EvoMSA.base.EvoMSA

class EvoMSA[source]

This is the main entry to create an EvoMSA model

Let us start with an example to show how to create an EvoMSA model. The first thing would be to read the dataset, EvoMSA has a dummy dataset to test its functionality, so lets used it.

Read the dataset

>>> from EvoMSA import base
>>> from microtc.utils import tweet_iterator
>>> import os
>>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json')
>>> D = list(tweet_iterator(tweets))
>>> X = [x['text'] for x in D]
>>> y = [x['klass'] for x in D]

Once the dataset is loaded, it is time to create an EvoMSA model

>>> from EvoMSA.base import EvoMSA
>>> stacked_method = 'sklearn.naive_bayes.GaussianNB'
>>> evo = EvoMSA(stacked_method=stacked_method).fit(X, y)

Predict a sentence in Spanish

>>> evo.predict(['EvoMSA esta funcionando'])
array(['P'], dtype='<U4')
Parameters:
  • b4msa_args (dict) – Arguments pass to TextModel updating the default arguments

  • stacked_method_args (dict) – Arguments pass to the stacked method

  • n_jobs (int) – Multiprocessing default 1 process, <= 0 to use all processors

  • n_splits (int) – Number of folds to train EvoDAG or evodag_class

  • seed (int) – Seed used default 0

  • classifier (bool) – EvoMSA as classifier default True

  • models (list) – Models used as list of pairs (see flags: TR, TH and Emo)

  • stacked_method (str or class) – Classifier or regressor used to ensemble the outputs of models default EvoDAG.model.EvoDAGE

  • TR (bool) – Use b4msa.textmodel.TextModel, sklearn.svm.LinearSVC on the training set

  • Emo (bool) – Use EvoMSA.model.EmoSpace[Ar|En|Es], sklearn.svm.LinearSVC

  • TH (bool) – Use EvoMSA.model.ThumbsUpDown[Ar|En|Es], sklearn.svm.LinearSVC

  • HA (bool) – Use HA datasets, sklearn.svm.LinearSVC

  • B4MSA – Pre-trained text model

  • tm_n_jobs (int) – Multiprocessing using on the Text Models, <= 0 to use all processors

  • cache (str) – Store the output of text models

__init__(b4msa_args={}, stacked_method='EvoDAG.model.EvoDAGE', stacked_method_args={}, n_jobs=1, n_splits=5, seed=0, classifier=True, models=None, lang=None, TR=True, Emo=False, TH=False, HA=False, B4MSA=False, Aggress=False, tm_n_jobs=None, cache=None)[source]
first_stage(X, y)[source]

Training EvoMSA’s first stage

Parameters:
  • X (dict or list) – Independent variables

  • y (list) – Dependent variable.

Returns:

List of vector spaces, i.e., second-stage’s training set

Return type:

list

>>> import os
>>> from EvoMSA import base
>>> from microtc.utils import tweet_iterator
>>> TWEETS = os.path.join(os.path.dirname(__file__), 'tests', 'tweets.json')
>>> X = [x['text'] for x in tweet_iterator(TWEETS)]
>>> y = [x['klass'] for x in tweet_iterator(TWEETS)]
>>> evo = base.EvoMSA()
>>> D = evo.first_stage(X, y)
>>> D.shape
(1000, 4)
fit(X, y, test_set=None)[source]

Train the model using a training set or pairs: text, dependent variable (e.g., class) EvoMSA is a two-stage procedure; the first step is to transform the text into a vector space with dimensions related to the number of classes and then train a supervised learning algorithm.

Parameters:
  • X (dict or list) – Independent variables

  • y (list) – Dependent variable.

Returns:

EvoMSA instance, i.e., self

property stacked_method

Method’s instance used to ensemble the output of the first stage.

property classifier

Whether EvoMSA is acting as classifier

property models

Models used as list of pairs

Return type:

list

property textModels

Text Models

Return type:

list

property cache

Basename to store the output of the textmodels

predict(X, cache=None)[source]

Predict the output of input X

Parameters:
  • X (list) – List of strings

  • cache (str) – Basename to store the output of the text models.

kfold_supervised_learning(X_vector_space, y)[source]

KFold to train the stacked_method, i.e., training set

Return type:

np.array