`EvoMSA.base.EvoMSA`¶

This is the main entry to create an EvoMSA model

Let us start with an example to show how to create an EvoMSA model. The first thing would be to read the dataset, EvoMSA has a dummy dataset to test its functionality, so lets used it.

Read the dataset

>>> from EvoMSA import base
>>> from microtc.utils import tweet_iterator
>>> import os
>>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json')
>>> D = list(tweet_iterator(tweets))
>>> X = [x['text'] for x in D]
>>> y = [x['klass'] for x in D]

Once the dataset is loaded, it is time to create an EvoMSA model

>>> from EvoMSA.base import EvoMSA
>>> stacked_method = 'sklearn.naive_bayes.GaussianNB'
>>> evo = EvoMSA(stacked_method=stacked_method).fit(X, y)

Predict a sentence in Spanish

>>> evo.predict(['EvoMSA esta funcionando'])
array(['P'], dtype='<U4')

Parameters:

b4msa_args (dict) – Arguments pass to TextModel updating the default arguments
stacked_method_args (dict) – Arguments pass to the stacked method
n_jobs (int) – Multiprocessing default 1 process, <= 0 to use all processors
n_splits (int) – Number of folds to train EvoDAG or evodag_class
seed (int) – Seed used default 0
classifier (bool) – EvoMSA as classifier default True
models (list) – Models used as list of pairs (see flags: TR, TH and Emo)
stacked_method (str or class) – Classifier or regressor used to ensemble the outputs of models default EvoDAG.model.EvoDAGE
TR (bool) – Use b4msa.textmodel.TextModel, sklearn.svm.LinearSVC on the training set
Emo (bool) – Use EvoMSA.model.EmoSpace[Ar|En|Es], sklearn.svm.LinearSVC
TH (bool) – Use EvoMSA.model.ThumbsUpDown[Ar|En|Es], sklearn.svm.LinearSVC
HA (bool) – Use HA datasets, sklearn.svm.LinearSVC
B4MSA – Pre-trained text model
tm_n_jobs (int) – Multiprocessing using on the Text Models, <= 0 to use all processors
cache (str) – Store the output of text models

__init__(b4msa_args={}, stacked_method='EvoDAG.model.EvoDAGE', stacked_method_args={}, n_jobs=1, n_splits=5, seed=0, classifier=True, models=None, lang=None, TR=True, Emo=False, TH=False, HA=False, B4MSA=False, Aggress=False, tm_n_jobs=None, cache=None)[source]¶

first_stage(X, y)[source]¶

Training EvoMSA’s first stage

Parameters:

X (dict or list) – Independent variables
y (list) – Dependent variable.

Returns:

List of vector spaces, i.e., second-stage’s training set

Return type:

list

>>> import os
>>> from EvoMSA import base
>>> from microtc.utils import tweet_iterator
>>> TWEETS = os.path.join(os.path.dirname(__file__), 'tests', 'tweets.json')
>>> X = [x['text'] for x in tweet_iterator(TWEETS)]
>>> y = [x['klass'] for x in tweet_iterator(TWEETS)]
>>> evo = base.EvoMSA()
>>> D = evo.first_stage(X, y)
>>> D.shape
(1000, 4)

fit(X, y, test_set=None)[source]¶

Train the model using a training set or pairs: text, dependent variable (e.g., class) EvoMSA is a two-stage procedure; the first step is to transform the text into a vector space with dimensions related to the number of classes and then train a supervised learning algorithm.

Parameters:

X (dict or list) – Independent variables
y (list) – Dependent variable.

Returns:

EvoMSA instance, i.e., self

property stacked_method¶: Method’s instance used to ensemble the output of the first stage.

property classifier¶: Whether EvoMSA is acting as classifier

property models¶

Models used as list of pairs

Return type:: list

property textModels¶

Text Models

Return type:: list

property cache¶: Basename to store the output of the textmodels

predict(X, cache=None)[source]¶

Predict the output of input X

Parameters:

X (list) – List of strings
cache (str) – Basename to store the output of the text models.

kfold_supervised_learning(X_vector_space, y)[source]¶

KFold to train the stacked_method, i.e., training set

Return type:: np.array

`EvoMSA.base.EvoMSA`¶

Table of Contents

Previous topic

Next topic

This Page

EvoMSA.base.EvoMSA¶

`EvoMSA.base.EvoMSA`¶