EvoMSA.base.EvoMSA
¶
- class EvoMSA[source]¶
This is the main entry to create an EvoMSA model
Let us start with an example to show how to create an EvoMSA model. The first thing would be to read the dataset, EvoMSA has a dummy dataset to test its functionality, so lets used it.
Read the dataset
>>> from EvoMSA import base >>> from microtc.utils import tweet_iterator >>> import os >>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json') >>> D = list(tweet_iterator(tweets)) >>> X = [x['text'] for x in D] >>> y = [x['klass'] for x in D]
Once the dataset is loaded, it is time to create an EvoMSA model
>>> from EvoMSA.base import EvoMSA >>> stacked_method = 'sklearn.naive_bayes.GaussianNB' >>> evo = EvoMSA(stacked_method=stacked_method).fit(X, y)
Predict a sentence in Spanish
>>> evo.predict(['EvoMSA esta funcionando']) array(['P'], dtype='<U4')
- Parameters:
b4msa_args (dict) – Arguments pass to TextModel updating the default arguments
stacked_method_args (dict) – Arguments pass to the stacked method
n_jobs (int) – Multiprocessing default 1 process, <= 0 to use all processors
n_splits (int) – Number of folds to train EvoDAG or evodag_class
seed (int) – Seed used default 0
classifier (bool) – EvoMSA as classifier default True
models (list) – Models used as list of pairs (see flags: TR, TH and Emo)
stacked_method (str or class) – Classifier or regressor used to ensemble the outputs of
models
defaultEvoDAG.model.EvoDAGE
TR (bool) – Use b4msa.textmodel.TextModel, sklearn.svm.LinearSVC on the training set
Emo (bool) – Use EvoMSA.model.EmoSpace[Ar|En|Es], sklearn.svm.LinearSVC
TH (bool) – Use EvoMSA.model.ThumbsUpDown[Ar|En|Es], sklearn.svm.LinearSVC
HA (bool) – Use HA datasets, sklearn.svm.LinearSVC
B4MSA – Pre-trained text model
tm_n_jobs (int) – Multiprocessing using on the Text Models, <= 0 to use all processors
cache (str) – Store the output of text models
- __init__(b4msa_args={}, stacked_method='EvoDAG.model.EvoDAGE', stacked_method_args={}, n_jobs=1, n_splits=5, seed=0, classifier=True, models=None, lang=None, TR=True, Emo=False, TH=False, HA=False, B4MSA=False, Aggress=False, tm_n_jobs=None, cache=None)[source]¶
- first_stage(X, y)[source]¶
Training EvoMSA’s first stage
- Parameters:
X (dict or list) – Independent variables
y (list) – Dependent variable.
- Returns:
List of vector spaces, i.e., second-stage’s training set
- Return type:
list
>>> import os >>> from EvoMSA import base >>> from microtc.utils import tweet_iterator >>> TWEETS = os.path.join(os.path.dirname(__file__), 'tests', 'tweets.json') >>> X = [x['text'] for x in tweet_iterator(TWEETS)] >>> y = [x['klass'] for x in tweet_iterator(TWEETS)] >>> evo = base.EvoMSA() >>> D = evo.first_stage(X, y) >>> D.shape (1000, 4)
- fit(X, y, test_set=None)[source]¶
Train the model using a training set or pairs: text, dependent variable (e.g., class) EvoMSA is a two-stage procedure; the first step is to transform the text into a vector space with dimensions related to the number of classes and then train a supervised learning algorithm.
- Parameters:
X (dict or list) – Independent variables
y (list) – Dependent variable.
- Returns:
EvoMSA instance, i.e., self
- property stacked_method¶
Method’s instance used to ensemble the output of the first stage.
- property classifier¶
Whether EvoMSA is acting as classifier
- property models¶
Models used as list of pairs
- Return type:
list
- property textModels¶
Text Models
- Return type:
list
- property cache¶
Basename to store the output of the textmodels