EvoMSA.utils

class LabelEncoderWrapper[source]

Wrapper of LabelEncoder. The idea is to keep the order when the classes are numbers at some point this will help improve the performance in ordinary classification problems

Parameters:

classifier (bool) – Specifies whether it is a classification problem

__init__(classifier=True)[source]
property classifier

Whether EvoMSA is acting as classifier

fit(y)[source]

Fit the label encoder

Parameters:

y (list or np.array) – Independent variables

Return type:

self

class Cache[source]

Store the output of the text models

__init__(basename)[source]
linearSVC_array(classifiers)[source]

Transform LinearSVC into weight stored in array.array

Parameters:

classifers (list) – List of LinearSVC where each element is binary

bootstrap_confidence_interval(y: ~numpy.ndarray, hy: ~numpy.ndarray, metric: ~typing.Callable[[float, float], float] = <function <lambda>>, alpha: float = 0.05, nbootstrap: int = 500) Tuple[float, float][source]

Confidence interval from predictions

class ConfidenceInterval[source]

Estimate the confidence interval

>>> from EvoMSA import base
>>> from EvoMSA.utils import ConfidenceInterval
>>> from microtc.utils import tweet_iterator
>>> import os
>>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json')
>>> D = list(tweet_iterator(tweets))
>>> X = [x['text'] for x in D]
>>> y = [x['klass'] for x in D]
>>> kw = dict(stacked_method="sklearn.naive_bayes.GaussianNB") 
>>> ci = ConfidenceInterval(X, y, evomsa_kwargs=kw)
>>> result = ci.estimate()
__init__(X: List[str], y: ndarray | list, Xtest: List[str] = None, y_test: ndarray | list = None, evomsa_kwargs: Dict = {}, folds: None | BaseCrossValidator = None) None[source]
class Linear[source]
>>> from EvoMSA.model import Linear
>>> linear = Linear(coef=[12, 3], intercept=0.5, labels=[0, 'P'])
>>> X = np.array([[2, -1]])
>>> linear.decision_function(X)
21.5
>>> linear.predict(X)[0]
'P'
__init__(coef: list | ndarray, intercept: float = 0, labels: list | ndarray | None = None, N: int = 0) None[source]
property N

Size

property coef

Coefficients

property intercept

Bias or intercept

property labels

Classes

emoji_information(lang='es')[source]

Download and load the Emoji statistics

Parameters:

lang (str) – [‘ar’, ‘zh’, ‘en’, ‘fr’, ‘pt’, ‘ru’, ‘es’]

>>> from EvoMSA.utils import emoji_information
>>> info = emoji_information()
>>> info['💧']
{'recall': 0.10575916230366492, 'ratio': 0.0003977123419509893, 'number': 3905}
load_dataset(lang='es', name='HA', k=None, d=17, func='most_common_by_type', v1=False)[source]

Download and load the Dataset representation

Parameters:
  • lang (str) – [‘ar’, ‘zh’, ‘en’, ‘es’]

  • emoji (int) – emoji identifier

>>> from EvoMSA.utils import load_dataset, load_bow
>>> bow = load_bow(lang='en')
>>> ds = load_dataset(lang='en', name='travel', k=0)
>>> X = bow.transform(['this is funny'])
>>> df = ds.decision_function(X)
dataset_information(lang='es')[source]

Download and load datasets information

Parameters:

lang (str) – [‘ar’, ‘zh’, ‘en’, ‘es’]

>>> from EvoMSA.utils import emoji_information
>>> info = dataset_information()