`EvoMSA.utils`¶

class LabelEncoderWrapper[source]¶

Wrapper of LabelEncoder. The idea is to keep the order when the classes are numbers at some point this will help improve the performance in ordinary classification problems

Parameters:: classifier (bool) – Specifies whether it is a classification problem

__init__(classifier=True)[source]¶

property classifier¶: Whether EvoMSA is acting as classifier

fit(y)[source]¶

Fit the label encoder

Parameters:: y (list or np.array) – Independent variables
Return type:: self

class Cache[source]¶

Store the output of the text models

__init__(basename)[source]¶

linearSVC_array(classifiers)[source]¶

Transform LinearSVC into weight stored in array.array

Parameters:: classifers (list) – List of LinearSVC where each element is binary

bootstrap_confidence_interval(y: ~numpy.ndarray, hy: ~numpy.ndarray, metric: ~typing.Callable[[float, float], float] = <function <lambda>>, alpha: float = 0.05, nbootstrap: int = 500) → Tuple[float, float][source]¶: Confidence interval from predictions

class ConfidenceInterval[source]¶

Estimate the confidence interval

>>> from EvoMSA import base
>>> from EvoMSA.utils import ConfidenceInterval
>>> from microtc.utils import tweet_iterator
>>> import os
>>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json')
>>> D = list(tweet_iterator(tweets))
>>> X = [x['text'] for x in D]
>>> y = [x['klass'] for x in D]
>>> kw = dict(stacked_method="sklearn.naive_bayes.GaussianNB") 
>>> ci = ConfidenceInterval(X, y, evomsa_kwargs=kw)
>>> result = ci.estimate()

__init__(X: List[str], y: ndarray | list, Xtest: List[str] = None, y_test: ndarray | list = None, evomsa_kwargs: Dict = {}, folds: None | BaseCrossValidator = None) → None[source]¶

class Linear[source]¶

>>> from EvoMSA.model import Linear
>>> linear = Linear(coef=[12, 3], intercept=0.5, labels=[0, 'P'])
>>> X = np.array([[2, -1]])
>>> linear.decision_function(X)
21.5
>>> linear.predict(X)[0]
'P'

__init__(coef: list | ndarray, intercept: float = 0, labels: list | ndarray | None = None, N: int = 0) → None[source]¶

property N¶: Size

property coef¶: Coefficients

property intercept¶: Bias or intercept

property labels¶: Classes

emoji_information(lang='es')[source]¶

Download and load the Emoji statistics

Parameters:: lang (str) – [‘ar’, ‘zh’, ‘en’, ‘fr’, ‘pt’, ‘ru’, ‘es’]

>>> from EvoMSA.utils import emoji_information
>>> info = emoji_information()
>>> info['💧']
{'recall': 0.10575916230366492, 'ratio': 0.0003977123419509893, 'number': 3905}

load_dataset(lang='es', name='HA', k=None, d=17, func='most_common_by_type', v1=False)[source]¶

Download and load the Dataset representation

Parameters:

lang (str) – [‘ar’, ‘zh’, ‘en’, ‘es’]
emoji (int) – emoji identifier

>>> from EvoMSA.utils import load_dataset, load_bow
>>> bow = load_bow(lang='en')
>>> ds = load_dataset(lang='en', name='travel', k=0)
>>> X = bow.transform(['this is funny'])
>>> df = ds.decision_function(X)

dataset_information(lang='es')[source]¶

Download and load datasets information

Parameters:: lang (str) – [‘ar’, ‘zh’, ‘en’, ‘es’]

>>> from EvoMSA.utils import emoji_information
>>> info = dataset_information()

`EvoMSA.utils`¶

Table of Contents

Previous topic

This Page

EvoMSA.utils¶

`EvoMSA.utils`¶