EvoMSA.utils
¶
- class LabelEncoderWrapper[source]¶
Wrapper of LabelEncoder. The idea is to keep the order when the classes are numbers at some point this will help improve the performance in ordinary classification problems
- Parameters:
classifier (bool) – Specifies whether it is a classification problem
- property classifier¶
Whether EvoMSA is acting as classifier
- linearSVC_array(classifiers)[source]¶
Transform LinearSVC into weight stored in array.array
- Parameters:
classifers (list) – List of LinearSVC where each element is binary
- bootstrap_confidence_interval(y: ~numpy.ndarray, hy: ~numpy.ndarray, metric: ~typing.Callable[[float, float], float] = <function <lambda>>, alpha: float = 0.05, nbootstrap: int = 500) Tuple[float, float] [source]¶
Confidence interval from predictions
- class ConfidenceInterval[source]¶
Estimate the confidence interval
>>> from EvoMSA import base >>> from EvoMSA.utils import ConfidenceInterval >>> from microtc.utils import tweet_iterator >>> import os >>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json') >>> D = list(tweet_iterator(tweets)) >>> X = [x['text'] for x in D] >>> y = [x['klass'] for x in D] >>> kw = dict(stacked_method="sklearn.naive_bayes.GaussianNB") >>> ci = ConfidenceInterval(X, y, evomsa_kwargs=kw) >>> result = ci.estimate()
- class Linear[source]¶
>>> from EvoMSA.model import Linear >>> linear = Linear(coef=[12, 3], intercept=0.5, labels=[0, 'P']) >>> X = np.array([[2, -1]]) >>> linear.decision_function(X) 21.5 >>> linear.predict(X)[0] 'P'
- __init__(coef: list | ndarray, intercept: float = 0, labels: list | ndarray | None = None, N: int = 0) None [source]¶
- property N¶
Size
- property coef¶
Coefficients
- property intercept¶
Bias or intercept
- property labels¶
Classes
- emoji_information(lang='es')[source]¶
Download and load the Emoji statistics
- Parameters:
lang (str) – [‘ar’, ‘zh’, ‘en’, ‘fr’, ‘pt’, ‘ru’, ‘es’]
>>> from EvoMSA.utils import emoji_information >>> info = emoji_information() >>> info['💧'] {'recall': 0.10575916230366492, 'ratio': 0.0003977123419509893, 'number': 3905}
- load_dataset(lang='es', name='HA', k=None, d=17, func='most_common_by_type', v1=False)[source]¶
Download and load the Dataset representation
- Parameters:
lang (str) – [‘ar’, ‘zh’, ‘en’, ‘es’]
emoji (int) – emoji identifier
>>> from EvoMSA.utils import load_dataset, load_bow >>> bow = load_bow(lang='en') >>> ds = load_dataset(lang='en', name='travel', k=0) >>> X = bow.transform(['this is funny']) >>> df = ds.decision_function(X)