Sentiment lexicon-based modelΒΆ

This model introduces a Sentiment Lexicon-based model into EvoMSA. The idea is to count the number of positive and negative words that appear on an affective lexicon. This model is appropriately described here.

This model has been implemented for Arabic, English, and Spanish, and can be used as follows:

For example, let us read a dataset to train EvoMSA.

>>> from EvoMSA import base
>>> from microtc.utils import tweet_iterator
>>> import os
>>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json')
>>> D = list(tweet_iterator(tweets))
>>> X = [x['text'] for x in D]
>>> y = [x['klass'] for x in D]

Once the dataset is load, EvoMSA using lexicon model is trained as follows:

>>> from EvoMSA.base import EvoMSA
>>> evo = EvoMSA(TH=True, lang='es').fit(X, y)
>>> evo.predict(['buenos dias'])

Particularly, the following classes implement the lexicon-based model:

  • EvoMSA.model.ThumbsUpDownAr

  • EvoMSA.model.ThumbsUpDownEn

  • EvoMSA.model.ThumbsUpDownEs

These models can be tested as follow:

>>> from EvoMSA.model import ThumbsUpDownEn
>>> th = ThumbsUpDownEn()
>>> th['good morning']