.. _v1: ================================== EvoMSA first version ================================== .. image:: https://github.com/INGEOTEC/EvoMSA/actions/workflows/test.yaml/badge.svg :target: https://github.com/INGEOTEC/EvoMSA/actions/workflows/test.yaml .. image:: https://coveralls.io/repos/github/INGEOTEC/EvoMSA/badge.svg?branch=develop :target: https://coveralls.io/github/INGEOTEC/EvoMSA?branch=develop .. image:: https://badge.fury.io/py/EvoMSA.svg :target: https://badge.fury.io/py/EvoMSA .. image:: https://dev.azure.com/conda-forge/feedstock-builds/_apis/build/status/evomsa-feedstock?branchName=main :target: https://dev.azure.com/conda-forge/feedstock-builds/_build/latest?definitionId=16466&branchName=main .. image:: https://img.shields.io/conda/vn/conda-forge/evomsa.svg :target: https://anaconda.org/conda-forge/evomsa .. image:: https://img.shields.io/conda/pn/conda-forge/evomsa.svg :target: https://anaconda.org/conda-forge/evomsa .. image:: https://readthedocs.org/projects/evomsa/badge/?version=docs :target: https://evomsa.readthedocs.io/en/docs/?badge=docs .. image:: https://colab.research.google.com/assets/colab-badge.svg :target: https://colab.research.google.com/github/INGEOTEC/EvoMSA/blob/master/docs/Quickstart.ipynb EvoMSA is a Sentiment Analysis System based on `B4MSA `_ and `EvoDAG `_. EvoMSA is a stack generalization algorithm specialized on text classification problems. It works by combining the output of different :ref:`text models ` to produce the final prediction. EvoMSA is a two-stage procedure; the first step transforms the text into a vector space with dimensions related to the number of classes, and then, the second stage trains a supervised learning algorithm. The first stage is a composition of two functions, :math:`g \circ m`, where :math:`m` is a :ref:`text model ` that transforms a text into a vector (i.e., :math:`m: \text{text} \rightarrow \mathbb R^d`) and :math:`g` is a classifier or regressor (i.e., :math:`g: \mathbb R^d \rightarrow \mathbb R^c`), :math:`d` depends on :math:`m`, and :math:`c` is the number of classes or labels. EvoMSA contains different :ref:`text models ` (i.e., :math:`m`), which can be selected using flags in the class constructor. The :ref:`text models ` implemented are: * :py:class:`b4msa.textmodel.TextModel` model trained with the training set (it is set by default :py:attr:`TR`) * :ref:`emospace` (it is evoked using :py:attr:`EvoMSA.base.EvoMSA(Emo=True, lang="en")`) * :ref:`th` (it is evoked using :py:attr:`EvoMSA.base.EvoMSA(TH=True, lang="en")`) * :ref:`ha` (it is evoked using :py:attr:`EvoMSA.base.EvoMSA(HA=True, lang="en")`) * :ref:`aggress` (it is evoked using :py:attr:`EvoMSA.base.EvoMSA(Aggress=True, lang="en")`) where :py:attr:`lang` specifies the language and can be either *ar*, *en*, or, *es* that corresponds to Arabic, English, and Spanish, respectively. On the other hand, :math:`g` is a classifier or regressor, and by default, it uses :py:class:`sklearn.svm.LinearSVC`. The second stage is the stacking method, which is a classifier or regressor. EvoMSA uses by default EvoDAG (i.e., :py:class:`EvoDAG.model.EvoDAGE`); however, this method can be changed with tha parameter :py:attr:`stacked_method`, e.g., :py:attr:`EvoMSA.base.EvoMSA(stacked_method="sklearn.naive_bayes.GaussianNB")`. EvoMSA is described in `EvoMSA: A Multilingual Evolutionary Approach for Sentiment Analysis `_, Mario Graff, Sabino Miranda-Jimenez, Eric Sadit Tellez, Daniela Moctezuma. Computational Intelligence Magazine, vol 15 no. 1, pp. 76-88, Feb. 2020. In this document, we try to follow as much as possible the notation used in the CIM paper; we believe this can help to grasp as easily as possible EvoMSA's goals. Quickstart Guide =================== We have decided to make a live quickstart guide, it covers the installation, the use of EvoMSA with different text models, and it ends by explaining how the text models can be used on their own. Finally, the notebook can be found at the docs directory on GitHub. .. raw:: html Citing ====== If you find EvoMSA useful for any academic/scientific purpose, we would appreciate citations to the following reference: .. code:: bibtex @article{DBLP:journals/corr/abs-1812-02307, author = {Mario Graff and Sabino Miranda{-}Jim{\'{e}}nez and Eric Sadit Tellez and Daniela Moctezuma}, title = {EvoMSA: {A} Multilingual Evolutionary Approach for Sentiment Analysis}, journal = {Computational Intelligence Magazine}, volume = {15}, issue = {1}, year = {2020}, pages = {76 -- 88}, url = {https://ieeexplore.ieee.org/document/8956106}, month = {Feb.} } Installing EvoMSA ======================= EvoMSA can be easly install using anaconda .. code:: bash conda install -c conda-forge EvoMSA or can be install using pip, it depends on numpy, scipy, scikit-learn and b4msa. .. code:: bash pip install cython pip install sparsearray pip install evodag pip install EvoMSA Usage ========= EvoMSA can be used from using the following commands. Read the dataset >>> from EvoMSA import base >>> from microtc.utils import tweet_iterator >>> import os >>> tweets = os.path.join(os.path.dirname(base.__file__), 'tests', 'tweets.json') >>> D = list(tweet_iterator(tweets)) >>> X = [x['text'] for x in D] >>> y = [x['klass'] for x in D] Once the dataset is loaded, it is time to create an EvoMSA model, let us create an EvoMSA model enhaced with :ref:`emospace`. >>> from EvoMSA.base import EvoMSA >>> evo = EvoMSA(Emo=True, lang='es').fit(X, y) Predict a sentence in Spanish >>> evo.predict(['EvoMSA esta funcionando']) EvoMSA uses by default :py:class:`EvoDAG.model.EvoDAGE` as stacked classifier; however, this is a parameter that can be modified. Let us, for example use :py:class:`sklearn.naive_bayes.GaussianNB` in the previous example. >>> evo = EvoMSA(Emo=True, lang='es', stacked_method='sklearn.naive_bayes.GaussianNB').fit(X, y) >>> evo.predict(['EvoMSA esta funcionando']) .. _tm: Text Models ================= Besides the default text model (i.e., :py:class:`b4msa.textmodel.TextModel`), EvoMSA has four text models (EvoMSA's CIM paper presents only the first three models) for Arabic, English and Spanish languages that can be selected with a flag in the constructor, these are: .. toctree:: :maxdepth: 2 emospace th ha aggress .. * :ref:`emospace`. * :ref:`th`. * :ref:`ha`. * :ref:`aggress`. Nonetheless, more text models can be included in EvoMSA. EvoMSA's core idea is to facilitate the inclusion of diverse text models. We have been using EvoMSA (as INGEOTEC team) on different competitions run at the Workshop of Semantic Evaluation as well as other sentiment-analysis tasks and traditional text classification problems. During this time, we have created different text models -- some of them using the datasets provided by the competition's organizers and others inspired by our previous work -- in different languages. We have decided to make public these text models organizing them by language. .. toctree:: :maxdepth: 2 arabic english spanish cites EvoMSA's classes ================== .. toctree:: :maxdepth: 2 base utils