Text Classifier Competitions

https://github.com/INGEOTEC/EvoMSA/actions/workflows/test.yaml/badge.svg https://coveralls.io/repos/github/INGEOTEC/EvoMSA/badge.svg?branch=develop https://badge.fury.io/py/EvoMSA.svg https://dev.azure.com/conda-forge/feedstock-builds/_apis/build/status/evomsa-feedstock?branchName=main https://img.shields.io/conda/vn/conda-forge/evomsa.svg https://img.shields.io/conda/pn/conda-forge/evomsa.svg https://readthedocs.org/projects/evomsa/badge/?version=docs

Text classification (TC) is a Natural Language Processing (NLP) task focused on identifying a text’s label. A standard approach to tackle text classification problems is to pose it as a supervised learning problem. In supervised learning, everything starts with a dataset composed of pairs of inputs and outputs; in this case, the inputs are texts, and the outputs correspond to the associated labels. The aim is that the developed algorithm can automatically assign a label to any given text independently, whether it was in the original dataset. The feasible classes are only those found on the original dataset. In some circumstances, the method can also inform the confidence it has in its prediction so the user can decide whether to use or discard it.

Following a supervised learning approach requires that the input is in amenable representation for the learning algorithm; usually, this could be a vector. One of the most common methods to represent a text into a vector is to use a Bag of Word (BoW) model, which works by having a fixed vocabulary where each component represents an element in the vocabulary and the presence of it in the text is given by a non-zero value.

The text classifier’s performance depends on the representation quality and the classifier used. Deciding which representation and algorithm to use is daunting; in this contribution, we describe a set of classifiers that can be used, out of the box, for a new text classification problem. These classifiers are based on the BoW model. Nonetheless, some methods, namely DenseBoW, represent the text following two stages. The first one uses a set of BoW models and classifiers trained on self-supervised problems, where each task predicts the presence of a particular token. Consequently, the text is presented in a vector where each component is associated with a token, and the existence of it is encoded in the value. The methods used BoW models, and DenseBoW were combined using a stack generalization approach, namely StackGeneralization.

The text classifiers presented have been tested in many text classifier competitions without modifications. The aim is to offer a better understanding of how these algorithms perform in a new situation and what would be the difference in performance with an algorithm tailored to the new problem. We test 13 different algorithms for each task of each competition. The configuration having the best performance was submitted to the contest. The best performance was computed using either a k-fold cross-validation or a validation set, depending on the information provided by the challenge.

Results

Following an unconventional approach, the performance of EvoMSA 2.0 in different competitions is presented before describing the parameters used and the challenges. The following table presents the performance; it includes the performance of the system that wins the competition, the performance of EvoMSA 2.0, and the difference between them in percentage.

EvoMSA 2.0 Performance in different competitions.

Competitions

Edition

Score

Winner

EvoMSA 2.0

Difference

HaSpeeDe3 (textual)

2023

macro-\(f_1\)

0.9128

0.8845 (Conf.)

3.2%

HaSpeeDe3 (XReligiousHate)

2023

macro-\(f_1\)

0.6525

0.5522 (Conf.)

18.2%

HODI

2023

macro-\(f_1\)

0.81079

0.71527 (Conf.)

13.4%

ACTI

2023

Accuracy

0.85712

0.78207 (Conf.)

9.6%

PoliticIT (Global)

2023

0.824057

0.762001

8.1%

PoliticIT (Gender)

2023

macro-\(f_1\)

0.824287

0.732259 (Conf.)

12.6%

PoliticIT (Ideology Binary)

2023

macro-\(f_1\)

0.928223

0.848525 (Conf.)

9.4%

PoliticIT (Ideology Multiclass)

2023

macro-\(f_1\)

0.751477

0.705220 (Conf.)

6.6%

PoliticEs (Global)

2023

0.811319

0.777584

4.3%

PoliticEs (Gender)

2023

macro-\(f_1\)

0.829633

0.711549 (Conf.)

16.6%

PoliticEs (Profession)

2023

macro-\(f_1\)

0.860824

0.837945 (Conf.)

2.7%

PoliticEs (Ideology Binary)

2023

macro-\(f_1\)

0.896715

0.891394 (Conf.)

0.6%

PoliticEs (Ideology Multiclass)

2023

macro-\(f_1\)

0.691334

0.669448 (Conf.)

3.3%

DA-VINCIS

2023

\(f_1\)

0.9264

0.8903 (Conf.)

4.1%

DA-VINCIS

2022

\(f_1\)

0.7817

0.7510 (Conf.)

4.1%

Rest-Mex (Global)

2023

see overview

0.7790190145

0.7375714730

5.6%

Rest-Mex (Polarity)

2023

see overview

0.621691991

0.554880778 (Conf.)

12.0%

Rest-Mex (Type)

2023

see overview

0.99032231

0.980539122 (Conf.)

1.0%

Rest-Mex (Country)

2023

see overview

0.942028113

0.927052594 (Conf.)

1.6%

HOMO-MEX

2023

macro-\(f_1\)

0.8847

0.8050 (Conf.)

9.9%

HOPE (ES)

2023

macro-\(f_1\)

0.9161

0.5214 (Conf.)

75.7%

HOPE (EN)

2023

macro-\(f_1\)

0.5012

0.4651 (Conf.)

7.8%

DIPROMATS (ES)

2023

\(f_1\)

0.8089

0.7485 (Conf.)

8.1%

DIPROMATS (EN)

2023

\(f_1\)

0.8090

0.7255 (Conf.)

11.5%

HUHU

2023

\(f_1\)

0.820

0.775 (Conf.)

5.8%

TASS

2017

macro-\(f_1\)

0.577

0.525 (Conf.)

9.9%

EDOS (A)

2023

macro-\(f_1\)

0.8746

0.7890 (Conf.)

10.8%

EDOS (B)

2023

macro-\(f_1\)

0.7326

0.5413 (Conf.)

35.3%

EDOS (C)

2023

macro-\(f_1\)

0.5606

0.3388 (Conf.)

65.5%

Competitions