.. _comp_systems: ======================== Competition Systems ======================== We test 13 different combinations of :ref:`BoW` and :ref:`DenseBoW` models. These models include the use of the two procedures to select the vocabulary (parameter voc_selection), the use of pre-trained :ref:`BoW`, and the creation of the :ref:`BoW` representation with the given training set. Additionally, we create text representations tailored to the problem at hand. That is the words with more discriminant power in a :ref:`BoW` classifier, trained on the training set, are selected as the labels in self-supervised problems. .. autoclass:: EvoMSA.competitions.Comp2023 :members: .. _tailored-keywords: Tailored Keywords ----------------------------- .. code-block:: python bow = BoW(lang=LANG, pretrain=False).fit(D) keywords = DenseBoW(lang=LANG, emoji=False, dataset=False).names tokens = [(name, np.median(np.fabs(w * v))) for name, w, v in zip(bow.names, bow.weights, bow.estimator_instance.coef_.T) if name[:2] != 'q:' and '~' not in name and name not in keywords] tokens.sort(key=lambda x: x[1], reverse=True) semi = SelfSupervisedDataset([k for k, _ in tokens[:2048]], tempfile=f'{MODEL}.gz', bow=BoW(lang=LANG), capacity=1, n_jobs=63) semi.process(PATH_DATASET, output=MODEL)