Multi-label Classification Using Polylingual Embeddings

•

1 like•723 views

This document proposes a method for multi-label, multi-class text classification using polylingual embeddings. It generates document embeddings in different languages using pooling methods and learns cross-language embeddings with an autoencoder. Experimental results on a dataset with 12,670 instances across 100 classes show that distributed representations perform better with limited labeled data compared to bag-of-words models. Neighborhood-based classifiers like k-NN outperform SVMs on the polylingual embeddings, likely due to their semantic nature. The authors conclude more work is needed on composition functions for word representations and efficiently combining them with bag-of-words models.

Science

Multi-label,Multi-classClassiﬁcationUsingPolylingual
Embeddings
Georgios.Balikas@imag.fr and Massih-Reza.Amini@imag.fr
1. MOTIVATION
Less than 50% of the current Internet content is written in English. There
are a lot of high quality resources for English.
Can we transfer knowledge between different languages? How can we
proﬁtably exploit the multilingual content?
Text classiﬁcation, like summarization, are applications that would beneﬁt
by such approaches.
Figure 1: Multilinguality.
2. OUR REPRESENTATION LEARNING APPROACH
Figure 2: The generation of polylingual document embeddings starting from the given languages.
Generate document embeddings in each language (English, French, ...) using average pooling methods or paragraph vectors.
Learn language-independent embeddings for each document using the denoising auto-encoder.
Evaluate the learning methods on those polylingual representations learned on the auto-encoder’s hidden layer.
3. THE EXPERIMENTAL FRAMEWORK
0.1 0.3 0.5 0.7 0.9
Proportion of the training set
0.4
0.5
0.6
F1measure
cbow
SVMPE
k-NNPE
SVMBoW
0.1 0.3 0.5 0.7 0.9
Proportion of the training set
0.4
0.5
0.6
F1measure
skip-gram
SVMPE
k-NNPE
SVMBoW
Figure 3: Polylingual embeddings Vs bag-of-words representations. Com-
plete dataset: 12,670 instances (100 classes).
cbow
dim. k-NNDR SVMDR k-NNPE SVMPE
50 39.19 37.20 39.58 32.84
100 40.20 40.01 43.53 37.54
200 40.48 43.41 45.86 42.50
300 40.42 44.25 46.33 43.38
DBOWpv
50 24.45 25.06 30.26 24.08
100 31.20 28.53 34.63 26.88
200 27.73 29.80 36.02 30.80
300 27.79 29.92 38.71 30.82
SVMBoW 36.03
Table 1: A detailed analysis of the performance gains.
Interestingly, distributed representations of documents perform better when few labeled data are available.
k-NN outperforms SVMs probably due to the semantic nature of such representations.



Take home message!
Need for composition functions that retain more information when combining word representations.
How can we efﬁciently combine such representations with bag-of-word representations?
REFERENCES
[1] Mikolov, Tomas, et al. "Efﬁcient estimation of word representations in vector space." arXiv:1301.3781 (2013).
[2] Le, Quoc V., and Tomas Mikolov. "Distributed representations of sentences and documents." arXiv:1405.4053
(2014).
ACKNOWLEDGEMENTS
This work is partially supported by the CIFRE N
28/2015 and by the LabEx PERSYVAL Lab ANR-11-
LABX-0025.

What's hot

A Simple Introduction to Word EmbeddingsBhaskar Mitra

Word representations in vector spaceAbdullah Khan Zehady

A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN

Taking into account communities of practice’s specific vocabularies in inform...inscit2006

An Intuitive Natural Language Understanding Systeminscit2006

Bt0077 multimedia systemssmumbahelp

Convolutional neural networks for sentiment classificationYunchao He

Deep Learning in practice : Speech recognition and beyond - MeetupLINAGORA

CMSC 723: Computational Linguistics Ibutest

Skip gram and cbowhyunyoung Lee

Phonetic distance based accentsipij

Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen

Shared-hidden-layer Deep Neural Network for Under-resourced Language the ContentTELKOMNIKA JOURNAL

Word Embeddings - IntroductionChristian Perone

(Kpi summer school 2015) word embeddings and neural language modelingSerhii Havrylov

Oop conceptsRitu Mangla

Fasttext 20170720 yjy재연 윤

Very short OOP IntroductionCristian G

encapsulationshalini392

International Journal of Engineering and Science Invention (IJESI) inventionjournals

What's hot (20)

A Simple Introduction to Word Embeddings

Word representations in vector space

A Vietnamese Language Model Based on Recurrent Neural Network

Taking into account communities of practice’s specific vocabularies in inform...

An Intuitive Natural Language Understanding System

Bt0077 multimedia systems

Convolutional neural networks for sentiment classification

Deep Learning in practice : Speech recognition and beyond - Meetup

CMSC 723: Computational Linguistics I

Skip gram and cbow

Phonetic distance based accent

Unsupervised Software-Specific Morphological Forms Inference from Informal Di...

Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content

Word Embeddings - Introduction

(Kpi summer school 2015) word embeddings and neural language modeling

Oop concepts

Fasttext 20170720 yjy

Very short OOP Introduction

encapsulation

International Journal of Engineering and Science Invention (IJESI)

Viewers also liked

Tag Extraction Final Presentation - CS185CSpring2014Naoki Nakatani

MrKNN_Soft Relevance for Multi-label ClassificationYI-JHEN LIN

Multi-label Classification with Meta-labelsAlbert Bifet

Multi-Class Classification on Cartographic Data(Forest Cover)Abhishek Agrawal

Voting Based Learning Classifier System for Multi-Label ClassificationDaniele Loiacono

Svm implementation for Health DataAbhishek Agrawal

CNN-RNN: A Unified Framework for Multi-label Image Classification@CV勉強会35回CVP...Toshiki Sakai

Naïve multi label classification of you tube comments usingNidhi Baranwal

Viewers also liked (8)

Tag Extraction Final Presentation - CS185CSpring2014

MrKNN_Soft Relevance for Multi-label Classification

Multi-label Classification with Meta-labels

Multi-Class Classification on Cartographic Data(Forest Cover)

Voting Based Learning Classifier System for Multi-Label Classification

Svm implementation for Health Data

CNN-RNN: A Unified Framework for Multi-label Image Classification@CV勉強会35回CVP...

Naïve multi label classification of you tube comments using

Similar to Multi-label Classification Using Polylingual Embeddings

DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEAbdurrahimDerric

Dorra elmekki nlpDipl. Ing. Dorra El Mekki

NLP Project: Paragraph Topic ClassificationEugene Nho

Fine grained irony classification through transfer learning approachCSITiaesprime

ODSC East: Effective Transfer Learning for NLPindico data

Challenges in transfer learning in nlpLaraOlmosCamarena

Survey on Natural Language Generationijtsrd

Turkish language modeling using BERTAbdurrahimDerric

Word2vec on the italian language: first experimentsVincenzo Lomonaco

Development of learned dictionary based spoken languagePallavi Bharti

A systematic study of text mining techniquesijnlc

TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...ijsc

Texts Classification with the usage of Neural Network based on the Word2vec’s...ijsc

Improving Text Categorization with Semantic Knowledge in Wikipediachjshan

Medical Text Classification using Convolutional Neural NetworkZihui Li

Multi label classification ofijaia

Team Gbutest

Automatic Grading of Handwritten AnswersIRJET Journal

Athifah procedia technology_2013Nong Tiun

Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit

Similar to Multi-label Classification Using Polylingual Embeddings (20)

DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE

Dorra elmekki nlp

NLP Project: Paragraph Topic Classification

Fine grained irony classification through transfer learning approach

ODSC East: Effective Transfer Learning for NLP

Challenges in transfer learning in nlp

Survey on Natural Language Generation

Turkish language modeling using BERT

Word2vec on the italian language: first experiments

Development of learned dictionary based spoken language

A systematic study of text mining techniques

TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...

Texts Classification with the usage of Neural Network based on the Word2vec’s...

Improving Text Categorization with Semantic Knowledge in Wikipedia

Medical Text Classification using Convolutional Neural Network

Multi label classification of

Team G

Automatic Grading of Handwritten Answers

Athifah procedia technology_2013

Improvement wsd dictionary using annotated corpus and testing it with simplif...

Recently uploaded

Introduction of Human Body & Structure of cell.pptxMedical College

Probability.pptx, Types of Probability, UGSoniaBajaj10

Unveiling the Cannabis Plant’s PotentialMarkus Roggen

DETECTION OF MUTATION BY CLB METHOD.pptx201bo007

Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad

Introduction Classification Of AlkaloidsNandakishor Bhaurao Deshmukh

KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1

DNA isolation molecular biology practical.pptxGiDMOh

complex analysis best book for solving questions.pdfSubhamKumar3239

GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin

6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju

Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk

CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456

final waves properties grade 7 - third quarterHanHyoKim

GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide

Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane

The Sensory Organs, Anatomy and FunctionJadeNovelo1

LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary MicrobiologyChayanika Das

whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani

Recently uploaded (20)

Introduction of Human Body & Structure of cell.pptx

Probability.pptx, Types of Probability, UG

Unveiling the Cannabis Plant’s Potential

DETECTION OF MUTATION BY CLB METHOD.pptx

Gas-ExchangeS-in-Plants-and-Animals.pptx

Introduction Classification Of Alkaloids

KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf

DNA isolation molecular biology practical.pptx

complex analysis best book for solving questions.pdf

GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx

6.2 Pests of Sesame_Identification_Binomics_Dr.UPR

Oxo-Acids of Halogens and their Salts.pptx

CHROMATOGRAPHY PALLAVI RAWAT.pptx

final waves properties grade 7 - third quarter

GenAI talk for Young at Wageningen University & Research (WUR) March 2024

Environmental Acoustics- Speech interference level, acoustics calibrator.pptx

The Sensory Organs, Anatomy and Function

LAMP PCR.pptx by Dr. Chayanika Das, Ph.D, Veterinary Microbiology

whole genome sequencing new and its types including shortgun and clone by clone

Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...

Multi-label Classification Using Polylingual Embeddings

1. Multi-label,Multi-classClassificationUsingPolylingual Embeddings Georgios.Balikas@imag.fr and Massih-Reza.Amini@imag.fr 1. MOTIVATION Less than 50% of the current Internet content is written in English. There are a lot of high quality resources for English. Can we transfer knowledge between different languages? How can we profitably exploit the multilingual content? Text classification, like summarization, are applications that would benefit by such approaches. Figure 1: Multilinguality. 2. OUR REPRESENTATION LEARNING APPROACH Figure 2: The generation of polylingual document embeddings starting from the given languages. Generate document embeddings in each language (English, French, ...) using average pooling methods or paragraph vectors. Learn language-independent embeddings for each document using the denoising auto-encoder. Evaluate the learning methods on those polylingual representations learned on the auto-encoder’s hidden layer. 3. THE EXPERIMENTAL FRAMEWORK 0.1 0.3 0.5 0.7 0.9 Proportion of the training set 0.4 0.5 0.6 F1measure cbow SVMPE k-NNPE SVMBoW 0.1 0.3 0.5 0.7 0.9 Proportion of the training set 0.4 0.5 0.6 F1measure skip-gram SVMPE k-NNPE SVMBoW Figure 3: Polylingual embeddings Vs bag-of-words representations. Com- plete dataset: 12,670 instances (100 classes). cbow dim. k-NNDR SVMDR k-NNPE SVMPE 50 39.19 37.20 39.58 32.84 100 40.20 40.01 43.53 37.54 200 40.48 43.41 45.86 42.50 300 40.42 44.25 46.33 43.38 DBOWpv 50 24.45 25.06 30.26 24.08 100 31.20 28.53 34.63 26.88 200 27.73 29.80 36.02 30.80 300 27.79 29.92 38.71 30.82 SVMBoW 36.03 Table 1: A detailed analysis of the performance gains. Interestingly, distributed representations of documents perform better when few labeled data are available. k-NN outperforms SVMs probably due to the semantic nature of such representations.    Take home message! Need for composition functions that retain more information when combining word representations. How can we efficiently combine such representations with bag-of-word representations? REFERENCES [1] Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv:1301.3781 (2013). [2] Le, Quoc V., and Tomas Mikolov. "Distributed representations of sentences and documents." arXiv:1405.4053 (2014). ACKNOWLEDGEMENTS This work is partially supported by the CIFRE N 28/2015 and by the LabEx PERSYVAL Lab ANR-11- LABX-0025.

Multi-label Classification Using Polylingual Embeddings

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Multi-label Classification Using Polylingual Embeddings

Similar to Multi-label Classification Using Polylingual Embeddings (20)

Recently uploaded

Recently uploaded (20)

Multi-label Classification Using Polylingual Embeddings