SlideShare a Scribd company logo
Topic Modeling and
WSD on the Ancora
Corpus
Ruben Izquierdo
Marten Postma
Piek Vossen
Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Outline
1. Starting Point
2. Motivation
3. Our Approach
4. Evaluation Framework
5. Experiments and Results
6. Conclusions
2Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Starting point
 “Understanding languages by machines” project
 Starts from the results of DutchSemCor (WSD)
 Analyse the real problems of WSD
 Understand the WSD task
 Word
 Meaning
 Context
3Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Outline
1. Starting Point
2. Motivation
3. Our Approach
4. Evaluation Framework
5. Experiments and Results
6. Conclusions
4Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Still WSD?
 Word Sense Disambiguation is still unsolved
 Used in high level applications
 Recently some unsupervised approaches and SemEval
tasks
 Babelnet, Babelfy…
 Several reasons and problems
5Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
WSD problems I
 Context is not considered properly
 Most are/were supervised approaches
 Moving to unsupervised, graph-based…
 WSD as a black box
 The larger number of features, the better performance?
 The best and newest machine learning algorithm
 WSD is seen as only one problem
 All words and cases treated in the same way
6Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
WSD problems II
 Error analysis SenseEval/SemEval systems [Postma
et al., 2014]
 Propagation errors (monosemous)
 Most Frequent Sense bias
 Supervised systems are skewed towards MFS
 Error analysis on WSD and SenseEval/SemEval
 Performance on MFS cases is good
 Very poor performance on non MFS cases
7Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
WSD problems II
8Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
WSD problems II
 Most Frequent Sense bias
 Supervised systems are skewed towards MFS
 Error analysis on WSD and SenseEval/SemEval
 Performance on MFS cases is good
 Very poor performance on non MFS cases
 Systems assign MFS in almost every case
 Sval2
 799 cases where the correct is not the MFS
 84% of the system still assign the MFS
9Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Outline
1. Starting Point
2. Motivation
3. Our Approach
4. Evaluation Framework
5. Experiments and Results
6. Conclusions
10Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Main idea
 WSD considered as two different problems
 When the MFS applies
 More general usages
 Larger contexts ??
 Rest of the senses
 More concrete usages
 Shorter contexts ??
 Specialized classifiers for each case
 Different features, parameters, contexts…
 Evaluation for Spanish
 Sense annotated corpus Ancora
11Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Our approach
 TRAINING. Use Topic Modeling (LDA) to induce word
expert classifiers
 For the Most Frequent Sense 
 Topics for the MFS case
 Topics for non MFS cases
 For the rest of senses (non MFS)
 Topics for every sense
 CLASSIFICATION. Apply the 2 classifiers in cascade
to decide the sense in every case
BINARY
MULTICLASS
12Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Training
13Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Classification
14Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Outline
1. Starting Point
2. Motivation
3. Our Approach
4. Evaluation Framework
5. Experiments and Results
6. Conclusions
15Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Evaluation framework
 Ancora corpus
 News Articles, Spanish part, 500K words, sense
annotated (nouns)
 Converted to NAF format
 3 Folded-cross validation
 Keeping sense distribution
 7119 unique lemmas annotated with nominal senses
16Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Evaluation framework
 Ancora corpus
 Spanish part, 500K words, sense annotated (nouns)
 3 Folded-cross validation
 Keeping sense distribution
 7119 unique lemmas annotated
 4907 are monosemous (69%)
 2212 are polysemous (31%)
 589 with at least 3 instances per sense (from the annotated)
17Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Evaluation framework
 Ancora corpus
 Spanish part, 500K words, sense annotated (nouns)
 3 Folded-cross validation
 Keeping sense distribution
 7119 unique lemmas annotated
0
200
400
600
800
1000
1200
1400
2 3 4 5 6 7 8 9 10 11 12
Number of lemmas vs. polysemy
Number of Lemmas
18Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Baseline Results
 For the 589 selected lemmas
Baseline Accuracy
Random 40.10
MFS overall 67.68
MFS folded 68.63
19Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Outline
1. Starting Point
2. Motivation
3. Our Approach
4. Evaluation Framework
5. Experiments and Results
6. Conclusions
20Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Experimentation
 Configuration of our cascade classifiers
 Only one step with the senseLDA classifier
 2 steps, mfsLDA with perfect performance + senseLDA
 2 steps, mfsLDA and senseLDA both induced
automatically
 LDA parameters (python gensim library)
 Context size (number of sentences)
 Number of topics for LDA
21Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results I
Instance
Example
Sense
LDA (all
senses)
Word
Sense
One step
classification
Sentences Topics Accurac
y
MFS baseline 68.63
0 3 67.54
10 65.56
100 58.34
3 3 66.30
10 64.62
100 60.07
50 3 66.04
10 63.42
100 59.06
• MFS not reached
• Most informative clues in
small contexts
• More topics  less
performance
22
Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results II
Instance
Example
MFS
(100%
accuracy)
Sense
LDA (all
senses)
Word
Sense
Two steps, MFS
classifier 100%
performance
Sentences Topics Accurac
y
MFS baseline 68.63
0 3 92.48
10 92.12
100 90.50
3 3 92.45
10 92.11
100 91.60
50 3 92.41
10 92.12
100 91.43
• Extremely high figures
• Good performance of the
senseLDA classifier (when no
MFS)
• Similar behaviour w.r.t. #sents
and # topics
23
Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results III
Instance
Example
MFS (s5)
Sense
LDA (all
senses)
Word
Sense
Two steps, MFS
classifier #S=5
Sents Topics Acc. MFS
T100
Acc. MFS
T1000
MFS baseline 68.63
0 3 74.53 66.73
10 74.00 66.41
100 72.61 64.91
3 3 74.30 66.61
10 73.87 66.36
100 73.39 65.76
50 3 74.26 66.48
10 73.90 66.24
100 73.53 65.75
• MFS s5 t100
• Smaller contexts for
non MFS cases (3, 50
included by 0)
• 3 Topics is the best
24
Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Results IV
Instance
Example
MFS (s50)
Sense
LDA (all
senses)
Word
Sense
Two steps, MFS
classifier #S=50
Sents Topics Acc. MFS
T100
Acc. MFS
T1000
MFS baseline 68.63
0 3 73.34 67.15
10 72.92 66.76
100 71.43 65.13
3 3 73.21 67.02
10 72.88 66.60
100 72.40 66.24
50 3 73.21 66.95
10 72.83 66.58
100 72.15 66.20
• Similar behaviour
compared to MFS_s5
• Slightly lower results
25
Lemma comparison
Lemma MFS (68.63) LDA (74.53) Variation Annotations
año 89.15 91.19 2.04 1275
país 72.29 83.55 11.26 695
presidente 70.31 73.94 3.63 690
partido 55.87 64.48 8.61 641
equipo 98.32 98.88 0.56 539
mes 54.29 80 25.71 315
hora 61.39 56.11 -5.28 305
caso 61.05 91.58 30.53 286
mundo 47.31 40.14 -7.17 279
semana 85.06 92.34 7.28 263
Most frequent lemmas
26Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Outline
1. Starting Point
2. Motivation
3. Our Approach
4. Evaluation Framework
5. Experiments and Results
6. Conclusions
27Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Conclusions
 Simple approach based on LDA for WSD in Spanish
 Two step classification approach for WSD improves the results for
Spanish (6 points)
 Different nature of both cases
 MFS in contexts of 5 sentences, 100 topics
 NonMFS in contexts in the local sentence, 3 topics
 All code and data publicly
available on GitHub (group policy)
http://github.com/rubenIzquierdo/lda_wsd
28Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
29Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
30Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
Ruben Izquierdo
Marten Postma
Piek Vossen
email: ruben.izquierdobevia@vu.nl
http://github.com/rubenIzquierdo/lda_wsd
http://rubenizquierdobevia.com
31Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.

More Related Content

Viewers also liked

The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
Marieke van Erp
 
ULM-1 Understanding Languages by Machines: The borders of Ambiguity
ULM-1 Understanding Languages by Machines: The borders of AmbiguityULM-1 Understanding Languages by Machines: The borders of Ambiguity
ULM-1 Understanding Languages by Machines: The borders of Ambiguity
Rubén Izquierdo Beviá
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
Marieke van Erp
 
HDRF: Stream-Based Partitioning for Power-Law Graphs
HDRF: Stream-Based Partitioning for Power-Law GraphsHDRF: Stream-Based Partitioning for Power-Law Graphs
HDRF: Stream-Based Partitioning for Power-Law Graphs
Fabio Petroni, PhD
 
LCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative FilteringLCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative Filtering
Fabio Petroni, PhD
 
Mining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completionMining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completion
Fabio Petroni, PhD
 
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative Analytics
Yunyao Li
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Yunyao Li
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Yunyao Li
 
KafNafParserPy: a python library for parsing/creating KAF and NAF files
KafNafParserPy: a python library for parsing/creating KAF and NAF filesKafNafParserPy: a python library for parsing/creating KAF and NAF files
KafNafParserPy: a python library for parsing/creating KAF and NAF files
Rubén Izquierdo Beviá
 
HSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe systemHSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe system
Fabio Petroni, PhD
 
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open ChallengesEnterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
Yunyao Li
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
Rubén Izquierdo Beviá
 
CORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization MachinesCORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization Machines
Fabio Petroni, PhD
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 Release
BigML, Inc
 
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 Release
BigML, Inc
 

Viewers also liked (17)

The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
ULM-1 Understanding Languages by Machines: The borders of Ambiguity
ULM-1 Understanding Languages by Machines: The borders of AmbiguityULM-1 Understanding Languages by Machines: The borders of Ambiguity
ULM-1 Understanding Languages by Machines: The borders of Ambiguity
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
HDRF: Stream-Based Partitioning for Power-Law Graphs
HDRF: Stream-Based Partitioning for Power-Law GraphsHDRF: Stream-Based Partitioning for Power-Law Graphs
HDRF: Stream-Based Partitioning for Power-Law Graphs
 
LCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative FilteringLCBM: Statistics-Based Parallel Collaborative Filtering
LCBM: Statistics-Based Parallel Collaborative Filtering
 
Mining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completionMining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completion
 
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative Analytics
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
 
KafNafParserPy: a python library for parsing/creating KAF and NAF files
KafNafParserPy: a python library for parsing/creating KAF and NAF filesKafNafParserPy: a python library for parsing/creating KAF and NAF files
KafNafParserPy: a python library for parsing/creating KAF and NAF files
 
HSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe systemHSIENA: a hybrid publish/subscribe system
HSIENA: a hybrid publish/subscribe system
 
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open ChallengesEnterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
CORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization MachinesCORE: Context-Aware Open Relation Extraction with Factorization Machines
CORE: Context-Aware Open Relation Extraction with Factorization Machines
 
BigML Summer 2016 Release
BigML Summer 2016 ReleaseBigML Summer 2016 Release
BigML Summer 2016 Release
 
BigML Fall 2016 Release
BigML Fall 2016 ReleaseBigML Fall 2016 Release
BigML Fall 2016 Release
 

Similar to Topic modeling and WSD on the Ancora corpus

third_seminar
third_seminarthird_seminar
third_seminar
Parakrant Sarkar
 
FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...
FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...
FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...
Arthur Sluÿters
 
MDE-experiments
MDE-experimentsMDE-experiments
MDE-experiments
miso_uam
 
Csmr10a.ppt
Csmr10a.pptCsmr10a.ppt
CSMR10a.ppt
CSMR10a.pptCSMR10a.ppt
CSMR10a.ppt
Ptidej Team
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
KiranKumar918931
 
Comparing the code quality of ECMs
Comparing the code quality of ECMsComparing the code quality of ECMs
Comparing the code quality of ECMs
Nuxeo
 
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Damiano Spina
 
Wcre12c.ppt
Wcre12c.pptWcre12c.ppt
Wcre12c.ppt
Ptidej Team
 
21AI401 AI Unit 1.pdf
21AI401 AI Unit 1.pdf21AI401 AI Unit 1.pdf
21AI401 AI Unit 1.pdf
DivyaDivya208851
 
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Vitomir Kovanovic
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Marcin Junczys-Dowmunt
 
Hpai class 16 - learning - 041320
Hpai   class 16 - learning - 041320Hpai   class 16 - learning - 041320
Hpai class 16 - learning - 041320
melendez321
 
openEHR in the world
openEHR in the worldopenEHR in the world
openEHR in the world
yampeku
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
New approaches in music generation from tonal and modal perspectives
New approaches in music generation from tonal and modal perspectivesNew approaches in music generation from tonal and modal perspectives
New approaches in music generation from tonal and modal perspectives
Facultad de Informática UCM
 
Microcontroladores: Programación del microcontrolador PIC en C
Microcontroladores: Programación del microcontrolador PIC en CMicrocontroladores: Programación del microcontrolador PIC en C
Microcontroladores: Programación del microcontrolador PIC en C
SANTIAGO PABLO ALBERTO
 
Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!
Walid Maalej
 

Similar to Topic modeling and WSD on the Ancora corpus (18)

third_seminar
third_seminarthird_seminar
third_seminar
 
FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...
FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...
FORTE: Few Samples for Recognizing Hand Gestures with a Smartphone-attached R...
 
MDE-experiments
MDE-experimentsMDE-experiments
MDE-experiments
 
Csmr10a.ppt
Csmr10a.pptCsmr10a.ppt
Csmr10a.ppt
 
CSMR10a.ppt
CSMR10a.pptCSMR10a.ppt
CSMR10a.ppt
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Comparing the code quality of ECMs
Comparing the code quality of ECMsComparing the code quality of ECMs
Comparing the code quality of ECMs
 
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...
 
Wcre12c.ppt
Wcre12c.pptWcre12c.ppt
Wcre12c.ppt
 
21AI401 AI Unit 1.pdf
21AI401 AI Unit 1.pdf21AI401 AI Unit 1.pdf
21AI401 AI Unit 1.pdf
 
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
Towards Automated Classification of Discussion Transcripts: A Cognitive Prese...
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
 
Hpai class 16 - learning - 041320
Hpai   class 16 - learning - 041320Hpai   class 16 - learning - 041320
Hpai class 16 - learning - 041320
 
openEHR in the world
openEHR in the worldopenEHR in the world
openEHR in the world
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
New approaches in music generation from tonal and modal perspectives
New approaches in music generation from tonal and modal perspectivesNew approaches in music generation from tonal and modal perspectives
New approaches in music generation from tonal and modal perspectives
 
Microcontroladores: Programación del microcontrolador PIC en C
Microcontroladores: Programación del microcontrolador PIC en CMicrocontroladores: Programación del microcontrolador PIC en C
Microcontroladores: Programación del microcontrolador PIC en C
 
Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!Help! I need an empirical study for my PhD!
Help! I need an empirical study for my PhD!
 

More from Rubén Izquierdo Beviá

CLTL python course: Object Oriented Programming (2/3)
CLTL python course: Object Oriented Programming (2/3)CLTL python course: Object Oriented Programming (2/3)
CLTL python course: Object Oriented Programming (2/3)
Rubén Izquierdo Beviá
 
CLTL python course: Object Oriented Programming (1/3)
CLTL python course: Object Oriented Programming (1/3)CLTL python course: Object Oriented Programming (1/3)
CLTL python course: Object Oriented Programming (1/3)
Rubén Izquierdo Beviá
 
CLTL Software and Web Services
CLTL Software and Web Services CLTL Software and Web Services
CLTL Software and Web Services
Rubén Izquierdo Beviá
 
Thesis presentation (WSD and Semantic Classes)
Thesis presentation (WSD and Semantic Classes)Thesis presentation (WSD and Semantic Classes)
Thesis presentation (WSD and Semantic Classes)
Rubén Izquierdo Beviá
 
ULM1 - The borders of Ambiguity
ULM1 - The borders of AmbiguityULM1 - The borders of Ambiguity
ULM1 - The borders of Ambiguity
Rubén Izquierdo Beviá
 
CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013
Rubén Izquierdo Beviá
 
CLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRFCLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRF
Rubén Izquierdo Beviá
 
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor  Building a semantically annotated corpus for DutchCLIN 2012: DutchSemCor  Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
Rubén Izquierdo Beviá
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
Rubén Izquierdo Beviá
 

More from Rubén Izquierdo Beviá (9)

CLTL python course: Object Oriented Programming (2/3)
CLTL python course: Object Oriented Programming (2/3)CLTL python course: Object Oriented Programming (2/3)
CLTL python course: Object Oriented Programming (2/3)
 
CLTL python course: Object Oriented Programming (1/3)
CLTL python course: Object Oriented Programming (1/3)CLTL python course: Object Oriented Programming (1/3)
CLTL python course: Object Oriented Programming (1/3)
 
CLTL Software and Web Services
CLTL Software and Web Services CLTL Software and Web Services
CLTL Software and Web Services
 
Thesis presentation (WSD and Semantic Classes)
Thesis presentation (WSD and Semantic Classes)Thesis presentation (WSD and Semantic Classes)
Thesis presentation (WSD and Semantic Classes)
 
ULM1 - The borders of Ambiguity
ULM1 - The borders of AmbiguityULM1 - The borders of Ambiguity
ULM1 - The borders of Ambiguity
 
CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013CLTL: Description of web services and sofware. Nijmegen 2013
CLTL: Description of web services and sofware. Nijmegen 2013
 
CLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRFCLTL presentation: training an opinion mining system from KAF files using CRF
CLTL presentation: training an opinion mining system from KAF files using CRF
 
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor  Building a semantically annotated corpus for DutchCLIN 2012: DutchSemCor  Building a semantically annotated corpus for Dutch
CLIN 2012: DutchSemCor Building a semantically annotated corpus for Dutch
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
 

Recently uploaded

The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
HongcNguyn6
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
terusbelajar5
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
European Sustainable Phosphorus Platform
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 

Recently uploaded (20)

The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốtmô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
mô tả các thí nghiệm về đánh giá tác động dòng khí hóa sau đốt
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Medical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptxMedical Orthopedic PowerPoint Templates.pptx
Medical Orthopedic PowerPoint Templates.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Thornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdfThornton ESPP slides UK WW Network 4_6_24.pdf
Thornton ESPP slides UK WW Network 4_6_24.pdf
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 

Topic modeling and WSD on the Ancora corpus

  • 1. Topic Modeling and WSD on the Ancora Corpus Ruben Izquierdo Marten Postma Piek Vossen Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 2. Outline 1. Starting Point 2. Motivation 3. Our Approach 4. Evaluation Framework 5. Experiments and Results 6. Conclusions 2Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 3. Starting point  “Understanding languages by machines” project  Starts from the results of DutchSemCor (WSD)  Analyse the real problems of WSD  Understand the WSD task  Word  Meaning  Context 3Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 4. Outline 1. Starting Point 2. Motivation 3. Our Approach 4. Evaluation Framework 5. Experiments and Results 6. Conclusions 4Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 5. Still WSD?  Word Sense Disambiguation is still unsolved  Used in high level applications  Recently some unsupervised approaches and SemEval tasks  Babelnet, Babelfy…  Several reasons and problems 5Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 6. WSD problems I  Context is not considered properly  Most are/were supervised approaches  Moving to unsupervised, graph-based…  WSD as a black box  The larger number of features, the better performance?  The best and newest machine learning algorithm  WSD is seen as only one problem  All words and cases treated in the same way 6Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 7. WSD problems II  Error analysis SenseEval/SemEval systems [Postma et al., 2014]  Propagation errors (monosemous)  Most Frequent Sense bias  Supervised systems are skewed towards MFS  Error analysis on WSD and SenseEval/SemEval  Performance on MFS cases is good  Very poor performance on non MFS cases 7Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 8. WSD problems II 8Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 9. WSD problems II  Most Frequent Sense bias  Supervised systems are skewed towards MFS  Error analysis on WSD and SenseEval/SemEval  Performance on MFS cases is good  Very poor performance on non MFS cases  Systems assign MFS in almost every case  Sval2  799 cases where the correct is not the MFS  84% of the system still assign the MFS 9Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 10. Outline 1. Starting Point 2. Motivation 3. Our Approach 4. Evaluation Framework 5. Experiments and Results 6. Conclusions 10Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 11. Main idea  WSD considered as two different problems  When the MFS applies  More general usages  Larger contexts ??  Rest of the senses  More concrete usages  Shorter contexts ??  Specialized classifiers for each case  Different features, parameters, contexts…  Evaluation for Spanish  Sense annotated corpus Ancora 11Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 12. Our approach  TRAINING. Use Topic Modeling (LDA) to induce word expert classifiers  For the Most Frequent Sense   Topics for the MFS case  Topics for non MFS cases  For the rest of senses (non MFS)  Topics for every sense  CLASSIFICATION. Apply the 2 classifiers in cascade to decide the sense in every case BINARY MULTICLASS 12Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 13. Training 13Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 14. Classification 14Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 15. Outline 1. Starting Point 2. Motivation 3. Our Approach 4. Evaluation Framework 5. Experiments and Results 6. Conclusions 15Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 16. Evaluation framework  Ancora corpus  News Articles, Spanish part, 500K words, sense annotated (nouns)  Converted to NAF format  3 Folded-cross validation  Keeping sense distribution  7119 unique lemmas annotated with nominal senses 16Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 17. Evaluation framework  Ancora corpus  Spanish part, 500K words, sense annotated (nouns)  3 Folded-cross validation  Keeping sense distribution  7119 unique lemmas annotated  4907 are monosemous (69%)  2212 are polysemous (31%)  589 with at least 3 instances per sense (from the annotated) 17Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 18. Evaluation framework  Ancora corpus  Spanish part, 500K words, sense annotated (nouns)  3 Folded-cross validation  Keeping sense distribution  7119 unique lemmas annotated 0 200 400 600 800 1000 1200 1400 2 3 4 5 6 7 8 9 10 11 12 Number of lemmas vs. polysemy Number of Lemmas 18Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 19. Baseline Results  For the 589 selected lemmas Baseline Accuracy Random 40.10 MFS overall 67.68 MFS folded 68.63 19Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 20. Outline 1. Starting Point 2. Motivation 3. Our Approach 4. Evaluation Framework 5. Experiments and Results 6. Conclusions 20Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 21. Experimentation  Configuration of our cascade classifiers  Only one step with the senseLDA classifier  2 steps, mfsLDA with perfect performance + senseLDA  2 steps, mfsLDA and senseLDA both induced automatically  LDA parameters (python gensim library)  Context size (number of sentences)  Number of topics for LDA 21Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 22. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante. Results I Instance Example Sense LDA (all senses) Word Sense One step classification Sentences Topics Accurac y MFS baseline 68.63 0 3 67.54 10 65.56 100 58.34 3 3 66.30 10 64.62 100 60.07 50 3 66.04 10 63.42 100 59.06 • MFS not reached • Most informative clues in small contexts • More topics  less performance 22
  • 23. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante. Results II Instance Example MFS (100% accuracy) Sense LDA (all senses) Word Sense Two steps, MFS classifier 100% performance Sentences Topics Accurac y MFS baseline 68.63 0 3 92.48 10 92.12 100 90.50 3 3 92.45 10 92.11 100 91.60 50 3 92.41 10 92.12 100 91.43 • Extremely high figures • Good performance of the senseLDA classifier (when no MFS) • Similar behaviour w.r.t. #sents and # topics 23
  • 24. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante. Results III Instance Example MFS (s5) Sense LDA (all senses) Word Sense Two steps, MFS classifier #S=5 Sents Topics Acc. MFS T100 Acc. MFS T1000 MFS baseline 68.63 0 3 74.53 66.73 10 74.00 66.41 100 72.61 64.91 3 3 74.30 66.61 10 73.87 66.36 100 73.39 65.76 50 3 74.26 66.48 10 73.90 66.24 100 73.53 65.75 • MFS s5 t100 • Smaller contexts for non MFS cases (3, 50 included by 0) • 3 Topics is the best 24
  • 25. Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante. Results IV Instance Example MFS (s50) Sense LDA (all senses) Word Sense Two steps, MFS classifier #S=50 Sents Topics Acc. MFS T100 Acc. MFS T1000 MFS baseline 68.63 0 3 73.34 67.15 10 72.92 66.76 100 71.43 65.13 3 3 73.21 67.02 10 72.88 66.60 100 72.40 66.24 50 3 73.21 66.95 10 72.83 66.58 100 72.15 66.20 • Similar behaviour compared to MFS_s5 • Slightly lower results 25
  • 26. Lemma comparison Lemma MFS (68.63) LDA (74.53) Variation Annotations año 89.15 91.19 2.04 1275 país 72.29 83.55 11.26 695 presidente 70.31 73.94 3.63 690 partido 55.87 64.48 8.61 641 equipo 98.32 98.88 0.56 539 mes 54.29 80 25.71 315 hora 61.39 56.11 -5.28 305 caso 61.05 91.58 30.53 286 mundo 47.31 40.14 -7.17 279 semana 85.06 92.34 7.28 263 Most frequent lemmas 26Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 27. Outline 1. Starting Point 2. Motivation 3. Our Approach 4. Evaluation Framework 5. Experiments and Results 6. Conclusions 27Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 28. Conclusions  Simple approach based on LDA for WSD in Spanish  Two step classification approach for WSD improves the results for Spanish (6 points)  Different nature of both cases  MFS in contexts of 5 sentences, 100 topics  NonMFS in contexts in the local sentence, 3 topics  All code and data publicly available on GitHub (group policy) http://github.com/rubenIzquierdo/lda_wsd 28Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 29. 29Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 30. 30Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.
  • 31. Ruben Izquierdo Marten Postma Piek Vossen email: ruben.izquierdobevia@vu.nl http://github.com/rubenIzquierdo/lda_wsd http://rubenizquierdobevia.com 31Ruben Izquierdo. LDA & WSD. SEPLN2015, Alicante.

Editor's Notes

  1. 4019 with 1 sense 1318 with 2 senses 449 with 3 227 with 4 110 with 5 41 with 6 38 with 7 11 with 8 10 with 9 5 with 10 2 with 11 senses 1 with 12