SlideShare a Scribd company logo
1 of 32
Download to read offline
An Enhanced Lesk Word Sense Disambiguation 
Algorithm through a Distributional Semantic Model 
Pierpaolo Basile, Annalina Caputo and Giovanni Semeraro 
annalina.caputo@uniba.it 
Department of Computer Science - SWAP Research Group 
University of Bari Aldo Moro (ITALY) 
Coling 2014, Dublin, 27th-29th August 2014 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 1 / 21
Motivations Problem 
One word... many meanings 
BANK 
1 Sloping land (especially the slope beside a body of water) 
2 A
nancial institution that accepts deposits and channels the money into lending 
activities 
3 A long ridge or pile 
4 ... 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 2 / 21
Motivations Lesk WSD 
Simple Lesk approach 
Insight 
Select the meaning whose gloss maximizes the context overlap 
Example 
The bank keeps my money 
1 Sloping land (especially the slope beside a body of water) 
2 A
nancial institution that accepts deposits and channels the money into lending 
activities 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 3 / 21
Motivations Lesk WSD 
Simple Lesk approach 
Insight 
Select the meaning whose gloss maximizes the context overlap 
Example 
The bank keeps my money 
1 Sloping land (especially the slope beside a body of water) ) overlap=0 
2 A
nancial institution that accepts deposits and channels the money into lending 
activities ) overlap=1 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 3 / 21
Motivations Lesk WSD 
Simple Lesk approach 
Issues 
1 Sense de
nition is short ) Reduced chances of matching 
2 Overlap based on string matching ) Semantically related words are considered 
dierently 
3 No knowledge about senses usage 
Lesk mismatch 
Sentence to disambiguate 
he cashed a check at the bank 
Right sense de
nition 
A
nancial institution that accepts deposits 
and channels the money into lending activities 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 4 / 21
Motivations Lesk WSD 
Simple Lesk approach 
Issues 
1 Sense de
nition is short ) Reduced chances of matching 
2 Overlap based on string matching ) Semantically related words are considered 
dierently 
3 No knowledge about senses usage 
Lesk mismatch 
Sentence to disambiguate 
he cashed a check at the bank 
Right sense de
nition 
A
nancial institution that accepts deposits 
and channels the money into lending activities 
overlap=0 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 4 / 21
Solution Distributional Lesk 
Idea 
Solutions 
1 Sense de
nition is short ) Gloss expansion through related meanings 
2 Overlap is based on string matching ) Similarity computed in a WordSpace 
3 No knowledge about senses usage ) Exploiting sense annotated corpus 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 5 / 21
Solution Distributional Lesk 
Idea 
Solutions 
1 Sense de
nition is short ) Gloss expansion through related meanings 
Gloss Expansion 
Sentence to disambiguate 
he cashed a check at the bank 
A
nancial institution that accepts deposits and channels the money into lending activities 
+ 
A
nancial institution that accepts demand deposits and makes loans and provides other services 
for the public... One of 12 regional banks that monitor and act as depositories for banks in their 
region... A corporation gaining
nancial control over another corporation or
nancial institution 
through a payment in cash or an exchange of stock... 
overlap=1 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 5 / 21
Solution Distributional Lesk 
Idea 
Solutions 
2 Overlap is based on string matching ) Similarity computed in a WordSpace 
Gloss Expansion 
Sentence to disambiguate 
that bank holds the mortgage on my home 
A
nancial institution that accepts deposits and channels the money into lending activities 
+ 
A
nancial institution that accepts demand deposits and makes loans and provides other services 
for the public... One of 12 regional banks that monitor and act as depositories for banks in their 
region... A corporation gaining
nancial control over another corporation or
nancial institution 
through a payment in cash or an exchange of stock... 
overlap=0 
A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 5 / 21
Solution Distributional Lesk 
Idea 
Solutions 
2 Overlap is based on string matching ) Similarity computed in a WordSpace 
Gloss Expansion 
Sentence to disambiguate 
that bank holds the mortgage on my home 
A
nancial institution that accepts deposits and 
channels the money into lending activities + A
nancial institution that accepts demand 
deposits and makes loans and provides other 
services for the public... One of 12 regional 
banks that monitor and act as depositories for 
banks in their region... A corporation gaining
nancial control over another corporation or

More Related Content

Viewers also liked

Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationElena-Oana Tabaranu
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a surveyunyil96
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]akm sabbir
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationvini89
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationRubén Izquierdo Beviá
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and InductionLeon Derczynski
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureeXascale Infolab
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Grupo HULAT
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningStuart Shulman
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 

Viewers also liked (14)

Graph-based Word Sense Disambiguation
Graph-based Word Sense DisambiguationGraph-based Word Sense Disambiguation
Graph-based Word Sense Disambiguation
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Word sense disambiguation a survey
Word sense disambiguation a surveyWord sense disambiguation a survey
Word sense disambiguation a survey
 
Word-sense disambiguation
Word-sense disambiguationWord-sense disambiguation
Word-sense disambiguation
 
Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]Biomedical Word Sense Disambiguation presentation [Autosaved]
Biomedical Word Sense Disambiguation presentation [Autosaved]
 
Similarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguationSimilarity based methods for word sense disambiguation
Similarity based methods for word sense disambiguation
 
Error analysis of Word Sense Disambiguation
Error analysis of Word Sense DisambiguationError analysis of Word Sense Disambiguation
Error analysis of Word Sense Disambiguation
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Word Sense Disambiguation and Induction
Word Sense Disambiguation and InductionWord Sense Disambiguation and Induction
Word Sense Disambiguation and Induction
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.Babelfy: Entity Linking meets Word Sense Disambiguation.
Babelfy: Entity Linking meets Word Sense Disambiguation.
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 

Similar to COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model

Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioDeep Learning Italia
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Marina Santini
 
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...Istituto nazionale di statistica
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 

Similar to COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model (7)

Embedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglioEmbedding for fun fumarola Meetup Milano DLI luglio
Embedding for fun fumarola Meetup Milano DLI luglio
 
Measuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and ConceptsMeasuring Similarity Between Contexts and Concepts
Measuring Similarity Between Contexts and Concepts
 
Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)Lecture: Vector Semantics (aka Distributional Semantics)
Lecture: Vector Semantics (aka Distributional Semantics)
 
Icon 2007 Pedersen
Icon 2007 PedersenIcon 2007 Pedersen
Icon 2007 Pedersen
 
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
M. De Cubellis, F. De Fausti, Word Embeddings: modellare il significato delle...
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 

More from Pierpaolo Basile

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsPierpaolo Basile
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaPierpaolo Basile
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesPierpaolo Basile
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsPierpaolo Basile
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingPierpaolo Basile
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingPierpaolo Basile
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterPierpaolo Basile
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017Pierpaolo Basile
 
Diachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramDiachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramPierpaolo Basile
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachinePierpaolo Basile
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...Pierpaolo Basile
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesPierpaolo Basile
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingPierpaolo Basile
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesPierpaolo Basile
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringPierpaolo Basile
 
Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloPierpaolo Basile
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOPierpaolo Basile
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessPierpaolo Basile
 

More from Pierpaolo Basile (20)

Diachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisionsDiachronic analysis of entities by exploiting wikipedia page revisions
Diachronic analysis of entities by exploiting wikipedia page revisions
 
Come l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storiaCome l'industria tecnologica ha cancellato le donne dalla storia
Come l'industria tecnologica ha cancellato le donne dalla storia
 
EVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language gamesEVALITA 2018 NLP4FUN - Solving language games
EVALITA 2018 NLP4FUN - Solving language games
 
Buon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian TweetsBuon appetito! Analyzing Happiness in Italian Tweets
Buon appetito! Analyzing Happiness in Italian Tweets
 
Detecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexingDetecting semantic shift in large corpora by exploiting temporal random indexing
Detecting semantic shift in large corpora by exploiting temporal random indexing
 
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence LabelingBi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling
 
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street FighterINSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
INSERT COIN - Storia dei videogame: da Spacewar a Street Fighter
 
QuestionCube DigithON 2017
QuestionCube DigithON 2017QuestionCube DigithON 2017
QuestionCube DigithON 2017
 
Diachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google NgramDiachronic Analysis of the Italian Language exploiting Google Ngram
Diachronic Analysis of the Italian Language exploiting Google Ngram
 
Diachronic Analysis
Diachronic AnalysisDiachronic Analysis
Diachronic Analysis
 
(Open) data hacking
(Open) data hacking(Open) data hacking
(Open) data hacking
 
La macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing MachineLa macchina più geek dell’universo The Turing Machine
La macchina più geek dell’universo The Turing Machine
 
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
UNIBA: Exploiting a Distributional Semantic Model for Disambiguating and Link...
 
Building WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spacesBuilding WordSpaces via Random Indexing from simple to complex spaces
Building WordSpaces via Random Indexing from simple to complex spaces
 
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random IndexingAnalysing Word Meaning over Time by Exploiting Temporal Random Indexing
Analysing Word Meaning over Time by Exploiting Temporal Random Indexing
 
A Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional SpacesA Study on Compositional Semantics of Words in Distributional Spaces
A Study on Compositional Semantics of Words in Distributional Spaces
 
Exploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question AnsweringExploiting Distributional Semantic Models in Question Answering
Exploiting Distributional Semantic Models in Question Answering
 
Sst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaoloSst evalita2011 basile_pierpaolo
Sst evalita2011 basile_pierpaolo
 
AI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHOAI*IA 2012 PAI Workshop OTTHO
AI*IA 2012 PAI Workshop OTTHO
 
Word Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information AccessWord Sense Disambiguation and Intelligent Information Access
Word Sense Disambiguation and Intelligent Information Access
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 

COLING 2014 - An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model

  • 1. An Enhanced Lesk Word Sense Disambiguation Algorithm through a Distributional Semantic Model Pierpaolo Basile, Annalina Caputo and Giovanni Semeraro annalina.caputo@uniba.it Department of Computer Science - SWAP Research Group University of Bari Aldo Moro (ITALY) Coling 2014, Dublin, 27th-29th August 2014 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 1 / 21
  • 2. Motivations Problem One word... many meanings BANK 1 Sloping land (especially the slope beside a body of water) 2 A
  • 3. nancial institution that accepts deposits and channels the money into lending activities 3 A long ridge or pile 4 ... A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 2 / 21
  • 4. Motivations Lesk WSD Simple Lesk approach Insight Select the meaning whose gloss maximizes the context overlap Example The bank keeps my money 1 Sloping land (especially the slope beside a body of water) 2 A
  • 5. nancial institution that accepts deposits and channels the money into lending activities A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 3 / 21
  • 6. Motivations Lesk WSD Simple Lesk approach Insight Select the meaning whose gloss maximizes the context overlap Example The bank keeps my money 1 Sloping land (especially the slope beside a body of water) ) overlap=0 2 A
  • 7. nancial institution that accepts deposits and channels the money into lending activities ) overlap=1 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 3 / 21
  • 8. Motivations Lesk WSD Simple Lesk approach Issues 1 Sense de
  • 9. nition is short ) Reduced chances of matching 2 Overlap based on string matching ) Semantically related words are considered dierently 3 No knowledge about senses usage Lesk mismatch Sentence to disambiguate he cashed a check at the bank Right sense de
  • 11. nancial institution that accepts deposits and channels the money into lending activities A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 4 / 21
  • 12. Motivations Lesk WSD Simple Lesk approach Issues 1 Sense de
  • 13. nition is short ) Reduced chances of matching 2 Overlap based on string matching ) Semantically related words are considered dierently 3 No knowledge about senses usage Lesk mismatch Sentence to disambiguate he cashed a check at the bank Right sense de
  • 15. nancial institution that accepts deposits and channels the money into lending activities overlap=0 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 4 / 21
  • 16. Solution Distributional Lesk Idea Solutions 1 Sense de
  • 17. nition is short ) Gloss expansion through related meanings 2 Overlap is based on string matching ) Similarity computed in a WordSpace 3 No knowledge about senses usage ) Exploiting sense annotated corpus A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 5 / 21
  • 18. Solution Distributional Lesk Idea Solutions 1 Sense de
  • 19. nition is short ) Gloss expansion through related meanings Gloss Expansion Sentence to disambiguate he cashed a check at the bank A
  • 20. nancial institution that accepts deposits and channels the money into lending activities + A
  • 21. nancial institution that accepts demand deposits and makes loans and provides other services for the public... One of 12 regional banks that monitor and act as depositories for banks in their region... A corporation gaining
  • 22. nancial control over another corporation or
  • 23. nancial institution through a payment in cash or an exchange of stock... overlap=1 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 5 / 21
  • 24. Solution Distributional Lesk Idea Solutions 2 Overlap is based on string matching ) Similarity computed in a WordSpace Gloss Expansion Sentence to disambiguate that bank holds the mortgage on my home A
  • 25. nancial institution that accepts deposits and channels the money into lending activities + A
  • 26. nancial institution that accepts demand deposits and makes loans and provides other services for the public... One of 12 regional banks that monitor and act as depositories for banks in their region... A corporation gaining
  • 27. nancial control over another corporation or
  • 28. nancial institution through a payment in cash or an exchange of stock... overlap=0 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 5 / 21
  • 29. Solution Distributional Lesk Idea Solutions 2 Overlap is based on string matching ) Similarity computed in a WordSpace Gloss Expansion Sentence to disambiguate that bank holds the mortgage on my home A
  • 30. nancial institution that accepts deposits and channels the money into lending activities + A
  • 31. nancial institution that accepts demand deposits and makes loans and provides other services for the public... One of 12 regional banks that monitor and act as depositories for banks in their region... A corporation gaining
  • 32. nancial control over another corporation or
  • 33. nancial institution through a payment in cash or an exchange of stock... sloping side ground mortgage loans lending deposit money
  • 34. nancial cash payment A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 5 / 21
  • 35. Solution Gloss expansion Gloss expansion Leavening on a semantic network Concatenate recursively glosses of related synsets until a depth d is reached Exclude antonym relation A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 6 / 21
  • 36. Solution Gloss expansion Term weighting Idea Term relevance depends on both its frequency and the distance d of the related synset Solutions Inverse gloss frequency (IGF ) Words occurring in all the extended glosses associated with the target word poorly characterize the meaning description Distance weight Inversely proportional to the distance in the network (number of edges) between the target synset and the related synset A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 7 / 21
  • 37. Solution Gloss expansion Term weighting Idea Term relevance depends on both its frequency and the distance d of the related synset Solutions Inverse gloss frequency (IGF ) Words occurring in all the extended glosses associated with the target word poorly characterize the meaning description Distance weight Inversely proportional to the distance in the network (number of edges) between the target synset and the related synset Bank 1 Sloping land (especially the slope beside a body of water) 2 A
  • 38. nancial institution that accepts deposits and channels the money into lending activities ... 8 A container (usually with a slot in the top) for keeping money at home A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 7 / 21
  • 39. Solution Gloss expansion Term weight Inverse gloss frequency IGFk = 1 + log2 jSi j gf k (1) gf k is the number of extended glosses that contain a word wk Term weight Weight for word wk appearing h times in the extended gloss g ij is given by weight(wk ; g ij ) = Xh 1 1 + d IGFk (2) A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 8 / 21
  • 40. Solution WordSpace Distributional Semantic Models (DSMs) You shall know a word by the company it keeps! Words are represented as points in a geometric space Words are related if they are close in that space A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 9 / 21
  • 41. Solution WordSpace Overlap in DSM Gloss as a vector: weighted vector sum of terms occurring in the expanded gloss Context as a vector: vector sum of the target surrounding words Compute the overlap as the cosine similarity between gloss vector and context vector bank hold mortgage home
  • 42. nancial institution accept deposit channel money lend activity... sloping land especially slope beside bodywater... A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 10 / 21
  • 43. Solution Sense Distribution Sense distribution Insight Analyze the distribution of meanings according to each word Solution p(sij jwi ) = t(wi ; sij ) + 1 #wi + jSi j (3) t(wi ; sij ): number of times the word wi is tagged with sij #wi : number of occurrences of wi A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 11 / 21
  • 44. Solution Methodology Shaking the ingredients 1 For each word retrieve the list of meanings A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 12 / 21
  • 45. Solution Methodology Shaking the ingredients 1 For each word retrieve the list of meanings 2 Expand the glosses and build for each expanded gloss the corresponding vector A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 12 / 21
  • 46. Solution Methodology Shaking the ingredients 1 For each word retrieve the list of meanings 2 Expand the glosses and build for each expanded gloss the corresponding vector 3 Create the context vector considering surrounding words A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 12 / 21
  • 47. Solution Methodology Shaking the ingredients 1 For each word retrieve the list of meanings 2 Expand the glosses and build for each expanded gloss the corresponding vector 3 Create the context vector considering surrounding words 4 Compute the overlap in DSM A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 12 / 21
  • 48. Solution Methodology Shaking the ingredients 1 For each word retrieve the list of meanings 2 Expand the glosses and build for each expanded gloss the corresponding vector 3 Create the context vector considering surrounding words 4 Compute the overlap in DSM 5 Combine the overlap with sense distribution A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 12 / 21
  • 49. Solution Methodology Shaking the ingredients 1 For each word retrieve the list of meanings 2 Expand the glosses and build for each expanded gloss the corresponding vector 3 Create the context vector considering surrounding words 4 Compute the overlap in DSM 5 Combine the overlap with sense distribution 6 Select the meaning whose extended gloss has the maximum overlap A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 12 / 21
  • 50. Evaluation Goal Evaluation Goals Comparing our system with respect to 1 Simpli
  • 51. ed Lesk approach 2 Other task participants Evaluate the system with and without sense distribution Sense distribution linearly combined with the cosine similarity score Dataset Dataset: Task-12 of SemEval-2013 Multilingual Word Sense Disambiguation Sense inventory: BabelNet Metrics: F-measure A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 13 / 21
  • 52. Evaluation System setup System setup Developed in JAVA relying on BabelNet API 1.1.11 Lucene analyzer to tokenize both glosses and the context, Snowball library2 stemming Latent Semantic Analysis for building DSM considering the most 100; 000 frequent words BNC corpus for English Wikipedia dump for Italian Synset distance d is set to 1 Several context dimension: 3, 5, 10, 20 and the whole text Combination factor for cosine similarity and sense distribution: 0.5 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 14 / 21
  • 53. Evaluation Results English Run ContextSize SenseDistr: F MFS - - 0.656 EN.LESK.1 3 N 0.525 EN.LESK.6 3 Y 0.633 EN.DSM.1 3 N 0.536 EN.DSM.2 5 N 0.605 EN.DSM.3 10 N 0.633 EN.DSM.4 20 N 0.650 EN.DSM.5 W N 0.687 EN.DSM.6 3 Y 0.669 EN.DSM.7 5 Y 0.677 EN.DSM.8 10 Y 0.689 EN.DSM.9 20 Y 0.696 EN.DSM.10 W Y 0.715 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 15 / 21
  • 54. Evaluation Results Italian Run ContextSize SenseDistr: F MFS - - 0.572 IT.LESK.2 5 N 0.530 IT.LESK.10 W Y 0.607 IT.DSM.1 3 N 0.610 IT.DSM.2 5 N 0.607 IT.DSM.3 10 N 0.626 IT.DSM.4 20 N 0.628 IT.DSM.5 W N 0.633 IT.DSM.6 3 Y 0.631 IT.DSM.7 5 Y 0.630 IT.DSM.8 10 Y 0.635 IT.DSM.9 20 Y 0.639 IT.DSM.10 W Y 0.641 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 16 / 21
  • 55. Evaluation Task results English System F EN.DSM.10 0.715 EN.DSM.5 0.687 UMCC-DLSI-2 0.685 UMCC-DLSI-3 0.680 UMCC-DLSI-1 0.677 MFS 0.656 DAEBAK 0.604 GETALP-BN-1 0.263 GETALP-BN-2 0.266 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 17 / 21
  • 56. Evaluation Task results Italian System F UMCC-DLSI-2 0.658 UMCC-DLSI-1 0.657 IT.DSM.10 0.641 IT.DSM.5 0.633 DAEBAK 0.613 MFS 0.572 GETALP-BN-2 0.325 GETALP-BN-1 0.324 A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 18 / 21
  • 57. Conclusions and Future Work Conclusions Recap The proposed algorithm outperforms the simple Lesk one for both English and Italian The system without knowledge about sense distribution always outperform the MFS baseline For English the system obtained the best results in the SemEval-2013 Task 12 with or without sense distribution A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 19 / 21
  • 58. Conclusions and Future Work Future work What's next? Extend the evaluation to other languages Evaluate dierent DSMs and compositional approaches Adapt our approach to a speci
  • 59. c domain Using a domain corpus for DSM building Exploit a domain sense annotated corpus for sense distribution A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 20 / 21
  • 60. That's all folks! The system is available on line https://github.com/pippokill/lesk-wsd-dsm A. Caputo (annalina.caputo@uniba.it) Lesk-DSM Coling 2014 - 28 Aug. 2014 21 / 21