SlideShare a Scribd company logo
1 of 16
Download to read offline
A Survey on Unsupervised Graph-based Word
           Sense Disambiguation



                Elena-Oana Tabaranu
             elena.tabaranu@info.uaic.ro
                      UAIC, Iasi
Plan
1.Introduction
2.State of the Art
3.Experiments and Results
4.Conclusions
5.References




                     Elena-Oana Tabaranu   2
Introduction
●   WSD = assign automatically the most
    appropriate meaning to a polysemous word
    within a given context (Sinha et al, 2007)
●   Use Cases:
    ●   Machine translation
    ●   Speech processing
    ●   Boosting the performance of tasks like text retrieval, document
        classification and document clustering




                               Elena-Oana Tabaranu                        3
State of the Art
●   Supervised WSD vs Unsupervised WSD
●   GWSD and Semantic Graph Construction
●   SAN Method
●   Page-Rank Method
●   HITS Method
●   P-Rank Method



                    Elena-Oana Tabaranu    4
Supervised WSD vs Unsupervised WSD
●   Most approaches transform           ●   Identify the best sense
    the sense of the word into a            candidate for a model of the
    feature vector                          word sense dependency in
                                            text
●   Low execution time
                                        ●   Ranking algorithm to choose
●   Accuracy of 60%-70%
                                            their most likely combination
●   Major disadvantage:                 ●   Window, graph based
    knowledge aquisition
                                            representation of the model
    bottleneck (accuracy
    connected to the amount of          ●   Fast execution time
    manually anotated data)             ●   Accuracy of 40%-60%



                            Elena-Oana Tabaranu                             5
Graph-based WSD
●   GWSD = graph representation used to model
    word sense dependencies in text (WSD with
    graphs, not just word window)
●   Goal: identify the most probable sense (label)
    for each word
●   Advantage: takes into account information
    drawn from the entire graph



                      Elena-Oana Tabaranu            6
Semantic Graph Construction (I)
●   Example (Sinha et al, 2007)




                          Elena-Oana Tabaranu   7
Semantic Graph Construction (II)
●   Example (Tsatsaronis et al, 2010)




                          Elena-Oana Tabaranu   8
The Page-Rank Method (Brin and
             Page, 1998)
●   Ranking algorithm based on the idea of voting:
    when one node links to another it offers a vote
    to that other node
●   The higher the number of votes for a note, the
    higher the importance of the node
●   Recursively score the candidate nodes for a
    weighted undirected graph



                      Elena-Oana Tabaranu             9
The P-Rank Method (Zao et al,
                2009)
●   Check the structural similarity of nodes in an
    information network
●   Based on the idea that two nodes are similar if
    they reference and also reference similar nodes
●   Represents a generalization of other state of
    the art measures like CoCitation, Coupling,
    Amsler, SimLink



                       Elena-Oana Tabaranu           10
The HITS Method (Kleinberg,1999)
●   Identify authorities = the most important nodes
    in the graph
●   Identify hubs = the nodes which point to
    authorities
●   The sense with the highest authority is chosen
    as the most likely one for each word
●   Major disadvantage: densely connected nodes
    can attract the highest score (clique attack)

                      Elena-Oana Tabaranu             11
Experiments and Results (I)
●   Senseval 2 and 3 data sets often used for testing
●   Occurencies for Senseval 2 using WordNet 2




●   Occurencies for Senseval 3 using WordNet 2




                            Elena-Oana Tabaranu         12
Experiments and Results (II)
●   Accuracies on the Senseval 2 and 3 English All
    Words Task data sets (Tsatsaronis et al)




                      Elena-Oana Tabaranu        13
Conclusions
●   Recent systems minimise the gap between supervised
    and unsupervised approaches.
●   The graph-based methods make the most of the rich
    semantic model they employ.
●   Unsupervised approaches seek the optimal value for
    the parameters using as little training data as possible
    and testing on as large a dataset as possible.
●   Future work: implement P-Rank using a different
    representation, for example Sinha et al.


                          Elena-Oana Tabaranu              14
References
1. Tsatsaronis, G., Varlamis, I., Norvag, K. : An Experimental
   Study on Unsupervised Graph-based Word Sense
   Disambiguation. In Proc. of CICLing (2010).
2. Sinha, R., Mihalcea, R. :Unsupervised graph-based word
  sense disambiguation using measures of semantic similarity. In
  Proc. of ICSC (2007).
3. Mihalcea, R., Csomai, A. : Senselearner: Word sense
  disambiguation for all words in unrestricted text. In Proc. of
  ACL, pages 53-56 (2005).
4. Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I. :Word
   Sense Disambiguation with Spreading Activation Networks
   Generated from Thesauri. In Proc. of IJCAI (2007).

                            Elena-Oana Tabaranu                    15
Questions?




  Elena-Oana Tabaranu   16

More Related Content

What's hot

What's hot (12)

Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
IFITT PhD Seminar 2015. Text Mining Ideas & Examples
IFITT PhD Seminar 2015. Text Mining Ideas & ExamplesIFITT PhD Seminar 2015. Text Mining Ideas & Examples
IFITT PhD Seminar 2015. Text Mining Ideas & Examples
 
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
 
Extract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep LearningExtract Stressors for Suicide from Twitter Using Deep Learning
Extract Stressors for Suicide from Twitter Using Deep Learning
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Sentiment analysis of twitter data
Sentiment analysis of twitter dataSentiment analysis of twitter data
Sentiment analysis of twitter data
 
Unit 3 Arithmetic Coding
Unit 3 Arithmetic CodingUnit 3 Arithmetic Coding
Unit 3 Arithmetic Coding
 
Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Data Tactics Analytics Brown Bag (Aug 22, 2013)
Data Tactics Analytics Brown Bag (Aug 22, 2013)
 
Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Unit 5 Quantization
Unit 5 QuantizationUnit 5 Quantization
Unit 5 Quantization
 
The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2The effect of number of concepts on readability of schemas 2
The effect of number of concepts on readability of schemas 2
 
Unit 3 Dictionary based Compression Techniques
Unit 3 Dictionary based Compression TechniquesUnit 3 Dictionary based Compression Techniques
Unit 3 Dictionary based Compression Techniques
 

Similar to A Survey on Unsupervised Graph-based Word Sense Disambiguation

Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...
butest
 
phani_halvi_new-1[1]
phani_halvi_new-1[1]phani_halvi_new-1[1]
phani_halvi_new-1[1]
Phani Halvi
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
CS, NcState
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
IAESIJAI
 

Similar to A Survey on Unsupervised Graph-based Word Sense Disambiguation (20)

Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...
 
Sensing complicated meanings from unstructured data: a novel hybrid approach
Sensing complicated meanings from unstructured data: a novel hybrid approachSensing complicated meanings from unstructured data: a novel hybrid approach
Sensing complicated meanings from unstructured data: a novel hybrid approach
 
phani_halvi_new-1[1]
phani_halvi_new-1[1]phani_halvi_new-1[1]
phani_halvi_new-1[1]
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Viva
VivaViva
Viva
 
Off-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative SurveyOff-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative Survey
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Thinking about nlp
Thinking about nlpThinking about nlp
Thinking about nlp
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Prediction
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
From Linked Data to Semantic Applications
From Linked Data to Semantic ApplicationsFrom Linked Data to Semantic Applications
From Linked Data to Semantic Applications
 

More from Elena-Oana Tabaranu

SXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersSXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBusters
Elena-Oana Tabaranu
 

More from Elena-Oana Tabaranu (7)

Recunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe TweeterRecunoasterea organizatiilor in postarile pe Tweeter
Recunoasterea organizatiilor in postarile pe Tweeter
 
SXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBustersSXSW 2012 JavaScript MythBusters
SXSW 2012 JavaScript MythBusters
 
Notes on a Standard: Unicode
Notes on a Standard: UnicodeNotes on a Standard: Unicode
Notes on a Standard: Unicode
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 
Miscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semanticMiscarea "NoSQL" in contextul Web-ului social/semantic
Miscarea "NoSQL" in contextul Web-ului social/semantic
 
Folosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continutFolosirea instumentului Zemanta in recomandarea de continut
Folosirea instumentului Zemanta in recomandarea de continut
 
Adobe Flex Framework
Adobe Flex FrameworkAdobe Flex Framework
Adobe Flex Framework
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

A Survey on Unsupervised Graph-based Word Sense Disambiguation

  • 1. A Survey on Unsupervised Graph-based Word Sense Disambiguation Elena-Oana Tabaranu elena.tabaranu@info.uaic.ro UAIC, Iasi
  • 2. Plan 1.Introduction 2.State of the Art 3.Experiments and Results 4.Conclusions 5.References Elena-Oana Tabaranu 2
  • 3. Introduction ● WSD = assign automatically the most appropriate meaning to a polysemous word within a given context (Sinha et al, 2007) ● Use Cases: ● Machine translation ● Speech processing ● Boosting the performance of tasks like text retrieval, document classification and document clustering Elena-Oana Tabaranu 3
  • 4. State of the Art ● Supervised WSD vs Unsupervised WSD ● GWSD and Semantic Graph Construction ● SAN Method ● Page-Rank Method ● HITS Method ● P-Rank Method Elena-Oana Tabaranu 4
  • 5. Supervised WSD vs Unsupervised WSD ● Most approaches transform ● Identify the best sense the sense of the word into a candidate for a model of the feature vector word sense dependency in text ● Low execution time ● Ranking algorithm to choose ● Accuracy of 60%-70% their most likely combination ● Major disadvantage: ● Window, graph based knowledge aquisition representation of the model bottleneck (accuracy connected to the amount of ● Fast execution time manually anotated data) ● Accuracy of 40%-60% Elena-Oana Tabaranu 5
  • 6. Graph-based WSD ● GWSD = graph representation used to model word sense dependencies in text (WSD with graphs, not just word window) ● Goal: identify the most probable sense (label) for each word ● Advantage: takes into account information drawn from the entire graph Elena-Oana Tabaranu 6
  • 7. Semantic Graph Construction (I) ● Example (Sinha et al, 2007) Elena-Oana Tabaranu 7
  • 8. Semantic Graph Construction (II) ● Example (Tsatsaronis et al, 2010) Elena-Oana Tabaranu 8
  • 9. The Page-Rank Method (Brin and Page, 1998) ● Ranking algorithm based on the idea of voting: when one node links to another it offers a vote to that other node ● The higher the number of votes for a note, the higher the importance of the node ● Recursively score the candidate nodes for a weighted undirected graph Elena-Oana Tabaranu 9
  • 10. The P-Rank Method (Zao et al, 2009) ● Check the structural similarity of nodes in an information network ● Based on the idea that two nodes are similar if they reference and also reference similar nodes ● Represents a generalization of other state of the art measures like CoCitation, Coupling, Amsler, SimLink Elena-Oana Tabaranu 10
  • 11. The HITS Method (Kleinberg,1999) ● Identify authorities = the most important nodes in the graph ● Identify hubs = the nodes which point to authorities ● The sense with the highest authority is chosen as the most likely one for each word ● Major disadvantage: densely connected nodes can attract the highest score (clique attack) Elena-Oana Tabaranu 11
  • 12. Experiments and Results (I) ● Senseval 2 and 3 data sets often used for testing ● Occurencies for Senseval 2 using WordNet 2 ● Occurencies for Senseval 3 using WordNet 2 Elena-Oana Tabaranu 12
  • 13. Experiments and Results (II) ● Accuracies on the Senseval 2 and 3 English All Words Task data sets (Tsatsaronis et al) Elena-Oana Tabaranu 13
  • 14. Conclusions ● Recent systems minimise the gap between supervised and unsupervised approaches. ● The graph-based methods make the most of the rich semantic model they employ. ● Unsupervised approaches seek the optimal value for the parameters using as little training data as possible and testing on as large a dataset as possible. ● Future work: implement P-Rank using a different representation, for example Sinha et al. Elena-Oana Tabaranu 14
  • 15. References 1. Tsatsaronis, G., Varlamis, I., Norvag, K. : An Experimental Study on Unsupervised Graph-based Word Sense Disambiguation. In Proc. of CICLing (2010). 2. Sinha, R., Mihalcea, R. :Unsupervised graph-based word sense disambiguation using measures of semantic similarity. In Proc. of ICSC (2007). 3. Mihalcea, R., Csomai, A. : Senselearner: Word sense disambiguation for all words in unrestricted text. In Proc. of ACL, pages 53-56 (2005). 4. Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I. :Word Sense Disambiguation with Spreading Activation Networks Generated from Thesauri. In Proc. of IJCAI (2007). Elena-Oana Tabaranu 15
  • 16. Questions? Elena-Oana Tabaranu 16