SlideShare a Scribd company logo
1 of 17
WORD SENSE DISAMBIGUATION AND LEXICAL
CHAINS CONSTRUCTION USING WORDNET
The Second Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and
Connectivity
September 14th 2010, Bucharest,
ROMANIA
Costin-Gabriel Chiru, Andrei Jancă, Traian Rebedea
Politehnica University of Bucharest
Summary
 The corpus
 Lexical Chains
 Semantic Distances
 WordNet
 Word sense
disambiguation
 Results
 Further research
The corpus
 Chats consisting of 4 or 5 participants,
debating subjects related to collaborative
learning
 High percentage of misspelled words
 Keywords - “wiki” and “forum” – not present
in WordNet with the proper sense
 Utterances not necessarily a correct text in
respect to English grammar
Lexical chains
 Lexical chain: set of words where each word has
an acceptable degree of semantic relatedness to
every other word in the set
 Each word must be fitted into a lexical chain –
How?
 The percentage of words in the chain that are
“related” to a word
 Over 90% - word “belongs” to that chain
Semantic distance
 When are two words considered to be
“related”?
 Methods of computing a semantic distance
between words – a number in respect to the
strength of the semantic connection
 Most methods use word frequency and the
lowest superordinate => the need for an
ontology
WordNet
 Ontology containing information about the
semantic relationships between words
 Words defining the same concept are
grouped into sets - Synsets
 Directed acyclic graph – synsets are nodes,
semantic relationships are vertices =>
consistency with the lower superordinate
concept
Semantic distances in WN
 Path length
• If a vertex exists between two nodes, the two
synsets are related through a semantic
relationship
• The length of such a path shows the strength of
a relationship between two senses.
 Conrath-Jiang measure
• Uses the lower superordinate and word
frequency
• Word frequency – number of hits returned by a
Word sense disambiguation
 Assigning a sense to an ambiguous word,
based on a context
 Semantic distance is computed for senses,
not for words
 The following scenario might occur : two
words are found to be semantically close, but
not for their right senses
Word sense disambiguation (2)
 Context = a window of words
 Window size – trade-off between time and
quality of results
 Our corpus : ideas and the subject of the
utterances often alternate => a large window
size is not necessary
Word sense disambiguation (3)
 Each word in the context has a list of senses
 A set of word-sense pairs : for each word in
the window, a sense is assigned
 We must choose the best such set => a score
must be computed for each set
 What evaluation function should compute the
score?
Evaluation function for WSD
 The degree of a word-sense pair : The number
of senses related to that sense in a set
 We use WN and semantic distances to
determine if two senses are related
 High degree – higher probability of a correct
word-sense assignment
 The average of all degrees in a set – high
average = high score
Evaluation function for WSD (2)
 Problem : many word-sense pairs with very low
degree and few with very high degree
 All degrees should be “packed” around the
average => standard deviation of all degrees.
 Low standard deviation = high score
 Low semantic distances = all senses are closely
related => we need the average and standard
deviation of all distances
Results
 Window size : 3-4 utterances
 High threshold for computing semantic
relatedness
 The lowest superordinate is part of the
shortest path between two senses – we
ignore path lengths > 4
 A word is included in a chain if it is related to
90% of the words present in that chain
Results (2)
 Vocabulary size for corpus - 1696 words
 With WSD
 Average chain length for entire corpus = 52.68 words
 Number of chains = 649
 Longest chain : 95 distinct words
 Number of unitary chains (chains with a single distinct word) =
394 => 23.21 % of all words are part of such a chain
 Without WSD
 Average chain length for entire corpus = 94.48 words
 Number of chains = 358 , of which 260 are unitary chains.
Therefore, some chains are very long and probably inaccurate
 Longest chain : 756 distinct words (over 40% of the vocabulary
size)
Further research
 There is a high dependency between the
linguistic tool (WN is used now) , the corpus and
algorithms for tasks like lexical chaining and
WSD
 Bottlenecks and key points of this system must
be identified
 Wikipedia – can be used as an ontology
(Wikipedia has a category graph), as well as a
relevant corpus
 Implementing a spell-checker to increase the
number of words taken into account
Further research (2)
 Each word must be fitted into a lexical chain –
How?
 When is a chain stronger, rather than where
does a word best belong
• “Strength” of a chain = how closely related are
the words
• Output is a set of chains => “strength” of a set of
chains
• State-space searching : the output of the current
lexical chaining algorithm is the initial state, while
the final state is an acceptably strong set of
chains
Q & A

More Related Content

Similar to Word sense disambiguation and lexical chains construction using wordnet

Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approachdinesh_joshy
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRubén Izquierdo Beviá
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRubén Izquierdo Beviá
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics Ibutest
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with WikipediaAshish Kulkarni
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONcscpconf
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnRwanEnan
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationIOSR Journals
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learningcsandit
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documentsmaria.grineva
 
Improving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in WikipediaImproving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in Wikipediachjshan
 

Similar to Word sense disambiguation and lexical chains construction using wordnet (20)

Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical Chains
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with Wikipedia
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense Disambiguation
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
 
Improving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in WikipediaImproving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in Wikipedia
 

More from University Politehnica Bucharest

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisUniversity Politehnica Bucharest
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...University Politehnica Bucharest
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...University Politehnica Bucharest
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisUniversity Politehnica Bucharest
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...University Politehnica Bucharest
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...University Politehnica Bucharest
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileUniversity Politehnica Bucharest
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaUniversity Politehnica Bucharest
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyUniversity Politehnica Bucharest
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUniversity Politehnica Bucharest
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentareaUniversity Politehnica Bucharest
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsUniversity Politehnica Bucharest
 

More from University Politehnica Bucharest (20)

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
 
Time series analysis for sales prediction
Time series analysis for sales predictionTime series analysis for sales prediction
Time series analysis for sales prediction
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
 
Identifying cyclic words with the help of google
Identifying cyclic words with the help of googleIdentifying cyclic words with the help of google
Identifying cyclic words with the help of google
 
Expression of Political Opinions in Press
Expression of Political Opinions in PressExpression of Political Opinions in Press
Expression of Political Opinions in Press
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysis
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profile
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corpora
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case study
 
Archaisms and neologisms identification in texts
Archaisms and neologisms identification in textsArchaisms and neologisms identification in texts
Archaisms and neologisms identification in texts
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesis
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentarea
 
Sentiment based text segmentation
Sentiment based text segmentationSentiment based text segmentation
Sentiment based text segmentation
 
Creativity detection in texts
Creativity detection in textsCreativity detection in texts
Creativity detection in texts
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chats
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
 
Metaphor detection
Metaphor detectionMetaphor detection
Metaphor detection
 

Recently uploaded

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 

Recently uploaded (20)

BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 

Word sense disambiguation and lexical chains construction using wordnet

  • 1. WORD SENSE DISAMBIGUATION AND LEXICAL CHAINS CONSTRUCTION USING WORDNET The Second Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity September 14th 2010, Bucharest, ROMANIA Costin-Gabriel Chiru, Andrei Jancă, Traian Rebedea Politehnica University of Bucharest
  • 2. Summary  The corpus  Lexical Chains  Semantic Distances  WordNet  Word sense disambiguation  Results  Further research
  • 3. The corpus  Chats consisting of 4 or 5 participants, debating subjects related to collaborative learning  High percentage of misspelled words  Keywords - “wiki” and “forum” – not present in WordNet with the proper sense  Utterances not necessarily a correct text in respect to English grammar
  • 4. Lexical chains  Lexical chain: set of words where each word has an acceptable degree of semantic relatedness to every other word in the set  Each word must be fitted into a lexical chain – How?  The percentage of words in the chain that are “related” to a word  Over 90% - word “belongs” to that chain
  • 5. Semantic distance  When are two words considered to be “related”?  Methods of computing a semantic distance between words – a number in respect to the strength of the semantic connection  Most methods use word frequency and the lowest superordinate => the need for an ontology
  • 6. WordNet  Ontology containing information about the semantic relationships between words  Words defining the same concept are grouped into sets - Synsets  Directed acyclic graph – synsets are nodes, semantic relationships are vertices => consistency with the lower superordinate concept
  • 7. Semantic distances in WN  Path length • If a vertex exists between two nodes, the two synsets are related through a semantic relationship • The length of such a path shows the strength of a relationship between two senses.  Conrath-Jiang measure • Uses the lower superordinate and word frequency • Word frequency – number of hits returned by a
  • 8. Word sense disambiguation  Assigning a sense to an ambiguous word, based on a context  Semantic distance is computed for senses, not for words  The following scenario might occur : two words are found to be semantically close, but not for their right senses
  • 9. Word sense disambiguation (2)  Context = a window of words  Window size – trade-off between time and quality of results  Our corpus : ideas and the subject of the utterances often alternate => a large window size is not necessary
  • 10. Word sense disambiguation (3)  Each word in the context has a list of senses  A set of word-sense pairs : for each word in the window, a sense is assigned  We must choose the best such set => a score must be computed for each set  What evaluation function should compute the score?
  • 11. Evaluation function for WSD  The degree of a word-sense pair : The number of senses related to that sense in a set  We use WN and semantic distances to determine if two senses are related  High degree – higher probability of a correct word-sense assignment  The average of all degrees in a set – high average = high score
  • 12. Evaluation function for WSD (2)  Problem : many word-sense pairs with very low degree and few with very high degree  All degrees should be “packed” around the average => standard deviation of all degrees.  Low standard deviation = high score  Low semantic distances = all senses are closely related => we need the average and standard deviation of all distances
  • 13. Results  Window size : 3-4 utterances  High threshold for computing semantic relatedness  The lowest superordinate is part of the shortest path between two senses – we ignore path lengths > 4  A word is included in a chain if it is related to 90% of the words present in that chain
  • 14. Results (2)  Vocabulary size for corpus - 1696 words  With WSD  Average chain length for entire corpus = 52.68 words  Number of chains = 649  Longest chain : 95 distinct words  Number of unitary chains (chains with a single distinct word) = 394 => 23.21 % of all words are part of such a chain  Without WSD  Average chain length for entire corpus = 94.48 words  Number of chains = 358 , of which 260 are unitary chains. Therefore, some chains are very long and probably inaccurate  Longest chain : 756 distinct words (over 40% of the vocabulary size)
  • 15. Further research  There is a high dependency between the linguistic tool (WN is used now) , the corpus and algorithms for tasks like lexical chaining and WSD  Bottlenecks and key points of this system must be identified  Wikipedia – can be used as an ontology (Wikipedia has a category graph), as well as a relevant corpus  Implementing a spell-checker to increase the number of words taken into account
  • 16. Further research (2)  Each word must be fitted into a lexical chain – How?  When is a chain stronger, rather than where does a word best belong • “Strength” of a chain = how closely related are the words • Output is a set of chains => “strength” of a set of chains • State-space searching : the output of the current lexical chaining algorithm is the initial state, while the final state is an acceptably strong set of chains
  • 17. Q & A

Editor's Notes

  1. Beginning course details and/or books/materials needed for a class/project.
  2. Beginning course details and/or books/materials needed for a class/project.
  3. Beginning course details and/or books/materials needed for a class/project.
  4. Beginning course details and/or books/materials needed for a class/project.
  5. Beginning course details and/or books/materials needed for a class/project.
  6. Beginning course details and/or books/materials needed for a class/project.
  7. Beginning course details and/or books/materials needed for a class/project.
  8. Beginning course details and/or books/materials needed for a class/project.
  9. Beginning course details and/or books/materials needed for a class/project.
  10. Beginning course details and/or books/materials needed for a class/project.
  11. Beginning course details and/or books/materials needed for a class/project.
  12. Beginning course details and/or books/materials needed for a class/project.
  13. Beginning course details and/or books/materials needed for a class/project.
  14. Beginning course details and/or books/materials needed for a class/project.
  15. Beginning course details and/or books/materials needed for a class/project.
  16. Beginning course details and/or books/materials needed for a class/project.