SlideShare a Scribd company logo
WORD SENSE DISAMBIGUATION AND LEXICAL
CHAINS CONSTRUCTION USING WORDNET
The Second Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and
Connectivity
September 14th 2010, Bucharest,
ROMANIA
Costin-Gabriel Chiru, Andrei Jancă, Traian Rebedea
Politehnica University of Bucharest
Summary
 The corpus
 Lexical Chains
 Semantic Distances
 WordNet
 Word sense
disambiguation
 Results
 Further research
The corpus
 Chats consisting of 4 or 5 participants,
debating subjects related to collaborative
learning
 High percentage of misspelled words
 Keywords - “wiki” and “forum” – not present
in WordNet with the proper sense
 Utterances not necessarily a correct text in
respect to English grammar
Lexical chains
 Lexical chain: set of words where each word has
an acceptable degree of semantic relatedness to
every other word in the set
 Each word must be fitted into a lexical chain –
How?
 The percentage of words in the chain that are
“related” to a word
 Over 90% - word “belongs” to that chain
Semantic distance
 When are two words considered to be
“related”?
 Methods of computing a semantic distance
between words – a number in respect to the
strength of the semantic connection
 Most methods use word frequency and the
lowest superordinate => the need for an
ontology
WordNet
 Ontology containing information about the
semantic relationships between words
 Words defining the same concept are
grouped into sets - Synsets
 Directed acyclic graph – synsets are nodes,
semantic relationships are vertices =>
consistency with the lower superordinate
concept
Semantic distances in WN
 Path length
• If a vertex exists between two nodes, the two
synsets are related through a semantic
relationship
• The length of such a path shows the strength of
a relationship between two senses.
 Conrath-Jiang measure
• Uses the lower superordinate and word
frequency
• Word frequency – number of hits returned by a
Word sense disambiguation
 Assigning a sense to an ambiguous word,
based on a context
 Semantic distance is computed for senses,
not for words
 The following scenario might occur : two
words are found to be semantically close, but
not for their right senses
Word sense disambiguation (2)
 Context = a window of words
 Window size – trade-off between time and
quality of results
 Our corpus : ideas and the subject of the
utterances often alternate => a large window
size is not necessary
Word sense disambiguation (3)
 Each word in the context has a list of senses
 A set of word-sense pairs : for each word in
the window, a sense is assigned
 We must choose the best such set => a score
must be computed for each set
 What evaluation function should compute the
score?
Evaluation function for WSD
 The degree of a word-sense pair : The number
of senses related to that sense in a set
 We use WN and semantic distances to
determine if two senses are related
 High degree – higher probability of a correct
word-sense assignment
 The average of all degrees in a set – high
average = high score
Evaluation function for WSD (2)
 Problem : many word-sense pairs with very low
degree and few with very high degree
 All degrees should be “packed” around the
average => standard deviation of all degrees.
 Low standard deviation = high score
 Low semantic distances = all senses are closely
related => we need the average and standard
deviation of all distances
Results
 Window size : 3-4 utterances
 High threshold for computing semantic
relatedness
 The lowest superordinate is part of the
shortest path between two senses – we
ignore path lengths > 4
 A word is included in a chain if it is related to
90% of the words present in that chain
Results (2)
 Vocabulary size for corpus - 1696 words
 With WSD
 Average chain length for entire corpus = 52.68 words
 Number of chains = 649
 Longest chain : 95 distinct words
 Number of unitary chains (chains with a single distinct word) =
394 => 23.21 % of all words are part of such a chain
 Without WSD
 Average chain length for entire corpus = 94.48 words
 Number of chains = 358 , of which 260 are unitary chains.
Therefore, some chains are very long and probably inaccurate
 Longest chain : 756 distinct words (over 40% of the vocabulary
size)
Further research
 There is a high dependency between the
linguistic tool (WN is used now) , the corpus and
algorithms for tasks like lexical chaining and
WSD
 Bottlenecks and key points of this system must
be identified
 Wikipedia – can be used as an ontology
(Wikipedia has a category graph), as well as a
relevant corpus
 Implementing a spell-checker to increase the
number of words taken into account
Further research (2)
 Each word must be fitted into a lexical chain –
How?
 When is a chain stronger, rather than where
does a word best belong
• “Strength” of a chain = how closely related are
the words
• Output is a set of chains => “strength” of a set of
chains
• State-space searching : the output of the current
lexical chaining algorithm is the initial state, while
the final state is an acceptably strong set of
chains
Q & A

More Related Content

Similar to Word sense disambiguation and lexical chains construction using wordnet

Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
dinesh_joshy
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
Rubén Izquierdo Beviá
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
Rubén Izquierdo Beviá
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Chunyang Chen
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
butest
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
International Journal of Engineering Inventions www.ijeijournal.com
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
Milind Gokhale
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
Andre Freitas
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
IRJET Journal
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET Journal
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
Noobie312
 
Sentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptxSentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptx
E&S Education Department, KP
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with Wikipedia
Ashish Kulkarni
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
cscpconf
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
ijnlc
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
RwanEnan
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
Enterprise Search Warsaw Meetup
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense Disambiguation
IOSR Journals
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
csandit
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
maria.grineva
 

Similar to Word sense disambiguation and lexical chains construction using wordnet (20)

Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
RANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpusRANLP 2013: DutchSemcor in quest of the ideal corpus
RANLP 2013: DutchSemcor in quest of the ideal corpus
 
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged CorpusRANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
RANLP2013: DutchSemCor, in Quest of the Ideal Sense Tagged Corpus
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Wsd final paper
Wsd final paperWsd final paper
Wsd final paper
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical Chains
 
1 l5eng
1 l5eng1 l5eng
1 l5eng
 
Sentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptxSentence Processing by Muhammad Saleem.pptx
Sentence Processing by Muhammad Saleem.pptx
 
Learning to Link with Wikipedia
Learning to Link with WikipediaLearning to Link with Wikipedia
Learning to Link with Wikipedia
 
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITIONSEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
SEMANTIC NETWORK BASED MECHANISMS FOR KNOWLEDGE ACQUISITION
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
 
Semeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic SimilaritySemeval Deep Learning In Semantic Similarity
Semeval Deep Learning In Semantic Similarity
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense Disambiguation
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
Extracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme DocumentsExtracting Key Terms From Noisy and Multi-theme Documents
Extracting Key Terms From Noisy and Multi-theme Documents
 

More from University Politehnica Bucharest

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
University Politehnica Bucharest
 
Time series analysis for sales prediction
Time series analysis for sales predictionTime series analysis for sales prediction
Time series analysis for sales prediction
University Politehnica Bucharest
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...
University Politehnica Bucharest
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
University Politehnica Bucharest
 
Identifying cyclic words with the help of google
Identifying cyclic words with the help of googleIdentifying cyclic words with the help of google
Identifying cyclic words with the help of google
University Politehnica Bucharest
 
Expression of Political Opinions in Press
Expression of Political Opinions in PressExpression of Political Opinions in Press
Expression of Political Opinions in Press
University Politehnica Bucharest
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysis
University Politehnica Bucharest
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...
University Politehnica Bucharest
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...
University Politehnica Bucharest
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profile
University Politehnica Bucharest
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corpora
University Politehnica Bucharest
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case study
University Politehnica Bucharest
 
Archaisms and neologisms identification in texts
Archaisms and neologisms identification in textsArchaisms and neologisms identification in texts
Archaisms and neologisms identification in texts
University Politehnica Bucharest
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesis
University Politehnica Bucharest
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentarea
University Politehnica Bucharest
 
Sentiment based text segmentation
Sentiment based text segmentationSentiment based text segmentation
Sentiment based text segmentation
University Politehnica Bucharest
 
Creativity detection in texts
Creativity detection in textsCreativity detection in texts
Creativity detection in texts
University Politehnica Bucharest
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chats
University Politehnica Bucharest
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
University Politehnica Bucharest
 
Metaphor detection
Metaphor detectionMetaphor detection

More from University Politehnica Bucharest (20)

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
 
Time series analysis for sales prediction
Time series analysis for sales predictionTime series analysis for sales prediction
Time series analysis for sales prediction
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
 
Identifying cyclic words with the help of google
Identifying cyclic words with the help of googleIdentifying cyclic words with the help of google
Identifying cyclic words with the help of google
 
Expression of Political Opinions in Press
Expression of Political Opinions in PressExpression of Political Opinions in Press
Expression of Political Opinions in Press
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysis
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profile
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corpora
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case study
 
Archaisms and neologisms identification in texts
Archaisms and neologisms identification in textsArchaisms and neologisms identification in texts
Archaisms and neologisms identification in texts
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesis
 
Tweets topic modelling across different countries prezentarea
Tweets topic modelling across different countries   prezentareaTweets topic modelling across different countries   prezentarea
Tweets topic modelling across different countries prezentarea
 
Sentiment based text segmentation
Sentiment based text segmentationSentiment based text segmentation
Sentiment based text segmentation
 
Creativity detection in texts
Creativity detection in textsCreativity detection in texts
Creativity detection in texts
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chats
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
 
Metaphor detection
Metaphor detectionMetaphor detection
Metaphor detection
 

Recently uploaded

Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 

Recently uploaded (20)

Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 

Word sense disambiguation and lexical chains construction using wordnet

  • 1. WORD SENSE DISAMBIGUATION AND LEXICAL CHAINS CONSTRUCTION USING WORDNET The Second Workshop on Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity September 14th 2010, Bucharest, ROMANIA Costin-Gabriel Chiru, Andrei Jancă, Traian Rebedea Politehnica University of Bucharest
  • 2. Summary  The corpus  Lexical Chains  Semantic Distances  WordNet  Word sense disambiguation  Results  Further research
  • 3. The corpus  Chats consisting of 4 or 5 participants, debating subjects related to collaborative learning  High percentage of misspelled words  Keywords - “wiki” and “forum” – not present in WordNet with the proper sense  Utterances not necessarily a correct text in respect to English grammar
  • 4. Lexical chains  Lexical chain: set of words where each word has an acceptable degree of semantic relatedness to every other word in the set  Each word must be fitted into a lexical chain – How?  The percentage of words in the chain that are “related” to a word  Over 90% - word “belongs” to that chain
  • 5. Semantic distance  When are two words considered to be “related”?  Methods of computing a semantic distance between words – a number in respect to the strength of the semantic connection  Most methods use word frequency and the lowest superordinate => the need for an ontology
  • 6. WordNet  Ontology containing information about the semantic relationships between words  Words defining the same concept are grouped into sets - Synsets  Directed acyclic graph – synsets are nodes, semantic relationships are vertices => consistency with the lower superordinate concept
  • 7. Semantic distances in WN  Path length • If a vertex exists between two nodes, the two synsets are related through a semantic relationship • The length of such a path shows the strength of a relationship between two senses.  Conrath-Jiang measure • Uses the lower superordinate and word frequency • Word frequency – number of hits returned by a
  • 8. Word sense disambiguation  Assigning a sense to an ambiguous word, based on a context  Semantic distance is computed for senses, not for words  The following scenario might occur : two words are found to be semantically close, but not for their right senses
  • 9. Word sense disambiguation (2)  Context = a window of words  Window size – trade-off between time and quality of results  Our corpus : ideas and the subject of the utterances often alternate => a large window size is not necessary
  • 10. Word sense disambiguation (3)  Each word in the context has a list of senses  A set of word-sense pairs : for each word in the window, a sense is assigned  We must choose the best such set => a score must be computed for each set  What evaluation function should compute the score?
  • 11. Evaluation function for WSD  The degree of a word-sense pair : The number of senses related to that sense in a set  We use WN and semantic distances to determine if two senses are related  High degree – higher probability of a correct word-sense assignment  The average of all degrees in a set – high average = high score
  • 12. Evaluation function for WSD (2)  Problem : many word-sense pairs with very low degree and few with very high degree  All degrees should be “packed” around the average => standard deviation of all degrees.  Low standard deviation = high score  Low semantic distances = all senses are closely related => we need the average and standard deviation of all distances
  • 13. Results  Window size : 3-4 utterances  High threshold for computing semantic relatedness  The lowest superordinate is part of the shortest path between two senses – we ignore path lengths > 4  A word is included in a chain if it is related to 90% of the words present in that chain
  • 14. Results (2)  Vocabulary size for corpus - 1696 words  With WSD  Average chain length for entire corpus = 52.68 words  Number of chains = 649  Longest chain : 95 distinct words  Number of unitary chains (chains with a single distinct word) = 394 => 23.21 % of all words are part of such a chain  Without WSD  Average chain length for entire corpus = 94.48 words  Number of chains = 358 , of which 260 are unitary chains. Therefore, some chains are very long and probably inaccurate  Longest chain : 756 distinct words (over 40% of the vocabulary size)
  • 15. Further research  There is a high dependency between the linguistic tool (WN is used now) , the corpus and algorithms for tasks like lexical chaining and WSD  Bottlenecks and key points of this system must be identified  Wikipedia – can be used as an ontology (Wikipedia has a category graph), as well as a relevant corpus  Implementing a spell-checker to increase the number of words taken into account
  • 16. Further research (2)  Each word must be fitted into a lexical chain – How?  When is a chain stronger, rather than where does a word best belong • “Strength” of a chain = how closely related are the words • Output is a set of chains => “strength” of a set of chains • State-space searching : the output of the current lexical chaining algorithm is the initial state, while the final state is an acceptably strong set of chains
  • 17. Q & A

Editor's Notes

  1. Beginning course details and/or books/materials needed for a class/project.
  2. Beginning course details and/or books/materials needed for a class/project.
  3. Beginning course details and/or books/materials needed for a class/project.
  4. Beginning course details and/or books/materials needed for a class/project.
  5. Beginning course details and/or books/materials needed for a class/project.
  6. Beginning course details and/or books/materials needed for a class/project.
  7. Beginning course details and/or books/materials needed for a class/project.
  8. Beginning course details and/or books/materials needed for a class/project.
  9. Beginning course details and/or books/materials needed for a class/project.
  10. Beginning course details and/or books/materials needed for a class/project.
  11. Beginning course details and/or books/materials needed for a class/project.
  12. Beginning course details and/or books/materials needed for a class/project.
  13. Beginning course details and/or books/materials needed for a class/project.
  14. Beginning course details and/or books/materials needed for a class/project.
  15. Beginning course details and/or books/materials needed for a class/project.
  16. Beginning course details and/or books/materials needed for a class/project.