Computational Lexical Semantics Om Damani, IIT Bombay
Study of Word Meaning <ul><li>Word Sense Disambiguation </li></ul><ul><li>Word Similarity </li></ul><ul><li>WordNet Relati...
Word Sense Disambiguation (WSD) WSD Applications: Search, _____, ______
Sense Inventory <ul><li>Wordnet, Dictionary etc. </li></ul><ul><li>Plant  in English Wordnet (#senses ??): </li></ul><ul><...
Sense Inventory .. <ul><li>Plant (Verb Senses): </li></ul><ul><ul><li>plant, set (put or set (seeds, seedlings, or plants)...
How many Senses of  सच्चा <ul><li>Noun:  सत्यवादी ,  सच्चा ,  सत्यभाषी ,  सत्यवक्ता  -  वह जो सत्य बोलता हो  &quot; आधुनिक...
How many Senses of  आदमी <ul><li>आदमी ,  पुरुष ,  मर्द ,  नर  -  नर जाति का मनुष्य  &quot; आदमी और औरत की शारीरिक संरचनाएँ...
WSD: Problem Statement <ul><li>Given a string of words (sentence, phrase, set of key-words), and a set of senses for each ...
Solution Approaches <ul><li>Solution depends on what resources do you have: </li></ul><ul><ul><li>Definition, Gloss </li><...
Combinatorial Explosion Problem <ul><li>I saw a man who is 98 years old and can still walk and tell jokes </li></ul><ul><l...
Dictionary-Based WSD
Dictionary-Based WSD The bank did not give loan to him though he offered to mortgage his  boat . The   bank  did not give ...
How to improve the LESK further <ul><li>Give an example where the algo fails – say for  bank   </li></ul><ul><ul><li>“ The...
LESK Algorithm Function  Lesk ( word, sentence )  returns  best sense of  word context  :=  set  of words in  sentence; fo...
GetSignature ( sense   ) <ul><li>All words in example and gloss of  sense </li></ul><ul><li>All words in gloss of  sense <...
Ideal Signature <ul><li>For each word, get a Vector of all the words in the language </li></ul><ul><li>Work with a |V|x|V|...
ComputeRelevance( signature, context ) <ul><li>number of common words </li></ul><ul><ul><li>Favors longer definitions </li...
GetDefaultSense (  word  ) <ul><li>The most frequent sense </li></ul><ul><li>The most frequent sense in a given domain </l...
Power of the LESK Schema <ul><li>Signature can even be a topic/domain code: finance, poetry, geo-physics </li></ul><ul><li...
Possible Improvements <ul><li>LESK gives equal weightage to all senses - ‘right’ sense should be given more weight </li></...
Page-Rank-LESK
Fundamental Limitation of Dictionary Based Methods <ul><li>Depends too much on the exact word </li></ul><ul><ul><li>Anothe...
Supervised Learning <ul><li>Lesk-like methods depend too much on the exact word </li></ul><ul><ul><li>Another dictionary m...
Supervised Learning <ul><li>Machine can only learn what we ask it to </li></ul><ul><li>Collocation feature </li></ul><ul><...
Naïve Bayes Classifier Still the data sparsity problem remains Assumption: features are  conditionally independent given t...
Computing Naïve Bayes Probabilities if a collocational feature such as [ wi .2 = guitar] occurred 3 times for sense bass1,...
Decision Lists <ul><li>Rule ⇒ Sense </li></ul><ul><li>fish  within window ⇒  bass 1 </li></ul><ul><li>striped bass  ⇒  bas...
How to Create Decision Lists <ul><li>Which feature has the most discrimination power </li></ul><ul><li>Seems same as max  ...
Selectional Restrictions and Selectional Preferences <ul><li>“ In our house, everybody has a career and none of them inclu...
Selectional Preference Strength <ul><li>eat  ⇒ edible.  </li></ul><ul><li>be   ⇒ ?? </li></ul><ul><li>Strength:  P ( c ) v...
Selection Association <ul><li>a probabilistic measure of the strength of association between a predicate and a class domin...
How do we use Selection Association for WSD <ul><li>Use as a relatedness model </li></ul><ul><li>select the sense with hig...
Minimally Supervised WSD <ul><li>Supervised: needs sense tagged corpora </li></ul><ul><li>Dictionary based: needs large ex...
Bootstrapping <ul><li>Seed-set L0 of labeled instances, a much larger unlabeled corpus  V 0 </li></ul><ul><li>Train a deci...
Bootstrapping Success Depends On <ul><li>Choosing the initial seed-set </li></ul><ul><ul><li>One sense per collocation </l...
WSD: Summary <ul><li>It is a hard problem </li></ul><ul><li>In part because it is not a well-defined problem </li></ul><ul...
Hindi Wordnet <ul><li>Word n et - A lexical database  </li></ul><ul><li>Inspired by the English WordNet </li></ul><ul><li>...
Entry in Hindi Wordnet <ul><li>Synset  </li></ul><ul><li>{ गाय , गऊ ,  गैया ,  धेनु } </li></ul><ul><li>{gaaya ,gauu, gaiy...
S ubgraph for Noun गाय ,  गऊ ( gaaya ,gauu) Cow चौपाया , पशु (chaupaayaa, pashu) Four-legged animal  सींगवाला एक शाकाहारी ...
S ubgraph for Verb रोना , रुदन करना   ( ronaa, rudan karanaa ) to weep भावाभिव्यक्ति करना   ( bhaavaabhivyakti karanaa ) t...
Marathi Wordnet (Noun) खोड रान बाग आंबा लिंबू म ू ळ मुळे , खोड , फांद्या , पाने इत्यादींनी युक्त असा वनस्पतिविशेष :&quot; ...
Word Similarity <ul><li>In Lesk and other algo, we need to measure how related two words are </li></ul><ul><li>Simplest me...
Path Length: Limitations <ul><li>All edges are not equal </li></ul><ul><li>Compare  medium of exchange  and  standard  wit...
Information Content Word Similarity LCS( c 1, c 2) = the lowest common subsumer, i.e., the lowest node in the hierarchy th...
IC Similarity: Limitations <ul><li>A concept is not similar to itself using the previous defn </li></ul><ul><li>Word simil...
Overlap based Similarity <ul><li>Previous methods may not work for words belonging to different classes: car and petrol </...
WORD SIMILARITY: DISTRIBUTIONAL METHODS pointwise mutual information
Similarity using Feature Vectors
Cosine Distance Dot product favors long vectors
Conclusion <ul><li>Lot of care is needed in defining similarity measures </li></ul><ul><li>Impressive results can be obtai...
Backup
OrgLESK :  Taking Signature of Context Words into Account for Relatedness <ul><li>Disambiguating “pine cone” </li></ul><ul...
Does the Improvement Really Work <ul><li>Problem : Collateral  has not one but many senses: </li></ul><ul><ul><li>Noun: co...
Upcoming SlideShare
Loading in …5
×

ppt - CSE, IIT Bombay

1,156 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,156
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • In this slide : Only red things appear in hindi Wordnet not black.
  • I have changed the title so please see this in master Here I can not add the roman because the space constraint.
  • ppt - CSE, IIT Bombay

    1. 1. Computational Lexical Semantics Om Damani, IIT Bombay
    2. 2. Study of Word Meaning <ul><li>Word Sense Disambiguation </li></ul><ul><li>Word Similarity </li></ul><ul><li>WordNet Relations </li></ul><ul><li>Do we really know the meaning of meaning </li></ul><ul><ul><li>We will just take the dictionary definition as meaning </li></ul></ul>
    3. 3. Word Sense Disambiguation (WSD) WSD Applications: Search, _____, ______
    4. 4. Sense Inventory <ul><li>Wordnet, Dictionary etc. </li></ul><ul><li>Plant in English Wordnet (#senses ??): </li></ul><ul><li>Noun Senses: </li></ul><ul><ul><li>plant, works, industrial plant (buildings for carrying on industrial labor) &quot;they built a large plant to manufacture automobiles&quot; </li></ul></ul><ul><ul><li>plant, flora, plant life ((botany) a living organism lacking the power of locomotion) </li></ul></ul><ul><ul><li>plant (an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience) </li></ul></ul><ul><ul><li>plant (something planted secretly for discovery by another) &quot;the police used a plant to trick the thieves&quot;; &quot;he claimed that the evidence against him was a plant&quot; </li></ul></ul>
    5. 5. Sense Inventory .. <ul><li>Plant (Verb Senses): </li></ul><ul><ul><li>plant, set (put or set (seeds, seedlings, or plants) into the ground) &quot;Let's plant flowers in the garden&quot; </li></ul></ul><ul><ul><li>implant, engraft, embed, imbed, plant (fix or set securely or deeply) &quot;He planted a knee in the back of his opponent&quot;; &quot;The dentist implanted a tooth in the gum&quot; </li></ul></ul><ul><ul><li>establish, found, plant, constitute, institute (set up or lay the groundwork for) &quot;establish a new department&quot; </li></ul></ul><ul><ul><li>plant (place into a river) &quot;plant fish&quot; </li></ul></ul><ul><ul><li>plant (place something or someone in a certain position in order to secretly observe or deceive) &quot;Plant a spy in Moscow&quot;; &quot;plant bugs in the dissident's apartment&quot; </li></ul></ul><ul><ul><li>plant, implant (put firmly in the mind) &quot;Plant a thought in the students' minds&quot; </li></ul></ul>
    6. 6. How many Senses of सच्चा <ul><li>Noun: सत्यवादी , सच्चा , सत्यभाषी , सत्यवक्ता - वह जो सत्य बोलता हो &quot; आधुनिक समाज में भी सत्यवादियों की कमी नहीं है / यथार्थवादी होने के कारण कई लोग श्याम के दुश्मन बन गए हैं &quot; </li></ul><ul><li>Adjective(6) </li></ul><ul><li>सत्यवादी , सच्चा , सत्यभाषी , सत्यवक्ता - जो सत्य बोलता हो &quot; युधिष्ठिर एक सत्यवादी व्यक्ति थे &quot; </li></ul><ul><li>ईमानदार , छलहीन , निष्कपट , निःकपट , रिजु , ऋजु , दयानतदार , सच्चा , अपैशुन , सत्यपर - चित्त में सद्वृत्ति या अच्छी नीयत रखनेवाला , चोरी या छल - कपट न करनेवाला &quot; ईमानदार व्यक्ति सम्मान का पात्र होता है &quot; </li></ul><ul><li>वास्तविक , यथार्थ , सच्चा , सही , असली , वास्तव , अकाल्पनिक , अकल्पित , अकूट , प्रकृत - जो वास्तव में हो या हुआ हो या बिल्कुल ठीक &quot; मैंने अभी - अभी एक अविश्वसनीय पर वास्तविक घटना सुनी है &quot; </li></ul><ul><li>सच्चा , असली - जो झूठा या बनावटी न हो &quot; वह भारत माँ का सच्चा सपूत है &quot; </li></ul><ul><li>खरा , चोखा , सच्चा - जो ईमानदारी , निष्पक्षता , न्याय आदि के आधार पर हो &quot; हमें खरा सौदा करना चाहिए &quot; </li></ul><ul><li>खरा , सच्चा , सीधा - बिना किसी बहाने या समझौता के यानि सीधा &quot; वह इतना खरा नहीं है जितना दिखाता है“ </li></ul><ul><li>How do you know these are different senses </li></ul><ul><ul><li>Hint: think translation </li></ul></ul>
    7. 7. How many Senses of आदमी <ul><li>आदमी , पुरुष , मर्द , नर - नर जाति का मनुष्य &quot; आदमी और औरत की शारीरिक संरचनाएँ भिन्न होती हैं &quot; </li></ul><ul><li>मानव , आदमी , इंसान , इन्सान , इनसान , मनुष्य , मानुष , मानुस , मनुष , नर - वह द्विपद प्राणी जो अपने बुद्धिबल के कारण सब प्राणियों में श्रेष्ठ है और जिसके अंतर्गत हम , आप और सब लोग हैं &quot; आदमी अपनी बुद्धि के कारण सभी प्राणियों में श्रेष्ठ है &quot; </li></ul><ul><li>व्यक्ति , मानस , आदमी , शख़्स , शख्स , जन , बंदा , बन्दा - मनुष्य जाति या समूह में से कोई एक &quot; इस कार में दो ही आदमी बैठ सकते हैं &quot; </li></ul><ul><li>नौकर , सेवक , दास , अनुचर , ख़ादिम , मुलाज़िम , मुलाजिम , आदमी , टहलुआ , पार्षद , लौंडा , अनुग , अनुचारक , अनुचारी , अनुयायी , पाबंद , पाबन्द , नफर , अभिचर , भृत्य , गण , अभिसर , अभिसारी - वह जो सेवा करता हो &quot; मेरा आदमी एक हफ्ते के लिए घर गया है &quot; </li></ul><ul><li>पति , मर्द , शौहर , घरवाला , मियाँ , आदमी , ख़सम , खसम , स्वामी , अधीश , नाथ , कांत , कंत , परिणेता , वारयिता , दयित - स्त्री की दृष्टि से उसका विवाहित पुरुष &quot; शीला का आदमी किसानी करके परिवार का पालन - पोषण करता है“ </li></ul><ul><li>How do you know these are different senses </li></ul><ul><ul><li>Hint: think translation </li></ul></ul>
    8. 8. WSD: Problem Statement <ul><li>Given a string of words (sentence, phrase, set of key-words), and a set of senses for each word, decide the appropriate sense for each word. </li></ul><ul><li>Example: Translate ‘Where can I get spare parts for textile plant ?’ to Hindi </li></ul><ul><li>Solution: ?? </li></ul>
    9. 9. Solution Approaches <ul><li>Solution depends on what resources do you have: </li></ul><ul><ul><li>Definition, Gloss </li></ul></ul><ul><ul><li>Topic/Category label for each sense definition </li></ul></ul><ul><ul><li>Selectional preference for each sense </li></ul></ul><ul><ul><li>Sense Marked Corpora </li></ul></ul><ul><ul><li>Parallel Sense-Marked Corpora </li></ul></ul>
    10. 10. Combinatorial Explosion Problem <ul><li>I saw a man who is 98 years old and can still walk and tell jokes </li></ul><ul><li>See(26), man(11), year(4), old(8), can(5). Still(4), walk(10), tell(8), joke(3). </li></ul><ul><li>4,39,29,600 sense combinations </li></ul><ul><li>Solution: Viterbi ?? </li></ul>
    11. 11. Dictionary-Based WSD
    12. 12. Dictionary-Based WSD The bank did not give loan to him though he offered to mortgage his boat . The bank did not give loan to him though he offered to mortgage his boat . the slope beside a body of water “ they pulled the boat up on the bank”, “he watched the currents from the river bank ” Gloss Example bank a financial institution that accepts deposits and gives loan “ he cashed a check at the bank”, “that bank holds the mortgage on my home” Gloss Example bank
    13. 13. How to improve the LESK further <ul><li>Give an example where the algo fails – say for bank </li></ul><ul><ul><li>“ The bank did not give loan to him though he offered his boat as collateral . ” </li></ul></ul><ul><li>Problem: collateral is related to the bank but the relation does not come out clearly </li></ul><ul><li>Solution: See if the definition of bank and definition of collateral share a term: </li></ul><ul><ul><li>Collateral: security pledged for loan repayment </li></ul></ul><ul><li>Problem: Can you give an example where the new algorithm fails too </li></ul>
    14. 14. LESK Algorithm Function Lesk ( word, sentence ) returns best sense of word context := set of words in sentence; for each sense in senses of word do sense.signature := GetSignature (sense); sense.relevance := ComputeRelevance ( sense.signature, context ); end best-sense := M axR elevantSense () ; if ( best-sense.relevance == 0 ) best-sense := GetDefaultSense ( word); return best-sense; GetSignature ( sense ): Get all words in example and gloss of sense ComputeRelevance ( signature, context ): number of common words
    15. 15. GetSignature ( sense ) <ul><li>All words in example and gloss of sense </li></ul><ul><li>All words in gloss of sense </li></ul><ul><ul><li>All words in gloss of all words in the gloss of the given sense </li></ul></ul><ul><ul><ul><li>All words in gloss of all words in gloss of all words in gloss </li></ul></ul></ul><ul><ul><ul><ul><li>… .. </li></ul></ul></ul></ul><ul><li>Problem: </li></ul><ul><ul><li>Including the right sense of each word in gloss needs WSD </li></ul></ul><ul><ul><li>Including all senses of all words in gloss will lead to sense-drift </li></ul></ul><ul><li>Possible Solution: All context words in a sense marked corpora </li></ul>
    16. 16. Ideal Signature <ul><li>For each word, get a Vector of all the words in the language </li></ul><ul><li>Work with a |V|x|V| Matrix </li></ul><ul><li>Iterate over it, till it converges </li></ul>
    17. 17. ComputeRelevance( signature, context ) <ul><li>number of common words </li></ul><ul><ul><li>Favors longer definitions </li></ul></ul><ul><ul><li>| Set-Intersection | / | Set-Union | </li></ul></ul><ul><li>Define Relevance between two words </li></ul><ul><ul><li>Synonyms </li></ul></ul><ul><ul><li>Specialization, Generalization has to be accounted for – canoe and boat </li></ul></ul><ul><li>Sum of Relevance between all word pairs </li></ul><ul><li>Weigh different terms differently – maybe based on TF-IDF score </li></ul>
    18. 18. GetDefaultSense ( word ) <ul><li>The most frequent sense </li></ul><ul><li>The most frequent sense in a given domain </li></ul><ul><li>The most frequent sense as per the topic of the document </li></ul>
    19. 19. Power of the LESK Schema <ul><li>Signature can even be a topic/domain code: finance, poetry, geo-physics </li></ul><ul><li>All variations of ComputeRelevance function are still applicable </li></ul>
    20. 20. Possible Improvements <ul><li>LESK gives equal weightage to all senses - ‘right’ sense should be given more weight </li></ul><ul><ul><li>Iterative fashion – one at a time – most certain first </li></ul></ul><ul><ul><li>Page Rank like algo </li></ul></ul><ul><li>Give more weightage to Gloss than to Example </li></ul>
    21. 21. Page-Rank-LESK
    22. 22. Fundamental Limitation of Dictionary Based Methods <ul><li>Depends too much on the exact word </li></ul><ul><ul><li>Another dictionary may use different gloss and example </li></ul></ul><ul><ul><li>Use the context words from a tagged corpus as signature </li></ul></ul>
    23. 23. Supervised Learning <ul><li>Lesk-like methods depend too much on the exact word </li></ul><ul><ul><li>Another dictionary may use different gloss and example </li></ul></ul><ul><li>Use a sense-tagged corpora </li></ul><ul><li>Employ a machine learning algorithm </li></ul>
    24. 24. Supervised Learning <ul><li>Machine can only learn what we ask it to </li></ul><ul><li>Collocation feature </li></ul><ul><ul><li>Relative position (2 words to the left) </li></ul></ul><ul><ul><li>Words and POS </li></ul></ul><ul><ul><li>“ An electric guitar and bass player stand off to one side, not really part of the scene, ...” </li></ul></ul><ul><ul><li>[ wi −2,POS i −2, wi −1,POS i −1, wi +1,(20.2) POS i +1, wi +2,POS i +2] </li></ul></ul><ul><ul><li>[guitar, NN, and, CC, player, NN, stand, VB] </li></ul></ul><ul><li>Bag-of-words feature </li></ul><ul><ul><li>[ fishing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band ] </li></ul></ul><ul><ul><li>[0,0,0,1,0,0,0,0,0,0,1,0] </li></ul></ul>
    25. 25. Naïve Bayes Classifier Still the data sparsity problem remains Assumption: features are conditionally independent given the word sense a simple binary bag of words vector defined over a vocabulary of 20 words would have --- possible feature vectors.
    26. 26. Computing Naïve Bayes Probabilities if a collocational feature such as [ wi .2 = guitar] occurred 3 times for sense bass1, and sense bass1 itself occurred 60 times in training, theMLE estimate is P ( f j | s )= 0.05. it’s hard for humans to examine Naïve Bayes’s workings and understand its decisions. Hence use Decision lists
    27. 27. Decision Lists <ul><li>Rule ⇒ Sense </li></ul><ul><li>fish within window ⇒ bass 1 </li></ul><ul><li>striped bass ⇒ bass 1 </li></ul><ul><li>guitar within window ⇒ bass 2 </li></ul><ul><li>bass player ⇒ bass 2 </li></ul><ul><li>piano within window ⇒ bass 2 </li></ul><ul><li>tenor within window ⇒ bass 2 </li></ul><ul><li>sea bass ⇒ bass 1 </li></ul>
    28. 28. How to Create Decision Lists <ul><li>Which feature has the most discrimination power </li></ul><ul><li>Seems same as max </li></ul><ul><li>P (Sense | f) </li></ul><ul><li>Need Decision Trees </li></ul>
    29. 29. Selectional Restrictions and Selectional Preferences <ul><li>“ In our house, everybody has a career and none of them includes washing dishes ,” he says. </li></ul><ul><li>In her kitchen, Ms. Chen works efficiently, cooking several simple dishes . </li></ul><ul><li>Wash[+WASHABLE], cook[+EDIBLE] </li></ul><ul><li>Used more often for elimination than selection </li></ul><ul><li>Problem: Gold-rush fell apart in 1931, perhaps because people realized you can’t eat gold for lunch if you’re hungry. </li></ul><ul><li>Solution: Use these preferences as features/probabilities </li></ul>
    30. 30. Selectional Preference Strength <ul><li>eat ⇒ edible. </li></ul><ul><li>be ⇒ ?? </li></ul><ul><li>Strength: P ( c ) vs P ( c | v ) </li></ul><ul><li>Kullback-Leibler Divergence (Relative Entropy) </li></ul><ul><li>selectional association : contribution of that class to general selectional preference of the verb </li></ul>
    31. 31. Selection Association <ul><li>a probabilistic measure of the strength of association between a predicate and a class dominating the argument to the predicate </li></ul><ul><li>Verb, Semantic Class, Assoc, Semantic Class, Assoc </li></ul><ul><li>read WRITING 6.80 ACTIVITY -.20 </li></ul><ul><li>write WRITING 7.26 COMMERCE 0 </li></ul><ul><li>see ENTITY 5.79 METHOD -0.01 </li></ul>
    32. 32. How do we use Selection Association for WSD <ul><li>Use as a relatedness model </li></ul><ul><li>select the sense with highest selectional association between one of its ancestor hypernyms and the predicate. </li></ul>
    33. 33. Minimally Supervised WSD <ul><li>Supervised: needs sense tagged corpora </li></ul><ul><li>Dictionary based: needs large examples and gloss </li></ul><ul><li>Supervised approaches do better but are much more expensive </li></ul><ul><li>Can we get best of both words </li></ul>
    34. 34. Bootstrapping <ul><li>Seed-set L0 of labeled instances, a much larger unlabeled corpus V 0 </li></ul><ul><li>Train a decision-list classifier on seed-set L0 </li></ul><ul><li>Uses this classifier to label the corpus V 0 </li></ul><ul><li>Add to the training set examples in V 0 that it is confident about </li></ul><ul><li>Iterate { retrain decision-list classifier } </li></ul>
    35. 35. Bootstrapping Success Depends On <ul><li>Choosing the initial seed-set </li></ul><ul><ul><li>One sense per collocation </li></ul></ul><ul><ul><li>One sense per discourse </li></ul></ul><ul><li>Samples of bass sentences extracted from the WSJ using the simple correlates play and fish . </li></ul><ul><ul><li>We need more good teachers – right now, there are only a half a dozen who can play the free bass with ease. </li></ul></ul><ul><ul><li>And it all started when fish ermen decided the striped bass in Lake Mead were too skinny. </li></ul></ul><ul><li>Choosing the ‘confidence’ criterion </li></ul>
    36. 36. WSD: Summary <ul><li>It is a hard problem </li></ul><ul><li>In part because it is not a well-defined problem </li></ul><ul><li>Or it cannot be well-defined </li></ul><ul><li>Because making sense of ‘Sense’ is hard </li></ul>
    37. 37. Hindi Wordnet <ul><li>Word n et - A lexical database </li></ul><ul><li>Inspired by the English WordNet </li></ul><ul><li>Built conceptually </li></ul><ul><li>Synset ( synonym set ) is the basic building block. </li></ul>
    38. 38. Entry in Hindi Wordnet <ul><li>Synset </li></ul><ul><li>{ गाय , गऊ , गैया , धेनु } </li></ul><ul><li>{gaaya ,gauu, gaiyaa, dhenu}, Cow </li></ul><ul><li>Gloss </li></ul><ul><ul><li>Text definition </li></ul></ul><ul><ul><ul><li>सींगवाला एक शाकाहारी मादा चौपाया </li></ul></ul></ul><ul><li>(siingwaalaa eka shaakaahaarii maadaa choupaayaa) </li></ul><ul><li>( a horny, herbivorous, four-legged female animal ) </li></ul><ul><ul><li>Example sentence </li></ul></ul><ul><li>हिन्दू लोग गाय को गो माता कहते हैं एवं उसकी पूजा करते हैं। </li></ul><ul><li>(hinduu loga gaaya ko go maataa kahate hain evam usakii puujaa karate hain) </li></ul><ul><li>(The Hindus considers cow as mother and worship it.) </li></ul>
    39. 39. S ubgraph for Noun गाय , गऊ ( gaaya ,gauu) Cow चौपाया , पशु (chaupaayaa, pashu) Four-legged animal सींगवाला एक शाकाहारी मादा चौपाया (siingwaalaa eka sakaahaarii maadaa choupaayaa) A horny, herbivorous, four-legged female animal ) पगुराना ( paguraanaa ) ruminate बैल ( baila ) Ox कामधेनु kaamadhenu A kind of cow मैनी गाय mainii gaaya A kind of cow थन (thana) udder पूँछ (puunchh ) Tail शाकाहारी ( shaakaahaarii ) herbivorous Hypernym Attribute Hyponym Gloss Ability Verb meronym Antonym
    40. 40. S ubgraph for Verb रोना , रुदन करना ( ronaa, rudan karanaa ) to weep भावाभिव्यक्ति करना ( bhaavaabhivyakti karanaa ) to express हँसना ( hansanaa ) to laugh आँसू बहाना ( aansuu bahaanaa ) to weep सिसकना ( sisakanaa ) to sob रुलाना ( rulaanaa ) to make cry Hypernym Antonym Gloss Troponym Causative Verb Entailment
    41. 41. Marathi Wordnet (Noun) खोड रान बाग आंबा लिंबू म ू ळ मुळे , खोड , फांद्या , पाने इत्यादींनी युक्त असा वनस्पतिविशेष :&quot; झाडे पर्यावरण शुद्ध करण्याचे काम करतात &quot; झाड , वृक्ष , तरू वनस्पती M E R O N Y M Y HOLONYMY H Y P E R N Y M Y H Y P O N Y M Y GLOSS
    42. 42. Word Similarity <ul><li>In Lesk and other algo, we need to measure how related two words are </li></ul><ul><li>Simplest measure: pathLength - #edges in shortest path between sense nodes c 1 and c 2 </li></ul><ul><li>sim ( c 1, c 2) = −log pathlen( c 1, c 2) </li></ul><ul><li>wordsim( w 1, w 2) = max ( c 1∈senses( w 1), c 2∈senses( w 2)) sim( c 1, c 2) </li></ul>
    43. 43. Path Length: Limitations <ul><li>All edges are not equal </li></ul><ul><li>Compare medium of exchange and standard with coin and nickel </li></ul><ul><li>Need a distance measure on edges </li></ul>
    44. 44. Information Content Word Similarity LCS( c 1, c 2) = the lowest common subsumer, i.e., the lowest node in the hierarchy that subsumes (is a hypernym of) both c 1 and c 2 sim ( c 1, c 2) = −log P(LCS( c 1, c 2))
    45. 45. IC Similarity: Limitations <ul><li>A concept is not similar to itself using the previous defn </li></ul><ul><li>Word similarity is not about Information Contents. It is about commanality vs differences: </li></ul>
    46. 46. Overlap based Similarity <ul><li>Previous methods may not work for words belonging to different classes: car and petrol </li></ul><ul><li>similarity(A,B) = overlap(gloss(A), gloss(B)) + overlap(gloss(hypo(A)), gloss(hypo(B)))+ overlap(gloss(A), gloss(hypo(B))) + overlap(gloss(hypo(A)),gloss(B)) </li></ul>
    47. 47. WORD SIMILARITY: DISTRIBUTIONAL METHODS pointwise mutual information
    48. 48. Similarity using Feature Vectors
    49. 49. Cosine Distance Dot product favors long vectors
    50. 50. Conclusion <ul><li>Lot of care is needed in defining similarity measures </li></ul><ul><li>Impressive results can be obtained once similarity is carefully defined </li></ul>
    51. 51. Backup
    52. 52. OrgLESK : Taking Signature of Context Words into Account for Relatedness <ul><li>Disambiguating “pine cone” </li></ul><ul><ul><li>Neither ‘pine’ nor ‘cone’ appears in each other definitions </li></ul></ul><ul><li>pine </li></ul><ul><ul><li>1 a evergreen tree with needle-shaped leaves and solid wood </li></ul></ul><ul><ul><li>2 waste away through sorrow or illness </li></ul></ul><ul><li>cone </li></ul><ul><ul><li>1 solid body which narrows to a point </li></ul></ul><ul><ul><li>2 something of this shape whether solid or hollow </li></ul></ul><ul><ul><li>3 fruit of certain evergreen trees </li></ul></ul>
    53. 53. Does the Improvement Really Work <ul><li>Problem : Collateral has not one but many senses: </li></ul><ul><ul><li>Noun: collateral (a security pledged for the repayment of a loan) </li></ul></ul><ul><ul><li>Adjective </li></ul></ul><ul><ul><li>S: (adj) collateral, indirect (descended from a common ancestor but through different lines) &quot;cousins are collateral relatives&quot;; &quot;an indirect descendant of the Stuarts&quot; </li></ul></ul><ul><ul><li>S: (adj) collateral, confirmative, confirming, confirmatory, corroborative, corroboratory, substantiating, substantiative, validating, validatory, verificatory, verifying (serving to support or corroborate) &quot;collateral evidence&quot; </li></ul></ul><ul><ul><li>S: (adj) collateral (accompany, concomitant) &quot;collateral target damage from a bombing run&quot; </li></ul></ul><ul><ul><li>S: (adj) collateral (situated or running side by side) &quot;collateral ridges of mountains“ </li></ul></ul><ul><li>Solution ?? </li></ul>

    ×