Your SlideShare is downloading. ×
0
The Semantic Quilt:  Contexts, Co-occurrences, Kernels, and Ontologies  Ted Pedersen University of Minnesota, Duluth http:...
Create by stitching together
Sew together different materials
Ontologies Co-Occurrences Kernels Contexts
Semantics in NLP <ul><li>Potentially useful for many applications </li></ul><ul><ul><li>Machine Translation </li></ul></ul...
What do we mean by  semantics ? …it depends on our resources… <ul><li>Ontologies – relationships among concepts </li></ul>...
What level of granularity? <ul><li>words </li></ul><ul><li>terms / collocations </li></ul><ul><li>phrases </li></ul><ul><l...
The Terrible Tension : Ambiguity versus Granularity <ul><li>Words are potentially very ambiguous </li></ul><ul><ul><li>But...
The Current State of Affairs <ul><li>Most resources and methods focus on word or term semantics  </li></ul><ul><ul><li>mak...
Similarity as Organizing Principle <ul><li>Measure word association using knowledge lean methods that are based on co-occu...
Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>basebal...
Things we want to do… <ul><li>Integrate different resources and methods </li></ul><ul><li>Solve bigger problems </li></ul>...
Semantic Patches to Sew Together <ul><li>Contexts </li></ul><ul><ul><li>SenseClusters : measures similarity between writte...
Ontologies Co-Occurrences Kernels Contexts
Ngram Statistics Package http:// ngram.sourceforge.net Co-Occurrences
Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>basebal...
Co-occurrences and semantics? <ul><li>individual words (esp. common ones) are very ambiguous </li></ul><ul><ul><li>bat  </...
Why pairs of words? <ul><li>Zipf's Law </li></ul><ul><ul><li>most words are rare, most bigrams are even more rare, most ng...
Bigrams <ul><li>Window Size of 2 </li></ul><ul><ul><li>baseball bat, fine wine, apple orchard, bill clinton </li></ul></ul...
“ occur together more often than expected by chance…” <ul><li>Observed frequencies for two words occurring together and al...
Measures and Tests of Association  http:// ngram.sourceforge.net <ul><li>Log-likelihood Ratio </li></ul><ul><li>Mutual Inf...
What do we get at the end? <ul><li>A list of bigrams or co-occurrences that are significant  or interesting (meaningful?) ...
Ontologies Co-Occurrences Kernels Contexts
SenseClusters http:// senseclusters.sourceforge.net Contexts
Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>basebal...
Identify Similar Contexts <ul><li>Find phrases that say the same thing using different words </li></ul><ul><ul><li>I went ...
SenseClusters Methodology <ul><li>Represent contexts using first or second order co-occurrences  </li></ul><ul><li>Reduce ...
Second Order Features <ul><li>Second order features encode something ‘extra’ about a feature that occurs in a context, som...
Similar Contexts may have the same meaning…   <ul><ul><li>Context 1: He drives his car fast  </li></ul></ul><ul><ul><li>Co...
Second Order Context Representation <ul><li>Bigrams used to create a word matrix </li></ul><ul><ul><li>Cell values = log-l...
2 nd  Order Context Vectors <ul><li>He won an Oscar, but  Tom Hanks  is still a nice guy. </li></ul>0 6272.85 2.9133 62.60...
After context representation… <ul><li>Second order vector is an average of word vectors that make up context, captures ind...
What do we get at the end? <ul><li>contexts organized into some number of clusters based on the  similarity  of their co-o...
Ontologies  WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
Oh…we also get plenty of these… <ul><li>Similarity Matrices… </li></ul><ul><ul><li>Word by Word  </li></ul></ul><ul><ul><l...
The WSD-Shell http:// www.d.umn.edu/~tpederse/supervised.html Kernels
Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>basebal...
Machine Learning Approach <ul><li>Annotate text with sense tags </li></ul><ul><ul><li>must select sense inventory </li></u...
Kernel Methods <ul><li>The challenge for any learning algorithm is to separate the training data into groups by finding a ...
Kernels are similarity matrices <ul><li>NSP  produces word by word similarity matrices, for use by SenseClusters </li></ul...
What do we get at the end? <ul><li>More accurate supervised classifiers that potentially require less training data </li><...
Ontologies  WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
WordNet-Similarity http://wn-similarity.sourceforge.net Ontologies
Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>basebal...
Similarity and Relatedness <ul><li>Two concepts are similar if they are connected by  is-a  relationships. </li></ul><ul><...
WordNet-Similarity http://wn-similarity.sourceforge.net <ul><li>Path based measures </li></ul><ul><ul><li>Shortest path (p...
Path Finding <ul><li>Find shortest is-a path between two concepts? </li></ul><ul><ul><li>Rada, et. al. (1989) </li></ul></...
watercraft  instrumentality object artifact conveyance vehicle motor-vehicle car boat ark article ware table-ware cutlery ...
Information Content <ul><li>Measure of specificity in is-a hierarchy (Resnik, 1995) </li></ul><ul><ul><li>-log (probabilit...
Observed “car”... motor vehicle (327 +1) *root* (32783 + 1) minicab (6) cab (23) car (73 +1) bus (17) stock car (12)
Observed “stock car”... motor vehicle (328+1) *root* (32784+1) minicab (6) cab (23) car (74+1) bus (17) stock car (12+1)
After Counting Concepts...  motor vehicle (329) IC = 1.9 *root* (32785) minicab (6) cab (23) car (75) bus (17) IC = 3.5 st...
Similarity and Information Content <ul><li>Resnik (1995) use information content of least common subsumer to express simil...
What do we get at the end? <ul><li>Similarity (or relatedness) scores between pairs of words / concepts that are based on ...
Ontologies  WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
Wait a minute …  <ul><li>Concepts must be organized in a hierarchy, and connected in that hierarchy </li></ul><ul><ul><li>...
Using Dictionary Glosses  to Measure Relatedness <ul><li>Lesk (1985) Algorithm – measure relatedness of two concepts by co...
Gloss Vectors <ul><li>Leskian approaches require exact matches in glosses </li></ul><ul><ul><li>Glosses are short, use rel...
What do we get at the end? <ul><li>Relatedness scores between pairs of words / concepts that are based on content of WordN...
Why measure conceptual similarity?  <ul><li>A word will take the sense that is most related to the surrounding context </l...
Word Sense Disambiguation <ul><li>… can be performed by finding the sense of a word most related to its neighbors </li></u...
WordNet-SenseRelate http:// senserelate.sourceforge.net SenseRelate
SenseRelate Algorithm <ul><li>For each sense of a target word in context </li></ul><ul><ul><li>For each content word in th...
Coverage… <ul><li>WordNet </li></ul><ul><ul><li>Nouns – 82,000 concepts </li></ul></ul><ul><ul><li>Verbs – 14,000 concepts...
What do we get at the end? <ul><li>Can assign a sense to every word (known to WordNet) in running text </li></ul><ul><li>C...
Ontologies  WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
Kernels are similarity matrices <ul><li>NSP  produces word by word similarity matrices, for use by SenseClusters </li></ul...
Ontologies  WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
SenseClusters Input … matrices <ul><li>Word by Word co-occurrences to create second order representation (Native) </li></u...
Ontologies  WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
Identifying Collocations <ul><li>…could benefit from word clusters found in SenseClusters </li></ul><ul><li>…could benefit...
Ontologies  WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
 
Conclusion <ul><li>Time to integrate what we have at the word and term level </li></ul><ul><ul><li>look for ways to stitch...
Many Thanks…  <ul><li>SenseClusters </li></ul><ul><ul><li>Amruta Purandare (MS '04) </li></ul></ul><ul><ul><li>Anagha Kulk...
URLs <ul><li>Ngram Statistics Package </li></ul><ul><ul><li>http:// ngram.sourceforge.net </li></ul></ul><ul><li>SenseClus...
Upcoming SlideShare
Loading in...5
×

The Semantic Quilt

770

Published on

A talk on the Semantic Quilt, which combines various methods of "doing semantics" into a more unified framework.

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
770
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
35
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "The Semantic Quilt"

  1. 1. The Semantic Quilt: Contexts, Co-occurrences, Kernels, and Ontologies Ted Pedersen University of Minnesota, Duluth http:// www.d.umn.edu/~tpederse
  2. 2. Create by stitching together
  3. 3. Sew together different materials
  4. 4. Ontologies Co-Occurrences Kernels Contexts
  5. 5. Semantics in NLP <ul><li>Potentially useful for many applications </li></ul><ul><ul><li>Machine Translation </li></ul></ul><ul><ul><li>Document or Story Understanding </li></ul></ul><ul><ul><li>Text Generation </li></ul></ul><ul><ul><li>Web Search </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Can come from many sources </li></ul><ul><li>Not well integrated </li></ul><ul><li>Not well defined? </li></ul>
  6. 6. What do we mean by semantics ? …it depends on our resources… <ul><li>Ontologies – relationships among concepts </li></ul><ul><ul><li>Similar / related concepts connected </li></ul></ul><ul><li>Dictionary – definitions of senses / concepts </li></ul><ul><ul><li>similar / related senses have similar / related definitions </li></ul></ul><ul><li>Contexts – short passages of words </li></ul><ul><ul><li>similar / related words occur in similar / related contexts </li></ul></ul><ul><li>Co-occurrences – </li></ul><ul><ul><li>a word is defined by the company it keeps </li></ul></ul><ul><ul><li>words that occur with the same kinds words are similar / related </li></ul></ul>
  7. 7. What level of granularity? <ul><li>words </li></ul><ul><li>terms / collocations </li></ul><ul><li>phrases </li></ul><ul><li>sentences </li></ul><ul><li>paragraphs </li></ul><ul><li>documents </li></ul><ul><li>books </li></ul>
  8. 8. The Terrible Tension : Ambiguity versus Granularity <ul><li>Words are potentially very ambiguous </li></ul><ul><ul><li>But we can list them (sort of) </li></ul></ul><ul><ul><li>… we can define their meanings (sort of) </li></ul></ul><ul><ul><li>… not ambiguous to human reader, but hard for a computer to know which meaning is intended </li></ul></ul><ul><li>Terms / collocations are less ambiguous </li></ul><ul><ul><li>Difficult to enumerate because there are so many, but can be done for a domain (e.g., medicine) </li></ul></ul><ul><li>Phrases (short contexts) can still be ambiguous, but not to the same degree as words or terms/collocations </li></ul>
  9. 9. The Current State of Affairs <ul><li>Most resources and methods focus on word or term semantics </li></ul><ul><ul><li>makes it possible to build resources (manually or automatically) with reasonable coverage, but … </li></ul></ul><ul><ul><li>… techniques become very resource dependent </li></ul></ul><ul><ul><li>… resources become language dependent </li></ul></ul><ul><ul><li>… introduces a lot of ambiguity </li></ul></ul><ul><ul><li>… not clear how to bring together resources </li></ul></ul><ul><li>Similarity is a useful organizing principle, but … </li></ul><ul><ul><li>… there are lots of ways to be similar </li></ul></ul>
  10. 10. Similarity as Organizing Principle <ul><li>Measure word association using knowledge lean methods that are based on co-occurrence information from large corpora </li></ul><ul><li>Measure contextual similarity using knowledge lean methods that are based on co-occurrence information from large corpora </li></ul><ul><li>Measure conceptual similarity / relatedness using a structured repository of knowledge </li></ul><ul><ul><li>Lexical database WordNet </li></ul></ul><ul><ul><li>Unified Medical Language System (UMLS) </li></ul></ul>
  11. 11. Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>baseball bat </li></ul></ul><ul><li>Identify similar contexts </li></ul><ul><ul><li>I bought some food at the store </li></ul></ul><ul><ul><li>I purchased something to eat at the market </li></ul></ul><ul><li>Assign meanings to words </li></ul><ul><ul><li>I went to the bank /[financial-inst.] to deposit my check </li></ul></ul><ul><li>Identify similar (or related) concepts </li></ul><ul><ul><li>frog : amphibian </li></ul></ul><ul><ul><li>Duluth : snow </li></ul></ul>
  12. 12. Things we want to do… <ul><li>Integrate different resources and methods </li></ul><ul><li>Solve bigger problems </li></ul><ul><ul><li>some of what we do now is a means to an unclear end </li></ul></ul><ul><li>Be Language Independent </li></ul><ul><li>Offer Broad Coverage </li></ul><ul><li>Reduce dependence on manually built resources </li></ul><ul><ul><li>ontologies, dictionaries, labeled training data… </li></ul></ul>
  13. 13. Semantic Patches to Sew Together <ul><li>Contexts </li></ul><ul><ul><li>SenseClusters : measures similarity between written texts (i.e., contexts) </li></ul></ul><ul><li>Co-Occurrences </li></ul><ul><ul><li>Ngram Statistics Package : measures association between words, identify collocations or terms </li></ul></ul><ul><li>Kernels </li></ul><ul><ul><li>WSD-Shell : supervised learning for word sense disambiguation, in process of including SVMs with user defined kernels </li></ul></ul><ul><li>“ Ontologies” </li></ul><ul><ul><li>WordNet-Similarity : measures similarity between concepts found in WordNet </li></ul></ul><ul><ul><li>UMLS-Similarity </li></ul></ul><ul><ul><li> All of these are projects at the University of Minnesota, Duluth </li></ul></ul>
  14. 14. Ontologies Co-Occurrences Kernels Contexts
  15. 15. Ngram Statistics Package http:// ngram.sourceforge.net Co-Occurrences
  16. 16. Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>baseball bat </li></ul></ul><ul><li>Identify similar contexts </li></ul><ul><ul><li>I bought some food at the store </li></ul></ul><ul><ul><li>I purchased something to eat at the market </li></ul></ul><ul><li>Assign meanings to words </li></ul><ul><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></ul><ul><li>Identify similar (or related) concepts </li></ul><ul><ul><li>frog : amphibian </li></ul></ul><ul><ul><li>Duluth : snow </li></ul></ul>
  17. 17. Co-occurrences and semantics? <ul><li>individual words (esp. common ones) are very ambiguous </li></ul><ul><ul><li>bat </li></ul></ul><ul><ul><li>line </li></ul></ul><ul><li>pairs of words disambiguate each other </li></ul><ul><ul><li>baseball bat </li></ul></ul><ul><ul><li>vampire … Transylvania </li></ul></ul><ul><ul><li>product line </li></ul></ul><ul><ul><li>speech …. line </li></ul></ul>
  18. 18. Why pairs of words? <ul><li>Zipf's Law </li></ul><ul><ul><li>most words are rare, most bigrams are even more rare, most ngrams are even rarer still </li></ul></ul><ul><ul><li>the more common a word, the more senses it will have </li></ul></ul><ul><li>“ Co-occurrences” are less frequent than individual words, tend to be less ambiguous as a result </li></ul><ul><ul><li>Mutually disambiguating </li></ul></ul>
  19. 19. Bigrams <ul><li>Window Size of 2 </li></ul><ul><ul><li>baseball bat, fine wine, apple orchard, bill clinton </li></ul></ul><ul><li>Window Size of 3 </li></ul><ul><ul><li>house of representatives, bottle of wine, </li></ul></ul><ul><li>Window Size of 4 </li></ul><ul><ul><li>president of the republic, whispering in the wind </li></ul></ul><ul><li>Selected using a small window size (2-4 words) </li></ul><ul><li>Objective is to capture a regular or localized pattern between two words (collocation?) </li></ul><ul><li>If order doesn’t matter, then these are co-occurrences … </li></ul>
  20. 20. “ occur together more often than expected by chance…” <ul><li>Observed frequencies for two words occurring together and alone are stored in a 2x2 matrix </li></ul><ul><li>Expected values are calculated, based on the model of independence and observed values </li></ul><ul><ul><li>How often would you expect these words to occur together, if they only occurred together by chance? </li></ul></ul><ul><ul><li>If two words occur “significantly” more often than the expected value, then the words do not occur together by chance. </li></ul></ul>
  21. 21. Measures and Tests of Association http:// ngram.sourceforge.net <ul><li>Log-likelihood Ratio </li></ul><ul><li>Mutual Information </li></ul><ul><li>Pointwise Mutual Information </li></ul><ul><li>Pearson’s Chi-squared Test </li></ul><ul><li>Phi coefficient </li></ul><ul><li>Fisher’s Exact Test </li></ul><ul><li>T-test </li></ul><ul><li>Dice Coefficient </li></ul><ul><li>Odds Ratio </li></ul>
  22. 22. What do we get at the end? <ul><li>A list of bigrams or co-occurrences that are significant or interesting (meaningful?) </li></ul><ul><ul><li>automatic </li></ul></ul><ul><ul><li>language independent </li></ul></ul><ul><li>These can be used as building blocks for systems that do semantic processing </li></ul><ul><ul><li>relatively unambiguous </li></ul></ul><ul><ul><li>often very informative about topic or domain </li></ul></ul><ul><ul><li>can serve as a fingerprint for a document or book </li></ul></ul>
  23. 23. Ontologies Co-Occurrences Kernels Contexts
  24. 24. SenseClusters http:// senseclusters.sourceforge.net Contexts
  25. 25. Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>baseball bat </li></ul></ul><ul><li>Identify similar contexts </li></ul><ul><ul><li>I bought some food at the store </li></ul></ul><ul><ul><li>I purchased something to eat at the market </li></ul></ul><ul><li>Assign meanings to words </li></ul><ul><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></ul><ul><li>Identify similar (or related) concepts </li></ul><ul><ul><li>frog : amphibian </li></ul></ul><ul><ul><li>Duluth : snow </li></ul></ul>
  26. 26. Identify Similar Contexts <ul><li>Find phrases that say the same thing using different words </li></ul><ul><ul><li>I went to the store </li></ul></ul><ul><ul><li>Ted drove to Wal-Mart </li></ul></ul><ul><li>Find words that have the same meaning in different contexts </li></ul><ul><ul><li>The line is moving pretty fast </li></ul></ul><ul><ul><li>I stood in line for 12 hours </li></ul></ul><ul><li>Find different words that have the same meaning in different contexts </li></ul><ul><ul><li>The line is moving pretty fast </li></ul></ul><ul><ul><li>I stood in the queue for 12 hours </li></ul></ul>
  27. 27. SenseClusters Methodology <ul><li>Represent contexts using first or second order co-occurrences </li></ul><ul><li>Reduce dimensionality of vectors </li></ul><ul><ul><li>Singular value decomposition </li></ul></ul><ul><li>Cluster the context vectors </li></ul><ul><ul><li>Find the number of clusters </li></ul></ul><ul><ul><li>Label the clusters </li></ul></ul><ul><li>Evaluate and/or use the contexts! </li></ul>
  28. 28. Second Order Features <ul><li>Second order features encode something ‘extra’ about a feature that occurs in a context, something not available in the context itself </li></ul><ul><ul><li>Native SenseClusters : each feature is represented by a vector of the words with which it occurs </li></ul></ul><ul><ul><li>Latent Semantic Analysis : each feature is represented by a vector of the contexts in which it occurs </li></ul></ul>
  29. 29. Similar Contexts may have the same meaning… <ul><ul><li>Context 1: He drives his car fast </li></ul></ul><ul><ul><li>Context 2: Jim speeds in his auto </li></ul></ul><ul><ul><li>Car -> motor, garage, gasoline, insurance </li></ul></ul><ul><ul><li>Auto -> motor, insurance, gasoline, accident </li></ul></ul><ul><ul><li>Car and Auto share many co-occurrences… </li></ul></ul>
  30. 30. Second Order Context Representation <ul><li>Bigrams used to create a word matrix </li></ul><ul><ul><li>Cell values = log-likelihood of word pair </li></ul></ul><ul><li>Rows are first order co-occurrence vector for a word </li></ul><ul><li>Represent context by averaging vectors of words in that context </li></ul><ul><ul><li>Context includes the Cxt positions around the target, where Cxt is typically 5 or 20. </li></ul></ul>
  31. 31. 2 nd Order Context Vectors <ul><li>He won an Oscar, but Tom Hanks is still a nice guy. </li></ul>0 6272.85 2.9133 62.6084 20.032 1176.84 51.021 O2 context 0 18818.55 0 0 0 205.5469 134.5102 guy 0 0 0 136.0441 29.576 0 0 Oscar 0 0 8.7399 51.7812 30.520 3324.98 18.5533 won needle family war movie actor football baseball
  32. 32. After context representation… <ul><li>Second order vector is an average of word vectors that make up context, captures indirect relationships </li></ul><ul><ul><li>Reduced by SVD to principal components </li></ul></ul><ul><li>Now, cluster the vectors! </li></ul><ul><ul><li>Many methods, we often use k-means or repeated bisections </li></ul></ul><ul><ul><li>CLUTO </li></ul></ul>
  33. 33. What do we get at the end? <ul><li>contexts organized into some number of clusters based on the similarity of their co-occurrences </li></ul><ul><li>contexts which share words that tend to co-occur with the same other words are clustered together </li></ul><ul><ul><li>2 nd order co-occurrences </li></ul></ul>
  34. 34. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
  35. 35. Oh…we also get plenty of these… <ul><li>Similarity Matrices… </li></ul><ul><ul><li>Word by Word </li></ul></ul><ul><ul><li>Ngram by Ngram </li></ul></ul><ul><ul><li>Word by Context </li></ul></ul><ul><ul><li>Ngram by Context </li></ul></ul><ul><ul><li>Context by Word </li></ul></ul><ul><ul><li>Context by Ngram </li></ul></ul><ul><ul><li>Context by Context </li></ul></ul>
  36. 36. The WSD-Shell http:// www.d.umn.edu/~tpederse/supervised.html Kernels
  37. 37. Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>baseball bat </li></ul></ul><ul><li>Identify similar contexts </li></ul><ul><ul><li>I bought some food at the store </li></ul></ul><ul><ul><li>I purchased something to eat at the market </li></ul></ul><ul><li>Assign meanings to words </li></ul><ul><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></ul><ul><li>Identify similar (or related) concepts </li></ul><ul><ul><li>frog : amphibian </li></ul></ul><ul><ul><li>Duluth : snow </li></ul></ul>
  38. 38. Machine Learning Approach <ul><li>Annotate text with sense tags </li></ul><ul><ul><li>must select sense inventory </li></ul></ul><ul><li>Find interesting features </li></ul><ul><ul><li>bigrams and co-occurrences quite effective </li></ul></ul><ul><li>Learn a model </li></ul><ul><li>Apply model to untagged data </li></ul><ul><li>Works very well…given sufficient quantities of training data and sufficient coverage of your sense inventory </li></ul>
  39. 39. Kernel Methods <ul><li>The challenge for any learning algorithm is to separate the training data into groups by finding a boundary (hyperplane) </li></ul><ul><li>Sometimes in the original space this boundary is hard to find </li></ul><ul><li>Transform data via kernel function to a different higher dimensional representation, where boundaries are easier to spot </li></ul>
  40. 40. Kernels are similarity matrices <ul><li>NSP produces word by word similarity matrices, for use by SenseClusters </li></ul><ul><li>SenseClusters produces various sorts of similarity matrices based on co-occurrences </li></ul><ul><li>…which can be used as kernels </li></ul><ul><ul><li>Latent Semantic kernel </li></ul></ul><ul><ul><li>Bigram Association kernel </li></ul></ul><ul><ul><li>Co-occurrence Association kernel </li></ul></ul>
  41. 41. What do we get at the end? <ul><li>More accurate supervised classifiers that potentially require less training data </li></ul><ul><li>Kernel improves ability to find boundaries between training examples by transforming feature space to a higher dimensional “cleaner” space… </li></ul>
  42. 42. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
  43. 43. WordNet-Similarity http://wn-similarity.sourceforge.net Ontologies
  44. 44. Things we can do now… <ul><li>Identify associated words </li></ul><ul><ul><li>fine wine </li></ul></ul><ul><ul><li>baseball bat </li></ul></ul><ul><li>Identify similar contexts </li></ul><ul><ul><li>I bought some food at the store </li></ul></ul><ul><ul><li>I purchased something to eat at the market </li></ul></ul><ul><li>Assign meanings to words </li></ul><ul><ul><li>I went to the bank/[financial-inst.] to deposit my check </li></ul></ul><ul><li>Identify similar (or related) concepts </li></ul><ul><ul><li>frog : amphibian </li></ul></ul><ul><ul><li>Duluth : snow </li></ul></ul>
  45. 45. Similarity and Relatedness <ul><li>Two concepts are similar if they are connected by is-a relationships. </li></ul><ul><ul><li>A frog is-a-kind-of amphibian </li></ul></ul><ul><ul><li>An illness is-a heath_condition </li></ul></ul><ul><li>Two concepts can be related many ways… </li></ul><ul><ul><li>A human has-a-part liver </li></ul></ul><ul><ul><li>Duluth receives-a-lot-of snow </li></ul></ul><ul><li>… similarity is one way to be related </li></ul>
  46. 46. WordNet-Similarity http://wn-similarity.sourceforge.net <ul><li>Path based measures </li></ul><ul><ul><li>Shortest path (path) </li></ul></ul><ul><ul><li>Wu & Palmer (wup) </li></ul></ul><ul><ul><li>Leacock & Chodorow (lch) </li></ul></ul><ul><ul><li>Hirst & St-Onge (hso) </li></ul></ul><ul><li>Information content measures </li></ul><ul><ul><li>Resnik (res) </li></ul></ul><ul><ul><li>Jiang & Conrath (jcn) </li></ul></ul><ul><ul><li>Lin (lin) </li></ul></ul><ul><li>Gloss based measures </li></ul><ul><ul><li>Banerjee and Pedersen (lesk) </li></ul></ul><ul><ul><li>Patwardhan and Pedersen (vector, vector_pairs) </li></ul></ul>
  47. 47. Path Finding <ul><li>Find shortest is-a path between two concepts? </li></ul><ul><ul><li>Rada, et. al. (1989) </li></ul></ul><ul><ul><li>Scaled by depth of hierarchy </li></ul></ul><ul><ul><ul><li>Leacock & Chodorow (1998) </li></ul></ul></ul><ul><ul><li>Depth of subsuming concept scaled by sum of the depths of individual concepts </li></ul></ul><ul><ul><ul><li>Wu and Palmer (1994) </li></ul></ul></ul>
  48. 48. watercraft instrumentality object artifact conveyance vehicle motor-vehicle car boat ark article ware table-ware cutlery fork from Jiang and Conrath [1997]
  49. 49. Information Content <ul><li>Measure of specificity in is-a hierarchy (Resnik, 1995) </li></ul><ul><ul><li>-log (probability of concept) </li></ul></ul><ul><ul><li>High information content values mean very specific concepts (like pitch-fork and basketball shoe) </li></ul></ul><ul><li>Count how often a concept occurs in a corpus </li></ul><ul><ul><li>Increment the count associated with that concept, and propagate the count up! </li></ul></ul><ul><ul><li>If based on word forms, increment all concepts associated with that form </li></ul></ul>
  50. 50. Observed “car”... motor vehicle (327 +1) *root* (32783 + 1) minicab (6) cab (23) car (73 +1) bus (17) stock car (12)
  51. 51. Observed “stock car”... motor vehicle (328+1) *root* (32784+1) minicab (6) cab (23) car (74+1) bus (17) stock car (12+1)
  52. 52. After Counting Concepts... motor vehicle (329) IC = 1.9 *root* (32785) minicab (6) cab (23) car (75) bus (17) IC = 3.5 stock car (13) IC = 3.1
  53. 53. Similarity and Information Content <ul><li>Resnik (1995) use information content of least common subsumer to express similarity between two concepts </li></ul><ul><li>Lin (1998) scale information content of least common subsumer with sum of information content of two concepts </li></ul><ul><li>Jiang & Conrath (1997) find difference between least common subsumer’s information content and the sum of the two individual concepts </li></ul>
  54. 54. What do we get at the end? <ul><li>Similarity (or relatedness) scores between pairs of words / concepts that are based on path lengths, but augmented with distributional information from corpora </li></ul><ul><li>Can create a similarity matrix between concepts based on these scores </li></ul>
  55. 55. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
  56. 56. Wait a minute … <ul><li>Concepts must be organized in a hierarchy, and connected in that hierarchy </li></ul><ul><ul><li>Limited to comparing nouns with nouns, or maybe verbs with verbs </li></ul></ul><ul><ul><li>Limited to similarity measures (is-a) </li></ul></ul><ul><li>What about mixed parts of speech? </li></ul><ul><ul><li>Murder (noun) and horrible (adjective) </li></ul></ul><ul><ul><li>Tobacco (noun) and drinking (verb) </li></ul></ul>
  57. 57. Using Dictionary Glosses to Measure Relatedness <ul><li>Lesk (1985) Algorithm – measure relatedness of two concepts by counting the number of shared words in their definitions </li></ul><ul><ul><li>Cold - a mild viral infection involving the nose and respiratory passages (but not the lungs) </li></ul></ul><ul><ul><li>Flu - an acute febrile highly contagious viral disease </li></ul></ul><ul><li>Adapted Lesk (Banerjee & Pedersen, 2003) – expand glosses to include those concepts directly related </li></ul><ul><ul><li>Cold - a common cold affecting the nasal passages and resulting in congestion and sneezing and headache; mild viral infection involving the nose and respiratory passages (but not the lungs); a disease affecting the respiratory system </li></ul></ul><ul><ul><li>Flu - an acute and highly contagious respiratory disease of swine caused by the orthomyxovirus thought to be the same virus that caused the 1918 influenza pandemic; an acute febrile highly contagious viral disease; a disease that can be communicated from one person to another </li></ul></ul>
  58. 58. Gloss Vectors <ul><li>Leskian approaches require exact matches in glosses </li></ul><ul><ul><li>Glosses are short, use related but not identical words </li></ul></ul><ul><li>Solution? Expand glosses by replacing each content word with a co-occurrence vector derived from corpora </li></ul><ul><ul><li>Rows are words in glosses, columns are the co-occurring words in a corpus, cell values are their log-likelihood ratios </li></ul></ul><ul><li>Average the word vectors to create a single vector that represents the gloss/sense (Patwardhan & Pedersen, 2003) </li></ul><ul><ul><li>2 nd order co-occurrences </li></ul></ul><ul><li>Measure relatedness using cosine rather than exact match! </li></ul><ul><li>Methodology the same as that used in SenseClusters </li></ul>
  59. 59. What do we get at the end? <ul><li>Relatedness scores between pairs of words / concepts that are based on content of WordNet (viewing it more like MRD than ontology) </li></ul><ul><li>Can create a “relatedness” matrix between concepts based on these scores </li></ul>
  60. 60. Why measure conceptual similarity? <ul><li>A word will take the sense that is most related to the surrounding context </li></ul><ul><ul><li>I love Java , especially the beaches and the weather. </li></ul></ul><ul><ul><li>I love Java , especially the support for concurrent programming. </li></ul></ul><ul><ul><li>I love java , especially first thing in the morning with a bagel. </li></ul></ul>
  61. 61. Word Sense Disambiguation <ul><li>… can be performed by finding the sense of a word most related to its neighbors </li></ul><ul><li>Here, we define similarity and relatedness with respect to WordNet-Similarity </li></ul><ul><li>WordNet-SenseRelate </li></ul><ul><ul><li>AllWords – assign a sense to every content word </li></ul></ul><ul><ul><li>TargetWord – assign a sense to a given word </li></ul></ul><ul><ul><ul><li>http:// senserelate.sourceforge.net </li></ul></ul></ul>
  62. 62. WordNet-SenseRelate http:// senserelate.sourceforge.net SenseRelate
  63. 63. SenseRelate Algorithm <ul><li>For each sense of a target word in context </li></ul><ul><ul><li>For each content word in the context </li></ul></ul><ul><ul><ul><li>For each sense of that content word </li></ul></ul></ul><ul><ul><ul><ul><li>Measure similarity/relatedness between sense of target word and sense of content word with WordNet::Similarity </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Keep running sum for score of each sense of target </li></ul></ul></ul></ul><ul><li>Pick sense of target word with highest score with words in context </li></ul><ul><li>Go to the next word, repeat </li></ul>
  64. 64. Coverage… <ul><li>WordNet </li></ul><ul><ul><li>Nouns – 82,000 concepts </li></ul></ul><ul><ul><li>Verbs – 14,000 concepts </li></ul></ul><ul><ul><li>Adjectives – 18,000 concepts </li></ul></ul><ul><ul><li>Adverbs – 4,000 concepts </li></ul></ul><ul><li>Words not found in WordNet can’t be disambiguated by SenseRelate </li></ul><ul><li>language and resource dependent… </li></ul>
  65. 65. What do we get at the end? <ul><li>Can assign a sense to every word (known to WordNet) in running text </li></ul><ul><li>Can assign similarity scores to pairs of contexts, or a word and a given set of words… </li></ul><ul><li>Can turn these into a matrix … </li></ul>
  66. 66. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  67. 67. Kernels are similarity matrices <ul><li>NSP produces word by word similarity matrices, for use by SenseClusters </li></ul><ul><li>SenseClusters produces various similarity matrices based on co-occurrences </li></ul><ul><li>WordNet-Similarity produces concept by concept similarity matrices </li></ul><ul><li>SenseRelate produces context by context similarity matrices based on concept similarity </li></ul><ul><li>All of these could be used as kernels for Supervised WSD </li></ul>
  68. 68. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  69. 69. SenseClusters Input … matrices <ul><li>Word by Word co-occurrences to create second order representation (Native) </li></ul><ul><li>Context by Word co-occurrences to create LSA representation… </li></ul><ul><li>Concept by Concept similarity scores from WordNet::Similarity </li></ul><ul><li>Context by Context similarity scores from SenseRelate </li></ul>
  70. 70. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  71. 71. Identifying Collocations <ul><li>…could benefit from word clusters found in SenseClusters </li></ul><ul><li>…could benefit from similarity measures from WordNet::Similarity… </li></ul>
  72. 72. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  73. 74. Conclusion <ul><li>Time to integrate what we have at the word and term level </li></ul><ul><ul><li>look for ways to stitch semantic patches together </li></ul></ul><ul><li>This will increase our coverage and decrease language dependence </li></ul><ul><ul><li>make the quilt bigger and sturdier </li></ul></ul><ul><li>We will then be able to look at a broader range of languages and semantic problems </li></ul><ul><ul><li>calm problems with the warmth of your lovely quilt… </li></ul></ul>
  74. 75. Many Thanks… <ul><li>SenseClusters </li></ul><ul><ul><li>Amruta Purandare (MS '04) </li></ul></ul><ul><ul><li>Anagha Kulkarni (MS '06) </li></ul></ul><ul><ul><li>Mahesh Joshi (MS '06) </li></ul></ul><ul><li>WordNet Similarity </li></ul><ul><ul><li>Sid Patwardhan (MS '03) </li></ul></ul><ul><ul><li>Jason Michelizzi (MS '05) </li></ul></ul><ul><li>SenseRelate </li></ul><ul><ul><li>Satanjeev Banerjee (MS '02) </li></ul></ul><ul><ul><li>Sid Patwardhan (MS '03) </li></ul></ul><ul><ul><li>Jason Michelizzi (MS '05) </li></ul></ul><ul><ul><li>Varada Kolhatkar (MS '09) </li></ul></ul><ul><li>Ngram Statistics Package </li></ul><ul><ul><li>Satanjeev Banerjee (MS '02) </li></ul></ul><ul><ul><li>Bridget McInnes (MS '04, PhD '??) </li></ul></ul><ul><ul><li>Saiyam Kohli (MS '06) </li></ul></ul><ul><li>Supervised WSD </li></ul><ul><ul><li>Saif Mohammad (MS '03) </li></ul></ul><ul><ul><li>Amruta Purandare (MS '04) </li></ul></ul><ul><ul><li>Mahesh Joshi (MS '06) </li></ul></ul><ul><ul><li>Bridget McInnes (MS '04, PhD '??) </li></ul></ul>
  75. 76. URLs <ul><li>Ngram Statistics Package </li></ul><ul><ul><li>http:// ngram.sourceforge.net </li></ul></ul><ul><li>SenseClusters </li></ul><ul><ul><li>http:// senseclusters.sourceforge.net </li></ul></ul><ul><li>WordNet-Similarity </li></ul><ul><ul><li>http://wn-similarity.sourceforge.net </li></ul></ul><ul><li>SenseRelate WSD </li></ul><ul><ul><li>http:// senserelate.sourceforge.net </li></ul></ul><ul><li>Supervised WSD </li></ul><ul><ul><li>http:// www.d.umn.edu/~tpederse/supervised.html </li></ul></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×