The Semantic Quilt
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

The Semantic Quilt

on

  • 1,187 views

A talk on the Semantic Quilt, which combines various methods of "doing semantics" into a more unified framework.

A talk on the Semantic Quilt, which combines various methods of "doing semantics" into a more unified framework.

Statistics

Views

Total Views
1,187
Views on SlideShare
1,186
Embed Views
1

Actions

Likes
2
Downloads
34
Comments
0

1 Embed 1

http://www.placedunumerique.info 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Semantic Quilt Presentation Transcript

  • 1. The Semantic Quilt: Contexts, Co-occurrences, Kernels, and Ontologies Ted Pedersen University of Minnesota, Duluth http:// www.d.umn.edu/~tpederse
  • 2. Create by stitching together
  • 3. Sew together different materials
  • 4. Ontologies Co-Occurrences Kernels Contexts
  • 5. Semantics in NLP
    • Potentially useful for many applications
      • Machine Translation
      • Document or Story Understanding
      • Text Generation
      • Web Search
    • Can come from many sources
    • Not well integrated
    • Not well defined?
  • 6. What do we mean by semantics ? …it depends on our resources…
    • Ontologies – relationships among concepts
      • Similar / related concepts connected
    • Dictionary – definitions of senses / concepts
      • similar / related senses have similar / related definitions
    • Contexts – short passages of words
      • similar / related words occur in similar / related contexts
    • Co-occurrences –
      • a word is defined by the company it keeps
      • words that occur with the same kinds words are similar / related
  • 7. What level of granularity?
    • words
    • terms / collocations
    • phrases
    • sentences
    • paragraphs
    • documents
    • books
  • 8. The Terrible Tension : Ambiguity versus Granularity
    • Words are potentially very ambiguous
      • But we can list them (sort of)
      • … we can define their meanings (sort of)
      • … not ambiguous to human reader, but hard for a computer to know which meaning is intended
    • Terms / collocations are less ambiguous
      • Difficult to enumerate because there are so many, but can be done for a domain (e.g., medicine)
    • Phrases (short contexts) can still be ambiguous, but not to the same degree as words or terms/collocations
  • 9. The Current State of Affairs
    • Most resources and methods focus on word or term semantics
      • makes it possible to build resources (manually or automatically) with reasonable coverage, but …
      • … techniques become very resource dependent
      • … resources become language dependent
      • … introduces a lot of ambiguity
      • … not clear how to bring together resources
    • Similarity is a useful organizing principle, but …
      • … there are lots of ways to be similar
  • 10. Similarity as Organizing Principle
    • Measure word association using knowledge lean methods that are based on co-occurrence information from large corpora
    • Measure contextual similarity using knowledge lean methods that are based on co-occurrence information from large corpora
    • Measure conceptual similarity / relatedness using a structured repository of knowledge
      • Lexical database WordNet
      • Unified Medical Language System (UMLS)
  • 11. Things we can do now…
    • Identify associated words
      • fine wine
      • baseball bat
    • Identify similar contexts
      • I bought some food at the store
      • I purchased something to eat at the market
    • Assign meanings to words
      • I went to the bank /[financial-inst.] to deposit my check
    • Identify similar (or related) concepts
      • frog : amphibian
      • Duluth : snow
  • 12. Things we want to do…
    • Integrate different resources and methods
    • Solve bigger problems
      • some of what we do now is a means to an unclear end
    • Be Language Independent
    • Offer Broad Coverage
    • Reduce dependence on manually built resources
      • ontologies, dictionaries, labeled training data…
  • 13. Semantic Patches to Sew Together
    • Contexts
      • SenseClusters : measures similarity between written texts (i.e., contexts)
    • Co-Occurrences
      • Ngram Statistics Package : measures association between words, identify collocations or terms
    • Kernels
      • WSD-Shell : supervised learning for word sense disambiguation, in process of including SVMs with user defined kernels
    • “ Ontologies”
      • WordNet-Similarity : measures similarity between concepts found in WordNet
      • UMLS-Similarity
      • All of these are projects at the University of Minnesota, Duluth
  • 14. Ontologies Co-Occurrences Kernels Contexts
  • 15. Ngram Statistics Package http:// ngram.sourceforge.net Co-Occurrences
  • 16. Things we can do now…
    • Identify associated words
      • fine wine
      • baseball bat
    • Identify similar contexts
      • I bought some food at the store
      • I purchased something to eat at the market
    • Assign meanings to words
      • I went to the bank/[financial-inst.] to deposit my check
    • Identify similar (or related) concepts
      • frog : amphibian
      • Duluth : snow
  • 17. Co-occurrences and semantics?
    • individual words (esp. common ones) are very ambiguous
      • bat
      • line
    • pairs of words disambiguate each other
      • baseball bat
      • vampire … Transylvania
      • product line
      • speech …. line
  • 18. Why pairs of words?
    • Zipf's Law
      • most words are rare, most bigrams are even more rare, most ngrams are even rarer still
      • the more common a word, the more senses it will have
    • “ Co-occurrences” are less frequent than individual words, tend to be less ambiguous as a result
      • Mutually disambiguating
  • 19. Bigrams
    • Window Size of 2
      • baseball bat, fine wine, apple orchard, bill clinton
    • Window Size of 3
      • house of representatives, bottle of wine,
    • Window Size of 4
      • president of the republic, whispering in the wind
    • Selected using a small window size (2-4 words)
    • Objective is to capture a regular or localized pattern between two words (collocation?)
    • If order doesn’t matter, then these are co-occurrences …
  • 20. “ occur together more often than expected by chance…”
    • Observed frequencies for two words occurring together and alone are stored in a 2x2 matrix
    • Expected values are calculated, based on the model of independence and observed values
      • How often would you expect these words to occur together, if they only occurred together by chance?
      • If two words occur “significantly” more often than the expected value, then the words do not occur together by chance.
  • 21. Measures and Tests of Association http:// ngram.sourceforge.net
    • Log-likelihood Ratio
    • Mutual Information
    • Pointwise Mutual Information
    • Pearson’s Chi-squared Test
    • Phi coefficient
    • Fisher’s Exact Test
    • T-test
    • Dice Coefficient
    • Odds Ratio
  • 22. What do we get at the end?
    • A list of bigrams or co-occurrences that are significant or interesting (meaningful?)
      • automatic
      • language independent
    • These can be used as building blocks for systems that do semantic processing
      • relatively unambiguous
      • often very informative about topic or domain
      • can serve as a fingerprint for a document or book
  • 23. Ontologies Co-Occurrences Kernels Contexts
  • 24. SenseClusters http:// senseclusters.sourceforge.net Contexts
  • 25. Things we can do now…
    • Identify associated words
      • fine wine
      • baseball bat
    • Identify similar contexts
      • I bought some food at the store
      • I purchased something to eat at the market
    • Assign meanings to words
      • I went to the bank/[financial-inst.] to deposit my check
    • Identify similar (or related) concepts
      • frog : amphibian
      • Duluth : snow
  • 26. Identify Similar Contexts
    • Find phrases that say the same thing using different words
      • I went to the store
      • Ted drove to Wal-Mart
    • Find words that have the same meaning in different contexts
      • The line is moving pretty fast
      • I stood in line for 12 hours
    • Find different words that have the same meaning in different contexts
      • The line is moving pretty fast
      • I stood in the queue for 12 hours
  • 27. SenseClusters Methodology
    • Represent contexts using first or second order co-occurrences
    • Reduce dimensionality of vectors
      • Singular value decomposition
    • Cluster the context vectors
      • Find the number of clusters
      • Label the clusters
    • Evaluate and/or use the contexts!
  • 28. Second Order Features
    • Second order features encode something ‘extra’ about a feature that occurs in a context, something not available in the context itself
      • Native SenseClusters : each feature is represented by a vector of the words with which it occurs
      • Latent Semantic Analysis : each feature is represented by a vector of the contexts in which it occurs
  • 29. Similar Contexts may have the same meaning…
      • Context 1: He drives his car fast
      • Context 2: Jim speeds in his auto
      • Car -> motor, garage, gasoline, insurance
      • Auto -> motor, insurance, gasoline, accident
      • Car and Auto share many co-occurrences…
  • 30. Second Order Context Representation
    • Bigrams used to create a word matrix
      • Cell values = log-likelihood of word pair
    • Rows are first order co-occurrence vector for a word
    • Represent context by averaging vectors of words in that context
      • Context includes the Cxt positions around the target, where Cxt is typically 5 or 20.
  • 31. 2 nd Order Context Vectors
    • He won an Oscar, but Tom Hanks is still a nice guy.
    0 6272.85 2.9133 62.6084 20.032 1176.84 51.021 O2 context 0 18818.55 0 0 0 205.5469 134.5102 guy 0 0 0 136.0441 29.576 0 0 Oscar 0 0 8.7399 51.7812 30.520 3324.98 18.5533 won needle family war movie actor football baseball
  • 32. After context representation…
    • Second order vector is an average of word vectors that make up context, captures indirect relationships
      • Reduced by SVD to principal components
    • Now, cluster the vectors!
      • Many methods, we often use k-means or repeated bisections
      • CLUTO
  • 33. What do we get at the end?
    • contexts organized into some number of clusters based on the similarity of their co-occurrences
    • contexts which share words that tend to co-occur with the same other words are clustered together
      • 2 nd order co-occurrences
  • 34. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
  • 35. Oh…we also get plenty of these…
    • Similarity Matrices…
      • Word by Word
      • Ngram by Ngram
      • Word by Context
      • Ngram by Context
      • Context by Word
      • Context by Ngram
      • Context by Context
  • 36. The WSD-Shell http:// www.d.umn.edu/~tpederse/supervised.html Kernels
  • 37. Things we can do now…
    • Identify associated words
      • fine wine
      • baseball bat
    • Identify similar contexts
      • I bought some food at the store
      • I purchased something to eat at the market
    • Assign meanings to words
      • I went to the bank/[financial-inst.] to deposit my check
    • Identify similar (or related) concepts
      • frog : amphibian
      • Duluth : snow
  • 38. Machine Learning Approach
    • Annotate text with sense tags
      • must select sense inventory
    • Find interesting features
      • bigrams and co-occurrences quite effective
    • Learn a model
    • Apply model to untagged data
    • Works very well…given sufficient quantities of training data and sufficient coverage of your sense inventory
  • 39. Kernel Methods
    • The challenge for any learning algorithm is to separate the training data into groups by finding a boundary (hyperplane)
    • Sometimes in the original space this boundary is hard to find
    • Transform data via kernel function to a different higher dimensional representation, where boundaries are easier to spot
  • 40. Kernels are similarity matrices
    • NSP produces word by word similarity matrices, for use by SenseClusters
    • SenseClusters produces various sorts of similarity matrices based on co-occurrences
    • …which can be used as kernels
      • Latent Semantic kernel
      • Bigram Association kernel
      • Co-occurrence Association kernel
  • 41. What do we get at the end?
    • More accurate supervised classifiers that potentially require less training data
    • Kernel improves ability to find boundaries between training examples by transforming feature space to a higher dimensional “cleaner” space…
  • 42. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
  • 43. WordNet-Similarity http://wn-similarity.sourceforge.net Ontologies
  • 44. Things we can do now…
    • Identify associated words
      • fine wine
      • baseball bat
    • Identify similar contexts
      • I bought some food at the store
      • I purchased something to eat at the market
    • Assign meanings to words
      • I went to the bank/[financial-inst.] to deposit my check
    • Identify similar (or related) concepts
      • frog : amphibian
      • Duluth : snow
  • 45. Similarity and Relatedness
    • Two concepts are similar if they are connected by is-a relationships.
      • A frog is-a-kind-of amphibian
      • An illness is-a heath_condition
    • Two concepts can be related many ways…
      • A human has-a-part liver
      • Duluth receives-a-lot-of snow
    • … similarity is one way to be related
  • 46. WordNet-Similarity http://wn-similarity.sourceforge.net
    • Path based measures
      • Shortest path (path)
      • Wu & Palmer (wup)
      • Leacock & Chodorow (lch)
      • Hirst & St-Onge (hso)
    • Information content measures
      • Resnik (res)
      • Jiang & Conrath (jcn)
      • Lin (lin)
    • Gloss based measures
      • Banerjee and Pedersen (lesk)
      • Patwardhan and Pedersen (vector, vector_pairs)
  • 47. Path Finding
    • Find shortest is-a path between two concepts?
      • Rada, et. al. (1989)
      • Scaled by depth of hierarchy
        • Leacock & Chodorow (1998)
      • Depth of subsuming concept scaled by sum of the depths of individual concepts
        • Wu and Palmer (1994)
  • 48. watercraft instrumentality object artifact conveyance vehicle motor-vehicle car boat ark article ware table-ware cutlery fork from Jiang and Conrath [1997]
  • 49. Information Content
    • Measure of specificity in is-a hierarchy (Resnik, 1995)
      • -log (probability of concept)
      • High information content values mean very specific concepts (like pitch-fork and basketball shoe)
    • Count how often a concept occurs in a corpus
      • Increment the count associated with that concept, and propagate the count up!
      • If based on word forms, increment all concepts associated with that form
  • 50. Observed “car”... motor vehicle (327 +1) *root* (32783 + 1) minicab (6) cab (23) car (73 +1) bus (17) stock car (12)
  • 51. Observed “stock car”... motor vehicle (328+1) *root* (32784+1) minicab (6) cab (23) car (74+1) bus (17) stock car (12+1)
  • 52. After Counting Concepts... motor vehicle (329) IC = 1.9 *root* (32785) minicab (6) cab (23) car (75) bus (17) IC = 3.5 stock car (13) IC = 3.1
  • 53. Similarity and Information Content
    • Resnik (1995) use information content of least common subsumer to express similarity between two concepts
    • Lin (1998) scale information content of least common subsumer with sum of information content of two concepts
    • Jiang & Conrath (1997) find difference between least common subsumer’s information content and the sum of the two individual concepts
  • 54. What do we get at the end?
    • Similarity (or relatedness) scores between pairs of words / concepts that are based on path lengths, but augmented with distributional information from corpora
    • Can create a similarity matrix between concepts based on these scores
  • 55. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
  • 56. Wait a minute …
    • Concepts must be organized in a hierarchy, and connected in that hierarchy
      • Limited to comparing nouns with nouns, or maybe verbs with verbs
      • Limited to similarity measures (is-a)
    • What about mixed parts of speech?
      • Murder (noun) and horrible (adjective)
      • Tobacco (noun) and drinking (verb)
  • 57. Using Dictionary Glosses to Measure Relatedness
    • Lesk (1985) Algorithm – measure relatedness of two concepts by counting the number of shared words in their definitions
      • Cold - a mild viral infection involving the nose and respiratory passages (but not the lungs)
      • Flu - an acute febrile highly contagious viral disease
    • Adapted Lesk (Banerjee & Pedersen, 2003) – expand glosses to include those concepts directly related
      • Cold - a common cold affecting the nasal passages and resulting in congestion and sneezing and headache; mild viral infection involving the nose and respiratory passages (but not the lungs); a disease affecting the respiratory system
      • Flu - an acute and highly contagious respiratory disease of swine caused by the orthomyxovirus thought to be the same virus that caused the 1918 influenza pandemic; an acute febrile highly contagious viral disease; a disease that can be communicated from one person to another
  • 58. Gloss Vectors
    • Leskian approaches require exact matches in glosses
      • Glosses are short, use related but not identical words
    • Solution? Expand glosses by replacing each content word with a co-occurrence vector derived from corpora
      • Rows are words in glosses, columns are the co-occurring words in a corpus, cell values are their log-likelihood ratios
    • Average the word vectors to create a single vector that represents the gloss/sense (Patwardhan & Pedersen, 2003)
      • 2 nd order co-occurrences
    • Measure relatedness using cosine rather than exact match!
    • Methodology the same as that used in SenseClusters
  • 59. What do we get at the end?
    • Relatedness scores between pairs of words / concepts that are based on content of WordNet (viewing it more like MRD than ontology)
    • Can create a “relatedness” matrix between concepts based on these scores
  • 60. Why measure conceptual similarity?
    • A word will take the sense that is most related to the surrounding context
      • I love Java , especially the beaches and the weather.
      • I love Java , especially the support for concurrent programming.
      • I love java , especially first thing in the morning with a bagel.
  • 61. Word Sense Disambiguation
    • … can be performed by finding the sense of a word most related to its neighbors
    • Here, we define similarity and relatedness with respect to WordNet-Similarity
    • WordNet-SenseRelate
      • AllWords – assign a sense to every content word
      • TargetWord – assign a sense to a given word
        • http:// senserelate.sourceforge.net
  • 62. WordNet-SenseRelate http:// senserelate.sourceforge.net SenseRelate
  • 63. SenseRelate Algorithm
    • For each sense of a target word in context
      • For each content word in the context
        • For each sense of that content word
          • Measure similarity/relatedness between sense of target word and sense of content word with WordNet::Similarity
          • Keep running sum for score of each sense of target
    • Pick sense of target word with highest score with words in context
    • Go to the next word, repeat
  • 64. Coverage…
    • WordNet
      • Nouns – 82,000 concepts
      • Verbs – 14,000 concepts
      • Adjectives – 18,000 concepts
      • Adverbs – 4,000 concepts
    • Words not found in WordNet can’t be disambiguated by SenseRelate
    • language and resource dependent…
  • 65. What do we get at the end?
    • Can assign a sense to every word (known to WordNet) in running text
    • Can assign similarity scores to pairs of contexts, or a word and a given set of words…
    • Can turn these into a matrix …
  • 66. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  • 67. Kernels are similarity matrices
    • NSP produces word by word similarity matrices, for use by SenseClusters
    • SenseClusters produces various similarity matrices based on co-occurrences
    • WordNet-Similarity produces concept by concept similarity matrices
    • SenseRelate produces context by context similarity matrices based on concept similarity
    • All of these could be used as kernels for Supervised WSD
  • 68. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  • 69. SenseClusters Input … matrices
    • Word by Word co-occurrences to create second order representation (Native)
    • Context by Word co-occurrences to create LSA representation…
    • Concept by Concept similarity scores from WordNet::Similarity
    • Context by Context similarity scores from SenseRelate
  • 70. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  • 71. Identifying Collocations
    • …could benefit from word clusters found in SenseClusters
    • …could benefit from similarity measures from WordNet::Similarity…
  • 72. Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
  • 73.  
  • 74. Conclusion
    • Time to integrate what we have at the word and term level
      • look for ways to stitch semantic patches together
    • This will increase our coverage and decrease language dependence
      • make the quilt bigger and sturdier
    • We will then be able to look at a broader range of languages and semantic problems
      • calm problems with the warmth of your lovely quilt…
  • 75. Many Thanks…
    • SenseClusters
      • Amruta Purandare (MS '04)
      • Anagha Kulkarni (MS '06)
      • Mahesh Joshi (MS '06)
    • WordNet Similarity
      • Sid Patwardhan (MS '03)
      • Jason Michelizzi (MS '05)
    • SenseRelate
      • Satanjeev Banerjee (MS '02)
      • Sid Patwardhan (MS '03)
      • Jason Michelizzi (MS '05)
      • Varada Kolhatkar (MS '09)
    • Ngram Statistics Package
      • Satanjeev Banerjee (MS '02)
      • Bridget McInnes (MS '04, PhD '??)
      • Saiyam Kohli (MS '06)
    • Supervised WSD
      • Saif Mohammad (MS '03)
      • Amruta Purandare (MS '04)
      • Mahesh Joshi (MS '06)
      • Bridget McInnes (MS '04, PhD '??)
  • 76. URLs
    • Ngram Statistics Package
      • http:// ngram.sourceforge.net
    • SenseClusters
      • http:// senseclusters.sourceforge.net
    • WordNet-Similarity
      • http://wn-similarity.sourceforge.net
    • SenseRelate WSD
      • http:// senserelate.sourceforge.net
    • Supervised WSD
      • http:// www.d.umn.edu/~tpederse/supervised.html