• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The Semantic Quilt
 

The Semantic Quilt

on

  • 1,108 views

A talk on the Semantic Quilt, which combines various methods of "doing semantics" into a more unified framework.

A talk on the Semantic Quilt, which combines various methods of "doing semantics" into a more unified framework.

Statistics

Views

Total Views
1,108
Views on SlideShare
1,107
Embed Views
1

Actions

Likes
2
Downloads
29
Comments
0

1 Embed 1

http://www.placedunumerique.info 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    The Semantic Quilt The Semantic Quilt Presentation Transcript

    • The Semantic Quilt: Contexts, Co-occurrences, Kernels, and Ontologies Ted Pedersen University of Minnesota, Duluth http:// www.d.umn.edu/~tpederse
    • Create by stitching together
    • Sew together different materials
    • Ontologies Co-Occurrences Kernels Contexts
    • Semantics in NLP
      • Potentially useful for many applications
        • Machine Translation
        • Document or Story Understanding
        • Text Generation
        • Web Search
      • Can come from many sources
      • Not well integrated
      • Not well defined?
    • What do we mean by semantics ? …it depends on our resources…
      • Ontologies – relationships among concepts
        • Similar / related concepts connected
      • Dictionary – definitions of senses / concepts
        • similar / related senses have similar / related definitions
      • Contexts – short passages of words
        • similar / related words occur in similar / related contexts
      • Co-occurrences –
        • a word is defined by the company it keeps
        • words that occur with the same kinds words are similar / related
    • What level of granularity?
      • words
      • terms / collocations
      • phrases
      • sentences
      • paragraphs
      • documents
      • books
    • The Terrible Tension : Ambiguity versus Granularity
      • Words are potentially very ambiguous
        • But we can list them (sort of)
        • … we can define their meanings (sort of)
        • … not ambiguous to human reader, but hard for a computer to know which meaning is intended
      • Terms / collocations are less ambiguous
        • Difficult to enumerate because there are so many, but can be done for a domain (e.g., medicine)
      • Phrases (short contexts) can still be ambiguous, but not to the same degree as words or terms/collocations
    • The Current State of Affairs
      • Most resources and methods focus on word or term semantics
        • makes it possible to build resources (manually or automatically) with reasonable coverage, but …
        • … techniques become very resource dependent
        • … resources become language dependent
        • … introduces a lot of ambiguity
        • … not clear how to bring together resources
      • Similarity is a useful organizing principle, but …
        • … there are lots of ways to be similar
    • Similarity as Organizing Principle
      • Measure word association using knowledge lean methods that are based on co-occurrence information from large corpora
      • Measure contextual similarity using knowledge lean methods that are based on co-occurrence information from large corpora
      • Measure conceptual similarity / relatedness using a structured repository of knowledge
        • Lexical database WordNet
        • Unified Medical Language System (UMLS)
    • Things we can do now…
      • Identify associated words
        • fine wine
        • baseball bat
      • Identify similar contexts
        • I bought some food at the store
        • I purchased something to eat at the market
      • Assign meanings to words
        • I went to the bank /[financial-inst.] to deposit my check
      • Identify similar (or related) concepts
        • frog : amphibian
        • Duluth : snow
    • Things we want to do…
      • Integrate different resources and methods
      • Solve bigger problems
        • some of what we do now is a means to an unclear end
      • Be Language Independent
      • Offer Broad Coverage
      • Reduce dependence on manually built resources
        • ontologies, dictionaries, labeled training data…
    • Semantic Patches to Sew Together
      • Contexts
        • SenseClusters : measures similarity between written texts (i.e., contexts)
      • Co-Occurrences
        • Ngram Statistics Package : measures association between words, identify collocations or terms
      • Kernels
        • WSD-Shell : supervised learning for word sense disambiguation, in process of including SVMs with user defined kernels
      • “ Ontologies”
        • WordNet-Similarity : measures similarity between concepts found in WordNet
        • UMLS-Similarity
        • All of these are projects at the University of Minnesota, Duluth
    • Ontologies Co-Occurrences Kernels Contexts
    • Ngram Statistics Package http:// ngram.sourceforge.net Co-Occurrences
    • Things we can do now…
      • Identify associated words
        • fine wine
        • baseball bat
      • Identify similar contexts
        • I bought some food at the store
        • I purchased something to eat at the market
      • Assign meanings to words
        • I went to the bank/[financial-inst.] to deposit my check
      • Identify similar (or related) concepts
        • frog : amphibian
        • Duluth : snow
    • Co-occurrences and semantics?
      • individual words (esp. common ones) are very ambiguous
        • bat
        • line
      • pairs of words disambiguate each other
        • baseball bat
        • vampire … Transylvania
        • product line
        • speech …. line
    • Why pairs of words?
      • Zipf's Law
        • most words are rare, most bigrams are even more rare, most ngrams are even rarer still
        • the more common a word, the more senses it will have
      • “ Co-occurrences” are less frequent than individual words, tend to be less ambiguous as a result
        • Mutually disambiguating
    • Bigrams
      • Window Size of 2
        • baseball bat, fine wine, apple orchard, bill clinton
      • Window Size of 3
        • house of representatives, bottle of wine,
      • Window Size of 4
        • president of the republic, whispering in the wind
      • Selected using a small window size (2-4 words)
      • Objective is to capture a regular or localized pattern between two words (collocation?)
      • If order doesn’t matter, then these are co-occurrences …
    • “ occur together more often than expected by chance…”
      • Observed frequencies for two words occurring together and alone are stored in a 2x2 matrix
      • Expected values are calculated, based on the model of independence and observed values
        • How often would you expect these words to occur together, if they only occurred together by chance?
        • If two words occur “significantly” more often than the expected value, then the words do not occur together by chance.
    • Measures and Tests of Association http:// ngram.sourceforge.net
      • Log-likelihood Ratio
      • Mutual Information
      • Pointwise Mutual Information
      • Pearson’s Chi-squared Test
      • Phi coefficient
      • Fisher’s Exact Test
      • T-test
      • Dice Coefficient
      • Odds Ratio
    • What do we get at the end?
      • A list of bigrams or co-occurrences that are significant or interesting (meaningful?)
        • automatic
        • language independent
      • These can be used as building blocks for systems that do semantic processing
        • relatively unambiguous
        • often very informative about topic or domain
        • can serve as a fingerprint for a document or book
    • Ontologies Co-Occurrences Kernels Contexts
    • SenseClusters http:// senseclusters.sourceforge.net Contexts
    • Things we can do now…
      • Identify associated words
        • fine wine
        • baseball bat
      • Identify similar contexts
        • I bought some food at the store
        • I purchased something to eat at the market
      • Assign meanings to words
        • I went to the bank/[financial-inst.] to deposit my check
      • Identify similar (or related) concepts
        • frog : amphibian
        • Duluth : snow
    • Identify Similar Contexts
      • Find phrases that say the same thing using different words
        • I went to the store
        • Ted drove to Wal-Mart
      • Find words that have the same meaning in different contexts
        • The line is moving pretty fast
        • I stood in line for 12 hours
      • Find different words that have the same meaning in different contexts
        • The line is moving pretty fast
        • I stood in the queue for 12 hours
    • SenseClusters Methodology
      • Represent contexts using first or second order co-occurrences
      • Reduce dimensionality of vectors
        • Singular value decomposition
      • Cluster the context vectors
        • Find the number of clusters
        • Label the clusters
      • Evaluate and/or use the contexts!
    • Second Order Features
      • Second order features encode something ‘extra’ about a feature that occurs in a context, something not available in the context itself
        • Native SenseClusters : each feature is represented by a vector of the words with which it occurs
        • Latent Semantic Analysis : each feature is represented by a vector of the contexts in which it occurs
    • Similar Contexts may have the same meaning…
        • Context 1: He drives his car fast
        • Context 2: Jim speeds in his auto
        • Car -> motor, garage, gasoline, insurance
        • Auto -> motor, insurance, gasoline, accident
        • Car and Auto share many co-occurrences…
    • Second Order Context Representation
      • Bigrams used to create a word matrix
        • Cell values = log-likelihood of word pair
      • Rows are first order co-occurrence vector for a word
      • Represent context by averaging vectors of words in that context
        • Context includes the Cxt positions around the target, where Cxt is typically 5 or 20.
    • 2 nd Order Context Vectors
      • He won an Oscar, but Tom Hanks is still a nice guy.
      0 6272.85 2.9133 62.6084 20.032 1176.84 51.021 O2 context 0 18818.55 0 0 0 205.5469 134.5102 guy 0 0 0 136.0441 29.576 0 0 Oscar 0 0 8.7399 51.7812 30.520 3324.98 18.5533 won needle family war movie actor football baseball
    • After context representation…
      • Second order vector is an average of word vectors that make up context, captures indirect relationships
        • Reduced by SVD to principal components
      • Now, cluster the vectors!
        • Many methods, we often use k-means or repeated bisections
        • CLUTO
    • What do we get at the end?
      • contexts organized into some number of clusters based on the similarity of their co-occurrences
      • contexts which share words that tend to co-occur with the same other words are clustered together
        • 2 nd order co-occurrences
    • Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
    • Oh…we also get plenty of these…
      • Similarity Matrices…
        • Word by Word
        • Ngram by Ngram
        • Word by Context
        • Ngram by Context
        • Context by Word
        • Context by Ngram
        • Context by Context
    • The WSD-Shell http:// www.d.umn.edu/~tpederse/supervised.html Kernels
    • Things we can do now…
      • Identify associated words
        • fine wine
        • baseball bat
      • Identify similar contexts
        • I bought some food at the store
        • I purchased something to eat at the market
      • Assign meanings to words
        • I went to the bank/[financial-inst.] to deposit my check
      • Identify similar (or related) concepts
        • frog : amphibian
        • Duluth : snow
    • Machine Learning Approach
      • Annotate text with sense tags
        • must select sense inventory
      • Find interesting features
        • bigrams and co-occurrences quite effective
      • Learn a model
      • Apply model to untagged data
      • Works very well…given sufficient quantities of training data and sufficient coverage of your sense inventory
    • Kernel Methods
      • The challenge for any learning algorithm is to separate the training data into groups by finding a boundary (hyperplane)
      • Sometimes in the original space this boundary is hard to find
      • Transform data via kernel function to a different higher dimensional representation, where boundaries are easier to spot
    • Kernels are similarity matrices
      • NSP produces word by word similarity matrices, for use by SenseClusters
      • SenseClusters produces various sorts of similarity matrices based on co-occurrences
      • …which can be used as kernels
        • Latent Semantic kernel
        • Bigram Association kernel
        • Co-occurrence Association kernel
    • What do we get at the end?
      • More accurate supervised classifiers that potentially require less training data
      • Kernel improves ability to find boundaries between training examples by transforming feature space to a higher dimensional “cleaner” space…
    • Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
    • WordNet-Similarity http://wn-similarity.sourceforge.net Ontologies
    • Things we can do now…
      • Identify associated words
        • fine wine
        • baseball bat
      • Identify similar contexts
        • I bought some food at the store
        • I purchased something to eat at the market
      • Assign meanings to words
        • I went to the bank/[financial-inst.] to deposit my check
      • Identify similar (or related) concepts
        • frog : amphibian
        • Duluth : snow
    • Similarity and Relatedness
      • Two concepts are similar if they are connected by is-a relationships.
        • A frog is-a-kind-of amphibian
        • An illness is-a heath_condition
      • Two concepts can be related many ways…
        • A human has-a-part liver
        • Duluth receives-a-lot-of snow
      • … similarity is one way to be related
    • WordNet-Similarity http://wn-similarity.sourceforge.net
      • Path based measures
        • Shortest path (path)
        • Wu & Palmer (wup)
        • Leacock & Chodorow (lch)
        • Hirst & St-Onge (hso)
      • Information content measures
        • Resnik (res)
        • Jiang & Conrath (jcn)
        • Lin (lin)
      • Gloss based measures
        • Banerjee and Pedersen (lesk)
        • Patwardhan and Pedersen (vector, vector_pairs)
    • Path Finding
      • Find shortest is-a path between two concepts?
        • Rada, et. al. (1989)
        • Scaled by depth of hierarchy
          • Leacock & Chodorow (1998)
        • Depth of subsuming concept scaled by sum of the depths of individual concepts
          • Wu and Palmer (1994)
    • watercraft instrumentality object artifact conveyance vehicle motor-vehicle car boat ark article ware table-ware cutlery fork from Jiang and Conrath [1997]
    • Information Content
      • Measure of specificity in is-a hierarchy (Resnik, 1995)
        • -log (probability of concept)
        • High information content values mean very specific concepts (like pitch-fork and basketball shoe)
      • Count how often a concept occurs in a corpus
        • Increment the count associated with that concept, and propagate the count up!
        • If based on word forms, increment all concepts associated with that form
    • Observed “car”... motor vehicle (327 +1) *root* (32783 + 1) minicab (6) cab (23) car (73 +1) bus (17) stock car (12)
    • Observed “stock car”... motor vehicle (328+1) *root* (32784+1) minicab (6) cab (23) car (74+1) bus (17) stock car (12+1)
    • After Counting Concepts... motor vehicle (329) IC = 1.9 *root* (32785) minicab (6) cab (23) car (75) bus (17) IC = 3.5 stock car (13) IC = 3.1
    • Similarity and Information Content
      • Resnik (1995) use information content of least common subsumer to express similarity between two concepts
      • Lin (1998) scale information content of least common subsumer with sum of information content of two concepts
      • Jiang & Conrath (1997) find difference between least common subsumer’s information content and the sum of the two individual concepts
    • What do we get at the end?
      • Similarity (or relatedness) scores between pairs of words / concepts that are based on path lengths, but augmented with distributional information from corpora
      • Can create a similarity matrix between concepts based on these scores
    • Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters
    • Wait a minute …
      • Concepts must be organized in a hierarchy, and connected in that hierarchy
        • Limited to comparing nouns with nouns, or maybe verbs with verbs
        • Limited to similarity measures (is-a)
      • What about mixed parts of speech?
        • Murder (noun) and horrible (adjective)
        • Tobacco (noun) and drinking (verb)
    • Using Dictionary Glosses to Measure Relatedness
      • Lesk (1985) Algorithm – measure relatedness of two concepts by counting the number of shared words in their definitions
        • Cold - a mild viral infection involving the nose and respiratory passages (but not the lungs)
        • Flu - an acute febrile highly contagious viral disease
      • Adapted Lesk (Banerjee & Pedersen, 2003) – expand glosses to include those concepts directly related
        • Cold - a common cold affecting the nasal passages and resulting in congestion and sneezing and headache; mild viral infection involving the nose and respiratory passages (but not the lungs); a disease affecting the respiratory system
        • Flu - an acute and highly contagious respiratory disease of swine caused by the orthomyxovirus thought to be the same virus that caused the 1918 influenza pandemic; an acute febrile highly contagious viral disease; a disease that can be communicated from one person to another
    • Gloss Vectors
      • Leskian approaches require exact matches in glosses
        • Glosses are short, use related but not identical words
      • Solution? Expand glosses by replacing each content word with a co-occurrence vector derived from corpora
        • Rows are words in glosses, columns are the co-occurring words in a corpus, cell values are their log-likelihood ratios
      • Average the word vectors to create a single vector that represents the gloss/sense (Patwardhan & Pedersen, 2003)
        • 2 nd order co-occurrences
      • Measure relatedness using cosine rather than exact match!
      • Methodology the same as that used in SenseClusters
    • What do we get at the end?
      • Relatedness scores between pairs of words / concepts that are based on content of WordNet (viewing it more like MRD than ontology)
      • Can create a “relatedness” matrix between concepts based on these scores
    • Why measure conceptual similarity?
      • A word will take the sense that is most related to the surrounding context
        • I love Java , especially the beaches and the weather.
        • I love Java , especially the support for concurrent programming.
        • I love java , especially first thing in the morning with a bagel.
    • Word Sense Disambiguation
      • … can be performed by finding the sense of a word most related to its neighbors
      • Here, we define similarity and relatedness with respect to WordNet-Similarity
      • WordNet-SenseRelate
        • AllWords – assign a sense to every content word
        • TargetWord – assign a sense to a given word
          • http:// senserelate.sourceforge.net
    • WordNet-SenseRelate http:// senserelate.sourceforge.net SenseRelate
    • SenseRelate Algorithm
      • For each sense of a target word in context
        • For each content word in the context
          • For each sense of that content word
            • Measure similarity/relatedness between sense of target word and sense of content word with WordNet::Similarity
            • Keep running sum for score of each sense of target
      • Pick sense of target word with highest score with words in context
      • Go to the next word, repeat
    • Coverage…
      • WordNet
        • Nouns – 82,000 concepts
        • Verbs – 14,000 concepts
        • Adjectives – 18,000 concepts
        • Adverbs – 4,000 concepts
      • Words not found in WordNet can’t be disambiguated by SenseRelate
      • language and resource dependent…
    • What do we get at the end?
      • Can assign a sense to every word (known to WordNet) in running text
      • Can assign similarity scores to pairs of contexts, or a word and a given set of words…
      • Can turn these into a matrix …
    • Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
    • Kernels are similarity matrices
      • NSP produces word by word similarity matrices, for use by SenseClusters
      • SenseClusters produces various similarity matrices based on co-occurrences
      • WordNet-Similarity produces concept by concept similarity matrices
      • SenseRelate produces context by context similarity matrices based on concept similarity
      • All of these could be used as kernels for Supervised WSD
    • Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
    • SenseClusters Input … matrices
      • Word by Word co-occurrences to create second order representation (Native)
      • Context by Word co-occurrences to create LSA representation…
      • Concept by Concept similarity scores from WordNet::Similarity
      • Context by Context similarity scores from SenseRelate
    • Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
    • Identifying Collocations
      • …could benefit from word clusters found in SenseClusters
      • …could benefit from similarity measures from WordNet::Similarity…
    • Ontologies WordNet-Similarity Co-Occurrences Ngram Statistics Package Kernels WSD-Shell Contexts SenseClusters SenseRelate
    •  
    • Conclusion
      • Time to integrate what we have at the word and term level
        • look for ways to stitch semantic patches together
      • This will increase our coverage and decrease language dependence
        • make the quilt bigger and sturdier
      • We will then be able to look at a broader range of languages and semantic problems
        • calm problems with the warmth of your lovely quilt…
    • Many Thanks…
      • SenseClusters
        • Amruta Purandare (MS '04)
        • Anagha Kulkarni (MS '06)
        • Mahesh Joshi (MS '06)
      • WordNet Similarity
        • Sid Patwardhan (MS '03)
        • Jason Michelizzi (MS '05)
      • SenseRelate
        • Satanjeev Banerjee (MS '02)
        • Sid Patwardhan (MS '03)
        • Jason Michelizzi (MS '05)
        • Varada Kolhatkar (MS '09)
      • Ngram Statistics Package
        • Satanjeev Banerjee (MS '02)
        • Bridget McInnes (MS '04, PhD '??)
        • Saiyam Kohli (MS '06)
      • Supervised WSD
        • Saif Mohammad (MS '03)
        • Amruta Purandare (MS '04)
        • Mahesh Joshi (MS '06)
        • Bridget McInnes (MS '04, PhD '??)
    • URLs
      • Ngram Statistics Package
        • http:// ngram.sourceforge.net
      • SenseClusters
        • http:// senseclusters.sourceforge.net
      • WordNet-Similarity
        • http://wn-similarity.sourceforge.net
      • SenseRelate WSD
        • http:// senserelate.sourceforge.net
      • Supervised WSD
        • http:// www.d.umn.edu/~tpederse/supervised.html