Cicling2005

518 views
480 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
518
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cicling2005

  1. 1. Name Discrimination by Clustering Similar Contexts Ted Pedersen & Anagha Kulkarni University of Minnesota, Duluth Amruta Purandare Now at University of Pittsburgh Research Supported by National Science Foundation Faculty Early Career Development Award (#0092784)
  2. 2. Name Discrimination <ul><li>Different people have the same name </li></ul><ul><ul><li>George (HW) Bush and George (W) Bush </li></ul></ul><ul><li>Different places have the same name </li></ul><ul><ul><li>Duluth (Minn) and Duluth (GA) </li></ul></ul><ul><li>Different things have the same abbrev. </li></ul><ul><ul><li>UMD (Duluth) and UMD (College Park) </li></ul></ul>
  3. 7. Our goals? <ul><li>Given 1000 contexts w/ “John Smith”, identify those that are similar to each other </li></ul><ul><li>Group similar contexts together, assume they are associated with single individual </li></ul><ul><li>Generate an identifying label from the content of the different clusters </li></ul>
  4. 8. Measuring Similarity of Words and Contexts w/Large Corpora? <ul><li>Second order Co-occurrences </li></ul><ul><ul><li>Jim drives his car fast / Jim speeds in his auto </li></ul></ul><ul><ul><li>Car -> motor, garage, gasoline, insurance </li></ul></ul><ul><ul><li>Auto -> motor, insurance, gasoline, accident </li></ul></ul><ul><ul><li>Car and Auto occur with many of the same words. They are therefore similar! </li></ul></ul><ul><ul><li>Less direct relationship, more resistant to sparsity! </li></ul></ul>
  5. 9. Word sense discrimination <ul><li>Given 1000 contexts that include a particular target word (e.g., shell) </li></ul><ul><li>Cluster those contexts such that similar contexts come together </li></ul><ul><ul><li>Similar contexts have similar meanings </li></ul></ul><ul><li>Label each cluster with something that describes content, maybe even provides definition </li></ul>
  6. 10. Methodology <ul><li>Feature Selection </li></ul><ul><li>Context Representation </li></ul><ul><li>Measuring Similarities </li></ul><ul><li>Clustering </li></ul><ul><li>Evaluation </li></ul>
  7. 11. Feature Selection <ul><li>Identify features in large (separate) training corpora, or in data to be clustered. </li></ul><ul><li>Rely on lexical features </li></ul><ul><ul><li>Unigrams, bigrams, co-occurrences </li></ul></ul>
  8. 12. Lexical features <ul><li>Unigrams, words that occur more than X times </li></ul><ul><li>Bigrams, ordered pairs of words, separated by at most 2-3 intervening words, score above cutoff on measure of association </li></ul><ul><li>Co-occurrences, same as bigrams, but unordered </li></ul>
  9. 13. Context representation <ul><li>First order </li></ul><ul><ul><li>Unigrams, bigrams, and co-occurrences that occur in training corpus, also occur in context to be clustered </li></ul></ul><ul><ul><li>Context is represented as vector that shows if (or how often) these features occur in context to be clustered </li></ul></ul>
  10. 14. Context Representation <ul><li>Second order </li></ul><ul><ul><li>Bigrams or co-occurrences used to create matrix, cells represent counts or measure of word pair </li></ul></ul><ul><ul><li>Rows serve as co-occurrence vector for a word </li></ul></ul><ul><ul><li>Represent context by averaging vectors of words in that context </li></ul></ul>
  11. 15. 2 nd Order Context Vectors <ul><li>The largest shell store by the sea shore </li></ul>0 6272.85 2.9133 62.6084 20.032 1176.84 51.021 O2 context 0 18818.55 0 0 0 205.5469 134.5102 Store 0 0 0 136.0441 29.576 0 0 Shore 0 0 8.7399 51.7812 30.520 3324.98 18.5533 Sea Artillery Sales Bombs Sandy North- West Water Sells
  12. 16. 2 nd Order Context Vectors Context sea shore store
  13. 17. Measuring Similarities <ul><li>c1: {file, unix , commands, system , store } </li></ul><ul><li>c2: {machine, os, unix , system , computer, dos, store } </li></ul><ul><li>Matching = |X П Y| </li></ul><ul><li>{unix, system, store} = 3 </li></ul><ul><li>Cosine = |X П Y|/(|X|*|Y|) </li></ul><ul><li>3/(√5*√7) = 3/(2.2361*2.646) = 0.5070 </li></ul>
  14. 18. Limitations of 1 st or 2 nd order 0 52.27 0 0.92 0 4.21 0 28.72 0 3.24 0 1.28 0 2.53 Weapon Missile Shoot Fire Destroy Murder Kill 17.77 0 14.6 46.2 22.1 0 34.2 19.23 2.36 0 72.7 0 1.28 2.56 Execute Command Bomb Pipe Fire CD Burn
  15. 19. Latent Semantic Analysis <ul><li>Singular Value Decomposition </li></ul><ul><li>Captures Polysemy and Synonymy(?) </li></ul><ul><li>Conceptual Fuzzy Feature Matching </li></ul><ul><li>Word Space to Semantic Space </li></ul>
  16. 20. After context representation… <ul><li>Each context is represented by a vector of some sort </li></ul><ul><ul><li>First order vector shows direct occurrence of features in context </li></ul></ul><ul><ul><li>Second order vector is an average of word vectors that make up context, captures indirect relationships </li></ul></ul><ul><li>Now, cluster the vectors! </li></ul>
  17. 21. Clustering <ul><li>UPGMA </li></ul><ul><ul><li>Hierarchical : Agglomerative </li></ul></ul><ul><li>Repeated Bisections </li></ul><ul><ul><li>Hybrid : Divisive + Partitional </li></ul></ul>
  18. 22. Evaluation (before mapping) c1 c2 c4 c3 2 1 15 2 C4 6 1 1 2 C3 1 7 1 1 C2 2 3 0 10 C1
  19. 23. Evaluation (after mapping) Accuracy=38/55=0.69 20 15 2 1 2 C4 17 1 1 0 55 11 12 15 10 6 1 2 C3 10 1 7 1 C2 15 2 3 10 C1
  20. 24. Majority Sense Classifier Maj. =17/55=0.31
  21. 25. Data <ul><li>Line, Hard, Serve </li></ul><ul><ul><li>4000+ Instances / Word </li></ul></ul><ul><ul><li>60:40 Split </li></ul></ul><ul><ul><li>3-5 Senses / Word </li></ul></ul><ul><li>SENSEVAL-2 </li></ul><ul><ul><li>73 words = 28 V + 29 N + 15 A </li></ul></ul><ul><ul><li>Approx. 50-100 Test, 100-200 Train </li></ul></ul><ul><ul><li>8-12 Senses/Word </li></ul></ul>
  22. 26. Experimental comparison of 1 st and 2 nd order representations: <ul><li>SC3 </li></ul><ul><li>SC1 with Bi-gram Matrix </li></ul><ul><li>PB3 </li></ul><ul><li>PB1 with Bi-gram Features </li></ul><ul><li>SC2 </li></ul><ul><li>SC1 except </li></ul><ul><li>UPGMA, Similarity Space </li></ul><ul><li>PB2 </li></ul><ul><li>PB1 except </li></ul><ul><li>RB, Vector Space </li></ul><ul><li>SC1 </li></ul><ul><li>Co-occurrence Matrix, SVD </li></ul><ul><li>RB, Vector Space </li></ul><ul><li>PB1 </li></ul><ul><li>Co-occurrences, </li></ul><ul><li>UPGMA, Similarity Space </li></ul>Schütze (2 nd Order Contexts) Pedersen & Bruce (1 st Order Contexts)
  23. 27. Experimental Conclusions 2 nd order, RB Smaller Data (like SENSEVAL-2) 1 st order, UPGMA Large, Homogeneous (like Line, Hard, Serve) Recommendation Nature of Data
  24. 28. Software <ul><li>SenseClusters – </li></ul><ul><li>http://senseclusters.sourceforge.net/ </li></ul><ul><li>N-gram Statistic Package - http:// www.d.umn.edu/~tpederse/nsp.html </li></ul><ul><li>Cluto - </li></ul><ul><li>http://www-users.cs.umn.edu/~karypis/cluto/ </li></ul><ul><li>SVDPack - </li></ul><ul><li>http://netlib.org/svdpack/ </li></ul>
  25. 29. Making Free Software Mostly Perl, All CopyLeft <ul><li>SenseClusters </li></ul><ul><ul><li>Identify similar contexts </li></ul></ul><ul><li>Ngram Statistics Package </li></ul><ul><ul><li>Identify interesting sequences of words </li></ul></ul><ul><li>WordNet::Similarity </li></ul><ul><ul><li>Measure similarity among concepts </li></ul></ul><ul><li>Google-Hack </li></ul><ul><ul><li>Find sets of related words </li></ul></ul><ul><li>WordNet::SenseRelate </li></ul><ul><ul><li>All words sense disambiguation </li></ul></ul><ul><li>SyntaLex and Duluth systems </li></ul><ul><ul><li>Supervised WSD </li></ul></ul><ul><li>http:// www.d.umn.edu/~tpederse/code.html </li></ul>

×