Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation Masters Thesis :  Saif Mohammad   Adviso...
Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul>...
Word Sense Disambiguation <ul><li>Harry cast a bewitching  spell </li></ul><ul><li>Humans immediately understand  spell   ...
Why do we need WSD ! <ul><li>Information Retrieval </li></ul><ul><ul><li>Query:  cricket bat </li></ul></ul><ul><ul><ul><l...
Terminology <ul><li>Harry cast a bewitching  spell </li></ul><ul><li>Target word  –  the word whose intended sense is to b...
Corpus-Based Supervised Machine Learning <ul><li>A computer program is said to learn from experience … if its performance ...
Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul>...
Decision Trees <ul><li>A kind of classifier </li></ul><ul><ul><li>Assigns a class by asking a series of questions </li></u...
Automating Toy Selection for Max Moving Parts ? Color ? Size ? Car ? Size ? Car ? LOVE LOVE SO SO LOVE HATE HATE SO SO HAT...
WSD Tree Feature 4? Feature 4 ? Feature 2 ? Feature 3 ? Feature 2 ? SENSE 4 SENSE 3 SENSE 2 SENSE 1 SENSE 3 SENSE 3 0 0 0 ...
Issues… <ul><li>Why use decision trees for WSD ? </li></ul><ul><li>How are decision trees learnt ? </li></ul><ul><ul><li>I...
Lexical Features <ul><li>Surface form </li></ul><ul><ul><li>A word we observe in text </li></ul></ul><ul><ul><li>Case(n)  ...
Part of Speech Tagging <ul><li>Pre-requisite for many Natural Language Tasks </li></ul><ul><ul><ul><li>Parsing, WSD, Anaph...
Pre-Tagging <ul><li>Pre-tagging is the act of manually assigning tags to selected words in a text prior to tagging </li></...
Contextual Rules <ul><li>Initial state tagger – assigns most frequent tag for a type based on entries in a Lexicon (pre-ta...
Guaranteed Pre-Tagging <ul><li>A patch to the tagger provided – BrillPatch </li></ul><ul><ul><li>Application of contextual...
Part of Speech Features <ul><li>A word in different parts of speech has different senses </li></ul><ul><li>A word used in ...
Parse Features <ul><li>Collins Parser used to parse the data </li></ul><ul><ul><li>Source code available </li></ul></ul><u...
Sample Parse Tree VERB PHRASE NOUN PHRASE Harry NOUN PHRASE SENTENCE spell cast a bewitching NNP VBD DT JJ NN
Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul>...
Sense-Tagged Data <ul><li>Senseval2 data </li></ul><ul><ul><li>4328 instances of test data and 8611 instances of training ...
Data Processing <ul><li>Packages to convert line hard, serve and interest data to Senseval-1 and Senseval-2 data formats <...
Sample line data instance <ul><li>Original instance: </li></ul><ul><li>art} aphb 01301041: </li></ul><ul><li>&quot; There'...
Sample Output from parseSenseval <ul><li><instance id=“harry&quot;> </li></ul><ul><li><answer instance=“harry&quot; sensei...
Issues… <ul><li>How is the target word identified in  line ,  hard  and  serve  data </li></ul><ul><li>How the data is tok...
Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul>...
Surface Forms Senseval-1 & Senseval-2 66.9% 55.1% Bigrams 66.9% 55.3% Unigrams 62.9% 49.3% Surface Form 56.3% 47.7% Majori...
Individual Word POS (Senseval-1) 64.3% 58.2% 62.2% 59.2% P -1 64.3% 58.2% 62.5% 60.3% P 0 66.2% 64.4% 65.4% 63.9% P 1 64.0...
Individual Word POS (Senseval-2) 59.0% 40.2% 55.2% 49.6% P -1 58.2% 40.6% 55.7% 49.9% P 0 61.0% 49.1% 53.8% 53.1% P 1 57.9...
Combining POS Features 67.8% 68.0% 66.7% 56.3% Senseval-1 62.3% 60.4% 54.1% 54.3% line 54.6% P -2 ,   P -1 ,   P 0 , P 1  ...
Effect Guaranteed Pre-tagging on WSD Senseval-1 Senseval-2 54.7% 54.6% 67.6% 68.0% P -1 ,   P 0 , P 1 53.7% 54.0% 66.3% 66...
Parse Features (Senseval-1) 65.8% 60.3% 62.6% 60.6% Parent 66.2% 57.2% 57.5% 58.5% Phrase 66.2% 58.3% 58.1% 57.9% Par. Phr...
Parse Features (Senseval-2) 59.3% 40.1% 56.1% 50.0% Parent 59.5% 40.3% 51.7% 48.3% Phrase 60.3% 39.1% 53.0% 48.5% Par. Phr...
Thoughts… <ul><li>Both lexical and syntactic features perform comparably </li></ul><ul><li>But do they get the same instan...
Measures <ul><li>Baseline Ensemble :  accuracy of a hypothetical ensemble which predicts the sense correctly only if  both...
Best Combinations 90.1% 83.2% 54.9% 67.6% P -1 ,P 0 , P 1  78.8% Bigrams 79.9% Interest 89.9% 81.6% 42.2% 58.4% P -1 ,P 0 ...
Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul>...
Conclusions <ul><li>Significant amount of complementarity across lexical and syntactic features </li></ul><ul><ul><li>Comb...
Other Contributions <ul><li>Converted  line ,  hard ,  serve  and  interest  data into Senseval-2 data format </li></ul><u...
Code, Data, Resources and Publication <ul><li>posSenseval  : part of speech tags any data in Senseval-2 data format </li><...
Thank You
Upcoming SlideShare
Loading in …5
×

Download

470 views

Published on

  • Be the first to comment

  • Be the first to like this

Download

  1. 1. Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation Masters Thesis : Saif Mohammad Advisor : Dr. Ted Pedersen University of Minnesota, Duluth Date: August 1, 2003
  2. 2. Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusions </li></ul>
  3. 3. Word Sense Disambiguation <ul><li>Harry cast a bewitching spell </li></ul><ul><li>Humans immediately understand spell to mean a charm or incantation </li></ul><ul><ul><li>reading out letter by letter or a period of time ? </li></ul></ul><ul><ul><ul><li>Words with multiple senses – polysemy , ambiguity </li></ul></ul></ul><ul><ul><li>Utilize background knowledge and context </li></ul></ul><ul><li>Machines lack background knowledge </li></ul><ul><ul><li>A utomatically i dentifying the intended sense of a word in written text, based on its context , remain s a hard problem </li></ul></ul><ul><ul><li>Features are identified from the context </li></ul></ul><ul><ul><li>Best accuracies in latest international event, around 65% </li></ul></ul>
  4. 4. Why do we need WSD ! <ul><li>Information Retrieval </li></ul><ul><ul><li>Query: cricket bat </li></ul></ul><ul><ul><ul><li>Documents pertaining to the insect and the mammal, irrelevant </li></ul></ul></ul><ul><li>Machine Translation </li></ul><ul><ul><li>Consider English to Hindi translation </li></ul></ul><ul><ul><ul><li>head to sar (upper part of the body) or adhyaksh (leader) </li></ul></ul></ul><ul><li>Machine Hu man interaction </li></ul><ul><ul><li>Instructions to machines </li></ul></ul><ul><ul><ul><li>Interactive home system: turn on the lights </li></ul></ul></ul><ul><ul><ul><li>Domestic Android: get the door </li></ul></ul></ul><ul><li>Applications are widespread and will affect our way of life </li></ul>
  5. 5. Terminology <ul><li>Harry cast a bewitching spell </li></ul><ul><li>Target word – the word whose intended sense is to be identified </li></ul><ul><ul><li>spell </li></ul></ul><ul><li>Context – the sentence housing the target word and possibly, 1 or 2 sentences around it </li></ul><ul><ul><li>Harry cast a bewitching spell </li></ul></ul><ul><li>Instance – target word along with its context </li></ul><ul><li>WSD is a classification problem wherein the occurrence of the </li></ul><ul><li>target word is assigned to one of its many possible senses </li></ul>
  6. 6. Corpus-Based Supervised Machine Learning <ul><li>A computer program is said to learn from experience … if its performance at tasks … improves with experience </li></ul><ul><li>- Mitchell </li></ul><ul><li>Task : Word Sense Disambiguation of given test instances </li></ul><ul><li>Performance : Ratio of instances correctly disambiguated to the total test instances - accuracy </li></ul><ul><li>Experience : Manually created instances such that target words are marked with intended sense – training instances </li></ul><ul><ul><li>Harry cast a bewitching spell / incantation </li></ul></ul>
  7. 7. Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusions </li></ul>
  8. 8. Decision Trees <ul><li>A kind of classifier </li></ul><ul><ul><li>Assigns a class by asking a series of questions </li></ul></ul><ul><ul><li>Questions correspond to features of the instance </li></ul></ul><ul><ul><li>Question asked depends on answer to previous question </li></ul></ul><ul><li>Inverted tree structure </li></ul><ul><ul><li>Interconnected nodes </li></ul></ul><ul><ul><ul><li>Top most node is called the root </li></ul></ul></ul><ul><ul><li>Each node corresponds to a question / feature </li></ul></ul><ul><ul><li>Each possible value of feature has corresponding branch </li></ul></ul><ul><ul><li>Leaves terminate every path from root </li></ul></ul><ul><ul><ul><li>Each leaf is associated with a class </li></ul></ul></ul>
  9. 9. Automating Toy Selection for Max Moving Parts ? Color ? Size ? Car ? Size ? Car ? LOVE LOVE SO SO LOVE HATE HATE SO SO HATE No No No Yes Yes Yes Blue Big Red Small Other Small Big ROOT NODES LEAVES
  10. 10. WSD Tree Feature 4? Feature 4 ? Feature 2 ? Feature 3 ? Feature 2 ? SENSE 4 SENSE 3 SENSE 2 SENSE 1 SENSE 3 SENSE 3 0 0 0 1 1 1 0 1 0 1 0 1 Feature 1 ? SENSE 1
  11. 11. Issues… <ul><li>Why use decision trees for WSD ? </li></ul><ul><li>How are decision trees learnt ? </li></ul><ul><ul><li>ID3 and C4.5algorithms </li></ul></ul><ul><li>What is bagging and its advantages </li></ul><ul><li>Drawbacks of decision trees bagging </li></ul><ul><li>Pedersen[2002]: Choosing the right features is of </li></ul><ul><li>greater significance than the learning algorithm itself </li></ul>
  12. 12. Lexical Features <ul><li>Surface form </li></ul><ul><ul><li>A word we observe in text </li></ul></ul><ul><ul><li>Case(n) </li></ul></ul><ul><ul><ul><li>1. Object of investigation 2. frame or covering 3. A weird person </li></ul></ul></ul><ul><ul><ul><li>Surface forms : case , cases , casing </li></ul></ul></ul><ul><ul><ul><li>An occurrence of casing suggests sense 2 </li></ul></ul></ul><ul><li>Unigrams and Bigrams </li></ul><ul><ul><li>One word and two word sequences in text </li></ul></ul><ul><ul><li>The interest rate is low </li></ul></ul><ul><ul><li>Unigrams: the, interest, rate, is, low </li></ul></ul><ul><ul><li>Bigrams: the interest, interest rate, rate is, is low </li></ul></ul>
  13. 13. Part of Speech Tagging <ul><li>Pre-requisite for many Natural Language Tasks </li></ul><ul><ul><ul><li>Parsing, WSD, Anaphora resolution </li></ul></ul></ul><ul><li>Brill Tagger – most widely used tool </li></ul><ul><ul><li>Accuracy around 95% </li></ul></ul><ul><ul><li>Source code available </li></ul></ul><ul><ul><li>Easily understood rules </li></ul></ul><ul><ul><li>Harry /NNP cast /VBD a /DT bewitching /JJ spell / NN </li></ul></ul><ul><ul><li>NNP proper noun, VBD verb past, DT determiner, NN noun </li></ul></ul>
  14. 14. Pre-Tagging <ul><li>Pre-tagging is the act of manually assigning tags to selected words in a text prior to tagging </li></ul><ul><ul><li>Mona will sit in the pretty chair // NN this time </li></ul></ul><ul><ul><li>chair is the pre-tagged word, NN is its pre-tag </li></ul></ul><ul><ul><li>Reliable anchors or seeds around which tagging is done </li></ul></ul><ul><li>Brill Tagger facilitates pre-tagging </li></ul><ul><ul><li>Pre-tag not always respected ! </li></ul></ul><ul><li>Mona /NNP will /MD sit /VB in /IN the /DT </li></ul><ul><li>pretty /RB chair // VB this /DT time /NN </li></ul>
  15. 15. Contextual Rules <ul><li>Initial state tagger – assigns most frequent tag for a type based on entries in a Lexicon (pre-tag respected) </li></ul><ul><li>Final state tagger – may modify tag of word based on context (pre-tag not given special treatment) </li></ul><ul><li>Relevant Lexicon Entries </li></ul><ul><ul><li>Type Most frequent tag Other possible tags </li></ul></ul><ul><li>chair NN (noun) VB (verb) </li></ul><ul><li> pretty RB ( adverb ) JJ (adjective ) </li></ul><ul><li>Relevant Contextual Rules </li></ul><ul><ul><li>Current Tag New Tag When </li></ul></ul><ul><ul><li>NN VB NEXTTAG DT </li></ul></ul><ul><ul><li>RB JJ NEXTTAG NN </li></ul></ul>
  16. 16. Guaranteed Pre-Tagging <ul><li>A patch to the tagger provided – BrillPatch </li></ul><ul><ul><li>Application of contextual rules to the pre-tagged words bypassed </li></ul></ul><ul><ul><li>Application of contextual rules to non pre-tagged words unchanged. </li></ul></ul><ul><ul><ul><li>Mona /NNP will /MD sit /VB in /IN the /DT </li></ul></ul></ul><ul><ul><ul><li>pretty /JJ chair //NN this /DT time /NN </li></ul></ul></ul><ul><li>Tag of chair retained as NN </li></ul><ul><ul><li>Contextual rule to change tag of chair from NN to VB not applied </li></ul></ul><ul><li>Tag of pretty transformed </li></ul><ul><ul><li>Contextual rule to change tag of pretty from RB to JJ applied </li></ul></ul>
  17. 17. Part of Speech Features <ul><li>A word in different parts of speech has different senses </li></ul><ul><li>A word used in different senses is likely to have different sets of pos around it </li></ul><ul><li>Why did jack turn /VB against /IN his /PRP$ team /NN </li></ul><ul><li>Why did jack turn /VB left /VBN at /IN the /DT crossing </li></ul><ul><li>Features used </li></ul><ul><ul><li>Individual word POS: P -2 , P -1 , P 0 , P 1 , P 2 * </li></ul></ul><ul><ul><ul><li>P 2 = JJ implies P 2 is an adjective </li></ul></ul></ul><ul><ul><li>Sequential POS: P -1 P 0 , P -1 P 0 P 1 , and so on </li></ul></ul><ul><ul><ul><li>P -1 P 0 = NN, VB implies P -1 is a noun and P 0 is a verb </li></ul></ul></ul><ul><ul><li>A combination of the above </li></ul></ul>
  18. 18. Parse Features <ul><li>Collins Parser used to parse the data </li></ul><ul><ul><li>Source code available </li></ul></ul><ul><ul><li>Uses part of speech tagged data as input </li></ul></ul><ul><li>Head word of a phrase </li></ul><ul><ul><li>the hard work , the hard surface </li></ul></ul><ul><ul><li>Phrase itself : noun phrase, verb phrase and so on </li></ul></ul><ul><li>Parent : Head word of the parent phrase </li></ul><ul><ul><li>fasten the line , cross the line </li></ul></ul><ul><ul><li>Parent Phrase </li></ul></ul>
  19. 19. Sample Parse Tree VERB PHRASE NOUN PHRASE Harry NOUN PHRASE SENTENCE spell cast a bewitching NNP VBD DT JJ NN
  20. 20. Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusions </li></ul>
  21. 21. Sense-Tagged Data <ul><li>Senseval2 data </li></ul><ul><ul><li>4328 instances of test data and 8611 instances of training data ranging over 73 different noun, verb and adjectives. </li></ul></ul><ul><li>Senseval1 data </li></ul><ul><ul><li>8512 test instances and 13,276 training instances, ranging over 35 nouns, verbs and adjectives. </li></ul></ul><ul><li>Line, hard, interest, serve data </li></ul><ul><ul><li>4,149, 4,337, 4378 and 2476 sense-tagged instances with line, hard, serve and interest as the head words. </li></ul></ul><ul><ul><li>Around 50,000 sense-tagged instances in all ! </li></ul></ul>
  22. 22. Data Processing <ul><li>Packages to convert line hard, serve and interest data to Senseval-1 and Senseval-2 data formats </li></ul><ul><li>refine preprocesses data in Senseval-2 data format to make it suitable for tagging </li></ul><ul><ul><li>Restore one sentence per line and one line per sentence, pre-tag the target words, split long sentences </li></ul></ul><ul><li>posSenseval part of speech tags any data in Senseval-2 data format </li></ul><ul><ul><li>Brill tagger along with Guaranteed Pre-tagging utilized </li></ul></ul><ul><li>parseSenseval parses data in a format as output by the Brill Tagger </li></ul><ul><ul><li>restores xml tags, creating a parsed file in Senseval-2 data format </li></ul></ul><ul><ul><li>Uses the Collins Parser </li></ul></ul>
  23. 23. Sample line data instance <ul><li>Original instance: </li></ul><ul><li>art} aphb 01301041: </li></ul><ul><li>&quot; There's none there . &quot; He hurried outside to see if there were any dry ones on the line . </li></ul><ul><li>Senseval-2 data format: </li></ul><ul><li><instance id=&quot; line-n.art} aphb 01301041: &quot;> </li></ul><ul><li><answer instance=&quot; line-n.art} aphb 01301041: &quot; senseid=&quot; cord &quot;/> </li></ul><ul><li><context> </li></ul><ul><li><s> &quot; There's none there . &quot; </s> <s> He hurried outside to see if there were any dry ones on the <head> line </head> . </s> </li></ul><ul><li></context> </li></ul><ul><li></instance> </li></ul>
  24. 24. Sample Output from parseSenseval <ul><li><instance id=“harry&quot;> </li></ul><ul><li><answer instance=“harry&quot; senseid=“incantation&quot;/> </li></ul><ul><li><context> </li></ul><ul><li>Harry cast a bewitching <head> spell </head> </li></ul><ul><li></context> </li></ul><ul><li></instance> </li></ul><ul><li><instance id=“harry&quot;> </li></ul><ul><li><answer instance=“harry&quot; senseid=“incantation&quot;/> </li></ul><ul><li><context> </li></ul><ul><li><P=“TOP~cast~1~1”> <P=“S~cast~2~2”> <P=“NPB~Potter~2~2”> Harry </li></ul><ul><li><p=“NNP”/> <P=“VP~cast~2~1”> cast <p=“VB”/> <P=“NPB~ spell ~3~3”> </li></ul><ul><li>a <p=“DT”/> bewitching <p=“JJ”/> spell <p=“NN”/> </P> </P> </P> </P> </li></ul><ul><li></context> </li></ul><ul><li></instance> </li></ul>
  25. 25. Issues… <ul><li>How is the target word identified in line , hard and serve data </li></ul><ul><li>How the data is tokenized for better quality pos tagging and parsing </li></ul><ul><li>How is the data pre-tagged </li></ul><ul><li>How is parse output of Collins Parser interpreted </li></ul><ul><li>How is the parsed output XML’ized and brought back to Senseval-2 data format </li></ul><ul><li>Idiosyncrasies of line , hard , serve , interest , Senseval-1 and Senseval-2 data and how they are handled </li></ul>
  26. 26. Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusions </li></ul>
  27. 27. Surface Forms Senseval-1 & Senseval-2 66.9% 55.1% Bigrams 66.9% 55.3% Unigrams 62.9% 49.3% Surface Form 56.3% 47.7% Majority Senseval-1 Senseval-2
  28. 28. Individual Word POS (Senseval-1) 64.3% 58.2% 62.2% 59.2% P -1 64.3% 58.2% 62.5% 60.3% P 0 66.2% 64.4% 65.4% 63.9% P 1 64.0 58.6% 58.2% 57.5% P -2 65.2% 60.8% 60.0% 59.9% P -2 64.3% 56.9% 57.2% 56.3% Majority Adj. Verbs Nouns All
  29. 29. Individual Word POS (Senseval-2) 59.0% 40.2% 55.2% 49.6% P -1 58.2% 40.6% 55.7% 49.9% P 0 61.0% 49.1% 53.8% 53.1% P 1 57.9% 38.0% 51.9% 47.1% P -2 59.4% 43.2% 50.2% 48.9% P -2 59.0% 39.7% 51.0% 47.7% Majority Adj. Verbs Nouns All
  30. 30. Combining POS Features 67.8% 68.0% 66.7% 56.3% Senseval-1 62.3% 60.4% 54.1% 54.3% line 54.6% P -2 , P -1 , P 0 , P 1 , P 2 54.6% P -1 , P 0 , P 1 54.3% P 0 , P 1 47.7% Majority Senseval-2
  31. 31. Effect Guaranteed Pre-tagging on WSD Senseval-1 Senseval-2 54.7% 54.6% 67.6% 68.0% P -1 , P 0 , P 1 53.7% 54.0% 66.3% 66.7% P -1 P 0 , P 0 P 1 54.1% 54.6% 66.1% 67.8% P -2 , P -1 , P 0 , P 1 , P 2 53.8% 54.3% 66.7% 66.7% P 0 , P 1 50.9% 50.8% 62.1% 62.2% P -1 , P 0 Reg. P Guar. P. Reg. P. Guar. P.
  32. 32. Parse Features (Senseval-1) 65.8% 60.3% 62.6% 60.6% Parent 66.2% 57.2% 57.5% 58.5% Phrase 66.2% 58.3% 58.1% 57.9% Par. Phr. 66.9% 59.8% 70.9% 64.3% Head 64.3% 56.9% 57.2% 56.3% Majority Adj. Verbs Nouns All
  33. 33. Parse Features (Senseval-2) 59.3% 40.1% 56.1% 50.0% Parent 59.5% 40.3% 51.7% 48.3% Phrase 60.3% 39.1% 53.0% 48.5% Par. Phr. 64.0% 39.8% 58.5% 51.7% Head 59.0% 39.7% 51.0% 47.7% Majority Adj. Verbs Nouns All
  34. 34. Thoughts… <ul><li>Both lexical and syntactic features perform comparably </li></ul><ul><li>But do they get the same instances right ? </li></ul><ul><ul><li>How much are the individual feature sets redundant </li></ul></ul><ul><li>Are there instances correctly disambiguated by one feature set and not by the other ? </li></ul><ul><ul><li>How much are the individual feature sets complementary </li></ul></ul><ul><ul><li>Is the effort to combine of lexical and syntactic </li></ul></ul><ul><ul><li>features justified ? </li></ul></ul>
  35. 35. Measures <ul><li>Baseline Ensemble : accuracy of a hypothetical ensemble which predicts the sense correctly only if both individual feature sets do so </li></ul><ul><ul><li>Quantifies redundancy amongst feature sets </li></ul></ul><ul><li>Optimal Ensemble : a ccuracy of a hypothetical ensemble which predicts the sense correctly if either of the individual feature sets do so </li></ul><ul><ul><li>Difference with individual accuracies quantifies complementarity </li></ul></ul><ul><li>We used a simple ensemble which sums up the </li></ul><ul><li>probabilities for each sense by the individual feature </li></ul><ul><li>sets to decide the intended sense </li></ul>
  36. 36. Best Combinations 90.1% 83.2% 54.9% 67.6% P -1 ,P 0 , P 1 78.8% Bigrams 79.9% Interest 89.9% 81.6% 42.2% 58.4% P -1 ,P 0 , P 1 73.0% Unigrams 73.3% serve 91.3% 88.9% 81.5% 86.1% Head, Par 87.7% Bigrams 89.5% hard 82.0% 74.2% 54.3% 55.1% P -1 ,P 0 , P 1 60.4% Unigrams 74.5% line 78.0% 71.1% 56.3% 57.6% P -1 ,P 0 , P 1 68.0% Unigrams 66.9% Sval1 67.9% 57.0% 47.7% 43.6% P -1 ,P 0 , P 1 55.3% Unigrams 55.3% Sval2 Opt. Ens. Maj. Base Set 2 Set 1 Data
  37. 37. Path Map <ul><li>Introduction </li></ul><ul><li>Background </li></ul><ul><li>Data </li></ul><ul><li>Experiments </li></ul><ul><li>Conclusions </li></ul>
  38. 38. Conclusions <ul><li>Significant amount of complementarity across lexical and syntactic features </li></ul><ul><ul><li>Combination of the two justified </li></ul></ul><ul><li>Part of speech of word immediately to the right of target word found most useful </li></ul><ul><ul><li>Pos of words immediately to the right of target word best for verbs and adjectives </li></ul></ul><ul><ul><li>Nouns helped by tags on either side </li></ul></ul><ul><li>Head word of phrase particularly useful for adjectives </li></ul><ul><ul><li>Nouns helped by both head and parent </li></ul></ul>
  39. 39. Other Contributions <ul><li>Converted line , hard , serve and interest data into Senseval-2 data format </li></ul><ul><li>Part of speech tagged and Parsed the Senseval2, Senseval-1, line , hard , serve and interest data </li></ul><ul><li>Developed the Guaranteed Pre-tagging mechanism to improve quality of pos tagging </li></ul><ul><ul><li>Showed that guaranteed pre-tagging improves WSD </li></ul></ul>
  40. 40. Code, Data, Resources and Publication <ul><li>posSenseval : part of speech tags any data in Senseval-2 data format </li></ul><ul><li>parseSenseval : parses data in a format as output by the Brill Tagger. Output is in Senseval-2 data format with part of speech and parse information as xml tags. </li></ul><ul><li>Packages to convert line hard, serve and interest data to Senseval-1 and Senseval-2 data formats </li></ul><ul><li>BrillPatch : Patch to Brill Tagger to employ Guaranteed Pre-Tagging </li></ul><ul><li>http://www.d.umn.edu/~tpederse/data.html </li></ul><ul><li>Brill Tagger : http://www.cs.jhu.edu/~brill/RBT1_14.tar.Z </li></ul><ul><li>Collins Parser : http://www.ai.mit.edu/people/mcollins </li></ul><ul><li>“ Guaranteed Pre-Tagging for the Brill Tagger ”, Mohammad and Pedersen, Fourth International Conference of Intelligent Systems and Text Processing, February 2003, Mexico </li></ul>
  41. 41. Thank You

×