Learning to Link with Wikipedia by Milne and Witten
    <ul><li>exploits Wikipedia - largest known knowledge base </li></ul><ul><li>enrich unstructured text with links to Wik...
Wikipedia <ul><ul><li>Largest most visited encyclopaedia </li></ul></ul><ul><ul><li>densely structured </li></ul></ul><ul>...
Enter Wikification
Related work <ul><li>Wikify - Mihalcea and Csomai </li></ul><ul><li>  </li></ul><ul><ul><li>Detection - based on link prob...
Machine learning approach to disambiguation <ul><ul><li>Uses the links in Wikipedia for training </li></ul></ul><ul><ul><l...
Weighing the context terms <ul><ul><li>All context terms are not equally useful - Use link probability to weigh the contex...
Combining the features <ul><li>Two features -  commonness  of each sense and its  relatedness  to surrounding context. </l...
Configuring the classifier <ul><ul><li>Probability of senses follows power law - unlikely senses can be safely ignored by ...
Learning to detect links <ul><li>Wikify uses link probabilities with no consideration to context - error prone. </li></ul>...
Features used by the links classifier <ul><li>Link Probability - each training instance involves several candidate link lo...
Training the links classifier <ul><ul><li>same 500 articles used to train disambiguation classifier are used for training ...
Evaluation <ul><ul><li>trained in 37 mins and tested in 8 mins </li></ul></ul>
Upcoming SlideShare
Loading in …5
×

Learning to Link with Wikipedia

968 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
968
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Learning to Link with Wikipedia

  1. 1. Learning to Link with Wikipedia by Milne and Witten
  2. 2.    <ul><li>exploits Wikipedia - largest known knowledge base </li></ul><ul><li>enrich unstructured text with links to Wikipedia articles </li></ul><ul><li>Implications - </li></ul><ul><ul><li>provide structured knowledge </li></ul></ul><ul><ul><li>Tasks like indexing, clustering, retrieval can use this technique instead of bag of words </li></ul></ul>
  3. 3. Wikipedia <ul><ul><li>Largest most visited encyclopaedia </li></ul></ul><ul><ul><li>densely structured </li></ul></ul><ul><ul><li>millions of articles with hundreds of millions of links </li></ul></ul><ul><ul><li>serendipitous encounters </li></ul></ul><ul><ul><li>&quot;small world&quot; - any article is just 4.5 links away from any other </li></ul></ul><ul><li>  </li></ul><ul><li>How can this be extended to ALL DOCUMENTS? </li></ul>
  4. 4. Enter Wikification
  5. 5. Related work <ul><li>Wikify - Mihalcea and Csomai </li></ul><ul><li>  </li></ul><ul><ul><li>Detection - based on link probabilities </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Disambiguation - extracts features from the phrase and its surrounding words. Compare them against training samples from Wikipedia. </li></ul></ul><ul><li>  </li></ul><ul><ul><li>Topic detection </li></ul></ul>
  6. 6. Machine learning approach to disambiguation <ul><ul><li>Uses the links in Wikipedia for training </li></ul></ul><ul><ul><li>500 training articles, 50K links, 1.8M instances </li></ul></ul><ul><ul><li>Commonness (Prior probability) </li></ul></ul><ul><ul><li>Relatedness (Link based measure) </li></ul></ul><ul><ul><ul><li>Use unambiguous links as context to disambiguate ambiguous terms. </li></ul></ul></ul><ul><ul><ul><li>sense article that has most in common with all the context articles. </li></ul></ul></ul><ul><ul><ul><li>R(a,b) = log(max(|A|,|B|)) - log(|A,B|) / log(|W|) - log(min(|A|,|B|)) </li></ul></ul></ul><ul><ul><ul><li>R(candidate) is weighted avg of its relatedness to each context article. </li></ul></ul></ul>
  7. 7. Weighing the context terms <ul><ul><li>All context terms are not equally useful - Use link probability to weigh the context terms </li></ul></ul><ul><ul><li>  Many context terms are outliers and do not relate to the central theme of the document - Use relatedness measure to compute the average semantic relatedness of a term to all other context terms. </li></ul></ul><ul><li>  </li></ul><ul><li>These two are averaged to provide a weight for each context term. This is then used in computing R(candidate) </li></ul>
  8. 8. Combining the features <ul><li>Two features - commonness of each sense and its relatedness to surrounding context. </li></ul><ul><li>How good is the context? </li></ul><ul><li>Third feature - context quality - sum of weights assigned to each context. </li></ul><ul><li>These three features are used to train a classifier that can distinguish valid senses from invalid ones. </li></ul>
  9. 9. Configuring the classifier <ul><ul><li>Probability of senses follows power law - unlikely senses can be safely ignored by choosing a threshold parameter </li></ul></ul><ul><ul><li>Improves performance, precision and decreases recall </li></ul></ul><ul><ul><li>2% is used as probability threshold </li></ul></ul><ul><ul><li>C4.5 algorithm was chosen after trying several classifiers </li></ul></ul><ul><li>  </li></ul>
  10. 10. Learning to detect links <ul><li>Wikify uses link probabilities with no consideration to context - error prone. </li></ul><ul><li>Gather all n-grams and retain those whose probability exceeds a low threshold (defined later) </li></ul><ul><li>Remaining phrases are disambiguated </li></ul><ul><li>The automatically identified Wikipedia articles provide training instances for a classifier </li></ul>
  11. 11. Features used by the links classifier <ul><li>Link Probability - each training instance involves several candidate link locations giving multiple link probabilities. These are combined into average and maximum (more indicative of links) </li></ul><ul><li>Relatedness - topics which relate to the central thread of the document are more likely to be linked </li></ul><ul><li>Disambiguation confidence - average and maximum disambiguation probability </li></ul><ul><li>Generality - link to specific topics is useful rather than general ones </li></ul><ul><li>Location and spread - Frequency, first occurrence, last occurrence </li></ul>
  12. 12. Training the links classifier <ul><ul><li>same 500 articles used to train disambiguation classifier are used for training </li></ul></ul><ul><ul><li>Threshold to discard nonsense phrases and stop words is set to 6.5% (min link probability) </li></ul></ul><ul><ul><li>Disambiguation classifier performs poorly in this case as it was trained on links but was used on raw text here - resolved by modifying the trainer to account for these other unambiguous terms. </li></ul></ul><ul><ul><li>Several classifiers were evaluated and bagged C4.5 was chosen </li></ul></ul>
  13. 13. Evaluation <ul><ul><li>trained in 37 mins and tested in 8 mins </li></ul></ul>

×