Machine Learning Applications in NLP.ppt


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Machine Learning Applications in NLP.ppt

  1. 1. Machine Learning Real World Applications Rob Jasper Intelligent Results [email_address]
  2. 2. Goals <ul><li>Convince you that: </li></ul><ul><ul><li>You can use Machine learning (ML) techniques to solve difficult real world problems </li></ul></ul><ul><ul><li>Real world programmers / programs use ML techniques </li></ul></ul><ul><ul><li>Applications for ML abound (especially in text processing) </li></ul></ul><ul><li>Provide overview </li></ul><ul><ul><li>Variety of applications in just one small area (text processing) </li></ul></ul><ul><ul><li>Classification is the quintessential ML problem </li></ul></ul><ul><ul><li>Variety of techniques to solving classification problems </li></ul></ul><ul><ul><li>Issues involved in building classifiers </li></ul></ul><ul><ul><li>Advanced techniques for dealing with particular problems </li></ul></ul>
  3. 3. Overview <ul><li>Machine Learning </li></ul><ul><ul><li>Definition </li></ul></ul><ul><ul><li>Supervised versus unsupervised </li></ul></ul><ul><li>Machine Learning in NLP </li></ul><ul><ul><li>Part of Speech (PoS) tagging </li></ul></ul><ul><ul><li>Named entity extraction </li></ul></ul><ul><ul><li>Key phrase extraction </li></ul></ul><ul><ul><li>Spelling correction </li></ul></ul><ul><ul><li>(Text) classification </li></ul></ul><ul><li>The quintessential ML problem </li></ul><ul><li>Classification techniques </li></ul><ul><ul><li>K nearest neighbor </li></ul></ul><ul><ul><li>Rocchio </li></ul></ul><ul><ul><li>Support Vector Machines (SVM) </li></ul></ul><ul><ul><li>Ensemble Techniques </li></ul></ul><ul><ul><ul><li>Bagging </li></ul></ul></ul><ul><ul><ul><li>Boosting </li></ul></ul></ul>
  4. 4. Machine Learning <ul><li>“ Machine Learning is the study of computer algorithms that improve automatically through experience. ” —Tom Mitchell </li></ul><ul><li>“ A computer program is said to learn from experience E w.r.t. some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” —Tom Mitchell </li></ul>
  5. 5. Machine Learning Example <ul><li>Backgammon </li></ul><ul><ul><li>T : playing backgammon </li></ul></ul><ul><ul><li>P : percent of games won against opponents </li></ul></ul><ul><ul><li>E : playing practice games against itself </li></ul></ul><ul><li>TD-Gammon (Tesauro 1992, 1995) learned to play at level of world champions by playing games against itself </li></ul><ul><li>What are other approaches to this problem? </li></ul>
  6. 6. Unsupervised versus Supervised Learning <ul><li>Unsupervised learning </li></ul><ul><ul><li>“ Learning in which the system parameters are adapted using only the information of the input and are constrained by prespecified internal rules.” </li></ul></ul><ul><li>Supervised learning </li></ul><ul><ul><li>“ Learning or adaptation in which a desired response can be used by the system to guide the learning.” </li></ul></ul><ul><li>Is learning backgammon supervised or unsupervised? </li></ul>
  7. 7. Problem Setting <ul><li>From Machine Learning, Tom Mitchell, 1997 </li></ul><ul><ul><li>X set of instances over which target functions can be defined </li></ul></ul><ul><ul><li>C set of target concepts our learner might want to learn </li></ul></ul><ul><ul><li>Each concept c in C can be viewed as a subset of X </li></ul></ul><ul><ul><li>Training examples are generated by drawing instance x of X at random according to some distribution D </li></ul></ul>
  8. 8. Concepts and training examples Instance Space X + + + + c - - - - Training examples
  9. 9. General Model of Learning <ul><li>General model of learning </li></ul><ul><ul><li>Learner, L considers set of hypotheses H based on properties of x </li></ul></ul><ul><ul><li>L observers a sequence of training examples </li></ul></ul><ul><ul><ul><li>x </li></ul></ul></ul><ul><ul><ul><li>c(x) </li></ul></ul></ul><ul><ul><li>L outputs hypothesis h, which is its estimate of c </li></ul></ul><ul><li>We evaluate h over new instances of X according to D </li></ul>
  10. 10. Error of hypothesis Instance Space X + + + + c - h - - - Where c and h disagree
  11. 11. An Operational Model of Machine Learning Learner Training Data Model Execution Engine Model Tagged Data Production Data
  12. 12. Machine Learning in Natural Language Processing <ul><li>NLP—”The branch of information science that deals with processing natural language” </li></ul><ul><li>Applications include </li></ul><ul><ul><li>Part of Speech (PoS) tagging </li></ul></ul><ul><ul><li>Named entity extraction </li></ul></ul><ul><ul><li>Key phrase extraction </li></ul></ul><ul><ul><li>Spelling correction </li></ul></ul><ul><ul><li>(Text) categorization </li></ul></ul>
  13. 13. PoS Tagging <ul><li>PoS tagging </li></ul><ul><ul><li>Task (T) : tag word tokens with correct part of speech </li></ul></ul><ul><ul><li>Measure (P): percent of correctly tagged words </li></ul></ul><ul><ul><li>Experience (E) manually tagged text </li></ul></ul><ul><li>Input: “The dogmatic dog danced delightfully.” </li></ul><ul><li>Output “The<article> dogmatic<adjective> dog<noun> danced<verb> delightfully<adverb>” </li></ul><ul><li>2002-3 SU Masters Project </li></ul>
  14. 14. Named Entity Extraction <ul><li>Named entities task </li></ul><ul><ul><li>Task (T) : tag entities (e.g., people, places, things) </li></ul></ul><ul><ul><li>Measure(P): Precision and Recall </li></ul></ul><ul><ul><li>Experience (E) manually tagged text (e.g, MUC) </li></ul></ul><ul><li>Input: “George saw the New York skyline in the 50’s” </li></ul><ul><li>Output “George<Person> saw the New<Place-start> York<Place-end> skyline in the 50’s<Date>” </li></ul><ul><li>2003-4 SU Masters Project </li></ul>
  15. 15. Key Phrase Extraction <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Extract Key phrases from a body of text </li></ul></ul><ul><ul><li>Measure(P): Precision and Recall </li></ul></ul><ul><ul><li>Experience (E) manually tagged text (identifying key phrases). </li></ul></ul><ul><li>Input: “DRESDEN, Germany (Reuters) - U.S. semiconductor maker Advanced Micro Devices is set to announce it will build a new chip plant in the eastern German city of Dresden, industry sources told Reuters on Saturday.” </li></ul><ul><li>Output: “Advanced Micro Devices”, “new chip plant”, “Dresden” </li></ul>
  16. 16. Spelling Correction <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Identify and rank suitable replacements for misspelled words </li></ul></ul><ul><ul><li>Measure(P): Ideal ranking </li></ul></ul><ul><ul><li>Experience (E) misspellings, correctly spelled words, logical replacements </li></ul></ul><ul><li>Input:”Fuedng”-->{“Feeding”, “Feudal”, “Feuding”, “Feed”, “Feud”} </li></ul><ul><li>Output : ”Fuedng”-->{“Feuding”, “Feeding”, “Feudal”, “Feud”} </li></ul>
  17. 17. Text Categorization <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Identify proper category among a pre-defined set of categories </li></ul></ul><ul><ul><li>Measure(P): Precision and Recall </li></ul></ul><ul><ul><li>Experience (E) Text document tagged with pre-defined set of categories (e.g., Reuters 21578) </li></ul></ul><ul><li>Input:”Shortly after Phish wraps up their four-night run in Miami this December, Page will begin a short tour up the East Coast with Vida Blue.” -->{Music,Sports,Business} </li></ul><ul><li>Output: Music </li></ul>
  18. 18. Document Representation <ul><li>“ Shortly after Phish wraps up their four-night run in Miami this December, Page will begin a short tour up the East Coast with Vida Blue Page, Russell and Oteil will be joined by the six-member Spam Allstars, who back Vida Blue…” </li></ul>Phish Page Russell Trey Record CD begin short tour ... 14 3 2 6 2 3 1 1 3 ... <ul><li>Remove (stop) non-content bearing terms: articles, conjuncts, etc. </li></ul><ul><li>Count content bearing words in document </li></ul><ul><li>Create vector: each word dimension, counts represent magnitude of dimension </li></ul>
  19. 19. Vector Example <ul><li>Example of documents in the Phish / Trey dimensions </li></ul>Phish Trey 6 14 d1 d2
  20. 20. Vector Comparison Phish Trey 6 14 d1 d2
  21. 21. kNN classification <ul><li>kNN--k nearest neighbors </li></ul>? M M M M M S S S S S B B B B B K=1 K=5 K=10 S Sports M Music B Business
  22. 22. Rocchio Classifier Music m m m m m m M Threshold Characteristic Vector Sports s s s s s s s S Business b b b b B b
  23. 23. Rocchio Formula
  24. 24. Rocchio Example Phish Sales + + + + - - - - Centroid+ Centroid- Rocchio
  25. 25. Support Vector Machines + + + + + + + + - - - - - - - - - -
  26. 26. Issues <ul><li>Very few training examples </li></ul><ul><li>Distribution of training examples isn’t very representative of “real data” </li></ul><ul><li>Classifier works very well on training data, but poorly on new data </li></ul><ul><ul><li>Not a big issue with SVM </li></ul></ul><ul><ul><li>An issue with kNN, Rocchio, C4.5, and many others </li></ul></ul><ul><ul><li>Bagging and Boosting are typical responses </li></ul></ul>
  27. 27. Bagging <ul><li>Create a whole gaggle of classifiers, each trained on different sets of data </li></ul><ul><ul><li>Sample training data with replacement </li></ul></ul><ul><ul><li>Majority of the sub-classifiers is the final answer </li></ul></ul>Music Training Data T2 T1 T3 T4 T5 Tn
  28. 28. Boosting <ul><li>Similar to bagging, run multiple classifiers on altered training data, combining the results into a final answer. </li></ul><ul><li>AdaBoost: </li></ul><ul><ul><li>Assign each training example a weight (all the same at start) </li></ul></ul><ul><ul><li>Boost a number of rounds </li></ul></ul><ul><ul><ul><li>Build classifier using weighted examples </li></ul></ul></ul><ul><ul><ul><li>Classify training examples </li></ul></ul></ul><ul><ul><ul><li>Increase weight of wrongly classified examples </li></ul></ul></ul><ul><ul><li>Create weighted majority classifier using weights (better classifiers get higher weights) </li></ul></ul>
  29. 29. AdaBoost Algorithm
  30. 30. Summary <ul><li>Machine learning (ML) provides a way to solve complex problems where programming would be difficult </li></ul><ul><li>Many problems can be framed a general classification problems </li></ul><ul><li>There are numerous (well known) techniques for solving these kinds of problems </li></ul><ul><li>Challenges are mainly collecting good training examples and identifying salient features </li></ul>
  31. 31. Resources <ul><li>“ Machine Learning”, Tom Mitchell, McGraw Hill, 1997 </li></ul><ul><li>“ Machine Learning in Automated Text Categorization”, Fabrizio Sabastiani, ACM Computing Surveys, March 2002 </li></ul><ul><li>“ A Short Introduction to Boosting”, Freund & Schapire, Journal of Japanese AI, Sept. 1999 </li></ul>