Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Machine Learning Real World Applications Rob Jasper Intelligent Results http://fac-staff.seattleu.edu/jasperr [email_addre...
Goals <ul><li>Convince you that: </li></ul><ul><ul><li>You can use Machine learning (ML) techniques to solve difficult rea...
Overview <ul><li>Machine Learning </li></ul><ul><ul><li>Definition </li></ul></ul><ul><ul><li>Supervised versus unsupervis...
Machine Learning <ul><li>“ Machine Learning is the study of computer algorithms that improve automatically through experie...
Machine Learning Example <ul><li>Backgammon </li></ul><ul><ul><li>T : playing backgammon </li></ul></ul><ul><ul><li>P : pe...
Unsupervised versus Supervised Learning <ul><li>Unsupervised learning </li></ul><ul><ul><li>“ Learning in which the system...
Problem Setting <ul><li>From Machine Learning, Tom Mitchell, 1997 </li></ul><ul><ul><li>X  set of instances over which tar...
Concepts and training examples Instance Space X + + + + c - - - - Training examples
General Model of Learning <ul><li>General model of learning </li></ul><ul><ul><li>Learner, L considers set of hypotheses H...
Error of hypothesis Instance Space X + + + + c - h - - - Where c and  h disagree
An Operational Model of Machine Learning Learner Training Data Model Execution Engine Model Tagged Data Production Data
Machine Learning in Natural Language Processing <ul><li>NLP—”The branch of information science that deals with processing ...
PoS Tagging <ul><li>PoS tagging </li></ul><ul><ul><li>Task (T) : tag word tokens with correct part of speech </li></ul></u...
Named Entity Extraction <ul><li>Named entities task </li></ul><ul><ul><li>Task (T) : tag entities (e.g., people, places, t...
Key Phrase Extraction <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Extract Key phrases from a body of text </...
Spelling Correction <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Identify and rank suitable replacements for ...
Text Categorization <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Identify proper category among a pre-defined...
Document Representation <ul><li>“ Shortly after Phish wraps up their four-night run in Miami this December, Page will begi...
Vector Example <ul><li>Example of documents in the Phish / Trey dimensions </li></ul>Phish Trey 6 14 d1 d2
Vector Comparison Phish Trey 6 14 d1 d2
kNN classification <ul><li>kNN--k nearest neighbors </li></ul>? M M M M M S S S S S B B B B B K=1 K=5 K=10 S Sports M Musi...
Rocchio Classifier Music m m m m m m M Threshold Characteristic Vector Sports s s s s s s s S Business b b b b B b
Rocchio Formula
Rocchio Example Phish Sales + + + + - - - - Centroid+ Centroid- Rocchio
Support Vector Machines + + + + + + + + - - - - - - - - - -
Issues <ul><li>Very few training examples </li></ul><ul><li>Distribution of training examples isn’t very representative of...
Bagging <ul><li>Create a whole gaggle of classifiers, each trained on different sets of data </li></ul><ul><ul><li>Sample ...
Boosting <ul><li>Similar to bagging, run multiple classifiers on altered training data, combining the results into a final...
AdaBoost Algorithm
Summary <ul><li>Machine learning (ML) provides a way to solve complex problems where programming would be difficult </li><...
Resources <ul><li>“ Machine Learning”, Tom Mitchell, McGraw Hill, 1997 </li></ul><ul><li>“ Machine Learning in Automated T...
Upcoming SlideShare
Loading in …5
×

Machine Learning Applications in NLP.ppt

4,864 views

Published on

  • Be the first to comment

Machine Learning Applications in NLP.ppt

  1. 1. Machine Learning Real World Applications Rob Jasper Intelligent Results http://fac-staff.seattleu.edu/jasperr [email_address]
  2. 2. Goals <ul><li>Convince you that: </li></ul><ul><ul><li>You can use Machine learning (ML) techniques to solve difficult real world problems </li></ul></ul><ul><ul><li>Real world programmers / programs use ML techniques </li></ul></ul><ul><ul><li>Applications for ML abound (especially in text processing) </li></ul></ul><ul><li>Provide overview </li></ul><ul><ul><li>Variety of applications in just one small area (text processing) </li></ul></ul><ul><ul><li>Classification is the quintessential ML problem </li></ul></ul><ul><ul><li>Variety of techniques to solving classification problems </li></ul></ul><ul><ul><li>Issues involved in building classifiers </li></ul></ul><ul><ul><li>Advanced techniques for dealing with particular problems </li></ul></ul>
  3. 3. Overview <ul><li>Machine Learning </li></ul><ul><ul><li>Definition </li></ul></ul><ul><ul><li>Supervised versus unsupervised </li></ul></ul><ul><li>Machine Learning in NLP </li></ul><ul><ul><li>Part of Speech (PoS) tagging </li></ul></ul><ul><ul><li>Named entity extraction </li></ul></ul><ul><ul><li>Key phrase extraction </li></ul></ul><ul><ul><li>Spelling correction </li></ul></ul><ul><ul><li>(Text) classification </li></ul></ul><ul><li>The quintessential ML problem </li></ul><ul><li>Classification techniques </li></ul><ul><ul><li>K nearest neighbor </li></ul></ul><ul><ul><li>Rocchio </li></ul></ul><ul><ul><li>Support Vector Machines (SVM) </li></ul></ul><ul><ul><li>Ensemble Techniques </li></ul></ul><ul><ul><ul><li>Bagging </li></ul></ul></ul><ul><ul><ul><li>Boosting </li></ul></ul></ul>
  4. 4. Machine Learning <ul><li>“ Machine Learning is the study of computer algorithms that improve automatically through experience. ” —Tom Mitchell </li></ul><ul><li>“ A computer program is said to learn from experience E w.r.t. some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” —Tom Mitchell </li></ul>
  5. 5. Machine Learning Example <ul><li>Backgammon </li></ul><ul><ul><li>T : playing backgammon </li></ul></ul><ul><ul><li>P : percent of games won against opponents </li></ul></ul><ul><ul><li>E : playing practice games against itself </li></ul></ul><ul><li>TD-Gammon (Tesauro 1992, 1995) learned to play at level of world champions by playing games against itself </li></ul><ul><li>What are other approaches to this problem? </li></ul>
  6. 6. Unsupervised versus Supervised Learning <ul><li>Unsupervised learning </li></ul><ul><ul><li>“ Learning in which the system parameters are adapted using only the information of the input and are constrained by prespecified internal rules.” </li></ul></ul><ul><li>Supervised learning </li></ul><ul><ul><li>“ Learning or adaptation in which a desired response can be used by the system to guide the learning.” </li></ul></ul><ul><li>Is learning backgammon supervised or unsupervised? </li></ul>
  7. 7. Problem Setting <ul><li>From Machine Learning, Tom Mitchell, 1997 </li></ul><ul><ul><li>X set of instances over which target functions can be defined </li></ul></ul><ul><ul><li>C set of target concepts our learner might want to learn </li></ul></ul><ul><ul><li>Each concept c in C can be viewed as a subset of X </li></ul></ul><ul><ul><li>Training examples are generated by drawing instance x of X at random according to some distribution D </li></ul></ul>
  8. 8. Concepts and training examples Instance Space X + + + + c - - - - Training examples
  9. 9. General Model of Learning <ul><li>General model of learning </li></ul><ul><ul><li>Learner, L considers set of hypotheses H based on properties of x </li></ul></ul><ul><ul><li>L observers a sequence of training examples </li></ul></ul><ul><ul><ul><li>x </li></ul></ul></ul><ul><ul><ul><li>c(x) </li></ul></ul></ul><ul><ul><li>L outputs hypothesis h, which is its estimate of c </li></ul></ul><ul><li>We evaluate h over new instances of X according to D </li></ul>
  10. 10. Error of hypothesis Instance Space X + + + + c - h - - - Where c and h disagree
  11. 11. An Operational Model of Machine Learning Learner Training Data Model Execution Engine Model Tagged Data Production Data
  12. 12. Machine Learning in Natural Language Processing <ul><li>NLP—”The branch of information science that deals with processing natural language” </li></ul><ul><li>Applications include </li></ul><ul><ul><li>Part of Speech (PoS) tagging </li></ul></ul><ul><ul><li>Named entity extraction </li></ul></ul><ul><ul><li>Key phrase extraction </li></ul></ul><ul><ul><li>Spelling correction </li></ul></ul><ul><ul><li>(Text) categorization </li></ul></ul>
  13. 13. PoS Tagging <ul><li>PoS tagging </li></ul><ul><ul><li>Task (T) : tag word tokens with correct part of speech </li></ul></ul><ul><ul><li>Measure (P): percent of correctly tagged words </li></ul></ul><ul><ul><li>Experience (E) manually tagged text </li></ul></ul><ul><li>Input: “The dogmatic dog danced delightfully.” </li></ul><ul><li>Output “The<article> dogmatic<adjective> dog<noun> danced<verb> delightfully<adverb>” </li></ul><ul><li>2002-3 SU Masters Project </li></ul>
  14. 14. Named Entity Extraction <ul><li>Named entities task </li></ul><ul><ul><li>Task (T) : tag entities (e.g., people, places, things) </li></ul></ul><ul><ul><li>Measure(P): Precision and Recall </li></ul></ul><ul><ul><li>Experience (E) manually tagged text (e.g, MUC) </li></ul></ul><ul><li>Input: “George saw the New York skyline in the 50’s” </li></ul><ul><li>Output “George<Person> saw the New<Place-start> York<Place-end> skyline in the 50’s<Date>” </li></ul><ul><li>2003-4 SU Masters Project </li></ul>
  15. 15. Key Phrase Extraction <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Extract Key phrases from a body of text </li></ul></ul><ul><ul><li>Measure(P): Precision and Recall </li></ul></ul><ul><ul><li>Experience (E) manually tagged text (identifying key phrases). </li></ul></ul><ul><li>Input: “DRESDEN, Germany (Reuters) - U.S. semiconductor maker Advanced Micro Devices is set to announce it will build a new chip plant in the eastern German city of Dresden, industry sources told Reuters on Saturday.” </li></ul><ul><li>Output: “Advanced Micro Devices”, “new chip plant”, “Dresden” </li></ul>
  16. 16. Spelling Correction <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Identify and rank suitable replacements for misspelled words </li></ul></ul><ul><ul><li>Measure(P): Ideal ranking </li></ul></ul><ul><ul><li>Experience (E) misspellings, correctly spelled words, logical replacements </li></ul></ul><ul><li>Input:”Fuedng”-->{“Feeding”, “Feudal”, “Feuding”, “Feed”, “Feud”} </li></ul><ul><li>Output : ”Fuedng”-->{“Feuding”, “Feeding”, “Feudal”, “Feud”} </li></ul>
  17. 17. Text Categorization <ul><li>Key phrases task </li></ul><ul><ul><li>Task (T) : Identify proper category among a pre-defined set of categories </li></ul></ul><ul><ul><li>Measure(P): Precision and Recall </li></ul></ul><ul><ul><li>Experience (E) Text document tagged with pre-defined set of categories (e.g., Reuters 21578) </li></ul></ul><ul><li>Input:”Shortly after Phish wraps up their four-night run in Miami this December, Page will begin a short tour up the East Coast with Vida Blue.” -->{Music,Sports,Business} </li></ul><ul><li>Output: Music </li></ul>
  18. 18. Document Representation <ul><li>“ Shortly after Phish wraps up their four-night run in Miami this December, Page will begin a short tour up the East Coast with Vida Blue Page, Russell and Oteil will be joined by the six-member Spam Allstars, who back Vida Blue…” </li></ul>Phish Page Russell Trey Record CD begin short tour ... 14 3 2 6 2 3 1 1 3 ... <ul><li>Remove (stop) non-content bearing terms: articles, conjuncts, etc. </li></ul><ul><li>Count content bearing words in document </li></ul><ul><li>Create vector: each word dimension, counts represent magnitude of dimension </li></ul>
  19. 19. Vector Example <ul><li>Example of documents in the Phish / Trey dimensions </li></ul>Phish Trey 6 14 d1 d2
  20. 20. Vector Comparison Phish Trey 6 14 d1 d2
  21. 21. kNN classification <ul><li>kNN--k nearest neighbors </li></ul>? M M M M M S S S S S B B B B B K=1 K=5 K=10 S Sports M Music B Business
  22. 22. Rocchio Classifier Music m m m m m m M Threshold Characteristic Vector Sports s s s s s s s S Business b b b b B b
  23. 23. Rocchio Formula
  24. 24. Rocchio Example Phish Sales + + + + - - - - Centroid+ Centroid- Rocchio
  25. 25. Support Vector Machines + + + + + + + + - - - - - - - - - -
  26. 26. Issues <ul><li>Very few training examples </li></ul><ul><li>Distribution of training examples isn’t very representative of “real data” </li></ul><ul><li>Classifier works very well on training data, but poorly on new data </li></ul><ul><ul><li>Not a big issue with SVM </li></ul></ul><ul><ul><li>An issue with kNN, Rocchio, C4.5, and many others </li></ul></ul><ul><ul><li>Bagging and Boosting are typical responses </li></ul></ul>
  27. 27. Bagging <ul><li>Create a whole gaggle of classifiers, each trained on different sets of data </li></ul><ul><ul><li>Sample training data with replacement </li></ul></ul><ul><ul><li>Majority of the sub-classifiers is the final answer </li></ul></ul>Music Training Data T2 T1 T3 T4 T5 Tn
  28. 28. Boosting <ul><li>Similar to bagging, run multiple classifiers on altered training data, combining the results into a final answer. </li></ul><ul><li>AdaBoost: </li></ul><ul><ul><li>Assign each training example a weight (all the same at start) </li></ul></ul><ul><ul><li>Boost a number of rounds </li></ul></ul><ul><ul><ul><li>Build classifier using weighted examples </li></ul></ul></ul><ul><ul><ul><li>Classify training examples </li></ul></ul></ul><ul><ul><ul><li>Increase weight of wrongly classified examples </li></ul></ul></ul><ul><ul><li>Create weighted majority classifier using weights (better classifiers get higher weights) </li></ul></ul>
  29. 29. AdaBoost Algorithm
  30. 30. Summary <ul><li>Machine learning (ML) provides a way to solve complex problems where programming would be difficult </li></ul><ul><li>Many problems can be framed a general classification problems </li></ul><ul><li>There are numerous (well known) techniques for solving these kinds of problems </li></ul><ul><li>Challenges are mainly collecting good training examples and identifying salient features </li></ul>
  31. 31. Resources <ul><li>“ Machine Learning”, Tom Mitchell, McGraw Hill, 1997 </li></ul><ul><li>“ Machine Learning in Automated Text Categorization”, Fabrizio Sabastiani, ACM Computing Surveys, March 2002 </li></ul><ul><li>“ A Short Introduction to Boosting”, Freund & Schapire, Journal of Japanese AI, Sept. 1999 </li></ul>

×