Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche Systeme Albrecht Zimmernann, Tayfun Guerel,  Kristian Kersti...
Why Machine Learning? <ul><li>Past </li></ul><ul><li>Computers (mostly) programmed by hand </li></ul><ul><li>Future </li><...
Behavioural Cloning / Verhaltensimitation plays logs plays User model
Backgammon <ul><li>More than 10 20  states (boards) </li></ul><ul><li>Best human players see only small fraction of all bo...
TD-Gammon by Tesauro (1995)
Recent Trends <ul><li>Recent progress in algorithms and theory </li></ul><ul><li>Growing flood of online data </li></ul><u...
Three Niches for Machine Learning <ul><li>Data mining:  using historical data to improve decisions </li></ul><ul><ul><li>M...
Typical Data Mining task <ul><li>Given: </li></ul><ul><ul><li>9,714 patient records, each describing pregnancy and birth <...
Data Mining Result <ul><li>One of 18 learned rules: </li></ul><ul><ul><li>If no previous vaginal delivery </li></ul></ul><...
Credit Risk Analysis <ul><li>Learned Rules: </li></ul><ul><ul><li>If  Other-Delinquent-Accounts > 2 </li></ul></ul><ul><ul...
Other Prediction Problems Process optimization Customer Purchase behavior Customer retention
Problems Too Difficult to Program by Hand <ul><li>ALVINN [Pomerlau] drives 70mph on highways </li></ul>
Problems Too Difficult to Program by Hand <ul><li>ALVINN [Pomerlau] drives 70mph on highways </li></ul>
Software that Customizes to User
Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche Systeme Albrecht Zimmernann, Tayfun Guerel,  Kristian Kersti...
Classification: Definition <ul><li>Given a collection of records ( training set  ) </li></ul><ul><ul><li>Each record conta...
Illustrating Classification Task
Examples of Classification Task <ul><li>Predicting tumor cells as benign or malignant </li></ul><ul><li>Classifying credit...
Classification Techniques <ul><li>Decision Tree  based Methods </li></ul><ul><li>Rule-based Methods </li></ul><ul><li>Inst...
Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal No Yes Each internal node tests an attribute...
Decision Tree for PlayTennis Outlook Temperature Humidity Wind  PlayTennis Sunny  Hot  High  Weak  ? No Outlook Sunny Over...
Decision Tree for Conjunction Outlook Sunny Overcast Rain Wind Strong Weak No Yes No Outlook=Sunny    Wind=Weak No
Decision Tree for Disjunction Outlook Sunny Overcast Rain Yes Outlook=Sunny    Wind=Weak Wind Strong Weak No Yes Wind Str...
Decision Tree for XOR Outlook Sunny Overcast Rain Wind Strong Weak Yes No Outlook=Sunny  XOR Wind=Weak Wind Strong Weak No...
Decision Tree  <ul><li>decision trees represent disjunctions of conjunctions </li></ul>(Outlook=Sunny    Humidity=Normal)...
When to consider Decision Trees <ul><li>Instances describable by attribute-value pairs </li></ul><ul><li>Target function i...
Decision Tree Induction <ul><li>Many Algorithms: </li></ul><ul><ul><li>Hunt’s Algorithm (one of the earliest) </li></ul></...
Top-Down Induction of Decision Trees ID3 <ul><li>A    the  “best”  decision attribute for next  node </li></ul><ul><li>As...
Which Attribute is ”best”? <ul><li>Example: </li></ul><ul><ul><li>2 Attributes, 1 class variable </li></ul></ul><ul><ul><l...
Entropy <ul><li>S is a sample of training examples </li></ul><ul><li>p +  is the proportion of positive examples </li></ul...
Entropy <ul><li>Entropy(S)  = expected number of bits needed to encode class (+ or -) of randomly drawn members of S (unde...
Information Gain <ul><li>Gain(S,A): expected reduction in entropy due to sorting S on attribute A </li></ul>Gain(S,A)=Entr...
Information Gain <ul><li>Entropy([21+,5-])  = 0.71 </li></ul><ul><li>Entropy([8+,30-]) = 0.74 </li></ul><ul><li>Gain(S,A 1...
Another Example <ul><li>14 training-example (9+, 5-) days for playing tennis </li></ul><ul><ul><li>Wind: weak, strong </li...
Another Example Humidity High Normal [ 3 +,  4 -] [ 6 +,  1 -] S= [9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985  – ...
Yet  Another Example: Playing Tennis No Strong High Mild Rain D14 Yes Weak Normal Hot Overcast D13 Yes Strong High Mild Ov...
PlayTennis - Selecting Next Attribute Outlook Sunny Rain [ 2 +,  3 -] [ 3 +,  2 -] S= [9+,5-] E=0.940 Gain(S,Outlook) =0.9...
PlayTennis - ID3 Algorithm Outlook Sunny Overcast Rain Yes [D1,D2,…,D14] [9+,5-] S sunny =[D1,D2,D8,D9,D11] [2+,3-] ?  ?  ...
ID3 Algorithm Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No [D3,D7,D12,D13] [D8,D9,D...
Hypothesis Space Search ID3 +  -  + A2 - A4 +  - A2 - A3 -  + +  -  + +  -  + A1 -  -  + +  -  + A2 +  -  -
Hypothesis Space Search ID3 <ul><li>Hypothesis space is complete! </li></ul><ul><ul><li>Target function surely in there… <...
Converting a Tree to Rules R 1 : If (Outlook=Sunny)    (Humidity=High) Then PlayTennis=No  R 2 : If (Outlook=Sunny)    (...
Conclusions <ul><li>Decision tree learning provides a practical method for concept learning.  </li></ul><ul><li>ID3-like a...
Upcoming SlideShare
Loading in …5
×

Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...

666 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
666
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...

  1. 1. Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche Systeme Albrecht Zimmernann, Tayfun Guerel, Kristian Kersting , Prof. Dr. Luc De Raedt, Machine Learning in Games Crash Course on Machine Learning
  2. 2. Why Machine Learning? <ul><li>Past </li></ul><ul><li>Computers (mostly) programmed by hand </li></ul><ul><li>Future </li></ul><ul><li>Computers (mostly) program themselves, by interaction with their environment </li></ul>
  3. 3. Behavioural Cloning / Verhaltensimitation plays logs plays User model
  4. 4. Backgammon <ul><li>More than 10 20 states (boards) </li></ul><ul><li>Best human players see only small fraction of all boards during lifetime </li></ul><ul><li>Searching is hard because of dice (branching factor > 100) </li></ul>
  5. 5. TD-Gammon by Tesauro (1995)
  6. 6. Recent Trends <ul><li>Recent progress in algorithms and theory </li></ul><ul><li>Growing flood of online data </li></ul><ul><li>Computational power is available </li></ul><ul><li>Growing industry </li></ul>
  7. 7. Three Niches for Machine Learning <ul><li>Data mining: using historical data to improve decisions </li></ul><ul><ul><li>Medical records  medical knowledge </li></ul></ul><ul><li>Software applications we can’t program by hand </li></ul><ul><ul><li>Autonomous driving </li></ul></ul><ul><ul><li>Speech recognition </li></ul></ul><ul><li>Self customizing programs </li></ul><ul><ul><li>Newsreader that learns user interests </li></ul></ul>
  8. 8. Typical Data Mining task <ul><li>Given: </li></ul><ul><ul><li>9,714 patient records, each describing pregnancy and birth </li></ul></ul><ul><ul><li>Each patient record contains 215 features </li></ul></ul><ul><li>Learn to predict: </li></ul><ul><ul><li>Class of future patients at risk for Emergency Cesarean Section </li></ul></ul>
  9. 9. Data Mining Result <ul><li>One of 18 learned rules: </li></ul><ul><ul><li>If no previous vaginal delivery </li></ul></ul><ul><ul><li>abnormal 2 nd Trimester Ultrasound </li></ul></ul><ul><ul><li>Malpresentation at admission </li></ul></ul><ul><ul><li>Then Probability of Emergency C-Section is 0.6 </li></ul></ul><ul><li>Accuracy over training data: 26/14 = .63 </li></ul><ul><li>Accuracy over testing data: 12/20 = .60 </li></ul>
  10. 10. Credit Risk Analysis <ul><li>Learned Rules: </li></ul><ul><ul><li>If Other-Delinquent-Accounts > 2 </li></ul></ul><ul><ul><li>Number-Delinquent-Billing-Cycles > 1 </li></ul></ul><ul><ul><li>Then Profitable-Customer? = no </li></ul></ul><ul><ul><li>If Other-Delinquent-Accounts = 0 </li></ul></ul><ul><ul><li>(Income > $30k OR Years-of-credit > 3) </li></ul></ul><ul><ul><li>Then Profitable-Customer? = yes </li></ul></ul>
  11. 11. Other Prediction Problems Process optimization Customer Purchase behavior Customer retention
  12. 12. Problems Too Difficult to Program by Hand <ul><li>ALVINN [Pomerlau] drives 70mph on highways </li></ul>
  13. 13. Problems Too Difficult to Program by Hand <ul><li>ALVINN [Pomerlau] drives 70mph on highways </li></ul>
  14. 14. Software that Customizes to User
  15. 15. Lehrstuhl fuer Maschinelles Lernen und Natuerlich Sprachliche Systeme Albrecht Zimmernann, Tayfun Guerel, Kristian Kersting , Prof. Dr. Luc De Raedt, Machine Learning in Games Crash Course on Decision Tree Learning Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K
  16. 16. Classification: Definition <ul><li>Given a collection of records ( training set ) </li></ul><ul><ul><li>Each record contains a set of attributes , one of the attributes is the class . </li></ul></ul><ul><li>Find a model for class attribute as a function of the values of other attributes. </li></ul><ul><li>Goal: previously unseen records should be assigned a class as accurately as possible. </li></ul><ul><ul><li>A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. </li></ul></ul>
  17. 17. Illustrating Classification Task
  18. 18. Examples of Classification Task <ul><li>Predicting tumor cells as benign or malignant </li></ul><ul><li>Classifying credit card transactions as legitimate or fraudulent </li></ul><ul><li>Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil </li></ul><ul><li>Categorizing news stories as finance, weather, entertainment, sports, etc </li></ul>
  19. 19. Classification Techniques <ul><li>Decision Tree based Methods </li></ul><ul><li>Rule-based Methods </li></ul><ul><li>Instance-Based Learners </li></ul><ul><li>Neural Networks </li></ul><ul><li>Bayesian Networks </li></ul><ul><li>(Conditional) Random Fields </li></ul><ul><li>Support Vector Machines </li></ul><ul><li>Inductive Logic Programming </li></ul><ul><li>Statistical Relational Learning </li></ul><ul><li>… </li></ul>
  20. 20. Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  21. 21. Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal No Yes Each internal node tests an attribute Each branch corresponds to an attribute value node Each leaf node assigns a classification
  22. 22. Decision Tree for PlayTennis Outlook Temperature Humidity Wind PlayTennis Sunny Hot High Weak ? No Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  23. 23. Decision Tree for Conjunction Outlook Sunny Overcast Rain Wind Strong Weak No Yes No Outlook=Sunny  Wind=Weak No
  24. 24. Decision Tree for Disjunction Outlook Sunny Overcast Rain Yes Outlook=Sunny  Wind=Weak Wind Strong Weak No Yes Wind Strong Weak No Yes
  25. 25. Decision Tree for XOR Outlook Sunny Overcast Rain Wind Strong Weak Yes No Outlook=Sunny XOR Wind=Weak Wind Strong Weak No Yes Wind Strong Weak No Yes
  26. 26. Decision Tree <ul><li>decision trees represent disjunctions of conjunctions </li></ul>(Outlook=Sunny  Humidity=Normal)  (Outlook=Overcast)  (Outlook=Rain  Wind=Weak) Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  27. 27. When to consider Decision Trees <ul><li>Instances describable by attribute-value pairs </li></ul><ul><li>Target function is discrete valued </li></ul><ul><li>Disjunctive hypothesis may be required </li></ul><ul><li>Possibly noisy training data </li></ul><ul><li>Missing attribute values </li></ul><ul><li>Examples: </li></ul><ul><ul><li>Medical diagnosis </li></ul></ul><ul><ul><li>Credit risk analysis </li></ul></ul><ul><ul><li>RTS Games ? </li></ul></ul>
  28. 28. Decision Tree Induction <ul><li>Many Algorithms: </li></ul><ul><ul><li>Hunt’s Algorithm (one of the earliest) </li></ul></ul><ul><ul><li>CART </li></ul></ul><ul><ul><li>ID3 , C4.5 </li></ul></ul><ul><ul><li>… </li></ul></ul>
  29. 29. Top-Down Induction of Decision Trees ID3 <ul><li>A  the “best” decision attribute for next node </li></ul><ul><li>Assign A as decision attribute for node </li></ul><ul><li>3. For each value of A create new descendant </li></ul><ul><li>Sort training examples to leaf node according to </li></ul><ul><li>the attribute value of the branch </li></ul><ul><li>If all training examples are perfectly classified (same value of target attribute) stop, else iterate over new leaf nodes. </li></ul>
  30. 30. Which Attribute is ”best”? <ul><li>Example: </li></ul><ul><ul><li>2 Attributes, 1 class variable </li></ul></ul><ul><ul><li>64 examples: 29+, 35- </li></ul></ul>A 1 =? True False [21+, 5-] [8+, 30-] [29+,35-] A 2 =? True False [18+, 33-] [11+, 2-] [29+,35-]
  31. 31. Entropy <ul><li>S is a sample of training examples </li></ul><ul><li>p + is the proportion of positive examples </li></ul><ul><li>p - is the proportion of negative examples </li></ul><ul><li>Entropy measures the impurity of S </li></ul><ul><ul><li>Entropy(S) = -p + log 2 p + - p - log 2 p - </li></ul></ul>
  32. 32. Entropy <ul><li>Entropy(S) = expected number of bits needed to encode class (+ or -) of randomly drawn members of S (under the optimal, shortest length-code) </li></ul><ul><li>Information theory : optimal length code assigns </li></ul><ul><li>– log 2 p bits to messages having probability p. </li></ul><ul><li>So, the expected number of bits to encode </li></ul><ul><li>(+ or -) of random member of S: </li></ul><ul><ul><li>-p + log 2 p + - p - log 2 p - </li></ul></ul><ul><li>(log 0 = 0) </li></ul>
  33. 33. Information Gain <ul><li>Gain(S,A): expected reduction in entropy due to sorting S on attribute A </li></ul>Gain(S,A)=Entropy(S) -  v  values(A) |S v |/|S| Entropy(S v ) Entropy([29+,35-]) = -29/64 log 2 29/64 – 35/64 log 2 35/64 = 0.99 A 1 =? True False [21+, 5-] [8+, 30-] [29+,35-] A 2 =? True False [18+, 33-] [11+, 2-] [29+,35-]
  34. 34. Information Gain <ul><li>Entropy([21+,5-]) = 0.71 </li></ul><ul><li>Entropy([8+,30-]) = 0.74 </li></ul><ul><li>Gain(S,A 1 )=Entropy(S) </li></ul><ul><li>-26/64*Entropy([21+,5-]) </li></ul><ul><li>-38/64*Entropy([8+,30-]) </li></ul><ul><li>=0.27 </li></ul>Entropy([18+,33-]) = 0.94 Entropy([11+,2-]) = 0.62 Gain(S,A 2 )=Entropy(S) -51/64*Entropy([18+,33-]) -13/64*Entropy([11+,2-]) =0.12 Entropy(S)=Entropy([29+,35-]) = 0.99 A 1 =? True False [21+, 5-] [8+, 30-] [29+,35-] A 2 =? True False [18+, 33-] [11+, 2-] [29+,35-]
  35. 35. Another Example <ul><li>14 training-example (9+, 5-) days for playing tennis </li></ul><ul><ul><li>Wind: weak, strong </li></ul></ul><ul><ul><li>Humidity: high, normal </li></ul></ul>
  36. 36. Another Example Humidity High Normal [ 3 +, 4 -] [ 6 +, 1 -] S= [9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 E=0.985 E=0.592 Wind Weak Strong [ 6 +, 2 -] [ 3 +, 3 -] S= [9+,5-] E=0.940 E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048
  37. 37. Yet Another Example: Playing Tennis No Strong High Mild Rain D14 Yes Weak Normal Hot Overcast D13 Yes Strong High Mild Overcast D12 Yes Strong Normal Mild Sunny D11 Yes Strong Normal Mild Rain D10 Yes Weak Normal Cold Sunny D9 No Weak High Mild Sunny D8 Yes Weak Normal Cool Overcast D7 No Strong Normal Cool Rain D6 Yes Weak Normal Cool Rain D5 Yes Weak High Mild Rain D4 Yes Weak High Hot Overcast D3 No Strong High Hot Sunny D2 No Weak High Hot Sunny D1 Play Tennis Wind Humidity Temp. Outlook Day
  38. 38. PlayTennis - Selecting Next Attribute Outlook Sunny Rain [ 2 +, 3 -] [ 3 +, 2 -] S= [9+,5-] E=0.940 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971 =0.247 E=0.971 E=0.971 Over cast [ 4 +, 0 ] E=0.0 Gain(S,Humidity) = 0.151 Gain(S,Wind) = 0.048 Gain(S,Temp) = 0.029
  39. 39. PlayTennis - ID3 Algorithm Outlook Sunny Overcast Rain Yes [D1,D2,…,D14] [9+,5-] S sunny =[D1,D2,D8,D9,D11] [2+,3-] ? ? [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Gain(S sunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(S sunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(S sunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
  40. 40. ID3 Algorithm Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No [D3,D7,D12,D13] [D8,D9,D11] [D6,D14] [D1,D2] [D4,D5,D10]
  41. 41. Hypothesis Space Search ID3 + - + A2 - A4 + - A2 - A3 - + + - + + - + A1 - - + + - + A2 + - -
  42. 42. Hypothesis Space Search ID3 <ul><li>Hypothesis space is complete! </li></ul><ul><ul><li>Target function surely in there… </li></ul></ul><ul><li>Outputs a single hypothesis </li></ul><ul><li>No backtracking on selected attributes (greedy search) </li></ul><ul><ul><li>Local minimal (suboptimal splits) </li></ul></ul><ul><li>Statistically-based search choices </li></ul><ul><ul><li>Robust to noisy data </li></ul></ul><ul><li>Inductive bias (search bias) </li></ul><ul><ul><li>Prefer shorter trees over longer ones </li></ul></ul><ul><ul><li>Place high information gain attributes close to the root </li></ul></ul>
  43. 43. Converting a Tree to Rules R 1 : If (Outlook=Sunny)  (Humidity=High) Then PlayTennis=No R 2 : If (Outlook=Sunny)  (Humidity=Normal) Then PlayTennis=Yes R 3 : If (Outlook=Overcast) Then PlayTennis=Yes R 4 : If (Outlook=Rain)  (Wind=Strong) Then PlayTennis=No R 5 : If (Outlook=Rain)  (Wind=Weak) Then PlayTennis=Yes Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  44. 44. Conclusions <ul><li>Decision tree learning provides a practical method for concept learning. </li></ul><ul><li>ID3-like algorithms search complete hypothesis space. </li></ul><ul><li>The inductive bias of decision trees is preference (search) bias. </li></ul><ul><li>Overfitting (you will see it, ;-)) the training data is an important issue in decision tree learning. </li></ul><ul><li>A large number of extensions of the ID3 algorithm have been proposed for overfitting avoidance, handling missing attributes, handling numerical attributes, etc (feel free to try them out). </li></ul>

×