3. Tree leaning Task Induction Deduction Learn Model (tree) Model (Tree) Test Set Learning algorithm Training Set Apply Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10
4. Example of a Decision Tree Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree categorical categorical continuous class
5. Another Example of Decision Tree categorical categorical continuous class MarSt Refund TaxInc YES NO NO Yes No Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data! NO
21. Decision Tree Induction Pseudocode DTree( examples , features ) returns a tree If all examples are in one category, return a leaf node with that category label. Else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Else pick a feature F and create a node R for it For each possible value v i of F : Let examples i be the subset of examples that have value v i for F Add an out-going edge E to node R labeled with the value v i. If examples i is empty then attach a leaf node to edge E labeled with the category that is the most common in examples . else call DTree( examples i , features – { F }) and attach the resulting tree as the subtree under edge E. Return the subtree rooted at R.
22. The Basic Decision Tree Learning Algorithm Top-Down Induction of Decision Trees . This approach is exemplified by the ID3 algorithm and its successor C4.5
23.
24. How to determine the Best Split Income Age >=10k <10k young old Customers fair customers Good customers
25.
26.
27.
28.
29.
30.
31. Hunt’s Algorithm Cheat=No Refund Cheat=No Cheat=No Yes No Refund Cheat=No Yes No Marital Status Cheat=No Cheat Single, Divorced Married Taxable Income Cheat=No < 80K >= 80K Refund Cheat=No Yes No Marital Status Cheat=No Cheat Single, Divorced Married
43. Training Examples for PlayTennis Day Outlook 온도 Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No
44.
45. Selecting the Next Attribute : which attribute is the best classifier? ID3
46.
47. ID3 amount of information required to specify class of an example given that it reaches node 0.94 bits 0.97 bits * 5/14 gain: 0.25 bits maximal information gain play don’t play 0.0 bits * 4/14 0.97 bits * 5/14 0.98 bits * 7/14 0.59 bits * 7/14 0.92 bits * 6/14 0.81 bits * 4/14 0.81 bits * 8/14 1.0 bits * 4/14 1.0 bits * 6/14 outlook sunny overcast rainy + = 0.69 bits + = 0.79 bits + = 0.91 bits + = 0.89 bits gain: 0.15 bits gain: 0.03 bits gain: 0.05 bits humidity temperature windy high normal hot mild cool false true
48. ID3 outlook sunny overcast rainy maximal information gain 0.97 bits play don’t play 0.0 bits * 3/5 humidity temperature windy high normal hot mild cool false true + = 0.0 bits gain: 0.97 bits + = 0.40 bits gain: 0.57 bits + = 0.95 bits gain: 0.02 bits 0.0 bits * 2/5 0.0 bits * 2/5 1.0 bits * 2/5 0.0 bits * 1/5 0.92 bits * 3/5 1.0 bits * 2/5
49. ID3 outlook sunny overcast rainy humidity high normal 0.97 bits play don’t play 1.0 bits *2/5 temperature windy hot mild cool false true + = 0.95 bits gain: 0.02 bits + = 0.95 bits gain: 0.02 bits + = 0.0 bits gain: 0.97 bits humidity high normal 0.92 bits * 3/5 0.92 bits * 3/5 1.0 bits * 2/5 0.0 bits * 3/5 0.0 bits * 2/5
50. ID3 outlook sunny overcast rainy windy false true humidity high normal Yes No No Yes Yes play don’t play
51.
52.
53.
54.
55. Complex DT : Simple DT temperature cool mild hot outlook outlook outlook sunny o’cast rain P N windy true false N P sunny o’cast rain windy P humid true false P N high normal windy P true false N P true false N humid high normal P outlook sunny o’cast rain N P null outlook sunny overcast rain humidity P windy high normal N P true false N P
68. Weather data – nominal values witten & eibe … … … … … Yes False Normal Mild Rainy Yes False High Hot Overcast No True High Hot Sunny No False High Hot Sunny Play Windy Humidity Temperature Outlook If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes
77. Distribute Instances Refund Yes No Refund Yes No Probability that Refund=Yes is 3/9 Probability that Refund=No is 6/9 Assign record to the left child with weight = 3/9 and to the right child with weight = 6/9
78.
79. Classify Instances Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K New record: Probability that Marital Status = Married is 3.67/6.67 Probability that Marital Status ={Single,Divorced} is 3/6.67 6.67 1 2 3.67 Total 2.67 1 1 6/9 Class=Yes 4 0 1 3 Class=No Total Divorced Single Married
94. Overfitting Example voltage (V) current (I) Testing Ohms Law: V = IR (I = (1/R)V) Ohm was wrong, we have found a more accurate function! Experimentally measure 10 points Fit a curve to the Resulting data. Perfect fit to training data with an 9 th degree polynomial (can fit n points exactly with an n -1 degree polynomial)
95. Overfitting Example voltage (V) current (I) Testing Ohms Law: V = IR (I = (1/R)V) Better generalization with a linear function that fits training data less accurately.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109. Example f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47 f = 5/14 e = 0.46 e < 0.51 so prune! Combined using ratios 6:2:6 e=0.51 witten & eibe Example