SlideShare a Scribd company logo
1 of 117
Machine Learning: Decision Tree Learning Soongsil University, Seoul Gun Ho Lee
Decision Tree Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tree leaning Task Induction Deduction Learn  Model (tree) Model (Tree) Test Set Learning algorithm Training Set Apply  Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10
Example of a Decision Tree Refund MarSt TaxInc YES NO NO NO Yes No Married   Single, Divorced < 80K > 80K Splitting Attributes Training Data Model:  Decision Tree categorical categorical continuous class
Another Example of Decision Tree categorical categorical continuous class MarSt Refund TaxInc YES NO NO Yes No Married   Single, Divorced < 80K > 80K There could be more than one tree that fits the same data! NO
Decision Tree Classification Task Decision Tree
Apply Model to Test Data Test Data Start from the root of tree. Refund MarSt TaxInc YES NO NO NO Yes No Married   Single, Divorced < 80K > 80K
Apply Model to Test Data Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married   Single, Divorced < 80K > 80K
Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married   Single, Divorced < 80K > 80K Test Data
Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married   Single, Divorced < 80K > 80K Test Data
Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married  Single, Divorced < 80K > 80K Test Data
Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married  Single, Divorced < 80K > 80K Test Data Assign Cheat to “No”
Decision Tree Classification Task Decision Tree
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Decision Trees ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],color red blue green shape circle square triangle neg pos pos neg neg color red blue green shape circle square triangle B C A B C
Properties of Decision Tree Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]
What makes a good tree? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Basic Algorithm ,[object Object],[object Object],[object Object]
Top-Down Decision Tree Induction ,[object Object],Example:  <big, red, circle>: +  <small, red, circle>: + <small, red, square>:     <big, blue, circle>:   <big,  red , circle>: +  <small,  red , circle>: + <small,  red , square>:     color red blue green
Top-Down Decision Tree Induction ,[object Object],shape circle square triangle <big,  red , circle>: +  <small, red, circle>: + <small,  red , square>:     color red blue green <big,  red , circle>: +  <small,  red , circle>: + pos <small,  red , square>:     neg pos <big, blue, circle>:   neg neg Example:  <big, red, circle>: +  <small, red, circle>: + <small, red, square>:     <big, blue, circle>:  
Decision Tree Induction Pseudocode DTree( examples ,  features ) returns a tree If all  examples  are in one category, return a leaf node with that category label. Else if the set of  features  is empty, return a leaf node with the category label that is the most common in examples. Else pick a feature  F  and create a node  R  for it For each possible value  v i  of  F : Let  examples i  be the subset of examples that have value  v i  for  F Add an out-going edge  E  to node  R  labeled with the value  v i. If  examples i  is empty then attach a leaf node to edge  E  labeled with the category that is the most common in  examples . else call DTree( examples i  ,  features  – { F }) and attach the resulting tree as the subtree under edge  E. Return the subtree rooted at  R.
The Basic Decision Tree Learning Algorithm Top-Down Induction of Decision Trees . This approach is exemplified by the  ID3  algorithm and its successor C4.5
Tree Induction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to determine the Best Split Income Age >=10k <10k young old Customers fair customers Good customers
How to determine the Best Split ,[object Object],[object Object],[object Object],High degree of impurity Low degree of impurity pure 50%  red 50%  green 75%  red 25%  green 100%  red 0%  green
Split Selection Method ,[object Object],Age < 33 30 35 Age
Split Selection Method (Contd.) ,[object Object],[object Object],[object Object],[object Object],[object Object],Car Sport, Truck Minivan
Decision Tree Induction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
History of Decision-Tree Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
General Structure of Hunt’s Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],D t ?
Hunt’s Algorithm Cheat=No Refund Cheat=No Cheat=No Yes No Refund Cheat=No Yes No Marital Status Cheat=No Cheat Single, Divorced Married Taxable Income Cheat=No < 80K >= 80K Refund Cheat=No Yes No Marital Status Cheat=No Cheat Single, Divorced Married
What is ID3( Interactive  Dichotomizer  3) ,[object Object],[object Object],[object Object],[object Object],[object Object]
ID3 ,[object Object],[object Object],[object Object],[object Object],[object Object]
ID3 Algorithm Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ID3 Algorithm
Picking a Good Split Feature ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Entropy ,[object Object],[object Object],[object Object]
Example – Information Needed ,[object Object],[object Object],[object Object],[object Object]
Entropy ,[object Object],[object Object],[object Object],[object Object],[object Object]
Entropy Plot for Binary Classification
Information Gain ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Information Gain ,[object Object],[object Object],[object Object],2+, 2    : E=1 size big  small 1+,1    1+,1  E=1  E=1 Gain=1  (0.5  1 + 0.5  1)  = 0 2+, 2    : E=1 color red  blue 2+,1    0+,1  E=0.918  E=0 Gain=1  (0.75  0.918 +0.25  0)  = 0.311 2+, 2     : E=1 shape circle  square 2+,1    0+,1  E=0.918  E=0 Gain =1  (0.75  0.918+0.25  0)  = 0.311
Training Examples for  PlayTennis Day Outlook 온도 Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No
ID3 play don’t play p no = 5/14 p yes = 9/14  ,[object Object],[object Object],[object Object]
Selecting the Next Attribute :  which attribute is the best classifier? ID3
Example – Branch on outlook ,[object Object],[object Object],p 3 =3,  n 3 =2,  E(p 3 , n 3 )=0.971 p 2 =4,  n 2 =0,  E(p 2 , n 2 )=0 p 1 =2,  n 1 =3,  E(p 1 , n 1 )=0.971 Outlook sunny overcast rain P True Normal Mild 11 P False Normal Cool 9 N False High Mild 8 N True High Hot 2 N False High Hot 1 P False Normal Hot 13 P True High Mild 12 P true Normal Cool 7 P False High Hot 3 N True High Mild 14 P False Normal Mild 10 N True Normal Cool 6 P False Normal Cool 5 P False High Mild 4
ID3 amount of information required to specify class of an example given that it reaches node   0.94 bits 0.97 bits * 5/14 gain: 0.25 bits maximal  information  gain play don’t play 0.0 bits * 4/14 0.97 bits * 5/14 0.98 bits * 7/14 0.59 bits * 7/14 0.92 bits * 6/14 0.81 bits * 4/14 0.81 bits * 8/14 1.0 bits * 4/14 1.0 bits * 6/14 outlook sunny overcast rainy + = 0.69 bits + = 0.79 bits + = 0.91 bits + = 0.89 bits gain: 0.15 bits gain: 0.03 bits gain: 0.05 bits humidity temperature windy high normal hot mild cool false true
ID3 outlook sunny overcast rainy maximal  information  gain 0.97 bits play don’t play 0.0 bits * 3/5 humidity temperature windy high normal hot mild cool false true + = 0.0 bits gain:  0.97 bits + = 0.40 bits gain: 0.57 bits + = 0.95 bits gain: 0.02 bits 0.0 bits * 2/5 0.0 bits * 2/5 1.0 bits * 2/5 0.0 bits * 1/5 0.92 bits * 3/5 1.0 bits * 2/5
ID3 outlook sunny overcast rainy humidity high normal 0.97 bits play don’t play 1.0 bits *2/5 temperature windy hot mild cool false true + = 0.95 bits gain: 0.02 bits + = 0.95 bits gain: 0.02 bits + = 0.0 bits gain: 0.97 bits humidity high normal 0.92 bits * 3/5 0.92 bits * 3/5 1.0 bits * 2/5 0.0 bits * 3/5 0.0 bits * 2/5 
ID3 outlook sunny overcast rainy windy false true humidity high normal Yes No No Yes Yes play don’t play
Hypothesis Space Search in Decision Tree Learning ,[object Object],[object Object],[object Object]
Hypothesis Space Search in Decision Tree Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hypothesis Space Search ,[object Object],[object Object],[object Object]
Occam’s Razor ,[object Object],William of Ockham (AD 1285? – 1347?)
Complex DT : Simple DT temperature cool mild hot outlook outlook outlook sunny o’cast rain P N windy true false N P sunny o’cast rain windy P humid true false P N high normal windy P true false N P true false N humid high normal P outlook sunny o’cast rain N P null outlook sunny overcast rain humidity P windy high normal N P true false N P
Inductive Bias in Decision Tree Learning Occam’s Razor
Decision Tree Representation Decision Trees
Disadvantages ,[object Object],[object Object],[object Object],[object Object],[object Object]
Advantages ,[object Object],[object Object],[object Object],[object Object]
C4.5 History ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
C4.5 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Weakness of ID3: Highly-branching attributes ,[object Object],[object Object],[object Object],[object Object],[object Object]
Weakness of ID3: Split for ID Code Attribute ,[object Object],[object Object],ID code No Yes No No Yes D1  D2  D3  …  D13  D14 Day Outlook 온도 Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No
C4.5 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Numeric attributes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
Example ,[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe 64  65  68  69  70  71  72  72  75  75  80  81  83  85 Yes  No  Yes Yes Yes No  No  Yes  Yes Yes No  Yes Yes  No
Avoid repeated sorting! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
Weather data – nominal values witten & eibe … … … … … Yes False Normal Mild Rainy Yes False High Hot  Overcast  No True High  Hot  Sunny No False High Hot Sunny Play Windy Humidity Temperature Outlook If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes
More speeding up  ,[object Object],Potential optimal breakpoints Breakpoints between values of the same class cannot be optimal value class 64  65  68  69  70  71  72  72  75  75  80  81  83  85 Yes  No Yes Yes  Yes No  No  Yes Yes  Yes  No  Yes Yes  No X
Continuous Attributes: Computing Gini Index... ,[object Object],[object Object],[object Object],[object Object],Split Positions Sorted Values
Splitting Based on Nominal Attributes ,[object Object],[object Object],OR CarType Family Sports Luxury CarType {Family,  Luxury} {Sports} CarType {Sports, Luxury} {Family} Size {Small, Large} {Medium}
Splitting Based on Continuous Attributes
Missing as a separate value ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Handling Missing Attribute Values ,[object Object],[object Object],[object Object],[object Object]
Computing Impurity Measure Split on Refund: Entropy(Refund=Yes) -(0)log(0/3) – (3/3)log(3/3)  =  0 Entropy(Refund=No)    = -(2/6)log(2/6) – (4/6)log(4/6) =  0.9183 Entropy(Children)    =  3/10 (0)  + 6/10 (0.9183) =  0.551 Gain =  0.8813  –  0.551  = 0.3303 Missing value Before Splitting:   Entropy(Parent)    = -0.3 log(0.3)-(0.7)log(0.7) =  0.8813
Handling Missing Attribute Values ,[object Object],[object Object],[object Object],[object Object]
Distribute Instances Refund Yes No Refund Yes No Probability that Refund=Yes is 3/9 Probability that Refund=No is 6/9 Assign record to the left child with weight = 3/9 and to the right child with weight = 6/9
Handling Missing Attribute Values ,[object Object],[object Object],[object Object],[object Object]
Classify Instances Refund MarSt TaxInc YES NO NO NO Yes No Married   Single,  Divorced < 80K > 80K New record: Probability that Marital Status  =  Married is 3.67/6.67 Probability that Marital Status  ={Single,Divorced} is 3/6.67 6.67 1 2 3.67 Total 2.67 1 1 6/9 Class=Yes 4 0 1 3 Class=No Total Divorced Single Married
 
From trees to rules – how? ,[object Object]
From trees to rules – simple ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
C4.5rules: choices and options ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
CART Split Selection Method ,[object Object],[object Object],[object Object]
CART ,[object Object],[object Object],[object Object],[object Object],[object Object]
Measure of Impurity: GINI ,[object Object],[object Object]
Examples for computing GINI P(C1) = 0/6 = 0  P(C2) = 6/6 = 1 Gini = 1 – P(C1) 2  – P(C2) 2  = 1 – 0 – 1 = 0  P(C1) = 1/6  P(C2) = 5/6 Gini = 1 – (1/6) 2  – (5/6) 2  = 0.278 P(C1) = 2/6  P(C2) = 4/6 Gini = 1 – (2/6) 2  – (4/6) 2  = 0.444
Comparison among Splitting Criteria For a 2-class problem:
CART ,[object Object],[object Object],[object Object],[object Object],[object Object]
Comparing Attribute Selection Measures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Issues in Decision Tree Learning Overfitting in Decision Trees
Issues in Decision Tree Learning Overfitting in Decision Trees h h ’
Overfitting ,[object Object],[object Object],[object Object],hypothesis complexity accuracy on training data on test data
Overfitting Example voltage (V) current (I) Testing Ohms Law: V = IR  (I = (1/R)V) Ohm was wrong, we have found a more accurate function! Experimentally measure 10 points Fit a curve to the Resulting data. Perfect fit to training data with an 9 th  degree polynomial (can fit  n  points exactly with an  n -1 degree polynomial)
Overfitting Example voltage (V) current (I) Testing Ohms Law: V = IR  (I = (1/R)V) Better generalization with a linear function that fits training data less accurately.
Overfitting Noise in Decision Trees ,[object Object],[object Object],shape circle square triangle color red blue green pos neg pos neg neg
Overfitting Noise in Decision Trees ,[object Object],[object Object],shape circle square triangle color red blue green pos neg pos neg <big, blue, circle>:   <medium, blue, circle>: + ,[object Object],[object Object],[object Object],small med big pos neg neg
Overfitting Prevention (Pruning) Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pruning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Early stopping ,[object Object],[object Object],[object Object],witten & eibe
Post-pruning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
Post-pruning: Subtree replacement, 1 ,[object Object],[object Object],[object Object],witten & eibe
Post-pruning: Subtree replacement, 2 ,[object Object]
Subtree replacement, 3 ,[object Object],[object Object],witten & eibe
Estimating error rates ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
*Mean and variance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
*Subtree raising ,[object Object],[object Object],[object Object],[object Object],witten & eibe X
C4.5’s method ,[object Object],[object Object],[object Object],[object Object],[object Object],witten & eibe
Example f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47 f = 5/14  e = 0.46 e < 0.51 so prune! Combined using ratios 6:2:6     e=0.51 witten & eibe Example
Issues in Decision Tree Learning Effect of Reduced-Error Pruning
Issues with Reduced Error Pruning ,[object Object],[object Object],test accuracy number of training examples
Issues in Decision Tree Learning Rule Post-Pruning
Issues in Decision Tree Learning Converting A Tree to Rules
Issues in Decision Tree Learning Attributes with Many Values
Issues in Decision Tree Learning Attributes with Costs
Issues in Decision Tree Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]
Additional Decision Tree Issues ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligenceMdAlAmin187
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Simplilearn
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 

What's hot (20)

Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
KNN
KNN KNN
KNN
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 

Similar to Slide3.ppt

NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptxcmpt cmpt
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edubutest
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Treessathish sak
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptRvishnupriya2
 
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDustiBuckner14
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is funZhen Li
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxHimanshuSharma997566
 
Part XIV
Part XIVPart XIV
Part XIVbutest
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerProduct School
 
decison tree and rules in data mining techniques
decison tree and rules in data mining techniquesdecison tree and rules in data mining techniques
decison tree and rules in data mining techniquesALIZAIB KHAN
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 

Similar to Slide3.ppt (20)

3830599
38305993830599
3830599
 
NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
 
www1.cs.columbia.edu
www1.cs.columbia.eduwww1.cs.columbia.edu
www1.cs.columbia.edu
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision TreesClassification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Data Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
 
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4ClassificationDr. Oner CelepcikayITS 632ITS 632Week 4Classification
Dr. Oner CelepcikayITS 632ITS 632Week 4Classification
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
07 learning
07 learning07 learning
07 learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Dataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptxDataming-chapter-7-Classification-Basic.pptx
Dataming-chapter-7-Classification-Basic.pptx
 
Part XIV
Part XIVPart XIV
Part XIV
 
Data Science: The Product Manager's Primer
Data Science: The Product Manager's PrimerData Science: The Product Manager's Primer
Data Science: The Product Manager's Primer
 
decison tree and rules in data mining techniques
decison tree and rules in data mining techniquesdecison tree and rules in data mining techniques
decison tree and rules in data mining techniques
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Slide3.ppt

  • 1. Machine Learning: Decision Tree Learning Soongsil University, Seoul Gun Ho Lee
  • 2.
  • 3. Tree leaning Task Induction Deduction Learn Model (tree) Model (Tree) Test Set Learning algorithm Training Set Apply Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10
  • 4. Example of a Decision Tree Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Splitting Attributes Training Data Model: Decision Tree categorical categorical continuous class
  • 5. Another Example of Decision Tree categorical categorical continuous class MarSt Refund TaxInc YES NO NO Yes No Married Single, Divorced < 80K > 80K There could be more than one tree that fits the same data! NO
  • 6. Decision Tree Classification Task Decision Tree
  • 7. Apply Model to Test Data Test Data Start from the root of tree. Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K
  • 8. Apply Model to Test Data Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K
  • 9. Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data
  • 10. Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data
  • 11. Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data
  • 12. Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Test Data Assign Cheat to “No”
  • 13. Decision Tree Classification Task Decision Tree
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Decision Tree Induction Pseudocode DTree( examples , features ) returns a tree If all examples are in one category, return a leaf node with that category label. Else if the set of features is empty, return a leaf node with the category label that is the most common in examples. Else pick a feature F and create a node R for it For each possible value v i of F : Let examples i be the subset of examples that have value v i for F Add an out-going edge E to node R labeled with the value v i. If examples i is empty then attach a leaf node to edge E labeled with the category that is the most common in examples . else call DTree( examples i , features – { F }) and attach the resulting tree as the subtree under edge E. Return the subtree rooted at R.
  • 22. The Basic Decision Tree Learning Algorithm Top-Down Induction of Decision Trees . This approach is exemplified by the ID3 algorithm and its successor C4.5
  • 23.
  • 24. How to determine the Best Split Income Age >=10k <10k young old Customers fair customers Good customers
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. Hunt’s Algorithm Cheat=No Refund Cheat=No Cheat=No Yes No Refund Cheat=No Yes No Marital Status Cheat=No Cheat Single, Divorced Married Taxable Income Cheat=No < 80K >= 80K Refund Cheat=No Yes No Marital Status Cheat=No Cheat Single, Divorced Married
  • 32.
  • 33.
  • 34.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. Entropy Plot for Binary Classification
  • 41.
  • 42.
  • 43. Training Examples for PlayTennis Day Outlook 온도 Humidity Wind PlayTennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild High Strong No
  • 44.
  • 45. Selecting the Next Attribute : which attribute is the best classifier? ID3
  • 46.
  • 47. ID3 amount of information required to specify class of an example given that it reaches node 0.94 bits 0.97 bits * 5/14 gain: 0.25 bits maximal information gain play don’t play 0.0 bits * 4/14 0.97 bits * 5/14 0.98 bits * 7/14 0.59 bits * 7/14 0.92 bits * 6/14 0.81 bits * 4/14 0.81 bits * 8/14 1.0 bits * 4/14 1.0 bits * 6/14 outlook sunny overcast rainy + = 0.69 bits + = 0.79 bits + = 0.91 bits + = 0.89 bits gain: 0.15 bits gain: 0.03 bits gain: 0.05 bits humidity temperature windy high normal hot mild cool false true
  • 48. ID3 outlook sunny overcast rainy maximal information gain 0.97 bits play don’t play 0.0 bits * 3/5 humidity temperature windy high normal hot mild cool false true + = 0.0 bits gain: 0.97 bits + = 0.40 bits gain: 0.57 bits + = 0.95 bits gain: 0.02 bits 0.0 bits * 2/5 0.0 bits * 2/5 1.0 bits * 2/5 0.0 bits * 1/5 0.92 bits * 3/5 1.0 bits * 2/5
  • 49. ID3 outlook sunny overcast rainy humidity high normal 0.97 bits play don’t play 1.0 bits *2/5 temperature windy hot mild cool false true + = 0.95 bits gain: 0.02 bits + = 0.95 bits gain: 0.02 bits + = 0.0 bits gain: 0.97 bits humidity high normal 0.92 bits * 3/5 0.92 bits * 3/5 1.0 bits * 2/5 0.0 bits * 3/5 0.0 bits * 2/5 
  • 50. ID3 outlook sunny overcast rainy windy false true humidity high normal Yes No No Yes Yes play don’t play
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Complex DT : Simple DT temperature cool mild hot outlook outlook outlook sunny o’cast rain P N windy true false N P sunny o’cast rain windy P humid true false P N high normal windy P true false N P true false N humid high normal P outlook sunny o’cast rain N P null outlook sunny overcast rain humidity P windy high normal N P true false N P
  • 56. Inductive Bias in Decision Tree Learning Occam’s Razor
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68. Weather data – nominal values witten & eibe … … … … … Yes False Normal Mild Rainy Yes False High Hot Overcast No True High Hot Sunny No False High Hot Sunny Play Windy Humidity Temperature Outlook If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes
  • 69.
  • 70.
  • 71.
  • 72. Splitting Based on Continuous Attributes
  • 73.
  • 74.
  • 75. Computing Impurity Measure Split on Refund: Entropy(Refund=Yes) -(0)log(0/3) – (3/3)log(3/3) = 0 Entropy(Refund=No) = -(2/6)log(2/6) – (4/6)log(4/6) = 0.9183 Entropy(Children) = 3/10 (0) + 6/10 (0.9183) = 0.551 Gain = 0.8813 – 0.551 = 0.3303 Missing value Before Splitting: Entropy(Parent) = -0.3 log(0.3)-(0.7)log(0.7) = 0.8813
  • 76.
  • 77. Distribute Instances Refund Yes No Refund Yes No Probability that Refund=Yes is 3/9 Probability that Refund=No is 6/9 Assign record to the left child with weight = 3/9 and to the right child with weight = 6/9
  • 78.
  • 79. Classify Instances Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K New record: Probability that Marital Status = Married is 3.67/6.67 Probability that Marital Status ={Single,Divorced} is 3/6.67 6.67 1 2 3.67 Total 2.67 1 1 6/9 Class=Yes 4 0 1 3 Class=No Total Divorced Single Married
  • 80.  
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87. Examples for computing GINI P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 – P(C1) 2 – P(C2) 2 = 1 – 0 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Gini = 1 – (1/6) 2 – (5/6) 2 = 0.278 P(C1) = 2/6 P(C2) = 4/6 Gini = 1 – (2/6) 2 – (4/6) 2 = 0.444
  • 88. Comparison among Splitting Criteria For a 2-class problem:
  • 89.
  • 90.
  • 91. Issues in Decision Tree Learning Overfitting in Decision Trees
  • 92. Issues in Decision Tree Learning Overfitting in Decision Trees h h ’
  • 93.
  • 94. Overfitting Example voltage (V) current (I) Testing Ohms Law: V = IR (I = (1/R)V) Ohm was wrong, we have found a more accurate function! Experimentally measure 10 points Fit a curve to the Resulting data. Perfect fit to training data with an 9 th degree polynomial (can fit n points exactly with an n -1 degree polynomial)
  • 95. Overfitting Example voltage (V) current (I) Testing Ohms Law: V = IR (I = (1/R)V) Better generalization with a linear function that fits training data less accurately.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109. Example f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47 f = 5/14 e = 0.46 e < 0.51 so prune! Combined using ratios 6:2:6  e=0.51 witten & eibe Example
  • 110. Issues in Decision Tree Learning Effect of Reduced-Error Pruning
  • 111.
  • 112. Issues in Decision Tree Learning Rule Post-Pruning
  • 113. Issues in Decision Tree Learning Converting A Tree to Rules
  • 114. Issues in Decision Tree Learning Attributes with Many Values
  • 115. Issues in Decision Tree Learning Attributes with Costs
  • 116.
  • 117.