2. Training Data
Age Income Student Credit_Rating Buy_NoBuy
Youth High no fair no
Youth High no excellent no
Middle_Aged High no fair yes
Senior Medium no fair yes
Senior Low yes fair yes
Senior Low yes excellent no
Middle_Aged Low yes excellent yes
Youth Medium no fair no
Youth Low yes fair yes
Senior Medium yes fair yes
Youth Medium yes excellent yes
Middle_Aged Medium no excellent yes
Middle_Aged High yes fair yes
Senior Medium no excellent no
To predict if a person will buy a Computer or Not
3. Prediction using Decision Tree
We want to predict the Buy or NoBuy decision of a person, given
that we know his/her:
• Age
• Income
• Student or Not
• Credit_Rating
4. Prediction with Decision Tree
• Decision trees are powerful and popular tools for classification
and prediction.
• We first make a list of attributes that we can measure. In our
case those are Age, Income, Student or Not, & Credit_Rating.
• We then choose a target attribute that we want to predict, in
our case it is the “Buy_NoBuy” decision.
5. Algorithms
Commonly Used Algorithms
• ID3 (Iterative Dichotomiser 3): developed in early 1980s: good
for discrete attributes
• C4.5 (improved from ID3) : Handling both continuous and
discrete attributes
• CART (Classification and Regression Tree) : developed in 1984:
good for continuous and discrete attribute
6. ID3 Algorithm
• Information gain is used to select most useful attribute for
classification/splitting
• To calculate Information Gain, we need to know Entropy
2
1
2 2
( _ ) log
9 9 5 5
log log
14 14 14 14
0.94
c
i i
i
Entropy Buy NoBuy p p
Buy_NoBuy
no
no
yes
yes
yes
no
yes
no
yes
yes
yes
yes
yes
no
Yes 9
No 5
Total 14
7. ID3 Algorithm
.
( _ , ) ( ) ( )
( ) (2,3) ( ) (4,0)
( ) (3,2)
c Age
Entropy Buy NoBuy Age P c Entropy c
P Youth Entropy P Middle Entropy
P Senior Entropy
Weather Buy NoBuy Total
Youth 2 3 5
Middle_Aged 4 0 4
Senior 3 2 5
Total 14
Age Buy_NoBuy
Youth no
Youth no
Middle_Aged yes
Senior yes
Senior yes
Senior no
Middle_Aged yes
Youth no
Youth yes
Senior yes
Youth yes
Middle_Aged yes
Middle_Aged yes
Senior no
9. ID3 Algorithm
Similarly:
( _ , ) 0.246
( _ , ) 0.029
( _ , ) 0.151
( _ , _ ) 0.048
InformationGain Buy NoBuy Age
InformationGain Buy NoBuy Income
InformationGain Buy NoBuy Student
InformationGain Buy NoBuy Credit Rating
Highest Information
Gain
Attribute with highest information gain (here Age), will be selected as
splitting attribute.
10.
11. Final Decision Tree Using ID3
Using this tree, we can predict that a young person who is also a
student will buy a computer
12. Random Forest
• First proposed by Tin Kam Ho of Bell Labs in 1995.
• Random forest is an ensemble/group classifier that consists of
a large number of decision trees.
• Each Decision Tree gives their predicted value, but the final
prediction is made by a majority vote.
13. Step 1
• Take a random sample of size N with replacement from the
data (bootstrap sample).
Selected Age Income Student Credit_Rating Buy_NoBuy
X Youth High no fair no
X Middle_Aged High no fair yes
X Senior Low yes excellent no
X Middle_Aged Low yes excellent yes
X Senior Medium yes fair yes
X Youth Medium yes excellent yes
.
.
Nth Senior Medium no excellent no
14. Step 2
• At each node, take a random sample of attributes of size m
(without replacement). M being total number of attributes,
such that m<M.
• Generally m=sqrt(M)
• Let’s say Age &
Credit_Rating are the
attributes selected
Selected X X Prediction
Age Income Student Credit_Rating Buy_NoBuy
X Youth High no fair no
X Middle_Aged High no fair yes
X Senior Low yes excellent no
X Middle_Aged Low yes excellent yes
X Senior Medium yes fair yes
X Youth Medium yes excellent yes
.
.
Nth Senior Medium no excellent no
16. Step 3
• Construct a split by using the m attributes selected in Step 2,
• Let’s say “Age” is selected for the split, can
be done by Information Gain method.
Selected X X Prediction
Age Credit_Rating Buy_NoBuy
X Youth fair no
X Middle_Aged fair yes
X Senior excellent no
X Middle_Aged excellent yes
X Senior fair yes
X Youth excellent yes
.
.
X Senior excellent no
Age
Youth Senior
Middle
17. Step 4
• Repeat Steps 2 and 3 for each subsequent split until the tree is
complete.
• Say, for Age = Youth, let Income
& Credit_Rating are the
attributes selected at random.
Selected X X X Prediction
Age Income Credit_Rating Buy_NoBuy
X Youth High fair no
X
X
X
X
X Youth Medium excellent yes
.
.
Nth
Age
Youth Senior
Middle
19. Step 4
• Out of Income & Credit_Rating, say Income is selected for the
split, as in step 3, using information Gain method.
Selected X X X Prediction
Age Income Credit_Rating Buy_NoBuy
X Youth High fair no
X
X
X
X
X Youth Medium excellent yes
.
.
Nth
Age
Income
Youth Senior
Middle
High Medium
20. Step 5
• Repeat steps 1 to 4 to create a large number of decision trees,
let’s say we create 4 trees.
• Make prediction using each decision tree.
• Make final prediction by a majority vote over the set of trees.
21. Prediction
• Predict using random forest if a young student with low
income and fair credit rating will buy a computer or not.
Tree # Predicted (Buy_NoBuy)
1 Buy
2 Buy
3 NoBuy
4 Buy
Final Prediction on the basis of
majority vote
Buy