Decision tree and random forest

Decision Tree
&
Random Forest Algorithm
Priagung Khusumanegara
Chonnam National Univeristy

Outline
 Introduction
 Example of Decision Tree
 Principles of Decision Tree
– Entropy
– Information gain
 Random Forest
2

The problem
 Given a set of training cases/objects and their attribute
values, try to determine the target attribute value of new
examples.
– Classification
– Prediction
3

Key Requirements
 Attribute-value description: object or case must be
expressible in terms of a fixed collection of properties or
attributes (e.g., hot, mild, cold).
 Predefined classes (target values): the target
function has discrete output values (bollean or
multiclass)
 Sufficient data: enough training cases should be
provided to learn the model.
4

Principled Criterion
 Choosing the most useful attribute for classifying examples.
 Entropy
- A measure of homogeneity of the set of examples
- If the sample is completely homogeneous the entropy is zero and if
the sample is an equally divided it has entropy of one
 Information Gain
- Measures how well a given attribute separates the training
examples according to their target classification
- This measure is used to select among the candidate attributes at
each step while growing the tree
6

Information Gain
Step 1 : Calculate entropy of the target
7

Information Gain (Cont’d)
Step 2 : Calculate information gain for each
attribute
8

Step 3: Choose attribute with the largest
information gain as the decision node.
9

Step 4a: A branch with entropy of 0 is a leaf
node.
10

Step 4b: A branch with entropy more than 0
needs further splitting.
11

Step 5: The algorithm is run recursively on the
non-leaf branches, until all data is classified.
12

Random Forest
 Decision Tree : one tree
 Random Forest : more than one tree
13

Decision Tree & Random Forest
14
Decision Tree
Random Forest
Tree 1 Tree 2
Tree 3

Decision Tree
Outlook Temp. Humidity Windy Play Golf
Rainy Mild High False ?
15
Result : No

Random Forest
16
Tree 1 Tree 2
Tree 3
Tree 1 : No
Tree 2 : No
Tree 3 : Yes
Yes : 1
No : 2
Result : No

OOB Error Rate
 OOB error rate can be used to get a running
unbiased estimate of the classification error
as trees are added to the forest.
17

Decision tree and random forest

More Related Content

What's hot

Similar to Decision tree and random forest

More from Lippo Group Digital

Recently uploaded

Decision tree and random forest