Decision tree induction \ Decision Tree Algorithm with Example| Data science

DECISION TREE
• Rabia Rehman

DECISION TREE
• A decision tree is a flowchart- like tree structure that
includes root nodes, branches and leaf nodes.
• Each internal node (non-leaf node) denotes a test on an
attributes, each branch denotes the outcome of a test,
and each leaf (terminal) node holds a class label.
• The top most node in a tree is root node.
• It’s a supervised machine learning algorithm.
• A leaf node attribute produces a homogeneous result
(all in one class), which does not require additional
classification testing.

Common terms used with Decision Tree
Root Node: It represents entire population or sample, and
this further gets divided into two or more homogeneous
sets.
Splitting: It is a process of dividing a node into two or more
sub-nodes.
Decision Node: Specifies a test on a single attribute
Leaf/ Terminal Node: Indicates the value of the target
attribute
Pruning: When we remove sub-nodes of a decision node,
this process is called pruning. You can say opposite
process of splitting.
Arc/edge: No. of paths extract from single attribute.
Path: A disjunction of test to make the final decision
• Decision trees classify instances or examples by starting
at the root of the tree and moving through it until a leaf
node

• The following
decision tree is for
the concept
buy_computer that
indicates whether a
customer at a
company is likely to
buy a computer or
not.
• Each internal node
represents a test on
an attribute.
• Each leaf node
represents a class.

“WHY ARE DECISION TREE
CLASSIFIERS SO POPULAR?
• It can handle multidimensional data.
• It requires less data cleaning compared to some other modeling
techniques. It is not influenced by outliers and missing values to
a fair degree.
• It works for both categorical and continuous input and output
variables.
• The learning and classification steps of a decision tree are simple
and fast.
• Perform classification without much computation.
• The construction of decision tree classiﬁers does not require any
domain knowledge or parameter setting.

TYPES OF DECISION TREE
• There are two types of decision tree:
1. CART(classification and regression tree)
• Adopt Gini Index
• Entropy Calculate
• Information Gain
2. ID3 (Iterative Dichotomiser 3)
• Entropy
• Avg Entropy of attributes
• Information Gain

ID3 ALGORITHM
• During the late 1970s and early 1980s, J.Ross Quinlan, a researcher
in machine learning, developed a decision tree algorithm known as
ID3 (Iterative Dichotomiser).
• ID3 algorithm is a classification algorithm that follows a greedy
approach of building a decision tree by selecting a best attribute that
yields maximum Information Gain or minimum Entropy.
• In this algorithm, there is no backtracking; the trees are constructed
in a top-down recursive divide-and-conquer manner.
 It uses three functions
• Entropy, average entropy & Information Gain

WHAT IS ENTROPY?
• A measure of homogeneity or uncertainty of the set of
examples.
• Given a set T of positive and negative examples of some
target concept (a 2-class problem), the entropy of set T
relative to this binary classification is:
E(T) = - (p/p+n) log2 (p/p+n) – (n/p+n) log2 (n/p+n)
p= Positive (Number of positive values in target
attribute)
n= Negative (Number of negative values in target
attribute)

WHAT IS AVERAGE ENTROPY?
• Calculate Entropy for sub attributes is average entropy.
• AvgEntropy =

WHAT IS INFORMATION GAIN?
• Information gain measures the expected reduction in entropy, or
uncertainty.
IG= Entropy(Attribute)-AvgEntropy(Attribute)

THE PROCESS
1. Calculate entropy for dataset.
2. For each attribute/feature
• Calculate entropy for all its categorical values.
• Calculate information gain for the feature.
3. Find the feature with maximum information gain.
4. Repeat it until we get the desired tree.

CONSIDER A TABLE OF DATA SET BELOW. GIVEN THE COLUMN “PLAY
TENNIS” AS TARGET ATTRIBUTE (T), AND EXAMPLE OF DAYS WHICH SUCH
CONDITIONS AS ATTRIBUTES: OUTLOOK; TEMPERATURE; HUMIDITY;
AND WIND. WE WANT TO KNOW WHAT THE BEST DAY TO PLAY TENNIS.

STEP 1: CALCULATE ENTROPY FOR DATASET
• Choose column “Play Tennis” as a Target Attribute (T).
• Dataset is of binary classes (yes and no), where 9 out of 14 are "yes"
and 5 out of 14 are "no“.
• We can consider Yes as Positive (p) and No as Negative (n).

• Complete entropy of dataset (Target Value) is:
E(T) = - (p/p+n) log2 (p/p+n) – (n/p+n) log2 (n/p+n)

STEP 2: CALCULATE ENTROPY OF EACH ATTRIBUTE
FOR ALL ITS CATEGORICAL VALUES
• First Attribute – Outlook
• Categorical values - sunny, overcast and rain Sunny: p(Yes)=2, n(No)=3
• E(T) = - (p/p+n) log2 (p/p+n) – (n/p+n) log2 (n/p+n) Rain: p(Yes)=3, n(No)=2
• E(Outlook=sunny) = -(2/5)*log(2/5)-(3/5)*log(3/5) =0.971 Overcast: p(Yes)=4, n(No)=0
• E(Outlook=rain) = -(3/5)*log(3/5)-(2/5)*log(2/5) =0.971
• E(Outlook=overcast) = -(4/4)*log(4/4)-0 = 0
• AvgEntropy (Outlook) p=9, n=4
AvgEntropy(Outlook) = p(sunny) * E(Outlook=sunny) + p(rain) * E(Outlook=rain) + p(overcast) *
E(Outlook=overcast)
= (5/14)*0.971 + (5/14)*0.971 + (4/14)*0
= 0.693
• Information Gain = E(T) - AvgEntropy(Outlook)
= 0.94 - 0.693
= 0.247

•Second Attribute – Temperature
• Categorical values - hot, mild, cool Hot: p(Yes)=2, n(No)=2
• E(Temperature=hot) = -(2/4)*log(2/4)-(2/4)*log(2/4) = 1 Cool: p(Yes)=3, n(No)=1
• E(Temperature=cool) = -(3/4)*log(3/4)-(1/4)*log(1/4) = 0.811 Mild: p(Yes)=4, n(No)=2
• E(Temperature=mild) = -(4/6)*log(4/6)-(2/6)*log(2/6) = 0.9179
• AvgEntropy(Temperature)
AvgEntropy(Temperature) = p(hot)*E(Temperature=hot) + p(mild)*E(Temperature=mild) +
p(cool)*E(Temperature=cool)
= (4/14)*1 + (6/14)*0.9179 + (4/14)*0.811
= 0.9108
• Information Gain = E(T) - AvgEntropy(Temperature)
= 0.94 - 0.9108
= 0.0292

• Third Attribute – Humidity
• Categorical values - high, normal High: p(Yes)=3, n(No)=4
• E(Humidity=high) = -(3/7)*log(3/7)-(4/7)*log(4/7) = 0.983 Normal: p(Yes)=6, n(No)=1
• E(Humidity=normal) = -(6/7)*log(6/7)-(1/7)*log(1/7) = 0.591
• AvgEntropy (Humidity)
AvgEntropy(Humidity) = p(high)*H(Humidity=high) + p(normal)*H(Humidity=normal)
= (7/14)*0.983 + (7/14)*0.591
= 0.787
• Information Gain = E(T) - AvgEntropy(Humidity)
= 0.94 - 0.787
= 0.153

•Fourth Attribute – Wind
• Categorical values - weak, strong Weak: p(Yes)=6, n(No)=2
• E(Wind=weak) = -(6/8)*log(6/8)-(2/8)*log(2/8) = 0.811 Strong: p(Yes)=3, n(No)=3
• E(Wind=strong) = -(3/6)*log(3/6)-(3/6)*log(3/6) = 1
• AvgEntropy (Wind)
AvgEntropy(Wind) = p(weak)*E(Wind=weak) + p(strong)*E(Wind=strong)
= (8/14)*0.811 + (6/14)*1
= 0.892
• Information Gain = E(T) - AvgEntropy(Wind)
= 0.94 - 0.892
= 0.048

Attribute Gain
Outlook 0.247
Temperature 0.029
Humidity 0.152
wind 0.048

STEP 3: FIND THE FEATURE WITH MAXIMUM
INFORMATION GAIN.
• Here, the attribute with maximum information gain is Outlook (0.247). So,
the decision tree built so far –
• Here, when Outlook == overcast, it is of pure class(Yes).
Now, we have to repeat same procedure for the data
with rows consist of Outlook value as Sunny and
then for Outlook value as Rain.

• Now, finding the best attribute for splitting the data with
Outlook=Sunny
• E(T) = - (p/p+n) log2 (p/p+n) – (n/p+n) log2 (n/p+n)

REPEAT STEP 2
• First Attribute – Temperature
• Categorical values - hot, mild, cool (Sunny, Hot): p(Yes)=0, n(No)=2
• E(Sunny, Temperature=hot) = -0-(2/2)*log(2/2) = 0 (Sunny, Cool): p(Yes)=1, n(No)=0
• E(Sunny, Temperature=cool) = -(1)*log(1)- 0 = 0 (Sunny, Mild): p(Yes)=1, n(No)=1
• E(Sunny, Temperature=mild) = -(1/2)*log(1/2)-(1/2)*log(1/2) = 1
• AvgEntropy (Temperature) Sunny: p=2, n=3
I(Sunny, Temperature) = p(Sunny, hot)*E(Sunny, Temperature=hot) + p(Sunny, mild)*E(Sunny,
Temperature=mild) + p(Sunny, cool)*E(Sunny, Temperature=cool)
= (2/5)*0 + (1/5)*0 + (2/5)*1
= 0.4
• Information Gain = Entropy(Sunny) – AvgEntropy(Temperature)
= 0.971 - 0.4
= 0.571

• Second Attribute – Humidity
• Categorical values - high, normal (Sunny, High): p(Yes)=0, n(No)=3
• E(Sunny, Humidity=high) = - 0 - (3/3)*log(3/3) = 0 (Sunny, Normal): p(Yes)=2, n(No)=0
• E(Sunny, Humidity=normal) = -(2/2)*log(2/2)-0 = 0
• AvgEntropy (Humidity)
AvgEntropy(Sunny, Humidity) = p(Sunny, high)*E(Sunny, Humidity=high) + p(Sunny,
normal)*E(Sunny, Humidity=normal)
= (3/5)*0 + (2/5)*0
= 0
• Information Gain = E(Sunny) - AvgEntropy(Humidity)
= 0.971 - 0
= 0.971

• Third Attribute – Wind
• Categorical values - weak, strong (Sunny, Weak): p(Yes)=1, n(No)=2
• E(Sunny, Wind=weak) = -(1/3)*log(1/3)-(2/3)*log(2/3) = 0.918 (Sunny, Strong): p(Yes)=1, n(No)=1
• E(Sunny, Wind=strong) = -(1/2)*log(1/2)-(1/2)*log(1/2) = 1
• AvgEntropy(Wind)
AvgEntropy(Sunny, Wind) = p(Sunny, weak)*E(Sunny, Wind=weak) + p(Sunny, strong)*E(Sunny,
Wind=strong)
= (3/5)*0.918 + (2/5)*1
= 0.9508
• Information Gain = E(Sunny) - AvgEntropy(Wind)
= 0.971 - 0.9508
= 0.0202

Attribute Gain
Temperature 0.571
Humidity 0.971
wind 0.02

REPEAT STEP 3
• Here, the attribute with maximum information gain is
Humidity(0.971). So, the decision tree built so far –

• Now, finding the best attribute for splitting the
data with Outlook=Rain
• E(T) = - (p/p+n) log2 (p/p+n) – (n/p+n) log2 (n/p+n)

REPEAT STEP 2
•First Attribute – Temperature
• Categorical values - mild, cool, Hot (Rain, Cool): p(Yes)=1,
n(No)=1
• E(Rain, Temperature=cool) = -(1/2)*log(1/2)- (1/2)*log(1/2) = 1 (Rain, Mild): p(Yes)=2, n(No)=1
• E(Rain, Temperature=mild) = -(2/3)*log(2/3)-(1/3)*log(1/3) = 0.918
• E(Rain, Temperature=hot) = 0
• AvgEntropy (Temperature) Rain: p=3, n=2
AvgEntropy(Rain, Temperature) = p(Rain, mild)*E(Rain, Temperature=mild) + p(Rain, cool)*E(Rain,
Temperature=cool)
= (2/5)*1 + (3/5)*0.918
= 0.9508
• Information Gain = E(Rain) - AvgEntropy(Temperature)
= 0.971 - 0.9508

• Second Attribute – Wind
• Categorical values - weak, strong (Rain, Weak): p(Yes)=3, n(No)=0
• E(Wind=weak) = -(3/3)*log(3/3)-0 = 0 (Rain, Strong): p(Yes)=0, n(No)=2
• E(Wind=strong) = 0-(2/2)*log(2/2) = 0
• AvgEntropy (Wind)
AvgEntropy(Wind) = p(Rain, weak)*E(Rain, Wind=weak) + p(Rain, strong)*E(Rain, Wind=strong)
= (3/5)*0 + (2/5)*0
= 0
• Information Gain = E(Rain) - AvgEntropy(Wind)
= 0.971 - 0
= 0.971

REPEAT STEP 3
• Here, the attribute with maximum information gain is
Wind(0.971). So, the decision tree built so far –

Decision tree induction \ Decision Tree Algorithm with Example| Data science

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Decision tree induction \ Decision Tree Algorithm with Example| Data science

Similar to Decision tree induction \ Decision Tree Algorithm with Example| Data science (20)

Recently uploaded

Recently uploaded (20)

Decision tree induction \ Decision Tree Algorithm with Example| Data science