A decision tree classifier is explained. Key points include:
- Nodes test attribute values, edges correspond to test outcomes, and leaves predict the class.
- Information gain measures how much a variable contributes to the classification.
- It is used to select the variable that best splits the data at each node, with the highest information gain splitting the root node.
- An example calculates information gain for road type, obstruction, and speed limit variables to classify car speed. Speed limit has the highest information gain of 1 and is used to build the decision tree.
2. Introduction
A decision tree consists of
• Nodes: test for the value of a certain attribute
• Edges: correspond to the outcome of a test
• connect to the next node or leaf
• Leaves: terminal nodes that predict the outcome
5. What Is Information Gain?
Information Gain (IG) is the most significant measure used to build a Decision Tree. It indicates how
much “information” a particular feature/ variable gives us about the final outcome.
Information Gain is important because it used to choose the variable that best splits the data at each
node of a Decision Tree. The variable with the highest IG is used to split the data at the root node.
Equation For Information Gain (IG):
Entropy: Entropy is nothing but the uncertainty in our dataset or measure of disorder
6. An Example to understand
Create a Decision Tree that classifies the speed of a car (response variable) as either slow or fast,
depending on the following predictor variables:
•Road type
•Obstruction
•Speed limit
7. • By calculating the Entropy and Information Gain (IG) for each of the
predictor variables, starting with ‘Road type’.
• In our data set, there are four observations in the ‘Road type’ column that
correspond to four labels in the ‘Speed of car’ column.
• We shall begin by calculating the entropy of the parent node (Speed of
car).
• Step one is to find out the fraction of the two classes present in the parent
node. We know that there are a total of four values present in the parent
node, out of which two samples belong to the ‘slow’ class and the other 2
belong to the ‘fast’ class, therefore:
• P(slow) -> fraction of ‘slow’ outcomes in the parent node
• P(fast) -> fraction of ‘fast’ outcomes in the parent node
8.
9. • Now that we know that the entropy of the parent node is 1, let’s see
how to calculate the Information Gain for the ‘Road type’ variable.
Remember that, if the Information gain of the ‘Road type’ variable is
greater than the Information Gain of all the other predictor variables,
only then the root node can be split by using the ‘Road type’ variable.
• In order to calculate the Information Gain of ‘Road type’ variable, we
first need to split the root node by the ‘Road type’ variable.
10. • we’ve split the parent node by using the ‘Road type’ variable, the child
nodes denote the corresponding responses as shown in the data set. Now,
we need to measure the entropy of the child nodes.
• The entropy of the right-hand side child node (fast) is 0 because all of the
outcomes in this node belongs to one class (fast). In a similar manner, we
must find the Entropy of the left-hand side node (slow, slow, fast).
• In this node there are two types of outcomes (fast and slow), therefore, we
first need to calculate the fraction of slow and fast outcomes for this
particular node.
P(slow) = 2/3 = 0.667
P(fast) = 1/3 = 0.334
Therefore, entropy is:
Entropy(left child node) = – {0.667 log2(0.667) + 0.334 log2(0.334)} =
– {-0.38 + (-0.52)}
= 0.9
11. • Our next step is to calculate the Entropy(children) with weighted
average:
• Total number of outcomes in parent node: 4
• Total number of outcomes in left child node: 3
• Total number of outcomes in right child node: 1
• Formula for Entropy(children) with weighted avg. :
• [Weighted avg]Entropy(children) = (no. of outcomes in left child node)
/ (total no. of outcomes in parent node) * (entropy of left node) + (no.
of outcomes in right child node)/ (total no. of outcomes in parent
node) * (entropy of right node)
• By using the above formula you’ll find that the, Entropy(children) with
weighted avg. is = 0.675
12. Our final step is to substitute the above weighted average in the IG formula in order to calculate the final IG of the
‘Road type’ variable:
Information Gain formula - Decision Tree Algorithm - EdurekaTherefore,
Information gain(Road type) = 1 – 0.675 = 0.325
Information gain of Road type feature is 0.325.
13. • The Decision Tree Algorithm selects the variable with the highest
Information Gain to split the Decision Tree. Therefore, by using the
above method you need to calculate the Information Gain for all the
predictor variables to check which variable has the highest IG.
• So by using the above methodology, you must get the following
values for each predictor variable:
• Information gain(Road type) = 1 – 0.675 = 0.325
• Information gain(Obstruction) = 1 – 1 = 0
• Information gain(Speed limit) = 1 – 0 = 1
14. So, here we can see that the ‘Speed limit’ variable has the
highest Information Gain. Therefore, the final Decision Tree for
this dataset is built using the ‘Speed limit’ variable.