Learning Decision Trees
• A Decision Tree is a tree-structured plan of a set of attributes to test
in order to predict the output.
• To decide which attribute should be tested first, simply find the one
with the highest information gain.
• Then recurse…
Decision Tree Learning Algorithm
Steps in ID3 algorithm:
1.It begins with the original set S as the root node.
2.On each iteration of the algorithm, it iterates through the very unused
attribute of the set S and calculates Entropy(H) and Information
gain(IG) of this attribute.
3.It then selects the attribute which has the smallest Entropy or
Largest Information gain.
4.The set S is then split by the selected attribute to produce a subset
of the data.
5.The algorithm continues to recur on each subset, considering only
attributes never selected before.
Decision Tree Learning Algorithm
Different Decision Tree Learning Algorithms
The most notable types of decision tree algorithms are:-
1. Iterative Dichotomiser 3 (ID3): This algorithm uses Information Gain to
decide which attribute is to be used classify the current subset of the data. For
each level of the tree, information gain is calculated for the remaining data
recursively.
2. C4.5: This algorithm is the successor of the ID3 algorithm. This algorithm
uses either Information gain or Gain ratio to decide upon the classifying
attribute. It is a direct improvement from the ID3 algorithm as it can handle
both continuous and missing attribute values.
3. Classification and Regression Tree(CART): It is a dynamic learning algorithm
which can produce a regression tree as well as a classification tree depending
upon the dependent variable.
Node Selection Measures
• Entropy,
• Information gain,
• Gini index,
• Gain Ratio,
• Reduction in Variance
• Chi-Square
Decision Tree for Numeric Data
• In the dataset above there are 5 attributes from which attribute E
is the predicting feature which contains 2(Positive & Negative)
classes. We have an equal proportion for both the classes.
• In Gini Index, we have to choose some random values to
categorize each attribute. These values for this dataset are:
Calculating Gini Index
Calculating Gini Index
References
• https://www.kdnuggets.com/2020/01/decision-tree-
algorithm-explained.html
• https://towardsdatascience.com/decision-tree-in-
machine-learning-e380942a4c96
• https://www.geeksforgeeks.org/decision-tree-
introduction-example/

Lecture08_Decision Tree Learning PartII.pptx

  • 1.
    Learning Decision Trees •A Decision Tree is a tree-structured plan of a set of attributes to test in order to predict the output. • To decide which attribute should be tested first, simply find the one with the highest information gain. • Then recurse…
  • 2.
    Decision Tree LearningAlgorithm Steps in ID3 algorithm: 1.It begins with the original set S as the root node. 2.On each iteration of the algorithm, it iterates through the very unused attribute of the set S and calculates Entropy(H) and Information gain(IG) of this attribute. 3.It then selects the attribute which has the smallest Entropy or Largest Information gain. 4.The set S is then split by the selected attribute to produce a subset of the data. 5.The algorithm continues to recur on each subset, considering only attributes never selected before.
  • 3.
  • 4.
    Different Decision TreeLearning Algorithms The most notable types of decision tree algorithms are:- 1. Iterative Dichotomiser 3 (ID3): This algorithm uses Information Gain to decide which attribute is to be used classify the current subset of the data. For each level of the tree, information gain is calculated for the remaining data recursively. 2. C4.5: This algorithm is the successor of the ID3 algorithm. This algorithm uses either Information gain or Gain ratio to decide upon the classifying attribute. It is a direct improvement from the ID3 algorithm as it can handle both continuous and missing attribute values. 3. Classification and Regression Tree(CART): It is a dynamic learning algorithm which can produce a regression tree as well as a classification tree depending upon the dependent variable.
  • 5.
    Node Selection Measures •Entropy, • Information gain, • Gini index, • Gain Ratio, • Reduction in Variance • Chi-Square
  • 6.
    Decision Tree forNumeric Data • In the dataset above there are 5 attributes from which attribute E is the predicting feature which contains 2(Positive & Negative) classes. We have an equal proportion for both the classes. • In Gini Index, we have to choose some random values to categorize each attribute. These values for this dataset are:
  • 7.
  • 8.
  • 9.