Decision Tree
Vijay Yadav (076MSCSK020)
Ravi Giri (076MSCSK009)
Sujit Kumar Jha (076MSCSK018)
Sujit Maharjan (076MSCSK019)
Introduction
• Decisions trees are the most powerful algorithms that falls
under the category of supervised algorithms.
• The two main entities of a tree are decision nodes, where
the data is split and leaves, where we got outcome.
• A decision tree can be used to visually and explicitly
represent decisions and decision making.
• They can be used for both classification and regression
tasks.
Types of decision tree
A. Classification decision trees −
• In this kind of decision trees, the decision variable is
categorical.
• It is analysis when the predicted outcome is the class
(discrete) to which the data belongs.
B. Regression decision trees −
• In this kind of decision trees, the decision variable is
continuous.
• It is analysis when the predicted outcome can be considered
a real number (e.g. the price of a house, or a patient's length
of stay in a hospital).
Examples of decision tree (Example 1)
Example 2
Important terminology
1. Root Node: This attribute is used for dividing the data into two or more sets.
The feature attribute in this node is selected based on Attribute Selection
Techniques.
2. Branch or Sub-Tree: A part of the entire decision tree is called a branch or
sub-tree.
3. Splitting: Dividing a node into two or more sub-nodes based on if-else
conditions.
4. Decision Node: After splitting the sub-nodes into further sub-nodes, then it
is called the decision node.
5. Leaf or Terminal Node: This is the end of the decision tree where it cannot
be split into further sub-nodes.
6. Pruning: Removing a sub-node from the tree is called pruning.
Contd…
Working of decision tree
• The root node feature is selected based on the results from the Attribute
Selection Measure(ASM).
• The ASM is repeated until there is a leaf node, or a terminal that node cannot
be split into sub-nodes.
Some decision tree algorithms
 ID3 (Iterative Dichotomizer 3)
 C4.5 (successor of ID3)
 CART (Classification And Regression Tree)
 Chi-square automatic interaction detection (CHAID). Performs
multi-level splits when computing classification trees.
 MARS: extends decision trees to handle numerical data better.
Attribute Selection Measures (ASM)
Attribute Subset Selection Measure is a technique used in the data
mining process for data reduction. The data reduction is necessary
to make better analysis and prediction of the target variable.
The two main ASM techniques are
1.Gini index
2.Information Gain(ID3)
1. Gini Index (used by CART algorithm)
The measure of the degree of probability of a particular variable being wrongly
classified when it is randomly chosen is called the Gini index or Gini impurity.
The data is equally distributed based on the Gini index.
Mathematical Formula :
Pi= probability of an object being classified into a particular class.
When we use the Gini index as the criterion for the algorithm to select the
feature for the root node, the feature with the least Gini index is selected.
2. Information gain (used by ID3, C4.5
algorithms)
Entropy is the basis of Information Theory. Entropy is a measure of
randomness, hence the smaller the entropy the greater the
information content.
Mathematically,
‘p’, denotes the probability of E(S), which denotes the entropy.
The feature or attribute with the highest ID3 gain is used as
the root for the splitting.
ID3 Algorithm
1. Create a root node for tree.
2. If all examples are positive, then create a positive node and stop.
3. If all examples are negative, then create a negative node and stop.
4. Otherwise
 Calculate entropy, information gain to select root node and branch
node.( nodes with highest information gain or minimum entropy is
selected as root node).
 Partition the examples into subset.
 Repeat until all examples are classified .
Advantages of decision tree
 It is simple to understand as it follows the same process which a
human follow while making any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other
algorithms.
Disadvantages of decision tree
 The decision tree contains lots of layers, which makes
it complex.
 It may have an overfitting issue, which can be resolved
using the Random Forest algorithm.
 For more class labels, the computational complexity of
the decision tree may increase.
Steps for implementation of decision tree
in programming language
 Data Pre-processing step
 Fitting a Decision-Tree algorithm to the Training set
 Predicting the test result
 Test accuracy of the result
 Visualizing the test set result.
Decision tree presentation

Decision tree presentation

  • 1.
    Decision Tree Vijay Yadav(076MSCSK020) Ravi Giri (076MSCSK009) Sujit Kumar Jha (076MSCSK018) Sujit Maharjan (076MSCSK019)
  • 2.
    Introduction • Decisions treesare the most powerful algorithms that falls under the category of supervised algorithms. • The two main entities of a tree are decision nodes, where the data is split and leaves, where we got outcome. • A decision tree can be used to visually and explicitly represent decisions and decision making. • They can be used for both classification and regression tasks.
  • 3.
    Types of decisiontree A. Classification decision trees − • In this kind of decision trees, the decision variable is categorical. • It is analysis when the predicted outcome is the class (discrete) to which the data belongs. B. Regression decision trees − • In this kind of decision trees, the decision variable is continuous. • It is analysis when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient's length of stay in a hospital).
  • 4.
    Examples of decisiontree (Example 1)
  • 5.
  • 6.
    Important terminology 1. RootNode: This attribute is used for dividing the data into two or more sets. The feature attribute in this node is selected based on Attribute Selection Techniques. 2. Branch or Sub-Tree: A part of the entire decision tree is called a branch or sub-tree. 3. Splitting: Dividing a node into two or more sub-nodes based on if-else conditions. 4. Decision Node: After splitting the sub-nodes into further sub-nodes, then it is called the decision node. 5. Leaf or Terminal Node: This is the end of the decision tree where it cannot be split into further sub-nodes. 6. Pruning: Removing a sub-node from the tree is called pruning.
  • 7.
  • 8.
    Working of decisiontree • The root node feature is selected based on the results from the Attribute Selection Measure(ASM). • The ASM is repeated until there is a leaf node, or a terminal that node cannot be split into sub-nodes.
  • 9.
    Some decision treealgorithms  ID3 (Iterative Dichotomizer 3)  C4.5 (successor of ID3)  CART (Classification And Regression Tree)  Chi-square automatic interaction detection (CHAID). Performs multi-level splits when computing classification trees.  MARS: extends decision trees to handle numerical data better.
  • 10.
    Attribute Selection Measures(ASM) Attribute Subset Selection Measure is a technique used in the data mining process for data reduction. The data reduction is necessary to make better analysis and prediction of the target variable. The two main ASM techniques are 1.Gini index 2.Information Gain(ID3)
  • 11.
    1. Gini Index(used by CART algorithm) The measure of the degree of probability of a particular variable being wrongly classified when it is randomly chosen is called the Gini index or Gini impurity. The data is equally distributed based on the Gini index. Mathematical Formula : Pi= probability of an object being classified into a particular class. When we use the Gini index as the criterion for the algorithm to select the feature for the root node, the feature with the least Gini index is selected.
  • 12.
    2. Information gain(used by ID3, C4.5 algorithms) Entropy is the basis of Information Theory. Entropy is a measure of randomness, hence the smaller the entropy the greater the information content. Mathematically, ‘p’, denotes the probability of E(S), which denotes the entropy. The feature or attribute with the highest ID3 gain is used as the root for the splitting.
  • 13.
    ID3 Algorithm 1. Createa root node for tree. 2. If all examples are positive, then create a positive node and stop. 3. If all examples are negative, then create a negative node and stop. 4. Otherwise  Calculate entropy, information gain to select root node and branch node.( nodes with highest information gain or minimum entropy is selected as root node).  Partition the examples into subset.  Repeat until all examples are classified .
  • 14.
    Advantages of decisiontree  It is simple to understand as it follows the same process which a human follow while making any decision in real-life.  It can be very useful for solving decision-related problems.  It helps to think about all the possible outcomes for a problem.  There is less requirement of data cleaning compared to other algorithms.
  • 15.
    Disadvantages of decisiontree  The decision tree contains lots of layers, which makes it complex.  It may have an overfitting issue, which can be resolved using the Random Forest algorithm.  For more class labels, the computational complexity of the decision tree may increase.
  • 16.
    Steps for implementationof decision tree in programming language  Data Pre-processing step  Fitting a Decision-Tree algorithm to the Training set  Predicting the test result  Test accuracy of the result  Visualizing the test set result.