Machine Learning
Submitted To
Neelam Ma’m
Assistance Prof.
SCRIET, Meerut
Submitted By
Ravindra Singh Kushwaha
B.Tech(IT) 8thsem
SCRIET, Meerut
Issues in Decision Tree Learning
Issues in Decision Tree Learning
• Overfitting
• Incorporating Continuous-valued attributes
• Attributes with many values
• Handling attributes with costs
• Handling examples with missing attribute values
Overfitting
• Consider a hypothesis h over
• Training data: errortrain(h)
• Entire distribution D of data: errorD(h)
• The hypothesis h ∈ H overfits training data if there is an
alternative hypothesis h’ ∈ H such that
• errortrain(h) <
errortrain(h’) AND
• errorD(h) > errorD(h’)
Overfitting in decision tree learning
Avoiding Overfitting
• Causes
1. This can happen when the training data contains errors or
noise.
2. small numbers of examples are associated with leaf nodes
• Avoiding Overfitting
1. Stop growing when data split not statistically significant
2. Grow full tree, then post-prune it.
• Selecting Best Tree
1. Measure performance over training data
2. Measure performance over separate validation data
Reduced-Error Pruning
• Split data into training and validation sets
• Do until further pruning is harmful
1. Evaluate impact of pruning each possible node on
validation set
2. Greedily remove the one that most improves the validation
set accuracy
Effect of Reduced-Error Pruning
Rule Post-Pruning
• The major drawback of Reduced-Error Pruning is when
the data is limited, validation set reduces even further
the number of examples for training.
Hence Rule Post-Pruning
• Convert tree to equivalent set of rules
• Prune each rule independently of others
• Sort final rules into desired sequence for use
Converting a tree to rules
IF (Outlook = Sunny) 𝖠 (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) 𝖠 (Humidity = Normal)
THEN PlayTennis = Yes
Continuous Valued-Attributes
• Create a discrete-valued attribute to test continuous
• So if Temperature = 75
• We can infer that PlayTennis = Yes
Attributes with many values
• Problem:
• If attribute has many values, Gain will select any value
• Example – Using date attribute
• One approach – Gain Ratio
Where si is a subset of S which has value vi
Attributes with costs
• Problem:
• Medical diagnosis, BloodTest has cost $150
• Robotics, Width_from_1ft has cost 23 sec
• One Approach - replace gain
• Tan and Schlimmer (1990)
• Nunez (1988)
• where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information
gain.
Examples with missing attribute values
• What if some examples missing values of attribute A?
• Use training examples anyway and sort through tree
• If node n tests A, Assign it the most common value among
the examples at node n
• Assign a probability pi to each possible value of A – vi and
assign fraction pi of example to each descendant in tree
Some of the latest Applications
Gesture Recognition
Motion Detection
Xbox 360 Kinect
Thank You

Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut

  • 1.
    Machine Learning Submitted To NeelamMa’m Assistance Prof. SCRIET, Meerut Submitted By Ravindra Singh Kushwaha B.Tech(IT) 8thsem SCRIET, Meerut Issues in Decision Tree Learning
  • 2.
    Issues in DecisionTree Learning • Overfitting • Incorporating Continuous-valued attributes • Attributes with many values • Handling attributes with costs • Handling examples with missing attribute values
  • 3.
    Overfitting • Consider ahypothesis h over • Training data: errortrain(h) • Entire distribution D of data: errorD(h) • The hypothesis h ∈ H overfits training data if there is an alternative hypothesis h’ ∈ H such that • errortrain(h) < errortrain(h’) AND • errorD(h) > errorD(h’)
  • 4.
  • 5.
    Avoiding Overfitting • Causes 1.This can happen when the training data contains errors or noise. 2. small numbers of examples are associated with leaf nodes • Avoiding Overfitting 1. Stop growing when data split not statistically significant 2. Grow full tree, then post-prune it. • Selecting Best Tree 1. Measure performance over training data 2. Measure performance over separate validation data
  • 6.
    Reduced-Error Pruning • Splitdata into training and validation sets • Do until further pruning is harmful 1. Evaluate impact of pruning each possible node on validation set 2. Greedily remove the one that most improves the validation set accuracy
  • 7.
  • 8.
    Rule Post-Pruning • Themajor drawback of Reduced-Error Pruning is when the data is limited, validation set reduces even further the number of examples for training. Hence Rule Post-Pruning • Convert tree to equivalent set of rules • Prune each rule independently of others • Sort final rules into desired sequence for use
  • 9.
    Converting a treeto rules IF (Outlook = Sunny) 𝖠 (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) 𝖠 (Humidity = Normal) THEN PlayTennis = Yes
  • 10.
    Continuous Valued-Attributes • Createa discrete-valued attribute to test continuous • So if Temperature = 75 • We can infer that PlayTennis = Yes
  • 11.
    Attributes with manyvalues • Problem: • If attribute has many values, Gain will select any value • Example – Using date attribute • One approach – Gain Ratio Where si is a subset of S which has value vi
  • 12.
    Attributes with costs •Problem: • Medical diagnosis, BloodTest has cost $150 • Robotics, Width_from_1ft has cost 23 sec • One Approach - replace gain • Tan and Schlimmer (1990) • Nunez (1988) • where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information gain.
  • 13.
    Examples with missingattribute values • What if some examples missing values of attribute A? • Use training examples anyway and sort through tree • If node n tests A, Assign it the most common value among the examples at node n • Assign a probability pi to each possible value of A – vi and assign fraction pi of example to each descendant in tree
  • 14.
    Some of thelatest Applications Gesture Recognition Motion Detection Xbox 360 Kinect
  • 15.