Issues in Decision Tree
Overfitting
Incorporating Continuous-valued attributes
Attributes with many values
Handling attributes with costs
Handling examples with missing attribute values
Machine Learning
B.Tech(IT) 2017-21
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut
1. Machine Learning
Submitted To
Neelam Ma’m
Assistance Prof.
SCRIET, Meerut
Submitted By
Ravindra Singh Kushwaha
B.Tech(IT) 8thsem
SCRIET, Meerut
Issues in Decision Tree Learning
2. Issues in Decision Tree Learning
• Overfitting
• Incorporating Continuous-valued attributes
• Attributes with many values
• Handling attributes with costs
• Handling examples with missing attribute values
3. Overfitting
• Consider a hypothesis h over
• Training data: errortrain(h)
• Entire distribution D of data: errorD(h)
• The hypothesis h ∈ H overfits training data if there is an
alternative hypothesis h’ ∈ H such that
• errortrain(h) <
errortrain(h’) AND
• errorD(h) > errorD(h’)
5. Avoiding Overfitting
• Causes
1. This can happen when the training data contains errors or
noise.
2. small numbers of examples are associated with leaf nodes
• Avoiding Overfitting
1. Stop growing when data split not statistically significant
2. Grow full tree, then post-prune it.
• Selecting Best Tree
1. Measure performance over training data
2. Measure performance over separate validation data
6. Reduced-Error Pruning
• Split data into training and validation sets
• Do until further pruning is harmful
1. Evaluate impact of pruning each possible node on
validation set
2. Greedily remove the one that most improves the validation
set accuracy
8. Rule Post-Pruning
• The major drawback of Reduced-Error Pruning is when
the data is limited, validation set reduces even further
the number of examples for training.
Hence Rule Post-Pruning
• Convert tree to equivalent set of rules
• Prune each rule independently of others
• Sort final rules into desired sequence for use
9. Converting a tree to rules
IF (Outlook = Sunny) 𝖠 (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) 𝖠 (Humidity = Normal)
THEN PlayTennis = Yes
11. Attributes with many values
• Problem:
• If attribute has many values, Gain will select any value
• Example – Using date attribute
• One approach – Gain Ratio
Where si is a subset of S which has value vi
12. Attributes with costs
• Problem:
• Medical diagnosis, BloodTest has cost $150
• Robotics, Width_from_1ft has cost 23 sec
• One Approach - replace gain
• Tan and Schlimmer (1990)
• Nunez (1988)
• where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information
gain.
13. Examples with missing attribute values
• What if some examples missing values of attribute A?
• Use training examples anyway and sort through tree
• If node n tests A, Assign it the most common value among
the examples at node n
• Assign a probability pi to each possible value of A – vi and
assign fraction pi of example to each descendant in tree
14. Some of the latest Applications
Gesture Recognition
Motion Detection
Xbox 360 Kinect