Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut

Machine Learning
Submitted To
Neelam Ma’m
Assistance Prof.
SCRIET, Meerut
Submitted By
Ravindra Singh Kushwaha
B.Tech(IT) 8thsem
SCRIET, Meerut
Issues in Decision Tree Learning

Issues in Decision Tree Learning
• Overfitting
• Incorporating Continuous-valued attributes
• Attributes with many values
• Handling attributes with costs
• Handling examples with missing attribute values

Overfitting
• Consider a hypothesis h over
• Training data: errortrain(h)
• Entire distribution D of data: errorD(h)
• The hypothesis h ∈ H overfits training data if there is an
alternative hypothesis h’ ∈ H such that
• errortrain(h) <
errortrain(h’) AND
• errorD(h) > errorD(h’)

Overfitting in decision tree learning

Avoiding Overfitting
• Causes
1. This can happen when the training data contains errors or
noise.
2. small numbers of examples are associated with leaf nodes
• Avoiding Overfitting
1. Stop growing when data split not statistically significant
2. Grow full tree, then post-prune it.
• Selecting Best Tree
1. Measure performance over training data
2. Measure performance over separate validation data

Reduced-Error Pruning
• Split data into training and validation sets
• Do until further pruning is harmful
1. Evaluate impact of pruning each possible node on
validation set
2. Greedily remove the one that most improves the validation
set accuracy

Effect of Reduced-Error Pruning

Rule Post-Pruning
• The major drawback of Reduced-Error Pruning is when
the data is limited, validation set reduces even further
the number of examples for training.
Hence Rule Post-Pruning
• Convert tree to equivalent set of rules
• Prune each rule independently of others
• Sort final rules into desired sequence for use

Converting a tree to rules
IF (Outlook = Sunny) 𝖠 (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) 𝖠 (Humidity = Normal)
THEN PlayTennis = Yes

Continuous Valued-Attributes
• Create a discrete-valued attribute to test continuous
• So if Temperature = 75
• We can infer that PlayTennis = Yes

Attributes with many values
• Problem:
• If attribute has many values, Gain will select any value
• Example – Using date attribute
• One approach – Gain Ratio
Where si is a subset of S which has value vi

Attributes with costs
• Problem:
• Medical diagnosis, BloodTest has cost $150
• Robotics, Width_from_1ft has cost 23 sec
• One Approach - replace gain
• Tan and Schlimmer (1990)
• Nunez (1988)
• where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information
gain.

Examples with missing attribute values
• What if some examples missing values of attribute A?
• Use training examples anyway and sort through tree
• If node n tests A, Assign it the most common value among
the examples at node n
• Assign a probability pi to each possible value of A – vi and
assign fraction pi of example to each descendant in tree

Some of the latest Applications
Gesture Recognition
Motion Detection
Xbox 360 Kinect

Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut

More Related Content

What's hot

Similar to Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut

Recently uploaded

Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut