SlideShare a Scribd company logo
1 of 15
Machine Learning
Submitted To
Neelam Ma’m
Assistance Prof.
SCRIET, Meerut
Submitted By
Ravindra Singh Kushwaha
B.Tech(IT) 8thsem
SCRIET, Meerut
Issues in Decision Tree Learning
Issues in Decision Tree Learning
• Overfitting
• Incorporating Continuous-valued attributes
• Attributes with many values
• Handling attributes with costs
• Handling examples with missing attribute values
Overfitting
• Consider a hypothesis h over
• Training data: errortrain(h)
• Entire distribution D of data: errorD(h)
• The hypothesis h ∈ H overfits training data if there is an
alternative hypothesis h’ ∈ H such that
• errortrain(h) <
errortrain(h’) AND
• errorD(h) > errorD(h’)
Overfitting in decision tree learning
Avoiding Overfitting
• Causes
1. This can happen when the training data contains errors or
noise.
2. small numbers of examples are associated with leaf nodes
• Avoiding Overfitting
1. Stop growing when data split not statistically significant
2. Grow full tree, then post-prune it.
• Selecting Best Tree
1. Measure performance over training data
2. Measure performance over separate validation data
Reduced-Error Pruning
• Split data into training and validation sets
• Do until further pruning is harmful
1. Evaluate impact of pruning each possible node on
validation set
2. Greedily remove the one that most improves the validation
set accuracy
Effect of Reduced-Error Pruning
Rule Post-Pruning
• The major drawback of Reduced-Error Pruning is when
the data is limited, validation set reduces even further
the number of examples for training.
Hence Rule Post-Pruning
• Convert tree to equivalent set of rules
• Prune each rule independently of others
• Sort final rules into desired sequence for use
Converting a tree to rules
IF (Outlook = Sunny) 𝖠 (Humidity = High)
THEN PlayTennis = No
IF (Outlook = Sunny) 𝖠 (Humidity = Normal)
THEN PlayTennis = Yes
Continuous Valued-Attributes
• Create a discrete-valued attribute to test continuous
• So if Temperature = 75
• We can infer that PlayTennis = Yes
Attributes with many values
• Problem:
• If attribute has many values, Gain will select any value
• Example – Using date attribute
• One approach – Gain Ratio
Where si is a subset of S which has value vi
Attributes with costs
• Problem:
• Medical diagnosis, BloodTest has cost $150
• Robotics, Width_from_1ft has cost 23 sec
• One Approach - replace gain
• Tan and Schlimmer (1990)
• Nunez (1988)
• where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information
gain.
Examples with missing attribute values
• What if some examples missing values of attribute A?
• Use training examples anyway and sort through tree
• If node n tests A, Assign it the most common value among
the examples at node n
• Assign a probability pi to each possible value of A – vi and
assign fraction pi of example to each descendant in tree
Some of the latest Applications
Gesture Recognition
Motion Detection
Xbox 360 Kinect
Thank You

More Related Content

What's hot

Queue Implementation Using Array & Linked List
Queue Implementation Using Array & Linked ListQueue Implementation Using Array & Linked List
Queue Implementation Using Array & Linked List
PTCL
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 

What's hot (20)

Heaps
HeapsHeaps
Heaps
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Data Structures (CS8391)
Data Structures (CS8391)Data Structures (CS8391)
Data Structures (CS8391)
 
Hash table
Hash tableHash table
Hash table
 
Queue Implementation Using Array & Linked List
Queue Implementation Using Array & Linked ListQueue Implementation Using Array & Linked List
Queue Implementation Using Array & Linked List
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Isomorphism (Graph)
Isomorphism (Graph) Isomorphism (Graph)
Isomorphism (Graph)
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
ADA - Minimum Spanning Tree Prim Kruskal and Dijkstra
 
Data Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search TreeData Structure and Algorithms Binary Search Tree
Data Structure and Algorithms Binary Search Tree
 
linear probing
linear probinglinear probing
linear probing
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 
String matching algorithms
String matching algorithmsString matching algorithms
String matching algorithms
 
Huffman's algorithm in Data Structure
 Huffman's algorithm in Data Structure Huffman's algorithm in Data Structure
Huffman's algorithm in Data Structure
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Python Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, DictionaryPython Variable Types, List, Tuple, Dictionary
Python Variable Types, List, Tuple, Dictionary
 

Similar to Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
Henock Beyene
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
kpcp
 

Similar to Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut (20)

L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
AL slides.ppt
AL slides.pptAL slides.ppt
AL slides.ppt
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx10 -- Overfitting and Underfitting.pptx
10 -- Overfitting and Underfitting.pptx
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Iti week 10 (3)
Iti week 10 (3)Iti week 10 (3)
Iti week 10 (3)
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 

Issues in Decision Tree by Ravindra Singh Kushwaha B.Tech(IT) 2017-21 Chaudhary Charan Singh University, Meerut

  • 1. Machine Learning Submitted To Neelam Ma’m Assistance Prof. SCRIET, Meerut Submitted By Ravindra Singh Kushwaha B.Tech(IT) 8thsem SCRIET, Meerut Issues in Decision Tree Learning
  • 2. Issues in Decision Tree Learning • Overfitting • Incorporating Continuous-valued attributes • Attributes with many values • Handling attributes with costs • Handling examples with missing attribute values
  • 3. Overfitting • Consider a hypothesis h over • Training data: errortrain(h) • Entire distribution D of data: errorD(h) • The hypothesis h ∈ H overfits training data if there is an alternative hypothesis h’ ∈ H such that • errortrain(h) < errortrain(h’) AND • errorD(h) > errorD(h’)
  • 4. Overfitting in decision tree learning
  • 5. Avoiding Overfitting • Causes 1. This can happen when the training data contains errors or noise. 2. small numbers of examples are associated with leaf nodes • Avoiding Overfitting 1. Stop growing when data split not statistically significant 2. Grow full tree, then post-prune it. • Selecting Best Tree 1. Measure performance over training data 2. Measure performance over separate validation data
  • 6. Reduced-Error Pruning • Split data into training and validation sets • Do until further pruning is harmful 1. Evaluate impact of pruning each possible node on validation set 2. Greedily remove the one that most improves the validation set accuracy
  • 8. Rule Post-Pruning • The major drawback of Reduced-Error Pruning is when the data is limited, validation set reduces even further the number of examples for training. Hence Rule Post-Pruning • Convert tree to equivalent set of rules • Prune each rule independently of others • Sort final rules into desired sequence for use
  • 9. Converting a tree to rules IF (Outlook = Sunny) 𝖠 (Humidity = High) THEN PlayTennis = No IF (Outlook = Sunny) 𝖠 (Humidity = Normal) THEN PlayTennis = Yes
  • 10. Continuous Valued-Attributes • Create a discrete-valued attribute to test continuous • So if Temperature = 75 • We can infer that PlayTennis = Yes
  • 11. Attributes with many values • Problem: • If attribute has many values, Gain will select any value • Example – Using date attribute • One approach – Gain Ratio Where si is a subset of S which has value vi
  • 12. Attributes with costs • Problem: • Medical diagnosis, BloodTest has cost $150 • Robotics, Width_from_1ft has cost 23 sec • One Approach - replace gain • Tan and Schlimmer (1990) • Nunez (1988) • where w ∈ [0, 1] is a constant that determines the relative importance of cost versus information gain.
  • 13. Examples with missing attribute values • What if some examples missing values of attribute A? • Use training examples anyway and sort through tree • If node n tests A, Assign it the most common value among the examples at node n • Assign a probability pi to each possible value of A – vi and assign fraction pi of example to each descendant in tree
  • 14. Some of the latest Applications Gesture Recognition Motion Detection Xbox 360 Kinect