SlideShare a Scribd company logo
TREE
PRUNING
BY SHIVANGI GUPTA
OVERVIEW
 Decision Tree
 Why Tree Pruning?
 Types of Tree pruning
 Reduced Error pruning
 Comparision
 References
INTRODUCTION
 Decision trees are made to classify the item
set.
 While classifying we meet with 2 problems
1. Underfitting .
2. Overfitting .
 Underfitting problem arises when both the
“training errors and test errors are large”
 This happens when the developed model is
made very simple.
 Overfitting problem arises when
“training errors are small but test errors are
large”
OVERFITTING
 Overfitting results in decision trees that are more
complex than necessary.
 Training error no longer provides a good estimate
of how well the tree will perform on previously
unseen records.
 Need new ways for estimating errors.
How to address overfitting ?
“Tree Pruning”
WHAT IS PRUNING?
 The process of adjusting Decision Tree to minimize
“misclassification error” is called pruning .
 Pruning can be done in 2 ways
1. Prepruning.
2.Postpruning.
PREPRUNING
 Prepruning is the halting of subtree construction at
some node after checking some measures.
 These measures can be Information gain, Gini
index,etc.
 If partitioning the tuple at a node would result in a
split that falls below a prespecified threshold, then
pruning is done.
 Early stopping- Pre-pruning may stop the growth
process prematurely.
POSTPRUNING
 Grow decision tree to its entirety.
 Trim the nodes of the decision tree in a
bottom-up fashion.Postpruning is done by
replacing the node with leaf.
 If error improves after trimming, replace sub-
tree by a leaf node.
REDUCED ERROR PRUNING
 The idea is to hold out some of the available instances—the
“pruning set” after the tree is built.
 Prune the tree until the classification error on these independent
instances starts to increase.
 These pruning set are not used for building the decision tree,
they provide a less biased estimate of its error rate on future
instances than the training data.
 Reduced error pruning is done in bottom up fashion.
 Criteria:
If error of parent is lesser than its child then prune the tree else
not .
i.e if Parent (error)< Child(error) then “Prune”
else don’t Prune
EXAMPLE
Pruning set
STEPS
 In each tree, the number of instances in the pruning data
that are misclassified by the individual nodes are given in
parentheses.
 Assuming that the tree is traversed left-to-right.
 The pruning procedure first considers for removal the
subtree attached to node 3.
 Because the subtree’s error on the pruning data (1 error)
exceeds the error of node 3 itself (0errors), node 3 is
converted to a leaf.
 Next, node 6 is replaced by a leaf for the same reason
 Having processed both of its successors, the pruning
procedure then considers node 2 for deletion.
However, because the subtree attached to node 2
makes fewer mistakes (0 errors) than node 2 itself (1
error), the subtree remains in place.
 Next, the subtree extending from node 9 is
considered for pruning, resulting in a leaf
 In the last step, node 1 is considered for pruning,
leaving the tree unchanged.
COMPARISION
 Prepruning is faster than post pruning since it don’t need to
wait for complete construction of decision tree.
 But still Post-pruning is preferable to pre-pruning because of
“interaction effect”.
 These are the efects which arise after interaction of several
attributes.
 Prepruning suppresses growth by evaluating each attribute
individually, and so might overlook effects that are due to the
interaction of several attributes and stop too early. Post-
pruning, on the other hand, avoids this problem because
interaction effects are visible in the fully grown tree.

More Related Content

What's hot

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
GauravBiswas9
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management System
Janki Shah
 
15. Transactions in DBMS
15. Transactions in DBMS15. Transactions in DBMS
15. Transactions in DBMSkoolkampus
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternatives
Pooja Dixit
 
Fp growth
Fp growthFp growth
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
Ravinder Kamboj
 
B and B+ tree
B and B+ treeB and B+ tree
B and B+ tree
Ashish Arun
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Rutvik Bapat
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Interface specification
Interface specificationInterface specification
Interface specification
maliksiddique1
 
Database security
Database securityDatabase security
Database security
Software Engineering
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
Tarat Diloksawatdikul
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
16. Concurrency Control in DBMS
16. Concurrency Control in DBMS16. Concurrency Control in DBMS
16. Concurrency Control in DBMSkoolkampus
 
File systems versus a dbms
File systems versus a dbmsFile systems versus a dbms
File systems versus a dbms
RituBhargava7
 
Query processing
Query processingQuery processing
Query processing
Dr. C.V. Suresh Babu
 
Concurrency control
Concurrency controlConcurrency control
Concurrency control
Subhasish Pati
 

What's hot (20)

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management System
 
15. Transactions in DBMS
15. Transactions in DBMS15. Transactions in DBMS
15. Transactions in DBMS
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternatives
 
Fp growth
Fp growthFp growth
Fp growth
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 
B and B+ tree
B and B+ treeB and B+ tree
B and B+ tree
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Interface specification
Interface specificationInterface specification
Interface specification
 
Database security
Database securityDatabase security
Database security
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
16. Concurrency Control in DBMS
16. Concurrency Control in DBMS16. Concurrency Control in DBMS
16. Concurrency Control in DBMS
 
File systems versus a dbms
File systems versus a dbmsFile systems versus a dbms
File systems versus a dbms
 
Query processing
Query processingQuery processing
Query processing
 
Concurrency control
Concurrency controlConcurrency control
Concurrency control
 

Viewers also liked

Sisir Bhowmick (CSE) - CV
Sisir Bhowmick  (CSE) - CVSisir Bhowmick  (CSE) - CV
Sisir Bhowmick (CSE) - CVSisir Bhowmick
 
Mining Object Movement Patterns from Trajectory Data
Mining Object Movement Patterns from Trajectory DataMining Object Movement Patterns from Trajectory Data
Mining Object Movement Patterns from Trajectory Data
NhatHai Phan
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Health Catalyst
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
Health Catalyst
 

Viewers also liked (6)

Sisir Bhowmick (CSE) - CV
Sisir Bhowmick  (CSE) - CVSisir Bhowmick  (CSE) - CV
Sisir Bhowmick (CSE) - CV
 
Mining Object Movement Patterns from Trajectory Data
Mining Object Movement Patterns from Trajectory DataMining Object Movement Patterns from Trajectory Data
Mining Object Movement Patterns from Trajectory Data
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 

Similar to Tree pruning

22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
Andres Mendez-Vazquez
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
INFOGAIN PUBLICATION
 
Maths Behind Models.pptx
Maths Behind Models.pptxMaths Behind Models.pptx
Maths Behind Models.pptx
Mukul Kumar Singh Chauhan
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
Saleesh Satheeshchandran
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
Leonardo Auslender
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
Rupak Roy
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
Issues in DTL.pptx
Issues in DTL.pptxIssues in DTL.pptx
Issues in DTL.pptx
Ramakrishna Reddy Bijjam
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
Chitrachitrap
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
Shreyas S K
 
Decision trees
Decision treesDecision trees
Decision trees
nandini patil
 
10 best practices in operational analytics
10 best practices in operational analytics 10 best practices in operational analytics
10 best practices in operational analytics
Decision Management Solutions
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
Kaviya452563
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
Gregg Barrett
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012Salford Systems
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 

Similar to Tree pruning (19)

22 Machine Learning Feature Selection
22 Machine Learning Feature Selection22 Machine Learning Feature Selection
22 Machine Learning Feature Selection
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 
Maths Behind Models.pptx
Maths Behind Models.pptxMaths Behind Models.pptx
Maths Behind Models.pptx
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Issues in DTL.pptx
Issues in DTL.pptxIssues in DTL.pptx
Issues in DTL.pptx
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
Decision trees
Decision treesDecision trees
Decision trees
 
10 best practices in operational analytics
10 best practices in operational analytics 10 best practices in operational analytics
10 best practices in operational analytics
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Applied machine learning: Insurance
Applied machine learning: InsuranceApplied machine learning: Insurance
Applied machine learning: Insurance
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 

Recently uploaded

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
DuvanRamosGarzon1
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 

Recently uploaded (20)

在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSETECHNICAL TRAINING MANUAL   GENERAL FAMILIARIZATION COURSE
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSE
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 

Tree pruning

  • 2. OVERVIEW  Decision Tree  Why Tree Pruning?  Types of Tree pruning  Reduced Error pruning  Comparision  References
  • 3. INTRODUCTION  Decision trees are made to classify the item set.  While classifying we meet with 2 problems 1. Underfitting . 2. Overfitting .
  • 4.  Underfitting problem arises when both the “training errors and test errors are large”  This happens when the developed model is made very simple.  Overfitting problem arises when “training errors are small but test errors are large”
  • 5.
  • 6. OVERFITTING  Overfitting results in decision trees that are more complex than necessary.  Training error no longer provides a good estimate of how well the tree will perform on previously unseen records.  Need new ways for estimating errors.
  • 7.
  • 8. How to address overfitting ? “Tree Pruning”
  • 9. WHAT IS PRUNING?  The process of adjusting Decision Tree to minimize “misclassification error” is called pruning .  Pruning can be done in 2 ways 1. Prepruning. 2.Postpruning.
  • 10. PREPRUNING  Prepruning is the halting of subtree construction at some node after checking some measures.  These measures can be Information gain, Gini index,etc.  If partitioning the tuple at a node would result in a split that falls below a prespecified threshold, then pruning is done.  Early stopping- Pre-pruning may stop the growth process prematurely.
  • 11. POSTPRUNING  Grow decision tree to its entirety.  Trim the nodes of the decision tree in a bottom-up fashion.Postpruning is done by replacing the node with leaf.  If error improves after trimming, replace sub- tree by a leaf node.
  • 12. REDUCED ERROR PRUNING  The idea is to hold out some of the available instances—the “pruning set” after the tree is built.  Prune the tree until the classification error on these independent instances starts to increase.  These pruning set are not used for building the decision tree, they provide a less biased estimate of its error rate on future instances than the training data.  Reduced error pruning is done in bottom up fashion.  Criteria: If error of parent is lesser than its child then prune the tree else not . i.e if Parent (error)< Child(error) then “Prune” else don’t Prune
  • 15. STEPS  In each tree, the number of instances in the pruning data that are misclassified by the individual nodes are given in parentheses.  Assuming that the tree is traversed left-to-right.  The pruning procedure first considers for removal the subtree attached to node 3.  Because the subtree’s error on the pruning data (1 error) exceeds the error of node 3 itself (0errors), node 3 is converted to a leaf.  Next, node 6 is replaced by a leaf for the same reason
  • 16.  Having processed both of its successors, the pruning procedure then considers node 2 for deletion. However, because the subtree attached to node 2 makes fewer mistakes (0 errors) than node 2 itself (1 error), the subtree remains in place.  Next, the subtree extending from node 9 is considered for pruning, resulting in a leaf  In the last step, node 1 is considered for pruning, leaving the tree unchanged.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. COMPARISION  Prepruning is faster than post pruning since it don’t need to wait for complete construction of decision tree.  But still Post-pruning is preferable to pre-pruning because of “interaction effect”.  These are the efects which arise after interaction of several attributes.  Prepruning suppresses growth by evaluating each attribute individually, and so might overlook effects that are due to the interaction of several attributes and stop too early. Post- pruning, on the other hand, avoids this problem because interaction effects are visible in the fully grown tree.