SlideShare a Scribd company logo
1 of 19
Nandini V Patil
Asst.Professor
Godutai Engg. College Kalaburagi
Decision Trees
Introduction
 Decision tree are a simple way to guide one’s
path to a decision.
 Decision may be simple binary Eg. Approve of
loan or complex multi-valued decision Eg.
Sickness diagnosis
 Decision trees are hierarchically branched
structures that helps to come out to the decision
on asking certain questions.
 Good decision tree should short and ask only few
meaningful questions.
 Decision trees are very efficient to use, easy to
explain and their classification accuracy is
competitive with other methods.
Decision tree problem
 Experts will use decision tree or decision rules for
solving problems. Human experts Learns
experiences or data points. Similarly machine can be
trained to learn from the past data points and extract
some knowledge or rules from it.
 Predictive accuracy is based on the correct decision
made.
 The more data available for training the decision tree,
the more accurate its knowledge extraction, then it will
make more accurate decisions.
 The more variables the tree can choose from, the
greater is the accuracy of the decision tree.
 Good decision tree should frugal so that it takes the least
no of questions, thus least amount of effort to get to the
Conti
 Decision Problem: Create a decision tree that
helps to make decisions about approving for
playing out door games.
 The objective for predictions are atmospheric
conditions of that place.
 For answering this above question we need past
experiences what decisions are made in the
similar instances. The past data is as follows in
dataset 6.1
Outlook Temp Humidity Windy Play
Sunny Hot Normal True ??
Conti
We don’t have direct solution from the data set so we have to compute the
Decision tree construction
 Decision tree is a hierarchically branched structure.
 Creating a decision is based on asking few simple
questions more important question should be first and
then less important one.
Determining the root node of the tree
 Start the tree constructing by taking an example of
Weather problem for playing.
 Four choices for four variables –start with following
questions
 What is the outlook
 What is the temperature
 What is the humidity
 What is the wind speed
Conti
Attribute Rules Error Total Error
Outlook Sunny No 2/5
Attribute Rules Error Total Error
Outlook Sunny No 2/5
Overcastyes 0/4
Attribute Rules Error Total Error
Outlook Sunny No 2/5
Overcastyes 0/4 4/14
Rainyyes 2/5
Start finding solution with first variable outlook and then will find remaining
variables humidity, temperature and wind. Overlook has three variables sunny,
overcast and rainy
Conti
Two variables have least number of errors ie 4 out of 14 instanc
can be broken using purity of resulting sub trees. In the outlook
has zero errors but in humidity no such subclass.
Conti
Splitting the
tree•Decision tree will look like after first split
Conti
Determining the next Nodes of the tree: Error values will b
calculated for Sunny, it has 3 other variables– temperature, humidity & win
The variable humidity shows the least amount of error ie zero error. Thus the
Sunny branch on the left will use humidity as the next splitting variable
conti
Error values are calculated for Rainy as follows
The variable Windy shows the least amount of error ie zero error. Thus the Outloo
Rainy branch on the right will use Windy as the next splitting variable
Conti
The final decision tree will looks as follows
conti
Outlook Temp Humidity Windy Play
Sunny Hot Normal True ??
Outlook Temp Humidity Windy Play
Sunny Hot Normal True yes
Solve the current problem using the decision tree.
First question to ask is about outlook. Outlook is sunny, thus decision problem
moves to
sunny branch of the tree. In that, node has humidity subtree, in this problem
humidity is normal
thus branch leads to yes answer. Thus the answer to the play problem is yes.
Comparing Decision tree with table
lookup
Lessons from constructing trees
 Final decision tree has zero errors in mapping to the
prior data ie predictive accuracy of tree should be
100%.
 Algorithm should select the minimum no of variables
which are important to solve the problem.
 Tree is almost symmetric with all branches of almost
similar lengths.
 It may possible to increase predictive accuracy by
making more sub-trees & making the tree longer.
 Perfect fitting tree has the danger of over-fitting the
data, thus capturing all the random variations in the
data.
 There will be single best tree for this data, however
two or more equally efficient decision tree of similar
length with similar predictive accuracy for the same
dataset.
Decision tree Algorithms
 Decision tree is based on divide and conquer
method.
 Pseudo code for making decision tree is as
follows—
1. Create a root node & assign all of the training data
to it.
2. Select the best splitting attribute according to
certain criteria.
3. Add a branch to the root node for each value of
the split.
4. Split the data into mutually exclusive subsets
along the lines of the specific split.
5. Repeat steps 2 & 3 for each & every leaf node
Decision tree key elements
 Splitting criteria—
 Which variable to use for the first split? How should one determine the most
important variable for the first branch & subsequently for each subtree?
 Ans: Algorithms use different measures like least error, information gain,Gini’s coefficient.
 What values to use for the split? If the variables have continuous values such
as for age or blood pressure, what value-ranges should be used to make bins.
 How many branches should be allowed for each node? There could be binary
trees, with just two branches at each node. Or there could be more branches
allowed.
 Stopping criteria – When to stop building the tree? Two major ways–
a) When certain depth of the branches has been reached & tree becomes
unreachable after that.
b)When the error level at any node is within predefined tolerable levels.
 Pruning– Act of reducing the size of decision trees by removing sectins of
the tree that provide little value. The decision tree could be trimmed to make
it more balanced, more general &more easily usable. Two approaches in
pruning
 Prepruning
 Postpruning.
Comparing popular Decision tree
Algorithms
Decision
Tree
C4.5 CART CHAID
Fullname Iterative
Dichotomizer(ID3)
Classification and
Regression Trees
Chi-squar automatic
Interaction Detector
Basic
Algorithm
Huntis algorithm Huntis algorithm Adjusted significance
testing
Developer Ross Quinlan Bremman Gordon kass
When
developed
1986 1984 1980
Type of trees Classification Classification &
regression
Classification &
regression
Serial
implementati
on
Tree growth &
tree pruning
Tree growth & tree
pruning
Tree growth & tree
pruning
Type of data Discrete &
continuous;
Incomplete data
Discrete &
continuous;
Non-normal data also
accepted
Conti
Decision
Tree
C4.5 CART CHAID
Type of splits Multi-way Binary splits only;
clever surrogate
splits to reduce tree
depth
Multiway splits
as default
Splitting
criteria
Information gain Gini’s coefficient,&
other
Chi-square test
Pruning
criteria
Clever bottom-
up technique
avoid over-fitting
Remove weakest
links first
Trees can
become very
large
Implementati
on
Publically
available
Publically available
In most packages
Popular in
market research
for segmentation

More Related Content

Similar to Decision trees

Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptxkibriaswe
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligenceMdAlAmin187
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxShivakrishnan18
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentationVijay Yadav
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxPriyadharshiniG41
 
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...pandavaTirumala
 

Similar to Decision trees (20)

Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
 
Decision tree
Decision tree Decision tree
Decision tree
 
Decision tress
  Decision tress  Decision tress
Decision tress
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
Detection Of Fraudlent Behavior In Water Consumption Using A Data Mining Base...
 

Recently uploaded

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture designssuser87fa0c1
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 

Recently uploaded (20)

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
pipeline in computer architecture design
pipeline in computer architecture  designpipeline in computer architecture  design
pipeline in computer architecture design
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 

Decision trees

  • 1. Nandini V Patil Asst.Professor Godutai Engg. College Kalaburagi Decision Trees
  • 2. Introduction  Decision tree are a simple way to guide one’s path to a decision.  Decision may be simple binary Eg. Approve of loan or complex multi-valued decision Eg. Sickness diagnosis  Decision trees are hierarchically branched structures that helps to come out to the decision on asking certain questions.  Good decision tree should short and ask only few meaningful questions.  Decision trees are very efficient to use, easy to explain and their classification accuracy is competitive with other methods.
  • 3. Decision tree problem  Experts will use decision tree or decision rules for solving problems. Human experts Learns experiences or data points. Similarly machine can be trained to learn from the past data points and extract some knowledge or rules from it.  Predictive accuracy is based on the correct decision made.  The more data available for training the decision tree, the more accurate its knowledge extraction, then it will make more accurate decisions.  The more variables the tree can choose from, the greater is the accuracy of the decision tree.  Good decision tree should frugal so that it takes the least no of questions, thus least amount of effort to get to the
  • 4. Conti  Decision Problem: Create a decision tree that helps to make decisions about approving for playing out door games.  The objective for predictions are atmospheric conditions of that place.  For answering this above question we need past experiences what decisions are made in the similar instances. The past data is as follows in dataset 6.1 Outlook Temp Humidity Windy Play Sunny Hot Normal True ??
  • 5. Conti We don’t have direct solution from the data set so we have to compute the
  • 6. Decision tree construction  Decision tree is a hierarchically branched structure.  Creating a decision is based on asking few simple questions more important question should be first and then less important one. Determining the root node of the tree  Start the tree constructing by taking an example of Weather problem for playing.  Four choices for four variables –start with following questions  What is the outlook  What is the temperature  What is the humidity  What is the wind speed
  • 7. Conti Attribute Rules Error Total Error Outlook Sunny No 2/5 Attribute Rules Error Total Error Outlook Sunny No 2/5 Overcastyes 0/4 Attribute Rules Error Total Error Outlook Sunny No 2/5 Overcastyes 0/4 4/14 Rainyyes 2/5 Start finding solution with first variable outlook and then will find remaining variables humidity, temperature and wind. Overlook has three variables sunny, overcast and rainy
  • 8. Conti Two variables have least number of errors ie 4 out of 14 instanc can be broken using purity of resulting sub trees. In the outlook has zero errors but in humidity no such subclass.
  • 9. Conti Splitting the tree•Decision tree will look like after first split
  • 10. Conti Determining the next Nodes of the tree: Error values will b calculated for Sunny, it has 3 other variables– temperature, humidity & win The variable humidity shows the least amount of error ie zero error. Thus the Sunny branch on the left will use humidity as the next splitting variable
  • 11. conti Error values are calculated for Rainy as follows The variable Windy shows the least amount of error ie zero error. Thus the Outloo Rainy branch on the right will use Windy as the next splitting variable
  • 12. Conti The final decision tree will looks as follows
  • 13. conti Outlook Temp Humidity Windy Play Sunny Hot Normal True ?? Outlook Temp Humidity Windy Play Sunny Hot Normal True yes Solve the current problem using the decision tree. First question to ask is about outlook. Outlook is sunny, thus decision problem moves to sunny branch of the tree. In that, node has humidity subtree, in this problem humidity is normal thus branch leads to yes answer. Thus the answer to the play problem is yes.
  • 14. Comparing Decision tree with table lookup
  • 15. Lessons from constructing trees  Final decision tree has zero errors in mapping to the prior data ie predictive accuracy of tree should be 100%.  Algorithm should select the minimum no of variables which are important to solve the problem.  Tree is almost symmetric with all branches of almost similar lengths.  It may possible to increase predictive accuracy by making more sub-trees & making the tree longer.  Perfect fitting tree has the danger of over-fitting the data, thus capturing all the random variations in the data.  There will be single best tree for this data, however two or more equally efficient decision tree of similar length with similar predictive accuracy for the same dataset.
  • 16. Decision tree Algorithms  Decision tree is based on divide and conquer method.  Pseudo code for making decision tree is as follows— 1. Create a root node & assign all of the training data to it. 2. Select the best splitting attribute according to certain criteria. 3. Add a branch to the root node for each value of the split. 4. Split the data into mutually exclusive subsets along the lines of the specific split. 5. Repeat steps 2 & 3 for each & every leaf node
  • 17. Decision tree key elements  Splitting criteria—  Which variable to use for the first split? How should one determine the most important variable for the first branch & subsequently for each subtree?  Ans: Algorithms use different measures like least error, information gain,Gini’s coefficient.  What values to use for the split? If the variables have continuous values such as for age or blood pressure, what value-ranges should be used to make bins.  How many branches should be allowed for each node? There could be binary trees, with just two branches at each node. Or there could be more branches allowed.  Stopping criteria – When to stop building the tree? Two major ways– a) When certain depth of the branches has been reached & tree becomes unreachable after that. b)When the error level at any node is within predefined tolerable levels.  Pruning– Act of reducing the size of decision trees by removing sectins of the tree that provide little value. The decision tree could be trimmed to make it more balanced, more general &more easily usable. Two approaches in pruning  Prepruning  Postpruning.
  • 18. Comparing popular Decision tree Algorithms Decision Tree C4.5 CART CHAID Fullname Iterative Dichotomizer(ID3) Classification and Regression Trees Chi-squar automatic Interaction Detector Basic Algorithm Huntis algorithm Huntis algorithm Adjusted significance testing Developer Ross Quinlan Bremman Gordon kass When developed 1986 1984 1980 Type of trees Classification Classification & regression Classification & regression Serial implementati on Tree growth & tree pruning Tree growth & tree pruning Tree growth & tree pruning Type of data Discrete & continuous; Incomplete data Discrete & continuous; Non-normal data also accepted
  • 19. Conti Decision Tree C4.5 CART CHAID Type of splits Multi-way Binary splits only; clever surrogate splits to reduce tree depth Multiway splits as default Splitting criteria Information gain Gini’s coefficient,& other Chi-square test Pruning criteria Clever bottom- up technique avoid over-fitting Remove weakest links first Trees can become very large Implementati on Publically available Publically available In most packages Popular in market research for segmentation