SlideShare a Scribd company logo
1 of 30
AN INTRODUCTION
TO
DECISION TREES
Prepared for:
CIS595 Knowledge Discovery and Data Mining
Professor Vasileios Megalooikonomou
Presented by:
Thomas Mahoney
Learning Systems
 Learning systems consider
 Solved cases - cases assigned to a class
 Information from the solved cases - general
decision rules
 Rules - implemented in a model
 Model - applied to new cases
 Different types of models - present their results
in various forms
 Linear discriminant model - mathematical
equation (p = ax1 + bx2 + cx3 + dx4 + ex5).
 Presentation comprehensibility
Data Classification and Prediction
 Data classification
 classification
 prediction
 Methods of classification
 decision tree induction
 Bayesian classification
 backpropagation
 association rule mining
Data Classification and Prediction
 Method creates model from a set of
training data
 individual data records (samples, objects,
tuples)
 records can each be described by its
attributes
 attributes arranged in a set of classes
 supervised learning - each record is assigned
a class label
Data Classification and Prediction
 Model form representations
 mathematical formulae
 classification rules
 decision trees
 Model utility for data classification
 degree of accuracy
 predict unknown outcomes for a new (no-test)
data set
 classification - outcomes always discrete or
nominal values
 regression may contain continuous or ordered
values
Description of
Decision Rules or Trees
 Intuitive appeal for users
 Presentation Forms
 “if, then” statements (decision rules)
 graphically - decision trees
What They Look Like
 Works like a flow chart
 Looks like an upside down tree
 Nodes
 appear as rectangles or circles
 represent test or decision
 Lines or branches - represent outcome of
a test
 Circles - terminal (leaf) nodes
 Top or starting node- root node
 Internal nodes - rectangles
An Example
 Bank - loan application
 Classify application
 approved class
 denied class
 Criteria - Target Class approved if 3 binary
attributes have certain value:
 (a) borrower has good credit history (credit rating in excess
of some threshold)
 (b) loan amount less than some percentage of collateral value
(e.g., 80% home value)
 (c) borrower has income to make payments on loan
 Possible scenarios = 32 = 8
 If the parameters for splitting the nodes can be adjusted, the
number of scenarios grows exponentially.
How They Work
 Decision rules - partition sample of data
 Terminal node (leaf) indicates the class assignment
 Tree partitions samples into mutually exclusive groups
 One group for each terminal node
 All paths
 start at the root node
 end at a leaf
 Each path represents a decision rule
 joining (AND) of all the tests along that path
 separate paths that result in the same class are disjunctions (ORs)
 All paths - mutually exclusive
 for any one case - only one path will be followed
 false decisions on the left branch
 true decisions on the right branch
Disjunctive Normal Form
 Non-terminal node - model identifies an
attribute to be tested
 test splits attribute into mutually exclusive
disjoint sets
 splitting continues until a node - one class
(terminal node or leaf)
 Structure - disjunctive normal form
 limits form of a rule to conjunctions (adding)
of terms
 allows disjunction (or-ing) over a set of rules
Geometry
 Disjunctive normal form
 Fits shapes of decision boundaries between classes
 Classes formed by lines parallel to axes
 Result - rectangular shaped class regions
Binary Trees
 Characteristics
 two branches leave each non-terminal
node
 those two branches cover outcomes of
the test
 exactly one branch enters each non-
root node
 there are n terminal nodes
 there are n-1 non-terminal nodes
Nonbinary Trees
 Characteristics
 two or more branches leave each non-
terminal node
 those branches cover outcomes of the
test
 exactly one branch enters each non-
root node
 there are n terminal nodes
 there are n-1 non-terminal nodes
Goal
 Dual goal - Develop tree that
 is small
 classifies and predicts class with accuracy
 Small size
 a smaller tree more easily understood
 smaller tree less susceptible to overfitting
 large tree less information regarding
classifying and predicting cases
Rule Induction
 Process of building the decision tree or
ascertaining the decision rules
 tree induction
 rule induction
 induction
 Decision tree algorithms
 induce decision trees recursively
 from the root (top) down - greedy approach
 established basic algorithms include ID3 and
C4.5
Discrete vs. Continuous Attributes
 Continuous variables attributes -
problems for decision trees
 increase computational complexity of the task
 promote prediction inaccuracy
 lead to overfitting of data
 Convert continuous variables into discrete
intervals
 “greater than or equal to” and “less than”
 optimal solution for conversion
 difficult to determine discrete intervals ideal
• size
• number
Making the Split
 Models induce a tree by recursively
selecting and subdividing attributes
 random selection - noisy variables
 inefficient production of inaccurate trees
 Efficient models
 examine each variable
 determine which will improve accuracy of
entire tree
 problem - this approach decides best split
without considering subsequent splits
Evaluating the Splits
Measures of impurity or its inverse, goodness reduce
impurity or degree of randomness at each node popular
measures include:
Entropy Function
- pj log pj
j
Gini Index
1 -  p2
j
j
Twoing Rule
k
(TL /n) * (TR /n) * ( Li TL -
Ri/ TR)2
Evaluating the Splits
Max Minority
Sum of Variances
Overfitting
 Error rate in predicting the correct
class for new cases
 overfitting of test data
 very low apparent error rate
 high actual error rate
Optimal Size
 Certain minimal size smaller tree
 higher apparent error rate
 lower actual error rate
 Goal
 identify threshold
 minimize actual error rate
 achieve greatest predictive accuracy
Ending Tree Growth
 Grow the tree until
 additional splitting produces no
significant information gain
 statistical test - a chi-squared test
 problem - trees that are too small
 only compares one split with the next
descending split
Pruning
 Grow large tree
 reduce its size by eliminating or pruning weak
branches step by step
 continue until minimum true error rate
 Pruning Methods
 reduced-error pruning
 divides samples into test set and training set
 training set is used to produce the fully
expanded tree
 tree is then tested using the test set
 weak branches are pruned
 stop when no more improvement
Pruning
 Resampling
 5 - fold cross-validation
 80% cases used for training; remainder for
testing
 Weakest-link or cost-complexity pruning
 trim weakest link ( produces the smallest
increase in the apparent error rate)
 method can be combined with resampling
Variations and Enhancements
to Basic Decision Trees
 Multivariate or Oblique Trees
 CART-LC - CART with Linear
Combinations
 LMDT - Linear Machine Decision Trees
 SADT - Simulated Annealing of Decision
Trees
 OC1 - Oblique Classifier 1
Evaluating Decision Trees
 Method’s Appropriateness
 Data set or type
 Criteria
 accuracy - predict class label for new data
 scalability
• performs model generation and prediction functions
• large data sets
• satisfactory speed
 robustness
• perform well despite noisy or missing data
 intuitive appeal
• results easily understood
• promotes decision making
Decision Tree Limitations
 No backtracking
 local optimal solution not global optimal
solution
 lookahead features may give us better trees
 Rectangular-shaped geometric regions
 in two-dimensional space
• regions bounded by lines parallel to the x- and y-
axes
 some linear relationships not parallel to the
axes
Conclusions
 Utility
 analyze classified data
 produce
 accurate and easily understood classification
rules
 with good predictive value
Improvements
 Limitations being addressed
 multivariate discrimination - oblique trees
 data mining techniques
Bibliography
 A System for Induction of Oblique Decision Trees, Sreerama K. Murthy,
Simon Kasif, Steven Salzberg, Journal of Artificial Intelligence Research 2
(1994) 1-32.
 Automatic Construction of Decision Trees from Data: A Multi-Disciplinary
Survey, Sreerama K. Murthy, Data Mining and Knowledge Discovery, 2.
345-389 (1998) Kluwer Academic Publishers.
 Classification and Regression Trees, Leo Breiman, Jerome Friedman,
Richard Olshen and Charles Stone, 1984, Wadsworth Int. Group.
 Computer Systems That Learn, Sholom M. Weiss and Casimer A.
Kulikowski, 1991, Morgan Kaufman.
 Data Mining, Concepts and Techniques, Jiawei Han and Micheline Kamber,
2001, Morgan Kaufman.
 Introduction to Mathematical Techniques in Pattern Recognition, Harry C.
Andrews, 1972, Wiley-Interscience.
 Machine Learning, Tom M. Mitchell, 1997, McGraw-Hill.

More Related Content

Similar to decisiontrees.ppt

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchjim
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Researchbutest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Researchkevinlan
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxVijayalakshmi171563
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learningRajasekhar364622
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique Sujeet Suryawanshi
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Shalin Hai-Jew
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
slides
slidesslides
slidesbutest
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 

Similar to decisiontrees.ppt (20)

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Introduction to cart_2009
Introduction to cart_2009Introduction to cart_2009
Introduction to cart_2009
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
NSL KDD Cup 99 dataset Anomaly Detection using Machine Learning Technique
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
slides
slidesslides
slides
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Advanced cart 2007
Advanced cart 2007Advanced cart 2007
Advanced cart 2007
 

More from PriyadharshiniG41

understanding-cholera-a-comprehensive-analysis.pdf
understanding-cholera-a-comprehensive-analysis.pdfunderstanding-cholera-a-comprehensive-analysis.pdf
understanding-cholera-a-comprehensive-analysis.pdfPriyadharshiniG41
 
combatting-malaria-strategies-for-prevention-and-treatment.pdf
combatting-malaria-strategies-for-prevention-and-treatment.pdfcombatting-malaria-strategies-for-prevention-and-treatment.pdf
combatting-malaria-strategies-for-prevention-and-treatment.pdfPriyadharshiniG41
 
ant colony optimization working and explanation
ant colony optimization working and explanationant colony optimization working and explanation
ant colony optimization working and explanationPriyadharshiniG41
 
RandomForests in artificial intelligence
RandomForests in artificial intelligenceRandomForests in artificial intelligence
RandomForests in artificial intelligencePriyadharshiniG41
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligencePriyadharshiniG41
 
information retrieval in artificial intelligence
information retrieval in artificial intelligenceinformation retrieval in artificial intelligence
information retrieval in artificial intelligencePriyadharshiniG41
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligencePriyadharshiniG41
 
Unit I-Final MArketing analytics unit 1 ppt
Unit I-Final MArketing analytics unit 1 pptUnit I-Final MArketing analytics unit 1 ppt
Unit I-Final MArketing analytics unit 1 pptPriyadharshiniG41
 
agent architecture in artificial intelligence.pptx
agent architecture in artificial intelligence.pptxagent architecture in artificial intelligence.pptx
agent architecture in artificial intelligence.pptxPriyadharshiniG41
 
trust,bargain,negotiate in artificail intelligence
trust,bargain,negotiate in artificail intelligencetrust,bargain,negotiate in artificail intelligence
trust,bargain,negotiate in artificail intelligencePriyadharshiniG41
 
spirometry classification enhanced using ai
spirometry classification enhanced using aispirometry classification enhanced using ai
spirometry classification enhanced using aiPriyadharshiniG41
 
Minmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptxMinmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptxPriyadharshiniG41
 
First order logic or Predicate logic.pptx
First order logic or Predicate logic.pptxFirst order logic or Predicate logic.pptx
First order logic or Predicate logic.pptxPriyadharshiniG41
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxPriyadharshiniG41
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxPriyadharshiniG41
 

More from PriyadharshiniG41 (20)

understanding-cholera-a-comprehensive-analysis.pdf
understanding-cholera-a-comprehensive-analysis.pdfunderstanding-cholera-a-comprehensive-analysis.pdf
understanding-cholera-a-comprehensive-analysis.pdf
 
combatting-malaria-strategies-for-prevention-and-treatment.pdf
combatting-malaria-strategies-for-prevention-and-treatment.pdfcombatting-malaria-strategies-for-prevention-and-treatment.pdf
combatting-malaria-strategies-for-prevention-and-treatment.pdf
 
ant colony optimization working and explanation
ant colony optimization working and explanationant colony optimization working and explanation
ant colony optimization working and explanation
 
RandomForests in artificial intelligence
RandomForests in artificial intelligenceRandomForests in artificial intelligence
RandomForests in artificial intelligence
 
knowledge representation in artificial intelligence
knowledge representation in artificial intelligenceknowledge representation in artificial intelligence
knowledge representation in artificial intelligence
 
information retrieval in artificial intelligence
information retrieval in artificial intelligenceinformation retrieval in artificial intelligence
information retrieval in artificial intelligence
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligence
 
Unit I-Final MArketing analytics unit 1 ppt
Unit I-Final MArketing analytics unit 1 pptUnit I-Final MArketing analytics unit 1 ppt
Unit I-Final MArketing analytics unit 1 ppt
 
agent architecture in artificial intelligence.pptx
agent architecture in artificial intelligence.pptxagent architecture in artificial intelligence.pptx
agent architecture in artificial intelligence.pptx
 
trust,bargain,negotiate in artificail intelligence
trust,bargain,negotiate in artificail intelligencetrust,bargain,negotiate in artificail intelligence
trust,bargain,negotiate in artificail intelligence
 
spirometry classification enhanced using ai
spirometry classification enhanced using aispirometry classification enhanced using ai
spirometry classification enhanced using ai
 
Minmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptxMinmax and alpha beta pruning.pptx
Minmax and alpha beta pruning.pptx
 
dds.pptx
dds.pptxdds.pptx
dds.pptx
 
actuators.pptx
actuators.pptxactuators.pptx
actuators.pptx
 
First order logic or Predicate logic.pptx
First order logic or Predicate logic.pptxFirst order logic or Predicate logic.pptx
First order logic or Predicate logic.pptx
 
14.08.2020 LKG.pdf
14.08.2020 LKG.pdf14.08.2020 LKG.pdf
14.08.2020 LKG.pdf
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
problemsolving with AI.pptx
problemsolving with AI.pptxproblemsolving with AI.pptx
problemsolving with AI.pptx
 
Naïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptxNaïve Bayes Classifier Algorithm.pptx
Naïve Bayes Classifier Algorithm.pptx
 
problem characterstics.pptx
problem characterstics.pptxproblem characterstics.pptx
problem characterstics.pptx
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

decisiontrees.ppt

  • 1. AN INTRODUCTION TO DECISION TREES Prepared for: CIS595 Knowledge Discovery and Data Mining Professor Vasileios Megalooikonomou Presented by: Thomas Mahoney
  • 2. Learning Systems  Learning systems consider  Solved cases - cases assigned to a class  Information from the solved cases - general decision rules  Rules - implemented in a model  Model - applied to new cases  Different types of models - present their results in various forms  Linear discriminant model - mathematical equation (p = ax1 + bx2 + cx3 + dx4 + ex5).  Presentation comprehensibility
  • 3. Data Classification and Prediction  Data classification  classification  prediction  Methods of classification  decision tree induction  Bayesian classification  backpropagation  association rule mining
  • 4. Data Classification and Prediction  Method creates model from a set of training data  individual data records (samples, objects, tuples)  records can each be described by its attributes  attributes arranged in a set of classes  supervised learning - each record is assigned a class label
  • 5. Data Classification and Prediction  Model form representations  mathematical formulae  classification rules  decision trees  Model utility for data classification  degree of accuracy  predict unknown outcomes for a new (no-test) data set  classification - outcomes always discrete or nominal values  regression may contain continuous or ordered values
  • 6. Description of Decision Rules or Trees  Intuitive appeal for users  Presentation Forms  “if, then” statements (decision rules)  graphically - decision trees
  • 7. What They Look Like  Works like a flow chart  Looks like an upside down tree  Nodes  appear as rectangles or circles  represent test or decision  Lines or branches - represent outcome of a test  Circles - terminal (leaf) nodes  Top or starting node- root node  Internal nodes - rectangles
  • 8.
  • 9. An Example  Bank - loan application  Classify application  approved class  denied class  Criteria - Target Class approved if 3 binary attributes have certain value:  (a) borrower has good credit history (credit rating in excess of some threshold)  (b) loan amount less than some percentage of collateral value (e.g., 80% home value)  (c) borrower has income to make payments on loan  Possible scenarios = 32 = 8  If the parameters for splitting the nodes can be adjusted, the number of scenarios grows exponentially.
  • 10. How They Work  Decision rules - partition sample of data  Terminal node (leaf) indicates the class assignment  Tree partitions samples into mutually exclusive groups  One group for each terminal node  All paths  start at the root node  end at a leaf  Each path represents a decision rule  joining (AND) of all the tests along that path  separate paths that result in the same class are disjunctions (ORs)  All paths - mutually exclusive  for any one case - only one path will be followed  false decisions on the left branch  true decisions on the right branch
  • 11. Disjunctive Normal Form  Non-terminal node - model identifies an attribute to be tested  test splits attribute into mutually exclusive disjoint sets  splitting continues until a node - one class (terminal node or leaf)  Structure - disjunctive normal form  limits form of a rule to conjunctions (adding) of terms  allows disjunction (or-ing) over a set of rules
  • 12. Geometry  Disjunctive normal form  Fits shapes of decision boundaries between classes  Classes formed by lines parallel to axes  Result - rectangular shaped class regions
  • 13. Binary Trees  Characteristics  two branches leave each non-terminal node  those two branches cover outcomes of the test  exactly one branch enters each non- root node  there are n terminal nodes  there are n-1 non-terminal nodes
  • 14. Nonbinary Trees  Characteristics  two or more branches leave each non- terminal node  those branches cover outcomes of the test  exactly one branch enters each non- root node  there are n terminal nodes  there are n-1 non-terminal nodes
  • 15. Goal  Dual goal - Develop tree that  is small  classifies and predicts class with accuracy  Small size  a smaller tree more easily understood  smaller tree less susceptible to overfitting  large tree less information regarding classifying and predicting cases
  • 16. Rule Induction  Process of building the decision tree or ascertaining the decision rules  tree induction  rule induction  induction  Decision tree algorithms  induce decision trees recursively  from the root (top) down - greedy approach  established basic algorithms include ID3 and C4.5
  • 17. Discrete vs. Continuous Attributes  Continuous variables attributes - problems for decision trees  increase computational complexity of the task  promote prediction inaccuracy  lead to overfitting of data  Convert continuous variables into discrete intervals  “greater than or equal to” and “less than”  optimal solution for conversion  difficult to determine discrete intervals ideal • size • number
  • 18. Making the Split  Models induce a tree by recursively selecting and subdividing attributes  random selection - noisy variables  inefficient production of inaccurate trees  Efficient models  examine each variable  determine which will improve accuracy of entire tree  problem - this approach decides best split without considering subsequent splits
  • 19. Evaluating the Splits Measures of impurity or its inverse, goodness reduce impurity or degree of randomness at each node popular measures include: Entropy Function - pj log pj j Gini Index 1 -  p2 j j Twoing Rule k (TL /n) * (TR /n) * ( Li TL - Ri/ TR)2
  • 20. Evaluating the Splits Max Minority Sum of Variances
  • 21. Overfitting  Error rate in predicting the correct class for new cases  overfitting of test data  very low apparent error rate  high actual error rate
  • 22. Optimal Size  Certain minimal size smaller tree  higher apparent error rate  lower actual error rate  Goal  identify threshold  minimize actual error rate  achieve greatest predictive accuracy
  • 23. Ending Tree Growth  Grow the tree until  additional splitting produces no significant information gain  statistical test - a chi-squared test  problem - trees that are too small  only compares one split with the next descending split
  • 24. Pruning  Grow large tree  reduce its size by eliminating or pruning weak branches step by step  continue until minimum true error rate  Pruning Methods  reduced-error pruning  divides samples into test set and training set  training set is used to produce the fully expanded tree  tree is then tested using the test set  weak branches are pruned  stop when no more improvement
  • 25. Pruning  Resampling  5 - fold cross-validation  80% cases used for training; remainder for testing  Weakest-link or cost-complexity pruning  trim weakest link ( produces the smallest increase in the apparent error rate)  method can be combined with resampling
  • 26. Variations and Enhancements to Basic Decision Trees  Multivariate or Oblique Trees  CART-LC - CART with Linear Combinations  LMDT - Linear Machine Decision Trees  SADT - Simulated Annealing of Decision Trees  OC1 - Oblique Classifier 1
  • 27. Evaluating Decision Trees  Method’s Appropriateness  Data set or type  Criteria  accuracy - predict class label for new data  scalability • performs model generation and prediction functions • large data sets • satisfactory speed  robustness • perform well despite noisy or missing data  intuitive appeal • results easily understood • promotes decision making
  • 28. Decision Tree Limitations  No backtracking  local optimal solution not global optimal solution  lookahead features may give us better trees  Rectangular-shaped geometric regions  in two-dimensional space • regions bounded by lines parallel to the x- and y- axes  some linear relationships not parallel to the axes
  • 29. Conclusions  Utility  analyze classified data  produce  accurate and easily understood classification rules  with good predictive value Improvements  Limitations being addressed  multivariate discrimination - oblique trees  data mining techniques
  • 30. Bibliography  A System for Induction of Oblique Decision Trees, Sreerama K. Murthy, Simon Kasif, Steven Salzberg, Journal of Artificial Intelligence Research 2 (1994) 1-32.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey, Sreerama K. Murthy, Data Mining and Knowledge Discovery, 2. 345-389 (1998) Kluwer Academic Publishers.  Classification and Regression Trees, Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone, 1984, Wadsworth Int. Group.  Computer Systems That Learn, Sholom M. Weiss and Casimer A. Kulikowski, 1991, Morgan Kaufman.  Data Mining, Concepts and Techniques, Jiawei Han and Micheline Kamber, 2001, Morgan Kaufman.  Introduction to Mathematical Techniques in Pattern Recognition, Harry C. Andrews, 1972, Wiley-Interscience.  Machine Learning, Tom M. Mitchell, 1997, McGraw-Hill.