SlideShare a Scribd company logo
1 of 20
Decision Tree of CART
Dinda、Isaac
Decision Tree (1/2)
feature vector (xi)
yi: +1:Yes, -1: No
• Training set
• Learned decision tree
[Note]
Only one feature will be
involved at a node
2
Decision Tree (2/2)
• Example:
Outlook Temperature Humidity Windy Play
Rainy Hot High True ?
Ans: no
(A test instance)
3
Why use decision tree
• Easy to interpret
• If outlook is sunny and humidity is high, then
we can’t play tennis ball.
• Perform feature selection
• The top few nodes on which the tree is split are
essentially the most important variables within
the dataset and feature selection is completed
automatically
4
Why not use decision tree
• Decision Trees do not work well if you have smooth boundaries
5
Smooth boundaries D.T. boundaries
Why not use decision tree
• Poor Resolution on Data With Complex Relationships Among the
Variables
6
How to generate a classification tree?
7
CART (Classification and Regression Trees)
Dataset S include n classes ,Gini(S) define as
pj is probability of value in S belong to class j
CART Briemen(1984) Discrete and
continuous
Gini Index Entire Error Rate
Learning
Method(演算法)
Author(作者) Data Type(資料屬
性)
Splitting Rule(分割
規則)
Pruning Rule(修剪
樹規則)
Example
• GiniOutlook(S) =
3
4
* (1 – (
3
3
)2 – (
0
3
)2 ) +
1
4
* (1 – (
1
1
)2 – (
0
1
)2 ) = 0
9
Thresholdsunny rain
PlayNot play
We have two feature Outlook、Temp., and we want to know that will we play tennis?
Thresholdhot cold
PlayNot play
GiniTemp. (S) =
2
4
* (1 – (
1
2
)2 – (
1
2
)2 ) +
2
4
* (1 – (
1
2
)2 – (
1
2
)2 ) =
1
2
Best threshold
Worst threshold
Method’s Blue part is calculate how pure can we get after we cut threshold. Orange part means its weight.
Example
DAY Outlook Temp. Play tennis
D1 Sunny Hot NO
D2 Sunny Hot YES
D3 Sunny Mild NO
D4 Sunny Cold YES
D5 Rain Cold YES
Sunny rain
NO 3 0
YES 1 1
Play
Outlook Hot Mild Cold
NO 1 1 0
YES 1 0 2
Play
Temp.
We have two feature Outlook、Temp., and we want to know that will we play tennis?
And using those two features to draw decision tree by using CART method.
0
1
2
3
4
NO YES
Outlook
Sunny rain
0
1
2
3
NO YES
Temp.
Hot,Mild Cold
0
1
2
3
NO YES
Temp.
Hot Mild,Cold
Gini(S{Sunny}) = 1 – (
1
4
)2 – (
3
4
)2 =
3
8
Gini(S{Rain}) = 1 – (
1
1
)2 – (
0
1
)2 = 0
GiniOutlook(S) =
4
5
∗
3
8
+
1
5
∗ 0 =
3
10
Sunny rain
NO 3 0
YES 1 1
Play
Outlook Hot,Mild Cold
NO 2 0
YES 1 2
Play
Temp.
Gini(STempϵ{Hot,Mild}) = 1 – (
2
3
)2 – (
1
3
)2 =
4
9
Gini(STempϵ{Cold}) = 1 – (
2
2
)2 – (
0
2
)2 = 0
GiniTemp.(S) =
3
5
∗
4
9
+
2
5
∗ 0 =
4
15
Hot Mild,Cold
NO 1 1
YES 1 2
Play
Temp.
Gini(STempϵ{Hot}) = 1 – (
1
2
)2 – (
1
2
)2 =
1
2
Gini(STempϵ{Cold,Mild}) = 1 – (
1
3
)2 – (
2
3
)2 =
4
9
GiniTemp.(S) =
2
5
∗
1
2
+
3
5
∗
4
9
=
7
15
Example
Example
• GiniOutlook(S) =
3
10
> GiniTemp.(S) =
4
15
We choose small one to be decision tree’s root.
Temp.
[Cold] [Mild, Hot]
Yes
Outlook
Yes No
[Rain] [Sunny]
Example
Riding mower classification
13
Obs # Income Lot size Class
1 Middle Middle Owners
2 High Middle Owners
3 High Big Owners
4 High Big Owners
5 Middle Big Owners
Obs # Income Lot size Class
6 Middle Middle Non-owners
7 Low Big Non-owners
8 Middle Middle Non-owners
9 Low Big Non-owners
10 High Middle Non-owners
A riding-mower manufacturer would like to find a way of classifying families in a city into those that
are likely to purchase a riding mower and those who are not likely to buy one. A pilot random sample
of 5 owners and 5 non-owners in the city is undertaken.
Example
14
10
12
14
16
18
20
22
24
30 40 50 60 70 80 90 100 110
Lotsize
Income
Owners Non-owners
How to split?
• Split criterion: Goodness function
• Used to select the attribute to be split at a tree node
during the tree generation phase
• Goodness function in CART: Gini Index
15
Example: Gini(Lot size)
16
14 14.8 16 16.4 16.8 17.2 17.6 18.4 18.8
14.4 15.4 16.2 16.6 17 17.4 18 18.6 19
19.2
Split-point
Example
17
10
12
14
16
18
20
22
24
30 40 50 60 70 80 90 100 110
Lotsize
Income
Owners Non-owners
19
17
Example: Gini(Lot size)
Split-point = 17
• D1 = Lot size ≤ 17, D2 = Lot size >
17
Split-point = 19
• D1 = Lot size ≤ 19, D2 = Lot size >
19
18
999.0
666.0333.0
19
8
19
11
1
24
19
5
4
5
1
1
24
5
)(
24
19
)(
24
5
)(
2222
21




























































 DGiniDGinisizeLotGini
5.0
25.025.0
12
9
12
3
1
24
12
12
3
12
9
1
24
12
)(
24
12
)(
24
12
)(
2222
21




























































 DGiniDGinisizeLotGini
Distance is narrow Distance is wide
Why use impurity instead of error as
goodness function?
• The main objective of decision tree is to find pure node containing
only one class
19
Error rate: 25%
Example: Decision Tree
20
19
Owner
Non-
owner
Lot size ≤19 Lot size>19Lot size

More Related Content

Similar to Decision tree of cart

Math Review.pdf
Math Review.pdfMath Review.pdf
Math Review.pdfponsia1
 
"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love Bucharest"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love BucharestStefan Adam
 
DS-004-Robust Design
DS-004-Robust DesignDS-004-Robust Design
DS-004-Robust Designhandbook
 
2.1_Workbook.pptx
2.1_Workbook.pptx2.1_Workbook.pptx
2.1_Workbook.pptxshush24
 
Unit ii divide and conquer -1
Unit ii divide and conquer -1Unit ii divide and conquer -1
Unit ii divide and conquer -1subhashchandra197
 
Introduction to Nastran SOL 200 Size Optimization
Introduction to Nastran SOL 200 Size OptimizationIntroduction to Nastran SOL 200 Size Optimization
Introduction to Nastran SOL 200 Size OptimizationChristian Aparicio
 
Decision tree cart c4.5
Decision tree   cart c4.5Decision tree   cart c4.5
Decision tree cart c4.5SatishH5
 
Calculation techniques in numbers
Calculation techniques in numbersCalculation techniques in numbers
Calculation techniques in numberssealih
 
Clustering for Beginners
Clustering for BeginnersClustering for Beginners
Clustering for BeginnersSayeed Mahmud
 
Data Augmentation and Disaggregation by Neal Fultz
Data Augmentation and Disaggregation by Neal FultzData Augmentation and Disaggregation by Neal Fultz
Data Augmentation and Disaggregation by Neal FultzData Con LA
 
machine learning.ppt
machine learning.pptmachine learning.ppt
machine learning.pptPratik Gohel
 
introduction to machine learning 3c.pptx
introduction to machine learning 3c.pptxintroduction to machine learning 3c.pptx
introduction to machine learning 3c.pptxPratik Gohel
 
GCSE Linear Starters Higher
GCSE Linear Starters Higher GCSE Linear Starters Higher
GCSE Linear Starters Higher MethuzelaJones
 
Estimation, Approximation and Standard form
Estimation, Approximation and Standard formEstimation, Approximation and Standard form
Estimation, Approximation and Standard formNsomp
 
W1-L1 Negative-numbers-ppt..pptx
W1-L1 Negative-numbers-ppt..pptxW1-L1 Negative-numbers-ppt..pptx
W1-L1 Negative-numbers-ppt..pptxGhassan44
 
1532 0545-2001-02-01-0050
1532 0545-2001-02-01-00501532 0545-2001-02-01-0050
1532 0545-2001-02-01-0050Rana Ahmad
 
Interpretation Module (Part 1 of 3)
Interpretation Module (Part 1 of 3)Interpretation Module (Part 1 of 3)
Interpretation Module (Part 1 of 3)csunklab
 

Similar to Decision tree of cart (20)

Math Review.pdf
Math Review.pdfMath Review.pdf
Math Review.pdf
 
"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love Bucharest"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love Bucharest
 
DS-004-Robust Design
DS-004-Robust DesignDS-004-Robust Design
DS-004-Robust Design
 
2.1_Workbook.pptx
2.1_Workbook.pptx2.1_Workbook.pptx
2.1_Workbook.pptx
 
Unit ii divide and conquer -1
Unit ii divide and conquer -1Unit ii divide and conquer -1
Unit ii divide and conquer -1
 
Introduction to Nastran SOL 200 Size Optimization
Introduction to Nastran SOL 200 Size OptimizationIntroduction to Nastran SOL 200 Size Optimization
Introduction to Nastran SOL 200 Size Optimization
 
Decision tree cart c4.5
Decision tree   cart c4.5Decision tree   cart c4.5
Decision tree cart c4.5
 
Calculation techniques in numbers
Calculation techniques in numbersCalculation techniques in numbers
Calculation techniques in numbers
 
Clustering for Beginners
Clustering for BeginnersClustering for Beginners
Clustering for Beginners
 
Data Augmentation and Disaggregation by Neal Fultz
Data Augmentation and Disaggregation by Neal FultzData Augmentation and Disaggregation by Neal Fultz
Data Augmentation and Disaggregation by Neal Fultz
 
machine learning.ppt
machine learning.pptmachine learning.ppt
machine learning.ppt
 
introduction to machine learning 3c.pptx
introduction to machine learning 3c.pptxintroduction to machine learning 3c.pptx
introduction to machine learning 3c.pptx
 
GCSE Linear Starters Higher
GCSE Linear Starters Higher GCSE Linear Starters Higher
GCSE Linear Starters Higher
 
Estimation, Approximation and Standard form
Estimation, Approximation and Standard formEstimation, Approximation and Standard form
Estimation, Approximation and Standard form
 
W1-L1 Negative-numbers-ppt..pptx
W1-L1 Negative-numbers-ppt..pptxW1-L1 Negative-numbers-ppt..pptx
W1-L1 Negative-numbers-ppt..pptx
 
Decision theory & decisiontrees
Decision theory & decisiontreesDecision theory & decisiontrees
Decision theory & decisiontrees
 
1532 0545-2001-02-01-0050
1532 0545-2001-02-01-00501532 0545-2001-02-01-0050
1532 0545-2001-02-01-0050
 
Interpretation Module (Part 1 of 3)
Interpretation Module (Part 1 of 3)Interpretation Module (Part 1 of 3)
Interpretation Module (Part 1 of 3)
 
Sudoku solve rmain
Sudoku solve rmainSudoku solve rmain
Sudoku solve rmain
 
BAS 250 Lecture 5
BAS 250 Lecture 5BAS 250 Lecture 5
BAS 250 Lecture 5
 

More from kalung0313

Stacking ensemble
Stacking ensembleStacking ensemble
Stacking ensemblekalung0313
 
Bagging ensemble
Bagging ensembleBagging ensemble
Bagging ensemblekalung0313
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transformskalung0313
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)kalung0313
 
FLDA(fisher linear discriminant analysis)
FLDA(fisher linear discriminant analysis)FLDA(fisher linear discriminant analysis)
FLDA(fisher linear discriminant analysis)kalung0313
 
Tests of hypothesis
Tests of hypothesisTests of hypothesis
Tests of hypothesiskalung0313
 

More from kalung0313 (8)

Stacking ensemble
Stacking ensembleStacking ensemble
Stacking ensemble
 
Bagging ensemble
Bagging ensembleBagging ensemble
Bagging ensemble
 
Fourier transforms
Fourier transformsFourier transforms
Fourier transforms
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
FLDA(fisher linear discriminant analysis)
FLDA(fisher linear discriminant analysis)FLDA(fisher linear discriminant analysis)
FLDA(fisher linear discriminant analysis)
 
adaboost
adaboostadaboost
adaboost
 
Tests of hypothesis
Tests of hypothesisTests of hypothesis
Tests of hypothesis
 
LR vs LDA
LR vs LDALR vs LDA
LR vs LDA
 

Recently uploaded

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Recently uploaded (20)

8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Decision tree of cart

  • 1. Decision Tree of CART Dinda、Isaac
  • 2. Decision Tree (1/2) feature vector (xi) yi: +1:Yes, -1: No • Training set • Learned decision tree [Note] Only one feature will be involved at a node 2
  • 3. Decision Tree (2/2) • Example: Outlook Temperature Humidity Windy Play Rainy Hot High True ? Ans: no (A test instance) 3
  • 4. Why use decision tree • Easy to interpret • If outlook is sunny and humidity is high, then we can’t play tennis ball. • Perform feature selection • The top few nodes on which the tree is split are essentially the most important variables within the dataset and feature selection is completed automatically 4
  • 5. Why not use decision tree • Decision Trees do not work well if you have smooth boundaries 5 Smooth boundaries D.T. boundaries
  • 6. Why not use decision tree • Poor Resolution on Data With Complex Relationships Among the Variables 6
  • 7. How to generate a classification tree? 7
  • 8. CART (Classification and Regression Trees) Dataset S include n classes ,Gini(S) define as pj is probability of value in S belong to class j CART Briemen(1984) Discrete and continuous Gini Index Entire Error Rate Learning Method(演算法) Author(作者) Data Type(資料屬 性) Splitting Rule(分割 規則) Pruning Rule(修剪 樹規則)
  • 9. Example • GiniOutlook(S) = 3 4 * (1 – ( 3 3 )2 – ( 0 3 )2 ) + 1 4 * (1 – ( 1 1 )2 – ( 0 1 )2 ) = 0 9 Thresholdsunny rain PlayNot play We have two feature Outlook、Temp., and we want to know that will we play tennis? Thresholdhot cold PlayNot play GiniTemp. (S) = 2 4 * (1 – ( 1 2 )2 – ( 1 2 )2 ) + 2 4 * (1 – ( 1 2 )2 – ( 1 2 )2 ) = 1 2 Best threshold Worst threshold Method’s Blue part is calculate how pure can we get after we cut threshold. Orange part means its weight.
  • 10. Example DAY Outlook Temp. Play tennis D1 Sunny Hot NO D2 Sunny Hot YES D3 Sunny Mild NO D4 Sunny Cold YES D5 Rain Cold YES Sunny rain NO 3 0 YES 1 1 Play Outlook Hot Mild Cold NO 1 1 0 YES 1 0 2 Play Temp. We have two feature Outlook、Temp., and we want to know that will we play tennis? And using those two features to draw decision tree by using CART method.
  • 11. 0 1 2 3 4 NO YES Outlook Sunny rain 0 1 2 3 NO YES Temp. Hot,Mild Cold 0 1 2 3 NO YES Temp. Hot Mild,Cold Gini(S{Sunny}) = 1 – ( 1 4 )2 – ( 3 4 )2 = 3 8 Gini(S{Rain}) = 1 – ( 1 1 )2 – ( 0 1 )2 = 0 GiniOutlook(S) = 4 5 ∗ 3 8 + 1 5 ∗ 0 = 3 10 Sunny rain NO 3 0 YES 1 1 Play Outlook Hot,Mild Cold NO 2 0 YES 1 2 Play Temp. Gini(STempϵ{Hot,Mild}) = 1 – ( 2 3 )2 – ( 1 3 )2 = 4 9 Gini(STempϵ{Cold}) = 1 – ( 2 2 )2 – ( 0 2 )2 = 0 GiniTemp.(S) = 3 5 ∗ 4 9 + 2 5 ∗ 0 = 4 15 Hot Mild,Cold NO 1 1 YES 1 2 Play Temp. Gini(STempϵ{Hot}) = 1 – ( 1 2 )2 – ( 1 2 )2 = 1 2 Gini(STempϵ{Cold,Mild}) = 1 – ( 1 3 )2 – ( 2 3 )2 = 4 9 GiniTemp.(S) = 2 5 ∗ 1 2 + 3 5 ∗ 4 9 = 7 15 Example
  • 12. Example • GiniOutlook(S) = 3 10 > GiniTemp.(S) = 4 15 We choose small one to be decision tree’s root. Temp. [Cold] [Mild, Hot] Yes Outlook Yes No [Rain] [Sunny]
  • 13. Example Riding mower classification 13 Obs # Income Lot size Class 1 Middle Middle Owners 2 High Middle Owners 3 High Big Owners 4 High Big Owners 5 Middle Big Owners Obs # Income Lot size Class 6 Middle Middle Non-owners 7 Low Big Non-owners 8 Middle Middle Non-owners 9 Low Big Non-owners 10 High Middle Non-owners A riding-mower manufacturer would like to find a way of classifying families in a city into those that are likely to purchase a riding mower and those who are not likely to buy one. A pilot random sample of 5 owners and 5 non-owners in the city is undertaken.
  • 14. Example 14 10 12 14 16 18 20 22 24 30 40 50 60 70 80 90 100 110 Lotsize Income Owners Non-owners
  • 15. How to split? • Split criterion: Goodness function • Used to select the attribute to be split at a tree node during the tree generation phase • Goodness function in CART: Gini Index 15
  • 16. Example: Gini(Lot size) 16 14 14.8 16 16.4 16.8 17.2 17.6 18.4 18.8 14.4 15.4 16.2 16.6 17 17.4 18 18.6 19 19.2 Split-point
  • 17. Example 17 10 12 14 16 18 20 22 24 30 40 50 60 70 80 90 100 110 Lotsize Income Owners Non-owners 19 17
  • 18. Example: Gini(Lot size) Split-point = 17 • D1 = Lot size ≤ 17, D2 = Lot size > 17 Split-point = 19 • D1 = Lot size ≤ 19, D2 = Lot size > 19 18 999.0 666.0333.0 19 8 19 11 1 24 19 5 4 5 1 1 24 5 )( 24 19 )( 24 5 )( 2222 21                                                              DGiniDGinisizeLotGini 5.0 25.025.0 12 9 12 3 1 24 12 12 3 12 9 1 24 12 )( 24 12 )( 24 12 )( 2222 21                                                              DGiniDGinisizeLotGini Distance is narrow Distance is wide
  • 19. Why use impurity instead of error as goodness function? • The main objective of decision tree is to find pure node containing only one class 19 Error rate: 25%
  • 20. Example: Decision Tree 20 19 Owner Non- owner Lot size ≤19 Lot size>19Lot size

Editor's Notes

  1. A riding-mower manufacturer would like to find a way of classifying families in a city into those that are likely to purchase a riding mower and those who are not likely to buy one. A pilot random sample of 5 owners and 5 non-owners in the city is undertaken.