SlideShare a Scribd company logo
1 of 8
Prof. Neeraj Bhargava
Vishal Dutt
Department of Computer Science, School of
Engineering & System Sciences
MDS University, Ajmer
What is CART?
 Classification And Regression Trees
 Developed by Breiman, Friedman, Olshen, Stone in early 80’s.
 Introduced tree-based modeling into the statistical mainstream
 Rigorous approach involving cross-validation to select the optimal
tree
 One of many tree-based modeling techniques.
 CART -- the classic
 CHAID
 C5.0
 Software package variants (SAS, S-Plus, R…)
 Note: the “rpart” package in “R” is freely available
The Key Idea
Recursive Partitioning
 Take all of your data.
 Consider all possible values of all variables.
 Select the variable/value (X=t1) that produces the greatest
“separation” in the target.
 (X=t1) is called a “split”.
 If X< t1 then send the data to the “left”; otherwise, send data
point to the “right”.
 Now repeat same process on these two “nodes”
 You get a “tree”
 Note: CART only uses binary splits.
Example
Let’s Get Rolling Suppose you have 3 variables:
# vehicles: {1,2,3…10+}
Age category: {1,2,3…6}
Liability-only: {0,1}
 At each iteration, CART tests all 15 splits.
(#veh<2), (#veh<3),…, (#veh<10)
(age<2),…, (age<6)
(lia<1)
Select split resulting in greatest increase in purity.
 Perfect purity: each split has either all claims or all no-claims.
 Perfect impurity: each split has same proportion of claims as
overall population.
Classification Tree Example:
predict likelihood of a claim
 Commercial Auto Dataset
 57,000 policies
 34% claim frequency
 Classification Tree using Gini
splitting rule
 First split:
 Policies with ≥5 vehicles
have 58% claim frequency
 Else 20%
 Big increase in purity
NUM_VEH <= 4.500
Terminal
Node 1
Class Cases %
0 29083 80.0
1 7276 20.0
N = 36359
NUM_VEH > 4.500
Terminal
Node 2
Class Cases %
0 8808 42.3
1 12036 57.7
N = 20844
Node 1
NUM_VEH
Class Cases %
0 37891 66.2
1 19312 33.8
N = 57203
Growing the Tree
FREQ1_F_RPT <= 0.500
Terminal
Node 1
Class = 0
Class Cases %
0 18984 78.7
1 5138 21.3
N = 24122
FREQ1_F_RPT > 0.500
Terminal
Node 2
Class = 1
Class Cases %
0 2508 57.4
1 1859 42.6
N = 4367
LIAB_ONLY <= 0.500
Node 3
FREQ1_F_RPT
N = 28489
LIAB_ONLY > 0.500
Terminal
Node 3
Class = 0
Class Cases %
0 7591 96.5
1 279 3.5
N = 7870
NUM_VEH <= 4.500
Node 2
LIAB_ONLY
N = 36359
AVGAGE_CAT <= 8.500
Terminal
Node 4
Class = 1
Class Cases %
0 4327 48.1
1 4671 51.9
N = 8998
AVGAGE_CAT > 8.500
Terminal
Node 5
Class = 0
Class Cases %
0 2072 76.5
1 637 23.5
N = 2709
NUM_VEH <= 10.500
Node 5
AVGAGE_CAT
N = 11707
NUM_VEH > 10.500
Terminal
Node 6
Class = 1
Class Cases %
0 2409 26.4
1 6728 73.6
N = 9137
NUM_VEH > 4.500
Node 4
NUM_VEH
N = 20844
Node 1
NUM_VEH
N = 57203
Observations (Shaking the Tree)
 First split (# vehicles) is rather
obvious
 More exposure  more claims
 But it confirms that CART is doing
something reasonable.
 Also: the choice of splitting
value 5 (not 4 or 6) is non-
obvious.
 This suggests a way of
optimally “binning” continuous
variables into a small number
of groups
NUM_VEH <= 4.500
Terminal
Node 1
Class Cases %
0 29083 80.0
1 7276 20.0
N = 36359
NUM_VEH > 4.500
Terminal
Node 2
Class Cases %
0 8808 42.3
1 12036 57.7
N = 20844
Node 1
NUM_VEH
Class Cases %
0 37891 66.2
1 19312 33.8
N = 57203

More Related Content

What's hot

Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Kazi Toufiq Wadud
 
Density based Clustering Algorithms(DB SCAN, Mean shift )
Density based Clustering Algorithms(DB SCAN, Mean shift )Density based Clustering Algorithms(DB SCAN, Mean shift )
Density based Clustering Algorithms(DB SCAN, Mean shift )Utkarsh Sharma
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector MachineDerek Kane
 
★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysisirisshicat
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesDataminingTools Inc
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysishktripathy
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy dataVivek Gandhi
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machineRuta Kambli
 
Machine learning session 10
Machine learning session 10Machine learning session 10
Machine learning session 10NirsandhG
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clusteringmonalisa Das
 
K means clustering
K means clusteringK means clustering
K means clusteringThomas K T
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...Edureka!
 

What's hot (20)

Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
16 Simple CART
16 Simple CART16 Simple CART
16 Simple CART
 
Density based Clustering Algorithms(DB SCAN, Mean shift )
Density based Clustering Algorithms(DB SCAN, Mean shift )Density based Clustering Algorithms(DB SCAN, Mean shift )
Density based Clustering Algorithms(DB SCAN, Mean shift )
 
Data Science - Part IX - Support Vector Machine
Data Science - Part IX -  Support Vector MachineData Science - Part IX -  Support Vector Machine
Data Science - Part IX - Support Vector Machine
 
★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis★Mean shift a_robust_approach_to_feature_space_analysis
★Mean shift a_robust_approach_to_feature_space_analysis
 
WEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And AttributesWEKA: Data Mining Input Concepts Instances And Attributes
WEKA: Data Mining Input Concepts Instances And Attributes
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
K means clustring @jax
K means clustring @jaxK means clustring @jax
K means clustring @jax
 
Data miningpresentation
Data miningpresentationData miningpresentation
Data miningpresentation
 
Clustering
ClusteringClustering
Clustering
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Handling noisy data
Handling noisy dataHandling noisy data
Handling noisy data
 
Event classification & prediction using support vector machine
Event classification & prediction using support vector machineEvent classification & prediction using support vector machine
Event classification & prediction using support vector machine
 
Machine learning session 10
Machine learning session 10Machine learning session 10
Machine learning session 10
 
ML basic &amp; clustering
ML basic &amp; clusteringML basic &amp; clustering
ML basic &amp; clustering
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Clustering part 1
Clustering part 1Clustering part 1
Clustering part 1
 
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
K Means Clustering Algorithm | K Means Example in Python | Machine Learning A...
 
Birch
BirchBirch
Birch
 

Similar to 14 Simple CART

Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classificationAlex Henderson
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methodsjoycemi_la
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsPeter Solymos
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better MathBrent Schneeman
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowOswald Campesato
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataShivaram Prakash
 
Mini project boston housing dataset v1
Mini project   boston housing dataset v1Mini project   boston housing dataset v1
Mini project boston housing dataset v1Wyendrila Roy
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesUni Azza Aunillah
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesUni Azza Aunillah
 

Similar to 14 Simple CART (20)

15 Simple CART
15 Simple CART15 Simple CART
15 Simple CART
 
Getting started with chemometric classification
Getting started with chemometric classificationGetting started with chemometric classification
Getting started with chemometric classification
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
Variable Selection Methods
Variable Selection MethodsVariable Selection Methods
Variable Selection Methods
 
Complex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutionsComplex models in ecology: challenges and solutions
Complex models in ecology: challenges and solutions
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Bigger Data v Better Math
Bigger Data v Better MathBigger Data v Better Math
Bigger Data v Better Math
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing Data
 
K fold validation
K fold validationK fold validation
K fold validation
 
6조
6조6조
6조
 
Advanced cart 2007
Advanced cart 2007Advanced cart 2007
Advanced cart 2007
 
Mini project boston housing dataset v1
Mini project   boston housing dataset v1Mini project   boston housing dataset v1
Mini project boston housing dataset v1
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Mod mean quartile
Mod mean quartileMod mean quartile
Mod mean quartile
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 

More from Vishal Dutt

Grid computing components
Grid computing componentsGrid computing components
Grid computing componentsVishal Dutt
 
Python files / directories part16
Python files / directories  part16Python files / directories  part16
Python files / directories part16Vishal Dutt
 
Python Classes and Objects part14
Python Classes and Objects  part14Python Classes and Objects  part14
Python Classes and Objects part14Vishal Dutt
 
Python Classes and Objects part13
Python Classes and Objects  part13Python Classes and Objects  part13
Python Classes and Objects part13Vishal Dutt
 
Python files / directories part15
Python files / directories  part15Python files / directories  part15
Python files / directories part15Vishal Dutt
 
Python functions part12
Python functions  part12Python functions  part12
Python functions part12Vishal Dutt
 
Python functions part11
Python functions  part11Python functions  part11
Python functions part11Vishal Dutt
 
Python functions part10
Python functions  part10Python functions  part10
Python functions part10Vishal Dutt
 
Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Vishal Dutt
 
Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Vishal Dutt
 
Python decision making_loops part7
Python decision making_loops part7Python decision making_loops part7
Python decision making_loops part7Vishal Dutt
 
Python decision making_loops part6
Python decision making_loops part6Python decision making_loops part6
Python decision making_loops part6Vishal Dutt
 
Python decision making part5
Python decision making part5Python decision making part5
Python decision making part5Vishal Dutt
 
Python decision making part4
Python decision making part4Python decision making part4
Python decision making part4Vishal Dutt
 
Python operators part3
Python operators part3Python operators part3
Python operators part3Vishal Dutt
 

More from Vishal Dutt (20)

Grid computing components
Grid computing componentsGrid computing components
Grid computing components
 
Python files / directories part16
Python files / directories  part16Python files / directories  part16
Python files / directories part16
 
Python Classes and Objects part14
Python Classes and Objects  part14Python Classes and Objects  part14
Python Classes and Objects part14
 
Python Classes and Objects part13
Python Classes and Objects  part13Python Classes and Objects  part13
Python Classes and Objects part13
 
Python files / directories part15
Python files / directories  part15Python files / directories  part15
Python files / directories part15
 
Python functions part12
Python functions  part12Python functions  part12
Python functions part12
 
Python functions part11
Python functions  part11Python functions  part11
Python functions part11
 
Python functions part10
Python functions  part10Python functions  part10
Python functions part10
 
List view5
List view5List view5
List view5
 
Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Python decision making_loops_control statements part9
Python decision making_loops_control statements part9
 
List view4
List view4List view4
List view4
 
List view3
List view3List view3
List view3
 
Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Python decision making_loops_control statements part8
Python decision making_loops_control statements part8
 
Python decision making_loops part7
Python decision making_loops part7Python decision making_loops part7
Python decision making_loops part7
 
Python decision making_loops part6
Python decision making_loops part6Python decision making_loops part6
Python decision making_loops part6
 
List view2
List view2List view2
List view2
 
List view1
List view1List view1
List view1
 
Python decision making part5
Python decision making part5Python decision making part5
Python decision making part5
 
Python decision making part4
Python decision making part4Python decision making part4
Python decision making part4
 
Python operators part3
Python operators part3Python operators part3
Python operators part3
 

Recently uploaded

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Recently uploaded (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

14 Simple CART

  • 1. Prof. Neeraj Bhargava Vishal Dutt Department of Computer Science, School of Engineering & System Sciences MDS University, Ajmer
  • 2. What is CART?  Classification And Regression Trees  Developed by Breiman, Friedman, Olshen, Stone in early 80’s.  Introduced tree-based modeling into the statistical mainstream  Rigorous approach involving cross-validation to select the optimal tree  One of many tree-based modeling techniques.  CART -- the classic  CHAID  C5.0  Software package variants (SAS, S-Plus, R…)  Note: the “rpart” package in “R” is freely available
  • 3. The Key Idea Recursive Partitioning  Take all of your data.  Consider all possible values of all variables.  Select the variable/value (X=t1) that produces the greatest “separation” in the target.  (X=t1) is called a “split”.  If X< t1 then send the data to the “left”; otherwise, send data point to the “right”.  Now repeat same process on these two “nodes”  You get a “tree”  Note: CART only uses binary splits.
  • 5. Let’s Get Rolling Suppose you have 3 variables: # vehicles: {1,2,3…10+} Age category: {1,2,3…6} Liability-only: {0,1}  At each iteration, CART tests all 15 splits. (#veh<2), (#veh<3),…, (#veh<10) (age<2),…, (age<6) (lia<1) Select split resulting in greatest increase in purity.  Perfect purity: each split has either all claims or all no-claims.  Perfect impurity: each split has same proportion of claims as overall population.
  • 6. Classification Tree Example: predict likelihood of a claim  Commercial Auto Dataset  57,000 policies  34% claim frequency  Classification Tree using Gini splitting rule  First split:  Policies with ≥5 vehicles have 58% claim frequency  Else 20%  Big increase in purity NUM_VEH <= 4.500 Terminal Node 1 Class Cases % 0 29083 80.0 1 7276 20.0 N = 36359 NUM_VEH > 4.500 Terminal Node 2 Class Cases % 0 8808 42.3 1 12036 57.7 N = 20844 Node 1 NUM_VEH Class Cases % 0 37891 66.2 1 19312 33.8 N = 57203
  • 7. Growing the Tree FREQ1_F_RPT <= 0.500 Terminal Node 1 Class = 0 Class Cases % 0 18984 78.7 1 5138 21.3 N = 24122 FREQ1_F_RPT > 0.500 Terminal Node 2 Class = 1 Class Cases % 0 2508 57.4 1 1859 42.6 N = 4367 LIAB_ONLY <= 0.500 Node 3 FREQ1_F_RPT N = 28489 LIAB_ONLY > 0.500 Terminal Node 3 Class = 0 Class Cases % 0 7591 96.5 1 279 3.5 N = 7870 NUM_VEH <= 4.500 Node 2 LIAB_ONLY N = 36359 AVGAGE_CAT <= 8.500 Terminal Node 4 Class = 1 Class Cases % 0 4327 48.1 1 4671 51.9 N = 8998 AVGAGE_CAT > 8.500 Terminal Node 5 Class = 0 Class Cases % 0 2072 76.5 1 637 23.5 N = 2709 NUM_VEH <= 10.500 Node 5 AVGAGE_CAT N = 11707 NUM_VEH > 10.500 Terminal Node 6 Class = 1 Class Cases % 0 2409 26.4 1 6728 73.6 N = 9137 NUM_VEH > 4.500 Node 4 NUM_VEH N = 20844 Node 1 NUM_VEH N = 57203
  • 8. Observations (Shaking the Tree)  First split (# vehicles) is rather obvious  More exposure  more claims  But it confirms that CART is doing something reasonable.  Also: the choice of splitting value 5 (not 4 or 6) is non- obvious.  This suggests a way of optimally “binning” continuous variables into a small number of groups NUM_VEH <= 4.500 Terminal Node 1 Class Cases % 0 29083 80.0 1 7276 20.0 N = 36359 NUM_VEH > 4.500 Terminal Node 2 Class Cases % 0 8808 42.3 1 12036 57.7 N = 20844 Node 1 NUM_VEH Class Cases % 0 37891 66.2 1 19312 33.8 N = 57203

Editor's Notes

  1. 2
  2. 3
  3. 5
  4. 6
  5. 7
  6. 8