SlideShare a Scribd company logo
1 of 11
Prof. Neeraj Bhargava
Vishal Dutt
Department of Computer Science, School of
Engineering & System Sciences
MDS University, Ajmer
CART Gains Chart
 How do the three trees
compare?
 Use gains chart on test data.
 Outer black line: the best
one could do
 45o line: monkey throwing
darts
 The bigger trees are about
equally good in catching 80%
of the spam.
 We do lose something with
the simpler tree.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Perc.Total.Pop
Perc.Spam
perfect model
unpruned tree
pruned tree #1
pruned tree #2
Spam Email Detection - Gains Charts
Other Models
 Fit a purely additive MARS model to the data.
 No interactions among basis functions
 Fit a neural network with 3 hidden nodes.
 Fit a logistic regression (GLM).
 Using the 20 strongest variables
 Fit an ordinary multiple regression.
 A statistical sin: the target is binary, not normal
GLM model
Logistic regression run
on 20 of the most
powerful predictive
variables
Neural Net Weights
Comparison of Techniques
 All techniques add value.
 MARS/NNET beats GLM.
 But note: we used all variables
for MARS/NNET; only 20 for
GLM.
 GLM beats CART.
 In real life we’d probably use
the GLM model but refer to
the tree for “rules” and
intuition.
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Perc.Total.Pop
Perc.Spam
perfect model
mars
neural net
pruned tree #1
glm
regression
Spam Email Detection - Gains Charts
Parting Shot: Hybrid GLM model
 We can use the simple decision tree (#3) to motivate the
creation of two ‘interaction’ terms:
 “Goodnode”:
(freq_$ < .0565) & (freq_remove < .065) & (freq_! <.524)
 “Badnode”:
(freq_$ > .0565) & (freq_hp <.16) & (freq_! > .375)
 We read these off tree (#3)
 Code them as {0,1} dummy variables
 Include in GLM model
 At the same time, remove terms no longer significant.
Hybrid GLM model
•The Goodnode and
Badnode indicators are
highly significant.
•Note that we also
removed 5 variables that
were in the original GLM
Hybrid Model Result
 Slight improvement over the
original GLM.
 See gains chart
 See confusion matrix
 Improvement not huge in this
particular model…
 … but proves the concept
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Perc.Total.Pop
Perc.Spam
perfect model
neural net
decision tree #2
glm
hybrid glm
Spam Email Detection - Gains Charts
Concluding Thoughts
 In many cases, CART will likely under-perform tried-and-
true techniques like GLM.
 Poor at handling linear structure
 Data gets chopped thinner at each split
 BUT: is highly intuitive and a great way to:
 Get a feel for your data
 Select variables
 Search for interactions
 Search for “rules”
 Bin variables

More Related Content

What's hot

Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Sri Ambati
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkDamian R. Mingle, MBA
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...gregoryg
 
PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based
PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based  PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based
PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based ahmet furkan emrehan
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
[M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization [M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization Andrea Rubio
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonRobert Dempsey
 
Deep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go HomeDeep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go HomeHolberton School
 
DL Classe 1 - Go Deep or Go Home
DL Classe 1 - Go Deep or Go HomeDL Classe 1 - Go Deep or Go Home
DL Classe 1 - Go Deep or Go HomeGregory Renard
 

What's hot (17)

Chap08
Chap08Chap08
Chap08
 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
 
Scikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That WorkScikit Learn: Data Normalization Techniques That Work
Scikit Learn: Data Normalization Techniques That Work
 
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
 
PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based
PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based  PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based
PREDICTION MODELS BASED ON MAX-STEMS Episode One: One-Word Based
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
[M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization [M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization
 
Matlab for marketing people
Matlab for marketing peopleMatlab for marketing people
Matlab for marketing people
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
3.6 (1)
3.6 (1)3.6 (1)
3.6 (1)
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Creating Your First Predictive Model In Python
Creating Your First Predictive Model In PythonCreating Your First Predictive Model In Python
Creating Your First Predictive Model In Python
 
Deep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go HomeDeep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go Home
 
DL Classe 1 - Go Deep or Go Home
DL Classe 1 - Go Deep or Go HomeDL Classe 1 - Go Deep or Go Home
DL Classe 1 - Go Deep or Go Home
 

Similar to 20 Simple CART

Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfAaryanArora10
 
Musings of kaggler
Musings of kagglerMusings of kaggler
Musings of kagglerKai Xin Thia
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & UnderfittingSOUMIT KAR
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataShivaram Prakash
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson ChallengeRaouf KESKES
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Taking r to its limits. 70+ tips
Taking r to its limits. 70+ tipsTaking r to its limits. 70+ tips
Taking r to its limits. 70+ tipsIlya Shutov
 
Toward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsToward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsJacques Rioux
 
WEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And TechniquesWEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And Techniquesweka Content
 
ADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De BunkersADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De BunkersRGPV De Bunkers
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation SaravanakumarSekar4
 

Similar to 20 Simple CART (20)

Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Musings of kaggler
Musings of kagglerMusings of kaggler
Musings of kaggler
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing Data
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Taking r to its limits. 70+ tips
Taking r to its limits. 70+ tipsTaking r to its limits. 70+ tips
Taking r to its limits. 70+ tips
 
Toward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss ModelsToward a Unified Approach to Fitting Loss Models
Toward a Unified Approach to Fitting Loss Models
 
report
reportreport
report
 
WEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And TechniquesWEKA:Practical Machine Learning Tools And Techniques
WEKA:Practical Machine Learning Tools And Techniques
 
ADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De BunkersADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
ADA Unit — 2 Greedy Strategy and Examples | RGPV De Bunkers
 
Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation Data Assessment and Analysis for Model Evaluation
Data Assessment and Analysis for Model Evaluation
 

More from Vishal Dutt

Grid computing components
Grid computing componentsGrid computing components
Grid computing componentsVishal Dutt
 
Python files / directories part16
Python files / directories  part16Python files / directories  part16
Python files / directories part16Vishal Dutt
 
Python Classes and Objects part14
Python Classes and Objects  part14Python Classes and Objects  part14
Python Classes and Objects part14Vishal Dutt
 
Python Classes and Objects part13
Python Classes and Objects  part13Python Classes and Objects  part13
Python Classes and Objects part13Vishal Dutt
 
Python files / directories part15
Python files / directories  part15Python files / directories  part15
Python files / directories part15Vishal Dutt
 
Python functions part12
Python functions  part12Python functions  part12
Python functions part12Vishal Dutt
 
Python functions part11
Python functions  part11Python functions  part11
Python functions part11Vishal Dutt
 
Python functions part10
Python functions  part10Python functions  part10
Python functions part10Vishal Dutt
 
Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Vishal Dutt
 
Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Vishal Dutt
 
Python decision making_loops part7
Python decision making_loops part7Python decision making_loops part7
Python decision making_loops part7Vishal Dutt
 
Python decision making_loops part6
Python decision making_loops part6Python decision making_loops part6
Python decision making_loops part6Vishal Dutt
 
Python decision making part5
Python decision making part5Python decision making part5
Python decision making part5Vishal Dutt
 
Python decision making part4
Python decision making part4Python decision making part4
Python decision making part4Vishal Dutt
 
Python operators part3
Python operators part3Python operators part3
Python operators part3Vishal Dutt
 

More from Vishal Dutt (20)

Grid computing components
Grid computing componentsGrid computing components
Grid computing components
 
Python files / directories part16
Python files / directories  part16Python files / directories  part16
Python files / directories part16
 
Python Classes and Objects part14
Python Classes and Objects  part14Python Classes and Objects  part14
Python Classes and Objects part14
 
Python Classes and Objects part13
Python Classes and Objects  part13Python Classes and Objects  part13
Python Classes and Objects part13
 
Python files / directories part15
Python files / directories  part15Python files / directories  part15
Python files / directories part15
 
Python functions part12
Python functions  part12Python functions  part12
Python functions part12
 
Python functions part11
Python functions  part11Python functions  part11
Python functions part11
 
Python functions part10
Python functions  part10Python functions  part10
Python functions part10
 
List view5
List view5List view5
List view5
 
Python decision making_loops_control statements part9
Python decision making_loops_control statements part9Python decision making_loops_control statements part9
Python decision making_loops_control statements part9
 
List view4
List view4List view4
List view4
 
List view3
List view3List view3
List view3
 
Python decision making_loops_control statements part8
Python decision making_loops_control statements part8Python decision making_loops_control statements part8
Python decision making_loops_control statements part8
 
Python decision making_loops part7
Python decision making_loops part7Python decision making_loops part7
Python decision making_loops part7
 
Python decision making_loops part6
Python decision making_loops part6Python decision making_loops part6
Python decision making_loops part6
 
List view2
List view2List view2
List view2
 
List view1
List view1List view1
List view1
 
Python decision making part5
Python decision making part5Python decision making part5
Python decision making part5
 
Python decision making part4
Python decision making part4Python decision making part4
Python decision making part4
 
Python operators part3
Python operators part3Python operators part3
Python operators part3
 

Recently uploaded

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 

Recently uploaded (20)

Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 

20 Simple CART

  • 1. Prof. Neeraj Bhargava Vishal Dutt Department of Computer Science, School of Engineering & System Sciences MDS University, Ajmer
  • 2.
  • 3. CART Gains Chart  How do the three trees compare?  Use gains chart on test data.  Outer black line: the best one could do  45o line: monkey throwing darts  The bigger trees are about equally good in catching 80% of the spam.  We do lose something with the simpler tree. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Perc.Total.Pop Perc.Spam perfect model unpruned tree pruned tree #1 pruned tree #2 Spam Email Detection - Gains Charts
  • 4. Other Models  Fit a purely additive MARS model to the data.  No interactions among basis functions  Fit a neural network with 3 hidden nodes.  Fit a logistic regression (GLM).  Using the 20 strongest variables  Fit an ordinary multiple regression.  A statistical sin: the target is binary, not normal
  • 5. GLM model Logistic regression run on 20 of the most powerful predictive variables
  • 7. Comparison of Techniques  All techniques add value.  MARS/NNET beats GLM.  But note: we used all variables for MARS/NNET; only 20 for GLM.  GLM beats CART.  In real life we’d probably use the GLM model but refer to the tree for “rules” and intuition. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Perc.Total.Pop Perc.Spam perfect model mars neural net pruned tree #1 glm regression Spam Email Detection - Gains Charts
  • 8. Parting Shot: Hybrid GLM model  We can use the simple decision tree (#3) to motivate the creation of two ‘interaction’ terms:  “Goodnode”: (freq_$ < .0565) & (freq_remove < .065) & (freq_! <.524)  “Badnode”: (freq_$ > .0565) & (freq_hp <.16) & (freq_! > .375)  We read these off tree (#3)  Code them as {0,1} dummy variables  Include in GLM model  At the same time, remove terms no longer significant.
  • 9. Hybrid GLM model •The Goodnode and Badnode indicators are highly significant. •Note that we also removed 5 variables that were in the original GLM
  • 10. Hybrid Model Result  Slight improvement over the original GLM.  See gains chart  See confusion matrix  Improvement not huge in this particular model…  … but proves the concept 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Perc.Total.Pop Perc.Spam perfect model neural net decision tree #2 glm hybrid glm Spam Email Detection - Gains Charts
  • 11. Concluding Thoughts  In many cases, CART will likely under-perform tried-and- true techniques like GLM.  Poor at handling linear structure  Data gets chopped thinner at each split  BUT: is highly intuitive and a great way to:  Get a feel for your data  Select variables  Search for interactions  Search for “rules”  Bin variables

Editor's Notes

  1. 3
  2. 4
  3. 5
  4. 6
  5. 7
  6. 8
  7. 9
  8. 10
  9. 11