18 Simple CART

•

0 likes•32 views

Vishal Dutt

Data Mining Lecture Series CART V

Education

Prof. Neeraj Bhargava
Vishal Dutt
Department of Computer Science, School of
Engineering & System Sciences
MDS University, Ajmer

CART advantages
 Nonparametric (no probabilistic assumptions)
 Automatically performs variable selection
 Uses any combination of continuous/discrete variables
 Very nice feature: ability to automatically bin massively
categorical variables into a few categories.
 zip code, business class, make/model…
 Discovers “interactions” among variables
 Good for “rules” search
 Hybrid GLM-CART models

CART advantages
 CART handles missing values automatically
 Using “surrogate splits”
 Invariant to monotonic transformations of predictive
variable
 Not sensitive to outliers in predictive variables
 Unlike regression
 Great way to explore, visualize data

CART Disadvantages
 The model is a step function, not a continuous score
 So if a tree has 10 nodes, that can only take on 10 possible
values.
 MARS improves this.
 Might take a large tree to get good lift
 But then hard to interpret
 Data gets chopped thinner at each split
 Instability of model structure
 Correlated variables  random data fluctuations could result in
entirely different trees.
 CART does a poor job of modeling linear structure

Uses of CART
 Building predictive models
 Alternative to GLMs, neural nets, etc
 Exploratory Data Analysis
 Breiman et al: a different view of the data.
 You can build a tree on nearly any data set with minimal data
preparation.
 Which variables are selected first?
 Interactions among variables
 Take note of cases where CART keeps re-splitting the same
variable (suggests linear relationship)
 Variable Selection
 CART can rank variables
 Alternative to stepwise regression

Similar to 18 Simple CART

Morse-Smale RegressionColleen Farrelly

Introduction to RandomForests 2004Salford Systems

VSSML17 Review. Summary Day 1 SessionsBigML, Inc

Data Science - Part V - Decision Trees & Random Forests Derek Kane

Accelerating the Random Forest algorithm for commodity parallel- Mark SeligmanPyData

Machine Learning Algorithm for Business Strategy.pdfPhD Assistance

20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2

Bank loan purchase modelingSaleesh Satheeshchandran

Overfitting & UnderfittingSOUMIT KAR

IntroductioneditedMefratechnologies

Machine learning it is time...Sandip Chatterjee

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

Data reductionGowriLatha1

ML-Unit-4.pdfAnushaSharma81

SPM User's Guide: Introducing MARSSalford Systems

Dowload Paper.doc.docbutest

TreeNet Tree Ensembles and CART Decision Trees: A Winning CombinationSalford Systems

TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems

Similar to 18 Simple CART (20)

Morse-Smale Regression

Introduction to RandomForests 2004

VSSML17 Review. Summary Day 1 Sessions

Data Science - Part V - Decision Trees & Random Forests

Accelerating the Random Forest algorithm for commodity parallel- Mark Seligman

Machine Learning Algorithm for Business Strategy.pdf

20211229120253D6323_PERT 06_ Ensemble Learning.pptx

Bank loan purchase modeling

Overfitting & Underfitting

Introductionedited

Machine learning it is time...

100-Concepts-of-AI by Anupama Kate .pptx

Data reduction

ML-Unit-4.pdf

SPM User's Guide: Introducing MARS

Dowload Paper.doc.doc

TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination

TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination

Recently uploaded

Staff of Color (SOC) Retention Efforts DDSDDavid Douglas School District

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George

mini mental status format.docxPoojaSen20

Introduction to AI in Higher Education_draft.pptxpboyjonauth

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy

Software Engineering Methodologies (overview)eniolaolutunde

Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝9953056974 Low Rate Call Girls In Saket, Delhi NCR

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching

microwave assisted reaction. General introductionMaksud Ahmed

The Most Excellent Way | 1 Corinthians 13Steve Thomason

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Mastering the Unannounced Regulatory InspectionSafetyChain Software

Measures of Central Tendency: Mean, Median and ModeThiyagu K

Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth

Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron

Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle

Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha

The basics of sentences session 2pptx copy.pptxheathfieldcps1

Paris 2024 Olympic Geographies - an activityGeoBlogs

Recently uploaded (20)

Staff of Color (SOC) Retention Efforts DDSD

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17

mini mental status format.docx

Introduction to AI in Higher Education_draft.pptx

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf

Software Engineering Methodologies (overview)

Micromeritics - Fundamental and Derived Properties of Powders

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...

microwave assisted reaction. General introduction

The Most Excellent Way | 1 Corinthians 13

Employee wellbeing at the workplace.pptx

Mastering the Unannounced Regulatory Inspection

Measures of Central Tendency: Mean, Median and Mode

Introduction to ArtificiaI Intelligence in Higher Education

Q4-W6-Restating Informational Text Grade 3

Hybridoma Technology ( Production , Purification , and Application )

Call Girls in Dwarka Mor Delhi Contact Us 9654467111

The basics of sentences session 2pptx copy.pptx

Paris 2024 Olympic Geographies - an activity

18 Simple CART

1. Prof. Neeraj Bhargava Vishal Dutt Department of Computer Science, School of Engineering & System Sciences MDS University, Ajmer

2. CART advantages  Nonparametric (no probabilistic assumptions)  Automatically performs variable selection  Uses any combination of continuous/discrete variables  Very nice feature: ability to automatically bin massively categorical variables into a few categories.  zip code, business class, make/model…  Discovers “interactions” among variables  Good for “rules” search  Hybrid GLM-CART models

3. CART advantages  CART handles missing values automatically  Using “surrogate splits”  Invariant to monotonic transformations of predictive variable  Not sensitive to outliers in predictive variables  Unlike regression  Great way to explore, visualize data

4. CART Disadvantages  The model is a step function, not a continuous score  So if a tree has 10 nodes, that can only take on 10 possible values.  MARS improves this.  Might take a large tree to get good lift  But then hard to interpret  Data gets chopped thinner at each split  Instability of model structure  Correlated variables  random data fluctuations could result in entirely different trees.  CART does a poor job of modeling linear structure

5. Uses of CART  Building predictive models  Alternative to GLMs, neural nets, etc  Exploratory Data Analysis  Breiman et al: a different view of the data.  You can build a tree on nearly any data set with minimal data preparation.  Which variables are selected first?  Interactions among variables  Take note of cases where CART keeps re-splitting the same variable (suggests linear relationship)  Variable Selection  CART can rank variables  Alternative to stepwise regression

18 Simple CART

Recommended

Recommended

More Related Content

Similar to 18 Simple CART

Similar to 18 Simple CART (20)

More from Vishal Dutt

More from Vishal Dutt (20)

Recently uploaded

Recently uploaded (20)

18 Simple CART

Editor's Notes