SlideShare a Scribd company logo
1 of 61
Improve Your Regression
CARTยฎ and RandomForestsยฎ
Charles Harrison
Marketing Statistician
Outline
Applications of CART and Random Forests
Ordinary Least Squares Regression
โ€“ A review
โ€“ Common issues in standard linear regression
Data Description
Improving your regression with an applied example
โ€“ CART decision tree
โ€“ Random Forest
Conclusions
Salford Systems ยฉ 2016 2
Applications
In this webinar we use CARTยฎ software and RandomForestsยฎ software to predict
concrete strength, but as we will see these techniques can be applied to any field
Quantitative Targets: Number of Cavities, Blood Pressure, Income etc.
Qualitative Targets: Disease or No Disease; Buy or Not Buy; Lend or Do Not Lend;
Buy Product A vs Product B vs. Product C vs. Product D
Examples
Credit Risk
Glaucoma Screening
Insurance Fraud
Customer Loyalty
Drug Discovery
Early Identification of Reading Disabilities
Biodiversity and Wildlife Conservation
Preview: CART and Random Forest Advantages
As we will see in this presentation both CART and Random
Forests have desirable properties that allow you to build
accurate predictive models with dirty data (i.e. missing values,
lots of variables, nonlinear relationships, outliers etc.)
Preview: Geometry of a CART tree (1 split) Preview: Geometry of a CART tree (2 splits)
Preview: Model Performance
Salford Systems ยฉ 2016 5
Method
Linear Regression 109.04
Linear Regression with
interactions
67.35
Min 1 SE CART (default
settings 65.05
CART (default settings)
55.99
RandomForestsยฎ
(default settings)
37.570
Improved
RandomForestsยฎ using
an SPM Automate
36.02
๐‘€๐‘†๐ธ =
1
๐‘›
๐‘–=1
๐‘›
๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘–
2
What is OLS?
OLS โ€“ ordinary least squares regression
โ€“ Discovered by Legendre (1805) and Gauss (1809) to solve
problems in astronomy using pen and paper
The model is of the form
๐œท ๐ŸŽ โ€“ the intercept term
๐œท ๐Ÿ, ๐œท ๐Ÿ, ๐œท ๐Ÿ‘โ€ฆ โ€“ coefficient estimates
๐’™ ๐Ÿ, ๐’™ ๐Ÿ, ๐’™ ๐Ÿ‘, โ€ฆ ๐’™ ๐’‘ - predictor variables (i.e. columns in the dataset)
Example: Income= 20,000 + 2,500*WorkExperience + 1,000*EducationYears
Salford Systems ยฉ 2016 6
Y = ๐œท ๐ŸŽ + ๐œท ๐Ÿ ๐’™ ๐Ÿ+ ๐œท ๐Ÿ ๐’™ ๐Ÿ+ ๐œท ๐Ÿ‘ ๐’™ ๐Ÿ‘ + โ€ฆ + ๐œท ๐’‘ ๐’™ ๐’‘
Common Issues in Regression
Missing values
โ€“ Requires imputation OR
โ€“ Results in record deletion
Nonlinearities and Local Effects
โ€“ Example: Y = 10 + 3๐‘ฅ1 + ๐‘ฅ2 โˆ’ .3๐‘ฅ1
2
โ€“ Modeled via manual transformations or they are automatically added and
then selected via forward, backward, stepwise, or regularization
โ€“ Ignores local effects unless specified by the analyst, but this is very
difficult/impossible in practice without subject matter expertise or prior
knowledge
Interactions
โ€“ Example: ๐‘Œ = 10 + 3๐‘ฅ1 โˆ’ 2๐‘ฅ2 + .25๐‘ฅ1 ๐‘ฅ2
โ€“ Manually added to the model (or through some automated procedure)
โ€“ Add interactions then use variable selection (i.e. regularized regression or
forward, backward, or stepwise selection)
Variable selection
โ€“ Usually accomplished manually or in combination with automated
selection procedures
Salford Systems ยฉ 2016 7
Solutions to OLS Problems
Two methods that do not suffer from the
drawbacks of linear regression are CART and
Random Forests
These methods automatically
โ€“ Handle missing values
โ€“ Model nonlinear relationships and local effects
โ€“ Select variables
โ€“ Model variable interactions
Salford Systems ยฉ 2016 8
Concrete Strength
Target:
โ€“ STRENGTH
Compressive strength of concrete in megapascals
Predictors:
โ€“ CEMENT
โ€“ BLAST_FURNACE_SLAG
โ€“ FLY_ASH
โ€“ WATER
โ€“ SUPERPLASTICIZER
โ€“ COARSE_AGGREGATE
โ€“ FINE_AGGREGATE
โ€“ AGE
Salford Systems ยฉ 2016 9
I-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998)
Why predict concrete strength?
Concrete is one of the most important materials in our society
and is a key ingredient in important infrastructure projects like
bridges, roads, buildings, and dams (MATSE)
Predicting the strength of concrete is important because its
concrete strength is a key component of the overall stability
these structures
Source: http://matse1.matse.illinois.edu/concrete/prin.html
Data Sample
Salford Systems ยฉ 2016 11
Cement Blast Furnace Slag Fly Ash Water Superplasticizer
Coarse
Aggregate
Fine Aggregate Age Strength
540 0 0 162 2.5 1040 676 28 79.98611076
540 0 0 162 2.5 1055 676 28 61.88736576
332.5 142.5 0 228 0 932 594 270 40.26953526
332.5 142.5 0 228 0 932 594 365 41.05277999
198.6 132.4 0 192 0 978.4 825.5 360 44.2960751
266 114 0 228 0 932 670 90 47.02984744
380 95 0 228 0 932 594 365 43.6982994
380 95 0 228 0 932 594 28 36.44776979
266 114 0 228 0 932 670 28 45.85429086
475 0 0 228 0 932 594 28 39.28978986
198.6 132.4 0 192 0 978.4 825.5 90 38.07424367
198.6 132.4 0 192 0 978.4 825.5 28 28.02168359
427.5 47.5 0 228 0 932 594 270 43.01296026
190 190 0 228 0 932 670 90 42.32693164
304 76 0 228 0 932 670 28 47.81378165
380 0 0 228 0 932 670 90 52.90831981
Regression Results
Salford Systems ยฉ 2016 12
Method
Linear Regression 109.04
Linear Regression with
interactions
67.35
Strength = -9.70 + .115*Cement + .01*BlastFurnaceSlag + .014*FlyAsh - .172*Water +
.10*Superplasticizer + .01*CoarseAggregate + .01*FineAggregate + .11*Age
๐‘‡๐‘’๐‘ ๐‘ก ๐‘€๐‘†๐ธ =
1
๐‘›
๐‘–=1
๐‘›
๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘–
2
**Test sample: 20% of observations were randomly selected for the testing dataset
**This same test dataset was used to evaluate all models for the purpose of comparisons
Classification And Regression Trees
Authors: Breiman, Friedman, Olshen, and Stone (1984)
CART is a decision tree algorithm used for both regression and classification problems
1. Classification: tries to separate classes by choosing variables and points that best
separate them
2. Regression: chooses the best variables and split points for reducing the squared or
absolute error criterion
CART is available exclusively in the SPMยฎ 8 Software Suite and was developed in close
consultation with the original authors
CART: Introduction
CART Introduction
Main Idea: divide the predictor variables (often people say โ€œpartitionโ€ instead of โ€œdivideโ€)
into different regions so that the dependent variable can be predicted more accurately.
The following shows the predicted values from a CART tree (i.e. the red horizontal bars) to
the curve ๐‘Œ = ๐‘ฅ2
+ ๐‘›๐‘œ๐‘–๐‘ ๐‘’.
โ€œnoiseโ€ is from a N(0,1)
๐‘Œ
๐‘ฅ
CART: Terminology
A tree split occurs when
a variable is partitioned (in-depth example starts
after the next slide). This tree has two splits:
1. AGE_DAY <=21
2. CEMENT_AMT <=355.95
The node at the top of the
tree is called the root node
A node that has no sub-branch is
a terminal node
This tree has three terminal
nodes (i.e. red boxes in the tree)
AGE_DAY <= 21.00
Terminal
Node 1
STD = 12.441
Avg = 23.944
W = 260.000
N = 260
CEMENT_AMT <= 355.95
Terminal
Node 2
STD = 12.683
Avg = 37.036
W = 436.000
N = 436
CEMENT_AMT > 355.95
Terminal
Node 3
STD = 13.452
Avg = 57.026
W = 129.000
N = 129
AGE_DAY > 21.00
Node 2
CEMENT_AMT <= 355.95
STD = 15.358
Avg = 41.600
W = 565.000
N = 565
Node 1
AGE_DAY <= 21.00
STD = 16.661
Avg = 36.036
W = 825.000
N = 825
The predicted value in a CART
regression model is the
average of the target
variable (i.e. โ€œYโ€) for the
records that fall into one of the terminal nodes
Example: If Age = 26 days and the amount of cement is 400
then the predicted strength is 57.026 megapasucals
CART: AlgorithmStep 1: Grow a large tree
This is done for you automatically
All variables are considered
at each split in the tree
Each split is made using one
variable and a specific value or set of values.
Splits are chosen so as to minimize model
error
The tree is grown until either a user-specified
criterion is met or until the tree cannot be
grown further
Step2: Prune the large tree
This is also done for you automatically
Use either a test sample or cross validation to prune subtrees
CART: Splitting Procedure
Consider the following CART tree grown on this dataset
How exactly do we get this tree? Y ๐‘‹1 ๐‘‹2
79.9861 162 28
61.8874 162 28
40.2695 228 270
41.0528 228 365
44.2961 192 360
47.0298 228 90
43.6983 228 365
36.4478 228 28
45.8543 228 28
39.2898 228 28
38.0742 192 90
28.0217 192 28
43.013 228 270
42.3269 228 90
47.8138 228 28
52.9083 228 90
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 <= 210.00
Terminal
Node 2
STD = 6.705
Avg = 36.797
W = 3.000
N = 3
X1 > 210.00
Terminal
Node 3
STD = 4.375
Avg = 43.609
W = 11.000
N = 11
X1 > 177.00
Node 2
X1 <= 210.00
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
CART: Splitting Procedure
Step 1: Find the best split point for the variable ๐‘‹1
๏ƒ˜ Sort the variable ๐‘‹1
๏ƒ˜ Compute the split improvement for each split point
๏ƒ˜ Best split for ๐‘‹1 :
๏ƒ˜ ๐‘ฟ ๐Ÿ โ‰ค 177
Note: the midpoint between
๐‘‹1 = 192 and ๐‘‹1 = 162 is 177
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
40.27 228 270
41.05 228 365
44.30 192 360
47.03 228 90
43.70 228 365
36.45 228 28
45.85 228 28
39.29 228 28
38.07 192 90
28.02 192 28
43.01 228 270
42.33 228 90
47.81 228 28
52.91 228 90
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
28.02 192 28
38.07 192 90
44.3 192 360
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
Y ๐‘‹1 ๐‘‹2
28.02 192 28
38.07 192 90
44.3 192 360
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
Split Improvement:
โˆ†๐‘… ๐‘ , ๐‘ก = ๐‘… ๐‘ก โˆ’ ๐‘… ๐‘ก ๐ฟ โˆ’ ๐‘…(๐‘ก ๐‘…)
๐‘… ๐‘ก =
1
๐‘
๐‘ฅ ๐‘› ๐œ–๐‘ก
(๐‘ฆ๐‘› โˆ’ ๐‘ฆ ๐‘ก )2
๐ฟ๐‘’๐‘Ž๐‘ ๐‘ก ๐‘†๐‘ž๐‘ข๐‘Ž๐‘Ÿ๐‘’๐‘ 
CART: Splitting Procedure
Step 2: Find the best split point for the variable ๐‘‹2
๏ƒ˜ Sort the variable ๐‘‹2
๏ƒ˜ Compute the split improvement for each split point
๏ƒ˜ Best Split for ๐‘‹2:
๏ƒ˜ ๐‘ฟ ๐Ÿ โ‰ค 59
Note: the midpoint
between ๐‘‹2 = 28 and
๐‘‹2 = 90 is 59
Y ๐‘‹1 ๐‘‹2
79.9861 162 28
61.8874 162 28
40.2695 228 270
41.0528 228 365
44.2961 192 360
47.0298 228 90
43.6983 228 365
36.4478 228 28
45.8543 228 28
39.2898 228 28
38.0742 192 90
28.0217 192 28
43.013 228 270
42.3269 228 90
47.8138 228 28
52.9083 228 90
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
28.02 192 28
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
38.07 192 90
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
44.30 192 360
41.05 228 365
43.70 228 365
X2 <= 59.00
Terminal
Node 1
STD = 16.158
Avg = 48.472
W = 7.000
N = 7
X2 > 59.00
Terminal
Node 2
STD = 4.069
Avg = 43.630
W = 9.000
N = 9
Node 1
X2 <= 59.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16 Y ๐‘‹1 ๐‘‹2
38.07 192 90
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
44.30 192 360
41.05 228 365
43.70 228 365
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
28.02 192 28
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
Split Improvement:
โˆ†๐‘… ๐‘ , ๐‘ก = ๐‘… ๐‘ก โˆ’ ๐‘… ๐‘ก ๐ฟ โˆ’ ๐‘…(๐‘ก ๐‘…)
๐‘… ๐‘ก =
1
๐‘
๐‘ฅ ๐‘› ๐œ–๐‘ก
(๐‘ฆ๐‘› โˆ’ ๐‘ฆ ๐‘ก )2
๐ฟ๐‘’๐‘Ž๐‘ ๐‘ก ๐‘†๐‘ž๐‘ข๐‘Ž๐‘Ÿ๐‘’๐‘ 
CART: Splitting Procedure
At this point CART has evaluated all possible split points for our two variables,
๐‘‹1 and ๐‘‹2, and determined the optimal split points for each.
Splitting on either ๐‘‹1 or ๐‘‹2 will yield a different tree, so what is the best split?
The one with the largest split improvement.
Best split for ๐‘ฟ ๐Ÿ: ๐‘‹1 โ‰ค 177
Improvement Value: 90.64
Best split for ๐‘ฟ ๐Ÿ: ๐‘‹2 โ‰ค 59
Improvement Value: 5.77
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
X2 <= 59.00
Terminal
Node 1
STD = 16.158
Avg = 48.472
W = 7.000
N = 7
X2 > 59.00
Terminal
Node 2
STD = 4.069
Avg = 43.630
W = 9.000
N = 9
Node 1
X2 <= 59.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
Split Improvement:
โˆ†๐‘… ๐‘ , ๐‘ก = ๐‘… ๐‘ก โˆ’ ๐‘… ๐‘ก ๐ฟ โˆ’ ๐‘…(๐‘ก ๐‘…)
๐‘… ๐‘ก =
1
๐‘
๐‘ฅ ๐‘› ๐œ–๐‘ก
(๐‘ฆ๐‘› โˆ’ ๐‘ฆ ๐‘ก )2
๐ฟ๐‘’๐‘Ž๐‘ ๐‘ก ๐‘†๐‘ž๐‘ข๐‘Ž๐‘Ÿ๐‘’๐‘ 
CART: 1st Split
Our best first split in the tree is ๐‘‹1 โ‰ค 177 which leads to the
following tree and partitioned dataset
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
Y ๐‘‹1 ๐‘‹2
28.02 192 28
38.07 192 90
44.3 192 360
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
28.02 192 28
38.07 192 90
44.3 192 360
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
Note: the predicted
values for this tree are
the respective
averages in each
terminal node.
Terminal Node 1
predicted value:
79.99+61.89 โ‰ˆ 70.94
Terminal Node 1
Terminal Node 2
CART Geometry
Y
๐‘‹1
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
๐‘‹1
๐‘‹2
CART: Splitting Procedure
So how do we get to our final tree?
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 <= 210.00
Terminal
Node 2
STD = 6.705
Avg = 36.797
W = 3.000
N = 3
X1 > 210.00
Terminal
Node 3
STD = 4.375
Avg = 43.609
W = 11.000
N = 11
X1 > 177.00
Node 2
X1 <= 210.00
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
CART: Splitting Procedure
We now perform the same procedure again, but this time for each partition
of the data (we can only split one partition at a time)
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
Y ๐‘‹1 ๐‘‹2
28.02 192 28
38.07 192 90
44.3 192 360
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
Y ๐‘‹1 ๐‘‹2
28.02 192 28
38.07 192 90
44.3 192 360
Y ๐‘‹1 ๐‘‹2
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
Best Split: Split
Partition 2 at
๐‘‹1 โ‰ค 210Partition 1 Partition 2
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 <= 210.00
Terminal
Node 2
STD = 6.705
Avg = 36.797
W = 3.000
N = 3
X1 > 210.00
Terminal
Node 3
STD = 4.375
Avg = 43.609
W = 11.000
N = 11
X1 > 177.00
Node 2
X1 <= 210.00
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
CART Geometry
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
๐‘‹1
๐‘‹2
Y
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 <= 210.00
Terminal
Node 2
STD = 6.705
Avg = 36.797
W = 3.000
N = 3
X1 > 210.00
Terminal
Node 3
STD = 4.375
Avg = 43.609
W = 11.000
N = 11
X1 > 177.00
Node 2
X1 <= 210.00
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
Where are we?
๏ƒผ CART Splitting Process
๏ฑ CART Pruning
๏ฑ Advantages of CART
๏ฑ Interpreting CART Output
๏ฑ Applied Example using CART
๏ฑ Random Forest Section
CART: Algorithm
Step 1: Grow a large tree
Step2: Prune the large tree
This is also done for you automatically
Use either a test sample or cross validation to
prune subtrees
CART: Pruning with a Test Sample
Test sample- randomly select a certain
percentage of data (often ~20%-30%)
to be used to assess the model error
Prune the CART tree
1. Run the test data down the large tree and the
smaller trees (the smaller trees are called
โ€œsubtreesโ€)
2. Compute the test error for each tree
3. The final tree shown to the user is the tree with
the smallest test error
Subtree Error
1 200
2 125
3 100
4 83
5 113
6 137
Where are we?
๏ƒผ CART Splitting Process
๏ƒผ CART Pruning
๏ฑ Advantages of CART (this is what allows you to build
models with dirty data)
๏ฑ Interpreting CART Output
๏ฑ Applied Example using CART
๏ฑ Random Forest Section
CART Advantages
In practice, you can build CART models with dirty data (i.e.
missing values, lots of variables, nonlinear relationships, outliers,
and numerous local effects)
This is due to CARTโ€™s desirable properties:
1. Easy to interpret
2. Automatic handling of the following:
a) Variable selection
b) Variable interaction modeling
c) Local effect modeling
d) Nonlinear relationship modeling
e) Missing values
f) Outliers
3. Not affected by monotonic transformations of variables
CART: Interpretation and Automatic
Variable Selection
Interpretation: CART trees have a simple
interpretation and only require that someone ask
themselves a series of โ€œyes or noโ€ questions like โ€œIs
Age_Day <= 21?โ€ etc.
Variable Selection:
All variables will be considered for each split
but not all variables will be used. Some
variables will be used more than others.
Only one variable is used for each split
The variables chosen are those that reduce
the error the most
AGE_DAY <= 21.00
Terminal
Node 1
STD = 12.441
Avg = 23.944
W = 260.000
N = 260
CEMENT_AMT <= 355.95
Terminal
Node 2
STD = 12.683
Avg = 37.036
W = 436.000
N = 436
CEMENT_AMT > 355.95
Terminal
Node 3
STD = 13.452
Avg = 57.026
W = 129.000
N = 129
AGE_DAY > 21.00
Node 2
CEMENT_AMT <= 355.95
STD = 15.358
Avg = 41.600
W = 565.000
N = 565
Node 1
AGE_DAY <= 21.00
STD = 16.661
Avg = 36.036
W = 825.000
N = 825
CART: Automatic Variable Interactions
and Local Effects
In regression interaction terms modeled globally in
the form x1*x2 or x1*x2*x3 (global means that
the interaction is present everywhere).
In CART interactions are automatically modeled
over certain regions of the data (i.e. locally) so
you do not have to worry about adding interaction
terms or local terms to your model
Example: Notice how the prediction changes for different
amounts of cement given that the Age is over 21 days (i.e. this
is the interaction)
1. If Age > 21 and Cement Amount <= 355. 95 then the
average strength is 37 megapascuals
2. If Age > 21 and Cement Amount > 355.95 then the
average strength is 57 megapascuals
AGE_DAY <= 21.00
Terminal
Node 1
STD = 12.441
Avg = 23.944
W = 260.000
N = 260
CEMENT_AMT <= 355.95
Terminal
Node 2
STD = 12.683
Avg = 37.036
W = 436.000
N = 436
CEMENT_AMT > 355.95
Terminal
Node 3
STD = 13.452
Avg = 57.026
W = 129.000
N = 129
AGE_DAY > 21.00
Node 2
CEMENT_AMT <= 355.95
STD = 15.358
Avg = 41.600
W = 565.000
N = 565
Node 1
AGE_DAY <= 21.00
STD = 16.661
Avg = 36.036
W = 825.000
N = 825
CART: Automatic Nonlinear Modeling
Nonlinear functions (and linear) are approximated via step functions, so in practice
you do not need to worry about adding terms like ๐’™ ๐Ÿ ๐’๐’“ ๐’๐’ ๐’™ to capture
nonlinear relationships. The picture below is the CART fit to ๐‘Œ = ๐‘‹2 + noise. CART
modeled this data automatically. No data pre-processing. Just CART.
๐‘Œ
๐‘‹
CART: Automatic Missing Value Handling
CART automatically handles missing values while building the
model, so you do not need to impute missing values yourself
The missing values are handled using a surrogate split
Surrogate Split- find another variable whose split is โ€œsimilarโ€ to the
variable with the missing values and split on the variable that does
not have missing values
Reference: see Section 5.3 in Breiman, Friedman, Olshen, and
Stone for more information
CART: Outliers in the Target Variable
Two types of outliers are
1. Outliers in the target variable (i.e. โ€œYโ€)
2. Outliers in the predictor variable (i.e. โ€œxโ€)
CART is more sensitive to outliers with
respect to the target variable
1. More severe in a regression context
than a classification context
2. CART may treat target variable outliers
by isolating them in small terminal
nodes which can limit their effect
Reference: Pages 197-200 and 253 in
Breiman, Friedman, Olshen, and Stone (1984)
๐‘‹1
๐‘‹2
Y
Here the target outliers are
isolated in terminal node 1
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
CART: Outliers in the Predictor Variables
CART is more robust to outliers in the predictor variables partly due to nature of the splitting process
Reference: Pages 197-200 and 253 in Breiman, Friedman, Olshen, and Stone (1984)
Y
X1 <= 177.00
Terminal
Node 1
STD = 8.532
Avg = 73.957
W = 3.000
N = 3
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.149
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 13.661
Avg = 47.762
W = 17.000
N = 17
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
๐‘‹1
๐‘‹2
CART: Monotonic Transformations of Variables
Monotonic transformation- is a transformation that does not change the order of a variable.
โ€“ CART, unlike linear regression, is not affected by this so if a transformation does not affect the order of a variable
then you do not need to worry about adding it to a CART model
โ€“ Example: Our best first split in the example tree was ๐‘‹1 โ‰ค 177. What happens if we square ๐‘‹1? The split point value
changes, but nothing else does including the predicted values. This happens because the same Y values fall into the
same partition (i.e. their order has not changed after we squared and sorted ๐‘‹1)
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
Y ๐‘‹1 ๐‘‹2
28.02 192 28
38.07 192 90
44.3 192 360
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
Y ๐‘‹1 ๐‘‹2
79.99 162 28
61.89 162 28
28.02 192 28
38.07 192 90
44.3 192 360
36.45 228 28
45.85 228 28
39.29 228 28
47.81 228 28
47.03 228 90
42.33 228 90
52.91 228 90
40.27 228 270
43.01 228 270
41.05 228 365
43.7 228 365
X1 <= 177.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1 > 177.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1 <= 177.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
Y ๐‘‹1 ๐‘‹2
79.99 26,244 28
61.89 26,244 28
28.02 36,864 28
38.07 36,864 90
44.3 36,864 360
36.45 51,984 28
45.85 51,984 28
39.29 51,984 28
47.81 51,984 28
47.03 51,984 90
42.33 51,984 90
52.91 51,984 90
40.27 51,984 270
43.01 51,984 270
41.05 51,984 365
43.7 51,984 365
Y ๐‘‹1 ๐‘‹2
79.99 26,244 28
61.89 26,244 28
Y ๐‘‹1 ๐‘‹2
28.02 36,864 28
38.07 36,864 90
44.3 36,864 360
36.45 51,984 28
45.85 51,984 28
39.29 51,984 28
47.81 51,984 28
47.03 51,984 90
42.33 51,984 90
52.91 51,984 90
40.27 51,984 270
43.01 51,984 270
41.05 51,984 365
43.7 51,984 365
X1_SQ <= 31554.00
Terminal
Node 1
STD = 9.049
Avg = 70.937
W = 2.000
N = 2
X1_SQ > 31554.00
Terminal
Node 2
STD = 5.700
Avg = 42.150
W = 14.000
N = 14
Node 1
X1_SQ <= 31554.00
STD = 11.371
Avg = 45.748
W = 16.000
N = 16
Where are we?
๏ƒผ CART Splitting Process
๏ƒผ CART Pruning
๏ƒผ Advantages of CART (this is what allows you to build
models with dirty data)
๏ฑ Interpreting CART Output
๏ฑ Applied Example using CART
๏ฑ Random Forest Section
CART: Relative Error
Relative error- used to determine the optimal model complexity for CART models
GOOD: Relative error values close to zero mean that CART is doing a better job than predicting only the
overall average (or median) for all records in the data
BAD: Relative error values equal to one means that that CART is no better than predicting the overall
average (or median) of the target variable for every record. Note: the relative error can be greater than one
which is especially bad.
The relative error can be computed for both Least Squares: LS = ๐‘– ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–
2
and Least Absolute Deviation:
LAD = ๐‘– ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘–
Relative Error =
๐ถ๐ด๐‘…๐‘‡ ๐‘€๐‘œ๐‘‘๐‘’๐‘™ ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘ข๐‘ ๐‘–๐‘›๐‘” ๐‘’๐‘–๐‘กโ„Ž๐‘’๐‘Ÿ ๐ฟ๐‘† ๐‘œ๐‘Ÿ ๐ฟ๐ด๐ท
๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘“๐‘œ๐‘Ÿ ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘›๐‘” ๐‘กโ„Ž๐‘’ ๐‘œ๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘™๐‘™ ๐‘Ž๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘”๐‘’ ๐‘“๐‘œ๐‘Ÿ ๐‘Ž๐‘™๐‘™ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘Ÿ๐‘‘๐‘ 
LAD Relative Error =
Relative Error: 0.129
CART: Variable Importance
CART Variable Importance: sum each
variableโ€™s split improvement score across the
splits in the tree. The importance scores for
variables are increased in two ways:
1) When the variable is actually used
to a split a node
2) When the variable is as the
surrogate split (i.e. the backup
splitting variable when the primary
splitting variables has a missing
value)
โ€œConsider Only Primary Splittersโ€ (green
rectangle on the right) removes the
surrogate splitting variables from the
variable importance calculation
โ€œDiscount Surrogatesโ€ allows you to
discount surrogates in a more specific
manner
Applied Example using CART
CART Performance
Salford Systems ยฉ 2016 42
Method
Linear Regression 109.04
Linear Regression with
interactions
67.35
Min 1 SE CART (default
settings) 65.05
CART (default settings)
55.99
Random Forest(default
settings)
Improved Random Forest
using an SPM Automate
๐‘€๐‘†๐ธ =
1
๐‘›
๐‘–=1
๐‘›
๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘–
2
***More information
about the 1 Standard
Error Rule for CART
can be found in the
appendix
Where are we?
๏ƒผ CART Splitting Process
๏ƒผ CART Pruning
๏ƒผ Advantages of CART (this is what allows you to build
models with dirty data)
๏ƒผ Interpreting CART Output
๏ƒผ Applied Example using CART
๏ฑ Random Forest Section
Introduction to Random Forests
Main Idea: fit multiple CART trees to independent โ€œbootstrapโ€
samples of the data and then combine the predictions
Leo Breiman, one of the co-creators of CART,
also created Random Forests and published a
paper on this method in 2001
Our RandomForestsยฎ software was developed in close consultation
with Breiman himself
What is a bootstrap sample?
A bootstrap sample is a random sample conducted with replacement
Steps:
1. Randomly select an observation from the original data
2. โ€œWrite it downโ€
3. โ€œPut it backโ€ (i.e. any observation can be selected more than once)
Repeat steps 1-3 N times; N is the number of observations in the original sample
FINAL RESULT: One โ€œbootstrap sampleโ€ with N observations
โ€ฆ
Bootstrap SampleOriginal Data
0 48 3
0 48 3
0 37 1
0 37 1
0 37 1
0 . 10 . 1
0 24 4
0 24 4
0 37 1
Y X1 X2
Original Data
โ€ฆ..Bootstrap 1 Bootstrap 2 Bootstrap 199 Bootstrap 200
Final Prediction for a New Record: take the average of the 200 individual predictions
10.5 + 9.8 โ€ฆ + 10.73 + 12
200
Predict a New Record : run the record down each tree, each time computing a prediction
1. Draw a bootstrap sample
2. Fit a large, unpruned, CART tree to this bootstrap
sample
-At each split in the tree consider only k randomly
selected variables instead of all of them
3. Average the predictions to predict a new record
Repeat Steps 1-2 at least 200 times
Tree 1: 10.5 Tree 2: 9.8 Tree 199: 10.73 Tree 200: 12โ€ฆ..
CART and Random Forests
When you build a Random Forest model just keep this picture in
the back of your mind:
The reason is because a Random Forest is really just an average of
CART trees constructed on bootstrap samples of the original data
= ++ +โ€ฆ +1 2 3 B
[ ]
1
๐ต
CART and Random Forests
Random Forests generally have superior predictive performance versus CART trees
because Random Forests have lower variance than a single CART tree
Since Random Forests are a combination of CART trees they inherit many of CARTโ€™s
properties:
Automatic
Variable selection
Variable interaction detection
Nonlinear relationship detection
Missing value handling
Outlier handling
Modeling of local effects
Invariant to monotone transformations of predictors
One drawback is that a Random Forest is not as interpretable as a single CART tree
Random Forests: Tuning Parameters
The performance of a Random Forest is
dependent upon the values of certain model
parameters
Two of these parameters are
1. Number of trees
2. Random number of variables chosen at each split
Random Forests: Number of Trees
Number of trees
Default: 200
The number of trees should be large
enough so that the model error no
longer meaningfully declines as the
number of trees increases
(Experimentation will be required)
In Random Forests the optimal
number of trees tends to be the
maximum value allotted (due to the
Law of Large Numbers) Default Setting: 200 Trees My Setting: 400 Trees
There is not much of a difference between the error for a forest with
200 trees and one with 400 trees, so, at least for this dataset, a
larger number of trees will not improve the model meaningfully
Random Forests: Tuning Parameters
The performance of a Random Forest is
dependent upon the values of certain model
parameters
Two of these parameters are
1. Number of trees
2. Random number of variables chosen at each
split
Random Forest Parameters:
Random variable subset size
Random number of variables k chosen
at each split in each tree in the forest
Default: k=3
Experimentation will be required to
find the optimal value and this can be
done using Automate RFNPREDS
Automate RFNPREDS- automatically
build multiple Random Forests: each
time the forest is the same except
that the number of randomly selected
variables at each split in each CART
tree changes
This allows us to conveniently
determine the optimal number
of variables to randomly select
at each split in each tree
This output is telling you that optimal number of
randomly selected variables at each split in each
tree in the forest is 5.
Interpreting a Random Forest
Since a Random Forest is a collection of hundreds or
even thousands of CART trees, the simple
interpretation is lost because we now have hundreds of
trees and are averaging the predictions
One method used to interpret a Random Forest is
variable importance
Random Forest for Regression:
Variable Importance
CART Variable Importance: sum each variableโ€™s split
improvement score across the splits in the tree
The importance scores for variables are increased in two way: 1)
when the variable is actually used to a split a node and 2) when the
variable is as the surrogate split (i.e. the backup splitting variable
when the primary splitting variables has a missing value)
Random Forest Variable Importance for Regression:
1. Compute a score for every split the variable generates, sum the
scores across all splits made
๏‚ง Relative Importance: divide all variable importance scores by the
maximum variable importance score (i.e. the most important variable
has a relative importance value of 100)
Note: For classification models, the preferred method is the random permutation method (see appendix for more details)
Random Forest Demonstration in SPM
Random Forests: Model Performance
Salford Systems ยฉ 2016 56
Method
Linear Regression 109.04
Linear Regression with
interactions
67.35
Min 1 SE CART (default
settings 65.05
CART (default settings)
55.99
Random Forest (default
settings)
37.70
Improved Random Forest
using an SPM Automate
36.02
๐‘€๐‘†๐ธ =
1
๐‘›
๐‘–=1
๐‘›
๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘–
2
Conclusion
CART produces an interpretable model that is more resistant to
outliers, predicts future data well, and automatically handles
1. Variables interactions
2. Missing values
3. Nonlinear relationships
4. Local effects
Random Forests are fundamentally a combination of individual
CART trees and thus inherit all of the advantages of CART above
(except the nice interpretation)
*Generally is superior to a single CART tree in terms of predictive
accuracy
Next in the seriesโ€ฆ
Improve Your Regression with TreeNetยฎ Gradient Boosting
Try CART and Random Forest
Download SPMยฎ 8 now to start building CART and
Random Forest models on your data
We will be more than happy to personally help you
if you have any questions or need assistance
My Email: charrison@salford-systems.com
Support Email: support@salford-systems.com
The appendix follows this slide
Appendix
1 Standard Error Rule for CART trees
1SE Rule in SPM
Optimal Tree 1 Standard Error Rule Tree
1SE Rule Tree: the smallest tree whose error is within one standard deviation of the minimum error
Figures: Upper Left: Optimal Tree has 188 terminal nodes and a relative error of .177; Upper Right: the 1SE
tree has 85 terminal nodes and a relative error .209
Smaller trees are preferred because they are less likely to overfit the data (i.e. the 1SE tree in this case is competitive
in terms of accuracy and is much less complex) and they are easier to interpret
Relative Error =
๐ถ๐ด๐‘…๐‘‡ ๐‘€๐‘œ๐‘‘๐‘’๐‘™ ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘ข๐‘ ๐‘–๐‘›๐‘” ๐‘’๐‘–๐‘กโ„Ž๐‘’๐‘Ÿ ๐ฟ๐‘† ๐‘œ๐‘Ÿ ๐ฟ๐ด๐ท
๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘“๐‘œ๐‘Ÿ ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘›๐‘” ๐‘กโ„Ž๐‘’ ๐‘œ๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘™๐‘™ ๐‘Ž๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘”๐‘’ (๐‘œ๐‘Ÿ ๐‘š๐‘’๐‘‘๐‘–๐‘Ž๐‘› ๐‘–๐‘“ ๐‘ข๐‘ ๐‘–๐‘›๐‘” ๐ฟ๐ด๐ท) ๐‘“๐‘œ๐‘Ÿ ๐‘Ž๐‘™๐‘™ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘Ÿ๐‘‘๐‘ 

More Related Content

What's hot

Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
ย 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classificationKalpna Saharan
ย 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision TreesRupak Roy
ย 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmZHAO Sam
ย 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
ย 
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...butest
ย 
Machine Learning Decision Tree Algorithms
Machine Learning Decision Tree AlgorithmsMachine Learning Decision Tree Algorithms
Machine Learning Decision Tree AlgorithmsRupak Roy
ย 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
ย 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methodsKrish_ver2
ย 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction TechniquesVishal Patel
ย 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Kazi Toufiq Wadud
ย 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
ย 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationKomal Kotak
ย 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
ย 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysisGreg Makowski
ย 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMd. Ariful Hoque
ย 
Building & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseBuilding & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseSiddhanth Chaurasiya
ย 
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile TelecommunicationsChurn Modeling For Mobile Telecommunications
Churn Modeling For Mobile TelecommunicationsSalford Systems
ย 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
ย 

What's hot (20)

Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
ย 
Unit 3classification
Unit 3classificationUnit 3classification
Unit 3classification
ย 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
ย 
Covering (Rules-based) Algorithm
Covering (Rules-based) AlgorithmCovering (Rules-based) Algorithm
Covering (Rules-based) Algorithm
ย 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
ย 
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...CC282 Decision trees Lecture 2 slides for CC282 Machine ...
CC282 Decision trees Lecture 2 slides for CC282 Machine ...
ย 
Classification
ClassificationClassification
Classification
ย 
Machine Learning Decision Tree Algorithms
Machine Learning Decision Tree AlgorithmsMachine Learning Decision Tree Algorithms
Machine Learning Decision Tree Algorithms
ย 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
ย 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
ย 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
ย 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
ย 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
ย 
Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
ย 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
ย 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysis
ย 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
ย 
Building & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business CaseBuilding & Evaluating Predictive model: Supermarket Business Case
Building & Evaluating Predictive model: Supermarket Business Case
ย 
Churn Modeling For Mobile Telecommunications
Churn Modeling For Mobile TelecommunicationsChurn Modeling For Mobile Telecommunications
Churn Modeling For Mobile Telecommunications
ย 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
ย 

Viewers also liked

Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
ย 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
ย 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
ย 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Salford Systems
ย 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEChaoyi WU
ย 
Analysis Of A Binary Outcome Variable
Analysis Of A Binary Outcome VariableAnalysis Of A Binary Outcome Variable
Analysis Of A Binary Outcome VariableArthur8898
ย 
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...Salford Systems
ย 
Case Study: American Family Insurance Best Practices for Automating Guidewire...
Case Study: American Family Insurance Best Practices for Automating Guidewire...Case Study: American Family Insurance Best Practices for Automating Guidewire...
Case Study: American Family Insurance Best Practices for Automating Guidewire...CA Technologies
ย 
Tournament Overview
Tournament OverviewTournament Overview
Tournament Overviewfulgie
ย 
Data mining for diabetes readmission
Data mining for diabetes readmissionData mining for diabetes readmission
Data mining for diabetes readmissionYi Chun (Nancy) Chien
ย 
Predicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetPredicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetSalford Systems
ย 
H2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark LandryH2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark LandrySri Ambati
ย 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleSajith Edirisinghe
ย 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
ย 
classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning Shiraz316
ย 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forestLippo Group Digital
ย 
Forecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club dataForecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club dataArchange Giscard DESTINE
ย 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
ย 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionKhaled Abd Elaziz
ย 

Viewers also liked (20)

Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
ย 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
ย 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
ย 
Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications Churn Modeling-For-Mobile-Telecommunications
Churn Modeling-For-Mobile-Telecommunications
ย 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
ย 
Analysis Of A Binary Outcome Variable
Analysis Of A Binary Outcome VariableAnalysis Of A Binary Outcome Variable
Analysis Of A Binary Outcome Variable
ย 
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...
ย 
Fusa 17 august 2015
Fusa   17 august 2015Fusa   17 august 2015
Fusa 17 august 2015
ย 
Case Study: American Family Insurance Best Practices for Automating Guidewire...
Case Study: American Family Insurance Best Practices for Automating Guidewire...Case Study: American Family Insurance Best Practices for Automating Guidewire...
Case Study: American Family Insurance Best Practices for Automating Guidewire...
ย 
Tournament Overview
Tournament OverviewTournament Overview
Tournament Overview
ย 
Data mining for diabetes readmission
Data mining for diabetes readmissionData mining for diabetes readmission
Data mining for diabetes readmission
ย 
Predicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetPredicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNet
ย 
H2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark LandryH2O World - GBM and Random Forest in H2O- Mark Landry
H2O World - GBM and Random Forest in H2O- Mark Landry
ย 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - Kaggle
ย 
Predictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big dataPredictive Modeling in Insurance in the context of (possibly) big data
Predictive Modeling in Insurance in the context of (possibly) big data
ย 
classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning classification_methods-logistic regression Machine Learning
classification_methods-logistic regression Machine Learning
ย 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
ย 
Forecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club dataForecasting P2P Credit Risk based on Lending Club data
Forecasting P2P Credit Risk based on Lending Club data
ย 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
ย 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
ย 

Similar to Improve Your Regression with CART and RandomForests

introduction to scientific computing
introduction to scientific computingintroduction to scientific computing
introduction to scientific computingHaiderParekh1
ย 
qc-tools.ppt
qc-tools.pptqc-tools.ppt
qc-tools.pptAlpharoot
ย 
01_FEA overview 2023-1 of fhtr j thrf for any.pptx
01_FEA overview 2023-1 of fhtr j thrf for any.pptx01_FEA overview 2023-1 of fhtr j thrf for any.pptx
01_FEA overview 2023-1 of fhtr j thrf for any.pptxRaviBabaladi2
ย 
Engineering Data Analysis-ProfCharlton
Engineering Data  Analysis-ProfCharltonEngineering Data  Analysis-ProfCharlton
Engineering Data Analysis-ProfCharltonCharltonInao1
ย 
Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2nurun2010
ย 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
ย 
Statistics project2
Statistics project2Statistics project2
Statistics project2shri1984
ย 
Useful Tools for Problem Solving by Operational Excellence Consulting
Useful Tools for Problem Solving by Operational Excellence ConsultingUseful Tools for Problem Solving by Operational Excellence Consulting
Useful Tools for Problem Solving by Operational Excellence ConsultingOperational Excellence Consulting
ย 
statistic project on Hero motocorp
statistic project on Hero motocorpstatistic project on Hero motocorp
statistic project on Hero motocorpYug Bokadia
ย 
Describing Data: Numerical Measures
Describing Data: Numerical MeasuresDescribing Data: Numerical Measures
Describing Data: Numerical MeasuresConflagratioNal Jahid
ย 
report
reportreport
reportArthur He
ย 
Measures of dispersion range qd md
Measures of dispersion range qd mdMeasures of dispersion range qd md
Measures of dispersion range qd mdRekhaChoudhary24
ย 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
ย 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
ย 
IRJET- Segregation of Machines According to the Noise Emitted by Differen...
IRJET-  	  Segregation of Machines According to the Noise Emitted by Differen...IRJET-  	  Segregation of Machines According to the Noise Emitted by Differen...
IRJET- Segregation of Machines According to the Noise Emitted by Differen...IRJET Journal
ย 
13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solutionKrunal Shah
ย 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
ย 
A Comprehensive Introduction of the Finite Element Method for Undergraduate C...
A Comprehensive Introduction of the Finite Element Method for Undergraduate C...A Comprehensive Introduction of the Finite Element Method for Undergraduate C...
A Comprehensive Introduction of the Finite Element Method for Undergraduate C...IJERA Editor
ย 
STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)sumanmathews
ย 

Similar to Improve Your Regression with CART and RandomForests (20)

introduction to scientific computing
introduction to scientific computingintroduction to scientific computing
introduction to scientific computing
ย 
qc-tools.ppt
qc-tools.pptqc-tools.ppt
qc-tools.ppt
ย 
01_FEA overview 2023-1 of fhtr j thrf for any.pptx
01_FEA overview 2023-1 of fhtr j thrf for any.pptx01_FEA overview 2023-1 of fhtr j thrf for any.pptx
01_FEA overview 2023-1 of fhtr j thrf for any.pptx
ย 
Engineering Data Analysis-ProfCharlton
Engineering Data  Analysis-ProfCharltonEngineering Data  Analysis-ProfCharlton
Engineering Data Analysis-ProfCharlton
ย 
Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2Lesson 7 measures of dispersion part 2
Lesson 7 measures of dispersion part 2
ย 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
ย 
Statistics project2
Statistics project2Statistics project2
Statistics project2
ย 
Useful Tools for Problem Solving by Operational Excellence Consulting
Useful Tools for Problem Solving by Operational Excellence ConsultingUseful Tools for Problem Solving by Operational Excellence Consulting
Useful Tools for Problem Solving by Operational Excellence Consulting
ย 
statistic project on Hero motocorp
statistic project on Hero motocorpstatistic project on Hero motocorp
statistic project on Hero motocorp
ย 
Describing Data: Numerical Measures
Describing Data: Numerical MeasuresDescribing Data: Numerical Measures
Describing Data: Numerical Measures
ย 
report
reportreport
report
ย 
Measures of dispersion range qd md
Measures of dispersion range qd mdMeasures of dispersion range qd md
Measures of dispersion range qd md
ย 
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
ย 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
ย 
IRJET- Segregation of Machines According to the Noise Emitted by Differen...
IRJET-  	  Segregation of Machines According to the Noise Emitted by Differen...IRJET-  	  Segregation of Machines According to the Noise Emitted by Differen...
IRJET- Segregation of Machines According to the Noise Emitted by Differen...
ย 
13 ch ken black solution
13 ch ken black solution13 ch ken black solution
13 ch ken black solution
ย 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
ย 
A Comprehensive Introduction of the Finite Element Method for Undergraduate C...
A Comprehensive Introduction of the Finite Element Method for Undergraduate C...A Comprehensive Introduction of the Finite Element Method for Undergraduate C...
A Comprehensive Introduction of the Finite Element Method for Undergraduate C...
ย 
STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)
ย 
Training on qc
Training on qcTraining on qc
Training on qc
ย 

More from Salford Systems

Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Salford Systems
ย 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningSalford Systems
ย 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerSalford Systems
ย 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
ย 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To RememberSalford Systems
ย 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to marsSalford Systems
ย 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher EducationSalford Systems
ย 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingSalford Systems
ย 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hivSalford Systems
ย 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning CombinationSalford Systems
ย 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSalford Systems
ย 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSalford Systems
ย 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998Salford Systems
ย 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPMSalford Systems
ย 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7Salford Systems
ย 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012Salford Systems
ย 
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART  Decision Trees:  A Winning CombinationTreeNet Tree Ensembles and CART  Decision Trees:  A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning CombinationSalford Systems
ย 
Text mining tutorial
Text mining tutorialText mining tutorial
Text mining tutorialSalford Systems
ย 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningSalford Systems
ย 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeSalford Systems
ย 

More from Salford Systems (20)

Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
ย 
The Do's and Don'ts of Data Mining
The Do's and Don'ts of Data MiningThe Do's and Don'ts of Data Mining
The Do's and Don'ts of Data Mining
ย 
Introduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele CutlerIntroduction to Random Forests by Dr. Adele Cutler
Introduction to Random Forests by Dr. Adele Cutler
ย 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
ย 
Statistically Significant Quotes To Remember
Statistically Significant Quotes To RememberStatistically Significant Quotes To Remember
Statistically Significant Quotes To Remember
ย 
Evolution of regression ols to gps to mars
Evolution of regression   ols to gps to marsEvolution of regression   ols to gps to mars
Evolution of regression ols to gps to mars
ย 
Data Mining for Higher Education
Data Mining for Higher EducationData Mining for Higher Education
Data Mining for Higher Education
ย 
Comparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modelingComparison of statistical methods commonly used in predictive modeling
Comparison of statistical methods commonly used in predictive modeling
ย 
Molecular data mining tool advances in hiv
Molecular data mining tool  advances in hivMolecular data mining tool  advances in hiv
Molecular data mining tool advances in hiv
ย 
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees:  A Winning CombinationTreeNet Tree Ensembles & CART Decision Trees:  A Winning Combination
TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination
ย 
SPM v7.0 Feature Matrix
SPM v7.0 Feature MatrixSPM v7.0 Feature Matrix
SPM v7.0 Feature Matrix
ย 
SPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARSSPM User's Guide: Introducing MARS
SPM User's Guide: Introducing MARS
ย 
Hybrid cart logit model 1998
Hybrid cart logit model 1998Hybrid cart logit model 1998
Hybrid cart logit model 1998
ย 
Session Logs Tutorial for SPM
Session Logs Tutorial for SPMSession Logs Tutorial for SPM
Session Logs Tutorial for SPM
ย 
Some of the new features in SPM 7
Some of the new features in SPM 7Some of the new features in SPM 7
Some of the new features in SPM 7
ย 
TreeNet Overview - Updated October 2012
TreeNet Overview  - Updated October 2012TreeNet Overview  - Updated October 2012
TreeNet Overview - Updated October 2012
ย 
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
TreeNet Tree Ensembles and CART  Decision Trees:  A Winning CombinationTreeNet Tree Ensembles and CART  Decision Trees:  A Winning Combination
TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination
ย 
Text mining tutorial
Text mining tutorialText mining tutorial
Text mining tutorial
ย 
Paradigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learningParadigm shifts in wildlife and biodiversity management through machine learning
Paradigm shifts in wildlife and biodiversity management through machine learning
ย 
Global Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate ChangeGlobal Modeling of Biodiversity and Climate Change
Global Modeling of Biodiversity and Climate Change
ย 

Recently uploaded

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
ย 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
ย 
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
ย 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
ย 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
ย 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
ย 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
ย 
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...Florian Roscheck
ย 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
ย 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
ย 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
ย 
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ffjhghh
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
ย 

Recently uploaded (20)

Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
ย 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
ย 
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
๊งโค Aerocity Call Girls Service Aerocity Delhi โค๊ง‚ 9999965857 โ˜Ž๏ธ Hard And Sexy ...
ย 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
ย 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
ย 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
ย 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
ย 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
ย 
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
From idea to production in a day โ€“ Leveraging Azure ML and Streamlit to build...
ย 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
ย 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
ย 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
ย 
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰																			ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ๅฎšๅˆถ่‹ฑๅ›ฝ็™ฝ้‡‘ๆฑ‰ๅคงๅญฆๆฏ•ไธš่ฏ๏ผˆUCBๆฏ•ไธš่ฏไนฆ๏ผ‰ ๆˆ็ปฉๅ•ๅŽŸ็‰ˆไธ€ๆฏ”ไธ€
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
ย 

Improve Your Regression with CART and RandomForests

  • 1. Improve Your Regression CARTยฎ and RandomForestsยฎ Charles Harrison Marketing Statistician
  • 2. Outline Applications of CART and Random Forests Ordinary Least Squares Regression โ€“ A review โ€“ Common issues in standard linear regression Data Description Improving your regression with an applied example โ€“ CART decision tree โ€“ Random Forest Conclusions Salford Systems ยฉ 2016 2
  • 3. Applications In this webinar we use CARTยฎ software and RandomForestsยฎ software to predict concrete strength, but as we will see these techniques can be applied to any field Quantitative Targets: Number of Cavities, Blood Pressure, Income etc. Qualitative Targets: Disease or No Disease; Buy or Not Buy; Lend or Do Not Lend; Buy Product A vs Product B vs. Product C vs. Product D Examples Credit Risk Glaucoma Screening Insurance Fraud Customer Loyalty Drug Discovery Early Identification of Reading Disabilities Biodiversity and Wildlife Conservation
  • 4. Preview: CART and Random Forest Advantages As we will see in this presentation both CART and Random Forests have desirable properties that allow you to build accurate predictive models with dirty data (i.e. missing values, lots of variables, nonlinear relationships, outliers etc.) Preview: Geometry of a CART tree (1 split) Preview: Geometry of a CART tree (2 splits)
  • 5. Preview: Model Performance Salford Systems ยฉ 2016 5 Method Linear Regression 109.04 Linear Regression with interactions 67.35 Min 1 SE CART (default settings 65.05 CART (default settings) 55.99 RandomForestsยฎ (default settings) 37.570 Improved RandomForestsยฎ using an SPM Automate 36.02 ๐‘€๐‘†๐ธ = 1 ๐‘› ๐‘–=1 ๐‘› ๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘– 2
  • 6. What is OLS? OLS โ€“ ordinary least squares regression โ€“ Discovered by Legendre (1805) and Gauss (1809) to solve problems in astronomy using pen and paper The model is of the form ๐œท ๐ŸŽ โ€“ the intercept term ๐œท ๐Ÿ, ๐œท ๐Ÿ, ๐œท ๐Ÿ‘โ€ฆ โ€“ coefficient estimates ๐’™ ๐Ÿ, ๐’™ ๐Ÿ, ๐’™ ๐Ÿ‘, โ€ฆ ๐’™ ๐’‘ - predictor variables (i.e. columns in the dataset) Example: Income= 20,000 + 2,500*WorkExperience + 1,000*EducationYears Salford Systems ยฉ 2016 6 Y = ๐œท ๐ŸŽ + ๐œท ๐Ÿ ๐’™ ๐Ÿ+ ๐œท ๐Ÿ ๐’™ ๐Ÿ+ ๐œท ๐Ÿ‘ ๐’™ ๐Ÿ‘ + โ€ฆ + ๐œท ๐’‘ ๐’™ ๐’‘
  • 7. Common Issues in Regression Missing values โ€“ Requires imputation OR โ€“ Results in record deletion Nonlinearities and Local Effects โ€“ Example: Y = 10 + 3๐‘ฅ1 + ๐‘ฅ2 โˆ’ .3๐‘ฅ1 2 โ€“ Modeled via manual transformations or they are automatically added and then selected via forward, backward, stepwise, or regularization โ€“ Ignores local effects unless specified by the analyst, but this is very difficult/impossible in practice without subject matter expertise or prior knowledge Interactions โ€“ Example: ๐‘Œ = 10 + 3๐‘ฅ1 โˆ’ 2๐‘ฅ2 + .25๐‘ฅ1 ๐‘ฅ2 โ€“ Manually added to the model (or through some automated procedure) โ€“ Add interactions then use variable selection (i.e. regularized regression or forward, backward, or stepwise selection) Variable selection โ€“ Usually accomplished manually or in combination with automated selection procedures Salford Systems ยฉ 2016 7
  • 8. Solutions to OLS Problems Two methods that do not suffer from the drawbacks of linear regression are CART and Random Forests These methods automatically โ€“ Handle missing values โ€“ Model nonlinear relationships and local effects โ€“ Select variables โ€“ Model variable interactions Salford Systems ยฉ 2016 8
  • 9. Concrete Strength Target: โ€“ STRENGTH Compressive strength of concrete in megapascals Predictors: โ€“ CEMENT โ€“ BLAST_FURNACE_SLAG โ€“ FLY_ASH โ€“ WATER โ€“ SUPERPLASTICIZER โ€“ COARSE_AGGREGATE โ€“ FINE_AGGREGATE โ€“ AGE Salford Systems ยฉ 2016 9 I-Cheng Yeh, "Modeling of strength of high performance concrete using artificial neural networks," Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998)
  • 10. Why predict concrete strength? Concrete is one of the most important materials in our society and is a key ingredient in important infrastructure projects like bridges, roads, buildings, and dams (MATSE) Predicting the strength of concrete is important because its concrete strength is a key component of the overall stability these structures Source: http://matse1.matse.illinois.edu/concrete/prin.html
  • 11. Data Sample Salford Systems ยฉ 2016 11 Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate Age Strength 540 0 0 162 2.5 1040 676 28 79.98611076 540 0 0 162 2.5 1055 676 28 61.88736576 332.5 142.5 0 228 0 932 594 270 40.26953526 332.5 142.5 0 228 0 932 594 365 41.05277999 198.6 132.4 0 192 0 978.4 825.5 360 44.2960751 266 114 0 228 0 932 670 90 47.02984744 380 95 0 228 0 932 594 365 43.6982994 380 95 0 228 0 932 594 28 36.44776979 266 114 0 228 0 932 670 28 45.85429086 475 0 0 228 0 932 594 28 39.28978986 198.6 132.4 0 192 0 978.4 825.5 90 38.07424367 198.6 132.4 0 192 0 978.4 825.5 28 28.02168359 427.5 47.5 0 228 0 932 594 270 43.01296026 190 190 0 228 0 932 670 90 42.32693164 304 76 0 228 0 932 670 28 47.81378165 380 0 0 228 0 932 670 90 52.90831981
  • 12. Regression Results Salford Systems ยฉ 2016 12 Method Linear Regression 109.04 Linear Regression with interactions 67.35 Strength = -9.70 + .115*Cement + .01*BlastFurnaceSlag + .014*FlyAsh - .172*Water + .10*Superplasticizer + .01*CoarseAggregate + .01*FineAggregate + .11*Age ๐‘‡๐‘’๐‘ ๐‘ก ๐‘€๐‘†๐ธ = 1 ๐‘› ๐‘–=1 ๐‘› ๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘– 2 **Test sample: 20% of observations were randomly selected for the testing dataset **This same test dataset was used to evaluate all models for the purpose of comparisons
  • 13. Classification And Regression Trees Authors: Breiman, Friedman, Olshen, and Stone (1984) CART is a decision tree algorithm used for both regression and classification problems 1. Classification: tries to separate classes by choosing variables and points that best separate them 2. Regression: chooses the best variables and split points for reducing the squared or absolute error criterion CART is available exclusively in the SPMยฎ 8 Software Suite and was developed in close consultation with the original authors CART: Introduction
  • 14. CART Introduction Main Idea: divide the predictor variables (often people say โ€œpartitionโ€ instead of โ€œdivideโ€) into different regions so that the dependent variable can be predicted more accurately. The following shows the predicted values from a CART tree (i.e. the red horizontal bars) to the curve ๐‘Œ = ๐‘ฅ2 + ๐‘›๐‘œ๐‘–๐‘ ๐‘’. โ€œnoiseโ€ is from a N(0,1) ๐‘Œ ๐‘ฅ
  • 15. CART: Terminology A tree split occurs when a variable is partitioned (in-depth example starts after the next slide). This tree has two splits: 1. AGE_DAY <=21 2. CEMENT_AMT <=355.95 The node at the top of the tree is called the root node A node that has no sub-branch is a terminal node This tree has three terminal nodes (i.e. red boxes in the tree) AGE_DAY <= 21.00 Terminal Node 1 STD = 12.441 Avg = 23.944 W = 260.000 N = 260 CEMENT_AMT <= 355.95 Terminal Node 2 STD = 12.683 Avg = 37.036 W = 436.000 N = 436 CEMENT_AMT > 355.95 Terminal Node 3 STD = 13.452 Avg = 57.026 W = 129.000 N = 129 AGE_DAY > 21.00 Node 2 CEMENT_AMT <= 355.95 STD = 15.358 Avg = 41.600 W = 565.000 N = 565 Node 1 AGE_DAY <= 21.00 STD = 16.661 Avg = 36.036 W = 825.000 N = 825 The predicted value in a CART regression model is the average of the target variable (i.e. โ€œYโ€) for the records that fall into one of the terminal nodes Example: If Age = 26 days and the amount of cement is 400 then the predicted strength is 57.026 megapasucals
  • 16. CART: AlgorithmStep 1: Grow a large tree This is done for you automatically All variables are considered at each split in the tree Each split is made using one variable and a specific value or set of values. Splits are chosen so as to minimize model error The tree is grown until either a user-specified criterion is met or until the tree cannot be grown further Step2: Prune the large tree This is also done for you automatically Use either a test sample or cross validation to prune subtrees
  • 17. CART: Splitting Procedure Consider the following CART tree grown on this dataset How exactly do we get this tree? Y ๐‘‹1 ๐‘‹2 79.9861 162 28 61.8874 162 28 40.2695 228 270 41.0528 228 365 44.2961 192 360 47.0298 228 90 43.6983 228 365 36.4478 228 28 45.8543 228 28 39.2898 228 28 38.0742 192 90 28.0217 192 28 43.013 228 270 42.3269 228 90 47.8138 228 28 52.9083 228 90 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 <= 210.00 Terminal Node 2 STD = 6.705 Avg = 36.797 W = 3.000 N = 3 X1 > 210.00 Terminal Node 3 STD = 4.375 Avg = 43.609 W = 11.000 N = 11 X1 > 177.00 Node 2 X1 <= 210.00 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16
  • 18. CART: Splitting Procedure Step 1: Find the best split point for the variable ๐‘‹1 ๏ƒ˜ Sort the variable ๐‘‹1 ๏ƒ˜ Compute the split improvement for each split point ๏ƒ˜ Best split for ๐‘‹1 : ๏ƒ˜ ๐‘ฟ ๐Ÿ โ‰ค 177 Note: the midpoint between ๐‘‹1 = 192 and ๐‘‹1 = 162 is 177 Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 40.27 228 270 41.05 228 365 44.30 192 360 47.03 228 90 43.70 228 365 36.45 228 28 45.85 228 28 39.29 228 28 38.07 192 90 28.02 192 28 43.01 228 270 42.33 228 90 47.81 228 28 52.91 228 90 Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 28.02 192 28 38.07 192 90 44.3 192 360 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 Y ๐‘‹1 ๐‘‹2 28.02 192 28 38.07 192 90 44.3 192 360 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 Split Improvement: โˆ†๐‘… ๐‘ , ๐‘ก = ๐‘… ๐‘ก โˆ’ ๐‘… ๐‘ก ๐ฟ โˆ’ ๐‘…(๐‘ก ๐‘…) ๐‘… ๐‘ก = 1 ๐‘ ๐‘ฅ ๐‘› ๐œ–๐‘ก (๐‘ฆ๐‘› โˆ’ ๐‘ฆ ๐‘ก )2 ๐ฟ๐‘’๐‘Ž๐‘ ๐‘ก ๐‘†๐‘ž๐‘ข๐‘Ž๐‘Ÿ๐‘’๐‘ 
  • 19. CART: Splitting Procedure Step 2: Find the best split point for the variable ๐‘‹2 ๏ƒ˜ Sort the variable ๐‘‹2 ๏ƒ˜ Compute the split improvement for each split point ๏ƒ˜ Best Split for ๐‘‹2: ๏ƒ˜ ๐‘ฟ ๐Ÿ โ‰ค 59 Note: the midpoint between ๐‘‹2 = 28 and ๐‘‹2 = 90 is 59 Y ๐‘‹1 ๐‘‹2 79.9861 162 28 61.8874 162 28 40.2695 228 270 41.0528 228 365 44.2961 192 360 47.0298 228 90 43.6983 228 365 36.4478 228 28 45.8543 228 28 39.2898 228 28 38.0742 192 90 28.0217 192 28 43.013 228 270 42.3269 228 90 47.8138 228 28 52.9083 228 90 Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 28.02 192 28 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 38.07 192 90 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 44.30 192 360 41.05 228 365 43.70 228 365 X2 <= 59.00 Terminal Node 1 STD = 16.158 Avg = 48.472 W = 7.000 N = 7 X2 > 59.00 Terminal Node 2 STD = 4.069 Avg = 43.630 W = 9.000 N = 9 Node 1 X2 <= 59.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 Y ๐‘‹1 ๐‘‹2 38.07 192 90 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 44.30 192 360 41.05 228 365 43.70 228 365 Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 28.02 192 28 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 Split Improvement: โˆ†๐‘… ๐‘ , ๐‘ก = ๐‘… ๐‘ก โˆ’ ๐‘… ๐‘ก ๐ฟ โˆ’ ๐‘…(๐‘ก ๐‘…) ๐‘… ๐‘ก = 1 ๐‘ ๐‘ฅ ๐‘› ๐œ–๐‘ก (๐‘ฆ๐‘› โˆ’ ๐‘ฆ ๐‘ก )2 ๐ฟ๐‘’๐‘Ž๐‘ ๐‘ก ๐‘†๐‘ž๐‘ข๐‘Ž๐‘Ÿ๐‘’๐‘ 
  • 20. CART: Splitting Procedure At this point CART has evaluated all possible split points for our two variables, ๐‘‹1 and ๐‘‹2, and determined the optimal split points for each. Splitting on either ๐‘‹1 or ๐‘‹2 will yield a different tree, so what is the best split? The one with the largest split improvement. Best split for ๐‘ฟ ๐Ÿ: ๐‘‹1 โ‰ค 177 Improvement Value: 90.64 Best split for ๐‘ฟ ๐Ÿ: ๐‘‹2 โ‰ค 59 Improvement Value: 5.77 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 X2 <= 59.00 Terminal Node 1 STD = 16.158 Avg = 48.472 W = 7.000 N = 7 X2 > 59.00 Terminal Node 2 STD = 4.069 Avg = 43.630 W = 9.000 N = 9 Node 1 X2 <= 59.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 Split Improvement: โˆ†๐‘… ๐‘ , ๐‘ก = ๐‘… ๐‘ก โˆ’ ๐‘… ๐‘ก ๐ฟ โˆ’ ๐‘…(๐‘ก ๐‘…) ๐‘… ๐‘ก = 1 ๐‘ ๐‘ฅ ๐‘› ๐œ–๐‘ก (๐‘ฆ๐‘› โˆ’ ๐‘ฆ ๐‘ก )2 ๐ฟ๐‘’๐‘Ž๐‘ ๐‘ก ๐‘†๐‘ž๐‘ข๐‘Ž๐‘Ÿ๐‘’๐‘ 
  • 21. CART: 1st Split Our best first split in the tree is ๐‘‹1 โ‰ค 177 which leads to the following tree and partitioned dataset Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 Y ๐‘‹1 ๐‘‹2 28.02 192 28 38.07 192 90 44.3 192 360 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 28.02 192 28 38.07 192 90 44.3 192 360 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 Note: the predicted values for this tree are the respective averages in each terminal node. Terminal Node 1 predicted value: 79.99+61.89 โ‰ˆ 70.94 Terminal Node 1 Terminal Node 2
  • 22. CART Geometry Y ๐‘‹1 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 ๐‘‹1 ๐‘‹2
  • 23. CART: Splitting Procedure So how do we get to our final tree? X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 <= 210.00 Terminal Node 2 STD = 6.705 Avg = 36.797 W = 3.000 N = 3 X1 > 210.00 Terminal Node 3 STD = 4.375 Avg = 43.609 W = 11.000 N = 11 X1 > 177.00 Node 2 X1 <= 210.00 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16
  • 24. CART: Splitting Procedure We now perform the same procedure again, but this time for each partition of the data (we can only split one partition at a time) Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 Y ๐‘‹1 ๐‘‹2 28.02 192 28 38.07 192 90 44.3 192 360 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 Y ๐‘‹1 ๐‘‹2 28.02 192 28 38.07 192 90 44.3 192 360 Y ๐‘‹1 ๐‘‹2 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 Best Split: Split Partition 2 at ๐‘‹1 โ‰ค 210Partition 1 Partition 2 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 <= 210.00 Terminal Node 2 STD = 6.705 Avg = 36.797 W = 3.000 N = 3 X1 > 210.00 Terminal Node 3 STD = 4.375 Avg = 43.609 W = 11.000 N = 11 X1 > 177.00 Node 2 X1 <= 210.00 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16
  • 25. CART Geometry X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 ๐‘‹1 ๐‘‹2 Y X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 <= 210.00 Terminal Node 2 STD = 6.705 Avg = 36.797 W = 3.000 N = 3 X1 > 210.00 Terminal Node 3 STD = 4.375 Avg = 43.609 W = 11.000 N = 11 X1 > 177.00 Node 2 X1 <= 210.00 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16
  • 26. Where are we? ๏ƒผ CART Splitting Process ๏ฑ CART Pruning ๏ฑ Advantages of CART ๏ฑ Interpreting CART Output ๏ฑ Applied Example using CART ๏ฑ Random Forest Section
  • 27. CART: Algorithm Step 1: Grow a large tree Step2: Prune the large tree This is also done for you automatically Use either a test sample or cross validation to prune subtrees
  • 28. CART: Pruning with a Test Sample Test sample- randomly select a certain percentage of data (often ~20%-30%) to be used to assess the model error Prune the CART tree 1. Run the test data down the large tree and the smaller trees (the smaller trees are called โ€œsubtreesโ€) 2. Compute the test error for each tree 3. The final tree shown to the user is the tree with the smallest test error Subtree Error 1 200 2 125 3 100 4 83 5 113 6 137
  • 29. Where are we? ๏ƒผ CART Splitting Process ๏ƒผ CART Pruning ๏ฑ Advantages of CART (this is what allows you to build models with dirty data) ๏ฑ Interpreting CART Output ๏ฑ Applied Example using CART ๏ฑ Random Forest Section
  • 30. CART Advantages In practice, you can build CART models with dirty data (i.e. missing values, lots of variables, nonlinear relationships, outliers, and numerous local effects) This is due to CARTโ€™s desirable properties: 1. Easy to interpret 2. Automatic handling of the following: a) Variable selection b) Variable interaction modeling c) Local effect modeling d) Nonlinear relationship modeling e) Missing values f) Outliers 3. Not affected by monotonic transformations of variables
  • 31. CART: Interpretation and Automatic Variable Selection Interpretation: CART trees have a simple interpretation and only require that someone ask themselves a series of โ€œyes or noโ€ questions like โ€œIs Age_Day <= 21?โ€ etc. Variable Selection: All variables will be considered for each split but not all variables will be used. Some variables will be used more than others. Only one variable is used for each split The variables chosen are those that reduce the error the most AGE_DAY <= 21.00 Terminal Node 1 STD = 12.441 Avg = 23.944 W = 260.000 N = 260 CEMENT_AMT <= 355.95 Terminal Node 2 STD = 12.683 Avg = 37.036 W = 436.000 N = 436 CEMENT_AMT > 355.95 Terminal Node 3 STD = 13.452 Avg = 57.026 W = 129.000 N = 129 AGE_DAY > 21.00 Node 2 CEMENT_AMT <= 355.95 STD = 15.358 Avg = 41.600 W = 565.000 N = 565 Node 1 AGE_DAY <= 21.00 STD = 16.661 Avg = 36.036 W = 825.000 N = 825
  • 32. CART: Automatic Variable Interactions and Local Effects In regression interaction terms modeled globally in the form x1*x2 or x1*x2*x3 (global means that the interaction is present everywhere). In CART interactions are automatically modeled over certain regions of the data (i.e. locally) so you do not have to worry about adding interaction terms or local terms to your model Example: Notice how the prediction changes for different amounts of cement given that the Age is over 21 days (i.e. this is the interaction) 1. If Age > 21 and Cement Amount <= 355. 95 then the average strength is 37 megapascuals 2. If Age > 21 and Cement Amount > 355.95 then the average strength is 57 megapascuals AGE_DAY <= 21.00 Terminal Node 1 STD = 12.441 Avg = 23.944 W = 260.000 N = 260 CEMENT_AMT <= 355.95 Terminal Node 2 STD = 12.683 Avg = 37.036 W = 436.000 N = 436 CEMENT_AMT > 355.95 Terminal Node 3 STD = 13.452 Avg = 57.026 W = 129.000 N = 129 AGE_DAY > 21.00 Node 2 CEMENT_AMT <= 355.95 STD = 15.358 Avg = 41.600 W = 565.000 N = 565 Node 1 AGE_DAY <= 21.00 STD = 16.661 Avg = 36.036 W = 825.000 N = 825
  • 33. CART: Automatic Nonlinear Modeling Nonlinear functions (and linear) are approximated via step functions, so in practice you do not need to worry about adding terms like ๐’™ ๐Ÿ ๐’๐’“ ๐’๐’ ๐’™ to capture nonlinear relationships. The picture below is the CART fit to ๐‘Œ = ๐‘‹2 + noise. CART modeled this data automatically. No data pre-processing. Just CART. ๐‘Œ ๐‘‹
  • 34. CART: Automatic Missing Value Handling CART automatically handles missing values while building the model, so you do not need to impute missing values yourself The missing values are handled using a surrogate split Surrogate Split- find another variable whose split is โ€œsimilarโ€ to the variable with the missing values and split on the variable that does not have missing values Reference: see Section 5.3 in Breiman, Friedman, Olshen, and Stone for more information
  • 35. CART: Outliers in the Target Variable Two types of outliers are 1. Outliers in the target variable (i.e. โ€œYโ€) 2. Outliers in the predictor variable (i.e. โ€œxโ€) CART is more sensitive to outliers with respect to the target variable 1. More severe in a regression context than a classification context 2. CART may treat target variable outliers by isolating them in small terminal nodes which can limit their effect Reference: Pages 197-200 and 253 in Breiman, Friedman, Olshen, and Stone (1984) ๐‘‹1 ๐‘‹2 Y Here the target outliers are isolated in terminal node 1 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16
  • 36. CART: Outliers in the Predictor Variables CART is more robust to outliers in the predictor variables partly due to nature of the splitting process Reference: Pages 197-200 and 253 in Breiman, Friedman, Olshen, and Stone (1984) Y X1 <= 177.00 Terminal Node 1 STD = 8.532 Avg = 73.957 W = 3.000 N = 3 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.149 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 13.661 Avg = 47.762 W = 17.000 N = 17 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 ๐‘‹1 ๐‘‹2
  • 37. CART: Monotonic Transformations of Variables Monotonic transformation- is a transformation that does not change the order of a variable. โ€“ CART, unlike linear regression, is not affected by this so if a transformation does not affect the order of a variable then you do not need to worry about adding it to a CART model โ€“ Example: Our best first split in the example tree was ๐‘‹1 โ‰ค 177. What happens if we square ๐‘‹1? The split point value changes, but nothing else does including the predicted values. This happens because the same Y values fall into the same partition (i.e. their order has not changed after we squared and sorted ๐‘‹1) Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 Y ๐‘‹1 ๐‘‹2 28.02 192 28 38.07 192 90 44.3 192 360 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 Y ๐‘‹1 ๐‘‹2 79.99 162 28 61.89 162 28 28.02 192 28 38.07 192 90 44.3 192 360 36.45 228 28 45.85 228 28 39.29 228 28 47.81 228 28 47.03 228 90 42.33 228 90 52.91 228 90 40.27 228 270 43.01 228 270 41.05 228 365 43.7 228 365 X1 <= 177.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1 > 177.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1 <= 177.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16 Y ๐‘‹1 ๐‘‹2 79.99 26,244 28 61.89 26,244 28 28.02 36,864 28 38.07 36,864 90 44.3 36,864 360 36.45 51,984 28 45.85 51,984 28 39.29 51,984 28 47.81 51,984 28 47.03 51,984 90 42.33 51,984 90 52.91 51,984 90 40.27 51,984 270 43.01 51,984 270 41.05 51,984 365 43.7 51,984 365 Y ๐‘‹1 ๐‘‹2 79.99 26,244 28 61.89 26,244 28 Y ๐‘‹1 ๐‘‹2 28.02 36,864 28 38.07 36,864 90 44.3 36,864 360 36.45 51,984 28 45.85 51,984 28 39.29 51,984 28 47.81 51,984 28 47.03 51,984 90 42.33 51,984 90 52.91 51,984 90 40.27 51,984 270 43.01 51,984 270 41.05 51,984 365 43.7 51,984 365 X1_SQ <= 31554.00 Terminal Node 1 STD = 9.049 Avg = 70.937 W = 2.000 N = 2 X1_SQ > 31554.00 Terminal Node 2 STD = 5.700 Avg = 42.150 W = 14.000 N = 14 Node 1 X1_SQ <= 31554.00 STD = 11.371 Avg = 45.748 W = 16.000 N = 16
  • 38. Where are we? ๏ƒผ CART Splitting Process ๏ƒผ CART Pruning ๏ƒผ Advantages of CART (this is what allows you to build models with dirty data) ๏ฑ Interpreting CART Output ๏ฑ Applied Example using CART ๏ฑ Random Forest Section
  • 39. CART: Relative Error Relative error- used to determine the optimal model complexity for CART models GOOD: Relative error values close to zero mean that CART is doing a better job than predicting only the overall average (or median) for all records in the data BAD: Relative error values equal to one means that that CART is no better than predicting the overall average (or median) of the target variable for every record. Note: the relative error can be greater than one which is especially bad. The relative error can be computed for both Least Squares: LS = ๐‘– ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘– 2 and Least Absolute Deviation: LAD = ๐‘– ๐‘ฆ๐‘– โˆ’ ๐‘ฆ๐‘– Relative Error = ๐ถ๐ด๐‘…๐‘‡ ๐‘€๐‘œ๐‘‘๐‘’๐‘™ ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘ข๐‘ ๐‘–๐‘›๐‘” ๐‘’๐‘–๐‘กโ„Ž๐‘’๐‘Ÿ ๐ฟ๐‘† ๐‘œ๐‘Ÿ ๐ฟ๐ด๐ท ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘“๐‘œ๐‘Ÿ ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘›๐‘” ๐‘กโ„Ž๐‘’ ๐‘œ๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘™๐‘™ ๐‘Ž๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘”๐‘’ ๐‘“๐‘œ๐‘Ÿ ๐‘Ž๐‘™๐‘™ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘Ÿ๐‘‘๐‘  LAD Relative Error = Relative Error: 0.129
  • 40. CART: Variable Importance CART Variable Importance: sum each variableโ€™s split improvement score across the splits in the tree. The importance scores for variables are increased in two ways: 1) When the variable is actually used to a split a node 2) When the variable is as the surrogate split (i.e. the backup splitting variable when the primary splitting variables has a missing value) โ€œConsider Only Primary Splittersโ€ (green rectangle on the right) removes the surrogate splitting variables from the variable importance calculation โ€œDiscount Surrogatesโ€ allows you to discount surrogates in a more specific manner
  • 42. CART Performance Salford Systems ยฉ 2016 42 Method Linear Regression 109.04 Linear Regression with interactions 67.35 Min 1 SE CART (default settings) 65.05 CART (default settings) 55.99 Random Forest(default settings) Improved Random Forest using an SPM Automate ๐‘€๐‘†๐ธ = 1 ๐‘› ๐‘–=1 ๐‘› ๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘– 2 ***More information about the 1 Standard Error Rule for CART can be found in the appendix
  • 43. Where are we? ๏ƒผ CART Splitting Process ๏ƒผ CART Pruning ๏ƒผ Advantages of CART (this is what allows you to build models with dirty data) ๏ƒผ Interpreting CART Output ๏ƒผ Applied Example using CART ๏ฑ Random Forest Section
  • 44. Introduction to Random Forests Main Idea: fit multiple CART trees to independent โ€œbootstrapโ€ samples of the data and then combine the predictions Leo Breiman, one of the co-creators of CART, also created Random Forests and published a paper on this method in 2001 Our RandomForestsยฎ software was developed in close consultation with Breiman himself
  • 45. What is a bootstrap sample? A bootstrap sample is a random sample conducted with replacement Steps: 1. Randomly select an observation from the original data 2. โ€œWrite it downโ€ 3. โ€œPut it backโ€ (i.e. any observation can be selected more than once) Repeat steps 1-3 N times; N is the number of observations in the original sample FINAL RESULT: One โ€œbootstrap sampleโ€ with N observations โ€ฆ Bootstrap SampleOriginal Data 0 48 3 0 48 3 0 37 1 0 37 1 0 37 1 0 . 10 . 1 0 24 4 0 24 4 0 37 1 Y X1 X2
  • 46. Original Data โ€ฆ..Bootstrap 1 Bootstrap 2 Bootstrap 199 Bootstrap 200 Final Prediction for a New Record: take the average of the 200 individual predictions 10.5 + 9.8 โ€ฆ + 10.73 + 12 200 Predict a New Record : run the record down each tree, each time computing a prediction 1. Draw a bootstrap sample 2. Fit a large, unpruned, CART tree to this bootstrap sample -At each split in the tree consider only k randomly selected variables instead of all of them 3. Average the predictions to predict a new record Repeat Steps 1-2 at least 200 times Tree 1: 10.5 Tree 2: 9.8 Tree 199: 10.73 Tree 200: 12โ€ฆ..
  • 47. CART and Random Forests When you build a Random Forest model just keep this picture in the back of your mind: The reason is because a Random Forest is really just an average of CART trees constructed on bootstrap samples of the original data = ++ +โ€ฆ +1 2 3 B [ ] 1 ๐ต
  • 48. CART and Random Forests Random Forests generally have superior predictive performance versus CART trees because Random Forests have lower variance than a single CART tree Since Random Forests are a combination of CART trees they inherit many of CARTโ€™s properties: Automatic Variable selection Variable interaction detection Nonlinear relationship detection Missing value handling Outlier handling Modeling of local effects Invariant to monotone transformations of predictors One drawback is that a Random Forest is not as interpretable as a single CART tree
  • 49. Random Forests: Tuning Parameters The performance of a Random Forest is dependent upon the values of certain model parameters Two of these parameters are 1. Number of trees 2. Random number of variables chosen at each split
  • 50. Random Forests: Number of Trees Number of trees Default: 200 The number of trees should be large enough so that the model error no longer meaningfully declines as the number of trees increases (Experimentation will be required) In Random Forests the optimal number of trees tends to be the maximum value allotted (due to the Law of Large Numbers) Default Setting: 200 Trees My Setting: 400 Trees There is not much of a difference between the error for a forest with 200 trees and one with 400 trees, so, at least for this dataset, a larger number of trees will not improve the model meaningfully
  • 51. Random Forests: Tuning Parameters The performance of a Random Forest is dependent upon the values of certain model parameters Two of these parameters are 1. Number of trees 2. Random number of variables chosen at each split
  • 52. Random Forest Parameters: Random variable subset size Random number of variables k chosen at each split in each tree in the forest Default: k=3 Experimentation will be required to find the optimal value and this can be done using Automate RFNPREDS Automate RFNPREDS- automatically build multiple Random Forests: each time the forest is the same except that the number of randomly selected variables at each split in each CART tree changes This allows us to conveniently determine the optimal number of variables to randomly select at each split in each tree This output is telling you that optimal number of randomly selected variables at each split in each tree in the forest is 5.
  • 53. Interpreting a Random Forest Since a Random Forest is a collection of hundreds or even thousands of CART trees, the simple interpretation is lost because we now have hundreds of trees and are averaging the predictions One method used to interpret a Random Forest is variable importance
  • 54. Random Forest for Regression: Variable Importance CART Variable Importance: sum each variableโ€™s split improvement score across the splits in the tree The importance scores for variables are increased in two way: 1) when the variable is actually used to a split a node and 2) when the variable is as the surrogate split (i.e. the backup splitting variable when the primary splitting variables has a missing value) Random Forest Variable Importance for Regression: 1. Compute a score for every split the variable generates, sum the scores across all splits made ๏‚ง Relative Importance: divide all variable importance scores by the maximum variable importance score (i.e. the most important variable has a relative importance value of 100) Note: For classification models, the preferred method is the random permutation method (see appendix for more details)
  • 56. Random Forests: Model Performance Salford Systems ยฉ 2016 56 Method Linear Regression 109.04 Linear Regression with interactions 67.35 Min 1 SE CART (default settings 65.05 CART (default settings) 55.99 Random Forest (default settings) 37.70 Improved Random Forest using an SPM Automate 36.02 ๐‘€๐‘†๐ธ = 1 ๐‘› ๐‘–=1 ๐‘› ๐‘Œ๐‘– โˆ’ ๐‘Œ๐‘– 2
  • 57. Conclusion CART produces an interpretable model that is more resistant to outliers, predicts future data well, and automatically handles 1. Variables interactions 2. Missing values 3. Nonlinear relationships 4. Local effects Random Forests are fundamentally a combination of individual CART trees and thus inherit all of the advantages of CART above (except the nice interpretation) *Generally is superior to a single CART tree in terms of predictive accuracy
  • 58. Next in the seriesโ€ฆ Improve Your Regression with TreeNetยฎ Gradient Boosting
  • 59. Try CART and Random Forest Download SPMยฎ 8 now to start building CART and Random Forest models on your data We will be more than happy to personally help you if you have any questions or need assistance My Email: charrison@salford-systems.com Support Email: support@salford-systems.com The appendix follows this slide
  • 60. Appendix 1 Standard Error Rule for CART trees
  • 61. 1SE Rule in SPM Optimal Tree 1 Standard Error Rule Tree 1SE Rule Tree: the smallest tree whose error is within one standard deviation of the minimum error Figures: Upper Left: Optimal Tree has 188 terminal nodes and a relative error of .177; Upper Right: the 1SE tree has 85 terminal nodes and a relative error .209 Smaller trees are preferred because they are less likely to overfit the data (i.e. the 1SE tree in this case is competitive in terms of accuracy and is much less complex) and they are easier to interpret Relative Error = ๐ถ๐ด๐‘…๐‘‡ ๐‘€๐‘œ๐‘‘๐‘’๐‘™ ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘ข๐‘ ๐‘–๐‘›๐‘” ๐‘’๐‘–๐‘กโ„Ž๐‘’๐‘Ÿ ๐ฟ๐‘† ๐‘œ๐‘Ÿ ๐ฟ๐ด๐ท ๐ธ๐‘Ÿ๐‘Ÿ๐‘œ๐‘Ÿ ๐‘“๐‘œ๐‘Ÿ ๐‘๐‘Ÿ๐‘’๐‘‘๐‘–๐‘๐‘ก๐‘–๐‘›๐‘” ๐‘กโ„Ž๐‘’ ๐‘œ๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘™๐‘™ ๐‘Ž๐‘ฃ๐‘’๐‘Ÿ๐‘Ž๐‘”๐‘’ (๐‘œ๐‘Ÿ ๐‘š๐‘’๐‘‘๐‘–๐‘Ž๐‘› ๐‘–๐‘“ ๐‘ข๐‘ ๐‘–๐‘›๐‘” ๐ฟ๐ด๐ท) ๐‘“๐‘œ๐‘Ÿ ๐‘Ž๐‘™๐‘™ ๐‘Ÿ๐‘’๐‘๐‘œ๐‘Ÿ๐‘‘๐‘ 

Editor's Notes

  1. 63.2% * 300 =