SlideShare a Scribd company logo
1 of 62
Download to read offline
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 17/3/2018
Ensemble models and
Gradient Boosting, part 2.
Leonardo Auslender
Independent Statistical Consultant
Leonardo ‘dot’ Auslender ‘at’
Gmail ‘dot’ com.
Copyright 2018.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 27/3/2018
2 studies
2.8.b: Raw data, GB without constraints on its
parameters, compared to its friends.
2.8.c: Comparison of methods but focusing on whether
raw vs 50/50 re-sampling makes a difference.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 37/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 47/3/2018
Partial Dependency plots (PDP).
Due to GB’s (and other methods’) black-box nature, these plots show the
effect of predictor X on modeled response once all other predictors
have been marginalized (integrated away). Marginalized Predictors
usually fixed at constant value, typically mean.
Graphs may not capture nature of variable interactions especially if
interaction significantly affects model outcome.
Formally, PDP of F(x1, x2, xp) on X is E(F) over all vars except X. Thus, for
given Xs, PDP is average of predictions in training with Xs kept constant.
Since GB, Boosting, Bagging, etc are BLACK BOX models, use PDP to
obtain model interpretation. Also useful for logistic models.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 57/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 67/3/2018
Analytical problem to investigate.
Optical Health Care fraud insurance patients. Longer care typically involves higher
treatment costs and insurance company has to set up reserves immediately as soon as
a case is opened. Sometimes doctors involve in fraud.
Aim: predict fraudulent charges  classification problem; use battery of models and
compare them. Below left, original data (M1 models. Focus is on comparisons across
models (see earlier chapters for individual models analytics). For brevity sake, omitted
mean and median ensembles.
Model Name Item Information
1
M1 TRN data set train
. TRN num obs 3595
1
VAL data set validata
1
. VAL num obs 2365
1
TST data set 1
. TST num obs 1
2
Dep. Var fraud
1
TRN % Events 20.389
1
VAL % Events 19.281
1
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 77/3/2018
E.g., 08_M1_VAL_BAGGING: 8th model of M1 data set case, Validation and using Bagging as the modeling
technique.
Requested Models: Names & Descriptions. Model #
Full Model Name Model Description
***
Overall Models
-1
M1 Raw 20pct
-10
01_M1_NSMBL_TRN_AVG Ensemble AVG
1
02_M1_NSMBL_TRN_LOGISTIC_NONE Logistic TRN NONE Ensemble
2
03_M1_NSMBL_TRN_MED Ensemble MED
3
04_M1_NSMBL_VAL_AVG Ensemble AVG
4
05_M1_NSMBL_VAL_LOGISTIC_NONE Logistic VAL NONE Ensemble
5
06_M1_NSMBL_VAL_MED Ensemble MED
6
07_M1_TRN_BAGGING Bagging TRN Bagging
7
08_M1_TRN_GRAD_BOOSTING Gradient Boosting
8
09_M1_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE
9
10_M1_TRN_RFORESTS Random Forests
10
11_M1_TRN_TREES Trees TRN Trees
11
12_M1_VAL_BAGGING Trees VAL Trees
12
13_M1_VAL_GRAD_BOOSTING Gradient Boosting
13
14_M1_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE
14
15_M1_VAL_RFORESTS Random Forests
15
16_M1_VAL_TREES Trees VAL Trees
16
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 87/3/2018
For models other than Tree themselves, modeled posterior
probabilities via interval valued target variable (includes
logistic and ensembles).
For simplicity, just first 2 levels of trees are shown.
Notation: M1_GB_TRN_TREES: Data M1, Tree simulation of
Gradient boosting run (GB). BG: Bagging, RF: Random Forests, LG
logistic, NSMBL: ensemble.
Intention: obtain general idea of tree representation for
comparison to standard tree model. .
Next page: small detail for BG (Bagging), GB Gradient Boosting and Trees
themselves. Later, graphical comparison of vars + splits at each tree level.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 97/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 107/3/2018
Requested Tree Models: Names & Descriptions.
Pred
Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.488
M1_BG_TRN_TREES no_claims < 0.5 (
0.142 )
member_duratio
n < 180.5 ( 0.2 )
total_spend <
5250 ( 0.519 )
total_spend >=
5050 ( 0.488 )
total_spend <
5050 ( 0.522 ) 0.522
total_spend >=
5250 ( 0.184 )
optom_presc >=
3.5 ( 0.324 )
0.324
optom_presc <
3.5 ( 0.175 ) 0.175
member_duratio
n >= 180.5 (
0.062 )
doctor_visits <
5.5 ( 0.102 )
member_duratio
n >= 187.5 (
0.099 )
0.099
member_duratio
n < 187.5 ( 0.128
)
0.128
doctor_visits >=
5.5 ( 0.043 )
member_duratio
n < 189.5 ( 0.067
)
0.067
member_duratio
n >= 189.5 (
0.041 )
0.041
no_claims >= 0.5
( 0.446 )
no_claims < 3.5 (
0.395 )
member_duratio
n < 127.5 ( 0.534
)
optom_presc >=
0.5 ( 0.583 )
0.583
optom_presc <
0.5 ( 0.389 ) 0.389
ETC …
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 117/3/2018
Tree Repr.
Level 1
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 127/3/2018
06 actual Tree. Top splitter No_claims, but LG splits at 1.5. Note different prob. events
(bar heights).
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 137/3/2018
Tree Repr.
Level 2
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 147/3/2018
RF pursues
Different structure
search for level 2.
See next Slide as well.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 157/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 167/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 177/3/2018
Etc, for
Levels 3 and 4.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 187/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 197/3/2018
Requested ENSEMBLE Tree Models: Names & Descriptions. Mod #
Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
4
M1_NSMBL_LG_TRN_TR
EES
p_M1_RFOREST
S < 0.32216 (
0.12 )
p_M1_RFOREST
S < 0.20605 (
0.094 )
p_M1_RFOREST
S < 0.13014 (
0.061 )
p_M1_RFOREST
S < 0.09849 (
0.054 )
p_M1_RFOREST
S >= 0.09849 (
0.076 ) 4
p_M1_RFOREST
S >= 0.13014 (
0.118 )
p_M1_RFOREST
S >= 0.17199 (
0.138 )
4
p_M1_RFOREST
S < 0.17199 (
0.107 ) 4
p_M1_RFOREST
S >= 0.20605 (
0.25 )
p_M1_RFOREST
S < 0.2581 (
0.208 )
p_M1_BAGGING
< 0.16363 ( 0.326
)
4
p_M1_BAGGING
>= 0.16363 (
0.195 ) 4
p_M1_RFOREST
S >= 0.2581 (
0.308 )
p_M1_TREES >=
0.09642 ( 0.299 )
4
p_M1_TREES <
0.09642 ( 0.642 ) 4
p_M1_RFOREST
S >= 0.32216 (
0.693 )
p_M1_RFOREST
S < 0.45694 (
0.499 )
p_M1_LOGISTIC
_STEPWISE <
0.40764 ( 0.602 )
p_M1_RFOREST
S >= 0.40264 (
0.767 )
4
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 207/3/2018
M1 ensembled mostly in RF, does it mean that RF is best model?.
Requested ENSEMBLE Tree Models: Names & Descriptions. Mod #
Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
M1_NSMBL_LG_TRN_TR
EES
p_M1_RFOREST
S >= 0.32216 (
0.693 )
p_M1_RFOREST
S < 0.45694 (
0.499 )
p_M1_LOGISTIC
_STEPWISE <
0.40764 ( 0.602 )
p_M1_RFOREST
S < 0.40264 (
0.535 )
4
p_M1_LOGISTIC
_STEPWISE >=
0.40764 ( 0.38 )
p_M1_RFOREST
S >= 0.38749 (
0.441 )
4
p_M1_RFOREST
S < 0.38749 (
0.308 )
4
p_M1_RFOREST
S >= 0.45694 (
0.872 )
p_M1_RFOREST
S < 0.61918 (
0.821 )
p_M1_LOGISTIC
_STEPWISE >=
0.59377 ( 0.725 )
4
p_M1_LOGISTIC
_STEPWISE <
0.59377 ( 0.887 )
4
p_M1_RFOREST
S >= 0.61918 (
0.954 )
p_M1_TREES <
0.92105 ( 0.985 )
4
p_M1_TREES >=
0.92105 ( 0.903 )
4
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 217/3/2018
Ensemble
for level 1
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 227/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 237/3/2018
Ensemble
for level 2
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 247/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 257/3/2018
Conclusion on tree representations I
No_claims at 0.5 certainly top splitter for most TREE models but notice
that event probabilities diverge (because RF, GB and BG model
posterior probability, not a binary event, and thus carry information
from previous models). Later splits diverge in predictors and split
values across models. LG finds a completely different structure and
starts with no_claims at 1.5. Thus, for tree based models, existence of a
claim is a suspicion of fraud, while for logistic it requires higher
threshold.
Ensemble models: mixture of models  typical interpretability from
single model is doubtful when reality is complex.
Important to view each tree model independently to gage
interpretability. Note that ensemble primer splitter is RF but RF is not
best model (it over-fits badly), but is chosen because all methods
minimize misclassification.
And it is important to view these recent findings in terms of variables
importance and “best” model choice.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 267/3/2018
Conclusion on tree representations, II
Most importantly, it looks like RF wins, should we stop now?
(Validation results not shown to add to the suspense).
DO NOT RUSH YOUR CONCLUSIONS and keep on reading.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 277/3/2018
Importance
Measures
For Tree based
Methods.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 287/3/2018
Agreement on No_claims by all methods, not so much for other variables.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 297/3/2018
For GB and BG all predictors matter, RF disparages num_members, Trees doctor_visits.
Comparing GB and RF, GB allocates more importance to all predictors (other than no_claims)
when compared to other methods, which implies that structure by RF is simpler.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 307/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 317/3/2018
Tree methods find no_claims as most important, logistic finds most predictors important.
Validation results show effects of over-fitting (variable doctor_visits)
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 327/3/2018
Note almost null stdzed RF VAL estimate < corresp. P-val Insignificant..
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 337/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 347/3/2018
Partial Dependency
Plots for
Logistic and
Gradient Boosting
Non-Ensemble
Models.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 357/3/2018
Most important var, similar
shapes in both cases. Note the
“logistic” like case of one,
and the jagged shape of the the
other, plus flatness for >= 5 at
0.8 prob..
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 367/3/2018
Num_members eliminated
From logistic stepwise. GB
jagged relationship  there
is strong interaction effect
with other predictors.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 377/3/2018
Pair-wise PDP
For GB
Some variables
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 387/3/2018
Fraud is concentrated on just one or two claims with
lower membership time (two most important vars).
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 397/3/2018
Fraud concentrated on smaller number of members and higher
Number of claims.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 407/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 41O 7/3/2018
Model # 4 (RF) seems best in fitting Prob event once other predictors’ effects are
marginalized away for TRN but VAL results point to GB instead.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 427/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 437/3/2018
Conclusions on PDPs
1) From Ensemble PDPs, it is obvious that RF fails in
validation. All the ensemble power rests on GB strongly
and on logistic with downward slope.
2) Individual Variable PDP shows uniform relationship for
variables in logistic, while GB shows fuzzy and nonlinear
structures.
3) The contour plots for pairs of variables (GB) allows to
focus on ranges of importance. For instance, No_claims
and Member_duration concentrate important information
at low levels of their respective ranges.
4) Still, it is not possible to obtain (at present) simple
interpretable graphs to understand full complexity of GB
models. Logistic are easier to understand, not fully easy.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 447/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 457/3/2018
Tree based methods do not necessarily reach top probability of 1 and lowest of 0.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 467/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 477/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 48
7/3/2018
Not over-fitted.
Some strong over-fit.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 497/3/2018
Over-fit degree different
Than in classif. Rates (prev. slide).
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 507/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 517/3/2018
Note that TRN and VAL rank do not match. Lower VAL ranked Models tend to overfit more.
GOF ranks
GOF measure
rankAUROC
Avg
Square
Error
Cum Lift
3rd bin
Cum
Resp
Rate
3rd Gini
Rsquare
Cramer
Tjur
rank rank rank rank rank rank
Unw.
Mean
Unw.
Median
Model Name
1 1 1 1 1 1 1.00 1.00
01_M1_NSMBL_TRN_LOGISTIC_
NONE
03_M1_TRN_BAGGING
4 4 4 4 4 4 4.00 4.00
04_M1_TRN_GRAD_BOOSTING
3 2 3 3 3 3 2.83 3.00
05_M1_TRN_LOGISTIC_STEPWI
SE 6 6 6 6 6 6 6.00 6.00
06_M1_TRN_RFORESTS
2 3 2 2 2 5 2.67 2.00
07_M1_TRN_TREES
5 5 5 5 5 2 4.50 5.00
GOF ranks
GOF measure
rankAUROC
Avg
Square
Error
Cum Lift
3rd bin
Cum
Resp
Rate
3rd Gini
Rsquare
Cramer
Tjur
rank rank rank rank rank rank
Unw.
Mean
Unw.
Median
Model Name
1 1 2 2 1 1 1.33 1.00
02_M1_NSMBL_VAL_LOGISTIC_
NONE
08_M1_VAL_BAGGING 5 6 5 5 5 5 5.17 5.00
09_M1_VAL_GRAD_BOOSTING 2 2 1 1 2 2 1.67 2.00
10_M1_VAL_LOGISTIC_STEPWIS
E 3 3 4 4 3 4 3.50 3.50
11_M1_VAL_RFORESTS 6 5 6 6 6 6 5.83 6.00
12_M1_VAL_TREES 4 4 3 3 4 3 3.50 3.50
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 527/3/2018
Based on this methodology, winner, and GB single best model. Alternative selection methods for
best models are users’ dependent.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 537/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 547/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 557/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 567/3/2018
ETC.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 577/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 587/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 597/3/2018
Ensembles have good performance and no over-fitting.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 607/3/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 617/3/2018
Conclusions
At least for the present defaults of RF in this presentation, it
has badly over-fitted. The best overall model is the
ensemble and the best single model is given by Gradient
Boosting.
The user should decide which metric to use for judging
goodness. In here, simple unweighted ranking of 5
measures was used.
Since there was no financial information, models could not
be measured in terms of profits. K-S chart (not
recommended) shows different cut-off points per model.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 Ch. 5-627/3/2018
for now

More Related Content

Similar to 4 2 ensemble models and grad boost part 2 2019-10-07

4 2 ensemble models and grad boost part 1
4 2 ensemble models and grad boost part 14 2 ensemble models and grad boost part 1
4 2 ensemble models and grad boost part 1Leonardo Auslender
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdfLeonardo Auslender
 
Statistics project2
Statistics project2Statistics project2
Statistics project2shri1984
 
IRJET - Crude Oil Price Forecasting using ARIMA Model
IRJET -  	  Crude Oil Price Forecasting using ARIMA ModelIRJET -  	  Crude Oil Price Forecasting using ARIMA Model
IRJET - Crude Oil Price Forecasting using ARIMA ModelIRJET Journal
 
Optimization of EDM process parameters for machining SS310
Optimization of EDM process parameters for machining SS310Optimization of EDM process parameters for machining SS310
Optimization of EDM process parameters for machining SS310IRJET Journal
 
IRJET- Rejection Analysis in Fuel Equipment
IRJET- Rejection Analysis in Fuel EquipmentIRJET- Rejection Analysis in Fuel Equipment
IRJET- Rejection Analysis in Fuel EquipmentIRJET Journal
 
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...IRJET Journal
 
Vibration Analysis of Car Door using FE and Experimental Technique
Vibration Analysis of Car Door using FE and Experimental TechniqueVibration Analysis of Car Door using FE and Experimental Technique
Vibration Analysis of Car Door using FE and Experimental TechniqueIRJET Journal
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Normalization_BCA_
Normalization_BCA_Normalization_BCA_
Normalization_BCA_Bhavini Shah
 
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...IRJET Journal
 
6t40_45_diag_fixes.pdf
6t40_45_diag_fixes.pdf6t40_45_diag_fixes.pdf
6t40_45_diag_fixes.pdfssuserb6b705
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdfLeonardo Auslender
 
Optimization of crosspiece of washing machine
Optimization of crosspiece of washing machineOptimization of crosspiece of washing machine
Optimization of crosspiece of washing machineeSAT Journals
 
Optimization of crosspiece of washing machine
Optimization of crosspiece of washing machineOptimization of crosspiece of washing machine
Optimization of crosspiece of washing machineeSAT Publishing House
 
CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...
CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...
CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...Vishnu Sai
 
IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...
IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...
IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...IRJET Journal
 

Similar to 4 2 ensemble models and grad boost part 2 2019-10-07 (20)

Summary jpx wp_en_no9
Summary jpx wp_en_no9Summary jpx wp_en_no9
Summary jpx wp_en_no9
 
4 2 ensemble models and grad boost part 1
4 2 ensemble models and grad boost part 14 2 ensemble models and grad boost part 1
4 2 ensemble models and grad boost part 1
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
 
Statistics project2
Statistics project2Statistics project2
Statistics project2
 
IRJET - Crude Oil Price Forecasting using ARIMA Model
IRJET -  	  Crude Oil Price Forecasting using ARIMA ModelIRJET -  	  Crude Oil Price Forecasting using ARIMA Model
IRJET - Crude Oil Price Forecasting using ARIMA Model
 
Optimization of EDM process parameters for machining SS310
Optimization of EDM process parameters for machining SS310Optimization of EDM process parameters for machining SS310
Optimization of EDM process parameters for machining SS310
 
IRJET- Rejection Analysis in Fuel Equipment
IRJET- Rejection Analysis in Fuel EquipmentIRJET- Rejection Analysis in Fuel Equipment
IRJET- Rejection Analysis in Fuel Equipment
 
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
 
Protein Structure Alignment
Protein Structure AlignmentProtein Structure Alignment
Protein Structure Alignment
 
Vibration Analysis of Car Door using FE and Experimental Technique
Vibration Analysis of Car Door using FE and Experimental TechniqueVibration Analysis of Car Door using FE and Experimental Technique
Vibration Analysis of Car Door using FE and Experimental Technique
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Normalization_BCA_
Normalization_BCA_Normalization_BCA_
Normalization_BCA_
 
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
 
6t40_45_diag_fixes.pdf
6t40_45_diag_fixes.pdf6t40_45_diag_fixes.pdf
6t40_45_diag_fixes.pdf
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
 
Optimization of crosspiece of washing machine
Optimization of crosspiece of washing machineOptimization of crosspiece of washing machine
Optimization of crosspiece of washing machine
 
Optimization of crosspiece of washing machine
Optimization of crosspiece of washing machineOptimization of crosspiece of washing machine
Optimization of crosspiece of washing machine
 
CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...
CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...
CUT SURFACE QUALITY CHARACTERISTICS ON STAINLESS STEEL USING PLASMA FUSION CU...
 
IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...
IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...
IRJET- Simulation of Turning with Finite Element Thermal Modeling of Aerospac...
 
Af36188193
Af36188193Af36188193
Af36188193
 

More from Leonardo Auslender (20)

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
1 UMI.pdf
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
 
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
 
4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
1 EDA.pdf
 
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
4 meda
4 meda4 meda
4 meda
 
3 beda
3 beda3 beda
3 beda
 
2 ueda
2 ueda2 ueda
2 ueda
 
1 eda
1 eda1 eda
1 eda
 
0 statistics intro
0 statistics intro0 statistics intro
0 statistics intro
 
4 1 tree world
4 1 tree world4 1 tree world
4 1 tree world
 

Recently uploaded

Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxJocylDuran
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样wsppdmt
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1sinhaabhiyanshu
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...varanasisatyanvesh
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 

Recently uploaded (20)

Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
bams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptxbams-3rd-case-presentation-scabies-12-05-2020.pptx
bams-3rd-case-presentation-scabies-12-05-2020.pptx
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 

4 2 ensemble models and grad boost part 2 2019-10-07

  • 1. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 17/3/2018 Ensemble models and Gradient Boosting, part 2. Leonardo Auslender Independent Statistical Consultant Leonardo ‘dot’ Auslender ‘at’ Gmail ‘dot’ com. Copyright 2018.
  • 2. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 27/3/2018 2 studies 2.8.b: Raw data, GB without constraints on its parameters, compared to its friends. 2.8.c: Comparison of methods but focusing on whether raw vs 50/50 re-sampling makes a difference.
  • 3. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 37/3/2018
  • 4. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 47/3/2018 Partial Dependency plots (PDP). Due to GB’s (and other methods’) black-box nature, these plots show the effect of predictor X on modeled response once all other predictors have been marginalized (integrated away). Marginalized Predictors usually fixed at constant value, typically mean. Graphs may not capture nature of variable interactions especially if interaction significantly affects model outcome. Formally, PDP of F(x1, x2, xp) on X is E(F) over all vars except X. Thus, for given Xs, PDP is average of predictions in training with Xs kept constant. Since GB, Boosting, Bagging, etc are BLACK BOX models, use PDP to obtain model interpretation. Also useful for logistic models.
  • 5. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 57/3/2018
  • 6. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 67/3/2018 Analytical problem to investigate. Optical Health Care fraud insurance patients. Longer care typically involves higher treatment costs and insurance company has to set up reserves immediately as soon as a case is opened. Sometimes doctors involve in fraud. Aim: predict fraudulent charges  classification problem; use battery of models and compare them. Below left, original data (M1 models. Focus is on comparisons across models (see earlier chapters for individual models analytics). For brevity sake, omitted mean and median ensembles. Model Name Item Information 1 M1 TRN data set train . TRN num obs 3595 1 VAL data set validata 1 . VAL num obs 2365 1 TST data set 1 . TST num obs 1 2 Dep. Var fraud 1 TRN % Events 20.389 1 VAL % Events 19.281 1
  • 7. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 77/3/2018 E.g., 08_M1_VAL_BAGGING: 8th model of M1 data set case, Validation and using Bagging as the modeling technique. Requested Models: Names & Descriptions. Model # Full Model Name Model Description *** Overall Models -1 M1 Raw 20pct -10 01_M1_NSMBL_TRN_AVG Ensemble AVG 1 02_M1_NSMBL_TRN_LOGISTIC_NONE Logistic TRN NONE Ensemble 2 03_M1_NSMBL_TRN_MED Ensemble MED 3 04_M1_NSMBL_VAL_AVG Ensemble AVG 4 05_M1_NSMBL_VAL_LOGISTIC_NONE Logistic VAL NONE Ensemble 5 06_M1_NSMBL_VAL_MED Ensemble MED 6 07_M1_TRN_BAGGING Bagging TRN Bagging 7 08_M1_TRN_GRAD_BOOSTING Gradient Boosting 8 09_M1_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE 9 10_M1_TRN_RFORESTS Random Forests 10 11_M1_TRN_TREES Trees TRN Trees 11 12_M1_VAL_BAGGING Trees VAL Trees 12 13_M1_VAL_GRAD_BOOSTING Gradient Boosting 13 14_M1_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE 14 15_M1_VAL_RFORESTS Random Forests 15 16_M1_VAL_TREES Trees VAL Trees 16
  • 8. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 87/3/2018 For models other than Tree themselves, modeled posterior probabilities via interval valued target variable (includes logistic and ensembles). For simplicity, just first 2 levels of trees are shown. Notation: M1_GB_TRN_TREES: Data M1, Tree simulation of Gradient boosting run (GB). BG: Bagging, RF: Random Forests, LG logistic, NSMBL: ensemble. Intention: obtain general idea of tree representation for comparison to standard tree model. . Next page: small detail for BG (Bagging), GB Gradient Boosting and Trees themselves. Later, graphical comparison of vars + splits at each tree level.
  • 9. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 97/3/2018
  • 10. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 107/3/2018 Requested Tree Models: Names & Descriptions. Pred Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.488 M1_BG_TRN_TREES no_claims < 0.5 ( 0.142 ) member_duratio n < 180.5 ( 0.2 ) total_spend < 5250 ( 0.519 ) total_spend >= 5050 ( 0.488 ) total_spend < 5050 ( 0.522 ) 0.522 total_spend >= 5250 ( 0.184 ) optom_presc >= 3.5 ( 0.324 ) 0.324 optom_presc < 3.5 ( 0.175 ) 0.175 member_duratio n >= 180.5 ( 0.062 ) doctor_visits < 5.5 ( 0.102 ) member_duratio n >= 187.5 ( 0.099 ) 0.099 member_duratio n < 187.5 ( 0.128 ) 0.128 doctor_visits >= 5.5 ( 0.043 ) member_duratio n < 189.5 ( 0.067 ) 0.067 member_duratio n >= 189.5 ( 0.041 ) 0.041 no_claims >= 0.5 ( 0.446 ) no_claims < 3.5 ( 0.395 ) member_duratio n < 127.5 ( 0.534 ) optom_presc >= 0.5 ( 0.583 ) 0.583 optom_presc < 0.5 ( 0.389 ) 0.389 ETC …
  • 11. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 117/3/2018 Tree Repr. Level 1
  • 12. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 127/3/2018 06 actual Tree. Top splitter No_claims, but LG splits at 1.5. Note different prob. events (bar heights).
  • 13. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 137/3/2018 Tree Repr. Level 2
  • 14. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 147/3/2018 RF pursues Different structure search for level 2. See next Slide as well.
  • 15. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 157/3/2018
  • 16. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 167/3/2018
  • 17. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 177/3/2018 Etc, for Levels 3 and 4.
  • 18. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 187/3/2018
  • 19. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 197/3/2018 Requested ENSEMBLE Tree Models: Names & Descriptions. Mod # Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 4 M1_NSMBL_LG_TRN_TR EES p_M1_RFOREST S < 0.32216 ( 0.12 ) p_M1_RFOREST S < 0.20605 ( 0.094 ) p_M1_RFOREST S < 0.13014 ( 0.061 ) p_M1_RFOREST S < 0.09849 ( 0.054 ) p_M1_RFOREST S >= 0.09849 ( 0.076 ) 4 p_M1_RFOREST S >= 0.13014 ( 0.118 ) p_M1_RFOREST S >= 0.17199 ( 0.138 ) 4 p_M1_RFOREST S < 0.17199 ( 0.107 ) 4 p_M1_RFOREST S >= 0.20605 ( 0.25 ) p_M1_RFOREST S < 0.2581 ( 0.208 ) p_M1_BAGGING < 0.16363 ( 0.326 ) 4 p_M1_BAGGING >= 0.16363 ( 0.195 ) 4 p_M1_RFOREST S >= 0.2581 ( 0.308 ) p_M1_TREES >= 0.09642 ( 0.299 ) 4 p_M1_TREES < 0.09642 ( 0.642 ) 4 p_M1_RFOREST S >= 0.32216 ( 0.693 ) p_M1_RFOREST S < 0.45694 ( 0.499 ) p_M1_LOGISTIC _STEPWISE < 0.40764 ( 0.602 ) p_M1_RFOREST S >= 0.40264 ( 0.767 ) 4
  • 20. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 207/3/2018 M1 ensembled mostly in RF, does it mean that RF is best model?. Requested ENSEMBLE Tree Models: Names & Descriptions. Mod # Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. M1_NSMBL_LG_TRN_TR EES p_M1_RFOREST S >= 0.32216 ( 0.693 ) p_M1_RFOREST S < 0.45694 ( 0.499 ) p_M1_LOGISTIC _STEPWISE < 0.40764 ( 0.602 ) p_M1_RFOREST S < 0.40264 ( 0.535 ) 4 p_M1_LOGISTIC _STEPWISE >= 0.40764 ( 0.38 ) p_M1_RFOREST S >= 0.38749 ( 0.441 ) 4 p_M1_RFOREST S < 0.38749 ( 0.308 ) 4 p_M1_RFOREST S >= 0.45694 ( 0.872 ) p_M1_RFOREST S < 0.61918 ( 0.821 ) p_M1_LOGISTIC _STEPWISE >= 0.59377 ( 0.725 ) 4 p_M1_LOGISTIC _STEPWISE < 0.59377 ( 0.887 ) 4 p_M1_RFOREST S >= 0.61918 ( 0.954 ) p_M1_TREES < 0.92105 ( 0.985 ) 4 p_M1_TREES >= 0.92105 ( 0.903 ) 4
  • 21. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 217/3/2018 Ensemble for level 1
  • 22. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 227/3/2018
  • 23. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 237/3/2018 Ensemble for level 2
  • 24. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 247/3/2018
  • 25. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 257/3/2018 Conclusion on tree representations I No_claims at 0.5 certainly top splitter for most TREE models but notice that event probabilities diverge (because RF, GB and BG model posterior probability, not a binary event, and thus carry information from previous models). Later splits diverge in predictors and split values across models. LG finds a completely different structure and starts with no_claims at 1.5. Thus, for tree based models, existence of a claim is a suspicion of fraud, while for logistic it requires higher threshold. Ensemble models: mixture of models  typical interpretability from single model is doubtful when reality is complex. Important to view each tree model independently to gage interpretability. Note that ensemble primer splitter is RF but RF is not best model (it over-fits badly), but is chosen because all methods minimize misclassification. And it is important to view these recent findings in terms of variables importance and “best” model choice.
  • 26. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 267/3/2018 Conclusion on tree representations, II Most importantly, it looks like RF wins, should we stop now? (Validation results not shown to add to the suspense). DO NOT RUSH YOUR CONCLUSIONS and keep on reading.
  • 27. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 277/3/2018 Importance Measures For Tree based Methods.
  • 28. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 287/3/2018 Agreement on No_claims by all methods, not so much for other variables.
  • 29. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 297/3/2018 For GB and BG all predictors matter, RF disparages num_members, Trees doctor_visits. Comparing GB and RF, GB allocates more importance to all predictors (other than no_claims) when compared to other methods, which implies that structure by RF is simpler.
  • 30. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 307/3/2018
  • 31. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 317/3/2018 Tree methods find no_claims as most important, logistic finds most predictors important. Validation results show effects of over-fitting (variable doctor_visits)
  • 32. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 327/3/2018 Note almost null stdzed RF VAL estimate < corresp. P-val Insignificant..
  • 33. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 337/3/2018
  • 34. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 347/3/2018 Partial Dependency Plots for Logistic and Gradient Boosting Non-Ensemble Models.
  • 35. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 357/3/2018 Most important var, similar shapes in both cases. Note the “logistic” like case of one, and the jagged shape of the the other, plus flatness for >= 5 at 0.8 prob..
  • 36. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 367/3/2018 Num_members eliminated From logistic stepwise. GB jagged relationship  there is strong interaction effect with other predictors.
  • 37. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 377/3/2018 Pair-wise PDP For GB Some variables
  • 38. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 387/3/2018 Fraud is concentrated on just one or two claims with lower membership time (two most important vars).
  • 39. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 397/3/2018 Fraud concentrated on smaller number of members and higher Number of claims.
  • 40. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 407/3/2018
  • 41. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 41O 7/3/2018 Model # 4 (RF) seems best in fitting Prob event once other predictors’ effects are marginalized away for TRN but VAL results point to GB instead.
  • 42. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 427/3/2018
  • 43. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 437/3/2018 Conclusions on PDPs 1) From Ensemble PDPs, it is obvious that RF fails in validation. All the ensemble power rests on GB strongly and on logistic with downward slope. 2) Individual Variable PDP shows uniform relationship for variables in logistic, while GB shows fuzzy and nonlinear structures. 3) The contour plots for pairs of variables (GB) allows to focus on ranges of importance. For instance, No_claims and Member_duration concentrate important information at low levels of their respective ranges. 4) Still, it is not possible to obtain (at present) simple interpretable graphs to understand full complexity of GB models. Logistic are easier to understand, not fully easy.
  • 44. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 447/3/2018
  • 45. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 457/3/2018 Tree based methods do not necessarily reach top probability of 1 and lowest of 0.
  • 46. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 467/3/2018
  • 47. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 477/3/2018
  • 48. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 48 7/3/2018 Not over-fitted. Some strong over-fit.
  • 49. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 497/3/2018 Over-fit degree different Than in classif. Rates (prev. slide).
  • 50. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 507/3/2018
  • 51. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 517/3/2018 Note that TRN and VAL rank do not match. Lower VAL ranked Models tend to overfit more. GOF ranks GOF measure rankAUROC Avg Square Error Cum Lift 3rd bin Cum Resp Rate 3rd Gini Rsquare Cramer Tjur rank rank rank rank rank rank Unw. Mean Unw. Median Model Name 1 1 1 1 1 1 1.00 1.00 01_M1_NSMBL_TRN_LOGISTIC_ NONE 03_M1_TRN_BAGGING 4 4 4 4 4 4 4.00 4.00 04_M1_TRN_GRAD_BOOSTING 3 2 3 3 3 3 2.83 3.00 05_M1_TRN_LOGISTIC_STEPWI SE 6 6 6 6 6 6 6.00 6.00 06_M1_TRN_RFORESTS 2 3 2 2 2 5 2.67 2.00 07_M1_TRN_TREES 5 5 5 5 5 2 4.50 5.00 GOF ranks GOF measure rankAUROC Avg Square Error Cum Lift 3rd bin Cum Resp Rate 3rd Gini Rsquare Cramer Tjur rank rank rank rank rank rank Unw. Mean Unw. Median Model Name 1 1 2 2 1 1 1.33 1.00 02_M1_NSMBL_VAL_LOGISTIC_ NONE 08_M1_VAL_BAGGING 5 6 5 5 5 5 5.17 5.00 09_M1_VAL_GRAD_BOOSTING 2 2 1 1 2 2 1.67 2.00 10_M1_VAL_LOGISTIC_STEPWIS E 3 3 4 4 3 4 3.50 3.50 11_M1_VAL_RFORESTS 6 5 6 6 6 6 5.83 6.00 12_M1_VAL_TREES 4 4 3 3 4 3 3.50 3.50
  • 52. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 527/3/2018 Based on this methodology, winner, and GB single best model. Alternative selection methods for best models are users’ dependent.
  • 53. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 537/3/2018
  • 54. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 547/3/2018
  • 55. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 557/3/2018
  • 56. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 567/3/2018 ETC.
  • 57. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 577/3/2018
  • 58. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 587/3/2018
  • 59. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 597/3/2018 Ensembles have good performance and no over-fitting.
  • 60. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 607/3/2018
  • 61. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 617/3/2018 Conclusions At least for the present defaults of RF in this presentation, it has badly over-fitted. The best overall model is the ensemble and the best single model is given by Gradient Boosting. The user should decide which metric to use for judging goodness. In here, simple unweighted ranking of 5 measures was used. Since there was no financial information, models could not be measured in terms of profits. K-S chart (not recommended) shows different cut-off points per model.
  • 62. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 Ch. 5-627/3/2018 for now