SlideShare a Scribd company logo
1 of 70
Download to read offline
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 1
10/7/2019
Ensemble models and
Gradient Boosting, part 2.
Leonardo Auslender
Independent Statistical Consultant
Leonardo ‘dot’ Auslender ‘at’
Gmail ‘dot’ com.
Copyright 2018.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 2
10/7/2019
2 studies
2.8.b: Raw data, GB without constraints on its
parameters, compared to its friends.
2.8.c: Comparison of methods but focusing on whether
raw vs 50/50 re-sampling makes a difference.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 3
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 4
10/7/2019
Partial Dependency plots (PDP).
Due to GB’s (and other methods’) black-box nature, need tools to
study model structure: these plots show mean Y score of each value
of predictor X matched to entire data set of additional predictors, on
modeled response.
Graphs may not capture nature of variable interactions especially if
interaction significantly affects model outcome.
Formally, PDP of F(x1, x2, xp) on X is E(F) over all vars except X. Thus,
for given Xs, PDP is average of predictions in training with Xs kept
constant.
Since GB, Boosting, Bagging, etc are BLACK BOX models, use PDP to
obtain model interpretation. Also useful for logistic models.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 5
10/7/2019
Modifications of Partial Dependency plots (PDP).
In PDP, each value of X (or pair of X’s) is matched to each observation of
complementary Xs, scores are obtained (predicted Ys) and then averaged
for each X value. Known that if predictors are correlated, PDPs are not
informative. Ergo, partial out effects as in following possible options:
1) Obtain q1 and q3, in addition to avg value to verify stability of average.
2) In like manner as in linear regression, create models from selected
variables but with fully orthogonalized predictors (method proposed by
yours truly). Could be called Partialized PDP, or PPDP.
3) When obtaining 3d PDPs for pairs of variables, obtain Marginal
PDPs, i.e., avg probability of each var2 point along var 1 range.
Reason is that 3d plots typically extrapolate in low density areas ➔
misleading local curves are possible.
4) I(ndividual) C(onditional) E(xpectations) plot: PDP plots for
individual or groups of observations. Grouping defined by user:
quantiles of posterior prob, clustering solution, levels of even
variable, specific individuals, etc.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 6
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 7
10/7/2019
Analytical problem to investigate.
Optical Health Care fraud insurance patients. Longer care typically involves higher
treatment costs and insurance company has to set up reserves immediately as soon as
a case is opened. Sometimes doctors involve in fraud.
Aim: predict fraudulent charges ➔ classification problem; use battery of models and
compare them. Below left, original data (M1 models. Focus is on comparisons across
models (see earlier chapters for individual models analytics). For brevity sake, omitted
mean and median ensembles.
Model Name Item Information
1
M1 TRN data set train
. TRN num obs 3595
1
VAL data set validata
1
. VAL num obs 2365
1
TST data set 1
. TST num obs 1
2
Dep. Var fraud
1
TRN % Events 20.389
1
VAL % Events 19.281
1
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 8
10/7/2019
E.g., 08_M1_VAL_BAGGING: 8th model of M1 data set case, Validation and using Bagging as the modeling
technique.
Requested Models: Names & Descriptions. Model
#
Full Model Name Model Description
***
Overall Models
-1
M1 20 pct prior
-10
01_M1_GB_TRN_TREES Tree Repr. for Gradient Boosting
1
02_M1_LG_TRN_TREES Tree Repr. of Logistic STEPWISE
2
03_M1_NSMBL_LG_TRN_TREES Tree Repr. of Logistic NONE
3
04_M1_TRN_BAGGING Bagging TRN Bagging TRN
4
05_M1_TRN_GRAD_BOOSTING Gradient Boosting
5
06_M1_TRN_LOGISTIC_NONE_NSMBL Logistic TRN NONE TRN ENSEMBLE
6
07_M1_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE TRN
7
08_M1_TRN_NSMBL_AVG Ensemble AVG
8
09_M1_TRN_NSMBL_MED Ensemble MED
9
10_M1_TRN_RFORESTS Random Forests
10
11_M1_TRN_TREES Trees TRN Trees TRN
11
12_M1_VAL_BAGGING Trees VAL Trees VAL
12
13_M1_VAL_GRAD_BOOSTING Gradient Boosting
13
14_M1_VAL_LOGISTIC_NONE_NSMBL Logistic VAL NONE VAL ENSEMBLE
14
15_M1_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE VAL
15
16_M1_VAL_NSMBL_AVG Ensemble AVG
16
17_M1_VAL_NSMBL_MED Ensemble MED
17
18_M1_VAL_RFORESTS Random Forests
18
19_M1_VAL_TREES Trees VAL Trees VAL
19
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 9
10/7/2019
For models other than Tree themselves, modeled posterior
probabilities via interval valued target variable (includes
logistic and ensembles).
For simplicity, just first 2 levels of trees are shown.
Notation: M1_GB_TRN_TREES: Data M1, Tree simulation of
Gradient boosting run (GB). BG: Bagging, RF: Random Forests, LG
logistic, NSMBL: ensemble.
Intention: obtain general idea of tree representation for
comparison to standard tree model. .
Next page: small detail for BG (Bagging), GB Gradient Boosting and Trees
themselves. Later, graphical comparison of vars + splits at each tree level.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 10
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 11
10/7/2019
Tree representation(s) up to 4 levels Model 'M1_BG_TRN_TREES'
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.273
no_claims < 0.5 ( 0.138 ) member_duration < 180.5 (
0.194 )
total_spend < 7950 ( 0.337
)
total_spend >= 7550 (
0.273 )
total_spend < 7550 ( 0.343
) 0.343
total_spend >= 7950 (
0.173 )
optom_presc >= 1.5 ( 0.25 )
0.250
optom_presc < 1.5 ( 0.146 ) 0.146
member_duration >= 180.5
( 0.061 )
doctor_visits < 6.5 ( 0.105 ) doctor_visits >= 5.5 ( 0.081
) 0.081
doctor_visits < 5.5 ( 0.109 ) 0.109
doctor_visits >= 6.5 ( 0.034
)
member_duration < 189.5 (
0.083 )
0.083
member_duration >= 189.5
( 0.03 )
0.030
no_claims >= 0.5 ( 0.43 ) no_claims < 2.5 ( 0.39 ) optom_presc < 0.5 ( 0.304 ) member_duration >= 171.5
( 0.266 )
0.266
member_duration < 171.5 (
0.343 )
0.343
optom_presc >= 0.5 ( 0.447
)
member_duration >= 170.5
( 0.363 )
0.363
member_duration < 170.5 (
0.521 )
0.521
no_claims >= 2.5 ( 0.556 ) optom_presc < 0.5 ( 0.515 )
0.515
optom_presc >= 0.5 ( 0.577
)
member_duration >= 168.5
( 0.54 )
0.540
member_duration < 168.5 (
0.626 )
0.626
Bagging.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 12
10/7/2019
ETC …
Tree representation(s) up to 4 levels Model 'M1_GB_TRN_TREES'
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.186
no_claims < 2.5 ( 0.185 ) no_claims < 0.5 ( 0.159 ) member_duration <
180.5 ( 0.199 )
total_spend >= 5250 (
0.186 )
total_spend < 5250 (
0.464 ) 0.464
member_duration >=
180.5 ( 0.103 )
doctor_visits < 5.5 (
0.126 ) 0.126
doctor_visits >= 5.5 (
0.093 ) 0.093
no_claims >= 0.5 ( 0.321
)
optom_presc < 3.5 (
0.291 )
total_spend < 6300 (
0.467 ) 0.467
total_spend >= 6300 (
0.273 ) 0.273
optom_presc >= 3.5 (
0.59 )
member_duration <
154.5 ( 0.67 )
0.670
member_duration >=
154.5 ( 0.447 ) 0.447
no_claims >= 2.5 ( 0.633
)
no_claims < 4.5 ( 0.57 ) optom_presc < 3.5 (
0.54 )
member_duration >=
128.5 ( 0.498 )
0.498
member_duration <
128.5 ( 0.627 ) 0.627
optom_presc >= 3.5 (
0.81 )
member_duration >=
137 ( 0.785 )
0.785
member_duration < 137
( 0.85 ) 0.850
no_claims >= 4.5 ( 0.761
)
member_duration <
303.5 ( 0.778 )
member_duration >=
148 ( 0.757 )
0.757
member_duration < 148
( 0.823 ) 0.823
G. Boosting
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 13
10/7/2019
Tree Repr.
Level 1
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 14
10/7/2019
06 actual Tree. Top splitter No_claims, but LG splits at 1.5. Note different prob. events
(bar heights).
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 15
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 16
10/7/2019
Tree Repr.
Level 2
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 17
10/7/2019
RF pursues
Different structure
search for level 2.
See next Slide as
well.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 18
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 19
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 20
10/7/2019
Etc, for
Levels 3 and 4.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 21
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 22
10/7/2019
M1 ensembled mostly in RF, does it mean that RF is best model?.
Requested ENSEMBLE Tree Models: Names & Descriptions. Mod #
Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
M1_NSMBL_LG_TRN_TR
EES
p_M1_RFOREST
S >= 0.34378 (
0.707 )
p_M1_RFOREST
S < 0.49055 (
0.541 )
p_M1_LOGISTIC
_STEPWISE <
0.36438 ( 0.669 )
p_M1_RFOREST
S < 0.40898 (
0.563 )
4
p_M1_LOGISTIC
_STEPWISE >=
0.36438 ( 0.476 )
p_M1_RFOREST
S >= 0.42914 (
0.581 )
4
p_M1_RFOREST
S < 0.42914 (
0.408 )
4
p_M1_RFOREST
S >= 0.49055 (
0.893 )
p_M1_RFOREST
S < 0.58664 (
0.829 )
p_M1_LOGISTIC
_STEPWISE >=
0.51796 ( 0.761 )
4
p_M1_LOGISTIC
_STEPWISE <
0.51796 ( 0.899 )
4
p_M1_RFOREST
S >= 0.58664 (
0.941 )
p_M1_TREES >=
0.8273 ( 0.908 )
4
p_M1_TREES <
0.8273 ( 0.966 )
4
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 23
10/7/2019
Ensemble
for level 1
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 24
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 25
10/7/2019
Ensemble
for level 2
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 26
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 27
10/7/2019
Conclusion on tree representations I
No_claims at 0.5 certainly top splitter for most TREE models but notice
that event probabilities diverge (because RF, GB and BG model
posterior probability, not a binary event, and thus carry information
from previous models). Later splits diverge in predictors and split
values across models. LG finds a completely different structure and
starts with no_claims at 1.5. Thus, for tree based models, existence of a
claim is a suspicion of fraud, while for logistic it requires higher
threshold.
Ensemble models: mixture of models ➔ typical interpretability from
single model is doubtful when reality is complex.
Important to view each tree model independently to gage
interpretability. Note that ensemble primer splitter is RF but RF is not
best model (it over-fits badly), but is chosen because all methods
minimize misclassification.
And it is important to view these recent findings in terms of variables
importance and “best” model choice.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 28
10/7/2019
Conclusion on tree representations, II
Most importantly, it looks like RF wins, should we stop now?
(Validation results not shown to add to the suspense).
DO NOT RUSH YOUR CONCLUSIONS and keep on reading.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 29
10/7/2019
Importance
Measures
For Tree based
Methods.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 30
10/7/2019
Agreement on No_claims by all methods, not so much for other variables.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 31
10/7/2019
For GB and BG all predictors matter, RF disparages num_members, Trees doctor_visits.
Comparing GB and RF, GB allocates more importance to all predictors (other than no_claims)
when compared to other methods, which implies that structure by RF is simpler.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 32
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 33
10/7/2019
Tree methods find no_claims as most important, logistic finds most predictors important.
Validation results show effects of over-fitting (variable doctor_visits)
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 34
10/7/2019
Note almost null stdzed RF VAL estimate ➔< corresp. P-val Insignificant..
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 35
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 36
10/7/2019
Partial Dependency
Plots for
Logistic and
Gradient Boosting
Non-Ensemble
Models.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 37
10/7/2019
Most important var, similar
shapes in both cases. Note the
“logistic” like case of one,
and the jagged shape of the the
other, plus flatness for >= 5 at
0.8 prob..
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 38
10/7/2019
Num_members eliminated
From logistic stepwise. GB jagged
relationship ➔ there is strong interaction
effect with other predictors.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 39
10/7/2019
Pair-wise PDP
For
Some variables
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 40
10/7/2019
Fraud is concentrated on lower membership time. TRN Stepwise Logistic (left), GB right,
correlation = 0.02846.
Similar but not
Identical.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 41
10/7/2019
Fraud concentrated on smaller number of members and higher
Number of claims. GB.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 42
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 43
10/7/2019
RF big winner, right? But ….
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 44
O 10/7/2019
Model # 4 (RF) seems best in fitting Prob event once other predictors’ effects are
marginalized away for TRN but VAL results point to GB instead.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 45
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 46
10/7/2019
Pair-wise PDP
For some
Ensemble Model
variables
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 47
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 48
10/7/2019
Conclusions on PDPs
1) From Ensemble PDPs, it is obvious that RF fails in
validation. All the ensemble power rests on GB strongly
and on logistic with downward slope.
2) Individual Variable PDP shows uniform relationship for
variables in logistic, while GB shows fuzzy and nonlinear
structures.
3) The contour plots for pairs of variables (GB) allows to
focus on ranges of importance. For instance, No_claims
and Member_duration concentrate important information
at low levels of their respective ranges.
4) Still, it is not possible to obtain (at present) simple
interpretable graphs to understand full complexity of GB
models. Logistic are easier to understand, not fully easy.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 49
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 50
10/7/2019
Tree based methods do not necessarily reach top probability of 1 and lowest of 0.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 51
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 52
10/7/2019
Overfit?
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 53
10/7/2019
Not over-fitted.
Some strong over-fit.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 54
10/7/2019
Over-fit degree different
Than in classif. Rates (prev. slide).
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 55
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 56
10/7/2019
Note that TRN and VAL rank do not match. Lower VAL ranked Models tend to overfit more.
GOF ranks
GOF measure
rank
AUR
OC
Avg
Squa
re
Error
Class
Rate
Cum
Lift
3rd
bin
Cum
Resp
Rate
3rd Gini
P - R
AUC
Preci
sion
Rate
Rsqu
are
Cram
er
Tjur
Rank Rank Rank Rank Rank Rank Rank Rank Rank
Unw.
Mean
Unw.
Median
Model Name
6 8 8 6 6 6 8 4 8 6.67 6
02_M1_TRN_BAGGING
03_M1_TRN_GRAD_BOOSTING 3 3 7 5 5 3 3 2 6 4.11 3
04_M1_TRN_LOGISTIC_NONE_NSM
BL 1 1 4 2 2 1 1 5 1 2.00 1
05_M1_TRN_LOGISTIC_STEPWISE 8 7 3 8 8 8 6 8 7 7.00 8
06_M1_TRN_NSMBL_AVG 4 5 1 3 3 4 5 7 4 4.00 4
07_M1_TRN_NSMBL_MED 5 4 2 4 4 5 4 6 5 4.33 4
08_M1_TRN_RFORESTS 2 2 5 1 1 2 2 1 3 2.11 2
09_M1_TRN_TREES 7 6 6 7 7 7 7 3 2 5.78 7
GOF ranks
GOF measure
rank
AUR
OC
Avg
Squa
re
Error
Class
Rate
Cum
Lift
3rd
bin
Cum
Resp
Rate
3rd Gini
P - R
AUC
Preci
sion
Rate
Rsqu
are
Cram
er
Tjur
Ran
k Rank Rank Rank Rank Rank Rank Rank Rank
Unw.
Mean
Unw.
Median
Model Name
5 6 7 5 5 5 6 3 7 5.44 5
10_M1_VAL_BAGGING
11_M1_VAL_GRAD_BOOSTING 2 2 5 1 1 2 2 1 2 2.00 2
12_M1_VAL_LOGISTIC_NONE_NSMB
L
1 1 4 2 2 1 1 4 1 1.89 1
13_M1_VAL_LOGISTIC_STEPWISE 6 5 3 6 6 6 5 7 4 5.33 6
14_M1_VAL_NSMBL_AVG 3 3 1 3 3 3 3 6 6 3.44 3
15_M1_VAL_NSMBL_MED 4 4 2 4 4 4 4 5 5 4.00 4
16_M1_VAL_RFORESTS 8 8 8 8 8 8 8 8 8 8.00 8
17_M1_VAL_TREES 7 7 6 7 7 7 7 2 3 5.89 7
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 57
10/7/2019
Based on this methodology, winner, and GB single best model. Alternative selection methods for
best models are users’ dependent., below is just one approach.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 58
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 59
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 60
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 61
10/7/2019
ETC.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 62
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 63
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 64
10/7/2019
Ensembles have good performance and no over-fitting.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 65
10/7/2019
Specific example: Note distance to reach ‘best’ lift.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 66
10/7/2019
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 67
10/7/2019
Note ‘event’ separation for ENSEMBLE case.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 68
10/7/2019
***
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 69
10/7/2019
Conclusions
At least for the present defaults of RF in this presentation, it
has badly over-fitted. The best overall model is the
ensemble and the best single model is given by Gradient
Boosting.
The user should decide which metric to use for judging
goodness. In here, simple unweighted ranking of 5
measures was used.
Since there was no financial information, models could not
be measured in terms of profits. K-S chart (not
recommended) shows different cut-off points per model.
Leonardo Auslender Copyright 2004
Leonardo Auslender – Copyright 2018 Ch. 5-70
10/7/2019
for now

More Related Content

Similar to 4_3_Ensemble models and grad boost part 2.pdf

DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDeltares
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdfLeonardo Auslender
 
Visual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsVisual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsLeonardo Auslender
 
IRJET- Survey on Image Denoising Algorithms
IRJET- Survey on Image Denoising AlgorithmsIRJET- Survey on Image Denoising Algorithms
IRJET- Survey on Image Denoising AlgorithmsIRJET Journal
 
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...IRJET Journal
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeOptics-Trade
 
Design of Secured Ground Vehicle Event Data Recorder for Data Analysis
Design of Secured Ground Vehicle Event Data Recorder for Data AnalysisDesign of Secured Ground Vehicle Event Data Recorder for Data Analysis
Design of Secured Ground Vehicle Event Data Recorder for Data AnalysisIJERA Editor
 
IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...
IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...
IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...IRJET Journal
 
IRJET - Smart Vet Locator for Hybrid Pets
IRJET -  	  Smart Vet Locator for Hybrid PetsIRJET -  	  Smart Vet Locator for Hybrid Pets
IRJET - Smart Vet Locator for Hybrid PetsIRJET Journal
 
2010 RDF credit Risk
2010 RDF credit Risk2010 RDF credit Risk
2010 RDF credit RiskAIS
 
EPFL workshop on sparsity
EPFL workshop on sparsityEPFL workshop on sparsity
EPFL workshop on sparsityJuri Ranieri
 
Image Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationImage Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationIRJET Journal
 
IRJET- A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...
IRJET-  	  A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...IRJET-  	  A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...
IRJET- A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...IRJET Journal
 
IRJET- Auto Range Resistor Sorter
IRJET- Auto Range Resistor SorterIRJET- Auto Range Resistor Sorter
IRJET- Auto Range Resistor SorterIRJET Journal
 
beyond linear programming: mathematical programming extensions
beyond linear programming: mathematical programming extensionsbeyond linear programming: mathematical programming extensions
beyond linear programming: mathematical programming extensionsAngelica Angelo Ocon
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...IRJET Journal
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningIRJET Journal
 

Similar to 4_3_Ensemble models and grad boost part 2.pdf (20)

4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
 
4 1 tree world
4 1 tree world4 1 tree world
4 1 tree world
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - Markus
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
 
Visual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning ModelsVisual Tools for explaining Machine Learning Models
Visual Tools for explaining Machine Learning Models
 
IRJET- Survey on Image Denoising Algorithms
IRJET- Survey on Image Denoising AlgorithmsIRJET- Survey on Image Denoising Algorithms
IRJET- Survey on Image Denoising Algorithms
 
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
IRJET- Analysis of Fractional PID Controller Parameters on Time Domain Specif...
 
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics TradeInstruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
Instruction Manual | ThermTec Wild Thermal Monoculars | Optics Trade
 
Design of Secured Ground Vehicle Event Data Recorder for Data Analysis
Design of Secured Ground Vehicle Event Data Recorder for Data AnalysisDesign of Secured Ground Vehicle Event Data Recorder for Data Analysis
Design of Secured Ground Vehicle Event Data Recorder for Data Analysis
 
IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...
IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...
IRJET- Effect and Optimization of Laser Beam Machining Parameters using Taguc...
 
IRJET - Smart Vet Locator for Hybrid Pets
IRJET -  	  Smart Vet Locator for Hybrid PetsIRJET -  	  Smart Vet Locator for Hybrid Pets
IRJET - Smart Vet Locator for Hybrid Pets
 
Pruning your code
Pruning your codePruning your code
Pruning your code
 
2010 RDF credit Risk
2010 RDF credit Risk2010 RDF credit Risk
2010 RDF credit Risk
 
EPFL workshop on sparsity
EPFL workshop on sparsityEPFL workshop on sparsity
EPFL workshop on sparsity
 
Image Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit IdentificationImage Processing Algorithm for Fruit Identification
Image Processing Algorithm for Fruit Identification
 
IRJET- A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...
IRJET-  	  A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...IRJET-  	  A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...
IRJET- A Novel Gabor Feed Forward Network for Pose Invariant Face Recogni...
 
IRJET- Auto Range Resistor Sorter
IRJET- Auto Range Resistor SorterIRJET- Auto Range Resistor Sorter
IRJET- Auto Range Resistor Sorter
 
beyond linear programming: mathematical programming extensions
beyond linear programming: mathematical programming extensionsbeyond linear programming: mathematical programming extensions
beyond linear programming: mathematical programming extensions
 
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
Optimization Problems Solved by Different Platforms Say Optimum Tool Box (Mat...
 
A Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data MiningA Comparative Study for Anomaly Detection in Data Mining
A Comparative Study for Anomaly Detection in Data Mining
 

More from Leonardo Auslender (20)

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
1 UMI.pdf
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
1 EDA.pdf
 
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
4 meda
4 meda4 meda
4 meda
 
3 beda
3 beda3 beda
3 beda
 
2 ueda
2 ueda2 ueda
2 ueda
 
1 eda
1 eda1 eda
1 eda
 
0 statistics intro
0 statistics intro0 statistics intro
0 statistics intro
 
Classification methods and assessment
Classification methods and assessmentClassification methods and assessment
Classification methods and assessment
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Classification methods and assessment
Classification methods and assessmentClassification methods and assessment
Classification methods and assessment
 

Recently uploaded

How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证pwgnohujw
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单aqpto5bt
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 

Recently uploaded (20)

How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证原件一样伦敦国王学院毕业证成绩单留信学历认证
原件一样伦敦国王学院毕业证成绩单留信学历认证
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 

4_3_Ensemble models and grad boost part 2.pdf

  • 1. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 1 10/7/2019 Ensemble models and Gradient Boosting, part 2. Leonardo Auslender Independent Statistical Consultant Leonardo ‘dot’ Auslender ‘at’ Gmail ‘dot’ com. Copyright 2018.
  • 2. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 2 10/7/2019 2 studies 2.8.b: Raw data, GB without constraints on its parameters, compared to its friends. 2.8.c: Comparison of methods but focusing on whether raw vs 50/50 re-sampling makes a difference.
  • 3. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 3 10/7/2019
  • 4. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 4 10/7/2019 Partial Dependency plots (PDP). Due to GB’s (and other methods’) black-box nature, need tools to study model structure: these plots show mean Y score of each value of predictor X matched to entire data set of additional predictors, on modeled response. Graphs may not capture nature of variable interactions especially if interaction significantly affects model outcome. Formally, PDP of F(x1, x2, xp) on X is E(F) over all vars except X. Thus, for given Xs, PDP is average of predictions in training with Xs kept constant. Since GB, Boosting, Bagging, etc are BLACK BOX models, use PDP to obtain model interpretation. Also useful for logistic models.
  • 5. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 5 10/7/2019 Modifications of Partial Dependency plots (PDP). In PDP, each value of X (or pair of X’s) is matched to each observation of complementary Xs, scores are obtained (predicted Ys) and then averaged for each X value. Known that if predictors are correlated, PDPs are not informative. Ergo, partial out effects as in following possible options: 1) Obtain q1 and q3, in addition to avg value to verify stability of average. 2) In like manner as in linear regression, create models from selected variables but with fully orthogonalized predictors (method proposed by yours truly). Could be called Partialized PDP, or PPDP. 3) When obtaining 3d PDPs for pairs of variables, obtain Marginal PDPs, i.e., avg probability of each var2 point along var 1 range. Reason is that 3d plots typically extrapolate in low density areas ➔ misleading local curves are possible. 4) I(ndividual) C(onditional) E(xpectations) plot: PDP plots for individual or groups of observations. Grouping defined by user: quantiles of posterior prob, clustering solution, levels of even variable, specific individuals, etc.
  • 6. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 6 10/7/2019
  • 7. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 7 10/7/2019 Analytical problem to investigate. Optical Health Care fraud insurance patients. Longer care typically involves higher treatment costs and insurance company has to set up reserves immediately as soon as a case is opened. Sometimes doctors involve in fraud. Aim: predict fraudulent charges ➔ classification problem; use battery of models and compare them. Below left, original data (M1 models. Focus is on comparisons across models (see earlier chapters for individual models analytics). For brevity sake, omitted mean and median ensembles. Model Name Item Information 1 M1 TRN data set train . TRN num obs 3595 1 VAL data set validata 1 . VAL num obs 2365 1 TST data set 1 . TST num obs 1 2 Dep. Var fraud 1 TRN % Events 20.389 1 VAL % Events 19.281 1
  • 8. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 8 10/7/2019 E.g., 08_M1_VAL_BAGGING: 8th model of M1 data set case, Validation and using Bagging as the modeling technique. Requested Models: Names & Descriptions. Model # Full Model Name Model Description *** Overall Models -1 M1 20 pct prior -10 01_M1_GB_TRN_TREES Tree Repr. for Gradient Boosting 1 02_M1_LG_TRN_TREES Tree Repr. of Logistic STEPWISE 2 03_M1_NSMBL_LG_TRN_TREES Tree Repr. of Logistic NONE 3 04_M1_TRN_BAGGING Bagging TRN Bagging TRN 4 05_M1_TRN_GRAD_BOOSTING Gradient Boosting 5 06_M1_TRN_LOGISTIC_NONE_NSMBL Logistic TRN NONE TRN ENSEMBLE 6 07_M1_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE TRN 7 08_M1_TRN_NSMBL_AVG Ensemble AVG 8 09_M1_TRN_NSMBL_MED Ensemble MED 9 10_M1_TRN_RFORESTS Random Forests 10 11_M1_TRN_TREES Trees TRN Trees TRN 11 12_M1_VAL_BAGGING Trees VAL Trees VAL 12 13_M1_VAL_GRAD_BOOSTING Gradient Boosting 13 14_M1_VAL_LOGISTIC_NONE_NSMBL Logistic VAL NONE VAL ENSEMBLE 14 15_M1_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE VAL 15 16_M1_VAL_NSMBL_AVG Ensemble AVG 16 17_M1_VAL_NSMBL_MED Ensemble MED 17 18_M1_VAL_RFORESTS Random Forests 18 19_M1_VAL_TREES Trees VAL Trees VAL 19
  • 9. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 9 10/7/2019 For models other than Tree themselves, modeled posterior probabilities via interval valued target variable (includes logistic and ensembles). For simplicity, just first 2 levels of trees are shown. Notation: M1_GB_TRN_TREES: Data M1, Tree simulation of Gradient boosting run (GB). BG: Bagging, RF: Random Forests, LG logistic, NSMBL: ensemble. Intention: obtain general idea of tree representation for comparison to standard tree model. . Next page: small detail for BG (Bagging), GB Gradient Boosting and Trees themselves. Later, graphical comparison of vars + splits at each tree level.
  • 10. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 10 10/7/2019
  • 11. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 11 10/7/2019 Tree representation(s) up to 4 levels Model 'M1_BG_TRN_TREES' Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.273 no_claims < 0.5 ( 0.138 ) member_duration < 180.5 ( 0.194 ) total_spend < 7950 ( 0.337 ) total_spend >= 7550 ( 0.273 ) total_spend < 7550 ( 0.343 ) 0.343 total_spend >= 7950 ( 0.173 ) optom_presc >= 1.5 ( 0.25 ) 0.250 optom_presc < 1.5 ( 0.146 ) 0.146 member_duration >= 180.5 ( 0.061 ) doctor_visits < 6.5 ( 0.105 ) doctor_visits >= 5.5 ( 0.081 ) 0.081 doctor_visits < 5.5 ( 0.109 ) 0.109 doctor_visits >= 6.5 ( 0.034 ) member_duration < 189.5 ( 0.083 ) 0.083 member_duration >= 189.5 ( 0.03 ) 0.030 no_claims >= 0.5 ( 0.43 ) no_claims < 2.5 ( 0.39 ) optom_presc < 0.5 ( 0.304 ) member_duration >= 171.5 ( 0.266 ) 0.266 member_duration < 171.5 ( 0.343 ) 0.343 optom_presc >= 0.5 ( 0.447 ) member_duration >= 170.5 ( 0.363 ) 0.363 member_duration < 170.5 ( 0.521 ) 0.521 no_claims >= 2.5 ( 0.556 ) optom_presc < 0.5 ( 0.515 ) 0.515 optom_presc >= 0.5 ( 0.577 ) member_duration >= 168.5 ( 0.54 ) 0.540 member_duration < 168.5 ( 0.626 ) 0.626 Bagging.
  • 12. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 12 10/7/2019 ETC … Tree representation(s) up to 4 levels Model 'M1_GB_TRN_TREES' Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.186 no_claims < 2.5 ( 0.185 ) no_claims < 0.5 ( 0.159 ) member_duration < 180.5 ( 0.199 ) total_spend >= 5250 ( 0.186 ) total_spend < 5250 ( 0.464 ) 0.464 member_duration >= 180.5 ( 0.103 ) doctor_visits < 5.5 ( 0.126 ) 0.126 doctor_visits >= 5.5 ( 0.093 ) 0.093 no_claims >= 0.5 ( 0.321 ) optom_presc < 3.5 ( 0.291 ) total_spend < 6300 ( 0.467 ) 0.467 total_spend >= 6300 ( 0.273 ) 0.273 optom_presc >= 3.5 ( 0.59 ) member_duration < 154.5 ( 0.67 ) 0.670 member_duration >= 154.5 ( 0.447 ) 0.447 no_claims >= 2.5 ( 0.633 ) no_claims < 4.5 ( 0.57 ) optom_presc < 3.5 ( 0.54 ) member_duration >= 128.5 ( 0.498 ) 0.498 member_duration < 128.5 ( 0.627 ) 0.627 optom_presc >= 3.5 ( 0.81 ) member_duration >= 137 ( 0.785 ) 0.785 member_duration < 137 ( 0.85 ) 0.850 no_claims >= 4.5 ( 0.761 ) member_duration < 303.5 ( 0.778 ) member_duration >= 148 ( 0.757 ) 0.757 member_duration < 148 ( 0.823 ) 0.823 G. Boosting
  • 13. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 13 10/7/2019 Tree Repr. Level 1
  • 14. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 14 10/7/2019 06 actual Tree. Top splitter No_claims, but LG splits at 1.5. Note different prob. events (bar heights).
  • 15. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 15 10/7/2019
  • 16. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 16 10/7/2019 Tree Repr. Level 2
  • 17. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 17 10/7/2019 RF pursues Different structure search for level 2. See next Slide as well.
  • 18. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 18 10/7/2019
  • 19. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 19 10/7/2019
  • 20. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 20 10/7/2019 Etc, for Levels 3 and 4.
  • 21. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 21 10/7/2019
  • 22. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 22 10/7/2019 M1 ensembled mostly in RF, does it mean that RF is best model?. Requested ENSEMBLE Tree Models: Names & Descriptions. Mod # Model Name Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. M1_NSMBL_LG_TRN_TR EES p_M1_RFOREST S >= 0.34378 ( 0.707 ) p_M1_RFOREST S < 0.49055 ( 0.541 ) p_M1_LOGISTIC _STEPWISE < 0.36438 ( 0.669 ) p_M1_RFOREST S < 0.40898 ( 0.563 ) 4 p_M1_LOGISTIC _STEPWISE >= 0.36438 ( 0.476 ) p_M1_RFOREST S >= 0.42914 ( 0.581 ) 4 p_M1_RFOREST S < 0.42914 ( 0.408 ) 4 p_M1_RFOREST S >= 0.49055 ( 0.893 ) p_M1_RFOREST S < 0.58664 ( 0.829 ) p_M1_LOGISTIC _STEPWISE >= 0.51796 ( 0.761 ) 4 p_M1_LOGISTIC _STEPWISE < 0.51796 ( 0.899 ) 4 p_M1_RFOREST S >= 0.58664 ( 0.941 ) p_M1_TREES >= 0.8273 ( 0.908 ) 4 p_M1_TREES < 0.8273 ( 0.966 ) 4
  • 23. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 23 10/7/2019 Ensemble for level 1
  • 24. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 24 10/7/2019
  • 25. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 25 10/7/2019 Ensemble for level 2
  • 26. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 26 10/7/2019
  • 27. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 27 10/7/2019 Conclusion on tree representations I No_claims at 0.5 certainly top splitter for most TREE models but notice that event probabilities diverge (because RF, GB and BG model posterior probability, not a binary event, and thus carry information from previous models). Later splits diverge in predictors and split values across models. LG finds a completely different structure and starts with no_claims at 1.5. Thus, for tree based models, existence of a claim is a suspicion of fraud, while for logistic it requires higher threshold. Ensemble models: mixture of models ➔ typical interpretability from single model is doubtful when reality is complex. Important to view each tree model independently to gage interpretability. Note that ensemble primer splitter is RF but RF is not best model (it over-fits badly), but is chosen because all methods minimize misclassification. And it is important to view these recent findings in terms of variables importance and “best” model choice.
  • 28. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 28 10/7/2019 Conclusion on tree representations, II Most importantly, it looks like RF wins, should we stop now? (Validation results not shown to add to the suspense). DO NOT RUSH YOUR CONCLUSIONS and keep on reading.
  • 29. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 29 10/7/2019 Importance Measures For Tree based Methods.
  • 30. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 30 10/7/2019 Agreement on No_claims by all methods, not so much for other variables.
  • 31. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 31 10/7/2019 For GB and BG all predictors matter, RF disparages num_members, Trees doctor_visits. Comparing GB and RF, GB allocates more importance to all predictors (other than no_claims) when compared to other methods, which implies that structure by RF is simpler.
  • 32. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 32 10/7/2019
  • 33. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 33 10/7/2019 Tree methods find no_claims as most important, logistic finds most predictors important. Validation results show effects of over-fitting (variable doctor_visits)
  • 34. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 34 10/7/2019 Note almost null stdzed RF VAL estimate ➔< corresp. P-val Insignificant..
  • 35. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 35 10/7/2019
  • 36. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 36 10/7/2019 Partial Dependency Plots for Logistic and Gradient Boosting Non-Ensemble Models.
  • 37. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 37 10/7/2019 Most important var, similar shapes in both cases. Note the “logistic” like case of one, and the jagged shape of the the other, plus flatness for >= 5 at 0.8 prob..
  • 38. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 38 10/7/2019 Num_members eliminated From logistic stepwise. GB jagged relationship ➔ there is strong interaction effect with other predictors.
  • 39. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 39 10/7/2019 Pair-wise PDP For Some variables
  • 40. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 40 10/7/2019 Fraud is concentrated on lower membership time. TRN Stepwise Logistic (left), GB right, correlation = 0.02846. Similar but not Identical.
  • 41. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 41 10/7/2019 Fraud concentrated on smaller number of members and higher Number of claims. GB.
  • 42. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 42 10/7/2019
  • 43. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 43 10/7/2019 RF big winner, right? But ….
  • 44. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 44 O 10/7/2019 Model # 4 (RF) seems best in fitting Prob event once other predictors’ effects are marginalized away for TRN but VAL results point to GB instead.
  • 45. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 45 10/7/2019
  • 46. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 46 10/7/2019 Pair-wise PDP For some Ensemble Model variables
  • 47. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 47 10/7/2019
  • 48. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 48 10/7/2019 Conclusions on PDPs 1) From Ensemble PDPs, it is obvious that RF fails in validation. All the ensemble power rests on GB strongly and on logistic with downward slope. 2) Individual Variable PDP shows uniform relationship for variables in logistic, while GB shows fuzzy and nonlinear structures. 3) The contour plots for pairs of variables (GB) allows to focus on ranges of importance. For instance, No_claims and Member_duration concentrate important information at low levels of their respective ranges. 4) Still, it is not possible to obtain (at present) simple interpretable graphs to understand full complexity of GB models. Logistic are easier to understand, not fully easy.
  • 49. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 49 10/7/2019
  • 50. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 50 10/7/2019 Tree based methods do not necessarily reach top probability of 1 and lowest of 0.
  • 51. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 51 10/7/2019
  • 52. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 52 10/7/2019 Overfit?
  • 53. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 53 10/7/2019 Not over-fitted. Some strong over-fit.
  • 54. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 54 10/7/2019 Over-fit degree different Than in classif. Rates (prev. slide).
  • 55. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 55 10/7/2019
  • 56. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 56 10/7/2019 Note that TRN and VAL rank do not match. Lower VAL ranked Models tend to overfit more. GOF ranks GOF measure rank AUR OC Avg Squa re Error Class Rate Cum Lift 3rd bin Cum Resp Rate 3rd Gini P - R AUC Preci sion Rate Rsqu are Cram er Tjur Rank Rank Rank Rank Rank Rank Rank Rank Rank Unw. Mean Unw. Median Model Name 6 8 8 6 6 6 8 4 8 6.67 6 02_M1_TRN_BAGGING 03_M1_TRN_GRAD_BOOSTING 3 3 7 5 5 3 3 2 6 4.11 3 04_M1_TRN_LOGISTIC_NONE_NSM BL 1 1 4 2 2 1 1 5 1 2.00 1 05_M1_TRN_LOGISTIC_STEPWISE 8 7 3 8 8 8 6 8 7 7.00 8 06_M1_TRN_NSMBL_AVG 4 5 1 3 3 4 5 7 4 4.00 4 07_M1_TRN_NSMBL_MED 5 4 2 4 4 5 4 6 5 4.33 4 08_M1_TRN_RFORESTS 2 2 5 1 1 2 2 1 3 2.11 2 09_M1_TRN_TREES 7 6 6 7 7 7 7 3 2 5.78 7 GOF ranks GOF measure rank AUR OC Avg Squa re Error Class Rate Cum Lift 3rd bin Cum Resp Rate 3rd Gini P - R AUC Preci sion Rate Rsqu are Cram er Tjur Ran k Rank Rank Rank Rank Rank Rank Rank Rank Unw. Mean Unw. Median Model Name 5 6 7 5 5 5 6 3 7 5.44 5 10_M1_VAL_BAGGING 11_M1_VAL_GRAD_BOOSTING 2 2 5 1 1 2 2 1 2 2.00 2 12_M1_VAL_LOGISTIC_NONE_NSMB L 1 1 4 2 2 1 1 4 1 1.89 1 13_M1_VAL_LOGISTIC_STEPWISE 6 5 3 6 6 6 5 7 4 5.33 6 14_M1_VAL_NSMBL_AVG 3 3 1 3 3 3 3 6 6 3.44 3 15_M1_VAL_NSMBL_MED 4 4 2 4 4 4 4 5 5 4.00 4 16_M1_VAL_RFORESTS 8 8 8 8 8 8 8 8 8 8.00 8 17_M1_VAL_TREES 7 7 6 7 7 7 7 2 3 5.89 7
  • 57. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 57 10/7/2019 Based on this methodology, winner, and GB single best model. Alternative selection methods for best models are users’ dependent., below is just one approach.
  • 58. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 58 10/7/2019
  • 59. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 59 10/7/2019
  • 60. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 60 10/7/2019
  • 61. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 61 10/7/2019 ETC.
  • 62. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 62 10/7/2019
  • 63. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 63 10/7/2019
  • 64. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 64 10/7/2019 Ensembles have good performance and no over-fitting.
  • 65. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 65 10/7/2019 Specific example: Note distance to reach ‘best’ lift.
  • 66. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 66 10/7/2019
  • 67. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 67 10/7/2019 Note ‘event’ separation for ENSEMBLE case.
  • 68. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 68 10/7/2019 ***
  • 69. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 69 10/7/2019 Conclusions At least for the present defaults of RF in this presentation, it has badly over-fitted. The best overall model is the ensemble and the best single model is given by Gradient Boosting. The user should decide which metric to use for judging goodness. In here, simple unweighted ranking of 5 measures was used. Since there was no financial information, models could not be measured in terms of profits. K-S chart (not recommended) shows different cut-off points per model.
  • 70. Leonardo Auslender Copyright 2004 Leonardo Auslender – Copyright 2018 Ch. 5-70 10/7/2019 for now