SlideShare a Scribd company logo
1 of 49
Download to read offline
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 17/1/2018
Ensemble models and
Gradient Boosting, part 3.
Leonardo Auslender
Independent Statistical Consultant
Leonardo ‘dot’ Auslender ‘at’
Gmail ‘dot’ com.
Copyright 2018.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 27/1/2018
Studies
2.8.c: Comparison of methods but focusing on whether
raw vs 50/50 re-sampling makes a difference.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 37/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 47/1/2018
Aim: study performance of Fraud models with original 20%
fraud events by altering percentage of events.
3 studies:
M1 5% events
M2 20% events, original
M3 50% Events.
Validation data set is random sample from original 20% data
set for all three studies.
Battery of models as in previous study, similar graphs for
evaluation.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 57/1/2018
Model Name Item Information
M1 TRN data set train05
. TRN num obs 954
VAL data set validata
. VAL num obs 2365
TST data set
. TST num obs
Dep. Var fraud
TRN % Events 5.346
VAL % Events 19.281
M2 TRN data set train
. TRN num obs 3595
VAL data set validata
. VAL num obs 2365
TST data set
. TST num obs
Dep. Var fraud
TRN % Events 20.389
VAL % Events 19.281
M3 TRN data set train50
. TRN num obs 1133
VAL data set validata
. VAL num obs 2365
TST data set
M3 . TST num obs 1
2
Dep. Var fraud
1
TRN % Events 50.838
1
VAL % Events 19.281
1
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 67/1/2018
Requested Models: Names & Descriptions. Model #
Full Model Name Model Description
***
Overall Models
-1
M1 Raw 05pct
-10
M2 Raw 20pct
-10
M3 50pct
-10
01_M1_NSMBL_TRN_LOGISTIC_NONE Logistic TRN NONE Ensemble
1
02_M1_NSMBL_VAL_LOGISTIC_NONE Logistic VAL NONE Ensemble
2
03_M1_TRN_BAGGING Bagging TRN Bagging
3
04_M1_TRN_GRAD_BOOSTING Gradient Boosting
4
05_M1_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE
5
06_M1_TRN_RFORESTS Random Forests
6
07_M1_TRN_TREES Trees TRN Trees
7
08_M1_VAL_BAGGING Trees VAL Trees
8
09_M1_VAL_GRAD_BOOSTING Gradient Boosting
9
10_M1_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE
10
11_M1_VAL_RFORESTS Random Forests
11
12_M1_VAL_TREES Trees VAL Trees
12
13_M2_NSMBL_TRN_LOGISTIC_NONE Logistic TRN NONE Ensemble
13
14_M2_NSMBL_VAL_LOGISTIC_NONE Logistic VAL NONE Ensemble
14
15_M2_TRN_BAGGING Bagging TRN Bagging
15
16_M2_TRN_GRAD_BOOSTING Gradient Boosting
16
17_M2_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE
17
And similarly for rest of M2 and all of M3.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 77/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 87/1/2018
Top split level, nodes 2 and 3. Note: 3 M1 (02, 03, 05) models split on member_duration,
but corresponding M2 and M3 on no_claims. Lonely M1 GB. Previously, just No_claims.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 97/1/2018
Same info, different categorization. Omitted next levels.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 107/1/2018
Omitted rest for brevity. Some conclusions:
Extreme imbalance has caused different initial split variable and
therefore different model structure as opposed to more balanced data
sets.
In the more balanced cases, even the splitting value has mostly not
changed. The probability of event in resulting nodes is different due to
different initial event rates.
The difference in splitting variables is not necessarily “BAD”. Note that
the sample size for more imbalanced data sets is smaller.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 117/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 127/1/2018
M1 models choose different important variable. 50/50 trees (M3_TREES selected just
no_claims, while M2_TREES selected 3 additional predictors. BG, RF and GB are not
similarly affected. M1 trees have no important variables, most affected by imbalance.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 137/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 147/1/2018
M2 (50/50) stops earlier, but val miscl higher than for raw (M1).
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 157/1/2018
Similar results (see previous slide)
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 167/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 177/1/2018
Logistic shows monotonically increasing relationship, while GB more jagged and
increasing. Just one variable shown. M1 unbalanced case seriously affected in
comparison. No_vclaims obviously positively affects Prob fraud.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 187/1/2018
M1 logistic suffers due to event imbalance.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 197/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 207/1/2018
No_claims and Optom_presc positively associated with prob Fraud (Training).
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 217/1/2018
No_claims and Optom_presc positively associated with prob
Fraud (Validation).
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 227/1/2018
TRN GB has # 3 No_claims as most important, others flat.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 237/1/2018
VAL GB repeats No_claims as most important, adds doctor_visits and optom_presc
As positive effects. Corresponding logistic does not point to doctor_visits.
Corresponding M1 and M3 almost identical.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 247/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 257/1/2018
# 7 VAL GB by far most important.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 267/1/2018
Confluence of curves at point of event prior. Forests perform
Very well at TRN but not at VAL.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 277/1/2018
#28 VAL log works to ‘bring’ down positive GB VAL slope.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 287/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 297/1/2018
Class imbalance shifts curve down and with flatter slope.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 307/1/2018
Val results for RF in ensemble models are flat.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 317/1/2018
Goodness of fit
And model
Selection.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 327/1/2018
GOF ranks VAL
GOF measure
rankAUROC
Avg
Square
Error
Cum Lift
3rd bin
Cum
Resp
Rate
3rd Gini
Rsquare
Cramer
Tjur
Rank Rank Rank Rank Rank Rank
Unw.
Mean
Unw.
Median
Model Name
6 3 6 6 6 3 5.00 6.00
02_M1_NSMBL_VAL_LOGISTIC_
NONE
08_M1_VAL_BAGGING
14 13 15 15 14 14 14.17 14.00
09_M1_VAL_GRAD_BOOSTING
5 4 1 1 5 4 3.33 4.00
10_M1_VAL_LOGISTIC_STEPWI
SE 16 11 16 16 16 17 15.33 16.00
11_M1_VAL_RFORESTS
13 16 14 14 13 15 14.17 14.00
12_M1_VAL_TREES
18 18 18 18 18 18 18.00 18.00
14_M2_NSMBL_VAL_LOGISTIC_
NONE 1 1 2 2 1 1 1.33 1.00
20_M2_VAL_BAGGING
9 7 9 9 9 12 9.17 9.00
21_M2_VAL_GRAD_BOOSTING
3 5 3 3 3 5 3.67 3.00
22_M2_VAL_LOGISTIC_STEPWI
SE 10 8 11 11 10 10 10.00 10.00
23_M2_VAL_RFORESTS
17 12 17 17 17 16 16.00 17.00
24_M2_VAL_TREES
12 10 10 10 12 8 10.33 10.00
26_M3_NSMBL_VAL_LOGISTIC_
NONE 2 2 4 4 2 2 2.67 2.00
32_M3_VAL_BAGGING
8 14 8 8 8 7 8.83 8.00
33_M3_VAL_GRAD_BOOSTING
4 6 5 5 4 6 5.00 5.00
34_M3_VAL_LOGISTIC_STEPWI
SE 11 9 12 12 11 11 11.00 11.00
35_M3_VAL_RFORESTS
7 15 7 7 7 9 8.67 7.00
Model Name
36_M3_VAL_TREES
15 17 13 13 15 13 14.33
14.0
0
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 337/1/2018
M2 VAL ensemble best, and M1 VAL GB best of single models.
M3 VAL performance of single models is lackluster.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 347/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 357/1/2018
Conclusion on re-sampling.
In this example, 50/50 M3 resampled models yielded a
smaller Tree with no discernible difference in
performance to its M2 counterpart. M1 trees failed to
perform and the other M1 methods performed acceptably
well.
Actual performance (for best models) was not affected by
50/50 or raw modeling. Extreme imbalance seriously
affected raw trees, but not other variants.
The overall winner in all cases was GB, when evaluated at
VAL. Models suffer when event prior is seriously
imbalanced except for GB.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 367/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 377/1/2018
XGBoost
Developed by Chen and Guestrin (2016) XGBoost: A Scalable Tree
Boosting System.
Claims: Faster and better than neural networks and Random Forests.
Uses 2nd order gradients of loss functions based on Taylor expansions of loss functions,
plugged into same algorithm for greater generalization. In addition, transforms loss function
into more sophisticated objective function containing regularization terms, that penalizes tree
growth, with penalty proportional to the size of the node weights thus preventing overfitting.
More efficient than GB due to parallel computing on single computer (10 times faster).
Algorithm takes advantage of advanced decomposition of objective function that allows for
outperforming GB.
Not yet SAS available. Available in R, Julia, Python, CLI.
Tool used in many champion models in recent competitions (Kaggle, etc.).
See also Foster’s (2017) XGboostExplainer.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 387/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 397/1/2018
Comments
1) Not immediately apparent what weak classifier is for GB (e.g., by
varying depth in our case). Likewise, number of iterations is big
issue. In our simple example in first study, M6 GB was best performer.
Still, overall modeling benefited from ensembling all methods as
measured by either AUROC or Cum Lift or ensemble p-values.
2) The posterior probability ranges are vastly different and thus the
tendency to classify observations by the .5 threshold is too simplistic.
3) The PDPs show that different methods find distinct multivariate
structures. Interestingly, the ensemble p-values show a decreasing
tendency by logistic and trees and a strong S shaped tendency
by M6 GB (first study), which could mean that M6 GB alone tends
to overshoot its predictions.
4) GB relatively unaffected by 50/50 mixture.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 407/1/2018
Comments
5) While on classification GB problems, predictions are within [0, 1], for
continuous target problems, predictions can be beyond the range of the
target variable  headaches.
This is due to the fact that GB models residual at each iteration, not the
original target; this can lead to surprises, such as negative predictions
when Y takes only non-negative residual values, contrary to the original
Tree algorithm.
6) Shrinkage parameter and early stopping (# trees) act as regularizers
but combined effect not known and could be ineffective.
7) If shrinkage too small, and allow large T, model is large, expensive
to compute, implement and understand.
8) Random Forests over-fitted. A larger study should incorporate changes
in its parameters for better validation.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 417/1/2018
Comments
9) Model interpretation is difficult in the case of BG, RF and BG (and not
trivial for the other methods either). PDPs for logistic regression variables
show monotonic relationships, while those of GB variables are very
nonlinear. PDPs for other methods were not created.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 427/1/2018
Drawbacks of GB.
1) IT IS NOT MAGIC, it won’t solve ALL modeling needs,
but best off-the-shelf tool. Still need to look for
transformations, odd issues, missing values, etc.
2) As all tree methods, categorical variables with many levels can
make it impossible to obtain model. E.g., zip codes.
3) Memory requirements can be very large, especially with large
iterations, typical problem of ensemble methods.
4) Large number of iterations  slow speed to obtain predictions 
on-line scoring may require trade-off between complexity and time
available. Once GB is learned, parallelization certainly helps.
5) No simple algorithm to capture interactions because of base-
learners.
6) No simple rules to determine gamma, # of iterations or depth of
simple learner. Need to try different combinations and possibly
recalibrate in time.
7) Still, one of the most powerful methods available.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 437/1/2018
Un-reviewed
Catboost
DeepForest
gcForest
Use of tree methods for continuous target variable.
Naïve-Bayes
Bootstrapping.
…
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 447/1/2018
2.11) References
Auslender L. (1998): Alacart, poor man’s classification trees, NESUG.
Breiman L., Friedman J., Olshen R., Stone J. (1984): Classification and Regression Trees, Wadsworth.
Chen and Guestrin (2016): XGBoost: A Scalable Tree Boosting System.
Chipman H., George E., McCulloch R.: BART, Bayesian additive regression Trees, The Annals of
Statistics.
Foster D. (2017): New R package that makes Xgboost Interpretable, https://medium.com/applied-data-
science/new-r-package-the-xgboost-explainer-51dd7d1aa211
Friedman, J. (2001).Greedy boosting approximation: a gradient boosting machine. Ann.Stat. 29, 1189–
1232.doi:10.1214/aos/1013203451
Paluszynska A. (2017): Structural mining and knowledge extraction from random forest with
applications to The Cancer Genome Atlas project
(https://www.google.com/url?q=https%3A%2F%2Frawgit.com%2FgeneticsMiNIng%2FBlackBoxOpener
%2Fmaster%2FrandomForestExplainer_Master_thesis.pdf&sa=D&sntz=1&usg=AFQjCNHTJONZK24L
ioDeOB0KZnwLkn98fw and https://mi2datalab.github.io/randomForestExplainer/)
Quinlan J. Ross (1993): C4.5: programs for machine learning, Morgan Kaufmann Publshers.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 457/1/2018
Earlier literature on combining methods:
Winkler, RL. and Makridakis, S. (1983). The combination of
forecasts. J. R. Statis. Soc. A. 146(2), 150-157.
Makridakis, S. and Winkler, R.L. (1983). Averages of
Forecasts: Some Empirical Results,. Management Science,
29(9) 987-996.
Bates, J.M. and Granger, C.W. (1969). The combination of
forecasts. Or, 451-468.
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 467/1/2018
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 477/1/2018
1) Can you explain in nontechnical language the idea of
maximum likelihood estimation?, of SVM (unreviewed in class)?
2) Contrast GB with RF.
3) In what way is over-fitting like a glove? Like an umbrella?
4) Would ensemble models always improve on individual models?
5) Would you select variables by way of tree methods to use in linear
methods later on? Yes? No? why?
6) In Tree regression, final predictions are means. Could better
predictions be obtained by regression model instead? A logistic for a
binary target? Discuss.
7) There are 9 coins, 8 of which are of equal weight, and there’s one
scale. How many steps until you identify the odd coin?
8) Why are manhole covers round?
9) You obtain 100% accuracy in validation of classification model. Are
you a genius? Yes, no, why?
10)If 85% of witnesses saw blue car during accident, and 15% saw red
car, what is probability (car is blue)?
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 487/1/2018
Counter-interview questions (you ask the interviewer).
1) How do you measure the height of a building with just a
barometer? Give three answers at least.
2) Two players A and B take turns saying a positive integer
number from 1 to 9. The numbers are added until
whoever reaches 100 or above, loses. Is there a strategy
to never lose? (aborting a game midway is acceptable, but
give reasoning).
3) There are two jugs, one that holds 5 gallons, the other one
3, and a nearby water fountain. How do you put exactly (less
than one ounce deviation is fine) 4 ounces in the 5 gallon
jug?
Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 Ch. 5-497/1/2018
for now

More Related Content

More from Leonardo Auslender (20)

4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
 
4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf4_5_Model Interpretation and diagnostics part 4_B.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
 
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4.pdf
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
1 EDA.pdf
 
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
4 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-074 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-07
 
4 meda
4 meda4 meda
4 meda
 
3 beda
3 beda3 beda
3 beda
 
2 ueda
2 ueda2 ueda
2 ueda
 
1 eda
1 eda1 eda
1 eda
 
0 statistics intro
0 statistics intro0 statistics intro
0 statistics intro
 
4 1 tree world
4 1 tree world4 1 tree world
4 1 tree world
 
Classification methods and assessment
Classification methods and assessmentClassification methods and assessment
Classification methods and assessment
 

Recently uploaded

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontangsiskavia95
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Payal Garg #K09
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxStephen266013
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 

Recently uploaded (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
Unsatisfied Bhabhi ℂall Girls Vadodara Book Esha 7427069034 Top Class ℂall Gi...
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptxClient Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
Client Researchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh.pptx
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

4 2 ensemble models and grad boost part 3 2019-10-07

  • 1. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 17/1/2018 Ensemble models and Gradient Boosting, part 3. Leonardo Auslender Independent Statistical Consultant Leonardo ‘dot’ Auslender ‘at’ Gmail ‘dot’ com. Copyright 2018.
  • 2. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 27/1/2018 Studies 2.8.c: Comparison of methods but focusing on whether raw vs 50/50 re-sampling makes a difference.
  • 3. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 37/1/2018
  • 4. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 47/1/2018 Aim: study performance of Fraud models with original 20% fraud events by altering percentage of events. 3 studies: M1 5% events M2 20% events, original M3 50% Events. Validation data set is random sample from original 20% data set for all three studies. Battery of models as in previous study, similar graphs for evaluation.
  • 5. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 57/1/2018 Model Name Item Information M1 TRN data set train05 . TRN num obs 954 VAL data set validata . VAL num obs 2365 TST data set . TST num obs Dep. Var fraud TRN % Events 5.346 VAL % Events 19.281 M2 TRN data set train . TRN num obs 3595 VAL data set validata . VAL num obs 2365 TST data set . TST num obs Dep. Var fraud TRN % Events 20.389 VAL % Events 19.281 M3 TRN data set train50 . TRN num obs 1133 VAL data set validata . VAL num obs 2365 TST data set M3 . TST num obs 1 2 Dep. Var fraud 1 TRN % Events 50.838 1 VAL % Events 19.281 1
  • 6. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 67/1/2018 Requested Models: Names & Descriptions. Model # Full Model Name Model Description *** Overall Models -1 M1 Raw 05pct -10 M2 Raw 20pct -10 M3 50pct -10 01_M1_NSMBL_TRN_LOGISTIC_NONE Logistic TRN NONE Ensemble 1 02_M1_NSMBL_VAL_LOGISTIC_NONE Logistic VAL NONE Ensemble 2 03_M1_TRN_BAGGING Bagging TRN Bagging 3 04_M1_TRN_GRAD_BOOSTING Gradient Boosting 4 05_M1_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE 5 06_M1_TRN_RFORESTS Random Forests 6 07_M1_TRN_TREES Trees TRN Trees 7 08_M1_VAL_BAGGING Trees VAL Trees 8 09_M1_VAL_GRAD_BOOSTING Gradient Boosting 9 10_M1_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE 10 11_M1_VAL_RFORESTS Random Forests 11 12_M1_VAL_TREES Trees VAL Trees 12 13_M2_NSMBL_TRN_LOGISTIC_NONE Logistic TRN NONE Ensemble 13 14_M2_NSMBL_VAL_LOGISTIC_NONE Logistic VAL NONE Ensemble 14 15_M2_TRN_BAGGING Bagging TRN Bagging 15 16_M2_TRN_GRAD_BOOSTING Gradient Boosting 16 17_M2_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE 17 And similarly for rest of M2 and all of M3.
  • 7. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 77/1/2018
  • 8. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 87/1/2018 Top split level, nodes 2 and 3. Note: 3 M1 (02, 03, 05) models split on member_duration, but corresponding M2 and M3 on no_claims. Lonely M1 GB. Previously, just No_claims.
  • 9. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 97/1/2018 Same info, different categorization. Omitted next levels.
  • 10. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 107/1/2018 Omitted rest for brevity. Some conclusions: Extreme imbalance has caused different initial split variable and therefore different model structure as opposed to more balanced data sets. In the more balanced cases, even the splitting value has mostly not changed. The probability of event in resulting nodes is different due to different initial event rates. The difference in splitting variables is not necessarily “BAD”. Note that the sample size for more imbalanced data sets is smaller.
  • 11. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 117/1/2018
  • 12. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 127/1/2018 M1 models choose different important variable. 50/50 trees (M3_TREES selected just no_claims, while M2_TREES selected 3 additional predictors. BG, RF and GB are not similarly affected. M1 trees have no important variables, most affected by imbalance.
  • 13. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 137/1/2018
  • 14. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 147/1/2018 M2 (50/50) stops earlier, but val miscl higher than for raw (M1).
  • 15. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 157/1/2018 Similar results (see previous slide)
  • 16. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 167/1/2018
  • 17. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 177/1/2018 Logistic shows monotonically increasing relationship, while GB more jagged and increasing. Just one variable shown. M1 unbalanced case seriously affected in comparison. No_vclaims obviously positively affects Prob fraud.
  • 18. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 187/1/2018 M1 logistic suffers due to event imbalance.
  • 19. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 197/1/2018
  • 20. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 207/1/2018 No_claims and Optom_presc positively associated with prob Fraud (Training).
  • 21. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 217/1/2018 No_claims and Optom_presc positively associated with prob Fraud (Validation).
  • 22. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 227/1/2018 TRN GB has # 3 No_claims as most important, others flat.
  • 23. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 237/1/2018 VAL GB repeats No_claims as most important, adds doctor_visits and optom_presc As positive effects. Corresponding logistic does not point to doctor_visits. Corresponding M1 and M3 almost identical.
  • 24. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 247/1/2018
  • 25. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 257/1/2018 # 7 VAL GB by far most important.
  • 26. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 267/1/2018 Confluence of curves at point of event prior. Forests perform Very well at TRN but not at VAL.
  • 27. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 277/1/2018 #28 VAL log works to ‘bring’ down positive GB VAL slope.
  • 28. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 287/1/2018
  • 29. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 297/1/2018 Class imbalance shifts curve down and with flatter slope.
  • 30. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 307/1/2018 Val results for RF in ensemble models are flat.
  • 31. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 317/1/2018 Goodness of fit And model Selection.
  • 32. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 327/1/2018 GOF ranks VAL GOF measure rankAUROC Avg Square Error Cum Lift 3rd bin Cum Resp Rate 3rd Gini Rsquare Cramer Tjur Rank Rank Rank Rank Rank Rank Unw. Mean Unw. Median Model Name 6 3 6 6 6 3 5.00 6.00 02_M1_NSMBL_VAL_LOGISTIC_ NONE 08_M1_VAL_BAGGING 14 13 15 15 14 14 14.17 14.00 09_M1_VAL_GRAD_BOOSTING 5 4 1 1 5 4 3.33 4.00 10_M1_VAL_LOGISTIC_STEPWI SE 16 11 16 16 16 17 15.33 16.00 11_M1_VAL_RFORESTS 13 16 14 14 13 15 14.17 14.00 12_M1_VAL_TREES 18 18 18 18 18 18 18.00 18.00 14_M2_NSMBL_VAL_LOGISTIC_ NONE 1 1 2 2 1 1 1.33 1.00 20_M2_VAL_BAGGING 9 7 9 9 9 12 9.17 9.00 21_M2_VAL_GRAD_BOOSTING 3 5 3 3 3 5 3.67 3.00 22_M2_VAL_LOGISTIC_STEPWI SE 10 8 11 11 10 10 10.00 10.00 23_M2_VAL_RFORESTS 17 12 17 17 17 16 16.00 17.00 24_M2_VAL_TREES 12 10 10 10 12 8 10.33 10.00 26_M3_NSMBL_VAL_LOGISTIC_ NONE 2 2 4 4 2 2 2.67 2.00 32_M3_VAL_BAGGING 8 14 8 8 8 7 8.83 8.00 33_M3_VAL_GRAD_BOOSTING 4 6 5 5 4 6 5.00 5.00 34_M3_VAL_LOGISTIC_STEPWI SE 11 9 12 12 11 11 11.00 11.00 35_M3_VAL_RFORESTS 7 15 7 7 7 9 8.67 7.00 Model Name 36_M3_VAL_TREES 15 17 13 13 15 13 14.33 14.0 0
  • 33. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 337/1/2018 M2 VAL ensemble best, and M1 VAL GB best of single models. M3 VAL performance of single models is lackluster.
  • 34. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 347/1/2018
  • 35. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 357/1/2018 Conclusion on re-sampling. In this example, 50/50 M3 resampled models yielded a smaller Tree with no discernible difference in performance to its M2 counterpart. M1 trees failed to perform and the other M1 methods performed acceptably well. Actual performance (for best models) was not affected by 50/50 or raw modeling. Extreme imbalance seriously affected raw trees, but not other variants. The overall winner in all cases was GB, when evaluated at VAL. Models suffer when event prior is seriously imbalanced except for GB.
  • 36. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 367/1/2018
  • 37. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 377/1/2018 XGBoost Developed by Chen and Guestrin (2016) XGBoost: A Scalable Tree Boosting System. Claims: Faster and better than neural networks and Random Forests. Uses 2nd order gradients of loss functions based on Taylor expansions of loss functions, plugged into same algorithm for greater generalization. In addition, transforms loss function into more sophisticated objective function containing regularization terms, that penalizes tree growth, with penalty proportional to the size of the node weights thus preventing overfitting. More efficient than GB due to parallel computing on single computer (10 times faster). Algorithm takes advantage of advanced decomposition of objective function that allows for outperforming GB. Not yet SAS available. Available in R, Julia, Python, CLI. Tool used in many champion models in recent competitions (Kaggle, etc.). See also Foster’s (2017) XGboostExplainer.
  • 38. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 387/1/2018
  • 39. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 397/1/2018 Comments 1) Not immediately apparent what weak classifier is for GB (e.g., by varying depth in our case). Likewise, number of iterations is big issue. In our simple example in first study, M6 GB was best performer. Still, overall modeling benefited from ensembling all methods as measured by either AUROC or Cum Lift or ensemble p-values. 2) The posterior probability ranges are vastly different and thus the tendency to classify observations by the .5 threshold is too simplistic. 3) The PDPs show that different methods find distinct multivariate structures. Interestingly, the ensemble p-values show a decreasing tendency by logistic and trees and a strong S shaped tendency by M6 GB (first study), which could mean that M6 GB alone tends to overshoot its predictions. 4) GB relatively unaffected by 50/50 mixture.
  • 40. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 407/1/2018 Comments 5) While on classification GB problems, predictions are within [0, 1], for continuous target problems, predictions can be beyond the range of the target variable  headaches. This is due to the fact that GB models residual at each iteration, not the original target; this can lead to surprises, such as negative predictions when Y takes only non-negative residual values, contrary to the original Tree algorithm. 6) Shrinkage parameter and early stopping (# trees) act as regularizers but combined effect not known and could be ineffective. 7) If shrinkage too small, and allow large T, model is large, expensive to compute, implement and understand. 8) Random Forests over-fitted. A larger study should incorporate changes in its parameters for better validation.
  • 41. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 417/1/2018 Comments 9) Model interpretation is difficult in the case of BG, RF and BG (and not trivial for the other methods either). PDPs for logistic regression variables show monotonic relationships, while those of GB variables are very nonlinear. PDPs for other methods were not created.
  • 42. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 427/1/2018 Drawbacks of GB. 1) IT IS NOT MAGIC, it won’t solve ALL modeling needs, but best off-the-shelf tool. Still need to look for transformations, odd issues, missing values, etc. 2) As all tree methods, categorical variables with many levels can make it impossible to obtain model. E.g., zip codes. 3) Memory requirements can be very large, especially with large iterations, typical problem of ensemble methods. 4) Large number of iterations  slow speed to obtain predictions  on-line scoring may require trade-off between complexity and time available. Once GB is learned, parallelization certainly helps. 5) No simple algorithm to capture interactions because of base- learners. 6) No simple rules to determine gamma, # of iterations or depth of simple learner. Need to try different combinations and possibly recalibrate in time. 7) Still, one of the most powerful methods available.
  • 43. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 437/1/2018 Un-reviewed Catboost DeepForest gcForest Use of tree methods for continuous target variable. Naïve-Bayes Bootstrapping. …
  • 44. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 447/1/2018 2.11) References Auslender L. (1998): Alacart, poor man’s classification trees, NESUG. Breiman L., Friedman J., Olshen R., Stone J. (1984): Classification and Regression Trees, Wadsworth. Chen and Guestrin (2016): XGBoost: A Scalable Tree Boosting System. Chipman H., George E., McCulloch R.: BART, Bayesian additive regression Trees, The Annals of Statistics. Foster D. (2017): New R package that makes Xgboost Interpretable, https://medium.com/applied-data- science/new-r-package-the-xgboost-explainer-51dd7d1aa211 Friedman, J. (2001).Greedy boosting approximation: a gradient boosting machine. Ann.Stat. 29, 1189– 1232.doi:10.1214/aos/1013203451 Paluszynska A. (2017): Structural mining and knowledge extraction from random forest with applications to The Cancer Genome Atlas project (https://www.google.com/url?q=https%3A%2F%2Frawgit.com%2FgeneticsMiNIng%2FBlackBoxOpener %2Fmaster%2FrandomForestExplainer_Master_thesis.pdf&sa=D&sntz=1&usg=AFQjCNHTJONZK24L ioDeOB0KZnwLkn98fw and https://mi2datalab.github.io/randomForestExplainer/) Quinlan J. Ross (1993): C4.5: programs for machine learning, Morgan Kaufmann Publshers.
  • 45. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 457/1/2018 Earlier literature on combining methods: Winkler, RL. and Makridakis, S. (1983). The combination of forecasts. J. R. Statis. Soc. A. 146(2), 150-157. Makridakis, S. and Winkler, R.L. (1983). Averages of Forecasts: Some Empirical Results,. Management Science, 29(9) 987-996. Bates, J.M. and Granger, C.W. (1969). The combination of forecasts. Or, 451-468.
  • 46. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 467/1/2018
  • 47. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 477/1/2018 1) Can you explain in nontechnical language the idea of maximum likelihood estimation?, of SVM (unreviewed in class)? 2) Contrast GB with RF. 3) In what way is over-fitting like a glove? Like an umbrella? 4) Would ensemble models always improve on individual models? 5) Would you select variables by way of tree methods to use in linear methods later on? Yes? No? why? 6) In Tree regression, final predictions are means. Could better predictions be obtained by regression model instead? A logistic for a binary target? Discuss. 7) There are 9 coins, 8 of which are of equal weight, and there’s one scale. How many steps until you identify the odd coin? 8) Why are manhole covers round? 9) You obtain 100% accuracy in validation of classification model. Are you a genius? Yes, no, why? 10)If 85% of witnesses saw blue car during accident, and 15% saw red car, what is probability (car is blue)?
  • 48. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 487/1/2018 Counter-interview questions (you ask the interviewer). 1) How do you measure the height of a building with just a barometer? Give three answers at least. 2) Two players A and B take turns saying a positive integer number from 1 to 9. The numbers are added until whoever reaches 100 or above, loses. Is there a strategy to never lose? (aborting a game midway is acceptable, but give reasoning). 3) There are two jugs, one that holds 5 gallons, the other one 3, and a nearby water fountain. How do you put exactly (less than one ounce deviation is fine) 4 ounces in the 5 gallon jug?
  • 49. Leonardo Auslender Copyright 2004Leonardo Auslender – Copyright 2018 Ch. 5-497/1/2018 for now