SlideShare a Scribd company logo
1 of 55
Download to read offline
Overall Description
The present work aims at presenting tools for model interpretation derived
from Partial Dependency Plots (in many different guises, explained in the text),
and contrasted to osterior probabilities, hereby called scores.
The work comprises 4 Powerpoint Documents, with a possible fifth (if I get to
it), numbered 0 to 4. 0 describes overall issues, introduces the working data
set and models.
At the risk of spoiling end results, the Multivariate section provides insights at
(almost) the observation level, and requires univariate and bivariate support.
This conclusion is quite surprising to me since I thought that Univariate and
Bivariate would be rendered lacking. But context reality is far more complex
than expected, and model interpretations are as varied as the different
contexts available in the data, that should not be dismissed all too eagerly.
This work is based mostly on visualitzation and I have tried to avoid statistical
inference and lengthy tables.
Abstract
Statistical and data science models: Are they Interpretive black-
boxes ? Let’s try for NO.
Molnar’s (2018) “Interpretable Machine Learning”: big effort in finding
solutions. Our presentation is humbler: visual tools for model
interpretation based on partial dependency plots and their variants,
such as collapsed PDPs created by the presenter, some of which may
be polemical and debatable. Almost no use of statistical inference.
Audience should be versed in models creation, and at least some insight into partial
dependency plots. Presentation based on simple working example with 8 predictors and
one binary target variable.
Not possible to detail exhaustively every method described in this presentation.
Extensive document in preparation. Presentation requires 3 hours and wide awake
audience. Double time if not awake. Sleepers will be punished accordingly.
Slides Marked **** can be skipped for easier first reading.
Contents: Model Interpretation (MI)
1. Introduction and General Notes
2. Confounding
3. Model Interpretation (MI) and Categorization: UMI, BMI, MMI.
4. Binary Target Study
4.1: Report of coefficients, estimates, etc.
4.2: Models Structures
4.3: GOF and model Interpretation
5. Univariate Model Interpretation: UMI
5. Profile and Model Interpretation area. Univariate Model Interpreation UMI.
6. Partial Dependency plots (PDPs) and their variants. UMI.
6. Bivariate Model Interpretation: BMI
7. PDPs and Bivariate Model Interpretation (BMI.)
7.1: UMI vs. BMI.
8. Multivariate model interpretation.: MMI
9. Future Steps
10. Observation level Interpretation.
11. References
Overall comments and introduction.
Presentation by way of example focusing on Fraud/Default
Data set and continuing previous chapters available on
web (standard class for Principal Analytics Prep).
Aim: study interpretation/diagnosis mostly via Partial
Dependency Plots of logistic regression, Classification
Trees and Gradient Boosting.
Presentation(s) available at
https://www.slideshare.net/LeonardoAuslender/visual-
tools-for-interpretation-of-machine-learning-models
At present, lots of written opinions and distinctions about topic. No room or desire to
discuss them all. See Molnar’s (2018) book for an overall view, O’Rourke (2018), Doshi-
Velez et (2017).
Overall comments and introduction (cont 1).
No discussion about imbalanced data set modeling
or other modeling issues such as model selection.
This presentation introduces novel visual concepts
as well as tools derived from Partial Dependency
Plots (PDP):
-Overall PDP
-Collapsed PDP and residuals
-Marginal PDP
-PDP vs. actual scores, ….
and how they assist in model interpretation.
Model Interpretation (MI) and model building issues.
1) Why/where model makes mistakes (large residuals, outliers, etc.)?
2) Which/when attributes (alone / group) end up being important?
3) Why non-importants?
4) Observation level predictions differ by models?
However, immediate aim is NOT interpretations at observation level (why
predicted sick/churner/innocent…) but
Objectives of MI (cont. 1)
Why not directly at observation level?
Suppose model to predict entertainment type preference for
database of families in large cities. Since not possible to
obtain updated family preferences consistently, (i.e., data
are ‘soft’), models necessarily are not interpretable at
specific family levels.
Contrariwise, disease diagnostic prediction is closer to
individual explanation and interpretablity (data typically
‘hard’).
MOTTO: Posterior probability follows Data + Model
algorithm/s. Interpretation follows primarily probability but
must include data (i.e., context) ➔
Model Interpretation categorization.
Just as in EDA (but on model results, i.e., predictions), not on initial data),
three types of MI:
Univariate Model Interpretation (UMI): One variable at a time vis-à-vis
predictions/probs. EASIEST to understand and huge source of “makes
sense” discourse. E.g., Classical linear models interpretations;, reasons to
decline a bank loan, etc.
Bivariate Model Interpretation (BMI): Looking at pairs of variables to interpret
model results. Correlation measures immediately spring to mind.
Multivariate Model Interpretation (MMI): Overall model interpretation, most
difficult and valuable.
Typically, most work results in UMI and perhaps BMI. Will aim for MMI as well.
Aside: Does Occam’s razor help?
“Pluralitas non est ponenda sine necessitate. “ ➔ can lead to interpret and
then choose model, or choose model and then interpret ➔ does not help us.
Model Interpretation presentation
We will present results in UMI, BMI and MMI order, and at end, compare
across the three methodologies.
Aim is to find insights and contradictions when generalizing UMI without
validating interpretation in BMI and MMI.
And likewise, to verify strong UMI results that are still prevalent in BMI and
MMI.
Confounding rears its ugly head.
See earlier chapters for review and
examples.
Must read, not elaborated
Herein.
Golden Days of Linear Regression Interpretation ***
Based on “ceteris paribus” assumption that fails In case of
even relatively small VIFs. At present, rule of thumb VIF >=
10 (R-sq = .90 among predictors) ➔ unstable model (see earlier
slides in shareware …).
“Ceteris paribus” exercise: Keeping all other predictors
constant, an increase in …. But if R-sq among predictors is
even 10%, not possible to keep all predictors constant while
increasing by 1 the variable of interest, as per ceteris paribus
frame of analysis.
Advantages however: EASY to conceptualize because
practice follows notion of mostly bivariate correlation
(keeping all else constant, reduces relationship to just 1 var
vs. predictions ➔ UMI). But wrong with even small bivariate
corrs and mostly wrong in multivariate case. Let us see …..
➔Corr (X,Y) = if SD(Y) = SD(X). That is, if both vars
Standardized, otherwise same sign at least, and
interpretation from correlation holds in simple regression
case.
Notice that regression of X on Y is NOT inverse of
regression of Y on X because of SD(X) and SD(Y).
= + +
−
= = 
−
=


/
Confusion on signs of coefficients
and interpretation. Simple LR case.
( )
ˆ {
( )
} ˆ
( ) ( )
y
i
xy xy
x
i
xy
Y X
s
Y Y
r r
s
X X
sg r sg
  
 
2
1 2
2
β̂
̂
20 5/4/2022
In multiple linear regression, previous relationship does not hold
because predictors can be correlated (rxz) weighted by ryz, hinting at
co-linearity and/or relationships of supression/enhancement (paper on
suppression/enhancement in shareware.net)➔
= + + +

= + +
−
=
−
= 
  
. .
. 2
2
But in multivariate, e.g.: ,
estimated equation (emphasizing "partial")
and for example:
ˆ ˆ ˆ ,
ˆ
1
ˆ
( ) ( )
( ) ( ) and 1
YX Z YZ X
Y YX YZ XZ
YX Z
X XZ
YX
YX YZ XZ XZ
Y X Z
Y a X Z
s r r r
s r
sg sg r
abs r abs r r r
   
 
 
Comment on Linear Model Interpretation
Even in traditional UMI land, multivariate relations
given by Partial- and semi-partial correlations
must be part of the interpretation.
Note that while correlation is a bivariate
relationship, partial and semi-partial corrs can be
extended to multivariate setting. In case of binary
target, these relationships are not fully analyzed.
However, even BMI and certainly MMI not so often
performed.
EDA and Model Interpretation
EDA analyzes data sets without reference to dependent or target variable
(DV), which is instead done by modeling. Thus, MI = EDA + Predictions
Analysis.
Nevertheless, for given values(s) of DV or of predicted values, UMI, BMI
and MMI can utilize EDA tools. For instance, histogram of posterior
model probabilities is part of Model UEDA and thus part of UMI.
Thus, MI is based on relationship of predictions (and residuals) vis-à-vis
single, pairs, triads, tetrads, etc. of predictors. And this translates in
different techniques such as Original PDPs, Pair PDPs, triads, etc. to be
reviewed below.
NB: We utilize binning and rescaling of variables ranges for easier visual
interpretation. The number of bins is 10 mostly for UMI analysis, and 3
otherwise. We do not discuss issues of optimal binning, left to the reader.
Searching for Important variables en route to answering
modeling question.
QUESTION: minimum components to make a car go along
highway.
1) Engine
2) Tires
3) Steering wheel
4) Transmission
5) Gas
6) ….. Other MMI aspects and interrelations.
Take just one of them out, and car won’t MOVE ➔ EXISTENCE OF NO
SINGLE most important variable. Instead, minimum irreducible set of
them is NECESSARY. In Data Science case with n → ∞, possibly
many subsets of ‘important’ variables for (n, p) subsets.
Typically, “suspect VARIABLES” good starting point of
research. “STARTING” is key word.
Basic DATA set(s) Information
Model
Name
Item Information
1
M2 TRN DATA set train
. TRN num obs 3595 1
VAL DATA set 1
. VAL num obs 0 1
TST DATA set 1
. TST num obs 0 1
2
Dep. Var fraud 1
TRN % Events 20.389 1
VAL % Events 1
TST % Events 1
— 30 —
Data set: Definition by way of Example
• Health insurance company:
Ophtamologic Insurance Claims
• Is claim valid or fraudulent? Binary
target.
• Full description and analysis of this data
set in
https://www.slideshare.net/LeonardoAuslender
(lectures at Principal Analytics Prep).
While presenting 3 models results, we’ll concentrate on ‘best’ model for
Interpretation for brevity sake, except to mention specific examples of
Different model interpretations across models.
RequestedModels:Names&Descriptions.
Mode
l#
FullModelName ModelDescription
2
002_M2_TRN_GRAD_BOOSTING GradientBoosting
004_M2_TRN_LOGISTIC_STEPWISE LogisticSTEPWISETRN 4
005_M2_TRN_TREE TREEmodel 5
Original Vars + Labels
Model
Name
M2
Var # Variable Label
**
1 FRAUD Fraudulent Activity yes/no
2 TOTAL_SPEND Total spent on opticals **
3 DOCTOR_VISITS Total visits to a doctor **
4 NO_CLAIMS No of claims made recently **
5 MEMBER_DURATION Membership duration **
6 OPTOM_PRESC Number of opticals claimed **
7 SPEND_PER_CLAIM Expenses per claim **
8 CLAIMS_PER_DURATIO
N
Claims per duration
**
Overall MI:
Comparison of
Models
Posterior
Probabilities and
Histograms.
Similar, not identical. Logistic & Trees achieve [0, 1].
Probability distributions very different ➔ Model interpretation must be dependent
on model selection. Possible to ‘mix’ all models into one, Ensemble, not in this
ppt. (See slides in shareware).
Some conclusions and comments so far: (cont.)
Probability distributions differ in:
1) Extreme points: Logistic and TREES achieve [0; 1], not necessarily
other methods, as GradBoost in our case.
2) Very different % obs in Models’ probability bins.
3) % events per bin fairly linear, except for Logistic ‘drop’ at 0.7. Grad
Boosting has higher % events for higher probability levels than other
2 models.
4) After about 0.4 of posterior probability, 3 methods have similar
distributions. Quite different in segment 0 - < 0.4. Notice GB and
TREE having large proportion of observation at lower probability
levels, compared to Logistic.
5) Relative but not absolute Ml Information can be inferred. % Events
different across models ➔ different probability estimates especially
above segment 0 - < 0.4. Since higher probability levels reflect higher
% events, MI necessarily different.
Let’s get into Data Details for sake
Of completion.
Quick EDA area.
U(nivariate) EDA = UEDA
Note “small” Claims_per_duration
And “NO_claims” values at p95.
4.2
PROBABILITIES,
PARAMETERS,
IMPORTANCE
....
4.2.1:
Logistic
Regression
Details.
Note: Importance and coefficients share one column as well
as p-values and number of rules. Note that models do not share all
Variables. Interestingly, CLAIMS_PER_DURATION is # 1 for the tree
methods and it was not selected by Logistic.
Coefficients, p-values and Importance.
Vars * Models *
Coeffs
Model Name
M2_TRN_GRAD_
BOOSTING
M2_TRN_LOGIST
IC_STEPWISE M2_TRN_TREE
Coeff /
Import
ance
PVal /
Nrules
Coeff /
Import
ance
PVal /
Nrules
Coeff /
Import
ance
PVal /
Nrules
Variable
1.0000 26.000 1.0000 5.000
CLAIMS_PER_DURATION
DOCTOR_VISITS 0.4035 20.000 -0.0180 0.014 0.2895 2.000
MEMBER_DURATION 0.5643 26.000 -0.0065 0.000 0.3650 2.000
NO_CLAIMS 0.2483 6.000 0.7137 0.000
OPTOM_PRESC 0.5963 21.000 0.2185 0.000 0.5383 5.000
SPEND_PER_CLAIM 0.2202 8.000 0.0000 0.001
TOTAL_SPEND 0.6148 29.000 -0.0000 0.000 0.4404 3.000
INTERCEPT -0.5160 0.000
Logistic Selection Steps
Model
Name
M2_TRN_
LOGISTI
C_STEPW
ISE
# in
mo
del
P-
valu
e
Ste
p
Effect Entered Effect Removed
1 .00
1 no_claims
2 member_duration 2 .00
3 optom_presc 3 .00
4 total_spend 4 .00
5 spend_per_claim 5 .00
6 doctor_visits 6 .01
....
4.2.2:
Specific Tree based
methods, EDA
and diagnostics.
Some conclusions and comments so far:
. Logistic stepwise did not select NUM_MEMBERS
that is shown with lowest relative importance in GB and
Trees. More importantly, “claims_per_duration” deemed
most important by tree methods, and disregarded by
logistic. Notice that Logistic Regression does not have
agreed-upon scale of importance. By default, using odds-
ratios.
. CLAIMS_PER is deemed most important single variable for
GB and TREE, but logistic deems NO_CLAIMS as # 1,
OPTOM_PRESC as # 2 (via odds ratios), while GB differed.
. Remaining variables have odds ratios of 1 which seem to
indicate similar effect across, while GB/TREE distinguish
relative importance after first two variables.
Strongly summarized area for brevity sake, added just for completion.
GOF ranks
GOF measure
rank
AURO
C
Avg
Squar
e
Error
Class
Rate
Cum
Lift
3rd
bin
Cum
Resp
Rate
3rd Gini
P - R
AUC
Precis
ion
Rate
Rsqua
re
Cram
er
Tjur
Rank Rank Rank Rank Rank Rank Rank Rank Rank
Unw.
Mean
Unw.
Median
Model Name
1 1 2 1 1 1 1 2 1 1.22 1
005_M2_TRN_GRAD_BOOSTING
007_M2_TRN_LOGISTIC_STEPWISE 3 3 1 3 3 3 3 3 3 2.78 3
008_M2_TRN_TREE 2 2 3 2 2 2 2 1 2 2.00 2
➔ Gradient Boosting is our champion, and omit usual
ROCs, Precision-recall curves, etc.
Tree representation(s) up to 4 levels Model 'M2_TRN_LG'
Intermediate prediction in parenthesis
7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.178
CLAIMS_PER_DURATION <
0.00791 ( 0.153 )
MEMBER_DURATION < 155.5 (
0.213 )
OPTOM_PRESC < 4.5 ( 0.197 ) OPTOM_PRESC < 1.5 ( 0.178 )
OPTOM_PRESC >= 1.5 ( 0.258 ) 0.258
OPTOM_PRESC >= 4.5 ( 0.514 ) OPTOM_PRESC < 6.5 ( 0.399 ) 0.399
OPTOM_PRESC >= 6.5 ( 0.622 ) 0.622
MEMBER_DURATION >= 155.5 (
0.113 )
CLAIMS_PER_DURATION <
0.00376 ( 0.099 )
OPTOM_PRESC >= 3.5 ( 0.204 ) 0.204
OPTOM_PRESC < 3.5 ( 0.093 ) 0.093
CLAIMS_PER_DURATION >=
0.00376 ( 0.262 )
OPTOM_PRESC < 2.5 ( 0.235 ) 0.235
OPTOM_PRESC >= 2.5 ( 0.39 ) 0.390
CLAIMS_PER_DURATION >=
0.00791 ( 0.572 )
CLAIMS_PER_DURATION <
0.017 ( 0.469 )
OPTOM_PRESC < 2.5 ( 0.421 ) CLAIMS_PER_DURATION >=
0.01272 ( 0.496 ) 0.496
CLAIMS_PER_DURATION <
0.01272 ( 0.386 ) 0.386
OPTOM_PRESC >= 2.5 ( 0.61 ) OPTOM_PRESC >= 6.5 ( 0.8 ) 0.800
OPTOM_PRESC < 6.5 ( 0.571 ) 0.571
CLAIMS_PER_DURATION >=
0.017 ( 0.755 )
NO_CLAIMS < 3.5 ( 0.652 ) OPTOM_PRESC >= 4.5 ( 0.845 ) 0.845
OPTOM_PRESC < 4.5 ( 0.633 ) 0.633
NO_CLAIMS >= 3.5 ( 0.859 ) NO_CLAIMS < 5.5 ( 0.796 ) 0.796
NO_CLAIMS >= 5.5 ( 0.938 ) 0.938
Tree representation(s) up to 4 levels Model 'M2_TRN_GB'
Intermediate prediction in parenthesis
7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.870
CLAIMS_PER_DURATION <
0.00583 ( 0.15 )
TOTAL_SPEND < 4150 ( 0.583 ) MEMBER_DURATION < 190 (
0.686 )
OPTOM_PRESC >= 1.5 ( 0.87 )
OPTOM_PRESC < 1.5 ( 0.63 ) 0.630
MEMBER_DURATION >= 190 (
0.25 )
TOTAL_SPEND >= 3400 ( 0.151
) 0.151
TOTAL_SPEND < 3400 ( 0.348 ) 0.348
TOTAL_SPEND >= 4150 ( 0.143
)
OPTOM_PRESC < 3.5 ( 0.13 ) MEMBER_DURATION < 182.5 (
0.165 ) 0.165
MEMBER_DURATION >= 182.5 (
0.087 ) 0.087
OPTOM_PRESC >= 3.5 ( 0.329 ) MEMBER_DURATION < 118.5 (
0.556 ) 0.556
MEMBER_DURATION >= 118.5 (
0.234 ) 0.234
CLAIMS_PER_DURATION >=
0.00583 ( 0.527 )
CLAIMS_PER_DURATION <
0.01954 ( 0.433 )
OPTOM_PRESC < 0.5 ( 0.246 ) SPEND_PER_CLAIM >= 4016.67
( 0.233 ) 0.233
SPEND_PER_CLAIM < 4016.67 (
0.354 ) 0.354
OPTOM_PRESC >= 0.5 ( 0.548 ) OPTOM_PRESC >= 3.5 ( 0.797 ) 0.797
OPTOM_PRESC < 3.5 ( 0.492 ) 0.492
CLAIMS_PER_DURATION >=
0.01954 ( 0.803 )
NO_CLAIMS < 4.5 ( 0.742 ) DOCTOR_VISITS >= 3 ( 0.788 ) 0.788
DOCTOR_VISITS < 3 ( 0.632 ) 0.632
NO_CLAIMS >= 4.5 ( 0.91 ) CLAIMS_PER_DURATION <
0.02491 ( 0.851 ) 0.851
CLAIMS_PER_DURATION >=
0.02491 ( 0.92 ) 0.920
Curiosly while node numbers don’t mean anything across models, obvious that
GB and LG share similar structure despite being very different algorithms. However, Tree
Representations are just approximations, except in Tree case.
Discussion of comparison of Tree representations between LG and GB.
The two methods split initially on Claims_per_duration, but at very
different values (0.00791 (LG) vs. 0.00583 (GB). Remember that actual
logistic regression results had dropped Claims_per_duration.
And later levels obviously differ since the initial split is quite different.
Therefore, these two models should ‘a priori’ differ in model
interpretation.
Statistical and visual tools for model interpretation

More Related Content

Similar to Statistical and visual tools for model interpretation

Interpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear RegressionInterpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear RegressionUnchitta Kan
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docxLynellBull52
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbookRaman Kannan
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningGianluca Bontempi
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?Galit Shmueli
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!Eindhoven University of Technology / JADS
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011Adi Ali
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response TheoryNathan Thompson
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodologysmumbahelp
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdfLeonardo Auslender
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?Galit Shmueli
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsAdrian Olszewski
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingGalit Shmueli
 

Similar to Statistical and visual tools for model interpretation (18)

Interpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear RegressionInterpretability in ML & Sparse Linear Regression
Interpretability in ML & Sparse Linear Regression
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Data Analyst - Interview Guide
Data Analyst - Interview GuideData Analyst - Interview Guide
Data Analyst - Interview Guide
 
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx© Charles T. Diebold, Ph.D., 73013. All Rights Reserved.  Pa.docx
© Charles T. Diebold, Ph.D., 73013. All Rights Reserved. Pa.docx
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
 
Introduction to Item Response Theory
Introduction to Item Response TheoryIntroduction to Item Response Theory
Introduction to Item Response Theory
 
Mb0050 research methodology
Mb0050   research methodologyMb0050   research methodology
Mb0050 research methodology
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
To Explain Or To Predict?
To Explain Or To Predict?To Explain Or To Predict?
To Explain Or To Predict?
 
Why are data transformations a bad choice in statistics
Why are data transformations a bad choice in statisticsWhy are data transformations a bad choice in statistics
Why are data transformations a bad choice in statistics
 
Statistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, DescribingStatistical Modeling in 3D: Explaining, Predicting, Describing
Statistical Modeling in 3D: Explaining, Predicting, Describing
 

More from Leonardo Auslender

More from Leonardo Auslender (20)

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
1 UMI.pdf
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
 
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
 
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf
 
4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
 
4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
1 EDA.pdf
 
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
 
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-074 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-07
 
4 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-074 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-07
 
4 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-074 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-07
 
4 meda
4 meda4 meda
4 meda
 
3 beda
3 beda3 beda
3 beda
 

Recently uploaded

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Recently uploaded (20)

Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Statistical and visual tools for model interpretation

  • 1.
  • 2.
  • 3. Overall Description The present work aims at presenting tools for model interpretation derived from Partial Dependency Plots (in many different guises, explained in the text), and contrasted to osterior probabilities, hereby called scores. The work comprises 4 Powerpoint Documents, with a possible fifth (if I get to it), numbered 0 to 4. 0 describes overall issues, introduces the working data set and models. At the risk of spoiling end results, the Multivariate section provides insights at (almost) the observation level, and requires univariate and bivariate support. This conclusion is quite surprising to me since I thought that Univariate and Bivariate would be rendered lacking. But context reality is far more complex than expected, and model interpretations are as varied as the different contexts available in the data, that should not be dismissed all too eagerly. This work is based mostly on visualitzation and I have tried to avoid statistical inference and lengthy tables.
  • 4. Abstract Statistical and data science models: Are they Interpretive black- boxes ? Let’s try for NO. Molnar’s (2018) “Interpretable Machine Learning”: big effort in finding solutions. Our presentation is humbler: visual tools for model interpretation based on partial dependency plots and their variants, such as collapsed PDPs created by the presenter, some of which may be polemical and debatable. Almost no use of statistical inference. Audience should be versed in models creation, and at least some insight into partial dependency plots. Presentation based on simple working example with 8 predictors and one binary target variable. Not possible to detail exhaustively every method described in this presentation. Extensive document in preparation. Presentation requires 3 hours and wide awake audience. Double time if not awake. Sleepers will be punished accordingly. Slides Marked **** can be skipped for easier first reading.
  • 5. Contents: Model Interpretation (MI) 1. Introduction and General Notes 2. Confounding 3. Model Interpretation (MI) and Categorization: UMI, BMI, MMI. 4. Binary Target Study 4.1: Report of coefficients, estimates, etc. 4.2: Models Structures 4.3: GOF and model Interpretation 5. Univariate Model Interpretation: UMI 5. Profile and Model Interpretation area. Univariate Model Interpreation UMI. 6. Partial Dependency plots (PDPs) and their variants. UMI. 6. Bivariate Model Interpretation: BMI 7. PDPs and Bivariate Model Interpretation (BMI.) 7.1: UMI vs. BMI. 8. Multivariate model interpretation.: MMI 9. Future Steps 10. Observation level Interpretation. 11. References
  • 6.
  • 7. Overall comments and introduction. Presentation by way of example focusing on Fraud/Default Data set and continuing previous chapters available on web (standard class for Principal Analytics Prep). Aim: study interpretation/diagnosis mostly via Partial Dependency Plots of logistic regression, Classification Trees and Gradient Boosting. Presentation(s) available at https://www.slideshare.net/LeonardoAuslender/visual- tools-for-interpretation-of-machine-learning-models At present, lots of written opinions and distinctions about topic. No room or desire to discuss them all. See Molnar’s (2018) book for an overall view, O’Rourke (2018), Doshi- Velez et (2017).
  • 8. Overall comments and introduction (cont 1). No discussion about imbalanced data set modeling or other modeling issues such as model selection. This presentation introduces novel visual concepts as well as tools derived from Partial Dependency Plots (PDP): -Overall PDP -Collapsed PDP and residuals -Marginal PDP -PDP vs. actual scores, …. and how they assist in model interpretation.
  • 9. Model Interpretation (MI) and model building issues. 1) Why/where model makes mistakes (large residuals, outliers, etc.)? 2) Which/when attributes (alone / group) end up being important? 3) Why non-importants? 4) Observation level predictions differ by models? However, immediate aim is NOT interpretations at observation level (why predicted sick/churner/innocent…) but
  • 10. Objectives of MI (cont. 1) Why not directly at observation level? Suppose model to predict entertainment type preference for database of families in large cities. Since not possible to obtain updated family preferences consistently, (i.e., data are ‘soft’), models necessarily are not interpretable at specific family levels. Contrariwise, disease diagnostic prediction is closer to individual explanation and interpretablity (data typically ‘hard’). MOTTO: Posterior probability follows Data + Model algorithm/s. Interpretation follows primarily probability but must include data (i.e., context) ➔
  • 11.
  • 12.
  • 13. Model Interpretation categorization. Just as in EDA (but on model results, i.e., predictions), not on initial data), three types of MI: Univariate Model Interpretation (UMI): One variable at a time vis-à-vis predictions/probs. EASIEST to understand and huge source of “makes sense” discourse. E.g., Classical linear models interpretations;, reasons to decline a bank loan, etc. Bivariate Model Interpretation (BMI): Looking at pairs of variables to interpret model results. Correlation measures immediately spring to mind. Multivariate Model Interpretation (MMI): Overall model interpretation, most difficult and valuable. Typically, most work results in UMI and perhaps BMI. Will aim for MMI as well. Aside: Does Occam’s razor help? “Pluralitas non est ponenda sine necessitate. “ ➔ can lead to interpret and then choose model, or choose model and then interpret ➔ does not help us.
  • 14. Model Interpretation presentation We will present results in UMI, BMI and MMI order, and at end, compare across the three methodologies. Aim is to find insights and contradictions when generalizing UMI without validating interpretation in BMI and MMI. And likewise, to verify strong UMI results that are still prevalent in BMI and MMI.
  • 15.
  • 16. Confounding rears its ugly head. See earlier chapters for review and examples. Must read, not elaborated Herein.
  • 17.
  • 18. Golden Days of Linear Regression Interpretation *** Based on “ceteris paribus” assumption that fails In case of even relatively small VIFs. At present, rule of thumb VIF >= 10 (R-sq = .90 among predictors) ➔ unstable model (see earlier slides in shareware …). “Ceteris paribus” exercise: Keeping all other predictors constant, an increase in …. But if R-sq among predictors is even 10%, not possible to keep all predictors constant while increasing by 1 the variable of interest, as per ceteris paribus frame of analysis. Advantages however: EASY to conceptualize because practice follows notion of mostly bivariate correlation (keeping all else constant, reduces relationship to just 1 var vs. predictions ➔ UMI). But wrong with even small bivariate corrs and mostly wrong in multivariate case. Let us see …..
  • 19. ➔Corr (X,Y) = if SD(Y) = SD(X). That is, if both vars Standardized, otherwise same sign at least, and interpretation from correlation holds in simple regression case. Notice that regression of X on Y is NOT inverse of regression of Y on X because of SD(X) and SD(Y). = + + − = =  − =   / Confusion on signs of coefficients and interpretation. Simple LR case. ( ) ˆ { ( ) } ˆ ( ) ( ) y i xy xy x i xy Y X s Y Y r r s X X sg r sg      2 1 2 2 β̂ ̂
  • 20. 20 5/4/2022 In multiple linear regression, previous relationship does not hold because predictors can be correlated (rxz) weighted by ryz, hinting at co-linearity and/or relationships of supression/enhancement (paper on suppression/enhancement in shareware.net)➔ = + + +  = + + − = − =     . . . 2 2 But in multivariate, e.g.: , estimated equation (emphasizing "partial") and for example: ˆ ˆ ˆ , ˆ 1 ˆ ( ) ( ) ( ) ( ) and 1 YX Z YZ X Y YX YZ XZ YX Z X XZ YX YX YZ XZ XZ Y X Z Y a X Z s r r r s r sg sg r abs r abs r r r        
  • 21. Comment on Linear Model Interpretation Even in traditional UMI land, multivariate relations given by Partial- and semi-partial correlations must be part of the interpretation. Note that while correlation is a bivariate relationship, partial and semi-partial corrs can be extended to multivariate setting. In case of binary target, these relationships are not fully analyzed. However, even BMI and certainly MMI not so often performed.
  • 22.
  • 23. EDA and Model Interpretation EDA analyzes data sets without reference to dependent or target variable (DV), which is instead done by modeling. Thus, MI = EDA + Predictions Analysis. Nevertheless, for given values(s) of DV or of predicted values, UMI, BMI and MMI can utilize EDA tools. For instance, histogram of posterior model probabilities is part of Model UEDA and thus part of UMI. Thus, MI is based on relationship of predictions (and residuals) vis-à-vis single, pairs, triads, tetrads, etc. of predictors. And this translates in different techniques such as Original PDPs, Pair PDPs, triads, etc. to be reviewed below. NB: We utilize binning and rescaling of variables ranges for easier visual interpretation. The number of bins is 10 mostly for UMI analysis, and 3 otherwise. We do not discuss issues of optimal binning, left to the reader.
  • 24.
  • 25. Searching for Important variables en route to answering modeling question. QUESTION: minimum components to make a car go along highway. 1) Engine 2) Tires 3) Steering wheel 4) Transmission 5) Gas 6) ….. Other MMI aspects and interrelations. Take just one of them out, and car won’t MOVE ➔ EXISTENCE OF NO SINGLE most important variable. Instead, minimum irreducible set of them is NECESSARY. In Data Science case with n → ∞, possibly many subsets of ‘important’ variables for (n, p) subsets. Typically, “suspect VARIABLES” good starting point of research. “STARTING” is key word.
  • 26.
  • 27.
  • 28.
  • 29. Basic DATA set(s) Information Model Name Item Information 1 M2 TRN DATA set train . TRN num obs 3595 1 VAL DATA set 1 . VAL num obs 0 1 TST DATA set 1 . TST num obs 0 1 2 Dep. Var fraud 1 TRN % Events 20.389 1 VAL % Events 1 TST % Events 1
  • 30. — 30 — Data set: Definition by way of Example • Health insurance company: Ophtamologic Insurance Claims • Is claim valid or fraudulent? Binary target. • Full description and analysis of this data set in https://www.slideshare.net/LeonardoAuslender (lectures at Principal Analytics Prep).
  • 31. While presenting 3 models results, we’ll concentrate on ‘best’ model for Interpretation for brevity sake, except to mention specific examples of Different model interpretations across models. RequestedModels:Names&Descriptions. Mode l# FullModelName ModelDescription 2 002_M2_TRN_GRAD_BOOSTING GradientBoosting 004_M2_TRN_LOGISTIC_STEPWISE LogisticSTEPWISETRN 4 005_M2_TRN_TREE TREEmodel 5
  • 32. Original Vars + Labels Model Name M2 Var # Variable Label ** 1 FRAUD Fraudulent Activity yes/no 2 TOTAL_SPEND Total spent on opticals ** 3 DOCTOR_VISITS Total visits to a doctor ** 4 NO_CLAIMS No of claims made recently ** 5 MEMBER_DURATION Membership duration ** 6 OPTOM_PRESC Number of opticals claimed ** 7 SPEND_PER_CLAIM Expenses per claim ** 8 CLAIMS_PER_DURATIO N Claims per duration **
  • 34. Similar, not identical. Logistic & Trees achieve [0, 1].
  • 35. Probability distributions very different ➔ Model interpretation must be dependent on model selection. Possible to ‘mix’ all models into one, Ensemble, not in this ppt. (See slides in shareware).
  • 36.
  • 37. Some conclusions and comments so far: (cont.) Probability distributions differ in: 1) Extreme points: Logistic and TREES achieve [0; 1], not necessarily other methods, as GradBoost in our case. 2) Very different % obs in Models’ probability bins. 3) % events per bin fairly linear, except for Logistic ‘drop’ at 0.7. Grad Boosting has higher % events for higher probability levels than other 2 models. 4) After about 0.4 of posterior probability, 3 methods have similar distributions. Quite different in segment 0 - < 0.4. Notice GB and TREE having large proportion of observation at lower probability levels, compared to Logistic. 5) Relative but not absolute Ml Information can be inferred. % Events different across models ➔ different probability estimates especially above segment 0 - < 0.4. Since higher probability levels reflect higher % events, MI necessarily different.
  • 38. Let’s get into Data Details for sake Of completion. Quick EDA area. U(nivariate) EDA = UEDA
  • 39. Note “small” Claims_per_duration And “NO_claims” values at p95.
  • 42. Note: Importance and coefficients share one column as well as p-values and number of rules. Note that models do not share all Variables. Interestingly, CLAIMS_PER_DURATION is # 1 for the tree methods and it was not selected by Logistic. Coefficients, p-values and Importance. Vars * Models * Coeffs Model Name M2_TRN_GRAD_ BOOSTING M2_TRN_LOGIST IC_STEPWISE M2_TRN_TREE Coeff / Import ance PVal / Nrules Coeff / Import ance PVal / Nrules Coeff / Import ance PVal / Nrules Variable 1.0000 26.000 1.0000 5.000 CLAIMS_PER_DURATION DOCTOR_VISITS 0.4035 20.000 -0.0180 0.014 0.2895 2.000 MEMBER_DURATION 0.5643 26.000 -0.0065 0.000 0.3650 2.000 NO_CLAIMS 0.2483 6.000 0.7137 0.000 OPTOM_PRESC 0.5963 21.000 0.2185 0.000 0.5383 5.000 SPEND_PER_CLAIM 0.2202 8.000 0.0000 0.001 TOTAL_SPEND 0.6148 29.000 -0.0000 0.000 0.4404 3.000 INTERCEPT -0.5160 0.000
  • 43. Logistic Selection Steps Model Name M2_TRN_ LOGISTI C_STEPW ISE # in mo del P- valu e Ste p Effect Entered Effect Removed 1 .00 1 no_claims 2 member_duration 2 .00 3 optom_presc 3 .00 4 total_spend 4 .00 5 spend_per_claim 5 .00 6 doctor_visits 6 .01
  • 45.
  • 46.
  • 47. Some conclusions and comments so far: . Logistic stepwise did not select NUM_MEMBERS that is shown with lowest relative importance in GB and Trees. More importantly, “claims_per_duration” deemed most important by tree methods, and disregarded by logistic. Notice that Logistic Regression does not have agreed-upon scale of importance. By default, using odds- ratios. . CLAIMS_PER is deemed most important single variable for GB and TREE, but logistic deems NO_CLAIMS as # 1, OPTOM_PRESC as # 2 (via odds ratios), while GB differed. . Remaining variables have odds ratios of 1 which seem to indicate similar effect across, while GB/TREE distinguish relative importance after first two variables.
  • 48. Strongly summarized area for brevity sake, added just for completion.
  • 49. GOF ranks GOF measure rank AURO C Avg Squar e Error Class Rate Cum Lift 3rd bin Cum Resp Rate 3rd Gini P - R AUC Precis ion Rate Rsqua re Cram er Tjur Rank Rank Rank Rank Rank Rank Rank Rank Rank Unw. Mean Unw. Median Model Name 1 1 2 1 1 1 1 2 1 1.22 1 005_M2_TRN_GRAD_BOOSTING 007_M2_TRN_LOGISTIC_STEPWISE 3 3 1 3 3 3 3 3 3 2.78 3 008_M2_TRN_TREE 2 2 3 2 2 2 2 1 2 2.00 2 ➔ Gradient Boosting is our champion, and omit usual ROCs, Precision-recall curves, etc.
  • 50.
  • 51. Tree representation(s) up to 4 levels Model 'M2_TRN_LG' Intermediate prediction in parenthesis 7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.178 CLAIMS_PER_DURATION < 0.00791 ( 0.153 ) MEMBER_DURATION < 155.5 ( 0.213 ) OPTOM_PRESC < 4.5 ( 0.197 ) OPTOM_PRESC < 1.5 ( 0.178 ) OPTOM_PRESC >= 1.5 ( 0.258 ) 0.258 OPTOM_PRESC >= 4.5 ( 0.514 ) OPTOM_PRESC < 6.5 ( 0.399 ) 0.399 OPTOM_PRESC >= 6.5 ( 0.622 ) 0.622 MEMBER_DURATION >= 155.5 ( 0.113 ) CLAIMS_PER_DURATION < 0.00376 ( 0.099 ) OPTOM_PRESC >= 3.5 ( 0.204 ) 0.204 OPTOM_PRESC < 3.5 ( 0.093 ) 0.093 CLAIMS_PER_DURATION >= 0.00376 ( 0.262 ) OPTOM_PRESC < 2.5 ( 0.235 ) 0.235 OPTOM_PRESC >= 2.5 ( 0.39 ) 0.390 CLAIMS_PER_DURATION >= 0.00791 ( 0.572 ) CLAIMS_PER_DURATION < 0.017 ( 0.469 ) OPTOM_PRESC < 2.5 ( 0.421 ) CLAIMS_PER_DURATION >= 0.01272 ( 0.496 ) 0.496 CLAIMS_PER_DURATION < 0.01272 ( 0.386 ) 0.386 OPTOM_PRESC >= 2.5 ( 0.61 ) OPTOM_PRESC >= 6.5 ( 0.8 ) 0.800 OPTOM_PRESC < 6.5 ( 0.571 ) 0.571 CLAIMS_PER_DURATION >= 0.017 ( 0.755 ) NO_CLAIMS < 3.5 ( 0.652 ) OPTOM_PRESC >= 4.5 ( 0.845 ) 0.845 OPTOM_PRESC < 4.5 ( 0.633 ) 0.633 NO_CLAIMS >= 3.5 ( 0.859 ) NO_CLAIMS < 5.5 ( 0.796 ) 0.796 NO_CLAIMS >= 5.5 ( 0.938 ) 0.938
  • 52. Tree representation(s) up to 4 levels Model 'M2_TRN_GB' Intermediate prediction in parenthesis 7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.870 CLAIMS_PER_DURATION < 0.00583 ( 0.15 ) TOTAL_SPEND < 4150 ( 0.583 ) MEMBER_DURATION < 190 ( 0.686 ) OPTOM_PRESC >= 1.5 ( 0.87 ) OPTOM_PRESC < 1.5 ( 0.63 ) 0.630 MEMBER_DURATION >= 190 ( 0.25 ) TOTAL_SPEND >= 3400 ( 0.151 ) 0.151 TOTAL_SPEND < 3400 ( 0.348 ) 0.348 TOTAL_SPEND >= 4150 ( 0.143 ) OPTOM_PRESC < 3.5 ( 0.13 ) MEMBER_DURATION < 182.5 ( 0.165 ) 0.165 MEMBER_DURATION >= 182.5 ( 0.087 ) 0.087 OPTOM_PRESC >= 3.5 ( 0.329 ) MEMBER_DURATION < 118.5 ( 0.556 ) 0.556 MEMBER_DURATION >= 118.5 ( 0.234 ) 0.234 CLAIMS_PER_DURATION >= 0.00583 ( 0.527 ) CLAIMS_PER_DURATION < 0.01954 ( 0.433 ) OPTOM_PRESC < 0.5 ( 0.246 ) SPEND_PER_CLAIM >= 4016.67 ( 0.233 ) 0.233 SPEND_PER_CLAIM < 4016.67 ( 0.354 ) 0.354 OPTOM_PRESC >= 0.5 ( 0.548 ) OPTOM_PRESC >= 3.5 ( 0.797 ) 0.797 OPTOM_PRESC < 3.5 ( 0.492 ) 0.492 CLAIMS_PER_DURATION >= 0.01954 ( 0.803 ) NO_CLAIMS < 4.5 ( 0.742 ) DOCTOR_VISITS >= 3 ( 0.788 ) 0.788 DOCTOR_VISITS < 3 ( 0.632 ) 0.632 NO_CLAIMS >= 4.5 ( 0.91 ) CLAIMS_PER_DURATION < 0.02491 ( 0.851 ) 0.851 CLAIMS_PER_DURATION >= 0.02491 ( 0.92 ) 0.920
  • 53. Curiosly while node numbers don’t mean anything across models, obvious that GB and LG share similar structure despite being very different algorithms. However, Tree Representations are just approximations, except in Tree case.
  • 54. Discussion of comparison of Tree representations between LG and GB. The two methods split initially on Claims_per_duration, but at very different values (0.00791 (LG) vs. 0.00583 (GB). Remember that actual logistic regression results had dropped Claims_per_duration. And later levels obviously differ since the initial split is quite different. Therefore, these two models should ‘a priori’ differ in model interpretation.