SlideShare a Scribd company logo
1 of 118
Download to read offline
• Areas to improve
• Principal components main corr part, difficult
to read. Improve on this
• Explain how we got plot of overall prob
against single variable.
Abstract
Statistical and data science models are considered to be, somewhat
pejoratively, black-boxes, interpretation of which has not been
systematically studied.
Molnar’s “Interpretable Machine Learning” is a big effort in finding
solutions. Our presentation is humbler. We aim at presenting visual tools
for model interpretation based on partial dependency plots and their
variants, such as collapsed PDPs created by the presenter, some of
which may be polemical and debatable.
The audience should be versed in models creation, and at least some
insight into partial dependency plots. The presentation will be based on a
simple working example with 6 predictors and one binary target variable
for ease of exposition.
Not possible to detail exhaustively every method described in this
presentation. Extensive document in preparation.
Slides Marked **** can be skipped for easier first reading.
Overall comments and introduction.
Presentation by way of example focusing on Fraud/Default
Data sets and continuing previous chapters.
Aim: study interpretation, diagnosis mostly via Partial
Dependency Plots of logistic regression, Classification
Trees and Regression Boosting.
At present, lots of written opinions and distinctions about
topic. No time to discuss them all. See Molnar’s (2018)
recent book for an overall view.
No discussion about imbalanced data set modeling or
other modeling issues. No discussion on literature, all due
to time constraints.
Interpretation and explanation from whom to whom? ***
Models typically involve multivariate relationships, usually displayed,
summarized and/or measured by specialized statistics. Either
graphically or as tables/formulae.
Graphical representations are easier to “interpret” and “understand”
but not necessarily fully interpretable.
Plus, possible existence of subgroups in data imply larger
interpretations and need of inter group comparisons that can test
patience of ‘to whom’ audience.
Finally, models created by software, not ‘by hand’. Software does not
explain or interpret, it ‘fits’ following computing or algebraic algorithms.
Thus, don’t blame software for poor interpretations.
More importantly, there are EU regulations at present to explain models
under the GDPR “right to explanation” (General Data Protection
Regulation).
Interpretation in context of pre-conceptions:
‘it makes-sense’ . ***
In Middle Ages, world was deemed to be flat, and it ‘made-sense’ .
Pre-conceptions and lack of analytical insight can seriously undermine, if not
mislead, model interpretation.
‘Making sense’ and ‘rules of thumb’, usually based on univariate or at most
bivariate (sometimes causal) relationships (or just convenience), bias
understanding and model creation and selection.
Model Interpretation should not be understood as ‘dumbing down’ of complex
multi-variate relationships but of clear exposition of conditions that lead to an
event, however difficult it may be to disentangle multivariate conditions.
On other hand, modeler should not take refuge in arcane formulae to
hide his/her own superficial understanding of conditions unveiled by
model. Otherwise, the action is called deceit.
“The most difficult subjects
can be explained to the most
slow-witted man if he has not
formed any idea of them
already; but the simplest thing
cannot be made clear to the
most intelligent man if he is
firmly persuaded that he knows
already, without a shadow of
doubt, what is laid before him.”
Leo Tolstoy, “The Kingdom of
God Is Within You” (1894)
Interpretation in context of many competing
models. ***
It is desirable that ONE and just ONE interpretation be the final outcome
of a model search. But just like in criminal detection there may be many
suspects with different or similar motivations, different models may
sometimes be interpreted similarly but often, interpretations are vastly
different.
model interpretation should determine or condition model selection?
Practice: model creation and selection come prior to model
interpretation, assuming competent model creation and interpretation.
Personal preference: If model non-interpretable, NOT A GOOD MODEL.
Classical statisticians however neglected variable selection and ensuing
model uncertainty.
Model Interpretation categorization.
Just as in EDA (but on model results, not on initial data), three
types:
Univariate Model Interpretation (UMI): One variable at a
type. EASIEST to understand and huge source of “makes
sense”. E.g., Classical linear models interpretations. E.g.,
reasons to decline a loan.
Bivariate Model Interpretation (BMI): Looking at pairs of
variables to interpret model results.
Multivariate Model Interpreation (MMI): Overall model
interpretation, most difficult.
Typically, most work results in UMI and perhaps BMI.
Days of Linear Regression Interpretation ***
Based on “ceteris paribus” assumption that fails In case of
Even relatively small VIFs. At present, rule of thumb VIF >=
10 (R-sq = .90 among predictors)  unstable model.
“Ceteris paribus” exercise: Keeping all other predictors
Constant, an increase in …. But if R-sq among predictors is
Even 10%, not possible to keep all predictors constant while
Increasing by 1 the variable of interest.
Advantages: EASY to conceptualize because practice
Follows notion of bivariate correlation.
But notion is generally wrong in multivariate case..
Corr (X,Y) = if SD(Y) = SD(X). E.g., if both
Standardized, otherwise same sign at least, and
interpretation from correlation holds in simple regression
case.
Notice that regression of X on Y is NOT inverse of
regression of Y on X because of SD(X) and SD(Y).
/
Confusion on signs of coefficients
and interpretation.
( )
ˆ {
( )
} ˆ
( ) ( )
y
i
xy xy
x
i
xy
Y X
s
Y Y
r r
s
X X
sg r sg
  
 
  

  




2
1 2
2
17 2019-05-10
In multiple linear regression, previous relationship does
not hold because predictors can be correlated (rxz)
weighted by ryz, hinting at co-linearity and/or relationships
of supression/enhancement 
. .
. 2
2
But in multivariate, e.g.: ,
estimated equation (emphasizing "partial")
and for example:
ˆ ˆ ˆ ,
ˆ
1
ˆ
( ) ( )
( ) ( ) and 1
YX Z YZ X
Y YX YZ XZ
YX Z
X XZ
YX
YX YZ XZ XZ
Y X Z
Y a X Z
s r r r
s r
sg sg r
abs r abs r r r
   
 
 
   

  



 
  
Comment
Even in traditional UMI land, we find that
multivariate relations given by Partial- and semi-
partial correlations must be part of the
interpretation.
Note that while correlation is a bivariate
relationship, partial and semipartial corrs can be
extended to multivariate setting.
However, even BMI and certainly MMI not so often
performed.
Searching for Important variables en route to answering
modeling question.
Case study: minimum components to make a car go
along highway.
1) Engine
2) Tires
3) Steering wheel
4) Transmission
5) Gas
6) ….. Other MMI aspects and interrelations.
Take just one of them out, and the car won’t drive. There is no
SINGLE most important variable but a minimum irreducible set of
them. In Data Science case with n  ∞, possibly many subsets of
‘important’ variables.
But “suspect VARS” good starting point of research.
Model Name Item Information
M2 TRN DATA set train
. TRN num obs 3595
VAL DATA set validata
. VAL num obs 2365
TST DATA set
. TST num obs 0
Dep. Var fraud
TRN % Events 20.389
VAL % Events 19.281
TST % Events
Original Vars + Labels Model
Name
M2
Variable Label
**
DOCTOR_VISITS Total visits to a doctor
MEMBER_DURATION Membership duration
**
NO_CLAIMS No of claims made recently
**
NUM_MEMBERS Number of members covered
**
OPTOM_PRESC Number of opticals claimed
**
TOTAL_SPEND Total spent on opticals
**
Requested Models: Names & Descriptions.
Full Model Name Model Description
Overall Models
M2 20 pct prior
M2_BY_DEPVAR Inference
01_M2_GB_TRN_TREES Tree Repr. for Gradient Boosting
02_M2_TRN_GRAD_BOOSTING Gradient Boosting
03_M2_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE
04_M2_VAL_GRAD_BOOSTING Gradient Boosting
05_M2_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE
— 26 —
Data set: Definition by way of Example
• Health insurance company:
Ophtamologic Insurance Claims
• Is claim valid or fraudulent? Binary
target.
• No transformations created to have
simple data set.
• Full description and analysis of this data
set in
https://www.slideshare.net/LeonardoAuslender
(lectures at Principal Analytics Prep).
Ch. 1.1-27
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informa
t
Label
3 DOCTOR_VISITS Num 8 BEST12
.
F12. Total visits to a doctor
1 FRAUD Num 8 BEST12
.
F12. Fraudulent Activity yes/no
5 MEMBER_DURAT
ION
Num 8 Membership duration
4 NO_CLAIMS Num 8 BEST12
.
F12. No of claims made recently
7 NUM_MEMBERS Num 8 Number of members covered
6 OPTOM_PRESC Num 8 BEST12
.
F12. Number of opticals claimed
2 TOTAL_SPEND Num 8 BEST12
.
F12. Total spent on opticals
Note: No nominal predictors. No transformations to keep presentation simple
But not simpler than necessary
....
Reporting area for
all Model.s
coefficients
Importance, etc. and
Selected Variables.
Vars * Models * Coeffs
Model Name
M2_TRN_GRAD_BO
OSTING
M2_TRN_LOGIS
TIC_STEPWISE
M2_VAL_GRAD_BO
OSTING
M2_VAL_LOGISTIC_S
TEPWISE
Coeff /
Importanc
e
PVal /
Nrules
Coeff /
Import
ance
PVal /
Nrules
Coeff /
Importa
nce
PVal /
Nrules
Coeff /
Importan
ce
PVal /
Nrules
Variable
0.1099 2.000 0.1099 2.000
NUM_MEMBERS
OPTOM_PRESC
0.6211 19.000 0.2178 0.000 0.6211 19.000 0.1463 0.000
DOCTOR_VISITS
0.4434 20.000
-
0.0171 0.020 0.4434 20.000 -0.0065 0.428
MEMBER_DURATION
0.7843 41.000
-
0.0066 0.000 0.7843 41.000 -0.0065 0.000
TOTAL_SPEND
0.6864 29.000
-
0.0000 0.003 0.6864 29.000 -0.0000 0.004
NO_CLAIMS
1.0000 19.000 0.7752 0.000 1.0000 19.000 0.7610 0.000
INTERCEPT -
0.5767 0.000 -0.5635 0.001
Logistic Selection Steps
Model
Name
M2_TRN
_LOGIST
IC_STEP
WISE
# in
mo
del
P-
val
ue
Step Effect Entered Effect Removed
1 .00
1 no_claims
2 member_duration
2 .00
3 optom_presc
3 .00
4 total_spend
4 .00
5 doctor_visits
5 .02
Dropped Num_members.
Mg: Effect:
Change in Prob as X changes.
Some conclusions and comments so far:
. Logistic stepwise dropped Num_members that is shown
with lowest relative importance in GB. Notice that Logistic
Regression does not have agreed-upon scale of importance.
We can use odds-ratios, e.g.
. NO_CLAIMS is deemed most important single variable for
GB, but logistic deems OPTOM_PRESC as the second one
(via odds ratios), while GB selected MEMBER_DURATION.
. Remaining variables have odds ratios of 1 which seem to
indicate similar effect, while GB distinguishes relative
importance after first two variables.
LG does not have measure of importance, as GB does 
we use marginal effects plots that indicate change in
probability along variable range. Except for Member
duration (that declines initially), other effects have
positive effect of different intensity and max value declines
as per logistic shape. Member duration has pronounced
decline for low duration levels,  possibility of fraudulent
members who join, commit their fraud and leave.
Note sharper increase in prob. for no_claims at bins 1 and
8. Optom_presc at 6.
GB Importance measures impact of individual inputs on
predicting Y, but don’t tell how impact changes along
range of inputs and individual variable effects are not in
consideration
 Use Partial Dependency Plots, also for LG in free ride.
Marginal Effects and PDPs
Marginal effects refer to change in probability with one
unit change in X, ceteris paribus (if meaningful or at
least desirable).
PDPs do not indicate change in Y at all; instead, PDP
measures probability levels at different values of X1
measuring all other predictors at their means (or modes,
medians, etc.).
 No Marginality in PDPs, unless we measured ‘change’
in probability as well. Shown later on, called Marginal
PDPs.
bù shì de
Maybe?.
Tree representation(s) up to 4 levels Model M2_GB_TRN_TREES
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.464
no_claims < 2.5 ( 0.185 ) no_claims < 0.5 (
0.159 )
member_duration
< 180.5 ( 0.199 )
total_spend < 5250
( 0.464 )
total_spend >=
5250 ( 0.186 ) 0.186
member_duration
>= 180.5 ( 0.103 )
doctor_visits >=
5.5 ( 0.093 ) 0.093
doctor_visits < 5.5
( 0.126 ) 0.126
no_claims >= 0.5 (
0.321 )
optom_presc < 3.5
( 0.291 )
total_spend >=
6300 ( 0.273 ) 0.273
total_spend < 6300
( 0.467 ) 0.467
optom_presc >=
3.5 ( 0.59 )
member_duration
< 154.5 ( 0.67 ) 0.670
member_duration
>= 154.5 ( 0.447 ) 0.447
no_claims >= 2.5 ( 0.633 ) no_claims < 4.5 (
0.57 )
optom_presc < 3.5
( 0.54 )
member_duration
>= 128.5 ( 0.498 ) 0.498
member_duration
< 128.5 ( 0.627 ) 0.627
optom_presc >=
3.5 ( 0.81 )
member_duration
>= 137 ( 0.785 ) 0.785
member_duration
< 137 ( 0.85 ) 0.850
no_claims >= 4.5 (
0.761 )
member_duration
< 303.5 ( 0.778 )
member_duration
>= 148 ( 0.757 ) 0.757
member_duration
< 148 ( 0.823 ) 0.823
Missing one line.
Tree representation(s) up to 4 levels Model M2_LG_TRN_TREES
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.195
no_claims < 1.5 ( 0.164 ) member_duration < 155.5 (
0.235 )
optom_presc < 3.5 ( 0.213 ) no_claims < 0.5 ( 0.195 )
no_claims >= 0.5 ( 0.337 ) 0.337
optom_presc >= 3.5 ( 0.49 ) optom_presc < 6.5 ( 0.404 ) 0.404
optom_presc >= 6.5 ( 0.647
) 0.647
member_duration >= 155.5
( 0.111 )
optom_presc < 3.5 ( 0.103 ) member_duration >= 246.5
( 0.065 )
0.065
member_duration < 246.5 (
0.122 ) 0.122
optom_presc >= 3.5 ( 0.235
)
no_claims >= 0.5 ( 0.353 ) 0.353
no_claims < 0.5 ( 0.213 ) 0.213
no_claims >= 1.5 ( 0.61 ) no_claims < 2.5 ( 0.451 ) member_duration < 155.5 (
0.562 )
optom_presc >= 1.5 ( 0.651
) 0.651
optom_presc < 1.5 ( 0.493 ) 0.493
member_duration >= 155.5
( 0.353 )
member_duration >= 237 (
0.204 )
0.204
member_duration < 237 (
0.39 ) 0.390
no_claims >= 2.5 ( 0.748 ) no_claims < 4.5 ( 0.675 ) member_duration >= 236.5
( 0.477 )
0.477
member_duration < 236.5 (
0.721 ) 0.721
no_claims >= 4.5 ( 0.899 ) member_duration >= 272 (
0.741 )
0.741
Missing one line.
Comment
LG starts by splitting on NO_CLAIMS >= 2 as likely fraud,
while GB >= 3. Predictions for first level across 2 models are
similar ( .185 ; .633) for GB vs. (.164, .61) for LG, which
indicates that the structures identified so far are similar.
While in 2nd level, GB only splits on NO_CLAIMS, LG splits
on MEMBER_DURATION for suspected non-fraudsters in the
first stage and on NO_CLAIMS for the fraudster suspects.
Predictions are similar only for the 4th node in level 2 (.748
and .761) but different otherwise. The careful reader may
verify that these two predictions emerge by splitting on
no_claims albeit at different values, which supports the
notion of No_claims being the leading clue in our research.
NO_CLAIMS is not so heavily used after the 2nd level
however, and the structures of the models are clearly
different. GB does not use it at all, while LG splits at 4.5 to
produce the highest prediction level of .899. While GB did
split initially on No_claims 2.5 and then on 4.5, it did not
reach the same level of prediction as LG that started
splitting at 1.5.
By going to the marginal effects plot, we can see that
No_claims has the largest slope for low values, but
member_duration has the highest for highest value of the
variable. But no similar plots can be created for GB.
Thus, found structures and consequent interpretations differ
and there is no isomorphism from one into the other.
Perhaps fractal approximation?
2nd most “important variable”, very different structures.
1st most “important” variable, very different structures.
Great! similar posterior probabilities, different structures but maybe similar
Interpretations? Note more discrepancies when Prob is higher.
....
Ranking the
models
by GOF.
Strongly summarized area for brevity sake, and just for completion.
GOF ranks
GOF measure
rank
AUR
OC
Avg
Squa
re
Error
Class
Rate
Cum
Lift
3rd
bin
Cum
Resp
Rate
3rd Gini
Preci
sion
Rate
Rsqu
are
Cram
er
Tjur
Ran
k
Ran
k
Ran
k
Ran
k
Ran
k
Ran
k
Ran
k
Ran
k
Unw
.
Mea
n
Unw
.
Medi
an
Model Name
1 1 2 1 1 1 1 1 1.13 1
02_M2_TRN_GRAD_BOOST
ING
03_M2_TRN_LOGISTIC_ST
EPWISE 2 2 1 2 2 2 2 2 1.88 2
GOF ranks
GOF measure
rank
AUR
OC
Avg
Squa
re
Error
Class
Rate
Cum
Lift
3rd
bin
Cum
Resp
Rate
3rd Gini
Preci
sion
Rate
Rsqu
are
Cram
er
Tjur
Rank Rank Rank Rank Rank Rank Rank Rank
Unw.
Mean
Unw.
Medi
an
Model Name
1 1 2 1 1 1 1 1 1.13 1
04_M2_VAL_GRAD_BOOS
TING
05_M2_VAL_LOGISTIC_S
TEPWISE 2 2 1 2 2 2 2 2 1.88 2
....
Profile and
Model Interpretation
Area.
....
Univariate Profile diagnostics
for 6
Important Vars.
....
Event Proportions
and Posterior Probabilites
for 5
Important Vars.
by original
Model Names.
Variables and probabilities binned for ease of
visualization. Proportion events same across models (it’s
just original data), but probabilities differ across models.
Not all cases shown.
Etc for the other variables.
Some observations
Binned No_claims: While similar in shape, GradBoost seriously
underestimates proportion of events throughout, while logistic has the
problem for bins 2, 3, 5, 6, 7. Logistic has a positive slope, while GB
flattens due to interactive GB model. Up to bin 7, similar behavior for
GB and LG, and then LG jumps to higher level of probability.
Binned Member_Duration: Probability distributions are similar but not
identical. For bins 1, 2, 3 and 16 methods underestimate proportion of
events. Slightly declining slope for both models.
Binned OPTOM_PRESC: Both methods failed to match proportion of
events in the mid range of the bins. Sudden positive upshift in
positive slope for GB starting at bin 15, while overall flat but positive
slope for Logistic.
....
Rescaled Variables
along binned
Posterior
Probability.
Interpretation: In bin 5, No_claims reaches overall max (100), while for bin 1 max is around
35 and 15 in 0-100 scale for respective models. Same interpretation for Q3, etc.
And Conversely ….. (GB = Tree repr. Of Grad_boosting …)
....
Partial Dependency
Plots and variants for
Non Ensemble
Models.
Some variables may be dropped due to computer resources.
Note the narrow range of GB PDPs compared to those of LG due to GB
interactive nature  more difficult to interpret.
Marginal (1) PDP comparative notes
(Marginal (1): one var at a time. Could also marginalize two vars at a time, not
done in this presentation).
BG marginals are rather flat, except for
MEMBER_DURATION, of which caveat later on.
LG is juicier, NO_CLAIMS increase of probability declines
along range, but OPTOM_PRESC increases, which seems to
indicate that leading reason would be prescriptions and not
overall claims.
Corresponding marginals for logistic end up with slowing
down of growh due to logistic shape. BG is not constrained
in that way.
PDP comparative notes
Overall PDP is Model probability when all predictors are at their means.
For LG, it’s about .17, while GB is .53. Individual PDPs (by def.) are
deviations from Overall when var of interest measured along its range,
while others remain at mean values. GB clumps most PDPs around
Overall, LG clearly distinct values instead.
Highest probability level for GB is around .7 while LG reaches 1, and
minima are around .6 and 0 respectively. Note LG monotonicity while
GB is mostly monotonic (except for Doctor_visits), possibly product of
data set created artificially.
In both cases, NO_CLAIMS appears as leading variable, especially in LG
but while Member_duration is rather flat in GB, it certainly declines
steadily in LG, with a very different interpretation. Longer member
duration implies steadier customer and familiarity. No_members had
been excluded in LG’s stepwise and should not be confused with
MEMBER_DURATION.
....
Now, mix
All previous
Probability
and PDPs
Together.
Member_duration alone brags too much.
UMI: Univariate Model Interpretation.
From preceding pages, we can conclude that:
No_claims: positively associated with increased fraud,
for both logistic and grad, but far steeper slope in
Logistic. Grad_b stays in narrow band of probabilty and
more interactive with other predictors  Grad_b requires
more BMI and MMI. GB’s PDP overshoots posterior
probability  other vars bring down this effect in GB.
Member_duration has a U shape relationship,
especially in Logistic case, while GB has a more spiky
one. Note the high spike at duration minimal and
immediate decline which seems to indicate members that
committed fraud as soon as they joined and left
immediately.
UMI: Univariate Model Interpretation (cont. 1)
PDP view: logistic shows positive effects of NO_CLAIMS
and OPTOM_PRESC, balanced by negative effects of
remaining variables.
Comparing posterior probability with No_claims PDP,
they are almost the same for Logistic. Similarly for
MEMBER_DURATION.
Grad_b instead shows more tepid effects of same
variables, and almost unchanging effects of remaining
predictors. Comparing PDP with probability, other
predictors bring down PDP of No_claims. Similar effects
for MEMBER_DURATION.
....
PDPs for
"Pairs of
Variables"
Note: 3d plots tend to interpolate areas of no data producing false
expectations of results. Thus, sometimes 3d charts preferable to
plots.
Not all Pairs of variables available due to computer resources.
Same for LG.
Note correlations (No_claims) with other variables are relative small when
compared to the pair Member_duration – Doctor_visits and Member_duration -
optom_presc. How will this translate in PDPs for 2 variables at a time?
M_: 'M2_TRN_LOGISTIC_STEPWISE' 'ORIGINAL' PDP Corr '-0.02542'
BINNED ORIGINAL PDP M2_TRN_GRAD_BOOSTING Corr ' 0.05073' NO_CLAIMS
DOCTOR_VISITS
Combination of No_claims & Doctor_visits shows high probability at NE corner
and middle section stable high prob. level. Too many charts to show but
necessary for full interpretation.
BINNED ORIGINAL PDP M2_TRN_GRAD_BOOSTING Corr ' 0.02549' NO_CLAIMS
MEMBER_DURATION
BINNED ORIGINAL PDP M2_TRN_LOGISTIC_STEPWISE Corr ' 0.06580'
NO_CLAIMS OPTOM_PRESC
BINNED ORIGINAL PDP M2_TRN_GRAD_BOOSTING Corr '-0.10759' MEMBER_DURATION
OPTOM_PRESC
BINNED ORIGINAL PDP M2_TRN_LOGISTIC_STEPWISE Corr '-0.10759' MEMBER_DURATION
OPTOM_PRESC
Some BMI comments
LG High levels of NO_CLAIMS have high probability for lowest level
of total_spend which probably denotes one time fraud.
Otherwise, even mid levels of NO_CLAIMS associated with high
probability for any level of TOTAL_SPEND. It seems that FRAUD is
not necessarily linked to TOTAL_SPEND alone.
About NO_CLAIMS and MEMBER_DURATION, fraud happens for low
level of duration, after which fraudsters leave.
About the pair Optom_presc and Member_duration for which we have
contrasting Pair PDPs, with corr = -0.10, interpretation is very
different. While Grad_b shows flat probabilities throughout, except in
NE empty corner, logistic shows more extreme NE corner, plus
declining probabilities from NW top.
For 2 dimensional visualization, collapse 3d chart by averaging
levels of variable 2 into those of variable 1 and compare to original
PDP.
Original and collapsed PDPs are derived from posterior model
probabilities.
No room for TOTAL_SPEND
Comments for BMI.
In case of NO_CLAIMS, all cases show overlapping of
collapsed and Original, except for OPTOM_PRESC. In GB
case, DURATION brings down probability slightly because
duration is itself strong predictor.
LG shows that presence of OPTOM_PRESC raises
posterior probability, not so accentuated in GB. LG model
could benefit of NO_CLAIMS and OPTOM_PRESC
interaction or possibly overall transformation by way of
obtaining information per month and per number of
members. (LG chart with TOTAL_SPEND omitted for space
brevity).
MEMBER_DURATION shows overlap with all second
variables, plus declining slope, more evident in LG
models.
Comments for BMI (cont).
It is possible to obtain 3-way and higher PDPs, and also
collapse them, not tried here.
Given overlap between Original and CPDP, UMI effects are
correct so far, except possibly for triplet NO_CLAIMS,
MEMBER_DURATION and OPTOM_PRESC.
P-comp1 mostly fitted by doctor_visits and member duration. # 2 (which fits
residuals from step 1 ) by No_claims and optom_presc, etc.
1.1-93 2019-05-10
And discriminating by Fraud levels 
For Illustration: No_claims for fraud = 1 still highly correlated with second
eigenvector.
Member_duration cannot compete with NO_CLAIMS.
Overall View and omitting # 2 and # 3 for brevity sake.
Comments for PCA results.
PC # 1 shows MEMBER_DURATION and DOCTOR_VISITS
grouped together, NO_CLAIMS and TOTAL_SPEND in
another, and remaining in separate groups (separation
can be proven by statistical inference).
For logistic case, note that NO_CLAIMS is first ‘entered’
variable in Stepwise selection (earlier slides), followed by
MEMBER_DURATION and OPTOM_PRESC. Note that
NO_CLAIMS does not have largest correlation with first
component, even when looking at correlations by values
of FRAUD.
And GB also has NO_CLAIMS and MEMBER_DURATION,
not represented in hierarchy of Principal Components
Analysis. PCA does not provide framework for
interpreting models.
Comments for PCA results (cont.)
PCA orthogonalizes away predictors effects when going
from step to step, not done by our present modeling
methods.
Having chosen cutoff point in posterior probability,
possible to obtain similar PCA results for predicted 0 and
predicted 1, obtain correlations and compare to previous
results.
Possible to use statistical inference to determine
equality/inequality of correlations (with original results)
for different cutoff points.
Comments for Statistical Inference – Multiple Comparisons.
Bars below ‘0.05’ considered to be significant, having taking
multiple comparisons effects into consideration.
Most results are insignificant, but significant results differ
across models: NO_CLAIMS provides same information
throughout range of Probs for GB, but LG finds first bin to
be significantly different than the rest. GB results come from
splitting GB search. Thus, LG states that lower level
probability indicates NO_FRAUD, GB cannot state that.
Definite monotonic relationship, higher values of NO_CLAIMS (e.g., bin 5)
associated with higher probability. Note differences between LG and BG.
More diffuse relationship.
GB monotonic, LG slighly U-shaped.
Similar coeffs among logistic , GB and beta regressions. By Beta Regression,
standard interpretation of log odds is possible, with caveats.
Vars * Models *
Coeffs
Model Name
M2_TRN_
GRAD_BO
OSTING
M2_TRN_
GRAD_BO
OSTING_
BETA_RE
G
M2_TRN_
LOGISTIC
_STEPWI
SE
M2_TRN_
LOGISTIC
_STEPWI
SE_BETA_
REG
M2_VAL_
GRAD_BO
OSTING
M2_VAL_
LOGISTIC
_STEPWI
SE
Coeff /
Importan
ce
Coeff /
Importan
ce
Coeff /
Importan
ce
Coeff /
Importan
ce
Coeff /
Importan
ce
Coeff /
Importan
ce
Variable
0.7318 -0.0051 -0.0057 -0.0057 0.7318 -0.0084
MEMBER_DURATIO
N
DOCTOR_VISITS
0.3925 -0.0061 0.3925
TOTAL_SPEND
0.6610 -0.0000 -0.0000 -0.0000 0.6610 -0.0000
OPTOM_PRESC
0.5944 0.1713 0.2132 0.2132 0.5944 0.1634
NO_CLAIMS
1.0000 0.7027 0.7921 0.7921 1.0000 0.7351
SCALE
21.8895
2.590291
516E16
INTERCEPT
-0.7979 -0.8352 -0.8352 -0.3111
Beta Regression results
Results of analyzing posterior probabilities (i.e.,
original GB and LG posteriors) via BETA
regression show very similar coefficients and
structures  Beta is reassuring but not providing
additional information.
Possible to say that …
1) Manipulate just NO_CLAIMS and problem
solved?
2) Maybe add MEMBER_DURATION and
OPTOM_PRESC for parts of NO_CLAIMS
range?
3) Maybe add if-then rules from simplified
TREE_REPRESENTATION because easier than
GB and more interactive than LG?
4) If using Neural Network, and NN derivatives
abandon all hope of interpretation?
5)  Interpretation needs definition of INTENDED
AUDIENCE (see Tolstoy ut supra).
Possible to say that … (cont. 1)
1) The analyst needs to focus on NO_CLAIMS,
MEMBER_DURATION and OPTOM_PRESC as
an ‘IMPORTANT’ group.
2) Different model Interpretations should be
entertained.
3) Different marginal effects must be explained.
Final thoughts before I exhaust the audience, if
not exhausted already.
MI analysis can proceed further obtaining
insights from collapsing three way PDPs for
instance.
If preferred ‘easier’ linear model explanation, beta
regression on posterior probability would provide
regression like information. Still, beta regression
is not straightforward and model selection is big
issue.
Ch. 1.1-114 2019-05-10
Future steps
Focus on MMI
1) Collapsing higher PDP orders, i.e., 3 way variables
and interpreting.
2) Beta regression for ‘linear’ interpretation. More
difficult because it requires model search as well.
Plus, additional error in modeling posterior
probability of original model.
3) Andrews’ curves.
Lime: Local Interpretable Model-Agnostic Explanations:
Uses surrogate interpretable model on black-box model, applied to observations of
interest. Tree representation in this presentation similar to this.
(https://homes.cs.washington.edu/~marcotcr/blog/lime/)
ICE: Clusters or classification variable applied to
PDP results. For given predictor, ICE plots draw one line per obs.,
representing how instance’s prediction changes when predictor
changes.
Shap Values: Shapley Additive Explanation (Lundberg et
al, 2017): Measures positive or negative feature contribution to posterior
probability, technique used in game theory to determine each player’s
contribution to success in a game. Affected by correlations among
predictors  focusing just on one predictor to change behavior may
change other predictors as well (available in Python).
AND OTHERS ….
References
Lundberg SM, Lee SI (2017), “Consistent feature
attribution for tree ensembles”, presented at the 2017
ICML Workshop on Human Interpretability in Machine
Learning (WHI 2017), Sydney, NSW, Australia
(https://arxiv.org/abs/1706.06060)
Molnar C. (2018): Interpretable Machine Learning, A
guide for making black box models explainable,
https://christophm.github.io/interpretable-ml-book/
Tolstoy Leo: “The Kingdom of God Is Within You”, (1894)
4_5_Model Interpretation and diagnostics part 4.pdf

More Related Content

Similar to 4_5_Model Interpretation and diagnostics part 4.pdf

Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011Adi Ali
 
Reds interpretability report
Reds interpretability reportReds interpretability report
Reds interpretability reportRaouf KESKES
 
Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1Michael Jacobs, Jr.
 
Robustness in Machine Learning Explanations Does It Matter-1.pdf
Robustness in Machine Learning Explanations Does It Matter-1.pdfRobustness in Machine Learning Explanations Does It Matter-1.pdf
Robustness in Machine Learning Explanations Does It Matter-1.pdfDaniel983829
 
Machine Learning
Machine LearningMachine Learning
Machine LearningShiraz316
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basicsNeeleEilers
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!Eindhoven University of Technology / JADS
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffRaman Kannan
 
Stochastic life cycle costing
Stochastic life cycle costingStochastic life cycle costing
Stochastic life cycle costingBruno De Wachter
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
The Statistical Mystique
The Statistical MystiqueThe Statistical Mystique
The Statistical Mystiquetpkcfa
 
efinitions, descriptions, process explanations, j and instru.docx
efinitions, descriptions, process explanations, j and instru.docxefinitions, descriptions, process explanations, j and instru.docx
efinitions, descriptions, process explanations, j and instru.docxjack60216
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Daniel Katz
 
Real time clustering of time series
Real time clustering of time seriesReal time clustering of time series
Real time clustering of time seriescsandit
 

Similar to 4_5_Model Interpretation and diagnostics part 4.pdf (18)

Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011
 
Reds interpretability report
Reds interpretability reportReds interpretability report
Reds interpretability report
 
Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1Jacobs Kiefer Bayes Guide 3 10 V1
Jacobs Kiefer Bayes Guide 3 10 V1
 
Qt unit i
Qt unit   iQt unit   i
Qt unit i
 
Robustness in Machine Learning Explanations Does It Matter-1.pdf
Robustness in Machine Learning Explanations Does It Matter-1.pdfRobustness in Machine Learning Explanations Does It Matter-1.pdf
Robustness in Machine Learning Explanations Does It Matter-1.pdf
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning basics
Machine Learning basicsMachine Learning basics
Machine Learning basics
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
M4D-v0.4.pdf
M4D-v0.4.pdfM4D-v0.4.pdf
M4D-v0.4.pdf
 
Stochastic life cycle costing
Stochastic life cycle costingStochastic life cycle costing
Stochastic life cycle costing
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
Presentation 5.pptx
Presentation 5.pptxPresentation 5.pptx
Presentation 5.pptx
 
The Statistical Mystique
The Statistical MystiqueThe Statistical Mystique
The Statistical Mystique
 
efinitions, descriptions, process explanations, j and instru.docx
efinitions, descriptions, process explanations, j and instru.docxefinitions, descriptions, process explanations, j and instru.docx
efinitions, descriptions, process explanations, j and instru.docx
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
 
Real time clustering of time series
Real time clustering of time seriesReal time clustering of time series
Real time clustering of time series
 

More from Leonardo Auslender

4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdfLeonardo Auslender
 
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdfLeonardo Auslender
 
4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdfLeonardo Auslender
 
4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdfLeonardo Auslender
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdfLeonardo Auslender
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdfLeonardo Auslender
 
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-074 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-07Leonardo Auslender
 
4 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-074 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-07Leonardo Auslender
 
4 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-074 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-07Leonardo Auslender
 

More from Leonardo Auslender (20)

1 UMI.pdf
1 UMI.pdf1 UMI.pdf
1 UMI.pdf
 
Ensembles.pdf
Ensembles.pdfEnsembles.pdf
Ensembles.pdf
 
Suppression Enhancement.pdf
Suppression Enhancement.pdfSuppression Enhancement.pdf
Suppression Enhancement.pdf
 
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and gradient boosting2.pdf
 
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf4_2_Ensemble models and grad boost part 3.pdf
4_2_Ensemble models and grad boost part 3.pdf
 
4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf4_3_Ensemble models and grad boost part 2.pdf
4_3_Ensemble models and grad boost part 2.pdf
 
4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf4_2_Ensemble models and grad boost part 1.pdf
4_2_Ensemble models and grad boost part 1.pdf
 
4_1_Tree World.pdf
4_1_Tree World.pdf4_1_Tree World.pdf
4_1_Tree World.pdf
 
Classification methods and assessment.pdf
Classification methods and assessment.pdfClassification methods and assessment.pdf
Classification methods and assessment.pdf
 
Linear Regression.pdf
Linear Regression.pdfLinear Regression.pdf
Linear Regression.pdf
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
2 UEDA.pdf
2 UEDA.pdf2 UEDA.pdf
2 UEDA.pdf
 
3 BEDA.pdf
3 BEDA.pdf3 BEDA.pdf
3 BEDA.pdf
 
1 EDA.pdf
1 EDA.pdf1 EDA.pdf
1 EDA.pdf
 
0 Statistics Intro.pdf
0 Statistics Intro.pdf0 Statistics Intro.pdf
0 Statistics Intro.pdf
 
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-074 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 3 2019-10-07
 
4 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-074 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-07
 
4 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-074 2 ensemble models and grad boost part 1 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-07
 
4 meda
4 meda4 meda
4 meda
 

Recently uploaded

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

4_5_Model Interpretation and diagnostics part 4.pdf

  • 1.
  • 2. • Areas to improve • Principal components main corr part, difficult to read. Improve on this • Explain how we got plot of overall prob against single variable.
  • 3. Abstract Statistical and data science models are considered to be, somewhat pejoratively, black-boxes, interpretation of which has not been systematically studied. Molnar’s “Interpretable Machine Learning” is a big effort in finding solutions. Our presentation is humbler. We aim at presenting visual tools for model interpretation based on partial dependency plots and their variants, such as collapsed PDPs created by the presenter, some of which may be polemical and debatable. The audience should be versed in models creation, and at least some insight into partial dependency plots. The presentation will be based on a simple working example with 6 predictors and one binary target variable for ease of exposition. Not possible to detail exhaustively every method described in this presentation. Extensive document in preparation. Slides Marked **** can be skipped for easier first reading.
  • 4.
  • 5. Overall comments and introduction. Presentation by way of example focusing on Fraud/Default Data sets and continuing previous chapters. Aim: study interpretation, diagnosis mostly via Partial Dependency Plots of logistic regression, Classification Trees and Regression Boosting. At present, lots of written opinions and distinctions about topic. No time to discuss them all. See Molnar’s (2018) recent book for an overall view. No discussion about imbalanced data set modeling or other modeling issues. No discussion on literature, all due to time constraints.
  • 6. Interpretation and explanation from whom to whom? *** Models typically involve multivariate relationships, usually displayed, summarized and/or measured by specialized statistics. Either graphically or as tables/formulae. Graphical representations are easier to “interpret” and “understand” but not necessarily fully interpretable. Plus, possible existence of subgroups in data imply larger interpretations and need of inter group comparisons that can test patience of ‘to whom’ audience. Finally, models created by software, not ‘by hand’. Software does not explain or interpret, it ‘fits’ following computing or algebraic algorithms. Thus, don’t blame software for poor interpretations. More importantly, there are EU regulations at present to explain models under the GDPR “right to explanation” (General Data Protection Regulation).
  • 7. Interpretation in context of pre-conceptions: ‘it makes-sense’ . *** In Middle Ages, world was deemed to be flat, and it ‘made-sense’ . Pre-conceptions and lack of analytical insight can seriously undermine, if not mislead, model interpretation. ‘Making sense’ and ‘rules of thumb’, usually based on univariate or at most bivariate (sometimes causal) relationships (or just convenience), bias understanding and model creation and selection. Model Interpretation should not be understood as ‘dumbing down’ of complex multi-variate relationships but of clear exposition of conditions that lead to an event, however difficult it may be to disentangle multivariate conditions. On other hand, modeler should not take refuge in arcane formulae to hide his/her own superficial understanding of conditions unveiled by model. Otherwise, the action is called deceit.
  • 8. “The most difficult subjects can be explained to the most slow-witted man if he has not formed any idea of them already; but the simplest thing cannot be made clear to the most intelligent man if he is firmly persuaded that he knows already, without a shadow of doubt, what is laid before him.” Leo Tolstoy, “The Kingdom of God Is Within You” (1894)
  • 9. Interpretation in context of many competing models. *** It is desirable that ONE and just ONE interpretation be the final outcome of a model search. But just like in criminal detection there may be many suspects with different or similar motivations, different models may sometimes be interpreted similarly but often, interpretations are vastly different. model interpretation should determine or condition model selection? Practice: model creation and selection come prior to model interpretation, assuming competent model creation and interpretation. Personal preference: If model non-interpretable, NOT A GOOD MODEL. Classical statisticians however neglected variable selection and ensuing model uncertainty.
  • 10.
  • 11.
  • 12.
  • 13. Model Interpretation categorization. Just as in EDA (but on model results, not on initial data), three types: Univariate Model Interpretation (UMI): One variable at a type. EASIEST to understand and huge source of “makes sense”. E.g., Classical linear models interpretations. E.g., reasons to decline a loan. Bivariate Model Interpretation (BMI): Looking at pairs of variables to interpret model results. Multivariate Model Interpreation (MMI): Overall model interpretation, most difficult. Typically, most work results in UMI and perhaps BMI.
  • 14.
  • 15. Days of Linear Regression Interpretation *** Based on “ceteris paribus” assumption that fails In case of Even relatively small VIFs. At present, rule of thumb VIF >= 10 (R-sq = .90 among predictors)  unstable model. “Ceteris paribus” exercise: Keeping all other predictors Constant, an increase in …. But if R-sq among predictors is Even 10%, not possible to keep all predictors constant while Increasing by 1 the variable of interest. Advantages: EASY to conceptualize because practice Follows notion of bivariate correlation. But notion is generally wrong in multivariate case..
  • 16. Corr (X,Y) = if SD(Y) = SD(X). E.g., if both Standardized, otherwise same sign at least, and interpretation from correlation holds in simple regression case. Notice that regression of X on Y is NOT inverse of regression of Y on X because of SD(X) and SD(Y). / Confusion on signs of coefficients and interpretation. ( ) ˆ { ( ) } ˆ ( ) ( ) y i xy xy x i xy Y X s Y Y r r s X X sg r sg                 2 1 2 2
  • 17. 17 2019-05-10 In multiple linear regression, previous relationship does not hold because predictors can be correlated (rxz) weighted by ryz, hinting at co-linearity and/or relationships of supression/enhancement  . . . 2 2 But in multivariate, e.g.: , estimated equation (emphasizing "partial") and for example: ˆ ˆ ˆ , ˆ 1 ˆ ( ) ( ) ( ) ( ) and 1 YX Z YZ X Y YX YZ XZ YX Z X XZ YX YX YZ XZ XZ Y X Z Y a X Z s r r r s r sg sg r abs r abs r r r                        
  • 18. Comment Even in traditional UMI land, we find that multivariate relations given by Partial- and semi- partial correlations must be part of the interpretation. Note that while correlation is a bivariate relationship, partial and semipartial corrs can be extended to multivariate setting. However, even BMI and certainly MMI not so often performed.
  • 19.
  • 20. Searching for Important variables en route to answering modeling question. Case study: minimum components to make a car go along highway. 1) Engine 2) Tires 3) Steering wheel 4) Transmission 5) Gas 6) ….. Other MMI aspects and interrelations. Take just one of them out, and the car won’t drive. There is no SINGLE most important variable but a minimum irreducible set of them. In Data Science case with n  ∞, possibly many subsets of ‘important’ variables. But “suspect VARS” good starting point of research.
  • 21.
  • 22.
  • 23. Model Name Item Information M2 TRN DATA set train . TRN num obs 3595 VAL DATA set validata . VAL num obs 2365 TST DATA set . TST num obs 0 Dep. Var fraud TRN % Events 20.389 VAL % Events 19.281 TST % Events
  • 24. Original Vars + Labels Model Name M2 Variable Label ** DOCTOR_VISITS Total visits to a doctor MEMBER_DURATION Membership duration ** NO_CLAIMS No of claims made recently ** NUM_MEMBERS Number of members covered ** OPTOM_PRESC Number of opticals claimed ** TOTAL_SPEND Total spent on opticals **
  • 25. Requested Models: Names & Descriptions. Full Model Name Model Description Overall Models M2 20 pct prior M2_BY_DEPVAR Inference 01_M2_GB_TRN_TREES Tree Repr. for Gradient Boosting 02_M2_TRN_GRAD_BOOSTING Gradient Boosting 03_M2_TRN_LOGISTIC_STEPWISE Logistic TRN STEPWISE 04_M2_VAL_GRAD_BOOSTING Gradient Boosting 05_M2_VAL_LOGISTIC_STEPWISE Logistic VAL STEPWISE
  • 26. — 26 — Data set: Definition by way of Example • Health insurance company: Ophtamologic Insurance Claims • Is claim valid or fraudulent? Binary target. • No transformations created to have simple data set. • Full description and analysis of this data set in https://www.slideshare.net/LeonardoAuslender (lectures at Principal Analytics Prep).
  • 27. Ch. 1.1-27 Alphabetic List of Variables and Attributes # Variable Type Len Format Informa t Label 3 DOCTOR_VISITS Num 8 BEST12 . F12. Total visits to a doctor 1 FRAUD Num 8 BEST12 . F12. Fraudulent Activity yes/no 5 MEMBER_DURAT ION Num 8 Membership duration 4 NO_CLAIMS Num 8 BEST12 . F12. No of claims made recently 7 NUM_MEMBERS Num 8 Number of members covered 6 OPTOM_PRESC Num 8 BEST12 . F12. Number of opticals claimed 2 TOTAL_SPEND Num 8 BEST12 . F12. Total spent on opticals Note: No nominal predictors. No transformations to keep presentation simple But not simpler than necessary
  • 28. .... Reporting area for all Model.s coefficients Importance, etc. and Selected Variables.
  • 29. Vars * Models * Coeffs Model Name M2_TRN_GRAD_BO OSTING M2_TRN_LOGIS TIC_STEPWISE M2_VAL_GRAD_BO OSTING M2_VAL_LOGISTIC_S TEPWISE Coeff / Importanc e PVal / Nrules Coeff / Import ance PVal / Nrules Coeff / Importa nce PVal / Nrules Coeff / Importan ce PVal / Nrules Variable 0.1099 2.000 0.1099 2.000 NUM_MEMBERS OPTOM_PRESC 0.6211 19.000 0.2178 0.000 0.6211 19.000 0.1463 0.000 DOCTOR_VISITS 0.4434 20.000 - 0.0171 0.020 0.4434 20.000 -0.0065 0.428 MEMBER_DURATION 0.7843 41.000 - 0.0066 0.000 0.7843 41.000 -0.0065 0.000 TOTAL_SPEND 0.6864 29.000 - 0.0000 0.003 0.6864 29.000 -0.0000 0.004 NO_CLAIMS 1.0000 19.000 0.7752 0.000 1.0000 19.000 0.7610 0.000 INTERCEPT - 0.5767 0.000 -0.5635 0.001
  • 30. Logistic Selection Steps Model Name M2_TRN _LOGIST IC_STEP WISE # in mo del P- val ue Step Effect Entered Effect Removed 1 .00 1 no_claims 2 member_duration 2 .00 3 optom_presc 3 .00 4 total_spend 4 .00 5 doctor_visits 5 .02 Dropped Num_members.
  • 31.
  • 32. Mg: Effect: Change in Prob as X changes.
  • 33.
  • 34. Some conclusions and comments so far: . Logistic stepwise dropped Num_members that is shown with lowest relative importance in GB. Notice that Logistic Regression does not have agreed-upon scale of importance. We can use odds-ratios, e.g. . NO_CLAIMS is deemed most important single variable for GB, but logistic deems OPTOM_PRESC as the second one (via odds ratios), while GB selected MEMBER_DURATION. . Remaining variables have odds ratios of 1 which seem to indicate similar effect, while GB distinguishes relative importance after first two variables.
  • 35. LG does not have measure of importance, as GB does  we use marginal effects plots that indicate change in probability along variable range. Except for Member duration (that declines initially), other effects have positive effect of different intensity and max value declines as per logistic shape. Member duration has pronounced decline for low duration levels,  possibility of fraudulent members who join, commit their fraud and leave. Note sharper increase in prob. for no_claims at bins 1 and 8. Optom_presc at 6. GB Importance measures impact of individual inputs on predicting Y, but don’t tell how impact changes along range of inputs and individual variable effects are not in consideration  Use Partial Dependency Plots, also for LG in free ride.
  • 36. Marginal Effects and PDPs Marginal effects refer to change in probability with one unit change in X, ceteris paribus (if meaningful or at least desirable). PDPs do not indicate change in Y at all; instead, PDP measures probability levels at different values of X1 measuring all other predictors at their means (or modes, medians, etc.).  No Marginality in PDPs, unless we measured ‘change’ in probability as well. Shown later on, called Marginal PDPs.
  • 37.
  • 39.
  • 40.
  • 41. Tree representation(s) up to 4 levels Model M2_GB_TRN_TREES Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.464 no_claims < 2.5 ( 0.185 ) no_claims < 0.5 ( 0.159 ) member_duration < 180.5 ( 0.199 ) total_spend < 5250 ( 0.464 ) total_spend >= 5250 ( 0.186 ) 0.186 member_duration >= 180.5 ( 0.103 ) doctor_visits >= 5.5 ( 0.093 ) 0.093 doctor_visits < 5.5 ( 0.126 ) 0.126 no_claims >= 0.5 ( 0.321 ) optom_presc < 3.5 ( 0.291 ) total_spend >= 6300 ( 0.273 ) 0.273 total_spend < 6300 ( 0.467 ) 0.467 optom_presc >= 3.5 ( 0.59 ) member_duration < 154.5 ( 0.67 ) 0.670 member_duration >= 154.5 ( 0.447 ) 0.447 no_claims >= 2.5 ( 0.633 ) no_claims < 4.5 ( 0.57 ) optom_presc < 3.5 ( 0.54 ) member_duration >= 128.5 ( 0.498 ) 0.498 member_duration < 128.5 ( 0.627 ) 0.627 optom_presc >= 3.5 ( 0.81 ) member_duration >= 137 ( 0.785 ) 0.785 member_duration < 137 ( 0.85 ) 0.850 no_claims >= 4.5 ( 0.761 ) member_duration < 303.5 ( 0.778 ) member_duration >= 148 ( 0.757 ) 0.757 member_duration < 148 ( 0.823 ) 0.823 Missing one line.
  • 42. Tree representation(s) up to 4 levels Model M2_LG_TRN_TREES Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.195 no_claims < 1.5 ( 0.164 ) member_duration < 155.5 ( 0.235 ) optom_presc < 3.5 ( 0.213 ) no_claims < 0.5 ( 0.195 ) no_claims >= 0.5 ( 0.337 ) 0.337 optom_presc >= 3.5 ( 0.49 ) optom_presc < 6.5 ( 0.404 ) 0.404 optom_presc >= 6.5 ( 0.647 ) 0.647 member_duration >= 155.5 ( 0.111 ) optom_presc < 3.5 ( 0.103 ) member_duration >= 246.5 ( 0.065 ) 0.065 member_duration < 246.5 ( 0.122 ) 0.122 optom_presc >= 3.5 ( 0.235 ) no_claims >= 0.5 ( 0.353 ) 0.353 no_claims < 0.5 ( 0.213 ) 0.213 no_claims >= 1.5 ( 0.61 ) no_claims < 2.5 ( 0.451 ) member_duration < 155.5 ( 0.562 ) optom_presc >= 1.5 ( 0.651 ) 0.651 optom_presc < 1.5 ( 0.493 ) 0.493 member_duration >= 155.5 ( 0.353 ) member_duration >= 237 ( 0.204 ) 0.204 member_duration < 237 ( 0.39 ) 0.390 no_claims >= 2.5 ( 0.748 ) no_claims < 4.5 ( 0.675 ) member_duration >= 236.5 ( 0.477 ) 0.477 member_duration < 236.5 ( 0.721 ) 0.721 no_claims >= 4.5 ( 0.899 ) member_duration >= 272 ( 0.741 ) 0.741 Missing one line.
  • 43. Comment LG starts by splitting on NO_CLAIMS >= 2 as likely fraud, while GB >= 3. Predictions for first level across 2 models are similar ( .185 ; .633) for GB vs. (.164, .61) for LG, which indicates that the structures identified so far are similar. While in 2nd level, GB only splits on NO_CLAIMS, LG splits on MEMBER_DURATION for suspected non-fraudsters in the first stage and on NO_CLAIMS for the fraudster suspects. Predictions are similar only for the 4th node in level 2 (.748 and .761) but different otherwise. The careful reader may verify that these two predictions emerge by splitting on no_claims albeit at different values, which supports the notion of No_claims being the leading clue in our research.
  • 44. NO_CLAIMS is not so heavily used after the 2nd level however, and the structures of the models are clearly different. GB does not use it at all, while LG splits at 4.5 to produce the highest prediction level of .899. While GB did split initially on No_claims 2.5 and then on 4.5, it did not reach the same level of prediction as LG that started splitting at 1.5. By going to the marginal effects plot, we can see that No_claims has the largest slope for low values, but member_duration has the highest for highest value of the variable. But no similar plots can be created for GB. Thus, found structures and consequent interpretations differ and there is no isomorphism from one into the other. Perhaps fractal approximation?
  • 45. 2nd most “important variable”, very different structures.
  • 46. 1st most “important” variable, very different structures.
  • 47.
  • 48. Great! similar posterior probabilities, different structures but maybe similar Interpretations? Note more discrepancies when Prob is higher.
  • 49. .... Ranking the models by GOF. Strongly summarized area for brevity sake, and just for completion.
  • 50. GOF ranks GOF measure rank AUR OC Avg Squa re Error Class Rate Cum Lift 3rd bin Cum Resp Rate 3rd Gini Preci sion Rate Rsqu are Cram er Tjur Ran k Ran k Ran k Ran k Ran k Ran k Ran k Ran k Unw . Mea n Unw . Medi an Model Name 1 1 2 1 1 1 1 1 1.13 1 02_M2_TRN_GRAD_BOOST ING 03_M2_TRN_LOGISTIC_ST EPWISE 2 2 1 2 2 2 2 2 1.88 2 GOF ranks GOF measure rank AUR OC Avg Squa re Error Class Rate Cum Lift 3rd bin Cum Resp Rate 3rd Gini Preci sion Rate Rsqu are Cram er Tjur Rank Rank Rank Rank Rank Rank Rank Rank Unw. Mean Unw. Medi an Model Name 1 1 2 1 1 1 1 1 1.13 1 04_M2_VAL_GRAD_BOOS TING 05_M2_VAL_LOGISTIC_S TEPWISE 2 2 1 2 2 2 2 2 1.88 2
  • 51. .... Profile and Model Interpretation Area. .... Univariate Profile diagnostics for 6 Important Vars.
  • 52. .... Event Proportions and Posterior Probabilites for 5 Important Vars. by original Model Names. Variables and probabilities binned for ease of visualization. Proportion events same across models (it’s just original data), but probabilities differ across models. Not all cases shown.
  • 53.
  • 54.
  • 55. Etc for the other variables.
  • 56. Some observations Binned No_claims: While similar in shape, GradBoost seriously underestimates proportion of events throughout, while logistic has the problem for bins 2, 3, 5, 6, 7. Logistic has a positive slope, while GB flattens due to interactive GB model. Up to bin 7, similar behavior for GB and LG, and then LG jumps to higher level of probability. Binned Member_Duration: Probability distributions are similar but not identical. For bins 1, 2, 3 and 16 methods underestimate proportion of events. Slightly declining slope for both models. Binned OPTOM_PRESC: Both methods failed to match proportion of events in the mid range of the bins. Sudden positive upshift in positive slope for GB starting at bin 15, while overall flat but positive slope for Logistic.
  • 58. Interpretation: In bin 5, No_claims reaches overall max (100), while for bin 1 max is around 35 and 15 in 0-100 scale for respective models. Same interpretation for Q3, etc.
  • 59.
  • 60.
  • 61.
  • 62. And Conversely ….. (GB = Tree repr. Of Grad_boosting …)
  • 63.
  • 64. .... Partial Dependency Plots and variants for Non Ensemble Models. Some variables may be dropped due to computer resources.
  • 65. Note the narrow range of GB PDPs compared to those of LG due to GB interactive nature  more difficult to interpret.
  • 66.
  • 67. Marginal (1) PDP comparative notes (Marginal (1): one var at a time. Could also marginalize two vars at a time, not done in this presentation). BG marginals are rather flat, except for MEMBER_DURATION, of which caveat later on. LG is juicier, NO_CLAIMS increase of probability declines along range, but OPTOM_PRESC increases, which seems to indicate that leading reason would be prescriptions and not overall claims. Corresponding marginals for logistic end up with slowing down of growh due to logistic shape. BG is not constrained in that way.
  • 68. PDP comparative notes Overall PDP is Model probability when all predictors are at their means. For LG, it’s about .17, while GB is .53. Individual PDPs (by def.) are deviations from Overall when var of interest measured along its range, while others remain at mean values. GB clumps most PDPs around Overall, LG clearly distinct values instead. Highest probability level for GB is around .7 while LG reaches 1, and minima are around .6 and 0 respectively. Note LG monotonicity while GB is mostly monotonic (except for Doctor_visits), possibly product of data set created artificially. In both cases, NO_CLAIMS appears as leading variable, especially in LG but while Member_duration is rather flat in GB, it certainly declines steadily in LG, with a very different interpretation. Longer member duration implies steadier customer and familiarity. No_members had been excluded in LG’s stepwise and should not be confused with MEMBER_DURATION.
  • 70.
  • 72.
  • 73. UMI: Univariate Model Interpretation. From preceding pages, we can conclude that: No_claims: positively associated with increased fraud, for both logistic and grad, but far steeper slope in Logistic. Grad_b stays in narrow band of probabilty and more interactive with other predictors  Grad_b requires more BMI and MMI. GB’s PDP overshoots posterior probability  other vars bring down this effect in GB. Member_duration has a U shape relationship, especially in Logistic case, while GB has a more spiky one. Note the high spike at duration minimal and immediate decline which seems to indicate members that committed fraud as soon as they joined and left immediately.
  • 74. UMI: Univariate Model Interpretation (cont. 1) PDP view: logistic shows positive effects of NO_CLAIMS and OPTOM_PRESC, balanced by negative effects of remaining variables. Comparing posterior probability with No_claims PDP, they are almost the same for Logistic. Similarly for MEMBER_DURATION. Grad_b instead shows more tepid effects of same variables, and almost unchanging effects of remaining predictors. Comparing PDP with probability, other predictors bring down PDP of No_claims. Similar effects for MEMBER_DURATION.
  • 75. .... PDPs for "Pairs of Variables" Note: 3d plots tend to interpolate areas of no data producing false expectations of results. Thus, sometimes 3d charts preferable to plots. Not all Pairs of variables available due to computer resources.
  • 76. Same for LG. Note correlations (No_claims) with other variables are relative small when compared to the pair Member_duration – Doctor_visits and Member_duration - optom_presc. How will this translate in PDPs for 2 variables at a time?
  • 78. BINNED ORIGINAL PDP M2_TRN_GRAD_BOOSTING Corr ' 0.05073' NO_CLAIMS DOCTOR_VISITS Combination of No_claims & Doctor_visits shows high probability at NE corner and middle section stable high prob. level. Too many charts to show but necessary for full interpretation.
  • 79. BINNED ORIGINAL PDP M2_TRN_GRAD_BOOSTING Corr ' 0.02549' NO_CLAIMS MEMBER_DURATION
  • 80. BINNED ORIGINAL PDP M2_TRN_LOGISTIC_STEPWISE Corr ' 0.06580' NO_CLAIMS OPTOM_PRESC
  • 81. BINNED ORIGINAL PDP M2_TRN_GRAD_BOOSTING Corr '-0.10759' MEMBER_DURATION OPTOM_PRESC
  • 82. BINNED ORIGINAL PDP M2_TRN_LOGISTIC_STEPWISE Corr '-0.10759' MEMBER_DURATION OPTOM_PRESC
  • 83. Some BMI comments LG High levels of NO_CLAIMS have high probability for lowest level of total_spend which probably denotes one time fraud. Otherwise, even mid levels of NO_CLAIMS associated with high probability for any level of TOTAL_SPEND. It seems that FRAUD is not necessarily linked to TOTAL_SPEND alone. About NO_CLAIMS and MEMBER_DURATION, fraud happens for low level of duration, after which fraudsters leave. About the pair Optom_presc and Member_duration for which we have contrasting Pair PDPs, with corr = -0.10, interpretation is very different. While Grad_b shows flat probabilities throughout, except in NE empty corner, logistic shows more extreme NE corner, plus declining probabilities from NW top.
  • 84. For 2 dimensional visualization, collapse 3d chart by averaging levels of variable 2 into those of variable 1 and compare to original PDP. Original and collapsed PDPs are derived from posterior model probabilities.
  • 85. No room for TOTAL_SPEND
  • 86.
  • 87.
  • 88. Comments for BMI. In case of NO_CLAIMS, all cases show overlapping of collapsed and Original, except for OPTOM_PRESC. In GB case, DURATION brings down probability slightly because duration is itself strong predictor. LG shows that presence of OPTOM_PRESC raises posterior probability, not so accentuated in GB. LG model could benefit of NO_CLAIMS and OPTOM_PRESC interaction or possibly overall transformation by way of obtaining information per month and per number of members. (LG chart with TOTAL_SPEND omitted for space brevity). MEMBER_DURATION shows overlap with all second variables, plus declining slope, more evident in LG models.
  • 89. Comments for BMI (cont). It is possible to obtain 3-way and higher PDPs, and also collapse them, not tried here. Given overlap between Original and CPDP, UMI effects are correct so far, except possibly for triplet NO_CLAIMS, MEMBER_DURATION and OPTOM_PRESC.
  • 90.
  • 91.
  • 92. P-comp1 mostly fitted by doctor_visits and member duration. # 2 (which fits residuals from step 1 ) by No_claims and optom_presc, etc.
  • 94. For Illustration: No_claims for fraud = 1 still highly correlated with second eigenvector.
  • 96. Overall View and omitting # 2 and # 3 for brevity sake.
  • 97. Comments for PCA results. PC # 1 shows MEMBER_DURATION and DOCTOR_VISITS grouped together, NO_CLAIMS and TOTAL_SPEND in another, and remaining in separate groups (separation can be proven by statistical inference). For logistic case, note that NO_CLAIMS is first ‘entered’ variable in Stepwise selection (earlier slides), followed by MEMBER_DURATION and OPTOM_PRESC. Note that NO_CLAIMS does not have largest correlation with first component, even when looking at correlations by values of FRAUD. And GB also has NO_CLAIMS and MEMBER_DURATION, not represented in hierarchy of Principal Components Analysis. PCA does not provide framework for interpreting models.
  • 98. Comments for PCA results (cont.) PCA orthogonalizes away predictors effects when going from step to step, not done by our present modeling methods. Having chosen cutoff point in posterior probability, possible to obtain similar PCA results for predicted 0 and predicted 1, obtain correlations and compare to previous results. Possible to use statistical inference to determine equality/inequality of correlations (with original results) for different cutoff points.
  • 99.
  • 100.
  • 101.
  • 102. Comments for Statistical Inference – Multiple Comparisons. Bars below ‘0.05’ considered to be significant, having taking multiple comparisons effects into consideration. Most results are insignificant, but significant results differ across models: NO_CLAIMS provides same information throughout range of Probs for GB, but LG finds first bin to be significantly different than the rest. GB results come from splitting GB search. Thus, LG states that lower level probability indicates NO_FRAUD, GB cannot state that.
  • 103.
  • 104. Definite monotonic relationship, higher values of NO_CLAIMS (e.g., bin 5) associated with higher probability. Note differences between LG and BG.
  • 106. GB monotonic, LG slighly U-shaped.
  • 107. Similar coeffs among logistic , GB and beta regressions. By Beta Regression, standard interpretation of log odds is possible, with caveats. Vars * Models * Coeffs Model Name M2_TRN_ GRAD_BO OSTING M2_TRN_ GRAD_BO OSTING_ BETA_RE G M2_TRN_ LOGISTIC _STEPWI SE M2_TRN_ LOGISTIC _STEPWI SE_BETA_ REG M2_VAL_ GRAD_BO OSTING M2_VAL_ LOGISTIC _STEPWI SE Coeff / Importan ce Coeff / Importan ce Coeff / Importan ce Coeff / Importan ce Coeff / Importan ce Coeff / Importan ce Variable 0.7318 -0.0051 -0.0057 -0.0057 0.7318 -0.0084 MEMBER_DURATIO N DOCTOR_VISITS 0.3925 -0.0061 0.3925 TOTAL_SPEND 0.6610 -0.0000 -0.0000 -0.0000 0.6610 -0.0000 OPTOM_PRESC 0.5944 0.1713 0.2132 0.2132 0.5944 0.1634 NO_CLAIMS 1.0000 0.7027 0.7921 0.7921 1.0000 0.7351 SCALE 21.8895 2.590291 516E16 INTERCEPT -0.7979 -0.8352 -0.8352 -0.3111
  • 108. Beta Regression results Results of analyzing posterior probabilities (i.e., original GB and LG posteriors) via BETA regression show very similar coefficients and structures  Beta is reassuring but not providing additional information.
  • 109.
  • 110. Possible to say that … 1) Manipulate just NO_CLAIMS and problem solved? 2) Maybe add MEMBER_DURATION and OPTOM_PRESC for parts of NO_CLAIMS range? 3) Maybe add if-then rules from simplified TREE_REPRESENTATION because easier than GB and more interactive than LG? 4) If using Neural Network, and NN derivatives abandon all hope of interpretation? 5)  Interpretation needs definition of INTENDED AUDIENCE (see Tolstoy ut supra).
  • 111. Possible to say that … (cont. 1) 1) The analyst needs to focus on NO_CLAIMS, MEMBER_DURATION and OPTOM_PRESC as an ‘IMPORTANT’ group. 2) Different model Interpretations should be entertained. 3) Different marginal effects must be explained.
  • 112. Final thoughts before I exhaust the audience, if not exhausted already. MI analysis can proceed further obtaining insights from collapsing three way PDPs for instance. If preferred ‘easier’ linear model explanation, beta regression on posterior probability would provide regression like information. Still, beta regression is not straightforward and model selection is big issue.
  • 113.
  • 114. Ch. 1.1-114 2019-05-10 Future steps Focus on MMI 1) Collapsing higher PDP orders, i.e., 3 way variables and interpreting. 2) Beta regression for ‘linear’ interpretation. More difficult because it requires model search as well. Plus, additional error in modeling posterior probability of original model. 3) Andrews’ curves.
  • 115.
  • 116. Lime: Local Interpretable Model-Agnostic Explanations: Uses surrogate interpretable model on black-box model, applied to observations of interest. Tree representation in this presentation similar to this. (https://homes.cs.washington.edu/~marcotcr/blog/lime/) ICE: Clusters or classification variable applied to PDP results. For given predictor, ICE plots draw one line per obs., representing how instance’s prediction changes when predictor changes. Shap Values: Shapley Additive Explanation (Lundberg et al, 2017): Measures positive or negative feature contribution to posterior probability, technique used in game theory to determine each player’s contribution to success in a game. Affected by correlations among predictors  focusing just on one predictor to change behavior may change other predictors as well (available in Python). AND OTHERS ….
  • 117. References Lundberg SM, Lee SI (2017), “Consistent feature attribution for tree ensembles”, presented at the 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia (https://arxiv.org/abs/1706.06060) Molnar C. (2018): Interpretable Machine Learning, A guide for making black box models explainable, https://christophm.github.io/interpretable-ml-book/ Tolstoy Leo: “The Kingdom of God Is Within You”, (1894)