1530 track 2 abbott_using our laptop

When Model Interpretation Matters:
Understanding Complex Predictive Models
Dean Abbo)
Co-Founder and Chief Data Scien6st, SmarterHQ
President, Abbo) Analy6cs
Twi)er: @deanabb

© Abbo) Analy6cs, 2014-2017
Your Boss

Simple Models…Simple Story

Variable Importance in Linear Regression

Variable Importance in Decision Trees
•  Decision Trees
–  You think this is
easy to explain?

Variable Importance in Decision Trees
–  How about this?
–  You think this is
easy to explain?

Then We Do This

Variable Importance in Neural Networks
•  Huh?

Neural Networks:
So we do this!

The Truly “Engaged” Even Do This!

To be Fair: Other Ways to Compute Neural Network Sensitivities
Such as… h)p://www.palisade.com/downloads/pdf/academic/DTSpaper110915.pdf
And Sp://Sp.sas.com/pub/neural/importance.html#mlp_parder_interp

•  Weight tracing – sum of product of weights (and variants)

•  Par6al deriva6ves – avg, avg absolute, squared, etc.

•  Remove variable, compute change in accuracy

Naïve Bayes Model Outputs
Essen6ally a series of
cross-tabs for every
variable!

Remember, the ﬁnal
probability is the
product of the
individual variable
probabili6es.

SVM Output
The
Support
Vectors
deﬁne the
decision
boundary!

What About Model Ensembles?
Decision Logic
Ensemble Prediction
10s, 100s, 1000s of trees…

What About Model Ensembles?
Decision Logic
Ensemble Prediction
10s, 100s, 1000s of trees…
“A forest of trees is impenetrable as far as simple
interpreta6ons of its mechanism go.” –
L. Breiman. Random forests. Machine Learning,
45(1): 5–32,2001. 18.
(h)ps://www.stat.berkeley.edu/~breiman/randomforest2001.pdf)

Breiman’s Solution to “Impenetrable”
h)ps://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf

Permutation Importance Available in Salford Systems SPM,
scikit-learn and R

Outline
•  Classical variable importance: linear regression
•  Input Shuﬄing for Regression: compare and contrast
with classical regression importance
•  Extend to other regression algorithms
•  Extend to classiﬁca6on
•  Demonstrate on larger dataset

The Data: Easiest Possible!
•  3 inputs: each is a random Normal: mean = 20, std = 5
•  Target variable: 0.5*var1 + 0.2*var2 + 0.3*var3
•  95,412 records (same size as cup98lrn)

Linear Regression Coefficient
For Each Variable to Assess Influence
•  Coefficient match (be defini6on) the propor6ons used to
be build the target variable
•  This is the average influence of each input on the
predic6ons for all records

Assess Influence with t-proportion
For Each Variable

For Each Variable
•  T-value measures the signiﬁcance of the rela6onship.

For Each Variable
•  T-value measures the signiﬁcance of the rela6onship.
•  It turns out, that the propor$on of the t-values for the exact model
matches the coeﬃcients

Assess Influence using Direct Measure of
Influence Proportion
•  Compute the contribu6on of each term in the linear regression
model separately (each record).
–  Var1_inﬂuence = var1coef * var1

•  Compute the contribu6on of each term in the linear regression
model separately (each record).
–  Var1_influence = var1coef * var1, etc.
•  For each term/input, compute the propor6on of the contribu6on
of the predicted target variable value
–  What propor6on of the predic6on comes from each input?
–  Var1_propor6on = Var1_influence / SUM(all variable influences)

•  Compute the contribu6on of each term in the linear regression model
separately (each record).
–  Var1_inﬂuence = var1coef * var1, etc.
•  For each term/input, compute the propor6on of the contribu6on of the
predicted target variable value
–  How much of the predic6on comes from each input?
•  Average the contribu6ons of each variable for each record to compute
the average inﬂuence of each variable

So Far So Good
•  Now let’s do the
same thing for
–  Neural Networks
–  Support Vector
Machines.

So Far So Good
•  Now let’s do the same
thing for
–  Neural Networks
–  Support Vector Machines.

Motivation for Input Shuffling / Permutation Importance
h)p://www.elderresearch.com/company/target-shuﬄing

Why “Input Shuffling” / Permutation Importance ?
•  We don’t always have nice metrics
to assess inputs of predic6ve
models -- Neural Networks, SVM,
ensembles
–  Contrast with sta6s6cal methods like
Regression
•  Even with regression, we don’t
always have the right input
distribu6ons so these metrics are
good indicators of variable
inﬂuence

Input Distributions Are Not Always Ideal

What does “Shuffled” mean?
•  Scramble (randomly) a single input
variable
–  Input Shuﬄing Node doesn’t have to be
in a loop; it can scramble a column while
leaving the others in their natural order
•  Captures the actual distribu6on of
the data
This node from open source soSware KNIME
h)p://www.knime.com

Principles of Input Shuffling
•  Key: randomly re-populate values of a single input variable while leaving
all other variables with their original values
•  Compute the standard devia6on (or some other measure of perturba6on)
for each record
–  Of the Predicted Target Variable – posterior probability
–  NOT the actual target variable value
•  This perturba6on is a measure of how influen6al the variable is in the
model
–  High standard devia6on -> lots of influence
–  Low standard devia6on -> not much influence
–  ~0 standard devia6on -> no influence

Shuffled Inputs Meta Node
Two Loops: (1) loop on input variables and (2) shuﬄe input variable (50x or so)

The Input Shuffling Process
1.  Build the predic6ve model

2.  For the training set (or suitable subset), loop over every variable

1.  For every variable (in loop), loop M 6mes (50 by default)
1.  Shuﬄe the variable (keeping all other inputs for that row ﬁxed)
2.  Score the Model
3.  Save the scores for the en6re data set (you will end up with

3.  Save all the scores for the en6re data set (M scores)

2.  Compute the standard devia6on of the predic6ons for each row (or
some other measure of “spread”), i.e., group by Row ID, compu6ng
stdev. Now we have N records again

3.  Compute the average spread of an input over all N records, such as
the mean of these standard devia6ons, i.e., group by en6re data set.
Now we have 1 number, the variable inﬂuence

3.  Compute the average spread of an input over all N records, such as
the mean of these standard devia6ons, i.e., group by en6re data set.
Now we have 1 number, the variable inﬂuence
3.  Compare all results. Sort descending by variable inﬂuence.

Single Record: what it looks like
•  ASer 50 “input shuﬄes”: Row0

Average for All Records in data
•  Measures the spread of the predic6ons when randomly perturbing
the single input variable

Input Shuffling Result:
Idealized Linear Regression Data
•  Compute propor6on of the average standard devia6on from
shuﬄing the input (keeping others with the original values)
•  (yes, I know I’m averaging standard devia6ons!)
Target variable: 0.5*var1 + 0.2*var2 + 0.3*var3

Realistic Data: KDD Cup 1998
•  95,412: cup98lrn from KDD Cup 1998 Compe66on
–  Use only the responders (4843) in linear regression models
•  Hundreds of ﬁelds in data, but only use 4 for our purposes here
–  LASTGIFT, NGIFTALL,
RFA_2F, D_RFA_2A
•  Con6nuous target
•  Two con6nuous inputs
•  One ordinal input (RFA_2F)
•  One dummy input (D_RFA_2A)

Realistic Data: KDD Cup 1998
•  Heavy skew of LASTGIFT, NGIFTALL, TARGET_D
–  Makes visualiza6on diﬃcult
–  Biases
regression
coeﬃcients
(if
one cares)
–  So, do the usual
“best prac6ces”

For Regression Modeling, Normalize the Data
•  To remove inﬂuence of skew and scale
–  Log10 transform LASTGIFT, NGIFTALL, TARGET_D
–  Scale all variables (post log10) to [0, 1]

For Regression Modeling, Normalize the Data
•  Rela6onships clearer
–  LASTGIFT strong posi6ve correla6on with TARGET_D
–  NGIFTALL, RFA_2F, D_RFA_2A all have apparently slight nega6ve
correla6on
with
TARGET_D

The Basic Model: Linear Regression
Coeﬃcient
Use abs() for inﬂuence calcula7ons

Linear Regression:
Compare Influence Using Different Methods
Coeﬃcient t-Propor7on
Use abs() for t-propor7on calcula7ons Use abs() for inﬂuence calcula7ons

Linear Regression:
Compare Influence Using Different Methods
Coeﬃcient t-Propor7on
Direct Propor7on Input Shuﬄing Propor7on
Use abs() for t-propor6on calcula6ons Use abs() for calcula6ons
Use abs() for t-propor6on calcula6ons Use abs() for calcula6ons

Linear Regression, Neural Network: Input Shuffling
Influence
Input Shuﬄing- LR Input Shuﬄing - MLP

Applying Input Shuffling to Classification: Logistic Regression
Start simple: just 4 variables (like the regression example)

Applying Input Shuffling to Classification: Logistic Regression
Influence Based on Propor7on of z-score Influence Based on Input Shuffling

Ranking Larger Numbers of Variables:
Regression Example (TARGET_D)

Conclusion
•  Input shuffling (Permuta6on Importance) can generate
model sensi$vity scores for any model, no ma)er how
complex or nonlinear

•  Input shuffling can be applied to any algorithm, no
ma)er how linear or nonlinear the algorithm is
•  Matches linear regression variable influence (t-value)
•  Similar to logis6c regression variable influence (z-
score)

Improvements
•  Scores shown here are on average aggregates of the en6re data space
•  Will tell a less powerful story or even be misleading if
–  model predic6ons (scores) are not normally distributed,
–  the input influence is not uniform,
•  Solu6on: predic6ons into quan6les (deciles or oth er number of bins)
and compute the permuta6on importance score for every quan6le
–  Answers the ques6on: for high predicted values, which variables are most
influen6al
•  Build score influence rather than predic6on influence
–  Use ROC AUC sta6s6cs for each shuffled input, and determine the influence of each
variable on the model score rather than the predicted value

1530 track 2 abbott_using our laptop

Recommended

Recommended

More Related Content

Similar to 1530 track 2 abbott_using our laptop

Similar to 1530 track 2 abbott_using our laptop (20)

More from Rising Media, Inc.

More from Rising Media, Inc. (20)

Recently uploaded

Recently uploaded (20)

1530 track 2 abbott_using our laptop