More Related Content Similar to 1530 track 2 abbott_using our laptop Similar to 1530 track 2 abbott_using our laptop (20) More from Rising Media, Inc. More from Rising Media, Inc. (20) 1530 track 2 abbott_using our laptop1. When Model Interpretation Matters:
Understanding Complex Predictive Models
Dean Abbo)
Co-Founder and Chief Data Scien6st, SmarterHQ
President, Abbo) Analy6cs
Twi)er: @deanabb
11. © Abbo) Analy6cs, 2014-2017
To be Fair: Other Ways to Compute Neural Network Sensitivities
Such as… h)p://www.palisade.com/downloads/pdf/academic/DTSpaper110915.pdf
And Sp://Sp.sas.com/pub/neural/importance.html#mlp_parder_interp
• Weight tracing – sum of product of weights (and variants)
• Par6al deriva6ves – avg, avg absolute, squared, etc.
• Remove variable, compute change in accuracy
15. © Abbo) Analy6cs, 2014-2017
What About Model Ensembles?
Decision Logic
Ensemble Prediction
10s, 100s, 1000s of trees…
“A forest of trees is impenetrable as far as simple
interpreta6ons of its mechanism go.” –
L. Breiman. Random forests. Machine Learning,
45(1): 5–32,2001. 18.
(h)ps://www.stat.berkeley.edu/~breiman/randomforest2001.pdf)
25. © Abbo) Analy6cs, 2014-2017
Assess Influence using Direct Measure of
Influence Proportion
• Compute the contribu6on of each term in the linear regression
model separately (each record).
– Var1_influence = var1coef * var1
– Var2_influence = var2coef * var2
– Var3_influence = var3coef * var3
26. © Abbo) Analy6cs, 2014-2017
Assess Influence using Direct Measure of
Influence Proportion
• Compute the contribu6on of each term in the linear regression
model separately (each record).
– Var1_influence = var1coef * var1, etc.
• For each term/input, compute the propor6on of the contribu6on
of the predicted target variable value
– What propor6on of the predic6on comes from each input?
– Var1_propor6on = Var1_influence / SUM(all variable influences)
27. © Abbo) Analy6cs, 2014-2017
Assess Influence using Direct Measure of
Influence Proportion
• Compute the contribu6on of each term in the linear regression model
separately (each record).
– Var1_influence = var1coef * var1, etc.
• For each term/input, compute the propor6on of the contribu6on of the
predicted target variable value
– How much of the predic6on comes from each input?
• Average the contribu6ons of each variable for each record to compute
the average influence of each variable
31. © Abbo) Analy6cs, 2014-2017
Why “Input Shuffling” / Permutation Importance ?
• We don’t always have nice metrics
to assess inputs of predic6ve
models -- Neural Networks, SVM,
ensembles
– Contrast with sta6s6cal methods like
Regression
• Even with regression, we don’t
always have the right input
distribu6ons so these metrics are
good indicators of variable
influence
33. © Abbo) Analy6cs, 2014-2017
What does “Shuffled” mean?
• Scramble (randomly) a single input
variable
– Input Shuffling Node doesn’t have to be
in a loop; it can scramble a column while
leaving the others in their natural order
• Captures the actual distribu6on of
the data
This node from open source soSware KNIME
h)p://www.knime.com
34. © Abbo) Analy6cs, 2014-2017
Principles of Input Shuffling
• Key: randomly re-populate values of a single input variable while leaving
all other variables with their original values
• Compute the standard devia6on (or some other measure of perturba6on)
for each record
– Of the Predicted Target Variable – posterior probability
– NOT the actual target variable value
• This perturba6on is a measure of how influen6al the variable is in the
model
– High standard devia6on -> lots of influence
– Low standard devia6on -> not much influence
– ~0 standard devia6on -> no influence
38. © Abbo) Analy6cs, 2014-2017
The Input Shuffling Process
1. Build the predic6ve model
2. For the training set (or suitable subset), loop over every variable
1. For every variable (in loop), loop M 6mes (50 by default)
1. Shuffle the variable (keeping all other inputs for that row fixed)
2. Score the Model
3. Save the scores for the en6re data set (you will end up with
39. © Abbo) Analy6cs, 2014-2017
The Input Shuffling Process
1. Build the predic6ve model
2. For the training set (or suitable subset), loop over every variable
1. For every variable (in loop), loop M 6mes (50 by default)
1. Shuffle the variable (keeping all other inputs for that row fixed)
2. Score the Model
3. Save all the scores for the en6re data set (M scores)
40. © Abbo) Analy6cs, 2014-2017
The Input Shuffling Process
1. Build the predic6ve model
2. For the training set (or suitable subset), loop over every variable
1. For every variable (in loop), loop M 6mes (50 by default)
1. Shuffle the variable (keeping all other inputs for that row fixed)
2. Score the Model
3. Save all the scores for the en6re data set (M scores)
2. Compute the standard devia6on of the predic6ons for each row (or
some other measure of “spread”), i.e., group by Row ID, compu6ng
stdev. Now we have N records again
41. © Abbo) Analy6cs, 2014-2017
The Input Shuffling Process
1. Build the predic6ve model
2. For the training set (or suitable subset), loop over every variable
1. For every variable (in loop), loop M 6mes (50 by default)
1. Shuffle the variable (keeping all other inputs for that row fixed)
2. Score the Model
3. Save all the scores for the en6re data set (M scores)
2. Compute the standard devia6on of the predic6ons for each row (or
some other measure of “spread”), i.e., group by Row ID, compu6ng
stdev. Now we have N records again
3. Compute the average spread of an input over all N records, such as
the mean of these standard devia6ons, i.e., group by en6re data set.
Now we have 1 number, the variable influence
42. © Abbo) Analy6cs, 2014-2017
The Input Shuffling Process
1. Build the predic6ve model
2. For the training set (or suitable subset), loop over every variable
1. For every variable (in loop), loop M 6mes (50 by default)
1. Shuffle the variable (keeping all other inputs for that row fixed)
2. Score the Model
3. Save all the scores for the en6re data set (M scores)
2. Compute the standard devia6on of the predic6ons for each row (or
some other measure of “spread”), i.e., group by Row ID, compu6ng
stdev. Now we have N records again
3. Compute the average spread of an input over all N records, such as
the mean of these standard devia6ons, i.e., group by en6re data set.
Now we have 1 number, the variable influence
3. Compare all results. Sort descending by variable influence.
47. © Abbo) Analy6cs, 2014-2017
Realistic Data: KDD Cup 1998
• 95,412: cup98lrn from KDD Cup 1998 Compe66on
– Use only the responders (4843) in linear regression models
• Hundreds of fields in data, but only use 4 for our purposes here
– LASTGIFT, NGIFTALL,
RFA_2F, D_RFA_2A
• Con6nuous target
• Two con6nuous inputs
• One ordinal input (RFA_2F)
• One dummy input (D_RFA_2A)