Train, explain, acclaim
Build a good model in three steps
AI & NLP workshop day
Przemysław Biecek
Samsung SRPOL
Warsaw University of Technology
Train
with AutoML
Doing your n+1 dataset …, should you take into account what
you have learnt on previous datasets?
Defaults (package
defaults (Def.P) and
optimal defaults
(Def.O)),
tunability of the
hyperparameters
with the package
defaults (Tun.P) and
our optimal defaults
(Tun.O) as reference
and tuning space
quantiles (q0.05 and
q0.95) for different
parameters of the
algorithms
https://autodl.chalearn.org/
Sequential Model-based
Algorithm Configuration
The open-source solution of AAD
Freiburg uses a heterogeneous
ensemble of learning machines
(AutoSklearn (Feurer et al.,
2015a,c)) combining the machine
learning library scikit-learn
(Pedregosa et al., 2011) with the
state-of-the-art SMBO method
SMAC to find suitable machine
learning pipelines for a data set at
hand. This is essentially a
reimplementation of Auto-WEKA.
To speed up the optimization
process they employed a
metalearning technique (Feurer et
al., 2015b) which starts SMAC
from promising configurations of
scikit-learn. Furthermore, they
used the outputs of all models and
combined these into an ensemble
using ensemble selection.
This work has been supported in part by the Defense Advanced Research Projects Agency
(DARPA) Data-Driven Discovery of Models (D3M) Program.
https://github.com/automl/HpBandSter
http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
https://towardsdatascience.com/shallow-understanding-on-bayesian-optimization-324b6c1f7083
This work has been supported in part by the Defense Advanced Research Projects Agency
(DARPA) Data-Driven Discovery of Models (D3M) Program.
https://github.com/hibayesian/awesome-automl-papers
https://www.ml4aad.org/automl/literature-on-neural-architecture-search/
Auto-Keras

auto-sklearn

automl-gs

Auto-Weka

FeatureTools

h2o automl

Ludwig

mljar-supervised

Neural Network
Intelligence (NNI)

tpot

TransmografAI

Auto_ml
https://www.linkedin.com/in/igor2k/
Explain
with Interpretable Machine Learning
https://www.wired.com/2015/10/can-learn-epic-failure-google-flu-trends/
• “You don’t see a lot of skepticism,” she says. “The algorithms are like shiny new
toys that we can’t resist using. We trust them so much that we project meaning on to
them.”
• Ultimately algorithms, according to O’Neil, reinforce discrimination and widen
inequality, “using people’s fear and trust of mathematics to prevent them from
asking questions”.
https://www.theguardian.com/books/2016/oct/27/cathy-oneil-weapons-of-math-
destruction-algorithms-big-data !22
Cathy O'Neil:
The era of blind faith
in big data must end
black boxes
Why do we need explanations for complex models?
Right to explanation
!23
Why do we need explanations for complex models?
https://panoptykon.org/wiadomosc/prawo-do-wyjasnienia-decyzji-kredytowej-dla-
kazdego-sukces-panoptykonu
!29
Local Model approximations
"Why Should I Trust You?" Explaining the Predictions of Any Classifier.
Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin (2016). https://arxiv.org/pdf/1602.04938.pdf
Port to R: Thomas Lin Pedersen (2017) https://github.com/thomasp85/lime
Other implementations: live (Staniak, Biecek 2018) and iml (Molnar 2018)
A different approach to model explanation is to locally approximate
the complex black-box model with an easier to interpret white-box
model constructed on interpretable features.
!30
Local Model approximations
1.Generate a fake dataset around x.
2.Use black-box estimator to get target values y.
3.Train a new white-box estimator for (y,x).
4.Check prediction quality of a white-box classifier.
5.Use white-box estimator as an explanation of black-box model.
Properties:
model-agnostic
interpretable representation
local fidelity
LIME / live
vs

Break Down
Model debugging
Biecek P (2018). “DALEX: Explainers for Complex Predictive Models in R.”
Journal of Machine Learning Research, 19(84), 1-5. URL:http://jmlr.org/papers/v19/18-416.html>
What would you explain?
https://kmichael08.github.io
What would you explain?
https://kmichael08.github.io
What If?
Why?
https://www.encyclopedia-titanica.org/
What are the odds of surviving?
What If?
Ceteris Paribus
Individual Conditional Expectations
Ceteris Paribus
Individual Conditional Expectations
Champion - Challenger
Champion - Challenger
Why?
iBreakDown: Uncertainty of Model Explanations for Non-additive Predictive Models
Alicja Gosiewska, Przemyslaw Biecek (2019) https://arxiv.org/abs/1903.11420v1
SHAP (SHapley Additive exPlanations) Lundberg (2017)
IME complexity is O(2
p
). Shapley values are known for some
time and we have methods to approximate them efficiently.
Order does matter
https://github.com/MI2DataLab/modelDown
https://chudekm.shinyapps.io/model_explorer_example/
https://breakdeeper.netlify.com/
What If? Interactive with D3
What If? Comparison of models between languages
Acclaim
With Human Centered AI
https://www.massdevice.com/report-ibm-watson-delivered-unsafe-and-inaccurate-
cancer-recommendations/
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
On Explainable Machine Learning Misconceptions and A More Human-Centered
Machine Learning; Patrick Hall
Predictive Models: Visual Exploration, Explanation and Debugging
Production
Development
Concept ValidateForge
Model debugging
Model development is an
iterative process. Each
iteration brings new insights.
Early phases:
Crisp modeling, general
understanding of the problem.
Medium phases:
Selective modeling, here we
select the best type of model.
Late phases:
Fine tuning of model
parameters or variable
engineering .
In each iteration model development starts with some concepts, ideas, then the model is trained and finally
model needs to be validated.
Predictions need to be
explained.
Here the instance level
explanation helps.
With time the model
performance may deteriorate,
thus it requires constant
monitoring, e.g. with the drifter
package.
Drop-out loss
Variable importance
GBM
baseline
ditrict
surface
floor
construction.year
no. rooms
full model
250 500 750 1000 1250
3000
3200
3400
surface
prediction
3600
3800
Surface
6019 100 148
Random Forest
feature influence
Variables attributions
GBM
intercept
district: Srodmiescie
surface: 22
no.rooms: 2
construction.year: 2005
floor: 1
prediction
Random Forest
intercept
district: Srodmiescie
surface: 22
no.rooms: 2
construction.year: 2005
floor: 1
prediction
LM
intercept
district: Srodmiescie
surface: 22
no.rooms: 2
construction.year: 2005
floor: 1
prediction
2000 2100 2200 2300 2400 2500 2600 2700 2800 2900
2046
2614.9
+358
+160
+78
-39.5
+12.4
2800
2425
-338
-112
+74
-53
+26
2378
2324.9
-239
+160
+68
+39.5
-12.4
feature influence
Variables attributions
GBM
intercept
district: Srodmiescie
surface: 22
no.rooms: 2
construction.year: 2005
floor: 1
prediction
2000 2100 2200 2300 2400 2500 2600 2700
2046
2614.9
+358
+160
+78
-39.5
+12.4
3000
3200
3400
surface
prediction
3600
3800
Surface
6019 100 148
Random Forest
3000
3200
3400
2 4 6
prediction
3600
3800
3000
3200
3400
3600
3800
Surface
1920 1940 1960 1980 2010
no.roomsfloor
surfaceconstruction.year
4020 80 120 144
2.50.8 1.15.0 7.5 11.2
Variable selection
Feature engineering
Random Forest
Factor Merger
Srodmiescie
Ochota
Mokotow
Zoliborz
Ursus
Bielany
Bemowo
Wola
Ursynow
Praga
0 2000 4000
group frequencyname price mean
5109.19
3954.83
3946.96
3918.55
3058.52
3045.79
3028.58
3011.69
3009.72
2991.48
nr
1
2
2
3
4
4
5
6
6
7
Prediction explanations What-If analysis Concept drift detection
explain
Like it? Let us know! Find a bug? Fire an issue!

Train, explain, acclaim. Build a good model in three steps