Been Kim - Interpretable machine learning, Nov 2015

Interactive and Interpretable
Machine Learning Models
for Human Machine Collaboration
Been Kim
Nov 2015

Vision
Harness the relative strength of
humans and machine learning models
2
Human
Machine
Learning
Models
http://blogs.teradata.com/

Research objectives
Develop machine learning models inspired by how
humans think that can…
3
Human
Machine
Learning
Models

Go#here#
4
Human
Machine
Learning
Models
infer decisions
of humans
Research objectives

5
Human
Machine
Learning
Models
make sense
to humansinfer decisions
of humans
Research objectives

6
Human
Machine
Learning
Models
infer decisions
of humans
interact with
humans
make sense
to humans
Research objectives

1. Infer human team
decisions from
team planning
conversation
2. Communication
from machine to
human:
provide intuitive
explanations
3. Communication
from human to
machine:
incorporate
feedback
Go#here#
infer decisions
of humans
make sense to
humans
interact with
humans
7
Research objectives

Road map
8
2. Communication
from machine to
human:
provide intuitive
explanations
3. Communication
from human to
machine:
incorporate
feedback
make sense to
humans
interact with
humans
1. Infer human team
decisions from
team planning
conversation
Go#here#
infer decisions
of humans

Road map
1. Infer human team
decisions from
team planning
conversation
9
2. Communication
from machine to
human:
provide intuitive
explanations
3. Communication
from human to
machine:
incorporate
feedback
infer decisions
of humans
make sense to
humans
interact with
humans
Go#here#

• Human’s tactical decision is based on
exemplar-based reasoning (matching and
prototyping) [Cohen 96, Newell 72]
• Skilled ﬁre ﬁghters use recognition-primed
decision making — a situation is matched
to typical cases [Klein 89]
• Machines can better support peoples’
decision-making by representing data in
the same way
Mirror the way humans think
10

Case-based reasoning and
interpretable models
11
Case-based reasoning
• Applied to various applications thanks
to its intuitive power
[Aamodt 94, Slade 91, Bekkerman 06]
Limitations
• Always require labels (supervised)
• Does not scale to complex problems
• Does not leverage global patterns of
data
Interpretable models
• Decision tree [De`ath 00]
• Sparse linear classiﬁers
[Tibshirani 96, Ustun 14]
• Prototype-based [Graf 09]
Limitations
• Sparsity is not enough [Freitas 14]
• Linear models or supervised

Our approach:
Bayesian Case Model (BCM)
*
Bayesian generative models
Case-based reasoning
• Leverage the power of examples (prototypes) and
subspaces (hot features) to explain machine
learning results
prototypes
subspaces
Explain
complicated
concepts using
examples
12
[Kim, Rudin, Shah NIPS 2014]

• A general framework for Bayesian case-based reasoning
• Joint inference on prototypes, subspaces and cluster labels
Cluster A
…
Cluster B Cluster C
prototypes subspaces cluster labels
14

subspaces
prototypes
Explanations provided by
salsa
sour cream
avocado
salt, pepper, taco
shell, lettuce, oil
Taco
ﬂour
egg
water, salt, milk,
butter
Basic crepe
chocolate
strawberry
pie crust, whipping cream,
kirsch, almonds
Chocolate berry tart
Cluster A Cluster B Cluster C
15

Prototype
Quintessential observation
that best represents the cluster
Subspace
sets of important features
in characterizing clusters
• A general framework for Bayesian case-based reasoning
• Joint inference on cluster labels, prototypes and subspaces
salsa
sour cream
avocado
Taco
Explain Cluster A
1. clustering
2. learning
explanation
prototypes subspacescluster labels
16

It is a crepe, since it has ﬂour and egg.
It is inspired by Mexican food, because
it has avocado, salsa and sour cream.
Cluster labels:
• Admixture model for modeling the underlying distributions
= [A, B, A]mexican_crepe
1. Clustering part
17

It is a crepe, since it has ﬂour and egg.
It is sweet crepe that is like chocolate
and berry dessert.
• Admixture model for modeling the underlying distributions
= [B, C, C]chocolate_crepe
1. Clustering part
18

• Cluster distribution + supervised classiﬁcation methods can be
used for evaluating the clustering performance[1]
• Hyper parameter can be used to control how many different cluster
labels within one data point
The concentration
parameter:
Cluster distribution
of the data point
[1] D. Blei, A. Ng, M. Jordan 2003
1. Clustering part
19

Subspaces
binary variable
1 for important
features
• Each cluster is characterized by a prototype and subspaces
Prototype
2. Learning explanation part
20

A prototype is an
actual data point that
exists in the dataset
Prototype
• prototype: quintessential observation that best represents the cluster
21

• subspace: sets of important features in characterizing clusters
Subspaces
binary variable
1 for important
features
22

Subspaces
binary variable
1 for important
features
• subspace: sets of important features in characterizing clusters
Any similarity measure can be used.
For example, using any loss function:
The feature j of
cluster s is an
important feature
(i.e., subspace)
The value of feature j
is identical to the
value of the
prototype of clusters
23

Results
Challenges of interpretable models
1. Do the learned prototypes and subspaces make
sense?
2. Are we sacriﬁcing performance for the interpretability?
3. Do learned prototypes and subspaces help humans’
understanding?
24

Data from computer cooking contest: liris/cnrs.fr/ccc/ccc2014
• Unsupervised clustering on a subset of recipe data
1. Do the learned prototypes and subspaces make sense?
BCM on recipe data
25
sesam
e

BCM on digit data
http://www.cs.nyu.edu/~roweis/data.html
26

Learned cluster D
Gibbs sampling iteration
BCM on digit data
27

2. Are we sacriﬁcing anything for the interpretability?
Maintain accuracy
Handdigit
dataset
20Newsgroups
dataset
Sensitivity
Analysis
BCM BCM
28

2. Are we sacriﬁcing anything for the interpretability?
Joint inference on prototypes,
subspaces and cluster labels is the key
Posterior distribution
Level set
Another solution
that clusters data equally well
and
has better interpretability
—- BCM gives higher score for
this solution
One solution
that clusters
data well
29

Collapsed Gibbs sampling
for inference
• Observed to converge quickly in admixture models
• Integrating out and for efﬁcient inference
30

3. Does the model make sense to humans?
Objective measure of human understanding
Accuracy of human classifier
a new data
point to be
classified
• Participant’s task is to assign
the ingredients of a specific
dish (a new data point) to a
cluster
• Each cluster is explained
using either BCM or LDA
31

• 384 classification questions asked to 24 people
• Statistically significantly better performance with BCM
(85.9% v.s. 71.3%)
a new data
point to be
classified
Explanations of clusters
Clusters explained
using
1. BCM :
ingredients of the
prototype recipe
2. LDA:
representative
ingredients of
each cluster
3. Does the model make sense to humans?
Objective measure of human understanding
Accuracy of human classifier
32
sesam
e

Road map
1. Infer human team
decisions from
team planning
conversation
33
2. Communication
from machine to
human:
provide intuitive
explanations
3. Communication
from human to
machine:
incorporate
feedback
make sense to
humans
interact with
humans
Go#here#
infer decisions
of humans

Related work on
interactive machine learning
• Interact via multiple model parameter
settings [Patel 10, Amershi 15]
• Design smart interfaces [Amershi 11]
and visualization [Chaney 12, Gou 03]
• Interact via simpliﬁed medium of
interaction [Kapoor 10, Ware 01]
Prototypes
and
Subspaces!
37

interactive BCM (iBCM)
38
BCM iBCM
Double circled nodes
represent interacted
latent variables —
Node that get
information from both
user feedback and
information obtained
from data points

39
BCM iBCM
Double circled nodes
represent interacted
latent variables —
Node that get
information from both
user feedback and
information obtained
from data points

internal mechanism
40
3. Listen to
Data
Key: Balance between what the data indicates and
what makes most sense to the user
Our approach: Decompose Gibbs sampling steps to
1) adjust the feedback propagation depending on user’s conﬁdence
2) accelerate the inference by rearranging latent variables
2. Propagate
Users feedback
to accelerate
inference
1. Listen to
Users

User’s workﬂow with iBCM
abstract domain
41
click to change
to
to
click to promote
any items to be
prototype

Experiment procedure
1. Subjects are asked how they want to
group items
2. Subjects view results from BCM
• Essentially shows one of the
optimal clustering
3. Subjects indicate how well the results
matched their preferred clustering
4. Subjects interact with iBCM
matches with what they want
42
Compare 24 participants, 192 questions

Experiment results
1. Subjects are asked how they want to
group items
2. Subjects view results from BCM
• Essentially shows one of the
optimal clustering
matched their preferred clustering
4. Subjects interact with iBCM
matches with what they want
43
24 participants, 192 questions
Participants agreed more
strongly that ﬁnal clusters
matched their preferences
compared to the initial
clusters
Wilcoxon signed rank test

iBCM for introductory
programming education
44
• Why education?
• Current teachers’ workﬂow for creating grading rubric:
randomly pick 4-5 assignments and Hodgepodge Grading
[Cross 99]
• Understanding this variation is important for providing
appropriate, tailored feedback to students [Basu13, Huang 13]
• What are the challenges?
• Extracting right features — OverCode [Glassman 15]

iBCM + OverCode system
45submissions from MIT introductory python classes

iBCM + OverCode system
46
Select/unselect
subspaces
(keywords)
Promote/demote
prototypes

iBCM experiment with
domain experts results
Click here to get a
new grouping
V.S.
Task: Explore the full spectrum of students’ submissions and
write down `discovery list’ for a recitation
47

Experiment with
• 48 problems explored by 12
subjects who previously
taught introductory python
class
• participants agreed more
strongly to the following
compared to BCM ( )
Were more satisﬁed
Better explored the full spectrum of
students’ submissions
Better identiﬁed important features to
expand discovery list
Important features and prototypes are
useful𝑝 < 0.001
49
with iBCM, they…

Experiment with
• 48 problems explored by 12
subjects who previously
taught introductory python
class
• participants agreed more
strongly to the following
compared to BCM ( )
Were more satisﬁed
Better explored the full spectrum of
students’ submissions
Better identiﬁed important features to
expand discovery list
Important features and prototypes are
useful𝑝 < 0.001
50
with iBCM, they…
“[iBCM enabled me to] go in depth
as to how students could do”
“ [iBCM] is useful with large datasets
where brute-force would not be practical.”

Summary
51
[Kim, Chacha, Shah AAAI13]
[Kim, Chacha, Shah JAIR15]
Communication from
machine to human:
provide intuitive
explanations
make sense to
humans
interact with
humans
[Kim, Glassman, Johnson, Shah submitted*]
[Kim, Patel, Rostamizadeh, Shah AAAI 2015]
Inspiration: how humans
make decisions
Approach: case-based
Bayesian model
Results: provided intuitive
explanations while
maintaining performance
Approach: enable
interaction by
decomposing sampling
inference steps
Results: implemented and
validated the approach in
education domain
Communication from
human to machine:
incorporate feedback

miss-classiﬁed data
Next steps
• Interpretability for data
exploration: visualization
• Domain speciﬁc interpretability:
learning features that
distinguishes clusters
• Interactive machine learning for
debugging models or hyper
parameter explorations
predicted:
politics
Doc id #24
True label: medicine
52
[Kim, Doshi-Velez, Shah NIPS 2015]

Next steps at AI2
• Extend interpretability for initially
uninterpretable features (neural nets)
53
4th grade
science
exam
question

Q&A
[Kim, Chacha, Shah AAAI13]
[Kim, Chacha, Shah JAIR15]
Communication from
machine to human:
provide intuitive
explanations
make sense to
humans
interact with
humans
[Kim, Glassman, Johnson, Shah submitted*]
Inspiration: how humans
make decisions
Approach: case-based
Bayesian model
Results: provided intuitive
explanations while
maintaining performance
Approach: enable
interaction by
decomposing sampling
inference steps
Results: implemented and
validated the approach in
education domain
Communication from
human to machine:
incorporate feedback
[Kim, Doshi-Velez, Shah NIPS 2015]
AI2 is hiring
research interns
any time of the year.
Shoot me an email
if interested!
beenk@allenai.org

Been Kim - Interpretable machine learning, Nov 2015

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Been Kim - Interpretable machine learning, Nov 2015

Similar to Been Kim - Interpretable machine learning, Nov 2015 (20)

More from Seattle DAML meetup

More from Seattle DAML meetup (11)

Recently uploaded

Recently uploaded (20)

Been Kim - Interpretable machine learning, Nov 2015