Visual Exploration of Machine Learning Results using Data Cube Analysis
HILDA 2016
Visual Exploration of
Machine Learning Results
using Data Cube Analysis
Minsuk (Brian) Kahng, Dezhi (Andy) Fang, Polo Chau
Workshop on Human-In-the-Loop Data Analytics
Co-located with SIGMOD 2016 | June 26, 2016
Machine learning becoming complex
Long Pipeline
e.g., Feature extraction, model selection
Complex learning algorithms
e.g., Boosted models, Deep learning
Databases
Output
scores
Feature vectors (w/ labels) Evaluation
metric
Model
2
0.74
ML often viewed as “black box”
People often select models only based on
evaluation metrics (e.g., accuracy) without
deeper understanding of the models
Model A
Raw data table Accuracy
Model B
3
0.85
0.74
Challenge: Interpretation
Users want to understand how a model works
and why/when it performs better than others
If the model performs well, we can trust it
If not, we know how to “debug” it
AI is changing the technology behind Google searches, Wired, 2016
Google search team was reluctant to adopt complex
algorithms because “it’s hard to explain and
ascertain why a particular search result ranks more
highly than another result for a given query.”
4
Existing Approaches
Accuracy
Model 0.75
Input features Labels
5
Predicted labelInput features Labels
…
…
Model
Hard to discover contributing causes Explains how an instance is classified
Instance-level inspection“Black-box”
Textual explanations [Kulesza et al., 2011]
Visualization [Amershi et al., 2015]
Fine-grained & may not scale to many inst.
Our Approach:
Slicing ML instances into subsets
Subset 1
Subset 2
Subset 6
Input features Accuracy for
subset
Labels
Model
Accuracy
Model particularly
works well for
teenage users!
Model
age = “14-25”
age = “25-40”
age = “>65”
…
…
0.75
0.85
0.74
0.62
Input features Labels
6
Predicted labelInput features Labels
…
…
Model
Instance-level inspection“Black-box”
Hard to discover contributing causes Fine-grained & may not scale to many inst.
You may want to
see why this group
performs bad
Our Approach:
Specifying subsets with MLCube
Data cube provides a nice framework for
specifying subsets.
Dimension attributes: features, etc.
Measure attributes: accuracy, etc.
User_country
User_age
User_gender
7
Flexible subset definition
Subsets can be specified over
raw attributes, features, labels & output scores.
Our approach is aware of the ML pipeline and
its intermediate data.
Model
Raw attributes Output scoresFeature vectors (w/ labels) Eval metric
0.75
8
𝜎𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 (𝑅𝑎𝑤𝑇𝑎𝑏𝑙𝑒𝑠 ⋈ Features ⋈ Labels ⋈ Scores)
e.g., title like ‘%car%’ AND title_len > 10 AND AND score > 0.7
Visual and Interactive
Generate an visual overview for data
Interactively spot and explore interesting patterns
9
MLCube Explorer:
Interactive Visualization for Exploring ML Results by Subsets
Challenge: Large number of possible subsets
10
Task: Building ad click prediction models
Dataset: Ad Click Log from KDD Cup 2012
11
Task: Building ad click prediction models
Dataset: Ad Click Log from KDD Cup 2012
Example use case
Model B much
better than A for
“user_age.. = 0”
15
Drills down into that subset
Interesting patterns
between accuracy
and tfidf_sim_
query_title feature.
Future work
Rank and suggest interesting subsets
e.g., Subsets with largest accuracy differences
Interactive materialization
e.g., By predicting the next possible user steps
[Kamat et al., 2014]
User studies to evaluate usability and utility
How our tool helps engineers ease their workflow
16
Thanks!
MLCube for analyzing ML results by subsets
MLCube Explorer spots interesting patterns
Minsuk (Brian) Kahng
CS PhD student at Georgia Tech
http://minsuk.com
We thank Thomas Dudziak, Hussein Mehanna, Sofus Macskassy, Liang Xiong, and Oliver Zeldin for their advice and feedback.
This work is supported by the NSF Graduate Research Fellowship Program.
17