SlideShare a Scribd company logo
1 of 94
Download to read offline
From Data to Decisions,
A Mixed Path of Data
Visualization and Machine
Learning
Qianwen Wang
Hypothesis
p-value
thr:0.05
Model
Results
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
0.7405
0.5232
0.2961
0.8705
0.030
R(M, D+)<R(M, D)
0.000
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.006
R(M+, D+)>R(M, D+)
0.048
R(M+, D)<R(M, D+)
0.000
R(M+, D+)>R(M+, D)
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
is
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
ld
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
is
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
ld
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
lr
e
a
d
y
le
a
r
n
e
d
t
h
e
M
+
h
a
s
le
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
p
o
s
it
iv
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
n
e
g
a
t
iv
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
p
o
s
it
iv
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
in
f
o
r
m
a
t
io
n
in
D
+
h
a
s
a
n
e
g
a
t
iv
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
it
iv
e
ly
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
iv
e
ly
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
it
iv
e
ly
L
e
a
n
in
g
w
it
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
iv
e
ly
T
h
e
c
o
n
c
e
p
t
is
u
s
e
f
u
l
t
o
M
+
Advisor: Huamin Qu Advisor: Nils Gehlenborg
2020
2017 2019
2015
Machine Learning
Data
Visualization
Human
Computer
Interaction
Machine Learning
and
Data Visualization,
What are we talking about?
Machine
Learning
Data
Visualization
• An ability to learn from data,
extract patterns, and make
decisions with minimum human
intervention
• An accessible way for humans
to interpret data, identify
patterns, and make data-
driven decisions
Machine
Learning
Data
Visualization
Data
Decisions
http://querytreeapp.com/blog/ma
ke-sense-with-data-visualization/
Data
Visualization
Machine
Learning
Artificial intelligence is still
human intelligence
Data
Visualization
Machine
Learning
Machine
Learning
Data
Visualization
Data
Decisions
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Machine
Learning
Data
Visualization
Data
Decisions
Data
Specification
Knowledge
Visualization Perception
Exploration
data visualization user
image
modify
specification
increase
knowledge
• VIS4ML
• ML4VIS
• A better collaboration
between ML and VIS
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Human intervention is needed at each step
How can data visualization facilitate the process?
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
How to choose a suitable model?
Overwhelmed by the Variety
12
DNN
DNN
D
N
N
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
D
N
N
DNN DNN
DNN DNN
Deep Neural Network (DNN)
DNN Genealogy
13
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
DNN
V i s u a l G e n e a l o g y o f
D e e p N e u r a l N e t w o r k s
Qianwen Wang1, Jun Yuan2, Shuxin Chen2, Hang Su2, Huamin Qu1, and Shixia Liu2
Tshinghua
University
Visualization Module
15
Architecture
Evolution
Performance
http://dnn.hkustvis.org/
Case: Investigate Evolution Patterns
17
Case: Investigate Evolution Patterns
18
19
How to combine skip connection
with the main branch?
Gate
Addition
Concatenation
A mixture
+
||
+ ||
Case: Investigate Evolution Patterns
ATMSeer: Increasing
Transparency and Controllability
in Automated Machine Learning
Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu,
Micah J. Smith, Kalyan Veeramachaneni, Huamin Qu
21
Developing ML Models
A model
for my task
SVM
MLP
Random
Forest
KNN
.
.
.
.
.
.
learning rate = ?
# layers = ?
batch size =?
# neurons = ?
.
.
.
.
.
.
…
SVM
MLP
Rando
m
Forest
KN
N
.
.
.
.
.
.
learning
rate
=
?
#
layers
=
?
batch
size
=?
#
neurons
=
?
.
.
.
.
.
.
Suppor
t
Vector
Machin
e ?
Ne
ura
l
Ne
tw
ork
?
Ra
nd
om
For
est
?
Hid
de
n
Lay
er
= ?
Le
arn
ing
Rat
e = ?
Ker
nel
Fun
ctio
n = ?
Ma
x
De
pt
h
= ?
A
c
ti
v
a
ti
o
n
= ?
K
Near
est
Neig
hbor ?
L
e
af
Si
z
e
= ?
Min
Sam
ples
Leaf
= ?
Min
Sam
ples
Split
= ?
Line
ar
Reg
ress
ion ?
Automated
Machine Learning
Make it automated!
22
controllability
transparency
…
24
Overview
25
Algorithm Level
HyperPartition
Level
HyperPartameter
Level
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Can we conduct behavioral testing of ML
models that goes beyond accuracy?
How to examine Discrimination?
28
A College Admission Example
29
accepted females
accepted males
rejected
50%>42% Seems unfair?
A College Admission Example
30
accepted females
accepted males
rejected
Low score
High score
33.3%>26.7%
75%>65%
A College Admission Example
31
accepted females
accepted males
rejected
20%=20%
40%=40%
60%=60%
80%=80%
Low score
High score
CS
EE CS
EE
32
Two individuals who are similar with respect
to a task are treated equally
Visual Analysis of
Discrimination
in Machine Learning
Tshinghua
University
1. 2.
Qianwen
Wang1
Zhenhua
Xu1
Huamin
Qu1
Shixia
Liu2
Zhutian
Chen1
Yong
Wang1
34
!"#"$%&'( )* +,-
."/ 01*
."/ 210
3'$45678// "#968:;'< ='9$/>3""4 ;<6'?")-04 @'<!
A
B
Discriminatory
Itemset
35
Discriminatory
Itemset
Challenges in Analysis
36
3'$45678//
7'6%&'(
C#968:;'<D*%2E ='9$/>3""4 +,-
$"78:;'<DF
'3<%6=;7#
='9/"D
$"<:
68G;:87%&8;<D
)E000
?8$;:87D
#;('$6"#
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
$"78:;'<DF
<':%;<%!8?;7;I
='9/"D
'3<
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
='9/"D
'3<
Long and Complex Definition
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
?8$;:87D
#;('$6"#
Intertwining
Relationship
Long and Complex Definition
37
Long and Complex Definition
38
23< raised hands < 50
Attribute Matrix
Itemset
Attribute
Intertwining Relationships
39
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
$"78:;'<DF
<':%;<%!8?;7;I
='9/"D
'3<
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
='9/"D
'3<
3'$45678//
G$;(8:"
C#968:;'<D)2E ='9$/>3""4DE-%H-
68G;:87%&8;<D
E000%H000
?8$;:87D
#;('$6"#
RippleSet
Designing RippleSet
40
An item
Items ∈ set A
An item
Items ∈ set A
41
An item
Items ∈ set A
(C∩D)(AUBUE)
(A∩B∩C∩D)E
(A∩B∩C)(DUE)
(A∩B∩E)(CUD)
(B∩C∩E)(AUD)
Designing RippleSet
42
An item
Items ∈ set A
(C∩D)(AUBUE)
(A∩B∩C∩D)E
(A∩B∩C)(DUE)
(A∩B∩E)(CUD)
(B∩C∩E)(AUD)
ABC
ABE
BCE
ABCD
CD
Designing RippleSet
43
An item
Items ∈ set A
(C∩D)(AUBUE)
(A∩B∩C∩D)E
(A∩B∩C)(DUE)
(A∩B∩E)(CUD)
(B∩C∩E)(AUD)
ABC
ABE
BCE
ABCD
CD
Items belonging to the
same set are put together
D
D
Weighted DAG
Circle packing algorithm
Designing RippleSet
44
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Can we conduct behavioral testing of ML
models that goes beyond accuracy?
46
Hypothesize about the effect of the
Common Orientation of an object
Hypothesize about the effect of the Surrounding
environment of an object
What concepts has the model learned?
Are the learned concepts always useful?
Black-box Analysis
47
input model prediction
Black-box Analysis
48
Prospector
Krause et al. 2016
model prediction
input
What-if tool
Wexler et al. 2019
GMUT
Hohman et al. 2019
examine hypotheses about how perturbations to inputs affect the
ML model outputs
Not statistically-meaningful:
• Only observations on individual predictions
White-box Analysis
49
Deconvnet Zeiler and Fergus 2013
Guided back propagation Springenberg et al. 2013
What has a neuron learned?
Not statistical-meaningful:
• The depicted patterns provide largely a hunch rather than solid
conclusions
Not efficient:
• It is impossible to examine all neurons
Can we test concept-based
hypotheses in an efficient and
statistically-meaningful way ?
50
H y p o M L : V i s u a l A n a l y s i s
f o r H y p o t h e s i s - b a s e d
E v a l u a t i o n o f M a c h i n e
L e a r n i n g M o d e l s
Qianwen
Wang1
William
Alexander2
Huamin Qu1
Min Chen2
Jack
Pegg2
noise
noise
D
D
+
+
D
D
Concept-based Testing
52
D
+
noise
D
M+
M+
M
M
M+
M
2 ML models
ML Training
2 pairs of datasets
Extra data that contains
the testing concept
Concept-based Testing
53
D
+
noise
D
M+
M
2 ML models
ML Training
D
+
noise
D
M+
M
D
+
noise
D
M+
M
R(M+,D)
R(M,D)
4 sets of results
ML Testing
Extra data that contains
the testing concept
R(b)
R(a)
R(M+,D+)
R(M,D+)
2 pairs of datasets
R(b)
R(a)
Statistical Comparison
54
significantly lower than
significantly higher than
insignificantly lower or higher than
or , but not
or , but not
Many uncontrolled variables…….
µ(R(a))=0.878 > µ(R(b))=0.876
Top-down workflow
55
0.8133
0.8347
0.8365
0.8356
Statistical
Comparison
Model
Results
H1. The concept is useful to M+ and would be useful to M
H2. The concept is harmful to M+ and would be harmful to M
H3. M has learned the concept ξ adequately
H4. M+ has learned the concept ξ adequately
H5. The extra information in D+ has a positive effect on M
H6. The extra information in D+ has a negative effect on M
H7. The extra information in D+ has a positive effect on M+
H8. The extra information in D+ has a negative effect on M+
H11. Leaning with Dm+ affects the extra part of M+ positively
H12. Leaning with Dm+ afects the extra part of M+ negatively
H9. Leaning with Dm+ affects the M part of M+ positively
H10. Leaning with Dm+ affects the M part of M+ negatively
Hypotheses
p: 0.446
p: 0.098
p: 0.256
p: 0.377
p: 0.061
p: 0.079
R(M+,D+)
R(M+,D)
R(M,D+)
R(M,D)
Visual Analysis of Hypotheses
56
p-value
thr:0.05
Model
Results
0.8757
0.6471
0.6092
0.9188
0.032
R(M, D+)<R(M, D)
0.002
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.015
R(M+, D+)>R(M, D+)
0.405
R(M+, D)<R(M, D+)
0.002
R(M+, D+)>R(M+, D)
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
Hypothesis
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
i
s
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
l
r
e
a
d
y
l
e
a
r
n
e
d
t
h
e
M
+
h
a
s
l
e
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
Supported
Unproven
Rejected
A hypothesis is
based on the
analyses in
the row
p-value
thr:0.05
Model
Results
0.8757
0.6471
0.6092
0.9188
0.032
R(M, D+)<R(M, D)
0.002
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.015
R(M+, D+)>R(M, D+)
0.405
R(M+, D)<R(M, D+)
0.002
R(M+, D+)>R(M+, D)
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
Hypothesis
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
i
s
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
l
r
e
a
d
y
l
e
a
r
n
e
d
t
h
e
M
+
h
a
s
l
e
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
Visual Analysis of Hypotheses
57
The analysis in row
rejects
supports
unproves
is conditional on
is unrelated to
the hypothesis in col
Visual Analysis of Hypotheses
58
The difference is statistically
significant
insignificant
p-value
thr:0.05
Model
Results
0.8757
0.6471
0.6092
0.9188
0.032
R(M, D+)<R(M, D)
0.002
R(M+, D)<R(M, D)
0.002
R(M+, D+)>R(M, D)
0.015
R(M+, D+)>R(M, D+)
0.405
R(M+, D)<R(M, D+)
0.002
R(M+, D+)>R(M+, D)
R(M, D)
R(M, D+)
R(M+, D)
R(M+, D+)
Hypothesis
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
u
s
e
f
u
l
t
o
M
T
h
e
c
o
n
c
e
p
t
i
s
h
a
r
m
f
u
l
t
o
M
+
a
n
d
w
o
u
l
d
b
e
h
a
r
m
f
u
l
t
o
M
M
h
a
s
a
l
r
e
a
d
y
l
e
a
r
n
e
d
t
h
e
M
+
h
a
s
l
e
a
r
n
e
d
t
h
e
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
p
o
s
i
t
i
v
e
e
f
f
e
c
t
o
n
M
+
T
h
e
e
x
t
r
a
i
n
f
o
r
m
a
t
i
o
n
i
n
D
+
h
a
s
a
n
e
g
a
t
i
v
e
e
f
f
e
c
t
o
n
M
+
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
e
x
t
r
a
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
p
o
s
i
t
i
v
e
l
y
L
e
a
n
i
n
g
w
i
t
h
D
m
+
a
f
f
e
c
t
s
t
h
e
M
p
a
r
t
o
f
M
+
n
e
g
a
t
i
v
e
l
y
T
h
e
c
o
n
c
e
p
t
i
s
u
s
e
f
u
l
t
o
M
+
Testing Concept:
Color Space
64
RGB
CIELAB
HSV
HSL
CMYK
YCrC
How does the concept Color Space influence the ML
model?
RGB
HSV
65
HSV
RGB
Noise
RGB
M+
D+
D
M How to merge
Color Space:
Experiment Design
How to merge
66
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
add
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
Conv2D
Max Pooling
max
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
max
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
Dropout
Dense
Conv2D
Max Pooling
Conv2D
Max Pooling
Flatten
max
How to merge
Color Space:
Experiment Design
maxpool 2
maxpool 1
add
max
67
maxpool 1
The information from another color space HSV
contributes to the prediction of this model
Color Space:
Results
68
maxpool 2
maxpool 1
The hypothesis testing results change when we
merge at different positions
Color Space:
Results
69
add max
Merge using different methods
Color Space:
Results
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
What can data visualization do
to facilitate the application of
ML in a specific domain ?
Qianwen
Wang
Nils
Gehlenborg
Kexin
Huang
Payal
Chandak
Marinka
Zitnik
DrugxAI: Interactive Visualization for
Explainable AI in Drug Discovery
71
Anatomy
Molecular
Function
Cellular
Component
Biological
Process
Phenoty
pe/Effect
Drug Disease
indication, contraindication, off-label use
drug side
effects disease symptoms/
phenotypes
Reactome
Pathway
present, absent
Protein/
Gene
relationships about drugs,
diseases, proteins, pathways,
effects as a heterogenous graph
Data about biomedicine
DrugxAI: Interactive Visualization for
Explainable AI in Drug Discovery
The challenges are more than just providing
explanations:
1) find a form of explanation that can be easily
interpreted by doctors in the context of biomedicine
2) present the explanations in a scalable, effective,
and steerable way.
known
relationships
new therapeutic use
deep learning
knowledge learned by
this model
reasons of
this prediction
DrugxAI: Interactive Visualization for
Explainable AI in Drug Discovery
D
a
t
a
Speci
ficati
on
Knowl
edge
Visualization Perception
Exploration
data visualization user
image
m
o
d
i
f
y
s
p
e
c
i
f
i
c
a
t
i
o
n
i
n
c
r
e
a
s
e
k
n
o
w
l
e
d
g
e
Machine
Learning
Data
Visualization
Data
Decisions
Data
Specification
Knowledge
Visualization Perception
Exploration
data visualization user
image
modify
specification
increase
knowledge
J. J. Van Wijk, “The value of visualization”, 2005
Data
Visualization
A p p l y i n g M a c h i n e L e a r n i n g
A d v a n c e s t o D a t a V i s u a l i z a t i o n :
A S u r v e y o f M L 4 V I S
Qianwen
Wang
Huamin
Qu
Zhutian
Chen
Yong
Wang
b
a
4
7
9
15
50
0 10 20 30 40 50
Other
DMM
ML
HCI
VIS
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
1 0 1 1 1
3
1
5
9
19
28
16
2020-
https://ml4vis.github.io
VIS-driven Data
Processing
Insight
Style
Data
Visualization
VIS
Interaction
VIS
Perception
Data
Presentation
Insight
Communication
Style Imitation
USER
VIS
DATA
User
Action
Raw
Data
Processed
Data
VIS-driven
Data
Preprocessing
VIS
Luo et al. Interactive Cleaning for Progressive Visualization through Composite Questions, 2020
Processed
Data
VIS
Data
Presentation
Dibia and Demiralp, Data2Vis, 2018
Hu et al. , VixML, 2018
Insights
VIS
Insight
Communication
Qian et al. 2020
Wang et al. 2020
Data
VIS of
A Specific Style
Style
Imitation
VIS
Tang et al, PlotThread, 2020
Wu et al., MobileVisFixer, 2020
Smart et al., 2019
DeepDrawing: A Deep Learning Approach to Graph Drawing
Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu
Graph Data
Style
Imitation
Graph Drawing
Graph
Drawing
Samples
The curved green arrows (real edges of graphs)
explicitly reflect the actual graph structure
The dotted yellow arrows (“fake” edges)
propagate the prior nodes’ overall influence on
the drawing of subsequent nodes
A graph-based LSTM for the
learning of graph drawing
DeepDrawing: A Deep Learning Approach to Graph Drawing
Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu
Graph Data
Style
Imitation
Graph Drawing
Graph
Drawing
Samples
Baseline Model:
a 4-layer bi-
directional LSTM
VIS
Data, Style,
Insights
VIS Perception
Bylinskii et al. 2017
Poco et al. 2017
Kafle et al. 2018
Towards Automated Infographic Design:
Deep Learning-based Auto-Extraction of Extensible Timeline
Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu
VIS
Perception
"encoding": { "x": {
"field": “sale”,
"scale": { "bandSize":
30 }, "type":
"quantitative" ..
Bitmap visualization
Visualization specification
Mask-RCNN
Post processing based on GrabCut
Towards Automated Infographic Design:
Deep Learning-based Auto-Extraction of Extensible Timeline
Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu
VIS
Perception
"encoding": { "x": {
"field": “sale”,
"scale": { "bandSize":
30 }, "type":
"quantitative" ..
Bitmap visualization
Visualization specification
VIS
VIS Interaction
User
Action
VIS
Chen et al. 2020
Ottley et al. 2019
ID paper venue
1 Gotz and Wen [86] IUI 2009 X X X
2 Savva et al. [107] UIST 2011 X X
3 Key et al. [11] SIGMOD 2012 X X
4 Steichen et al. [84] IUI 2013 X X
5 Brown et al. [62] TVCG 2014 X X
6 Lalle et al. [83] IUI 2014 X X
7 Toker et al. [12] IUI 2014 X X
8 Sedlmair and Aupetit [13] CGF 2015 X X
9 Mutlu et al. [14] TiiS 2016 X X
10 Aupetit and Sedlmair [95] PVis 2016 X X
11 Siegel et al. [102] ECCV 2016 X X
12 Kembhavi et al. [92] ECCV 2016 X x
13 Al-Zaidy et al. [15] AAAI 2016 X X
14 Poci et al. [88] VIS 2017 X X
15 Kwon et al. [74] VIS 2017 X x
16 Bylinskii et al. [64] UIST 2017 X X
17 Saha et al. [117] IJCAI 2017 X X
18 Kruiger et al. [16] EuroVis 2017 X X
19 Poco and Heer [89] EuroVis 2017 X X
20 Jung et al. [99] CHI 2017 X X
21 Bylinskii et al. [100] arxiv 2017 X X X
22 Al-Zaidy and Giles [17] AAAI 2017 X X
23 Siddiqui et al. [61] VLDB 2018 X X
24 Gramazio et al. [85] VIS 2018 X X
25 Moritz et al. [18] VIS 2018 X X x
26 Berger et al. [68] VIS 2018 X X
27 Wang et al. [53] VIS 2018 X X
28 Haehn et al. [19] VIS 2018 X x
29 Luo et al. [57] SIGMOD 2018 X X x
30 Milo and Somech [80] KDD 2018 X X
31 Zhou et al. [20] IJCAI 2018 X X
32 Kahou et al. [101] ICLR 2018 X X
33 Luo et al. [65] ICDE 2018 X X
34 [Fan and Hauser [79] EuroVis 2018 X X
35 Chegini et al. [96] EuroVis 2018 X X
36 Kafle et al. [63] CVPR 2018 X X x
37 Kim et al. [106] CVPR 2018 X x
38 Battle et al. [108] CHI 2018 X X
39 Dibia and Demiralp [54] CGA 2018 X X
40 Haleem et al. [94] CGA 2018 X X
41 Madan et al. [103] arxiv 2018 X x X
V
I
S
-
d
r
i
v
e
n
D
a
t
a
P
r
o
c
e
s
s
i
n
g
P
r
e
s
e
n
t
D
a
t
a
C
o
m
m
u
n
i
c
a
t
e
I
n
s
i
g
h
t
I
m
i
t
a
t
e
S
t
y
l
e
V
I
S
P
e
r
c
e
p
t
i
o
n
V
I
S
I
n
t
e
r
a
c
t
i
o
n
C
l
u
s
t
e
r
i
n
g
D
i
m
e
n
s
i
o
n
R
e
d
u
c
t
i
o
n
G
e
n
e
r
a
t
i
v
e
C
l
a
s
s
i
f
i
c
a
t
i
o
n
R
e
g
r
e
s
s
i
o
n
S
e
m
i
-
s
u
p
e
r
v
i
s
e
d
R
e
i
n
f
o
r
c
e
m
e
n
t
14 Poci et al. [88] VIS 2017 X X
15 Kwon et al. [74] VIS 2017 X x
16 Bylinskii et al. [64] UIST 2017 X X
17 Saha et al. [117] IJCAI 2017 X X
18 Kruiger et al. [16] EuroVis 2017 X X
19 Poco and Heer [89] EuroVis 2017 X X
20 Jung et al. [99] CHI 2017 X X
21 Bylinskii et al. [100] arxiv 2017 X X X
22 Al-Zaidy and Giles [17] AAAI 2017 X X
23 Siddiqui et al. [61] VLDB 2018 X X
24 Gramazio et al. [85] VIS 2018 X X
25 Moritz et al. [18] VIS 2018 X X x
26 Berger et al. [68] VIS 2018 X X
27 Wang et al. [53] VIS 2018 X X
28 Haehn et al. [19] VIS 2018 X x
29 Luo et al. [57] SIGMOD 2018 X X x
30 Milo and Somech [80] KDD 2018 X X
31 Zhou et al. [20] IJCAI 2018 X X
32 Kahou et al. [101] ICLR 2018 X X
33 Luo et al. [65] ICDE 2018 X X
34 [Fan and Hauser [79] EuroVis 2018 X X
35 Chegini et al. [96] EuroVis 2018 X X
36 Kafle et al. [63] CVPR 2018 X X x
37 Kim et al. [106] CVPR 2018 X x
38 Battle et al. [108] CHI 2018 X X
39 Dibia and Demiralp [54] CGA 2018 X X
40 Haleem et al. [94] CGA 2018 X X
41 Madan et al. [103] arxiv 2018 X x X
42 Yu and Silva [82] VIS 2019 X X
43 He et al. [69] VIS 2019 X X
44 Chen et al. [59] VIS 2019 X X
45 Han and Wang [67] VIS 2019 X X
46 Chen et al. [55] VIS 2019 X X
47 Kwon and Ma [75] VIS 2019 X X
48 Wang et al. [2] VIS 2019 X x
49 Han et al. [120] VIS 2019 X X x
50 Wall et al. [111] VIS 2019 X X
51 Fujiwara et al. [118] VIS 2019 X X
52 Fu et al. [3] VIS 2019 X x X
53 Porter et al. [21] VIS 2019 X X
54 Jo and Seo [119] VIS 2019 X X x
55 Ma et al. [93] VIS 2019 X X
56 Wang et al. [73] VIS 2019 x X
57 Cui et al. [56] VIS 2019 X X
58 Chen et al. [5] VIS 2019 x X
59 Wang et al. [22] VIS 2019 X x
60 Smart et al. [58] VIS 2019 X X
61 Huang et al. [104] VIS 2019 X X
62 Hong et al. [23] PacificVis 2019 X X
63 Fan and Hauser [122] EuroVis 2019 X X
64 Ottley et al. [60] EuroVis 2019 X X
65 Abbas et al. [121] EuroVis 2019 X x x
66 Kassel and Rohs [24] EuroVis 2019 X X X
67 Hu et al. [66] CHI 2019 X X
68 Fan and Hauser [25] CGA 2019 X X
69 Kafle et al. [26] arxiv 2019 X X
45 Han and Wang [67] VIS 2019 X X
46 Chen et al. [55] VIS 2019 X X
47 Kwon and Ma [75] VIS 2019 X X
48 Wang et al. [2] VIS 2019 X x
49 Han et al. [120] VIS 2019 X X x
50 Wall et al. [111] VIS 2019 X X
51 Fujiwara et al. [118] VIS 2019 X X
52 Fu et al. [3] VIS 2019 X x X
53 Porter et al. [21] VIS 2019 X X
54 Jo and Seo [119] VIS 2019 X X x
55 Ma et al. [93] VIS 2019 X X
56 Wang et al. [73] VIS 2019 x X
57 Cui et al. [56] VIS 2019 X X
58 Chen et al. [5] VIS 2019 x X
59 Wang et al. [22] VIS 2019 X x
60 Smart et al. [58] VIS 2019 X X
61 Huang et al. [104] VIS 2019 X X
62 Hong et al. [23] PacificVis 2019 X X
63 Fan and Hauser [122] EuroVis 2019 X X
64 Ottley et al. [60] EuroVis 2019 X X
65 Abbas et al. [121] EuroVis 2019 X x x
66 Kassel and Rohs [24] EuroVis 2019 X X X
67 Hu et al. [66] CHI 2019 X X
68 Fan and Hauser [25] CGA 2019 X X
69 Kafle et al. [26] arxiv 2019 X X
70 Mohammed [27] VLDB 2020 X x
71 Zhang et al. [90] VIS 2020 x x x
72 Wu et al. [77] VIS 2020 x x
73 Tang et al. [76] VIS 2020 x x
74 Qian et al. [28] VIS 2020 x x
75 Wang et al. [29] VIS 2020 x X
76 Fosco et al. [112] UIST 2020 x x
77 Giovannangeli et al. [139] PacificVis 2020 x x
78 Liu et al. [105] PacificVis 2020 x x x
79 Luo et al. [52] ICDE 2020 X x x
80 Lekschas et al. [113] EuroVis 2020 x x X x
81 Zhao et al. [30] CHI 2020 X x
82 Lai et al. [31] CHI 2020 x x x
83 Kim et al. [32] CHI 2020 x x x
84 Lu et al. [33] CHI 2020 x x x
85 Zhou et al. [109] arxiv 2020 X X
Machine Learning Tasks:
Clustering,
Dimension Reduction,
Generation
Classification
Regression
Semi-supervised Learning
Reinforcement Learning
Visualization Process:
VIS-driven Data Processing
Data Presentation
Insight Communication
Style Imitation
VIS Perception
VIS Interaction
Classification is the most widely used
This might be caused by the success of deep learning
in computer vision tasks
We need to better embrace the diversity of ML
techniques.
ML4VIS:
Opportunities
& Challenges
Public High-quality Datasets
& Benchmark Tasks
Visualization-Tailored Machine Learning
User-Friendly ML4VIS
ML4VIS:
Opportunities
& Challenges
Public High-quality Datasets
& Benchmark Tasks
• Most papers constructed their own datasets due to the
lack of public visualization datasets
• The dataset quality may endanger the validity of the
obtained ML models.
e.g., DeepEye [luo et al.2019] learns to classify
“good”/“bad” visualizations based on the training
examples labelled by 100 students
• Benchmark tasks for ML4VIS remain unclear
ML4VIS:
Opportunities
& Challenges
Visualization-Tailored Machine Learning
• Most ML4VIS studies directly apply general ML
techniques developed in the field of ML
• General ML techniques not always suit well for the
specific problems in visualization
ML4VIS:
Opportunities
& Challenges
User-friendly ML4VIS
• The employment of ML not only provides opportunities
but also poses new challenges in designing
visualizations
• Some ML4VIS studies have discussed the usability
issues of ML4VIS, but these suggestions are scattered
among different papers
• Future studies are needed to help designers better
understand user behaviours and expectations in this
new ML4VIS scenario
https://qarea.com/blog/5-tips-for-creating-user-friendly-interface
Machine Learning
+
Data Visualization
+
Humans
Amount of Information
Few Large
Human
Head
Pure Machine Learning
Pure Data
Visualization
Task Definition
Fuzzy
Clear There is no panacea
A better combination between the
power of visualization, machine
learning, and human users:
• How to split tasks
• How to dynamically modify the
splitting based on user
preference and expertise
• How to design novel algorithms &
visualizations for the
collaboration
Machine Learning or Data Visualization?
Thanks!
https://wangqianwen0418.github.io/
qianwen_wang@hms.harvard.edu
Data
Collection
Model
Development
Model
Evaluation
Model
Application
Problem
Understanding
Data
Visualization
Data Decisions
VIS-driven Data
Processing
Insight
Style
Data
Visualization
VIS
Interaction
VIS
Perception
Data
Presentation
Insight
Communication
Style Imitation
USER
VIS
DATA
User
Action
Machine
Learning

More Related Content

Similar to From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISIrene Pochinok
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis TestingRyan Herzog
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxxababid981
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmCemal Ardil
 
STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)sumanmathews
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing datamjlobetos
 
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docxPage 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docxalfred4lewis58146
 
Sparsenet
SparsenetSparsenet
Sparsenetndronen
 
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHESVARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHESIAEME Publication
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750richardchandler
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptxssuser908de6
 
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdfObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdfMohammedArish6
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.pptRohit Raj
 

Similar to From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning (20)

Topic 1 part 2
Topic 1 part 2Topic 1 part 2
Topic 1 part 2
 
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISFUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Overview Of Quartile.pptx
Overview Of Quartile.pptxOverview Of Quartile.pptx
Overview Of Quartile.pptx
 
Regression
RegressionRegression
Regression
 
ML-MCQ.pdf
ML-MCQ.pdfML-MCQ.pdf
ML-MCQ.pdf
 
Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxMarketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
 
Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmMimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithm
 
STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)
 
Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing dataLesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data
 
1624.pptx
1624.pptx1624.pptx
1624.pptx
 
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docxPage 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
 
Cairo 02 Stat Inference
Cairo 02 Stat InferenceCairo 02 Stat Inference
Cairo 02 Stat Inference
 
Sparsenet
SparsenetSparsenet
Sparsenet
 
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHESVARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
 
unit classification.pptx
unit  classification.pptxunit  classification.pptx
unit classification.pptx
 
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdfObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 

Recently uploaded

Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12cjchen22
 
LM7_ Embedded Sql and Dynamic SQL in dbms
LM7_ Embedded Sql and Dynamic SQL in dbmsLM7_ Embedded Sql and Dynamic SQL in dbms
LM7_ Embedded Sql and Dynamic SQL in dbmsBalaKrish12
 
Feasibility analysis and modeling of a solar hybrid system for residential el...
Feasibility analysis and modeling of a solar hybrid system for residential el...Feasibility analysis and modeling of a solar hybrid system for residential el...
Feasibility analysis and modeling of a solar hybrid system for residential el...IJECEIAES
 
maths mini project ( applictions of quadratic forms and SVD ).ppt
maths mini project ( applictions of quadratic forms and SVD ).pptmaths mini project ( applictions of quadratic forms and SVD ).ppt
maths mini project ( applictions of quadratic forms and SVD ).pptManavPatane
 
Pyrolysis process control: temperature control design and application for opt...
Pyrolysis process control: temperature control design and application for opt...Pyrolysis process control: temperature control design and application for opt...
Pyrolysis process control: temperature control design and application for opt...IJECEIAES
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
How to Implement Effective Stormwater Management in DC
How to Implement Effective Stormwater Management in DCHow to Implement Effective Stormwater Management in DC
How to Implement Effective Stormwater Management in DCSera Engineered, LLC
 
Defining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptxDefining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptxAshwiniTodkar4
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxPoonam60376
 
Madani.store - Planning - Interview Questions
Madani.store - Planning - Interview QuestionsMadani.store - Planning - Interview Questions
Madani.store - Planning - Interview QuestionsKarim Gaber
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Understanding Process Safety Incidents in the Oil and Gas Industry
Understanding Process Safety Incidents in the Oil and Gas IndustryUnderstanding Process Safety Incidents in the Oil and Gas Industry
Understanding Process Safety Incidents in the Oil and Gas Industrysoginsider
 
sedimentation for the material for system.
sedimentation for the material for system.sedimentation for the material for system.
sedimentation for the material for system.Shyam97291
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfsaad175691
 
LEA's chemistry of cement and concrete - 2019.pdf
LEA's chemistry of cement and concrete - 2019.pdfLEA's chemistry of cement and concrete - 2019.pdf
LEA's chemistry of cement and concrete - 2019.pdfJurgen Kola
 

Recently uploaded (20)

Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12
 
LM7_ Embedded Sql and Dynamic SQL in dbms
LM7_ Embedded Sql and Dynamic SQL in dbmsLM7_ Embedded Sql and Dynamic SQL in dbms
LM7_ Embedded Sql and Dynamic SQL in dbms
 
Feasibility analysis and modeling of a solar hybrid system for residential el...
Feasibility analysis and modeling of a solar hybrid system for residential el...Feasibility analysis and modeling of a solar hybrid system for residential el...
Feasibility analysis and modeling of a solar hybrid system for residential el...
 
maths mini project ( applictions of quadratic forms and SVD ).ppt
maths mini project ( applictions of quadratic forms and SVD ).pptmaths mini project ( applictions of quadratic forms and SVD ).ppt
maths mini project ( applictions of quadratic forms and SVD ).ppt
 
Pyrolysis process control: temperature control design and application for opt...
Pyrolysis process control: temperature control design and application for opt...Pyrolysis process control: temperature control design and application for opt...
Pyrolysis process control: temperature control design and application for opt...
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
How to Implement Effective Stormwater Management in DC
How to Implement Effective Stormwater Management in DCHow to Implement Effective Stormwater Management in DC
How to Implement Effective Stormwater Management in DC
 
Defining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptxDefining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptx
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxIntroduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptx
 
Madani.store - Planning - Interview Questions
Madani.store - Planning - Interview QuestionsMadani.store - Planning - Interview Questions
Madani.store - Planning - Interview Questions
 
ASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductosASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductos
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Understanding Process Safety Incidents in the Oil and Gas Industry
Understanding Process Safety Incidents in the Oil and Gas IndustryUnderstanding Process Safety Incidents in the Oil and Gas Industry
Understanding Process Safety Incidents in the Oil and Gas Industry
 
sedimentation for the material for system.
sedimentation for the material for system.sedimentation for the material for system.
sedimentation for the material for system.
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
 
Versatile Engineering Construction Firms
Versatile Engineering Construction FirmsVersatile Engineering Construction Firms
Versatile Engineering Construction Firms
 
LEA's chemistry of cement and concrete - 2019.pdf
LEA's chemistry of cement and concrete - 2019.pdfLEA's chemistry of cement and concrete - 2019.pdf
LEA's chemistry of cement and concrete - 2019.pdf
 

From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

  • 1. From Data to Decisions, A Mixed Path of Data Visualization and Machine Learning Qianwen Wang Hypothesis p-value thr:0.05 Model Results R(M, D) R(M, D+) R(M+, D) R(M+, D+) 0.7405 0.5232 0.2961 0.8705 0.030 R(M, D+)<R(M, D) 0.000 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.006 R(M+, D+)>R(M, D+) 0.048 R(M+, D)<R(M, D+) 0.000 R(M+, D+)>R(M+, D) H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t is u s e f u l t o M + a n d w o u ld b e u s e f u l t o M T h e c o n c e p t is h a r m f u l t o M + a n d w o u ld b e h a r m f u l t o M M h a s a lr e a d y le a r n e d t h e M + h a s le a r n e d t h e T h e e x t r a in f o r m a t io n in D + h a s a p o s it iv e e f f e c t o n M T h e e x t r a in f o r m a t io n in D + h a s a n e g a t iv e e f f e c t o n M T h e e x t r a in f o r m a t io n in D + h a s a p o s it iv e e f f e c t o n M + T h e e x t r a in f o r m a t io n in D + h a s a n e g a t iv e e f f e c t o n M + L e a n in g w it h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s it iv e ly L e a n in g w it h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t iv e ly L e a n in g w it h D m + a f f e c t s t h e M p a r t o f M + p o s it iv e ly L e a n in g w it h D m + a f f e c t s t h e M p a r t o f M + n e g a t iv e ly T h e c o n c e p t is u s e f u l t o M +
  • 2. Advisor: Huamin Qu Advisor: Nils Gehlenborg 2020 2017 2019 2015 Machine Learning Data Visualization Human Computer Interaction
  • 4. Machine Learning Data Visualization • An ability to learn from data, extract patterns, and make decisions with minimum human intervention • An accessible way for humans to interpret data, identify patterns, and make data- driven decisions
  • 7. Artificial intelligence is still human intelligence Data Visualization Machine Learning
  • 12. Overwhelmed by the Variety 12 DNN DNN D N N DNN DNN DNN DNN DNN DNN DNN DNN D N N DNN DNN DNN DNN Deep Neural Network (DNN)
  • 14. V i s u a l G e n e a l o g y o f D e e p N e u r a l N e t w o r k s Qianwen Wang1, Jun Yuan2, Shuxin Chen2, Hang Su2, Huamin Qu1, and Shixia Liu2 Tshinghua University
  • 16.
  • 19. 19 How to combine skip connection with the main branch? Gate Addition Concatenation A mixture + || + || Case: Investigate Evolution Patterns
  • 20. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, Huamin Qu
  • 21. 21 Developing ML Models A model for my task SVM MLP Random Forest KNN . . . . . . learning rate = ? # layers = ? batch size =? # neurons = ? . . . . . .
  • 22. … SVM MLP Rando m Forest KN N . . . . . . learning rate = ? # layers = ? batch size =? # neurons = ? . . . . . . Suppor t Vector Machin e ? Ne ura l Ne tw ork ? Ra nd om For est ? Hid de n Lay er = ? Le arn ing Rat e = ? Ker nel Fun ctio n = ? Ma x De pt h = ? A c ti v a ti o n = ? K Near est Neig hbor ? L e af Si z e = ? Min Sam ples Leaf = ? Min Sam ples Split = ? Line ar Reg ress ion ? Automated Machine Learning Make it automated! 22
  • 26.
  • 28. How to examine Discrimination? 28
  • 29. A College Admission Example 29 accepted females accepted males rejected 50%>42% Seems unfair?
  • 30. A College Admission Example 30 accepted females accepted males rejected Low score High score 33.3%>26.7% 75%>65%
  • 31. A College Admission Example 31 accepted females accepted males rejected 20%=20% 40%=40% 60%=60% 80%=80% Low score High score CS EE CS EE
  • 32. 32 Two individuals who are similar with respect to a task are treated equally
  • 33. Visual Analysis of Discrimination in Machine Learning Tshinghua University 1. 2. Qianwen Wang1 Zhenhua Xu1 Huamin Qu1 Shixia Liu2 Zhutian Chen1 Yong Wang1
  • 34. 34 !"#"$%&'( )* +,- ."/ 01* ."/ 210 3'$45678// "#968:;'< ='9$/>3""4 ;<6'?")-04 @'<! A B Discriminatory Itemset
  • 36. Challenges in Analysis 36 3'$45678// 7'6%&'( C#968:;'<D*%2E ='9$/>3""4 +,- $"78:;'<DF '3<%6=;7# ='9/"D $"<: 68G;:87%&8;<D )E000 ?8$;:87D #;('$6"# 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- $"78:;'<DF <':%;<%!8?;7;I ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ='9/"D '3< Long and Complex Definition 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ?8$;:87D #;('$6"# Intertwining Relationship
  • 37. Long and Complex Definition 37
  • 38. Long and Complex Definition 38 23< raised hands < 50 Attribute Matrix Itemset Attribute
  • 39. Intertwining Relationships 39 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- $"78:;'<DF <':%;<%!8?;7;I ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ?8$;:87D #;('$6"# RippleSet
  • 40. Designing RippleSet 40 An item Items ∈ set A An item Items ∈ set A
  • 41. 41 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) Designing RippleSet
  • 42. 42 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) ABC ABE BCE ABCD CD Designing RippleSet
  • 43. 43 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) ABC ABE BCE ABCD CD Items belonging to the same set are put together D D Weighted DAG Circle packing algorithm Designing RippleSet
  • 44. 44
  • 46. 46 Hypothesize about the effect of the Common Orientation of an object Hypothesize about the effect of the Surrounding environment of an object What concepts has the model learned? Are the learned concepts always useful?
  • 48. Black-box Analysis 48 Prospector Krause et al. 2016 model prediction input What-if tool Wexler et al. 2019 GMUT Hohman et al. 2019 examine hypotheses about how perturbations to inputs affect the ML model outputs Not statistically-meaningful: • Only observations on individual predictions
  • 49. White-box Analysis 49 Deconvnet Zeiler and Fergus 2013 Guided back propagation Springenberg et al. 2013 What has a neuron learned? Not statistical-meaningful: • The depicted patterns provide largely a hunch rather than solid conclusions Not efficient: • It is impossible to examine all neurons
  • 50. Can we test concept-based hypotheses in an efficient and statistically-meaningful way ? 50
  • 51. H y p o M L : V i s u a l A n a l y s i s f o r H y p o t h e s i s - b a s e d E v a l u a t i o n o f M a c h i n e L e a r n i n g M o d e l s Qianwen Wang1 William Alexander2 Huamin Qu1 Min Chen2 Jack Pegg2
  • 52. noise noise D D + + D D Concept-based Testing 52 D + noise D M+ M+ M M M+ M 2 ML models ML Training 2 pairs of datasets Extra data that contains the testing concept
  • 53. Concept-based Testing 53 D + noise D M+ M 2 ML models ML Training D + noise D M+ M D + noise D M+ M R(M+,D) R(M,D) 4 sets of results ML Testing Extra data that contains the testing concept R(b) R(a) R(M+,D+) R(M,D+) 2 pairs of datasets
  • 54. R(b) R(a) Statistical Comparison 54 significantly lower than significantly higher than insignificantly lower or higher than or , but not or , but not Many uncontrolled variables……. µ(R(a))=0.878 > µ(R(b))=0.876
  • 55. Top-down workflow 55 0.8133 0.8347 0.8365 0.8356 Statistical Comparison Model Results H1. The concept is useful to M+ and would be useful to M H2. The concept is harmful to M+ and would be harmful to M H3. M has learned the concept ξ adequately H4. M+ has learned the concept ξ adequately H5. The extra information in D+ has a positive effect on M H6. The extra information in D+ has a negative effect on M H7. The extra information in D+ has a positive effect on M+ H8. The extra information in D+ has a negative effect on M+ H11. Leaning with Dm+ affects the extra part of M+ positively H12. Leaning with Dm+ afects the extra part of M+ negatively H9. Leaning with Dm+ affects the M part of M+ positively H10. Leaning with Dm+ affects the M part of M+ negatively Hypotheses p: 0.446 p: 0.098 p: 0.256 p: 0.377 p: 0.061 p: 0.079 R(M+,D+) R(M+,D) R(M,D+) R(M,D)
  • 56. Visual Analysis of Hypotheses 56 p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M + Supported Unproven Rejected A hypothesis is based on the analyses in the row
  • 57. p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M + Visual Analysis of Hypotheses 57 The analysis in row rejects supports unproves is conditional on is unrelated to the hypothesis in col
  • 58. Visual Analysis of Hypotheses 58 The difference is statistically significant insignificant p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M +
  • 59. Testing Concept: Color Space 64 RGB CIELAB HSV HSL CMYK YCrC How does the concept Color Space influence the ML model? RGB HSV
  • 60. 65 HSV RGB Noise RGB M+ D+ D M How to merge Color Space: Experiment Design How to merge
  • 61. 66 Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling Flatten add Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling max Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling max Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling Flatten max How to merge Color Space: Experiment Design maxpool 2 maxpool 1 add max
  • 62. 67 maxpool 1 The information from another color space HSV contributes to the prediction of this model Color Space: Results
  • 63. 68 maxpool 2 maxpool 1 The hypothesis testing results change when we merge at different positions Color Space: Results
  • 64. 69 add max Merge using different methods Color Space: Results
  • 65. Data Collection Model Development Model Evaluation Model Application Problem Understanding What can data visualization do to facilitate the application of ML in a specific domain ?
  • 66. Qianwen Wang Nils Gehlenborg Kexin Huang Payal Chandak Marinka Zitnik DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery 71 Anatomy Molecular Function Cellular Component Biological Process Phenoty pe/Effect Drug Disease indication, contraindication, off-label use drug side effects disease symptoms/ phenotypes Reactome Pathway present, absent Protein/ Gene relationships about drugs, diseases, proteins, pathways, effects as a heterogenous graph Data about biomedicine
  • 67. DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery The challenges are more than just providing explanations: 1) find a form of explanation that can be easily interpreted by doctors in the context of biomedicine 2) present the explanations in a scalable, effective, and steerable way. known relationships new therapeutic use deep learning knowledge learned by this model reasons of this prediction
  • 68. DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery
  • 69. D a t a Speci ficati on Knowl edge Visualization Perception Exploration data visualization user image m o d i f y s p e c i f i c a t i o n i n c r e a s e k n o w l e d g e Machine Learning Data Visualization Data Decisions
  • 70. Data Specification Knowledge Visualization Perception Exploration data visualization user image modify specification increase knowledge J. J. Van Wijk, “The value of visualization”, 2005 Data Visualization
  • 71. A p p l y i n g M a c h i n e L e a r n i n g A d v a n c e s t o D a t a V i s u a l i z a t i o n : A S u r v e y o f M L 4 V I S Qianwen Wang Huamin Qu Zhutian Chen Yong Wang
  • 72. b a 4 7 9 15 50 0 10 20 30 40 50 Other DMM ML HCI VIS 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 1 0 1 1 1 3 1 5 9 19 28 16 2020-
  • 75. Raw Data Processed Data VIS-driven Data Preprocessing VIS Luo et al. Interactive Cleaning for Progressive Visualization through Composite Questions, 2020
  • 76. Processed Data VIS Data Presentation Dibia and Demiralp, Data2Vis, 2018 Hu et al. , VixML, 2018
  • 78. Data VIS of A Specific Style Style Imitation VIS Tang et al, PlotThread, 2020 Wu et al., MobileVisFixer, 2020 Smart et al., 2019
  • 79. DeepDrawing: A Deep Learning Approach to Graph Drawing Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu Graph Data Style Imitation Graph Drawing Graph Drawing Samples The curved green arrows (real edges of graphs) explicitly reflect the actual graph structure The dotted yellow arrows (“fake” edges) propagate the prior nodes’ overall influence on the drawing of subsequent nodes A graph-based LSTM for the learning of graph drawing
  • 80. DeepDrawing: A Deep Learning Approach to Graph Drawing Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu Graph Data Style Imitation Graph Drawing Graph Drawing Samples Baseline Model: a 4-layer bi- directional LSTM
  • 81. VIS Data, Style, Insights VIS Perception Bylinskii et al. 2017 Poco et al. 2017 Kafle et al. 2018
  • 82. Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu VIS Perception "encoding": { "x": { "field": “sale”, "scale": { "bandSize": 30 }, "type": "quantitative" .. Bitmap visualization Visualization specification Mask-RCNN Post processing based on GrabCut
  • 83. Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu VIS Perception "encoding": { "x": { "field": “sale”, "scale": { "bandSize": 30 }, "type": "quantitative" .. Bitmap visualization Visualization specification
  • 84. VIS VIS Interaction User Action VIS Chen et al. 2020 Ottley et al. 2019
  • 85. ID paper venue 1 Gotz and Wen [86] IUI 2009 X X X 2 Savva et al. [107] UIST 2011 X X 3 Key et al. [11] SIGMOD 2012 X X 4 Steichen et al. [84] IUI 2013 X X 5 Brown et al. [62] TVCG 2014 X X 6 Lalle et al. [83] IUI 2014 X X 7 Toker et al. [12] IUI 2014 X X 8 Sedlmair and Aupetit [13] CGF 2015 X X 9 Mutlu et al. [14] TiiS 2016 X X 10 Aupetit and Sedlmair [95] PVis 2016 X X 11 Siegel et al. [102] ECCV 2016 X X 12 Kembhavi et al. [92] ECCV 2016 X x 13 Al-Zaidy et al. [15] AAAI 2016 X X 14 Poci et al. [88] VIS 2017 X X 15 Kwon et al. [74] VIS 2017 X x 16 Bylinskii et al. [64] UIST 2017 X X 17 Saha et al. [117] IJCAI 2017 X X 18 Kruiger et al. [16] EuroVis 2017 X X 19 Poco and Heer [89] EuroVis 2017 X X 20 Jung et al. [99] CHI 2017 X X 21 Bylinskii et al. [100] arxiv 2017 X X X 22 Al-Zaidy and Giles [17] AAAI 2017 X X 23 Siddiqui et al. [61] VLDB 2018 X X 24 Gramazio et al. [85] VIS 2018 X X 25 Moritz et al. [18] VIS 2018 X X x 26 Berger et al. [68] VIS 2018 X X 27 Wang et al. [53] VIS 2018 X X 28 Haehn et al. [19] VIS 2018 X x 29 Luo et al. [57] SIGMOD 2018 X X x 30 Milo and Somech [80] KDD 2018 X X 31 Zhou et al. [20] IJCAI 2018 X X 32 Kahou et al. [101] ICLR 2018 X X 33 Luo et al. [65] ICDE 2018 X X 34 [Fan and Hauser [79] EuroVis 2018 X X 35 Chegini et al. [96] EuroVis 2018 X X 36 Kafle et al. [63] CVPR 2018 X X x 37 Kim et al. [106] CVPR 2018 X x 38 Battle et al. [108] CHI 2018 X X 39 Dibia and Demiralp [54] CGA 2018 X X 40 Haleem et al. [94] CGA 2018 X X 41 Madan et al. [103] arxiv 2018 X x X V I S - d r i v e n D a t a P r o c e s s i n g P r e s e n t D a t a C o m m u n i c a t e I n s i g h t I m i t a t e S t y l e V I S P e r c e p t i o n V I S I n t e r a c t i o n C l u s t e r i n g D i m e n s i o n R e d u c t i o n G e n e r a t i v e C l a s s i f i c a t i o n R e g r e s s i o n S e m i - s u p e r v i s e d R e i n f o r c e m e n t 14 Poci et al. [88] VIS 2017 X X 15 Kwon et al. [74] VIS 2017 X x 16 Bylinskii et al. [64] UIST 2017 X X 17 Saha et al. [117] IJCAI 2017 X X 18 Kruiger et al. [16] EuroVis 2017 X X 19 Poco and Heer [89] EuroVis 2017 X X 20 Jung et al. [99] CHI 2017 X X 21 Bylinskii et al. [100] arxiv 2017 X X X 22 Al-Zaidy and Giles [17] AAAI 2017 X X 23 Siddiqui et al. [61] VLDB 2018 X X 24 Gramazio et al. [85] VIS 2018 X X 25 Moritz et al. [18] VIS 2018 X X x 26 Berger et al. [68] VIS 2018 X X 27 Wang et al. [53] VIS 2018 X X 28 Haehn et al. [19] VIS 2018 X x 29 Luo et al. [57] SIGMOD 2018 X X x 30 Milo and Somech [80] KDD 2018 X X 31 Zhou et al. [20] IJCAI 2018 X X 32 Kahou et al. [101] ICLR 2018 X X 33 Luo et al. [65] ICDE 2018 X X 34 [Fan and Hauser [79] EuroVis 2018 X X 35 Chegini et al. [96] EuroVis 2018 X X 36 Kafle et al. [63] CVPR 2018 X X x 37 Kim et al. [106] CVPR 2018 X x 38 Battle et al. [108] CHI 2018 X X 39 Dibia and Demiralp [54] CGA 2018 X X 40 Haleem et al. [94] CGA 2018 X X 41 Madan et al. [103] arxiv 2018 X x X 42 Yu and Silva [82] VIS 2019 X X 43 He et al. [69] VIS 2019 X X 44 Chen et al. [59] VIS 2019 X X 45 Han and Wang [67] VIS 2019 X X 46 Chen et al. [55] VIS 2019 X X 47 Kwon and Ma [75] VIS 2019 X X 48 Wang et al. [2] VIS 2019 X x 49 Han et al. [120] VIS 2019 X X x 50 Wall et al. [111] VIS 2019 X X 51 Fujiwara et al. [118] VIS 2019 X X 52 Fu et al. [3] VIS 2019 X x X 53 Porter et al. [21] VIS 2019 X X 54 Jo and Seo [119] VIS 2019 X X x 55 Ma et al. [93] VIS 2019 X X 56 Wang et al. [73] VIS 2019 x X 57 Cui et al. [56] VIS 2019 X X 58 Chen et al. [5] VIS 2019 x X 59 Wang et al. [22] VIS 2019 X x 60 Smart et al. [58] VIS 2019 X X 61 Huang et al. [104] VIS 2019 X X 62 Hong et al. [23] PacificVis 2019 X X 63 Fan and Hauser [122] EuroVis 2019 X X 64 Ottley et al. [60] EuroVis 2019 X X 65 Abbas et al. [121] EuroVis 2019 X x x 66 Kassel and Rohs [24] EuroVis 2019 X X X 67 Hu et al. [66] CHI 2019 X X 68 Fan and Hauser [25] CGA 2019 X X 69 Kafle et al. [26] arxiv 2019 X X 45 Han and Wang [67] VIS 2019 X X 46 Chen et al. [55] VIS 2019 X X 47 Kwon and Ma [75] VIS 2019 X X 48 Wang et al. [2] VIS 2019 X x 49 Han et al. [120] VIS 2019 X X x 50 Wall et al. [111] VIS 2019 X X 51 Fujiwara et al. [118] VIS 2019 X X 52 Fu et al. [3] VIS 2019 X x X 53 Porter et al. [21] VIS 2019 X X 54 Jo and Seo [119] VIS 2019 X X x 55 Ma et al. [93] VIS 2019 X X 56 Wang et al. [73] VIS 2019 x X 57 Cui et al. [56] VIS 2019 X X 58 Chen et al. [5] VIS 2019 x X 59 Wang et al. [22] VIS 2019 X x 60 Smart et al. [58] VIS 2019 X X 61 Huang et al. [104] VIS 2019 X X 62 Hong et al. [23] PacificVis 2019 X X 63 Fan and Hauser [122] EuroVis 2019 X X 64 Ottley et al. [60] EuroVis 2019 X X 65 Abbas et al. [121] EuroVis 2019 X x x 66 Kassel and Rohs [24] EuroVis 2019 X X X 67 Hu et al. [66] CHI 2019 X X 68 Fan and Hauser [25] CGA 2019 X X 69 Kafle et al. [26] arxiv 2019 X X 70 Mohammed [27] VLDB 2020 X x 71 Zhang et al. [90] VIS 2020 x x x 72 Wu et al. [77] VIS 2020 x x 73 Tang et al. [76] VIS 2020 x x 74 Qian et al. [28] VIS 2020 x x 75 Wang et al. [29] VIS 2020 x X 76 Fosco et al. [112] UIST 2020 x x 77 Giovannangeli et al. [139] PacificVis 2020 x x 78 Liu et al. [105] PacificVis 2020 x x x 79 Luo et al. [52] ICDE 2020 X x x 80 Lekschas et al. [113] EuroVis 2020 x x X x 81 Zhao et al. [30] CHI 2020 X x 82 Lai et al. [31] CHI 2020 x x x 83 Kim et al. [32] CHI 2020 x x x 84 Lu et al. [33] CHI 2020 x x x 85 Zhou et al. [109] arxiv 2020 X X Machine Learning Tasks: Clustering, Dimension Reduction, Generation Classification Regression Semi-supervised Learning Reinforcement Learning Visualization Process: VIS-driven Data Processing Data Presentation Insight Communication Style Imitation VIS Perception VIS Interaction
  • 86. Classification is the most widely used
  • 87. This might be caused by the success of deep learning in computer vision tasks
  • 88. We need to better embrace the diversity of ML techniques.
  • 89. ML4VIS: Opportunities & Challenges Public High-quality Datasets & Benchmark Tasks Visualization-Tailored Machine Learning User-Friendly ML4VIS
  • 90. ML4VIS: Opportunities & Challenges Public High-quality Datasets & Benchmark Tasks • Most papers constructed their own datasets due to the lack of public visualization datasets • The dataset quality may endanger the validity of the obtained ML models. e.g., DeepEye [luo et al.2019] learns to classify “good”/“bad” visualizations based on the training examples labelled by 100 students • Benchmark tasks for ML4VIS remain unclear
  • 91. ML4VIS: Opportunities & Challenges Visualization-Tailored Machine Learning • Most ML4VIS studies directly apply general ML techniques developed in the field of ML • General ML techniques not always suit well for the specific problems in visualization
  • 92. ML4VIS: Opportunities & Challenges User-friendly ML4VIS • The employment of ML not only provides opportunities but also poses new challenges in designing visualizations • Some ML4VIS studies have discussed the usability issues of ML4VIS, but these suggestions are scattered among different papers • Future studies are needed to help designers better understand user behaviours and expectations in this new ML4VIS scenario https://qarea.com/blog/5-tips-for-creating-user-friendly-interface
  • 93. Machine Learning + Data Visualization + Humans Amount of Information Few Large Human Head Pure Machine Learning Pure Data Visualization Task Definition Fuzzy Clear There is no panacea A better combination between the power of visualization, machine learning, and human users: • How to split tasks • How to dynamically modify the splitting based on user preference and expertise • How to design novel algorithms & visualizations for the collaboration Machine Learning or Data Visualization?