1 of 94

## Similar to From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSISIrene Pochinok

Hypothesis Testing
Hypothesis TestingRyan Herzog

Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptxxababid981

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie

Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithmCemal Ardil

STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)sumanmathews

Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing datamjlobetos

Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docxalfred4lewis58146

Sparsenet
Sparsenetndronen

VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHESIAEME Publication

t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750richardchandler

unit classification.pptx
unit classification.pptxssuser908de6

ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdfMohammedArish6

### Similar to From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning(20)

Topic 1 part 2
Topic 1 part 2

FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS
FUNCTION OF RIVAL SIMILARITY IN A COGNITIVE DATA ANALYSIS

Hypothesis Testing
Hypothesis Testing

Overview Of Quartile.pptx
Overview Of Quartile.pptx

Regression
Regression

ML-MCQ.pdf
ML-MCQ.pdf

Marketing Research Hypothesis Testing.pptx
Marketing Research Hypothesis Testing.pptx

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...

Mimo system-order-reduction-using-real-coded-genetic-algorithm
Mimo system-order-reduction-using-real-coded-genetic-algorithm

STANDARD DEVIATION (2018) (STATISTICS)
STANDARD DEVIATION (2018) (STATISTICS)

Lesson 27 using statistical techniques in analyzing data
Lesson 27 using statistical techniques in analyzing data

1624.pptx
1624.pptx

Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx
Page 1 of 18Part A Multiple Choice (1–11)______1. Using.docx

Cairo 02 Stat Inference
Cairo 02 Stat Inference

Sparsenet
Sparsenet

VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES
VARIOUS FUZZY NUMBERS AND THEIR VARIOUS RANKING APPROACHES

t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750

unit classification.pptx
unit classification.pptx

ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf
ObjectiveQuestionsonEngineeringMathematicsForGATE2022.pdf

Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12cjchen22

LM7_ Embedded Sql and Dynamic SQL in dbms
LM7_ Embedded Sql and Dynamic SQL in dbmsBalaKrish12

Feasibility analysis and modeling of a solar hybrid system for residential el...
Feasibility analysis and modeling of a solar hybrid system for residential el...IJECEIAES

maths mini project ( applictions of quadratic forms and SVD ).ppt
maths mini project ( applictions of quadratic forms and SVD ).pptManavPatane

Pyrolysis process control: temperature control design and application for opt...
Pyrolysis process control: temperature control design and application for opt...IJECEIAES

Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar

Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7

How to Implement Effective Stormwater Management in DC
How to Implement Effective Stormwater Management in DCSera Engineered, LLC

Defining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptxAshwiniTodkar4

KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales

priority interrupt computer organization
priority interrupt computer organizationchnrketan

Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptxPoonam60376

Madani.store - Planning - Interview Questions
Madani.store - Planning - Interview QuestionsKarim Gaber

SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar

Understanding Process Safety Incidents in the Oil and Gas Industry
Understanding Process Safety Incidents in the Oil and Gas Industrysoginsider

sedimentation for the material for system.
sedimentation for the material for system.Shyam97291

Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf

LEA's chemistry of cement and concrete - 2019.pdf
LEA's chemistry of cement and concrete - 2019.pdfJurgen Kola

Network Enhancements on BitVisor for BitVisor Summit 12
Network Enhancements on BitVisor for BitVisor Summit 12

LM7_ Embedded Sql and Dynamic SQL in dbms
LM7_ Embedded Sql and Dynamic SQL in dbms

Feasibility analysis and modeling of a solar hybrid system for residential el...
Feasibility analysis and modeling of a solar hybrid system for residential el...

maths mini project ( applictions of quadratic forms and SVD ).ppt
maths mini project ( applictions of quadratic forms and SVD ).ppt

Pyrolysis process control: temperature control design and application for opt...
Pyrolysis process control: temperature control design and application for opt...

Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question

Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...

How to Implement Effective Stormwater Management in DC
How to Implement Effective Stormwater Management in DC

Defining the Clouds for entriprises.pptx
Defining the Clouds for entriprises.pptx

KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos

priority interrupt computer organization
priority interrupt computer organization

Introduction of Object Oriented Programming Language using Java. .pptx
Introduction of Object Oriented Programming Language using Java. .pptx

Madani.store - Planning - Interview Questions
Madani.store - Planning - Interview Questions

ASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductos

SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION

Understanding Process Safety Incidents in the Oil and Gas Industry
Understanding Process Safety Incidents in the Oil and Gas Industry

sedimentation for the material for system.
sedimentation for the material for system.

Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf

Versatile Engineering Construction Firms
Versatile Engineering Construction Firms

LEA's chemistry of cement and concrete - 2019.pdf
LEA's chemistry of cement and concrete - 2019.pdf

### From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

• 1. From Data to Decisions, A Mixed Path of Data Visualization and Machine Learning Qianwen Wang Hypothesis p-value thr:0.05 Model Results R(M, D) R(M, D+) R(M+, D) R(M+, D+) 0.7405 0.5232 0.2961 0.8705 0.030 R(M, D+)<R(M, D) 0.000 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.006 R(M+, D+)>R(M, D+) 0.048 R(M+, D)<R(M, D+) 0.000 R(M+, D+)>R(M+, D) H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t is u s e f u l t o M + a n d w o u ld b e u s e f u l t o M T h e c o n c e p t is h a r m f u l t o M + a n d w o u ld b e h a r m f u l t o M M h a s a lr e a d y le a r n e d t h e M + h a s le a r n e d t h e T h e e x t r a in f o r m a t io n in D + h a s a p o s it iv e e f f e c t o n M T h e e x t r a in f o r m a t io n in D + h a s a n e g a t iv e e f f e c t o n M T h e e x t r a in f o r m a t io n in D + h a s a p o s it iv e e f f e c t o n M + T h e e x t r a in f o r m a t io n in D + h a s a n e g a t iv e e f f e c t o n M + L e a n in g w it h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s it iv e ly L e a n in g w it h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t iv e ly L e a n in g w it h D m + a f f e c t s t h e M p a r t o f M + p o s it iv e ly L e a n in g w it h D m + a f f e c t s t h e M p a r t o f M + n e g a t iv e ly T h e c o n c e p t is u s e f u l t o M +
• 2. Advisor: Huamin Qu Advisor: Nils Gehlenborg 2020 2017 2019 2015 Machine Learning Data Visualization Human Computer Interaction
• 4. Machine Learning Data Visualization • An ability to learn from data, extract patterns, and make decisions with minimum human intervention • An accessible way for humans to interpret data, identify patterns, and make data- driven decisions
• 7. Artificial intelligence is still human intelligence Data Visualization Machine Learning
• 12. Overwhelmed by the Variety 12 DNN DNN D N N DNN DNN DNN DNN DNN DNN DNN DNN D N N DNN DNN DNN DNN Deep Neural Network (DNN)
• 14. V i s u a l G e n e a l o g y o f D e e p N e u r a l N e t w o r k s Qianwen Wang1, Jun Yuan2, Shuxin Chen2, Hang Su2, Huamin Qu1, and Shixia Liu2 Tshinghua University
• 16.
• 19. 19 How to combine skip connection with the main branch? Gate Addition Concatenation A mixture + || + || Case: Investigate Evolution Patterns
• 20. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, Huamin Qu
• 21. 21 Developing ML Models A model for my task SVM MLP Random Forest KNN . . . . . . learning rate = ? # layers = ? batch size =? # neurons = ? . . . . . .
• 22. … SVM MLP Rando m Forest KN N . . . . . . learning rate = ? # layers = ? batch size =? # neurons = ? . . . . . . Suppor t Vector Machin e ? Ne ura l Ne tw ork ? Ra nd om For est ? Hid de n Lay er = ? Le arn ing Rat e = ? Ker nel Fun ctio n = ? Ma x De pt h = ? A c ti v a ti o n = ? K Near est Neig hbor ? L e af Si z e = ? Min Sam ples Leaf = ? Min Sam ples Split = ? Line ar Reg ress ion ? Automated Machine Learning Make it automated! 22
• 26.
• 28. How to examine Discrimination? 28
• 29. A College Admission Example 29 accepted females accepted males rejected 50%>42% Seems unfair?
• 30. A College Admission Example 30 accepted females accepted males rejected Low score High score 33.3%>26.7% 75%>65%
• 31. A College Admission Example 31 accepted females accepted males rejected 20%=20% 40%=40% 60%=60% 80%=80% Low score High score CS EE CS EE
• 32. 32 Two individuals who are similar with respect to a task are treated equally
• 33. Visual Analysis of Discrimination in Machine Learning Tshinghua University 1. 2. Qianwen Wang1 Zhenhua Xu1 Huamin Qu1 Shixia Liu2 Zhutian Chen1 Yong Wang1
• 34. 34 !"#"$%&'( )* +,- ."/ 01* ."/ 210 3'$45678// "#968:;'< ='9$/>3""4 ;<6'?")-04 @'<! A B Discriminatory Itemset • 36. Challenges in Analysis 36 3'$45678// 7'6%&'( C#968:;'<D*%2E ='9$/>3""4 +,-$"78:;'<DF '3<%6=;7# ='9/"D $"<: 68G;:87%&8;<D )E000 ?8$;:87D #;('$6"# 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- $"78:;'<DF <':%;<%!8?;7;I ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ='9/"D '3< Long and Complex Definition 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ?8$;:87D #;('$6"# Intertwining Relationship • 37. Long and Complex Definition 37 • 38. Long and Complex Definition 38 23< raised hands < 50 Attribute Matrix Itemset Attribute • 39. Intertwining Relationships 39 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- $"78:;'<DF <':%;<%!8?;7;I ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ?8$;:87D #;('\$6"# RippleSet
• 40. Designing RippleSet 40 An item Items ∈ set A An item Items ∈ set A
• 41. 41 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) Designing RippleSet
• 42. 42 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) ABC ABE BCE ABCD CD Designing RippleSet
• 43. 43 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) ABC ABE BCE ABCD CD Items belonging to the same set are put together D D Weighted DAG Circle packing algorithm Designing RippleSet
• 44. 44
• 46. 46 Hypothesize about the effect of the Common Orientation of an object Hypothesize about the effect of the Surrounding environment of an object What concepts has the model learned? Are the learned concepts always useful?
• 48. Black-box Analysis 48 Prospector Krause et al. 2016 model prediction input What-if tool Wexler et al. 2019 GMUT Hohman et al. 2019 examine hypotheses about how perturbations to inputs affect the ML model outputs Not statistically-meaningful: • Only observations on individual predictions
• 49. White-box Analysis 49 Deconvnet Zeiler and Fergus 2013 Guided back propagation Springenberg et al. 2013 What has a neuron learned? Not statistical-meaningful: • The depicted patterns provide largely a hunch rather than solid conclusions Not efficient: • It is impossible to examine all neurons
• 50. Can we test concept-based hypotheses in an efficient and statistically-meaningful way ? 50
• 51. H y p o M L : V i s u a l A n a l y s i s f o r H y p o t h e s i s - b a s e d E v a l u a t i o n o f M a c h i n e L e a r n i n g M o d e l s Qianwen Wang1 William Alexander2 Huamin Qu1 Min Chen2 Jack Pegg2
• 52. noise noise D D + + D D Concept-based Testing 52 D + noise D M+ M+ M M M+ M 2 ML models ML Training 2 pairs of datasets Extra data that contains the testing concept
• 53. Concept-based Testing 53 D + noise D M+ M 2 ML models ML Training D + noise D M+ M D + noise D M+ M R(M+,D) R(M,D) 4 sets of results ML Testing Extra data that contains the testing concept R(b) R(a) R(M+,D+) R(M,D+) 2 pairs of datasets
• 54. R(b) R(a) Statistical Comparison 54 significantly lower than significantly higher than insignificantly lower or higher than or , but not or , but not Many uncontrolled variables……. µ(R(a))=0.878 > µ(R(b))=0.876
• 55. Top-down workflow 55 0.8133 0.8347 0.8365 0.8356 Statistical Comparison Model Results H1. The concept is useful to M+ and would be useful to M H2. The concept is harmful to M+ and would be harmful to M H3. M has learned the concept ξ adequately H4. M+ has learned the concept ξ adequately H5. The extra information in D+ has a positive effect on M H6. The extra information in D+ has a negative effect on M H7. The extra information in D+ has a positive effect on M+ H8. The extra information in D+ has a negative effect on M+ H11. Leaning with Dm+ affects the extra part of M+ positively H12. Leaning with Dm+ afects the extra part of M+ negatively H9. Leaning with Dm+ affects the M part of M+ positively H10. Leaning with Dm+ affects the M part of M+ negatively Hypotheses p: 0.446 p: 0.098 p: 0.256 p: 0.377 p: 0.061 p: 0.079 R(M+,D+) R(M+,D) R(M,D+) R(M,D)
• 56. Visual Analysis of Hypotheses 56 p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M + Supported Unproven Rejected A hypothesis is based on the analyses in the row
• 57. p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M + Visual Analysis of Hypotheses 57 The analysis in row rejects supports unproves is conditional on is unrelated to the hypothesis in col
• 58. Visual Analysis of Hypotheses 58 The difference is statistically significant insignificant p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M +
• 59. Testing Concept: Color Space 64 RGB CIELAB HSV HSL CMYK YCrC How does the concept Color Space influence the ML model? RGB HSV
• 60. 65 HSV RGB Noise RGB M+ D+ D M How to merge Color Space: Experiment Design How to merge
• 61. 66 Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling Flatten add Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling max Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling max Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling Flatten max How to merge Color Space: Experiment Design maxpool 2 maxpool 1 add max
• 62. 67 maxpool 1 The information from another color space HSV contributes to the prediction of this model Color Space: Results
• 63. 68 maxpool 2 maxpool 1 The hypothesis testing results change when we merge at different positions Color Space: Results
• 64. 69 add max Merge using different methods Color Space: Results
• 65. Data Collection Model Development Model Evaluation Model Application Problem Understanding What can data visualization do to facilitate the application of ML in a specific domain ?
• 66. Qianwen Wang Nils Gehlenborg Kexin Huang Payal Chandak Marinka Zitnik DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery 71 Anatomy Molecular Function Cellular Component Biological Process Phenoty pe/Effect Drug Disease indication, contraindication, off-label use drug side effects disease symptoms/ phenotypes Reactome Pathway present, absent Protein/ Gene relationships about drugs, diseases, proteins, pathways, effects as a heterogenous graph Data about biomedicine
• 67. DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery The challenges are more than just providing explanations: 1) find a form of explanation that can be easily interpreted by doctors in the context of biomedicine 2) present the explanations in a scalable, effective, and steerable way. known relationships new therapeutic use deep learning knowledge learned by this model reasons of this prediction
• 68. DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery
• 69. D a t a Speci ficati on Knowl edge Visualization Perception Exploration data visualization user image m o d i f y s p e c i f i c a t i o n i n c r e a s e k n o w l e d g e Machine Learning Data Visualization Data Decisions
• 70. Data Specification Knowledge Visualization Perception Exploration data visualization user image modify specification increase knowledge J. J. Van Wijk, “The value of visualization”, 2005 Data Visualization
• 71. A p p l y i n g M a c h i n e L e a r n i n g A d v a n c e s t o D a t a V i s u a l i z a t i o n : A S u r v e y o f M L 4 V I S Qianwen Wang Huamin Qu Zhutian Chen Yong Wang
• 72. b a 4 7 9 15 50 0 10 20 30 40 50 Other DMM ML HCI VIS 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 1 0 1 1 1 3 1 5 9 19 28 16 2020-
• 75. Raw Data Processed Data VIS-driven Data Preprocessing VIS Luo et al. Interactive Cleaning for Progressive Visualization through Composite Questions, 2020
• 76. Processed Data VIS Data Presentation Dibia and Demiralp, Data2Vis, 2018 Hu et al. , VixML, 2018
• 78. Data VIS of A Specific Style Style Imitation VIS Tang et al, PlotThread, 2020 Wu et al., MobileVisFixer, 2020 Smart et al., 2019
• 79. DeepDrawing: A Deep Learning Approach to Graph Drawing Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu Graph Data Style Imitation Graph Drawing Graph Drawing Samples The curved green arrows (real edges of graphs) explicitly reflect the actual graph structure The dotted yellow arrows (“fake” edges) propagate the prior nodes’ overall influence on the drawing of subsequent nodes A graph-based LSTM for the learning of graph drawing
• 80. DeepDrawing: A Deep Learning Approach to Graph Drawing Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu Graph Data Style Imitation Graph Drawing Graph Drawing Samples Baseline Model: a 4-layer bi- directional LSTM
• 81. VIS Data, Style, Insights VIS Perception Bylinskii et al. 2017 Poco et al. 2017 Kafle et al. 2018
• 82. Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu VIS Perception "encoding": { "x": { "field": “sale”, "scale": { "bandSize": 30 }, "type": "quantitative" .. Bitmap visualization Visualization specification Mask-RCNN Post processing based on GrabCut
• 83. Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu VIS Perception "encoding": { "x": { "field": “sale”, "scale": { "bandSize": 30 }, "type": "quantitative" .. Bitmap visualization Visualization specification
• 84. VIS VIS Interaction User Action VIS Chen et al. 2020 Ottley et al. 2019
• 85. ID paper venue 1 Gotz and Wen [86] IUI 2009 X X X 2 Savva et al. [107] UIST 2011 X X 3 Key et al. [11] SIGMOD 2012 X X 4 Steichen et al. [84] IUI 2013 X X 5 Brown et al. [62] TVCG 2014 X X 6 Lalle et al. [83] IUI 2014 X X 7 Toker et al. [12] IUI 2014 X X 8 Sedlmair and Aupetit [13] CGF 2015 X X 9 Mutlu et al. [14] TiiS 2016 X X 10 Aupetit and Sedlmair [95] PVis 2016 X X 11 Siegel et al. [102] ECCV 2016 X X 12 Kembhavi et al. [92] ECCV 2016 X x 13 Al-Zaidy et al. [15] AAAI 2016 X X 14 Poci et al. [88] VIS 2017 X X 15 Kwon et al. [74] VIS 2017 X x 16 Bylinskii et al. [64] UIST 2017 X X 17 Saha et al. [117] IJCAI 2017 X X 18 Kruiger et al. [16] EuroVis 2017 X X 19 Poco and Heer [89] EuroVis 2017 X X 20 Jung et al. [99] CHI 2017 X X 21 Bylinskii et al. [100] arxiv 2017 X X X 22 Al-Zaidy and Giles [17] AAAI 2017 X X 23 Siddiqui et al. [61] VLDB 2018 X X 24 Gramazio et al. [85] VIS 2018 X X 25 Moritz et al. [18] VIS 2018 X X x 26 Berger et al. [68] VIS 2018 X X 27 Wang et al. [53] VIS 2018 X X 28 Haehn et al. [19] VIS 2018 X x 29 Luo et al. [57] SIGMOD 2018 X X x 30 Milo and Somech [80] KDD 2018 X X 31 Zhou et al. [20] IJCAI 2018 X X 32 Kahou et al. [101] ICLR 2018 X X 33 Luo et al. [65] ICDE 2018 X X 34 [Fan and Hauser [79] EuroVis 2018 X X 35 Chegini et al. [96] EuroVis 2018 X X 36 Kafle et al. [63] CVPR 2018 X X x 37 Kim et al. [106] CVPR 2018 X x 38 Battle et al. [108] CHI 2018 X X 39 Dibia and Demiralp [54] CGA 2018 X X 40 Haleem et al. [94] CGA 2018 X X 41 Madan et al. [103] arxiv 2018 X x X V I S - d r i v e n D a t a P r o c e s s i n g P r e s e n t D a t a C o m m u n i c a t e I n s i g h t I m i t a t e S t y l e V I S P e r c e p t i o n V I S I n t e r a c t i o n C l u s t e r i n g D i m e n s i o n R e d u c t i o n G e n e r a t i v e C l a s s i f i c a t i o n R e g r e s s i o n S e m i - s u p e r v i s e d R e i n f o r c e m e n t 14 Poci et al. [88] VIS 2017 X X 15 Kwon et al. [74] VIS 2017 X x 16 Bylinskii et al. [64] UIST 2017 X X 17 Saha et al. [117] IJCAI 2017 X X 18 Kruiger et al. [16] EuroVis 2017 X X 19 Poco and Heer [89] EuroVis 2017 X X 20 Jung et al. [99] CHI 2017 X X 21 Bylinskii et al. [100] arxiv 2017 X X X 22 Al-Zaidy and Giles [17] AAAI 2017 X X 23 Siddiqui et al. [61] VLDB 2018 X X 24 Gramazio et al. [85] VIS 2018 X X 25 Moritz et al. [18] VIS 2018 X X x 26 Berger et al. [68] VIS 2018 X X 27 Wang et al. [53] VIS 2018 X X 28 Haehn et al. [19] VIS 2018 X x 29 Luo et al. [57] SIGMOD 2018 X X x 30 Milo and Somech [80] KDD 2018 X X 31 Zhou et al. [20] IJCAI 2018 X X 32 Kahou et al. [101] ICLR 2018 X X 33 Luo et al. [65] ICDE 2018 X X 34 [Fan and Hauser [79] EuroVis 2018 X X 35 Chegini et al. [96] EuroVis 2018 X X 36 Kafle et al. [63] CVPR 2018 X X x 37 Kim et al. [106] CVPR 2018 X x 38 Battle et al. [108] CHI 2018 X X 39 Dibia and Demiralp [54] CGA 2018 X X 40 Haleem et al. [94] CGA 2018 X X 41 Madan et al. [103] arxiv 2018 X x X 42 Yu and Silva [82] VIS 2019 X X 43 He et al. [69] VIS 2019 X X 44 Chen et al. [59] VIS 2019 X X 45 Han and Wang [67] VIS 2019 X X 46 Chen et al. [55] VIS 2019 X X 47 Kwon and Ma [75] VIS 2019 X X 48 Wang et al. [2] VIS 2019 X x 49 Han et al. [120] VIS 2019 X X x 50 Wall et al. [111] VIS 2019 X X 51 Fujiwara et al. [118] VIS 2019 X X 52 Fu et al. [3] VIS 2019 X x X 53 Porter et al. [21] VIS 2019 X X 54 Jo and Seo [119] VIS 2019 X X x 55 Ma et al. [93] VIS 2019 X X 56 Wang et al. [73] VIS 2019 x X 57 Cui et al. [56] VIS 2019 X X 58 Chen et al. [5] VIS 2019 x X 59 Wang et al. [22] VIS 2019 X x 60 Smart et al. [58] VIS 2019 X X 61 Huang et al. [104] VIS 2019 X X 62 Hong et al. [23] PacificVis 2019 X X 63 Fan and Hauser [122] EuroVis 2019 X X 64 Ottley et al. [60] EuroVis 2019 X X 65 Abbas et al. [121] EuroVis 2019 X x x 66 Kassel and Rohs [24] EuroVis 2019 X X X 67 Hu et al. [66] CHI 2019 X X 68 Fan and Hauser [25] CGA 2019 X X 69 Kafle et al. [26] arxiv 2019 X X 45 Han and Wang [67] VIS 2019 X X 46 Chen et al. [55] VIS 2019 X X 47 Kwon and Ma [75] VIS 2019 X X 48 Wang et al. [2] VIS 2019 X x 49 Han et al. [120] VIS 2019 X X x 50 Wall et al. [111] VIS 2019 X X 51 Fujiwara et al. [118] VIS 2019 X X 52 Fu et al. [3] VIS 2019 X x X 53 Porter et al. [21] VIS 2019 X X 54 Jo and Seo [119] VIS 2019 X X x 55 Ma et al. [93] VIS 2019 X X 56 Wang et al. [73] VIS 2019 x X 57 Cui et al. [56] VIS 2019 X X 58 Chen et al. [5] VIS 2019 x X 59 Wang et al. [22] VIS 2019 X x 60 Smart et al. [58] VIS 2019 X X 61 Huang et al. [104] VIS 2019 X X 62 Hong et al. [23] PacificVis 2019 X X 63 Fan and Hauser [122] EuroVis 2019 X X 64 Ottley et al. [60] EuroVis 2019 X X 65 Abbas et al. [121] EuroVis 2019 X x x 66 Kassel and Rohs [24] EuroVis 2019 X X X 67 Hu et al. [66] CHI 2019 X X 68 Fan and Hauser [25] CGA 2019 X X 69 Kafle et al. [26] arxiv 2019 X X 70 Mohammed [27] VLDB 2020 X x 71 Zhang et al. [90] VIS 2020 x x x 72 Wu et al. [77] VIS 2020 x x 73 Tang et al. [76] VIS 2020 x x 74 Qian et al. [28] VIS 2020 x x 75 Wang et al. [29] VIS 2020 x X 76 Fosco et al. [112] UIST 2020 x x 77 Giovannangeli et al. [139] PacificVis 2020 x x 78 Liu et al. [105] PacificVis 2020 x x x 79 Luo et al. [52] ICDE 2020 X x x 80 Lekschas et al. [113] EuroVis 2020 x x X x 81 Zhao et al. [30] CHI 2020 X x 82 Lai et al. [31] CHI 2020 x x x 83 Kim et al. [32] CHI 2020 x x x 84 Lu et al. [33] CHI 2020 x x x 85 Zhou et al. [109] arxiv 2020 X X Machine Learning Tasks: Clustering, Dimension Reduction, Generation Classification Regression Semi-supervised Learning Reinforcement Learning Visualization Process: VIS-driven Data Processing Data Presentation Insight Communication Style Imitation VIS Perception VIS Interaction
• 86. Classification is the most widely used
• 87. This might be caused by the success of deep learning in computer vision tasks
• 88. We need to better embrace the diversity of ML techniques.
• 89. ML4VIS: Opportunities & Challenges Public High-quality Datasets & Benchmark Tasks Visualization-Tailored Machine Learning User-Friendly ML4VIS
• 90. ML4VIS: Opportunities & Challenges Public High-quality Datasets & Benchmark Tasks • Most papers constructed their own datasets due to the lack of public visualization datasets • The dataset quality may endanger the validity of the obtained ML models. e.g., DeepEye [luo et al.2019] learns to classify “good”/“bad” visualizations based on the training examples labelled by 100 students • Benchmark tasks for ML4VIS remain unclear
• 91. ML4VIS: Opportunities & Challenges Visualization-Tailored Machine Learning • Most ML4VIS studies directly apply general ML techniques developed in the field of ML • General ML techniques not always suit well for the specific problems in visualization
• 92. ML4VIS: Opportunities & Challenges User-friendly ML4VIS • The employment of ML not only provides opportunities but also poses new challenges in designing visualizations • Some ML4VIS studies have discussed the usability issues of ML4VIS, but these suggestions are scattered among different papers • Future studies are needed to help designers better understand user behaviours and expectations in this new ML4VIS scenario https://qarea.com/blog/5-tips-for-creating-user-friendly-interface
• 93. Machine Learning + Data Visualization + Humans Amount of Information Few Large Human Head Pure Machine Learning Pure Data Visualization Task Definition Fuzzy Clear There is no panacea A better combination between the power of visualization, machine learning, and human users: • How to split tasks • How to dynamically modify the splitting based on user preference and expertise • How to design novel algorithms & visualizations for the collaboration Machine Learning or Data Visualization?
Current LanguageEnglish
Español
Portugues
Français
Deutsche