Successfully reported this slideshow.
Your SlideShare is downloading. ×

From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

Advertisement

More Related Content

Advertisement

Related Books

Free with a 30 day trial from Scribd

See all

From Data to Decisions, a Mixed Path of Data Visualization and Machine Learning

  1. 1. From Data to Decisions, A Mixed Path of Data Visualization and Machine Learning Qianwen Wang Hypothesis p-value thr:0.05 Model Results R(M, D) R(M, D+) R(M+, D) R(M+, D+) 0.7405 0.5232 0.2961 0.8705 0.030 R(M, D+)<R(M, D) 0.000 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.006 R(M+, D+)>R(M, D+) 0.048 R(M+, D)<R(M, D+) 0.000 R(M+, D+)>R(M+, D) H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t is u s e f u l t o M + a n d w o u ld b e u s e f u l t o M T h e c o n c e p t is h a r m f u l t o M + a n d w o u ld b e h a r m f u l t o M M h a s a lr e a d y le a r n e d t h e M + h a s le a r n e d t h e T h e e x t r a in f o r m a t io n in D + h a s a p o s it iv e e f f e c t o n M T h e e x t r a in f o r m a t io n in D + h a s a n e g a t iv e e f f e c t o n M T h e e x t r a in f o r m a t io n in D + h a s a p o s it iv e e f f e c t o n M + T h e e x t r a in f o r m a t io n in D + h a s a n e g a t iv e e f f e c t o n M + L e a n in g w it h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s it iv e ly L e a n in g w it h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t iv e ly L e a n in g w it h D m + a f f e c t s t h e M p a r t o f M + p o s it iv e ly L e a n in g w it h D m + a f f e c t s t h e M p a r t o f M + n e g a t iv e ly T h e c o n c e p t is u s e f u l t o M +
  2. 2. Advisor: Huamin Qu Advisor: Nils Gehlenborg 2020 2017 2019 2015 Machine Learning Data Visualization Human Computer Interaction
  3. 3. Machine Learning and Data Visualization, What are we talking about?
  4. 4. Machine Learning Data Visualization • An ability to learn from data, extract patterns, and make decisions with minimum human intervention • An accessible way for humans to interpret data, identify patterns, and make data- driven decisions
  5. 5. Machine Learning Data Visualization Data Decisions
  6. 6. http://querytreeapp.com/blog/ma ke-sense-with-data-visualization/ Data Visualization Machine Learning
  7. 7. Artificial intelligence is still human intelligence Data Visualization Machine Learning
  8. 8. Machine Learning Data Visualization Data Decisions
  9. 9. Data Collection Model Development Model Evaluation Model Application Problem Understanding Machine Learning Data Visualization Data Decisions Data Specification Knowledge Visualization Perception Exploration data visualization user image modify specification increase knowledge • VIS4ML • ML4VIS • A better collaboration between ML and VIS
  10. 10. Data Collection Model Development Model Evaluation Model Application Problem Understanding Human intervention is needed at each step How can data visualization facilitate the process?
  11. 11. Data Collection Model Development Model Evaluation Model Application Problem Understanding How to choose a suitable model?
  12. 12. Overwhelmed by the Variety 12 DNN DNN D N N DNN DNN DNN DNN DNN DNN DNN DNN D N N DNN DNN DNN DNN Deep Neural Network (DNN)
  13. 13. DNN Genealogy 13 DNN DNN DNN DNN DNN DNN DNN DNN DNN DNN DNN DNN
  14. 14. V i s u a l G e n e a l o g y o f D e e p N e u r a l N e t w o r k s Qianwen Wang1, Jun Yuan2, Shuxin Chen2, Hang Su2, Huamin Qu1, and Shixia Liu2 Tshinghua University
  15. 15. Visualization Module 15 Architecture Evolution Performance http://dnn.hkustvis.org/
  16. 16. Case: Investigate Evolution Patterns 17
  17. 17. Case: Investigate Evolution Patterns 18
  18. 18. 19 How to combine skip connection with the main branch? Gate Addition Concatenation A mixture + || + || Case: Investigate Evolution Patterns
  19. 19. ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, Huamin Qu
  20. 20. 21 Developing ML Models A model for my task SVM MLP Random Forest KNN . . . . . . learning rate = ? # layers = ? batch size =? # neurons = ? . . . . . .
  21. 21. … SVM MLP Rando m Forest KN N . . . . . . learning rate = ? # layers = ? batch size =? # neurons = ? . . . . . . Suppor t Vector Machin e ? Ne ura l Ne tw ork ? Ra nd om For est ? Hid de n Lay er = ? Le arn ing Rat e = ? Ker nel Fun ctio n = ? Ma x De pt h = ? A c ti v a ti o n = ? K Near est Neig hbor ? L e af Si z e = ? Min Sam ples Leaf = ? Min Sam ples Split = ? Line ar Reg ress ion ? Automated Machine Learning Make it automated! 22
  22. 22. controllability transparency …
  23. 23. 24 Overview
  24. 24. 25 Algorithm Level HyperPartition Level HyperPartameter Level
  25. 25. Data Collection Model Development Model Evaluation Model Application Problem Understanding Can we conduct behavioral testing of ML models that goes beyond accuracy?
  26. 26. How to examine Discrimination? 28
  27. 27. A College Admission Example 29 accepted females accepted males rejected 50%>42% Seems unfair?
  28. 28. A College Admission Example 30 accepted females accepted males rejected Low score High score 33.3%>26.7% 75%>65%
  29. 29. A College Admission Example 31 accepted females accepted males rejected 20%=20% 40%=40% 60%=60% 80%=80% Low score High score CS EE CS EE
  30. 30. 32 Two individuals who are similar with respect to a task are treated equally
  31. 31. Visual Analysis of Discrimination in Machine Learning Tshinghua University 1. 2. Qianwen Wang1 Zhenhua Xu1 Huamin Qu1 Shixia Liu2 Zhutian Chen1 Yong Wang1
  32. 32. 34 !"#"$%&'( )* +,- ."/ 01* ."/ 210 3'$45678// "#968:;'< ='9$/>3""4 ;<6'?")-04 @'<! A B Discriminatory Itemset
  33. 33. 35 Discriminatory Itemset
  34. 34. Challenges in Analysis 36 3'$45678// 7'6%&'( C#968:;'<D*%2E ='9$/>3""4 +,- $"78:;'<DF '3<%6=;7# ='9/"D $"<: 68G;:87%&8;<D )E000 ?8$;:87D #;('$6"# 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- $"78:;'<DF <':%;<%!8?;7;I ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ='9/"D '3< Long and Complex Definition 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ?8$;:87D #;('$6"# Intertwining Relationship
  35. 35. Long and Complex Definition 37
  36. 36. Long and Complex Definition 38 23< raised hands < 50 Attribute Matrix Itemset Attribute
  37. 37. Intertwining Relationships 39 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- $"78:;'<DF <':%;<%!8?;7;I ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ='9/"D '3< 3'$45678// G$;(8:" C#968:;'<D)2E ='9$/>3""4DE-%H- 68G;:87%&8;<D E000%H000 ?8$;:87D #;('$6"# RippleSet
  38. 38. Designing RippleSet 40 An item Items ∈ set A An item Items ∈ set A
  39. 39. 41 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) Designing RippleSet
  40. 40. 42 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) ABC ABE BCE ABCD CD Designing RippleSet
  41. 41. 43 An item Items ∈ set A (C∩D)(AUBUE) (A∩B∩C∩D)E (A∩B∩C)(DUE) (A∩B∩E)(CUD) (B∩C∩E)(AUD) ABC ABE BCE ABCD CD Items belonging to the same set are put together D D Weighted DAG Circle packing algorithm Designing RippleSet
  42. 42. 44
  43. 43. Data Collection Model Development Model Evaluation Model Application Problem Understanding Can we conduct behavioral testing of ML models that goes beyond accuracy?
  44. 44. 46 Hypothesize about the effect of the Common Orientation of an object Hypothesize about the effect of the Surrounding environment of an object What concepts has the model learned? Are the learned concepts always useful?
  45. 45. Black-box Analysis 47 input model prediction
  46. 46. Black-box Analysis 48 Prospector Krause et al. 2016 model prediction input What-if tool Wexler et al. 2019 GMUT Hohman et al. 2019 examine hypotheses about how perturbations to inputs affect the ML model outputs Not statistically-meaningful: • Only observations on individual predictions
  47. 47. White-box Analysis 49 Deconvnet Zeiler and Fergus 2013 Guided back propagation Springenberg et al. 2013 What has a neuron learned? Not statistical-meaningful: • The depicted patterns provide largely a hunch rather than solid conclusions Not efficient: • It is impossible to examine all neurons
  48. 48. Can we test concept-based hypotheses in an efficient and statistically-meaningful way ? 50
  49. 49. H y p o M L : V i s u a l A n a l y s i s f o r H y p o t h e s i s - b a s e d E v a l u a t i o n o f M a c h i n e L e a r n i n g M o d e l s Qianwen Wang1 William Alexander2 Huamin Qu1 Min Chen2 Jack Pegg2
  50. 50. noise noise D D + + D D Concept-based Testing 52 D + noise D M+ M+ M M M+ M 2 ML models ML Training 2 pairs of datasets Extra data that contains the testing concept
  51. 51. Concept-based Testing 53 D + noise D M+ M 2 ML models ML Training D + noise D M+ M D + noise D M+ M R(M+,D) R(M,D) 4 sets of results ML Testing Extra data that contains the testing concept R(b) R(a) R(M+,D+) R(M,D+) 2 pairs of datasets
  52. 52. R(b) R(a) Statistical Comparison 54 significantly lower than significantly higher than insignificantly lower or higher than or , but not or , but not Many uncontrolled variables……. µ(R(a))=0.878 > µ(R(b))=0.876
  53. 53. Top-down workflow 55 0.8133 0.8347 0.8365 0.8356 Statistical Comparison Model Results H1. The concept is useful to M+ and would be useful to M H2. The concept is harmful to M+ and would be harmful to M H3. M has learned the concept ξ adequately H4. M+ has learned the concept ξ adequately H5. The extra information in D+ has a positive effect on M H6. The extra information in D+ has a negative effect on M H7. The extra information in D+ has a positive effect on M+ H8. The extra information in D+ has a negative effect on M+ H11. Leaning with Dm+ affects the extra part of M+ positively H12. Leaning with Dm+ afects the extra part of M+ negatively H9. Leaning with Dm+ affects the M part of M+ positively H10. Leaning with Dm+ affects the M part of M+ negatively Hypotheses p: 0.446 p: 0.098 p: 0.256 p: 0.377 p: 0.061 p: 0.079 R(M+,D+) R(M+,D) R(M,D+) R(M,D)
  54. 54. Visual Analysis of Hypotheses 56 p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M + Supported Unproven Rejected A hypothesis is based on the analyses in the row
  55. 55. p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M + Visual Analysis of Hypotheses 57 The analysis in row rejects supports unproves is conditional on is unrelated to the hypothesis in col
  56. 56. Visual Analysis of Hypotheses 58 The difference is statistically significant insignificant p-value thr:0.05 Model Results 0.8757 0.6471 0.6092 0.9188 0.032 R(M, D+)<R(M, D) 0.002 R(M+, D)<R(M, D) 0.002 R(M+, D+)>R(M, D) 0.015 R(M+, D+)>R(M, D+) 0.405 R(M+, D)<R(M, D+) 0.002 R(M+, D+)>R(M+, D) R(M, D) R(M, D+) R(M+, D) R(M+, D+) Hypothesis H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 T h e c o n c e p t i s u s e f u l t o M + a n d w o u l d b e u s e f u l t o M T h e c o n c e p t i s h a r m f u l t o M + a n d w o u l d b e h a r m f u l t o M M h a s a l r e a d y l e a r n e d t h e M + h a s l e a r n e d t h e T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M T h e e x t r a i n f o r m a t i o n i n D + h a s a p o s i t i v e e f f e c t o n M + T h e e x t r a i n f o r m a t i o n i n D + h a s a n e g a t i v e e f f e c t o n M + L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e e x t r a p a r t o f M + n e g a t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + p o s i t i v e l y L e a n i n g w i t h D m + a f f e c t s t h e M p a r t o f M + n e g a t i v e l y T h e c o n c e p t i s u s e f u l t o M +
  57. 57. Testing Concept: Color Space 64 RGB CIELAB HSV HSL CMYK YCrC How does the concept Color Space influence the ML model? RGB HSV
  58. 58. 65 HSV RGB Noise RGB M+ D+ D M How to merge Color Space: Experiment Design How to merge
  59. 59. 66 Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling Flatten add Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling max Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling max Conv2D Max Pooling Conv2D Max Pooling Flatten Dropout Dense Conv2D Max Pooling Conv2D Max Pooling Flatten max How to merge Color Space: Experiment Design maxpool 2 maxpool 1 add max
  60. 60. 67 maxpool 1 The information from another color space HSV contributes to the prediction of this model Color Space: Results
  61. 61. 68 maxpool 2 maxpool 1 The hypothesis testing results change when we merge at different positions Color Space: Results
  62. 62. 69 add max Merge using different methods Color Space: Results
  63. 63. Data Collection Model Development Model Evaluation Model Application Problem Understanding What can data visualization do to facilitate the application of ML in a specific domain ?
  64. 64. Qianwen Wang Nils Gehlenborg Kexin Huang Payal Chandak Marinka Zitnik DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery 71 Anatomy Molecular Function Cellular Component Biological Process Phenoty pe/Effect Drug Disease indication, contraindication, off-label use drug side effects disease symptoms/ phenotypes Reactome Pathway present, absent Protein/ Gene relationships about drugs, diseases, proteins, pathways, effects as a heterogenous graph Data about biomedicine
  65. 65. DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery The challenges are more than just providing explanations: 1) find a form of explanation that can be easily interpreted by doctors in the context of biomedicine 2) present the explanations in a scalable, effective, and steerable way. known relationships new therapeutic use deep learning knowledge learned by this model reasons of this prediction
  66. 66. DrugxAI: Interactive Visualization for Explainable AI in Drug Discovery
  67. 67. D a t a Speci ficati on Knowl edge Visualization Perception Exploration data visualization user image m o d i f y s p e c i f i c a t i o n i n c r e a s e k n o w l e d g e Machine Learning Data Visualization Data Decisions
  68. 68. Data Specification Knowledge Visualization Perception Exploration data visualization user image modify specification increase knowledge J. J. Van Wijk, “The value of visualization”, 2005 Data Visualization
  69. 69. A p p l y i n g M a c h i n e L e a r n i n g A d v a n c e s t o D a t a V i s u a l i z a t i o n : A S u r v e y o f M L 4 V I S Qianwen Wang Huamin Qu Zhutian Chen Yong Wang
  70. 70. b a 4 7 9 15 50 0 10 20 30 40 50 Other DMM ML HCI VIS 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 1 0 1 1 1 3 1 5 9 19 28 16 2020-
  71. 71. https://ml4vis.github.io
  72. 72. VIS-driven Data Processing Insight Style Data Visualization VIS Interaction VIS Perception Data Presentation Insight Communication Style Imitation USER VIS DATA User Action
  73. 73. Raw Data Processed Data VIS-driven Data Preprocessing VIS Luo et al. Interactive Cleaning for Progressive Visualization through Composite Questions, 2020
  74. 74. Processed Data VIS Data Presentation Dibia and Demiralp, Data2Vis, 2018 Hu et al. , VixML, 2018
  75. 75. Insights VIS Insight Communication Qian et al. 2020 Wang et al. 2020
  76. 76. Data VIS of A Specific Style Style Imitation VIS Tang et al, PlotThread, 2020 Wu et al., MobileVisFixer, 2020 Smart et al., 2019
  77. 77. DeepDrawing: A Deep Learning Approach to Graph Drawing Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu Graph Data Style Imitation Graph Drawing Graph Drawing Samples The curved green arrows (real edges of graphs) explicitly reflect the actual graph structure The dotted yellow arrows (“fake” edges) propagate the prior nodes’ overall influence on the drawing of subsequent nodes A graph-based LSTM for the learning of graph drawing
  78. 78. DeepDrawing: A Deep Learning Approach to Graph Drawing Yong Wang, Zhihua Jin, Qianwen Wang, Weiwei Cui, Tengfei Ma, Huamin Qu Graph Data Style Imitation Graph Drawing Graph Drawing Samples Baseline Model: a 4-layer bi- directional LSTM
  79. 79. VIS Data, Style, Insights VIS Perception Bylinskii et al. 2017 Poco et al. 2017 Kafle et al. 2018
  80. 80. Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu VIS Perception "encoding": { "x": { "field": “sale”, "scale": { "bandSize": 30 }, "type": "quantitative" .. Bitmap visualization Visualization specification Mask-RCNN Post processing based on GrabCut
  81. 81. Towards Automated Infographic Design: Deep Learning-based Auto-Extraction of Extensible Timeline Zhutian Chen, Yun Wang, Qianwen Wang, Yong Wang, and Huamin Qu VIS Perception "encoding": { "x": { "field": “sale”, "scale": { "bandSize": 30 }, "type": "quantitative" .. Bitmap visualization Visualization specification
  82. 82. VIS VIS Interaction User Action VIS Chen et al. 2020 Ottley et al. 2019
  83. 83. ID paper venue 1 Gotz and Wen [86] IUI 2009 X X X 2 Savva et al. [107] UIST 2011 X X 3 Key et al. [11] SIGMOD 2012 X X 4 Steichen et al. [84] IUI 2013 X X 5 Brown et al. [62] TVCG 2014 X X 6 Lalle et al. [83] IUI 2014 X X 7 Toker et al. [12] IUI 2014 X X 8 Sedlmair and Aupetit [13] CGF 2015 X X 9 Mutlu et al. [14] TiiS 2016 X X 10 Aupetit and Sedlmair [95] PVis 2016 X X 11 Siegel et al. [102] ECCV 2016 X X 12 Kembhavi et al. [92] ECCV 2016 X x 13 Al-Zaidy et al. [15] AAAI 2016 X X 14 Poci et al. [88] VIS 2017 X X 15 Kwon et al. [74] VIS 2017 X x 16 Bylinskii et al. [64] UIST 2017 X X 17 Saha et al. [117] IJCAI 2017 X X 18 Kruiger et al. [16] EuroVis 2017 X X 19 Poco and Heer [89] EuroVis 2017 X X 20 Jung et al. [99] CHI 2017 X X 21 Bylinskii et al. [100] arxiv 2017 X X X 22 Al-Zaidy and Giles [17] AAAI 2017 X X 23 Siddiqui et al. [61] VLDB 2018 X X 24 Gramazio et al. [85] VIS 2018 X X 25 Moritz et al. [18] VIS 2018 X X x 26 Berger et al. [68] VIS 2018 X X 27 Wang et al. [53] VIS 2018 X X 28 Haehn et al. [19] VIS 2018 X x 29 Luo et al. [57] SIGMOD 2018 X X x 30 Milo and Somech [80] KDD 2018 X X 31 Zhou et al. [20] IJCAI 2018 X X 32 Kahou et al. [101] ICLR 2018 X X 33 Luo et al. [65] ICDE 2018 X X 34 [Fan and Hauser [79] EuroVis 2018 X X 35 Chegini et al. [96] EuroVis 2018 X X 36 Kafle et al. [63] CVPR 2018 X X x 37 Kim et al. [106] CVPR 2018 X x 38 Battle et al. [108] CHI 2018 X X 39 Dibia and Demiralp [54] CGA 2018 X X 40 Haleem et al. [94] CGA 2018 X X 41 Madan et al. [103] arxiv 2018 X x X V I S - d r i v e n D a t a P r o c e s s i n g P r e s e n t D a t a C o m m u n i c a t e I n s i g h t I m i t a t e S t y l e V I S P e r c e p t i o n V I S I n t e r a c t i o n C l u s t e r i n g D i m e n s i o n R e d u c t i o n G e n e r a t i v e C l a s s i f i c a t i o n R e g r e s s i o n S e m i - s u p e r v i s e d R e i n f o r c e m e n t 14 Poci et al. [88] VIS 2017 X X 15 Kwon et al. [74] VIS 2017 X x 16 Bylinskii et al. [64] UIST 2017 X X 17 Saha et al. [117] IJCAI 2017 X X 18 Kruiger et al. [16] EuroVis 2017 X X 19 Poco and Heer [89] EuroVis 2017 X X 20 Jung et al. [99] CHI 2017 X X 21 Bylinskii et al. [100] arxiv 2017 X X X 22 Al-Zaidy and Giles [17] AAAI 2017 X X 23 Siddiqui et al. [61] VLDB 2018 X X 24 Gramazio et al. [85] VIS 2018 X X 25 Moritz et al. [18] VIS 2018 X X x 26 Berger et al. [68] VIS 2018 X X 27 Wang et al. [53] VIS 2018 X X 28 Haehn et al. [19] VIS 2018 X x 29 Luo et al. [57] SIGMOD 2018 X X x 30 Milo and Somech [80] KDD 2018 X X 31 Zhou et al. [20] IJCAI 2018 X X 32 Kahou et al. [101] ICLR 2018 X X 33 Luo et al. [65] ICDE 2018 X X 34 [Fan and Hauser [79] EuroVis 2018 X X 35 Chegini et al. [96] EuroVis 2018 X X 36 Kafle et al. [63] CVPR 2018 X X x 37 Kim et al. [106] CVPR 2018 X x 38 Battle et al. [108] CHI 2018 X X 39 Dibia and Demiralp [54] CGA 2018 X X 40 Haleem et al. [94] CGA 2018 X X 41 Madan et al. [103] arxiv 2018 X x X 42 Yu and Silva [82] VIS 2019 X X 43 He et al. [69] VIS 2019 X X 44 Chen et al. [59] VIS 2019 X X 45 Han and Wang [67] VIS 2019 X X 46 Chen et al. [55] VIS 2019 X X 47 Kwon and Ma [75] VIS 2019 X X 48 Wang et al. [2] VIS 2019 X x 49 Han et al. [120] VIS 2019 X X x 50 Wall et al. [111] VIS 2019 X X 51 Fujiwara et al. [118] VIS 2019 X X 52 Fu et al. [3] VIS 2019 X x X 53 Porter et al. [21] VIS 2019 X X 54 Jo and Seo [119] VIS 2019 X X x 55 Ma et al. [93] VIS 2019 X X 56 Wang et al. [73] VIS 2019 x X 57 Cui et al. [56] VIS 2019 X X 58 Chen et al. [5] VIS 2019 x X 59 Wang et al. [22] VIS 2019 X x 60 Smart et al. [58] VIS 2019 X X 61 Huang et al. [104] VIS 2019 X X 62 Hong et al. [23] PacificVis 2019 X X 63 Fan and Hauser [122] EuroVis 2019 X X 64 Ottley et al. [60] EuroVis 2019 X X 65 Abbas et al. [121] EuroVis 2019 X x x 66 Kassel and Rohs [24] EuroVis 2019 X X X 67 Hu et al. [66] CHI 2019 X X 68 Fan and Hauser [25] CGA 2019 X X 69 Kafle et al. [26] arxiv 2019 X X 45 Han and Wang [67] VIS 2019 X X 46 Chen et al. [55] VIS 2019 X X 47 Kwon and Ma [75] VIS 2019 X X 48 Wang et al. [2] VIS 2019 X x 49 Han et al. [120] VIS 2019 X X x 50 Wall et al. [111] VIS 2019 X X 51 Fujiwara et al. [118] VIS 2019 X X 52 Fu et al. [3] VIS 2019 X x X 53 Porter et al. [21] VIS 2019 X X 54 Jo and Seo [119] VIS 2019 X X x 55 Ma et al. [93] VIS 2019 X X 56 Wang et al. [73] VIS 2019 x X 57 Cui et al. [56] VIS 2019 X X 58 Chen et al. [5] VIS 2019 x X 59 Wang et al. [22] VIS 2019 X x 60 Smart et al. [58] VIS 2019 X X 61 Huang et al. [104] VIS 2019 X X 62 Hong et al. [23] PacificVis 2019 X X 63 Fan and Hauser [122] EuroVis 2019 X X 64 Ottley et al. [60] EuroVis 2019 X X 65 Abbas et al. [121] EuroVis 2019 X x x 66 Kassel and Rohs [24] EuroVis 2019 X X X 67 Hu et al. [66] CHI 2019 X X 68 Fan and Hauser [25] CGA 2019 X X 69 Kafle et al. [26] arxiv 2019 X X 70 Mohammed [27] VLDB 2020 X x 71 Zhang et al. [90] VIS 2020 x x x 72 Wu et al. [77] VIS 2020 x x 73 Tang et al. [76] VIS 2020 x x 74 Qian et al. [28] VIS 2020 x x 75 Wang et al. [29] VIS 2020 x X 76 Fosco et al. [112] UIST 2020 x x 77 Giovannangeli et al. [139] PacificVis 2020 x x 78 Liu et al. [105] PacificVis 2020 x x x 79 Luo et al. [52] ICDE 2020 X x x 80 Lekschas et al. [113] EuroVis 2020 x x X x 81 Zhao et al. [30] CHI 2020 X x 82 Lai et al. [31] CHI 2020 x x x 83 Kim et al. [32] CHI 2020 x x x 84 Lu et al. [33] CHI 2020 x x x 85 Zhou et al. [109] arxiv 2020 X X Machine Learning Tasks: Clustering, Dimension Reduction, Generation Classification Regression Semi-supervised Learning Reinforcement Learning Visualization Process: VIS-driven Data Processing Data Presentation Insight Communication Style Imitation VIS Perception VIS Interaction
  84. 84. Classification is the most widely used
  85. 85. This might be caused by the success of deep learning in computer vision tasks
  86. 86. We need to better embrace the diversity of ML techniques.
  87. 87. ML4VIS: Opportunities & Challenges Public High-quality Datasets & Benchmark Tasks Visualization-Tailored Machine Learning User-Friendly ML4VIS
  88. 88. ML4VIS: Opportunities & Challenges Public High-quality Datasets & Benchmark Tasks • Most papers constructed their own datasets due to the lack of public visualization datasets • The dataset quality may endanger the validity of the obtained ML models. e.g., DeepEye [luo et al.2019] learns to classify “good”/“bad” visualizations based on the training examples labelled by 100 students • Benchmark tasks for ML4VIS remain unclear
  89. 89. ML4VIS: Opportunities & Challenges Visualization-Tailored Machine Learning • Most ML4VIS studies directly apply general ML techniques developed in the field of ML • General ML techniques not always suit well for the specific problems in visualization
  90. 90. ML4VIS: Opportunities & Challenges User-friendly ML4VIS • The employment of ML not only provides opportunities but also poses new challenges in designing visualizations • Some ML4VIS studies have discussed the usability issues of ML4VIS, but these suggestions are scattered among different papers • Future studies are needed to help designers better understand user behaviours and expectations in this new ML4VIS scenario https://qarea.com/blog/5-tips-for-creating-user-friendly-interface
  91. 91. Machine Learning + Data Visualization + Humans Amount of Information Few Large Human Head Pure Machine Learning Pure Data Visualization Task Definition Fuzzy Clear There is no panacea A better combination between the power of visualization, machine learning, and human users: • How to split tasks • How to dynamically modify the splitting based on user preference and expertise • How to design novel algorithms & visualizations for the collaboration Machine Learning or Data Visualization?
  92. 92. Thanks! https://wangqianwen0418.github.io/ qianwen_wang@hms.harvard.edu Data Collection Model Development Model Evaluation Model Application Problem Understanding Data Visualization Data Decisions VIS-driven Data Processing Insight Style Data Visualization VIS Interaction VIS Perception Data Presentation Insight Communication Style Imitation USER VIS DATA User Action Machine Learning

×