Successfully reported this slideshow.
Your SlideShare is downloading. ×

Applying Machine Learning to Data Visaulization: What, Why, Where, and How

Applying Machine Learning to Data Visaulization: What, Why, Where, and How

Download to read offline

Inspired by the great success of machine learning (ML), researchers have applied ML techniques to visualizations to achieve a better design, development, and evaluation of visualizations. This branch of studies, known as ML4VIS, is gaining increasing research attention in recent years. To successfully adapt ML techniques for visualizations, a structured understanding of the integration of ML4VISis needed. In this paper, we systematically survey 88 ML4VIS studies, aiming to answer two motivating questions: "what visualization processes can be assisted by ML?" and "how ML techniques can be used to solve visualization problems?" This survey reveals seven main processes where the employment of ML techniques can benefit visualizations:Data Processing4VIS, Data-VIS Mapping, InsightCommunication, Style Imitation, VIS Interaction, VIS Reading, and User Profiling. The seven processes are related to existing visualization theoretical models in an ML4VIS pipeline, aiming to illuminate the role of ML-assisted visualization in general visualizations.Meanwhile, the seven processes are mapped into main learning tasks in ML to align the capabilities of ML with the needs in visualization. Current practices and future opportunities of ML4VIS are discussed in the context of the ML4VIS pipeline and the ML-VIS mapping. While more studies are still needed in the area of ML4VIS, we hope this paper can provide a stepping-stone for future exploration. A web-based interactive browser of this survey is available at https://ml4vis.github.io

Inspired by the great success of machine learning (ML), researchers have applied ML techniques to visualizations to achieve a better design, development, and evaluation of visualizations. This branch of studies, known as ML4VIS, is gaining increasing research attention in recent years. To successfully adapt ML techniques for visualizations, a structured understanding of the integration of ML4VISis needed. In this paper, we systematically survey 88 ML4VIS studies, aiming to answer two motivating questions: "what visualization processes can be assisted by ML?" and "how ML techniques can be used to solve visualization problems?" This survey reveals seven main processes where the employment of ML techniques can benefit visualizations:Data Processing4VIS, Data-VIS Mapping, InsightCommunication, Style Imitation, VIS Interaction, VIS Reading, and User Profiling. The seven processes are related to existing visualization theoretical models in an ML4VIS pipeline, aiming to illuminate the role of ML-assisted visualization in general visualizations.Meanwhile, the seven processes are mapped into main learning tasks in ML to align the capabilities of ML with the needs in visualization. Current practices and future opportunities of ML4VIS are discussed in the context of the ML4VIS pipeline and the ML-VIS mapping. While more studies are still needed in the area of ML4VIS, we hope this paper can provide a stepping-stone for future exploration. A web-based interactive browser of this survey is available at https://ml4vis.github.io

Advertisement
Advertisement

More Related Content

Advertisement

Related Books

Free with a 30 day trial from Scribd

See all

Applying Machine Learning to Data Visaulization: What, Why, Where, and How

  1. 1. Qianwen Wang 2022.02.03 ApplyingMachineLearningAdvances toDataVisualization WHAT WHY HOW WHERE 1
  2. 2. ID paper venue D a t a P r o c e s s i n g 4 V I S D a t a - V I S M a p p i n g I n s i g h t C o m m u n i c a t i o n V I S R e a d i n g V I S I n t e r a c t i o n C l u s t e r i n g D i m e n s i o n R e d u c t i o n G e n e r a t i v e C l a s s i f i c a t i o n R e g r e s s i o n S e m i - s u p e r v i s e d R e i n f o r c e m e n t U s e r P r o f i l i n g 1 Sips et al. [1] EuroVis 2009 X X 2 Gotz and Wen [2] IUI 2009 X X 3 Savva et al. [3] UIST 2011 X X 4 Key et al. [4] SIGMOD 2012 X X 5 Steichen et al. [5] IUI 2013 X X 6 Brown et al. [6] TVCG 2014 X X 7 Lalle et al. [7] IUI 2014 X X 8 Toker et al. [8] IUI 2014 X X 9 Sedlmair and Aupetit [9] CGF 2015 X X 10 Mutlu et al. [10] TiiS 2016 X X 11 Aupetit and Sedlmair [11] PVis 2016 X X 12 Siegel et al. [12] ECCV 2016 X X 13 Kembhavi et al. [13] ECCV 2016 X X 14 Al-Zaidy et al. [14] AAAI 2016 X X 15 Pezzotti et al. [15] TVCG 2016 X X 16 Poco et al. [16] VIS 2017 X X 17 Kwon et al. [17] VIS 2017 X X 18 Bylinskii et al. [18] UIST 2017 X X 19 Saha et al. [19] IJCAI 2017 X X 20 Kruiger et al. [20] EuroVis 2017 X X 21 Poco and Heer [21] EuroVis 2017 X X 22 Jung et al. [22] CHI 2017 X X 23 Bylinskii et al. [23] arxiv 2017 X X X 24 Al-Zaidy and Giles [24] AAAI 2017 X X 25 Siddiqui et al. [25] VLDB 2018 X X X 26 Gramazio et al. [26] VIS 2018 X X 27 Moritz et al. [27] VIS 2018 X X X 28 Berger et al. [28] VIS 2018 X X 29 Wang et al. [29] VIS 2018 X X 30 Haehn et al. [30] VIS 2018 X X 31 Luo et al. [31] SIGMOD 2018 X X X 32 Milo and Somech [32] KDD 2018 X X 33 Zhou et al. [33] IJCAI 2018 X X 34 Kahou et al. [34] ICLR 2018 X X 35 Luo et al. [35] ICDE 2018 X X 36 Fan and Hauser [36] EuroVis 2018 X X 37 Chegini et al. [37] EuroVis 2018 X X 38 Kafle et al. [38] CVPR 2018 X X X 39 Kim et al. [39] CVPR 2018 X X 40 Battle et al. [40] CHI 2018 X X 41 Dibia and Demiralp [41] CGA 2018 X X 42 Haleem et al. [42] CGA 2018 X X 43 Madan et al. [43] arxiv 2018 X X X 44 Yu and Silva [44] VIS 2019 X X 45 He et al. [45] VIS 2019 X X 46 Chen et al. [46] VIS 2019 X X 47 Han and Wang [47] VIS 2019 X X 48 Chen et al. [48] VIS 2019 X X 49 Kwon and Ma [49] VIS 2019 X X 50 Wang et al. [50] VIS 2019 X X 51 Han et al. [51] VIS 2019 X X X 52 Wall et al. [52] VIS 2019 X X 53 Fujiwara et al. [53] VIS 2019 X X 54 Fu et al. [54] VIS 2019 X X X 55 Porter et al. [55] VIS 2019 X X 56 Jo and Seo [56] VIS 2019 X X X 57 Ma et al. [57] VIS 2019 X X 58 Wang et al. [58] VIS 2019 X X 59 Cui et al. [59] VIS 2019 X X 60 Chen et al. [60] VIS 2019 X X 61 Wang et al. [61] VIS 2019 X X 62 Smart et al. [62] VIS 2019 X X 63 Huang et al. [63] VIS 2019 X X 64 Hong et al. [64] PVis 2019 X X 65 Fan and Hauser [65] EuroVis 2019 X X 66 Ottley et al. [66] EuroVis 2019 X X 67 Abbas et al. [67] EuroVis 2019 X X X 68 Kassel and Rohs [68] EuroVis 2019 X X X 69 Hu et al. [69] CHI 2019 X X 70 Fan and Hauser [70] CGA 2019 X X 71 Kafle et al. [71] arxiv 2019 X X 72 Mohammed [72] VLDB 2020 X X 73 Zhang et al. [73] VIS 2020 X X X 74 Wu et al. [74] VIS 2020 X X 75 Tang et al. [75] VIS 2020 X X 76 Qian et al. [76] VIS 2020 X X 77 Wang et al. [77] VIS 2020 X X 78 Oppermann et al. [78] VIS 2020 X X 79 Fosco et al. [79] UIST 2020 X X 80 Giovannangeli et al. [80] PacificVis 2020 X X 81 Liu et al. [81] PacificVis 2020 X X X 82 Luo et al. [82] ICDE 2020 X X X 83 Lekschas et al. [83] EuroVis 2020 X X X X 84 Zhao et al. [84] CHI 2020 X X 85 Lai et al. [85] CHI 2020 X X X 86 Kim et al. [86] CHI 2020 X X X 87 Lu et al. [87] CHI 2020 X X X 88 Zhou et al. [88] arxiv 2020 X X S t y l e I m i t a t i o n Current Practices, Trends, Challenges, Opportunities 2
  3. 3. https://ml4vis.github.io https://github.com/ML4VIS/ML4VIS.github.io/ 3
  4. 4. Outline WHAT What is ML4VIS WHY Why ML4VIS WHERE Where do the needs for ML exist in visualization HOW How can ML be used for visualization problems Summary Deep Learning-based Auto- Extraction of Extensible Timeline Chen et al. IEEE InfoVIS 2019 4
  5. 5. D a t a Visu a liz a tion (VIS), M a chine Le a rning (ML), VIS4ML, ML4VIS WHAT 5
  6. 6. Data Real World Humans 6
  7. 7. Data Real World Humans 7
  8. 8. Data Real World Humans VIS ML 8
  9. 9. VIS ML Strengths of human visual perception systems to e ffi ciently make sense of data "a picture is worth a thousand words" Unprecedented power of automatic algorithms to reveal hidden patterns from large amount of data without human intervention ML4 VIS4 9
  10. 10. VIS4ML Known relationships between medical entities ML Qianwen Wang et al. 2021 ICML Workshop on Interpretable Machine Learning in Healthcare 10
  11. 11. VIS4ML 11 Qianwen Wang et al. 2019 Visual Genealogy of Deep Neural Networks IEEE TVCG
  12. 12. VIS4ML Qianwen Wang et al. 2019 Visual Genealogy of Deep Neural Networks IEEE TVCG 12
  13. 13. VIS4ML 13 Data Collection Model Development Model Evaluation Model Application
  14. 14. VIS 4 ML Assess Create Design 14
  15. 15. Why ML4VIS Why this ML4VIS survey WHY 15
  16. 16. WhyML4VIS It can be challenging to create e ff ective visualizations http://leoyuholo.com/bad-vis-browser/ https://www.reddit.com/r/shittydataisbeautiful/ Data Analytics Graphic Design Full Stack Development User Experience Cognitive Science Human- Computer- Interaction 16
  17. 17. WhyanML4VISsurvey Capabilities of ML Needs in Visualization 17
  18. 18. WhyanML4VISsurvey Capabilities of ML Needs in Visualization Applying ML to unsuitable visualization problems may only impose the drawbacks of ML (e.g., uncertainty, inexplainability) without bringing any bene fi t. 18
  19. 19. WhyanML4VISsurvey Capabilities of ML Needs in Visualization Given a suitable visualization problem, selecting a proper ML technique and employing necessary adaptation are crucial yet challenging. 19
  20. 20. WhyanML4VISsurvey Capabilities of ML Needs in Visualization WHERE HOW 20
  21. 21. WhyanML4VISsurvey Capabilities of ML Needs in Visualization WHERE HOW 21
  22. 22. Where do the needs exist in visu a liz a tion? WHERE 22
  23. 23. Data VIS Users Clear, process, transform data Create visualizations Interpret, interact with, extract information from visualizations D a t a -VIS M a pping Insight Communic a tion Style Imit a tion VIS Inter a ction User Pro f iling VIS Re a ding 4VIS D a t a Processing 23
  24. 24. 24
  25. 25. Data Processing4VIS Data Data VIS Input Output Luo, Yuyu, et al. "Interactive cleaning for progressive visualization through composite questions." 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 2020. Data with erros/missing values Data with no errors that will in fl uence the visualization 25 raw data is transformed into a format that better suits the following visualization processes
  26. 26. Data-VIS Mapping VIS Data Input Output [{“sale”: “100”, “catgegory”: “car”,“year”: “1993”} … {“sale”: “1605”, “catgegory”: “car”,“year”: “1993”}] Haotian Li et al. 2019 KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation 26 data fi elds are mapped into visual channels
  27. 27. Insight Communication Insight VIS Data “Among all students, 49% like football, 32% like basketball, and 21% like baseball.” Input Output Wewei Cui et al. 2019 Text-to-VIS 27 insights are embedded in visualizations to be e ff ectively communicated
  28. 28. Style Imitation Style Data VIS Input Output A layout style that emphasise the node communities Network Data Yong Wang et al. 2019 DeepDrawing: A Deep Learning Approach to Graph Drawing 28 styles are extracted from the given examples and applied to the created visualization A graph with similar style
  29. 29. VIS Interaction VIS VIS User Action Input Output 3D point cloud 2D lasso selection Chen et al. LassoNet: Deep Lasso-Selection of 3D Point Clouds IEEE InfoVIS 2019 & TVCG 29 users interact with a visualization and transformed it into a new stage through user actions
  30. 30. User Profiling User Action VIS User Action User Characteristic or Input Output Eye-tracker records Perceptual speed Verbal working memory Visual working memory Locus of control (personality trait) Learning curve for a certain visual analysis task Sébastien Lallé et al. 2020 Prediction of Users’ Learning Curves for Adaptation while Using an Information Visualization A speci fi c visualization 30 user actions with visualizations are logged and analyzed to better understand users
  31. 31. VIS Reading VIS Data Style Insight or Input Output Can Liu et al. 2020 AutoCaption: An Approach to Generate Natural Language Description from Visualization Automatically Paci fi cVis 31 users read visualizations and obtain useful information
  32. 32. Data Processing4VIS Insight Style Visualization VIS Reading Data-VIS Mapping Insight Communication Style Imitation USER VIS DATA User Action User Profiling User Characteristics VIS Interaction Data 32
  33. 33. Data Processing4VIS Insight Style Visualization VIS Reading Data-VIS Mapping Insight Communication Style Imitation USER VIS DATA User Action User Profiling User Characteristics VIS Interaction Data 33
  34. 34. It would be great if I can create fancy timeline infographics (Style Imitation) 34 Chen et al. Towards Automated Infographic Design: Deep Learning-based Auto- Extraction of Extensible Timeline IEEE InfoVIS 2019 & TVCG
  35. 35. Manually? 35 Chen et al. Towards Automated Infographic Design: Deep Learning-based Auto- Extraction of Extensible Timeline IEEE InfoVIS 2019 & TVCG
  36. 36. 2014 2015 2016 The first year of my Ph.D. Everything is wonderful! My first submission to VIS has been accepted… My second submission to VIS has been accepted Again! Chen et al. Towards Automated Infographic Design: Deep Learning-based Auto- Extraction of Extensible Timeline IEEE InfoVIS 2019 & TVCG 36 2002 2006 2010 Brazil 2-0 Germany. A beautiful match. Italy 1 – 1 France. OMG Zidane head-butted Materazzi! Spain 1-0 Netherlands. What a pity for Netherlands. 2014 2018 Germany 1-0 Argentina. Wonderful game. France 4-2 Croatia. Very exciting for so many goals. New Data ? Can we ask the question differently?
  37. 37. Can we extract the template from a bitmap timeline infographic automatically 2014 2015 2016 The 1st year of my Ph.D. Everything is wonderful! My first submission to VIS has been accepted… My second submission to VIS has been accepted Again! 2002 2006 2010 Brazil 2-0 Germany. A beautiful match. Italy 1 – 1 France. OMG Zidane head-butted Materazzi! Spain 1-0 Netherlands. What a pity for Netherlands. 2014 2018 Germany 1-0 Argentina. Wonderful game. France 4-2 Croatia. Very exciting for so many goals. Font Font Font Icon Icon Icon Font Font Font 2014 2015 2016 The 1st year of my Ph.D. Everything is wonderful! My first submission to VIS has been accepted… My second submission to VIS has been accepted Again! am am Linear, Sequential, Unified, Horizontal at am et em em em et et at at VIS Reading (ML-based) non-ML-based Chen et al. Towards Automated Infographic Design: Deep Learning-based Auto- Extraction of Extensible Timeline IEEE InfoVIS 2019 & TVCG 37
  38. 38. Data Processing4VIS Insight Style Visualization VIS Reading Data-VIS Mapping Insight Communication Style Imitation USER VIS DATA User Action User Profiling User Characteristics VIS Interaction Data 38
  39. 39. Data Processing4VIS Insight Style Visualization VIS Reading Data-VIS Mapping Insight Communication Style Imitation USER VIS DATA User Action User Profiling User Characteristics VIS Interaction Data 39
  40. 40. How c a n ML be used to s a tisfy these needs? HOW 40
  41. 41. ML models are quickly evolving 41
  42. 42. Supervised Learning Semi-Supervised Learning Unsupervised Learning Reinforcement Learning 42
  43. 43. Supervised Learning How a visualization problem can be formed as a supervised learning task • Training Dataset (labeled input-output pairs) • The output can be either described using a numerical value (regression) or fi nite number of types (classi fi cation) Classi fi cation Regression a model learns the mapping from input X to output Y from the labeled training examples FigureSeer Dataset (60k), AI2D dataset (5k), Visually29K dataset (29k), DVQA dataset (300k) FigureQA dataset (100k) ColorMapping dataset (1.6k) 43
  44. 44. Classi fi cation Regression Supervised Learning How a visualization problem can be formed as a supervised learning task • Training Dataset (labeled input-output pairs) • The output can be either described using a numerical value (regression) or fi nite number of types (classi fi cation) a model learns the mapping from input X to output Y from the labeled training examples VIS Reading: A saliency score for each pixel VIS Reading: Bounding box and data values User Pro fi ling: Learning curve A score for a visualisation? A score for a data processing? 44
  45. 45. Classi fi cation Regression Supervised Learning How a visualization problem can be formed as a supervised learning task • Training Dataset (labeled input-output pairs) • The output can be either described using a numerical value (regression) or fi nite number of types (classi fi cation) a model learns the mapping from input X to output Y from the labeled training examples Data-VIS Mapping Is there always a fi nite number of classes? VIS Interaction type of action VIS Reading type of a chart 45
  46. 46. Unsupervised Learning Generative Clustering Dimension Reduction How a visualization problem can be formed as a unsupervised learning task • Labeled dataset is unavailable • Find similar new samples by learning the distribution of existing samples (Generative) a model learns the underlying structure of the unlabelled data X Chen Chen et al. 2019 GenerativeMap: Visualization and Exploration of Dynamic Density Maps via Generative Learning Model Alvitta Ottley et al 2019 Follow The Clicks: Learning and Anticipating Mouse Interactions During Exploratory Data Analysis G Generate interpolation visualizations Generate next step user actions ? ? t t+n 46
  47. 47. Semi-supervised Learning How a visualization problem can be formed as a semi-supervised learning task • Training Dataset (labeled input-output pairs) • The output can be either described using a numerical value (regression) or fi nite number of types (classi fi cation) • Only a small amount of data is labeled • Interactively query new labels from users Similar to supervised learning. But this model is trained using a small amount of labeled data with a large amount of unlabeled data. 47
  48. 48. Reinforcement Learning an agent learns to take actions in an environment to maximize the cumulative rewards. How a visualization problem can be formed as a a reinforcement learning task • The solution can be formed as a set of actions • The quality of the solution can be presented by cumulative rewards Tan Tang et al. 2020 PlotThread: Creating Expressive Storyline Visualizations using Reinforcement Learning Decomposing the creation of a timeline visualization as a set of actions Reward: Δsimilarity between the ground truth layout and the k-th step layout 48
  49. 49. 2002 2006 2010 Brazil 2-0 Germany. A beautiful match. Italy 1 – 1 France. OMG Zidane head-butted Materazzi! Spain 1-0 Netherlands. What a pity for Netherlands. 2014 2018 Germany 1-0 Argentina. Wonderful game. France 4-2 Croatia. Very exciting for so many goals. Font Font Font Icon Icon Icon Font Font Font non-ML-based 2014 2015 2016 The 1st year of my Ph.D. Everything is wonderful! My first submission to VIS has been accepted… My second submission to VIS has been accepted Again! Bitmap Image Content Understanding 2014 2015 2016 The 1st year of my Ph.D. Everything is wonderful! My first submission to VIS has been accepted… My second submission to VIS has been accepted Again! am am Linear, Sequential, Unified, Horizontal at am et em em em et et at at VIS Reading (ML-based) TaskForming 49 Supervised Learning
  50. 50. M. Brehmer, B. Lee, B. Bach, N. H. Riche, and T. Munzner. Timelines Revisited: A Design Space and Considerations for Expressive Storytelling. IEEE TVCG About the whole timeline: 1. Representation 2. Scale 3. Layout 4. Orientation About the elements: 1. Category 2. Location 3. Mask Classi fi cation of an Image Classi fi cation of an object Regression Classi fi cation of a pixel TaskForming 50
  51. 51. ResNeXt - FPN RPN RoiAlign layer Feature maps Box Head Element Bbox Element Category Mask Head Element Mask Timeline Type Fixed size feature map of a RoI Timeline Orientation Feature maps with RoIs Global Local 51 Mask R-CNN. Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick
  52. 52. The model is then fine-tuned with A Synthetic dataset (4296) + a Real world dataset (393) TimelineStoryteller: https://timelinestoryteller.com/ The model is pre-trained with Microsoft COCO Dataset TrainingData 52
  53. 53. PerformanceEvaluation 53
  54. 54. PerformanceEvaluation 54
  55. 55. ML4VIS: Align Needs with Capabilities 55
  56. 56. Supervised ML is the most widely use ML techniques 56
  57. 57. Calling for more diverse ML techniques and more close AI-human collaboration 57
  58. 58. 58
  59. 59. What are still missing: Multi-View Visualizations Visualization Interactions Visualization Animation 59
  60. 60. What are still missing: The adaption of ML techniques for visualization data 60
  61. 61. What are still missing: The adaption of ML techniques for visualization data 61
  62. 62. Deep learning for natural images, a blessing and a curse 62
  63. 63. Take-HomeMessage • 7 visualization processes that can bene fi t from ML • How to form di ff erent visualization problems into 4 main types of ML tasks 63
  64. 64. Thanks Questions & Comments a re welcome! 64

×