Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Visualization design -- Interactive data visualization course

235 views

Published on

This lecture talks about the nested model for visualization design proposed by Tamara Munzner, the LATCH principle, and Tufte design principles.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Visualization design -- Interactive data visualization course

  1. 1. Visualization Design Chen He
  2. 2. Nested Model for Visualization Design Domain situation Data / task abstraction Encoding / interaction technique Algorithm Munzner, T. (2009). A nested process model for visualization design and validation. IEEE Transactions on Visualization & Computer Graphics, (6), 921-928.
  3. 3. Domain situation Domain situation Characterize the problems and data of target users in some particular target domain.
  4. 4. Data / task abstraction Domain situation Data / task abstraction What type of data is shown? Data abstraction Why is the user looking at it? Task abstraction Abstract domain-specific problems and data into a more generic description that is in the vocabulary of computer science.
  5. 5. Encoding / interaction technique Domain situation Data / task abstraction Encoding / interaction technique How is the data shown? Decide on the specific way to create and manipulate the visual representation of the abstraction.
  6. 6. Algorithm Domain situation Data / task abstraction Encoding / interaction technique Crafting a detailed procedure that allows a computer to automatically and efficiently carry out the desired visualization goal. Algorithm
  7. 7. Case study 1: visualizing drug-target datasets
  8. 8. Domain situation Biologists: drug-target datasets are usually dispersed in different sources, which hinders exploration. (CGI) (DTC)
  9. 9. Task abstraction Visually represent drug-target relations; Depict relations from multiple sources;
  10. 10. Data abstraction https://www.coursera.org/learn/datavisualization/lecture/ONoPE/2-1-1-data
  11. 11. Data abstraction Data Data type Drug Nominal Mutation Nominal Tumor type Nominal Drug-target relation from CGI Effects Nominal Evidence level Ordinal Drug-target relation from DTC Potency level Ordinal
  12. 12. Visual encoding - Layout Matrix-based layout: scalable numbers of drugs and targets; Overlaid layers: facilitate comparison of data from multiple sources.
  13. 13. Visual encoding J. Mackinlay, Automating the Design of Graphical Presentations of Relational Information, ACM Transactions on Graphics 5(2), 1986.
  14. 14. Visual encoding Angle & Area < Position & Length
  15. 15. Visual encoding Data Data type Visual variable Drug Nominal Position Mutation Nominal Position Tumor type Nominal Position Drug-target relation from CGI Effects Nominal Hue Evidence level Ordinal Position, length, saturation Drug-target relation from DTC Potency level Ordinal Position, length
  16. 16. Interaction technique https://www.youtube.com/watch?v=Bg_YvhBs1sg
  17. 17. New discovery - Inconsistency Exposure He, C., Micallef, L., Kaski, S., Aittokallio, T. and Jacucci, G., 2017. MediSyn: uncertainty-aware visualization of multiple biomedical datasets to support drug treatment selection. BMC bioinformatics, 18(10), p.393.
  18. 18. Nested Model for Validation
  19. 19. Validation - Discovery of drug repurposing opportunity.
  20. 20. Validation - Qualitative lab study Baseline: two unlinked datasets Participants: 6 domain experts Tasks: 1. drug selection; 2. inconsistency discovery Measure: task performance, subjective feedback
  21. 21. Results - Qualitative lab study
  22. 22. Results - Qualitative lab study
  23. 23. Matrix view supports drug comparison and exposes missing data. Depiction of datasets in overlaid layers facilitates direct comparison of data from multiple sources. (Data consistency) Exposed data conflicts tend to lower user trust in MediSyn but do not have observable effects of user trust in data. Findings - Qualitative lab study
  24. 24. Iterative process & rapid prototyping
  25. 25. Iterative process & rapid prototyping Five datasets - Juxtaposed bars
  26. 26. Iterative process & rapid prototyping
  27. 27. Case study 2: visualizing biological data From Miriah Meyer, Danyel Fisher. Making Data Visual: A Practical Guide to Using Visualization for Insight. O'Reilly Media, 2018.
  28. 28. Domain situation How genes influence physical features of animals? Biologists study a set of fundamental genes that are shared across many species, and control the development of body parts in developing embryos. They are nearly the same in many species, and yet these species are physically very different.
  29. 29. Domain situation What is known: Differences between species are related to when and where these genes are turned on and off in developing embryos. -- Gene expression Goal: Link differences in gene expression to differences in physical traits.
  30. 30. Existing task and tool Task 1: Find cells in one embryo that had significantly different gene expression from cells in another embryo. -- Outlier cells
  31. 31. 2D representation of a fruit fly embryo. Outlier cells are clustered by color and shape.
  32. 32. Existing task and tool Task 2: Find out which genes were different in the outlier cells. Heatmap: encode gene expression values using color. Column: a cell. Grouped columns: clusters of cells in the outlier cell plot. Rows: genes and 6 time points of each gene.
  33. 33. Limitation of the existing tool Manual look-ups between multiple views. Task 3 (not supported): characterize how this gene expression is different from the corresponding cells in another embryo. Task 3 results in comparison of numerous numbers of heatmaps.
  34. 34. First iteration Link two views together via user interaction. Details on demand.
  35. 35. Deploy and interview Problem: The outlier detection algorithm was too restrict, resulting in a rethinking of biologists’ computational approach.
  36. 36. Second iteration -- similarity, not outliers How similar each cell in one embryo was compared to corresponding cells in the other embryo. Task 1: From “ Find outlier cells ” to “ Find cells with low similarity. ”
  37. 37. Second iteration -- results Explore many more different cells than the first version. Finding: The experimental measure from one of the species was plagued with low-level noise, causing the biologists to go back and recapture the data. Emerging question: What would a different similarity metric reveal?
  38. 38. A final version -- apply good design principles Measure a gene in 6 time points of its expression values.
  39. 39. A final version Columns: gene Top row: a selected cell Bottom rows: corresponding cells from other species. Guideline: It is important to present new ideas to the target users with their own data.
  40. 40. Final prototype Meyer, M., Munzner, T., DePace, A. and Pfister, H., 2010. MulteeSum: a tool for comparative spatial and temporal gene expression data. IEEE transactions on visualization and computer graphics, 16(6), pp.908-917.
  41. 41. Recap -- an iterative process with rapid prototyping
  42. 42. Design principles Chen He
  43. 43. Genre Time Length Country Rating Directors Stars
  44. 44. From millions of movies, how do you find interesting movies to watch? Genre Time Length Country Rating Directors Stars
  45. 45. The L.A.T.C.H principle -- Methods of organization Information may be infinite, however... The organization of information is finite. Location Alphabet Time Category Hierarchy
  46. 46. The L.A.T.C.H principle -- Location
  47. 47. The L.A.T.C.H principle -- Alphabet Use when no other methods are appropriate.
  48. 48. The L.A.T.C.H principle -- Time
  49. 49. The L.A.T.C.H principle -- Time
  50. 50. The L.A.T.C.H principle -- Category Organize by similarity or relatedness.
  51. 51. The L.A.T.C.H principle -- Hierarchy
  52. 52. The L.A.T.C.H principle -- Benefits? Support information navigation and exploration; Maximize visual working memory: Four items four chunks;
  53. 53. The L.A.T.C.H principle -- VisGets Dörk, M., Carpendale, S., Collins, C. and Williamson, C., 2008. Visgets: Coordinated visualizations for web-based information exploration and discovery. IEEE Transactions on Visualization and Computer Graphics, 14(6).
  54. 54. Tufte design principles Principle of Graphical Integrity Data-Ink Chartjunk Data Density Small Multiples Principle of Graphical Excellence Following contents are adapted from Edward R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, CT 1983.
  55. 55. What are the problems of this visualization?
  56. 56. Principle of Graphical Integrity Disguised negative income in 1970. Different baselines for three charts make it incomparable between charts. Distracting backgrounds.
  57. 57. Principle of Graphical Integrity -- Visual vs. Numerical Scale The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented.
  58. 58. Counterpoint See slides of lecture 2
  59. 59. Principle of Graphical Integrity -- Labeling Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity.
  60. 60. Lie factor Lie factor = Effect size shown in graphic / Effect size in data. Lie factor ≈ 1: Truth Lie factor > 1.05 or < 0.95: Substantial distortion
  61. 61. Principle of Graphical Integrity
  62. 62. Principle of Graphical Integrity Years showing from future to present. The width of the road is shrinking due to ● values change, ● perspectives, no chance of separating the two.
  63. 63. Principle of Graphical Integrity Data (27.5 - 18.0) / 18.0 = 0.53 Graphic (5.3 - 0.6) / 0.6 = 7.83 Lie factor = 7.83 / 0.53 = 14.8
  64. 64. Principle of Graphical Integrity
  65. 65. Principle of Graphical Integrity Left: $10 for one year is 0.31 square inches. Right: $10 for one year is 4.69 square inches. Lie factor = 4.69 / 0.31 = 15.1
  66. 66. Principle of Graphical Integrity Variation: Show data variation, not design variation.
  67. 67. Principle of Graphical Integrity Lie factor: 9.4 Lie factor: 59.4
  68. 68. Principle of Graphical Integrity Visual vs. data dimensions: The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
  69. 69. Context: Graphics must not quote data out of context. Principle of graphical integrity
  70. 70. Principle of Graphical Integrity Visual vs. numerical scale: The representation of numbers, as physically measured on the surface of the graphic itself, should be directly proportional to the numerical quantities represented. Labeling: Clear, detailed, and thorough labeling should be used to defeat graphical distortion and ambiguity. Variation: Show data variation, not design variation. Visual vs. data dimensions: The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. Context: Graphics must not quote data out of context.
  71. 71. Principle of Graphical Integrity
  72. 72. Principle of Graphical Integrity -- Scale
  73. 73. Principle of Graphical Integrity -- Context
  74. 74. Data-Ink Data-ink ratio = data-ink / total ink used to print the graphic = 1.0 - proportion of a graphic that can be erased without loss of data-information
  75. 75. Data-Ink Above all else show the data Maximize the data-ink ratio, within reason Erase non-data-ink, within reason Erase redundant data-ink, within reason Revise and edit
  76. 76. How to increase the data-ink ratio of this visualization?
  77. 77. Chartjunk -- Vibration
  78. 78. Chartjunk -- The grid The grid should be muted relative to the data. Avoid doubled grid lines.
  79. 79. Chartjunk -- The grid A grid can help in reading and interpolating. A gray grid can promote more accurate data reconstruction than a dark grid. Train schedule
  80. 80. Arrivals and departures from a station are located along the horizontal line. Length of a stop at a station is indicated by the length of the horizontal line. Stations are separated in proportion to their actual distance. The slope of the line reflects the speed of the train.
  81. 81. Chartjunk -- The duck Chart elements that added no value whatsoever, other than to distract or entertain the viewer.
  82. 82. Chartjunk -- The duck Can the data-ink be made more clear eliminating the need for the legend?
  83. 83. Chartjunk -- Counterpoints Significantly better recall of embellished charts than plain charts after a two-to-three week gap.
  84. 84. Chartjunk -- Counterpoints
  85. 85. Data density Data density of a graphic = number of entries in data matrix / area of data graphic Maximize data density and the size of the data matrix, within reason. Graphics can be shrunk way down.
  86. 86. Data density Maximize the size of the data matrix; data density, within reason. Graphics can be shrunk way down.
  87. 87. Data density -- Critiques White space thought to contribute to good visual design. Tufte's book itself has lots of white space.
  88. 88. Small multiples Repeated application of the Shink principle.
  89. 89. Small multiples
  90. 90. Small multiples Well designed small multiples are Comparative; Multivariate; Shrunken, highly-density graphics; Usually based on a large data matrix; Drawn almost entirely with data-ink; Efficient in interpretation; Often narrative in content, showing shifts in the relationship between variables as the index variable changes.
  91. 91. Principle of Graphical Excellence Complex ideas communicated with clarity, precision, and efficiency. Give the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space. Telling the truth about the data.
  92. 92. Recap The L.A.T.C.H principle -- Methods of organization. Tufte design principles Principle of Graphical Integrity Data-Ink Chartjunk Data Density Small Multiples Principle of Graphical Excellence

×