Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Humans See Data

2,089 views

Published on

As presented at Velocity Amsterdam 2016

Published in: Technology

How Humans See Data

  1. 1. How Humans See Data John Rauser @jrauser November 2016
  2. 2. How Humans See Data John Rauser @jrauser November 2016
  3. 3. visualization
  4. 4. visualization is communication
  5. 5. how to make better visualizations
  6. 6. help humans solve analytical problems quickly and accurately with visualization
  7. 7. Part I: Why visualize data at all?
  8. 8. x 1.972 y 1.236 x y 0.111 0.542 1.112 1.994 0.902 0.005 0.000 1.009 0.598 0.085 0.665 1.942 1.613 1.790 0.235 0.356 1.298 1.955 0.247 1.658 0.651 1.937 1.275 1.961 1.949 1.316 0.702 0.045 0.099 0.567 1.760 0.350 0.862 0.010 1.691 0.277 0.027 0.768 1.628 1.778 0.706 1.956 1.957 1.290 1.042 1.999
  9. 9. pre-attentive processing
  10. 10. A graph is an encoding of the data.
  11. 11. x 1.972 y 1.236 x y 0.111 0.542 1.112 1.994 0.902 0.005 0.000 1.009 0.598 0.085 0.665 1.942 1.613 1.790 0.235 0.356 1.298 1.955 0.247 1.658 0.651 1.937 1.275 1.961 1.949 1.316 0.702 0.045 0.099 0.567 1.760 0.350 0.862 0.010 1.691 0.277 0.027 0.768 1.628 1.778 0.706 1.956 1.957 1.290 1.042 1.999
  12. 12. n x y n x y 1 1.972 1.236 13 0.111 0.542 2 1.112 1.994 14 0.902 0.005 3 0.000 1.009 15 0.598 0.085 4 0.665 1.942 16 1.613 1.790 5 0.235 0.356 17 1.298 1.955 6 0.247 1.658 18 0.651 1.937 7 1.275 1.961 19 1.949 1.316 8 0.702 0.045 20 0.099 0.567 9 1.760 0.350 21 0.862 0.010 10 1.691 0.277 22 0.027 0.768 11 1.628 1.778 23 0.706 1.956 12 1.957 1.290 24 1.042 1.999
  13. 13. Good visualizations optimize for the human visual system.
  14. 14. How does the human visual system work?
  15. 15. How does the human visual system decode a graph?
  16. 16. Cleveland’s three visual operations of pattern perception: 1. Detection 2. Assembly 3. Estimation
  17. 17. Part II: estimation
  18. 18. Three levels of estimation a. discrimination X=Y X!=Y b. ranking X>Y X<Y c. ratioing X / Y = ?
  19. 19. At the heart of quantitative reasoning is a single question: Compared to what? - Tufte, Envisioning Information
  20. 20. Three levels of estimation a. discrimination X=Y X!=Y b. ranking X>Y X<Y c. ratioing X / Y = ?
  21. 21. the most important thing
  22. 22. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  23. 23. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  24. 24. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  25. 25. “The first rule of color: do not talk about color!” - Tamara Munzner
  26. 26. luminance saturation hue
  27. 27. luminance saturation hue
  28. 28. Observation: Alphabetical is almost never the correct ordering of a categorical variable.
  29. 29. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  30. 30. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  31. 31. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  32. 32. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  33. 33. 11 mpg
  34. 34. 11 mpg
  35. 35. 11 mpg
  36. 36. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  37. 37. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  38. 38. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  39. 39. Observation: Stacked anything is nearly always a mistake.
  40. 40. Stacking makes the reader decode lengths, not position on a common scale.
  41. 41. 11 mpg
  42. 42. Observation: Stacked anything is nearly always a mistake.
  43. 43. Observation: Pie charts are ALWAYS a mistake.
  44. 44. Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps. http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
  45. 45. Piecharts are the information visualization equivalent of a roofing hammer to the frontal lobe. They have no place in the world of grownups, and occupy the same semiotic space as short pants, a runny nose, and chocolate smeared on one’s face. They are as professional as a pair of assless chaps. http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
  46. 46. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  47. 47. Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used. -Edward Tufte, The Visual Display of Quantitative Information
  48. 48. Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only thing worse than a pie chart is several of them, for then the viewer is asked to compared quantities located in spatial disarray both within and between pies… Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used. -Edward Tufte, The Visual Display of Quantitative Information
  49. 49. Clinton Trump Among Democrats 99% 1% Among Republicans 53% 47% Who do you think did a better job in tonight’s debate?
  50. 50. Afghanistan Albania Algeria Angola Argentina Australia Austria Bahrain Bangladesh Belgium Benin Bolivia Bosnia and Herzegovina Botswana Brazil Bulgaria Burkina Faso Burundi Cambodia Cameroon
  51. 51. All good pie charts are jokes.
  52. 52. Observation: Comparison is trivial on a common scale.
  53. 53. the dashboard metaphor is fundamentally flawed
  54. 54. Observation: Scatterplots show relationships directly.
  55. 55. Observation: Growth charts usually aren’t.
  56. 56. If growth (slope) is important, plot it directly.
  57. 57. Observation: Growth charts usually aren’t. If growth (slope) is important, plot it directly.
  58. 58. The most important measurement should exploit the highest ranked encoding possible. • Position along a common scale • Position on identical but nonaligned scales • Length • Angle or Slope • Area • Volume or Density or Color saturation • Color hue
  59. 59. Cleveland’s three visual operations of pattern perception: 1. Detection 2. Assembly 3. Estimation
  60. 60. Part three: assembly
  61. 61. Gestalt Psychology
  62. 62. reification
  63. 63. emergence
  64. 64. emergence
  65. 65. Prägnanz
  66. 66. Law Of Closure
  67. 67. Law Of Continuity
  68. 68. Observation: Good plots leverage the law of continuity to assist with assembly.
  69. 69. Law of Similarity
  70. 70. Law of Proximity
  71. 71. Observation: dodged bar charts are a bad idea
  72. 72. Cleveland’s three visual operations of pattern perception: 1. Detection 2. Assembly 3. Estimation
  73. 73. Part IV: detection
  74. 74. excel’s defaults are pretty bad
  75. 75. - 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 180,000 200,000 1 2 3 4 5 6
  76. 76. Observation: Detection isn’t as trivial as it seems.
  77. 77. “Above all else, show the data.” -Tufte
  78. 78. Part V: other useful results
  79. 79. Weber’s law: The “Just Noticeable Difference” is proportional to the size of the initial stimuli.
  80. 80. 10 20
  81. 81. 10 20 100 110
  82. 82. 12 units 12 units
  83. 83. Observation: Weber’s Law is why gridlines are useful
  84. 84. “Erase non-data ink.” -Tufte
  85. 85. “Erase non-data ink, within reason.” -Tufte
  86. 86. “Erase non-data ink that interferes with detection or doesn’t assist assembly and estimation.” -Rauser
  87. 87. You are best at detecting variation in slope near 45 degrees.
  88. 88. banking to 45
  89. 89. Observation: Banking to 45 best shows variation in slope
  90. 90. Q: Should I include 0 on my scale?
  91. 91. Q: Should I include 0 on my scale? A: It depends.
  92. 92. Q: Should I include 0 on my scale? A: Relying on the pre-attentive perception of size or intensity? Yes, otherwise you will mislead. Using position? It’s up to you.
  93. 93. “Above all else, show the data.” -Tufte
  94. 94. “Above all else, show the variation in the data.” -Rauser (via Tufte)
  95. 95. R/GGplot2 code for every plot in this presentation available at http://goo.gl/xH5PLV The rendered document is at http://rpubs.com/jrauser/hhsd_notes This presentation is at http://goo.gl/VKxxya I will tweet these links as @jrauser
  96. 96. coda
  97. 97. visualization is communication
  98. 98. art is communication
  99. 99. visualization is art
  100. 100. why does it make you feel that way?
  101. 101. visualization has as much to learn from art as from science
  102. 102. R/GGplot2 code for every plot in this presentation available at http://goo.gl/xH5PLV The rendered document is at http://rpubs.com/jrauser/hhsd_notes This presentation is at http://goo.gl/VKxxya I will tweet these links as @jrauser
  103. 103. end

×