Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data & Analytics Club - Data Visualization Workshop

865 views

Published on

Introduction to principles and techniques of data visualization.

Published in: Technology
  • Be the first to comment

Data & Analytics Club - Data Visualization Workshop

  1. 1. Data Visualization Nikhil Srivastava, 2015 Nikhil Srivastava Wharton Data & Analytics Club
  2. 2. Data Visualization Nikhil Srivastava, 2015 hoster@wharton.upenn.edu
  3. 3. Data Visualization Nikhil Srivastava, 2015 About this Lecture • Shortened version of longer course – Slides, demos, extra material – Code samples and libraries – Sample projects • Questions
  4. 4. Data Visualization Nikhil Srivastava, 2015 About You
  5. 5. Data Visualization Nikhil Srivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction Outline
  6. 6. Data Visualization Nikhil Srivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  7. 7. Data Visualization Nikhil Srivastava, 2015 Data Visualization Information Visualization Scientific Visualization Infographics Statistical Graphics Informative Art Art Science Statistics JournalismDesign Visual Analytics Business
  8. 8. Data Visualization Nikhil Srivastava, 2015 City State Population Baton Rouge Louisiana 191,741 Birmingham Alabama 220,927 Broken Arrow Oklahoma 58,018 Eugene Oregon 115,890 Glendale Arizona 245,868 Huntsville Alabama 55,741 Lafayette Louisiana 87,737 Mobile Alabama 98,147 Montgomery Alabama 126,250 New Orleans Louisiana 322,172 Norman Oklahoma 101,590 Peoria Arizona 167,868 Portland Oregon 514,108 Salem Oregon 147,631 Scottsdale Arizona 134,335 Shreveport Louisiana 68,756 Surprise Arizona 90,548 Tempe Arizona 143,369 Tulsa Oklahoma 392,138
  9. 9. Data Visualization Nikhil Srivastava, 2015 • Which is the most populous city in the list? • Which state in the list has the most cities? • Which state in the list has the largest average city? City State Population Baton Rouge Louisiana 191,741 Birmingham Alabama 220,927 Broken Arrow Oklahoma 58,018 Eugene Oregon 115,890 Glendale Arizona 245,868 Huntsville Alabama 55,741 Lafayette Louisiana 87,737 Mobile Alabama 98,147 Montgomery Alabama 126,250 New Orleans Louisiana 322,172 Norman Oklahoma 101,590 Peoria Arizona 167,868 Portland Oregon 514,108 Salem Oregon 147,631 Scottsdale Arizona 134,335 Shreveport Louisiana 68,756 Surprise Arizona 90,548 Tempe Arizona 143,369 Tulsa Oklahoma 392,138
  10. 10. Data Visualization Nikhil Srivastava, 2015
  11. 11. Data Visualization Nikhil Srivastava, 2015 • Which is the most populous city in the list? • Which state in the list has the most cities? • Which state in the list has the largest average city?
  12. 12. Data Visualization Nikhil Srivastava, 2015 • Which is the most populous city in the list? • Which state in the list has the most cities? • Which state in the list has the largest average city? • What is the population of Montgomery, Alabama?
  13. 13. Data Visualization Nikhil Srivastava, 2015 Data Visualization is: • Useful – Answers user questions – Reduces user workload (by design, not by default)
  14. 14. Data Visualization Nikhil Srivastava, 2015 Anscombe’s quartet (1973)
  15. 15. Data Visualization Nikhil Srivastava, 2015 Anscombe’s quartet (1973)
  16. 16. Data Visualization Nikhil Srivastava, 2015 Data Visualization is: • Useful – Understand structure and patterns – Resolve ambiguity – Locate outliers
  17. 17. Data Visualization Nikhil Srivastava, 2015
  18. 18. Data Visualization Nikhil Srivastava, 2015 Data Visualization is: • Important – Design decisions affect interpretation
  19. 19. Data Visualization Nikhil Srivastava, 2015 Crimean War Deaths Florence Nightingale, 1858 (re-colorized)
  20. 20. Data Visualization Nikhil Srivastava, 2015 Gapminder Foundation
  21. 21. Data Visualization Nikhil Srivastava, 2015 Data Visualization is: • Powerful – Communicate, teach, inspire
  22. 22. Data Visualization Nikhil Srivastava, 2015 purpose communicate explore, analyze data type numerical, categorical text, maps, graphs, networks method static representation animation, interactivity Our Focus
  23. 23. Data Visualization Nikhil Srivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  24. 24. Data Visualization Nikhil Srivastava, 2015 The Hardware
  25. 25. Data Visualization Nikhil Srivastava, 2015 The Software • High-level concepts: objects, symbols • Involves working memory • Slower, serial, conscious • Sensory input • Low-level features: orientation, shape, color, movement • Rapid, parallel, automatic Visual Perception “Bottom-up”
  26. 26. Data Visualization Nikhil Srivastava, 2015 The Software • High-level concepts: objects, symbols • Involves working memory • Slow, sequential, conscious • Sensory input • Low-level features: orientation, shape, color, movement • Rapid, parallel, automatic “Bottom-up” “Top-down” Visual Perception
  27. 27. Data Visualization Nikhil Srivastava, 2015 Task: Counting How many 3’s? 1281768756138976546984506985604982826762 9809858458224509856458945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
  28. 28. Data Visualization Nikhil Srivastava, 2015 Task: Counting How many 3’s? 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
  29. 29. Data Visualization Nikhil Srivastava, 2015 Task: Counting Slow, sequential, conscious Rapid, parallel, automatic 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686 1281768756138976546984506985604982826762 9809858458224509856358945098450980943585 9091030209905959595772564675050678904567 8845789809821677654876364908560912949686
  30. 30. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  31. 31. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  32. 32. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  33. 33. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search Which side has the red circle?
  34. 34. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search Slow, sequential, conscious Rapid, parallel, automatic
  35. 35. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search
  36. 36. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search
  37. 37. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search
  38. 38. Data Visualization Nikhil Srivastava, 2015 Task: (Distracted) Search Slow, sequential, conscious Rapid, parallel, automatic (n=7) (n=5) (n=3)
  39. 39. Data Visualization Nikhil Srivastava, 2015 Lessons for Visualization • Use “pre-attentive” attributes when possible – Color, shape, orientation (depth, motion) – Faster, higher bandwidth • Caveats – Beware limits of working memory (<7) – Be careful mixing attributes
  40. 40. Data Visualization Nikhil Srivastava, 2015 Example: Inefficient Attributes
  41. 41. Data Visualization Nikhil Srivastava, 2015 Example: Too Many Attributes
  42. 42. Data Visualization Nikhil Srivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  43. 43. Data Visualization Nikhil Srivastava, 2015 What kind of data do we have? How can we represent the data visually? How can we organize this into a visualization? Visual Encoding
  44. 44. Data Visualization Nikhil Srivastava, 2015 Data Types CATEGORICAL ORDINAL NUMERICAL Interval Ratio Male / Female Asia / Africa / Europe True / False Small / Med / Large Low / High Yes / Maybe / No Latitude/Longitude Compass direction Time (event) Length Count Time (duration) = = = = < > < > < > - + - * /
  45. 45. Data Visualization Nikhil Srivastava, 2015 Data Types CATEGORICAL ORDINAL NUMERICAL Interval Ratio Male / Female Asia / Africa / Europe True / False Small / Med / Large Low / High Yes / Maybe / No Latitude/Longitude Compass direction Time (event) Length Count Time (duration) Bin/Categorize Difference/Normalize
  46. 46. Data Visualization Nikhil Srivastava, 2015 Data Types (Advanced) • Networks/Graphs – Hierarchies/Trees • Text • Maps: points, regions, routes
  47. 47. Data Visualization Nikhil Srivastava, 2015 What kind of data do we have? How can we represent the data visually? How can we organize this into a visualization? Visual Encoding
  48. 48. Data Visualization Nikhil Srivastava, 2015 Visual Encodings Marks point line area volume Channels position size shape color angle/tilt
  49. 49. Data Visualization Nikhil Srivastava, 2015 Channel Effectiveness
  50. 50. Data Visualization Nikhil Srivastava, 2015 Channel Effectiveness “Spatial position is such a good visual coding of data that the first decision of visualization design is which variables get spatial encoding at the expense of others”
  51. 51. Data Visualization Nikhil Srivastava, 2015 What kind of data do we have? How can we represent the data visually? How can we organize this into a visualization? Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626
  52. 52. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Scatter Plot point position 2 quantitative
  53. 53. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Scatter + Hue point position, color 2 quantitative, 1 categorical
  54. 54. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Scatter + Size (“Bubble”) point position, size 3 quantitative
  55. 55. Data Visualization Nikhil Srivastava, 2015 Scatter Plot – Applications RELATIONSHIP GROUPING OUTLIERS
  56. 56. Data Visualization Nikhil Srivastava, 2015 Scatter Plot – Dangers OCCLUSION (DENSITY) OCCLUSION (OVERLAP) 3-D
  57. 57. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Line Chart line position (orientation) 2 quantitative
  58. 58. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Area Chart area size (length) 2 quantitative
  59. 59. Data Visualization Nikhil Srivastava, 2015 Line Chart – Applications PATTERN OVER TIME COMPARISON
  60. 60. Data Visualization Nikhil Srivastava, 2015 Line Chart – Dangers Y SCALING X SCALING OVERLOAD
  61. 61. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Bar Chart line size (length) 1 categorical, 1 quantitative
  62. 62. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Histogram line size (length) 1 ordinal/quantitative, 1 quantitative (count)
  63. 63. Data Visualization Nikhil Srivastava, 2015 Bar Chart – Applications COMPARE CATEGORIES DISTRIBUTION
  64. 64. Data Visualization Nikhil Srivastava, 2015 Bar Chart – Dangers TOO MANY CATEGORIES POORLY SORTED CATEGORIES ZERO AXIS
  65. 65. Data Visualization Nikhil Srivastava, 2015 type mark channel data represented Pie Chart area size (angle) 1 quantitative
  66. 66. Data Visualization Nikhil Srivastava, 2015 Pie Chart – Dangers AREA/ANGLE SCALE SIMILAR AREAS OVERLOAD
  67. 67. Data Visualization Nikhil Srivastava, 2015 Multi-Series: Bar “GROUPED” BAR CHART “STACKED” BAR CHART
  68. 68. Data Visualization Nikhil Srivastava, 2015 Multi-Series: Line MULTIPLE LINE STACKED AREA CHART
  69. 69. Data Visualization Nikhil Srivastava, 2015 Normalization NORMALIZED BAR NORMALIZED AREA
  70. 70. Data Visualization Nikhil Srivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  71. 71. Data Visualization Nikhil Srivastava, 2015 From Science to Art • Design principles* • Style guidelines* *dependent on context and objective (and author)
  72. 72. Data Visualization Nikhil Srivastava, 2015 Design Principles
  73. 73. Data Visualization Nikhil Srivastava, 2015 Design Principles • Integrity – Tell the truth with data • Effectiveness – Achieve visualization objectives • Aesthetics – Be compelling, vivid, beautiful
  74. 74. Data Visualization Nikhil Srivastava, 2015 Integrity Lie Ratio = size of effect in graphic size of effect in data
  75. 75. Data Visualization Nikhil Srivastava, 2015 Integrity
  76. 76. Data Visualization Nikhil Srivastava, 2015 Integrity “show data variation, not design variation”
  77. 77. Data Visualization Nikhil Srivastava, 2015 Effectiveness* Data/Ink Ratio = ink representing data total ink *Tufte
  78. 78. Data Visualization Nikhil Srivastava, 2015 Effectiveness* *Tufte avoid “chart junk”
  79. 79. Data Visualization Nikhil Srivastava, 2015 Avoid Chart Junk
  80. 80. Data Visualization Nikhil Srivastava, 2015 Avoid Chart Junk
  81. 81. Data Visualization Nikhil Srivastava, 2015 Avoid Chart Junk
  82. 82. Data Visualization Nikhil Srivastava, 2015 Avoid Chart Junk
  83. 83. Data Visualization Nikhil Srivastava, 2015 Avoid Chart Junk
  84. 84. Data Visualization Nikhil Srivastava, 2015 Avoid Chart Junk
  85. 85. Data Visualization Nikhil Srivastava, 2015 Effectiveness (Few)
  86. 86. Data Visualization Nikhil Srivastava, 2015 Practical Guidelines • Avoid 3-D charts • Focus on substance over graphics • Avoid separate legends and keys • Use faint grids/guidelines • Avoid unnecessary textures and colors
  87. 87. Data Visualization Nikhil Srivastava, 2015 A Note on Color • To label • To emphasize • To liven or decorate
  88. 88. Data Visualization Nikhil Srivastava, 2015 Color as a Channel Categorical Quantitative Hue Good (6-8 max) Poor Value Poor Good Saturation Poor Okay
  89. 89. Data Visualization Nikhil Srivastava, 2015 Bad Color
  90. 90. Data Visualization Nikhil Srivastava, 2015 Good Color
  91. 91. Data Visualization Nikhil Srivastava, 2015 More Color Guidelines • Use color only when necessary • Saturated colors for small areas, labels • Less saturated colors for large areas, backgrounds • Use tools like ColorBrewer
  92. 92. Data Visualization Nikhil Srivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  93. 93. Data Visualization Nikhil Srivastava, 2015 What Tools to Use? Athi River Machakos 139,380 Awasi Kisumu 93,369 Kangundo-Tala Machakos 218,557 Karuri Kiambu 129,934 Kiambu Kiambu 88,869 Kikuyu Kiambu 233,231 Kisumu Kisumu 409,928 Kitale Trans-Nzoia 106,187 Kitui Kitui 155,896 Limuru Kiambu 104,282 Machakos Machakos 150,041 Molo Nakuru 107,806 Mwingi Kitui 83,803 Naivasha Nakuru 181,966 Nakuru Nakuru 307,990 Nandi Hills Trans-Nzoia 73,626 Clean Restructure Explore Analyze DATA Visualization Goals
  94. 94. Data Visualization Nikhil Srivastava, 2015 Visualization Tools Excel Tableau Plotly Python R Matlab Ubiq/Silk How hard is it to learn? How powerful & flexible is it? I’ll have to write code
  95. 95. Data Visualization Nikhil Srivastava, 2015 Visualization Tools Excel Tableau Plotly Python R Matlab Ubiq/Silk How hard is it to learn? How powerful & flexible is it? Google Charts Highcharts d3 I’ll have to write code
  96. 96. Data Visualization Nikhil Srivastava, 2015 Cheat Sheets • For Hackathon participants • Otherwise, email me
  97. 97. Data Visualization Nikhil Srivastava, 2015 • What is Data Visualization? • Thinking and Seeing • From Data to Graphics • Principles and Guidelines • Building Visualizations • Advanced introduction foundation & theory building blocks design & critique construction
  98. 98. Data Visualization Nikhil Srivastava, 2015 Small Multiples
  99. 99. Data Visualization Nikhil Srivastava, 2015 Treemap (Hierarchical Data) Strengths: nested relationships Concerns: order, aspect ratio
  100. 100. Data Visualization Nikhil Srivastava, 2015 Multi-Level Pie Chart (Hierarchical Data) Strengths: nested relationships Concerns: readability
  101. 101. Data Visualization Nikhil Srivastava, 2015 Heat Map (Table/Field Data) Strengths: pattern/outlier detection Concerns: ordering, clustering, color
  102. 102. Data Visualization Nikhil Srivastava, 2015 Choropleth (Region Data) Strengths: geography Concerns: region size color
  103. 103. Data Visualization Nikhil Srivastava, 2015 Cartogram (Region Data) Strengths: geographic pattern Concerns: base map knowledge
  104. 104. Data Visualization Nikhil Srivastava, 2015 The Ebb and Flow of Movies NY Times, 2008 Streamgraph
  105. 105. Data Visualization Nikhil Srivastava, 2015 “Data Visualization” Wikipedia Page Wordle Word Cloud
  106. 106. Data Visualization Nikhil Srivastava, 2015
  107. 107. Data Visualization Nikhil Srivastava, 2015 Twitter Networks PJ Lamberson, 2012
  108. 108. Data Visualization Nikhil Srivastava, 2015 Blogs/Reference • Infosthetics.com • Visualizing.org • FlowingData.com
  109. 109. Data Visualization Nikhil Srivastava, 2015 Nikhil Srivastava nsri@wharton.upenn.edu

×