Practical Data Visualization

1,332 views
1,187 views

Published on

Published in: Data & Analytics, Technology

Practical Data Visualization

  1. 1. Practical Data Visualization April 10, 2014 COMPSCI 290-01: Everything Data https://iu.box.com/everythingdata Angela Zoss Data Visualization Coordinator Data & GIS Services
  2. 2. WHY VISUALIZE?
  3. 3. Preserve complexity Anscombe’s Quartet I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
  4. 4. Preserve complexity http://en.wikipedia.org/wiki/Anscombe%27s_quartet Anscombe’s Quartet I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Property Value Mean of x 9 (exact) Variance of x 11 (exact) Mean of y 7.50 (to 2 decimal places) Variance of y 4.122 or 4.127 (to 3 decimal places) Correlation between x and y 0.816 (to 3 decimal places) Linear regression line y = 3.00 + 0.500x (to 2 and 3 decimal places, respectively)
  5. 5. Preserve complexity http://en.wikipedia.org/wiki/Anscombe%27s_quartet Anscombe’s Quartet
  6. 6. Evaluate data quality Query using Facebook API • Node-link diagram Kandel, Heer, Plaisant, et al. (2011) http://dx.doi.org/10.1177/1473871611415994
  7. 7. Query using Facebook API • Node-link diagram • Matrix display with clustering Evaluate data quality Kandel, Heer, Plaisant, et al. (2011) http://dx.doi.org/10.1177/1473871611415994
  8. 8. Kandel, Heer, Plaisant, et al. (2011) http://dx.doi.org/10.1177/1473871611415994 Query using Facebook API • Node-link diagram • Matrix display with clustering • Matrix display, API return order Evaluate data quality
  9. 9. Kandel, Heer, Plaisant, et al. (2011) http://dx.doi.org/10.1177/1473871611415994 Query using Facebook API • Node-link diagram • Matrix display with clustering • Matrix display, API return order 5000-item result limit Silent failure Evaluate data quality
  10. 10. Tell a story Hans Rosling –The River of Myths http://www.youtube.com/watch?v=OwII-dwh-bk http://www.gapminder.org/
  11. 11. CREATING A VISUALIZATION
  12. 12. From Data to Graphic • What data types are present in the data source? • How are the variables likely to relate? • What visualization type seems to be the best fit for the goal?
  13. 13. Matching Data Types to Visual Elements Mackinlay, J. (1986).Automating the design of graphical presentations of relational information. ACMTransactions on Graphics,5(2), 110-141. http://dx.doi.org.proxy.lib.duke.edu/10.1145/22949.22950
  14. 14. Example Encoding Ordered Useful values Quantitative Ordinal Categorical Relational position, placement yes infinite Good Good Good Good 1, 2, 3; A, B, C text labels optional (alphabetical or numbered) infinite Good Good Good Good length yes many Good Good size, area yes many Good Good angle yes medium/few Good Good pattern density yes few Good Good weight, boldness yes few Good saturation, brightness yes few Good color no few (< 20) Good shape, icon no medium Good pattern texture no medium Good enclosure, connection no infinite Good Good line pattern no few Good line endings no few Good line weight yes few Good Properties and Best Uses of Visual Encodings Noah Iliinsky • ComplexDiagrams.com/properties • 2012-06 Visual Encodings http://complexdiagrams.com/properties
  15. 15. Chart Choosers • Interested in showing composition? Relationship? Distribution? (What do the charts do well?) http://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html • Chart typically determines position of elements, with some built-in visual encodings. • Additional visual encodings can often be added to incorporate more variables into charts, but beware of overwhelming the audience.
  16. 16. Chart Choosers http://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html
  17. 17. VISUALIZATION TYPES
  18. 18. Common Visualization Types • 1D/Linear • 2D/Planar (incl. Geospatial) • 3D/Volumetric • Temporal • nD/Multidimensional (common charts, etc.) • Tree/Hierarchical • Network Shneiderman, B. (1996).The eyes have it:A task by data type taxonomy for information visualizations. Proceedings of IEEE Symposium onVisual Languages - Boulder,CO (pp. 336-343). http://dx.doi.org.proxy.lib.duke.edu/10.1109/VL.1996.545307 See LibGuide (http://guides.library.duke.edu/vis_types) for examples and tools.
  19. 19. One-dimensional scatter plot
  20. 20. 3D visualization http://bit.ly/1gaahiw
  21. 21. Common Visualization Types Showing Space
  22. 22. Proportional symbol http://ti.me/RQaRH9 http://wapo.st/2012-campaignvisits
  23. 23. Proportional symbol
  24. 24. Choropleth https://twitter.com/mihi_tr/status/330261204083810304/photo/1
  25. 25. Choropleth https://twitter.com/mihi_tr/status/330261204083810304/photo/1
  26. 26. And don’t make users do “visual math.” http://eagereyes.org/criticism/visual-math-wrong http://enb105-2012s-rw.blogspot.com/2012/02/lab-two-mapping-excercise.html
  27. 27. Common Routes Based on Ship Log Data http://bit.ly/1i3PSQh
  28. 28. Atlas of the Historical Geography of the United States (1932) http://bit.ly/1qv0Lvo
  29. 29. Possible tools for mapping • ArcGIS • QGIS • Google FusionTables • Tableau Public • Google Earth • GeoCommons • CartoDB • JavaScript – D3 http://d3js.org/ – Leaflet http://leafletjs.com/ – Kartograph http://kartograph.org/ – Polymaps http://polymaps.org/ – Google Maps API https://developers.googl e.com/maps/documentat ion/javascript/ • Very basic: – Google Spreadsheets – BatchGeo http://batchgeo.com/ – OpenHeatMap http://www.openheatma p.com/ See also: http://library.duke.edu/data/gis https://github.com/veltman/learninglunches/tree/master/maps
  30. 30. Common Visualization Types Showing Time
  31. 31. Economic indicators over time http://blogs.library.duke.edu/data/2012/11/12/adding-colored-regions-to-excel-charts/
  32. 32. http://seawifs.gsfc.nasa.gov/SEAWIFS/BACKGROUND/Gallery/time_series.jpg Time series of 2D data set
  33. 33. Storylines http://xkcd.com/657/
  34. 34. http://neoformix.com/2013/NovelViews.html
  35. 35. Shape of Song http://www.turbulence.org/Works/song/mono.html
  36. 36. http://itsbeenreal.co.uk/index.php?/wwwords/rhythm-textures/
  37. 37. Over the Decades, How States Have Shifted http://nyti.ms/Wr1dhZ
  38. 38. Possible tools for temporal vis. • Basic charting tools • TimelineJS http://timeline.knightlab.com/ • SimileTimeline http://simile.mit.edu/ • D3
  39. 39. Common Visualization Types Showing Numbers
  40. 40. Raw: Binned Scatterplot http://raw.densitydesign.org/
  41. 41. Parallel Coordinates http://eagereyes.org/techniques/parallel-coordinates
  42. 42. Alluvial Diagram (Raw) http://raw.densitydesign.org/
  43. 43. http://www.nytimes.com/interactive/2008/09/04/us/politics/20080905_WORDS_GRAPHIC.html
  44. 44. http://flowingdata.com/2010/01/21/how-to- make-a-heatmap-a-quick-and-easy-solution/ http://flowingdata.com/2011/09/13/last-fm- scrobbles-as-calendar-heat-map/ Heat Maps
  45. 45. Dynamic Pairs Plot: http://www.stat.sc.edu/~west/bradley/census.html Pairs Plots
  46. 46. Possible tools for multidimensional vis. • Basic charting tools • Raw http://raw.densitydesign.org/ • Tableau • D3 • R
  47. 47. Common Visualization Types Showing Hierarchies
  48. 48. Dendrogram (Raw) http://raw.densitydesign.org/
  49. 49. Treemap (Raw) http://raw.densitydesign.org/
  50. 50. Circle Packing (Raw) http://raw.densitydesign.org/
  51. 51. Possible tools for hierarchies • Tableau • D3 • Raw • Google Spreadsheets
  52. 52. Common Visualization Types Showing Relationships
  53. 53. Flights http://www.aaronkoblin.com/work/flightpatterns/
  54. 54. Revolutionaries http://kieranhealy.org/blog/archives/2013/06/09/using-metadata-to-find-paul-revere/
  55. 55. Scientific Fields, Spanish Empire (1600-1810) http://republicofletters.stanford.edu/casestudies/spanishempire.html
  56. 56. Example: NIH Map Viewer https://app.nihmaps.org/nih/browser/
  57. 57. Chinese Canadian Immigrant Flows http://stanford.io/1hCYwkd
  58. 58. Flight & Expulsion http://www.niceone.org/lab/refugees/
  59. 59. Tube Map
  60. 60. Possible tools for network vis. • D3 • Gephi http://gephi.org/ • NodeXL http://nodexl.codeplex.com/ • Pajek http://vlado.fmf.uni- lj.si/pub/networks/pajek/ • Cytoscape • NetworkWorkbench/Sci2 http://nwb.cns.iu.edu/, https://sci2.cns.iu.edu/ • VOSviewer http://www.vosviewer.com/ • UCINET https://sites.google.com/site/ucinetso ftware/home • GUESS http://graphexploration.cond.org/ • R • SigmaJS http://sigmajs.org/
  61. 61. VISUALIZING UNCERTAINTY
  62. 62. Showing uncertainty http://peltiertech.com/WordPress/excel-fan-chart-showing-uncertainty-in-projections/
  63. 63. Showing uncertainty http://ivi.sagepub.com/content/10/4/271
  64. 64. Showing uncertainty http://vialab.science.uoit.ca/portfolio/lattice-uncertainty-visualization- understanding-machine-translation-and-speech-recognition
  65. 65. Take-away Uncertainty is blue.
  66. 66. TOOLS
  67. 67. JMP Pro https://oit.duke.edu/comp-print/software/license/detail.php?id=4 http://www.jmp.com/support/help/Essential_Graphing.shtml
  68. 68. JMP: Essential Graphing • Overlay Plots • Scatterplot 3D • Contour Plots • Bubble Plots • Parallel Plots • Cell Plots • Treemaps • Scatterplot Matrix • Ternary Plots • Summary Charts • Create Maps http://www.jmp.com/support/help/Essential_Graphing.shtml
  69. 69. DEMO
  70. 70. Congress data query SELECT person_id, type, MIN(start_date), MAX(end_date), gender FROM person_roles LEFT JOIN persons ON person_roles.person_id = persons.id GROUP BY person_id, type;
  71. 71. OTHER TOOLS
  72. 72. Tableau http://guides.library.duke.edu/tableau
  73. 73. What can Tableau make? • Maps (symbol, filled) • Text tables • Heat maps a grid representing variables by size and color • Highlight tables a grid representing variables by text and color • Treemap a grid representing variables by size • Horizontal bars • Stacked bars • Side-by-side bars • Lines/Area charts • Lines/Area charts (discrete) • Dual lines • Pie charts • Scatter plots • Circle views • Side-by-side circles • Dual combination • Bullet graphs • Gantt • Packed bubbles/Word cloud • Histogram
  74. 74. Tableau Desktop Windows only (for now). Free for: • students (http://www.tableausoftware.com/academic/students) • teachers using it in a class, semester license (http://www.tableausoftware.com/academic/teaching) Otherwise, can useTableau Public for free (installed in Perkins 226)
  75. 75. Protip: Tableau wants one column per variable
  76. 76. Gephi http://bit.ly/gephi_workshop
  77. 77. Data formats • Confusing number of choices • GEXF supports many program features, but a pain to write by hand • Spreadsheet is convenient and supports important features https://gephi.org/users/supported-graph-formats/
  78. 78. In addition to network visualization, Gephi can calculate: • Degree (when directed, in-degree and out-degree) • Diameter – Betweenness Centrality – Closeness Centrality – Eccentricity • Density • Clustering/Modularity
  79. 79. D3.js http://bl.ocks.org/dukevis/8782982 http://d3js.org
  80. 80. About D3 • JavaScript library • Fairly low level; building with rectangles and circles and lines, instead of pre-made chart structures* • Basic functioning makes it easy to join HTML elements with data points
  81. 81. *D3 Middleware Basic line/area chart: • xCharts ~10 lines? http://tenxer.github.io/xcharts/ • Rickshaw (specifically for time series) ~16 lines http://code.shutterstock.com/rickshaw/ • NVD3 ~31 lines http://nvd3.org/ • Vega ~57 lines http://trifacta.github.io/vega/ http://chimera.labs.oreilly.com/books/12300000003 45/ch02.html#_tools_built_with_d3
  82. 82. D3 Resources • Interactive DataVisualization for theWeb http://chimera.labs.oreilly.com/books/123000000 0345 • Tutorial and Cheat Sheet, c. 2012 www.jeromecukier.net/blog/2012/10/15/d3- tutorial-at-visweek-2012/ • D3Tips andTricks https://leanpub.com/D3-Tips-and-Tricks/read
  83. 83. When to use D3 • Need for unusual, highly customized chart types (http://bl.ocks.org/mbostock) • Relatively low number of data points or visible elements (SVG vs. HTML5 Canvas) • Impress your friends
  84. 84. Raw http://raw.densitydesign.org/ Has visualizations to show: • Numbers • Relationships • Hierarchies
  85. 85. Google Spreadsheets https://drive.google.com/
  86. 86. Datawrapper http://datawrapper.de
  87. 87. ManyEyes http://www.ibm.com/manyeyes …or maybe http://www-958.ibm.com/software/analytics/labs/manyeyes/
  88. 88. ManyEyes http://www.ibm.com/manyeyes …or maybe http://www-958.ibm.com/software/analytics/labs/manyeyes/ Many of these require Java
  89. 89. ChartBuilder http://quartz.github.io/Chartbuilder/
  90. 90. Plot.ly https://plot.ly/
  91. 91. TimelineJS http://timeline.knightlab.com/
  92. 92. Timeliner http://timemapper.okfnlabs.org/
  93. 93. StoryMapJS http://storymap.knightlab.com/
  94. 94. Also, GitHub auto-rendering • 3D Files https://help.github.com/articles/3 d-file-viewer • GeoJSON/TopoJSON https://help.github.com/articles/m apping-geojson-files-on-github • CSV/TSV https://help.github.com/articles/re ndering-csv-and-tsv-data
  95. 95. VISUALIZATION TIPS
  96. 96. Visualization Tips: Vector output • Stats programs aren’t design programs • Vector output (PDF, SVG, EPS) is easy to edit later in a vector graphics program like Adobe Illustrator • Also helps to create high-res for posters, Mediawall, etc.
  97. 97. Design Tips: In a nutshell Simplify (but not the axis): • Reduce color • Focus on major trends • Consistent style/format/reference system http://guides.library.duke.edu/topten
  98. 98. Pay attention to text in figures • Horizontal text/bars • Increase font size http://bit.ly/figtext
  99. 99. Crowded charts can be overwhelming
  100. 100. Clarify groups and trends…
  101. 101. Or use multiple charts, keeping a consistent grid
  102. 102. See also: Tufte, E. R. (1990). Envisioning information.Cheshire, CT: Graphics Press, p. 28-29, 78. http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/china-cdm-projects-by-type-and-regio Small multiples
  103. 103. Consider summarizing
  104. 104. Avoid special effects http://bit.ly/3dpiebad
  105. 105. Design for Non-Designers Michael Faber, basic graphic design principles • Learn IT at Lunch, Wednesday,April 9 Follow-up: http://bit.ly/1ktHzRg • Visualization Friday Forum recording, Spring 2013 http://bit.ly/14oxuIO
  106. 106. Good Chart Makeover Examples The Why Axis chart remakes http://thewhyaxis.info/remakes/ Storytelling With Data visual makeovers: http://www.storytellingwithdata.com/search/l abel/Visual%20Makeover
  107. 107. On the web • Bad examples: WTFViz, http://wtfviz.net/ • Good examples: Thumbs UpViz, http://thumbsupviz.com/ • Ask for help: Help MeViz, http://helpmeviz.com/
  108. 108. More on Data Visualization Visual communication: http://guides.library.duke.edu/visualcomm Data visualization: http://guides.library.duke.edu/datavis/ Top 10 dos and don’ts for charts and graphs: http://guides.library.duke.edu/topten
  109. 109. RESOURCES OFFERED AT DUKE
  110. 110. Data & GIS Services • Perkins 226 computing cluster • Walk-in consultations • Data collections • Workshops • Online instructional materials
  111. 111. Brandaleone Family Center for Data and GIS Services • Perkins 226 • Open whenever the library is open • 12 high-powered Dell workstations • 3 Bloomberg financial workstations • Various data analysis, GIS, and visualization software packages available http://library.duke.edu/data/about/lab
  112. 112. Support Area: Visualizing Data • GIS (Geographic Information Systems) support – Workshops on ArcGIS and other online mapping tools – High powered computers with GIS software – Expert help from Data & GIS Staff • Visualization support, more broadly – Workshops onTableau Public and best practices for charts, graphs, posters, etc.
  113. 113. Walk-in Consulting …or by appointment: askdata@duke.edu http://library.duke.edu/data/about/schedule
  114. 114. Workshops • Typically toward the beginning of the semester • Covering: data processing/statistical software, GIS/mapping, visualization http://library.duke.edu/data/news • 1-2 hours, often hands-on http://library.duke.edu/data/guides/ For announcements, sign up for our listserv: https://lists.duke.edu/sympa/subscribe/dgs-announce
  115. 115. Information about Data & GIS Services • Data collections, LibGuides, etc. http://library.duke.edu/data/ • Blog (tutorials, announcements, etc.) http://blogs.library.duke.edu/data/ • E-mail consultations askdata@duke.edu • Twitter accounts @duke_data, @duke_vis
  116. 116. Other visualization resources • Visualization Friday Forum http://vis.duke.edu/FridayForum/ • Duke Flickr Gallery http://bit.ly/dukevis • Student DataVisualization Contest http://bit.ly/viscontest14 • LINK Mediawall https://wiki.duke.edu/display/LMW/
  117. 117. QUESTIONS? SUGGESTIONS? angela.zoss@duke.edu http://twitter.com/duke_vis

×