I Can See Clearly Now: A Survey of Data Visualization Techniques & Practice

1,920 views
1,770 views

Published on

A short survey of data visualization techniques

Presented at Toronto Data Science Meetup
February 12th, 2014

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,920
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • Introduction
  • Why data visualization?
  • Big data vs. “small data”
  • 3 components of data science / data scientist:Math (predicative modeling, advanced statistical techniques, machine learning…)Technology (data, coding, big data platforms & databases, Hadoop, Mongo, etc.)Communication (explaining technical findings and processes to non-technical, stakeholder, business)Data viz important as fits squarely into #3, often overlooked
  • Anscombe’s quartet – an example of why dataviz is important
  • 1 – linear relationship2 – non-linear relationship3 – linear relationship w outlier4 – centered around singular x-value with outlier
  • Attentive vs. pre-attentive processing
  • Attentive and pre-attentive processing – a simple example
  • From Stephen Few:Several measures for encoding quantity visuallyLengthWidthArea / SizeOrientationShapeEnclosureColor (Hue, Intensity)Position
  • Simpest case, quantity across categorical variable – pie chartDataviz community & experts – no, thanks
  • Bar graphs instead– good for comparing quantitive data across categories, especially ranking
  • Misleading example
  • As quantity is interpreted from length, this is misleading
  • Instead ensure independent variable axis always starts from 0 (or other base point)
  • On that note – How to Make A Graph Suck Less (in 30s)Example of data-ink ratio
  • 3D pie chart (!)
  • 3D is unnecessary
  • Bar chart
  • Colour is unnecessary
  • As is shading
  • As are gridlines
  • Ranked
  • Highlight value of interest
  • And label
  • Compare and contrast quality of information being represented15% looks larger than 33% in pie chart when is in fact < half
  • Don’t use 3D
  • Scatterplots for comparing two quantitative variablesAcross categorical w colourLine of best fit for illustrating trend
  • - For larger data sets, design choices becomes more important- Not as simple relationships things like a moving average or other fitted curve can be illuminating
  • Nuances of line graphs:Points / markers or no?Missing data
  • As the number of quantities plotted increases, legibility decreases – tradeoffs required
  • Bubble plot
  • For dense data points, interpretation suffersDifferent techniques (alpha, point sizing, kernel density estimation)
  • Hexbinning
  • Extension of 1-D histogram
  • Also boxplotMeasure of central tendency dark line (usually median)Overplot points w jitter for more detail / context as x-axis not quantative
  • SPLOMHistograms along diagonal
  • Trellis graph / small multiples (Tufte)
  • SlopegraphCompare changes in large # of categories across two values of categorical variable (usually time / before-after)Values should be same unitUseful for ranked data
  • Network graph:- Shows relationships between entitiesNodes typically sized by degree or other measure of connectedness or centralityColor for category
  • Or colour by quantiative valueUsually some measure of importance of points in network (measure of centrality)
  • Chord diagram
  • Sankey diagram, 1898 energy efficient of steam engineShows flow w proportional widthsMost notalby in google analytics from google research visualization group
  • Know your audience and how to capture their imagination, make choices accordinglyStartup guys (show us something cool) vs. old C-levels at bank (give me the data)
  • Interactive? Data exploration
  • Exploratory vs. explanatory
  • BreatheWhy visualize?Allows us take in more information at once – pre-attentive processingPattern recognition A picture is worth 1000 words – a graph worth a 1000 data points (or more)
  • iv. Communicating with Data
  • I Can See Clearly Now: A Survey of Data Visualization Techniques & Practice

    1. 1. I Can See Clearly Now A Survey of Data Visualization Techniques & Practice Toronto Data Science Group Wednesday, February 12, 2014 Myles Harrison @everydayanalyst www.everydayanalytics.ca myles@mylesharrison.com
    2. 2. i.
    3. 3. Pourqoui le #dataviz?
    4. 4. DATA
    5. 5. #dataviz Communication “DATA SCIENTIST” Mathematics Technology
    6. 6. Anscombe’s quartet I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.13 6.0 6.08 8.0 Value 5.25 4.0 4.0 Mean of x4.26 in each case 3.10 4.09 (exact) 5.39 19.0 12.50 12.0 10.84 Variance of x in each12.0 case 9.13 12.0 (exact) 8.15 11 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 6.0 Property Mean of y in each case Variance of y in each case 7.50 (to 2 decimal places) 4.122 or 4.127 (to 3 decimal places) Correlation between x and y in each case 0.816 (to 3 decimal places) Linear regression line in each case y = 3.00 + 0.500x (to 2 and 3 decimal places, respectively)
    7. 7. Anscombe’s quartet
    8. 8. pre- attentive processing
    9. 9. 1172 / 293 = ? 1172 293
    10. 10. Adapted from Show Me The Numbers, 2nd ed. by Stephen Few. Analytics Press, 2012
    11. 11. ii.
    12. 12. Variable Y vs. Variable X 25 20 Variable Y 15 10 5 0 0 20 40 60 80 Variable X 100 120 140
    13. 13. Variable Y, January 2013 450 400 350 Variable Y 300 250 200 150 100 50 0 1-Jan-13 6-Jan-13 11-Jan-13 16-Jan-13 Date 21-Jan-13 26-Jan-13
    14. 14. Variable Y, January 2013 450 400 350 Variable Y 300 250 200 150 100 50 0 1-Jan-13 6-Jan-13 11-Jan-13 16-Jan-13 Date 21-Jan-13 26-Jan-13
    15. 15. 450 400 350 300 250 200 150 1-Jan-13 6-Jan-13 11-Jan-13 16-Jan-13 140 120 100 100 80 80 60 60 40 40 20 20 0 26-Jan-13 140 120 21-Jan-13 0 140 140 120 120 100 100 80 80 60 60 40 40 20 20 0 0 1-Jan-13 6-Jan-13 11-Jan-13 16-Jan-13 21-Jan-13 26-Jan-13 1-Jan-13 6-Jan-13 11-Jan-13 16-Jan-13 21-Jan-13 26-Jan-13
    16. 16. iii.
    17. 17. Credit: Kirk, Andy. In Praise of Slopegraphs.
    18. 18. Credit: griffsgraphs.com
    19. 19. Credit: Bostock, M. Co-occurrence of Charactes in Les Miserables. mbostock.github.io/provitis/ex/arc.html
    20. 20. iv.
    21. 21. The Spectrum of Data Visualization Art DESIGN Data Art Information Design ? Science Infographics Dashboards Graphs ANALYSIS Tables
    22. 22. Know your audience!
    23. 23. E.D.A Exploratory Data Analysis Explanatory Data Analysis
    24. 24. Recommended Resources

    ×