Visualization for
Data Science
Angela Zoss
Duke University Libraries
angela.zoss@duke.edu
NCDS DataBytes Webinar
May 3, 2017
Slides: http://bit.ly/vis4ds
Angela Zoss
Data Vis.
Coordinator,
Duke University
Libraries
Angela Zoss
http://library.duke.edu/data
Today’s Topics
• Approaches to visualization
• Visualization for data science
• Developing your design instincts
• Reproducibility and visualization
VISUALIZATION APPROACHES
Vis for
Everyone
http://guides.library.duke.edu/vis_types
Vis for
Utility
http://web.mta.info/nyct/maps/subway_map.pdf
Vis for
News
https://nyti.ms/Wr1dhZ http://www.stanfordkaystudio.com/
information.html
Vis for
Entertainment
http://www.informationisbeautiful.net/visualizations/a-taxonomy-of-
hipster-coffee-shop-names/
Vis for Art
http://www.dear-data.com/theproject
Vis for
Activism
http://guns.periscopic.com
Vis for
Research
http://atlas.cid.harvard.edu
Vis for
Business
https://datastudio.google.com
VISUALIZATION FOR DATA SCIENCE
Transforming
Data
Jeff Heer
http://bit.ly/HeerVisProcess
Visualization for
• Exploration
• Communication
Visualization
for Exploration
Distributions
Visualization
for Exploration
Correlations
Visualization
for Exploration
Testing statistical
assumptions
3.1 3.2 3.3 3.4 3.5
−2.0−1.00.01.0
Fitted values
Residuals
●
● ●●
●
●●●
● ●●
●●
●
●
●
●● ●●●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●●
●● ●●
●●●●
●
●
●
Residuals vs Fitted
100
98
92
●
●●●
●
●● ●
●●●
●●
●
●
●
●●●●● ● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●
●●●●
●●●●
●
●
●
−2 −1 0 1 2
−4−2012
Theoretical Quantiles
Standardizedresiduals
Normal Q−Q
100
98
92
3.1 3.2 3.3 3.4 3.5
0.00.51.01.52.0
Fitted values
Standardizedresiduals
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●●
●●●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
Scale−Location
100
98
92
0.00 0.02 0.04 0.06 0.08 0.10
−4−202
Leverage
Standardizedresiduals
●
● ●●
●
●●●
● ●●
●●
●
●
●
●● ●●●●●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●●●
●●●●
●●●●
●
●
●
Cook's distance
1
0.5
Residuals vs Leverage
99
1
34
http://graphicaldescriptives.org
http://dx.doi.org/10.1177/1745691616663875
Visualization
for Exploration
Insight through
experimentation
Nathan Yau
http://flowingdata.com/2017/01/24/one-dataset-visualized-25-ways/
Visualization for
Communication
Annotations
Visualization for
Communication
Storytelling
Hans Rosling
http://www.youtube.com/watch?v=OwII-dwh-bk
Visualization for
Communication
Dashboards
Shamik Sharma, via
https://www.perceptualedge.com/blog/?p=1374
DESIGNING VISUALIZATIONS
Importance of
Design
Insights depend
on using the
right chart
for the right
question
Kandel, Heer, Plaisant, et al. (2011)
http://dx.doi.org/10.1177/1473871611415994
Importance of
Design
Insights depend
on using the
right chart
for the right
question
Kandel, Heer, Plaisant, et al. (2011)
http://dx.doi.org/10.1177/1473871611415994
Importance of
Design
Insights depend
on using the
right chart
for the right
question
5000-item result limit
Silent failure
Kandel, Heer, Plaisant, et al. (2011)
http://dx.doi.org/10.1177/1473871611415994
Importance of
Design
Insights depend
on matching
data properties
to visual
properties
Borkin, Gajos, Peters, et al. (2011)
http://bit.ly/hemovis
Design
Instincts
Be comfortable
with a variety
of charts
http://www.datavizcatalogue.com/
Design
Instincts
Follow the
work of other
designers
https://medium.com/accurat-studio/the-architecture-
of-a-data-visualization-470b807799b4
Design
Instincts
Get feedback
http://dataremixed.com/2016/04/the-
design-of-everyday-visualizations
VISUALIZATION AND
REPRODUCIBILITY
Reproducible
Research
“[T]he ability to
recompute
results”
“Can I trust this
analysis?”
Not the same as
trusting the
results
Jeffrey T. Leek, and Roger D. Peng PNAS 2015;112:1645-1646
http://dx.doi.org/10.1073/pnas.1421412111
Peer review and editor evaluation help treat poor data analysis.
Reproducible
Visualization
The ability to
regenerate
visualizations
Doesn’t ensure
effectiveness
of visualization
Scripting
Visualization
JMP Pro (proprietary)
R (fully open)
MATLAB (proprietary)
Python (fully open)
Why create reproducible
visualizations?
• Transparency of process
• Easy to recreate previous figures
• Easy to create multiple figures
that have a similar style
…, but often much harder to
customize, add design elements
Reproducible
Design
http://vdl.sci.utah.edu/publications/2016_infovis_hanpuku
LESSONS FOR DATA SCIENCE
Visualization:
• Exploits powerful our visual processing
system
• Can improve data exploration and
communication
• Requires thoughtful design choices for
full impact
• Will become increasingly reproducible
RESOURCES
Learn more about
visualization
• Data Matters 2017 (Raleigh, NC)
http://datamatters.org
• Stephanie Evergreen’s
Data Visualization Academy (virtual)
http://academy.stephanieevergreen.com
• VDS Workshop at IEEE VIS conference
http://www.visualdatascience.org/
• In-person training, e.g.:
– Andy Kirk
http://www.visualisingdata.com/training
– Cole Nussbaumer Knaflic
http://www.storytellingwithdata.com/public-workshops
QUESTIONS?
Angela Zoss
angela.zoss@duke.edu
Slides: http://bit.ly/vis4ds

Visualization For Data Science