6. 6 •
I keep seeing plain tables.
Do they want me to read all this?
Did they copy-paste their slides from their paper?
Do they care about their audience?
Do they care about giving this talk?
Are they hiding something?
Do they realize a dataviz would be much more
powerful?
Most respectful interpretation?
9. 9 •
never trust summary
statistics alone;
always visualize your
data
Detecting patterns
http://www.thefunctionalart.com/2016/08/download-datasaurus-never-trust-summary.html
13. 13 • Pictures credits : Wiki Commons / the_jetboy CC2.0 / Sandor Vamos / Wiki Commons / Acute3D / Walkerssk / Pixabay / Wiki Commons / Wiki Commons
Many ways to apprehend the world
7,000,000 visitors a year2,500,000 rivets
10,100 tons
21. 21 •
Define your goal Choose an
effective visual
Find the right
focus
Close the loop
Explore / Explain
Question?
Simple is better
Function first,
form next
Use color, size
Remove clutter
Do you answer your
question?
Do you have a
story?
Follow the process
22. 22 •
Follow best practices
Actively take control
Think accessibility
Use rules of thumbs
Be truthful
23. 23 •
When you want to focus the
attention on just a number or two
When you have a mixed
audience, for information lookup
To show the relationship between
two things
The best for continuous data over
time
Makes it very easy to compare
categories
To compare totals and also
subcomponents
Choose an effective, simple visual
Source: http://www.storytellingwithdata.com/book/downloads
26. 26 •
There are many preattentive attributes
Source: http://www.storytellingwithdata.com/book/downloads
27. 27 •
But two are special
Colour is the most powerful tool you have.
Use it sparingly and resist the urge to use colour for the sake of being colourful.
Leverage colour selectively to highlight the important parts of your visual.
Size matters.
If you’re showing multiple things that are of roughly equal importance, size them similarly.
If there is one really important thing, leverage size to indicate that: make it BIG!
28. 28 •
Maximise data-ink ratio, within
reason.
Edward Tufte, The Visual Display of Quantitative Information
29. 29 •
Forgo chartjunk, including
moiré vibration, the grid, and the duck.
Edward Tufte, The Visual Display of Quantitative Information
37. 37 •
You should care
It’s not only about nice graphics
There’s a wealth of resources
Well-grounded best practices
38. Further tips
Highlight the important stuff
Eliminate distractions
Create a visual hierarchy of
information
Make it accessible
1
2
3
4
Only highlight 10% of the overall visual. Use preattentive attributes to do so, even together
for very important stuff
When detail isn’t needed, summarize. Ask yourself if eliminating this would change
anything. If not, take it out. Push less impacting items to the background with light grey
Organize information to guide the audience. Follow a Z-pattern from top left to bottom right.
You might be an engineer, but it shouldn’t take someone with an engineering degree to
understand your graph.
Use simple language5
Choose simple language over complex, choose fewer words over more words, define any
specialized language with which your audience may not be familiar, and spell out
acronyms.
Be mindful of aestethics6
Be smart with colors. Pay attention to alignment to give a sense of unity and cohesion.
Leverage white space, and don’t add stuff just to fill space
Always prefer simple over complex
39. 39 •
The Visual Display of Quantitative Information. Edward Tufte. Graphics Press, 2d edition,
2001. The classic on beautiful, faithful displays.
Visualization Analysis and Design. Tamara Munzner. AK Peters / CRC Press, Oct 2014. A
comprehensive textbook.
Visualize this: the FlowingData guide to design, visualization, and statistics. Nathan Yau. John
Wiley & Sons, 2011. For practical examples and code.
The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting
Data, Facts, and Figures. Dona M. Wong. W. W. Norton & Company, 2013.
Storytelling with Data: A Data Visualization Guide for Business Professionals. Cole
Nussbaumer Knaflic. Wiley, 2015.
Books
40. 40 •
Tukey, John W. "The future of data analysis." The annals of mathematical statistics 33.1 (1962): 1-67. pdf
Cleveland, William S., and Robert McGill. "Graphical perception: Theory, experimentation, and application to
the development of graphical methods." Journal of the American statistical association 79.387 (1984): 531-554.
pdf
Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. "Let's practice what we preach: turning tables into
graphs." The American Statistician 56.2 (2002): 121-130. pdf
Gelman, Andrew, and Antony Unwin. "Infovis and statistical graphics: different goals, different looks." Journal of
Computational and Graphical Statistics 22.1 (2013): 2-28. pdf
Gelman, Andrew, and Thomas Basbøll. "When do stories work? Evidence and illustration in the social
sciences." Sociological Methods & Research 43.4 (2014): 547-570. pdf
Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using t-SNE." Journal of machine learning
research 9.Nov (2008): 2579-2605. pdf
Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo. "Examples are not enough, learn to criticize! criticism for
interpretability." Advances in Neural Information Processing Systems. 2016. pdf
Wongsuphasawat, Kanit, et al. "Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow." IEEE
transactions on visualization and computer graphics 24.1 (2018): 1-12. pdf
Research papers
41. 41 •
• Flowing Data
• Storytelling With Data
• The Functional Art
• Google Brain PAIR group
• colorbrewer2.org helps select colors
Blogs & other resources
Learn from good examples
• junkcharts
• vizwiz
• fivethirtyeight
• theguardian.com/data
But also from bad ones
• viz.wtf
Practice with makeovermonday
Interested? React on paris-wimlds.slack.com
Why are we interested in data visualization?
Disclaimer: I’m not a dataviz specialist. My interest in the topic is fairly recent.
Out of curiosity, I animated the roundtable on data visualization at the first WiMLDS Paris Meetup… we had interesting discussions, and I bought one more book.
I did some more serious research after the Meetup. Read some books & papers.
To be honest, I didn’t find it mindblowing. The most disappointing is, yes, it takes lots of time and practice to do _really_ good dataviz. However, there are lots of easy-to-follow best practices.
What I found most shocking is, how little these basic best practices are actually followed by the machine learning experts, engineers, researchers etc. We should behave better towards our data. _I_ can do better.
There are much more than 2 reasons. But these ones I found important for anyone in data-related engineering/science/research.
These are the 2 reasons I'm giving this talk :-)
It might be OK in a paper which I can take the time to read carefully. Time is limited in a presentation. And it’s always precious.
What’s the most respectful interpretation?
Copy-pasting just means they rushed preparing the talk. So, … do they care?
Maybe people just don’t realize how powerful a good dataviz is to convey information.
It might be OK in a paper which I can take the time to read carefully. Time is limited in a presentation. And it’s always precious.
What’s the most respectful interpretation?
Copy-pasting just means they rushed preparing the talk. So, … do they care?
Maybe people just don’t realize how powerful a good dataviz is to convey information.
Data visualisation is about communication: let people see by themselves what _you_ want to tell them.
This one is not even very good (and it’s a bit outdated). But it gets the message across.
When dealing with lots of data, it’s tempting to only look at summary statistics to analyse the data.
However, aggregated statistics can hide lots of patterns.
You can create a dataset that has the same summary statistics with a totally different underlying distribution.
An answer to:
What can you do to facilitate understanding?
If it was good for Maxwell, it certainly is good for me!
In machine learning, it's about explainability and interpretability.
Google is pushing it very far. ICML talk at workshop on Human Interpretability by Fernanda Viégas and Martin Wattenberg, Google Brain.
http://playground.tensorflow.org
We've been given 5 senses to explore the world and learn.
There are always different ways to explain / understand / explore the same place ==> or the same data.
https://www.quora.com/How-fast-is-the-human-visual-system-as-a-whole
"High-bandwidth channel to our brain" (Munzner) Parallel processing / preconscious level
Also role of iconic memory (https://en.wikipedia.org/wiki/Iconic_memory)
'Popout', 'grouping' => Gestalt theory
In the next slide do the same game
In the next slide do the same game
In the next slide do the same game
By highliting the 3 it was that much easier for our brain to spot them, even before we knew they were «3»
So we want to take advantage of our brain power. Let’s take a step back.
Dataviz is a tool for a purpose, don’t use it at random => Follow the process.
Side note: who is this Cail I just spotted on the Eiffel Tower? https://fr.wikipedia.org/wiki/Jean-Fran%C3%A7ois_Cail
Accessibility: eg colorblind friendly colors, police as big as possible.
Truthfulness: don’t truncate axis, state data sources.
« All things whatsoever ye would that men should do unto you, even so do ye also unto them ».
Tables may be useful when people may be looking for a specific figure – have them ordered in a sensible way for easy lookup!
Which is the bigger supplier?
Is the yellow share bigger than green?
Pie charts are tricky because we are much more capable to compare lengths than arcs or angles.
They are especially bad as a tool of comparison where bars are far superior in rapidity of insight.
By highliting the 3 it was that much easier for our brain to spot them, even before we knew they were «3»
Note as you scan across the attributes in Figure 4.4, your eye is drawn to the one element within each group that is different from the rest:
you don’t have to look for it. That’s because our brains are hardwired to quickly pick up differences we see in our environment.
Don’t overdo it with colors.
Think of colorblindess & other interpretations of colors (context matters).
Comparing different nearest neighbors algorithms (but who cares?)
Depending on the scale, your eyes may blink differently.
Note that matplotlib defaults have been improved a lot in the last 2 years.
I’m not a data viz expert. I still find it hard to have nice plots. But seaborn really makes it easy to get it clean.
Note that matplotlib defaults have been improved a lot in the last 2 years.
I’m not a data viz expert. I still find it hard to have nice plots. But seaborn really makes it easy to get it clean.
Also mention: ggplot, Tableau, and for interactive viz, Python Altair, d3js,
Note that matplotlib defaults have been improved a lot in the last 2 years.
I’m not a data viz expert. I still find it hard to have nice plots. But seaborn really makes it easy to get it clean.
We’re not done!
It’s not random
You should care
If you want to communicate results
If you want to know your data better
It’s not only about nice graphics
It’s about statistics
It’s about putting the human in the loop
It’s about asking the right questions
There’s a wealth of resources
It’s not easy to plot nice graphs
Follow best practices
Provided as a reference.
A very incomplete list, but these books were used in the preparation of (the previous versions of) this talk.
https://en.wikipedia.org/wiki/Jacques_Bertin
Tufte's duck: a really pretty house which is not a house anymore (not even windows, or on the wrong side!)
https://en.wikipedia.org/wiki/Big_Duck