This document provides dos and don'ts for data visualization. It discusses how to properly scale and represent proportions in charts. Common misleading techniques are shown such as rescaling axes, omitting the y-axis origin, using different scales for the same axis, and including meaningless or invented data. The document advocates showing only relevant information without crowding plots. It also notes that the visualization should fit the intended audience and goal. While rules can be broken, the overall message is that visualizations should accurately and honestly portray the data.
5. NOW IN A PROPORTIONAL SCALE
PSOE
PARTIDO
POPULAR
NúmerodeParados
6. DON’T OMIT THE ORIGIN OF THE Y-AXIS
Where is
the
Axis??
94 is not 0
Source: http://blog.rtve.es/
http://mediamatters.org/
7. DO SHOW THE Y-AXIS FROM THE ORIGIN
MillionDollars
50.66% 49.07%
8. THIS ALSO HAPPENS IN SCIENTIFIC PAPERS
This is a big
difference, isn’t it?
According to the
paper,
this should be
1.82
The value of Y
(Rape Myth
Acceptance)
varies between
1 and 5
There are values
placed in the
wrong position
Source: Fox, Jesse; Bailenson, Jeremy N.; Tricase, Liz (2013). "The embodiment of
sexualized virtual selves: The Proteus effect and experiences of self-objectification via
avatars". Computers in Human Behavior 29 (3): 930–938
9. THE REALITY IS SOMETHING DIFFERENT
Face
It was not that
different in the
end…
Remember:
The value of Y
(Rape Myth
Acceptance)
varies between
1 and 5
10. DON’T USE INVENTED OR TAILOR-MADE SCALES
How can this be a line?
Source: http://mediamatters.org/
12. DON’T USE DIFFERENT SCALES FOR THE SAME
AXIS
Left Y-Axis
(representing the
non-smokers)
starts at 2
Right Y-Axis
(representing
the smokers)
starts at 3
Source: H. Wainer, Visual Revelations, Graphical Tales of Fate and
Deceptions from Napoleon Bonaparte to Ross Perot
Disclaimer! This Graph is
from a tobacco company
14. DON’T SHOW MEANINGLESS NUMBERS
DON’T USE PIE CHARTS
193% ???
That’s a big pie!
Source: http://mediamatters.org/
DON’T USE 3D
Perspective makes
percentages look different
Source: http://imgarcade.com/1/misleading-circle-graphs/
15. SOME THINGS WE LEARNED AT SCHIBSTED
■Know your audience and adapt the visualization to them
■The title matters, it has to be attractive but not distracting
■Select the most suitable plot, there is no one-plot-fit-all
■Show only relevant information, crowded visualizations are
misleading
■Sometimes you can break the rules…
16. DO CHOOSE A VISUALIZATION FITTING YOUR
AUDIENCE
Percentage of Sellers per segment
Slack channels sharing users
17. DON’T USE CROWDED PLOTS WITH MISLEADING
INFORMATION
■Too many elements
■The colours are
meaningless
■The axes are misleading
(not showing the origin)
18. DO SHOW ONLY WHAT IS IMPORTANT
■Axes starting at 0
■Only the necessary
elements
GOAL
Show the correlation of the
data points
19. … A DIFFERENTAPPROACH
■We don’t care about the
value it’s OK to break the
axis rule!!
■The colours have a meaning
GOAL
Show the distribution and
density of the data points
20. WE ARE LOOKING FOR TALENT!
inaki.puigdollers@schibsted.com
Thanks, questions?
Data Scientist – Schibsted Product & Technology
Editor's Notes
-A picture tells a thousand words
-Goal: share examples of visualizations showing distorted information and how can this be addressed
-Common practise to fool people’s mind is rescaling porportions
-Even though you show the numbers, if the plot is not proportional contradictory information
-A picture tells a thousand words
-Here you see how different the plot looks when the proportions are as they should
-However this particular example can be just an error, just not intentional. But what about this one?
-Spatial perception is a very important component of image processing in human’s brain
-This is why mass media abuses this kind of blatant distortions to communicate somehow biased message
-Again, if we do the exercise of re-plotting the data in a fairer way we see that reality is something different to what they try to show
-So the blue line is flatter than the one they presented originally, take your own conclusions…
-Another technique to show distorted data is omitting the Y-axis.
-Messes up with spatial perception again
-Comparing is very difficult
-But if we re-plot it truth comes to surface again…
-And that incredibly huge difference betwwen both candidates is gone
-And the federal wellfare received in US hasn’t grown as much neither...
-No surprise media uses this
-We all knew that TV and newspapers provided biased information
-Is more strange is to see this in science
-Some scientific studies use distortion techniques as well to “enhance” their message
-But if we see how it really looks like this is what we have: the difference between conditions is not that big
-Is it science a matter of believe in the end?
-Another great example from Fox news: created a linear growth of the job loss by QUARTER out of the blue
-This is how it really looks, not only the values are not linear, but the periods are not quarters but random months across 3 different years!
-Another good deceiving technique is to use double axis in the same plot
-It can be good: enhanced readability, but if the axis are not the same you can create effects like the one from this tobacco company showing that smoking is not affecting with death rate, only the age matters
-However if we re-plot it correctly we see a complete different story
-No Surprise it comes from a tobacco company, right?
-And then we have the pie charts.
-Should I use them? I ‘ll try to avoid them
-If you insist
-remember simple rule: pie charts show parts of a whole so make them sum up to 100% and no more
- avoid perspective games
-Things we learned at Schisted, I’m going to talk about a couple of them
-One of the most delicate points: choosing which visualization to use
-Know audience beforehand
-Not everybody understands reality the same way, while a DS may feel comfortable with a network plot, BP tend to prefer bar plots or waterfall plots
-In addition, There is no one-plot-fit-all solution
-Once you have decided which way to go you have to be careful with the number elements you add to the plot. By elements I mean : colours, size of the points, width of the bars, regression lines,… amogn others
-Crowded plots are, more often than not, misleading and distracting audience's attention from what is important.
-My suggestion: do not add irrelevant elements, every single element you have in the plot has to be meaningful by itself.
-Here, for instance we have a clear goal, so we sticked to it and showed only elements that helped us to explain that message
-If your goal is different, so is your plot
-All in all, I would say that the golden rule in data visualization is two folded to communicate a message (this is your goal) based on some observed data (which you have to respect)
-If your goal is different, so is your plot
-All in all, I would say that the golden rule in data visualization is two folded to communicate a message (this is your goal) based on some observed data (which you have to respect)