Edwin de Jonge (@edwindjonge)
May 17th 2016,
Visualisation Workshop, Valencia
Uncertainty visualisation
Who am I?
Statistical consultant / Data scientist
- working @ R&D department of Statistics
Netherlands (CBS)
- Expertise:
- Visualisation
- Computational Statistics
- Complexity
Chop: resolutionStatistics and Visualisation?
Edward Tufte
– Visualisation
expert:
Is a statistician!
Minard: Napoleans war on Russia
Nathan Yau
(flowing
data)
How not to lie with statistics
• Is a
statistican!
William Cleveland
• Is a statistician
• Deserves more
• credits
• Scientific work
• on which charts
• really work:
• User experiments!
Why Visualization?
“Statistics is the study of the collection, 
organization, analysis, interpretation, and 
presentation of data. 
It deals with all aspects of 
this, including the planning of data collection in 
terms of the design of surveys and experiments”
Wikipedia (May 2016)
• Visualisation useful for every step:
•

Collection


Presentation of data


But also analysis and interpretation!
How not to lie with statistics
Anscombes quartet…
DS1 x
y
DS2 x y
DS3
x y DS4 x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
Anscombe’s quartet
Property Value
Mean of x1, x2, x3, x4 All equal: 9
Variance of x1, x2, x3, x4 All equal: 11
Mean of y1, y2, y3, y4 All equal: 7.50
Variance of y1, y2, y3, y4 All equal: 4.1
Correlation for ds1, ds2, ds3, ds4 All equal 0.816
Linear regression for ds1, ds2, ds3, ds4 All equal: y = 3.00 + 0.500x
Looks the same, right?
Lets plot!
Publishing uncertainty...
Official statistics institute are:
– very careful / prudent / reluctant on publishing
uncertainty margins.
Reasons:
– “Users don't understand them”
– “Users dont need them”
– “Users may choose the number that best fits”
– “We don't have an accurate estimation of the
accuracy”.
– “It may be embarrassing large”
– ?
Why is uncertainty important?
For official statistics, at least two reasons:
– Communicating accuracy
– Statistical/stochastic uncertainty
Let's view two cases of stats NL (CBS)
What is not surrounded by uncertainty cannot be
the truth,
Richard Feynman
Case 1: Diabetes (stats nl)
– Diabetes incidence
– Based on a (large) health survey
of statistics netherlands (CBS)
StatMine 0.2
Diabetes increasing
For everyone?
StatMine 0.2
Small multiples:
Split in groups
StatMine 0.2
StatMine 0.2
Reaching
measurement
accuracy
Case 2: Traffic casualties
– Based on mortality statistics
– In NL, based on a register, all medical
death certificates collected by our office.
Case 2: Traffic casualties
Traffic casualties
Exact numbers!
Mortality stats
(no estimation)
Traffic casualties
Let's split in
smaller regions
October 1st 2013, Statistics Netherlands
ag
e
After ‘datareduction’
ag
e
1
amount
mount
Plot: 3d?
October 1st 2013, Statistics Netherlands
ag
e
After ‘datareduction’
ag
e
1
amount
mount
Plot: 3d?
Over 10%
year on year changes!
Case 2: Stochastic
uncertainty
Side track:
Exact numbers...
Exact is not always better...
Uncertainty Visualisation
– Active research activity!
–
– Most research papers on:

Cartography

Geology

3D (Medical imaging, astronomy

Techniques used:

Transparency

Different colorisation (“whitening”)

– Not many on plain statistical graphs…
– Error bars are known to be imperfect.
User Study 1:
StatMine
– The perception of visual uncertainty representation by
non-experts

Tak,Toet, van Erp,Transactions on
Visualisation and computer Graphics, 2014

– User experiment:

140 users
– Tests:

Reading of certainty.

Given a number, how certain is that value?
Weater forecast (Dutch television)
User Study 1: findings
StatMine
– Non-expert can read probability intervals.
– However: users with high numeracy are
better at it.
– No (significant) difference in response
time.
– Random lines work well for stochastic
numbers.
User Study 2:
– Effect of displaying uncertainty in Line and Bar charts,

Van der Laan, de Jonge, Solcer, IVAPP,
2015



User study:

110 persons

Goal:

Line: how does uncertainty effect (overall) trend?

(main purpose line chart)

Bar: how does uncertainty effect comparison?

(main purpose bar chart)
StatMine
User Study 2: Confidence intervals
–
– Al facts Statistics Netherlands have confidence interval
– European Statistics Code of Practice (12.2):
- “sampling and non sampling errors should be
systematically documented”
Investigate how uncertainty in numbers can be presented
understandable to users.
StatMine
Restricted to:
- How do users interpret CI’s?
- And what does that affect the interpretation of facts?
- Do users need CI’s?
Assumption:
- For test data set of point estimate with CI available
Study 2
StatMine 0.2
User test (100+) with synthetic data shows that:
- CI’s improve validity of user statements (they are
more correct)
User test CI’s
StatMine 0.2
Line charts with uncertainty
Line charts:
– displaying uncertainty improve user statements
(more correct)
– “band + line” works best for point estimate
, “error bar” works best for interval estimate
– Users do not “freak out” on uncertainty
– Appreciate it and ask for its definition.
User Study 2: findings Line chart
StatMine 0.2
Bar charts with uncertainty
Error bar chart
error bars
Bar chart variant 1 - chisel
Bar Chart variant 2 - cigarette
Bar charts:
– displaying uncertainty makes user less confident
in comparison tasks. (which is good)
– No significant difference between methods
– PhDs prefer error bar, but error bar does do not
perform better.
– When publishing intervals (without point):
– cigarette is better
User Study 2: findings 2
StatMine 0.2
Start publishing uncertainty measures
Plot them!
User appreciate it, and we are doing
statistics aren't we?
Recommadation
StatMine 0.2
Far better an approximate answer to the right
question, which is often vague, than an exact answer 
to the wrong
question, which can always be made precise
John Tukey
Questions?

Uncertainty visualisation