A Visual Guide for Describing Numbers

torturing numbers
a novice’s guide to descriptive dtatistics
1

Bandhu
P.
Das

"If you torture the data long
enough, it will confess"
@BPDas_
2

– Ronald Harry Coase

why do we torture numbers?
@BPDas_
3

q  Describe the story
q  Find trends in data
against variation
q  Determine if a sample
represents a population
q  Draw conclusions about the story

a tool called
‘descriptive statistics’
is used
@BPDas_
4

describing numbers
@BPDas_
5

25 people were asked what an
average person pay in tax?
What do these numbers tell you?
£45,000
£3,700
£10,000
£2,000
£2,000

£15,000
£3,000
£5,000
£3,700
£2,000

£10,000
£2,000
£2,000
£3,700
£2,000

£5,700
£2,000
£2,000
£3,700
£2,000

£5,000
£2,000
£5,000
£2,000
£2,000

describing numbers
@BPDas_
6

£2,000
Here is the same data ordered from greatest to
least and weighted to show how many times each
value occurs in the data set
•  Now what do the data tell
you?
•  What is the average income?
£45,000
£15,000
£10,000
£5,700
£5,000
£3,700
£3,000

£45,000
£15,000
£10,000
£5,700
£5,000
£3,700
£3,000
describing numbers
@BPDas_
7

BEWARE! The reported ‘average’ might
depend on what you are meant to see.
Which would you use?
MEAN (arithmetic average)
MEDIAN (midpoint in range)
MODE (most frequent)
So, to really understand the
data set you need more than
just the ‘average’
£2,000

spread and variability
@BPDas_
8

You need to know the spread of the data
•  This histogram
shows the ages
of people that
use a smart
phone
•  Is it typical
for 90 year
olds to use a
smart phone?

@BPDas_
9

When the mean and median are the same, you
have a special situation called a ‘normal’ curve
On this
symmetrical
curve, the
variability can
be described
using standard
deviations (SD)

@BPDas_
10

SD is a way to determine how far a data
point is from the mean
You can now say
that 90 year
olds fall more
than 2 SD from
the mean, or
that they make
up less than
2.5% of the
data set

@BPDas_
11

If we collapse the whole data set to one bar,
we can show the mean with some measure
of variability (std dev, std error, etc.)
Without some indication of variability, you
cannot effectively compare two data sets

@BPDas_
12

Min Q1 Median Q3 Max
Perhaps the best way to describe any data set is
with ﬁve numbers: Minimum, Q1, Median, Q3,
Maximum. This helps when comparing data sets,
and when there are oddities called outliers.
25% 25% 25% 25%
*

“79.48% of all statistics are
made up on the spot.”
@BPDas_
13

– John A. Paulos

a sample study
@BPDas_
14

Researchers want to
know which of three
fertilisers produce the
highest wheat yield in
kg/plot

a sample study
@BPDas_
15

They design a study with three treatments
and ﬁve replications for each treatment
3 Treatments (Fertilisers 1, 2 and 3)
5Replicates

a sample study
@BPDas_
16

Could a nearby
forest or
river be a
confounding
variable?
Variables like soil type and other local
inﬂuences may have unexpected impacts…

a sample study
@BPDas_
17

This is why a good study is
randomised, to defeat potentially
confounding variables

Does the sample
plot in our study
represent all the
wheat in all the
world?
P
O
P
U
L
A
T
I
O
N
SAMPLE
@BPDas_

18

uncertainty
@BPDas_
19

With all the unknown variables, there will
always be a degree of uncertainty that our
sample represents the population
That’s why the more samples we have, the more
confident we are that our study represents the
population

confidence
@BPDas_
20

•  Any confidence interval
could be used, but 95% is
often chosen
•  This means that 95% of
the time, you expect your
data represents reality
•  BEWARE reports with no
confidence interval

@BPDas_
21

Fer$lizer
1
Fer$lizer
2
Fer$lizer
3

64.8
56.5
65.8

60.5
53.8
73.2

63.4
59.4
59.5

48.2
61.1
66.3

55.5
58.8
70.2

two ways to present data
Tables are the preferred way to show data,
but graphs paint a quick, easy and
seductive picture

drawing conclusions
A presenter may want you to see a
relationship between two variables
Fertiliser 3 appears to increase the average yield
of wheat – but what kind of average is this? How big
was the sample? Where is the indication of
variability? Where is the confidence interval?
@BPDas_
22

drawing conclusions
A presenter may want you to see a
relationship between two variables
Fertiliser 3 appears to increase the average yield
of wheat – but what kind of average is this? How big
was the sample? Where is the indication of
variability? Where is the confidence interval?
@BPDas_

23

Bad stats and
presentation may
lead to bad
conclusions
2 SD

drawing conclusions
@BPDas_
24

Correlation does not imply causation
The more firemen fighting a fire, the
bigger the fire is observed to be.
Therefore more firemen cause an increase
in the size of a fire

Often, a presenter wants to lead you to
a conclusion. Newspapers, TV and
online articles should be scrutinised!
BEWARE:
“This is not a scientific poll…”
“These results may not be representative of
the population”
“…based on a list of those that responded”
“Data showed a trend but was not
statistically significant”
it’s all in how they are presented
@BPDas_
25

@BPDas_
26

Pies are for eating
It’s very hard to see differences
BEWARE CHARTJUNK!

@BPDas_

27

Amusing graphics are nothing but distractions
Again, it’s very hard to see differences
BEWARE CHARTJUNK!

@BPDas_
28

Here is the same population growth data
shown on two scales. Which would you use to
demonstrate rapid growth?
BEWARE tricky scales!

@BPDas_
29

BEWARE statements with no context.
Here’s a made-up example:
Did you know that even speaking to
someone that once smoked, DOUBLES
your chance of getting cancer?! ;)
Your odds go from
to
0.000000001:1
0.000000002:1

conclusion
@BPDas_
30

Like any tool, stats can be misused
(intentionally or unintentionally)
Maintain a healthy skepticism and
question charts, tables and conclusions
where insufﬁcient information is provided

references
@BPDas_
31

-  The Cartoon Guide to Statistics (1993)
-  Larry Gonick and Woolcott Smith
-  How to Lie with Statistics (1954)
-  Darrel Huff

A Visual Guide for Describing Numbers

More Related Content

Similar to A Visual Guide for Describing Numbers

Recently uploaded

A Visual Guide for Describing Numbers