Dr. Carlos Rodríguez Contreras
UNAM
The Exceptional Power of
Data Visualisation
Scientific data visualisation is a
collection of statistical methods, both
quantitative and qualitative, for
identifying relationships in data and
consolidating them into an illustrative
informative summarising graphic.
As Edward Tufte, one of the main
authorities in data visualization, says:
The Art of Data Visualization
To illustrate the raw power of
data visualisation, let us walk
trough a classic example:
Analyse the relationships in the next
four datasets to identify signifcant
diferences between them.
The Anscombe's Quartet
Graphs in Statistical Analysis
F. J. Anscombe
The American Statistician, Vol. 27, No. 1. (Feb., 1973), pp. 17-21.
The Anscombe's Quartet
 A summary statistics are shown in this table.
 There appears to be little diference between the
four datasets.
 Even calculating the correlation coefcient between
each x and y variable, there are no visible
diferences between data sets.
Scatter Plots
Volume
per day
Cost per
day
23 125
26 140
29 146
33 160
38 167
42 170
50 188
55 195
60 200
Production Volume vs. Cost per Day
0
50
100
150
200
250
0 10 20 30 40 50 60 70
Volume per Day
CostperDay
Types of Relationships
Linear Relatonships
X X
YY
Types of Relationships
Curvilinear Relatonships
X X
YY
Types of Relationships
No Relatonship
X X
YY
The Anscombe's Quartet
Dataset 1
The Anscombe's Quartet
Dataset 2
The Anscombe's Quartet
Dataset 3
The Anscombe's Quartet
Dataset 4
Let us see another example to
illustrate the raw power of
data visualisation:
Consider the famous investigation on
marriage selection in respect to stature
conducted by Sir Francis Galton.
The Galton's Investigation
Natural Inheritance
Francis Galton, F.R.S.
The Galton's Investigation
Preparing the data in a Cross Tab for analysis
An association plot for
The Galton's Investigation
An spine plot for
The Galton's Investigation
Lifeboats on the Titanic
Report Into the Loss of the SS
Titanic - 100 Years Later
Lifeboats R dataset
A data frame with 18 observations and 8 variables:
launch launch time in "POSIXt" format.
side factor. Side of the boat.
boat factor indicating the boat.
crew number of male crew members on board.
men number of men on board.
women number of women (including female crew) on board.
total total number of passengers.
cap capacity of the boat.
Titanic Ternaryplot
Let us try all these in
Work with: categoricaldataviz.R
anscombe1.R
anscombe2.R
titanic.R
Power DataViz

Power DataViz