David Giard
Microsoft Senior Software Engineer
blog: DavidGiard.com
tv: TechnologyAndFriends.com
aka.ms/gcast
twitter: @DavidGiard
Data Visualization
The Ideas of Edward Tufte
@DavidGiard
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.59
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.72 8.0 6.89
@DavidGiard
0
5
10
0 10 20
I
0
5
10
0 10 20
II
0
5
10
0 10 20
III
0
5
10
0 10 20
IV
@DavidGiard
Dr. Edward Tufte
@DavidGiard
Graphical Excellence
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
500,000
100,000
10,000
@DavidGiard
Graphical Integrity
@DavidGiard
Blatant Lies
Source: Fox News, Dec 2011
Reprinted by Washington Post
@DavidGiard
$(11,014)
$0 $(11,014)
@DavidGiard
Lie
@DavidGiard
Lie Factor
𝑆𝑖𝑧𝑒 𝑂𝑓 𝐸𝑓𝑓𝑒𝑐𝑡 𝑆ℎ𝑜𝑤𝑛 𝐼𝑛 𝐺𝑟𝑎𝑝ℎ𝑖𝑐
𝑆𝑖𝑧𝑒 𝑂𝑓 𝐸𝑓𝑓𝑒𝑐𝑡 𝐼𝑛 𝐷𝑎𝑡𝑎
@DavidGiard
Lie
Data Increase = 53%
Graphical Increase = 783%
Lie Factor=14.8
@DavidGiard
Truth
0
5
10
15
20
25
30
1978 1979 1980 1981 1982 1983 1984 1985
Required Fuel Economy Standards:
New cars built from 1978 to 1985
@DavidGiard
Data Change = 125%
Graphical Change = 406%
Lie Factor=3.8
@DavidGiard
Data Change = 554%
Graphical Change = 27,000%
Lie Factor=48.8
@DavidGiard
@DavidGiard
@DavidGiard
Context
@DavidGiard
275
300
325
1955 1956
Connecticut Traffic Deaths,
Before (1955) and After(1956)
Stricter Enforcement by the Police
Against Cars Exceeding Speed Limit
Before stricter
enforcement
After stricter
enforcement
@DavidGiard
@DavidGiard
220
245
270
295
320
1951 1952 1953 1954 1955 1956 1957 1958 1959
Connecticut Traffic Deaths
1951-1959
@DavidGiard
6
8
10
12
14
16
1951 1952 1953 1954 1955 1956 1957 1958 1959
Traffic Deaths per 100,000
Persons in Connecticut, Massachusetts,
Rhode Island, and New York
1951-1959
NY
MA
CT
RI
@DavidGiard
Principles of Graphical Integrity
• Data Representations proportional to Data
• #Dimensions in graph = #Dimensions in data
• Real dollars, instead of deflated dollars
• Provide context
@DavidGiard
Data-Ink
@DavidGiard
Data-Ink Ratio
=
𝐷𝑎𝑡𝑎 𝐼𝑛𝑘
𝑇𝑜𝑡𝑎𝑙 𝐼𝑛𝑘
@DavidGiard
Redundant Data
@DavidGiard
35.9
@DavidGiard
35.9
@DavidGiard
Metadata
@DavidGiard
0
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6
@DavidGiard
0
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6
@DavidGiard
0
20
40
60
80
100
120
140
160
0 1 2 3 4 5 6
@DavidGiard
0
40
80
120
160
0 2 4 6
@DavidGiard
0
40
80
120
160
0 2 4 6
@DavidGiard
0
40
80
120
160
0 2 4 6
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
Principles
• Above all else, show the data
• Maximize the Data-Ink ratio, within reason
• Erase non-data-ink
• Erase redundant data-ink
• Revise and edit
@DavidGiard
Vibrations
@DavidGiard
Vibrations
@DavidGiard
@DavidGiard
@DavidGiard
0
5
10
15
20
25
30
35
40
45
50
55
60
PERCENT
CRITICAL
ARTICLES
ISSUE AREAS
INFLATION
UNEMPLOYMENT
SHORTAGES
RACE
CRIME
GOVT. POWER
CONFIDENCE
WATERGATE
COMPETENCE
@DavidGiard
0
5
10
15
20
25
30
35
40
45
50
55
60
PERCENT
CRITICAL
ARTICLES
ISSUE AREAS
@DavidGiard
INFLATION
UNEMPLOYMENT
SHORTAGES
RACE
CRIME
GOVT.
POWER
CONFIDENCE
WATERGATE
COMPETENCE
0
5
10
15
20
25
30
35
40
45
50
55
60
PERCENT
CRITICAL
ARTICLES
ISSUE AREAS
@DavidGiard
CONFIDENCE
WATERGATE
GOVT.
POWER
CRIME
COMPETENCE
INFLATION
RACE
SHORTAGES
UNEMPLOYMENT
0
5
10
15
20
25
30
35
40
45
50
55
60
PERCENT
CRITICAL
ARTICLES
ISSUE AREAS
@DavidGiard
@DavidGiard
Chart Junk and Ducks
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
Worst. Graph. Ever.
@DavidGiard
Year % Students < 25
1972 28.0
1973 29.2
1974 32.8
1975 33.6
1976 33.0
@DavidGiard
Year % Students < 25 % Students
>= 25
1972 28.0 72.0
1973 29.2 70.8
1974 32.8 67.2
1975 33.6 66.4
1976 33.0 64.0
@DavidGiard
Multifunctioning
Graphical Elements
@DavidGiard
@DavidGiard
@DavidGiard
@DavidGiard
Data Density
@DavidGiard
Data Density
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑟𝑖𝑒𝑠 𝑖𝑛 𝑑𝑎𝑡𝑎 𝑚𝑎𝑡𝑟𝑖𝑥
𝐴𝑟𝑒𝑎 𝑜𝑓 𝐷𝑎𝑡𝑎 𝐺𝑟𝑎𝑝ℎ𝑖𝑐
@DavidGiard
Low Data Density
@DavidGiard
Low Data Density
Number of entries = 4
Graph Area = 26.5 square inches
Data Density =
4 𝑑𝑎𝑡𝑎 𝑒𝑛𝑡𝑟𝑖𝑒𝑠
26.5 𝑠𝑞. 𝑖𝑛.
=.15 data entries per sq. in.
@DavidGiard
High Data Density
181 Numbers per square inch
@DavidGiard
High Data Density
1,000 Numbers per square inch
@DavidGiard
Small Multiples
@DavidGiard
Small Multiples
@DavidGiard
Small Multiples
@DavidGiard
Small Multiples
@DavidGiard
Tufte’s Graphs
• Sparkline
• Slope Graph
@DavidGiard
Sparklines
@DavidGiard
Sparklines
@DavidGiard
Slope Graph
@DavidGiard
Slope Graph
Source: The Atlantic, June 30, 2012
@DavidGiard
Takeaways
• Maintain Graphical Integrity
• Maximize Data-Ink Ratio, within reason
• Avoid Chartjunk and Ducks
• Use Multifunctioning Graphical Elements, if
possible
• Keep Labels with data
• Maximize Data Density
@DavidGiard
@DavidGiard
0
0 -5
-9
-21
-11
-20
-24
-30
-26
Temperature ( C )
10/10
10/18
10/24
11/9
11/14
11/20
11/28
12/1
12/6
12/7
100,000
96,000
55,000
37,000
24,000
50,000
25,000
20,000
12,000
10,000
# Troops
10/10
10/18
10/24
11/9
11/14
11/20
11/28
12/1
12/6
12/7
040 90
145
180
250
275
300
320
365
Distance Traveled (km)
10/10
10/18
10/24
11/9
11/14
11/20
11/28
12/1
12/6
12/7
@DavidGiard
0
20,000
40,000
60,000
80,000
100,000
120,000
10/10
10/12
10/14
10/16
10/18
10/20
10/22
10/24
10/26
10/28
10/30
11/1
11/3
11/5
11/7
11/9
11/11
11/13
11/15
11/17
11/19
11/21
11/23
11/25
11/27
11/29
12/1
12/3
12/5
12/7
#
Troops
Date
Troops
Troops
@DavidGiard
0
20,000
40,000
60,000
80,000
100,000
120,000
10/10
10/12
10/14
10/16
10/18
10/20
10/22
10/24
10/26
10/28
10/30
11/1
11/3
11/5
11/7
11/9
11/11
11/13
11/15
11/17
11/19
11/21
11/23
11/25
11/27
11/29
12/1
12/3
12/5
12/7
#
Troops
Date
Troops
Troops
@DavidGiard
0
20,000
40,000
60,000
80,000
100,000
120,000
10/10
10/12
10/14
10/16
10/18
10/20
10/22
10/24
10/26
10/28
10/30
11/1
11/3
11/5
11/7
11/9
11/11
11/13
11/15
11/17
11/19
11/21
11/23
11/25
11/27
11/29
12/1
12/3
12/5
12/7
#
Troops
Date
Troops
@DavidGiard
0
20,000
40,000
60,000
80,000
100,000
120,000
10/10
10/12
10/14
10/16
10/18
10/20
10/22
10/24
10/26
10/28
10/30
11/1
11/3
11/5
11/7
11/9
11/11
11/13
11/15
11/17
11/19
11/21
11/23
11/25
11/27
11/29
12/1
12/3
12/5
12/7
#
Troops
Date
@DavidGiard
0
20,000
40,000
60,000
80,000
100,000
120,000
10/10 10/17 10/24 10/31 11/7 11/14 11/21 11/28 12/5
#
Troops
Date
@DavidGiard
-35
-30
-25
-20
-15
-10
-5
0
0
20,000
40,000
60,000
80,000
100,000
120,000
10/10 10/17 10/24 10/31 11/7 11/14 11/21 11/28 12/5
Temperature
(Celsius)
#
Troops
Date
Troops
Temperature
David Giard
Microsoft Software Engineer
blog: DavidGiard.com
tv: TechnologyAndFriends.com
aka.ms/gcast
twitter: @DavidGiard
@DavidGiard
Video of Presentation
tinyurl.com/DataVizTechEd
tinyurl.com/DataVizITCamp
@DavidGiard
Learn more
• “Visualizing Code”
Matt Eland
Friday at 8:30AM in Salon A/H
@DavidGiard
@DavidGiard

Data Visualization - CodeMash 2022

Editor's Notes

  • #7 Hand-drawn graph from the 1880’s, showing Paris train schedule. Attributed to the French engineer Ibry. Source: E.J. Marey, La Methode de Graphique (Paris, 1885)
  • #8 William Playfair (1759-1823) 3 series over time: -Wheat prices -Labor wages -Monarch
  • #9 From 1960 census: # of families per county with very low income (<$3,000) # of families per county with very high income (>$10,000)
  • #10 Charles Joseph Minard, French Engineer, 1781-1870 “It may well be the best statistical graphic ever.” – Tufte Tan line = Napoleon’s march to Moscow in the winter of 1812. (422,000 men – 100,000 men) Black = Napoleon’s retreat to Poland. (422,000 men – 100,000 men) Width of lines represents size of army. (100,000 men - 10,000 men) Bottom line is linked to lower graph, showing dates and temperatures (very cold winter) Auxiliary troop movements are shown. Crossing Berzina River was a disaster. Variables: -Size of army -Location -Direction of movement -Temperature -Dates
  • #11 Charles Joseph Minard, French Engineer, 1781-1870 “It may well be the best statistical graphic ever.” – Tufte Tan line = Napoleon’s march to Moscow in the winter of 1812. (422,000 men – 100,000 men) Black = Napoleon’s retreat to Poland. (422,000 men – 100,000 men) Width of lines represents size of army. (100,000 men - 10,000 men) Bottom line is linked to lower graph, showing dates and temperatures (very cold winter) Auxiliary troop movements are shown. Crossing Berzina River was a disaster. Variables: -Size of army -Location -Direction of movement -Temperature -Dates
  • #15 From NY Times, 1978 Fuel economy standards increased by 53% Graphic shows fuel economy increased by 783% Lie factor = 14.8
  • #17 From NY Times, 1978 Fuel economy standards increased by 53% Graphic shows fuel economy increased by 783% Lie factor = 14.8
  • #19 From The Los Angeles Times, 1979 Lie factor = 3.8 (also horizontal spacing of X-axis is wrong)
  • #20 Time, 1979 1-dimensional data is shown as 3-dimensional objects Increase of 454% is shown as volume increase of 27,000% Lie factor=48.8, a record!
  • #21 Source: Sunday Times (London), 1979
  • #22 New York Times, 1978
  • #30 Data-ink = ink that directly shows the data and will result in loss of data if erased All else = decorations, metadata and redundant data. Proportion of a graphic’s ink devoted to the non-redundant display of data-information. 1.0 – proportion of graphic that can be erased without loss of data-information
  • #48 Duck-shaped building in Flanders, NY 3 types of chart junk: 1) Unintentional optical art 2) Grid 3) Self-promoting graphical duck
  • #49 Moire’ Effect Graphic appears to vibrate or shimmer
  • #57 Duck-shaped building in Flanders, NY 3 types of chart junk: 1) Unintentional optical art 2) Grid 3) Self-promoting graphical duck
  • #73 Source: Executive Office of the President, Office of Management and Budget, 1973
  • #74 Source: Executive Office of the President, Office of Management and Budget, 1973
  • #75 Source: JASA
  • #77 Source: Maps and Diagrams by F.J. Monkhouse and H.R. Wilkinson, 1971
  • #79 Source: Fluctuations of the Great Fisheries of Northern Europe by John Hjort, 1914
  • #80 See this animate at http://blog.stephenwolfram.com/2013/04/data-science-of-the-facebook-world/
  • #88 Source: The Visual Display of Quantitative Information by Edward Tufte
  • #89 Source: The Visual Display of Quantitative Information by Edward Tufte
  • #92 Charles Joseph Minard, French Engineer, 1781-1870 “It may well be the best statistical graphic ever.” – Tufte Tan line = Napoleon’s march to Moscow in the winter of 1812. (422,000 men – 100,000 men) Black = Napoleon’s retreat to Poland. (422,000 men – 100,000 men) Width of lines represents size of army. (100,000 men - 10,000 men) Bottom line is linked to lower graph, showing dates and temperatures (very cold winter) Auxiliary troop movements are shown. Crossing Berzina River was a disaster. Variables: -Size of army -Location -Direction of movement -Temperature -Dates