Telling a Story – or Even Propaganda – Through Data Visualization

Demetris Trihinas
Demetris TrihinasFull-Time Lecturer at University of Nicosia
7/16/19 1Demetris Trihinas
trihinas.d@unic.ac.cy
1Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Telling a Story
– or Even Propaganda –
Through Data Visualization
Demetris Trihinas
Department of Computer Science
ailab @ University of Nicosia
trihinas.d@unic.ac.cy
7/16/19 2Demetris Trihinas
trihinas.d@unic.ac.cy
2Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Full-Time Faculty Member
University of Nicosia
“Designing and developing scalable and self-adaptive tools for data
management, exploration and visualization”
@dtrihinas
http://dtrihinas.info
https://ailab.unic.ac.cy/https://www.slideshare.net/DemetrisTrihinas
@AilabUnic
7/16/19 3Demetris Trihinas
trihinas.d@unic.ac.cy
3Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
A picture is worth a 1000 words...
Chinese proverb
7/16/19 4Demetris Trihinas
trihinas.d@unic.ac.cy
4Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Unemployment Data in the US
Any conclusions or insights from this table?
7/16/19 5Demetris Trihinas
trihinas.d@unic.ac.cy
5Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Unemployment Data in the US
Colored visualization of unemployment per area
Which areas
have low
unemployment?
7/16/19 6Demetris Trihinas
trihinas.d@unic.ac.cy
6Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Seismic Activity in California
Any conclusions or insights from this table?
7/16/19 7Demetris Trihinas
trihinas.d@unic.ac.cy
7Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Seismic Activity in California
Alcatraz
NationalPark
Hollywood
At the national
park are there no
seismic activity?
Is this a good
place to live?
7/16/19 8Demetris Trihinas
trihinas.d@unic.ac.cy
8Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Visualization
Easier –for humans– to conceptually understand data by visually
focusing on the main information.
Data visualization is a tool for both disseminating knowledge
and a form of knowledge communication.
7/16/19 9Demetris Trihinas
trihinas.d@unic.ac.cy
9Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Why Visual Representations?Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
What is the data
“telling” us?
How about letting me
“see” the data first?
7/16/19 10Demetris Trihinas
trihinas.d@unic.ac.cy
10Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Visualization is also a Data Exploration Tool
Why Visual?
18CIS 467, Spring 2015
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Mean of x 9
Variance of x 11
Mean of y 7.50
Variance of y 4.122
Correlation 0.816
[F. J. Anscombe]
CIS 602, Fall 2014
●
●
●
●
●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x1
y1
●
●
●●
●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x2
y2
●
●
●
●
●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x3
y3
●
●
●
●●
●
●
●
●
●
●
4 6 8 10 12 14 16 18
4
6
8
10
12
x4
y4
Why Visual?
19
[F. J. Anscombe]
Linear
dependency
…“perfect” linear
dependency
Without
”outlier…”
Should we just consider this an error and throw this point away?
7/16/19 11Demetris Trihinas
trihinas.d@unic.ac.cy
11Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
7/16/19 12Demetris Trihinas
trihinas.d@unic.ac.cy
12Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What Data Should be Visualized
7/16/19 13Demetris Trihinas
trihinas.d@unic.ac.cy
13Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Dashboards, spreadsheets and visuals
only tell you what is happening.
But, they do not tell you why…
7/16/19 14Demetris Trihinas
trihinas.d@unic.ac.cy
14Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The “Computer vs Human” Ability Matrix
Ability Matrix
7/16/19 15Demetris Trihinas
trihinas.d@unic.ac.cy
15Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Today’s Talk
• Data visualization as a communication and data
exploration tool
• Data storytelling
• Give your data a voice!
• The unintentional and intentional “bewares”
• Tools of the trade
7/16/19 16Demetris Trihinas
trihinas.d@unic.ac.cy
16Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What is a Story?
A story is a set of – observations, facts, or
events, true or invented – that are presented
in a specific order such that they create an
emotional reaction in the audience.
7/16/19 17Demetris Trihinas
trihinas.d@unic.ac.cy
17Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Storytelling
Data storytelling uses a narrative
tailored to a specific audience with the
intent to communicate information
extracted from (raw) data.
7/16/19 18Demetris Trihinas
trihinas.d@unic.ac.cy
18Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What Makes a (Good) Story?
hypothesis –> data –> insights –> narrative –> visuals
The narrative
through visuals is
the key vehicle to
convey insights
extracted from
the data.
you start here…
7/16/19 19Demetris Trihinas
trihinas.d@unic.ac.cy
19Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Storytelling – It’s a Brain Thing…
Narratives aid the memory process, via the
emotional aspect of a story which can engage
more parts of the brain, making the story, and
its elements, easier to recall.
How Stories Change the Brain. P. Zak, Berkeley, 2013.
7/16/19 20Demetris Trihinas
trihinas.d@unic.ac.cy
20Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Should we Get Into the Camcorder or
Digital Camera Business?
the hypothesis
7/16/19 21Demetris Trihinas
trihinas.d@unic.ac.cy
21Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
the data
Camcorder vs Digital Camera Sales
Camcorder Digital Camera
…but also part
of the insights
7/16/19 22Demetris Trihinas
trihinas.d@unic.ac.cy
22Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The GoPro Story
7/16/19 23Demetris Trihinas
trihinas.d@unic.ac.cy
23Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
A Good Story… has a Personal Touch
It’s not about how much
money you spent but how
many miles you traveled
and the equivalent of
those miles.
7/16/19 24Demetris Trihinas
trihinas.d@unic.ac.cy
24Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
A Good Story… can be Interactive
Harsh reality
10 - 12% of our
lives is devoted to
travelling between
work, leisure and
our homes
7/16/19 25Demetris Trihinas
trihinas.d@unic.ac.cy
25Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
A Good Story… can be Interactive
Focus even more on interesting information – time is factored in visual.
7/16/19 26Demetris Trihinas
trihinas.d@unic.ac.cy
26Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Journalism is the Future…
Journalists need to be data-savvy. It used to be that you
would get stories by chatting to people in bars, and it
still might be that you’ll do it that way some times. But
now it’s also going to be about equipping yourself with
the tools to analyze data and picking out what is
interesting. And keeping it in perspective, helping
people out by really seeing where it all fits together, and
what’s going on in the world.
Sir Tim Berners-Lee (2013)
7/16/19 27Demetris Trihinas
trihinas.d@unic.ac.cy
27Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The Data Journalism Pyramid
Paul Bradshaw (2011)
Data
The
Data
Science
Process
7/16/19 28Demetris Trihinas
trihinas.d@unic.ac.cy
28Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The Data Journalism Pyramid
Paul Bradshaw (2011)
Data
The
Data
Science
Process
7/16/19 29Demetris Trihinas
trihinas.d@unic.ac.cy
29Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Science and Data Journalism
7/16/19 30Demetris Trihinas
trihinas.d@unic.ac.cy
30Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
WSJ: The Impact of Vaccines (2015)
Data from CDC 1920-2014 (US)
Heatmap
“cool to warm” scale denoting number of infection cases
7/16/19 31Demetris Trihinas
trihinas.d@unic.ac.cy
31Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
ProPublica: A Disappearing Planet
https://projects.propublica.org/extinctions/Sliding time window
Data from UN Red List of Species (2013)
Stack bar plot
quantities out of total
Stack bar plot
“clustered” by species
7/16/19 32Demetris Trihinas
trihinas.d@unic.ac.cy
32Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Bloomberg: Most dangerous jobs (2015)
Data from U.S. Department of Labor
Tagline
Stacked bar plot with
highlighting on focused
category
7/16/19 33Demetris Trihinas
trihinas.d@unic.ac.cy
33Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
How things can go wrong
unintentionally…
Misinformation
7/16/19 34Demetris Trihinas
trihinas.d@unic.ac.cy
34Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What is the Intended Story?
Mean arrival delay versus distance from New York City
Each point represents a destination, and the size of each point represents the number of
flights from New York to that destination in 2013.
Which is
the best
airline?
7/16/19 35Demetris Trihinas
trihinas.d@unic.ac.cy
35Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Make a Figure for the “Generals”
• Common dataviz misconceptions:
• The audience sees your figures
and immediately infers the points
you are trying to make.
• The audience can rapidly process complex visualizations and
understand the key trends and relationships that are shown.
• Follow your audience “language” and thinking process.
Claus Wilke, “Fundamentals of Data Visualization”, https://serialmentor.com/dataviz/
7/16/19 36Demetris Trihinas
trihinas.d@unic.ac.cy
36Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
What is the Intended Story?
simple and clear is better than complex and confusing.
7/16/19 37Demetris Trihinas
trihinas.d@unic.ac.cy
37Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Death to Pie Charts [and Comic Sans]ie Charts
Cole Nussbaumerwww.storytellingwithdata.com/2011/07/death-to-pie-charts.html
“I hate pie charts.
I mean, really hate them.”
Share of coverage
on TechCrunch
Redesign
Cole Nussbaumerwww.storytellingwithdata.com/2011/07/death-to-pie-charts.html
“I hate pie charts.
I mean, really hate them.”
Share of coverage
on TechCrunch
7/16/19 38Demetris Trihinas
trihinas.d@unic.ac.cy
38Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Storytelling Pie vs Bars
So, what to use instead?
http://www.storytellingwithdata.com/blog/2014/06/alternatives-to-pies
imagine you just completed a pilot summer learning program on science aimed at improving perceptions of the field among 2nd and 3rd grade elementary children
7/16/19 39Demetris Trihinas
trihinas.d@unic.ac.cy
39Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Storytelling Pie vs Bars
native #2: Simple Bar Grap
7/16/19 40Demetris Trihinas
trihinas.d@unic.ac.cy
40Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Storytelling Pie vs Bars
rnative #3: 100% Stacked
zontal Bar Graph
7/16/19 41Demetris Trihinas
trihinas.d@unic.ac.cy
41Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Storytelling Pie vs Barsernative #4: Slopegraph
Slope
Graph
7/16/19 42Demetris Trihinas
trihinas.d@unic.ac.cy
42Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
7/16/19 43Demetris Trihinas
trihinas.d@unic.ac.cy
43Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Data Storytelling is NOT about “fitting”
the data to the story YOU want!
7/16/19 44Demetris Trihinas
trihinas.d@unic.ac.cy
44Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Correlation
• Correlation is a statistical technique that tells us how
strongly related are pairs of variables.
• But… correlation does not tell us the why and how
behind the relationship.
• So… correlation just says that a relationship exists.
7/16/19 45Demetris Trihinas
trihinas.d@unic.ac.cy
45Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Ice-Cream and Sunglass Sales
As the sales of ice creams is increasing so do
the sales of sunglasses.
7/16/19 46Demetris Trihinas
trihinas.d@unic.ac.cy
46Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Causation
• Causation denotes that any change in the value of one
variable will cause a change in the value of another
variable.
• This means that one variable makes other to happen.
7/16/19 47Demetris Trihinas
trihinas.d@unic.ac.cy
47Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Exercise and Calories
• When a person is exercising then the amount of
calories burned increases every minute.
• The former (exercise) is causing the latter (calories
burned) to happen.
7/16/19 48Demetris Trihinas
trihinas.d@unic.ac.cy
48Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Ice-Cream and Homicides in New York
• A study in the 90’s showed that ice-cream sales are the
cause of homicides in New York.
• As the sales of ice-cream rise and fall, so do the
number of homicides -> correlation.
• But… does the consumption of ice-cream actually
cause the death of people in NY?
https://www.nytimes.com/2009/06/19/nyregion/19murder.html
7/16/19 49Demetris Trihinas
trihinas.d@unic.ac.cy
49Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Correlation Does NOT Imply Causation
• The two things are, yes, correlated.
• But this does NOT mean one causes other.
Correlation is something which
we think, when we can’t see
under the covers.
So the less the information we
have the more we are forced
to observe correlations.
7/16/19 50Demetris Trihinas
trihinas.d@unic.ac.cy
50Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
There is NO Correlation without Causation
If neither A nor B causes the other, and the two are
correlated, there must be some common cause. It may not
be a direct cause of each of them, but it’s there somewhere
“upstream” in the picture.
Bottom line:
you have to keep “digging”… don’t be lazy!
7/16/19 51Demetris Trihinas
trihinas.d@unic.ac.cy
51Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
How things intentionally go
wrong…
Disinformation
7/16/19 52Demetris Trihinas
trihinas.d@unic.ac.cy
52Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Using a Sample of the Data
• How many football games do US citizens got to?
• To get an -exact- answer (100% correct), you must ask
everyone in the US (>350M people) -> Not practical!
• Use a random sample, meaning ask (much) less people
-> but we won’t be 100% correct.
7/16/19 53Demetris Trihinas
trihinas.d@unic.ac.cy
53Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Small Sample Sizes
• Picking an adequate sample size is “part science and
part art”
• But statements, like “75% of (some group) plan to use
(some product) this year” become suspect when the
sample size is just 24 companies.
• Even worse… the sample size is NOT mentioned in the
study or visual at all.
7/16/19 54Demetris Trihinas
trihinas.d@unic.ac.cy
54Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Biased Sampling
• This involves over/under polling a non-representative
group.
• A survey reveals that “81% of bank customers would
use mobile banking if it were available…”
• Meaningless if survey only polled people on their
mobile devices.
7/16/19 55Demetris Trihinas
trihinas.d@unic.ac.cy
55Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Random Sample Selection
• Random… means random!
• You cannot just select 1000 people from one city, the
sample wont represent the whole country.
• You cannot just send FB messages to 1000 random
people, you will get a representation of FB users, and
of course not all of the country’s citizens use FB.
• So… constructing a random sample is actually hard!
7/16/19 56Demetris Trihinas
trihinas.d@unic.ac.cy
56Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Poorly Chosen Lying with Statistics
Using the mean with values across non-uniform
populations.
What is the
starting
salary at a
company?
7/16/19 57Demetris Trihinas
trihinas.d@unic.ac.cy
57Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Poorly Chosen Lying with Statistics
Using the median to hide a skewed data.
Invest with me.
My portfolio’s
median profit
is 8%.
median
mean
7/16/19 58Demetris Trihinas
trihinas.d@unic.ac.cy
58Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Poorly Chosen Lying with Statistics
A survey is only as accurate as it’s standard error.
7/16/19 59Demetris Trihinas
trihinas.d@unic.ac.cy
59Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The Semi-Attached Persona
• Stating one thing as a proof for something else.
• For example, if an ad says “15% of CEOs drive a BMW;
more than any other brand”– what does that prove?
• The implication is that CEOs are some sort of
authorities on cars or it could be the other way around,
BMWs “make” CEOs.
7/16/19 60Demetris Trihinas
trihinas.d@unic.ac.cy
60Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
The Lie Factor
• The size of the graphic effect should be directly
proportional to the numerical quantities:
Edward Tufte: Principles of Graphical Integrity
e Lie Factor
Size of effect shown in graphic
Size of effect in data
7/16/19 61Demetris Trihinas
trihinas.d@unic.ac.cy
61Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Proportional Data -> Proportional Vizd bar chart?
Rule: Use channel proportional to data!
7/16/19 62Demetris Trihinas
trihinas.d@unic.ac.cy
62Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Lying through Graphics
Lie Factor - Graphical Integrity
Magnitude in data
must correspond to
magnitude of mark
Flowing Data
Effect in Data: factor 1.14
Effect in Graphic: factor 5
Lie Factor: 5/1.14 = 4.38
35%
39.6%Scale Distortions
7/16/19 63Demetris Trihinas
trihinas.d@unic.ac.cy
63Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Propaganda Gone Really Really Bad?
7/16/19 64Demetris Trihinas
trihinas.d@unic.ac.cy
64Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Conceptualizing Scale via Comparison
To truly understand
scale a comparison
must be made.
This is good
visualization because
we have UK as
reference
7/16/19 65Demetris Trihinas
trihinas.d@unic.ac.cy
65Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Diverging Value-Scaleample: Diverging Value-Scal
Who won the election?
Election
maps carry
significant
bias
7/16/19 66Demetris Trihinas
trihinas.d@unic.ac.cy
66Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
US Election 2016 – Displaying Polls
• Donald Trump’s campaign used the actual US map to
present poll results.
• Influencing swing voters by feeding “your” news.
“…a lot of red folks… we’re winning...” The “reality”
7/16/19 67Demetris Trihinas
trihinas.d@unic.ac.cy
67Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
2016 Spain Elections
7/16/19 68Demetris Trihinas
trihinas.d@unic.ac.cy
68Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
(Some) tools of the trade
Going beyond Microsoft Excel…
7/16/19 69Demetris Trihinas
trihinas.d@unic.ac.cy
69Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
7/16/19 70Demetris Trihinas
trihinas.d@unic.ac.cy
70Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
7/16/19 71Demetris Trihinas
trihinas.d@unic.ac.cy
71Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
kepler.gl
7/16/19 72Demetris Trihinas
trihinas.d@unic.ac.cy
72Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Instead of a Conclusion…
Data can be the source of a story, or it can be the
tool with which the story is told – or it can be both.
Like any source, it should be treated with
skepticism; and like any tool, we should be
conscious of how it can shape the stories that are
created with it.
7/16/19 73Demetris Trihinas
trihinas.d@unic.ac.cy
73Lead Cyprus: Disinformation Battles | Limassol, July 2019
Department of
Computer Science
Telling a Story
– or Even Propaganda –
Through Data Visualization
Questions?
Demetris Trihinas
Department of Computer Science
ailab @ University of Nicosia
trihinas.d@unic.ac.cy
1 of 73

More Related Content

What's hot(20)

Idiro Analytics - Analytics & Big DataIdiro Analytics - Analytics & Big Data
Idiro Analytics - Analytics & Big Data
Idiro Analytics7.4K views
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
Stephen Tracy11.2K views
3 data visualization3 data visualization
3 data visualization
ThilinaWanshathilaka586 views
Swisscom Network AnalyticsSwisscom Network Analytics
Swisscom Network Analytics
confluent204 views
Splunk for AIOps: Reduce IT outages through prediction with machine learningSplunk for AIOps: Reduce IT outages through prediction with machine learning
Splunk for AIOps: Reduce IT outages through prediction with machine learning
Digital Transformation EXPO Event Series1.3K views
econometrics project PG1 2015-16econometrics project PG1 2015-16
econometrics project PG1 2015-16
Sayantan Baidya1.5K views
SCM PresentationSCM Presentation
SCM Presentation
Yousef A Al Saayed1.4K views
Big Data StrategiesBig Data Strategies
Big Data Strategies
Misiek Piskorski7.3K views
Data QualityData Quality
Data Quality
Michael Collins5.3K views
Sensitivity AnalysisSensitivity Analysis
Sensitivity Analysis
Bhargav Seeram37.2K views
Demystifying Healthcare Data GovernanceDemystifying Healthcare Data Governance
Demystifying Healthcare Data Governance
Health Catalyst57.3K views
A Pixar Twist on Presenting DataA Pixar Twist on Presenting Data
A Pixar Twist on Presenting Data
Amanda Makulec1.2K views
Building Data Lakehouse.pdfBuilding Data Lakehouse.pdf
Building Data Lakehouse.pdf
Luis Jimenez526 views
Data visualizationData visualization
Data visualization
Subarna Natarajan358 views
Data AnalyticsData Analytics
Data Analytics
Srinimf-Slides 2K views

Similar to Telling a Story – or Even Propaganda – Through Data Visualization(9)

More from Demetris Trihinas(17)

From Mining Raw Data to Story VisualizationFrom Mining Raw Data to Story Visualization
From Mining Raw Data to Story Visualization
Demetris Trihinas136 views
Adam - Adaptive Monitoring in 5minAdam - Adaptive Monitoring in 5min
Adam - Adaptive Monitoring in 5min
Demetris Trihinas624 views
Find A ProjectFind A Project
Find A Project
Demetris Trihinas370 views
Cloud Elasticity and the CELAR ProjectCloud Elasticity and the CELAR Project
Cloud Elasticity and the CELAR Project
Demetris Trihinas1.1K views

Recently uploaded(20)

CXL at OCPCXL at OCP
CXL at OCP
CXL Forum203 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 views
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation24 views

Telling a Story – or Even Propaganda – Through Data Visualization

  • 1. 7/16/19 1Demetris Trihinas trihinas.d@unic.ac.cy 1Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Telling a Story – or Even Propaganda – Through Data Visualization Demetris Trihinas Department of Computer Science ailab @ University of Nicosia trihinas.d@unic.ac.cy
  • 2. 7/16/19 2Demetris Trihinas trihinas.d@unic.ac.cy 2Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Full-Time Faculty Member University of Nicosia “Designing and developing scalable and self-adaptive tools for data management, exploration and visualization” @dtrihinas http://dtrihinas.info https://ailab.unic.ac.cy/https://www.slideshare.net/DemetrisTrihinas @AilabUnic
  • 3. 7/16/19 3Demetris Trihinas trihinas.d@unic.ac.cy 3Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science A picture is worth a 1000 words... Chinese proverb
  • 4. 7/16/19 4Demetris Trihinas trihinas.d@unic.ac.cy 4Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Unemployment Data in the US Any conclusions or insights from this table?
  • 5. 7/16/19 5Demetris Trihinas trihinas.d@unic.ac.cy 5Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Unemployment Data in the US Colored visualization of unemployment per area Which areas have low unemployment?
  • 6. 7/16/19 6Demetris Trihinas trihinas.d@unic.ac.cy 6Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Seismic Activity in California Any conclusions or insights from this table?
  • 7. 7/16/19 7Demetris Trihinas trihinas.d@unic.ac.cy 7Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Seismic Activity in California Alcatraz NationalPark Hollywood At the national park are there no seismic activity? Is this a good place to live?
  • 8. 7/16/19 8Demetris Trihinas trihinas.d@unic.ac.cy 8Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Data Visualization Easier –for humans– to conceptually understand data by visually focusing on the main information. Data visualization is a tool for both disseminating knowledge and a form of knowledge communication.
  • 9. 7/16/19 9Demetris Trihinas trihinas.d@unic.ac.cy 9Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Why Visual Representations?Why Visual? 18CIS 467, Spring 2015 I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Mean of x 9 Variance of x 11 Mean of y 7.50 Variance of y 4.122 Correlation 0.816 [F. J. Anscombe] Why Visual? 18CIS 467, Spring 2015 I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Mean of x 9 Variance of x 11 Mean of y 7.50 Variance of y 4.122 Correlation 0.816 [F. J. Anscombe] Why Visual? 18CIS 467, Spring 2015 I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Mean of x 9 Variance of x 11 Mean of y 7.50 Variance of y 4.122 Correlation 0.816 [F. J. Anscombe] What is the data “telling” us? How about letting me “see” the data first?
  • 10. 7/16/19 10Demetris Trihinas trihinas.d@unic.ac.cy 10Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Visualization is also a Data Exploration Tool Why Visual? 18CIS 467, Spring 2015 I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 Mean of x 9 Variance of x 11 Mean of y 7.50 Variance of y 4.122 Correlation 0.816 [F. J. Anscombe] CIS 602, Fall 2014 ● ● ● ● ● ● ● ● ● ● ● 4 6 8 10 12 14 16 18 4 6 8 10 12 x1 y1 ● ● ●● ● ● ● ● ● ● ● 4 6 8 10 12 14 16 18 4 6 8 10 12 x2 y2 ● ● ● ● ● ● ● ● ● ● ● 4 6 8 10 12 14 16 18 4 6 8 10 12 x3 y3 ● ● ● ●● ● ● ● ● ● ● 4 6 8 10 12 14 16 18 4 6 8 10 12 x4 y4 Why Visual? 19 [F. J. Anscombe] Linear dependency …“perfect” linear dependency Without ”outlier…” Should we just consider this an error and throw this point away?
  • 11. 7/16/19 11Demetris Trihinas trihinas.d@unic.ac.cy 11Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science
  • 12. 7/16/19 12Demetris Trihinas trihinas.d@unic.ac.cy 12Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science What Data Should be Visualized
  • 13. 7/16/19 13Demetris Trihinas trihinas.d@unic.ac.cy 13Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Dashboards, spreadsheets and visuals only tell you what is happening. But, they do not tell you why…
  • 14. 7/16/19 14Demetris Trihinas trihinas.d@unic.ac.cy 14Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science The “Computer vs Human” Ability Matrix Ability Matrix
  • 15. 7/16/19 15Demetris Trihinas trihinas.d@unic.ac.cy 15Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Today’s Talk • Data visualization as a communication and data exploration tool • Data storytelling • Give your data a voice! • The unintentional and intentional “bewares” • Tools of the trade
  • 16. 7/16/19 16Demetris Trihinas trihinas.d@unic.ac.cy 16Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science What is a Story? A story is a set of – observations, facts, or events, true or invented – that are presented in a specific order such that they create an emotional reaction in the audience.
  • 17. 7/16/19 17Demetris Trihinas trihinas.d@unic.ac.cy 17Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Data Storytelling Data storytelling uses a narrative tailored to a specific audience with the intent to communicate information extracted from (raw) data.
  • 18. 7/16/19 18Demetris Trihinas trihinas.d@unic.ac.cy 18Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science What Makes a (Good) Story? hypothesis –> data –> insights –> narrative –> visuals The narrative through visuals is the key vehicle to convey insights extracted from the data. you start here…
  • 19. 7/16/19 19Demetris Trihinas trihinas.d@unic.ac.cy 19Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Data Storytelling – It’s a Brain Thing… Narratives aid the memory process, via the emotional aspect of a story which can engage more parts of the brain, making the story, and its elements, easier to recall. How Stories Change the Brain. P. Zak, Berkeley, 2013.
  • 20. 7/16/19 20Demetris Trihinas trihinas.d@unic.ac.cy 20Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Should we Get Into the Camcorder or Digital Camera Business? the hypothesis
  • 21. 7/16/19 21Demetris Trihinas trihinas.d@unic.ac.cy 21Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science the data Camcorder vs Digital Camera Sales Camcorder Digital Camera …but also part of the insights
  • 22. 7/16/19 22Demetris Trihinas trihinas.d@unic.ac.cy 22Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science The GoPro Story
  • 23. 7/16/19 23Demetris Trihinas trihinas.d@unic.ac.cy 23Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science A Good Story… has a Personal Touch It’s not about how much money you spent but how many miles you traveled and the equivalent of those miles.
  • 24. 7/16/19 24Demetris Trihinas trihinas.d@unic.ac.cy 24Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science A Good Story… can be Interactive Harsh reality 10 - 12% of our lives is devoted to travelling between work, leisure and our homes
  • 25. 7/16/19 25Demetris Trihinas trihinas.d@unic.ac.cy 25Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science A Good Story… can be Interactive Focus even more on interesting information – time is factored in visual.
  • 26. 7/16/19 26Demetris Trihinas trihinas.d@unic.ac.cy 26Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Data Journalism is the Future… Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you’ll do it that way some times. But now it’s also going to be about equipping yourself with the tools to analyze data and picking out what is interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the world. Sir Tim Berners-Lee (2013)
  • 27. 7/16/19 27Demetris Trihinas trihinas.d@unic.ac.cy 27Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science The Data Journalism Pyramid Paul Bradshaw (2011) Data The Data Science Process
  • 28. 7/16/19 28Demetris Trihinas trihinas.d@unic.ac.cy 28Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science The Data Journalism Pyramid Paul Bradshaw (2011) Data The Data Science Process
  • 29. 7/16/19 29Demetris Trihinas trihinas.d@unic.ac.cy 29Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Data Science and Data Journalism
  • 30. 7/16/19 30Demetris Trihinas trihinas.d@unic.ac.cy 30Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science WSJ: The Impact of Vaccines (2015) Data from CDC 1920-2014 (US) Heatmap “cool to warm” scale denoting number of infection cases
  • 31. 7/16/19 31Demetris Trihinas trihinas.d@unic.ac.cy 31Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science ProPublica: A Disappearing Planet https://projects.propublica.org/extinctions/Sliding time window Data from UN Red List of Species (2013) Stack bar plot quantities out of total Stack bar plot “clustered” by species
  • 32. 7/16/19 32Demetris Trihinas trihinas.d@unic.ac.cy 32Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Bloomberg: Most dangerous jobs (2015) Data from U.S. Department of Labor Tagline Stacked bar plot with highlighting on focused category
  • 33. 7/16/19 33Demetris Trihinas trihinas.d@unic.ac.cy 33Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science How things can go wrong unintentionally… Misinformation
  • 34. 7/16/19 34Demetris Trihinas trihinas.d@unic.ac.cy 34Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science What is the Intended Story? Mean arrival delay versus distance from New York City Each point represents a destination, and the size of each point represents the number of flights from New York to that destination in 2013. Which is the best airline?
  • 35. 7/16/19 35Demetris Trihinas trihinas.d@unic.ac.cy 35Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Make a Figure for the “Generals” • Common dataviz misconceptions: • The audience sees your figures and immediately infers the points you are trying to make. • The audience can rapidly process complex visualizations and understand the key trends and relationships that are shown. • Follow your audience “language” and thinking process. Claus Wilke, “Fundamentals of Data Visualization”, https://serialmentor.com/dataviz/
  • 36. 7/16/19 36Demetris Trihinas trihinas.d@unic.ac.cy 36Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science What is the Intended Story? simple and clear is better than complex and confusing.
  • 37. 7/16/19 37Demetris Trihinas trihinas.d@unic.ac.cy 37Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Death to Pie Charts [and Comic Sans]ie Charts Cole Nussbaumerwww.storytellingwithdata.com/2011/07/death-to-pie-charts.html “I hate pie charts. I mean, really hate them.” Share of coverage on TechCrunch Redesign Cole Nussbaumerwww.storytellingwithdata.com/2011/07/death-to-pie-charts.html “I hate pie charts. I mean, really hate them.” Share of coverage on TechCrunch
  • 38. 7/16/19 38Demetris Trihinas trihinas.d@unic.ac.cy 38Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Storytelling Pie vs Bars So, what to use instead? http://www.storytellingwithdata.com/blog/2014/06/alternatives-to-pies imagine you just completed a pilot summer learning program on science aimed at improving perceptions of the field among 2nd and 3rd grade elementary children
  • 39. 7/16/19 39Demetris Trihinas trihinas.d@unic.ac.cy 39Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Storytelling Pie vs Bars native #2: Simple Bar Grap
  • 40. 7/16/19 40Demetris Trihinas trihinas.d@unic.ac.cy 40Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Storytelling Pie vs Bars rnative #3: 100% Stacked zontal Bar Graph
  • 41. 7/16/19 41Demetris Trihinas trihinas.d@unic.ac.cy 41Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Storytelling Pie vs Barsernative #4: Slopegraph Slope Graph
  • 42. 7/16/19 42Demetris Trihinas trihinas.d@unic.ac.cy 42Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science
  • 43. 7/16/19 43Demetris Trihinas trihinas.d@unic.ac.cy 43Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Data Storytelling is NOT about “fitting” the data to the story YOU want!
  • 44. 7/16/19 44Demetris Trihinas trihinas.d@unic.ac.cy 44Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Correlation • Correlation is a statistical technique that tells us how strongly related are pairs of variables. • But… correlation does not tell us the why and how behind the relationship. • So… correlation just says that a relationship exists.
  • 45. 7/16/19 45Demetris Trihinas trihinas.d@unic.ac.cy 45Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Ice-Cream and Sunglass Sales As the sales of ice creams is increasing so do the sales of sunglasses.
  • 46. 7/16/19 46Demetris Trihinas trihinas.d@unic.ac.cy 46Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Causation • Causation denotes that any change in the value of one variable will cause a change in the value of another variable. • This means that one variable makes other to happen.
  • 47. 7/16/19 47Demetris Trihinas trihinas.d@unic.ac.cy 47Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Exercise and Calories • When a person is exercising then the amount of calories burned increases every minute. • The former (exercise) is causing the latter (calories burned) to happen.
  • 48. 7/16/19 48Demetris Trihinas trihinas.d@unic.ac.cy 48Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Ice-Cream and Homicides in New York • A study in the 90’s showed that ice-cream sales are the cause of homicides in New York. • As the sales of ice-cream rise and fall, so do the number of homicides -> correlation. • But… does the consumption of ice-cream actually cause the death of people in NY? https://www.nytimes.com/2009/06/19/nyregion/19murder.html
  • 49. 7/16/19 49Demetris Trihinas trihinas.d@unic.ac.cy 49Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Correlation Does NOT Imply Causation • The two things are, yes, correlated. • But this does NOT mean one causes other. Correlation is something which we think, when we can’t see under the covers. So the less the information we have the more we are forced to observe correlations.
  • 50. 7/16/19 50Demetris Trihinas trihinas.d@unic.ac.cy 50Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science There is NO Correlation without Causation If neither A nor B causes the other, and the two are correlated, there must be some common cause. It may not be a direct cause of each of them, but it’s there somewhere “upstream” in the picture. Bottom line: you have to keep “digging”… don’t be lazy!
  • 51. 7/16/19 51Demetris Trihinas trihinas.d@unic.ac.cy 51Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science How things intentionally go wrong… Disinformation
  • 52. 7/16/19 52Demetris Trihinas trihinas.d@unic.ac.cy 52Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Using a Sample of the Data • How many football games do US citizens got to? • To get an -exact- answer (100% correct), you must ask everyone in the US (>350M people) -> Not practical! • Use a random sample, meaning ask (much) less people -> but we won’t be 100% correct.
  • 53. 7/16/19 53Demetris Trihinas trihinas.d@unic.ac.cy 53Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Small Sample Sizes • Picking an adequate sample size is “part science and part art” • But statements, like “75% of (some group) plan to use (some product) this year” become suspect when the sample size is just 24 companies. • Even worse… the sample size is NOT mentioned in the study or visual at all.
  • 54. 7/16/19 54Demetris Trihinas trihinas.d@unic.ac.cy 54Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Biased Sampling • This involves over/under polling a non-representative group. • A survey reveals that “81% of bank customers would use mobile banking if it were available…” • Meaningless if survey only polled people on their mobile devices.
  • 55. 7/16/19 55Demetris Trihinas trihinas.d@unic.ac.cy 55Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Random Sample Selection • Random… means random! • You cannot just select 1000 people from one city, the sample wont represent the whole country. • You cannot just send FB messages to 1000 random people, you will get a representation of FB users, and of course not all of the country’s citizens use FB. • So… constructing a random sample is actually hard!
  • 56. 7/16/19 56Demetris Trihinas trihinas.d@unic.ac.cy 56Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Poorly Chosen Lying with Statistics Using the mean with values across non-uniform populations. What is the starting salary at a company?
  • 57. 7/16/19 57Demetris Trihinas trihinas.d@unic.ac.cy 57Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Poorly Chosen Lying with Statistics Using the median to hide a skewed data. Invest with me. My portfolio’s median profit is 8%. median mean
  • 58. 7/16/19 58Demetris Trihinas trihinas.d@unic.ac.cy 58Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Poorly Chosen Lying with Statistics A survey is only as accurate as it’s standard error.
  • 59. 7/16/19 59Demetris Trihinas trihinas.d@unic.ac.cy 59Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science The Semi-Attached Persona • Stating one thing as a proof for something else. • For example, if an ad says “15% of CEOs drive a BMW; more than any other brand”– what does that prove? • The implication is that CEOs are some sort of authorities on cars or it could be the other way around, BMWs “make” CEOs.
  • 60. 7/16/19 60Demetris Trihinas trihinas.d@unic.ac.cy 60Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science The Lie Factor • The size of the graphic effect should be directly proportional to the numerical quantities: Edward Tufte: Principles of Graphical Integrity e Lie Factor Size of effect shown in graphic Size of effect in data
  • 61. 7/16/19 61Demetris Trihinas trihinas.d@unic.ac.cy 61Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Proportional Data -> Proportional Vizd bar chart? Rule: Use channel proportional to data!
  • 62. 7/16/19 62Demetris Trihinas trihinas.d@unic.ac.cy 62Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Lying through Graphics Lie Factor - Graphical Integrity Magnitude in data must correspond to magnitude of mark Flowing Data Effect in Data: factor 1.14 Effect in Graphic: factor 5 Lie Factor: 5/1.14 = 4.38 35% 39.6%Scale Distortions
  • 63. 7/16/19 63Demetris Trihinas trihinas.d@unic.ac.cy 63Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Propaganda Gone Really Really Bad?
  • 64. 7/16/19 64Demetris Trihinas trihinas.d@unic.ac.cy 64Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Conceptualizing Scale via Comparison To truly understand scale a comparison must be made. This is good visualization because we have UK as reference
  • 65. 7/16/19 65Demetris Trihinas trihinas.d@unic.ac.cy 65Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Diverging Value-Scaleample: Diverging Value-Scal Who won the election? Election maps carry significant bias
  • 66. 7/16/19 66Demetris Trihinas trihinas.d@unic.ac.cy 66Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science US Election 2016 – Displaying Polls • Donald Trump’s campaign used the actual US map to present poll results. • Influencing swing voters by feeding “your” news. “…a lot of red folks… we’re winning...” The “reality”
  • 67. 7/16/19 67Demetris Trihinas trihinas.d@unic.ac.cy 67Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science 2016 Spain Elections
  • 68. 7/16/19 68Demetris Trihinas trihinas.d@unic.ac.cy 68Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science (Some) tools of the trade Going beyond Microsoft Excel…
  • 69. 7/16/19 69Demetris Trihinas trihinas.d@unic.ac.cy 69Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science
  • 70. 7/16/19 70Demetris Trihinas trihinas.d@unic.ac.cy 70Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science
  • 71. 7/16/19 71Demetris Trihinas trihinas.d@unic.ac.cy 71Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science kepler.gl
  • 72. 7/16/19 72Demetris Trihinas trihinas.d@unic.ac.cy 72Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Instead of a Conclusion… Data can be the source of a story, or it can be the tool with which the story is told – or it can be both. Like any source, it should be treated with skepticism; and like any tool, we should be conscious of how it can shape the stories that are created with it.
  • 73. 7/16/19 73Demetris Trihinas trihinas.d@unic.ac.cy 73Lead Cyprus: Disinformation Battles | Limassol, July 2019 Department of Computer Science Telling a Story – or Even Propaganda – Through Data Visualization Questions? Demetris Trihinas Department of Computer Science ailab @ University of Nicosia trihinas.d@unic.ac.cy