Post-academic	course	Big	Data	 	 																														 	 			
Post-academic	course	Big	Data	
Joris Klerkx
Research Manager, PhD.
joris.klerkx@cs.kuleuven.be
Visualisatie
Big Data
IVPV - Instituut voor PermanenteVorming
28-05-2015
1
Augment group - HCI research lab
Dept. Computerwetenschappen
KU Leuven
https://augmenthuman.wordpress.com
2
Erik Duval
11/9/1965 – 12/3/2016
3
Our mission
“To	augment	the	human	intellect”	(Engelbart,	1962)
4
By	 ‘augmen+ng	human	 intellect’	 we	 mean	 increasing	 the	 capability	 of	 a	 man	
to	 approach	 a	 complex	 problem	 situa+on,	 to	 gain	 comprehension	 to	 suit	 his	
particular	needs,	and	to	derive	solu+ons	to	problems.
Design,	build	and	evaluate	relevant	tools	and	
technologies	that	help	users	to	become	beCer	in	
their	daily	life	&	work	(Duval,	2015)
Our mission
5
What are relevant user
actions?
How can we capture signals?
How can we store them?
How can we create a
meaningful feedback loop?
Our Research
Physiological, behavioural signals
Sensors, (self-)trackers
Information visualization
Scalable infrastructure
6
Application Domains
Technology-Enhanced Learning
Media
Consumption
Science 2.0
(e)Health
7
Slides will be posted to Slideshare & Zephyr
8
http://www.hearts.com/ecolife/cut-paper-consumption-protect-forests/
9
Big Data
10
Big data
11
Big data
insights
12
Better Human
Understanding
13
A mental model represents what a person
thinks is true… but isn’t necessarily true
14
UNDERSTANDING OF THEIR MENTAL MODELS
15
Wouter Walgrave - http://www.slideshare.net/wouterwalgraeve/mental-models-as-information-radiators
16
17
18
?
19
"The idea that business is strictly a numbers affair has always struck me as preposterous.
For one thing, I’ve never been particularly good at numbers, but I think I’ve done a
reasonable job with feelings. And I’m convinced that it is feelings — and feelings alone —
that account for the success of the Virgin brand in all of its myriad forms.” -- Richard
Branson
20
Gut feeling
21
What your gut
feeling says
What the
facts say
22
What your gut
feeling says
What the
facts say
Confirmation bias
Undervalued Overvalued Foolish
23
Big data
insights
data-driven insights
24
25
Big data
insights
data-driven insights
Meaningful
26
Defining visualization
27
Definition
28
Information Visualization is the use
of interactive visual representations
to amplify cognition [Card. et. al]
algorithm
<>
human
29
Information Visualisation is the use
of interactive visual representations
to amplify cognition [Card. et. al]
Definition
30
http://www.demorgen.be/dm/nl/5403/Internet/article/detail/1890428/2014/05/18/Twitteractiviteit-verraadt-je-politieke-profiel.dhtml31
Facilitate human interaction
for exploration with and
understanding of big data
32
Data visualization
Slide	source:	John	Stasko
Scientific
visualization
Information
visualization
33
Scientific visualisation
Specifically concerned with data that has a well-defined representation in 2D or 3D space (e.g., from
simulation mesh or scanner).
Slide	source:	Robert	Putman 34
InformationVisualisation
Concerned with data that does not have a well-defined
representation in 2D or 3D space (i.e.,“abstract data”)
35
Dispersion (Backstrom & Kleinberg)
36
The role of visualisation
37
Big data
insights
data-driven insights
Meaningful
38
By Longlivetheux - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=3770524739
https://medium.com/@angelamorelli/3-powerful-lessons-i-have-learnt-as-an-information-designer-cb028940254#.mkgb0h2cc
40
The Role of visualisation
Brehmer, M.; Munzner, T., "A Multi-Level Typology of Abstract Visualization Tasks," Visualization
and Computer Graphics, IEEE Transactions on , vol.19, no.12, pp.2376,2385, Dec. 201341
Explore
Data insights: a visualization (Gregor Aisch)
42
http://www.visual-analytics.eu/faq
Also: Visual Analytics
43
Visualizing Big Data
44
Multiple data sources with varied data types
“Diverse” data
I talk geoJSON
i talk custom
xml
i talk apache
logs
45
millions of records
“Tall” data
46
http://dataclysm.org
Example: 51 million ratings
47
Example: 51 million ratings
48
http://dataclysm.org
Example: 51 million ratings
49
http://dataclysm.org
Example: 51 million ratings
50
http://dataclysm.org
51
Cluttered displays
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)52
Cluttered displays
Binned density scatterplot
Hexagonal instead of rectangular
Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)53
Multi-variate data with 100s to 1000s of variables
“Wide” data
54
http://www.perceptualedge.com/blog/?p=2046
In this day of so-called Big Data,
organizations are scrambling to
implement new software and
hardware to increase the amount of
data that they collect and store.
In so doing they are unwittingly
making it harder to find the needles of
useful information in the rapidly
growing mounds of hay.
If you don’t know how to
differentiate signals from noise,
adding more noise only makes
matters worse.
55
Avoid the All-You-Can-Eat buffet! (Ben Fry)56
Visualizations might help reveal multidimensional patterns
Use the power of the machine to find a proxy in the data that
predicts the selected variables
Depending on their specific questions, domain experts might
select a subset of variables they are interested in
57
Example: 4 million messages/day on OKCupid
http://dataclysm.org
58
Each dot at 90% transparency
http://dataclysm.org
59
http://dataclysm.org
60
http://dataclysm.org
61
http://dataclysm.org
62
Multiple views on the data allow exploration of patterns
63
The strength of visualization
64
Anscombe`s quartet
http://en.wikipedia.org/wiki/Anscombe's_quartet
Enables discovery of visual patterns in data sets
Graphics reveal data (Tufte, 2001)
65
World Population Growth
A tremendous change occurred with the industrial revolution: whereas it had taken all of human history until
around 1800 for world population to reach one billion, the second billion was achieved in only 130 years
(1930), the third billion in less than 30 years (1959), the fourth billion in 15 years (1974), and the fifth billion in
only 13 years (1987). During the 20th century alone, the population in the world has grown from 1.65
billion to 6 billion.
Seeing is understanding
66
Facilitates understanding
http://www.bbc.co.uk/news/world-15391515
67
Facilitates human interaction for exploration and understanding
http://www.bbc.co.uk/news/world-15391515
68
http://www.informationisbeautiful.net/visualizations/how-many-gigatons-of-co2/
Tells stories
69
T. Nagel, M. Maitan, E. Duval,A.Vande Moere, J. Klerkx, K. Kloeckl, and C. Ratti.Touching transport - a case study on visualizing metropolitan public
transit on interactive tabletops. In AVI2014: 12th ACM International Working Conference on AdvancedVisual Interfaces, pages 281–288, 2014.
http://www.youtube.com/watch?v=wQpTM7ASc-w
Facilitates human interaction for exploration and understanding
70
Will there be enough food?
http://www.footprintnetwork.org/en/index.php/gfn/page/earth_overshoot_day/
Communicates insights easily
71Triggers Impact
http://terror.periscopic.com
Shows patterns & triggers questions
72
Interactivity allows comparison
73
http://blog.stephenwolfram.com/2012/03/the-personal-analytics-of-my-life/
Shows trends & anomalies in the data, therefore triggers questions
74
Helps to find stories, see trends
BelgiumBrazil
USA
India
75
Sentiment analysis in enterprise social network (slack)
Shows patterns
76
http://deredactie.be/cm/vrtnieuws/grafiek/interactief/1.224856177
Reader Client
Tracking Service
WebSockets
Database
engagement data mouse data
10.065 sessies werden getracked
9674 sessies werden gebruikt
in de analyse
391 sessies werden verwijderd
uit analyse (noise)
78
Visualizing Reader Activity
Elk vierkant is een ‘slide’
Elke rij stelt een
navigatie-patroon voor
doorheen de slides
Kolom 1 toont absoluut
aantal lezers
Kolom 2 toont het
percentage lezers
79
262 readers (2.7%) gaan volledig door alle slides, waarna
ze snel teruggaan naar de eerste slide om die nog even
te bekijken.
Lezerstijd per slide
Lezers spenderen +/- 75 seconden (avg) op de eerste slide
om te bestuderen welke informatie voorhanden is.
80
Shows patterns
Sentiment analysis in enterprise social network (slack)
Triggers questions & creates awareness
Disclaimer: Should we trust NLP-algorithms?
81
Empowers users to make informed decisions
Positive Badges
Negative Badges
82
Show errors in the data
http://woutervds.github.io/InfoVisPostgraduwhat/83
Show errors in the data
84


Khaled Bachour, Frederic Kaplan, Pierre Dillenbourg, "An Interactive Table for Supporting Participation Balance in Face-to-Face
Collaborative Learning," IEEE Transactions on Learning Technologies, vol. 3, no. 3, pp. 203-213, July-September, 2010
Creates awareness
85
http://infosthetics.com/
http://visualizing.org
http://www.visualcomplexity.com/vc/
http://visual.ly/
http://flowingdata.com
http://www.infovis-wiki.net
86
Visualizing (big) data
Guidelines & Facts
88
How many circles?
89
Humans have advanced perceptual abilities
Our brains makes us extremely good at recognizing visual patterns
90
91
Humans have little short term memory
Our brain remembers relatively little of what we perceive.
Most of us can only hold three to seven chunks of data at the same time.
Humans have little short term memory
92
Recognition
Identify previously learned information
93
Humans have advanced perceptual abilities
Humans have little short term memory
Our brains makes us extremely good at recognizing visual patterns
Our brains remember relatively little of what we perceive
Externalize data by using interactive, visual encodings
Promote recognition rather than recall
94
https://www.youtube.com/watch?v=og7bzN0DhpI (9:51 - 11:22 )95
96
“The centrality of human
activity in the process is key”
97
Explore
Data insights: a visualization (Gregor Aisch)
98
“It’s not a magical algorithm
that finds the insight for you”
“You have to look at the overview,
you have to decide what you zoom
in to, what you filter out. And then
you click to get the details”
Ben Shneiderman, 201199
http://www.bbc.com/future/bespoke/20140724-flight-risk/
Overview first, zoom & filter, details-on-demand
100
Overview first, zoom & filter, details-on-demand
http://www.student.kuleuven.be/~r0580868/
101
https://postgraduwhatblog.wordpress.com/2016/02/13/infovis-van-de-week-1-wouter/
Overview first, zoom & filter, details-on-demand
102
Visual Information Seeking Mantra
103
Real data is ugly and needs to be cleaned
http://hcil2.cs.umd.edu/trs/2011-34/2011-34.pdf
http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisation
https://code.google.com/p/google-refine/
http://vis.stanford.edu/wrangler/Pre-process your data
104
http://nieuws.vtm.be/verkiezingen/gemeente?province=P1&city=G73
Always check & pre-process your data
105
Verkiezingen
14/10/12
Forget about 3D graphs (on a 2D screen..)
Occlusion
Complex to interact with
Doesn’t add anything to the data
106
Source: Stephen Few
What if we need to add a 3rd variable?
107
Use small coordinated graphs to add variables
108
Forget about 3D graphs
Source: Stephen Few
Which student has more blogposts?
• Size & angle are difficult to compare
• Without labels & legends, impossible to show exact quantitative
differences
• Limited Short term (visual) memory
109
Source: Stephen Few
Save the pies for dessert (S. Few)
Try using either of the pies to put the slices in order by size
110
deredactie.be
demorgen.be
vtm.be
Verkiezingen
14/10/12
111
Obviously there are exceptions to the rule
112
http://themetapicture.com/the-sunny-side-of-the-pyramid/
0"
5"
10"
15"
20"
25"
30"
blogposts" tweets" comments"on"
blogs"
reports"
submi6ed"
Student'1'
Student"1"
0" 5" 10" 15" 20" 25" 30"
blogposts"
comments"on"blogs"
tweets"
reports"submi6ed"
Student'1'
Student"1"
Use Common Sense
0"
5"
10"
15"
20"
25"
30"
blogposts" comments"on"
blogs"
tweets" reports"
submi6ed"
Student'1'
Student"1"
113
0" 10" 20" 30" 40" 50" 60"
Student"1"
Student"2"
Student"3"
Student"4"
blogposts"
tweets"
comments"on"blogs"
reports"submi:ed"
0%# 20%# 40%# 60%# 80%# 100%#
Student#1#
Student#2#
Student#3#
Student#4#
blogposts#
tweets#
comments#on#blogs#
reports#submi;ed#
Use Common Sense
What are you comparing?
What story do you get from it?
114
Which graph makes it easier to focus on the pattern of change
through time, instead of the individual values?
Choose graph that answers your questions about your data
115Source: Stephen Few
vtm.be
deredactie.be
nieuwsblad.be
Verkiezingen
14/10/12
Communicate the correct story
116
Don’t use visualisations to mislead
117
Don’t use visualisations to mislead
118
Source: Stephen Few
119
Source: Stephen Few 120
121
http://fellinlovewithdata.com/research/deceptive-visualizations
122
http://fellinlovewithdata.com/research/deceptive-visualizations
123
How much better are the drinking water conditions in Willowtown as
compared to Silvatown?
124
http://fellinlovewithdata.com/research/deceptive-visualizations
Storytelling with visualisation
125
Visualization tasks
Brehmer, M.; Munzner, T., "A Multi-Level Typology of Abstract Visualization Tasks," Visualization
and Computer Graphics, IEEE Transactions on , vol.19, no.12, pp.2376,2385, Dec. 2013126
http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html
127
Human Perception
128
Our brains makes us extremely good at recognizing visual patterns
Source: Katrien Verbert 129
Source: Katrien Verbert 130
A limited set of visual properties that are detected
- very rapidly (< 200 to 250 ms),
- accurately,
- with little effort,
- before focused attention
by the low-lever visual system on them.
Healey,	C.,	&	Enns,	J.	(2012).	ADenEon	and	Visual	Memory	in	VisualizaEon	and	Computer	Graphics.	IEEE	Transac+ons	on	Visualiza+on	
and	Computer	Graphics	,	18	(7),	1170-1188.	
Pre-attentive characteristics
Note that eye movements take at least 200 ms to initiate.
131
Pre-attentive characteristics
Find the red dot
<> Hue
Find the dot
<> shape
Find the red dot
conjunction
not pre-attentive
http://www.csc.ncsu.edu/faculty/healey/PP/
helps to spot differences in multi-element display
132
Pre-attentive characteristics
Line orientation Length, width Closure Size
Curvature Density, contrast Intersection 3D depth
Not all of them allow showing exact quantitative differences
Helps to spot differences in multi-element display
133
http://www.csc.ncsu.edu/faculty/healey/PP/
http://www.slideshare.net/chelsc/gestalt-laws-and-design-presentation
http://artspilesenglish.blogspot.be/2011/11/gestalt-theory-exercise-for-3rdlevel.html
134
Gestalt Laws (“Pattern” laws)
Basic rules or design principles that describe perceptual phenomena.
Explain the way users or humans see patterns in visualisations.
Figure & Ground
135
136
Closure
Smallness
137Source: Katrien Verbert
Common Fate

Objects with a common movement, that move in the same
direction, at the same pace, at the same time are organised as a
group (Ehrenstein, 2004).
138
Law of Isomorphism
Is similarity that can be behavioural or perceptual, and can
be a response based on the viewers previous experiences
(Luchins & Luchins, 1999; Chang, 2002).This law is the basis
for symbolism (Schamber, 1986).
139
London Tube Map
Which Gestalt laws do you see?
140
Visualization design process
141
B. McDonnel and N. Elmqvist. Towards utilizing gpus in information visualization:A model and implementation of
image-space operations.Visualization and Computer Graphics, IEEE Transactions on, 15(6):1105–1112, 2009.
http://www.infovis-wiki.net/index.php/Visualization_Pipeline
142
143
Data
- structure
time, hierarchy, network, 1D, 2D, nD, …
- questions
where, when, how often, …
- audience
domain & visualisation expertise, …
144
S. Stevens. On the theory of scales of measurement. Science, 103(2684), 1946.
Structure
Time? hierarchical? 1D? 2D? nD? network? …
145
Questions (to get things going)
What is the average amount of students that bought the course book ?
What? When? How much? How often?
When did students start looking at the course material?
How much hours did Peter work on this assignment?
(Why did Peter have to redo his assignment?)
How often did Peter retake the course before he passed?
(why?)
146
147
Visual mapping
Encode data characteristics into visual form
Each mark (point, line, area,…) represents a data element
Think about relationships between elements (position)
“Simplicity is the ultimate sophistication.”
Leonardo daVinci
Size
http://www.informationisbeautiful.net/2009/visualising-the-guardian-datablog/
148
X	4
How much bigger is the lower bar?
Slide	adapted	from	Michael	Porath	&	Katrien	Verbert
Length
149
X	5
How much bigger is the right circle?
Slide	adapted	from	Michael	Porath	&	Katrien	Verbert
Area
150
X	9
How much bigger is the right circle?
151
Apparent magnitude curves
http://makingmaps.net/2007/08/28/perceptual-scaling-of-map-symbols
Slide	adapted	from	Michael	Porath		
152
Which one looks more accurate?
Slide	adapted	from	Michael	Porath		
153
Compensating magnitude to match perception
Color
Color Principles - Hue, Saturation, andValue
https://www.youtube.com/watch?v=l8_fZPHasdo154
Use maximum +/- 5 colors (for categories,.. ) (short term memory)
http://en.wikipedia.org/wiki/HSL_and_HSV
• hue: categorical

• saturation: ordinal and quantitative
• luminance/brightness: 

ordinal and quantitative
How to choose colors
source from: Katrien Verbert 155
http://colorbrewer2.org
156
157
https://eagereyes.org/basics/rainbow-color-map
158
http://gizmodo.com/why-a-white-cup-makes-your-coffee-taste-more-intense-1663691154
intensity, sweetness, aroma, bitterness, and quality
159
How to choose colors
Position
160
Position & color
http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/
161
J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions On Graphics, 5(2):110–141, 1986.
162
163
J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions On Graphics, 5(2):110–141,
1986.
164
Offer precise controls for sharing on the Internet...
Users should navigate through 50 settings with more than 170 options
Example
Facebook privacy statement
Questions?
How did its complexity change over time?
How does its length compare to privacy statements
of other tools?
165
How did its complexity change over time?
http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html
166
How does its length compare to privacy statements
of other tools?
http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html
167
Example:
Encoding weather forecast on a smartphone
168
?
Joris Klerkx
Research Manager, PhD.
joris.klerkx@cs.kuleuven.be
@jkofmsk
https://augmenthuman.wordpress.com
169
Always on-the-look for new opportunities…

Visualisatie - Module 3 - Big Data