5. Graphical POV
Lie factor = =
= 1 : Truth
≠ 1 : Lie
Size of effect in data
Size of effect shown in graphic
where
From: http://www.infovis-wiki.net/index.php?title=Lie_Factor
Eurostat
8. Data visualization and Big Data
Implementing effective data visualization
solutions for Big Data has to take into account
- apart the volume of the data - other intrinsic
constraints generated by the typical
characteristics of Big Data:
• real-time changes
• extreme variety of the sources
• different levels of data structuring
Moreover, it is advisable the simultaneous
usage of several visualization techniques to
better illustrate relationships among a large
amount of data.
Eurostat
9. When Data become Big?
Extreme-scale
Size
Inclusion of visual and analytical
Active involvement of a human
Data in many forms
Structured, unstructured,
text, multimedia
Data in motion
Analysis of streaming data
to enable decisions within
fractions of a second
Data at scale
Petabyte (1015) to
Exabyte (1018)
Complex Information Spaces
(a) the data items being difficultto
compare based on raw data,
(b) data compound of several base data
types
Three critical elements in
applying visual analytics to
extreme-scale data and
complex Information Spaces
Eurostat
10. Complexity and flatness
Eurostat
“The world is complex, dynamic,
multidimensional; the paper is static,
flat.
How are we to represent the rich
visual world of experience and
measurement on mere flatland?”
E. Tu f t e
11. Big Data building blocks
Generic process model,
Big data analytics
processes based on
building blocks [Chau]
Collection
Cleaning
Integration
Visualization
Analysis
Presentation
Dissemination
Some building blocks can be
skipped, depending on the
operating contexts and to go
back (two-way street) is
admitted
Eurostat
12. Role of data visualization in Big Data Life Cycle
Eurostat
• Data visualization can play a
specific role in several
phases of the Big Data Life
Cycle
• Data types can affect
visualization design
• Visualization methods can
informs data cleaning and
the choice of analysis
algorithms
Along the Big Data life cycle, visualization methods can be properly
incorporated in three phases:
• Pre-processing, staging, handling
• Exploratory data analysis
• Presentation of analytical results
13. Three Styles of Big Data Visualization
Remco Chang – Fields Institute 15
Emphasis on… Methodology Author
Data
reduction
Big Data Medium Data Small Data+ R
Filtering Filtering
Wickham
Visual
interaction
New representation pattern + User Interaction
StarGlyphs+Parallel coordinates
Interaction
Carpendale
HCP
Divide and conquer + Parallel Computation
Bowei Xi
Eurostat
14. Visualizing Big Data in Official Statistics
Although there are already many experiences and success
stories in applying data visualization technologies on Big Data,
the most interesting proposals are aimed at future challenges.
The main issues to deal with are focused on the combination of
some basic opportunities like:
Automated
analysis tools
Eurostat
Interactive
visual methods
Traditional
visual analytics
approaches
Presentation
tools
New advanced data
visualization
technologies
Analytic
platforms
15. Automated analysis and interactive visual methods
In order to support the entire life cycle of Big Data, a good visual
analytics system has to combine the advantages of the automatic
analysis with interactive techniques to explore data.
Behind this desired technical feature there is the deeper aim to
integrate the analytic capability of a computer with the
abilities of the human analysis.
volume, velocity, variety
mapping complex data
into more simple visual
forms of knowledge
Appropriately definition
in phase of design and
implementation of
specific weight and right
balancing of the two
components
Eurostat
16. Macro phase Data Processes
Data
management
Selection & Data loading
Integration
Export
Data
handling
Pre-processing, cleaning & transformation
Calculations & querying
Data
modelling
Statistics functions (univariate, bivariate and multivariate analysis)
Clustering, classification, network modelling, predictive analysis
Data projection (Principal Components, Multidimensional scaling,
Self organizing map, Bayesian Network)
Pattern recognition & Visual query analysis (both automated
and interactive)
Data
Visualization Visual Interpretation, evaluation, representation
Eurostat
Automated analysis
Reorganization of the structure of the visual analytics functionalities
17. Automated analysis
Automated analysis of Big Data concerns with the “development of
methods and techniques for making sense of data” [Fayyad]
Simple reports
Descriptive
approximation or model
of the process that
generated the data
Predictive model for
estimating the value of
future cases
Extreme
characteristics
of Big Data
Huge
At low-level
More abstract
Synthetic
Clear
Useful
Specific data-mining
methods for pattern
discovery and
extraction
Eurostat
18. Interactive Visual Analytics techniques with Big Data
Data
preprocessing
through visual
approaches
• Data mining
• Machine learning
• Statistical
methods
Interactive
visualization
Dissemination
tools
• Browse
• search
• monitor
• Show the data
Bring out meaningful:
• patterns
• outliers
• clusters
• gaps
• Discover the most interesting
relationships among data
• Investigate what-if scenarios
• Verify the presence of biases
• Simulate changes impact
• Enlighten the sense of data
• Tell stories about them
Eurostat
19. Interactive visualization
Eurostat
In the context of Big Data some categories as basis of
reasoning can be adopted [Yi-etal-2007]:
• Select (mark something as interesting)
• Explore (show me something else)
• Reconfigure (show me a different arrangement)
• Encode (show me a different representation)
• Abstract/elaborate (show me more or less detail)
• Filter (show me something conditionally)
• Connect (show me related items)
http://www.cs.tufts.edu/comp/250VA/papers/yi2007toward.pdf
23. Interactive visualization
Eurostat
Select
Ability to mark data items of
interest to highlight them
Outlier values
Explore
Enabling users to examine the
different subsets in which the
data can be divided
Panning across the data
Reconfigure
Provide users with different
data perspectives
• Revelation of hidden patterns
• visual rearrangements of a series
Encode
Capability of a visualization
system to handle and
transform the basic elements of
human vision
Pre-attentive processing, colours,
shapes, dimensions
Abstract/
elaborate
Capability of reduce or increase the details of the visualization
Filter
Highlight some visual elements that are compliant with specific
conditions defined by users
Connect
Enables users to better emphasize relationships and associations
already known or discover the hidden patterns of the data
24. Human
perception
objects becomes large, humans
often have difficulty extracting
meaningful information
Limited screen
space
Traditional Visual Analytics tools and techniques don’t
properly fit big data.
Computational problems for VA with Big Data
When the number of visualized
Risk of significant visual clutter
when a visualization displays too
many data
Main
causes
Eurostat
Effects
Traditional vs. New techniques
25. Traditional vs. New techniques
Working with new data sources brings about a number
of analytical challenges
(1) getting the picture right, i.e. summarising
the data
(2) interpreting, or making sense of the data
through inferences
(3) defining and detecting anomalies.
Eurostat
27. 3. Internet of Things (machine-
generated data)
31. Data from sensors
311. Fixed sensors
3111. Home automation
3112. Weather/pollution sensors
3113. Traffic sensors/webcam
3114. Scientific sensors
3115. Security videos/images
312. Mobile sensors (tracking)
3121. Mobile phone location
3122. Cars
3123. Satellite images
32. Data from computer systems
3210. Logs
3220. Web logs
Eurostat
1. Social Networks (human-sourced
information)
1100. Social Networks
1200. Blogs and comments
1300. Personal documents
1400. Pictures: Instagram, Flickr, Picasa
1500. Videos: Youtube etc.
1600. Internet searches
1700. Mobile data content: text messages
1800. User-generated maps
1900. E-Mail
2. Traditional Business systems (process-
mediated data)
21. Data produced by Public Agencies
2110. Medical records
22. Data produced by businesses
2210. Commercial transactions
2220. Banking/stock records
2230. E-commerce
2240. Credit cards
28. 3. Internet of Things (machine-
generated data)
31. Data from sensors
311. Fixed sensors
3111. Home automation
3112. Weather/pollution sensors
3113. Traffic sensors/webcam
3114. Scientific sensors
3115. Security videos/images
312. Mobile sensors (tracking)
3121. Mobile phone location
3122. Cars
3123. Satellite images
32. Data from computer systems
3210. Logs
3220. Web logs
1. Social Networks (human-sourced
information)
1100. Social Networks
1200. Blogs and comments
1300. Personal documents
1400. Pictures: Instagram, Flickr, Picasa
1500. Videos: Youtube etc.
1600. Internet searches
1700. Mobile data content: text messages
1800. User-generated maps
1900. E-Mail
2. Traditional Business systems (process-
mediated data)
21. Data produced by Public Agencies
2110. Medical records
22. Data produced by businesses
Eurostat
2210. Commercial transactions
2220. Banking/stock records
2230. E-commerce
2240. Credit cards
29. Blogopole
http://blogopole.observatoire-presidentielle.fr/
1200. Blogs and comments
«La Blogopole (contraction
de blogosphère politique)
c'est l'ensemble des sites et
blogs de citoyens qui
alimentent le débat politique
en France c'est à dire tant les
hommes politiques, les
militants, les sympathisants
que les commentateurs et
analystes»
Eurostat
31. The Bible
Eurostat
1300. Personal documents
«The bar graph that runs
along the bottom
represents all of the
chapters in the Bible. Books
alternate in color between
white and light gray. The
length of each bar denotes
the number of verses in the
chapter. Each of the 63,779
cross references found in
the Bible is depicted by a
single arc - the color
corresponds to the distance
between the two chapters,
creating a rainbow-like
effect»
http://www.chrisharrison.net/index.php/Visualizations/BibleViz
32. Human emotion
1100. Social Networks
«This video shows
the mood in the U.S.,
as inferred using
over 300 million
tweets, over the
course of the day.
The maps are
represented using
density-preserving
cartograms»
https://www.youtube.com/watch?v=ujcrJZRSGkg
Eurostat
35. 100 seconds of History
1. Human-sourced information
http://flowingdata.com/2011/03/21/history-of-the-world-in-100-seconds-according-to-wikipedia/
Eurostat
For a sort of
evolution of the
world at glance, all
geotagged
Wikipedia articles
have been scraped,
with time attached
to them, providing a
total of 14,238
events.
36. Human disease network
2110. Medical records
«The diseasome website is
a disease/disorder
relationships explorer and a
sample of an innovative
map-oriented scientific work.
Built by a team of
researchers and engineers,
it uses the Human Disease
Network dataset and allows
intuitive knowledge
discovery by mapping its
complexity»
Eurostat
37. «It's also evident
that only a day
later hardly
anybody was
talking about the
hurricane,
showing the
transient nature
of social media,
even for large
global events.»
«…Digital portrait for
each city, formed from
millions of bits of data
as people talked and
interacted about the
biggest events of the
day.»
«…time explodes
outwards from the
centre with each point
representing one
minute giving a
possible 4320 points
–the number of
minutes in three days
–to cover the day
before, during and
after the launch of
4G.»
«In the London
image you can
clearly see when
Hurricane Sandy
hit in New york,
and even when
Obama visited the
city to inspect the
damage.»
Digital City Portraits (launch of 4G by EE)
http://brendandawes.com/projects/ee
1700. Mobile data content: text messages
Eurostat
38. Urban Mobs
3121. Mobile phone location
«Cette visualisation représente
la quantité de SMS envoyés le
soir de la fête de la musique (21
juin 2008).
On peut découvrir à partir de
17h une forte activité aux
alentours du Parc des Princes
que nous pouvons mettre en
parallèle avec le concert de
Tokio Hotel ce soir là. On
remarque ensuite un autre foyer
d'activité à l'hippodrome
d'Auteuil correspondant au
concert organisé par France 2»
http://www.urbanmobs.fr/fr/france/
Eurostat
39. LIVE Singapore!
31. Data from sensors
«Making decisions in sync
with the environment
LIVE Singapore! provides
people with access to a range
of useful real-time information
about their city by developing
an open platform for the
collection, elaboration and
distribution of real-time data
that reflect urban activity.
Giving people visual and
tangible access to real-time
information about their city
enables them to take their
decisions more in sync with
their environment, with what
Eurostat
is actually happening around
them.»
https://www.youtube.com/watch?feature=player_embedded&v=2aEPkyOBtRo
40. San Francisco Transportation
Eurostat
312. Mobile sensors (tracking)
«…data from the Muni (San
Francisco Municipal
Transportation Agency)
showing the geographic
coordinates of their vehicles to
create this map showing
average transit speeds over a
24-hour period.
[…]
Black lines represent very slow
movement under 7 mph. Red
are less than 19 mph. Blue are
less than 43 mph. Green lines
depict faster speeds above 43
mph.»
https://www.flickr.com/photos/walkingsf/4521616274/in/photostream/
43. Hints about Storytelling
Eurostat
“Narrative or recital of an event, or a series of events whether
real or fictitious”
New International Webster’s Comprehensive Dictionary
(2013 edition)
“Programme to make the results of official statistics accessible
and understandable to people and – in fulfilment of an
information mandate – to make "evidence based decision
making" possible.”
Armin Grossenbacher, Federal Statistical Office,
Storytelling revisited, 2010
44. Storytelling principles
Eurostat
1) Gricean Maxims (P
. Grice)
2) Pyramid principle (B. Minto)
3) Seven steps to storytelling (J. Lambert)
4) Scenario for combining data, model and stories
(J. Koomey)
5) Five golden rules for statistics storytellers (D.
Marder)
45. Gricean Maxims
1. Do not say what you
believe to be false.
2. Do not say that for
which you lack
adequate
evidence.
Grice’s
conversational
maxims
necessary.
Be relevant
(that is, say things
related to the current
topic of conversation).
1.Avoid obscurity
of expression.
2. ity.
Avoid ambigu
3. Be brief (avoid
unnecessary wordiness).
4. Be orderly.
1. Make your contribution to the
conversation as informative
as necessary.
2. Do not make your
contribution to the
conversation more
informative than
“Make your
conversational
contribution
what is
required, at
the stage at
which it
occurs, by the
accepted
purpose or
direction of the
talk exchange
in which you
are engaged.”
(P. Grice)
Eurostat
46. Barbara Minto’s pyramid principle
The Answer is your
particularly inspired
way of solving the
problem you are
presenting.
The Situation is
simply the state of
affairs in your
particular area. For
example, your
current growth rate
or your product
offering.
The Complication is what
is changing in your field to
make things more
challenging—it’s the
proverbial thorn in your
side that you have to
remove in order to make
things run smoothly. This
might be your new
competition, or a lack of
fresh prospects.
The Question states
what the situation and
complication are asking.
For instance how do I
achieve double-digit
growth with increased
competition? Or another
question—how do I
reach out to the
particular audience that
I’ve targeted and get
them to buy my product?
http://blog.kurtosys.com/storytell
ing-pyramid-principle/
Eurostat
47. Seven steps to storytelling
Step 1: Owning Your Insights
Step 2: Owning Your Emotions
Step 3: Finding The Moment
Step 4: Seeing Your Story
Step 5: Hearing Your Story
Step 6: Assembling Your Story
Step 7: Sharing Your Story
Joe Lambert, DIGITAL STORYTELLING COOKBOOK – 2010, Digital Diner Press
Insights Emotions
Decisive
Moments
Vision
Narrativ
e
Editing
Sharing
Eurostat
48. Scenario for combining data, model and stories
Turning Numbers Into Knowledge: Mastering the Art of Problem Solving - Jon Koomey
Eurostat
49. Five golden rules for statistics storytellers
Eurostat
… five golden rules that statistical story writers often lose sight
of:
•Write as people speak;
•Don’t just get to the point – start with it;
•Make every sentence relevant to the audience – what’s in it
for them;
•Stay simple, but don’t patronise;
•Use only one idea per sentence.
David Marder, Office for National Statistics.
The Holistic Approach to Statistical Story-Telling,
16 UNECE Work Session on Dissemination of Statistical Commentary (Geneva, 4-5 Dec. 2003).
50. killer-examples
(i.e.: 8 ways to
build an effective
storytelling with
infographic)
http://www.howtostory.be/killer-examples-of-the-best-infographics/
newspaper
flowchart
timeline
bait
comparison
numbers
photos
vision
Eurostat