Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Data stories
1. DATA STORIES
ENGAGING WITH DATA IN A POST-TRUTH WORLD
Elena Simperl
@esimperl
Data science seminar
Feb 19th 2018
2. “One of the interpretations of the EU referendum
result and the rise of Donald Trump in the US is that
we are now living in a post-truth society - a world in
which anecdotes shared on social media and invented
numbers thrown on the sides of buses are more
trusted and influential than official statistics,
extensive research, and proven expertise. In this
world, scientists, statisticians, analysts, and journalists
must find new ways to bring hard, factual data to
citizens.”
“Data must entertain as well as inform, excite as well
as educate. It must be built with social media sharing
in mind, and become part of our everyday activities
and digital interactions with others.”
3. Data Stories looks at frameworks and technology to
bring data closer to people through art, games, and
storytelling.
It examines the impact that varying levels of
localisation, topicalisation, participation, and
shareability have on the engagement of the public with
factual evidence.
It delivers tools and guidance for communities and
civic groups to achieve wider participation and support
for their initiatives; and empower artists, designers,
statisticians, analysts, and journalists to communicate
through data in inspiring, informative ways.
4. “Data is infrastructure. It underpins
transparency, accountability, public services,
business innovation and civil society.”
5.
6.
7. How do we help people tell their data stories?
What data stories do people share and why?
How do we make data more engaging?
8. HUMAN DATA
INTERACTION
Term originally introduced
in (Crabtree and Mortier,
2015) in the context of
personal data
A multidisciplinary field
that places human factors
at the centre of attention in
everything data
Considers the whole
interaction process between
people and data, and the
context in which such
interactions takes place
11. RESEARCH QUESTIONS
• Who searches for
data and why?
• How do people search
for data?
• What sort of queries
do they write?
• Do they need query
writing support?
• How should results be
displayed?
• Do they need one or
more search sessions to
find what the user is
looking for?
• Is the search
exploratory?
• How do people pick
the best results?
12. CONCEPTUAL
FRAMEWORKS
FOR
INTERACTING
WITH DATA
HELP SYSTEM
DESIGNERS
IDENTIFY USER
TASKS AND
TAILOR
FEATURES
Existing frameworks
Belkin et al. introduced a faceted approach
to conceptualizing tasks in information
seeking (Belkin et al., 2008)
Yi et al. introduced a taxonomy of tasks in
information visualisation (Yi et al., 2007)
We introduced an interaction framework for
structured data (Koesten et al., 2017)
13. INTERACTING WITH STRUCTURED DATA
Goal or
process
oriented
Web
Data
portals
People
FoI
Relevance
Usability
Quality
Visual scan
Obvious
errors
Basic stats
Headers
Metadata
Koesten, L.M., Kacprzak, E., Tennison, J.F. and Simperl, E., 2017, May. The Trials and Tribulations of Working with Structured Data:-a Study on
Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1277-1289). ACM.
14. ANALYSIS OF SEARCH BEHAVIOUR
INFORMS THE DESIGN OF DATA SEARCH ENGINES
● Four national open governmental data portals, 2.2
million queries from 2013-2016 (Kacprzak et al., 2017)
● Shorter queries, include temporal and location
information
● Explorative search
● Difference in topics between queries issued directly to
portals and web search engines
● Ongoing work: comparison to data requests
Kacprzak, E., Koesten, L.M., Ibáñez, L.D., Simperl, E. and Tennison, J., A Query Log Analysis of Dataset Search. In International
Conference on Web Engineering (pp. 429-436). Springer, 2017.
15. DATA SUMMARIES
HELP PEOPLE MAKE SENSE OF DATA EFFECTIVELY
Study with experts and novices,
20 datasets
Task: Write a summary (100 words)
about the data
Analysis: thematic analysis, comparison
with existing summaries and metadata
schemas
Automatically generating text
from structured data
Neural network architecture
Tested on Dbpedia/Wikidata triples in
English, Arabic, Esperanto
Text reused by editors to start new
articles
Vougiouklis, P., Elsahar, H., Kaffee, L.A., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2017. Neural Wikipedian:
Generating Textual Summaries from Knowledge Base Triples. arXiv preprint arXiv:1711.00155.
19. VIRAL DATA
HELPS (ALTERNATIVE) FACTS SPREAD FASTERS
How does data travel?
E.g. on social media
What makes data go viral?
Visualisations?
Subject matter/topic?
“Transmission vectors”: journalists, celebrities,
grassroots, botnets?
20. CURRENT DATA SHARING PRACTICES ON
TWITTER
• What evidence can we see of data sharing activities?
• What form is data being shared in?
• How are the various stages of the data science pipeline represented?
• Does anyone share raw data?
• Do narratives explicitly reference the data that they are built on?
• How common is data sharing
• Who is it done by?
• How do they do it?
• What kind of data is (not) being shared?
• Who makes use of the data for what purposes?
21. OFFICIAL DATA
• 6 week Twitter study of
ons.gov.uk
• 1186 original tweets made
by 898 people, with 4906
subsequent retweets
• 15 most active tweeters, half
work for the ONS or are
official accounts of the ONS
• Most retweeted tweet (503
times) is by a BBC journalist
mentioning an ONS data
visualisation
• One of the 64 separate
tweets about this ONS data
release
22. OPEN DATA
• Six week Twitter study of
data.gov.uk
• 113 original tweets made by
87 different accounts, with
258 subsequent retweets
• No bias towards
organisational affiliation is
present in the set of active
retweeters
• The single most retweeted
tweet (121 times) is by a
Joint Nature Conservation
Committee earth observation
specialist. Mentions a crop
map visualisation from
environment.data.gov.uk
23. SHARING SPREADSHEETS
• No XLSX, but Google sheets
• 1475 original tweets
from 1067 unique
accounts with 6923
retweets
• Most retweeted spreadsheet
(1188 times)
• Schedule for the timings
of INKIGAYO broadcasts
(famous Korean
livestreamed pop music
program with live voting)
• Sent by account
promoting BTS, a recent
high profile K-pop band
(the first to win a
Billboard newcomers
award in the US)
• Gives detailed song
broadcast timings
24. SPREADSHEET CATEGORIES AND USE
• Visual inspection of 100 highly
retweeted sheets
• sports statistics (including gambling
analysis)
• computer games statistics
• catalogues of resources/assets
(including artist’s videos or a series of
TV episodes)
• selling goods/artwork/services for a
trader or fan group
• coordinating donations/volunteers,
political info
• coordinating political activity
• music voting
• buying on behalf of an artist
• monitoring cryptocurrency offerings
Simple list 10%
Rich data 40%
Data analysis 10%
Promoting action 15%
Coordinating crowd action 20%
Other 5%
25. USE OF CHARTS
• 5% (29) of sheets contained charts
• 4 charts intended to promote
subsequent use and discussion
• Survey of fanfic community from NYC festival
attendees
• A maths teacher who takes part in Maths
Teaching discussion groups tweeted a Google
form to record preferences for banana
ripeness
• A study on the citation of Registered Reports
in Cognitive Neuroscience
• Historic weather data collected by a local
citizen offered to a “sports weather”
journalist
Games (trading, playing, curation) 7
Politics (monitoring, organising,
arguing)
6
Surveys (attitudes, phenomena) 4
Financial investment analysis 3
Personal list of assets/achievements 2
TV/radio (voting/ratings) 2
Trading (orders) 1
Miscellaneous data collection
- Historic weather data
- Boeing 787 production data
(hobbyist)
- Google Analytics audit of Udemy
- Academic citation analysis
4
26. USE OF CHARTS (2)
• 2 charts support an
argument or discussion
• UN data on firearms. Discussion
thread between pro- & anti-
NRA positions. Sent by author,
a senior technologist in
Microsoft.
• Use of the Physics GRE in N
American University Physics
admission processes. Sent by a
delegate at the Conference for
Undergraduate
Underrepresented Minorities in
Physics, not the spreadsheet
author.
29. DATA GAMES
HELP PEOPLE EXPLORE FACTS
Minecraft maps generated
using LIDAR data
Demonstrate effects of global
warming
Create/model archaeological
digs over different time
periods
C. Gutteridge, Magical Minecraft Map Maker, https://www.ecs.soton.ac.uk/news/4827, 2015