SlideShare a Scribd company logo
1 of 34
Download to read offline
DATA STORIES
ENGAGING WITH DATA IN A POST-TRUTH WORLD
Elena Simperl
@esimperl
Data science seminar
Feb 19th 2018
“One of the interpretations of the EU referendum
result and the rise of Donald Trump in the US is that
we are now living in a post-truth society - a world in
which anecdotes shared on social media and invented
numbers thrown on the sides of buses are more
trusted and influential than official statistics,
extensive research, and proven expertise. In this
world, scientists, statisticians, analysts, and journalists
must find new ways to bring hard, factual data to
citizens.”
“Data must entertain as well as inform, excite as well
as educate. It must be built with social media sharing
in mind, and become part of our everyday activities
and digital interactions with others.”
Data Stories looks at frameworks and technology to
bring data closer to people through art, games, and
storytelling.
It examines the impact that varying levels of
localisation, topicalisation, participation, and
shareability have on the engagement of the public with
factual evidence.
It delivers tools and guidance for communities and
civic groups to achieve wider participation and support
for their initiatives; and empower artists, designers,
statisticians, analysts, and journalists to communicate
through data in inspiring, informative ways.
“Data is infrastructure. It underpins
transparency, accountability, public services,
business innovation and civil society.”
How do we help people tell their data stories?
What data stories do people share and why?
How do we make data more engaging?
HUMAN DATA
INTERACTION
Term originally introduced
in (Crabtree and Mortier,
2015) in the context of
personal data
A multidisciplinary field
that places human factors
at the centre of attention in
everything data
Considers the whole
interaction process between
people and data, and the
context in which such
interactions takes place
HOW DO WE HELP PEOPLE
TELL THEIR DATA STORIES?
RESEARCH QUESTIONS
• Who searches for
data and why?
• How do people search
for data?
• What sort of queries
do they write?
• Do they need query
writing support?
• How should results be
displayed?
• Do they need one or
more search sessions to
find what the user is
looking for?
• Is the search
exploratory?
• How do people pick
the best results?
CONCEPTUAL
FRAMEWORKS
FOR
INTERACTING
WITH DATA
HELP SYSTEM
DESIGNERS
IDENTIFY USER
TASKS AND
TAILOR
FEATURES
Existing frameworks
 Belkin et al. introduced a faceted approach
to conceptualizing tasks in information
seeking (Belkin et al., 2008)
 Yi et al. introduced a taxonomy of tasks in
information visualisation (Yi et al., 2007)
 We introduced an interaction framework for
structured data (Koesten et al., 2017)
INTERACTING WITH STRUCTURED DATA
Goal or
process
oriented
Web
Data
portals
People
FoI
Relevance
Usability
Quality
Visual scan
Obvious
errors
Basic stats
Headers
Metadata
Koesten, L.M., Kacprzak, E., Tennison, J.F. and Simperl, E., 2017, May. The Trials and Tribulations of Working with Structured Data:-a Study on
Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1277-1289). ACM.
ANALYSIS OF SEARCH BEHAVIOUR
INFORMS THE DESIGN OF DATA SEARCH ENGINES
● Four national open governmental data portals, 2.2
million queries from 2013-2016 (Kacprzak et al., 2017)
● Shorter queries, include temporal and location
information
● Explorative search
● Difference in topics between queries issued directly to
portals and web search engines
● Ongoing work: comparison to data requests
Kacprzak, E., Koesten, L.M., Ibáñez, L.D., Simperl, E. and Tennison, J., A Query Log Analysis of Dataset Search. In International
Conference on Web Engineering (pp. 429-436). Springer, 2017.
DATA SUMMARIES
HELP PEOPLE MAKE SENSE OF DATA EFFECTIVELY
Study with experts and novices,
20 datasets
 Task: Write a summary (100 words)
about the data
 Analysis: thematic analysis, comparison
with existing summaries and metadata
schemas
Automatically generating text
from structured data
 Neural network architecture
 Tested on Dbpedia/Wikidata triples in
English, Arabic, Esperanto
 Text reused by editors to start new
articles
Vougiouklis, P., Elsahar, H., Kaffee, L.A., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2017. Neural Wikipedian:
Generating Textual Summaries from Knowledge Base Triples. arXiv preprint arXiv:1711.00155.
DATA
NEEDS
CONTEXT
See https://beta.ons.gov.uk/datasets/cpih01/editions/time-series/versions/3
WHAT DATA STORIES DO
PEOPLE SHARE AND WHY?
VIRAL DATA
HELPS (ALTERNATIVE) FACTS SPREAD FASTERS
How does data travel?
E.g. on social media
What makes data go viral?
Visualisations?
Subject matter/topic?
“Transmission vectors”: journalists, celebrities,
grassroots, botnets?
CURRENT DATA SHARING PRACTICES ON
TWITTER
• What evidence can we see of data sharing activities?
• What form is data being shared in?
• How are the various stages of the data science pipeline represented?
• Does anyone share raw data?
• Do narratives explicitly reference the data that they are built on?
• How common is data sharing
• Who is it done by?
• How do they do it?
• What kind of data is (not) being shared?
• Who makes use of the data for what purposes?
OFFICIAL DATA
• 6 week Twitter study of
ons.gov.uk
• 1186 original tweets made
by 898 people, with 4906
subsequent retweets
• 15 most active tweeters, half
work for the ONS or are
official accounts of the ONS
• Most retweeted tweet (503
times) is by a BBC journalist
mentioning an ONS data
visualisation
• One of the 64 separate
tweets about this ONS data
release
OPEN DATA
• Six week Twitter study of
data.gov.uk
• 113 original tweets made by
87 different accounts, with
258 subsequent retweets
• No bias towards
organisational affiliation is
present in the set of active
retweeters
• The single most retweeted
tweet (121 times) is by a
Joint Nature Conservation
Committee earth observation
specialist. Mentions a crop
map visualisation from
environment.data.gov.uk
SHARING SPREADSHEETS
• No XLSX, but Google sheets
• 1475 original tweets
from 1067 unique
accounts with 6923
retweets
• Most retweeted spreadsheet
(1188 times)
• Schedule for the timings
of INKIGAYO broadcasts
(famous Korean
livestreamed pop music
program with live voting)
• Sent by account
promoting BTS, a recent
high profile K-pop band
(the first to win a
Billboard newcomers
award in the US)
• Gives detailed song
broadcast timings
SPREADSHEET CATEGORIES AND USE
• Visual inspection of 100 highly
retweeted sheets
• sports statistics (including gambling
analysis)
• computer games statistics
• catalogues of resources/assets
(including artist’s videos or a series of
TV episodes)
• selling goods/artwork/services for a
trader or fan group
• coordinating donations/volunteers,
political info
• coordinating political activity
• music voting
• buying on behalf of an artist
• monitoring cryptocurrency offerings
Simple list 10%
Rich data 40%
Data analysis 10%
Promoting action 15%
Coordinating crowd action 20%
Other 5%
USE OF CHARTS
• 5% (29) of sheets contained charts
• 4 charts intended to promote
subsequent use and discussion
• Survey of fanfic community from NYC festival
attendees
• A maths teacher who takes part in Maths
Teaching discussion groups tweeted a Google
form to record preferences for banana
ripeness
• A study on the citation of Registered Reports
in Cognitive Neuroscience
• Historic weather data collected by a local
citizen offered to a “sports weather”
journalist
Games (trading, playing, curation) 7
Politics (monitoring, organising,
arguing)
6
Surveys (attitudes, phenomena) 4
Financial investment analysis 3
Personal list of assets/achievements 2
TV/radio (voting/ratings) 2
Trading (orders) 1
Miscellaneous data collection
- Historic weather data
- Boeing 787 production data
(hobbyist)
- Google Analytics audit of Udemy
- Academic citation analysis
4
USE OF CHARTS (2)
• 2 charts support an
argument or discussion
• UN data on firearms. Discussion
thread between pro- & anti-
NRA positions. Sent by author,
a senior technologist in
Microsoft.
• Use of the Physics GRE in N
American University Physics
admission processes. Sent by a
delegate at the Conference for
Undergraduate
Underrepresented Minorities in
Physics, not the spreadsheet
author.
MAKING DATA MORE
ENGAGING
Can games help people get
familiar with data?
DATA GAMES
HELP PEOPLE EXPLORE FACTS
Minecraft maps generated
using LIDAR data
Demonstrate effects of global
warming
Create/model archaeological
digs over different time
periods
C. Gutteridge, Magical Minecraft Map Maker, https://www.ecs.soton.ac.uk/news/4827, 2015
Alexa, what’s our discount
levels on those sales?
DATA AS
CULTURE, 2018
Exhibition at the Open
Data Institute, London
Launched January 23rd
2018
Curated by Julie Freeman
and Hannah Redler
Hawes
Dan Hett
Lee
Montgomery
Pip
Thornton
Riita
Oittinen
WE’RE HIRING @esimperl

More Related Content

What's hot

Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Projectmwe400
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analyticssuresh sood
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA DATASCIENCE
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Data journalism presentation
Data journalism presentationData journalism presentation
Data journalism presentationKwami Ahiabenu,II
 

What's hot (20)

GI Management Transformation: from geometry to databased relationships
GI Management Transformation: from geometry to databased relationshipsGI Management Transformation: from geometry to databased relationships
GI Management Transformation: from geometry to databased relationships
 
Data Power
Data PowerData Power
Data Power
 
Today's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's CitizensToday's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's Citizens
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Wire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub ProjectWire Workshop: Overview slides for ArchiveHub Project
Wire Workshop: Overview slides for ArchiveHub Project
 
Data Science and its impact on society
Data Science and its impact on societyData Science and its impact on society
Data Science and its impact on society
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
Data journalism
Data journalism Data journalism
Data journalism
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Intro to Data Analysis Framework
Intro to Data Analysis Framework Intro to Data Analysis Framework
Intro to Data Analysis Framework
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
Lecture #02
Lecture #02 Lecture #02
Lecture #02
 
Homelessness Data Discussion
Homelessness Data DiscussionHomelessness Data Discussion
Homelessness Data Discussion
 
Foresight Analytics
Foresight AnalyticsForesight Analytics
Foresight Analytics
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Web and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sisWeb and Complex Systems Lab @ Kno.e.sis
Web and Complex Systems Lab @ Kno.e.sis
 
Data journalism presentation
Data journalism presentationData journalism presentation
Data journalism presentation
 
DREaM Event 2: Louise Cooke
DREaM Event 2: Louise CookeDREaM Event 2: Louise Cooke
DREaM Event 2: Louise Cooke
 

Similar to Data stories

The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so farElena Simperl
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?Todd Suomela
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...datacite
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!Renaine Julian
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultCASRAI
 
Learning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction NetworksLearning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction NetworksSymeon Papadopoulos
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott LibraryRebekah Cummings
 
Basic conditions, availability, and the value added of open data in comparison
Basic conditions, availability, and the value added of open data in comparisonBasic conditions, availability, and the value added of open data in comparison
Basic conditions, availability, and the value added of open data in comparisonHeinrich-Heine-University Düsseldorf
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
The ECSA Characteristics of Citizen Science
The ECSA Characteristics of Citizen ScienceThe ECSA Characteristics of Citizen Science
The ECSA Characteristics of Citizen ScienceMargaret Gold
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptxRahulTr22
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital AgeEric Meyer
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Micah Altman
 

Similar to Data stories (20)

Öppen data och forskningens genomslag
Öppen data och forskningens genomslagÖppen data och forskningens genomslag
Öppen data och forskningens genomslag
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
Critically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart CityCritically Assembling Data, Processes & Things: Toward and Open Smart City
Critically Assembling Data, Processes & Things: Toward and Open Smart City
 
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
DigiCCurr 2013 PhD Workshop - Citizen Science and Data Curation: Who needs what?
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
 
The Power of Open Data!
The Power of Open Data!The Power of Open Data!
The Power of Open Data!
 
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. LauriaultKeynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
Keynote: Today's Data Grow Tomorrow's Citizens - Tracey P. Lauriault
 
Learning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction NetworksLearning to Classify Users in Online Interaction Networks
Learning to Classify Users in Online Interaction Networks
 
Ongoing Research in Data Studies
Ongoing Research in Data StudiesOngoing Research in Data Studies
Ongoing Research in Data Studies
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Basic conditions, availability, and the value added of open data in comparison
Basic conditions, availability, and the value added of open data in comparisonBasic conditions, availability, and the value added of open data in comparison
Basic conditions, availability, and the value added of open data in comparison
 
Lowenberg Making Data Count
Lowenberg Making Data CountLowenberg Making Data Count
Lowenberg Making Data Count
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
The ECSA Characteristics of Citizen Science
The ECSA Characteristics of Citizen ScienceThe ECSA Characteristics of Citizen Science
The ECSA Characteristics of Citizen Science
 
1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx1. Data Science overview - part1.pptx
1. Data Science overview - part1.pptx
 
Scholarship in the Digital Age
Scholarship in the Digital AgeScholarship in the Digital Age
Scholarship in the Digital Age
 
Automating Homelessness
Automating HomelessnessAutomating Homelessness
Automating Homelessness
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse
 

More from Elena Simperl

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceElena Simperl
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationElena Simperl
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactElena Simperl
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfElena Simperl
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringElena Simperl
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfElena Simperl
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Elena Simperl
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesElena Simperl
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impactElena Simperl
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...Elena Simperl
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesElena Simperl
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachElena Simperl
 
Making transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factorMaking transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factorElena Simperl
 
Quality and collaboration in Wikidata
Quality and collaboration in WikidataQuality and collaboration in Wikidata
Quality and collaboration in WikidataElena Simperl
 
Beyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasksBeyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasksElena Simperl
 

More from Elena Simperl (20)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?Are our knowledge graphs trustworthy?
Are our knowledge graphs trustworthy?
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The human face of AI: how collective and augmented intelligence can help sol...
The human face of AI:  how collective and augmented intelligence can help sol...The human face of AI:  how collective and augmented intelligence can help sol...
The human face of AI: how collective and augmented intelligence can help sol...
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 
Making transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factorMaking transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factor
 
Data storytelling
Data storytelling Data storytelling
Data storytelling
 
Quality and collaboration in Wikidata
Quality and collaboration in WikidataQuality and collaboration in Wikidata
Quality and collaboration in Wikidata
 
Beyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasksBeyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasks
 
The Data Pitch call
The Data Pitch callThe Data Pitch call
The Data Pitch call
 

Recently uploaded

办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一z xss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxviniciusperissetr
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 

Recently uploaded (20)

办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
办美国加州大学伯克利分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
办理(UC毕业证书)堪培拉大学毕业证成绩单原版一比一
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
SWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptxSWOT Analysis Slides Powerpoint Template.pptx
SWOT Analysis Slides Powerpoint Template.pptx
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 

Data stories

  • 1. DATA STORIES ENGAGING WITH DATA IN A POST-TRUTH WORLD Elena Simperl @esimperl Data science seminar Feb 19th 2018
  • 2. “One of the interpretations of the EU referendum result and the rise of Donald Trump in the US is that we are now living in a post-truth society - a world in which anecdotes shared on social media and invented numbers thrown on the sides of buses are more trusted and influential than official statistics, extensive research, and proven expertise. In this world, scientists, statisticians, analysts, and journalists must find new ways to bring hard, factual data to citizens.” “Data must entertain as well as inform, excite as well as educate. It must be built with social media sharing in mind, and become part of our everyday activities and digital interactions with others.”
  • 3. Data Stories looks at frameworks and technology to bring data closer to people through art, games, and storytelling. It examines the impact that varying levels of localisation, topicalisation, participation, and shareability have on the engagement of the public with factual evidence. It delivers tools and guidance for communities and civic groups to achieve wider participation and support for their initiatives; and empower artists, designers, statisticians, analysts, and journalists to communicate through data in inspiring, informative ways.
  • 4. “Data is infrastructure. It underpins transparency, accountability, public services, business innovation and civil society.”
  • 5.
  • 6.
  • 7. How do we help people tell their data stories? What data stories do people share and why? How do we make data more engaging?
  • 8. HUMAN DATA INTERACTION Term originally introduced in (Crabtree and Mortier, 2015) in the context of personal data A multidisciplinary field that places human factors at the centre of attention in everything data Considers the whole interaction process between people and data, and the context in which such interactions takes place
  • 9. HOW DO WE HELP PEOPLE TELL THEIR DATA STORIES?
  • 10.
  • 11. RESEARCH QUESTIONS • Who searches for data and why? • How do people search for data? • What sort of queries do they write? • Do they need query writing support? • How should results be displayed? • Do they need one or more search sessions to find what the user is looking for? • Is the search exploratory? • How do people pick the best results?
  • 12. CONCEPTUAL FRAMEWORKS FOR INTERACTING WITH DATA HELP SYSTEM DESIGNERS IDENTIFY USER TASKS AND TAILOR FEATURES Existing frameworks  Belkin et al. introduced a faceted approach to conceptualizing tasks in information seeking (Belkin et al., 2008)  Yi et al. introduced a taxonomy of tasks in information visualisation (Yi et al., 2007)  We introduced an interaction framework for structured data (Koesten et al., 2017)
  • 13. INTERACTING WITH STRUCTURED DATA Goal or process oriented Web Data portals People FoI Relevance Usability Quality Visual scan Obvious errors Basic stats Headers Metadata Koesten, L.M., Kacprzak, E., Tennison, J.F. and Simperl, E., 2017, May. The Trials and Tribulations of Working with Structured Data:-a Study on Information Seeking Behaviour. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (pp. 1277-1289). ACM.
  • 14. ANALYSIS OF SEARCH BEHAVIOUR INFORMS THE DESIGN OF DATA SEARCH ENGINES ● Four national open governmental data portals, 2.2 million queries from 2013-2016 (Kacprzak et al., 2017) ● Shorter queries, include temporal and location information ● Explorative search ● Difference in topics between queries issued directly to portals and web search engines ● Ongoing work: comparison to data requests Kacprzak, E., Koesten, L.M., Ibáñez, L.D., Simperl, E. and Tennison, J., A Query Log Analysis of Dataset Search. In International Conference on Web Engineering (pp. 429-436). Springer, 2017.
  • 15. DATA SUMMARIES HELP PEOPLE MAKE SENSE OF DATA EFFECTIVELY Study with experts and novices, 20 datasets  Task: Write a summary (100 words) about the data  Analysis: thematic analysis, comparison with existing summaries and metadata schemas Automatically generating text from structured data  Neural network architecture  Tested on Dbpedia/Wikidata triples in English, Arabic, Esperanto  Text reused by editors to start new articles Vougiouklis, P., Elsahar, H., Kaffee, L.A., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2017. Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples. arXiv preprint arXiv:1711.00155.
  • 18. WHAT DATA STORIES DO PEOPLE SHARE AND WHY?
  • 19. VIRAL DATA HELPS (ALTERNATIVE) FACTS SPREAD FASTERS How does data travel? E.g. on social media What makes data go viral? Visualisations? Subject matter/topic? “Transmission vectors”: journalists, celebrities, grassroots, botnets?
  • 20. CURRENT DATA SHARING PRACTICES ON TWITTER • What evidence can we see of data sharing activities? • What form is data being shared in? • How are the various stages of the data science pipeline represented? • Does anyone share raw data? • Do narratives explicitly reference the data that they are built on? • How common is data sharing • Who is it done by? • How do they do it? • What kind of data is (not) being shared? • Who makes use of the data for what purposes?
  • 21. OFFICIAL DATA • 6 week Twitter study of ons.gov.uk • 1186 original tweets made by 898 people, with 4906 subsequent retweets • 15 most active tweeters, half work for the ONS or are official accounts of the ONS • Most retweeted tweet (503 times) is by a BBC journalist mentioning an ONS data visualisation • One of the 64 separate tweets about this ONS data release
  • 22. OPEN DATA • Six week Twitter study of data.gov.uk • 113 original tweets made by 87 different accounts, with 258 subsequent retweets • No bias towards organisational affiliation is present in the set of active retweeters • The single most retweeted tweet (121 times) is by a Joint Nature Conservation Committee earth observation specialist. Mentions a crop map visualisation from environment.data.gov.uk
  • 23. SHARING SPREADSHEETS • No XLSX, but Google sheets • 1475 original tweets from 1067 unique accounts with 6923 retweets • Most retweeted spreadsheet (1188 times) • Schedule for the timings of INKIGAYO broadcasts (famous Korean livestreamed pop music program with live voting) • Sent by account promoting BTS, a recent high profile K-pop band (the first to win a Billboard newcomers award in the US) • Gives detailed song broadcast timings
  • 24. SPREADSHEET CATEGORIES AND USE • Visual inspection of 100 highly retweeted sheets • sports statistics (including gambling analysis) • computer games statistics • catalogues of resources/assets (including artist’s videos or a series of TV episodes) • selling goods/artwork/services for a trader or fan group • coordinating donations/volunteers, political info • coordinating political activity • music voting • buying on behalf of an artist • monitoring cryptocurrency offerings Simple list 10% Rich data 40% Data analysis 10% Promoting action 15% Coordinating crowd action 20% Other 5%
  • 25. USE OF CHARTS • 5% (29) of sheets contained charts • 4 charts intended to promote subsequent use and discussion • Survey of fanfic community from NYC festival attendees • A maths teacher who takes part in Maths Teaching discussion groups tweeted a Google form to record preferences for banana ripeness • A study on the citation of Registered Reports in Cognitive Neuroscience • Historic weather data collected by a local citizen offered to a “sports weather” journalist Games (trading, playing, curation) 7 Politics (monitoring, organising, arguing) 6 Surveys (attitudes, phenomena) 4 Financial investment analysis 3 Personal list of assets/achievements 2 TV/radio (voting/ratings) 2 Trading (orders) 1 Miscellaneous data collection - Historic weather data - Boeing 787 production data (hobbyist) - Google Analytics audit of Udemy - Academic citation analysis 4
  • 26. USE OF CHARTS (2) • 2 charts support an argument or discussion • UN data on firearms. Discussion thread between pro- & anti- NRA positions. Sent by author, a senior technologist in Microsoft. • Use of the Physics GRE in N American University Physics admission processes. Sent by a delegate at the Conference for Undergraduate Underrepresented Minorities in Physics, not the spreadsheet author.
  • 28. Can games help people get familiar with data?
  • 29. DATA GAMES HELP PEOPLE EXPLORE FACTS Minecraft maps generated using LIDAR data Demonstrate effects of global warming Create/model archaeological digs over different time periods C. Gutteridge, Magical Minecraft Map Maker, https://www.ecs.soton.ac.uk/news/4827, 2015
  • 30. Alexa, what’s our discount levels on those sales?
  • 31. DATA AS CULTURE, 2018 Exhibition at the Open Data Institute, London Launched January 23rd 2018 Curated by Julie Freeman and Hannah Redler Hawes
  • 33.