SlideShare a Scribd company logo
1 of 41
Download to read offline
The web of data:
how are we
doing so far?
E L E N A S I M P E R L
K I N G ’ S C O L L E G E LO N D O N
@ E S I M P E R L
THE WEB CONFERENCE, APRIL 2021
The web has shaped our understanding and
interactions with data
Answering
factual
questions
Sharing
data
online
Publishing
data for
others to
use
Creating
datasets in
collaboration
Creating
digital
traces
Labelling
data for
algorithms
to use
(Source: Fensel, 2013)
The theory and practice of the web of data
are different
We are living through a crucial moment in
how data is published and used on the web
(Source: Hitzler, 2021)
European Data Portal
Technology, resources and support to increase the value of European open government data
Highlights of our work
Supporting the entire data value chain from publishing to reuse
Low uptake of linked data, limited vocabulary
reuse, proprietary, non-dereferenceable
vocabularies, reasonable metadata quality
Content metadata published as linked data,
joint data model, data sharing framework,
Europeana identifiers
25 million datasets (DCAT, schema.org) in
summer 2020
There is a lot of annotated data online,
especially about products, people and
businesses
Making portals more user-centric
Walker & Simperl, 2017
The ten guidelines
Organise for use of the datasets - rather than simply for publication
Promote use through data storytelling and community building, borrowing from open-source communities and other
peer-production systems
Invest in discoverability best practices, borrowing from e-commerce and web search
Publish good quality metadata - to enhance reuse
Adopt standards to ensure interoperability
Co-locate tools so that a wider range of users can be engaged with
Link datasets to enhance value
Be accessible by offering options for from APIs to CSV downloads
Co-locate documentation - users should not need to be domain experts to understand the data;
Be measurable - as a way to assess how well they are meeting users’ needs.
Operationalising the guidance
Literature review to develop 5* schemes to operationalise indicators.
Application of the schemes on 10 open data portals at different maturity level.
(Walker & Simperl, 2017)
Example: Organise for use
Each dataset is accompanied by a comprehensive descriptive
record (going beyond a collection of structured metadata)
An extract of the data can be previewed (for sense making)
The portal provides recommendations for related datasets
The portal enables users to review/rate the datasets
Keywords from datasets are linked to other published datasets
Example: Promote for use
The portal is connected with social media to create a social distribution channel
for open data.
The portal provides users with online support for feedback, to request/suggest
the publication of new datasets, and when problems arise during use (e.g.
contact form, discussion forum, FAQs, helpdesk, search tips, tutorials, demos).
The portal provides a way for users to keep informed of updates to the data (e.g.
news feed).
Datasets are accompanied by links or resources that provide user guidance and
support.
Examples of reuse (fictitious or real) are provided (e.g. information contributed
by other users, last reuse, best reuse, data stories).
Example: Co-locate documentation
Supporting documentation does not exist.
Supporting documentation exists, but as a document found separately from the data.
Supporting documentation is found at the same time as the data (e.g. the link to the document is
next to the link to the data in the search).
Supporting documentation can be immediately accessed from within the dataset but it is not context
sensitive (e.g. a link to the documentation or text contained within the dataset).
Supporting documentation can be immediately accessed from within the dataset and it is context
sensitive so that users can immediately access information about a specific item of concern (e.g. a
link to a specific point in the documentation or the text contained within the dataset).
Varying open data maturity levels
Be discoverable , co-locate documentation, be
measurable are universally challenging
A lot of guidance available already
Is there any evidence that it works?
Be
measurable:
GitHub as a
data platform
~1.4 million datasets (e.g. CSV,
excel) from ~65K repos
Map literature features to both
dataset and repository features
Use engagement metrics as
proxies for data reuse
Train a predictive model to see
what publishing guidance leads to
higher engagement values
Size Attributes
Age
Quality
Documentation
Reviews
Recommendations for publishers
Co-locate documentation:
◦ Informative, short text about the dataset
◦ Comprehensive README file in a structured form,
links to further information
Co-locate tools:
◦ Standard processable file sizes for dataset
distributions
◦ Openable with a standard configuration of a
common library (such as Pandas)
Can people find the data they need?
Analysis of logs and data requests
(2018)
• Four national open government data portals, 2.2 million queries (2013 – 2016), 1500 data
requests.
• Shorter queries, include temporal and location information.
• Explorative search.
• Native and external queries topically different.
• Data requests offer more context to user intent.
Analysis of logs (2020 - 21)
844k sessions from 04/2018 to 06/2020, web search as well as
native search sessions from the European Data Portal
Location, provenance, format, licence, time frame and date, publishing
date, location of publication and data schema
Mostly web search, web search and native search users have different
information needs and different success rates
Dataset preview page is important in web search
Linking to stories and other content helps with traffic
Recommendations for publishers
Two types of
users
Spatial and
temporal
queries
Result
presentation
Quality
reviews
Data stories
More logs
needed!
Data documentation and sensemaking
practices
(Source: Gregory et al., 2020)
Data work is teamwork
Open approaches and standards work best when solving
actual problems. These problems are rarely about a set of
technologies.
Conclusions
We are at a crucial moment in data availability and use, online and elsewhere
There is an increasing body of evidence about what people’s data needs and about how data is
published on the web
We don’t have links and we don’t always have great business cases for creating and maintaining
them on the open, decentralised web. In fact, we need better models to resource data publishing
all together
There are other data modalities e.g. charts which web technologies can help share responsibly
Metadata vocabularies used where there is a clear
business case
More documentation needed to make data useful
for others
Some data is missing, with serious consequences
Charts as alternative to ‘raw’ data. Where are the
links to the data?
Thank you
Talking Datasets: understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E Simperl.
International Journal of Human-Computer Studies. 146:102562. 2021
Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten, E Simperl,
E Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies. 2019
Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E Kacprzak, E
Simperl, J Tennison; ACM CHI Conference on Human Factors in Computing Systems, CHI 2019.
Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E Kacprzak, P Groth.
The International Journal on Very Large Data Bases, 2019.
Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L Koesten, LD
Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web Semantics, 2018
Making sense of numerical data-semantic labelling of web tables. Kacprzak, E., Giménez-García, J.M., Piscopo,
A., Koesten, L., Ibáñez, L.D., Tennison, J. and Simperl, E. In European Knowledge Acquisition Workshop (pp. 163-
178). Springer, 2018
The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. L
Koesten, E Kacprzak, J Tennison, E Simperl. Proceedings of ACM CHI Conference on Human Factors in
Computing Systems, CHI 2017
Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P Groth - Patterns,
2020
Characterising Dataset Search on the European Data Portal . L Ibáñez, L Koesten, E Kacprzak, E Simperl.
European Data Portal Analytical Report 18, 2020
Understanding Supply and Demand on the European Data Portal. L Ibáñez, E Simperl. European Data Portal
Analytical Report 19, 2020
The Future of Open Data Portals. J Walker, E Simperl. European Data Portal Analytical Report 8, 2017
Smart Rural: The Open Data Gap. J Walker, G Thuermer, E Simperl, L Carr. Data for Policy, 2020

More Related Content

What's hot

Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Artificial Intelligence Institute at UofSC
 

What's hot (20)

Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data science
Data science Data science
Data science
 
Data Discovery and Visualization
Data Discovery and VisualizationData Discovery and Visualization
Data Discovery and Visualization
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
 
Noshir Contractor's view on the future of Linked Data
Noshir Contractor's view on the future of Linked DataNoshir Contractor's view on the future of Linked Data
Noshir Contractor's view on the future of Linked Data
 
Data Science and its impact on society
Data Science and its impact on societyData Science and its impact on society
Data Science and its impact on society
 
Crowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart citiesCrowdsourcing and citizen engagement for people-centric smart cities
Crowdsourcing and citizen engagement for people-centric smart cities
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
Graphs in Government
Graphs in GovernmentGraphs in Government
Graphs in Government
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
 
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
 
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
EDF2013: Invited talk Florian Bauer: Unleashing climate and energy knowledge ...
 
SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"SMART Seminar Series: "From Big Data to Smart data"
SMART Seminar Series: "From Big Data to Smart data"
 
State of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - KeynoteState of Florida Neo4J Graph Briefing - Keynote
State of Florida Neo4J Graph Briefing - Keynote
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
 
Bigdatacooltools
BigdatacooltoolsBigdatacooltools
Bigdatacooltools
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
 

Similar to The web of data: how are we doing so far?

Data ecosystems: turning data into public value
Data ecosystems:  turning data into public valueData ecosystems:  turning data into public value
Data ecosystems: turning data into public value
Slim Turki, Dr.
 

Similar to The web of data: how are we doing so far? (20)

Open government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impactOpen government data portals: from publishing to use and impact
Open government data portals: from publishing to use and impact
 
The web of data: how are we doing so far
The web of data: how are we doing so farThe web of data: how are we doing so far
The web of data: how are we doing so far
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 
Publishing Data on the Web
Publishing Data on the Web Publishing Data on the Web
Publishing Data on the Web
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
 
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repositoryEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 
Open Data is not Enough
Open Data is not EnoughOpen Data is not Enough
Open Data is not Enough
 
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Data ecosystems: turning data into public value
Data ecosystems:  turning data into public valueData ecosystems:  turning data into public value
Data ecosystems: turning data into public value
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Matching data detection for the integration system
Matching data detection for the integration systemMatching data detection for the integration system
Matching data detection for the integration system
 
Open Data and Institutional Repositories
Open Data and Institutional RepositoriesOpen Data and Institutional Repositories
Open Data and Institutional Repositories
 

More from Elena Simperl

More from Elena Simperl (19)

This talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing scienceThis talk was not generated with ChatGPT: how AI is changing science
This talk was not generated with ChatGPT: how AI is changing science
 
Knowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generationKnowledge graph use cases in natural language generation
Knowledge graph use cases in natural language generation
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Ten myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdfTen myths about knowledge graphs.pdf
Ten myths about knowledge graphs.pdf
 
What Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineeringWhat Wikidata teaches us about knowledge engineering
What Wikidata teaches us about knowledge engineering
 
Data commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdfData commons and their role in fighting misinformation.pdf
Data commons and their role in fighting misinformation.pdf
 
High-value datasets: from publication to impact
High-value datasets: from publication to impactHigh-value datasets: from publication to impact
High-value datasets: from publication to impact
 
The story of Data Stories
The story of Data StoriesThe story of Data Stories
The story of Data Stories
 
Qrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart citiesQrowd and the city: designing people-centric smart cities
Qrowd and the city: designing people-centric smart cities
 
Qrowd and the city
Qrowd and the cityQrowd and the city
Qrowd and the city
 
Inclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approachInclusive cities: a crowdsourcing approach
Inclusive cities: a crowdsourcing approach
 
Making transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factorMaking transport smarter, leveraging the human factor
Making transport smarter, leveraging the human factor
 
Data storytelling
Data storytelling Data storytelling
Data storytelling
 
Quality and collaboration in Wikidata
Quality and collaboration in WikidataQuality and collaboration in Wikidata
Quality and collaboration in Wikidata
 
Beyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasksBeyond monetary incentives: experiments with paid microtasks
Beyond monetary incentives: experiments with paid microtasks
 
The Data Pitch call
The Data Pitch callThe Data Pitch call
The Data Pitch call
 
The business of open data
The business of open dataThe business of open data
The business of open data
 
Open data – are we done
Open data – are we doneOpen data – are we done
Open data – are we done
 

Recently uploaded

1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
adet6151
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
hwhqz6r1y
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 

Recently uploaded (20)

How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 

The web of data: how are we doing so far?

  • 1. The web of data: how are we doing so far? E L E N A S I M P E R L K I N G ’ S C O L L E G E LO N D O N @ E S I M P E R L THE WEB CONFERENCE, APRIL 2021
  • 2.
  • 3. The web has shaped our understanding and interactions with data
  • 11. The theory and practice of the web of data are different We are living through a crucial moment in how data is published and used on the web
  • 13. European Data Portal Technology, resources and support to increase the value of European open government data
  • 14. Highlights of our work Supporting the entire data value chain from publishing to reuse
  • 15. Low uptake of linked data, limited vocabulary reuse, proprietary, non-dereferenceable vocabularies, reasonable metadata quality
  • 16. Content metadata published as linked data, joint data model, data sharing framework, Europeana identifiers
  • 17. 25 million datasets (DCAT, schema.org) in summer 2020
  • 18. There is a lot of annotated data online, especially about products, people and businesses
  • 19. Making portals more user-centric Walker & Simperl, 2017
  • 20. The ten guidelines Organise for use of the datasets - rather than simply for publication Promote use through data storytelling and community building, borrowing from open-source communities and other peer-production systems Invest in discoverability best practices, borrowing from e-commerce and web search Publish good quality metadata - to enhance reuse Adopt standards to ensure interoperability Co-locate tools so that a wider range of users can be engaged with Link datasets to enhance value Be accessible by offering options for from APIs to CSV downloads Co-locate documentation - users should not need to be domain experts to understand the data; Be measurable - as a way to assess how well they are meeting users’ needs.
  • 21. Operationalising the guidance Literature review to develop 5* schemes to operationalise indicators. Application of the schemes on 10 open data portals at different maturity level. (Walker & Simperl, 2017)
  • 22. Example: Organise for use Each dataset is accompanied by a comprehensive descriptive record (going beyond a collection of structured metadata) An extract of the data can be previewed (for sense making) The portal provides recommendations for related datasets The portal enables users to review/rate the datasets Keywords from datasets are linked to other published datasets
  • 23. Example: Promote for use The portal is connected with social media to create a social distribution channel for open data. The portal provides users with online support for feedback, to request/suggest the publication of new datasets, and when problems arise during use (e.g. contact form, discussion forum, FAQs, helpdesk, search tips, tutorials, demos). The portal provides a way for users to keep informed of updates to the data (e.g. news feed). Datasets are accompanied by links or resources that provide user guidance and support. Examples of reuse (fictitious or real) are provided (e.g. information contributed by other users, last reuse, best reuse, data stories).
  • 24. Example: Co-locate documentation Supporting documentation does not exist. Supporting documentation exists, but as a document found separately from the data. Supporting documentation is found at the same time as the data (e.g. the link to the document is next to the link to the data in the search). Supporting documentation can be immediately accessed from within the dataset but it is not context sensitive (e.g. a link to the documentation or text contained within the dataset). Supporting documentation can be immediately accessed from within the dataset and it is context sensitive so that users can immediately access information about a specific item of concern (e.g. a link to a specific point in the documentation or the text contained within the dataset).
  • 25. Varying open data maturity levels Be discoverable , co-locate documentation, be measurable are universally challenging
  • 26. A lot of guidance available already
  • 27. Is there any evidence that it works?
  • 28. Be measurable: GitHub as a data platform ~1.4 million datasets (e.g. CSV, excel) from ~65K repos Map literature features to both dataset and repository features Use engagement metrics as proxies for data reuse Train a predictive model to see what publishing guidance leads to higher engagement values Size Attributes Age Quality Documentation Reviews
  • 29. Recommendations for publishers Co-locate documentation: ◦ Informative, short text about the dataset ◦ Comprehensive README file in a structured form, links to further information Co-locate tools: ◦ Standard processable file sizes for dataset distributions ◦ Openable with a standard configuration of a common library (such as Pandas)
  • 30. Can people find the data they need?
  • 31. Analysis of logs and data requests (2018) • Four national open government data portals, 2.2 million queries (2013 – 2016), 1500 data requests. • Shorter queries, include temporal and location information. • Explorative search. • Native and external queries topically different. • Data requests offer more context to user intent.
  • 32. Analysis of logs (2020 - 21) 844k sessions from 04/2018 to 06/2020, web search as well as native search sessions from the European Data Portal Location, provenance, format, licence, time frame and date, publishing date, location of publication and data schema Mostly web search, web search and native search users have different information needs and different success rates Dataset preview page is important in web search Linking to stories and other content helps with traffic
  • 33. Recommendations for publishers Two types of users Spatial and temporal queries Result presentation Quality reviews Data stories More logs needed!
  • 34. Data documentation and sensemaking practices
  • 35. (Source: Gregory et al., 2020) Data work is teamwork
  • 36. Open approaches and standards work best when solving actual problems. These problems are rarely about a set of technologies.
  • 37. Conclusions We are at a crucial moment in data availability and use, online and elsewhere There is an increasing body of evidence about what people’s data needs and about how data is published on the web We don’t have links and we don’t always have great business cases for creating and maintaining them on the open, decentralised web. In fact, we need better models to resource data publishing all together There are other data modalities e.g. charts which web technologies can help share responsibly
  • 38. Metadata vocabularies used where there is a clear business case More documentation needed to make data useful for others
  • 39. Some data is missing, with serious consequences
  • 40. Charts as alternative to ‘raw’ data. Where are the links to the data?
  • 41. Thank you Talking Datasets: understanding data sensemaking behaviours. L Koesten, K Gregory, P Groth, E Simperl. International Journal of Human-Computer Studies. 146:102562. 2021 Everything You Always Wanted to Know about a Dataset: Studies in Data Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison. International Journal of Human-Computer Studies. 2019 Collaborative Practices with Structured Data: Do Tools Support what Users Need? L Koesten, E Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human Factors in Computing Systems, CHI 2019. Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis, LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data Bases, 2019. Characterising dataset search — An analysis of search logs and data requests. E Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web Semantics, 2018 Making sense of numerical data-semantic labelling of web tables. Kacprzak, E., Giménez-García, J.M., Piscopo, A., Koesten, L., Ibáñez, L.D., Tennison, J. and Simperl, E. In European Knowledge Acquisition Workshop (pp. 163- 178). Springer, 2018 The Trials and Tribulations of Working with Structured Data - a Study on Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl. Proceedings of ACM CHI Conference on Human Factors in Computing Systems, CHI 2017 Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis, E Simperl, P Groth - Patterns, 2020 Characterising Dataset Search on the European Data Portal . L Ibáñez, L Koesten, E Kacprzak, E Simperl. European Data Portal Analytical Report 18, 2020 Understanding Supply and Demand on the European Data Portal. L Ibáñez, E Simperl. European Data Portal Analytical Report 19, 2020 The Future of Open Data Portals. J Walker, E Simperl. European Data Portal Analytical Report 8, 2017 Smart Rural: The Open Data Gap. J Walker, G Thuermer, E Simperl, L Carr. Data for Policy, 2020