SlideShare a Scribd company logo
Max Schmachtenberg 
Christian Bizer 
Heiko Paulheim 
Adoption of the Linked Data Best Practices 
in Different Topical Domains 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 1
The Linked Data Best Practices 
Central idea of Linked Data: Ease data discovery and 
integration by complying to a set of best practices. 
1. Linking Best Practices 
• Set RDF links pointing at instances in other data sources. 
2. Vocabulary Best Practices 
• Reuse terms from widely-used vocabularies. 
• Make definitions of proprietary terms dereferencable. 
• Link vocabulary terms to terms in other vocabularies. 
3. Metadata Best Practices 
• Publish machine-readable provenance and licensing metadata. 
• Publish metadata about alternative access methods (SPARQL, dumps) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 2
State of the LOD Cloud Report - 2011 
 http://lod-cloud.net/state/ 
 Based on information 
by provided dataset 
publishers via the 
datahub.io catalog 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 3
LOD Cloud - 2011 
Consists of 
295 datasets. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 4
Outline 
Goal: Update the State of the LOD Cloud report 
and LOD Cloud itself to 2014. 
1. Methodology 
2. Adoption of the Linking Best Practices 
3. Adoption of the Vocabulary Best Practices 
4. Adoption of the Metadata Best Practices 
5. Conclusions (in Relation to Schema.org) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 5
1. Methodology: Crawl of the Linked Data Web 
 Crawler: LDSpider, Crawl Date: April 2014 
 Seeds: 560,000 seed URIs from 
1. Example URIs in datahub.io catalog 
2. URIs from BTC2012 dataset 
3. URIs from datasets advertised on public-lod@w3.org mailing list 
 Crawled Data Corpus 
• 900,000 documents containing 
• 8,038,000 resources 
• 1014 datasets 
• 77 datasets prevent 
crawling via robots.txt 
• Distribution by dataset 
• Red line: documents 
• Blue line: resources 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 6
Categorization by Topical Domain 
 Used categorization from datahub.io for existing datasets. 
 Manually categorized remaining datasets. 
 Added new category Social Networking 
 Growth without new category Social Networking: 94 % 
 LODstats (http://stats.lod2.eu/) discovered similar number of datasets: 1048 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 7
2. Adoption of the Linking Best Practices 
Data publishers should set RDF links as: 
1. Discoverability depends on being linked. 
2. RDF links ease data integration. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 8
Degrees 
 56% of all datasets set RDF links pointing to other datasets. 
• The remaining 44% are either only the target of RDF links from other 
datasets or are isolated. 
 Datasets with Top In- and Outdegrees: 
 Most widely used linking predicates: owl:sameAs, rdfs:seeAlso, foaf:knows 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 9
“Crawlable” LOD Cloud 2014 
 ss 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 10
Degree Distributions 
 Dotted line: Social Networking (status.net, etc.) 
 Solid line: Cross-Domain datasets (DBpedia, etc.) 
 Largest Strongly Connected Component: 36% (377 datasets) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 11
Conclusion concerning Linking Best Practices 
 Some datasets put a lot of effort into linking. 
 Many datasets only link to a small number of other datasets 
or do not set RDF links at all. 
 Similar situation as in 2011. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 12
3. Adoption of the Vocabulary Best Practices 
Goal: Help applications understand the data by 
1. Reusing terms from widely-used vocabularies. 
2. Making definitions of proprietary terms 
dereferencable. 
3. Linking vocabulary terms to terms in other 
vocabularies. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 13
Widely-Used and Proprietary Vocabularies 
 Strong agreement on some vocabularies. 
 Proprietary vocabularies are used in 
addition to common ones, 
as data is often very specific 
Widely-Used Vocabularies 
Proprietary Vocabularies 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 14
Dereferencability of Term URIs and Vocabulary Linking 
 28% of the proprietary vocabularies provide dereferencable URIs. 
 21% set RDF links to other vocabularies (8% in 2011) 
• Popular linking predicates: rdfs:range, rdfs:subClassOf, rdfs:subClassOf 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 15
Adoption of the Metadata Best Practices 
1. Publish machine-readable provenance information. 
2. Publish machine-readable licensing information. 
3. Publish metadata about alternative access methods 
(SPARQL endpoints, RDF dumps) 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 16
Provenance and Licensing Metadata 
 37% of the datasets provide provenance information 
• Dublin Core is used more than W3C Prov 
 10% provide machine-readable licensing information 
• Most used predicates dc:license, cc:license 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 17
Dataset Level Metadata (VoID) 
 15% of the datasets publish VoID descriptions. 
 Via these descriptions, it is possible to discover SPARQL 
endpoints and dumps for about 10% of the data sources. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 18
Conclusion concerning Metadata Best Practices 
 Applications can not rely on availability of metadata, 
as only a small fraction of all data sources publishes such data. 
 The Government and Library domains are positive exceptions. 
 Similarly low numbers as in 2011. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 19
“Full” LOD Cloud Diagram 
570 datasets 
 374 datahub.io 
 196 our crawl 
http://lod-cloud.net/ 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 20
Growth of the “Full” LOD Cloud Diagram 
 2011: 295 datasets 
 2014: 570 datasets (+ 93 %) 
http://lod-cloud.net/ 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 21
Comparison of Linked Data and Schema.org 
Schema.org 
1. does not expect data publishers to set data links. 
2. relies on marking up data in HTML pages. 
3. Strong application pull by Google, Microsoft, Yahoo! 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 22
Adoption 
WebDataCommons, 2013*: 
463,000 websites (PLDs) provide Microdata annotations. 
Google, 2014**: 
5 million websites provide Schema.org data. 
 Orders of magnitude more Schema.org data sources. 
* WebDataCommons extracts Microdata, RDFa, Microformat data 
from the CommonCrawl (2.2 billion HTML pages from 12.8 million PLDs). 
** Guha in LDOW2014 Keynote 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 23
Schema.org Topical Focus 
Different topics 
compared to 
Linked Data. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 24
Class / Property Distribution 
Microdata 2012 
 Only a small set of classes / properties is actually used. 
 Less variety compared to Linked Data. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 25
Shallowness of the Schema.org Data 
schema:Product schema:JobPosting 
Product Names 
• AppleMacBook Air MC968/A 11.6-Inch Laptop 
• Apple MacBook Air 11-in, Intel Core i5 1.60GHz, 64 GB, Lion 10.7 
JobPostings 
• More specific properties like skills are hardly used. 
• 57% of all hiringOrganizations are strings not instances. 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 26
Conclusion 
Linked Data Schema.org 
~ 1,000 sources > 460,000 sources 
covers wider range of specific topics 
(government, libraries, science) 
topics focused on search engines 
(products, organizations) 
contains more complex 
data structures 
very simple and shallow 
data structures 
partial ontology agreement strong ontology agreement 
identity resolution eased by RDF links identity resolution often requires 
value parsing 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 27
Thank you. 
References 
 Report 
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/ 
 Catalog 
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/ 
Acknowledgement 
 This work was supported by 
Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 28

More Related Content

What's hot

The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
Martin Hepp
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
Data Beers
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
Chris Bizer
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
Open Analytics
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
Primal Pappachan
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Kai Schlegel
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
RuleML
 
Internet in space - Networkshop44
Internet in space - Networkshop44Internet in space - Networkshop44
Internet in space - Networkshop44
Jisc
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
Chiara Del Vescovo
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
guestd60742
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
RDTF-Discovery
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
Ian Foster
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
Globus
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
petrknoth
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
EUCLID project
 
Unlocking the full potential of five-star addresses by using Linked Data Frag...
Unlocking the full potential of five-star addresses by using Linked Data Frag...Unlocking the full potential of five-star addresses by using Linked Data Frag...
Unlocking the full potential of five-star addresses by using Linked Data Frag...
Raf Buyle
 
ResourceSync Tutorial
ResourceSync TutorialResourceSync Tutorial
ResourceSync Tutorial
Open Archives Initiative
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
Ontotext
 
Mapping the Repository Landscape
Mapping the Repository LandscapeMapping the Repository Landscape
Mapping the Repository Landscape
EDINA, University of Edinburgh
 

What's hot (20)

The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
The Semantic Web – A Vision Come True, or Giving Up the Great Plan?
 
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data ap...
 
Extending Tables with Data from over a Million Websites
 Extending Tables with Data from over a Million Websites Extending Tables with Data from over a Million Websites
Extending Tables with Data from over a Million Websites
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
 
Cenitpede: Analyzing Webcrawl
Cenitpede: Analyzing WebcrawlCenitpede: Analyzing Webcrawl
Cenitpede: Analyzing Webcrawl
 
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Informationballoon Fusion: SPARQL Rewriting Based on  Unified Co-Reference Information
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
 
Industry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraftIndustry@RuleML2015 DataGraft
Industry@RuleML2015 DataGraft
 
Internet in space - Networkshop44
Internet in space - Networkshop44Internet in space - Networkshop44
Internet in space - Networkshop44
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Unlocking the full potential of five-star addresses by using Linked Data Frag...
Unlocking the full potential of five-star addresses by using Linked Data Frag...Unlocking the full potential of five-star addresses by using Linked Data Frag...
Unlocking the full potential of five-star addresses by using Linked Data Frag...
 
ResourceSync Tutorial
ResourceSync TutorialResourceSync Tutorial
ResourceSync Tutorial
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
Mapping the Repository Landscape
Mapping the Repository LandscapeMapping the Repository Landscape
Mapping the Repository Landscape
 

Similar to Adoption of the Linked Data Best Practices in Different Topical Domains

Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
3 Round Stones
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
Laboratory of Information Science and Semantic Technologies
 
EPA OEI Linked Data Process
EPA OEI Linked Data ProcessEPA OEI Linked Data Process
EPA OEI Linked Data Process
3 Round Stones
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
Maaike Duine
 
Lecture20
Lecture20Lecture20
Lecture20
srikanthhadoop
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
Vivien Bonazzi
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
Open Data Support
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
aaroncollie
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
Vivien Bonazzi
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
Michael Hausenblas
 
Bourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication Repositories
ASIS&T
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
Kristi Holmes
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
IMC Technologies
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
Philip Bourne
 
2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
Lizzy_Rolando
 
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
Cory Lampert
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecycle
geoknow
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
Anja Jentzsch
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case study
Leila Zemmouchi-Ghomari
 

Similar to Adoption of the Linked Data Best Practices in Different Topical Domains (20)

Linked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for EntrepreneursLinked Data: Opportunities for Entrepreneurs
Linked Data: Opportunities for Entrepreneurs
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
EPA OEI Linked Data Process
EPA OEI Linked Data ProcessEPA OEI Linked Data Process
EPA OEI Linked Data Process
 
THOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing ElsevierTHOR Workshop - Data Publishing Elsevier
THOR Workshop - Data Publishing Elsevier
 
Lecture20
Lecture20Lecture20
Lecture20
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Llinked open data training for EU institutions
Llinked open data training for EU institutionsLlinked open data training for EU institutions
Llinked open data training for EU institutions
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Bourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication RepositoriesBourne RDAP11 Data Publication Repositories
Bourne RDAP11 Data Publication Repositories
 
Linked Open Data_mlanet13
Linked Open Data_mlanet13Linked Open Data_mlanet13
Linked Open Data_mlanet13
 
Linked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the SoftwareLinked Data for the Masses: The approach and the Software
Linked Data for the Masses: The approach and the Software
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop2012 Fall Data Management Planning Workshop
2012 Fall Data Management Planning Workshop
 
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
Exposing Hidden Relationships: Practical Work in Linked Data using Digital Co...
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecycle
 
Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)Linked Data (1st Linked Data Meetup Malmö)
Linked Data (1st Linked Data Meetup Malmö)
 
Using Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case studyUsing Linked Data Resources to generate web pages based on a BBC case study
Using Linked Data Resources to generate web pages based on a BBC case study
 

More from Chris Bizer

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
Chris Bizer
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Chris Bizer
 
Using the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product Matching
Chris Bizer
 
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
Chris Bizer
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Chris Bizer
 
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Chris Bizer
 
Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)
Chris Bizer
 
Exploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesExploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web Tables
Chris Bizer
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications.
Chris Bizer
 

More from Chris Bizer (9)

GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
GPT4 versus BERT: Which Foundation Model is better for Web Data Integration?
 
Integrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning TechniquesIntegrating Product Data from the Semantic Web using Deep Learning Techniques
Integrating Product Data from the Semantic Web using Deep Learning Techniques
 
Using the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product MatchingUsing the Semantic Web as Training Data for Product Matching
Using the Semantic Web as Training Data for Product Matching
 
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open WebJIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
JIST2019 Keynote: Completing Knowledge Graphs using Data from the Open Web
 
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
Schema.org Annotations and Web Tables: Underexploited Semantic Nuggets on the...
 
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
Is the Semantic Web what we expected? Adoption Patterns and Content-driven Ch...
 
Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)Data Search and Search Joins (Universität Heidelberg 2015)
Data Search and Search Joins (Universität Heidelberg 2015)
 
Exploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web TablesExploring the Application Potential of Relational Web Tables
Exploring the Application Potential of Relational Web Tables
 
Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications. Evolving the Web into a Global Database - Advances and Applications.
Evolving the Web into a Global Database - Advances and Applications.
 

Recently uploaded

6 Reasons to Use a VPN | 3S VPN Server App
6 Reasons to Use a VPN | 3S VPN Server App6 Reasons to Use a VPN | 3S VPN Server App
6 Reasons to Use a VPN | 3S VPN Server App
VPN Server
 
Network Layer and its protocols mod .pptx
Network Layer and its protocols mod .pptxNetwork Layer and its protocols mod .pptx
Network Layer and its protocols mod .pptx
cossykin19
 
Bitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docx
Bitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docxBitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docx
Bitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docx
SFC Today
 
Information Systems Auditing, Controls and Assurance , tanapat limsaiprom
Information Systems Auditing, Controls and Assurance , tanapat limsaipromInformation Systems Auditing, Controls and Assurance , tanapat limsaiprom
Information Systems Auditing, Controls and Assurance , tanapat limsaiprom
TanapatLimsaiprom1
 
Draya Michele’s Son – Kniko Howard’s Rise to Fame.pptx
Draya Michele’s Son – Kniko Howard’s Rise to Fame.pptxDraya Michele’s Son – Kniko Howard’s Rise to Fame.pptx
Draya Michele’s Son – Kniko Howard’s Rise to Fame.pptx
ashishkumarrana9
 
Iot-Internet-of-Things_Industrial revolution 4.0-ppt.pptx
Iot-Internet-of-Things_Industrial revolution 4.0-ppt.pptxIot-Internet-of-Things_Industrial revolution 4.0-ppt.pptx
Iot-Internet-of-Things_Industrial revolution 4.0-ppt.pptx
DeepakKumar862274
 
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdfTop 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Krishna L
 
Quiz Quiz Hota Hai (School Quiz 2018-19)
Quiz Quiz Hota Hai (School Quiz 2018-19)Quiz Quiz Hota Hai (School Quiz 2018-19)
Quiz Quiz Hota Hai (School Quiz 2018-19)
Kashyap J
 
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirtsJarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
exgf28
 
AWS Networking Basic , tanapat limsaiprom
AWS Networking Basic , tanapat limsaipromAWS Networking Basic , tanapat limsaiprom
AWS Networking Basic , tanapat limsaiprom
ธนาพัฒน์ ลิ้มสายพรหม
 
Ontology for the semantic enhancement, database definition and management and...
Ontology for the semantic enhancement, database definition and management and...Ontology for the semantic enhancement, database definition and management and...
Ontology for the semantic enhancement, database definition and management and...
Edward Blurock
 
202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧
202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧
202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧
ffg01100
 
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
ffg01100
 
@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...
@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...
@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...
shamrisumri
 
Tarun Gaur On Data Breaches and Privacy Fears
Tarun Gaur On Data Breaches and Privacy FearsTarun Gaur On Data Breaches and Privacy Fears
Tarun Gaur On Data Breaches and Privacy Fears
Tarun Gaur
 
optimized green synthesis characterization and evaluation
optimized green synthesis characterization and evaluationoptimized green synthesis characterization and evaluation
optimized green synthesis characterization and evaluation
ManojKumarr75
 
Megalive99 Situs Betting Online Gacor Terpercaya
Megalive99 Situs Betting Online Gacor TerpercayaMegalive99 Situs Betting Online Gacor Terpercaya
Megalive99 Situs Betting Online Gacor Terpercaya
Megalive99
 
High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...
High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...
High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...
shamrisumri
 
Dewanstudio Project Portfolio 2023 show case
Dewanstudio Project Portfolio 2023 show caseDewanstudio Project Portfolio 2023 show case
Dewanstudio Project Portfolio 2023 show case
DEWANSTUDIO.COM
 
2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage
Zsolt Nemeth
 

Recently uploaded (20)

6 Reasons to Use a VPN | 3S VPN Server App
6 Reasons to Use a VPN | 3S VPN Server App6 Reasons to Use a VPN | 3S VPN Server App
6 Reasons to Use a VPN | 3S VPN Server App
 
Network Layer and its protocols mod .pptx
Network Layer and its protocols mod .pptxNetwork Layer and its protocols mod .pptx
Network Layer and its protocols mod .pptx
 
Bitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docx
Bitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docxBitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docx
Bitcoin vs Ethereum Which Crypto Performed Better in Q2, 2024.docx
 
Information Systems Auditing, Controls and Assurance , tanapat limsaiprom
Information Systems Auditing, Controls and Assurance , tanapat limsaipromInformation Systems Auditing, Controls and Assurance , tanapat limsaiprom
Information Systems Auditing, Controls and Assurance , tanapat limsaiprom
 
Draya Michele’s Son – Kniko Howard’s Rise to Fame.pptx
Draya Michele’s Son – Kniko Howard’s Rise to Fame.pptxDraya Michele’s Son – Kniko Howard’s Rise to Fame.pptx
Draya Michele’s Son – Kniko Howard’s Rise to Fame.pptx
 
Iot-Internet-of-Things_Industrial revolution 4.0-ppt.pptx
Iot-Internet-of-Things_Industrial revolution 4.0-ppt.pptxIot-Internet-of-Things_Industrial revolution 4.0-ppt.pptx
Iot-Internet-of-Things_Industrial revolution 4.0-ppt.pptx
 
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdfTop 50 Telephone Conversation Sample Examples For IT Industries.pdf
Top 50 Telephone Conversation Sample Examples For IT Industries.pdf
 
Quiz Quiz Hota Hai (School Quiz 2018-19)
Quiz Quiz Hota Hai (School Quiz 2018-19)Quiz Quiz Hota Hai (School Quiz 2018-19)
Quiz Quiz Hota Hai (School Quiz 2018-19)
 
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirtsJarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
Jarren Duran Fuck EM T shirts Jarren Duran Fuck EM T shirts
 
AWS Networking Basic , tanapat limsaiprom
AWS Networking Basic , tanapat limsaipromAWS Networking Basic , tanapat limsaiprom
AWS Networking Basic , tanapat limsaiprom
 
Ontology for the semantic enhancement, database definition and management and...
Ontology for the semantic enhancement, database definition and management and...Ontology for the semantic enhancement, database definition and management and...
Ontology for the semantic enhancement, database definition and management and...
 
202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧
202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧
202254.com免费观看《长相思第二季》免费观看高清,长相思第二季线上看,《长相思第二季》最新电视剧在线观看,杨紫最新电视剧
 
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
202254.com香蕉影视,在线观看《我才不要和你做朋友呢》在线观看最新电影,香蕉影视在线观看《我才不要和你做朋友呢》在线观看高清电影
 
@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...
@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...
@Girls @Call Chennai 🛬 XXXXXXXXXX 🛬 available 24*7 cash payment book now pay ...
 
Tarun Gaur On Data Breaches and Privacy Fears
Tarun Gaur On Data Breaches and Privacy FearsTarun Gaur On Data Breaches and Privacy Fears
Tarun Gaur On Data Breaches and Privacy Fears
 
optimized green synthesis characterization and evaluation
optimized green synthesis characterization and evaluationoptimized green synthesis characterization and evaluation
optimized green synthesis characterization and evaluation
 
Megalive99 Situs Betting Online Gacor Terpercaya
Megalive99 Situs Betting Online Gacor TerpercayaMegalive99 Situs Betting Online Gacor Terpercaya
Megalive99 Situs Betting Online Gacor Terpercaya
 
High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...
High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...
High Profile Girls Call ServiCe Chennai XX00XXX00X Tanisha Best High Class Ch...
 
Dewanstudio Project Portfolio 2023 show case
Dewanstudio Project Portfolio 2023 show caseDewanstudio Project Portfolio 2023 show case
Dewanstudio Project Portfolio 2023 show case
 
2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage2023. Archive - Gigabajtos selfpublisher homepage
2023. Archive - Gigabajtos selfpublisher homepage
 

Adoption of the Linked Data Best Practices in Different Topical Domains

  • 1. Max Schmachtenberg Christian Bizer Heiko Paulheim Adoption of the Linked Data Best Practices in Different Topical Domains Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 1
  • 2. The Linked Data Best Practices Central idea of Linked Data: Ease data discovery and integration by complying to a set of best practices. 1. Linking Best Practices • Set RDF links pointing at instances in other data sources. 2. Vocabulary Best Practices • Reuse terms from widely-used vocabularies. • Make definitions of proprietary terms dereferencable. • Link vocabulary terms to terms in other vocabularies. 3. Metadata Best Practices • Publish machine-readable provenance and licensing metadata. • Publish metadata about alternative access methods (SPARQL, dumps) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 2
  • 3. State of the LOD Cloud Report - 2011  http://lod-cloud.net/state/  Based on information by provided dataset publishers via the datahub.io catalog Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 3
  • 4. LOD Cloud - 2011 Consists of 295 datasets. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 4
  • 5. Outline Goal: Update the State of the LOD Cloud report and LOD Cloud itself to 2014. 1. Methodology 2. Adoption of the Linking Best Practices 3. Adoption of the Vocabulary Best Practices 4. Adoption of the Metadata Best Practices 5. Conclusions (in Relation to Schema.org) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 5
  • 6. 1. Methodology: Crawl of the Linked Data Web  Crawler: LDSpider, Crawl Date: April 2014  Seeds: 560,000 seed URIs from 1. Example URIs in datahub.io catalog 2. URIs from BTC2012 dataset 3. URIs from datasets advertised on public-lod@w3.org mailing list  Crawled Data Corpus • 900,000 documents containing • 8,038,000 resources • 1014 datasets • 77 datasets prevent crawling via robots.txt • Distribution by dataset • Red line: documents • Blue line: resources Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 6
  • 7. Categorization by Topical Domain  Used categorization from datahub.io for existing datasets.  Manually categorized remaining datasets.  Added new category Social Networking  Growth without new category Social Networking: 94 %  LODstats (http://stats.lod2.eu/) discovered similar number of datasets: 1048 Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 7
  • 8. 2. Adoption of the Linking Best Practices Data publishers should set RDF links as: 1. Discoverability depends on being linked. 2. RDF links ease data integration. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 8
  • 9. Degrees  56% of all datasets set RDF links pointing to other datasets. • The remaining 44% are either only the target of RDF links from other datasets or are isolated.  Datasets with Top In- and Outdegrees:  Most widely used linking predicates: owl:sameAs, rdfs:seeAlso, foaf:knows Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 9
  • 10. “Crawlable” LOD Cloud 2014  ss Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 10
  • 11. Degree Distributions  Dotted line: Social Networking (status.net, etc.)  Solid line: Cross-Domain datasets (DBpedia, etc.)  Largest Strongly Connected Component: 36% (377 datasets) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 11
  • 12. Conclusion concerning Linking Best Practices  Some datasets put a lot of effort into linking.  Many datasets only link to a small number of other datasets or do not set RDF links at all.  Similar situation as in 2011. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 12
  • 13. 3. Adoption of the Vocabulary Best Practices Goal: Help applications understand the data by 1. Reusing terms from widely-used vocabularies. 2. Making definitions of proprietary terms dereferencable. 3. Linking vocabulary terms to terms in other vocabularies. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 13
  • 14. Widely-Used and Proprietary Vocabularies  Strong agreement on some vocabularies.  Proprietary vocabularies are used in addition to common ones, as data is often very specific Widely-Used Vocabularies Proprietary Vocabularies Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 14
  • 15. Dereferencability of Term URIs and Vocabulary Linking  28% of the proprietary vocabularies provide dereferencable URIs.  21% set RDF links to other vocabularies (8% in 2011) • Popular linking predicates: rdfs:range, rdfs:subClassOf, rdfs:subClassOf Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 15
  • 16. Adoption of the Metadata Best Practices 1. Publish machine-readable provenance information. 2. Publish machine-readable licensing information. 3. Publish metadata about alternative access methods (SPARQL endpoints, RDF dumps) Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 16
  • 17. Provenance and Licensing Metadata  37% of the datasets provide provenance information • Dublin Core is used more than W3C Prov  10% provide machine-readable licensing information • Most used predicates dc:license, cc:license Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 17
  • 18. Dataset Level Metadata (VoID)  15% of the datasets publish VoID descriptions.  Via these descriptions, it is possible to discover SPARQL endpoints and dumps for about 10% of the data sources. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 18
  • 19. Conclusion concerning Metadata Best Practices  Applications can not rely on availability of metadata, as only a small fraction of all data sources publishes such data.  The Government and Library domains are positive exceptions.  Similarly low numbers as in 2011. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 19
  • 20. “Full” LOD Cloud Diagram 570 datasets  374 datahub.io  196 our crawl http://lod-cloud.net/ Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 20
  • 21. Growth of the “Full” LOD Cloud Diagram  2011: 295 datasets  2014: 570 datasets (+ 93 %) http://lod-cloud.net/ Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 21
  • 22. Comparison of Linked Data and Schema.org Schema.org 1. does not expect data publishers to set data links. 2. relies on marking up data in HTML pages. 3. Strong application pull by Google, Microsoft, Yahoo! Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 22
  • 23. Adoption WebDataCommons, 2013*: 463,000 websites (PLDs) provide Microdata annotations. Google, 2014**: 5 million websites provide Schema.org data.  Orders of magnitude more Schema.org data sources. * WebDataCommons extracts Microdata, RDFa, Microformat data from the CommonCrawl (2.2 billion HTML pages from 12.8 million PLDs). ** Guha in LDOW2014 Keynote Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 23
  • 24. Schema.org Topical Focus Different topics compared to Linked Data. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 24
  • 25. Class / Property Distribution Microdata 2012  Only a small set of classes / properties is actually used.  Less variety compared to Linked Data. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 25
  • 26. Shallowness of the Schema.org Data schema:Product schema:JobPosting Product Names • AppleMacBook Air MC968/A 11.6-Inch Laptop • Apple MacBook Air 11-in, Intel Core i5 1.60GHz, 64 GB, Lion 10.7 JobPostings • More specific properties like skills are hardly used. • 57% of all hiringOrganizations are strings not instances. Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 26
  • 27. Conclusion Linked Data Schema.org ~ 1,000 sources > 460,000 sources covers wider range of specific topics (government, libraries, science) topics focused on search engines (products, organizations) contains more complex data structures very simple and shallow data structures partial ontology agreement strong ontology agreement identity resolution eased by RDF links identity resolution often requires value parsing Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 27
  • 28. Thank you. References  Report http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/  Catalog http://linkeddatacatalog.dws.informatik.uni-mannheim.de/ Acknowledgement  This work was supported by Schmachtenberg, Bizer, Paulheim: Adoption of the Linked Data Best Practices, 23.10.2014 Slide 28