SlideShare a Scribd company logo
1 of 27
Download to read offline
1Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
F. Michel
Université Côte d’Azur, CNRS, Inria, I3S, France
Describe and Publish data sets on the web:
vocabularies, catalogues, data portals…
ANF APSEM2018 : Apprentissage et sémantique
Toulouse, 12-15 novembre 2018
2Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Making datasets FAIR
requires quality, comprehensive metadata
published in public catalogues
3Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Metadata are the key to “FAIRness”
Context
Identification, authors/organizations, dates, version, reference articles,
funding, reuse conditions (licenses, rights),
coverage: time, space, topics
Access Format, structure, location (download), query methods, protocols
Interpretation
What do the data represent? What concepts, entities, semantics?
Reference vocabularies, dimensions, variables, time granularity,
units (cm or inches, left/right)…
Provenance
Acquired with which equipment, parameters, protocols?
Derived from which dataset? With which processing?
Dataset-level or entity-level provenance
Statistics Number of triples per property of class, links to other datasets…
…
4Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
No single data
vocabulary
covers all these
metadata
out-of-the-box
5Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Vocabularies to describe datasets
and dataset catalogues
6Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Standards about metadata and how to us them
About metadata
DCAT: Data Catalog Vocabulary
VOID: Vocabulary of Interlinked Datasets
Schema.org: Dataset and DataCatalog
About how to use metadata
HCLS: Health Care & Life Sciences Dataset Profile
7Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
DCAT, Data Catalog Vocabulary
Describe any dataset, in any format, standalone or in a catalog
Describe any catalog of datasets
Facilitate interoperability between catalogs published on the web
http://www.w3.org/TR/2014/REC-vocab-dcat-20140116/
8Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
DCAT model
Source: http://www.w3.org/TR/2014/REC-vocab-dcat-20140116/
Distinguish between
the data and the
representation formats
9Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
DCAT limitations: partial metadata coverage
No versioning
No properties for provenance, organizations, projects, funding,
interpretation…
Yes, but DCAT is just a framework:
“Other complementary vocabularies may be used (…)
to provide more detailed format-specific information”
(from DCAT recommandation)
10Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
DCAT is extensible with Application Profiles
 DCAT-AP: DCAT profile for data portals in Europe
“(…) enable cross-data portal search of data sets
and make public sector data better searchable
across borders and sectors”
(https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe)
 Europe: DCAT-AP-IT, DCAT-AP-NO, DCAT-AP.de, TransportDCAT-AP,
StatDCAT-AP
 RING DCAT extension: agriculture, agrifood
 BotDCAT-AP: datasets for Chatbot Systems
 …
11Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
At DCAT 1.1 will fix many limitations (2nd Q 2019)
Draws on the experience of DCAT,
the multiple use cases and Application Profiles
Multiple extensions:
• Versioning
• Qualitative information
• Formal description (coverage, interpretation…)
• …
https://w3c.github.io/dxwg/ucr/
12Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Microdata
Collaborative community project led by:
Define a common vocabulary to markup web pages
- Make data understandable to search engines
- Improve discoverability, ranking
- Lightweight descriptions to allow for summarizations
Markup formats
RDFa
13Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Not as
comprehensive as
DCAT, but benefits
from large adoption
of schema.org
14Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
VOID: Vocabulary of Interlinked Datasets
“VOID is an RDF Schema vocabulary for
expressing metadata about RDF datasets”
(from VOID recommandation)
General metadata
Terms from Dublin Core, FOAF: topics, web page, contacts, license, version…
Access metadata
Data dump, RDF serialization formats, SPARQL endpoint, example
Structural metadata
URIs scheme, vocabularies, statistics (no. triples, properties, classes, subjects, objects),
dataset partitions, root resource
Links from/to other datasets
No. link triples, link predicates (e.g. owl:sameAs, skos:exactMatch, owl:equivalentClass)
http://www.w3.org/TR/2011/NOTE-void-20110303/
15Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Health Care and Life Sciences Profile (HCLS)
“Develop a guidance note for reusing existing
vocabularies to describe datasets with RDF”
(Michel Dumontier, https://www.slideshare.net/micheldumontier/w3c-hcls-dataset-description)
Consensus among participating
stakeholders on the RDF description
of HC and LS datasets
Detailed requirements have been
drawn from a wide range of use cases
http://www.w3.org/TR/hcls-dataset/
16Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Vocabularies used in HCLS
Data Catalog (DCAT)
Citation Typing Ontology
Dublin Core Types/Terms
Friend-of-a-Friend (FOAF)
Collection Description Frequency Vocabulary
Identifiers.org vocabulary (IdoT)
Lexical Vocabulary (Lexvo)
Provenance Authoring and Versioning ontology (PAV)
PROV Ontology
Schema.org
SPARQL Service Description (SD)
Semantic science Integrated Ontology (SIO)
Vocabulary of Interlinked Datasets (VOID)
17Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Organize, host, search datasets
with data portals
18Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
CKAN
19Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
CKAN Data Management Platform
 Functions: allow to publish, share, search datasets
 Faceted search by topics, publishers, keywords, formats, languages…
 197 instances as of Oct. 2018 (https://ckan.org/about/instances/)
• 93 in Europe, 39 national government instances
• data.gov.*
 Native exporting of records in DCAT and
harvesting DCAT records from other catalogues
20Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
“Powered by CKAN”
http://data.europa.eu/euodp/en/data/
https://www.europeandataportal.eu/
21Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Linking Open Data cloud diagram, 2018. J.P. McCrae, A. Abele,
P. Buitelaar, A. Jentzsch, V. Andryushechkin and R. Cyganiak.
http://lod-cloud.net/
Datahub.io:
Source for the
Linked Open Data cloud
“Powered by CKAN”
22Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
https://www.blog.google/products/search/making-it-easier-discover-datasets/
23Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Google Dataset Search
24Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
25Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
26Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Makup embedded in the
web page as JSON-LD
27Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
Thank you!

More Related Content

What's hot

How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
openminted_eu
 
DCAT-Application Profile for Data Providers
DCAT-Application Profile for Data ProvidersDCAT-Application Profile for Data Providers
DCAT-Application Profile for Data Providers
Johann Höchtl
 
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Nuno Freire
 

What's hot (16)

2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
 
Connecting Museums with Linked Data
Connecting Museums with Linked DataConnecting Museums with Linked Data
Connecting Museums with Linked Data
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
dotAC: Exploring the Uk Research Landscape
dotAC: Exploring the Uk Research LandscapedotAC: Exploring the Uk Research Landscape
dotAC: Exploring the Uk Research Landscape
 
DCAT-Application Profile for Data Providers
DCAT-Application Profile for Data ProvidersDCAT-Application Profile for Data Providers
DCAT-Application Profile for Data Providers
 
AXES: If only you knew what's in your archives
AXES: If only you knew what's in your archivesAXES: If only you knew what's in your archives
AXES: If only you knew what's in your archives
 
Sonex deposit meeting_ws_20110301
Sonex deposit meeting_ws_20110301Sonex deposit meeting_ws_20110301
Sonex deposit meeting_ws_20110301
 
Fabrizia Bignami
Fabrizia  BignamiFabrizia  Bignami
Fabrizia Bignami
 
Perx and TechXtra
Perx and TechXtraPerx and TechXtra
Perx and TechXtra
 
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
Building new knowledge from distributed scientific corpus: HERBADROP & EUROPE...
 
Pipe dreams
Pipe dreamsPipe dreams
Pipe dreams
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
 
Mapping of Terminology Standards, a Way for Interoperability (Position Paper)
Mapping of Terminology Standards, a Way for Interoperability (Position Paper)Mapping of Terminology Standards, a Way for Interoperability (Position Paper)
Mapping of Terminology Standards, a Way for Interoperability (Position Paper)
 
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
 
Scientific Units in the Electronic Age
Scientific Units in the Electronic AgeScientific Units in the Electronic Age
Scientific Units in the Electronic Age
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 

Similar to Describe and Publish data sets on the web: vocabularies, catalogues, data portals…

Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
Carole Goble
 

Similar to Describe and Publish data sets on the web: vocabularies, catalogues, data portals… (20)

ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
 
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
 
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
 
semantic and social (intra)webs
semantic and social (intra)webssemantic and social (intra)webs
semantic and social (intra)webs
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Web open standards for linked data and knowledge graphs as enablers of EU dig...
Web open standards for linked data and knowledge graphs as enablers of EU dig...Web open standards for linked data and knowledge graphs as enablers of EU dig...
Web open standards for linked data and knowledge graphs as enablers of EU dig...
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
EPOS metadata catalogue
EPOS metadata catalogueEPOS metadata catalogue
EPOS metadata catalogue
 
Make our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the WebMake our Scientific Datasets Accessible and Interoperable on the Web
Make our Scientific Datasets Accessible and Interoperable on the Web
 
An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...
An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...
An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 
Knowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked dataKnowledge Engineering: Semantic web, web of data, linked data
Knowledge Engineering: Semantic web, web of data, linked data
 
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future OpportunitiesModelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
NeOn Project : Lifecycle support for Networked Ontologies
NeOn Project : Lifecycle support for Networked Ontologies NeOn Project : Lifecycle support for Networked Ontologies
NeOn Project : Lifecycle support for Networked Ontologies
 
NeOn project
NeOn projectNeOn project
NeOn project
 
Data interoperability for fisheries statistics
Data interoperability for fisheries statisticsData interoperability for fisheries statistics
Data interoperability for fisheries statistics
 
Dariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingDariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displaying
 
Open Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaicsOpen Data Mashups: linking fragments into mosaics
Open Data Mashups: linking fragments into mosaics
 

More from Franck Michel

SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
Franck Michel
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Franck Michel
 

More from Franck Michel (9)

Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
 
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
 
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked DataSPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
 
Integrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of DataIntegrating Heterogeneous Data Sources in the Web of Data
Integrating Heterogeneous Data Sources in the Web of Data
 
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
 
A Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQLA Mapping-based Method to Query MongoDB Documents with SPARQL
A Mapping-based Method to Query MongoDB Documents with SPARQL
 
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
 
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RMLTranslation of Relational and Non-Relational Databases into RDF with xR2RML
Translation of Relational and Non-Relational Databases into RDF with xR2RML
 
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
 

Recently uploaded

Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
ChloeMeadows1
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
abhinandnam9997
 

Recently uploaded (16)

Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdf
 
Case study on merger of Vodafone and Idea (VI).pptx
Case study on merger of Vodafone and Idea (VI).pptxCase study on merger of Vodafone and Idea (VI).pptx
Case study on merger of Vodafone and Idea (VI).pptx
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Article writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptxArticle writing on excessive use of internet.pptx
Article writing on excessive use of internet.pptx
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 

Describe and Publish data sets on the web: vocabularies, catalogues, data portals…

  • 1. 1Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France F. Michel Université Côte d’Azur, CNRS, Inria, I3S, France Describe and Publish data sets on the web: vocabularies, catalogues, data portals… ANF APSEM2018 : Apprentissage et sémantique Toulouse, 12-15 novembre 2018
  • 2. 2Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Making datasets FAIR requires quality, comprehensive metadata published in public catalogues
  • 3. 3Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Metadata are the key to “FAIRness” Context Identification, authors/organizations, dates, version, reference articles, funding, reuse conditions (licenses, rights), coverage: time, space, topics Access Format, structure, location (download), query methods, protocols Interpretation What do the data represent? What concepts, entities, semantics? Reference vocabularies, dimensions, variables, time granularity, units (cm or inches, left/right)… Provenance Acquired with which equipment, parameters, protocols? Derived from which dataset? With which processing? Dataset-level or entity-level provenance Statistics Number of triples per property of class, links to other datasets… …
  • 4. 4Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France No single data vocabulary covers all these metadata out-of-the-box
  • 5. 5Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Vocabularies to describe datasets and dataset catalogues
  • 6. 6Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Standards about metadata and how to us them About metadata DCAT: Data Catalog Vocabulary VOID: Vocabulary of Interlinked Datasets Schema.org: Dataset and DataCatalog About how to use metadata HCLS: Health Care & Life Sciences Dataset Profile
  • 7. 7Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France DCAT, Data Catalog Vocabulary Describe any dataset, in any format, standalone or in a catalog Describe any catalog of datasets Facilitate interoperability between catalogs published on the web http://www.w3.org/TR/2014/REC-vocab-dcat-20140116/
  • 8. 8Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France DCAT model Source: http://www.w3.org/TR/2014/REC-vocab-dcat-20140116/ Distinguish between the data and the representation formats
  • 9. 9Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France DCAT limitations: partial metadata coverage No versioning No properties for provenance, organizations, projects, funding, interpretation… Yes, but DCAT is just a framework: “Other complementary vocabularies may be used (…) to provide more detailed format-specific information” (from DCAT recommandation)
  • 10. 10Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France DCAT is extensible with Application Profiles  DCAT-AP: DCAT profile for data portals in Europe “(…) enable cross-data portal search of data sets and make public sector data better searchable across borders and sectors” (https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe)  Europe: DCAT-AP-IT, DCAT-AP-NO, DCAT-AP.de, TransportDCAT-AP, StatDCAT-AP  RING DCAT extension: agriculture, agrifood  BotDCAT-AP: datasets for Chatbot Systems  …
  • 11. 11Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France At DCAT 1.1 will fix many limitations (2nd Q 2019) Draws on the experience of DCAT, the multiple use cases and Application Profiles Multiple extensions: • Versioning • Qualitative information • Formal description (coverage, interpretation…) • … https://w3c.github.io/dxwg/ucr/
  • 12. 12Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Microdata Collaborative community project led by: Define a common vocabulary to markup web pages - Make data understandable to search engines - Improve discoverability, ranking - Lightweight descriptions to allow for summarizations Markup formats RDFa
  • 13. 13Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Not as comprehensive as DCAT, but benefits from large adoption of schema.org
  • 14. 14Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France VOID: Vocabulary of Interlinked Datasets “VOID is an RDF Schema vocabulary for expressing metadata about RDF datasets” (from VOID recommandation) General metadata Terms from Dublin Core, FOAF: topics, web page, contacts, license, version… Access metadata Data dump, RDF serialization formats, SPARQL endpoint, example Structural metadata URIs scheme, vocabularies, statistics (no. triples, properties, classes, subjects, objects), dataset partitions, root resource Links from/to other datasets No. link triples, link predicates (e.g. owl:sameAs, skos:exactMatch, owl:equivalentClass) http://www.w3.org/TR/2011/NOTE-void-20110303/
  • 15. 15Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Health Care and Life Sciences Profile (HCLS) “Develop a guidance note for reusing existing vocabularies to describe datasets with RDF” (Michel Dumontier, https://www.slideshare.net/micheldumontier/w3c-hcls-dataset-description) Consensus among participating stakeholders on the RDF description of HC and LS datasets Detailed requirements have been drawn from a wide range of use cases http://www.w3.org/TR/hcls-dataset/
  • 16. 16Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Vocabularies used in HCLS Data Catalog (DCAT) Citation Typing Ontology Dublin Core Types/Terms Friend-of-a-Friend (FOAF) Collection Description Frequency Vocabulary Identifiers.org vocabulary (IdoT) Lexical Vocabulary (Lexvo) Provenance Authoring and Versioning ontology (PAV) PROV Ontology Schema.org SPARQL Service Description (SD) Semantic science Integrated Ontology (SIO) Vocabulary of Interlinked Datasets (VOID)
  • 17. 17Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Organize, host, search datasets with data portals
  • 18. 18Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France CKAN
  • 19. 19Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France CKAN Data Management Platform  Functions: allow to publish, share, search datasets  Faceted search by topics, publishers, keywords, formats, languages…  197 instances as of Oct. 2018 (https://ckan.org/about/instances/) • 93 in Europe, 39 national government instances • data.gov.*  Native exporting of records in DCAT and harvesting DCAT records from other catalogues
  • 20. 20Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France “Powered by CKAN” http://data.europa.eu/euodp/en/data/ https://www.europeandataportal.eu/
  • 21. 21Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Linking Open Data cloud diagram, 2018. J.P. McCrae, A. Abele, P. Buitelaar, A. Jentzsch, V. Andryushechkin and R. Cyganiak. http://lod-cloud.net/ Datahub.io: Source for the Linked Open Data cloud “Powered by CKAN”
  • 22. 22Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France https://www.blog.google/products/search/making-it-easier-discover-datasets/
  • 23. 23Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Google Dataset Search
  • 24. 24Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
  • 25. 25Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France
  • 26. 26Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Makup embedded in the web page as JSON-LD
  • 27. 27Franck MICHEL - Université Côte d’Azur, CNRS, Inria, I3S, France Thank you!