SlideShare a Scribd company logo
1 of 22
How to describe a dataset.
Interoperability issues
Valeria Pesce
Global Forum on Agricultural Research
Definition of “dataset”
The term “dataset” has been defined in several ways, all of which
further specify or extend the basic concept of “a collection of data”.
Definition given by the W3C Government Linked Data Working Group:
A dataset is “a collection of data, published or curated by a
single source, and available for access or download in one or
more formats”
The “instances” of the dataset “available for access or
download in one or more formats” are called
“distributions”. A dataset can have many distributions.
Examples of distributions include a downloadable CSV
file, an API or an RSS feed.
Definition of “interoperability”
“Data interoperability is a feature of datasets -
and of information services that give access to
datasets - whereby data can easily be retrieved,
processed, re-used, and re-packaged
(“operated”) by other systems.”
Interim Proceedings of International Expert Consultation on “Building the CIARD
Framework for Data and Information Sharing”, CIARD (2011)
software applications
datasets have to be machine-readable
What applications need
Besides information common to any type of resource (name, author /
owner, date…), applications have to find enough metadata about
datasets to understand:
1. the specific coverage of the dataset (type of data, thematic
coverage, geographic coverage)
2. the necessary technical specifications to retrieve and parse a
distribution of the dataset (format, protocol etc.)
3. the conditions for re-use (rights, licenses)
4. the “dimensions” covered by the dataset (e.g. temperature,
time, salinity, gene, coordinates)
5. the semantics of the dimensions (units of measure, time
granularity, syntax, reference taxonomies)
Partial answers in existing vocabularies
• DCAT vocabulary
– RDF vocabulary for describing any dataset
– Datasets can be standalone or part of a “catalog”
– Datasets are accessible through several “distributions”
– “Other, complementary vocabularies may be used together with DCAT to provide
more detailed format-specific information. For example, properties from the VoID
vocabulary can be used if that dataset is in RDF format.”
• VOID vocabulary
– RDF vocabulary for expressing metadata about RDF datasets
• (SDMX ) DataCube vocabulary
– RDF vocabulary for describing statistical datasets
– Useful for attaching metadata about the “data structure” to any dataset that
doesn’t follow a known published standard
Coverage of a dataset
• This can be handled by common Dublin Core properties like subject and
coverage.
• DCAT re-uses these DC properties.
Issue 1: No specific property for the type of data covered in a dataset
The values of these properties have to be understood by machines:
- The value should be standardized, possibly a URI
- The URI should be de-referenceable to a thing
- The thing should be part of an authority list / taxonomy
Issue 3: There is no authority vocabulary for types of data
Issue 1
Issue 2
Conditions for re-use
• DCAT re-uses the license DC property at the level of
distributions
• DCAT re-uses the rights DC property at bith the level
of dataset and the level of distribution
dc:license > dc:LicenseDocument
dc:rights > dc:RightsStatement
W3C DCAT > DCAT AP
DCAT core
Technical properties
The necessary technical specifications to retrieve and
parse a distribution of a dataset (format, protocol etc.)
• DCAT re-uses the DC format property;
Issue No property for protocol
The values of these properties have to be understood by
machines, possibly URIs:
Issue2 No comprehensive RDF authority lists for these
values (partial: DC Types; non-RDF: IANA types)
Issue 1
Issue 2
VOID
VOID can help with the protocol metadata but only for
RDF datasets:
- Property for data dump: dataDump
- Property for SPARQL endpoint: sparqlEndpoint
“Dimensions” and their semantics
DCAT does not describe the dimensions of a dataset,
except for a reference to a standard if the dataset
dimensions can be defined by a formalized standard
(e.g. an XML schema or an RDF vocabulary or an ISO
standard)
dc:conformsTo > dc:Standard
Statistical vocabularies can help
with the description of the dimensions
SDMX: data structure and dimensions
SDMX: Statistical Data and Metadata Exchange
The data structure definition is a description of all the metadata needed to
understand the data set structure.
This includes:
• identification of the dimensions (Dimension) according to standard
statistical terminology,
• the key structure (KeyDescriptor),
• the code-lists (CodeList) that enumerate valid values for each dimension
• coded attribute (CodedAttribute), information about whether attributes
are required or optional and coded or free text.
Given the metadata in the data structure definition, all of the data in the
data set becomes meaningful.
DataCube: simplified SDMX in RDF
DataCube: simplified SDMX in RDF
Reference to a concept scheme
DataCube: simplified SDMX in RDF
“Semantic role” of the property
DataCube: simplified SDMX in RDF
“Semantic role” of
Combining different vocabularies
Name
URL
Owner
Content type
Topic(s)
Language
Metadata set(s)
Data structure
Distribution(s)
[…]
DATASET
Name
Protocol
Endpoint URL
Media type
Format
Size
DISTRIBUTION
DCAT model
Dimensions
Attributes
Measures
Value lists
DATA STRUCTURE
DataCube model
Catalog: the directory
Vocabulary(ies)
SPARQL endpoint
Data dump
Serialization format
Number of triples
RDF dataset info
VOID properties
If one or more known
published metadata sets
are used, just fill
“metadata set(s)”,
otherwise link to a “data
structure” with custom
“dimensions”
IF media type has RDF
or SPARQL response
Tools for managing dataset metadata
• CKAN maintained by the Open Knowledge Foundation
Uses most of DCAT. Doesn’t describe dimensions.
Also provides a global dataset hub called the Datahub
• Dataverse created by Harvard University
Uses a custom vocabulary. Doesn’t describe dimensions.
• Commercial solutions
• Repositories and catalogs:
OpenAIRE, DataCite (using re3data to search repositories) and Dryad
use their own vocabularies.
• CIARD RING
Uses full DCAT AP with some extended properties (protocol, data
type) and local taxonomies with URIs mapped when possible to
authorities.
Next steps: adding DataCube properties for dimensions.
Major outstanding issues
• Some missing properties in existing vocabularies:
 approach vocabulary owners OR extend vocabularies
• Missing vocabularies for protocols, formats
 approach standardizing bodies?
 perhaps specific dataset formats?
• Need for more standardized semantics for
dimensions:
 Joint discussions with the RDA Data Type Registries WG?
• Lack of interoperability metadata in existing tools
References
• W3C DCAT: http://www.w3.org/TR/vocab-dcat/
• DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat-
application-profile-data-portals-europe-final
• DataCube: http://purl.org/linked-data/cube#
• VOID: http://rdfs.org/ns/void-guide
• VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/
• CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/
• CKAN: http://ckan.org/
• Datahub: http://datahub.io/
• DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture
• Re3data: http://www.re3data.org
• Dryad: http://datadryad.org/
• OpenAIRE: https://www.openaire.eu/
Thank you
Valeria Pesce
Global Forum on Agricultural Research

More Related Content

What's hot

Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordMark Wilkinson
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxfPhilippe Rocca-Serra
 
Data analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsData analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsvty
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsJenn Riley
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...datascienceiqss
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset cataloguese-ROSA
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)mhb120
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomyDejan Radic
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overviewrobin fay
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemChris Mattmann
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...PERICLES_FP7
 
Krish data controls
Krish data controlsKrish data controls
Krish data controlssubakrish
 
The JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scopeThe JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scopeEduserv Foundation
 

What's hot (20)

Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
 
Force11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, OxfordForce11 JDDCP workshop presentation, @ Force2015, Oxford
Force11 JDDCP workshop presentation, @ Force2015, Oxford
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
Data analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsData analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical maps
 
General concepts: DDI
General concepts: DDIGeneral concepts: DDI
General concepts: DDI
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
 
Metadata lecture(9 17-14)
Metadata lecture(9 17-14)Metadata lecture(9 17-14)
Metadata lecture(9 17-14)
 
Data(base) taxonomy
Data(base) taxonomyData(base) taxonomy
Data(base) taxonomy
 
Metadata an overview
Metadata an overviewMetadata an overview
Metadata an overview
 
A Look into the Apache OODT Ecosystem
A Look into the Apache OODT EcosystemA Look into the Apache OODT Ecosystem
A Look into the Apache OODT Ecosystem
 
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
Digital Preservation in the era of Big Data - The Diachron Platform - Acting ...
 
Good Practice in Research Data Management
Good Practice in Research Data ManagementGood Practice in Research Data Management
Good Practice in Research Data Management
 
FAIR Data ecosystem
FAIR Data ecosystemFAIR Data ecosystem
FAIR Data ecosystem
 
Krish data controls
Krish data controlsKrish data controls
Krish data controls
 
The JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scopeThe JISC DC Application Profiles: Some thoughts on requirements and scope
The JISC DC Application Profiles: Some thoughts on requirements and scope
 

Similar to How to Describe a Dataset. Interoperability Issues, by Valeria Pesce

Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21vty
 
DC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open sessionDC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open sessionMikael Nilsson
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)Dan Brickley
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataversevty
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesRichard.Sapon-White
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse vty
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Julie Allinson
 
Validation: Requirements and approaches
Validation: Requirements and approachesValidation: Requirements and approaches
Validation: Requirements and approachesDave Reynolds
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Dr. Haxel Consult
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSAlphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSJenn Riley
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederOpenAIRE
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardmhaendel
 

Similar to How to Describe a Dataset. Interoperability Issues, by Valeria Pesce (20)

Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS TechnologiesEasily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21Flexible metadata schemes for research data repositories - CLARIN Conference'21
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
DC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open sessionDC-2008 Architecture Forum Open session
DC-2008 Architecture Forum Open session
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Understanding RDF: the Resource Description Framework in Context (1999)
Understanding RDF: the Resource Description Framework in Context  (1999)Understanding RDF: the Resource Description Framework in Context  (1999)
Understanding RDF: the Resource Description Framework in Context (1999)
 
Ontologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and DataverseOntologies, controlled vocabularies and Dataverse
Ontologies, controlled vocabularies and Dataverse
 
Metadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemesMetadata lecture 3, metadata schemes
Metadata lecture 3, metadata schemes
 
PRELIDA Project Draft Roadmap
PRELIDA Project Draft RoadmapPRELIDA Project Draft Roadmap
PRELIDA Project Draft Roadmap
 
CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse CLARIN CMDI support in Dataverse
CLARIN CMDI support in Dataverse
 
Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29Swap For Dummies Rsp 2007 11 29
Swap For Dummies Rsp 2007 11 29
 
Validation: Requirements and approaches
Validation: Requirements and approachesValidation: Requirements and approaches
Validation: Requirements and approaches
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Linked Open Data in the World of Patents
Linked Open Data in the World of Patents Linked Open Data in the World of Patents
Linked Open Data in the World of Patents
 
Lecture01 257
Lecture01 257Lecture01 257
Lecture01 257
 
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODSAlphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
Alphabet Soup: Choosing Among DC, QDC, MARC, MARCXML, and MODS
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Understanding Data
Understanding Data Understanding Data
Understanding Data
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 

More from AIMS (Agricultural Information Management Standards)

More from AIMS (Agricultural Information Management Standards) (20)

Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...Metadata as Standard: improving Interoperability through the Research Data Al...
Metadata as Standard: improving Interoperability through the Research Data Al...
 
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic ResourcesAssigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
Assigning Digital Object Identifiers (DOIs) to Plant Genetic Resources
 
VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release VocBench 3: some insights on the forthcoming release
VocBench 3: some insights on the forthcoming release
 
The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...The case for Digital Objects Identifiers (DOIs) in support of research activi...
The case for Digital Objects Identifiers (DOIs) in support of research activi...
 
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management PlanningWebinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management Planning
 
Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library Webinar@ASIRA: How to foster openness from an academic library
Webinar@ASIRA: How to foster openness from an academic library
 
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
Webinar@ASIRA: A Practitioners Approach to Open Data for Agricultural Research
 
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
Webinar@ASIRA: AuthorAID: Supporting Developing Country Researchers in Publis...
 
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
Webinar@ASIRA: Introduction to Using TEEAL to Access Agricultural Journals
 
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA) Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
Webinar@ASIRA: Access to Global Online Research in Agriculture (AGORA)
 
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
Webinar@ASIRA: AGRIS: Providing Access to Agricultural Research and Technolog...
 
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
Webinar@ASIRA: New Roles for Changing Times UNAM Subject Librarians in Context
 
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research PublishingWebinar@ASIRA: Emerging Themes in Agricultural Research Publishing
Webinar@ASIRA: Emerging Themes in Agricultural Research Publishing
 
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
Webinar@AIMS: OKAD & F1000Research: a very different approach to publishing a...
 
Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...Using AGRIS as a portal of choice to access agricultural research and technol...
Using AGRIS as a portal of choice to access agricultural research and technol...
 
Research4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portesResearch4Life: La bibliothèque qui ouvre ses portes
Research4Life: La bibliothèque qui ouvre ses portes
 
Publishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmosPublishing skos concept schemes with skosmos
Publishing skos concept schemes with skosmos
 
Research4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertasResearch4Life: La biblioteca que abre puertas
Research4Life: La biblioteca que abre puertas
 
Research4Life: The library that opens doors
Research4Life: The library that opens doorsResearch4Life: The library that opens doors
Research4Life: The library that opens doors
 

Recently uploaded

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 

How to Describe a Dataset. Interoperability Issues, by Valeria Pesce

  • 1. How to describe a dataset. Interoperability issues Valeria Pesce Global Forum on Agricultural Research
  • 2. Definition of “dataset” The term “dataset” has been defined in several ways, all of which further specify or extend the basic concept of “a collection of data”. Definition given by the W3C Government Linked Data Working Group: A dataset is “a collection of data, published or curated by a single source, and available for access or download in one or more formats” The “instances” of the dataset “available for access or download in one or more formats” are called “distributions”. A dataset can have many distributions. Examples of distributions include a downloadable CSV file, an API or an RSS feed.
  • 3. Definition of “interoperability” “Data interoperability is a feature of datasets - and of information services that give access to datasets - whereby data can easily be retrieved, processed, re-used, and re-packaged (“operated”) by other systems.” Interim Proceedings of International Expert Consultation on “Building the CIARD Framework for Data and Information Sharing”, CIARD (2011) software applications datasets have to be machine-readable
  • 4. What applications need Besides information common to any type of resource (name, author / owner, date…), applications have to find enough metadata about datasets to understand: 1. the specific coverage of the dataset (type of data, thematic coverage, geographic coverage) 2. the necessary technical specifications to retrieve and parse a distribution of the dataset (format, protocol etc.) 3. the conditions for re-use (rights, licenses) 4. the “dimensions” covered by the dataset (e.g. temperature, time, salinity, gene, coordinates) 5. the semantics of the dimensions (units of measure, time granularity, syntax, reference taxonomies)
  • 5. Partial answers in existing vocabularies • DCAT vocabulary – RDF vocabulary for describing any dataset – Datasets can be standalone or part of a “catalog” – Datasets are accessible through several “distributions” – “Other, complementary vocabularies may be used together with DCAT to provide more detailed format-specific information. For example, properties from the VoID vocabulary can be used if that dataset is in RDF format.” • VOID vocabulary – RDF vocabulary for expressing metadata about RDF datasets • (SDMX ) DataCube vocabulary – RDF vocabulary for describing statistical datasets – Useful for attaching metadata about the “data structure” to any dataset that doesn’t follow a known published standard
  • 6. Coverage of a dataset • This can be handled by common Dublin Core properties like subject and coverage. • DCAT re-uses these DC properties. Issue 1: No specific property for the type of data covered in a dataset The values of these properties have to be understood by machines: - The value should be standardized, possibly a URI - The URI should be de-referenceable to a thing - The thing should be part of an authority list / taxonomy Issue 3: There is no authority vocabulary for types of data Issue 1 Issue 2
  • 7. Conditions for re-use • DCAT re-uses the license DC property at the level of distributions • DCAT re-uses the rights DC property at bith the level of dataset and the level of distribution dc:license > dc:LicenseDocument dc:rights > dc:RightsStatement
  • 8. W3C DCAT > DCAT AP
  • 10. Technical properties The necessary technical specifications to retrieve and parse a distribution of a dataset (format, protocol etc.) • DCAT re-uses the DC format property; Issue No property for protocol The values of these properties have to be understood by machines, possibly URIs: Issue2 No comprehensive RDF authority lists for these values (partial: DC Types; non-RDF: IANA types) Issue 1 Issue 2
  • 11. VOID VOID can help with the protocol metadata but only for RDF datasets: - Property for data dump: dataDump - Property for SPARQL endpoint: sparqlEndpoint
  • 12. “Dimensions” and their semantics DCAT does not describe the dimensions of a dataset, except for a reference to a standard if the dataset dimensions can be defined by a formalized standard (e.g. an XML schema or an RDF vocabulary or an ISO standard) dc:conformsTo > dc:Standard Statistical vocabularies can help with the description of the dimensions
  • 13. SDMX: data structure and dimensions SDMX: Statistical Data and Metadata Exchange The data structure definition is a description of all the metadata needed to understand the data set structure. This includes: • identification of the dimensions (Dimension) according to standard statistical terminology, • the key structure (KeyDescriptor), • the code-lists (CodeList) that enumerate valid values for each dimension • coded attribute (CodedAttribute), information about whether attributes are required or optional and coded or free text. Given the metadata in the data structure definition, all of the data in the data set becomes meaningful.
  • 15. DataCube: simplified SDMX in RDF Reference to a concept scheme
  • 16. DataCube: simplified SDMX in RDF “Semantic role” of the property
  • 17. DataCube: simplified SDMX in RDF “Semantic role” of
  • 18. Combining different vocabularies Name URL Owner Content type Topic(s) Language Metadata set(s) Data structure Distribution(s) […] DATASET Name Protocol Endpoint URL Media type Format Size DISTRIBUTION DCAT model Dimensions Attributes Measures Value lists DATA STRUCTURE DataCube model Catalog: the directory Vocabulary(ies) SPARQL endpoint Data dump Serialization format Number of triples RDF dataset info VOID properties If one or more known published metadata sets are used, just fill “metadata set(s)”, otherwise link to a “data structure” with custom “dimensions” IF media type has RDF or SPARQL response
  • 19. Tools for managing dataset metadata • CKAN maintained by the Open Knowledge Foundation Uses most of DCAT. Doesn’t describe dimensions. Also provides a global dataset hub called the Datahub • Dataverse created by Harvard University Uses a custom vocabulary. Doesn’t describe dimensions. • Commercial solutions • Repositories and catalogs: OpenAIRE, DataCite (using re3data to search repositories) and Dryad use their own vocabularies. • CIARD RING Uses full DCAT AP with some extended properties (protocol, data type) and local taxonomies with URIs mapped when possible to authorities. Next steps: adding DataCube properties for dimensions.
  • 20. Major outstanding issues • Some missing properties in existing vocabularies:  approach vocabulary owners OR extend vocabularies • Missing vocabularies for protocols, formats  approach standardizing bodies?  perhaps specific dataset formats? • Need for more standardized semantics for dimensions:  Joint discussions with the RDA Data Type Registries WG? • Lack of interoperability metadata in existing tools
  • 21. References • W3C DCAT: http://www.w3.org/TR/vocab-dcat/ • DCAT AP: https://joinup.ec.europa.eu/asset/dcat_application_profile/asset_release/dcat- application-profile-data-portals-europe-final • DataCube: http://purl.org/linked-data/cube# • VOID: http://rdfs.org/ns/void-guide • VIVO Datastar: http://sourceforge.net/projects/vivo/files/Datastar%20ontology/ • CERIF for datasets: https://cerif4datasets.wordpress.com/c4d-deliverables/ • CKAN: http://ckan.org/ • Datahub: http://datahub.io/ • DataCite: http://search.datacite.org/ui?q=subject%3Aagriculture • Re3data: http://www.re3data.org • Dryad: http://datadryad.org/ • OpenAIRE: https://www.openaire.eu/
  • 22. Thank you Valeria Pesce Global Forum on Agricultural Research