SlideShare a Scribd company logo
Data discovery
through federated
dataset catalogs
Valeria Pesce
Secretariat of the Global Forum on Agricultural Research (GFAR)
Secretariat of the Global Open Data for Agriculture and Nutrition (GODAN) initiative
eROSA workshop, Montpellier, 6-7 July 2017
• Many institutional catalogs / geographically-scoped catalogs / thematic catalogs
• How many catalogs do I have to search?
>> General meta-catalogs? Different targeted catalogs?
>> Federated metadata catalogs
/ Secondary catalogs
1. Dataset discovery: how
• Data are in datasets, stored in some dataset repository
• Datasets can be made searchable through a dataset catalog
Good dataset metadata at the level of the local repository / catalog
Open interoperable dataset metadata
at the level of the primary repository / catalog
IDEAL
>>> Linked Data federated search engines LOD-enabled
primary catalog
Heavy requirements for the
local primary catalog
1. Dataset discovery: good metadata (1)
1. General metadata about the dataset “resource”:
a) identifier(s)
b) who is responsible for it
c) when and where the data were collected
d) relations to organizations, persons, publications, software, projects, funding…
e) the conditions for re-use (rights, licenses)
f) provenance, versions
g) the specific coverage of the dataset (type of data, thematic coverage, geographic
coverage)
Normally covered by generic vocabularies like Dublin Core or DCAT
IDEAL
Let’s look at existing good practices and standards
1. Dataset discovery: good metadata (2)
a) The variables: the observed “dimensions” (e.g. time, geographic region, gender,
elevation…) and the measured / observed phenomenon (e.g. life expectancy)
b) The specification of the dimensions (units of measure, time granularity, syntax, any
scaling factors and metadata such as the status of the observation, reference
taxonomies…)
c) Possible time and space slices; subsets
Not always considered in generic dataset metadata vocabularies (DCAT) but traditionally
included in research datasets (e.g. in formats like NetCDF) and covered by DataCube
IDEAL
2. Metadata about the data structure!
1. Dataset discovery: good metadata (3)
1. Where to retrieve the dataset: URL (data dump, service…)
2. The necessary technical specifications to retrieve and parse a distribution of the
dataset:
- format (file format, data format), vocabularies / data dictionaries
- protocol, API parameters…
Not always considered in generic dataset metadata vocabularies: DCAT covers data
dump and format, VOID some services
IDEAL
3. Metadata about the actual “serializations” or “distributions” of the
dataset.
Data will be processed by tools! Data formats and access protocols are important.
1. Dataset discovery: interoperable metadata
Secondary catalogs have to be able to retrieve metadata from the dataset catalog
IDEAL
Ideally, secondary catalogs would be able to retrieve only subsets of the
catalog (by type of data, by data format, by phenomenon observed?)
Data service / API with filtering parameters Catalogs as DAAS - Data-as-a-Service
• All discovery-relevant metadata are exposed in machine-readable form
• Exposed metadata use shared semantics
• Standardization of the values, e.g. for “thematic coverage” or “dimensions” of
datasets, “format” or “protocol used” of distributions etc.
• The value should be standardized, possibly a URI
• The value should be part of an authority list / code list
1. Dataset discovery: ideal architecture
Conclusions
• Dataset metadata ideally created by authors / curator at the local level,
catalog associated with repository
• High-quality metadata in catalogs allowing for answers to all possible queries
• Ownership, rights, temporal, spatial, thematic, data structure, access…
• Machine-readable metadata; agreed vocabularies; shared semantics; APIs for
querying
• General or specialized secondary catalogs federate metadata from primary
catalogs; multiply discoverability and cater for different audiences
• Also secondary catalogs expose good metadata and APIs
• There’s an inventory / registry of dataset repositories and all types of catalogs
IDEAL
2. Dataset discovery: current situation
in Agriculture (1)
• Institutional data repositories are picking up (need for an inventory!)
CURRENT
• Use of standardized or semi-standardized data repository tools with
cataloguing functionalities and APIs is picking up (Dataverse, CKAN…)
• Some governmental metadata catalogs exist, often using standardized tools
(CKAN) and standard vocabularies (DCAT), that include agricultural datasets
• Some international data catalogs exist that include agricultural datasets
(re3data, OpenAIRE, DataHub…)
• Also research-oriented data services like OpenDAP or Unidata THREDDS
• Some secondary federated catalogs exist (? Need for an inventory!)
• General one for agriculture (usable as an inventory): the CIARD RING
2. Dataset discovery: current situation
Example of CIARD RING secondary catalog
• Architecture:
• Datasets can be hosted anywhere, the RING only hosts the metadata
• Optionally, datasets can be uploaded and the RING can act as a subsidiary repository
• Datasets (metadata) can be federated from other catalogs
• It uses the dataset / distribution DCAT model
• Metadata quality:
• it uses a combination of the DCAT model + the VOID vocabulary and the DataCube
vocabulary + some extra properties (? a “RING DCAT profile” will be published)
• Shared semantics:
• it has a Linked Data layer, URIs for all entities; all categories are published as SKOS
concepts in SKOS concept schemes and are mapped to external concepts whenever
possible
2. Dataset discovery: current situation
in Agriculture (2)
• Metadata quality of most used primary data catalog tools is not high
• E.g. no metadata about data structure, no shared semantics for data types, topics,
formats, standards used
CURRENT
>> poor discovery services in secondary catalogs
• Metadata interoperability of most used primary data catalog tools is not high
• No full compliance with broadly recognized vocabularies (DCAT, DataCube…)
• No functionality to apply shared semantics for categorizations like topics, data types,
formats, dimensions (sometimes keywords from AGROVOC)
• Data not always accessible through exposed metadataNo full compliance
with broadly recognized vocabularies (DCAT, DataCube…)
• Scarce population >> lack of reputation / authority of secondary catalogs.>
lack of motivation to share
3. Dataset discovery: infrastructural improvements (1)
Quality depends on the metadata coming from the primary repository /
catalog… how can a good infrastructure overcome this problem?
• Advocacy for better / improved tools?
• Promote improvement of existing tools?
• Dataset repository / catalog platforms in the cloud?
• Complementary / subsidiary role of secondary catalogs?
• Allow subsidiary use of secondary catalogs as primary catalog and even repository
for some datasets (small institutions, individuals)
• Cater or the improvement of metadata directly in the secondary catalogs
• Incentives to provide good metadata?
• E.g. offer mechanisms to a) measure reuse; b) enforce respect of usage rights.
• Good agreed metadata standards and reference value vocabularies
• Combine existing standards (DCAT, DataCube, VOID…) in an application
profile?
• Provide a reference framework of agreed value vocabularies with URIs?
Mapping from local values to agreed ones?
>> AgriSemantics – GACS, VEST/AgroPortal
• Avoid too much interdependence. Design a loosely coupled
infrastructure. (How?)
3. Dataset discovery: infrastructural improvements (2)
Key questions
• Is it our task to aim at having better machine-readable metadata at the level
of the primary local repository / catalog?
• How can we influence this? Advocate for including metadata in researcher’s tools?
• Do we want to “drive” secondary catalogs or let them bloom? Or both?
• At least a global one for food&ag? How many? Who decides? Who manages them?
• How can other infrastructural components facilitate good catalogs?
• Subsidiary metadata in secondary catalogs? Good dataset catalog tools in the cloud?
• Good agreed metadata standards and reference value vocabularies?
• Mapping with local values
• How to design for resilience of the system? Loosely coupled components?
• How much of this is specific to food&ag and which aspects should be tackled
in a broader context? (EOSC?)
Data discovery
through dataset repositories
and catalogs
Thank you for your attention
eROSA workshop, Montpellier, 6-7 July 2017
Some recommendations from EC High Level
Expert Group on EOSC (1)
“An Internet of data and services where containers with software
applications are routed to relevant data and vice versa” (B. Mons)
- Develop and sustain core data assets for the EOSC and make them
available to the community under well-defined conditions. These may
include workflows, analytics programmes and notably existing datasets
with FAIR status (including metadata creation)
- Support the development of one or more publicly available data search
engine(s) that find FAIR metadata across trusted EOSC repositories
- Develop technologies and approaches to meaningfully measure re-use
and scientific impact of Research Objects after their initial publication
(e.g. metrics that matter and get recognised)
Some recommendations from EC High Level
Expert Group on EOSC (1)
- Start dedicated efforts to prepare data and research objects for
inclusion in the EOSC
- Combine single sign-on issues with the connection of social and
professional people oriented web applications resulting in a federated
identity and credentials for all people in the EOSC
- A repository of research vocabularies and a software application to
support wider access, reuse and development of vocabularies thereby
enhancing interoperability
Diagram of India govt. Data-as-a-Service
Data-as-a-Service from an Elixir PPT

More Related Content

What's hot

Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
Karel Charvat
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
Scott Edmunds
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
Tom Plasterer
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
dkNET
 
FAIR data overview
FAIR data overviewFAIR data overview
Metadata lecture 1, intro
Metadata lecture 1, introMetadata lecture 1, intro
Metadata lecture 1, intro
Richard.Sapon-White
 
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET
 
Advantages of metadata
Advantages of metadataAdvantages of metadata
Advantages of metadata
Azeem Sultan
 
Metadata: A concept
Metadata: A conceptMetadata: A concept
Metadata: A concept
SrikantaSahu10
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODS
Felipe Gutierrez
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
Merce Crosas
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
Neuroscience Information Framework
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
Merce Crosas
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
Neuroscience Information Framework
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
William Smith
 
Dk net webinar tutorial pen
Dk net webinar tutorial penDk net webinar tutorial pen
Dk net webinar tutorial pen
Maryann Martone
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
Michael Bar-Sinai
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
Philippe Rocca-Serra
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
EUDAT
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
Nikos Palavitsinis, PhD
 

What's hot (20)

Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Metadata lecture 1, intro
Metadata lecture 1, introMetadata lecture 1, intro
Metadata lecture 1, intro
 
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
 
Advantages of metadata
Advantages of metadataAdvantages of metadata
Advantages of metadata
 
Metadata: A concept
Metadata: A conceptMetadata: A concept
Metadata: A concept
 
FAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODSFAIR sequencing data repository based on iRODS
FAIR sequencing data repository based on iRODS
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Dk net webinar tutorial pen
Dk net webinar tutorial penDk net webinar tutorial pen
Dk net webinar tutorial pen
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
 

Similar to eROSA Stakeholder WS1: Data discovery through federated dataset catalogues

Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
Nederlands Instituut voor Beeld en Geluid
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
Tom Plasterer
 
Essentials 4 Data Support: a fine course in FAIR Data Support
Essentials 4 Data Support: a fine course in FAIR Data SupportEssentials 4 Data Support: a fine course in FAIR Data Support
Essentials 4 Data Support: a fine course in FAIR Data Support
Ellen Verbakel
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
OpenAIRE
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Jenn Riley
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
FAIRDOM
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
ASIS&T
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
thplayer127
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
The University of Edinburgh
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
Vassilis Protonotarios
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Susanna-Assunta Sansone
 
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steve Androulakis
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ARDC
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
OpenAIRE
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
Dr. Mirko Kämpf
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin
 

Similar to eROSA Stakeholder WS1: Data discovery through federated dataset catalogues (20)

Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Essentials 4 Data Support: a fine course in FAIR Data Support
Essentials 4 Data Support: a fine course in FAIR Data SupportEssentials 4 Data Support: a fine course in FAIR Data Support
Essentials 4 Data Support: a fine course in FAIR Data Support
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 
FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 

More from e-ROSA

Building Capacities for Open Science
Building Capacities for Open Science Building Capacities for Open Science
Building Capacities for Open Science
e-ROSA
 
Community and Governance Recommendations for the Future State of an e-infrast...
Community and Governance Recommendations for the Future State of an e-infrast...Community and Governance Recommendations for the Future State of an e-infrast...
Community and Governance Recommendations for the Future State of an e-infrast...
e-ROSA
 
Technical Recommendations for the Future State of an e-infrastructure in Agri...
Technical Recommendations for the Future State of an e-infrastructure in Agri...Technical Recommendations for the Future State of an e-infrastructure in Agri...
Technical Recommendations for the Future State of an e-infrastructure in Agri...
e-ROSA
 
Towards Open Science in Agriculture & Food
Towards Open Science in Agriculture & FoodTowards Open Science in Agriculture & Food
Towards Open Science in Agriculture & Food
e-ROSA
 
FACCE JPI agenda on big data and digitization of agriculture
FACCE JPI agenda on big data and digitization of agricultureFACCE JPI agenda on big data and digitization of agriculture
FACCE JPI agenda on big data and digitization of agriculture
e-ROSA
 
ICT-AGRI agenda on digitization of agriculture
ICT-AGRI agenda on digitization of agricultureICT-AGRI agenda on digitization of agriculture
ICT-AGRI agenda on digitization of agriculture
e-ROSA
 
D4Science experience: VREs for increasing the sharing and collaboration in th...
D4Science experience: VREs for increasing the sharing and collaboration in th...D4Science experience: VREs for increasing the sharing and collaboration in th...
D4Science experience: VREs for increasing the sharing and collaboration in th...
e-ROSA
 
The state-of-play of the general EOSC policy work
The state-of-play of the general EOSC policy workThe state-of-play of the general EOSC policy work
The state-of-play of the general EOSC policy work
e-ROSA
 
The Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food CommunityThe Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food Community
e-ROSA
 
Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...
e-ROSA
 
eROSA Vision 2030
eROSA Vision 2030eROSA Vision 2030
eROSA Vision 2030
e-ROSA
 
Technical Implementation Agenda for a pan-European Scientific e-infrastructur...
Technical Implementation Agenda for a pan-European Scientific e-infrastructur...Technical Implementation Agenda for a pan-European Scientific e-infrastructur...
Technical Implementation Agenda for a pan-European Scientific e-infrastructur...
e-ROSA
 
E-Infrastructure for open agri-food sciences - The landscape
E-Infrastructure for open agri-food sciences - The landscapeE-Infrastructure for open agri-food sciences - The landscape
E-Infrastructure for open agri-food sciences - The landscape
e-ROSA
 
OpenAIRE: Implementing Open Science
OpenAIRE: Implementing Open ScienceOpenAIRE: Implementing Open Science
OpenAIRE: Implementing Open Science
e-ROSA
 
The D4Science Infrastructure
The D4Science InfrastructureThe D4Science Infrastructure
The D4Science Infrastructure
e-ROSA
 
EOSC-Hub - Services for the European Open Science Cloud
EOSC-Hub - Services for the European Open Science CloudEOSC-Hub - Services for the European Open Science Cloud
EOSC-Hub - Services for the European Open Science Cloud
e-ROSA
 
Grand Challenges and Open Science for the Food System
Grand Challenges and Open Science for the Food SystemGrand Challenges and Open Science for the Food System
Grand Challenges and Open Science for the Food System
e-ROSA
 
E-infrastructure for open agri-food sciences: Vision & Roadmap
E-infrastructure for open agri-food sciences: Vision & RoadmapE-infrastructure for open agri-food sciences: Vision & Roadmap
E-infrastructure for open agri-food sciences: Vision & Roadmap
e-ROSA
 
2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?
2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?
2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?
e-ROSA
 
EOSC Stakeholder Forum - The e-ROSA project
EOSC Stakeholder Forum - The e-ROSA projectEOSC Stakeholder Forum - The e-ROSA project
EOSC Stakeholder Forum - The e-ROSA project
e-ROSA
 

More from e-ROSA (20)

Building Capacities for Open Science
Building Capacities for Open Science Building Capacities for Open Science
Building Capacities for Open Science
 
Community and Governance Recommendations for the Future State of an e-infrast...
Community and Governance Recommendations for the Future State of an e-infrast...Community and Governance Recommendations for the Future State of an e-infrast...
Community and Governance Recommendations for the Future State of an e-infrast...
 
Technical Recommendations for the Future State of an e-infrastructure in Agri...
Technical Recommendations for the Future State of an e-infrastructure in Agri...Technical Recommendations for the Future State of an e-infrastructure in Agri...
Technical Recommendations for the Future State of an e-infrastructure in Agri...
 
Towards Open Science in Agriculture & Food
Towards Open Science in Agriculture & FoodTowards Open Science in Agriculture & Food
Towards Open Science in Agriculture & Food
 
FACCE JPI agenda on big data and digitization of agriculture
FACCE JPI agenda on big data and digitization of agricultureFACCE JPI agenda on big data and digitization of agriculture
FACCE JPI agenda on big data and digitization of agriculture
 
ICT-AGRI agenda on digitization of agriculture
ICT-AGRI agenda on digitization of agricultureICT-AGRI agenda on digitization of agriculture
ICT-AGRI agenda on digitization of agriculture
 
D4Science experience: VREs for increasing the sharing and collaboration in th...
D4Science experience: VREs for increasing the sharing and collaboration in th...D4Science experience: VREs for increasing the sharing and collaboration in th...
D4Science experience: VREs for increasing the sharing and collaboration in th...
 
The state-of-play of the general EOSC policy work
The state-of-play of the general EOSC policy workThe state-of-play of the general EOSC policy work
The state-of-play of the general EOSC policy work
 
The Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food CommunityThe Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food Community
 
Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...Why the food sector needs a research infrastructure on Food and Health Consum...
Why the food sector needs a research infrastructure on Food and Health Consum...
 
eROSA Vision 2030
eROSA Vision 2030eROSA Vision 2030
eROSA Vision 2030
 
Technical Implementation Agenda for a pan-European Scientific e-infrastructur...
Technical Implementation Agenda for a pan-European Scientific e-infrastructur...Technical Implementation Agenda for a pan-European Scientific e-infrastructur...
Technical Implementation Agenda for a pan-European Scientific e-infrastructur...
 
E-Infrastructure for open agri-food sciences - The landscape
E-Infrastructure for open agri-food sciences - The landscapeE-Infrastructure for open agri-food sciences - The landscape
E-Infrastructure for open agri-food sciences - The landscape
 
OpenAIRE: Implementing Open Science
OpenAIRE: Implementing Open ScienceOpenAIRE: Implementing Open Science
OpenAIRE: Implementing Open Science
 
The D4Science Infrastructure
The D4Science InfrastructureThe D4Science Infrastructure
The D4Science Infrastructure
 
EOSC-Hub - Services for the European Open Science Cloud
EOSC-Hub - Services for the European Open Science CloudEOSC-Hub - Services for the European Open Science Cloud
EOSC-Hub - Services for the European Open Science Cloud
 
Grand Challenges and Open Science for the Food System
Grand Challenges and Open Science for the Food SystemGrand Challenges and Open Science for the Food System
Grand Challenges and Open Science for the Food System
 
E-infrastructure for open agri-food sciences: Vision & Roadmap
E-infrastructure for open agri-food sciences: Vision & RoadmapE-infrastructure for open agri-food sciences: Vision & Roadmap
E-infrastructure for open agri-food sciences: Vision & Roadmap
 
2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?
2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?
2nd e-ROSA Stakeholder workshop: M. Chelle, Genomics?
 
EOSC Stakeholder Forum - The e-ROSA project
EOSC Stakeholder Forum - The e-ROSA projectEOSC Stakeholder Forum - The e-ROSA project
EOSC Stakeholder Forum - The e-ROSA project
 

Recently uploaded

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 

Recently uploaded (20)

Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 

eROSA Stakeholder WS1: Data discovery through federated dataset catalogues

  • 1. Data discovery through federated dataset catalogs Valeria Pesce Secretariat of the Global Forum on Agricultural Research (GFAR) Secretariat of the Global Open Data for Agriculture and Nutrition (GODAN) initiative eROSA workshop, Montpellier, 6-7 July 2017
  • 2. • Many institutional catalogs / geographically-scoped catalogs / thematic catalogs • How many catalogs do I have to search? >> General meta-catalogs? Different targeted catalogs? >> Federated metadata catalogs / Secondary catalogs 1. Dataset discovery: how • Data are in datasets, stored in some dataset repository • Datasets can be made searchable through a dataset catalog Good dataset metadata at the level of the local repository / catalog Open interoperable dataset metadata at the level of the primary repository / catalog IDEAL >>> Linked Data federated search engines LOD-enabled primary catalog Heavy requirements for the local primary catalog
  • 3. 1. Dataset discovery: good metadata (1) 1. General metadata about the dataset “resource”: a) identifier(s) b) who is responsible for it c) when and where the data were collected d) relations to organizations, persons, publications, software, projects, funding… e) the conditions for re-use (rights, licenses) f) provenance, versions g) the specific coverage of the dataset (type of data, thematic coverage, geographic coverage) Normally covered by generic vocabularies like Dublin Core or DCAT IDEAL Let’s look at existing good practices and standards
  • 4. 1. Dataset discovery: good metadata (2) a) The variables: the observed “dimensions” (e.g. time, geographic region, gender, elevation…) and the measured / observed phenomenon (e.g. life expectancy) b) The specification of the dimensions (units of measure, time granularity, syntax, any scaling factors and metadata such as the status of the observation, reference taxonomies…) c) Possible time and space slices; subsets Not always considered in generic dataset metadata vocabularies (DCAT) but traditionally included in research datasets (e.g. in formats like NetCDF) and covered by DataCube IDEAL 2. Metadata about the data structure!
  • 5. 1. Dataset discovery: good metadata (3) 1. Where to retrieve the dataset: URL (data dump, service…) 2. The necessary technical specifications to retrieve and parse a distribution of the dataset: - format (file format, data format), vocabularies / data dictionaries - protocol, API parameters… Not always considered in generic dataset metadata vocabularies: DCAT covers data dump and format, VOID some services IDEAL 3. Metadata about the actual “serializations” or “distributions” of the dataset. Data will be processed by tools! Data formats and access protocols are important.
  • 6. 1. Dataset discovery: interoperable metadata Secondary catalogs have to be able to retrieve metadata from the dataset catalog IDEAL Ideally, secondary catalogs would be able to retrieve only subsets of the catalog (by type of data, by data format, by phenomenon observed?) Data service / API with filtering parameters Catalogs as DAAS - Data-as-a-Service • All discovery-relevant metadata are exposed in machine-readable form • Exposed metadata use shared semantics • Standardization of the values, e.g. for “thematic coverage” or “dimensions” of datasets, “format” or “protocol used” of distributions etc. • The value should be standardized, possibly a URI • The value should be part of an authority list / code list
  • 7. 1. Dataset discovery: ideal architecture Conclusions • Dataset metadata ideally created by authors / curator at the local level, catalog associated with repository • High-quality metadata in catalogs allowing for answers to all possible queries • Ownership, rights, temporal, spatial, thematic, data structure, access… • Machine-readable metadata; agreed vocabularies; shared semantics; APIs for querying • General or specialized secondary catalogs federate metadata from primary catalogs; multiply discoverability and cater for different audiences • Also secondary catalogs expose good metadata and APIs • There’s an inventory / registry of dataset repositories and all types of catalogs IDEAL
  • 8. 2. Dataset discovery: current situation in Agriculture (1) • Institutional data repositories are picking up (need for an inventory!) CURRENT • Use of standardized or semi-standardized data repository tools with cataloguing functionalities and APIs is picking up (Dataverse, CKAN…) • Some governmental metadata catalogs exist, often using standardized tools (CKAN) and standard vocabularies (DCAT), that include agricultural datasets • Some international data catalogs exist that include agricultural datasets (re3data, OpenAIRE, DataHub…) • Also research-oriented data services like OpenDAP or Unidata THREDDS • Some secondary federated catalogs exist (? Need for an inventory!) • General one for agriculture (usable as an inventory): the CIARD RING
  • 9. 2. Dataset discovery: current situation Example of CIARD RING secondary catalog • Architecture: • Datasets can be hosted anywhere, the RING only hosts the metadata • Optionally, datasets can be uploaded and the RING can act as a subsidiary repository • Datasets (metadata) can be federated from other catalogs • It uses the dataset / distribution DCAT model • Metadata quality: • it uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary + some extra properties (? a “RING DCAT profile” will be published) • Shared semantics: • it has a Linked Data layer, URIs for all entities; all categories are published as SKOS concepts in SKOS concept schemes and are mapped to external concepts whenever possible
  • 10. 2. Dataset discovery: current situation in Agriculture (2) • Metadata quality of most used primary data catalog tools is not high • E.g. no metadata about data structure, no shared semantics for data types, topics, formats, standards used CURRENT >> poor discovery services in secondary catalogs • Metadata interoperability of most used primary data catalog tools is not high • No full compliance with broadly recognized vocabularies (DCAT, DataCube…) • No functionality to apply shared semantics for categorizations like topics, data types, formats, dimensions (sometimes keywords from AGROVOC) • Data not always accessible through exposed metadataNo full compliance with broadly recognized vocabularies (DCAT, DataCube…) • Scarce population >> lack of reputation / authority of secondary catalogs.> lack of motivation to share
  • 11. 3. Dataset discovery: infrastructural improvements (1) Quality depends on the metadata coming from the primary repository / catalog… how can a good infrastructure overcome this problem? • Advocacy for better / improved tools? • Promote improvement of existing tools? • Dataset repository / catalog platforms in the cloud? • Complementary / subsidiary role of secondary catalogs? • Allow subsidiary use of secondary catalogs as primary catalog and even repository for some datasets (small institutions, individuals) • Cater or the improvement of metadata directly in the secondary catalogs • Incentives to provide good metadata? • E.g. offer mechanisms to a) measure reuse; b) enforce respect of usage rights.
  • 12. • Good agreed metadata standards and reference value vocabularies • Combine existing standards (DCAT, DataCube, VOID…) in an application profile? • Provide a reference framework of agreed value vocabularies with URIs? Mapping from local values to agreed ones? >> AgriSemantics – GACS, VEST/AgroPortal • Avoid too much interdependence. Design a loosely coupled infrastructure. (How?) 3. Dataset discovery: infrastructural improvements (2)
  • 13. Key questions • Is it our task to aim at having better machine-readable metadata at the level of the primary local repository / catalog? • How can we influence this? Advocate for including metadata in researcher’s tools? • Do we want to “drive” secondary catalogs or let them bloom? Or both? • At least a global one for food&ag? How many? Who decides? Who manages them? • How can other infrastructural components facilitate good catalogs? • Subsidiary metadata in secondary catalogs? Good dataset catalog tools in the cloud? • Good agreed metadata standards and reference value vocabularies? • Mapping with local values • How to design for resilience of the system? Loosely coupled components? • How much of this is specific to food&ag and which aspects should be tackled in a broader context? (EOSC?)
  • 14. Data discovery through dataset repositories and catalogs Thank you for your attention eROSA workshop, Montpellier, 6-7 July 2017
  • 15. Some recommendations from EC High Level Expert Group on EOSC (1) “An Internet of data and services where containers with software applications are routed to relevant data and vice versa” (B. Mons) - Develop and sustain core data assets for the EOSC and make them available to the community under well-defined conditions. These may include workflows, analytics programmes and notably existing datasets with FAIR status (including metadata creation) - Support the development of one or more publicly available data search engine(s) that find FAIR metadata across trusted EOSC repositories - Develop technologies and approaches to meaningfully measure re-use and scientific impact of Research Objects after their initial publication (e.g. metrics that matter and get recognised)
  • 16. Some recommendations from EC High Level Expert Group on EOSC (1) - Start dedicated efforts to prepare data and research objects for inclusion in the EOSC - Combine single sign-on issues with the connection of social and professional people oriented web applications resulting in a federated identity and credentials for all people in the EOSC - A repository of research vocabularies and a software application to support wider access, reuse and development of vocabularies thereby enhancing interoperability
  • 17. Diagram of India govt. Data-as-a-Service