SlideShare a Scribd company logo
Data discovery
through federated
dataset catalogs
Valeria Pesce
Secretariat of the Global Forum on Agricultural Research (GFAR)
Secretariat of the Global Open Data for Agriculture and Nutrition (GODAN) initiative
eROSA workshop, Montpellier, 6-7 July 2017
• Many institutional catalogs / geographically-scoped catalogs / thematic catalogs
• How many catalogs do I have to search?
>> General meta-catalogs? Different targeted catalogs?
>> Federated metadata catalogs
/ Secondary catalogs
1. Dataset discovery: how
• Data are in datasets, stored in some dataset repository
• Datasets can be made searchable through a dataset catalog
Good dataset metadata at the level of the local repository / catalog
Open interoperable dataset metadata
at the level of the primary repository / catalog
IDEAL
>>> Linked Data federated search engines LOD-enabled
primary catalog
Heavy requirements for the
local primary catalog
1. Dataset discovery: good metadata (1)
1. General metadata about the dataset “resource”:
a) identifier(s)
b) who is responsible for it
c) when and where the data were collected
d) relations to organizations, persons, publications, software, projects, funding…
e) the conditions for re-use (rights, licenses)
f) provenance, versions
g) the specific coverage of the dataset (type of data, thematic coverage, geographic
coverage)
Normally covered by generic vocabularies like Dublin Core or DCAT
IDEAL
Let’s look at existing good practices and standards
1. Dataset discovery: good metadata (2)
a) The variables: the observed “dimensions” (e.g. time, geographic region, gender,
elevation…) and the measured / observed phenomenon (e.g. life expectancy)
b) The specification of the dimensions (units of measure, time granularity, syntax, any
scaling factors and metadata such as the status of the observation, reference
taxonomies…)
c) Possible time and space slices; subsets
Not always considered in generic dataset metadata vocabularies (DCAT) but traditionally
included in research datasets (e.g. in formats like NetCDF) and covered by DataCube
IDEAL
2. Metadata about the data structure!
1. Dataset discovery: good metadata (3)
1. Where to retrieve the dataset: URL (data dump, service…)
2. The necessary technical specifications to retrieve and parse a distribution of the
dataset:
- format (file format, data format), vocabularies / data dictionaries
- protocol, API parameters…
Not always considered in generic dataset metadata vocabularies: DCAT covers data
dump and format, VOID some services
IDEAL
3. Metadata about the actual “serializations” or “distributions” of the
dataset.
Data will be processed by tools! Data formats and access protocols are important.
1. Dataset discovery: interoperable metadata
Secondary catalogs have to be able to retrieve metadata from the dataset catalog
IDEAL
Ideally, secondary catalogs would be able to retrieve only subsets of the
catalog (by type of data, by data format, by phenomenon observed?)
Data service / API with filtering parameters Catalogs as DAAS - Data-as-a-Service
• All discovery-relevant metadata are exposed in machine-readable form
• Exposed metadata use shared semantics
• Standardization of the values, e.g. for “thematic coverage” or “dimensions” of
datasets, “format” or “protocol used” of distributions etc.
• The value should be standardized, possibly a URI
• The value should be part of an authority list / code list
1. Dataset discovery: ideal architecture
Conclusions
• Dataset metadata ideally created by authors / curators at the local level,
catalog associated with repository
• High-quality metadata in catalogs allowing for answers to all possible queries
• Ownership, rights, temporal, spatial, thematic, data structure, access…
• Machine-readable metadata; agreed vocabularies; shared semantics; APIs for
querying
• General or specialized secondary catalogs federate metadata from primary
catalogs; multiply discoverability and cater for different audiences
• Also secondary catalogs expose good metadata and APIs
• There’s an inventory / registry of dataset repositories and all types of catalogs
IDEAL
2. Dataset discovery: current situation
in Agriculture (1)
• Institutional data repositories are picking up (need for an inventory!)
CURRENT
• Use of standardized or semi-standardized data repository tools with
cataloguing functionalities and APIs is picking up (Dataverse, CKAN…)
• Some governmental metadata catalogs exist, often using standardized tools
(CKAN) and standard vocabularies (DCAT), that include agricultural datasets
• Some international data catalogs exist that include agricultural datasets
(re3data, OpenAIRE, DataHub…)
• Also research-oriented data services like OpenDAP or Unidata THREDDS
• Some secondary federated catalogs exist (? Need for an inventory!)
• General one for agriculture (usable as an inventory): the CIARD RING
2. Dataset discovery: current situation
Example of CIARD RING secondary catalog
• Architecture:
• Datasets can be hosted anywhere, the RING only hosts the metadata
• Optionally, datasets can be uploaded and the RING can act as a subsidiary repository
• Datasets (metadata) can be federated from other catalogs
• It uses the dataset / distribution DCAT model
• Metadata quality:
• it uses a combination of the DCAT model + the VOID vocabulary and the DataCube
vocabulary + some extra properties (? a “RING DCAT profile” will be published)
• Shared semantics:
• it has a Linked Data layer, URIs for all entities; all categories are published as SKOS
concepts in SKOS concept schemes and are mapped to external concepts whenever
possible
2. Dataset discovery: current situation
in Agriculture (2)
• Metadata quality of most used primary data catalog tools is not high
• E.g. no metadata about data structure, no shared semantics for data types, topics,
formats, standards used
CURRENT
>> poor discovery services in secondary catalogs
• Metadata interoperability of most used primary data catalog tools is not high
• No full compliance with broadly recognized vocabularies (DCAT, DataCube…)
• No functionality to apply shared semantics for categorizations like topics, data types,
formats, dimensions (sometimes keywords from AGROVOC)
• Data not always accessible through exposed metadataNo full compliance
with broadly recognized vocabularies (DCAT, DataCube…)
• Scarce population >> lack of reputation / authority of secondary catalogs.>
lack of motivation to share
3. Dataset discovery: infrastructural improvements (1)
Quality depends on the metadata coming from the primary repository /
catalog… how can a good infrastructure overcome this problem?
• Advocacy for better / improved tools?
• Promote improvement of existing tools?
• Dataset repository / catalog platforms in the cloud?
• Complementary / subsidiary role of secondary catalogs?
• Allow subsidiary use of secondary catalogs as primary catalog and even repository
for some datasets (small institutions, individuals)
• Cater or the improvement of metadata directly in the secondary catalogs
• Incentives to provide good metadata?
• E.g. offer mechanisms to a) measure reuse; b) enforce respect of usage rights.
• Good agreed metadata standards and reference value vocabularies
• Combine existing standards (DCAT, DataCube, VOID…) in an application
profile?
• Provide a reference framework of agreed value vocabularies with URIs?
Mapping from local values to agreed ones?
>> AgriSemantics – GACS, VEST/AgroPortal
• Avoid too much interdependence. Design a loosely coupled
infrastructure. (How?)
3. Dataset discovery: infrastructural improvements (2)
Key questions
• Is it our task to aim at having better machine-readable metadata at the level
of the primary local repository / catalog?
• How can we influence this? Advocate for including metadata in researcher’s tools?
• Do we want to “drive” secondary catalogs or let them bloom? Or both?
• At least a global one for food&ag? How many? Who decides? Who manages them?
• How can other infrastructural components facilitate good catalogs?
• Subsidiary metadata in secondary catalogs? Good dataset catalog tools in the cloud?
• Good agreed metadata standards and reference value vocabularies?
• Mapping with local values
• How to design for resilience of the system? Loosely coupled components?
• How much of this is specific to food&ag and which aspects should be tackled
in a broader context? (EOSC?)
Data discovery
through dataset repositories
and catalogs
Thank you for your attention
eROSA workshop, Montpellier, 6-7 July 2017

More Related Content

What's hot

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
DataWorks Summit
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
ChrisFord803185
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance Success
Ample Insight Inc
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
sarakirsten
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
DATAVERSITY
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
Alan McSweeney
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
Dr. Jasmine Beulah Gnanadurai
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
DATAVERSITY
 
Data mesh
Data meshData mesh
Data mesh
ManojKumarR41
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Spark Summit
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
Analytics8
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Nathan Bijnens
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022
Henrik Brattlie
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
vty
 
Consensus and Raft Algorithm in Distributed System
Consensus and  Raft Algorithm in Distributed SystemConsensus and  Raft Algorithm in Distributed System
Consensus and Raft Algorithm in Distributed System
Thao Huynh Quang
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
BigID Inc
 

What's hot (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Data Mesh 101
Data Mesh 101Data Mesh 101
Data Mesh 101
 
Practical Guide to Data Governance Success
Practical Guide to Data Governance SuccessPractical Guide to Data Governance Success
Practical Guide to Data Governance Success
 
Successful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata DesignSuccessful Content Management Through Taxonomy And Metadata Design
Successful Content Management Through Taxonomy And Metadata Design
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
Designing An Enterprise Data Fabric
Designing An Enterprise Data FabricDesigning An Enterprise Data Fabric
Designing An Enterprise Data Fabric
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Data mesh
Data meshData mesh
Data mesh
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022Databricks secure deployments and security baselines, doug march 2022
Databricks secure deployments and security baselines, doug march 2022
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
Consensus and Raft Algorithm in Distributed System
Consensus and  Raft Algorithm in Distributed SystemConsensus and  Raft Algorithm in Distributed System
Consensus and Raft Algorithm in Distributed System
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
 

Viewers also liked

Data Modeling & Data Integration
Data Modeling & Data IntegrationData Modeling & Data Integration
Data Modeling & Data Integration
DATAVERSITY
 
The agINFRA Linked Data layer
The agINFRA Linked Data layerThe agINFRA Linked Data layer
The agINFRA Linked Data layer
Valeria Pesce
 
Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...
Valeria Pesce
 
Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?
Gauri Salokhe
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
Hortonworks
 
Inventory of data standards for food & agriculture
Inventory of data standards for food & agricultureInventory of data standards for food & agriculture
Inventory of data standards for food & agriculture
Valeria Pesce
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
Valeria Pesce
 
Semantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standardsSemantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standards
Valeria Pesce
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Mark Kromer
 
Cognitive Search for Knowledge Management
Cognitive Search for Knowledge ManagementCognitive Search for Knowledge Management
Cognitive Search for Knowledge Management
Attivio
 
A global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural developmentA global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural development
Valeria Pesce
 
Attivio Predictions 2017
Attivio Predictions 2017Attivio Predictions 2017
Attivio Predictions 2017
Attivio
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
Valeria Pesce
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Amazon Web Services
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 

Viewers also liked (15)

Data Modeling & Data Integration
Data Modeling & Data IntegrationData Modeling & Data Integration
Data Modeling & Data Integration
 
The agINFRA Linked Data layer
The agINFRA Linked Data layerThe agINFRA Linked Data layer
The agINFRA Linked Data layer
 
Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...Semantic challenges in sharing dataset metadata and creating federated datase...
Semantic challenges in sharing dataset metadata and creating federated datase...
 
Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?Sharing Agricultural Events Information: When and where is that workshop?
Sharing Agricultural Events Information: When and where is that workshop?
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
 
Inventory of data standards for food & agriculture
Inventory of data standards for food & agricultureInventory of data standards for food & agriculture
Inventory of data standards for food & agriculture
 
How to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issuesHow to describe a dataset. Interoperability issues
How to describe a dataset. Interoperability issues
 
Semantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standardsSemantics for food and agriculture: the GODAN Action map of data standards
Semantics for food and agriculture: the GODAN Action map of data standards
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
Cognitive Search for Knowledge Management
Cognitive Search for Knowledge ManagementCognitive Search for Knowledge Management
Cognitive Search for Knowledge Management
 
A global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural developmentA global linked and open data infrastructure for agricultural development
A global linked and open data infrastructure for agricultural development
 
Attivio Predictions 2017
Attivio Predictions 2017Attivio Predictions 2017
Attivio Predictions 2017
 
Dataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabulariesDataset description: DCAT and other vocabularies
Dataset description: DCAT and other vocabularies
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
 
Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612Microsoft Technologies for Data Science 201612
Microsoft Technologies for Data Science 201612
 

Similar to Data discovery through federated dataset catalogs

eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
e-ROSA
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
Karel Charvat
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Jenn Riley
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
Scott Edmunds
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
ASIS&T
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
Nikos Palavitsinis, PhD
 
NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS model
Susanna-Assunta Sansone
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
Tom Plasterer
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
thplayer127
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
Philippe Rocca-Serra
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data Discovery
ARDC
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)
Nikos Palavitsinis, PhD
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
Dr. Mirko Kämpf
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
Jordan Open Source Association
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
FAIRDOM
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Artificial Intelligence Institute at UofSC
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
Hilmar Lapp
 

Similar to Data discovery through federated dataset catalogs (20)

eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
 
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Me...
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)MetadataTheory: Introduction to Metadata (5th of 10)
MetadataTheory: Introduction to Metadata (5th of 10)
 
NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS model
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
L07 metadata
L07 metadataL07 metadata
L07 metadata
 
Dats nih-dccpc-kc7-april2018-prs-uoxf
Dats  nih-dccpc-kc7-april2018-prs-uoxfDats  nih-dccpc-kc7-april2018-prs-uoxf
Dats nih-dccpc-kc7-april2018-prs-uoxf
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data Discovery
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)
 
Hadoop & Complex Systems Research
Hadoop & Complex Systems ResearchHadoop & Complex Systems Research
Hadoop & Complex Systems Research
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 

More from Valeria Pesce

Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Valeria Pesce
 
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Valeria Pesce
 
Farmers' data rights - Some findings
Farmers' data rights - Some findingsFarmers' data rights - Some findings
Farmers' data rights - Some findings
Valeria Pesce
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
Valeria Pesce
 
Publishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked DataPublishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked Data
Valeria Pesce
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontology
Valeria Pesce
 
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
Valeria Pesce
 
AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...Valeria Pesce
 
AgriDrupal: general presentation
AgriDrupal: general presentationAgriDrupal: general presentation
AgriDrupal: general presentation
Valeria Pesce
 
Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...
Valeria Pesce
 
Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)
Valeria Pesce
 
The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...
Valeria Pesce
 
Libraries 2.0 and RSS
Libraries 2.0 and RSSLibraries 2.0 and RSS
Libraries 2.0 and RSS
Valeria Pesce
 
The Ciard RING
The Ciard RINGThe Ciard RING
The Ciard RING
Valeria Pesce
 
The Global ARD Web Ring
The Global ARD Web RingThe Global ARD Web Ring
The Global ARD Web Ring
Valeria Pesce
 
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFARThe EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
Valeria Pesce
 

More from Valeria Pesce (16)

Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
Codes of conduct for farm data sharing. Work done and ideas for a GODAN/CTA s...
 
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
Digital agriculture: ICT-amplified data asymmetries and power imbalances. Pol...
 
Farmers' data rights - Some findings
Farmers' data rights - Some findingsFarmers' data rights - Some findings
Farmers' data rights - Some findings
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
Publishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked DataPublishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked Data
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontology
 
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
AgriVIVO: A Global Ontology-Driven RDF Store Based on a Distributed Architect...
 
AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...AgriVIVO. Fostering better networking and collaboration among researchers, re...
AgriVIVO. Fostering better networking and collaboration among researchers, re...
 
AgriDrupal: general presentation
AgriDrupal: general presentationAgriDrupal: general presentation
AgriDrupal: general presentation
 
Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...Developing Agricultural Research Information Systems. The experience of the G...
Developing Agricultural Research Information Systems. The experience of the G...
 
Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)Information / software architectures based on Content Management Systems (CMS)
Information / software architectures based on Content Management Systems (CMS)
 
The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...The CIARD RING, an infrastructure for interoperability of agricultural resear...
The CIARD RING, an infrastructure for interoperability of agricultural resear...
 
Libraries 2.0 and RSS
Libraries 2.0 and RSSLibraries 2.0 and RSS
Libraries 2.0 and RSS
 
The Ciard RING
The Ciard RINGThe Ciard RING
The Ciard RING
 
The Global ARD Web Ring
The Global ARD Web RingThe Global ARD Web Ring
The Global ARD Web Ring
 
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFARThe EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
The EGFAR web space: Using Web 2.0 technologies to electronically mimic GFAR
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 

Data discovery through federated dataset catalogs

  • 1. Data discovery through federated dataset catalogs Valeria Pesce Secretariat of the Global Forum on Agricultural Research (GFAR) Secretariat of the Global Open Data for Agriculture and Nutrition (GODAN) initiative eROSA workshop, Montpellier, 6-7 July 2017
  • 2. • Many institutional catalogs / geographically-scoped catalogs / thematic catalogs • How many catalogs do I have to search? >> General meta-catalogs? Different targeted catalogs? >> Federated metadata catalogs / Secondary catalogs 1. Dataset discovery: how • Data are in datasets, stored in some dataset repository • Datasets can be made searchable through a dataset catalog Good dataset metadata at the level of the local repository / catalog Open interoperable dataset metadata at the level of the primary repository / catalog IDEAL >>> Linked Data federated search engines LOD-enabled primary catalog Heavy requirements for the local primary catalog
  • 3. 1. Dataset discovery: good metadata (1) 1. General metadata about the dataset “resource”: a) identifier(s) b) who is responsible for it c) when and where the data were collected d) relations to organizations, persons, publications, software, projects, funding… e) the conditions for re-use (rights, licenses) f) provenance, versions g) the specific coverage of the dataset (type of data, thematic coverage, geographic coverage) Normally covered by generic vocabularies like Dublin Core or DCAT IDEAL Let’s look at existing good practices and standards
  • 4. 1. Dataset discovery: good metadata (2) a) The variables: the observed “dimensions” (e.g. time, geographic region, gender, elevation…) and the measured / observed phenomenon (e.g. life expectancy) b) The specification of the dimensions (units of measure, time granularity, syntax, any scaling factors and metadata such as the status of the observation, reference taxonomies…) c) Possible time and space slices; subsets Not always considered in generic dataset metadata vocabularies (DCAT) but traditionally included in research datasets (e.g. in formats like NetCDF) and covered by DataCube IDEAL 2. Metadata about the data structure!
  • 5. 1. Dataset discovery: good metadata (3) 1. Where to retrieve the dataset: URL (data dump, service…) 2. The necessary technical specifications to retrieve and parse a distribution of the dataset: - format (file format, data format), vocabularies / data dictionaries - protocol, API parameters… Not always considered in generic dataset metadata vocabularies: DCAT covers data dump and format, VOID some services IDEAL 3. Metadata about the actual “serializations” or “distributions” of the dataset. Data will be processed by tools! Data formats and access protocols are important.
  • 6. 1. Dataset discovery: interoperable metadata Secondary catalogs have to be able to retrieve metadata from the dataset catalog IDEAL Ideally, secondary catalogs would be able to retrieve only subsets of the catalog (by type of data, by data format, by phenomenon observed?) Data service / API with filtering parameters Catalogs as DAAS - Data-as-a-Service • All discovery-relevant metadata are exposed in machine-readable form • Exposed metadata use shared semantics • Standardization of the values, e.g. for “thematic coverage” or “dimensions” of datasets, “format” or “protocol used” of distributions etc. • The value should be standardized, possibly a URI • The value should be part of an authority list / code list
  • 7. 1. Dataset discovery: ideal architecture Conclusions • Dataset metadata ideally created by authors / curators at the local level, catalog associated with repository • High-quality metadata in catalogs allowing for answers to all possible queries • Ownership, rights, temporal, spatial, thematic, data structure, access… • Machine-readable metadata; agreed vocabularies; shared semantics; APIs for querying • General or specialized secondary catalogs federate metadata from primary catalogs; multiply discoverability and cater for different audiences • Also secondary catalogs expose good metadata and APIs • There’s an inventory / registry of dataset repositories and all types of catalogs IDEAL
  • 8. 2. Dataset discovery: current situation in Agriculture (1) • Institutional data repositories are picking up (need for an inventory!) CURRENT • Use of standardized or semi-standardized data repository tools with cataloguing functionalities and APIs is picking up (Dataverse, CKAN…) • Some governmental metadata catalogs exist, often using standardized tools (CKAN) and standard vocabularies (DCAT), that include agricultural datasets • Some international data catalogs exist that include agricultural datasets (re3data, OpenAIRE, DataHub…) • Also research-oriented data services like OpenDAP or Unidata THREDDS • Some secondary federated catalogs exist (? Need for an inventory!) • General one for agriculture (usable as an inventory): the CIARD RING
  • 9. 2. Dataset discovery: current situation Example of CIARD RING secondary catalog • Architecture: • Datasets can be hosted anywhere, the RING only hosts the metadata • Optionally, datasets can be uploaded and the RING can act as a subsidiary repository • Datasets (metadata) can be federated from other catalogs • It uses the dataset / distribution DCAT model • Metadata quality: • it uses a combination of the DCAT model + the VOID vocabulary and the DataCube vocabulary + some extra properties (? a “RING DCAT profile” will be published) • Shared semantics: • it has a Linked Data layer, URIs for all entities; all categories are published as SKOS concepts in SKOS concept schemes and are mapped to external concepts whenever possible
  • 10. 2. Dataset discovery: current situation in Agriculture (2) • Metadata quality of most used primary data catalog tools is not high • E.g. no metadata about data structure, no shared semantics for data types, topics, formats, standards used CURRENT >> poor discovery services in secondary catalogs • Metadata interoperability of most used primary data catalog tools is not high • No full compliance with broadly recognized vocabularies (DCAT, DataCube…) • No functionality to apply shared semantics for categorizations like topics, data types, formats, dimensions (sometimes keywords from AGROVOC) • Data not always accessible through exposed metadataNo full compliance with broadly recognized vocabularies (DCAT, DataCube…) • Scarce population >> lack of reputation / authority of secondary catalogs.> lack of motivation to share
  • 11. 3. Dataset discovery: infrastructural improvements (1) Quality depends on the metadata coming from the primary repository / catalog… how can a good infrastructure overcome this problem? • Advocacy for better / improved tools? • Promote improvement of existing tools? • Dataset repository / catalog platforms in the cloud? • Complementary / subsidiary role of secondary catalogs? • Allow subsidiary use of secondary catalogs as primary catalog and even repository for some datasets (small institutions, individuals) • Cater or the improvement of metadata directly in the secondary catalogs • Incentives to provide good metadata? • E.g. offer mechanisms to a) measure reuse; b) enforce respect of usage rights.
  • 12. • Good agreed metadata standards and reference value vocabularies • Combine existing standards (DCAT, DataCube, VOID…) in an application profile? • Provide a reference framework of agreed value vocabularies with URIs? Mapping from local values to agreed ones? >> AgriSemantics – GACS, VEST/AgroPortal • Avoid too much interdependence. Design a loosely coupled infrastructure. (How?) 3. Dataset discovery: infrastructural improvements (2)
  • 13. Key questions • Is it our task to aim at having better machine-readable metadata at the level of the primary local repository / catalog? • How can we influence this? Advocate for including metadata in researcher’s tools? • Do we want to “drive” secondary catalogs or let them bloom? Or both? • At least a global one for food&ag? How many? Who decides? Who manages them? • How can other infrastructural components facilitate good catalogs? • Subsidiary metadata in secondary catalogs? Good dataset catalog tools in the cloud? • Good agreed metadata standards and reference value vocabularies? • Mapping with local values • How to design for resilience of the system? Loosely coupled components? • How much of this is specific to food&ag and which aspects should be tackled in a broader context? (EOSC?)
  • 14. Data discovery through dataset repositories and catalogs Thank you for your attention eROSA workshop, Montpellier, 6-7 July 2017