SlideShare a Scribd company logo
Designing TCS e-Infrastructure: data,
metadata and architecture
Daniele Bailo (EPOS-ICS / INGV),
Daniele Trippanera (INGV), WP7 team
TCS Objectives (proposal)
• Implementation and integration of data and
services from Multi-scale Laboratories
• collect and harmonize available and
emerging laboratory data on the properties and
processes controlling rock system behaviour at multiple
scales, in order to
• generate accessible and interoperable
products through services for supporting research
activities.
What’s happening elsewere
(examples)
• WP8 good maturity, long history of sharing
data
• WP9 starting from scratch, designing
architecture now (new community)
• WP10 set up shraed a pan-european e-
infrastructure and software in EPOS-PP (GSAC)
• WP11 harmonising metadata, data and
products
TCS e-infrastructures
1. Advantage of sharing data are being
recognised
2. TCSs infrastructures are being set up
3. Much effort is still required in ALL TCSs (even
more mature)
WP16 state of the art
• Some institutions have good e-infrastructures
and metadata scheme
• Some have data stored on local drives
• Some have only data stored on accessible
repositories
• Some have repositories and portals with
proprietary metadata scheme
 Quite heterogenous
e-architecture
• e-infrastructure for sharing data needs an
architecture
• Institutions located in several countries
 distributed architecture
• Several options with different impact
• Two main element:
1. data repository,
2. metadata catalogue
Generic TCS-ICS architecture
TCS system Data/metada
ta catalogue
National
network
National
repository
API / web service
Local HPC
Option 1: fully distributed
ICS
Institution 1 Institution 2
Metadata
catalogue
DATA
Web Service Web Service
Metadata
catalogue
DATA
Institution 3
Web Service
Metadata
catalogue
DATA
PROs
1. No delegation:
each institution has full
control over data and
metadata
CONs
1. Several access points
2. Institution’s effort to
set up and maintain
the e-infrastructure
(WS, institutional data
storage system, MD
catalogue)
Option 2: metadata centralised
(distributed institutional data storage system)
ICS
Institution 1
Institution 2
DATA
Web Service
Metadata
catalogue
DATA
Institution 3
DATA
PROs
1. Very low delegation:
each institution only
provides metadata
2. Less effort: each
instituion only
maintain institutional
data storage system
CONs TODO
1. Agreements on
metadata provision
Metadata
extraction
Metadata
extraction
Option 3: metadata and data
repository centralised
ICS
Institution 1
Institution 2
DATA
Web Service
Metadata
catalogue
DATA
Institution 3
DATA
PROs
1. No effort: from
institutions
CONs
1. Full delegation:
each institution
provides both data and
metadata to central
node
2. Agreements on
data/metadata
provision
3. No infrastructure is
built locally
RISK
1. No data is shared
Metadata
extraction
Metadata
extraction
WHAT IS THE BEST ONE?
THE ONE that WORKS!
Taking into account all aspects:
1. Technical
2. Available (financial) resources
3. Available (technical) solutions
4. Will of sharing and contributing
Reality is complex
WP16 Institutions have different set-ups
Reality is complex
Institution 1
Metadata
catalogue
DATA
Web Service
Ready for
interoperability
WP16 Institutions have different set-ups
Reality is complex
Institution 1
Metadata
catalogue
DATA
Web Service
Ready to provide
metadata
Institution 1
DATA
WP16 Institutions have different set-ups
Metadata
extraction
Reality is complex
Institution 1
Metadata
catalogue
DATA
Web Service
Needs to figure out
how to store / maintain data
Institution 1
DATA
Institution 1
DATA
WP16 Institutions have different set-ups
Hybrid Option
ICS
Institution 2
CENTRAL
NODE
DATA
Web Service
Metadata
catalogueDATA
PROs
1. Low perturbation of
existing systems
TODOs
1. Agreements on
provision of data /
metadata
RISK
“Empty box” scenario:
nothing is really shared,
only examples files
Metadata
extraction
Institution 1
DATA
MD
Extr.
Institution 3
Metadata
catalogueDATA
DOI
• Allow to uniquely identify data
• Ensure Citation of the dataset
• Data access via DOI link
• Guarantee timewise availability of data access
Actual scenario
• 4 DATA PROVIDERS types:
– Published Data (data is within the publication as
e.g. table). No raw data, only metadata
– Raw Data + metadata + DOI (GFZ+others)
– Raw Data repository + no DOI + non standard
metadata
– Raw data (personal storage, e.g. HD) + no DOI
Plan for harmonization and DDSS prioritization
Conclusions
• Some effort to build a supporting
infrastructure is needed (metadata
harmonization, digital infrastructure)
• A good architecture should take into account
multiple institutional set-ups
• Find a way to manage also non-DOI data
• Possible outcome of the meeting  list of
available data, metadata and infrastructure for
each institution, institutions offering DOI.
Invitation
W3C-VRE4EIC
Smart Descriptions & Smarter Vocabularies
30 November - 1 December, CWI, Amsterdam
“The need to describe data with metadata is well
understood: the problem is how best to do it. ”[…]
9 October 2016:Deadline for submission of Position
Papers
https://www.w3.org/2016/11/sdsvoc/

More Related Content

What's hot

GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics InstituteGlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
Globus
 
Hadoop
HadoopHadoop
Hadoop
Aarti Bedre
 
Big data service architecture: a survey
Big data service architecture: a surveyBig data service architecture: a survey
Big data service architecture: a survey
ssuser0191d4
 
How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...
Leon Osinski
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
Gilles Fedak
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
vty
 
OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016
OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016
OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016
OpenAIRE
 
Data Centers In US
Data Centers In USData Centers In US
Data Centers In US
msirmajritchie
 
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
3TU.Datacentrum
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology Infrastructure
Research Data Alliance
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
vty
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
vty
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
vty
 
Open Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola MasuzzoOpen Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
OpenAIRE
 
Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...
Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...
Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...
Ryft
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish Repositories
RIANIreland
 

What's hot (16)

GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics InstituteGlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
GlobusWorld 2021: Managing Genomics Data at the DOE Joint Genomics Institute
 
Hadoop
HadoopHadoop
Hadoop
 
Big data service architecture: a survey
Big data service architecture: a surveyBig data service architecture: a survey
Big data service architecture: a survey
 
How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...How to make your research data open : presentation held at the VU Open Scienc...
How to make your research data open : presentation held at the VU Open Scienc...
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016
OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016
OpenAIRE: Open Science as-a-Service - presentation at #DI4R2016
 
Data Centers In US
Data Centers In USData Centers In US
Data Centers In US
 
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
[3.6] Beyond Data Sharing - Pieter van Gorp [3TU.Datacentrum Symposium 2014, ...
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology Infrastructure
 
Dataverse in the European Open Science Cloud
Dataverse in the European Open Science CloudDataverse in the European Open Science Cloud
Dataverse in the European Open Science Cloud
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
DataverseNL as structured data hub
DataverseNL as structured data hubDataverseNL as structured data hub
DataverseNL as structured data hub
 
Open Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola MasuzzoOpen Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
Open Data: Sharing the Main Actor of a Scientific Story - Paola Masuzzo
 
Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...
Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...
Strata + Hadoop World San Jose Presentation: Overcoming the Top Five Hurdles ...
 
OpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish Repositories
 

Viewers also liked

Safe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-NeiSafe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-Nei
njbikeped
 
Strat Plan May 31 2014
Strat Plan May 31 2014Strat Plan May 31 2014
Strat Plan May 31 2014
Geoffrey Richards
 
Zato my iz_zato
Zato my iz_zatoZato my iz_zato
Zato my iz_zatomarymam
 
Zato my iz_zato
Zato my iz_zatoZato my iz_zato
Zato my iz_zatomarymam
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТОmarymam
 
Classroom ethics cartoon by Reaz and Ayyaz
Classroom ethics cartoon by Reaz and AyyazClassroom ethics cartoon by Reaz and Ayyaz
Classroom ethics cartoon by Reaz and Ayyaz
royos88
 
Bali island
Bali islandBali island
Bali island
AIZZY118
 
Article 14-CFS
Article 14-CFSArticle 14-CFS
Article 14-CFS
Claire Algarme
 
Alimentacióny nutrición alejandrina ibarra avila
Alimentacióny nutrición alejandrina ibarra avilaAlimentacióny nutrición alejandrina ibarra avila
Alimentacióny nutrición alejandrina ibarra avila
cynthiardzb
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТОmarymam
 
Penalosa Farm: An Organic Haven
Penalosa Farm: An Organic HavenPenalosa Farm: An Organic Haven
Penalosa Farm: An Organic Haven
Claire Algarme
 
Kurchatow
KurchatowKurchatow
Kurchatowmarymam
 
04 streamline english directions
04 streamline english directions04 streamline english directions
04 streamline english directions
thucvat
 
Metadata & Brokering - a modern approach for INGV RI
Metadata & Brokering - a modern approach for INGV RI Metadata & Brokering - a modern approach for INGV RI
Metadata & Brokering - a modern approach for INGV RI
Daniele Bailo
 
Hodri meydan
Hodri meydanHodri meydan
Hodri meydan
kipsay
 
UCL of Slideshare
UCL of SlideshareUCL of Slideshare
UCL of Slideshare
Jeane Paguio
 
141124 vocational trg_notice_eng - winter 2014
141124 vocational trg_notice_eng - winter 2014141124 vocational trg_notice_eng - winter 2014
141124 vocational trg_notice_eng - winter 2014
Ashok Kumar Yadav
 

Viewers also liked (18)

Monet
MonetMonet
Monet
 
Safe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-NeiSafe Routes to School - Elise Bremer-Nei
Safe Routes to School - Elise Bremer-Nei
 
Strat Plan May 31 2014
Strat Plan May 31 2014Strat Plan May 31 2014
Strat Plan May 31 2014
 
Zato my iz_zato
Zato my iz_zatoZato my iz_zato
Zato my iz_zato
 
Zato my iz_zato
Zato my iz_zatoZato my iz_zato
Zato my iz_zato
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТО
 
Classroom ethics cartoon by Reaz and Ayyaz
Classroom ethics cartoon by Reaz and AyyazClassroom ethics cartoon by Reaz and Ayyaz
Classroom ethics cartoon by Reaz and Ayyaz
 
Bali island
Bali islandBali island
Bali island
 
Article 14-CFS
Article 14-CFSArticle 14-CFS
Article 14-CFS
 
Alimentacióny nutrición alejandrina ibarra avila
Alimentacióny nutrición alejandrina ibarra avilaAlimentacióny nutrición alejandrina ibarra avila
Alimentacióny nutrición alejandrina ibarra avila
 
зато мы из ЗАТО
зато мы из ЗАТОзато мы из ЗАТО
зато мы из ЗАТО
 
Penalosa Farm: An Organic Haven
Penalosa Farm: An Organic HavenPenalosa Farm: An Organic Haven
Penalosa Farm: An Organic Haven
 
Kurchatow
KurchatowKurchatow
Kurchatow
 
04 streamline english directions
04 streamline english directions04 streamline english directions
04 streamline english directions
 
Metadata & Brokering - a modern approach for INGV RI
Metadata & Brokering - a modern approach for INGV RI Metadata & Brokering - a modern approach for INGV RI
Metadata & Brokering - a modern approach for INGV RI
 
Hodri meydan
Hodri meydanHodri meydan
Hodri meydan
 
UCL of Slideshare
UCL of SlideshareUCL of Slideshare
UCL of Slideshare
 
141124 vocational trg_notice_eng - winter 2014
141124 vocational trg_notice_eng - winter 2014141124 vocational trg_notice_eng - winter 2014
141124 vocational trg_notice_eng - winter 2014
 

Similar to Designing TCS e-Infrastructure: data, metadata and architecture

Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Denodo
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
Adriano Patrick Cunha
 
HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx
HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptxHEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx
HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx
ssuser0d9ec0
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
Denodo
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
BigDataExpo
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Denodo
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
Robert Grossman
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 
ICPSR Data Managment
ICPSR Data ManagmentICPSR Data Managment
ICPSR Data Managment
ICPSR
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
OpenAIRE
 
ELIXIR
ELIXIRELIXIR
Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu | Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu |
EUDAT
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Denodo
 
Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data Fabric
Denodo
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
Denodo
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Denodo
 

Similar to Designing TCS e-Infrastructure: data, metadata and architecture (20)

Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
Myth Busters: I’m Building a Data Lake, So I Don’t Need Data Virtualization (...
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
 
HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx
HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptxHEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx
HEDW-2020-Using-Data-Virtualization-to-Break-Down-Data-Silos.pptx
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data VirtualizationMyth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
Myth Busters III: I’m Building a Data Lake, So I Don’t Need Data Virtualization
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
ICPSR Data Managment
ICPSR Data ManagmentICPSR Data Managment
ICPSR Data Managment
 
EUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan BroederEUDAT data architecture and interoperability aspects – Daan Broeder
EUDAT data architecture and interoperability aspects – Daan Broeder
 
ELIXIR
ELIXIRELIXIR
ELIXIR
 
Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu | Eudat presentation nov2013 | www.eudat.eu |
Eudat presentation nov2013 | www.eudat.eu |
 
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric ArchitectureShaping the Role of a Data Lake in a Modern Data Fabric Architecture
Shaping the Role of a Data Lake in a Modern Data Fabric Architecture
 
Delivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data FabricDelivering Faster Insights with a Logical Data Fabric
Delivering Faster Insights with a Logical Data Fabric
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 

Designing TCS e-Infrastructure: data, metadata and architecture

  • 1. Designing TCS e-Infrastructure: data, metadata and architecture Daniele Bailo (EPOS-ICS / INGV), Daniele Trippanera (INGV), WP7 team
  • 2. TCS Objectives (proposal) • Implementation and integration of data and services from Multi-scale Laboratories • collect and harmonize available and emerging laboratory data on the properties and processes controlling rock system behaviour at multiple scales, in order to • generate accessible and interoperable products through services for supporting research activities.
  • 3. What’s happening elsewere (examples) • WP8 good maturity, long history of sharing data • WP9 starting from scratch, designing architecture now (new community) • WP10 set up shraed a pan-european e- infrastructure and software in EPOS-PP (GSAC) • WP11 harmonising metadata, data and products
  • 4. TCS e-infrastructures 1. Advantage of sharing data are being recognised 2. TCSs infrastructures are being set up 3. Much effort is still required in ALL TCSs (even more mature)
  • 5. WP16 state of the art • Some institutions have good e-infrastructures and metadata scheme • Some have data stored on local drives • Some have only data stored on accessible repositories • Some have repositories and portals with proprietary metadata scheme  Quite heterogenous
  • 6. e-architecture • e-infrastructure for sharing data needs an architecture • Institutions located in several countries  distributed architecture • Several options with different impact • Two main element: 1. data repository, 2. metadata catalogue
  • 7. Generic TCS-ICS architecture TCS system Data/metada ta catalogue National network National repository API / web service Local HPC
  • 8. Option 1: fully distributed ICS Institution 1 Institution 2 Metadata catalogue DATA Web Service Web Service Metadata catalogue DATA Institution 3 Web Service Metadata catalogue DATA PROs 1. No delegation: each institution has full control over data and metadata CONs 1. Several access points 2. Institution’s effort to set up and maintain the e-infrastructure (WS, institutional data storage system, MD catalogue)
  • 9. Option 2: metadata centralised (distributed institutional data storage system) ICS Institution 1 Institution 2 DATA Web Service Metadata catalogue DATA Institution 3 DATA PROs 1. Very low delegation: each institution only provides metadata 2. Less effort: each instituion only maintain institutional data storage system CONs TODO 1. Agreements on metadata provision Metadata extraction Metadata extraction
  • 10. Option 3: metadata and data repository centralised ICS Institution 1 Institution 2 DATA Web Service Metadata catalogue DATA Institution 3 DATA PROs 1. No effort: from institutions CONs 1. Full delegation: each institution provides both data and metadata to central node 2. Agreements on data/metadata provision 3. No infrastructure is built locally RISK 1. No data is shared Metadata extraction Metadata extraction
  • 11. WHAT IS THE BEST ONE?
  • 12. THE ONE that WORKS! Taking into account all aspects: 1. Technical 2. Available (financial) resources 3. Available (technical) solutions 4. Will of sharing and contributing
  • 13. Reality is complex WP16 Institutions have different set-ups
  • 14. Reality is complex Institution 1 Metadata catalogue DATA Web Service Ready for interoperability WP16 Institutions have different set-ups
  • 15. Reality is complex Institution 1 Metadata catalogue DATA Web Service Ready to provide metadata Institution 1 DATA WP16 Institutions have different set-ups Metadata extraction
  • 16. Reality is complex Institution 1 Metadata catalogue DATA Web Service Needs to figure out how to store / maintain data Institution 1 DATA Institution 1 DATA WP16 Institutions have different set-ups
  • 17. Hybrid Option ICS Institution 2 CENTRAL NODE DATA Web Service Metadata catalogueDATA PROs 1. Low perturbation of existing systems TODOs 1. Agreements on provision of data / metadata RISK “Empty box” scenario: nothing is really shared, only examples files Metadata extraction Institution 1 DATA MD Extr. Institution 3 Metadata catalogueDATA
  • 18. DOI • Allow to uniquely identify data • Ensure Citation of the dataset • Data access via DOI link • Guarantee timewise availability of data access
  • 19. Actual scenario • 4 DATA PROVIDERS types: – Published Data (data is within the publication as e.g. table). No raw data, only metadata – Raw Data + metadata + DOI (GFZ+others) – Raw Data repository + no DOI + non standard metadata – Raw data (personal storage, e.g. HD) + no DOI Plan for harmonization and DDSS prioritization
  • 20. Conclusions • Some effort to build a supporting infrastructure is needed (metadata harmonization, digital infrastructure) • A good architecture should take into account multiple institutional set-ups • Find a way to manage also non-DOI data • Possible outcome of the meeting  list of available data, metadata and infrastructure for each institution, institutions offering DOI.
  • 21. Invitation W3C-VRE4EIC Smart Descriptions & Smarter Vocabularies 30 November - 1 December, CWI, Amsterdam “The need to describe data with metadata is well understood: the problem is how best to do it. ”[…] 9 October 2016:Deadline for submission of Position Papers https://www.w3.org/2016/11/sdsvoc/