Designing TCS e-Infrastructure: data,
metadata and architecture
Daniele Bailo (EPOS-ICS / INGV),
Daniele Trippanera (INGV), WP7 team
TCS Objectives (proposal)
• Implementation and integration of data and
services from Multi-scale Laboratories
• collect and harmonize available and
emerging laboratory data on the properties and
processes controlling rock system behaviour at multiple
scales, in order to
• generate accessible and interoperable
products through services for supporting research
activities.
What’s happening elsewere
(examples)
• WP8 good maturity, long history of sharing
data
• WP9 starting from scratch, designing
architecture now (new community)
• WP10 set up shraed a pan-european e-
infrastructure and software in EPOS-PP (GSAC)
• WP11 harmonising metadata, data and
products
TCS e-infrastructures
1. Advantage of sharing data are being
recognised
2. TCSs infrastructures are being set up
3. Much effort is still required in ALL TCSs (even
more mature)
WP16 state of the art
• Some institutions have good e-infrastructures
and metadata scheme
• Some have data stored on local drives
• Some have only data stored on accessible
repositories
• Some have repositories and portals with
proprietary metadata scheme
 Quite heterogenous
e-architecture
• e-infrastructure for sharing data needs an
architecture
• Institutions located in several countries
 distributed architecture
• Several options with different impact
• Two main element:
1. data repository,
2. metadata catalogue
Generic TCS-ICS architecture
TCS system Data/metada
ta catalogue
National
network
National
repository
API / web service
Local HPC
Option 1: fully distributed
ICS
Institution 1 Institution 2
Metadata
catalogue
DATA
Web Service Web Service
Metadata
catalogue
DATA
Institution 3
Web Service
Metadata
catalogue
DATA
PROs
1. No delegation:
each institution has full
control over data and
metadata
CONs
1. Several access points
2. Institution’s effort to
set up and maintain
the e-infrastructure
(WS, institutional data
storage system, MD
catalogue)
Option 2: metadata centralised
(distributed institutional data storage system)
ICS
Institution 1
Institution 2
DATA
Web Service
Metadata
catalogue
DATA
Institution 3
DATA
PROs
1. Very low delegation:
each institution only
provides metadata
2. Less effort: each
instituion only
maintain institutional
data storage system
CONs TODO
1. Agreements on
metadata provision
Metadata
extraction
Metadata
extraction
Option 3: metadata and data
repository centralised
ICS
Institution 1
Institution 2
DATA
Web Service
Metadata
catalogue
DATA
Institution 3
DATA
PROs
1. No effort: from
institutions
CONs
1. Full delegation:
each institution
provides both data and
metadata to central
node
2. Agreements on
data/metadata
provision
3. No infrastructure is
built locally
RISK
1. No data is shared
Metadata
extraction
Metadata
extraction
WHAT IS THE BEST ONE?
THE ONE that WORKS!
Taking into account all aspects:
1. Technical
2. Available (financial) resources
3. Available (technical) solutions
4. Will of sharing and contributing
Reality is complex
WP16 Institutions have different set-ups
Reality is complex
Institution 1
Metadata
catalogue
DATA
Web Service
Ready for
interoperability
WP16 Institutions have different set-ups
Reality is complex
Institution 1
Metadata
catalogue
DATA
Web Service
Ready to provide
metadata
Institution 1
DATA
WP16 Institutions have different set-ups
Metadata
extraction
Reality is complex
Institution 1
Metadata
catalogue
DATA
Web Service
Needs to figure out
how to store / maintain data
Institution 1
DATA
Institution 1
DATA
WP16 Institutions have different set-ups
Hybrid Option
ICS
Institution 2
CENTRAL
NODE
DATA
Web Service
Metadata
catalogueDATA
PROs
1. Low perturbation of
existing systems
TODOs
1. Agreements on
provision of data /
metadata
RISK
“Empty box” scenario:
nothing is really shared,
only examples files
Metadata
extraction
Institution 1
DATA
MD
Extr.
Institution 3
Metadata
catalogueDATA
DOI
• Allow to uniquely identify data
• Ensure Citation of the dataset
• Data access via DOI link
• Guarantee timewise availability of data access
Actual scenario
• 4 DATA PROVIDERS types:
– Published Data (data is within the publication as
e.g. table). No raw data, only metadata
– Raw Data + metadata + DOI (GFZ+others)
– Raw Data repository + no DOI + non standard
metadata
– Raw data (personal storage, e.g. HD) + no DOI
Plan for harmonization and DDSS prioritization
Conclusions
• Some effort to build a supporting
infrastructure is needed (metadata
harmonization, digital infrastructure)
• A good architecture should take into account
multiple institutional set-ups
• Find a way to manage also non-DOI data
• Possible outcome of the meeting  list of
available data, metadata and infrastructure for
each institution, institutions offering DOI.
Invitation
W3C-VRE4EIC
Smart Descriptions & Smarter Vocabularies
30 November - 1 December, CWI, Amsterdam
“The need to describe data with metadata is well
understood: the problem is how best to do it. ”[…]
9 October 2016:Deadline for submission of Position
Papers
https://www.w3.org/2016/11/sdsvoc/

Designing TCS e-Infrastructure: data, metadata and architecture

  • 1.
    Designing TCS e-Infrastructure:data, metadata and architecture Daniele Bailo (EPOS-ICS / INGV), Daniele Trippanera (INGV), WP7 team
  • 2.
    TCS Objectives (proposal) •Implementation and integration of data and services from Multi-scale Laboratories • collect and harmonize available and emerging laboratory data on the properties and processes controlling rock system behaviour at multiple scales, in order to • generate accessible and interoperable products through services for supporting research activities.
  • 3.
    What’s happening elsewere (examples) •WP8 good maturity, long history of sharing data • WP9 starting from scratch, designing architecture now (new community) • WP10 set up shraed a pan-european e- infrastructure and software in EPOS-PP (GSAC) • WP11 harmonising metadata, data and products
  • 4.
    TCS e-infrastructures 1. Advantageof sharing data are being recognised 2. TCSs infrastructures are being set up 3. Much effort is still required in ALL TCSs (even more mature)
  • 5.
    WP16 state ofthe art • Some institutions have good e-infrastructures and metadata scheme • Some have data stored on local drives • Some have only data stored on accessible repositories • Some have repositories and portals with proprietary metadata scheme  Quite heterogenous
  • 6.
    e-architecture • e-infrastructure forsharing data needs an architecture • Institutions located in several countries  distributed architecture • Several options with different impact • Two main element: 1. data repository, 2. metadata catalogue
  • 7.
    Generic TCS-ICS architecture TCSsystem Data/metada ta catalogue National network National repository API / web service Local HPC
  • 8.
    Option 1: fullydistributed ICS Institution 1 Institution 2 Metadata catalogue DATA Web Service Web Service Metadata catalogue DATA Institution 3 Web Service Metadata catalogue DATA PROs 1. No delegation: each institution has full control over data and metadata CONs 1. Several access points 2. Institution’s effort to set up and maintain the e-infrastructure (WS, institutional data storage system, MD catalogue)
  • 9.
    Option 2: metadatacentralised (distributed institutional data storage system) ICS Institution 1 Institution 2 DATA Web Service Metadata catalogue DATA Institution 3 DATA PROs 1. Very low delegation: each institution only provides metadata 2. Less effort: each instituion only maintain institutional data storage system CONs TODO 1. Agreements on metadata provision Metadata extraction Metadata extraction
  • 10.
    Option 3: metadataand data repository centralised ICS Institution 1 Institution 2 DATA Web Service Metadata catalogue DATA Institution 3 DATA PROs 1. No effort: from institutions CONs 1. Full delegation: each institution provides both data and metadata to central node 2. Agreements on data/metadata provision 3. No infrastructure is built locally RISK 1. No data is shared Metadata extraction Metadata extraction
  • 11.
    WHAT IS THEBEST ONE?
  • 12.
    THE ONE thatWORKS! Taking into account all aspects: 1. Technical 2. Available (financial) resources 3. Available (technical) solutions 4. Will of sharing and contributing
  • 13.
    Reality is complex WP16Institutions have different set-ups
  • 14.
    Reality is complex Institution1 Metadata catalogue DATA Web Service Ready for interoperability WP16 Institutions have different set-ups
  • 15.
    Reality is complex Institution1 Metadata catalogue DATA Web Service Ready to provide metadata Institution 1 DATA WP16 Institutions have different set-ups Metadata extraction
  • 16.
    Reality is complex Institution1 Metadata catalogue DATA Web Service Needs to figure out how to store / maintain data Institution 1 DATA Institution 1 DATA WP16 Institutions have different set-ups
  • 17.
    Hybrid Option ICS Institution 2 CENTRAL NODE DATA WebService Metadata catalogueDATA PROs 1. Low perturbation of existing systems TODOs 1. Agreements on provision of data / metadata RISK “Empty box” scenario: nothing is really shared, only examples files Metadata extraction Institution 1 DATA MD Extr. Institution 3 Metadata catalogueDATA
  • 18.
    DOI • Allow touniquely identify data • Ensure Citation of the dataset • Data access via DOI link • Guarantee timewise availability of data access
  • 19.
    Actual scenario • 4DATA PROVIDERS types: – Published Data (data is within the publication as e.g. table). No raw data, only metadata – Raw Data + metadata + DOI (GFZ+others) – Raw Data repository + no DOI + non standard metadata – Raw data (personal storage, e.g. HD) + no DOI Plan for harmonization and DDSS prioritization
  • 20.
    Conclusions • Some effortto build a supporting infrastructure is needed (metadata harmonization, digital infrastructure) • A good architecture should take into account multiple institutional set-ups • Find a way to manage also non-DOI data • Possible outcome of the meeting  list of available data, metadata and infrastructure for each institution, institutions offering DOI.
  • 21.
    Invitation W3C-VRE4EIC Smart Descriptions &Smarter Vocabularies 30 November - 1 December, CWI, Amsterdam “The need to describe data with metadata is well understood: the problem is how best to do it. ”[…] 9 October 2016:Deadline for submission of Position Papers https://www.w3.org/2016/11/sdsvoc/