Daniele Bailo
M E T A D A T A
& BROKERING
a modern approach
E P I S O D E # 2
Previously on…
Metadata &
Brokering#1
Main concepts
- Digital Data
- Metadata
- Brokering system
- The triad <PID, MD, DO>
- Database
- APIs (web services)
Side concepts
- Ontologies / Semantics
- PID
- Digital Object
- Standard
- Interoperability
- Open Access
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
Data
set
API API API
Discovery (DC) and (CKAN, eGMS)
Contextual (CERIF metadata model)
Detailed (community specific)
Features
1. APIs
2. <PID, metadata, DO>
3. Contextualization
metadata
4. Support ontologies
Data from Irpinia
<PID, metadata, DO>
request response
THE PERFECT
SYSTEM
#6 Metadata driven
canonical Brokering
with contextualization
& PID
NEW & OLD CHARACTERS
Metadata
Purposes
1. Discovery (humans &
machines)
2. Contextualization:
which is the context of
the data
3. Use it for processing
or other advanced
tasks
Usually attached to D.O.
Interoperability
What & Why
Enables 2 system to
1. Exchange information
2. Understand information
Usually achieved
through:
- Agreed language
- Software “translators”
interfaces thin layers
...ma che parli
Arabo???
Ontologies
Why an ontology?
It is the way machines
manage “meaning”
How does it work?
1. Connects concepts
2. Needs vocabulary
Issues
• Many ontologies exist
• Vocabulary Mapping
Michelini
CNT
Is Director of
INGV
Is section of Gresta
Is president of
Sailing
Has hobby
Trieste
Is Born
Italy
Located in
Boat
use
sea
use
Metadata
Catalogue#1
Purposes
Store metadata:
e.g. 1. producer
2. date of creation
3. data format format
Misleading
Example (why?)
Metadata
Catalogue#2
How to implement it?
Single table (bad habit)
One table with all data
Multi table (good habit)
- Data is stored in
multiple tables (one for
concept)
- Tables are linked
- Can contextualize data
Metadata catalogue =
relational database *
Single table
Multi table
Metadata
Catalogue#2
How to implement it?
Single table (bad habit)
One table with all data
Multi table (good habit)
- Data is stored in
unique tables (one for
concept)
- Tables are linked
- Can contextualize data
Metadata catalogue =
relational database *
Single table
Multi table and
contextualization
Catalogue Interface
Human interface (GUI)
Website or portal
Machine interface
- API or Web service
- which execute scripts
or queries
- Returns metadata in a
given standard
What is it?
It does something for the
user
(deliver value to
customer)*
A “thin layer”
We usually don’t know
what’s under the hood
Examples
- FDSN stations
(web) service
FDSN stations
FDSN Dataselect
Database
(MD catalogue)
Waveform
repository
CKAN
CKAN GUI
METADATA
catalogue
CKAN APIs
EIDA stations ISIDE stations
Metadata
replication
What is it?
- Metadata Catalogue
- With interfaces
(GUI+API)
- No direct
CKAN <-> sources
connection
Examples
- Works FDSN stations
- Doesn’t work with
FDSN dataselect
Plugins
Plugins
Plugins Plugins
Plugins
Plugins
Plugins Plugins
Brokering System
(e.g. VERCE framework)
BROKER GUI
METADATA
catalogue
BROKER APIs
EIDA stations
ISIDE
stations
Metadata
replication
What is it?
- Metadata Catalogue
- With interfaces
(GUI+API)
- System manager
- Other modules
- BROKER <-> sources
interactive connection
Examples
- EIDA stations
- EIDA dataselect
- Processing Job at
System
manager
Interactive
access to
service
EIDA
dataselect
Processing
facility
? ? ?
Comments
&
Questions
Why the example
was misleading?
A global view
Data initiatives
RDA
-”regulate” data
sharing/use
EUDAT
- Common data
infrastructure
EGI
- Organize National Grid
Infrastructures (CINECA)
EPOS
- ESFRI integrating Solid
RDA
Do for data what has been
done for the internet
(TCP/IP)
RDA concepts
Data Fabric
What?
Identifies mechanisms,
standard, components and
interfaces making data
science efficient and cost
effective
Data Management Plan
• Data management
• Data analysis
• Data preservation
• Data publication
• Data sharing
[UK data Archive http://www.data-archive.ac.uk/]
RDA concepts
Data Fabric
[RDA WG outputs https://indico.cern.ch/event/370271/session/2/contribution/6/material/0/0.pdf]
How to store?How to register?
How to discover?
How to cite?
How to document
processing?
How to integrate?
How to collect
new DP?
How to
access?
data?
How to discover
data?
Metadata system
WE ALREADY KNOW
EVERYTHING ABOUT IT
METADATA
catalogue
standards?
How to preserve
data?Registry
systemWhat?
An agreed/legacy catalog
of:
- data formats (schemas)
- metadata formats
- Vocabularies & semantic
categories
- Data types
- Trusted repositories
- ….
Registry
Ahaa.. Ma
‘npratica è ‘n
database..
…anfatti…
How to register/cite data or
publications?
PID system
Purpose
- DO / publication can be
uniquely referenced
- Assign a PID at data
creation times
Issues
- Need for a simple
mechanism to implement
it
- Now EUDAT can help
- Peter & Massimo
How to access data?
AAI system
(federeated &
distributed)
Purpose
- Authenticate users
- Authorize users
Issues
- Delegation
- Many system,
sometimes non
interoperable
How to store data?
Data repository
(trusted)
What?
- Store data
- Couple with PIDs
- Ensure preservation (not
curation)
- Can be trusted (DSA)
Opportunity
- INGV DSA repository…
How to document data
processing?
Workflow engines
Purpose
- Tracks data
transformation
- Allows versioning
- Allows reproducibility
Comments
- Interoperability among
various workflow engines
- VERCE did it
Brokering System
(e.g. VERCE framework)
BROKER GUI
METADATA
catalogue
BROKER APIs
Full version include
- Metadata Catalogue
- interfaces (GUI+API)
- System manager
- AAI system
- Workflow engine
External actors
- PID System
- Trusted repositories
- Registries
- Processing facilities
System
manager
Data
set
Data
set Data
set
Data
set Data
set
Data
set
API API
AAI
system
Workflow
Engine
Trusted
repository
Trusted
repository
Registry
PID
system
HPC
center
Q&A

Metadata & brokering - a modern approach #2

  • 1.
    Daniele Bailo M ET A D A T A & BROKERING a modern approach E P I S O D E # 2
  • 2.
    Previously on… Metadata & Brokering#1 Mainconcepts - Digital Data - Metadata - Brokering system - The triad <PID, MD, DO> - Database - APIs (web services) Side concepts - Ontologies / Semantics - PID - Digital Object - Standard - Interoperability - Open Access
  • 3.
    Data set Data set Data set Data set Data set Data set Data set Data set Data set API API API Discovery(DC) and (CKAN, eGMS) Contextual (CERIF metadata model) Detailed (community specific) Features 1. APIs 2. <PID, metadata, DO> 3. Contextualization metadata 4. Support ontologies Data from Irpinia <PID, metadata, DO> request response THE PERFECT SYSTEM #6 Metadata driven canonical Brokering with contextualization & PID
  • 4.
    NEW & OLDCHARACTERS
  • 5.
    Metadata Purposes 1. Discovery (humans& machines) 2. Contextualization: which is the context of the data 3. Use it for processing or other advanced tasks Usually attached to D.O.
  • 6.
    Interoperability What & Why Enables2 system to 1. Exchange information 2. Understand information Usually achieved through: - Agreed language - Software “translators” interfaces thin layers ...ma che parli Arabo???
  • 7.
    Ontologies Why an ontology? Itis the way machines manage “meaning” How does it work? 1. Connects concepts 2. Needs vocabulary Issues • Many ontologies exist • Vocabulary Mapping Michelini CNT Is Director of INGV Is section of Gresta Is president of Sailing Has hobby Trieste Is Born Italy Located in Boat use sea use
  • 8.
    Metadata Catalogue#1 Purposes Store metadata: e.g. 1.producer 2. date of creation 3. data format format Misleading Example (why?)
  • 9.
    Metadata Catalogue#2 How to implementit? Single table (bad habit) One table with all data Multi table (good habit) - Data is stored in multiple tables (one for concept) - Tables are linked - Can contextualize data Metadata catalogue = relational database * Single table Multi table
  • 10.
    Metadata Catalogue#2 How to implementit? Single table (bad habit) One table with all data Multi table (good habit) - Data is stored in unique tables (one for concept) - Tables are linked - Can contextualize data Metadata catalogue = relational database * Single table Multi table and contextualization
  • 11.
    Catalogue Interface Human interface(GUI) Website or portal Machine interface - API or Web service - which execute scripts or queries - Returns metadata in a given standard
  • 12.
    What is it? Itdoes something for the user (deliver value to customer)* A “thin layer” We usually don’t know what’s under the hood Examples - FDSN stations (web) service FDSN stations FDSN Dataselect Database (MD catalogue) Waveform repository
  • 13.
    CKAN CKAN GUI METADATA catalogue CKAN APIs EIDAstations ISIDE stations Metadata replication What is it? - Metadata Catalogue - With interfaces (GUI+API) - No direct CKAN <-> sources connection Examples - Works FDSN stations - Doesn’t work with FDSN dataselect Plugins Plugins Plugins Plugins Plugins Plugins Plugins Plugins
  • 14.
    Brokering System (e.g. VERCEframework) BROKER GUI METADATA catalogue BROKER APIs EIDA stations ISIDE stations Metadata replication What is it? - Metadata Catalogue - With interfaces (GUI+API) - System manager - Other modules - BROKER <-> sources interactive connection Examples - EIDA stations - EIDA dataselect - Processing Job at System manager Interactive access to service EIDA dataselect Processing facility ? ? ?
  • 15.
  • 16.
    A global view Datainitiatives RDA -”regulate” data sharing/use EUDAT - Common data infrastructure EGI - Organize National Grid Infrastructures (CINECA) EPOS - ESFRI integrating Solid
  • 17.
    RDA Do for datawhat has been done for the internet (TCP/IP)
  • 18.
    RDA concepts Data Fabric What? Identifiesmechanisms, standard, components and interfaces making data science efficient and cost effective Data Management Plan • Data management • Data analysis • Data preservation • Data publication • Data sharing [UK data Archive http://www.data-archive.ac.uk/]
  • 19.
    RDA concepts Data Fabric [RDAWG outputs https://indico.cern.ch/event/370271/session/2/contribution/6/material/0/0.pdf] How to store?How to register? How to discover? How to cite? How to document processing? How to integrate? How to collect new DP? How to access?
  • 20.
    data? How to discover data? Metadatasystem WE ALREADY KNOW EVERYTHING ABOUT IT METADATA catalogue
  • 21.
    standards? How to preserve data?Registry systemWhat? Anagreed/legacy catalog of: - data formats (schemas) - metadata formats - Vocabularies & semantic categories - Data types - Trusted repositories - …. Registry Ahaa.. Ma ‘npratica è ‘n database.. …anfatti…
  • 22.
    How to register/citedata or publications? PID system Purpose - DO / publication can be uniquely referenced - Assign a PID at data creation times Issues - Need for a simple mechanism to implement it - Now EUDAT can help - Peter & Massimo
  • 23.
    How to accessdata? AAI system (federeated & distributed) Purpose - Authenticate users - Authorize users Issues - Delegation - Many system, sometimes non interoperable
  • 24.
    How to storedata? Data repository (trusted) What? - Store data - Couple with PIDs - Ensure preservation (not curation) - Can be trusted (DSA) Opportunity - INGV DSA repository…
  • 25.
    How to documentdata processing? Workflow engines Purpose - Tracks data transformation - Allows versioning - Allows reproducibility Comments - Interoperability among various workflow engines - VERCE did it
  • 26.
    Brokering System (e.g. VERCEframework) BROKER GUI METADATA catalogue BROKER APIs Full version include - Metadata Catalogue - interfaces (GUI+API) - System manager - AAI system - Workflow engine External actors - PID System - Trusted repositories - Registries - Processing facilities System manager Data set Data set Data set Data set Data set Data set API API AAI system Workflow Engine Trusted repository Trusted repository Registry PID system HPC center
  • 27.

Editor's Notes

  • #3 DIGITAL DATA Sequence of (digital) symbols With a meaning Can be stored Can be transmitted Can be computed METADATA DATA ABOUT DATA What is metadata to me, can be data to others Many standards Ontologies BROKERING SYSTEM - Intermediary software Access to several system at your place Collects data for you (integration) DATABASE - Collection of (organized) DATA Usually has DBMS APIs Application Programming Interfae Standard procedures or instructions to access to a service (or function)
  • #6 Esempio carta identità
  • #7 Esempio carta identità
  • #8 Esempio carta identità
  • #9 Esempio carta identità
  • #10 Esempio carta identità
  • #11 Esempio carta identità
  • #12 Esempio carta identità
  • #13 Esempio carta identità
  • #14 Esempio carta identità
  • #15 Esempio carta identità
  • #19 Data management –enterprise to build a data repository, manage an information catalog, & enforce management policy Data analysis –enterprise to process a data collection, apply analysis tools, and automate a processing pipeline. Data preservation –enterprise to build reference collections and knowledge bases that comprise the intellectual capital, while managing technology evolution Data publication –discovery and access of data collections Data sharing – controlled sharing of a data collection, shared analysis workflows, and information catalogs
  • #21 Data management –enterprise to build a data repository, manage an information catalog, & enforce management policy Data analysis –enterprise to process a data collection, apply analysis tools, and automate a processing pipeline. Data preservation –enterprise to build reference collections and knowledge bases that comprise the intellectual capital, while managing technology evolution Data publication –discovery and access of data collections Data sharing – controlled sharing of a data collection, shared analysis workflows, and information catalogs
  • #27 Esempio carta identità