agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

agINFRA
A data infrastructure to
support agricultural scientific
communities
Andreas Drakos, University of Alcala
EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Our project

in agINFRA we will:

share agricultural research…
…over a data e-infrastructure


2

Agricultural research data
• Primary data:
– Structured, e.g. datasets as tables
– Digitized : images, videos, etc.

• Secondary data (elaborations, e.g. a dendogram)
• Provenance information, incl. authors, their
organizations and projects
• Methods and procedures followed
• Reports, including papers
• Secondary documents, e.g. training resources
• Metadata about the above
• Social data, tags, ratings, etc.

3

agINFRA values: scientific data must be
A

| Open |

Must be open and interlinked
NOT subject to barriers, based on standard formats and avoiding building
data silos due to lack of interrelatedness and ad-hoc APIs.

B

| Meaningful | Must be meaningful through explicit semantics
Reusing the semantics already provided in mature terminologies and
ontologies that are exposed and interlinked through the Web.

C

| Reliable | Must be reliable, traceable and accessible
Any kind of research objects can be stored in the data infrastructure, and
there are NO barriers to expressing relations between these objects to
capture the context of research activities.

D

| Actionable | Must be actionable via services that empower research
Data is not useful without flexible and adaptable services that allow
researchers to act on the data in the ways they need.

4

There is a lot of data


5

CONTENT PROVIDER
WITH UNORGANISED
COLLECTION
(e.g. listed at Web
site or in DVD-ROM)

chooses sharing
compliant tool

register as
data source

hosted over agINFRA

(meta)data export in
proprietary format & ingestion in sharing
mapping to known
compliant tool
CONTENT PROVIDER
WITH CMS THAT DOES
NOT SUPPORT
SHARING (e.g.
proprietary DB)

register as
data source

hosted over agINFRA
computed over agINFRA

register as
data source
hosted over agINFRA
CONTENT PROVIDER
WITH CMS THAT
SUPPORTS SHARING
(e.g. OAI-PMH,
RSS,...)

6

shares (meta)data
e.g. through OAI-PMH

hosted over agINFRA
shares (meta)data



(META)DATA
AGGREGATOR

indexed & available
through CIARD RING

served through agINFRA

shares (meta)data



7



…

hosted over agINFRA

8

Actors over the infrastructure
Registry of
Datasets and APIs
collections

Registry of
vocabularies
and tools

data sources

Cloud / SaaS tools

APIs

LOD Vocabularies
agINFRA RDF
vocabularies

Public REST APIs

Grid jobs
Grid workflowss

Productivity Tools


Information services

agINFRA LOD KOSs

9

Actors over the infrastructure
Developers
Information
systems
providers

Registry of
Datasets and APIs
collections

Registry of
vocabularies
and tools

data sources

Cloud / SaaS tools

Public REST APIs

Grid jobs
Grid workflowss

Productivity Tools

Taxonomists

APIs

LOD Vocabularies

Data providers
agINFRA RDF
vocabularies

agINFRA LOD KOSs

Researchers



Policy makers 10

An existing data community

• a global community movement to make
agricultural research information and
knowledge publicly accessible to all
– http://www.ciard.net

agINFRA 2nd Review Meeting, 13th of December 2013

11

A core registry service

• CIARD RING (Routemap to Information Nodes
and Gateways)
– global registry to give access to any kind of
information sources pertaining to agricultural
research for development
– principal tool created through CIARD to allow
information providers to register their services in
various categories and facilitate discovery of
sources of agriculture-related information across
the world

12

New agINFRA RING


13

New agINFRA RING


14

RING data registry usage scenario 1

• data aggregators registering their data
providers to
CIARD RING
– asking directly to
be registered there
(AGRIS)
– federating own
smaller registries
(GLN)


15

RING data registry usage scenario 2

• new data providers using agINFRA cloud tools
can be automatically registered to CIARD RING
– cloud-hosted AgriDrupal or AgriOceanDSpace
instances for document repositories
– cloud-hosted agLR instances for learning
repositories

• agINFRA Cloud hosting services
– In collaboration with other cloud communities
(eg. OKEANOS/GRNET)
– In collaboration with CHAIN-REDS project etc.

16

Data provider scenario 1
Data provider in
need of hosting &
storage of smallscale CMS

Use a cloud
hosted CMS
Cloud / SaaS tools

Registry of
Datasets and APIs
collections

Registry of
vocabularies
and tools

data sources
APIs

LOD Vocabularies

Public REST APIs

Grid jobs
Grid workflowss

Productivity Tools
agINFRA RDF
vocabularies

agINFRA LOD KOSs

sets up own CMS instance



17

Data provider scenario 2
Data provider in
need of large scale
hosting &
replication CMS
Requests
space/accounts
in large-scale
CMS
Cloud / SaaS tools

Registry of
Datasets and APIs
collections

Registry of
vocabularies
and tools

data sources
APIs

LOD Vocabularies
agINFRA RDF
vocabularies

Public REST APIs

Grid jobs
Grid workflowss

Productivity Tools



agINFRA LOD KOSs

18

A semantic backbone for agINFRA

• to help all data providers declaring, publishing &
linking their metadata properties and value
spaces
– Publishing their KOSs using the VocBench and their
metadata vocabularies using Neologism
– Linking them to existing vocabularies, e.g. AGROVOC
for KOSs, Dublin Core for metadata

• guidelines & tools to support data providers in
adopting such a LOD framework
– e.g. LODE-BD recommendations

• to provide an entry point to existing relevant
vocabularies

19

Exposing to the e-infrastructure scenario
Data provider
hosting CMS at
own or
external/commerci
al infrastructure
Interested to expose
(meta)data to einfrastructure
Cloud / SaaS tools

Registry of
Datasets and APIs
collections

Registry of
vocabularies
and tools

data sources
APIs

LOD Vocabularies
agINFRA RDF
vocabularies

Public REST APIs

Grid jobs
Grid workflowss

Productivity Tools



agINFRA LOD KOSs

20

agINFRA LOD layer usage scenario 1
• A data owner wants to share their data as Linked
Data
• The data owner uses non-LOD vocabularies and
KOSs and wants to publish them as LOD and link
them to existing vocabularies
• agINFRA offers tools for publishing vocabularies
and KOSs

Once the vocabularies are published, all metadata
and all concepts have URIs and can be referenced by
any other system

21

• Once KOSs are published, all metadata and all
concepts have URIs and can be referenced by any
other system
• Data aggregators like AGRIS and GLN can create
mash ups between their core data and other
agricultural data types (e.g. germplasm, soil maps,
statistics, ….) by using the LOD semantic backbone as
a crosswalk between metadata formalizations and
concepts in different vocabularies


22

Example: LOD-based mash-ups in AGRIS
AGRIS bibliographic metadata
Journal

AGRIS
Journals
RDF store

Topic
Geographic
metadata

Thematic
metadata

DBpedia

Scientific
names

FAO Country
Profiles

FAO
Fisheries

WorldBank
indicators by
country
Info on
journal

Info on
topic
Info on
country


Info on
species
Specific
indicators on
country

23

Workflow architecture

File system
(DC, IEEE
LOM, MODS
XML)

Stores

Ariadne
harvester

File system
(DC, IEEE
LOM, MODS
XML)

Stores

Filtering
component
To be ported on
the Grid

MySQL

Records
with
Broken
Links

File
system
(XMLs)

Get unique ID

Identification and
de-duplication
component

Transformation
component

Stores

Duplicates

Store
metadata
in JSON

Link checking
component

PostProcessing/
Enrichment
component

agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

Similar to agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 (20)

More from Andreas Drakos

More from Andreas Drakos (6)

Recently uploaded

Recently uploaded (20)

agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014