How	
  e-­‐infrastructure	
  can	
  contribute	
  to	
  
Linked	
  Germplasm	
  data
	
  
Giannis	
  Stoitsis,	
  Agro-­‐Know
	
  
stoitsis@agroknow.gr
	
  
e-­‐conference	
  on	
  Germplasm	
  Data	
  	
  
Interoperability
	
  
Contents
	
  
Why	
  we	
  need	
  e-­‐infrastructure	
  
What	
  e-­‐infrastructure	
  can	
  provide	
  
The	
  agINFRA	
  approach	
  	
  
agINFRA	
  powered	
  services	
  for	
  Germplasm	
  
data	
  	
  
•  What	
  is	
  next	
  
• 
• 
• 
• 
WHY	
  WE	
  NEED	
  E-­‐INFRASTRUCTURE	
  
agricultural	
  data	
  
•  publicaKons,	
  thesis,	
  reports,	
  other	
  grey	
  literature	
  
•  educaKonal	
  material	
  and	
  content,	
  courseware	
  
•  primary	
  data,	
  such	
  as	
  measurements	
  &	
  observaKons	
  
–  structured,	
  e.g.	
  datasets	
  as	
  tables	
  
–  digiKzed,	
  e.g.	
  images,	
  videos	
  

•  secondary	
  data,	
  such	
  as	
  processed	
  elaboraKons	
  
–  e.g.	
  dendrograms,	
  pie	
  charts,	
  models	
  

•  provenance	
  informaKon,	
  incl.	
  authors,	
  their	
  
organizaKons	
  and	
  projects	
  
•  experimental	
  protocols	
  &	
  methods	
  
•  social	
  data,	
  tags,	
  raKngs,	
  etc.	
  
•  …	
  
•  stats	
  
•  gene	
  banks	
  
•  gis	
  data	
  
•  blogs,	
  	
  
•  journals	
  
•  open	
  archives	
  
•  raw	
  data	
  
•  technologies	
  
•  learning	
  objects	
  
•  ………..	
  

educators’
view
•  stats	
  
•  gene	
  banks	
  
•  gis	
  data	
  
•  blogs,	
  	
  
•  journals	
  
•  open	
  archives	
  
•  raw	
  data	
  

researchers’
•  technologies	
  
view
•  learning	
  objects	
  
•  ………..	
  
•  stats	
  
•  gene	
  banks	
  
•  gis	
  data	
  
•  blogs,	
  	
  
•  journals	
  
•  open	
  archives	
  
•  raw	
  data	
  
•  technologies	
  
•  learning	
  objects	
  
•  ………..	
  

practioners’
view
•  stats	
  
•  gene	
  banks	
  
•  gis	
  data	
  
•  blogs,	
  	
  
•  journals	
  
•  open	
  archives	
  
•  raw	
  data	
  
•  technologies	
  
•  learning	
  objects	
  
•  ………..	
  
LD for educational data/resource sharing
Overview

we	
  sKll	
  have	
  data	
  silos
	
  

Approaches for LD in educational data sharing

•  Many	
  metadata	
  standards	
  (e.g.	
  DC,	
  IEEE	
   APIs and data (http://www.meducator.net)
 On the-fly/automated integration of heterogeneous LOM,	
  Dw,	
  local	
  schemas)	
  
•  Diversity	
  of	
  web	
  interfaces	
  (e.g.	
  REST,	
  OAI-­‐PMH,	
  SOAP,	
  SPI,	
  SQI)	
  
•  Different	
  exchange	
  format	
  (e.g.	
  XML,	
  RDF,	
  JSON)	
  
•  Fragmented	
  use	
  of	
  t cataloging (http://linkedup-project.eu)
 Dataset (transformation and)exonomies	
  

… and not here …

We are still here …

?
we	
  need	
  ontologies	
  published	
  
online	
  and	
  aligned
	
  
• 
• 
• 
• 
• 
• 
• 

	
  

stats	
  
gene	
  banks	
  
blogs,	
  	
  
journals	
  
open	
  archives	
  
raw	
  data	
  
learning	
  objects	
  
we	
  need	
  tools	
  to	
  share	
  data
	
  
we	
  need	
  tools	
  to	
  semanKcally	
  annotate	
  
data
	
  
and	
  for	
  all	
  this	
  we	
  need
	
  
data	
  infrastructure	
  for	
  agriculture	
  
•  aim	
  is:	
  
promo&ng	
  data	
  sharing	
  and	
  
consump&on	
  related	
  to	
  any	
  research	
  
ac&vity	
  aimed	
  at	
  improving	
  
produc&vity	
  and	
  quality	
  of	
  crops	
  
ICT	
  for	
  compu&ng,	
  connec&vity,	
  storage,	
  
instrumenta&on	
  
	
  
	
  
what	
  researchers	
  need	
  in	
  agINFRA
	
  

…	
  only	
  a	
  browser	
  and	
  internet	
  connecKon	
  
typical	
  problem:	
  compuKng
	
  
typical	
  problem:	
  hosKng
	
  
what	
  can	
  be	
  hosted	
  and	
  executed	
  
on	
  agINFRA
	
  
•  Data	
  storage	
  &	
  management	
  tools	
  
–  APIs	
  for	
  content	
  disseminaKon	
  in	
  large	
  networks	
  

•  Processing	
  &	
  visualisaKon	
  tools	
  
•  Metadata	
  aggregaKon	
  infra	
  
•  Search	
  engines	
  and	
  apps	
  for	
  insKtuKons	
  or	
  
communiKes	
  
•  Environments	
  for	
  running	
  experiments	
  e.g.	
  
comparing	
  different	
  content	
  recommendaKon	
  
algorithms	
  
h[p://aginfra.eu/en/our-­‐soluKon/api
	
  
HOW	
  AGINFRA	
  CAN	
  SOLVE	
  DATA	
  
INTEROPERABILITY	
  PROBLEMS	
  	
  
WORKFLOW	
  FOR	
  METADATA	
  
AGGREGATION	
  
metadata	
  aggregaKons
	
  
•  concerns	
  viewing	
  merged	
  collecAons	
  of	
  
metadata	
  records	
  from	
  different	
  sources	
  
•  useful:	
  when	
  access	
  to	
  specific	
  supersets	
  or	
  
subsets	
  of	
  networked	
  collecAons	
  
– records	
  actually	
  stored	
  at	
  aggregator	
  
– or	
  queries	
  distributed	
  at	
  virtually	
  aggregated	
  
collecKons	
  
23	
  
typically	
  look	
  like	
  this
	
  

24	
  

Ternier et al., 2010
metadata	
  aggregaKon	
  tools
	
  

More	
  than	
  a	
  harvester:
	
  

q Valida&on	
  Service	
  
q Repository	
  So4ware	
  
	
  
q Registry	
  Service	
  	
  
q Harvester	
  

Powered by

25	
  
a	
  metadata	
  aggregaKon	
  workflow	
  that	
  can	
  be	
  
ported	
  on	
  agINFRA
	
  

HarvesKng	
  

ValidaKng	
  

Transforming	
  

OAI	
  target	
  -­‐	
  
XMLs	
  

Storing	
  and	
  
indexing	
  	
  

TriplificaKon	
  
TOOLS	
  FOR	
  PUBLISHING	
  AND	
  
LINKING	
  VOCABULARIES	
  
AGRICULTURAL	
  DATA	
  DISCOVERY	
  
SERVICE/PORTAL	
  OVER	
  THE	
  CLOUD	
  
agricultural	
  data	
  discovery	
  modules	
  
for	
  open	
  source	
  CMS
	
  
hIp://www.youtube.com/watch?v=OYlxWlyag04&feature=youtu.be	
  
LINKING	
  GERMPLASM	
  DATABASES	
  AND	
  
EXPOSING	
  DESCRIPTIONS	
  AS	
  LINKED	
  DATA	
  
agINFRA	
  contribuKon	
  in	
  germplasm	
  
data	
  interoperability	
  
	
  
•  Define	
  recommendaKons	
  for	
  describing	
  
germplasm	
  data	
  
•  Define	
  mappings	
  between	
  different	
  metadata	
  
formats	
  
•  Provide	
  APIs	
  for	
  transformaKon	
  
–  triplificaKon	
  of	
  germplasm	
  descripKons	
  
mapping	
  between	
  different	
  metadata	
  
formats	
  powered	
  by	
  agINFRA
	
  
publishing	
  germplasm	
  data	
  as	
  
linked	
  data	
  in	
  agINFRA
	
  

services
next	
  steps	
  in	
  the	
  context	
  of	
  agINFRA
	
  
•  Develop	
  the	
  recommendaKons	
  for	
  publishing	
  
germplasm	
  data	
  
•  Deploy	
  transformers	
  and	
  make	
  them	
  available	
  
in	
  agINFRA	
  
•  Deploy	
  API	
  for	
  triplificaKon	
  
 
	
  

thank	
  you!	
  
stoitsis@agroknow.gr	
  	
  
www.agroknow.gr	
  
www.aginfra.eu	
  	
  	
  	
  	
  

How e-infrastructure can contribute to Linked Germplasm Data

  • 1.
    How  e-­‐infrastructure  can  contribute  to   Linked  Germplasm  data   Giannis  Stoitsis,  Agro-­‐Know   stoitsis@agroknow.gr   e-­‐conference  on  Germplasm  Data     Interoperability  
  • 2.
    Contents   Why  we  need  e-­‐infrastructure   What  e-­‐infrastructure  can  provide   The  agINFRA  approach     agINFRA  powered  services  for  Germplasm   data     •  What  is  next   •  •  •  • 
  • 3.
    WHY  WE  NEED  E-­‐INFRASTRUCTURE  
  • 4.
    agricultural  data   • publicaKons,  thesis,  reports,  other  grey  literature   •  educaKonal  material  and  content,  courseware   •  primary  data,  such  as  measurements  &  observaKons   –  structured,  e.g.  datasets  as  tables   –  digiKzed,  e.g.  images,  videos   •  secondary  data,  such  as  processed  elaboraKons   –  e.g.  dendrograms,  pie  charts,  models   •  provenance  informaKon,  incl.  authors,  their   organizaKons  and  projects   •  experimental  protocols  &  methods   •  social  data,  tags,  raKngs,  etc.   •  …  
  • 5.
    •  stats   • gene  banks   •  gis  data   •  blogs,     •  journals   •  open  archives   •  raw  data   •  technologies   •  learning  objects   •  ………..   educators’ view
  • 6.
    •  stats   • gene  banks   •  gis  data   •  blogs,     •  journals   •  open  archives   •  raw  data   researchers’ •  technologies   view •  learning  objects   •  ………..  
  • 7.
    •  stats   • gene  banks   •  gis  data   •  blogs,     •  journals   •  open  archives   •  raw  data   •  technologies   •  learning  objects   •  ………..   practioners’ view
  • 8.
    •  stats   • gene  banks   •  gis  data   •  blogs,     •  journals   •  open  archives   •  raw  data   •  technologies   •  learning  objects   •  ………..  
  • 9.
    LD for educationaldata/resource sharing Overview we  sKll  have  data  silos   Approaches for LD in educational data sharing •  Many  metadata  standards  (e.g.  DC,  IEEE   APIs and data (http://www.meducator.net)  On the-fly/automated integration of heterogeneous LOM,  Dw,  local  schemas)   •  Diversity  of  web  interfaces  (e.g.  REST,  OAI-­‐PMH,  SOAP,  SPI,  SQI)   •  Different  exchange  format  (e.g.  XML,  RDF,  JSON)   •  Fragmented  use  of  t cataloging (http://linkedup-project.eu)  Dataset (transformation and)exonomies   … and not here … We are still here … ?
  • 10.
    we  need  ontologies  published   online  and  aligned   •  •  •  •  •  •  •    stats   gene  banks   blogs,     journals   open  archives   raw  data   learning  objects  
  • 11.
    we  need  tools  to  share  data  
  • 12.
    we  need  tools  to  semanKcally  annotate   data  
  • 13.
    and  for  all  this  we  need  
  • 15.
    data  infrastructure  for  agriculture   •  aim  is:   promo&ng  data  sharing  and   consump&on  related  to  any  research   ac&vity  aimed  at  improving   produc&vity  and  quality  of  crops   ICT  for  compu&ng,  connec&vity,  storage,   instrumenta&on      
  • 16.
    what  researchers  need  in  agINFRA   …  only  a  browser  and  internet  connecKon  
  • 17.
  • 18.
  • 19.
    what  can  be  hosted  and  executed   on  agINFRA   •  Data  storage  &  management  tools   –  APIs  for  content  disseminaKon  in  large  networks   •  Processing  &  visualisaKon  tools   •  Metadata  aggregaKon  infra   •  Search  engines  and  apps  for  insKtuKons  or   communiKes   •  Environments  for  running  experiments  e.g.   comparing  different  content  recommendaKon   algorithms  
  • 20.
  • 21.
    HOW  AGINFRA  CAN  SOLVE  DATA   INTEROPERABILITY  PROBLEMS    
  • 22.
    WORKFLOW  FOR  METADATA   AGGREGATION  
  • 23.
    metadata  aggregaKons   • concerns  viewing  merged  collecAons  of   metadata  records  from  different  sources   •  useful:  when  access  to  specific  supersets  or   subsets  of  networked  collecAons   – records  actually  stored  at  aggregator   – or  queries  distributed  at  virtually  aggregated   collecKons   23  
  • 24.
    typically  look  like  this   24   Ternier et al., 2010
  • 25.
    metadata  aggregaKon  tools   More  than  a  harvester:   q Valida&on  Service   q Repository  So4ware     q Registry  Service     q Harvester   Powered by 25  
  • 26.
    a  metadata  aggregaKon  workflow  that  can  be   ported  on  agINFRA   HarvesKng   ValidaKng   Transforming   OAI  target  -­‐   XMLs   Storing  and   indexing     TriplificaKon  
  • 27.
    TOOLS  FOR  PUBLISHING  AND   LINKING  VOCABULARIES  
  • 29.
    AGRICULTURAL  DATA  DISCOVERY   SERVICE/PORTAL  OVER  THE  CLOUD  
  • 30.
    agricultural  data  discovery  modules   for  open  source  CMS   hIp://www.youtube.com/watch?v=OYlxWlyag04&feature=youtu.be  
  • 31.
    LINKING  GERMPLASM  DATABASES  AND   EXPOSING  DESCRIPTIONS  AS  LINKED  DATA  
  • 32.
    agINFRA  contribuKon  in  germplasm   data  interoperability     •  Define  recommendaKons  for  describing   germplasm  data   •  Define  mappings  between  different  metadata   formats   •  Provide  APIs  for  transformaKon   –  triplificaKon  of  germplasm  descripKons  
  • 33.
    mapping  between  different  metadata   formats  powered  by  agINFRA  
  • 34.
    publishing  germplasm  data  as   linked  data  in  agINFRA   services
  • 35.
    next  steps  in  the  context  of  agINFRA   •  Develop  the  recommendaKons  for  publishing   germplasm  data   •  Deploy  transformers  and  make  them  available   in  agINFRA   •  Deploy  API  for  triplificaKon  
  • 36.
        thank  you!   stoitsis@agroknow.gr     www.agroknow.gr   www.aginfra.eu