agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

614 views
535 views

Published on

Presentation of agINFRA project (www.aginfra.eu) in the EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
“Managing, computing and preserving big data for research”
https://indico.egi.eu/indico/conferenceDisplay.py?confId=2052

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
614
On SlideShare
0
From Embeds
0
Number of Embeds
245
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014

  1. 1. agINFRA A data infrastructure to support agricultural scientific communities Andreas Drakos, University of Alcala EGI-APARSEN workshop, Amsterdam, 4-6 March 2014
  2. 2. Our project in agINFRA we will: share agricultural research… …over a data e-infrastructure EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 2
  3. 3. Agricultural research data • Primary data: – Structured, e.g. datasets as tables – Digitized : images, videos, etc. • Secondary data (elaborations, e.g. a dendogram) • Provenance information, incl. authors, their organizations and projects • Methods and procedures followed • Reports, including papers • Secondary documents, e.g. training resources • Metadata about the above • Social data, tags, ratings, etc. EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 3
  4. 4. agINFRA values: scientific data must be A | Open | Must be open and interlinked NOT subject to barriers, based on standard formats and avoiding building data silos due to lack of interrelatedness and ad-hoc APIs. B | Meaningful | Must be meaningful through explicit semantics Reusing the semantics already provided in mature terminologies and ontologies that are exposed and interlinked through the Web. C | Reliable | Must be reliable, traceable and accessible Any kind of research objects can be stored in the data infrastructure, and there are NO barriers to expressing relations between these objects to capture the context of research activities. D | Actionable | Must be actionable via services that empower research Data is not useful without flexible and adaptable services that allow researchers to act on the data in the ways they need. EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 4
  5. 5. There is a lot of data EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 5
  6. 6. CONTENT PROVIDER WITH UNORGANISED COLLECTION (e.g. listed at Web site or in DVD-ROM) chooses sharing compliant tool register as data source hosted over agINFRA (meta)data export in proprietary format & ingestion in sharing mapping to known compliant tool CONTENT PROVIDER WITH CMS THAT DOES NOT SUPPORT SHARING (e.g. proprietary DB) register as data source hosted over agINFRA computed over agINFRA register as data source hosted over agINFRA CONTENT PROVIDER WITH CMS THAT SUPPORTS SHARING (e.g. OAI-PMH, EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 RSS,...) 6
  7. 7. shares (meta)data e.g. through OAI-PMH computed over agINFRA hosted over agINFRA shares (meta)data e.g. through OAI-PMH computed over agINFRA computed over agINFRA (META)DATA AGGREGATOR indexed & available through CIARD RING served through agINFRA shares (meta)data e.g. through OAI-PMH computed over agINFRA EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 7
  8. 8. computed over agINFRA computed over agINFRA … EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 hosted over agINFRA computed over agINFRA 8
  9. 9. Actors over the infrastructure Registry of Datasets and APIs collections Registry of vocabularies and tools data sources Cloud / SaaS tools APIs LOD Vocabularies agINFRA RDF vocabularies Public REST APIs Grid jobs Grid workflowss Productivity Tools EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 Information services agINFRA LOD KOSs 9
  10. 10. Actors over the infrastructure Developers Information systems providers Registry of Datasets and APIs collections Registry of vocabularies and tools data sources Cloud / SaaS tools Public REST APIs Grid jobs Grid workflowss Productivity Tools Taxonomists APIs LOD Vocabularies Data providers agINFRA RDF vocabularies agINFRA LOD KOSs Researchers EGI-APARSEN workshop, Amsterdam, 4-6 March 2014 Information services Policy makers 10
  11. 11. An existing data community • a global community movement to make agricultural research information and knowledge publicly accessible to all – http://www.ciard.net agINFRA 2nd Review Meeting, 13th of December 2013 11
  12. 12. A core registry service • CIARD RING (Routemap to Information Nodes and Gateways) – global registry to give access to any kind of information sources pertaining to agricultural research for development – principal tool created through CIARD to allow information providers to register their services in various categories and facilitate discovery of sources of agriculture-related information across the world agINFRA 2nd Review Meeting, 13th of December 2013 12
  13. 13. New agINFRA RING agINFRA 2nd Review Meeting, 13th of December 2013 13
  14. 14. New agINFRA RING agINFRA 2nd Review Meeting, 13th of December 2013 14
  15. 15. RING data registry usage scenario 1 • data aggregators registering their data providers to CIARD RING – asking directly to be registered there (AGRIS) – federating own smaller registries (GLN) agINFRA 2nd Review Meeting, 13th of December 2013 15
  16. 16. RING data registry usage scenario 2 • new data providers using agINFRA cloud tools can be automatically registered to CIARD RING – cloud-hosted AgriDrupal or AgriOceanDSpace instances for document repositories – cloud-hosted agLR instances for learning repositories • agINFRA Cloud hosting services – In collaboration with other cloud communities (eg. OKEANOS/GRNET) – In collaboration with CHAIN-REDS project etc. agINFRA 2nd Review Meeting, 13th of December 2013 16
  17. 17. Data provider scenario 1 Data provider in need of hosting & storage of smallscale CMS Use a cloud hosted CMS Cloud / SaaS tools Registry of Datasets and APIs collections Registry of vocabularies and tools data sources APIs LOD Vocabularies Public REST APIs Grid jobs Grid workflowss Productivity Tools agINFRA RDF vocabularies agINFRA LOD KOSs sets up own CMS instance agINFRA 2nd Review Meeting, 13th of December 2013 Information services 17
  18. 18. Data provider scenario 2 Data provider in need of large scale hosting & replication CMS Requests space/accounts in large-scale CMS Cloud / SaaS tools Registry of Datasets and APIs collections Registry of vocabularies and tools data sources APIs LOD Vocabularies agINFRA RDF vocabularies Public REST APIs Grid jobs Grid workflowss Productivity Tools agINFRA 2nd Review Meeting, 13th of December 2013 Information services agINFRA LOD KOSs 18
  19. 19. A semantic backbone for agINFRA • to help all data providers declaring, publishing & linking their metadata properties and value spaces – Publishing their KOSs using the VocBench and their metadata vocabularies using Neologism – Linking them to existing vocabularies, e.g. AGROVOC for KOSs, Dublin Core for metadata • guidelines & tools to support data providers in adopting such a LOD framework – e.g. LODE-BD recommendations • to provide an entry point to existing relevant vocabularies agINFRA 2nd Review Meeting, 13th of December 2013 19
  20. 20. Exposing to the e-infrastructure scenario Data provider hosting CMS at own or external/commerci al infrastructure Interested to expose (meta)data to einfrastructure Cloud / SaaS tools Registry of Datasets and APIs collections Registry of vocabularies and tools data sources APIs LOD Vocabularies agINFRA RDF vocabularies Public REST APIs Grid jobs Grid workflowss Productivity Tools agINFRA 2nd Review Meeting, 13th of December 2013 Information services agINFRA LOD KOSs 20
  21. 21. agINFRA LOD layer usage scenario 1 • A data owner wants to share their data as Linked Data • The data owner uses non-LOD vocabularies and KOSs and wants to publish them as LOD and link them to existing vocabularies • agINFRA offers tools for publishing vocabularies and KOSs Once the vocabularies are published, all metadata and all concepts have URIs and can be referenced by any other system agINFRA 2nd Review Meeting, 13th of December 2013 21
  22. 22. agINFRA LOD layer usage scenario 2 • Once KOSs are published, all metadata and all concepts have URIs and can be referenced by any other system • Data aggregators like AGRIS and GLN can create mash ups between their core data and other agricultural data types (e.g. germplasm, soil maps, statistics, ….) by using the LOD semantic backbone as a crosswalk between metadata formalizations and concepts in different vocabularies agINFRA 2nd Review Meeting, 13th of December 2013 22
  23. 23. agINFRA LOD layer usage scenario 2 Example: LOD-based mash-ups in AGRIS AGRIS bibliographic metadata Journal AGRIS Journals RDF store Topic Geographic metadata Thematic metadata DBpedia Scientific names FAO Country Profiles FAO Fisheries WorldBank indicators by country Info on journal Info on topic Info on country agINFRA 2nd Review Meeting, 13th of December 2013 Info on species Specific indicators on country 23
  24. 24. Workflow architecture File system (DC, IEEE LOM, MODS XML) Stores Ariadne harvester File system (DC, IEEE LOM, MODS XML) Stores Filtering component To be ported on the Grid MySQL Records with Broken Links File system (XMLs) Get unique ID Identification and de-duplication component Transformation component Stores Duplicates Store metadata in JSON Link checking component PostProcessing/ Enrichment component
  25. 25. Thank you! Questions

×