agINFRA Germplasm metadata analysis
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

agINFRA Germplasm metadata analysis

  • 454 views
Uploaded on

Presentation of the two agINFRA Germplasm data sources (CGRIS, China and CRA, Italy) and the metadata used for the description of their germplasm accessions. Presented during Session 2 of the 1st......

Presentation of the two agINFRA Germplasm data sources (CGRIS, China and CRA, Italy) and the metadata used for the description of their germplasm accessions. Presented during Session 2 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
454
On Slideshare
272
From Embeds
182
Number of Embeds
14

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 182

http://vprot.blogspot.gr 98
http://vprot.blogspot.com 36
http://vprot.blogspot.co.uk 17
http://vprot.blogspot.it 5
http://vprot.blogspot.cz 5
http://vprot.blogspot.be 5
http://vprot.blogspot.in 3
http://vprot.blogspot.nl 3
http://vprot.blogspot.com.es 2
http://vprot.blogspot.ch 2
http://vprot.blogspot.ro 2
http://vprot.blogspot.ca 2
http://vprot.blogspot.fr 1
http://vprot.blogspot.de 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Metadata analysis of germplasm collections The case of agINFRA Dr. Vassilis Protonotarios Agricultural Biotechnologist, PhD Agro-Know Technologies, Greece e-Conference on Germplasm Data Interoperability Session 2: “Status of data and metadata for germplasm”
  • 2. Structure of the presentation 1. The agINFRA germplasm data sources – Chinese Crop Germplasm Information System – Italian National Germplasm Database 2. Current status – Mappings – Linked Data approach 3. Conclusions
  • 3. The agINFRA germplasm data sources
  • 4. agINFRA germplasm data sources • Italian Germplasm Database (CRA) – Data available through EURISCO -> GENESYS – Uses EURISCO set of descriptors – Data also available through GBIF • Chinese Crop Germplasm Information System (CGRIS/CAAS) – Data unavailable through aggregators – Own schema used for description of germplasm accessions – Metadata exposure in CSV
  • 5. agINFRA germplasm data analysis 1. Analysis of agINFRA germplasm data sources 2. Analysis of metadata schemas used 3. Identification of external schemas – Review of existing work 4. Definition of a base schema (descriptors) 5. Mappings of various schemas to the base one 6. Development of a linked data approach for linking germplasm data sources
  • 6. 1. Chinese Crop Germplasm Information System (CGRIS / CAASD)
  • 7. Chinese Crop Germplasm Information System (CGRIS) • Provided by: Chinese Academy of Agricultural Sciences • A central repository for all type of plant genetic resources information. It consists of six subsystems: 1. The management system of the National Crop Gene Bank (NCGB), 2. The management system of the long-term storage in Qinghai, 3. The management system of National germplasm Resources Nursery, 4. The crop characterization and evaluation database system, 5. The database system for germplasm exchange at home and abroad and 6. The management system of the medium-term storage in Beijing. URL: http://icgr.caas.net.cn/cgrisintroduction.html
  • 8. CGRIS: Data At present, CGRIS owns • > 2000 MB data on 180 kinds of crops – including food crops, fibre plants, oil crops, vegetable, fruit tree, tea, mulberry, tobacco, sugar, green manure crops, tropical crops etc.), • 390,000 accessions of germplasm
  • 9. CGRIS: Accessions (indicative list) http://icgr.caas.net.cn/cgrisintroduction.html
  • 10. Crop Germplasm Classification
  • 11. Info on wheat varieties
  • 12. Info on wheat varieties
  • 13. CGRIS: Germplasm Data Query
  • 14. CGRIS: Germplasm Data Query
  • 15. CGRIS Metadata • CGRIS germplasm descriptors based on own schema – can be seen as the de facto standard for germplasm accession information in China. – Based on metadata scheme standards such as developed by IPGRI (Bioversity) and GRIN
  • 16. CGRIS: Basic Descriptors
  • 17. CGRIS: Wheat descriptors
  • 18. CGRIS Metadata: Next steps • A mapping to the Multi-crop Passport Descriptors (MCPD) standard is intended – According to CAAS subject experts such a mapping should be rather easy to produce.
  • 19. CGRIS: Exposing data • Data stored in relational DBs • Hosted in an SQL server • Exposure of data as CSV files (partially in Chinese)
  • 20. CGRIS: IPR information • The CGRIS website is public and accessible for everybody. The information is provided free of charge but based on copyright. • With regards to data exchange there is no explicit policy to follow. • CGRIS does not have an Open Access mandate and the members of the CGRIS network apply their own institution policy.
  • 21. 2. Italian Germplasm Database (CRA)
  • 22. Italian Germplasm Database • Provided by: Italian Council for Research and Experimentation in Agriculture • Developed in the context of the “Plant Genetic Resources/FAO” project in 2004 – Research Centres and Units of the CRA – The Institute of Plant Genetics of the CNR in Bari, – NGO “Rete Semi Rurali” – University collections (Perugia, Potenza etc.) URL: http://fru.entecra.it
  • 23. CRA Germplasm: Data Current status of germplasm data (CRA) • 20,954 records from Italy are included in EURISCO of which 17,212 from CRA • 28,509 records for 275 plant species in the National Inventory (in general) – does not allow for identifying the number of CRA germplasm records
  • 24. CRA: Accessions (indicative list) URL: http://fru.entecra.it/accessioni.php
  • 25. Info on specific species
  • 26. EURISCO descriptors
  • 27. CRA Metadata • Most CRA institutional databases use the MCPD – however, in the records provided to the National Inventory several fields are often not filled. • Some CRA collections also use descriptors defined by – the Union for the Protection of New Varieties of Plants (UPOV) and – the National Register of New Varieties. • Ensure mapping to the Multi-crop Passport Descriptors (MCPD)/EURISCO
  • 28. CRA: IPR information • The CRA website is public and accessible for everybody. The information is provided free of charge but based on copyright • The Multilateral System (MLS) of the Treaty demands free availability of the information on the PGRFA that are under the management and control of the Contracting Parties and in the public domain (Treaty, Art. 11.2). • This excludes – germplasm accessions that are subject to IPR and – other legally binding protection which restricts the Contracting Party’s control over the material. – Accessions that are not covered by IPR include old and autochthonous varieties, crop wild relatives and other material found in in-situ conditions, new cultivars not protected by IPR and cultivars whose IPR have expired.
  • 29. Conclusions
  • 30. Current status • First version of mappings is available • EURISCO descriptors used as base schema – MCPD – Darwin Core for Genebanks – ABCD – CGRIS – CRA
  • 31. Mapping table
  • 32. Mapping table
  • 33. Development of decision trees
  • 34. Development of decision trees
  • 35. Linked Data • A linked data approach will be used by agINFRA for linking germplasm data sources • OpenAGRIS already aggregates germplasm data using AGROVOC
  • 36. Conclusions • Both schemas / sets of descriptors can be mapped to the EURISCO ones • Linked Data approach will facilitate linking of germplasm data from CRA/CGRIS • EURISCO descriptors to be published as linked data – To be used as the base of passport data • Linking to other germplasm standards – e.g. Darwin Core for Genebanks* *https://code.google.com/p/darwincore-germplasm/wiki/DarwinCoreGermplasmMapping
  • 37. Take home message • The identification of common properties between different metadata schemas will facilitate the linked data framework
  • 38. (Indicative) List of References • agINFRA Deliverable D2.3 “Review of Content Requirements” • agINFRA Deliverable D5.3 “Conceptual specification of linked agricultural data framework” • agINFRA Germplasm Working Group Wiki http://wiki.aginfra.eu/index.php/Germplasm_Working_Group • EURISCO passport descriptors http://www.ecpgr.cgiar.org/germplasm_databases.html • Draft Mapping of EURISCO Descriptors to ABCD 2.06 http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/EURISCO-2-ABCD.pdf
  • 39. Source: http://verastic.com/social/why-do-people-not-say-thank-you.html Contact me: vprot@agroknow.gr