0
Metadata analysis of germplasm
collections
The case of agINFRA
Dr. Vassilis Protonotarios
Agricultural Biotechnologist, Ph...
Structure of the presentation
1. The agINFRA germplasm data sources
– Chinese Crop Germplasm Information System
– Italian ...
The agINFRA germplasm data sources
agINFRA germplasm data sources
• Italian Germplasm Database (CRA)
– Data available through EURISCO -> GENESYS
– Uses EURIS...
agINFRA germplasm data analysis
1. Analysis of agINFRA germplasm data sources
2. Analysis of metadata schemas used
3. Iden...
1. Chinese Crop Germplasm
Information System (CGRIS / CAASD)
Chinese Crop Germplasm
Information System (CGRIS)
• Provided by: Chinese Academy of Agricultural Sciences
• A central repo...
CGRIS: Data
At present, CGRIS owns
• > 2000 MB data on 180 kinds of crops
– including food crops, fibre plants, oil crops,...
CGRIS: Accessions (indicative list)

http://icgr.caas.net.cn/cgrisintroduction.html
Crop Germplasm Classification
Info on wheat varieties
Info on wheat varieties
CGRIS: Germplasm Data Query
CGRIS: Germplasm Data Query
CGRIS Metadata
• CGRIS germplasm descriptors based on own
schema
– can be seen as the de facto standard for
germplasm acce...
CGRIS: Basic Descriptors
CGRIS: Wheat descriptors
CGRIS Metadata: Next steps
• A mapping to the Multi-crop Passport
Descriptors (MCPD) standard is intended
– According to C...
CGRIS: Exposing data
• Data stored in relational DBs
• Hosted in an SQL server
• Exposure of data as CSV files (partially ...
CGRIS: IPR information
• The CGRIS website is public and accessible for
everybody. The information is provided free of
cha...
2. Italian Germplasm Database (CRA)
Italian Germplasm Database
• Provided by: Italian Council for Research and
Experimentation in Agriculture
• Developed in t...
CRA Germplasm: Data
Current status of germplasm data (CRA)
• 20,954 records from Italy are included in
EURISCO of which 17...
CRA: Accessions (indicative list)

URL: http://fru.entecra.it/accessioni.php
Info on specific species
EURISCO
descriptors
CRA Metadata
• Most CRA institutional databases use the
MCPD
– however, in the records provided to the National
Inventory ...
CRA: IPR information
• The CRA website is public and accessible for everybody. The
information is provided free of charge ...
Conclusions
Current status
• First version of mappings is available
• EURISCO descriptors used as base schema
– MCPD
– Darwin Core for...
Mapping table
Mapping table
Development of decision trees
Development of decision trees
Linked Data
• A linked data approach will be used by
agINFRA for linking germplasm data sources
• OpenAGRIS already aggreg...
Conclusions
• Both schemas / sets of descriptors can be
mapped to the EURISCO ones
• Linked Data approach will facilitate ...
Take home message
• The identification of common properties
between different metadata schemas will
facilitate the linked ...
(Indicative) List of References
• agINFRA Deliverable D2.3 “Review of Content
Requirements”
• agINFRA Deliverable D5.3 “Co...
Source: http://verastic.com/social/why-do-people-not-say-thank-you.html

Contact me: vprot@agroknow.gr
agINFRA Germplasm metadata analysis
agINFRA Germplasm metadata analysis
Upcoming SlideShare
Loading in...5
×

agINFRA Germplasm metadata analysis

385

Published on

Presentation of the two agINFRA Germplasm data sources (CGRIS, China and CRA, Italy) and the metadata used for the description of their germplasm accessions. Presented during Session 2 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
385
On Slideshare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "agINFRA Germplasm metadata analysis"

  1. 1. Metadata analysis of germplasm collections The case of agINFRA Dr. Vassilis Protonotarios Agricultural Biotechnologist, PhD Agro-Know Technologies, Greece e-Conference on Germplasm Data Interoperability Session 2: “Status of data and metadata for germplasm”
  2. 2. Structure of the presentation 1. The agINFRA germplasm data sources – Chinese Crop Germplasm Information System – Italian National Germplasm Database 2. Current status – Mappings – Linked Data approach 3. Conclusions
  3. 3. The agINFRA germplasm data sources
  4. 4. agINFRA germplasm data sources • Italian Germplasm Database (CRA) – Data available through EURISCO -> GENESYS – Uses EURISCO set of descriptors – Data also available through GBIF • Chinese Crop Germplasm Information System (CGRIS/CAAS) – Data unavailable through aggregators – Own schema used for description of germplasm accessions – Metadata exposure in CSV
  5. 5. agINFRA germplasm data analysis 1. Analysis of agINFRA germplasm data sources 2. Analysis of metadata schemas used 3. Identification of external schemas – Review of existing work 4. Definition of a base schema (descriptors) 5. Mappings of various schemas to the base one 6. Development of a linked data approach for linking germplasm data sources
  6. 6. 1. Chinese Crop Germplasm Information System (CGRIS / CAASD)
  7. 7. Chinese Crop Germplasm Information System (CGRIS) • Provided by: Chinese Academy of Agricultural Sciences • A central repository for all type of plant genetic resources information. It consists of six subsystems: 1. The management system of the National Crop Gene Bank (NCGB), 2. The management system of the long-term storage in Qinghai, 3. The management system of National germplasm Resources Nursery, 4. The crop characterization and evaluation database system, 5. The database system for germplasm exchange at home and abroad and 6. The management system of the medium-term storage in Beijing. URL: http://icgr.caas.net.cn/cgrisintroduction.html
  8. 8. CGRIS: Data At present, CGRIS owns • > 2000 MB data on 180 kinds of crops – including food crops, fibre plants, oil crops, vegetable, fruit tree, tea, mulberry, tobacco, sugar, green manure crops, tropical crops etc.), • 390,000 accessions of germplasm
  9. 9. CGRIS: Accessions (indicative list) http://icgr.caas.net.cn/cgrisintroduction.html
  10. 10. Crop Germplasm Classification
  11. 11. Info on wheat varieties
  12. 12. Info on wheat varieties
  13. 13. CGRIS: Germplasm Data Query
  14. 14. CGRIS: Germplasm Data Query
  15. 15. CGRIS Metadata • CGRIS germplasm descriptors based on own schema – can be seen as the de facto standard for germplasm accession information in China. – Based on metadata scheme standards such as developed by IPGRI (Bioversity) and GRIN
  16. 16. CGRIS: Basic Descriptors
  17. 17. CGRIS: Wheat descriptors
  18. 18. CGRIS Metadata: Next steps • A mapping to the Multi-crop Passport Descriptors (MCPD) standard is intended – According to CAAS subject experts such a mapping should be rather easy to produce.
  19. 19. CGRIS: Exposing data • Data stored in relational DBs • Hosted in an SQL server • Exposure of data as CSV files (partially in Chinese)
  20. 20. CGRIS: IPR information • The CGRIS website is public and accessible for everybody. The information is provided free of charge but based on copyright. • With regards to data exchange there is no explicit policy to follow. • CGRIS does not have an Open Access mandate and the members of the CGRIS network apply their own institution policy.
  21. 21. 2. Italian Germplasm Database (CRA)
  22. 22. Italian Germplasm Database • Provided by: Italian Council for Research and Experimentation in Agriculture • Developed in the context of the “Plant Genetic Resources/FAO” project in 2004 – Research Centres and Units of the CRA – The Institute of Plant Genetics of the CNR in Bari, – NGO “Rete Semi Rurali” – University collections (Perugia, Potenza etc.) URL: http://fru.entecra.it
  23. 23. CRA Germplasm: Data Current status of germplasm data (CRA) • 20,954 records from Italy are included in EURISCO of which 17,212 from CRA • 28,509 records for 275 plant species in the National Inventory (in general) – does not allow for identifying the number of CRA germplasm records
  24. 24. CRA: Accessions (indicative list) URL: http://fru.entecra.it/accessioni.php
  25. 25. Info on specific species
  26. 26. EURISCO descriptors
  27. 27. CRA Metadata • Most CRA institutional databases use the MCPD – however, in the records provided to the National Inventory several fields are often not filled. • Some CRA collections also use descriptors defined by – the Union for the Protection of New Varieties of Plants (UPOV) and – the National Register of New Varieties. • Ensure mapping to the Multi-crop Passport Descriptors (MCPD)/EURISCO
  28. 28. CRA: IPR information • The CRA website is public and accessible for everybody. The information is provided free of charge but based on copyright • The Multilateral System (MLS) of the Treaty demands free availability of the information on the PGRFA that are under the management and control of the Contracting Parties and in the public domain (Treaty, Art. 11.2). • This excludes – germplasm accessions that are subject to IPR and – other legally binding protection which restricts the Contracting Party’s control over the material. – Accessions that are not covered by IPR include old and autochthonous varieties, crop wild relatives and other material found in in-situ conditions, new cultivars not protected by IPR and cultivars whose IPR have expired.
  29. 29. Conclusions
  30. 30. Current status • First version of mappings is available • EURISCO descriptors used as base schema – MCPD – Darwin Core for Genebanks – ABCD – CGRIS – CRA
  31. 31. Mapping table
  32. 32. Mapping table
  33. 33. Development of decision trees
  34. 34. Development of decision trees
  35. 35. Linked Data • A linked data approach will be used by agINFRA for linking germplasm data sources • OpenAGRIS already aggregates germplasm data using AGROVOC
  36. 36. Conclusions • Both schemas / sets of descriptors can be mapped to the EURISCO ones • Linked Data approach will facilitate linking of germplasm data from CRA/CGRIS • EURISCO descriptors to be published as linked data – To be used as the base of passport data • Linking to other germplasm standards – e.g. Darwin Core for Genebanks* *https://code.google.com/p/darwincore-germplasm/wiki/DarwinCoreGermplasmMapping
  37. 37. Take home message • The identification of common properties between different metadata schemas will facilitate the linked data framework
  38. 38. (Indicative) List of References • agINFRA Deliverable D2.3 “Review of Content Requirements” • agINFRA Deliverable D5.3 “Conceptual specification of linked agricultural data framework” • agINFRA Germplasm Working Group Wiki http://wiki.aginfra.eu/index.php/Germplasm_Working_Group • EURISCO passport descriptors http://www.ecpgr.cgiar.org/germplasm_databases.html • Draft Mapping of EURISCO Descriptors to ABCD 2.06 http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/EURISCO-2-ABCD.pdf
  39. 39. Source: http://verastic.com/social/why-do-people-not-say-thank-you.html Contact me: vprot@agroknow.gr
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×