Presentation of some of the major germplasm data sources, including aggregators, networks and individual data providers. Information based on the agINFRA Dossier on Germplasm Data sources (available at http://wiki.aginfra.eu/index.php/Germplasm_Working_Group)
Presented during Session 3 of the 1st International e-Conference on Germplasm Data Interoperability (https://sites.google.com/site/germplasminteroperability/)
A Critique of the Proposed National Education Policy Reform
Major germplasm data sources and referatories
1. Major germplasm data sources
and referatories
Dr. Vassilis Protonotarios
Agro-Know Technologies, Greece
Dr. Guntram Geser
Salzburg Research, Austria
e-Conference on Germplasm Data Interoperability
Session 3: “Setting up an infra for the Germplasm Data”
2. Structure of the presentation
1.
2.
3.
4.
Introduction
Germplasm data aggregators
Germplasm data sources
Conclusions
3. Aim of presentation
• To provide an overview of the major
germplasm data sources
– Not possible to cover all of them
– Information mostly based on “agINFRA Dossier on
Germplasm Information” (2012)
5. About Genesys
• Developed by Bioversity International on
behalf of
– the System-wide Genetic Resources Programme of
the CGIAR,
– the Global Crop Diversity Trust and
– the Secretariat of the International Treaty on Plant
Genetic Resources for Food and Agriculture.
• Genesys was launched on 13 May 2011
URL: http://www.genesys-pgr.org/
6. Data aggregation
1. SINGER (Systems-Wide Information Network
for Genetic Resources)
2. EURISCO (European National Germplasm
Inventories)
3. GRIN (Germplasm Resources Information
Network) of USDA
7. Genesys data
• Provides access to almost 2,5M germplasm
accession in 356 institutions of 238 countries.
– This covers about one third of the genebank
accessions estimated to be held worldwide.
• Contains over 11 million Characterization and
Evaluation (C&E) records
• Environmental data records for the over
600,000 geo-referenced sites where
accessions were collected.
8. Metadata model used
• Schema based on MCPD
– Adopted by SINGER & EURISCO
• Expanded to include Characterization &
Evaluation data (C&E)
– As used by GRIN and CGIAR genebanks
10. About EURISCO
• EURISCO: based on a European network of ex
situ National Inventories (NIs) that makes the
European plant genetic resources data available
everywhere in the world.
• Maintained by Bioversity International on behalf
of the Secretariat of the European Cooperative
Programme for Plant Genetic Resources (ECPGR)
in collaboration with the National Focal Points for
the National Inventories.
URL: http://eurisco.ecpgr.org
11. Data aggregation
1. National Focal Points
– the unique link between the EURISCO and the
European National Inventories (NIs) and national
documentation systems.
2. National Inventories
– A number of European countries have
established national PGR inventories that are
available on the web.
12. EURISCO germplasm data
• The EURISCO Catalogue contains passport
data on more than 1.1M samples of crop
diversity
– representing >5,600 genera and >36,00 species
(genus-species combinations including synonyms
and spelling variants)
– from 43 countries
*as of May 2012
13. Metadata model used
• Schema based on MCPD
– Adopted by SINGER & EURISCO
• Expanded to include Characterization &
Evaluation data (C&E)
– As used by GRIN and CGIAR genebanks
15. About GBIF
• An international open data infrastructure,
funded by governments.
• Operates through a network of nodes,
coordinating the biodiversity information
facilities of participant countries and
organizations, collaborating with each other
and the Secretariat
URL: http://www.gbif.org
16. Data aggregation
• Data aggregated from almost 750 data
sources, including
–
–
–
–
–
–
Laboratories
Research centers
Corporations
Museums
NGOs
Universities
20. About GRIN
• Developed by the U.S. Department of
Agriculture / Agricultural Research Service
• Aims to acquire, characterize, preserve,
document, and distribute to scientists,
germplasm of all lifeforms important for food
and agricultural production.
URL: http://www.ars-grin.gov/
21. Data aggregation
• Data aggregated from more than thirty
germplasm USDA / ARS data sources,
including
– Arctic and Subarctic Plant Gene Bank Research
centers
– Desert Legume Program
– Forest Service National Seed Lab
– Maize Genetic Stock Center
– National Arid Land Plant Genetic Resources Unit
22. GRIN data
• Data available though GRIN includes
– Passport,
– Characterization & Evaluation,
– Inventory and
– Distribution data
• > 500,000 accessions (distinct varieties of
plants) in the GRIN database.
– Representing >10,000 species of plants
23. Metadata model used
• Passport information
– Crop independent
– Own schema*
• Crop descriptors**
– Crop specific
*http://www.ars-grin.gov/npgs/pcgrin/manual/genlinfo.htm
** http://www.ars-grin.gov/npgs/pcgrin/manual/concepts.htm
25. ECPGR Germplasm Databases
• European Cooperative Programme for Plant
Genetic Resources (ECPGR)
– founded in 1980 on the basis of the
recommendations of
• the United Nations Development Programme (UNDP),
• the Food and Agriculture Organization of the United
Nations (FAO) and
• the Genebank Committee of the European Association
for Research on Plant Breeding (EUCARPIA).
URL: http://www.ecpgr.cgiar.org/germplasm_databases.html
26. ECPGR Data sources
• 64 ECPGR Central Crop Databases have been
established by individual institutes and the
ECPGR Working Groups.
• The databases hold passport data and to
varying degrees, characterization and primary
evaluation data of the major collections of the
respective crops
27. ECPGR germplasm data
• ECPGR offers Web access to specific crop and multicrop databases:
1. ECPGR Central Crop Databases and other Crop Databases
ECPGR and other Central Crop Databases have been
established through the initiative of individual institutes
and of ECPGR Working Groups. The databases hold
passport data and, to varying degrees, characterization
and primary evaluation data of the major collections of
the respective crops in Europe.
2. Germplasm Collecting Missions Database
3. International Multi-crop Databases
4. National Multi-crop Databases
29. Crop Genebank Knowledge Base
• An initiative of the System-wide Genetic
Resources Programme (SGRP) of the
Consultative Group on International
Agricultural Research (CGIAR).
• Developed as part of the World Bank funded
project “Collective Action for the Rehabilitation of Global
Public Goods in the CGIAR Genetic Resources System, Phase 2
(GPG 2)”.
URL: http://cropgenebank.sgrp.cgiar.org
30. CGKB data
• A user-friendly online access to procedures, standards
and practices for managing clonally propagated and
seed crops held in genebanks.
• Best practices in the framework of a learning platform.
• Links to other related information and training
resources.
• A mechanism to update the existing best practices for
crop management in genebanks and to develop best
practices for additional crops.
• Build capacity of genebank curators and technicians.
32. European Genebank Integrated System
• Developed by the European Cooperative
Programme for Plant Genetic Resources
(ECPGR)
• Supports the coordination of plant genetic
resources for food and agriculture (PGRFA)
URL: http://cropgenebank.sgrp.cgiar.org
33. AEGIS data
• The European Collection
– operated as a virtual European genebank,
– composed of European Accessions conserved for
the long-term by the AEGIS Associate Members on
behalf of the ECPGR Member countries and being
available for use or conservation only for the
purposes of research, breeding and training for
food and agriculture.
35. NordGen
• Nordic organization supporting the
coordination of Nordic plant genetic resources
for food and agriculture (PGRFA)
• Joint initiative of all Nordic countries
– Denmark,
– Finland,
– Iceland,
– Norway,
– Sweden
URL: http://www.nordgen.org
36. NordGen SESTO
• SESTO Genebank Documentation System
– A genebank management tool developed by the
Nordic Gene Bank (today Nordic Genetic Resource
Center, NordGen).
– Developed into a more generic PGR information
system
– Adopted for management and presentation of
data from other genebanks in other parts of the
world.
37. NordGen Data
• Information available for:
– Trait Datasets
– Trait Descriptors
– Germplasm accessions
– Observations
39. Conclusions (1/2)
• Germplasm data available from several
sources
– Global aggregators
– National Inventories/aggregators
– Individual data sources
• Different metadata used in each case
– Highlights the need for harmonization of
standards
– Standard vocabularies will allow linked data
approach
40. Conclusions (2/2)
• Aggregation of metadata facilitates
harmonization
– Application of a common standard for several data
sources
• Exposure of metadata as linked data per
aggregator
41. Next steps
• Complete pending mapping between existing
standards
– Work of bioinformatics experts
• Define common vocabularies
– Work of germplasm experts
• Deploy a linked germplasm data framework
– agINFRA could help with that
42. References
• Geser, Guntram (2012) “agINFRA Dossier on
Germplasm information”. Available online at:
http://wiki.aginfra.eu/index.php/Germplasm_Working_Group
• Websites of data sources mentioned in the
presentation