BioCASE web services for germplasm data sets, at FAO, Rome (2006)


Published on

Sharing of biodiversity data with web services - demonstration of the BioCASE software. Food and Agriculture Organization of the United Nations (FAO) 2nd March 2006.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • * Text formulation source [] wording above is modified. * Photo (top): Beetle collection in Benin, West Africa (March 24 2004). Photographer Dag Endresen. []
  • Photo (top) * Seed storage in Benin, West Africa (March 24 2004). Photographer Dag Endresen. [] Photo (below)* VIR seed collection. St. Petersburg. Photographer Eva Thörn (NGB Picture Archive, image 001319).
  • Photo: Field been from Boreal, accession NGB11518, 2005-03-05, Dag Endresen []
  • * IPGRI Descriptors lists [] (119 descriptor lists, 2005) * MCPD [] * UPOV - International Union for the Protection of New Varieties of Plants (UPOV) [] * UPOV - The International Union for the Protection of New Varieties of Plants or UPOV (French: Union internationale pour la protection des obtentions végétales) is an intergovernmental organization with headquarters in Geneva, Switzerland. [] * COMECON - The Council for Mutual Economic Assistance (COMECON / Comecon / CMEA / CEMA), 1949 – 1991, was an economic organisation of communist states and a kind of Eastern European equivalent to the European Economic Community. The military counterpart to the Comecon was the Warsaw Pact. [] * Multi-crop Passport Descriptors (MCPD) [] F AO (Food and Agricultural Organization of the United Nations) - IPGRI (International Plant Genetic Resources Institute). This is a revised version (December 2001) of the 1997 MCPD List. * FAO World Information and Early WarningSystem ( WIEWS) [] * 19 Plant Uses Categories based on categories developed for the Working Group on Taxonomic Databases (TDWG) (Cook, Frances E.M., 1995. Economic Botany: Data Collection Standard. Royal Botanic Gardens Kew). [] * The mapping of MCPD to ABCD was started in 2004 by Helmut Knüpffer and Walter Berendsohn, and continued by Javier de la Torre and Dag Terje Filip Endresen in 2005. [] [ ]
  • * Illustration: Corn earworm pupae that will be used to produce control parasites for release in the field. Photo by Scott Bauer. [] * UBIF is an attempt to define a common foundation for several TDWG/GBIF standards like SDD (see SDD WIKI), ABCD (see ABCD content schema homepage) or TaxonConceptNames (see Taxonomic Concept Transfer Schema WIKI). * Unified Biosciences Information Frameword (UBIF) XML schema for data exchange and integration across knowledge domains. The schema has been design for biological data, but is applicable to other knowledge areas as well. It is based on work of the TDWG SDD and ABCD subgroups and currently jointly authored by the SDD, ABCD, TaxonName subgroups and by GBIF (Global Biodiversity Information Facility). The framework may be used without changes for new schemata, no registration is necessary. * Complex Types are part of the UBIF infrastructure (TDWG common complex type for several schemas, ABCD, SDD, TCS, Lnnean Core, etc.)
  • * The mapping of MCPD to ABCD was started in 2004 by Helmut Knüpffer and Walter Berendsohn, and continued by Javier de la Torre and Dag Terje Filip Endresen in 2005. [] [ ]
  • GCP_Passport v 1.03 []
  • * Demo Data Portal [] The work on the demo portal has been replaced by routines to harvest and index remote data. The live remote access proved to slow and unreliable. See the Germplasm Clearing House Mechanism for more info [].
  • * Illustration: Tapir - © 1999-2005 (Licence “Feel free to use Barrys Clipart Server content in personal/ non profit projects to create webpages…”) [] Not Quality counts: Chemist Gary List checks soybeans. Photo by Keith Weller. []
  • Photo: PICT0173.jpg Sub-section from Whale Safari to Kaikoura New Zealand. Photo Dag Terje Filip Endresen []
  • []
  • BioCASE development is coordinated by the Botanischer Garten und Botanisches Museum Berlin-Dahlem – BGBM.
  • <?xml version='1.0' encoding='UTF-8'?> <request> <header /> <inventory count='true' start='0' limit='40' xmlns:singer='' > <concepts> <concept path='singer:/sourcename'/> <concept path='singer:/taxonomy/genus' /> <concept path='singer:/taxonomy/species' /> <concept path='singer:/taxonomy/subspecies' /> <concept path='singer:/holding/ID' /> <concept path='singer:/holding/name' /> <concept path='singer:/origin/collecting/countrysource' /> <concept path='singer:/origin/collecting/countrysourceID' /> <concept path='singer:/status/biologicalstatus' /> <concept path='singer:/status/biologicalstatusID' /> </concepts> <filter> <like> <concept path='singer:/taxonomy/genus' /> <literal value='cice*' /> </like> </filter> </inventory> </request>
  • Slide by Samy Gaiji, from presentation on: “ Information Networking - Challenges for the Plant Genetic Resources Communities, 2004.
  • Slide by Samy Gaiji, from presentation on: “ Information Networking - Challenges for the Plant Genetic Resources Communities, 2004.
  • Photo (top) IRRI genebank. Los Banos, Philippines [] Photo (below) CIP genebank. Lima, Peru []
  • BioCASE web services for germplasm data sets, at FAO, Rome (2006)

    1. 1. Sharing of biodiversity data with Web Services Demonstration of BioCASE
    2. 2. TOPICS <ul><li>Biodiversity data </li></ul><ul><li>Data Standards </li></ul><ul><li>Data exchange tools </li></ul><ul><li>The BioCASE data provider software </li></ul><ul><li>Decentralized data network </li></ul>
    3. 3. Biodiversity collections data <ul><li>Different Biodiversity collections data describe very similar data objects. </li></ul><ul><li>Preserved reference collections , such as those in museums and herbaria. </li></ul><ul><li>Living collections, like botanical and zoological gardens, aquaria, seed banks , microbial strain cultures and tissue collections. </li></ul><ul><li>Data collections , from surveys of objects in the field, such as observations. </li></ul><ul><li>These collections have most of their attributes in common , although the terminology used to describe them may differ substantially . </li></ul>[]
    4. 4. Germplasm data, seed genebanks <ul><li>Germplasm genebanks are biodiversity collections. </li></ul><ul><li>Collection level data </li></ul><ul><li>Metadata about genebank institutes and the germplasm collections they hold. </li></ul><ul><li>Unit level data </li></ul><ul><li>The unit level data for germplasm collections are the accessions. Genebank accessions have most of the same properties and attributes as other biodiversity specimens. </li></ul>
    5. 5. <ul><li>Data Standards </li></ul>
    6. 6. Crop Descriptors <ul><li>The IPGRI crop descriptors (as well as other networks) is developed to meet specific needs for these crops. </li></ul><ul><li>The MCPD is designed to be compatible with the IPGRI crop specific descriptor lists and the FAO World Information and Early Warning System ( WIEWS ). </li></ul><ul><li>The MCPD descriptor list is compatible with ABCD (2.06). </li></ul>
    7. 7. Taxonomic Database Working Group Standards development and maintenance <ul><li>Darwin Core 2 - Element definitions designed to support the sharing and integration of primary biodiversity data&quot;. [] </li></ul><ul><li>Access to Biological Collection Data (ABCD) 2.06 - An evolving comprehensive standard for the access to and exchange of data about specimens and observations (a.k.a. primary biodiversity data)“. </li></ul><ul><li>[] </li></ul>
    8. 8. ABCD A ccess to B iological C ollection D ata <ul><li>ABCD is a common data specification for data on biological specimens and observations (including the plant genetic resources seed banks). </li></ul><ul><li>The design goal is to be both comprehensive and general (about 1200 elements). </li></ul><ul><li>Development of the ABCD started after the 2000 meeting of the TDWG. </li></ul><ul><li>ABCD was developed with support from TDWG/CODATA , ENHSIN, BioCASE, and GBIF. </li></ul><ul><li>The MCPD descriptor list is now completely mapped and compatible to ABCD 2.06 </li></ul><ul><li>[] </li></ul>
    9. 9. PGR sub-unit of ABCD <ul><li>PGR </li></ul>
    10. 10. Generation Challenge Program GCP_Passport_1.03 <ul><li>In the context of the GCP (Generation Challenge Program), the GCP Passport data exchange schema was developed. </li></ul><ul><li>Similar XML schema are under development for Phenotype (trait data) and Genotype . </li></ul>
    11. 11. Demo Data Portal <ul><li>A demo data portal was developed, providing live access to selected BioCASE data providers. </li></ul>[]
    12. 12. Create your own BioCASE data schema <ul><li>Create an XML schema (xsd file) of your data model and copy the schema online (http://...) </li></ul><ul><li>Create a Concept Mapping Configuration (CMF) file from the XML schema. </li></ul><ul><li>[] </li></ul><ul><li>(or use your own BioCASE installation ... /utilities/process_schema.html) </li></ul><ul><li>Save the result XML (CMF file) into your BioCASE installation cmf folder to make it available for local mapping. </li></ul><ul><li>.../biocase/configuration/templates/cmf/cmf_your-preferred-file-name.xml </li></ul><ul><li>Visit : [] for more info! </li></ul>
    13. 13. <ul><li>Biodiversity informatics data exchange tools </li></ul>
    14. 14. Data Provider Software <ul><li>Distributed network of data providers retrieving structured data from multiple, distributed, heterogeneous databases across the Internet. </li></ul><ul><li>DiGIR , Di stributed G eneric I nformation R etrieval. [] </li></ul><ul><li>BioCASE , The Biological Collection Access Service for Europe. </li></ul><ul><li>[] </li></ul>
    15. 15. Protocol integration - TAPIR <ul><li>There is a need to integrate the current protocols in use by different biodiversity informatics community networks. </li></ul><ul><li>During the TDWG meeting in 2004, the unified protocol was presented and named TAPIR . The T DWG A ccess P rotocol for I nformation R etrieval. </li></ul><ul><li>New BioCASE and DiGIR software will implement the TAPIR protocol. </li></ul><ul><li>Will TAPIR also help us to integrate GBIF with the BioMOBY community? </li></ul><ul><li>[] </li></ul>
    16. 16. BioMOBY <ul><li>BioMOBY is an international research project on methodologies for biological data representation, distribution, and discovery. </li></ul><ul><li>BioMOBY is chosen as the web service framework for the Generation Challenge Program </li></ul><ul><li>[] </li></ul><ul><li>Work is in progress to develop BioMOBY and BioCASE interoperability. </li></ul>
    17. 17. <ul><li>BioCASE data provider software </li></ul>BioCASE Bio logical C ollection A ccess for E urope []
    18. 18. BioCASE Biological Collection Access for Europe <ul><li>BioCASE establish web-based unified access to biological collections in Europe while leaving control of the information with the collection holders. </li></ul><ul><li>ABCD is the main data definition used by BioCASE. </li></ul><ul><li>Designed generic to handle any schema and connect to any SQL capable database. </li></ul><ul><li>BioCASE provide full access to its registry for GBIF . Being a BioCASE provider thus means being a GBIF provider. </li></ul><ul><li>[] </li></ul>
    19. 19. BioCASE [] <ul><li>BioCASE runs on MS Windows, Mac OS X, Linux, BSD, Solaris... </li></ul><ul><li>BioCASE works with many different databases , PostgreSQL, MySQL, Oracle, MS Access, MS SQL Server.... </li></ul><ul><li>BioCASE works with UNICODE </li></ul><ul><li> ضاإطقكغب שּׁשׁﭻﭗﭼﱠ אָבּדּוּ </li></ul><ul><li>BioCASE is OpenSource </li></ul><ul><li>BioCASE is developed in the Python programming language </li></ul>CVS
    20. 20. Distributed BioCASE network
    21. 21. BioCASE protocol stack 
    22. 22. BioCASE Provider Software v 2.3.1 <ul><li>Required configuration: </li></ul><ul><li>Web server : Any CGI compliant web server: Apache, IIS, etc. </li></ul><ul><li>Database : major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work. </li></ul><ul><li>Python (BioCASE is developed in the Python programming language. Install version 2.3 or later) </li></ul><ul><li>[] </li></ul><ul><li>[] </li></ul>
    23. 23. BioCASE installation <ul><li>Download the provider software and unzip the archive file [provider_software_2.3.1.tar.gz] </li></ul><ul><li>For example uncompress it into [C:iocase] </li></ul><ul><li>Configure your web server to publish the www folder. Example [C:iocase] to be accessible trough [http://localhost/biocase/] </li></ul><ul><li>Download and install the latest Python software [] </li></ul><ul><li>Execute the [] script. </li></ul><ul><li> For a UNIX like system: %> cd biocase </li></ul><ul><li>%> python </li></ul><ul><li>Test your installation [http://localhost/biocase] </li></ul>[]
    24. 24. BioCASE Install third party software [ http://localhost/biocase/utilities/testlibs.cgi ] Follow the links from the Library test page. The column for installed version will display the installed version after successful installation. <ul><li>To update the BioCASE software: </li></ul><ul><li>Download the new release. </li></ul><ul><li>Unzip to a temporary folder. </li></ul><ul><li>Execute the and follow the instructions. </li></ul>
    25. 25. BioCASE configuration <ul><li>After successful installation you will need to configure your data provider. Follow the instructions from the BioCASE documentation to configure </li></ul><ul><li>Data sources . If you provide more datasets or several databases they will be configured as individual data sources. </li></ul><ul><li>Database connection . So the software can access your database. </li></ul><ul><li>Database structure . Define the relevant tables, the primary keys and foreign keys. </li></ul><ul><li>Data model . Map your database model to the standard represented by the XML Schemas you choose. </li></ul>[]
    26. 26. Example of a service request <ul><li>All exchanged data is formatted with XML tags. </li></ul>
    27. 27. Example of a service response
    28. 28. TAPIR <ul><li>TAPIR will offer you more advanced request formats. </li></ul>
    29. 29. TAPIR service request <ul><li>TAPIR will offer you more advanced request formats. </li></ul>
    30. 30. TAPIR service response singer:/sourcename singer:/taxonomy/genus singer:/taxonomy/species singer:/taxonomy/subspecies singer:/holding/ID singer:/holding/name singer:/origin/collecting/countrysource singer:/origin/collecting/countrysourceID singer:/status/biologicalstatus singer:/status/biologicalstatusID ...
    31. 31. <ul><li>Decentralized data network with web services </li></ul>
    32. 32. Data warehouse model (Slide by Samy Gaiji, IPGRI)
    33. 33. Decentralized model (Slide by Samy Gaiji, IPGRI)
    34. 34. Data flow from genebanks to EURISCO and ECCDBs
    35. 35. Decentralized model
    36. 36. Genebanks on BioCASE <ul><li>The BioCASE data provider software has been implemented at (almost) all the CGIAR germplasm centers during the autumn of 2005. </li></ul><ul><li>Several other genebanks have installed the GBIF web service technology. Nordic Gene Bank, IPK Gatersleben, IHAR (DiGIR), USDA GRIN, CGN, more to follow soon... </li></ul>
    37. 37. Germplasm data indexing tools <ul><li>We are building data indexing methodologies for access to germplasm data with BioCASE. </li></ul><ul><li>This is planned to build a Germplasm Clearing House Mechanism. </li></ul><ul><li>Development in cooperation with GBIF, which themselves index basic biodiversity data from a similar approach. </li></ul><ul><li>[] </li></ul>
    38. 38. BioCASE and germplasm data []
    39. 39. <ul><li>Global Unique Identifiers, GUID ( LSID , Life Science Identifiers) [] </li></ul><ul><li>Biodiversity informatics workflow tools (BioMOBY and Taverna, Kepler and SEEK...) </li></ul><ul><li>Germplasm Clearing House Mechanism [] </li></ul><ul><li>TAPIR </li></ul>Works in progress
    40. 40. Thank you for listening!