Data exchange alternatives, SBIS conference in Stockholm (2008)


Published on

Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Image source: University of Ottawa, Distributed Computing Research Group: (Google Images).See also:
  • IMAGE: [Creative Commons License:]
  • More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:
  • More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:
  • Apache HTTPD server: database server: programming language:
  • In some cases with proxies:svn co svn:// pywrapper
  • More details see:GBIFNODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:
  • IMAGE source:; Copyright: GNU Public Licence
  • Photographer Dag Terje Endresen (NordGen Picture Archive, image 002986)
  • Data exchange alternatives, SBIS conference in Stockholm (2008)

    1. 1. Biodiversity Data Provider Software<br />Hands-on exercises with TAPIR<br />Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008)<br />Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)<br />
    2. 2. Fallacies of Distributed Computing<br />The network is reliable.<br />Latency is zero.<br />Bandwidth is infinite.<br />The network is secure.<br />Topology doesn&apos;t change.<br />There is one administrator.<br />Transport cost is zero.<br />The network is homogeneous.<br />This list of fallacies came about at Sun Microsystems around 1994.<br />2<br />
    3. 3. TAPIR<br />3<br />Cartoon by Sasha Kopf (Creative Commons)<br />
    4. 4. TAPIR<br />TAPIR - TDWG Access Protocol for Information Retrieval.<br />During the 2004 TDWG meeting in Christchurch, NZ, work started on a unified protocol and named TAPIR.<br />TAPIR is based on the protocol from the two data provider software, BioCASE and DiGIR. <br />4<br />
    5. 5. Provider software, wrappers<br />DiGIR (2002, not active)<br /><br />BioCASE (2003, PyWrapper)<br /><br />PyWrapper3 (2006, not active)<br /><br />TapirLink (2007)<br /><br />GBIF Provider Toolkit (2009)<br /><br />5<br />
    6. 6. BioCASE 2.5.ORC<br />6<br />The BioCASE provider software is a product of the EU funded BioCASE project (2001-2004). <br />Developed at BGBM in Berlin. <br />Last updated in April 2008, with support for Python version 2.5 and less required external<br />Implement the BioCASE provider to share data as ABCD 2.06.<br /><br />
    7. 7. 1. Make sure you have Python 2.5 installed <br />(command line: python –v)<br />2.Download the latest provider software from<br /><br />3. Uncompress the BioCASE provider software to a folder on your system [provider_software_2.5.0RC.tar.gz]<br /> (tar –xzvf provider_...tar.gz)<br />4. Run, (python<br />5. Configure your web server to mount biocase/www as<br />http://localhost/biocase/<br />Hint: You will find an example for httpd.conf as the last terminal output from running<br />7<br />BioCASE 2.5.ORC<br />
    8. 8. BioCASE 2.5.ORC<br />6. Visit the library test page: <br />http://localhost/biocase/utilities/testlibs.cgi<br />6a. Download latest 4 Suite <br />from<br />Uncompress and install <br />[4Suite-XML-1.0.2.tar.bz2]<br />6b. Install additional python <br />libraries, including the desired <br />database driver. For each <br />python package: <br />(python install)<br />6c. Graphviz is useful to <br />visualize the database<br />table structure.<br />8<br />
    9. 9. BioCASE 2.5.ORC<br />7. Configuration<br /><ul><li> Add datasource (dsa)
    10. 10. Database connection
    11. 11. Database table structure
    12. 12. Mapping of data model to standard schema</li></ul>9<br />
    13. 13. BioCASE 2.5.ORC<br />8. Query Form<br />The manual query form is illustrative for understanding exactly how the wrapper software works! http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto<br />10<br />
    14. 14. 11<br />
    15. 15. PyWrapper3<br />Home:<br />Primary developers: Markus Döring, Javier de la Torre<br />14/07/2008 - Development stalled<br />We are sorry to inform you that development of the TAPIR branch of PyWrapper has been stalled. The latest 3.1 alpha version is not stable and not recommended for production! (Message from the home page)<br /><ul><li>PyWrapper 3.0.0 (Latest stable version, requires Python 2.4)
    16. 16. PyWrapper 3.1.0 alpha (development version, works with Python 2.5)</li></ul>PyWrapper is tested and verified to work fine with Windows, Mac OS X and Linux.<br />12<br />
    17. 17. Required configuration<br />Web server: Any CGI compliant web server: Apache, IIS etc. (The built in CherryPy web server can also be used).<br />Database: Major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work. <br />Python PyWrapper is developed with the Python programming language. (The latest version from the SVN code repository works with Python version 2.5)<br />13<br />Apache, MySQL and Python are open source software, free to use - even for commercial products. <br />
    18. 18. Installation<br /><br />1. Download the latest PyWrapper3 installer.<br /> Use SVN export or checkout for Python 2.5 support<br />2. Uncompress to a folder of your choice.<br /> Example: “/usr/local/pywrapper3/”<br /> Example: “C:pywrapper”<br />Local installation: If you have a Subversion client installed, you may use the automatic installer. (Local Python and libraries are installed to your pywrapper folder)<br />promt$ svn export svn:// pywrapper<br />promt$ cd pywrapper/tools<br />promt$ /bin/<br />This will require that you have a bash shell, and probably that you have a Unix line system like e.g. FreeBSD, Linux or Mac OS X…<br />3. Execute: pywrapper/<br /> Example: promt$ python (Mac OS X, Linux)<br /> On Windows locate and double-click<br />14<br />
    19. 19. Start standalone server<br />Execute port is 8080)<br />promt$ cdwebapp/ <br />promt$ ./ 8088(example to start on port 8088)<br />On a Windows system you may do this in a MS-DOS window (or double-click the file - if you accept the default port).<br />Some messages will pass across your screen. Please be patient, this could take a minute. Wait for the message “start server …” and find find PyWrapper at:<br />http://localhost:8088/pywrapper<br />15<br />
    20. 20. Configuration<br />After successful installation, you will need to configure your data provider. Follow the instructions from the PyWrapper documentation web page to configure.<br />Data sources. If you provide more datasets or several databases they will be configured as individual data sources (dsa).<br />Database connection. For PyWrapper to access your database.<br />Database structure. Define the relevant database tables, the primary keys and foreign keys.<br />Data model. Map your database model to the standard represented by the XML Schemas you choose.<br /><br />16<br />
    21. 21. Screen examples<br />PyWrapper comes with a graphical web based configuration tool<br />For more information and more screen dumps from the configuration of PyWrapper, see:<br /><br />17<br />
    22. 22. TapirLink 0.6.1<br />18<br />
    23. 23. TapirLink 0.6.1<br />Home:<br />Primary developers: Renato De Giovanni, Dave Vieglais<br />Download:<br />Uncompress PHP source code<br />Eg: /usr/local/tapir/tapirlink<br />Mount admin and www directory for your web server.<br />Example: Apache “httpd.conf”<br />Alias /tapirlink &quot;/usr/local/tapir/tapirlink/www”<br />Alias /tapirlink-admin &quot;/usr/local/tapir/tapirlink/admin&quot; <br />&lt;Location /tapirlink&gt;<br /> Order allow,deny<br /> Allow from all<br />&lt;/Location&gt; <br />&lt;Location /tapirlink-admin&gt;<br /> Order allow,deny<br /> Allow from all<br />&lt;/Location&gt; <br />Read permissions on all directories<br />Write on cache, config, log, statistics<br />19<br />
    24. 24. TapirLink 0.6.1<br />Start by adding a new resource http://localhost/tapirlink-admin/<br />Step 1: Describe your new resource<br />20<br />
    25. 25. TapirLink 0.6.1<br />Step 2: Data source, database connection<br />Step 3: Table structure<br />21<br />
    26. 26. TapirLink 0.6.1<br />Step 4: Filter<br />Step 5: Select mapping standards to use<br />22<br />
    27. 27. TapirLink 0.6.1<br />Step 5b: Mapping data schema (ABCD 2.06)<br />etc…<br />23<br />
    28. 28. TapirLink 0.6.1<br />Step 5c: Mapping data concepts (Darwin Core)<br />etc…<br />Step 5d: Remember that DwC have an extension for geospatial descriptors<br />etc…<br />24<br />
    29. 29. TapirLink 0.6.1<br />Step 6: Settings<br />New resource successfully configured<br />25<br />
    30. 30. TapirLink 0.6.1<br />Test resource with client form:<br />http://localhost/tapirlink/tapir_client.php<br />The XML Client form is very illustrative for understanding exactly how the wrapper software works! <br />26<br />
    31. 31. Service interface<br />27<br />
    32. 32. Example of a service request<br /> All exchanged data is formatted with XML tags.<br />28<br />
    33. 33. Example of a service response<br />29<br />...<br />
    34. 34. Example TAPIR service request<br />30<br />
    35. 35. Example TAPIR service response<br />singer:/sourcename<br />singer:/taxonomy/genus<br />singer:/taxonomy/species<br />singer:/taxonomy/subspecies<br />singer:/holding/ID<br />singer:/holding/name<br />singer:/origin/collecting/countrysource<br />singer:/origin/collecting/countrysourceID<br />singer:/status/biologicalstatus<br />singer:/status/biologicalstatusID<br />...<br />31<br />
    36. 36. Example TAPIR service SEARCH request<br />32<br />
    37. 37. Example TAPIR service Search response<br />33<br />
    38. 38. Example of OAI-PMH service request<br />OAI-PMH requests are expressed as HTTP requests.<br />OAI-PMH requests must be submitted using either the HTTP GET or POST methods.<br /><br />&<br />&metadataPrefix=oai_dc<br />34<br />
    39. 39. Example of OAI-PMH service RESPONSE<br />OAI-PMH responses are formatted as HTTP responses.<br />With The Content-Type as text/xml.<br />35<br />
    40. 40. OAI-PMH PROTOCOL, metadata formats<br />36<br />Request types (verb):<br />Identify<br />ListMetadataFormats<br />ListSets<br />GetRecord<br />ListIdentifiers<br />ListRecords<br />For purposes of interoperability, the metadataPrefix `oai_dc’ is reserved for Dublin Core.<br />Communities adopt own metadataPrefixesfor own metadata fomats. Relevant formats/schemas for Biodiversity Informatics are Darwin Core and ABCD.<br />
    41. 41. Why sharing data?<br />37<br />
    42. 42. []<br />38<br />[]<br />
    43. 43. GBIF PGR Network 2<br />[]<br />39<br />
    44. 44. 40<br />Distributed network<br />The image is from the BioCASE web site<br />
    45. 45. 41<br />Decentralized network<br />GBIF<br />(Global Biodiversity<br />Information Facility)<br />ALIS<br />(Accession Level <br />Information System)<br />USER<br />Svalbard Global Seed Vault<br />(Safe Backup)<br />(USDA ARS<br />National <br />Germplasm <br />Repositories...)<br />Web <br />Services<br />USDA GRIN (USA)<br />SINGER<br />(CGIAR)<br />(CGIAR<br />International<br />Future Harvest <br />gene banks...)<br />EURISCO<br />(Europe)<br />MCPD<br />IHAR<br />(Poland)<br />WUR CGN<br />(Netherlands)<br />NordGen<br />(Northern Europe)<br />IPK Gatersleben<br />(Germany)<br />(Other European <br />gene banks...)<br />
    46. 46. 42<br />Crop Wild Relatives<br />LKA<br />ARM<br />BOL<br />National Datasets<br />are shared with <br />the central <br />CWR data index.<br />The national <br />datasets as well as <br />access to other International <br />datasets are provided from the CWR data portal.<br />MDG<br />EURISCO<br />UZB<br /><br />SINGER<br />
    47. 47. Data portal example<br />43<br />
    48. 48. 44<br /><br />
    49. 49. 45<br />
    50. 50. 46<br />
    51. 51. 47<br />
    52. 52. 48<br />
    53. 53. 49<br />Outlook<br />The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community.<br />Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF&apos;s work.<br />Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections.<br />The establishment of new data portals on a specific crop, a regional thematic network or similar subset of the total global biodiversity datasets; can be done with rather few efforts! This requires only that all the relevant datasets are provided by GBIF compatible web services (like the BioCASE PyWrapper).<br />
    54. 54. Participation and the sharing of your institute datasets with global and national biodiversity projects<br />is important for your public and scientific visibility,<br />promoting the use (usefulness) of your data<br />and ultimately for the continued funding of your institutional activities. <br />50<br />
    55. 55. Special thanks to<br />Bioversity International<br /> []<br />GBIF, Global Biodiversity Information Facility []<br />BioCASE, The Biological Collection Access Service for Europe.<br /> []<br />TDWG, Biodiversity Information Standards []<br />51<br />
    56. 56. Special thanks to<br />BioCASE and PyWrapper3 software<br />Markus Döring<br />Javier de la Torre<br />DiGIR and TapirLink software<br />Renato de Giovanni<br />Dave Vieglais<br />52<br />
    57. 57. Thank you for listening!<br />53<br />