Your SlideShare is downloading. ×
Data exchange alternatives, SBIS conference in Stockholm (2008)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data exchange alternatives, SBIS conference in Stockholm (2008)

1,094
views

Published on

Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).

Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,094
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Image source: University of Ottawa, Distributed Computing Research Group: http://www.genie.uottawa.ca/research/rsrch_site.php?lang=e&id=90 (Google Images).See also: http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
  • IMAGE: http://blog.tapirtype.com/cartoons/ [Creative Commons License: http://creativecommons.org/licenses/by-nc-sa/3.0/us/]
  • http://www.tdwg.org
  • More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
  • More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
  • Apache HTTPD server:http://www.apache.org/MySQL database server:http://www.mysql.com/Python programming language: http://www.python.org
  • In some cases with proxies:svn co svn://svn.pywrapper.org/pywrapper/trunk pywrapper
  • More details see:GBIFNODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
  • IMAGE source: http://commons.wikimedia.org/wiki/Image:Handshake_(Workshop_Cologne_%2706).jpeg; Copyright: GNU Public Licence
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • Photographer Dag Terje Endresen (NordGen Picture Archive, image 002986)http://www.nordgen.org/sesto/index.php?scp=ngb&thm=pictures&mod=det&id=002986&img_size=768x512
  • Transcript

    • 1. Biodiversity Data Provider Software
      Hands-on exercises with TAPIR
      Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008)
      Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)
    • 2. Fallacies of Distributed Computing
      The network is reliable.
      Latency is zero.
      Bandwidth is infinite.
      The network is secure.
      Topology doesn't change.
      There is one administrator.
      Transport cost is zero.
      The network is homogeneous.
      This list of fallacies came about at Sun Microsystems around 1994.
      2
    • 3. TAPIR
      3
      Cartoon by Sasha Kopf (Creative Commons)
    • 4. TAPIR
      TAPIR - TDWG Access Protocol for Information Retrieval.
      During the 2004 TDWG meeting in Christchurch, NZ, work started on a unified protocol and named TAPIR.
      TAPIR is based on the protocol from the two data provider software, BioCASE and DiGIR.
      4
    • 5. Provider software, wrappers
      DiGIR (2002, not active)
      http://digir.sourceforge.net
      BioCASE (2003, PyWrapper)
      http://www.biocase.org
      PyWrapper3 (2006, not active)
      http://trac.pywrapper.org/
      TapirLink (2007)
      http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink
      GBIF Provider Toolkit (2009)
      http://code.google.com/p/gbif-providertoolkit
      5
    • 6. BioCASE 2.5.ORC
      6
      The BioCASE provider software is a product of the EU funded BioCASE project (2001-2004).
      Developed at BGBM in Berlin.
      Last updated in April 2008, with support for Python version 2.5 and less required external
      Implement the BioCASE provider to share data as ABCD 2.06.
      http://www.biocase.org
    • 7. 1. Make sure you have Python 2.5 installed
      (command line: python –v)
      2.Download the latest provider software from
      http://www.biocase.org
      3. Uncompress the BioCASE provider software to a folder on your system [provider_software_2.5.0RC.tar.gz]
      (tar –xzvf provider_...tar.gz)
      4. Run setup.py, (python setup.py)
      5. Configure your web server to mount biocase/www as
      http://localhost/biocase/
      Hint: You will find an example for httpd.conf as the last terminal output from running setup.py
      7
      BioCASE 2.5.ORC
    • 8. BioCASE 2.5.ORC
      6. Visit the library test page:
      http://localhost/biocase/utilities/testlibs.cgi
      6a. Download latest 4 Suite
      from http://4suite.org/
      Uncompress and install
      [4Suite-XML-1.0.2.tar.bz2]
      6b. Install additional python
      libraries, including the desired
      database driver. For each
      python package:
      (python setup.py install)
      6c. Graphviz is useful to
      visualize the database
      table structure.
      8
    • 9. BioCASE 2.5.ORC
      7. Configuration
      • Add datasource (dsa)
      • 10. Database connection
      • 11. Database table structure
      • 12. Mapping of data model to standard schema
      9
    • 13. BioCASE 2.5.ORC
      8. Query Form
      The manual query form is illustrative for understanding exactly how the wrapper software works! http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto
      10
    • 14. 11
    • 15. PyWrapper3
      Home:http://trac.pywrapper.org/
      Primary developers: Markus Döring, Javier de la Torre
      14/07/2008 - Development stalled
      We are sorry to inform you that development of the TAPIR branch of PyWrapper has been stalled. The latest 3.1 alpha version is not stable and not recommended for production! (Message from the home page)
      • PyWrapper 3.0.0 (Latest stable version, requires Python 2.4)
      • 16. PyWrapper 3.1.0 alpha (development version, works with Python 2.5)
      PyWrapper is tested and verified to work fine with Windows, Mac OS X and Linux.
      12
    • 17. Required configuration
      Web server: Any CGI compliant web server: Apache, IIS etc. (The built in CherryPy web server can also be used).
      Database: Major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work.
      Python PyWrapper is developed with the Python programming language. (The latest version from the SVN code repository works with Python version 2.5)
      13
      Apache, MySQL and Python are open source software, free to use - even for commercial products.
    • 18. Installation
      http://trac.pywrapper.org/pywrapper/wiki/InstallationGuide
      1. Download the latest PyWrapper3 installer.
      Use SVN export or checkout for Python 2.5 support
      2. Uncompress to a folder of your choice.
      Example: “/usr/local/pywrapper3/”
      Example: “C:pywrapper”
      Local installation: If you have a Subversion client installed, you may use the automatic installer. (Local Python and libraries are installed to your pywrapper folder)
      promt$ svn export svn://svn.pywrapper.org:80/pywrapper/trunk pywrapper
      promt$ cd pywrapper/tools
      promt$ /bin/shinstall.sh
      This will require that you have a bash shell, and probably that you have a Unix line system like e.g. FreeBSD, Linux or Mac OS X…
      3. Execute: pywrapper/setup.py
      Example: promt$ python setup.py (Mac OS X, Linux)
      On Windows locate setup.py and double-click
      14
    • 19. Start standalone server
      Execute start_server.py(default port is 8080)
      promt$ cdwebapp/
      promt$ ./start_server.py 8088(example to start on port 8088)
      On a Windows system you may do this in a MS-DOS window (or double-click the file - if you accept the default port).
      Some messages will pass across your screen. Please be patient, this could take a minute. Wait for the message “start server …” and find find PyWrapper at:
      http://localhost:8088/pywrapper
      15
    • 20. Configuration
      After successful installation, you will need to configure your data provider. Follow the instructions from the PyWrapper documentation web page to configure.
      Data sources. If you provide more datasets or several databases they will be configured as individual data sources (dsa).
      Database connection. For PyWrapper to access your database.
      Database structure. Define the relevant database tables, the primary keys and foreign keys.
      Data model. Map your database model to the standard represented by the XML Schemas you choose.
      http://trac.pywrapper.org/pywrapper/wiki/Documentation
      16
    • 21. Screen examples
      PyWrapper comes with a graphical web based configuration tool
      For more information and more screen dumps from the configuration of PyWrapper, see:
      http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
      17
    • 22. TapirLink 0.6.1
      18
    • 23. TapirLink 0.6.1
      Home: http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink
      Primary developers: Renato De Giovanni, Dave Vieglais
      Download: http://sourceforge.net/project/showfiles.php?group_id=38190
      Uncompress PHP source code
      Eg: /usr/local/tapir/tapirlink
      Mount admin and www directory for your web server.
      Example: Apache “httpd.conf”
      Alias /tapirlink "/usr/local/tapir/tapirlink/www”
      Alias /tapirlink-admin "/usr/local/tapir/tapirlink/admin"
      <Location /tapirlink>
      Order allow,deny
      Allow from all
      </Location>
      <Location /tapirlink-admin>
      Order allow,deny
      Allow from all
      </Location>
      Read permissions on all directories
      Write on cache, config, log, statistics
      19
    • 24. TapirLink 0.6.1
      Start by adding a new resource http://localhost/tapirlink-admin/
      Step 1: Describe your new resource
      20
    • 25. TapirLink 0.6.1
      Step 2: Data source, database connection
      Step 3: Table structure
      21
    • 26. TapirLink 0.6.1
      Step 4: Filter
      Step 5: Select mapping standards to use
      22
    • 27. TapirLink 0.6.1
      Step 5b: Mapping data schema (ABCD 2.06)
      etc…
      23
    • 28. TapirLink 0.6.1
      Step 5c: Mapping data concepts (Darwin Core)
      etc…
      Step 5d: Remember that DwC have an extension for geospatial descriptors
      etc…
      24
    • 29. TapirLink 0.6.1
      Step 6: Settings
      New resource successfully configured
      25
    • 30. TapirLink 0.6.1
      Test resource with client form:
      http://localhost/tapirlink/tapir_client.php
      The XML Client form is very illustrative for understanding exactly how the wrapper software works!
      26
    • 31. Service interface
      27
    • 32. Example of a service request
      All exchanged data is formatted with XML tags.
      28
    • 33. Example of a service response
      29
      ...
    • 34. Example TAPIR service request
      30
    • 35. Example TAPIR service response
      singer:/sourcename
      singer:/taxonomy/genus
      singer:/taxonomy/species
      singer:/taxonomy/subspecies
      singer:/holding/ID
      singer:/holding/name
      singer:/origin/collecting/countrysource
      singer:/origin/collecting/countrysourceID
      singer:/status/biologicalstatus
      singer:/status/biologicalstatusID
      ...
      31
    • 36. Example TAPIR service SEARCH request
      32
    • 37. Example TAPIR service Search response
      33
    • 38. Example of OAI-PMH service request
      OAI-PMH requests are expressed as HTTP requests.
      OAI-PMH requests must be submitted using either the HTTP GET or POST methods.
      http://an.oa.org/OAI-script?verb=GetRecord
      &identifier=oai:arXiv.org:hep-th/9901001
      &metadataPrefix=oai_dc
      34
    • 39. Example of OAI-PMH service RESPONSE
      OAI-PMH responses are formatted as HTTP responses.
      With The Content-Type as text/xml.
      35
    • 40. OAI-PMH PROTOCOL, metadata formats
      36
      Request types (verb):
      Identify
      ListMetadataFormats
      ListSets
      GetRecord
      ListIdentifiers
      ListRecords
      For purposes of interoperability, the metadataPrefix `oai_dc’ is reserved for Dublin Core.
      Communities adopt own metadataPrefixesfor own metadata fomats. Relevant formats/schemas for Biodiversity Informatics are Darwin Core and ABCD.
    • 41. Why sharing data?
      37
    • 42. [http://data.gbif.org/]
      38
      [http://data.gbif.org/datasets/resource/1487]
    • 43. GBIF PGR Network 2
      [http://data.gbif.org/datasets/network/2]
      39
    • 44. 40
      Distributed network
      The image is from the BioCASE web site
    • 45. 41
      Decentralized network
      GBIF
      (Global Biodiversity
      Information Facility)
      ALIS
      (Accession Level
      Information System)
      USER
      Svalbard Global Seed Vault
      (Safe Backup)
      (USDA ARS
      National
      Germplasm
      Repositories...)
      Web
      Services
      USDA GRIN (USA)
      SINGER
      (CGIAR)
      (CGIAR
      International
      Future Harvest
      gene banks...)
      EURISCO
      (Europe)
      MCPD
      IHAR
      (Poland)
      WUR CGN
      (Netherlands)
      NordGen
      (Northern Europe)
      IPK Gatersleben
      (Germany)
      (Other European
      gene banks...)
    • 46. 42
      Crop Wild Relatives
      LKA
      ARM
      BOL
      National Datasets
      are shared with
      the central
      CWR data index.
      The national
      datasets as well as
      access to other International
      datasets are provided from the CWR data portal.
      MDG
      EURISCO
      UZB
      http://www.cropwildrelatives.org
      SINGER
    • 47. Data portal example
      43
    • 48. 44
      http://wwwdev.ngb.se/portal/index.php?scope=demo
    • 49. 45
    • 50. 46
    • 51. 47
    • 52. 48
    • 53. 49
      Outlook
      The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community.
      Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work.
      Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections.
      The establishment of new data portals on a specific crop, a regional thematic network or similar subset of the total global biodiversity datasets; can be done with rather few efforts! This requires only that all the relevant datasets are provided by GBIF compatible web services (like the BioCASE PyWrapper).
    • 54. Participation and the sharing of your institute datasets with global and national biodiversity projects
      is important for your public and scientific visibility,
      promoting the use (usefulness) of your data
      and ultimately for the continued funding of your institutional activities.
      50
    • 55. Special thanks to
      Bioversity International
      [http://www.bioversityinternational.org]
      GBIF, Global Biodiversity Information Facility [http://www.gbif.org]
      BioCASE, The Biological Collection Access Service for Europe.
      [http://www.biocase.org]
      TDWG, Biodiversity Information Standards [http://www.tdwg.org]
      51
    • 56. Special thanks to
      BioCASE and PyWrapper3 software
      Markus Döring
      Javier de la Torre
      DiGIR and TapirLink software
      Renato de Giovanni
      Dave Vieglais
      52
    • 57. Thank you for listening!
      53

    ×