Data exchange alternatives, SBIS conference in Stockholm (2008)
Upcoming SlideShare
Loading in...5
×
 

Data exchange alternatives, SBIS conference in Stockholm (2008)

on

  • 1,684 views

Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).

Biodiversity Data Publishing Software for the Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008). Stockholm, 3rd December 2008. Dag Endresen (Bioversity/NordGen).

Statistics

Views

Total Views
1,684
Views on SlideShare
1,679
Embed Views
5

Actions

Likes
0
Downloads
6
Comments
0

2 Embeds 5

http://www.slideshare.net 3
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Image source: University of Ottawa, Distributed Computing Research Group: http://www.genie.uottawa.ca/research/rsrch_site.php?lang=e&id=90 (Google Images).See also: http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
  • IMAGE: http://blog.tapirtype.com/cartoons/ [Creative Commons License: http://creativecommons.org/licenses/by-nc-sa/3.0/us/]
  • http://www.tdwg.org
  • More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
  • More details see:GBIF NODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
  • Apache HTTPD server:http://www.apache.org/MySQL database server:http://www.mysql.com/Python programming language: http://www.python.org
  • In some cases with proxies:svn co svn://svn.pywrapper.org/pywrapper/trunk pywrapper
  • More details see:GBIFNODES meeting 2007 in Amsterdam.Agenda 09 Technical Training session - TAPIR/PyWrapper3:http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
  • IMAGE source: http://commons.wikimedia.org/wiki/Image:Handshake_(Workshop_Cologne_%2706).jpeg; Copyright: GNU Public Licence
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • http://wwwdev.ngb.se/portal/index.php?scope=demohttp://chm.grinfo.net/index.php?app=data_providers
  • Photographer Dag Terje Endresen (NordGen Picture Archive, image 002986)http://www.nordgen.org/sesto/index.php?scp=ngb&thm=pictures&mod=det&id=002986&img_size=768x512

Data exchange alternatives, SBIS conference in Stockholm (2008) Data exchange alternatives, SBIS conference in Stockholm (2008) Presentation Transcript

  • Biodiversity Data Provider Software
    Hands-on exercises with TAPIR
    Stockholm Biodiversity Informatics Symposium 2008 (SBIS2008)
    Dag Terje Filip Endresen, Nordic Genetic Resources Center (Sweden) / Bioversity International (Italy)
  • Fallacies of Distributed Computing
    The network is reliable.
    Latency is zero.
    Bandwidth is infinite.
    The network is secure.
    Topology doesn't change.
    There is one administrator.
    Transport cost is zero.
    The network is homogeneous.
    This list of fallacies came about at Sun Microsystems around 1994.
    2
  • TAPIR
    3
    Cartoon by Sasha Kopf (Creative Commons)
  • TAPIR
    TAPIR - TDWG Access Protocol for Information Retrieval.
    During the 2004 TDWG meeting in Christchurch, NZ, work started on a unified protocol and named TAPIR.
    TAPIR is based on the protocol from the two data provider software, BioCASE and DiGIR.
    4
  • Provider software, wrappers
    DiGIR (2002, not active)
    http://digir.sourceforge.net
    BioCASE (2003, PyWrapper)
    http://www.biocase.org
    PyWrapper3 (2006, not active)
    http://trac.pywrapper.org/
    TapirLink (2007)
    http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink
    GBIF Provider Toolkit (2009)
    http://code.google.com/p/gbif-providertoolkit
    5
  • BioCASE 2.5.ORC
    6
    The BioCASE provider software is a product of the EU funded BioCASE project (2001-2004).
    Developed at BGBM in Berlin.
    Last updated in April 2008, with support for Python version 2.5 and less required external
    Implement the BioCASE provider to share data as ABCD 2.06.
    http://www.biocase.org
  • 1. Make sure you have Python 2.5 installed
    (command line: python –v)
    2.Download the latest provider software from
    http://www.biocase.org
    3. Uncompress the BioCASE provider software to a folder on your system [provider_software_2.5.0RC.tar.gz]
    (tar –xzvf provider_...tar.gz)
    4. Run setup.py, (python setup.py)
    5. Configure your web server to mount biocase/www as
    http://localhost/biocase/
    Hint: You will find an example for httpd.conf as the last terminal output from running setup.py
    7
    BioCASE 2.5.ORC
  • BioCASE 2.5.ORC
    6. Visit the library test page:
    http://localhost/biocase/utilities/testlibs.cgi
    6a. Download latest 4 Suite
    from http://4suite.org/
    Uncompress and install
    [4Suite-XML-1.0.2.tar.bz2]
    6b. Install additional python
    libraries, including the desired
    database driver. For each
    python package:
    (python setup.py install)
    6c. Graphviz is useful to
    visualize the database
    table structure.
    8
  • BioCASE 2.5.ORC
    7. Configuration
    • Add datasource (dsa)
    • Database connection
    • Database table structure
    • Mapping of data model to standard schema
    9
  • BioCASE 2.5.ORC
    8. Query Form
    The manual query form is illustrative for understanding exactly how the wrapper software works! http://localhost/biocase/utilities/queryforms/qf_manual.cgi?dsa=sesto
    10
  • 11
  • PyWrapper3
    Home:http://trac.pywrapper.org/
    Primary developers: Markus Döring, Javier de la Torre
    14/07/2008 - Development stalled
    We are sorry to inform you that development of the TAPIR branch of PyWrapper has been stalled. The latest 3.1 alpha version is not stable and not recommended for production! (Message from the home page)
    • PyWrapper 3.0.0 (Latest stable version, requires Python 2.4)
    • PyWrapper 3.1.0 alpha (development version, works with Python 2.5)
    PyWrapper is tested and verified to work fine with Windows, Mac OS X and Linux.
    12
  • Required configuration
    Web server: Any CGI compliant web server: Apache, IIS etc. (The built in CherryPy web server can also be used).
    Database: Major databases are supported, including MySQL, Oracle, SQLServer, Sybase, Access, PostgreSQL. Theoretically any database with a Python library should work.
    Python PyWrapper is developed with the Python programming language. (The latest version from the SVN code repository works with Python version 2.5)
    13
    Apache, MySQL and Python are open source software, free to use - even for commercial products.
  • Installation
    http://trac.pywrapper.org/pywrapper/wiki/InstallationGuide
    1. Download the latest PyWrapper3 installer.
    Use SVN export or checkout for Python 2.5 support
    2. Uncompress to a folder of your choice.
    Example: “/usr/local/pywrapper3/”
    Example: “C:pywrapper”
    Local installation: If you have a Subversion client installed, you may use the automatic installer. (Local Python and libraries are installed to your pywrapper folder)
    promt$ svn export svn://svn.pywrapper.org:80/pywrapper/trunk pywrapper
    promt$ cd pywrapper/tools
    promt$ /bin/shinstall.sh
    This will require that you have a bash shell, and probably that you have a Unix line system like e.g. FreeBSD, Linux or Mac OS X…
    3. Execute: pywrapper/setup.py
    Example: promt$ python setup.py (Mac OS X, Linux)
    On Windows locate setup.py and double-click
    14
  • Start standalone server
    Execute start_server.py(default port is 8080)
    promt$ cdwebapp/
    promt$ ./start_server.py 8088(example to start on port 8088)
    On a Windows system you may do this in a MS-DOS window (or double-click the file - if you accept the default port).
    Some messages will pass across your screen. Please be patient, this could take a minute. Wait for the message “start server …” and find find PyWrapper at:
    http://localhost:8088/pywrapper
    15
  • Configuration
    After successful installation, you will need to configure your data provider. Follow the instructions from the PyWrapper documentation web page to configure.
    Data sources. If you provide more datasets or several databases they will be configured as individual data sources (dsa).
    Database connection. For PyWrapper to access your database.
    Database structure. Define the relevant database tables, the primary keys and foreign keys.
    Data model. Map your database model to the standard represented by the XML Schemas you choose.
    http://trac.pywrapper.org/pywrapper/wiki/Documentation
    16
  • Screen examples
    PyWrapper comes with a graphical web based configuration tool
    For more information and more screen dumps from the configuration of PyWrapper, see:
    http://circa.gbif.net/Public/irc/gbif/nodes/library?l=/meetings/2007_10_amsterdam/tapir_pywrapper3/_EN_1.0_&a=i
    17
  • TapirLink 0.6.1
    18
  • TapirLink 0.6.1
    Home: http://wiki.tdwg.org/twiki/bin/view/TAPIR/TapirLink
    Primary developers: Renato De Giovanni, Dave Vieglais
    Download: http://sourceforge.net/project/showfiles.php?group_id=38190
    Uncompress PHP source code
    Eg: /usr/local/tapir/tapirlink
    Mount admin and www directory for your web server.
    Example: Apache “httpd.conf”
    Alias /tapirlink "/usr/local/tapir/tapirlink/www”
    Alias /tapirlink-admin "/usr/local/tapir/tapirlink/admin"
    <Location /tapirlink>
    Order allow,deny
    Allow from all
    </Location>
    <Location /tapirlink-admin>
    Order allow,deny
    Allow from all
    </Location>
    Read permissions on all directories
    Write on cache, config, log, statistics
    19
  • TapirLink 0.6.1
    Start by adding a new resource http://localhost/tapirlink-admin/
    Step 1: Describe your new resource
    20
  • TapirLink 0.6.1
    Step 2: Data source, database connection
    Step 3: Table structure
    21
  • TapirLink 0.6.1
    Step 4: Filter
    Step 5: Select mapping standards to use
    22
  • TapirLink 0.6.1
    Step 5b: Mapping data schema (ABCD 2.06)
    etc…
    23
  • TapirLink 0.6.1
    Step 5c: Mapping data concepts (Darwin Core)
    etc…
    Step 5d: Remember that DwC have an extension for geospatial descriptors
    etc…
    24
  • TapirLink 0.6.1
    Step 6: Settings
    New resource successfully configured
    25
  • TapirLink 0.6.1
    Test resource with client form:
    http://localhost/tapirlink/tapir_client.php
    The XML Client form is very illustrative for understanding exactly how the wrapper software works!
    26
  • Service interface
    27
  • Example of a service request
    All exchanged data is formatted with XML tags.
    28
  • Example of a service response
    29
    ...
  • Example TAPIR service request
    30
  • Example TAPIR service response
    singer:/sourcename
    singer:/taxonomy/genus
    singer:/taxonomy/species
    singer:/taxonomy/subspecies
    singer:/holding/ID
    singer:/holding/name
    singer:/origin/collecting/countrysource
    singer:/origin/collecting/countrysourceID
    singer:/status/biologicalstatus
    singer:/status/biologicalstatusID
    ...
    31
  • Example TAPIR service SEARCH request
    32
  • Example TAPIR service Search response
    33
  • Example of OAI-PMH service request
    OAI-PMH requests are expressed as HTTP requests.
    OAI-PMH requests must be submitted using either the HTTP GET or POST methods.
    http://an.oa.org/OAI-script?verb=GetRecord
    &identifier=oai:arXiv.org:hep-th/9901001
    &metadataPrefix=oai_dc
    34
  • Example of OAI-PMH service RESPONSE
    OAI-PMH responses are formatted as HTTP responses.
    With The Content-Type as text/xml.
    35
  • OAI-PMH PROTOCOL, metadata formats
    36
    Request types (verb):
    Identify
    ListMetadataFormats
    ListSets
    GetRecord
    ListIdentifiers
    ListRecords
    For purposes of interoperability, the metadataPrefix `oai_dc’ is reserved for Dublin Core.
    Communities adopt own metadataPrefixesfor own metadata fomats. Relevant formats/schemas for Biodiversity Informatics are Darwin Core and ABCD.
  • Why sharing data?
    37
  • [http://data.gbif.org/]
    38
    [http://data.gbif.org/datasets/resource/1487]
  • GBIF PGR Network 2
    [http://data.gbif.org/datasets/network/2]
    39
  • 40
    Distributed network
    The image is from the BioCASE web site
  • 41
    Decentralized network
    GBIF
    (Global Biodiversity
    Information Facility)
    ALIS
    (Accession Level
    Information System)
    USER
    Svalbard Global Seed Vault
    (Safe Backup)
    (USDA ARS
    National
    Germplasm
    Repositories...)
    Web
    Services
    USDA GRIN (USA)
    SINGER
    (CGIAR)
    (CGIAR
    International
    Future Harvest
    gene banks...)
    EURISCO
    (Europe)
    MCPD
    IHAR
    (Poland)
    WUR CGN
    (Netherlands)
    NordGen
    (Northern Europe)
    IPK Gatersleben
    (Germany)
    (Other European
    gene banks...)
  • 42
    Crop Wild Relatives
    LKA
    ARM
    BOL
    National Datasets
    are shared with
    the central
    CWR data index.
    The national
    datasets as well as
    access to other International
    datasets are provided from the CWR data portal.
    MDG
    EURISCO
    UZB
    http://www.cropwildrelatives.org
    SINGER
  • Data portal example
    43
  • 44
    http://wwwdev.ngb.se/portal/index.php?scope=demo
  • 45
  • 46
  • 47
  • 48
  • 49
    Outlook
    The compatibility of data standards between PGR and biodiversity collections made it possible to integrate the worldwide germplasm collections into the biodiversity community.
    Using GBIF technology (and contributing to its development), the PGR community can easily establish specific PGR networks without duplicating GBIF's work.
    Use of GBIF technology and integration of PGR collection data into GBIF allows PGR users to simultaneously search PGR collections and other biodiversity collections, and to get access to the data (and possibly the material) of relevant biodiversity collections.
    The establishment of new data portals on a specific crop, a regional thematic network or similar subset of the total global biodiversity datasets; can be done with rather few efforts! This requires only that all the relevant datasets are provided by GBIF compatible web services (like the BioCASE PyWrapper).
  • Participation and the sharing of your institute datasets with global and national biodiversity projects
    is important for your public and scientific visibility,
    promoting the use (usefulness) of your data
    and ultimately for the continued funding of your institutional activities.
    50
  • Special thanks to
    Bioversity International
    [http://www.bioversityinternational.org]
    GBIF, Global Biodiversity Information Facility [http://www.gbif.org]
    BioCASE, The Biological Collection Access Service for Europe.
    [http://www.biocase.org]
    TDWG, Biodiversity Information Standards [http://www.tdwg.org]
    51
  • Special thanks to
    BioCASE and PyWrapper3 software
    Markus Döring
    Javier de la Torre
    DiGIR and TapirLink software
    Renato de Giovanni
    Dave Vieglais
    52
  • Thank you for listening!
    53