GBIF Summary Paper:

The Global Biodiversity Resources Discovery System (GBRDS)

Background:
One of the major challenges for existing biodiversity informatics
infrastructure is to provide users with ways that substantially
increase their ability to discover and access relevant biodiversity
information and data resources. Our current ability to discover
distributed, isolated and unknown data and information
resources is limited.

GBIF has already demonstrated that a worldwide-distributed
network of biodiversity data publishers can be linked together
and made searchable from a single point of access. Currently,
more than 181 million primary biodiversity data records are
searchable and downloadable online from over 296 publishers,
demonstrating the feasibility of linking data-holding institutions
and individuals at national, regional and thematic level. However, this represents only a small
fraction of the biodiversity information and data resources which exist (and will be generated
in the future) yet which are not, nor will be, readily discoverable, without a comprehensive
discovery system.

Rationale:
One of the major challenges for GBIF as a leading global biodiversity informatics
infrastructure is to provide an innovative means to the discovery and access of all relevant
information and data resources for the benefit of the biodiversity community at large.

This challenge is being addressed by GBIF through the development of a Global Biodiversity
Resources Discovery System (GBRDS) for registration and discovery of biodiversity information
and data resources and services, as set out in our Work Programme 2009-2010.

Previously, users of resources such as a library or biological collections had to rely on a card
catalogue. It was the only means for finding much of the material available in libraries or
other collections. Today, of course, much of the information painstakingly captured on cards
in the catalogues has become available digitally, cross-searchable between catalogues and
capable of easily locating relevant resources remotely, using titles, authors, abstracts, dates
or keywords. More even than a global catalogue, what is required today for the Biodiversity
Informatics community is a comprehensive directory of resources and a road map to access
them – a resource discovery system. Like a compass such a global directory of resources would
offer a unique map of existing institutions, collections, datasets, services and other relevant
information resources.

At present the GBIF infrastructure functions on a Universal Description Discovery and
Integration (UDDI) platform, which allows for registration of services and associated technical
access points. It is used as a network compass to locate the data publisher’s access-points
exposed on the Internet via Web services. However a UDDI alone falls short of mapping the
full complexity of the networked GBIF community and does not address more complex issues
such as understanding of roles and relationships between institutions, collections and
resources. For example, within the DarwinCore schema, the two fundamental concepts
(InstitutionCode and CollectionCode), are intended to refer to a “standard” code identifier
that uniquely identifies the institution and collection. However, no global registry exists for
assigning unique, standardised institutional (and collection) codes that publishers should use
across the various disciplines.

In 2008, acutely aware of this growing challenge GBIF, together with the Biodiversity
Information Standards (TDWG), and the Royal Botanic Garden, Edinburgh (RBGE) initiated a
project called the ‘Biodiversity Collections Index’ (BCI) with the aim to facilitate the
understanding, conservation and utilisation of global biodiversity resources by creating a
single global annotated index of biodiversity collections based on existing authoritative
references. The BCI feasibility project intended to do this by collaborating with the
organisations and individuals who curate these collections. The BCI project was not intended
to duplicate these authoritative references but to reconcile these through a central reference
point, and associate them with a globally unique identifier. The BCI project was successful in
mobilizing interest from the community but its integration within the full GBIF infrastructure
and community requires broadening its scope – content, coverage, and reconciliation ability.

DESIGN:
A GBRDS should ideally be a combination of 1) a Registry of resources and services and 2) a
set of discovery services interacting with existing infrastructure such as GBIF to facilitate the
discovery of biodiversity information. The most important component, the Registry would
facilitate the inventory of information resources by creating a single annotated index of
publishers, institutions, networks, collections (datasets), schema repository and services. The
envisaged GBRDS is not conceived to be designed as simply a collection of centralized indexes
but much more as an integrated ‘Yellow Pages’ reference of all biodiversity information
resources, reconciling all distributed resources and providing a meaningful way to discover
them in a distributed manner. Any and all interested organisations should be able to register
their resources and services into the GBRDS and contribute to the discovery services.

It is envisaged that the GBRDS will form the core of the next generation of biodiversity
informatics infrastructure, built on the principle of distributed architecture, and
decentralized implementation. Through the comprehensive related activities planned in the
GBIF 2009-2010 Work Programme, by December 2010, the GBRDS in conjunction with existing
infrastructures, is intended to become a unified global entry point for discovery of all kinds of
information (who, what, where, when, and how) about biodiversity resources, both digitised
and undigitised, including primary and non-primary biodiversity data, standards and services,
and for integrating the GBIF network with other systems/networks. The functionalities and
services offered by the GBRDS are intended to be scalable and evolve over time based on the
needs of the community, improvement of global biodiversity information infrastructure, and
interoperable linkages with other similar networks such as GEO-BON etc. The GBRDS is
conceived as a discovery system that will provide a platform for a coherent global map of
resources. The GBRDS is therefore foreseen as a catalyst for integration of all existing
resources and enabling their discovery and use.
SPECIFICATIONS
Two main components constitute the GBRDS: (a) registry and (b) discovery services.

   a) REGISTRY

The registry will facilitate unified registration, disambiguation and resolution of data
resources, and services. In other words, the GBRDS will incorporate an inventory of
‘persistent identifiers’ that glues together all the data components within and outside of the
GBIF infrastructure. Thus, the registry will provide a ‘wiring diagram’ of GBIF and other
biodiversity related network topologies and services available.


   •   Open source (Apache 2.0 license) Java-based customisable, multilingual web
       application.
   •   Web based user interface and web services offering:
          o Creation, deletion and update of network entities (institutions, datasets,
              products, protocols, services, thematic networks etc)
          o Management of multiple relationships between network entities, allowing
              resources to participate in many networks for example.
          o Role-based user management, allowing for multiple data curators to
              participate in the registration of network entities.
          o Browse and search capabilities for registered resources, enabling discovery of
              access points.
          o Ability to filter the network view so that only registered entities participating
              in a specific thematic network are displayed; “Show me the publishers
              contributing to Network X and their relationships”
          o Services to help uniquely identify entities reconciling multiple identifiers to the
              same network entity.
          o Registration of IPT specific extensions and controlled vocabularies
          o Notification to appropriate administrators when changes are made
          o Services for assignment and resolution of persistent identifiers for network
              entities
          o Services for resolution of ‘Persistent Identifiers’ assigned by other
              registries/discovery systems

   b) DISCOVERY SERVICES:

The GBIF infrastructure aims to provide search and discovery of the following data types:
   • Primary biodiversity data
           o Single location (e.g. point base)
           o Grid based, such as plot monitoring
           o Individual specimen monitoring
   • Taxonomic and checklist information
   • Species distributions (range of distribution, etc.)
   • Multimedia resources in biodiversity
   • Literature based biodiversity data
   • Data Resources (datasets) descriptions or metadata
Other services are made available by many organisations and initiatives. The discovery
services of the GRBDS are aimed at documenting these in ways that they can be easily
discovered even by machine-to-machine systems.

Ultimately such discovery services combined with the content of the Registry will provide a
unique reference of all biodiversity information resources and enable scientists and other
users to simply and quickly discover what information resources are available in the global
distributed network and what analytical tools and services to analyse them are available.


DEVELOPMENT SCHEDULE
    •     GBRDS Summary Paper: August 2009

    •     GBRDS Stakeholders workshop: September 2009

    •     GBRDS version 1.0: October 2009

    •     GBRDS version 2.0: December 2010


RESOURCES
    1. http://code.google.com/p/gbif-registry/

          Source code, wiki, bug reporting

    2. http://www.gbif.org/

          GBIF communications portal


GBIF CONTACTS
 
        Head of Informatics                    Systems Architect

        SAMY GAIJI                             TIM ROBERTSON

        sgaiji@gbif.org                        trobertson@gbif.org



        Senior Programme Officer for DIGIT     Developer

        VISHWAS CHAVAN                         JOSE CUADRA

        vchavan@gbif.org                       jcuadra@gbif.org

 

Gbrds Summary Final July2009 (2)

  • 1.
    GBIF Summary Paper: TheGlobal Biodiversity Resources Discovery System (GBRDS) Background: One of the major challenges for existing biodiversity informatics infrastructure is to provide users with ways that substantially increase their ability to discover and access relevant biodiversity information and data resources. Our current ability to discover distributed, isolated and unknown data and information resources is limited. GBIF has already demonstrated that a worldwide-distributed network of biodiversity data publishers can be linked together and made searchable from a single point of access. Currently, more than 181 million primary biodiversity data records are searchable and downloadable online from over 296 publishers, demonstrating the feasibility of linking data-holding institutions and individuals at national, regional and thematic level. However, this represents only a small fraction of the biodiversity information and data resources which exist (and will be generated in the future) yet which are not, nor will be, readily discoverable, without a comprehensive discovery system. Rationale: One of the major challenges for GBIF as a leading global biodiversity informatics infrastructure is to provide an innovative means to the discovery and access of all relevant information and data resources for the benefit of the biodiversity community at large. This challenge is being addressed by GBIF through the development of a Global Biodiversity Resources Discovery System (GBRDS) for registration and discovery of biodiversity information and data resources and services, as set out in our Work Programme 2009-2010. Previously, users of resources such as a library or biological collections had to rely on a card catalogue. It was the only means for finding much of the material available in libraries or other collections. Today, of course, much of the information painstakingly captured on cards in the catalogues has become available digitally, cross-searchable between catalogues and capable of easily locating relevant resources remotely, using titles, authors, abstracts, dates or keywords. More even than a global catalogue, what is required today for the Biodiversity Informatics community is a comprehensive directory of resources and a road map to access them – a resource discovery system. Like a compass such a global directory of resources would offer a unique map of existing institutions, collections, datasets, services and other relevant information resources. At present the GBIF infrastructure functions on a Universal Description Discovery and Integration (UDDI) platform, which allows for registration of services and associated technical access points. It is used as a network compass to locate the data publisher’s access-points exposed on the Internet via Web services. However a UDDI alone falls short of mapping the full complexity of the networked GBIF community and does not address more complex issues
  • 2.
    such as understandingof roles and relationships between institutions, collections and resources. For example, within the DarwinCore schema, the two fundamental concepts (InstitutionCode and CollectionCode), are intended to refer to a “standard” code identifier that uniquely identifies the institution and collection. However, no global registry exists for assigning unique, standardised institutional (and collection) codes that publishers should use across the various disciplines. In 2008, acutely aware of this growing challenge GBIF, together with the Biodiversity Information Standards (TDWG), and the Royal Botanic Garden, Edinburgh (RBGE) initiated a project called the ‘Biodiversity Collections Index’ (BCI) with the aim to facilitate the understanding, conservation and utilisation of global biodiversity resources by creating a single global annotated index of biodiversity collections based on existing authoritative references. The BCI feasibility project intended to do this by collaborating with the organisations and individuals who curate these collections. The BCI project was not intended to duplicate these authoritative references but to reconcile these through a central reference point, and associate them with a globally unique identifier. The BCI project was successful in mobilizing interest from the community but its integration within the full GBIF infrastructure and community requires broadening its scope – content, coverage, and reconciliation ability. DESIGN: A GBRDS should ideally be a combination of 1) a Registry of resources and services and 2) a set of discovery services interacting with existing infrastructure such as GBIF to facilitate the discovery of biodiversity information. The most important component, the Registry would facilitate the inventory of information resources by creating a single annotated index of publishers, institutions, networks, collections (datasets), schema repository and services. The envisaged GBRDS is not conceived to be designed as simply a collection of centralized indexes but much more as an integrated ‘Yellow Pages’ reference of all biodiversity information resources, reconciling all distributed resources and providing a meaningful way to discover them in a distributed manner. Any and all interested organisations should be able to register their resources and services into the GBRDS and contribute to the discovery services. It is envisaged that the GBRDS will form the core of the next generation of biodiversity informatics infrastructure, built on the principle of distributed architecture, and decentralized implementation. Through the comprehensive related activities planned in the GBIF 2009-2010 Work Programme, by December 2010, the GBRDS in conjunction with existing infrastructures, is intended to become a unified global entry point for discovery of all kinds of information (who, what, where, when, and how) about biodiversity resources, both digitised and undigitised, including primary and non-primary biodiversity data, standards and services, and for integrating the GBIF network with other systems/networks. The functionalities and services offered by the GBRDS are intended to be scalable and evolve over time based on the needs of the community, improvement of global biodiversity information infrastructure, and interoperable linkages with other similar networks such as GEO-BON etc. The GBRDS is conceived as a discovery system that will provide a platform for a coherent global map of resources. The GBRDS is therefore foreseen as a catalyst for integration of all existing resources and enabling their discovery and use.
  • 4.
    SPECIFICATIONS Two main componentsconstitute the GBRDS: (a) registry and (b) discovery services. a) REGISTRY The registry will facilitate unified registration, disambiguation and resolution of data resources, and services. In other words, the GBRDS will incorporate an inventory of ‘persistent identifiers’ that glues together all the data components within and outside of the GBIF infrastructure. Thus, the registry will provide a ‘wiring diagram’ of GBIF and other biodiversity related network topologies and services available. • Open source (Apache 2.0 license) Java-based customisable, multilingual web application. • Web based user interface and web services offering: o Creation, deletion and update of network entities (institutions, datasets, products, protocols, services, thematic networks etc) o Management of multiple relationships between network entities, allowing resources to participate in many networks for example. o Role-based user management, allowing for multiple data curators to participate in the registration of network entities. o Browse and search capabilities for registered resources, enabling discovery of access points. o Ability to filter the network view so that only registered entities participating in a specific thematic network are displayed; “Show me the publishers contributing to Network X and their relationships” o Services to help uniquely identify entities reconciling multiple identifiers to the same network entity. o Registration of IPT specific extensions and controlled vocabularies o Notification to appropriate administrators when changes are made o Services for assignment and resolution of persistent identifiers for network entities o Services for resolution of ‘Persistent Identifiers’ assigned by other registries/discovery systems b) DISCOVERY SERVICES: The GBIF infrastructure aims to provide search and discovery of the following data types: • Primary biodiversity data o Single location (e.g. point base) o Grid based, such as plot monitoring o Individual specimen monitoring • Taxonomic and checklist information • Species distributions (range of distribution, etc.) • Multimedia resources in biodiversity • Literature based biodiversity data • Data Resources (datasets) descriptions or metadata
  • 5.
    Other services aremade available by many organisations and initiatives. The discovery services of the GRBDS are aimed at documenting these in ways that they can be easily discovered even by machine-to-machine systems. Ultimately such discovery services combined with the content of the Registry will provide a unique reference of all biodiversity information resources and enable scientists and other users to simply and quickly discover what information resources are available in the global distributed network and what analytical tools and services to analyse them are available. DEVELOPMENT SCHEDULE • GBRDS Summary Paper: August 2009 • GBRDS Stakeholders workshop: September 2009 • GBRDS version 1.0: October 2009 • GBRDS version 2.0: December 2010 RESOURCES 1. http://code.google.com/p/gbif-registry/ Source code, wiki, bug reporting 2. http://www.gbif.org/ GBIF communications portal GBIF CONTACTS   Head of Informatics Systems Architect SAMY GAIJI TIM ROBERTSON sgaiji@gbif.org trobertson@gbif.org Senior Programme Officer for DIGIT Developer VISHWAS CHAVAN JOSE CUADRA vchavan@gbif.org jcuadra@gbif.org