Gbrds Summary Final July2009 (2)

GBIF Summary Paper:

The Global Biodiversity Resources Discovery System (GBRDS)

Background:
One of the major challenges for existing biodiversity informatics
infrastructure is to provide users with ways that substantially
increase their ability to discover and access relevant biodiversity
information and data resources. Our current ability to discover
distributed, isolated and unknown data and information
resources is limited.

GBIF has already demonstrated that a worldwide-distributed
network of biodiversity data publishers can be linked together
and made searchable from a single point of access. Currently,
more than 181 million primary biodiversity data records are
searchable and downloadable online from over 296 publishers,
demonstrating the feasibility of linking data-holding institutions
and individuals at national, regional and thematic level. However, this represents only a small
fraction of the biodiversity information and data resources which exist (and will be generated
in the future) yet which are not, nor will be, readily discoverable, without a comprehensive
discovery system.

Rationale:
One of the major challenges for GBIF as a leading global biodiversity informatics
infrastructure is to provide an innovative means to the discovery and access of all relevant
information and data resources for the benefit of the biodiversity community at large.

This challenge is being addressed by GBIF through the development of a Global Biodiversity
Resources Discovery System (GBRDS) for registration and discovery of biodiversity information
and data resources and services, as set out in our Work Programme 2009-2010.

Previously, users of resources such as a library or biological collections had to rely on a card
catalogue. It was the only means for finding much of the material available in libraries or
other collections. Today, of course, much of the information painstakingly captured on cards
in the catalogues has become available digitally, cross-searchable between catalogues and
capable of easily locating relevant resources remotely, using titles, authors, abstracts, dates
or keywords. More even than a global catalogue, what is required today for the Biodiversity
Informatics community is a comprehensive directory of resources and a road map to access
them – a resource discovery system. Like a compass such a global directory of resources would
offer a unique map of existing institutions, collections, datasets, services and other relevant
information resources.

At present the GBIF infrastructure functions on a Universal Description Discovery and
Integration (UDDI) platform, which allows for registration of services and associated technical
access points. It is used as a network compass to locate the data publisher’s access-points
exposed on the Internet via Web services. However a UDDI alone falls short of mapping the
full complexity of the networked GBIF community and does not address more complex issues

such as understanding of roles and relationships between institutions, collections and
resources. For example, within the DarwinCore schema, the two fundamental concepts
(InstitutionCode and CollectionCode), are intended to refer to a “standard” code identifier
that uniquely identifies the institution and collection. However, no global registry exists for
assigning unique, standardised institutional (and collection) codes that publishers should use
across the various disciplines.

In 2008, acutely aware of this growing challenge GBIF, together with the Biodiversity
Information Standards (TDWG), and the Royal Botanic Garden, Edinburgh (RBGE) initiated a
project called the ‘Biodiversity Collections Index’ (BCI) with the aim to facilitate the
understanding, conservation and utilisation of global biodiversity resources by creating a
single global annotated index of biodiversity collections based on existing authoritative
references. The BCI feasibility project intended to do this by collaborating with the
organisations and individuals who curate these collections. The BCI project was not intended
to duplicate these authoritative references but to reconcile these through a central reference
point, and associate them with a globally unique identifier. The BCI project was successful in
mobilizing interest from the community but its integration within the full GBIF infrastructure
and community requires broadening its scope – content, coverage, and reconciliation ability.

DESIGN:
A GBRDS should ideally be a combination of 1) a Registry of resources and services and 2) a
set of discovery services interacting with existing infrastructure such as GBIF to facilitate the
discovery of biodiversity information. The most important component, the Registry would
facilitate the inventory of information resources by creating a single annotated index of
publishers, institutions, networks, collections (datasets), schema repository and services. The
envisaged GBRDS is not conceived to be designed as simply a collection of centralized indexes
but much more as an integrated ‘Yellow Pages’ reference of all biodiversity information
resources, reconciling all distributed resources and providing a meaningful way to discover
them in a distributed manner. Any and all interested organisations should be able to register
their resources and services into the GBRDS and contribute to the discovery services.

It is envisaged that the GBRDS will form the core of the next generation of biodiversity
informatics infrastructure, built on the principle of distributed architecture, and
decentralized implementation. Through the comprehensive related activities planned in the
GBIF 2009-2010 Work Programme, by December 2010, the GBRDS in conjunction with existing
infrastructures, is intended to become a unified global entry point for discovery of all kinds of
information (who, what, where, when, and how) about biodiversity resources, both digitised
and undigitised, including primary and non-primary biodiversity data, standards and services,
and for integrating the GBIF network with other systems/networks. The functionalities and
services offered by the GBRDS are intended to be scalable and evolve over time based on the
needs of the community, improvement of global biodiversity information infrastructure, and
interoperable linkages with other similar networks such as GEO-BON etc. The GBRDS is
conceived as a discovery system that will provide a platform for a coherent global map of
resources. The GBRDS is therefore foreseen as a catalyst for integration of all existing
resources and enabling their discovery and use.

SPECIFICATIONS
Two main components constitute the GBRDS: (a) registry and (b) discovery services.

a) REGISTRY

The registry will facilitate unified registration, disambiguation and resolution of data
resources, and services. In other words, the GBRDS will incorporate an inventory of
‘persistent identifiers’ that glues together all the data components within and outside of the
GBIF infrastructure. Thus, the registry will provide a ‘wiring diagram’ of GBIF and other
biodiversity related network topologies and services available.

• Open source (Apache 2.0 license) Java-based customisable, multilingual web
application.
• Web based user interface and web services offering:
o Creation, deletion and update of network entities (institutions, datasets,
products, protocols, services, thematic networks etc)
o Management of multiple relationships between network entities, allowing
resources to participate in many networks for example.
o Role-based user management, allowing for multiple data curators to
participate in the registration of network entities.
o Browse and search capabilities for registered resources, enabling discovery of
access points.
o Ability to filter the network view so that only registered entities participating
in a specific thematic network are displayed; “Show me the publishers
contributing to Network X and their relationships”
o Services to help uniquely identify entities reconciling multiple identifiers to the
same network entity.
o Registration of IPT specific extensions and controlled vocabularies
o Notification to appropriate administrators when changes are made
o Services for assignment and resolution of persistent identifiers for network
entities
o Services for resolution of ‘Persistent Identifiers’ assigned by other
registries/discovery systems

b) DISCOVERY SERVICES:

The GBIF infrastructure aims to provide search and discovery of the following data types:
• Primary biodiversity data
o Single location (e.g. point base)
o Grid based, such as plot monitoring
o Individual specimen monitoring
• Taxonomic and checklist information
• Species distributions (range of distribution, etc.)
• Multimedia resources in biodiversity
• Literature based biodiversity data
• Data Resources (datasets) descriptions or metadata

Other services are made available by many organisations and initiatives. The discovery
services of the GRBDS are aimed at documenting these in ways that they can be easily
discovered even by machine-to-machine systems.

Ultimately such discovery services combined with the content of the Registry will provide a
unique reference of all biodiversity information resources and enable scientists and other
users to simply and quickly discover what information resources are available in the global
distributed network and what analytical tools and services to analyse them are available.

DEVELOPMENT SCHEDULE
• GBRDS Summary Paper: August 2009

• GBRDS Stakeholders workshop: September 2009

• GBRDS version 1.0: October 2009

• GBRDS version 2.0: December 2010

RESOURCES
1. http://code.google.com/p/gbif-registry/

Source code, wiki, bug reporting

2. http://www.gbif.org/

GBIF communications portal

GBIF CONTACTS

Head of Informatics Systems Architect

SAMY GAIJI TIM ROBERTSON

sgaiji@gbif.org trobertson@gbif.org

Senior Programme Officer for DIGIT Developer

VISHWAS CHAVAN JOSE CUADRA

vchavan@gbif.org jcuadra@gbif.org

Gbrds Summary Final July2009 (2)

More Related Content

What's hot

Viewers also liked

Similar to Gbrds Summary Final July2009 (2)

More from Vishwas Chavan

Recently uploaded

Gbrds Summary Final July2009 (2)