Remsen EOL Content Summit
Upcoming SlideShare
Loading in...5
×
 

Remsen EOL Content Summit

on

  • 584 views

Global Biodiversity Information Facility presentation to the Encyclopedia of Life Content Summit

Global Biodiversity Information Facility presentation to the Encyclopedia of Life Content Summit

Statistics

Views

Total Views
584
Views on SlideShare
584
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Before I describe the challenges inherent to the index, I’d like to illustrate how primary biodiversity data has been used in various scientific and biodiversity policy-related contexts.
  • GBIF has a specific focus within biodiversity information in that our scope is restricted to the mobilisation, discovery, and use of primary biodiversity data. Primary biodiversity data are the digital text or multimedia data records that detail the instance of an organism – the ‘what, where, when, how and by whom’ of the organism’s occurrence and recording. One major class of primary biodiversity data is that derived from natural history collections.
  • A second class of primary biodiversity data originate with observations of species and there are numerous instances of observational data networks that collect millions of species observations every year.
  • A second class of primary biodiversity data originate with observations of species and there are numerous instances of observational data networks that collect millions of species observations every year.
  • GBIF represents a federated network that is composed of thousands of different primary biodiversity databases located all over the world.
  • Before I describe the challenges inherent to the index, I’d like to illustrate how primary biodiversity data has been used in various scientific and biodiversity policy-related contexts.
  • Before I describe the challenges inherent to the index, I’d like to illustrate how primary biodiversity data has been used in various scientific and biodiversity policy-related contexts.
  • Before I describe the challenges inherent to the index, I’d like to illustrate how primary biodiversity data has been used in various scientific and biodiversity policy-related contexts.
  • GBIF has invested heavily in the development of Darwin Core Archive data publishing tools and supporting documentation.
  • GBIF has invested heavily in the development of Darwin Core Archive data publishing tools and supporting documentation.
  • GBIF has invested heavily in the development of Darwin Core Archive data publishing tools and supporting documentation.
  • What makes all of these different databases part of the GBIF network are: These data are made available on the Internet using a common set of communications protocols and data formats. A registry, representing a list of all members of the network and the location of the data itself (often a URL) serves as a master network directory.
  • Lists of these resources are available via RESTful machine interfaces. Here is an example of listing all Darwin Core Archive checklists data as a JSON object.
  • The registry and communications protocols are utilised to poll each database in the network and retrieve an index of the biodiversity data records they contain. The index includes the key taxonomic, geospatial, and provenance elements of the data record. This allows the data to be visually represented, for instance, on a map of the Earth.
  • The data in the index are made available through the GBIF data portal. A primary means by which data are accessed is via taxonomic organisation – either by searching for a taxon by keyword or by browsing through a taxonomic hierarchy.
  • Currently the GBIF index stands at over 310 million records from over 9000 different databases. Each of these data records records the name of the taxon, usually a species, that the record is associated with. The total number of scientific names in this virtual dataset exceeds 6 million different text strings – far exceeding the number of known species. Correctly interpreting this list of names is a key requirement in enabling effective use of the index.
  • GBIF has invested heavily in the development of Darwin Core Archive data publishing tools and supporting documentation.
  • Before I describe the challenges inherent to the index, I’d like to illustrate how primary biodiversity data has been used in various scientific and biodiversity policy-related contexts.

Remsen EOL Content Summit Remsen EOL Content Summit Presentation Transcript

  • EOL Content Summit, Barro Colorado Island, Panama Global Biodiversity Information Facility David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF) January 2012
  • GBIF and its parts
  • GBIF is composed of countries
  • GBIF Governing Board
  • GBIF Organisation
  • GBIF Participant Countries
  • Why is this important?
    • Capacitate contributing countries
    • Support creation of national biodiversity information facilities
    • Serve as a means to mobilise and discover biodiversity data – not an ends
      • Not just a single portal application
  • GBIF Data Scope
  • PRIMARY BIODIVERSITY DATA
  • PRIMARY BIODIVERSITY DATA
  • SPECIES INFORMATION
  • SPECIES INFORMATION Distribution Species Descriptions!! Classification Synonymy Bibliography Specimens Common Names Images Annotated Species Checklists General Descriptions Morphology Behavior Conservation Diagnosic Reproduction
  • GBIF Infrastructure Components
  • GBIF IS A FEDERATED NETWORK A “network of networks”
  • Heterogenous biodiversity databases
  • Standard Formats/Protocols set the scope of the network
  • Standard Formats/Protocols set the scope of the network
  • Darwin Core Archives A text-based solution to publishing biodiversity data
  • Core Data file Each row=1 taxon
    • CSV or TAB
    • Easily exported from DB
    • Easy to import into Excel
    • Classification
    • Synonymy
    • Checklist parts
    Taxon
  • Extending Darwin Core one-to-many one-to-many
    • Extensions defined via simple schema
    • Darwin Core or other terms
    • Linked to controlled vocabularies
    • One taxa – many extension records
    • Simple to Export
    • Simple to Manage
    • Supports sharing of EOL content
    Taxon Taxon Descriptions Distribution
  • Archive is stand-alone data file No complicated protocols required Data is shared with URLs
  • Standards-based Data Publishing Data publishing tools User Guides, References, Best Practices
  • Integrated Publishing Toolkit
  • Easy to customise/internationalise
  • http:// tools.gbif.org /resource-browser/ Knowledge Organisation System
  • Common discovery system http:// gbrds.gbif.org
  • http:// gbrds.gbif.org /registry/ service.json?type =DWC-ARCHIVE-CHECKLIST Access to resources
  • Common discovery system http://gbrds.gbif.org
  • GLOBAL DATA INDEX
  • DATA PORTAL DISCOVERY ACCESS
  • 317,199,241 data records 9,290 datasets 6,112,683 “names”
  • Nodes Portal Toolkit http:// npt-demo.gbif.org /
  • EOL discussion points
  • How can EOL and GBIF simplify the process of mobilisation and discovery of biodiversity data/content?
  • Leverage and contribute to a common biodiversity data mobilisation network
  • Dataset Registry
  • Adoption of DwC-A for some EOL resources Particularly those within common scope of EOL and GBIF
  • Develop Shared Vocabularies Internationalise them
  • Darwin Core Archive-related documentation
  • IPT and other publishing tools One tool: multiple data types GBIF support of Plinian Core Audubon Core See: Customizing the IPT
  • Other ideas? [email_address]