GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen, Senior Programme Officer, GBIF 15 September 2009, Biodiversity Informatics WWW.GBIF.ORG Global Names Architecture A Rationale Brief History Components
Biodiversity Information: A focus on taxa All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge. - Grimaldi & Engel, 2005, Evolution of the Insects Biodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity information
A name that serves as a link to what has been learned in the past… From T.E. Glover, The Fishes of Southwestern Japan, c.1870
A name that serves as a link to what has been learned in the past… Unlike many other domains of science, historic publications have continued importance.
… and that we today add to the body of knowledge. From T.E. Glover, The Fishes of Southwestern Japan, c.1870
GBIF index 177 million records (> 5%/month) G igabytes of text (~100 now) All data mobilized through GBIF
Biodiversity Information Species information “tied” to scientific names
T he “Names Problem”
Not Stable
5-10% names invalidated/decade
Not unique
No complete list of names
No complete list of species
No agreement on how many
Even within a single group
Impacts discovery and access of information about species
T he “Names Problem”
Properties of Names
Orthographic (As labels of text that are “tied” to information about species)
Nomenclature (As the core “words” of taxonomy that tie a name to a original publication and type)
Taxonomy (As components of taxon definitions derived via authoritative taxonomic rigor)
Orthography
Orthography and the Names Problem
Objectives for Remediation
Variations in name spelling Loligo pealeii Loligo pealii Loligo pealei
Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Gerardia paupercula borealis Gerardia paupercula var. borealis Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) Pennell Gerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell Many ways to correctly spell a name Should GBIF/EoL/BHL display all/one/some?
Objectives
Informatics can contribute
Index names occurring in content we wish to publicise and access
Develop tools to extract, catalog, and match names.
Reconcile names to authoritative names sources via a common resolution path
Reconcile name occurrence to taxonomic concepts via a common concept resolution path
Nomenclature
Nomenclatural aspects of the names problem.
Approaches for remediating them
Don’t pass on bad information. How can we determine the status of the names we discover in content that we serve?
Nomenclatural changes impact search and retrieval Where can I find out these names are related? Zoological Code doesn’t track recombinations Botanical Code does.
Nomenclatural changes impact search and retrieval
Homonyms Peranema – the fern Peranema – the euglenid How many Peranema are there? How can I tell them apart?
Homonyms Taxonomic context alone doesn’t tell me enough. Kingdom Phylum Class Order Family Genus Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe Plantae Oenanthe Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe Plantae Orchidaceae Oenanthe Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe Animalia Chordata Aves Passeriformes Turdidae Oenanthe Animalia Chordata Actinopterygii Perciformes Pomatomidae Pomatomus Animalia Chordata Pisces Perciformes Serranidae Pomatomus
Approaches to remediation
Consolidate the major nomenclatural databases
A single nomenclatural dictionary
Populate with provisionally verified records and enable open annotation
Provides nomenclatural status of a name
Collectively identifies all homonyms. Identifiers used in taxonomic data provide disambiguation context
Ties all distinct nomenclatural combinations to the original published name.
Informatics
Promote global identifiers and simple resolution pathway for these data
Consequences of Splitting Taxon Concept problem: What does someone mean when they refer to P. carinii
The Perils of Lumping Bear Lodge meadow jumping mouse. Zaphus hudsonius campestris Zaphus hudsonius preblei INCLUDES DOES NOT INCLUDE Dr. Rob Roy Ramey says Dr. Tim King says Preble’s meadow jumping mouse. What should a search for “Zaphus hudsonius campestris” return?
Different taxonomic views, different # species, different names Taxonomic Backbones: Scope and completeness
Organisational value of Non-Taxonomic Lists
Approaches to remediation
An inventory of different taxonomic catalogues
Inform if there are concept issues for the species
Provide synonymised taxon concepts with unique and resolvable identifiers
Multiple classifications via checklists and catalogues accessible and utilised as organisational frameworks for species information
Summary
A data publication framework that enables
A complete index of all names that are tied to information about species
Tools and infrastructure to support this.
A complete index of verified nomenclature and a identification and resolution system to make it easy to tie a name to an authoritative record.
A global taxonomic resolution system that allows a particular usage of a name to be tied to a defined taxon.
A system that puts taxonomy as a global organisational framework for species information.
Inventory and Index
uBio Indexes
Web Service outputs Taxon Object
Web Service calls from client applications
Taxonomic organisation of content
Taxonomic organisation of content
Indexes support processes that support discovery
That enable new and better tools and services
Formalise the Architecture
Coordinate Communities of Interest
Summary: GNA Objectives
A complete index of names tied to information about species reconciled to a common and verified nomenclatural dictionary.
This same dictionary forms the basis for multiple expressions of taxonomic catalogues, regional checklists, and thematic lists of species.
These lists are openly accessible and tied to services and processes that enable them to be effectively employed in data organisation and retrieval.
Collectively, these components serve the delivery and utilisation of biological knowledge.
David Remsen lecture on Tuesday, Sept 15, 2009, for more
David Remsen lecture on Tuesday, Sept 15, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node. less
0 comments
Post a comment