Remsen Lect04


Published on

David Remsen lecture on Tuesday, Sept 15, 2009, for the Biodiversity Informatics Course, a Swedish Taxonomy Initiative (Svenska Artprojektet) course at the Swedish Natural History Museum, Stockholm, supported by the Swedish Species Service (ArtDatabanken) and the Swedish GBIF node.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Remsen Lect04

  1. 1. GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen, Senior Programme Officer, GBIF 15 September 2009, Biodiversity Informatics WWW.GBIF.ORG Global Names Architecture A Rationale Brief History Components
  2. 2. Biodiversity Information: A focus on taxa All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge. - Grimaldi & Engel, 2005, Evolution of the Insects Biodiversity Informatics: Creation, Curation, Discovery, Delivery of biodiversity information
  3. 3. A name that serves as a link to what has been learned in the past… From T.E. Glover, The Fishes of Southwestern Japan, c.1870
  4. 4. A name that serves as a link to what has been learned in the past… Unlike many other domains of science, historic publications have continued importance.
  5. 5. … and that we today add to the body of knowledge. From T.E. Glover, The Fishes of Southwestern Japan, c.1870
  6. 6. GBIF index 177 million records (> 5%/month) G igabytes of text (~100 now) All data mobilized through GBIF
  7. 7. Biodiversity Information Species information “tied” to scientific names
  8. 8. T he “Names Problem” <ul><li>Not Stable </li></ul><ul><ul><li>5-10% names invalidated/decade </li></ul></ul><ul><li>Not unique </li></ul><ul><li>No complete list of names </li></ul><ul><li>No complete list of species </li></ul><ul><ul><li>No agreement on how many </li></ul></ul><ul><ul><li>Even within a single group </li></ul></ul><ul><li>Impacts discovery and access of information about species </li></ul>
  9. 9. T he “Names Problem” <ul><li>Properties of Names </li></ul><ul><ul><li>Orthographic (As labels of text that are “tied” to information about species) </li></ul></ul><ul><ul><li>Nomenclature (As the core “words” of taxonomy that tie a name to a original publication and type) </li></ul></ul><ul><ul><li>Taxonomy (As components of taxon definitions derived via authoritative taxonomic rigor) </li></ul></ul>
  10. 10. Orthography <ul><li>Orthography and the Names Problem </li></ul><ul><li>Objectives for Remediation </li></ul>
  11. 11. Variations in name spelling Loligo pealeii Loligo pealii Loligo pealei
  12. 12. Some names are more hard to spell than others Actinobacillus actimomycetemcomitans Actinobacillus actimycetemcomitans Actinobacillus actinmycetemcomitans Actinobacillus actinomicetemcomitans Actinobacillus actinomy Actinobacillus actinomyce Actinobacillus actinomycemcomitans Actinobacillus actinomyceremcomitans Actinobacillus actinomycetam Actinobacillus actinomycetamcomitans Actinobacillus actinomycetecomitans Actinobacillus actinomycetemcmitans Actinobacillus actinomycetemcomintans Actinobacillus actinomycetemcomitance Actinobacillus actinomycetemcomitans Actinobacillus actinomycetemcomitants Actinobacillus actinomycetemcommitans Actinobacillus actinomycetemocimitans Actinobacillus actinomycetencomitans Actinobacillus actinomycetum Actinobacillus actinomyctemcomitans Actinobacillus actinomyectomcomitans Actinobacillus actinomyetemcomitans Actinobacillus actinonmycetemcomitans Actinobacillus actionomycetemcomitans Actinobacillus actynomicetemcomitans Actinobacillus antinomycetemcomitans <ul><li>Difficulties with Latinized Names </li></ul><ul><li>Transcription errors </li></ul>Which one is the correct one?
  13. 13. Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Gerardia paupercula borealis Gerardia paupercula var. borealis Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) Pennell Gerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell Many ways to correctly spell a name Should GBIF/EoL/BHL display all/one/some?
  14. 14. Objectives <ul><li>Informatics can contribute </li></ul><ul><ul><li>Index names occurring in content we wish to publicise and access </li></ul></ul><ul><ul><li>Develop tools to extract, catalog, and match names. </li></ul></ul><ul><ul><li>Reconcile names to authoritative names sources via a common resolution path </li></ul></ul><ul><ul><li>Reconcile name occurrence to taxonomic concepts via a common concept resolution path </li></ul></ul>
  15. 15. Nomenclature <ul><li>Nomenclatural aspects of the names problem. </li></ul><ul><li>Approaches for remediating them </li></ul>
  16. 16. Don’t pass on bad information. How can we determine the status of the names we discover in content that we serve?
  17. 17. Nomenclatural changes impact search and retrieval Where can I find out these names are related? Zoological Code doesn’t track recombinations Botanical Code does.
  18. 18. Nomenclatural changes impact search and retrieval
  19. 19. Homonyms Peranema – the fern Peranema – the euglenid How many Peranema are there? How can I tell them apart?
  20. 20. Homonyms Taxonomic context alone doesn’t tell me enough. Kingdom Phylum Class Order Family Genus Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe Plantae Oenanthe Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe Plantae Orchidaceae Oenanthe Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe Animalia Chordata Aves Passeriformes Turdidae Oenanthe Animalia Chordata Actinopterygii Perciformes Pomatomidae Pomatomus Animalia Chordata Pisces Perciformes Serranidae Pomatomus
  21. 21. Approaches to remediation <ul><li>Consolidate the major nomenclatural databases </li></ul><ul><ul><li>A single nomenclatural dictionary </li></ul></ul><ul><ul><ul><li>Populate with provisionally verified records and enable open annotation </li></ul></ul></ul><ul><ul><li>Provides nomenclatural status of a name </li></ul></ul><ul><ul><li>Collectively identifies all homonyms. Identifiers used in taxonomic data provide disambiguation context </li></ul></ul><ul><ul><li>Ties all distinct nomenclatural combinations to the original published name. </li></ul></ul><ul><li>Informatics </li></ul><ul><ul><li>Promote global identifiers and simple resolution pathway for these data </li></ul></ul>
  22. 22. Taxonomy <ul><li>Taxonomic Examples of the Names problem </li></ul><ul><li>Approaches for remediating them </li></ul>
  23. 23. Taxonomic synonyms Halichondria panicea (Pallas 1776) sec Van Soest 2002 (WoRMS)
  24. 24. Consequences of Splitting Taxon Concept problem: What does someone mean when they refer to P. carinii
  25. 25. The Perils of Lumping Bear Lodge meadow jumping mouse. Zaphus hudsonius campestris Zaphus hudsonius preblei INCLUDES DOES NOT INCLUDE Dr. Rob Roy Ramey says Dr. Tim King says Preble’s meadow jumping mouse. What should a search for “Zaphus hudsonius campestris” return?
  26. 26. Different taxonomic views, different # species, different names Taxonomic Backbones: Scope and completeness
  27. 27. Organisational value of Non-Taxonomic Lists
  28. 28. Approaches to remediation <ul><li>An inventory of different taxonomic catalogues </li></ul><ul><ul><li>Inform if there are concept issues for the species </li></ul></ul><ul><li>Provide synonymised taxon concepts with unique and resolvable identifiers </li></ul><ul><li>Multiple classifications via checklists and catalogues accessible and utilised as organisational frameworks for species information </li></ul>
  29. 29. Summary <ul><ul><li>A data publication framework that enables </li></ul></ul><ul><ul><ul><li>A complete index of all names that are tied to information about species </li></ul></ul></ul><ul><ul><ul><ul><li>Tools and infrastructure to support this. </li></ul></ul></ul></ul><ul><ul><ul><li>A complete index of verified nomenclature and a identification and resolution system to make it easy to tie a name to an authoritative record. </li></ul></ul></ul><ul><ul><ul><li>A global taxonomic resolution system that allows a particular usage of a name to be tied to a defined taxon. </li></ul></ul></ul><ul><ul><li>A system that puts taxonomy as a global organisational framework for species information. </li></ul></ul>
  30. 30. Inventory and Index
  31. 31. uBio Indexes
  32. 32. Web Service outputs Taxon Object
  33. 33. Web Service calls from client applications
  34. 34. Taxonomic organisation of content
  35. 35. Taxonomic organisation of content
  36. 36. Indexes support processes that support discovery
  37. 37. That enable new and better tools and services
  38. 38. Formalise the Architecture
  39. 39. Coordinate Communities of Interest
  40. 40. Summary: GNA Objectives <ul><ul><li>A complete index of names tied to information about species reconciled to a common and verified nomenclatural dictionary. </li></ul></ul><ul><ul><li>This same dictionary forms the basis for multiple expressions of taxonomic catalogues, regional checklists, and thematic lists of species. </li></ul></ul><ul><ul><li>These lists are openly accessible and tied to services and processes that enable them to be effectively employed in data organisation and retrieval. </li></ul></ul><ul><ul><li>Collectively, these components serve the delivery and utilisation of biological knowledge. </li></ul></ul>
  41. 41. Thank you [email_address] Skype:dremsen