Remsen sherborne

•Download as PPT, PDF•

1 like•1,203 views

David Remsen

Technology

Anchoring Biodiversity Information: From Sherborne to the 21 st century and beyond Biodiversity Informatics – GBIFs role in linking information through scientific names. David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF) 28 October 2011

“ All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.”

GBIF IS A FEDERATED NETWORK A “network of networks”

310,132,149 data records 9,290 datasets 6,112,683 “names”

Change in suitability for cultivating common bean across the world, from present to 2020, showing a global loss in suitability, especially in Africa. Predict distribution changes

CBD Access & Benefit Sharing (Nagoya) Protocol

Challenges to presenting Taxon-oriented data

Tealia crassicornis (Müller) Urticina crassicornis (Müller)

Urticina felina (Linnaeus 1767) Tealia felina, Tealia felina, Urticina crassicornis, Urticina columbiana, Tealia crassicornis, Urticina felina, Urticina coriacea, Stomphia churchiae, Rhodactinia crassicornis, Tealia lofotensis, Leiotealia spetsbergensis, Madoniactis lofotensis, Tealia tuberculata, Bolocera eques, Tealia greenii, Cereus coriaceus, Actinia felina, Bunodes crassicornis, Actinea tuberculata, Actinea coriacea, Actinia gemmacea, Actinia crassicornis, Actinia dævisii, Actinia coriacea, Actinia holsatica,

UNTRUSTWORTHY SCIENCE Trochilidae UNTRUSTWORTHY TAXONOMY

Unraveling Homonyms Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe Plantae Oenanthe Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe ? Orchidaceae Oenanthe Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe Animalia Chordata Aves Passeriformes Turdidae Oenanthe

Difficult for user to interpret Accurate search results Yesterday Today Unraveling Homonyms

A need for nomenclators Actinobacillus actimomycetemcomitans Actinobacillus actimycetemcomitans Actinobacillus actinmycetemcomitans Actinobacillus actinomicetemcomitans Actinobacillus actinomy Actinobacillus actinomyce Actinobacillus actinomycemcomitans Actinobacillus actinomyceremcomitans Actinobacillus actinomycetam Actinobacillus actinomycetamcomitans Actinobacillus actinomycetecomitans Actinobacillus actinomycetemcmitans Actinobacillus actinomycetemcomintans Actinobacillus actinomycetemcomitance Actinobacillus actinomycetemcomitans Actinobacillus actinomycetemcomitants Actinobacillus actinomycetemcommitans Actinobacillus actinomycetemocimitans Actinobacillus actinomycetencomitans Actinobacillus actinomycetum Actinobacillus actinomyctemcomitans Actinobacillus actinomyectomcomitans Actinobacillus actinomyetemcomitans Actinobacillus actinonmycetemcomitans Actinobacillus actionomycetemcomitans Actinobacillus actynomicetemcomitans Actinobacillus antinomycetemcomitans … and TaxaMatch

Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Issues of Orthography Reconciling different forms of the same name

Effective Biodiversity Informatics requires Taxonomic and nomenclatural authority files & services ,[object Object],[object Object],[object Object],[object Object],[object Object]

A GLOBAL NAMES ARCHITECTURE Another federated network ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What's hot

Application of Whole Genome Sequencing in the infectious disease’ in vitro di...ExternalEvents

SNPs in whalesUniversity of Oregon

Biology foldingSydgold15

Project Unity: The Way of the Future for Plant BreedingPhenome Networks

Crispr technologyArulbioKumar

Molecular Systematics and BiodiversitySarwar A.D

The Chills and Thrills of Whole Genome SequencingEmiliano De Cristofaro

Role of computer science in biotechnologyParanjay Manchanda

Sharing the trail : Inspiring your students through GenOmics and other Social...gwardis

DNA Technologymgsonline

Human genome project[1]somsscience7

Michelle Poster DraftMichelle Zabat

Human genome projectDilip jaipal

Human genome projectAmjad Afridi

DNA in the LaboratoryKimberlee Dillon

Photosynthetic euglenidsEukRef

Pallavi online assignmentreshmafmtc

Magnetic phenomenon-cv19-vialsCultura Arte Sociedad (Cuarso)

What's hot (18)

Application of Whole Genome Sequencing in the infectious disease’ in vitro di...

SNPs in whales

Biology folding

Project Unity: The Way of the Future for Plant Breeding

Crispr technology

Molecular Systematics and Biodiversity

The Chills and Thrills of Whole Genome Sequencing

Role of computer science in biotechnology

Sharing the trail : Inspiring your students through GenOmics and other Social...

DNA Technology

Human genome project[1]

Michelle Poster Draft

Human genome project

DNA in the Laboratory

Photosynthetic euglenids

Pallavi online assignment

Magnetic phenomenon-cv19-vials

Viewers also liked

Nodes Portal Toolkit PrimerDavid Remsen

Remsen celebration of discoveryDavid Remsen

Tdwg 2-remsenDavid Remsen

Tdwg 1-remsenDavid Remsen

Biodiversity capecod shortDavid Remsen

Collaboration Forum KeynoteDavid Remsen

Emergent interdisciplinary research opportunity for the MBLDavid Remsen

ASP.Net MVC ile Web Uygulamaları - 1(Giriş)İbrahim ATAY

ASP.Net MVC ile Web Uygulamaları -17(MVCContrib)İbrahim ATAY

Viewers also liked (9)

Nodes Portal Toolkit Primer

Remsen celebration of discovery

Tdwg 2-remsen

Tdwg 1-remsen

Biodiversity capecod short

Collaboration Forum Keynote

Emergent interdisciplinary research opportunity for the MBL

ASP.Net MVC ile Web Uygulamaları - 1(Giriş)

ASP.Net MVC ile Web Uygulamaları -17(MVCContrib)

Similar to Remsen sherborne

Plant Pathology SeminarBongsoo Park

Big data nebraskaAdina Chuang Howe

"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...Jonathan Eisen

Big Data Field MuseumAdina Chuang Howe

2015 Soil Science of America MeetingAdina Chuang Howe

The Garden Of EdenArtSci_center

H177 Midterm DizonVictoria Vesna

Big data nebraskaAdina Chuang Howe

Applying agricultural biotechnology tools and capabilities to enhance food se...ExternalEvents

The Ginés‐Mera Fellowship Fund for Postgraduates Studies in BiodiversityCIAT

Development of genomics pipelines and its integration with breedingCIAT

Biosafety of gmos and the role of entomologistsDr. Abiodun Denloye

Currsci Sep25 2004Vishwas Chavan

Phyloinformatics: IntroductionRoderic Page

Plant genome project(aribidopsis)Muhammad Faizan Khattak

iPlant Tree of LifeNaim Matasci

Eumicrobedb - Oomycetes Genomics Database Arup Ghosh

Bioinformatics and its Applications in Agriculture/Sericulture and in other F...mohd younus wani

Text-mining and ontologies - new approaches to knowledge discovery of microbi...Claire Nedellec

Session 7: Probiotic diets to increase Queensland fruit fly male performance ...Plant Biosecurity Cooperative Research Centre

Similar to Remsen sherborne (20)

Plant Pathology Seminar

Big data nebraska

"The Quest for A field Guide to the Microbes" talk by Jonathan Eisen February...

Big Data Field Museum

2015 Soil Science of America Meeting

The Garden Of Eden

H177 Midterm Dizon

Big data nebraska

Applying agricultural biotechnology tools and capabilities to enhance food se...

The Ginés‐Mera Fellowship Fund for Postgraduates Studies in Biodiversity

Development of genomics pipelines and its integration with breeding

Biosafety of gmos and the role of entomologists

Currsci Sep25 2004

Phyloinformatics: Introduction

Plant genome project(aribidopsis)

iPlant Tree of Life

Eumicrobedb - Oomycetes Genomics Database

Bioinformatics and its Applications in Agriculture/Sericulture and in other F...

Text-mining and ontologies - new approaches to knowledge discovery of microbi...

Session 7: Probiotic diets to increase Queensland fruit fly male performance ...

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

Artificial intelligence in cctv survelliance.pptxhariprasad279825

CloudStudio User manual (basic edition):comworks

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Search Engine Optimization SEO PDF for 2024.pdfRankYa

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Training state-of-the-art general text embeddingZilliz

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

My INSURER PTE LTD - Insurtech Innovation Award 2024

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

Vector Databases 101 - An introduction to the world of Vector Databases

Artificial intelligence in cctv survelliance.pptx

CloudStudio User manual (basic edition):

DevEX - reference for building teams, processes, and platforms

Anypoint Exchange: It’s Not Just a Repo!

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Search Engine Optimization SEO PDF for 2024.pdf

Unraveling Multimodality with Large Language Models.pdf

DMCC Future of Trade Web3 - Special Edition

Training state-of-the-art general text embedding

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Designing IA for AI - Information Architecture Conference 2024

Dev Dives: Streamline document processing with UiPath Studio Web

Remsen sherborne

1. Anchoring Biodiversity Information: From Sherborne to the 21 st century and beyond Biodiversity Informatics – GBIFs role in linking information through scientific names. David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF) 28 October 2011

2. BIODIVERSITY INFORMATICS

3. “ All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.”

4. PRIMARY BIODIVERSITY DATA

5. PRIMARY BIODIVERSITY DATA

6. Primary Biodiversity Databases

7. GBIF IS A FEDERATED NETWORK A “network of networks”

8. COMMON COMMUNICATIONS REGISTRY

9. GLOBAL DATA INDEX

10. 310,132,149 data records 9,290 datasets 6,112,683 “names”

11. DATA PORTAL DISCOVERY ACCESS

12. Use of Primary Biodiversity Data

13. Build Provisional Species Lists

14. Validate range maps

15. Predict species distributon

16. Change in suitability for cultivating common bean across the world, from present to 2020, showing a global loss in suitability, especially in Africa. Predict distribution changes

17. CBD Access & Benefit Sharing (Nagoya) Protocol

18. Challenges to presenting Taxon-oriented data

19. Tealia crassicornis (Müller) Urticina crassicornis (Müller)

20. Urticina felina (Linnaeus 1767) Tealia felina, Tealia felina, Urticina crassicornis, Urticina columbiana, Tealia crassicornis, Urticina felina, Urticina coriacea, Stomphia churchiae, Rhodactinia crassicornis, Tealia lofotensis, Leiotealia spetsbergensis, Madoniactis lofotensis, Tealia tuberculata, Bolocera eques, Tealia greenii, Cereus coriaceus, Actinia felina, Bunodes crassicornis, Actinea tuberculata, Actinea coriacea, Actinia gemmacea, Actinia crassicornis, Actinia dævisii, Actinia coriacea, Actinia holsatica,

21. Gap in Synonymies

22. UNTRUSTWORTHY SCIENCE Trochilidae UNTRUSTWORTHY TAXONOMY

23. TRUSTED TAXONOMY BETTER SCIENCE

24. Unraveling Homonyms Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Umbelliferae Oenanthe Plantae Oenanthe Oenanthe Plantae Magnoliophyta Magnoliopsida Apiales Apiaceae Oenanthe ? Orchidaceae Oenanthe Animalia Chordata Aves Passeriformes Muscicapidae Oenanthe Animalia Chordata Aves Passeriformes Turdidae Oenanthe

25. Difficult for user to interpret Accurate search results Yesterday Today Unraveling Homonyms

26.

27. A need for nomenclators Actinobacillus actimomycetemcomitans Actinobacillus actimycetemcomitans Actinobacillus actinmycetemcomitans Actinobacillus actinomicetemcomitans Actinobacillus actinomy Actinobacillus actinomyce Actinobacillus actinomycemcomitans Actinobacillus actinomyceremcomitans Actinobacillus actinomycetam Actinobacillus actinomycetamcomitans Actinobacillus actinomycetecomitans Actinobacillus actinomycetemcmitans Actinobacillus actinomycetemcomintans Actinobacillus actinomycetemcomitance Actinobacillus actinomycetemcomitans Actinobacillus actinomycetemcomitants Actinobacillus actinomycetemcommitans Actinobacillus actinomycetemocimitans Actinobacillus actinomycetencomitans Actinobacillus actinomycetum Actinobacillus actinomyctemcomitans Actinobacillus actinomyectomcomitans Actinobacillus actinomyetemcomitans Actinobacillus actinonmycetemcomitans Actinobacillus actionomycetemcomitans Actinobacillus actynomicetemcomitans Actinobacillus antinomycetemcomitans … and TaxaMatch

28. Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Issues of Orthography Reconciling different forms of the same name

29. Parsing

30. Dictionaries

31. Rapidly mines names from literature

32.

33.

Editor's Notes

Biodiversity informatics fills a space between traditional bioinformatics with its focus on genomes and Ecoinformatics that looks at entire landscapes and their interaction with the physical world. Biodiversity informatics focuses on taxa and their interactions among each other.
Nomenclature and taxonomy plays a central role within the handling of biodiversity information because nearly every piece of information or data related to a species (or more specifically – a taxon) is labeled with a scientific name.
GBIF has a specific focus within biodiversity information in that our scope is restricted to the mobilisation, discovery, and use of primary biodiversity data. Primary biodiversity data are the digital text or multimedia data records that detail the instance of an organism – the ‘what, where, when, how and by whom’ of the organism’s occurrence and recording. One major class of primary biodiversity data is that derived from natural history collections.
A second class of primary biodiversity data originate with observations of species and there are numerous instances of observational data networks that collect millions of species observations every year.
These different classes of biodiversity information are typically stored in databases of some sort that are hosted throughout the world. These databases may contribute to larger networks or act as standalone data access systems. In most cases, the data are made available for access to the Internet through a variety of gateways or portals.
GBIF represents a federated network that is composed of thousands of different primary biodiversity databases located all over the world.
The thing that makes all of these different databases part of the GBIF network are: These data are made available on the Internet using a common set of communications protocols and data formats. A registry, representing a list of all members of the network and the location of the data itself (often a URL) serves as a master network directory.
The registry and communications protocols are utilised to poll each database in the network and retrieve an index of the biodiversity data records they contain. The index includes the key taxonomic, geospatial, and provenance elements of the data record. This allows the data to be visually represented, for instance, on a map of the Earth.
Currently the GBIF index stands at over 310 million records from over 9000 different databases. Each of these data records records the name of the taxon, usually a species, that the record is associated with. The total number of scientific names in this virtual dataset exceeds 6 million different text strings – far exceeding the number of known species. Correctly interpreting this list of names is a key requirement in enabling effective use of the index.
This graph shows the growth of the GBIF occurrence index since 2007.
Before I describe the challenges inherent to the index, I’d like to illustrate how biodiversity data has been used in various scientific and biodiversity policy-related contexts.
In this example, occurrence data from the GBIF network has been geospatially joined with world protected area boundaries to generate provisional species lists and data distribution summaries for the protected area.
Occurrence data has been combined with IUCN species range maps both to validate the distribution and identify potential gaps in coverage.
Species occurrence data is geo-spatially integrated with additional data types such as climatic data to create an ecological profile for the species. Aquamaps uses ecological niche modeling to predict the distribution of marine species.
In the example illustrated here, the model outputs project changes in distribution of a crop species based on possible climate change scenarios.
Researchers at Lancaster University have utilised GBIF data mining tools and occurrence index to extract over 65,000 species names from the US and Worlds Patent indices and determine the distribution of these species among the worlds nations in order to inform Access and Benefit Sharing processes demanded by developing countries as a component of the Convention on Biological Diversity
The uses illustrated here require access to primary biodiversity data that is organised around taxa – either species or higher groups like familes. This organisation is challenged by a number of different factors which I would like to illustrate.
In a federated data environment, specimens may be labeled with different names that refer to the same species. Here is an example of a pair of nomenclatural synonyms that are initially interpreted as distinct taxa and subsequently result in distinct occurrence data maps.
Access to authoritative synonymised species checklists, when properly annotated and interpreted, enable data records labeled with different names to be linked to the same taxon. This clearly impacts the resultant data distribution output and any subsequent uses of these data. A challenge for GBIF has been in 1) gaining access to taxonomic authority files. Until recently the only major taxonomic data source was the Catalogue of Life – a wonderful resource but one that only partially addressed this problem within the GBIF index.
Edward Dickonson mentioned the problem with synonymy in birds and their compilations being scattered among a range of resources. A consequence of this is illustrated here where the Catalogue of Life provides the correct name for the blue tit, it does not include the original combination of the name coined by Linnaeus and as a consequence, misses the majority of occurrences in the index.
Without access to sufficient authoritative taxonomic data, we have been forced to rely on less-accurate classification data originating in occurrence datasets. These datasets often contain errors such as illustrated here where a synonym of a European bird species was mistakenly placed in the hummingbird family. This creates knock-on effects that impact use beyond the single species to the entire family.
With access to a more complete array of authoritative taxonomic sources, we are able to match more taxa and improve the taxonomic backbone used to organise and present species data records.
The lack of a comprehensive multi-regnal nomenclator means that we have no clear indication of the number of homonyms that exist nor a method for determining which classification is ‘correct’ As a result the GBIF index may provide a confusing array of options for a user. Illustrated above is a typical case where we have a number of different Oenanthe but lack sufficient external taxonomic resources to reconcile this number any further.
Access to a wider array of nomenclatural sources reveals there are exactly two genera with this name and includes a common name to help distinguish them.
Difficulties with orthography in scientific names starts at the source. Here are some examples of insect specimen labels that have been transcribed to electronic databases.
It may come as no surprise, therefore to see the sort of variation that may exist in a federated dataset for some of the more complex scientific names. Considerable work has gone into the development of ‘fuzzy matching’ algorithms, notably Tony Rees’ TaxaMatch. But it’s only authoritative nomenclatural sources that can inform us which is the correctly spelled version of the name.
Reconciling orthograpny and nomenclature presents problems beyond simple misspellings. Nomenclatural formats include authorship, infraspecific ranks, and other notation, For a computer, all of these strings represent different names and present challenges to properly organising data records in a federated environment.
Taxonomic name parsing services provide a solution for matching different forms of the same name whenever biodiversity data needs to be integrated from multiple sources. The service atomises name into recognisable constituent parts and reassembles a simplified canonical form that can will be equivalent for the different versions of the name.
These name parsers – combined with authoritative nomenclatural data – extend the utility of this service by providing the raw materials for creating specialised taxonomic name dictionaries.
These dictionaries, combined with software, result in name-mining services that can locate scientific names in literature– on specimen labels – and other full-text publications. It can rapidly and accurately extract all scientific names from large compilations of literature. Such services are employed by the BHL to develop taxonomic indices and by the CBD data mining example I cited earlier
How do we facilitate this?
At GBIF we are working today on extending our architectural framework to serve as a contributor to a Global Names Architecture. A framework that supports the discovery of, and access to, a range of nomenclatural and taxonomic resources. To enable the development of new integrated resources such as a consolidated nomenclatural index that can serve as a core authoritative names dictionary from which different taxonomies may be tied. And to promote the development of name services that enable taxonomy to serve as the core organisational framework for all biodiversity information. Thank you.

Remsen sherborne

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (9)

Similar to Remsen sherborne

Similar to Remsen sherborne (20)

More from David Remsen

More from David Remsen (14)

Recently uploaded

Recently uploaded (20)

Remsen sherborne

Editor's Notes