SlideShare a Scribd company logo
GBIF Checklist Bank
Checklist index, name matching & backbone building
Markus Döring, GBIF
Uppsala, October 2016
Organizing Occurrences
• GBIF needs a single, consistent taxonomy
• for metrics, search, maps
• considerable variation in higher taxa
• synonymies can be very large
• Catalogue of Life largest single source
• ~90% of GBIF occurrence records (thanks to birds)
• ~60% of GBIF occurrence names (35% in 2010)
• GBIF needs to assemble a taxonomy
• originally merged (noisy) names found 

in occurrences. Resulted in lots of duplicates
• improved by stitching together checklist datasets
• include fossil names
Cronquist classification
Mimosaceae: 3,200 species
Caesalpiniaceae: 2,000 species
Fabaceae: 14,000 species
“Modern” classification
Fabaceae: 19,200 species
Mimosoideae: 3,200 species
Cæsalpinioideae: 2,000 species
Faboideae: 14,000 species
Checklist Bank Goals
• Index to “checklists”, i.e. full taxonomies & simple name lists
• uniform API
• standard use of DwC terms
• Source for GBIF Backbone names
• better quality than using GBIF occurrence names
• stores & serves also the GBIF Backbone in the same way
• Name matching service for GBIF Backbone
• for dirty occurrence names
• for cleaner checklist names
• Link same names across checklists
Indexed Checklists Oct 2016
16.519 datasets registered 22.1 million name records
Plazi (14.861), Scratchpads (998), Pensoft (217), CoL GSDs (160)
Backbone Sources
• Catalogue of Life
• GBIF Algae Classification
• ION Taxonomic Hierarchy
• World Register of Marine Species
• Catalogue of Afrotropical Bees
• EDIT Cichorieae
• World Typhlocybinae database
• Spinnengids
• Afromoths
• True Fruit Flies of the Afrotropical Region
• Fauna Europaea
• Euro+Med Plantbase
• Beetles (Coleoptera) of Canada and Alaska
• The Clements Checklist
• IOC World Bird Names
• Nomenclators
• IPNI
• Index Fungorum
• Prokaryotic
Nomenclature Up-
to-date
• ICTV Master
Species List
• Publisher
• Species Files
• Diversity Taxon
Names
• Plazi articles
• Mammal Species of the World
• Dyntaxa - Svensk taxonomisk databas
• Artsnavnebasen
• GRIN Taxonomy
• Flora of Brazil
• Database of Vascular Plants of Canada
• Plant List
• ITIS
• TAXREF
• The National Checklist of Taiwan
• Endemic species in Taiwan
• IRMNG
• Index Fungorum
• Paleobiology Database
https://github.com/gbif/checklistbank/blob/master/checklistbank-nub/nub-sources.tsv
Backbone Building
• Overlay prioritised sources
• start with Catalog of Life
• primary source defines status
• create new name if kingdom, canonical name & authorship do not exist in
current nub
• Ignore source name if …
• not a major Linnean rank (infraspecifc ranks are included)
• higher ranks above family (configurable per source, CoL for higher taxa)
• status conflicts with previously encountered status
• hybrid formula, cultivar, candidatus or placeholder names !!!
• Rebuild backbone every 4 month
Catalogue of Life
Fauna
Europaea
GRIN
Mammal
Species
World
Specimens 8000 Species Lists
10s of taxonomic resources
Me
Backbone Metrics
accepted by kingdom
Animalia
Archaea
Bacteria
Chromista
Fungi
Plantae
Protozoa
Viruses
Other
100 10.000 1.000.000
names by status
total
accepted
doubtful
synonym
homotypic synonym
heterotypic synonym
proparte synonym
100 10.000 1.000.000
accepted by rank
kingdom
phylum
class
order
family
genus
species
subspecies
variety
form
unranked
100 10.000 1.000.000
2.693.192
1.937.869
439.243
2.525.274
401.693
22.165
1.695.892
554.581
202.943
92.247
name by various
catalogue of life
IRMNG
IPNI
basionym
basionym placeholder
basionym derived
ex-author synonym
proparte
100 10.000 1.000.000
3.175.925
733.720
114.127
458.189
638.146
5.307.978
Backbone Name Matching
• Fuzzy mode for occurrence names
• fuzzy name match
• fuzzy classification match
• allow higher rank matches, e.g. to genus or class only
http://api.gbif.org/v1/species/match?kingdom=Plantae&name=Oenante
• Strict mode for checklist names
• kingdom match required
• rank match required
• canonical name match required
• allow double letters and few common misspellings (ll->l, y->i, rh->r)
• gender neutral epithet matching
• lose authorship comparison
http://api.gbif.org/v1/species/match?strict=true&kingdom=Plantae&name=Oenanthe L.
Name Matching Issues
• Homonyms
• legal cross code homonyms
• synonyms with different authors
• monomials at different rank
difficult but solved !!!
• Name not in backbone
• regular gaps (especially fossils, molluscs & insects)
• non Linnean rank (e.g. subclass Vertebrata)
• No Taxon concept matching
• concepts hard to define
• type specimen information rare & not well structured
• synonyms & included children good candidate for comparison
• tough for occurrences. No taxonomies & hardly any taxonomic references used
Storing Name Matches
• Avoid exponential link growth
• thousands of datasets possible
• store links to backbone only
• allows all crosswalks in 2 steps 

if backbone is complete
Abies alba Mill.
Abies alba Miller
Abies alba
Abies alba Mill.
Abies alba Mill.
Abies alba Mill.
Abies alba Mill.
Abies alba Mill.
Abies alba Miller
Abies alba
Abies alba Mill.
Abies alba Mill.
Abies alba Mill.
Backbone Identifier
• Name identifier, not accepted taxa
• Status, synonyms, classification, included taxa, description or types
are ignored
• Unchanged for lexical group of name strings
• Macrozamia platyrhachis F. M. Bailey
• Macrozamia platyrhachis
• Macrozamia platyrachis Bailey
• Uses strict matching service to group name strings
• Stable over different backbone versions
• Deleted names still resolve
Backbone Assembling
Animalia
Archaea
Bacteria
Chromista
Fungi
Plantae
Protozoa
Viruses
incertae sedis
• Nub build starts with 8
kingdoms
Backbone Assembling
Plantae
Magnoliophyta
Magnoliopsida
Asterales
Asteraceae
Helianthus L.
Helianthus anuus L.
• Catalog of Life is added
• Defines higher classification
Plantae
Magnoliophyta
Magnoliopsida
Asterales
Asteraceae
Helianthus L.
Helianthus anuus L.
Backbone Assembling
Plantae
Magnoliophyta
Magnoliopsida
Asterales
Asteraceae
Helianthus L.
Helianthus anuus L.
Cichorium
Cichorium intybus L.
• Missing genera are created
• Tribe is ignored
Asteraceae
Cichorieae Lam & DC. [tribe]
Cichorium intybus L.
Backbone Assembling
Plantae
Magnoliophyta
Magnoliopsida
Asterales
Asteraceae
Helianthus L.
Helianthus anuus L.
Cichorium Linneaus
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clementi
• Synonyms respect authors
• Author match very loose
• Existing genus author updated
Plantae
Asteraceae
Cichorium Linneaus
Cichorium intybus Linneaus
= Cichorium balearicum Porta
= Cichorium byzantinum Clem.
= Cichorium byzantinum Clementi
Backbone Assembling
Plantae
Magnoliophyta
Magnoliopsida
Asterales
Asteraceae
Helianthus L.
Helianthus anuus L.
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Prefer authors from
nomenclators
Asteraceae
Cichorium L.
Cichorium byzantinum Clem.
Backbone Assembling
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
A. a. var. maritima (E. Sheld.) Baird
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Infraspecific names included
Asteraceae
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
A. a. var. maritima (E. Sheld.) Baird
Backbone Assembling
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
A. a. var. maritima (E. Sheld.) Baird
Agoseris eastwoodiae Fedde
Agoseris maritima E. Sheld.
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Other source treats them

as species
• Same canonical maritima
allowed twice - author different
Asteraceae
Agoseris eastwoodiae Fedde
Agoseris maritima E. Sheld.
Final Cleanup - Basionyms
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
= Agoseris eastwoodiae Fedde
A. a. var. maritima (E. Sheld.) Baird
= Agoseris maritima E. Sheld.
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Finally basionyms are detected
• by terminal epithet & author
within a family
• skip epithets that are used in
multiple original names
• Only 1 accepted per group
• the name from the most
trusted source stays
Final Cleanup - Autonyms
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. apargioides
A. a. var. eastwoodiae (Fedde) Munz
= Agoseris eastwoodiae Fedde
A. a. var. maritima (E. Sheld.) Baird
= Agoseris maritima E. Sheld.
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Create missing autonyms

More Related Content

Similar to GBIF ChecklistBank and Backbone building

10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?
Tony Rees
 
Something general on Eukaryotic Taxonomy
Something general on  Eukaryotic TaxonomySomething general on  Eukaryotic Taxonomy
Something general on Eukaryotic Taxonomy
EukRef
 
Angiosperm systematics and biodiversity
Angiosperm systematics and biodiversityAngiosperm systematics and biodiversity
Angiosperm systematics and biodiversity
DrReshma Sonwalkar
 
History of classification
History of classificationHistory of classification
History of classification
vjcummins
 
Evolution natural selection
Evolution natural selectionEvolution natural selection
Evolution natural selectionKelly D
 
Classificationnomenclature
ClassificationnomenclatureClassificationnomenclature
ClassificationnomenclatureJohn Gruber
 
uBio presentation to UMLS group of NLM / NIH
uBio presentation to UMLS group of NLM / NIHuBio presentation to UMLS group of NLM / NIH
uBio presentation to UMLS group of NLM / NIH
David Remsen
 
Angiosperms
AngiospermsAngiosperms
Angiosperms
Dion Orquia
 
Taxonomy_Classification_17_.ppt
Taxonomy_Classification_17_.pptTaxonomy_Classification_17_.ppt
Taxonomy_Classification_17_.ppt
aprilrances1
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendly
William Ulate
 
Introduction to plant Systematics by sarah Ashfaq.pptx
Introduction to plant Systematics by sarah Ashfaq.pptxIntroduction to plant Systematics by sarah Ashfaq.pptx
Introduction to plant Systematics by sarah Ashfaq.pptx
SarahAshfaq4
 
Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)Qaisar Khan
 
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
Martin Kalfatovic
 
Introduce the kingdam of animalia
Introduce the kingdam of  animaliaIntroduce the kingdam of  animalia
Introduce the kingdam of animalia
GANAPATHIS16
 
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for successNigel J. Robinson - ZooBank and Zoological Record - a partnership for success
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
ICZN
 
Variety of life, Binomial Nomenclature
Variety of life, Binomial NomenclatureVariety of life, Binomial Nomenclature
Variety of life, Binomial Nomenclature
Iram Qaiser
 
Taxonomy of Angiosperm for M.Sc. Students
Taxonomy of Angiosperm for M.Sc. StudentsTaxonomy of Angiosperm for M.Sc. Students
Taxonomy of Angiosperm for M.Sc. Students
DrShriramKunjam1
 
WHAT IS Taxonomy_Classification_17_.ppt
WHAT IS  Taxonomy_Classification_17_.pptWHAT IS  Taxonomy_Classification_17_.ppt
WHAT IS Taxonomy_Classification_17_.ppt
dawitg2
 
ICBN BOTANY.pptx
ICBN BOTANY.pptxICBN BOTANY.pptx
ICBN BOTANY.pptx
DINESHKUMAWAT46
 
Garner ecn 2012
Garner ecn 2012Garner ecn 2012
Garner ecn 2012ECNOfficer
 

Similar to GBIF ChecklistBank and Backbone building (20)

10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?10 years of global biodiversity databases: are we there yet?
10 years of global biodiversity databases: are we there yet?
 
Something general on Eukaryotic Taxonomy
Something general on  Eukaryotic TaxonomySomething general on  Eukaryotic Taxonomy
Something general on Eukaryotic Taxonomy
 
Angiosperm systematics and biodiversity
Angiosperm systematics and biodiversityAngiosperm systematics and biodiversity
Angiosperm systematics and biodiversity
 
History of classification
History of classificationHistory of classification
History of classification
 
Evolution natural selection
Evolution natural selectionEvolution natural selection
Evolution natural selection
 
Classificationnomenclature
ClassificationnomenclatureClassificationnomenclature
Classificationnomenclature
 
uBio presentation to UMLS group of NLM / NIH
uBio presentation to UMLS group of NLM / NIHuBio presentation to UMLS group of NLM / NIH
uBio presentation to UMLS group of NLM / NIH
 
Angiosperms
AngiospermsAngiosperms
Angiosperms
 
Taxonomy_Classification_17_.ppt
Taxonomy_Classification_17_.pptTaxonomy_Classification_17_.ppt
Taxonomy_Classification_17_.ppt
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendly
 
Introduction to plant Systematics by sarah Ashfaq.pptx
Introduction to plant Systematics by sarah Ashfaq.pptxIntroduction to plant Systematics by sarah Ashfaq.pptx
Introduction to plant Systematics by sarah Ashfaq.pptx
 
Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)Plant genome project (COBAM, UOP, Peshawar)
Plant genome project (COBAM, UOP, Peshawar)
 
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
 
Introduce the kingdam of animalia
Introduce the kingdam of  animaliaIntroduce the kingdam of  animalia
Introduce the kingdam of animalia
 
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for successNigel J. Robinson - ZooBank and Zoological Record - a partnership for success
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
 
Variety of life, Binomial Nomenclature
Variety of life, Binomial NomenclatureVariety of life, Binomial Nomenclature
Variety of life, Binomial Nomenclature
 
Taxonomy of Angiosperm for M.Sc. Students
Taxonomy of Angiosperm for M.Sc. StudentsTaxonomy of Angiosperm for M.Sc. Students
Taxonomy of Angiosperm for M.Sc. Students
 
WHAT IS Taxonomy_Classification_17_.ppt
WHAT IS  Taxonomy_Classification_17_.pptWHAT IS  Taxonomy_Classification_17_.ppt
WHAT IS Taxonomy_Classification_17_.ppt
 
ICBN BOTANY.pptx
ICBN BOTANY.pptxICBN BOTANY.pptx
ICBN BOTANY.pptx
 
Garner ecn 2012
Garner ecn 2012Garner ecn 2012
Garner ecn 2012
 

Recently uploaded

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 

Recently uploaded (20)

Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 

GBIF ChecklistBank and Backbone building

  • 1. GBIF Checklist Bank Checklist index, name matching & backbone building Markus Döring, GBIF Uppsala, October 2016
  • 2. Organizing Occurrences • GBIF needs a single, consistent taxonomy • for metrics, search, maps • considerable variation in higher taxa • synonymies can be very large • Catalogue of Life largest single source • ~90% of GBIF occurrence records (thanks to birds) • ~60% of GBIF occurrence names (35% in 2010) • GBIF needs to assemble a taxonomy • originally merged (noisy) names found 
 in occurrences. Resulted in lots of duplicates • improved by stitching together checklist datasets • include fossil names Cronquist classification Mimosaceae: 3,200 species Caesalpiniaceae: 2,000 species Fabaceae: 14,000 species “Modern” classification Fabaceae: 19,200 species Mimosoideae: 3,200 species Cæsalpinioideae: 2,000 species Faboideae: 14,000 species
  • 3. Checklist Bank Goals • Index to “checklists”, i.e. full taxonomies & simple name lists • uniform API • standard use of DwC terms • Source for GBIF Backbone names • better quality than using GBIF occurrence names • stores & serves also the GBIF Backbone in the same way • Name matching service for GBIF Backbone • for dirty occurrence names • for cleaner checklist names • Link same names across checklists
  • 4. Indexed Checklists Oct 2016 16.519 datasets registered 22.1 million name records Plazi (14.861), Scratchpads (998), Pensoft (217), CoL GSDs (160)
  • 5. Backbone Sources • Catalogue of Life • GBIF Algae Classification • ION Taxonomic Hierarchy • World Register of Marine Species • Catalogue of Afrotropical Bees • EDIT Cichorieae • World Typhlocybinae database • Spinnengids • Afromoths • True Fruit Flies of the Afrotropical Region • Fauna Europaea • Euro+Med Plantbase • Beetles (Coleoptera) of Canada and Alaska • The Clements Checklist • IOC World Bird Names • Nomenclators • IPNI • Index Fungorum • Prokaryotic Nomenclature Up- to-date • ICTV Master Species List • Publisher • Species Files • Diversity Taxon Names • Plazi articles • Mammal Species of the World • Dyntaxa - Svensk taxonomisk databas • Artsnavnebasen • GRIN Taxonomy • Flora of Brazil • Database of Vascular Plants of Canada • Plant List • ITIS • TAXREF • The National Checklist of Taiwan • Endemic species in Taiwan • IRMNG • Index Fungorum • Paleobiology Database https://github.com/gbif/checklistbank/blob/master/checklistbank-nub/nub-sources.tsv
  • 6. Backbone Building • Overlay prioritised sources • start with Catalog of Life • primary source defines status • create new name if kingdom, canonical name & authorship do not exist in current nub • Ignore source name if … • not a major Linnean rank (infraspecifc ranks are included) • higher ranks above family (configurable per source, CoL for higher taxa) • status conflicts with previously encountered status • hybrid formula, cultivar, candidatus or placeholder names !!! • Rebuild backbone every 4 month Catalogue of Life Fauna Europaea GRIN Mammal Species World Specimens 8000 Species Lists 10s of taxonomic resources Me
  • 7. Backbone Metrics accepted by kingdom Animalia Archaea Bacteria Chromista Fungi Plantae Protozoa Viruses Other 100 10.000 1.000.000 names by status total accepted doubtful synonym homotypic synonym heterotypic synonym proparte synonym 100 10.000 1.000.000 accepted by rank kingdom phylum class order family genus species subspecies variety form unranked 100 10.000 1.000.000 2.693.192 1.937.869 439.243 2.525.274 401.693 22.165 1.695.892 554.581 202.943 92.247 name by various catalogue of life IRMNG IPNI basionym basionym placeholder basionym derived ex-author synonym proparte 100 10.000 1.000.000 3.175.925 733.720 114.127 458.189 638.146 5.307.978
  • 8. Backbone Name Matching • Fuzzy mode for occurrence names • fuzzy name match • fuzzy classification match • allow higher rank matches, e.g. to genus or class only http://api.gbif.org/v1/species/match?kingdom=Plantae&name=Oenante • Strict mode for checklist names • kingdom match required • rank match required • canonical name match required • allow double letters and few common misspellings (ll->l, y->i, rh->r) • gender neutral epithet matching • lose authorship comparison http://api.gbif.org/v1/species/match?strict=true&kingdom=Plantae&name=Oenanthe L.
  • 9. Name Matching Issues • Homonyms • legal cross code homonyms • synonyms with different authors • monomials at different rank difficult but solved !!! • Name not in backbone • regular gaps (especially fossils, molluscs & insects) • non Linnean rank (e.g. subclass Vertebrata) • No Taxon concept matching • concepts hard to define • type specimen information rare & not well structured • synonyms & included children good candidate for comparison • tough for occurrences. No taxonomies & hardly any taxonomic references used
  • 10. Storing Name Matches • Avoid exponential link growth • thousands of datasets possible • store links to backbone only • allows all crosswalks in 2 steps 
 if backbone is complete Abies alba Mill. Abies alba Miller Abies alba Abies alba Mill. Abies alba Mill. Abies alba Mill. Abies alba Mill. Abies alba Mill. Abies alba Miller Abies alba Abies alba Mill. Abies alba Mill. Abies alba Mill.
  • 11. Backbone Identifier • Name identifier, not accepted taxa • Status, synonyms, classification, included taxa, description or types are ignored • Unchanged for lexical group of name strings • Macrozamia platyrhachis F. M. Bailey • Macrozamia platyrhachis • Macrozamia platyrachis Bailey • Uses strict matching service to group name strings • Stable over different backbone versions • Deleted names still resolve
  • 13. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. • Catalog of Life is added • Defines higher classification Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L.
  • 14. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. Cichorium Cichorium intybus L. • Missing genera are created • Tribe is ignored Asteraceae Cichorieae Lam & DC. [tribe] Cichorium intybus L.
  • 15. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. Cichorium Linneaus Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clementi • Synonyms respect authors • Author match very loose • Existing genus author updated Plantae Asteraceae Cichorium Linneaus Cichorium intybus Linneaus = Cichorium balearicum Porta = Cichorium byzantinum Clem. = Cichorium byzantinum Clementi
  • 16. Backbone Assembling Plantae Magnoliophyta Magnoliopsida Asterales Asteraceae Helianthus L. Helianthus anuus L. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Prefer authors from nomenclators Asteraceae Cichorium L. Cichorium byzantinum Clem.
  • 17. Backbone Assembling Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Infraspecific names included Asteraceae Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird
  • 18. Backbone Assembling Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz A. a. var. maritima (E. Sheld.) Baird Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Other source treats them
 as species • Same canonical maritima allowed twice - author different Asteraceae Agoseris eastwoodiae Fedde Agoseris maritima E. Sheld.
  • 19. Final Cleanup - Basionyms Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. eastwoodiae (Fedde) Munz = Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird = Agoseris maritima E. Sheld. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Finally basionyms are detected • by terminal epithet & author within a family • skip epithets that are used in multiple original names • Only 1 accepted per group • the name from the most trusted source stays
  • 20. Final Cleanup - Autonyms Asteraceae Helianthus L. Helianthus anuus L. Agoseris Agoseris apargioides (Less.) Greene = A. maritima Eastw. A. a. var. apargioides A. a. var. eastwoodiae (Fedde) Munz = Agoseris eastwoodiae Fedde A. a. var. maritima (E. Sheld.) Baird = Agoseris maritima E. Sheld. Cichorium L. Cichorium intybus L. = C. balearicum Porta = C. byzantinum Clem. • Create missing autonyms