5. Checklist Challenges
• Highly relational taxonomic data, almost all records linked in tree & basionym
• Wrong or missing records destroy dataset integrity, not just a single record!
• Different to flat, unrelated occurrence records
• Data Quality
• broken referential integrity
• bad names or placeholders (e.g. «Unallocated Family»)
• missing or unused controlled vcabularies, e.g. «art» for rank species
• Name strings can be published in several ways
• ScientificName
• ScientificName + Authorship
• Genus + SpeciesEpitheton + Rank + InfraspecificEpitheton + Authorship
• Classifications can be published in several ways
• Normalised via parentNameUsageID
• Normalised via parentNameUsage
• Denormalised via Kingdom,Phylum,Class,Order,Family,Genus
6. Checklist Indexing
• Basic archive validation
• unique ids
• Checklist Normalizer
• resolve relations
• create implicit taxa from denormalised classification
• interpret controlled vocabularies, e.g. rank
• match to backbone
• match to previous version to keep GBIF ids stable
• Checklist Importer
• Inserts data to PostgresDB and solr index for searches
• Checklist Analyser
• generate dataset metrics
7. Organizing Occurrences
• GBIF needs a single, consistent taxonomy
• for metrics, search, maps
• considerable variation in higher taxa
• synonymies can be very large
• Catalog of Life is largest single source
• ~90% of GBIF occurrence records (thanks to birds)
• ~50% of GBIF occurrence names (35% in 2010)
• GBIF needs to assemble a taxonomy
• originally merged (noisy) names found
in occurrences. Resulted in lots of duplicates
• improved by stitching together checklist datasets
Cronquist classification
Mimosaceae: 3,200 species
Caesalpiniaceae: 2,000 species
Fabaceae: 14,000 species
“Modern” classification
Fabaceae: 19,200 species
Mimosoideae: 3,200 species
Cæsalpinioideae: 2,000 species
Faboideae: 14,000 species
8. Current Backbone Issues
• Far too many accepted species (acc/syn)
• Cactaceae: GBIF 12.062 (342 syn), TPL 2.233 (5.422 syn) + 5.500 unknown
• Genus Weingartia: GBIF 129 (0 syn), TPL 8 (26 syn) + 68 unknown
• Many accepted names based on the same basionym
• Sulcorebutia breviflora Backeb.
• Weingartia breviflora (Backeb.) Hentzschel & K.Augustin
• No synonyms with different authors possible
• Poa pubescens R.Br. synonym of Eragrostis pubescens (R.Br.) Steud.
• Poa pubescens Lej. synonym of Poa pratensis L.
• merged all names with exact same canonical name
• list of known homonym genera (IRMNG) used to disambiguate between larger groups
9. Backbone Building
• Overlay ordered sources
• Start with Catalog of Life
• Primary source defines status
• Create new name if kingdom, canonical name & authorship do not exist in
current nub
• Ignore source name if …
• not a major Linnean rank (infraspecifc ranks are included)
• higher ranks above family (configurable per source)
• status conflicts with already existing status
• hybrid formula, cultivar, candidatus or placeholder names !!!
Catalogue of Life
Fauna
Europaea
GRIN
Mammal
Species
World
Observations
Specimens 8000 Species Lists
10s of taxonomic resources
Me
15. Backbone Assembling
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
A. a. var. maritima (E. Sheld.) Baird
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Infraspecifics are included
Asteraceae
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
A. a. var. maritima (E. Sheld.) Baird
16. Backbone Assembling
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
A. a. var. maritima (E. Sheld.) Baird
Agoseris eastwoodiae Fedde
Agoseris maritima E. Sheld.
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Other source treats them
as species
• Same canonical maritima
allowed twice - author different
Asteraceae
Agoseris eastwoodiae Fedde
Agoseris maritima E. Sheld.
17. Final Cleanup - Basionyms
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. eastwoodiae (Fedde) Munz
= Agoseris eastwoodiae Fedde
A. a. var. maritima (E. Sheld.) Baird
= Agoseris maritima E. Sheld.
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Finally basionyms are detected
• by terminal epithet & author
within a family
• Only 1 accepted per group
• the most trusted first stays
18. Final Cleanup - Autonyms
Asteraceae
Helianthus L.
Helianthus anuus L.
Agoseris
Agoseris apargioides (Less.) Greene
= A. maritima Eastw.
A. a. var. apargioides
A. a. var. eastwoodiae (Fedde) Munz
= Agoseris eastwoodiae Fedde
A. a. var. maritima (E. Sheld.) Baird
= Agoseris maritima E. Sheld.
Cichorium L.
Cichorium intybus L.
= C. balearicum Porta
= C. byzantinum Clem.
• Create missing autonyms
19. Backbone Building Rules
• Create missing genus or species in classification
• only for accepted taxa
• Create missing autonyms for infraspecific
• Detect basionyms based on terminal epithet & authorship
• Assumes epithet & authorship in family is unique
• Converts all but one accepted to synonyms
• Flag taxa as doubtful
• genus or higher taxon without any species (IRMNG)
• species (or infrasp.) with a parent genus (or species) considered to be a synonym
• moved to newly accepted genus (or species)
• the case for potential children of synonymised basionym combination
20. Backbone Sources
• GBIF Backbone Patch
• Catalogue of Life
• World Register of Marine Species
• Dyntaxa - Svensk taxonomisk databas
• GRIN Taxonomy
• Fauna Europaea
• Integrated Taxonomic Information System
• Euro+Med Plantbase
• Interim Register of Marine and Nonmarine Genera
• The Clements Checklist
• IOC World Bird Names
• Mammal Species of the World
• Paleobiology Database
• Nomenclators
• International Plant Names Index
• Index Fungorum
• ZooBank
• Prokaryotic Nomenclature Up-to-
date
• ICTV Master Species List
• Organisations
• Species Files
• Biodiversity Data Journal (Pensoft)
• ZooKeys (Pensoft)
• PhytoKeys (Pensoft)
• Plazi ???
21. Backbone Matching
• Occurrence
• fuzzy name match
• classification match
• allow higher rank matches
• Checklist
• match kingdom
• require straight canonical match
• incl authorship comparison
• no webservice yet, only embedded
22. NameUsageParsed Name
Backbone Match
Citation
Dataset Metrics
Verbatim Record
Metrics
Extensions
• Checklists & Nub
same structure
• Parent-child
hierarchy
• normalized classification
• flexible ranks
• synonyms accepted rel.
• Dataset metrics
as timeseries
• Basionym relation
Schema
23. CLB Supported Extensions
• Description: human paragraphs about some topic
• Distribution: area ranges with statuses
• Identifier: additional identifier for the record
• Multimedia: image, video, sound
• Literature references: bibliography
• Occurrence (indexed via occurrence workflows)
• Species Profile: extinct, marine, freshwater, terrestrial flags
• Types and specimens: (overlaps with Occurrence)
• Vernacular names: name with language & region
http://rs.gbif.org/extension/gbif/1.0/