 Bioinformatics – mainly concerned with genomics / DNA
sequencing
 Biodiversity Informatics – Biodiversity as a computable
resource, e.g. …
◦ Numbers of species, genera, families, etc. in the world; what
are their names, where and when published (and by whom)
◦ A navigable “taxonomic backbone” for biodiversity data systems
◦ Name recognition / spell checking
◦ A machine-searchable repository of attributes (traits) e.g.
marine/nonmarine, extant/fossil, more…
◦ Cross referencing old names (synonyms) to current names
◦ Tracking taxonomic dynamism (new names / taxonomic changes
through time)
 One answer: “the variety of plant and
animal life in the world or in a
particular habitat”
 For a number of reasons, may wish
to include fossil as well as extant
taxa in a comprehensive
treatment:
◦ Enable a view of evolutionary
processes / serve more than
one community
◦ New names must not be
preoccupied in either extant
or fossil realms
◦ Provide a facility to distinguish
between extant and fossil taxa in
stored or incoming data.
http://www.evogeneao.com/tree-of-life/tree-of-life.htm
http://www.evogeneao.com/tree-of-life/tree-of-life.htm
http://www.evogeneao.com/tree-of-life/tree-of-life.htm
~1.9m valid, extant species + ?0.3m fossils
– say 2.2m valid names total, + more for
synonyms (maybe another 2-3m…)
…no figures available for genera, but
guesstimate would be around 10% of these
values – maybe 200k valid, another 200-
300k synonyms
 ~20,000 new species described every year, also ~2,000 new
genera, dozens/hundreds of families – no single source of these
at this time (but IPNI, ION* help)
 Probably >1,000 species and some genus names move into/out of
synonymy each year – specialist databases/publications keep
track of these to a degree
 Major new treatments of particular groups e.g. flowering plants
(APG I/II/III), birds, protists can radically affect higher taxonomy.
 *International Plant Name Index / Index of Organism Names
 Print works – need a physical copy (or pdf), no integration,
limited computer querying
 Wikipedia / Google / Google Scholar – ad hoc treatments,
no/few common terms, hierarchies set up for human reading
but not computers, cannot detect data gaps or generate lists
 Databases (“relational databases”) – standardise content
and relations between data items enabling rapid and efficient
queries, e.g. …
◦ Search millions of entries in seconds or milliseconds
◦ Easy checks for data inconsistencies e.g. same author names spelled
different ways, “extant” species in “fossil” genus, etc.
◦ Produce summary statistics on-the-fly
◦ Provide intuitive navigation tools e.g. drill down or go up tree, etc.,
all derived from the data (no separate maintenance overhead)
◦ Enter the data once, query / report in multiple ways according to
current and future user needs / ideas
◦ Support both human- and machine- based queries (=data services).
 Catalogue of Life – attempt to “stitch
together” existing species-level databases
for particular groups, where these exist
◦ Extant only (2002-15), fossils just starting
◦ Where no global species database/s (GSDs),
no data (even for genera/families)
◦ ~45% complete for species in 2006, 85% in
2015 (less so for synonyms and genus names)
 Paleobiology Database (previously
PaleoDB) – community-based activity for
fossil taxa & occurrences
◦ Does include genus, family level information,
also content types not in CoL e.g. references,
geological range, collecting localities
◦ 50,000(?) names in 2006, 320,000 names in
2015 (all ranks), degree of completeness not
known
 Index of Organism Names (ION) – based on Zoological
Record ongoing indexing of the scientific literature (big task –
5,000 journals scanned + more)
◦ Contains 5m+ animal names (extant + fossil), but mix of “clean” and
“dirty” data (including many duplicates)
◦ Proprietary product: cannot obtain entire database (or portion of it) as
an export file although can query via web
◦ Some “key” content e.g. abstracts, authorship of publications, user-
defined searches, is behind paywall (limited to subscribers to
Zoological Record) and protected by IP assertions
◦ ION IDs a useful concept for subset of the names (those linked to
primary publication instances) – potential for re-use in other systems.
 Nomenclator Zoologicus, Index Nominum Genericorum
are fairly complete genus-level compilations (to ~2004) for animals
and plants, respectively (including fossils)
 Other databases attempt to be complete for specific groups (including
species as well as genera) – many updated on ongoing basis
 Genera easier than species (10x fewer to catalogue – 0.5m vs. 5m), still
useful because part of every species name e.g. Homo sapiens, Physeter
macrocephalus
 Use most of the resources mentioned (especially genus level
compilations) to provide as comprehensive as possible “complete” survey
of biodiversity (no gaps), i.e. plant + animal, extant + fossil, also
prokaryotes and viruses – initially for OBIS use
 Some sources (e.g. the Plant List, PaleoBioDB, recent versions of Cat. of
Life) not yet included, could be added in future
 “Missing” animal species names could be included from other resources
as time available (but lower priority)
 “Interim” = some limitations, but useable in the absence of anything
better or more “finished”.
2015: - 479,000/488,000 genus names
(valid + synonyms + “not known”)
- 1.9m species names
(incl. 0.6m synonyms)
- extant plus fossil taxa, all groups, most with
marine/nonmarine and extant/fossil flags
 OBIS (Ocean Biogeographic Information System) /
CoML (Census of Marine Life) – need to discriminate marine
from nonmarine, extant from fossil names in incoming data
 CSIRO Marine Research – complement / extension to existing
“marine, Australian only” dataset (CAAB) – including cross checks
and new information/records
 Author’s personal interest / background – including
experience with algae, higher plants, marine fauna, palynology and
microfossils – plus challenge of creating something new and useful
 Subsequent interest from other projects – including GBIF,
Atlas of Living Australia (ALA), SeaLifeBase, Global Names,
Encyclopedia of Life, Open Tree of Life, WoRMS and more.
• Terminal tips are species (extant or extinct i.e. fossil)
• Groups of species are genera
• Groups of genera are families
• Groups of families are orders, then classes, then phyla, then kingdoms
(intermediate ranks e.g. superfamilies, subphyla could be added later if
desired)
(etc.)
(etc.)
• Mammal genus syns. are well
resolved (varies with group)
• Mix of extant and fossil (†) taxa
• Species completeness variable
(extant currently much better than
fossil)
• Some genus names not yet in
families, just in placeholder
categories like “Mammalia –
awaiting allocation” – to be
addressed in due course (part of
“Interim” nature of project)
 Initial lists of families
from Parker and Benton
print works (1982, 1983)
for extant and fossil taxa,
respectively; updated for
many groups from more
recent sources
 Genus names from
◦ Taxonomicon/SN200
(private Dutch
compilation, 2006)
◦ Nomenclator
Zoologicus (animals,
to 2004)
◦ Index Nominum
Genericorum, The
Plant List and GRIN
(plants, to c. 2010)
◦ more animal genera
from ION/Zoological
Record (to c. 2009) +
updates in progress
◦ Index Fungorum
(2009 + 2013 update)
◦ prokaryotes and
viruses from LPSN
(2009) and virus DB
(2006)
◦ more…
 Species from
◦ Catalogue of Life (2006
version)
◦ Australian Faunal
Directory (2007 version)
◦ NZ Biodiversity Inventory
(2008 preprint) – 56k
living + 15k fossil
species
◦ Aphia/ERMS/
WoRMS (2006 + 2013
update) – 220k valid
species+159k synonyms
◦ Joel Hallan Biology
Catalog (2012 version)
◦ Museum Victoria KEmu
database (2006 version)
◦ Print sources for some
fossil groups
◦ more…
 Author’s original version (2006-current) at CSIRO Hobart, at
www.cmar.csiro.au/datacentre/irmng/
- software and content will not be developed further, but has
some custom features not yet in VLIZ-hosted version as below
CSIRO Marine Labs, Hobart
 Author’s original version (2006-current) at CSIRO Hobart, at
www.cmar.csiro.au/datacentre/irmng/
- software and content will not be developed further, but has
some custom features not yet in VLIZ-hosted version as below
 New version / location (2015 on, still under test) at Flanders
Marine Institute (VLIZ), Belgium, home of WoRMS + OBIS:
www.marinespecies.org/irmng/
(new content will be added to this version)
CSIRO Marine Labs, Hobart Flanders Marine Institute, Oostende
(paste names
list here)
http://www.zo.utexas.edu/faculty/antisense/downloadfilestol.html
You are here
Example copy-and-paste into IRMNG query – what are these critters?
Corculum cardissa
Galeomma takii
Ostrea edulis
Crassostrea virginica
Nerita albicilla
Mytilus edulis
Mytilus trossulus
Mytilus galloprovincialis
Mytilus californianus
Geukensia demissa
Mimachlamys varia
Chlamys hastata
Crassadoma gigantea
Pecten maximus
Argopecten gibbus
Argopecten irradians
Placopecten magellanicus
Chlamys islandica
Atrina pectinata
Arca noae
Barbatia virescens
(etc. …)
family IRMNG
data
source
classification traits
remarks,
synonymy
where known
full name
+ authority
(etc. …)
 Other web functions include search/match at higher
taxonomic levels e.g. genus, family; traverse taxonomic
hierarchy; filter by taxonomic group; search by author, year,
more; partial search e.g. by 3 or more leading characters…
 At database level, can do many other custom searches e.g.
look for genera with no children or verified source (could be
misspellings), look for genera awaiting family allocation, etc.
etc.
 VLIZ copy is intended to support multiple remote editing
(different editors for particular groups), as per current
WoRMS (World Register of Marine Species), allowing for
future distribution of effort, also leverage of data entry into
WoRMS down the track…
(NB conscious effort here to transition IRMNG from a “single
editor” project to more collective ownership / ongoing
maintenance & development; will share software environment
with WoRMS going forward…)
 “I've never seen someone's chin hit the floor so quickly when I
showed them IRMNG. I have been asked to show everyone in
the WAM [West Australian Museum] about it, and send out an
email to let them know it is there. At the morning tea it was
discussed and there was a good level of excitement… Part of
this work is reviewing taxonomic names in the Kimberley. We
are going to keep an eye out for any names that we identify as
errors and can feed that back.”
 – Piers Higgs, Gaia Resources, Australia
[Piers@gaiaresources.com.au], 21 May 2009
 “I've never seen someone's chin hit the floor so quickly when I
showed them IRMNG. I have been asked to show everyone in
the WAM [West Australian Museum] about it, and send out an
email to let them know it is there. At the morning tea it was
discussed and there was a good level of excitement… Part of
this work is reviewing taxonomic names in the Kimberley. We
are going to keep an eye out for any names that we identify as
errors and can feed that back.”
 – Piers Higgs, Gaia Resources, Australia
[Piers@gaiaresources.com.au], 21 May 2009
 “IRMNG is the most useful web tool I've ever used - I also used
it to place Micropalaeontology genera into families (ostracods
and foraminifera) - Here there is always a single match in these
groups, sometimes also a match in Mollusca and Arthropoda,
but I just use the former ones because I know that these
names are of specimens belonging to those groups… Thank you
very much for developing this fantastic tool.”–
 – Willem Coetzer, South African Institute for Aquatic
Biodiversity [w.coetzer@saiab.ac.za], 2 April 2011
 Fill gap in more recent genus
names (e.g. 2008-present) – start
made over past 12 months (9k
animal genera added to 2011,
additional 5k in pipeline)
 Improve taxonomic resolution for
~100k genus names not yet
placed to family (big task, also
many may be older synonyms)
 Check ~10k presently “unverified”
genus names – mix of
misspellings and “good” names
not in major compilations
 Revisit and additional QA on
higher taxonomy (Kingdom ->
Family) using most recent sources
 Add in newly available “quality”
species lists: The Plant List
(2010/2013), PaleoBioDB,
Catalogue of Life post 2006
 Update prokaryote and virus
species lists
 Get additional animal species
names from elsewhere e.g. ION
(big task!), fungal species from
Index Fungorum / Mycobank
 Think about data flows in the
bigger picture – who tracks new
names, who wants them, how to
notify and transport between
projects with similar needs.
 Intention is to create a new future for IRMNG with the
handover to VLIZ
 VLIZ with Tony will investigate best ongoing use for the
system, synergies with WoRMS etc.
 Other projects now active in this space e.g. EOL, OTOL,
GBIF, Global Names all have “own systems” – possible
scope for further collaboration and discussions
 Watch this space…
Thank you!
Tony Rees Leen Vandepitte + team at VLIZ
tonyrees49@gmail.com info@marinespecies.org
 Hexapoda (Insects): 175k (of which
7.5k fossil)
◦ includes Coleoptera 62k,
Lepidoptera 30k,
Hemiptera 21k, Diptera 21k,
Hymenoptera 19k
 Vertebrata: 55k (incl. 15.5k fossil)
◦ Mammalia 13k (incl. 6.9k fossil)
◦ Aves 13k (incl. 0.6k fossil)
◦ Pisces 19k (incl. 4.1k fossil)
◦ Reptilia 8k (incl. 3.2k fossil)
◦ Amphibia 2k (incl. 0.6k fossil)
 Land Plants: (Mosses -> Angios) 50k
(incl. 8.6k fossil) – includes
Angiosperms 40k
 Mollusca: 41k (incl. 17.2k fossil)
 Chelicerata: 20k (incl. 1.1k fossil)
 Crustacea: 20k (incl. 4.8k fossil)
 Fungi: 17k (incl. 0.5k fossil)
 Protista (excl. Algae): 13k (incl. 4.2k
fossil)
 “Algae”: 9k (incl. 2.7k fossil)
 Cnidaria: 9k (incl. 4.1k fossil)
 Echinodermata: 8k (incl. 4.3k fossil)
 Platyhelminthes: 7k (incl. 0.2k
fossil)
 Brachiopoda: 6k (incl. 5.9k fossil)
 Trilobitomorpha: 6k (all fossil)
 Porifera: 5k (incl. 2.1k fossil)
 Nematoda: 5k (incl. 0.1k fossil)
 Bacteria/Cyanos/Archaea: 3k (incl.
0.4k fossil)
 smaller invert. groups: 0k -> 3k each
 Viruses: 0.4k (0 fossil)
Approx. mean ratio of species to genus names varies from
around 5:1 through 10:1 (Hexapoda) to 20:1 (Land plants)
 >469,000 genus names (incl. 94,000 fossil) and 1.9m species names
held (1.3m valid, 0.6m synonyms) – latter from CoL, WoRMS and
elsewhere e.g. national lists
 Genus coverage good up to ~pub. yr. 2002 (2,000+ per year), then
declines: 2002: 2,488
2003: 2,385
2004: 1,942
2005: 1,372
2006: 1,554
2007: 1,656
2008: 1,232
2009: 859
2010: 478
2011: 335
2012: 253
2013: 38
2014: 0
 N.B., ~100k animal genus names from Nomenclator Zoologicus still
need allocation to family (presently e.g. “Mammalia”, “Mollusca” only)
 Indicates
backlog
exists to be
filled, see
next slide
 2014-2015 activity: parsing and preparing c.9k missed animal genus
names from ION database (Zoological Record) to 2012, via R. Page
“Bionames” data dump
 Existing genus coverage: 2002: 2,488 Add: 108
2003: 2,385 64
2004: 1,942 625
2005: 1,372 939
2006: 1,554 1,002
2007: 1,656 1,033
2008: 1,232 1,277
2009: 859 1,661
2010: 478 1,661
2011: 335 962
2012: 253 0
2013: 38 0
2014: 0 0
 Need an ongoing mechanism to trap new genus names as published
(also need new botanical genera c.2009 onwards)
 2014-2015 activity: parsing and preparing c.9k missed animal genus
names from ION database (Zoological Record) to 2012, via R. Page
“Bionames” data dump
 Existing genus coverage: 2002: 2,488 Add: 108
2003: 2,385 64
2004: 1,942 625
2005: 1,372 939
2006: 1,554 1,002
2007: 1,656 1,033
2008: 1,232 1,277 39
2009: 859 1,661 31
2010: 478 1,661 81
2011: 335 962 595
2012: 253 0 1,790
2013: 38 0 1,783
2014: 0 0 1,187
 Need an ongoing mechanism to trap new genus names as published
(also need new botanical genera c.2009 onwards)
New
batch
#2
Input
name
Target
names
near
matches
Successful submission
based on IRMNG,
Taxamatch and 2 other
systems developed by the
author, 2002-2014
(2014 GBIF keynote address available on YouTube)

Tony Rees IRMNG 2015 presentation

  • 2.
     Bioinformatics –mainly concerned with genomics / DNA sequencing  Biodiversity Informatics – Biodiversity as a computable resource, e.g. … ◦ Numbers of species, genera, families, etc. in the world; what are their names, where and when published (and by whom) ◦ A navigable “taxonomic backbone” for biodiversity data systems ◦ Name recognition / spell checking ◦ A machine-searchable repository of attributes (traits) e.g. marine/nonmarine, extant/fossil, more… ◦ Cross referencing old names (synonyms) to current names ◦ Tracking taxonomic dynamism (new names / taxonomic changes through time)
  • 3.
     One answer:“the variety of plant and animal life in the world or in a particular habitat”  For a number of reasons, may wish to include fossil as well as extant taxa in a comprehensive treatment: ◦ Enable a view of evolutionary processes / serve more than one community ◦ New names must not be preoccupied in either extant or fossil realms ◦ Provide a facility to distinguish between extant and fossil taxa in stored or incoming data.
  • 4.
  • 5.
  • 6.
  • 10.
    ~1.9m valid, extantspecies + ?0.3m fossils – say 2.2m valid names total, + more for synonyms (maybe another 2-3m…) …no figures available for genera, but guesstimate would be around 10% of these values – maybe 200k valid, another 200- 300k synonyms
  • 11.
     ~20,000 newspecies described every year, also ~2,000 new genera, dozens/hundreds of families – no single source of these at this time (but IPNI, ION* help)  Probably >1,000 species and some genus names move into/out of synonymy each year – specialist databases/publications keep track of these to a degree  Major new treatments of particular groups e.g. flowering plants (APG I/II/III), birds, protists can radically affect higher taxonomy.  *International Plant Name Index / Index of Organism Names
  • 12.
     Print works– need a physical copy (or pdf), no integration, limited computer querying  Wikipedia / Google / Google Scholar – ad hoc treatments, no/few common terms, hierarchies set up for human reading but not computers, cannot detect data gaps or generate lists  Databases (“relational databases”) – standardise content and relations between data items enabling rapid and efficient queries, e.g. … ◦ Search millions of entries in seconds or milliseconds ◦ Easy checks for data inconsistencies e.g. same author names spelled different ways, “extant” species in “fossil” genus, etc. ◦ Produce summary statistics on-the-fly ◦ Provide intuitive navigation tools e.g. drill down or go up tree, etc., all derived from the data (no separate maintenance overhead) ◦ Enter the data once, query / report in multiple ways according to current and future user needs / ideas ◦ Support both human- and machine- based queries (=data services).
  • 13.
     Catalogue ofLife – attempt to “stitch together” existing species-level databases for particular groups, where these exist ◦ Extant only (2002-15), fossils just starting ◦ Where no global species database/s (GSDs), no data (even for genera/families) ◦ ~45% complete for species in 2006, 85% in 2015 (less so for synonyms and genus names)  Paleobiology Database (previously PaleoDB) – community-based activity for fossil taxa & occurrences ◦ Does include genus, family level information, also content types not in CoL e.g. references, geological range, collecting localities ◦ 50,000(?) names in 2006, 320,000 names in 2015 (all ranks), degree of completeness not known
  • 14.
     Index ofOrganism Names (ION) – based on Zoological Record ongoing indexing of the scientific literature (big task – 5,000 journals scanned + more) ◦ Contains 5m+ animal names (extant + fossil), but mix of “clean” and “dirty” data (including many duplicates) ◦ Proprietary product: cannot obtain entire database (or portion of it) as an export file although can query via web ◦ Some “key” content e.g. abstracts, authorship of publications, user- defined searches, is behind paywall (limited to subscribers to Zoological Record) and protected by IP assertions ◦ ION IDs a useful concept for subset of the names (those linked to primary publication instances) – potential for re-use in other systems.
  • 15.
     Nomenclator Zoologicus,Index Nominum Genericorum are fairly complete genus-level compilations (to ~2004) for animals and plants, respectively (including fossils)  Other databases attempt to be complete for specific groups (including species as well as genera) – many updated on ongoing basis
  • 16.
     Genera easierthan species (10x fewer to catalogue – 0.5m vs. 5m), still useful because part of every species name e.g. Homo sapiens, Physeter macrocephalus  Use most of the resources mentioned (especially genus level compilations) to provide as comprehensive as possible “complete” survey of biodiversity (no gaps), i.e. plant + animal, extant + fossil, also prokaryotes and viruses – initially for OBIS use  Some sources (e.g. the Plant List, PaleoBioDB, recent versions of Cat. of Life) not yet included, could be added in future  “Missing” animal species names could be included from other resources as time available (but lower priority)  “Interim” = some limitations, but useable in the absence of anything better or more “finished”. 2015: - 479,000/488,000 genus names (valid + synonyms + “not known”) - 1.9m species names (incl. 0.6m synonyms) - extant plus fossil taxa, all groups, most with marine/nonmarine and extant/fossil flags
  • 17.
     OBIS (OceanBiogeographic Information System) / CoML (Census of Marine Life) – need to discriminate marine from nonmarine, extant from fossil names in incoming data  CSIRO Marine Research – complement / extension to existing “marine, Australian only” dataset (CAAB) – including cross checks and new information/records  Author’s personal interest / background – including experience with algae, higher plants, marine fauna, palynology and microfossils – plus challenge of creating something new and useful  Subsequent interest from other projects – including GBIF, Atlas of Living Australia (ALA), SeaLifeBase, Global Names, Encyclopedia of Life, Open Tree of Life, WoRMS and more.
  • 18.
    • Terminal tipsare species (extant or extinct i.e. fossil) • Groups of species are genera • Groups of genera are families • Groups of families are orders, then classes, then phyla, then kingdoms (intermediate ranks e.g. superfamilies, subphyla could be added later if desired)
  • 19.
  • 20.
    (etc.) • Mammal genussyns. are well resolved (varies with group) • Mix of extant and fossil (†) taxa • Species completeness variable (extant currently much better than fossil) • Some genus names not yet in families, just in placeholder categories like “Mammalia – awaiting allocation” – to be addressed in due course (part of “Interim” nature of project)
  • 22.
     Initial listsof families from Parker and Benton print works (1982, 1983) for extant and fossil taxa, respectively; updated for many groups from more recent sources  Genus names from ◦ Taxonomicon/SN200 (private Dutch compilation, 2006) ◦ Nomenclator Zoologicus (animals, to 2004) ◦ Index Nominum Genericorum, The Plant List and GRIN (plants, to c. 2010) ◦ more animal genera from ION/Zoological Record (to c. 2009) + updates in progress ◦ Index Fungorum (2009 + 2013 update) ◦ prokaryotes and viruses from LPSN (2009) and virus DB (2006) ◦ more…  Species from ◦ Catalogue of Life (2006 version) ◦ Australian Faunal Directory (2007 version) ◦ NZ Biodiversity Inventory (2008 preprint) – 56k living + 15k fossil species ◦ Aphia/ERMS/ WoRMS (2006 + 2013 update) – 220k valid species+159k synonyms ◦ Joel Hallan Biology Catalog (2012 version) ◦ Museum Victoria KEmu database (2006 version) ◦ Print sources for some fossil groups ◦ more…
  • 29.
     Author’s originalversion (2006-current) at CSIRO Hobart, at www.cmar.csiro.au/datacentre/irmng/ - software and content will not be developed further, but has some custom features not yet in VLIZ-hosted version as below CSIRO Marine Labs, Hobart
  • 30.
     Author’s originalversion (2006-current) at CSIRO Hobart, at www.cmar.csiro.au/datacentre/irmng/ - software and content will not be developed further, but has some custom features not yet in VLIZ-hosted version as below  New version / location (2015 on, still under test) at Flanders Marine Institute (VLIZ), Belgium, home of WoRMS + OBIS: www.marinespecies.org/irmng/ (new content will be added to this version) CSIRO Marine Labs, Hobart Flanders Marine Institute, Oostende
  • 31.
  • 32.
  • 33.
  • 34.
    Example copy-and-paste intoIRMNG query – what are these critters?
  • 35.
    Corculum cardissa Galeomma takii Ostreaedulis Crassostrea virginica Nerita albicilla Mytilus edulis Mytilus trossulus Mytilus galloprovincialis Mytilus californianus Geukensia demissa Mimachlamys varia Chlamys hastata Crassadoma gigantea Pecten maximus Argopecten gibbus Argopecten irradians Placopecten magellanicus Chlamys islandica Atrina pectinata Arca noae Barbatia virescens
  • 36.
    (etc. …) family IRMNG data source classificationtraits remarks, synonymy where known full name + authority
  • 37.
  • 38.
     Other webfunctions include search/match at higher taxonomic levels e.g. genus, family; traverse taxonomic hierarchy; filter by taxonomic group; search by author, year, more; partial search e.g. by 3 or more leading characters…  At database level, can do many other custom searches e.g. look for genera with no children or verified source (could be misspellings), look for genera awaiting family allocation, etc. etc.  VLIZ copy is intended to support multiple remote editing (different editors for particular groups), as per current WoRMS (World Register of Marine Species), allowing for future distribution of effort, also leverage of data entry into WoRMS down the track… (NB conscious effort here to transition IRMNG from a “single editor” project to more collective ownership / ongoing maintenance & development; will share software environment with WoRMS going forward…)
  • 39.
     “I've neverseen someone's chin hit the floor so quickly when I showed them IRMNG. I have been asked to show everyone in the WAM [West Australian Museum] about it, and send out an email to let them know it is there. At the morning tea it was discussed and there was a good level of excitement… Part of this work is reviewing taxonomic names in the Kimberley. We are going to keep an eye out for any names that we identify as errors and can feed that back.”  – Piers Higgs, Gaia Resources, Australia [Piers@gaiaresources.com.au], 21 May 2009
  • 40.
     “I've neverseen someone's chin hit the floor so quickly when I showed them IRMNG. I have been asked to show everyone in the WAM [West Australian Museum] about it, and send out an email to let them know it is there. At the morning tea it was discussed and there was a good level of excitement… Part of this work is reviewing taxonomic names in the Kimberley. We are going to keep an eye out for any names that we identify as errors and can feed that back.”  – Piers Higgs, Gaia Resources, Australia [Piers@gaiaresources.com.au], 21 May 2009  “IRMNG is the most useful web tool I've ever used - I also used it to place Micropalaeontology genera into families (ostracods and foraminifera) - Here there is always a single match in these groups, sometimes also a match in Mollusca and Arthropoda, but I just use the former ones because I know that these names are of specimens belonging to those groups… Thank you very much for developing this fantastic tool.”–  – Willem Coetzer, South African Institute for Aquatic Biodiversity [w.coetzer@saiab.ac.za], 2 April 2011
  • 41.
     Fill gapin more recent genus names (e.g. 2008-present) – start made over past 12 months (9k animal genera added to 2011, additional 5k in pipeline)  Improve taxonomic resolution for ~100k genus names not yet placed to family (big task, also many may be older synonyms)  Check ~10k presently “unverified” genus names – mix of misspellings and “good” names not in major compilations  Revisit and additional QA on higher taxonomy (Kingdom -> Family) using most recent sources  Add in newly available “quality” species lists: The Plant List (2010/2013), PaleoBioDB, Catalogue of Life post 2006  Update prokaryote and virus species lists  Get additional animal species names from elsewhere e.g. ION (big task!), fungal species from Index Fungorum / Mycobank  Think about data flows in the bigger picture – who tracks new names, who wants them, how to notify and transport between projects with similar needs.
  • 42.
     Intention isto create a new future for IRMNG with the handover to VLIZ  VLIZ with Tony will investigate best ongoing use for the system, synergies with WoRMS etc.  Other projects now active in this space e.g. EOL, OTOL, GBIF, Global Names all have “own systems” – possible scope for further collaboration and discussions  Watch this space… Thank you! Tony Rees Leen Vandepitte + team at VLIZ tonyrees49@gmail.com info@marinespecies.org
  • 44.
     Hexapoda (Insects):175k (of which 7.5k fossil) ◦ includes Coleoptera 62k, Lepidoptera 30k, Hemiptera 21k, Diptera 21k, Hymenoptera 19k  Vertebrata: 55k (incl. 15.5k fossil) ◦ Mammalia 13k (incl. 6.9k fossil) ◦ Aves 13k (incl. 0.6k fossil) ◦ Pisces 19k (incl. 4.1k fossil) ◦ Reptilia 8k (incl. 3.2k fossil) ◦ Amphibia 2k (incl. 0.6k fossil)  Land Plants: (Mosses -> Angios) 50k (incl. 8.6k fossil) – includes Angiosperms 40k  Mollusca: 41k (incl. 17.2k fossil)  Chelicerata: 20k (incl. 1.1k fossil)  Crustacea: 20k (incl. 4.8k fossil)  Fungi: 17k (incl. 0.5k fossil)  Protista (excl. Algae): 13k (incl. 4.2k fossil)  “Algae”: 9k (incl. 2.7k fossil)  Cnidaria: 9k (incl. 4.1k fossil)  Echinodermata: 8k (incl. 4.3k fossil)  Platyhelminthes: 7k (incl. 0.2k fossil)  Brachiopoda: 6k (incl. 5.9k fossil)  Trilobitomorpha: 6k (all fossil)  Porifera: 5k (incl. 2.1k fossil)  Nematoda: 5k (incl. 0.1k fossil)  Bacteria/Cyanos/Archaea: 3k (incl. 0.4k fossil)  smaller invert. groups: 0k -> 3k each  Viruses: 0.4k (0 fossil) Approx. mean ratio of species to genus names varies from around 5:1 through 10:1 (Hexapoda) to 20:1 (Land plants)
  • 45.
     >469,000 genusnames (incl. 94,000 fossil) and 1.9m species names held (1.3m valid, 0.6m synonyms) – latter from CoL, WoRMS and elsewhere e.g. national lists  Genus coverage good up to ~pub. yr. 2002 (2,000+ per year), then declines: 2002: 2,488 2003: 2,385 2004: 1,942 2005: 1,372 2006: 1,554 2007: 1,656 2008: 1,232 2009: 859 2010: 478 2011: 335 2012: 253 2013: 38 2014: 0  N.B., ~100k animal genus names from Nomenclator Zoologicus still need allocation to family (presently e.g. “Mammalia”, “Mollusca” only)  Indicates backlog exists to be filled, see next slide
  • 46.
     2014-2015 activity:parsing and preparing c.9k missed animal genus names from ION database (Zoological Record) to 2012, via R. Page “Bionames” data dump  Existing genus coverage: 2002: 2,488 Add: 108 2003: 2,385 64 2004: 1,942 625 2005: 1,372 939 2006: 1,554 1,002 2007: 1,656 1,033 2008: 1,232 1,277 2009: 859 1,661 2010: 478 1,661 2011: 335 962 2012: 253 0 2013: 38 0 2014: 0 0  Need an ongoing mechanism to trap new genus names as published (also need new botanical genera c.2009 onwards)
  • 47.
     2014-2015 activity:parsing and preparing c.9k missed animal genus names from ION database (Zoological Record) to 2012, via R. Page “Bionames” data dump  Existing genus coverage: 2002: 2,488 Add: 108 2003: 2,385 64 2004: 1,942 625 2005: 1,372 939 2006: 1,554 1,002 2007: 1,656 1,033 2008: 1,232 1,277 39 2009: 859 1,661 31 2010: 478 1,661 81 2011: 335 962 595 2012: 253 0 1,790 2013: 38 0 1,783 2014: 0 0 1,187  Need an ongoing mechanism to trap new genus names as published (also need new botanical genera c.2009 onwards) New batch #2
  • 49.
  • 50.
    Successful submission based onIRMNG, Taxamatch and 2 other systems developed by the author, 2002-2014 (2014 GBIF keynote address available on YouTube)