SlideShare a Scribd company logo
www.obis.org.au/irmng



              IRMNG – the Interim Register of
              Marine and Nonmarine Genera:
                rationale and current status


Tony Rees – CSIRO Marine and Atmospheric Research, Australia
for: GN-CoL names and taxonomy sharing workshop, Hawaii, March 2012
The Dream…
Imagine a system that would…
   • Automatically classify “any” genus & species name to kingdom /
     phylum / class / order / family (as far down as possible) – “what is
     this critter” – plus hierarchical relations e.g. parents / children /
     siblings
   • Return whether a current (valid) or non-current name e.g.
     synonym
   • Check spelling for correctness, also authority details, plus supply
     original publication ref. as available
   • Return associated attributes such as extant / fossil status, habitat
     information, geographic / geologic range, more…
   • Work seamlessly, with a single point of entry, across all groups
     and geologic epochs including present day
   • Be as up-to-date as possible (latest content), and authoritative
     (maintained by relevant experts)


  Tony Rees: IRMNG March 2012
Realising the Dream…

 • For extant taxa: role of Cat. of Life, however ~30% of species still to
   go; for fossil taxa: PaleoDB (unknown proportion missing, maybe
   50%?)
 • In mean time, could make progress by assembling global genera list,
   and infilling with species names as available




          genera

          species


 • IRMNG is an attempt along these lines… a work in progress, with
   modest resourcing, but available for use now.

Tony Rees: IRMNG March 2012
IRMNG data sources

• Animal genera + auth’s from Nomenclator Zoologicus and
  elsewhere, tax. placements and synonymies from multiple sources
  including CoL, individual taxon treatments and printed works

• Botanical genera and auth’s from Index Nominum Genericorum
  (ING) supplemented with other sources, tax. placements and
  synonymies from multiple sources including GRIN (APGIII in the
  main), Index Fungorum, AlgaeBase, CyanoDB, more

• Prokaryote genera, auth’s and tax. placements from LSPN
  (Euzéby list), previous/non-valid names from multiple sources

• Virus genera and tax. placements from ICTV db (multiple versions
  – very different through time)

• Species lists (all groups) from CoL 2006, Aphia/WoRMS 2006,
  AFD, NZ Organisms Register + more.


Tony Rees: IRMNG March 2012
IRMNG content as at March 2012 (cf. e.g. Cat. of Life):

     Cat. of Life (2011 version):     IRMNG:
             • 8k families                • 19k families
             • 178k genera                • 454k genera
             • 2.25m species names        • 1.46m species names
               (including synonyms)         (including synonyms)

• Not all IRMNG genera yet linked to relevant families, but ~370k are
  (remainder linked to higher taxon i.e. phylum, class or order)
• Extant/fossil, marine/nonmarine flags held for majority of names

• Nomenclatural status known for most names, tax. status i.e. valid
  name/synonym for only a subset at this time (varies by group)
• Authority known for >97% of genera, publication details for “animal”
  subset (from Nomenclator Zoologicus in the main)
• Fuzzy matching (TAXAMATCH) deployed over all web-based queries for
  correction of potential errors in input names to be matched.
Tony Rees: IRMNG March 2012
IRMNG in practice – example genus = “Lawsonia”

• Same name is currently a valid genus in 3 Codes i.e. plants,
  animals and bacteria (no barriers to this)




  Tony Rees: IRMNG March 2012
Required base information is scattered in multiple
systems / printed works at this time




                   plant                             animal




                              bacterium
                                            (etc.)
Tony Rees: IRMNG March 2012
Required base information is scattered in multiple
systems / printed works at this time




                   plant                             animal




                              bacterium
                                            (etc.)
Tony Rees: IRMNG March 2012
IRMNG query as at March 2012




Tony Rees: IRMNG March 2012
IRMNG query as at March 2012
                                                        synonym
                                      extant, habitat
                                                          of (as
                                          flags
                                                         known)




children        parents




       Tony Rees: IRMNG March 2012
Note: IRMNG fields displayed on the web are only a
          subset of full information held for any name, e.g.:




Tony Rees: IRMNG March 2012
IRMNG core fields

• IRMNG ID, Rank
                                             • Extant/fossil, marine/nonmarine
• Scientific name (for species: epithet +      flags + “according to” (could be “as
  parent ID)                                   per parent”)
• Authority                                  • Date entered, last modified,
• Publication (as “microcitation” – subset     deprecated (where required)
  with link to refs. module)
• Source(s) for above
• Orthography verified against               (under consideration…)
  (authoritative source)                     • Intermediate ranks e.g. subfamily,
• Parent ID (+ “according to…”) –              subgenus, also infraspecies (not
                                               currently held)
  Linnaean ranks only at this time
                                             • Type genus / species indicator
• Nomenclatural status (+ relation with
  other names as needed) + “according        • Freshwater / terrestrial flags vs.
  to…”                                         present “nonmarine”
• Taxonomic status (same)                    • Geo flags (country codes etc.)
• Nomenclatural Code                         • Palaeo range (periods/epochs)
• Taxonomic or nomenclatural remarks         • Vernacular names as available


 Tony Rees: IRMNG March 2012
IRMNG is not just a “passive” aggregator…

Editorial / curatorial decisions / actions required to:
• Correct obvious data errors
• Assemble “complete” records from multiple sources (where one source
  data deficient)
• Normalise authority data (in particular) to a “house style”
• Digitise or transcribe print material into electronic form where not
  otherwise available
• Decide between conflicting content in data sources e.g. for authority
  orthography/year, taxonomic placement, valid/synonym status and
  more
• Cross-link names e.g. synonyms -> current names, basionyms ->
  replacement names, misspelled names to their correctly spelled
  counterparts, etc. etc.
• Reconcile variant higher taxonomies as supplied to a single hierarchy
• Add nomenclatural or taxonomic remarks as required.
 Tony Rees: IRMNG March 2012
Relevance to present meeting?

 • Demonstrates utility of a single entry point to a system permitting
   query on “any name” – i.e., a [comprehensive] Taxonomic Name
   Resolution Service (TNRS) covering all life
     • Envisage something like OBIS or GBIF, but for taxonomy – the
       aggregator / central query point is not a content author, but
       provides integration and value-added services
 • IRMNG – based on static snapshot/s of multiple data sources; cf. a
   “super catalogue” should be based on live feeds from relevant
   authoritative sources, continuously updated as available (?+ some
   static data not available as feeds)
      • Maybe the static data lives outside the “data aggregation/query”
        point, becomes a separately managed source
 • How does / should GNA facilitate this?

 • Will the need for an IRMNG (or IRMNG equivalent) disappear or grow
   in the above scenario? (for example could this role be taken by
   another player or group of players…)
Tony Rees: IRMNG March 2012
Thank you!




Tony Rees: IRMNG March 2012
(supplementary slides)




Tony Rees: IRMNG March 2012
Size of the task: IRMNG 2011 content cf. Cat. of Life 2011

                                                                                   IRMNG –
                              Cat. of Life -   % with     IRMNG –         % with
                                                                                   Oct 2011 -
                              2011 edition     auth's    Oct 2011 -       auth's
                                                                                   fossil only
                                                        extant + fossil


        Kingdoms                   8                          7                         (0)
          Phyla                   111                        153                       (12)
         Classes                  288                        509                       (64)
          Orders                 1,233                      2,645                     (715)
         Families                8,071          0%         19,639         22.1%      (6,542)
       Subfamilies                 -              -           -             -            -
         Genera                 178,515          0%        452,848        97.1%     (90,278)
        Subgenera                  -              -           -              -           -
      Species (valid)          1,347,224       ~100%      1,020,519       ~100%     (16,792)
    Species (synonyms)          895,441        ~100%       440,738        ~100%       (100)

• CoL has 70% of valid extant species names (of est. 1.9m total), thus maybe
  also 70% of valid extant genera (with subset of genus-level synonyms)
• IRMNG has further ~180k extant genus names and ~90k fossil names at this
  time (including syns) – est. ~25k still missing
Tony Rees: IRMNG March 2012
Taxonomic names:
               what the customer is currently offered (+ more…)
publication         discovery                   official                    taxon-specific                    integrated DB’s           “all names”
                                               registers                        DB’s
                                              ICTV Viruses
                                               ICTV Viruses                                                          ITIS
                                                                                                                      ITIS
                                                                              CyanoDB
                                                                              CyanoDB
                                                   DB
                                                   DB                                                                NCBI
                                                                                                                     NCBI
                                                                                                                   Taxonomy
                                                                                                                   Taxonomy
                                                                                Index
                                                                                 Index                              WoRMS
                                                   LPSN
                                                   LPSN                                                             WoRMS
                                                                              Fungorum
                                                                              Fungorum                                etc.
                                               (Prokaryote
                                                (Prokaryote                                                           etc.
                                                                              MycoBank
                                                                              MycoBank
                                                  names)
                                                  names)
                                                                              AlgaeBase
                                                                              AlgaeBase


                                                                             Plant GSD’s
                                                                             Plant GSD’s
                                             ICBN Decisions
                                              ICBN Decisions
     New                                                                                                          Catalogue             ChecklistBank
                                                                                                                                        ChecklistBank
     New                                                                                                          Catalogue
   names                                                                                                           of Life
                                                                                                                   of Life
    names                                                                     The Plant
                                                                              The Plant                                                     GNI
                                                                                                                                            GNI
 publishe
  publishe                                                                    List, IPNI,
                                                                              List, IPNI,
     d (in             Journal
                       Journal                                               TROPICOS,
                                                                             TROPICOS,
     d (in          TOC’s, RSS
                                                                                                                                           GNUB
                                                                                                                                           GNUB
                    TOC’s, RSS                                                   ING
                                                                                  ING
  primary
   primary              feeds,
                                                                                               Botany
                         feeds,
literature)
 literature)        text mining
                     text mining
                                                                                              Zoology              PaleoDB
                                                                                                                   PaleoDB
                                                                             Animal GSD’s
                                                                             Animal GSD’s

                     Abstracting
                     Abstracting                                             Nomenclator
                                                                             Nomenclator
                      services
                      services                                                Zoologicus
                                                                              Zoologicus
                       Subject
                        Subject                                                             ION (Index of Organism
                                                                                             ION (Index of Organism
                    bibliographies
                    bibliographies                                                                  Names)
                                                                                                     Names)
                                                               Zoological
                                                               Zoological
                                                                Record
                                                                Record
                      Reviews,
                       Reviews,
                     secondary
                     secondary
                      literature
                       literature            ICZN Decisions
                                              ICZN Decisions                                  other compilations e.g. regional lists,
                                                                                              other compilations e.g. regional lists,
                                                                                                Wikispecies, Wikipedia, more…
                                                                                                 Wikispecies, Wikipedia, more…
               Tony Rees: IRMNG March 2012
Two approaches - GNI and Cat. of Life




          NameBank / GNI
• 20m+ names – all ranks, no hierarchy
• mix of “clean” and “dirty” names
• many duplicates
• extant + fossil, most sectors with at
least some names
Tony Rees: IRMNG March 2012
GNI search result
 – “Lawsonia” (all
ranks returned)
(Mar 2012)

…candidate genus
names highlighted in
red (although could
be other ranks too)

… need access to
original taxonomic /
nomenclatural
resources to sort out
/ see if anything
missed




   Tony Rees: IRMNG March 2012
Two approaches - GNI and Cat. of Life




          NameBank / GNI                              Cat. of Life
• 20m+ names – all ranks, no hierarchy    • <2m names – Linnaean ranks, in
• mix of “clean” and “dirty” names        hierarchy
• many duplicates                         • all “clean”/ vetted names / relationships
• extant + fossil, most sectors with at   • extant only, sectors either complete or
least some names                          absent
Tony Rees: IRMNG March 2012
Cat. of Life search result – “Lawsonia” (Mar 2012)




Tony Rees: IRMNG March 2012

More Related Content

Similar to IRMNG presentation March 2012

Patterson names
Patterson namesPatterson names
Patterson names
David Patterson
 
Remsen Lect04
Remsen Lect04Remsen Lect04
Remsen Lect04
bioinfocourse
 
Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)
Vince Smith
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendly
William Ulate
 
Sharing information between projects
Sharing information between projectsSharing information between projects
Sharing information between projects
Kehan Harman
 
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledgeFranz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
taxonbytes
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Hilmar Lapp
 
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variableFranz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variabletaxonbytes
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio Workshop
Naim Matasci
 
Classification (IB Biology)
Classification (IB Biology)Classification (IB Biology)
Classification (IB Biology)
Stephen Taylor
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
Cyndy Parr
 
Ontologies gramene tutorial
Ontologies gramene tutorialOntologies gramene tutorial
Ontologies gramene tutorial
FOODCROPS
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Andrew Ferlitsch
 
Global Names Architecture - Remsen
Global Names Architecture - RemsenGlobal Names Architecture - Remsen
Global Names Architecture - Remsen
David Remsen
 
Metadata taxonomy and content types oh my - sp fest chicago - dec 2014
Metadata taxonomy and content types oh my - sp fest chicago - dec 2014Metadata taxonomy and content types oh my - sp fest chicago - dec 2014
Metadata taxonomy and content types oh my - sp fest chicago - dec 2014
Ruven Gotz
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of Life
Naim Matasci
 
TAIR Presentation ASPB 2016
TAIR Presentation ASPB 2016TAIR Presentation ASPB 2016
TAIR Presentation ASPB 2016
Leonore Reiser
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
millerjeremya
 

Similar to IRMNG presentation March 2012 (20)

Patterson names
Patterson namesPatterson names
Patterson names
 
Remsen Lect04
Remsen Lect04Remsen Lect04
Remsen Lect04
 
Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendly
 
Sharing information between projects
Sharing information between projectsSharing information between projects
Sharing information between projects
 
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledgeFranz 2017 uiuc cirss non unitary syntheses of systematic knowledge
Franz 2017 uiuc cirss non unitary syntheses of systematic knowledge
 
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
 
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variableFranz et al 2015 escjam 2015 logic resolution taxonomic variable
Franz et al 2015 escjam 2015 logic resolution taxonomic variable
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio Workshop
 
Classification (IB Biology)
Classification (IB Biology)Classification (IB Biology)
Classification (IB Biology)
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
Ontologies gramene tutorial
Ontologies gramene tutorialOntologies gramene tutorial
Ontologies gramene tutorial
 
Bi 2005 20
Bi 2005 20Bi 2005 20
Bi 2005 20
 
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
 
Meghyn slides-hse-2014
Meghyn slides-hse-2014Meghyn slides-hse-2014
Meghyn slides-hse-2014
 
Global Names Architecture - Remsen
Global Names Architecture - RemsenGlobal Names Architecture - Remsen
Global Names Architecture - Remsen
 
Metadata taxonomy and content types oh my - sp fest chicago - dec 2014
Metadata taxonomy and content types oh my - sp fest chicago - dec 2014Metadata taxonomy and content types oh my - sp fest chicago - dec 2014
Metadata taxonomy and content types oh my - sp fest chicago - dec 2014
 
iPlant Tree of Life
iPlant Tree of LifeiPlant Tree of Life
iPlant Tree of Life
 
TAIR Presentation ASPB 2016
TAIR Presentation ASPB 2016TAIR Presentation ASPB 2016
TAIR Presentation ASPB 2016
 
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
pro-iBiosphere Towards Open Biodiversity Knowledge COOPEUS 2013
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 

IRMNG presentation March 2012

  • 1. www.obis.org.au/irmng IRMNG – the Interim Register of Marine and Nonmarine Genera: rationale and current status Tony Rees – CSIRO Marine and Atmospheric Research, Australia for: GN-CoL names and taxonomy sharing workshop, Hawaii, March 2012
  • 2. The Dream… Imagine a system that would… • Automatically classify “any” genus & species name to kingdom / phylum / class / order / family (as far down as possible) – “what is this critter” – plus hierarchical relations e.g. parents / children / siblings • Return whether a current (valid) or non-current name e.g. synonym • Check spelling for correctness, also authority details, plus supply original publication ref. as available • Return associated attributes such as extant / fossil status, habitat information, geographic / geologic range, more… • Work seamlessly, with a single point of entry, across all groups and geologic epochs including present day • Be as up-to-date as possible (latest content), and authoritative (maintained by relevant experts) Tony Rees: IRMNG March 2012
  • 3. Realising the Dream… • For extant taxa: role of Cat. of Life, however ~30% of species still to go; for fossil taxa: PaleoDB (unknown proportion missing, maybe 50%?) • In mean time, could make progress by assembling global genera list, and infilling with species names as available genera species • IRMNG is an attempt along these lines… a work in progress, with modest resourcing, but available for use now. Tony Rees: IRMNG March 2012
  • 4. IRMNG data sources • Animal genera + auth’s from Nomenclator Zoologicus and elsewhere, tax. placements and synonymies from multiple sources including CoL, individual taxon treatments and printed works • Botanical genera and auth’s from Index Nominum Genericorum (ING) supplemented with other sources, tax. placements and synonymies from multiple sources including GRIN (APGIII in the main), Index Fungorum, AlgaeBase, CyanoDB, more • Prokaryote genera, auth’s and tax. placements from LSPN (Euzéby list), previous/non-valid names from multiple sources • Virus genera and tax. placements from ICTV db (multiple versions – very different through time) • Species lists (all groups) from CoL 2006, Aphia/WoRMS 2006, AFD, NZ Organisms Register + more. Tony Rees: IRMNG March 2012
  • 5. IRMNG content as at March 2012 (cf. e.g. Cat. of Life): Cat. of Life (2011 version): IRMNG: • 8k families • 19k families • 178k genera • 454k genera • 2.25m species names • 1.46m species names (including synonyms) (including synonyms) • Not all IRMNG genera yet linked to relevant families, but ~370k are (remainder linked to higher taxon i.e. phylum, class or order) • Extant/fossil, marine/nonmarine flags held for majority of names • Nomenclatural status known for most names, tax. status i.e. valid name/synonym for only a subset at this time (varies by group) • Authority known for >97% of genera, publication details for “animal” subset (from Nomenclator Zoologicus in the main) • Fuzzy matching (TAXAMATCH) deployed over all web-based queries for correction of potential errors in input names to be matched. Tony Rees: IRMNG March 2012
  • 6. IRMNG in practice – example genus = “Lawsonia” • Same name is currently a valid genus in 3 Codes i.e. plants, animals and bacteria (no barriers to this) Tony Rees: IRMNG March 2012
  • 7. Required base information is scattered in multiple systems / printed works at this time plant animal bacterium (etc.) Tony Rees: IRMNG March 2012
  • 8. Required base information is scattered in multiple systems / printed works at this time plant animal bacterium (etc.) Tony Rees: IRMNG March 2012
  • 9. IRMNG query as at March 2012 Tony Rees: IRMNG March 2012
  • 10. IRMNG query as at March 2012 synonym extant, habitat of (as flags known) children parents Tony Rees: IRMNG March 2012
  • 11. Note: IRMNG fields displayed on the web are only a subset of full information held for any name, e.g.: Tony Rees: IRMNG March 2012
  • 12. IRMNG core fields • IRMNG ID, Rank • Extant/fossil, marine/nonmarine • Scientific name (for species: epithet + flags + “according to” (could be “as parent ID) per parent”) • Authority • Date entered, last modified, • Publication (as “microcitation” – subset deprecated (where required) with link to refs. module) • Source(s) for above • Orthography verified against (under consideration…) (authoritative source) • Intermediate ranks e.g. subfamily, • Parent ID (+ “according to…”) – subgenus, also infraspecies (not currently held) Linnaean ranks only at this time • Type genus / species indicator • Nomenclatural status (+ relation with other names as needed) + “according • Freshwater / terrestrial flags vs. to…” present “nonmarine” • Taxonomic status (same) • Geo flags (country codes etc.) • Nomenclatural Code • Palaeo range (periods/epochs) • Taxonomic or nomenclatural remarks • Vernacular names as available Tony Rees: IRMNG March 2012
  • 13. IRMNG is not just a “passive” aggregator… Editorial / curatorial decisions / actions required to: • Correct obvious data errors • Assemble “complete” records from multiple sources (where one source data deficient) • Normalise authority data (in particular) to a “house style” • Digitise or transcribe print material into electronic form where not otherwise available • Decide between conflicting content in data sources e.g. for authority orthography/year, taxonomic placement, valid/synonym status and more • Cross-link names e.g. synonyms -> current names, basionyms -> replacement names, misspelled names to their correctly spelled counterparts, etc. etc. • Reconcile variant higher taxonomies as supplied to a single hierarchy • Add nomenclatural or taxonomic remarks as required. Tony Rees: IRMNG March 2012
  • 14. Relevance to present meeting? • Demonstrates utility of a single entry point to a system permitting query on “any name” – i.e., a [comprehensive] Taxonomic Name Resolution Service (TNRS) covering all life • Envisage something like OBIS or GBIF, but for taxonomy – the aggregator / central query point is not a content author, but provides integration and value-added services • IRMNG – based on static snapshot/s of multiple data sources; cf. a “super catalogue” should be based on live feeds from relevant authoritative sources, continuously updated as available (?+ some static data not available as feeds) • Maybe the static data lives outside the “data aggregation/query” point, becomes a separately managed source • How does / should GNA facilitate this? • Will the need for an IRMNG (or IRMNG equivalent) disappear or grow in the above scenario? (for example could this role be taken by another player or group of players…) Tony Rees: IRMNG March 2012
  • 15. Thank you! Tony Rees: IRMNG March 2012
  • 17. Size of the task: IRMNG 2011 content cf. Cat. of Life 2011 IRMNG – Cat. of Life - % with IRMNG – % with Oct 2011 - 2011 edition auth's Oct 2011 - auth's fossil only extant + fossil Kingdoms 8 7 (0) Phyla 111 153 (12) Classes 288 509 (64) Orders 1,233 2,645 (715) Families 8,071 0% 19,639 22.1% (6,542) Subfamilies - - - - - Genera 178,515 0% 452,848 97.1% (90,278) Subgenera - - - - - Species (valid) 1,347,224 ~100% 1,020,519 ~100% (16,792) Species (synonyms) 895,441 ~100% 440,738 ~100% (100) • CoL has 70% of valid extant species names (of est. 1.9m total), thus maybe also 70% of valid extant genera (with subset of genus-level synonyms) • IRMNG has further ~180k extant genus names and ~90k fossil names at this time (including syns) – est. ~25k still missing Tony Rees: IRMNG March 2012
  • 18. Taxonomic names: what the customer is currently offered (+ more…) publication discovery official taxon-specific integrated DB’s “all names” registers DB’s ICTV Viruses ICTV Viruses ITIS ITIS CyanoDB CyanoDB DB DB NCBI NCBI Taxonomy Taxonomy Index Index WoRMS LPSN LPSN WoRMS Fungorum Fungorum etc. (Prokaryote (Prokaryote etc. MycoBank MycoBank names) names) AlgaeBase AlgaeBase Plant GSD’s Plant GSD’s ICBN Decisions ICBN Decisions New Catalogue ChecklistBank ChecklistBank New Catalogue names of Life of Life names The Plant The Plant GNI GNI publishe publishe List, IPNI, List, IPNI, d (in Journal Journal TROPICOS, TROPICOS, d (in TOC’s, RSS GNUB GNUB TOC’s, RSS ING ING primary primary feeds, Botany feeds, literature) literature) text mining text mining Zoology PaleoDB PaleoDB Animal GSD’s Animal GSD’s Abstracting Abstracting Nomenclator Nomenclator services services Zoologicus Zoologicus Subject Subject ION (Index of Organism ION (Index of Organism bibliographies bibliographies Names) Names) Zoological Zoological Record Record Reviews, Reviews, secondary secondary literature literature ICZN Decisions ICZN Decisions other compilations e.g. regional lists, other compilations e.g. regional lists, Wikispecies, Wikipedia, more… Wikispecies, Wikipedia, more… Tony Rees: IRMNG March 2012
  • 19. Two approaches - GNI and Cat. of Life NameBank / GNI • 20m+ names – all ranks, no hierarchy • mix of “clean” and “dirty” names • many duplicates • extant + fossil, most sectors with at least some names Tony Rees: IRMNG March 2012
  • 20. GNI search result – “Lawsonia” (all ranks returned) (Mar 2012) …candidate genus names highlighted in red (although could be other ranks too) … need access to original taxonomic / nomenclatural resources to sort out / see if anything missed Tony Rees: IRMNG March 2012
  • 21. Two approaches - GNI and Cat. of Life NameBank / GNI Cat. of Life • 20m+ names – all ranks, no hierarchy • <2m names – Linnaean ranks, in • mix of “clean” and “dirty” names hierarchy • many duplicates • all “clean”/ vetted names / relationships • extant + fossil, most sectors with at • extant only, sectors either complete or least some names absent Tony Rees: IRMNG March 2012
  • 22. Cat. of Life search result – “Lawsonia” (Mar 2012) Tony Rees: IRMNG March 2012

Editor's Notes

  1. Talk prepared for GN-CoL names and taxonomy sharing workshop, Hawaii, March 2012
  2. Hierarchical approach is in contrast to e.g. NameBank, GNI -- Names are not accepted without a parent (even if this is “Animalia (awaiting allocation)” in a few cases) -- Placeholder groups e.g. “Mollusca (awaiting allocation)” are erected at Order and Family level to allow addition of genus names not yet placed to family (for homonymy in particular, also because other details e.g. publication info, extant/fossil status may already be available)
  3. Homonomy is a big problem – up to 15% of all genus names are homonyms/isonyms either within or across Codes (including some misspellings which collide with other “good” names) (*Isonyms: multiple publication instances of same name as new, based on same type or concept) Many genus names are valid across more than 1 Code (e.g. used in botany and zoology for different taxa), a handful of genus names are concurrently valid across three Codes as per this example: Lawsonia Worst example currently “ Wagneria ” – 14 instances in IRMNG, 2 valid, the rest are synonyms Cannot disentangle without a master list of genus names
  4. IRMNG is a central aggregation point for all such information as readily available from multiple sources, both electronic and print, although the compilation of names / associated nomenclatural info. outstrips the full taxonomic information at this time. Incorporation of “TAXAMATCH” fuzzy matching also permits return of other names differering only slightly from the entered name, in case one of these is in fact the intended target (also permits a degree of data cleaning and reconciliation/dediplication).
  5. The IRMNG web query interface also includes information on extant &amp; habitat flags, synonymy (as held), sources of the data, information about parent and child taxa, and so on.
  6. Currently IRMNG is structured around Linnaean ranks only i.e. kingdom / phylum / class / order / family / genus / species (no infraspecies are held at this time), may be extended in future. Deprecated records (e.g. duplicates detected during subsequent QA) are left on system with their IRMNG ID intact, in case referred to elsewhere, or require re-activation. Records are flagged as either current (valid) or non-current at the indicated Rank; not yet clear how to handle taxa considered non-current at designated rank, but current at another.
  7. - Cat. of Life misses many genus-level synonyms / misspellings recognised elsewhere (including its source DB’s) - Genera not treated as distinct data objects in CoL (unless changed recently) i.e. no authorities, publication info, nomenclatural or taxonomic remarks - Coverage of fossils is considered valuable feature of IRMNG (though no systematic attempt at species ingestion as yet)
  8. Many single sources of taxon names - often not integrated - newly published names discoverable only with some effort (although “official” registries/lists for prokaryotes, viruses) - considerable latency as names flow from published (at left) to aggregators (at right)
  9. GNI / NameBank approach: collect as many namestrings as possible, any rank - User needs to explore source/s to determine taxonomic hierarchy and other information (if held) - Or: maybe one day, will be offered in a coherent hierarchy/list (but not any time soon)
  10. GNI produces a (partial?) list of known orthographies (mix of all ranks) Species and below can generally be eliminated by pattern matching, leaving uninomial names i.e. genera and above (plus authorities), in multiple potential variants Note this suggests that there may be 3 genuinely distinct “Lawsonia” instances known to GNI at this time – although sometimes the situation is more opaque / potentially misleading (similar auth’s but different taxa, different auth’s but the same taxon, or no auth held).
  11. Cat. of Life approach: stitch together authoritative lists for global sectors complete to species - Some sectors (30% of all extant taxa) not yet sourced, may have no lists - Information above species level is sketchy (e.g. no genus, family auth’s or other information) - Fossil taxa are omitted at this time
  12. Catalogue of Life largely indexes species and infraspecies, genera are presented last with no authorities (although position in hierarchy can be accessed from this page) Note, only 2 “Lawsonia”s held, [at least] another one more somewhere not known to CoL (either missing, or out of scope i.e. fossil)