SHERBORN’S INDEX ANIMALIUM
INTEGRATION INTO ION: ACCESS TO ALL
NIGEL ROBINSON
DIRECTOR YORK OPERATIONS

28 OCTOBER 2011
ZOOLOGICAL RECORD & INDEX ANIMALIUM




                             1758         1850
©2010 Thomson Reuters




                        Index Animalium          1864            1999       Current ZR
                                          Zoological Record   ION created    and ION
                                               founded
TEXT TO DATABASE




     ID        Image ID        Page             Name & reference
"362382","SIL34_02_24_0193","6101","splendens Turdus, W. E. Leach, Zool. Miscell. II. 1815, 30.”
CHALLENGES




                        • Interpretation of Index Animalium data
                        • Inconsistent formatting, punctuation etc. in original books
                        • OCR errors
                        • Variant publication titles
©2010 Thomson Reuters




                        • Application of management classification
NORMALISED DATA RECORD

                        Genus: Turdus
                        Species: splendens                                 • Bibliographic
                        Name string: Turdus splendens                        record
                        Author lastname: Leach
                                                                           • Classification
                        Author forename: W.E.
                                                                             routines
                        Abbreviated publication name: Zool. Miscell.
                        Full publication name: The Zoological Miscellany
                                                                           • ZR style entry
                        Volume: II                                         • Standard data
                        Year: 1815                                           load and QC
©2010 Thomson Reuters




                        Page: 30
PROCESS FLOW
                            BIODIVERSITY COMMUNITY

                        OUTPUT FILES             ION


                        DATA WAREHOUSE (in standard format)

                                                              Correction cycles
                           ZR CLASSIFICATION APPLIED

                                PARSING ROUTINES
©2010 Thomson Reuters




                                    INPUT FILE
INDEX TO ORGANISM NAMES SEARCH




           Turdus splendens
RESULTS AND VIEWS
DETAILS - MERGED RECORD
LINK OUT FOR ADDITIONAL INFORMATION
UPDATE NOTIFICATIONS – RSS ALERTS
METRICS INTEGRATION
NEXT STEPS

                                • Load next 200,000 names
                                • Further processing of remaining names
                        2011
                                •   Load remaining names
                                •   Web services and APIs
                                •   Publication name unification
                        2012    •   Apply unification and links to BHL
©2010 Thomson Reuters
BENEFITS OF COLLABORATION
                        • Zoological Record production
                          – Authority file validation
                          – Resolution of homonyms in ZR production
                             • Enable notification to authors
                          – Increased links to literature and ZR in ION

                        • Extends the list of names used in the
                          literature to 1758
                          – Homonym checks
                          – Data available for other projects
                             • More granular digital format
©2010 Thomson Reuters




                             • Cleaner version of the data
QUESTIONS??


                        • Access
                              http://www.organismnames.com
                        • Contact
                              nigel.robinson@thomsonreuters.com
©2010 Thomson Reuters

Sherborn: Robinson - Sherborn’s Index Animalium integration into ION: access to all

  • 1.
    SHERBORN’S INDEX ANIMALIUM INTEGRATIONINTO ION: ACCESS TO ALL NIGEL ROBINSON DIRECTOR YORK OPERATIONS 28 OCTOBER 2011
  • 2.
    ZOOLOGICAL RECORD &INDEX ANIMALIUM 1758 1850 ©2010 Thomson Reuters Index Animalium 1864 1999 Current ZR Zoological Record ION created and ION founded
  • 3.
    TEXT TO DATABASE ID Image ID Page Name & reference "362382","SIL34_02_24_0193","6101","splendens Turdus, W. E. Leach, Zool. Miscell. II. 1815, 30.”
  • 4.
    CHALLENGES • Interpretation of Index Animalium data • Inconsistent formatting, punctuation etc. in original books • OCR errors • Variant publication titles ©2010 Thomson Reuters • Application of management classification
  • 5.
    NORMALISED DATA RECORD Genus: Turdus Species: splendens • Bibliographic Name string: Turdus splendens record Author lastname: Leach • Classification Author forename: W.E. routines Abbreviated publication name: Zool. Miscell. Full publication name: The Zoological Miscellany • ZR style entry Volume: II • Standard data Year: 1815 load and QC ©2010 Thomson Reuters Page: 30
  • 6.
    PROCESS FLOW BIODIVERSITY COMMUNITY OUTPUT FILES ION DATA WAREHOUSE (in standard format) Correction cycles ZR CLASSIFICATION APPLIED PARSING ROUTINES ©2010 Thomson Reuters INPUT FILE
  • 7.
    INDEX TO ORGANISMNAMES SEARCH Turdus splendens
  • 8.
  • 9.
  • 10.
    LINK OUT FORADDITIONAL INFORMATION
  • 11.
  • 12.
  • 13.
    NEXT STEPS • Load next 200,000 names • Further processing of remaining names 2011 • Load remaining names • Web services and APIs • Publication name unification 2012 • Apply unification and links to BHL ©2010 Thomson Reuters
  • 14.
    BENEFITS OF COLLABORATION • Zoological Record production – Authority file validation – Resolution of homonyms in ZR production • Enable notification to authors – Increased links to literature and ZR in ION • Extends the list of names used in the literature to 1758 – Homonym checks – Data available for other projects • More granular digital format ©2010 Thomson Reuters • Cleaner version of the data
  • 15.
    QUESTIONS?? • Access http://www.organismnames.com • Contact nigel.robinson@thomsonreuters.com ©2010 Thomson Reuters

Editor's Notes

  • #5 More granularity needed to be a useful tool in detailed nomenclatural and bibliographic databasesBreak down data into standard bibliographic and nomenclatural fieldsEmploy current ZR editorial processes and practicesFacilitate loading to ION through normal update processes
  • #8 IA data csv fileParsing algorithmsNormalised data loaded to data warehouseZR classification appliedFurther QC, clean upAssignment of BHL linksLoad to ION, assignment of LSIDUpdate capabilities