Applying Taxonomic Intelligence to Digitization Initiatives

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Group

    Applying Taxonomic Intelligence to Digitization Initiatives - Presentation Transcript

    1. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Applying Taxonomic Intelligence to Digitization Initiatives Translating the Value of the Research Library Information Futures Institute Cathy Norton October 27. 2007 MBLWHOI Library Director Deputy Director Biodiversity Heritage Library © 2007 MBLWHOI Library www.mblwhoilibrary.org
    2. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution TOPICS • Biodiversity Heritage Libraries • Open Content Alliance, Principles • Internet Archive Partner • Northeast Regional Digitizing Center @Boston Public Library • Taxonomic Intelligence- modernizing the literature © 2007 MBLWHOI Library www.mblwhoilibrary.org
    3. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Woods Hole Scientific Community NMFS - 1871 MBL - 1888 WHOI - 1930 USGS - 1960 SEA - 1971 WHRC - 1985 This library serves the MBL, WHOI, USGS, NMFS, SEA, WHRC, and other scientific groups in the area. Facing a new dynamic phase © 2007 MBLWHOI Library www.mblwhoilibrary.org
    4. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution The vision Imagine an electronic page for each species of organism on Earth, available everywhere by single access on command. The page contains the scientific name of the species, a pictorial or genomic presentation of the primary type specimen on which its name is based, and a summary of its diagnostic traits. The page opens out directly or by linkage with other databases such as ARKive, Ecoport, and GenBank. It comprises a summary of everything known about the species’ genome, proteome, geographic distribution, phylogenetic position, habitat, ecological relationships, and, not least, its perceived practical importance for humanity. E. O. Wilson, 2003. © 2007 MBLWHOI Library www.mblwhoilibrary.org
    5. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    6. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Vision QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Build a Digital Open Access Library for Biodiversity Literature © 2007 MBLWHOI Library www.mblwhoilibrary.org
    7. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Meetings in Colorado,2005 London, 2005 laboratories and libraries Washington BHL 2006 Simultaneous Meetings in Woods Hole for BHL& EOL 2006 © 2007 MBLWHOI Library www.mblwhoilibrary.org
    8. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Members • American Museum of Natural History • Botany Library- Harvard • British Natural History Museum, UK • Field Museum* • MBLWHOI Library* • Missouri Botanical Gardens* • Museum of Comparative Zoology-Harvard* • New York Botanical Gardens • Royal Botanical Gardens @ Kew ,UK • Smithsonian Museum of Natural History* – University of Illinois, contributing member – EOL Founding INstitution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    9. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Why BHL now ? • Legacy Taxonomic Literature available in museums has limited access • Much of it is rare • Systematic literature depends on the historic literature • The cited half-life of natural history is longer than that of any other scientific domain (TAXA TOY)Pre 1923 • 90% of Biodiversity Information is in these libraries • 90% of Biodiversity is in 3rd world countries like Africa and South America © 2007 MBLWHOI Library www.mblwhoilibrary.org
    10. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution The Open Content Alliance (OCA) represents the collaborative efforts of a group of cultural, technology, nonprofit, and governmental organizations from around the world that will help build a permanent archive of multilingual digitized text and multimedia content. © 2007 MBLWHOI Library www.mblwhoilibrary.org
    11. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Principles of OCA • The OCA will encourage the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors. INTERNET ARCHIVE • Contributors will determine the terms and conditions under which their collections are distributed and how attribution should be made. • IA need not be obligated to accept all content that is offered to it and may give preference to that which can be made widely accessible. • • IA will offer collection and item-level metadata of its hosted collections in a variety of formats. • IA welcomes efforts to create and offer tools (including finding aids, catalogs, and indexes) that will enhance the usability of the materials in the archive. • Copies of IA collections will reside in multiple archives internationally to ensure their long-term preservation and accessibility to all. © 2007 MBLWHOI Library www.mblwhoilibrary.org
    12. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Name: BioDiversity Heritage Library Wiki- for all involved Web Presence! Where to begin? © 2007 MBLWHOI Library www.mblwhoilibrary.org
    13. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution NAME of Consortium - BioDiversity Heritage Library Web Presence! © 2007 MBLWHOI Library www.mblwhoilibrary.org
    14. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    15. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    16. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    17. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    18. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    19. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    20. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    21. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    22. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    23. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    24. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    25. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution In the end… simplicity… • http://bhl.si.edu/ © 2007 MBLWHOI Library www.mblwhoilibrary.org
    26. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution • BHL invited to be a part of the EOL project. • EOL - build one web page for each known species… 1.8 million! • Alfred P. Sloan and Macarthur Foundations © 2007 MBLWHOI Library www.mblwhoilibrary.org
    27. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution QuickTime™ and a decompressor are needed to see this picture. © 2007 MBLWHOI Library www.mblwhoilibrary.org
    28. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution © 2007 MBLWHOI Library www.mblwhoilibrary.org
    29. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Principles of OCA • The OCA will encourage the greatest possible degree of access to and reuse of collections in the archive, while respecting the rights of content owners and contributors. INTERNET ARCHIVE • Contributors will determine the terms and conditions under which their collections are distributed and how attribution should be made. • IA need not be obligated to accept all content that is offered to it and may give preference to that which can be made widely accessible. • • IA will offer collection and item-level metadata of its hosted collections in a variety of formats. • IA welcomes efforts to create and offer tools (including finding aids, catalogs, and indexes) that will enhance the usability of the materials in the archive. • Copies of IA collections will reside in multiple archives internationally to ensure their long-term preservation and accessibility to all. © 2007 MBLWHOI Library www.mblwhoilibrary.org
    30. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Northeast Digitization Center • Boston Public Library Cathy Norton, Bernie Margolis, Brewster Kale – Space infrastructure • 10 Scanning Stations • .10 ¢per page • 50 Books per day • Journals- metadata,foldouts • Transportation – ILL delivery moving company 15 rolling carts per trip Photo by lesveilleus 9/20/07 © 2007 MBLWHOI Library www.mblwhoilibrary.org
    31. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Economies of Scale • North East Regional Digitization Center • Agreements made with the Boston Public Library to Include the Boston Library Consortium and NE BHL members. • Smithsonian and Library of Congress • Field Museum of Ill • BNH UK and Kew UK* Barbara Preece, Exec Director, BLC Cathy Norton-MBLWHOI Library © 2007 MBLWHOI Library www.mblwhoilibrary.org
    32. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution 10 scribes BPL © 2007 MBLWHOI Library www.mblwhoilibrary.org
    33. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Biology Digitization Projects Problems, Dilemmas,Puzzles,Difficulties • Copyright - Pre 1923, 1923-1964, orphan works, out-of-print – Stanford University Copyright Renewal Database • Permissions • Collaboration with publisher, societies, institutions, etc. • Duplicates, journals 85,000 - 14,000 BID LIST • Monographs, collection analysis-- Ref Works © 2007 MBLWHOI Library www.mblwhoilibrary.org
    34. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Name Changes over Time Taxonomic Intelligence © 2007 MBLWHOI Library www.mblwhoilibrary.org
    35. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution “All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.” ~ Grimaldi & Engel, 2005, Evolution of the Insects © 2007 MBLWHOI Library www.mblwhoilibrary.org
    36. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution • Information about named groups (taxa) of organisms (taxon-related information) • Extends back at least 1000 years • Books, journals, surveys • Museum specimens, herbaria • In many languages and is distributed From T.E. Glover, The Fishes of Southwestern Japan, c.1870 © 2007 MBLWHOI Library www.mblwhoilibrary.org
    37. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution The challenge for contemporary DIGITAL libraries But … names of organisms change over time Goal: Use one name to find the content for all names © 2007 MBLWHOI Library www.mblwhoilibrary.org
    38. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Names are even misspelled, such as Loligo pealei Loligo pealeii Loligo pealii Loligo pealei © 2007 MBLWHOI Library www.mblwhoilibrary.org
    39. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Virginia Homonyms and polysemes People Places Peranema Animals – the fern Peranema – the euglenid And of course Anorexia nervosa Habeas corpus, and Etcetera etcetera © 2007 MBLWHOI Library www.mblwhoilibrary.org
    40. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Libraries Search engines Students and researchers Publishers Federal Agencies Museums COML Federated databases 106000 515 35800 33 712 155850 Red spotted newt 18700 © 2007 MBLWHOI Library www.mblwhoilibrary.org
    41. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Serious challenges in federated environments One organism 4 scientific names 4 maps We want one map © 2007 MBLWHOI Library www.mblwhoilibrary.org
    42. MBL WHOI Library Classifications Marine Biological Laboratory Woods Hole Oceanographic Institution • Metadata – such as names – provided the power to index and search • Classifications allowed us browse, navigate, and run hierarchical searches © 2007 MBLWHOI Library www.mblwhoilibrary.org
    43. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Reconciliation – linking alternative names for the same organism A query initiated with any name, can be expanded to all names and will unify data associated with each © 2007 MBLWHOI Library www.mblwhoilibrary.org
    44. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution And the other issue connecting ALL data about all organisms together? • Data stores – mostly was not happening (despite the success of Genbank) • Search engines – not taxonomically intelligent and missed 90% • Hyperlinks – slow, tedious, and unstable • Dynamic links – using variables, databases, and code (e.g. micro*scope) • Federation – cluster of partners playing by the same rules (e.g. OBIS) • Data transfer standards – rules that anyone can use (e.g. DiGIR, TAPIR, UBDB) • API’s – spigots from databases • Aggregation (mashups) – the chosen way © 2007 MBLWHOI Library www.mblwhoilibrary.org
    45. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms • All names & all Classifications ClassificationBank • Alternative names reconciled • Similar names disambiguated • Exploit hierarchies to browse and search, build a comprehensive classification • Improve performance with federated systems • Read documents, web sites, databases and taxonomically indexing the content • Create a unified portal to information about organisms on the internet © 2007 MBLWHOI Library www.mblwhoilibrary.org
    46. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Taxonomically intelligent aggregation technology builds portals to distribute information about organisms • There are many resources out there, but no single comprehensive resource for species information • Rather than building another big database, we can create a new way to link existing information using an aggregation portal • This places little or no burden on data providers • Protecting ownership and diversity of initiatives © 2007 MBLWHOI Library www.mblwhoilibrary.org
    47. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution uBioRSS Taxonomically Intelligent RSS Feed Aggregator © 2007 MBLWHOI Library www.mblwhoilibrary.org
    48. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution MBL WHOI Library – Woods Hole authors’ publications © 2007 MBLWHOI Library www.mblwhoilibrary.org
    49. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution MBL WHOI Library – Woods Hole species publications © 2007 MBLWHOI Library www.mblwhoilibrary.org
    50. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Taxonomically intelligent scientific text parsing © 2007 MBLWHOI Library www.mblwhoilibrary.org
    51. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Taxonomically intelligent scientific text parsing © 2007 MBLWHOI Library www.mblwhoilibrary.org
    52. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Taxonomic intelligence works miracles • It will benefit any initiative that uses distributed and heterogeneous information about biology • Distributed content on the same species can be drawn together because different names will be standardized through reconciliation • We can read documents, find names, catalog and taxonomically index documents • Produce a framework around which we can organize and assemble remote and local content © 2007 MBLWHOI Library www.mblwhoilibrary.org
    53. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution “Taxonomic intelligence” enhances search © 2007 MBLWHOI Library www.mblwhoilibrary.org
    54. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution • Documents go to Internet Archive for OCR and storage • The documents are added to the BHL collection • uBio checks the BHL collection for new documents • The documents are scanned for names • TaxonFinder adds new strings to Namebank • Document markup with anchors • TaxonFinder adds all namebankIDs to Taxonomic Index • This index is called upon by various applications... © 2007 MBLWHOI Library www.mblwhoilibrary.org
    55. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Biological Data Revolution Biomedical Knowledge Biodiversity Knowledge © 2007 MBLWHOI Library www.mblwhoilibrary.org
    56. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution No Complete List of Scientific Names Scientific Names* 112,133 741,872 49,382 *Scientific Names ≠ Species Published Variants Objective Synonyms Mis-spellings Bacterium coli Escherichia coli Escheria coli Bacillus coli © 2007 MBLWHOI Library www.mblwhoilibrary.org
    57. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Taxonomic Knowledge © 2007 MBLWHOI Library www.mblwhoilibrary.org
    58. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Data, Data, Everywhere © 2007 MBLWHOI Library www.mblwhoilibrary.org
    59. MBL WHOI Library The ‘biopipes’ concept Marine Biological Laboratory Woods Hole Oceanographic Institution BIOPIPES BIOPIPES Then, dragged the functions (pipes) you wanted onto your desktop get tree get data get matching blast clade name get ITIS Google get all reconciled preferred name Earth names search Nomenclator Zoologicus get subset of Get original Original publication EoL species site description information myEoL page And, of course, saved the functionality to apply to the next data © 2007 MBLWHOI Library www.mblwhoilibrary.org
    60. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Proceeding Boldly © 2007 MBLWHOI Library www.mblwhoilibrary.org
    61. MBL WHOI Library Progress Marine Biological Laboratory Woods Hole Oceanographic Institution MID 2007 --July • EoL funds started to flow By - MID 2008 • EOL Informatics Teams--Core bioinformatics infrastructure (Taxonomic Intelligence and high priority marine modules of the Universal Biodiversity Data Bus) will be in place • BioPipes for OBIS, BOLD, GENBANK, EoL, BHL • List of most marine genera • EoL with agreement show content from FishBase, SeaLifeBase, CephBase etc. • RSS feeds and other alerts established to inform interested parties of new content © 2007 MBLWHOI Library www.mblwhoilibrary.org
    62. MBL WHOI Library Progress Marine Biological Laboratory Woods Hole Oceanographic Institution • BHL • 10 Scribes installed in Boston – MBL/ Harvard/SI/ BNH/MOBOT/Field Museum all scanning – AMNH/NYBG will use NY PUBLIC – Close to 2 million pages- AS OF NOV 07 © 2007 MBLWHOI Library www.mblwhoilibrary.org
    63. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution What role will libraries play once the scanning is done? Will you be negotiators like you are now with serials? Public domain publications restricted for EVER by contract or open? © 2007 MBLWHOI Library www.mblwhoilibrary.org
    64. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Road map • Be a community • Be a working environment • Be creative • Become and Informatics Center with scientific appointments. • Think translation not transactions! • Stay alive professionally © 2007 MBLWHOI Library www.mblwhoilibrary.org
    65. MBL WHOI Library Marine Biological Laboratory Woods Hole Oceanographic Institution Acknowledgments Martin Kalfatovic Neil Sarkar Tom Garnet David Remsen Graham Higley David Patterson Connie Rinaldo Diane Rielinger A.W. Mellon Foundation Alfred P Sloan Foundation John D and Catherine T MacArthur Foundation © 2007 MBLWHOI Library www.mblwhoilibrary.org

    + Martin KalfatovicMartin Kalfatovic, 3 years ago

    custom

    1152 views, 0 favs, 0 embeds more stats

    Applying Taxonomic Intelligence to Digitization Ini more

    More info about this document

    CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

    Go to text version

    • Total Views 1152
      • 1152 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 13
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories