Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Global Library of Life: The Biodiversity Heritage Library


Published on

Global Library of Life: The Biodiversity Heritage Library. Martin R. Kalfatovic. Boston Library Consortium Meeting. Boston Public Library. 18 March 2008. Boston, MA.

Published in: Education, Technology
  • Be the first to comment

Global Library of Life: The Biodiversity Heritage Library

  1. 1. A Global Library for Life Martin R. Kalfatovic Smithsonian Institution Libraries 18 March 2008
  2. 2. The cultivation of natural science cannot be efficiently carried on without reference to an extensive library Charles Darwin, et al (1847)‏ Darwin, C. R. et al. 1847. Copy of Memorial to the First Lord of the Treasury [Lord John Russell], respecting the Management of the British Museum. Parliamentary Papers, Accounts and Papers 1847, paper number (268), volume XXXIV.253 (13 April): 1-3. [Complete Works of Charles Darwin Online]
  3. 3. Taxonomic Literature <ul><li>Over 250 years of systematic description of life </li></ul><ul><li>Systema naturae (10 th ed. 1758) by Carl von Linné </li></ul>
  4. 4. Taxonomic Literature <ul><li>Taxonomic descriptions must be published for the name to be valid </li></ul><ul><li>Publications must be available to the public through trusted sources </li></ul><ul><li>Libraries have been the traditional place </li></ul>
  5. 5. Taxonomic Literature The cited half-life of publications in taxonomy is longer than in any other scientific discipline * * * The decay rate is longer than in any scientific discipline ~ Macro-economic case for open access Tom Moritz
  6. 6. BHL Timeline 2003. Telluride. Encyclopedia of Life meeting February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature May 2005. Washington. Ground work for the Biodiversity Heritage Library June 2006. Washington. Organizational and Technical meeting October 2006. St. Louis/San Francisco. Technical meetings February 2007. Museum of Comparative Zoology. Organizational meeting May 2007. Encyclopedia of Life and BHL Portal Launch. Washington DC. February 2008. Launch of EOL species pages and associated BHL literature
  7. 7. BHL Members American Museum of Natural History (New York)‏ Field Museum (Chicago)‏ Natural History Museum (London)‏ Smithsonian Institution (Washington) Missouri Botanical Garden (St. Louis)‏ New York Botanical Garden (New York)‏
  8. 8. <ul><li>Royal Botanic Garden, Kew </li></ul><ul><li>Botany Libraries, Harvard University </li></ul><ul><li>Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University </li></ul><ul><li>Marine Biological Laboratory / Woods Hole Oceanographic Institution </li></ul>BHL Members
  9. 9. BHL Members <ul><li>University of Illinois, Urbana-Champaign (contributing member)‏ </li></ul><ul><li>Scheme for addition of European and Asian partners under way </li></ul><ul><li>Additional categories of membership under consideration </li></ul>
  10. 10. BHL Focus: Literature
  11. 11. BHL Focus: Literature
  12. 12. <ul><li>Cost low – 10-19 cents a page </li></ul><ul><li>Other projects funded recently – Microsoft and Google </li></ul><ul><li>Tractable, well-defined scientific domain </li></ul><ul><li>Supports GBIF and other international initiatives – including CBD, ABS, Darwin Declaration </li></ul>BHL Focus: Literature
  13. 13. <ul><li>Core literature pre-1923: 100 million pages (?)‏ </li></ul><ul><li>All pre-1923: 120-150 million pages </li></ul><ul><li>All literature: 280-320 million pages </li></ul><ul><li>Over 5.4 million books dating back to 1469 </li></ul><ul><li>800,000 monographs </li></ul><ul><li>40,000 journal titles (12,5000 current)‏ </li></ul>BHL Focus: Literature
  14. 14. <ul><li>1.3 million catalogue records </li></ul><ul><li>73% are monographs (remainder are serials at title-level) </li></ul><ul><li>63% is English language material </li></ul><ul><li>The next most popular language (9%) is German </li></ul><ul><li>About 30% of material was published before 1923 </li></ul>BHL Collections
  15. 15. BHL Tools <ul><li>Serials Bid List </li></ul><ul><ul><li>Allows institutions to “bid” on titles for scanning </li></ul></ul><ul><ul><li>Helps avoid duplication of scanning of large serial runs </li></ul></ul><ul><ul><li>Allows for partial “bidding” so that other members can fill in runs of titles </li></ul></ul>
  16. 16. BHL Tools <ul><li>Mongraph De-duping tooll </li></ul><ul><ul><li>Separate accounts for each institution </li></ul></ul><ul><ul><li>Designed to ingest packlists in excel (.xls) format </li></ul></ul><ul><ul><li>Ability to track your institution’s packing list upload activity </li></ul></ul><ul><ul><li>If duplicates are found, you’ll be able to see which institution scanned it, and when </li></ul></ul>
  17. 17. BHL Tools <ul><li>WonderFetch </li></ul><ul><ul><li>Allows institutions to pass additional information to the metadata that is not included in a MARC record </li></ul></ul><ul><ul><ul><li>Intellectual property </li></ul></ul></ul><ul><ul><ul><li>Due diligence </li></ul></ul></ul><ul><ul><ul><li>Other identifiers </li></ul></ul></ul><ul><ul><ul><li>Enumeration and chronological information (for serial volumes)‏ </li></ul></ul></ul>
  18. 18. BHL Tools <ul><li>OCLA Collections Analysis Tool </li></ul><ul><ul><li>Allows for gross analysis of collections of the member libraries to help in selection </li></ul></ul>
  19. 19. So how do we get from this ...
  20. 20. ... to this?
  21. 21. The Internet Archive <ul><li>501(c)(3) organization </li></ul><ul><li>Dedicated to “Universal Access to Human Knowledge” </li></ul><ul><li>Founder of the Open Content Alliance </li></ul><ul><li>Provides: </li></ul><ul><ul><li>Mass scanning </li></ul></ul><ul><ul><li>Archival storage of files </li></ul></ul><ul><ul><li>Image processing </li></ul></ul><ul><ul><li>Technology development </li></ul></ul>
  22. 22. BHL Scanning Centers <ul><li>New York City </li></ul><ul><ul><li>10 Scribe facility at New York Public Library </li></ul></ul><ul><li>Boston </li></ul><ul><ul><li>10 Scribe facility at Boston Public Library </li></ul></ul><ul><li>Other </li></ul><ul><ul><li>1 Scribe, 2 shifts, London </li></ul></ul><ul><ul><li>2 Scribes at UIUC </li></ul></ul>
  23. 23. BHL Scanning Centers <ul><li>Washington, DC </li></ul><ul><ul><li>1 Scribe machine at Smithsonian Libraries </li></ul></ul><ul><ul><li>10 Scribe facility at Library of Congress with Fedlink (operational Spring 2008)‏ </li></ul></ul>
  24. 24. BHL Scanning Centers ... what Robert said!
  25. 25. Scanning Stats <ul><li>5.5 million plus total pages scanned (IA and non-IA)‏ </li></ul><ul><li>Going full speed in Boston </li></ul><ul><li>Productive smaller operations in Urbana-Champaign and London </li></ul><ul><li>Operations ramping up in Washington and New York </li></ul>
  26. 26. But what about ...
  27. 28. BHL Google (the difference between)‏ <ul><li>Bibliographic accuracy for all materials </li></ul><ul><li>Ability to re-purpose and reuse all data as needed </li></ul><ul><li>Congruence of original printed materials to digital versions </li></ul>
  28. 32. BHL Add-ons <ul><li>Identifiers added at a variety of levels </li></ul><ul><li>Structural markup for re-purposing of data in other systems </li></ul><ul><li>Semantic markup for re-purposing of data in other systems </li></ul><ul><li>Taxonomic Intelligence </li></ul>
  29. 33. Persistent Identifiers <ul><li>Stable URL </li></ul><ul><li>Handle </li></ul><ul><li>DOI </li></ul><ul><li>BICI/SICI </li></ul><ul><li>ISSN </li></ul><ul><li>ISBN </li></ul><ul><li>LSIDs </li></ul>
  30. 34. Structural Markup <ul><li><article> </li></ul><ul><li>  <title> A BRIEF CONSIDERATION OF CERTAIN POINTS IN THE MORPHOLOGY OFTHE FAMILY CHALCIDID^E.*. </title> </li></ul><ul><li>  <author> L. O. HOWARD. </author> </li></ul><ul><li>  <volume> 1 </volume> </li></ul><ul><li>  <issue> 2 </issue> </li></ul><ul><li>  <start_page> 65 </start_page> </li></ul><ul><li>  <end_page> 86 </end_page> </li></ul><ul><li>  <start_count_page> 85 </start_count_page> </li></ul><ul><li>  <end_count_page> 106 </end_count_page> </li></ul><ul><li>  <start_page_image_file> 3908800908001101smthrich_0085.djvu </start_page_image_file> </li></ul><ul><li>  <end_page_image_file> 3908800908001101smthrich_0106.djvu </end_page_image_file> </li></ul><ul><li>  </article> </li></ul>
  31. 35. Semantic Markup <ul><li>GoldenGATE The intention of the GoldenGATE editor is to build a bridge between NLP components and XML markup of natural language text according to arbitrary XML schemas. It allows the deployment of NLP components to marking up the bodies of literature they were designed for. In this way, it enables transforming the texts into XML content according to an XML schema that was designed to gain maximum benefit from the knowledge provided in them. </li></ul><ul><li>Integrated Open Taxonomic Access (INOTAXA) </li></ul>
  32. 36. <ul><li>10.7 million name strings in NameBank </li></ul><ul><li>Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text </li></ul><ul><li>Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition </li></ul>Taxonomic Intelligence
  33. 37. BHL & Publishers
  34. 38. Permissions <ul><li>Seek permissions from copyright holders </li></ul><ul><li>Opt in Copyright Model: The BHL will actively work with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals </li></ul><ul><li>BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost. </li></ul><ul><li>Will provide a set of files to the publishers for reuse as they see fit. </li></ul>
  35. 39. BHL Advantages <ul><li>Use of the articles will increase as evidenced by citation upsurge </li></ul><ul><li>Long-term management of the digital assets is provided by the BHL at no cost </li></ul><ul><li>Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century </li></ul><ul><li>Structural markup of backfiles into conformance with NLM DTD (just starting)‏ </li></ul>
  36. 40. Successes <ul><li>Entomological News </li></ul><ul><li>Journal of Hymenoptera Research </li></ul><ul><li>Herpetological Review </li></ul><ul><li>Publications of the San Diego Natural History Museum </li></ul><ul><li>California Academy of Sciences publications </li></ul><ul><li>And more ... </li></ul>
  37. 41. Funding <ul><li>Initial grant from the MacArthur and Sloan Foundations (as part of the Encyclopedia of Life grant)‏ </li></ul><ul><li>Additional support from parent institutions </li></ul><ul><li>Additional grants being actively pursued by BHL and individual members </li></ul>
  38. 43. Structure of the Encyclopedia of Life Serine Molecule
  39. 44. Serine Molecule Synthesis Center Field Museum Biodiversity Heritage Library Secretariat Smithsonian Education & Outreach Smithsonian/Harvard Informatics Marine Biological Laboratory & MOBOT
  40. 45. EOL Species Pages <ul><li>Built from a variety of new and existing sources </li></ul><ul><li>Views available for varying levels of expertise from novice to expert </li></ul><ul><li>Legacy literature a key component of the EOL species pages </li></ul>
  41. 48. Looking Forward <ul><li>Co-evolving bioinformatics resources produce a rich information ecology: </li></ul><ul><ul><li>Consortium for the Barcoding of Life (CBOL) with gene sequences deposited in GenBank. </li></ul></ul><ul><ul><li>GBIF’s Electronic Catalog of Taxonomic Names </li></ul></ul><ul><ul><li>Hebaria and museum specimen databases </li></ul></ul>
  42. 49. <ul><li>Quick ramp-up high early costs – development, mass scanning, etc. </li></ul><ul><li>Derive some long-term costs from the operating budgets of the member institutions. (examples under consideration: acquisitions budget, staff positions, etc.)‏ </li></ul><ul><li>Integrate functions/tasks with wider efforts where appropriate, e.g. mass storage. </li></ul>Looking Forward
  43. 50. The Long Now Strategy <ul><li>Institutions that are creating the BHL exist to persist through time. That’s an important part of their business </li></ul><ul><li>The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly </li></ul>
  44. 51. A Global Library for Life In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned. Charles Davies Sherborn, Epilogue to Index Animalium , March 1922
  45. 54. Thank You ... for sticking around!
  46. 55. LINKS <ul><li>Biodiversity Heritage Library </li></ul><ul><li>Biodiversity Heritage Library Blog </li></ul><ul><li>Encyclopedia of Life </li></ul><ul><li>Smithsonian Institution Libraries http:// / </li></ul><ul><li>Universal Biological Indexer and Organizer </li></ul><ul><li>Biologia Centrali-Americana </li></ul>
  47. 56. CREDITS <ul><li>Thanks to: </li></ul><ul><ul><li>Chris Freeland, Missouri Botanical Garden </li></ul></ul><ul><ul><li>Tom Garnett, The Biodiversity Heritage Library </li></ul></ul><ul><ul><li>The staff at the Internet Archive </li></ul></ul><ul><li>Images from </li></ul><ul><ul><li>The Galaxy of Images, Smithsonian Libraries ( )‏ </li></ul></ul><ul><ul><li>Martin R. Kalfatovic </li></ul></ul><ul><ul><li>Suzanne C. Pilsk </li></ul></ul><ul><ul><li>Bernard Scaife </li></ul></ul>