The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life


Published on

Presentation at the Biodiversity Heritage Library @ Smithsonian Libraries event during ALA (June 25, 2007) held at the National Museum of Natural History

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life

  1. 1. The Biodiversity Heritage Library Martin R. Kalfatovic Smithsonian Institution Libraries A Cornerstone of the Encyclopedia of Life
  2. 2. Biodiversity Heritage Library
  3. 3. Structure of the Encyclopedia of Life
  4. 4. OH O H 2 N OH H Serine Molecule
  5. 5. Education & Outreach Smithsonian/Harvard Informatics Marine Biological Laboratory Secretariat Smithsonian Synthesis Center Field Museum Biodiversity Heritage Library
  6. 6. Biodiversity Heritage Library <ul><li>2003, Telluride. Encyclopedia of Life meeting </li></ul><ul><li>February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature </li></ul><ul><li>May 2005. Washington. Ground work for the Biodiversity Heritage Library </li></ul><ul><li>June 2006. Washington. Organizational and Technical meeting </li></ul><ul><li>August 2006. New York Botanical Garden. BHL Director’s Meeting. </li></ul><ul><li>October 2006. St. Louis/San Francisco. Technical meetings </li></ul><ul><li>February 2007. Museum of Comparative Zoology. Organizational meeting </li></ul>
  7. 7. Biodiversity Heritage Library <ul><li>American Museum of Natural History (New York) </li></ul><ul><li>Field Museum (Chicago) </li></ul><ul><li>Natural History Museum (London) </li></ul><ul><li>Smithsonian Institution (Washington) </li></ul><ul><li>Missouri Botanical Garden (St. Louis) </li></ul><ul><li>New York Botanical Garden (New York) </li></ul><ul><li>Royal Botanic Garden, Kew </li></ul><ul><li>Botany Libraries, Harvard University </li></ul><ul><li>Ernst Meyer Library of the Museum of Comparative Zoology, Harvard University </li></ul><ul><li>Marine Biological Laboratory / Woods Hole Oceanographic Institution </li></ul>James Dwight Dana Zoophytes. Atlas , 1849
  8. 8. Taxonomic Literature <ul><li>Over 250 years of systematic description of life </li></ul><ul><li>The cited half-life of publications in taxonomy is longer than in any other scientific discipline </li></ul><ul><li>The decay rate is longer than in any scientific discipline </li></ul><ul><ul><li>Tom Moritz </li></ul></ul>
  9. 9. Literature Repatriation Biologia Centrali-Americana. Edited by Frederick Ducane Godman and Osbert Salvin. London : Pub. for the editors by R. H. Porter, 1879-1915
  10. 10. Digital Divide?
  11. 11. Digital Divide? Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database … Much of the information Chavan seeks is in old, out-of-print tomes … To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet. “ Science in the Web Age: The Real Death of Print” by Andreas von Bubnoff Nature 438, 550-552 1 December 2005 Henry Walter Bates The Naturalist on the River Amazons, 1863
  12. 12. Narrowing the Divide
  13. 13. <ul><li>Core literature pre-1923: 400,000 (80 million pages) </li></ul><ul><li>All pre-1923: 600-750,000 (120-150 million pages) </li></ul><ul><li>All literature: 1.4-1.6 million (280-320 million pages) </li></ul>Biodiversity Heritage Library Mass. Zoological and Botanical Survey Reports on the fishes, reptiles and birds of Massachusetts , 1839
  14. 14. Changing Priorities <ul><li>Open Access for scientific literature </li></ul><ul><li>Encourage re-use and re-purposing of the data in multiple and diverse systems </li></ul><ul><li>Work with non-commercial publishers to provide access </li></ul>
  15. 15. Changing Priorities <ul><li>BHL has had discussions with various society publishers as well as: </li></ul><ul><ul><li>BioOne </li></ul></ul><ul><ul><li>JSTOR </li></ul></ul>T.H. Huxley by Leslie Ward (“Spy”)
  16. 16. Digital Book Creation <ul><li>Automated structure detection – vital for serials </li></ul><ul><li>Taxonomic Intelligence </li></ul><ul><li>Digital Identifiers </li></ul><ul><li>Scalable mass scanning (outside of the Google environment) </li></ul>Richard Owen by Leslie Ward (“Spy”)
  17. 17. BHL Structural Metadata First Ingest Internet Archive 390888347 45632 390888346 45632 390888345 45632 390888344 45632 390888343 45632 Sub-element Barcode Bib #
  18. 18. BHL Structural Metadata Sub-Element Map Internet Archive 5 390888343 45632 4 390888343 45632 3 390888343 45632 2 390888343 45632 1 390888343 45632 Sub-element Barcode Bib #
  19. 19. BHL Structural Metadata Page Structure Map Internet Archive XML structure map that delineates the relationships of the images created automatically 1 390888343 45632 Sub-element Barcode Bib # 0005 0004 0003 0002 0001 Image Number
  20. 20. Taxonomic Intelligence
  21. 21. Taxonomic Intelligence <ul><li>9.4 million name strings in NameBank </li></ul><ul><li>Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text </li></ul><ul><li>Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition </li></ul>Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux) , 1799-1808
  22. 22. Digital Identifiers <ul><li>Digital Object Identifiers (DOI) </li></ul><ul><li>Handles </li></ul><ul><li>Life Science Identifiers (LSID) </li></ul><ul><li>URIs </li></ul><ul><li>Etc. </li></ul>Telespiza palmeri Avifauna of Laysan , 1893-1900
  23. 23. Digital Identifiers <ul><li>Factors: </li></ul><ul><ul><li>Cost per identifier </li></ul></ul><ul><ul><li>Community acceptance </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><li>BHL is working with TDWG and others to come up with the best scheme(s) </li></ul>Moho bishopi Aviafauna of Laysan , 1893-1900
  24. 24. Scalable Mass Scanning
  25. 25. The Internet Archive <ul><li>501(c)(3) organization </li></ul><ul><li>Dedicated to “Universal Access to Human Knowledge” </li></ul><ul><li>Founder of the Open Content Alliance </li></ul><ul><li>Provides: </li></ul><ul><ul><li>Mass scanning </li></ul></ul><ul><ul><li>Archival storage of files </li></ul></ul><ul><ul><li>Image processing </li></ul></ul><ul><ul><li>Technology development </li></ul></ul>
  26. 26. Internet Archive Scribe Scanner <ul><li>Single Scribe Machine </li></ul><ul><ul><li>Human operated </li></ul></ul><ul><ul><li>200 volumes per shift per week </li></ul></ul><ul><ul><li>~ 70,000 pages from a single machine per week </li></ul></ul><ul><ul><li>Cost: $100,000 / year </li></ul></ul>
  27. 27. Internet Archive Scribe: Boston <ul><li>Cooperative facility with the Boston Library Consortium (19 New England Libraries) </li></ul><ul><li>BHL Members MBL/WHOI and Harvard Libraries will use the facility </li></ul><ul><li>Status: In production </li></ul>
  28. 28. Internet Archive Scribe: Boston
  29. 29. Internet Archive Scribe: London <ul><li>Single Scribe in place </li></ul><ul><li>Projected 5 unit pod to be located at The Natural History Museum </li></ul><ul><li>Status: In production </li></ul>
  30. 30. Internet Archive Scribe: London
  31. 31. Internet Archive Scribe: Washington <ul><li>Single unit arrived May 5 </li></ul><ul><li>Funded by Smithsonian Libraries </li></ul><ul><li>Projected 5 unit BHL pod in National Museum of Natural History </li></ul><ul><li>Projected 10-15 unit pod shared by Smithsonian/BHL and regional Washington libraries </li></ul>
  32. 32. Internet Archive Scribe: Washington
  33. 33. Internet Archive Scribe: New York <ul><li>Current BHL plans focus on sharing a 10 unit pod located at the New York Public Library </li></ul><ul><li>American Museum of Natural History and New York Botanical Garden will use this facility </li></ul><ul><li>Status: in planning </li></ul>Carl von Linné (1707 - 1778)
  34. 34. Internet Archive Scribe: Illinois <ul><li>Two machines funded by State of Illinois </li></ul><ul><li>UIUC scanning Fieldiana (all series) </li></ul><ul><li>Arrangement coordinated by Michael Godow, Bryan Heidorn (UIUC/GSLIS), Betsy Kruger (UIUC/Library) </li></ul><ul><li>Status: In production </li></ul>
  35. 35. Internet Archive Scribe: Illinois
  36. 36. BHL Portal <ul><li>Library catalog-like interface to BHL literature </li></ul><ul><li>Enhanced structural analysis to provide volume/issue/article page access to the literature </li></ul><ul><li>Iterative development based on feedback from user community </li></ul><ul><li>Provide access to two key audiences: </li></ul><ul><ul><li>Humans </li></ul></ul><ul><ul><li>Machines </li></ul></ul>
  37. 37.
  38. 46. BHL Literature Online 1,291,485 pages 657,310 pages via BHL Portal Yet another physical difficulty is the task of assembling the library and indexes which will enable the student to work under proper conditions…. the beginner must now be prepared to spend liberally, or else must establish himself in an institution where a large library exists ; if he work by himself with only a few books, he will have to confine himself to a very narrow specialty indeed. 'The Limitations of Taxonomy' by J.M. Aldrich, Science , April 22, 1927, vol. LXV, no. 1686, p.381
  39. 47. Biodiversity Heritage Library
  40. 48. Biodiversity Heritage Library