Successfully reported this slideshow.

Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life


Published on

An overview of the Biodiversity Heritage Library project ; presentation given at The Field Museum, May 8, 2007

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life

  1. 1. The Biodiversity Heritage Library Martin R. Kalfatovic Smithsonian Institution Libraries A Cornerstone of the Encyclopedia of Life
  2. 2. Biodiversity Heritage Library
  3. 3. Structure of the Encyclopedia of Life
  4. 4. OH O H 2 N OH H Serine Molecule
  5. 5. Education & Outreach Smithsonian/Harvard Informatics Marine Biological Laboratory Secretariat Smithsonian Synthesis Center Field Museum Biodiversity Heritage Library
  6. 6. Biodiversity Heritage Library <ul><li>2003, Telluride. Encyclopedia of Life meeting </li></ul><ul><li>February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature </li></ul><ul><li>May 2005. Washington. Ground work for the Biodiversity Heritage Library </li></ul><ul><li>June 2006. Washington. Organizational and Technical meeting </li></ul><ul><li>August 2006. New York Botanical Garden. BHL Director’s Meeting. </li></ul><ul><li>October 2006. St. Louis/San Francisco. Technical meetings </li></ul><ul><li>February 2007. Museum of Comparative Zoology. Organizational meeting </li></ul>
  7. 7. Biodiversity Heritage Library <ul><li>American Museum of Natural History (New York) </li></ul><ul><li>Field Museum (Chicago) </li></ul><ul><li>Natural History Museum (London) </li></ul><ul><li>Smithsonian Institution (Washington) </li></ul><ul><li>Missouri Botanical Garden (St. Louis) </li></ul><ul><li>New York Botanical Garden (New York) </li></ul><ul><li>Royal Botanic Garden, Kew </li></ul><ul><li>Botany Libraries, Harvard University </li></ul><ul><li>Ernst Meyer Library of the Museum of Comparative Zoology, Harvard University </li></ul><ul><li>Marine Biological Laboratory / Woods Hole Oceanographic Institution </li></ul>James Dwight Dana Zoophytes. Atlas , 1849
  8. 8. Taxonomic Literature <ul><li>Over 250 years of systematic description of life </li></ul><ul><li>The cited half-life of publications in taxonomy is longer than in any other scientific discipline </li></ul><ul><li>The decay rate is longer than in any scientific discipline </li></ul><ul><ul><li>Tom Moritz </li></ul></ul>
  9. 9. Literature Repatriation Biologia Centrali-Americana. Edited by Frederick Ducane Godman and Osbert Salvin. London : Pub. for the editors by R. H. Porter, 1879-1915
  10. 10. Digital Divide?
  11. 11. Digital Divide? Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database … Much of the information Chavan seeks is in old, out-of-print tomes … To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet. “ Science in the Web Age: The Real Death of Print” by Andreas von Bubnoff Nature 438, 550-552 1 December 2005 Henry Walter Bates The Naturalist on the River Amazons, 1863
  12. 12. Narrowing the Divide
  13. 13. <ul><li>Core literature pre-1923: 400,000 (80 million pages) </li></ul><ul><li>All pre-1923: 600-750,000 (120-150 million pages) </li></ul><ul><li>All literature: 1.4-1.6 million (280-320 million pages) </li></ul>Biodiversity Heritage Library Mass. Zoological and Botanical Survey Reports on the fishes, reptiles and birds of Massachusetts , 1839
  14. 14. Changing Priorities <ul><li>Open Access for scientific literature </li></ul><ul><li>Encourage re-use and re-purposing of the data in multiple and diverse systems </li></ul><ul><li>Work with non-commercial publishers to provide access </li></ul>
  15. 15. Changing Priorities <ul><li>BHL has had discussions with various society publishers as well as: </li></ul><ul><ul><li>BioOne </li></ul></ul><ul><ul><li>JSTOR </li></ul></ul>T.H. Huxley by Leslie Ward (“Spy”)
  16. 16. Digital Book Creation <ul><li>Automated structure detection – vital for serials </li></ul><ul><li>Taxonomic Intelligence </li></ul><ul><li>Digital Identifiers </li></ul><ul><li>Scalable mass scanning </li></ul>Richard Owen by Leslie Ward (“Spy”)
  17. 17. BHL Structural Metadata First Ingest Internet Archive 390888347 45632 390888346 45632 390888345 45632 390888344 45632 390888343 45632 Sub-element Barcode Bib #
  18. 18. BHL Structural Metadata Sub-Element Map Internet Archive 5 390888343 45632 4 390888343 45632 3 390888343 45632 2 390888343 45632 1 390888343 45632 Sub-element Barcode Bib #
  19. 19. BHL Structural Metadata Page Structure Map Internet Archive XML structure map that delineates the relationships of the images created automatically 1 390888343 45632 Sub-element Barcode Bib # 0005 0004 0003 0002 0001 Image Number
  20. 20. Taxonomic Intelligence
  21. 21. Taxonomic Intelligence <ul><li>9.4 million name strings in NameBank </li></ul><ul><li>Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text </li></ul><ul><li>Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition </li></ul>Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux) , 1799-1808
  22. 22. Digital Identifiers <ul><li>Digital Object Identifiers (DOI) </li></ul><ul><li>Handles </li></ul><ul><li>Life Science Identifiers (LSID) </li></ul><ul><li>URIs </li></ul><ul><li>Etc. </li></ul>Telespiza palmeri Avifauna of Laysan , 1893-1900
  23. 23. Digital Identifiers <ul><li>Factors: </li></ul><ul><ul><li>Cost per identifier </li></ul></ul><ul><ul><li>Community acceptance </li></ul></ul><ul><ul><li>Scalability </li></ul></ul><ul><li>BHL is working with TDWG and others to come up with the best scheme(s) </li></ul>Moho bishopi Aviafauna of Laysan , 1893-1900
  24. 24. Scalable Mass Scanning
  25. 25. The Internet Archive <ul><li>501(c)(3) organization </li></ul><ul><li>Dedicated to “Universal Access to Human Knowledge” </li></ul><ul><li>Founder of the Open Content Alliance </li></ul><ul><li>Provides: </li></ul><ul><ul><li>Mass scanning </li></ul></ul><ul><ul><li>Archival storage of files </li></ul></ul><ul><ul><li>Image processing </li></ul></ul><ul><ul><li>Technology development </li></ul></ul>
  26. 26. Internet Archive Scribe Scanner <ul><li>Single Scribe Machine </li></ul><ul><ul><li>Human operated </li></ul></ul><ul><ul><li>200 volumes per shift per week </li></ul></ul><ul><ul><li>~ 70,000 pages from a single machine per week </li></ul></ul><ul><ul><li>Cost: $100,000 / year </li></ul></ul>
  27. 27. Internet Archive Scribe: Boston <ul><li>Cooperative facility with the Boston Library Consortium (19 New England Libraries) </li></ul><ul><li>BHL Members MBL/WHOI and Harvard Libraries will use the facility </li></ul><ul><li>Status: In production </li></ul>
  28. 28. Internet Archive Scribe: Boston
  29. 29. Internet Archive Scribe: London <ul><li>Single Scribe in place </li></ul><ul><li>Projected 5 unit pod to be located at The Natural History Museum </li></ul><ul><li>Status: In production </li></ul>
  30. 30. Internet Archive Scribe: London
  31. 31. Internet Archive Scribe: Washington <ul><li>Single unit arrived May 5 </li></ul><ul><li>Funded by Smithsonian Libraries </li></ul><ul><li>Projected 5 unit BHL pod in National Museum of Natural History </li></ul><ul><li>Projected 10-15 unit pod shared by Smithsonian/BHL and regional Washington libraries </li></ul><ul><li>Status: Being installed right now! </li></ul>
  32. 32. Internet Archive Scribe: Washington
  33. 33. Internet Archive Scribe: Washington
  34. 34. Internet Archive Scribe: Washington
  35. 35. Internet Archive Scribe: New York <ul><li>Current BHL plans focus on sharing a 10 unit pod located at the New York Public Library </li></ul><ul><li>American Museum of Natural History and New York Botanical Garden will use this facility </li></ul><ul><li>Status: in planning </li></ul>Carl von Linné (1707 - 1778)
  36. 36. Internet Archive Scribe: Illinois <ul><li>Two machines funded by State of Illinois </li></ul><ul><li>UIUC scanning Fieldiana (all series) </li></ul><ul><li>Arrangement coordinated by Michael Godow and Bryan Heidorn (UIUC/GSLIS) </li></ul><ul><li>No cost to The Field or BHL </li></ul><ul><li>Status: In production </li></ul>
  37. 37. Internet Archive Scribe: Illinois
  38. 38. Internet Archive Scribe: Illinois
  39. 42. BHL Portal <ul><li>Library catalog-like interface to BHL literature </li></ul><ul><li>Enhanced structural analysis to provide volume/issue/article page access to the literature </li></ul><ul><li>Iterative development based on feedback from user community </li></ul><ul><li>Provide access to two key audiences: </li></ul><ul><ul><li>Humans </li></ul></ul><ul><ul><li>Machines </li></ul></ul>
  40. 43.
  41. 52. BHL Literature Online 1,258,653 pages 627,360 pages via BHL Portal
  42. 53. Biodiversity Heritage Library
  43. 54. Biodiversity Heritage Library