Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ifla Bhl080208cr


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

Ifla Bhl080208cr

  1. 1. Nancy Gwinn, Director, Smithsonian Institution Libraries [email_address] and Connie Rinaldo, Ernst Mayr Library, Harvard University [email_address]
  2. 2. Overview <ul><li>How BHL began </li></ul><ul><li>Purpose </li></ul><ul><li>Encyclopedia of Life </li></ul><ul><li>Why now & Who is involved? </li></ul><ul><li>How is it being done? </li></ul><ul><li>What makes this different? </li></ul><ul><li>Who else is involved? </li></ul><ul><li>Challenges ahead </li></ul>
  3. 3. What? <ul><li>MISSION </li></ul><ul><ul><li>Provide open access to biodiversity literature for scientists, researchers, students, and public world-wide </li></ul></ul><ul><li>GOALS </li></ul><ul><ul><li>Digitize the core published literature of biodiversity </li></ul></ul><ul><ul><li>Collaborate with the global taxonomic community, rights holders and others </li></ul></ul>“ The cultivation of natural science cannot be efficiently carried on without reference to an extensive library.” --C. Darwin et al 1847
  4. 4. Why now? <ul><li>Biodiversity is HOT </li></ul><ul><li>Taxonomic literature has extreme longevity </li></ul><ul><li>Current taxonomic literature often relies on texts and specimens > 100 years old. </li></ul><ul><li>Tractable, well-defined scientific domain </li></ul><ul><li>Cost low – 10-19 cents a page </li></ul><ul><li>Supports GBIF and other international initiatives </li></ul><ul><li>Literature repatriation </li></ul>Distribution of Biologica Centrali-Americana Courtesy, Martin Kalfatovic
  5. 5. Encyclopedia of Life <ul><li>“ The launch of the Encyclopedia of Life will have a profound and creative effect in science… this effort will lay out new directions for research in Every branch of biology” </li></ul><ul><ul><li>E.O. Wilson </li></ul></ul>
  6. 6. Encyclopedia of Life <ul><li>EOL needed the literature underpinning in the BHL project </li></ul><ul><li>EOL launched on 9 th May, 2007 </li></ul><ul><li>BHL now key partner in EOL project </li></ul><ul><li>Total funding will reach at least $50M </li></ul>
  7. 7. How big is the Biodiversity domain? <ul><li>Over 5.4 million books dating back to 1469 </li></ul><ul><li>800,000 monographs </li></ul><ul><li>40,000 journal titles (12,500 current)‏ </li></ul><ul><li>50% pre-1923 </li></ul>
  8. 8. Why is the literature so important? <ul><li>Taxonomic descriptions must be published for the name to be valid </li></ul><ul><li>Publications must be available to the public through trusted sources </li></ul><ul><li>Libraries have been the traditional place </li></ul>
  9. 9. Taxonomic Literature The cited half-life of publications in taxonomy is longer than in any other scientific discipline * * * The decay rate is longer than in any scientific discipline
  10. 10. WHO? Member Partners <ul><li>Museum libraries </li></ul><ul><li>American Museum of Natural History </li></ul><ul><li>Field Museum of Natural History </li></ul><ul><li>Natural History Museum, London </li></ul><ul><li>Smithsonian Institution Libraries </li></ul><ul><li>Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University </li></ul><ul><li>Botany libraries </li></ul><ul><li>Missouri Botanical Garden </li></ul><ul><li>New York Botanical Garden </li></ul><ul><li>Royal Botanic Garden, Kew </li></ul><ul><li>Botany Libraries, Harvard University </li></ul><ul><li>Research institute library </li></ul><ul><li>Marine Biological Laboratory / Woods Hole Oceanographic Institution </li></ul>
  11. 11. BHL Collections <ul><ul><li>1.3 million catalogue records </li></ul></ul><ul><ul><li>73% are monographs (remainder are serials at title-level) </li></ul></ul><ul><ul><li>63% is English language material </li></ul></ul><ul><ul><li>The next most popular language (9%) is German </li></ul></ul><ul><ul><li>About 30% of material was published before 1923 </li></ul></ul>
  12. 12. The Internet Archive <ul><li>501(c)(3) organization </li></ul><ul><li>Dedicated to “Universal Access to Human Knowledge” </li></ul><ul><li>Founder of the Open Content Alliance </li></ul><ul><li>Provides: </li></ul><ul><ul><li>Mass scanning </li></ul></ul><ul><ul><li>Archival storage of files </li></ul></ul><ul><ul><li>Image processing </li></ul></ul><ul><ul><li>Technology development </li></ul></ul>
  13. 13. <ul><li>Internet Archive scanning centers in London, New York, DC, Boston, Illinois </li></ul><ul><li>Image files/text derived from OCR </li></ul>HOW?
  14. 14. ERNST MAYR LIBRARY STORY (1) <ul><li>Workflow: </li></ul><ul><ul><li>Generate picklists </li></ul></ul><ul><ul><li>Identify acceptable items: size, foldouts </li></ul></ul><ul><ul><li>Avoid duplication </li></ul></ul><ul><ul><li>Barcode </li></ul></ul><ul><ul><li>Generate packing list; check-out & pack books </li></ul></ul><ul><ul><li>Check-in, reshelve returns </li></ul></ul>
  15. 15. ERNST MAYR LIBRARY STORY (2) <ul><li>Serials bid list & Monograph de-duping tool </li></ul><ul><li>OCLC Collection Analysis Tool </li></ul><ul><li>Internet Archive provides image files & text from Optical Character Recognition (OCR) to BHL portal </li></ul><ul><li>“ Boutique” scanning </li></ul><ul><li>2.5 FTE devoted to project </li></ul>
  16. 17. What makes this project different ? TAXONOMIC INTELLIGENCE
  17. 18. “ All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.” ~ Grimaldi & Engel, 2005, Evolution of the Insects
  18. 19. Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms. Established at the Marine Biological Laboratory/Woods Hole Oceanographic Institute
  19. 20. <ul><li>10.7 million name strings in NameBank </li></ul><ul><li>Uses sophisticated algorithm to locate likely name strings in OCR text </li></ul><ul><li>Processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition </li></ul>Taxonomic Intelligence
  20. 21. BHL Portal http://www. biodiversitylibrary .org <ul><li>Portal developed at Missouri Botanical Garden </li></ul><ul><li>Portal serves image & text files & uses a variety of tools to organize the content </li></ul><ul><li>Persistent URLs allow linking at bibliographic record, volume, & page levels in BHL & to other taxonomic services </li></ul>
  21. 23. Page Delivery
  22. 25. Publishers & Permissions <ul><li>Seek permissions from copyright holders of journals (49 so far) </li></ul><ul><li>Opt in Copyright Model </li></ul><ul><li>Will digitize learned society backfiles and mount them through the BHL Portal at no cost to the society </li></ul><ul><li>Will provide a set of files to the publishers for reuse as they see fit </li></ul>
  23. 26. So far, BHL has shown <ul><li>High levels of OCR accuracy in late 19th and 20th century printing </li></ul><ul><li>Taxonomic intelligence (species name finding) across millions of pages against nearly 11 million names in NameBank is highly effective </li></ul><ul><li>Administratively separate and geographically dispersed institutions can collaborate effectively </li></ul><ul><li>Society journal publishers are enthusiastic about participation in the BHL opt-in copyright model </li></ul><ul><li>The project has generated excitement in the international community and many opportunities to develop new partnerships </li></ul><ul><li>Ability to generate significant financial support </li></ul>
  24. 27. Funding <ul><li>Initial $3 million from John D. and Catherine T. MacArthur Foundation </li></ul><ul><li>Gordon Moore Foundation </li></ul><ul><li>Individual members (Harvard, Smithsonian, NY Botanical Garden </li></ul>
  25. 28. <ul><li>Open Content Alliance </li></ul><ul><li>International Commission on Zoological Nomenclature </li></ul><ul><li>European Distributed Institute of Taxonomy </li></ul><ul><li>Global Biodiversity Information Facility (GBIF) </li></ul><ul><li>Atlas of Living Australia </li></ul><ul><li>BioOne </li></ul><ul><li>Chinese Academy of Sciences </li></ul>Potential Collaborators
  26. 29. Future Directions <ul><li>Sustainable platform </li></ul><ul><li>Ability to scan fold-outs, over-sized volumes </li></ul><ul><li>Time to access pages slow </li></ul><ul><li>Mirror sites </li></ul><ul><li>How to represent results to users? </li></ul><ul><ul><li>6.7 million pages in BHL portal </li></ul></ul><ul><ul><li>14.7 mill. Name occurrences using Taxon Finder </li></ul></ul><ul><ul><li>One search can yield 19,000 occurrences of single name </li></ul></ul>
  27. 30. Future Directions <ul><li>Further develop global partnerships (BHL Europe, e.g.) & add multiple languages </li></ul><ul><li>Further develop partnerships with publishers </li></ul><ul><li>Improved OCR </li></ul><ul><li>Enhance connections with EOL </li></ul><ul><li>Linkages to molecular, morphological & other data types </li></ul><ul><li>Expand content analysis & tools to new audiences </li></ul><ul><li>Grey Literature & archives </li></ul><ul><li>Article-level analysis of serials using automated tools </li></ul>
  28. 31. CREDITS <ul><li>Thanks to: </li></ul><ul><ul><li>All BHL partners and staff whose slides and presentations we borrowed </li></ul></ul><ul><li>Images from </li></ul><ul><ul><li>The Galaxy of Images, Smithsonian Libraries ( www. sil . si . edu/imagegalaxy ) </li></ul></ul>
  29. 32. LINKS <ul><li>Biodiversity Heritage Library </li></ul><ul><li>Biodiversity Heritage Library Blog </li></ul><ul><li>Encyclopedia of Life </li></ul><ul><li>Smithsonian Institution Libraries </li></ul><ul><li>Universal Biological Indexer and Organizer </li></ul><ul><li>Biologia Centrali-Americana </li></ul>