Your SlideShare is downloading. ×
An Inordinate Fondness for Data: The Biodiversity Heritage Library
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

An Inordinate Fondness for Data: The Biodiversity Heritage Library


Published on

An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.

An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.

Published in: Technology, Education

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. An Inordinate Fondness for Data The Biodiversity Heritage Library OCLC Digital Forum East 2009 5 November 2009 Arlington, VA Martin R. Kalfatovic Smithsonian Institution Libraries
  • 2. American Museum of Natural History (New York) Academy of Natural Sciences Philadelphia California Academy of Sciences (San Francisco) Field Museum (Chicago) Natural History Museum (London) Smithsonian Institution Libraries (Washington) Missouri Botanical Garden (St. Louis) New York Botanical Garden (New York) Royal Botanic Garden, Kew Botany Libraries, Harvard University Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University Marine Biological Laboratory / Woods Hole Oceanographic Institution
  • 3. The Encyclopedia of Life
  • 4. Education and Outreach Smithsonian & Harvard H Synthesis Center Field Museum Species Pages & Secretariat Smithsonian Informatics Marine Biological Laboratory Missouri Botanical Garden
  • 5. How much is there: Core literature pre- 1923: 100 million pages (?) All pre-1923: 120- 150 million pages All literature: 280-320 million pages
  • 6. • Northeast Regional Scanning Facility (Boston) • Jersey City Facility • University of Illinois • Natural History Museum, London • Missouri Botanical Garden (Non-Scribe operation) • Fedscan (Library of Congress) • Smithsonian Libraries
  • 7. BHL Members: BHL-Europe • Museum für Naturkunde - • Stichting Nationaal Leibniz-Institut für Evolutions- Natuurhistorisch Museum, und Biodiversitätsforschung an Naturalis der Humboldt-Universität zu • National Botanic Garden of Berlin Belgium • Natural History Museum, UK • Royal Museum for Central Africa, • Narodni muzeum NMP CZ • Royal Belgian Institute of Natural • Angewandte Informationstechnik Sciences Forschungsgesellschaft mbH • Bibliothèque nationale de France • Freie Universität Berlin • Museum national d’histoire FUBBGBM naturelle • Georg-August-Universität • Consejo Superior de Göttingen Stiftung Öffentlichen Investigaciones Cientificas Rechts • Università degli Studi di Firenze • Naturhistorisches Museum Wien • Royal Botanic Garden, • Hungarian Natural History Edinburgh Museum • Species 2000 • Museum and Institute of Zoology, Polish Academy of • John Wiley & Sons limited Sciences • Helsingin yliopisto UH-Viikki • University of Copenhagen
  • 8. Now Online More than: 40,000 volumes 16 million pages Only 290 million to go! Avg. monthly growth rate 1,500 volumes 600,000 pages See you in 2048!
  • 9. Ingest existing content 12,000,000 pages+ from other Internet Archive scanning partners
  • 10. Acquiring other content ... Researches scanning their own work or literature relevant to their work Journals that have scanned their content, but do not have a robust platform to host it
  • 11. Biodiversity Heritage Library Permission Process Working with non-profit publishers for sharing with the BHL To digitize and mount works under copyright BHL must obtain permission from the copyright holders. Many biodiversity journals and monographs are published by non-profit institutions or learned societies whose mission is to promote research and learning. Some of these institutions have not sold their rights to commercial publishers and are open to sharing with the BHL.
  • 12. So what? Does [fill in blank] do that? … and more and faster?
  • 13. So what? Does [fill in blank] do that? … and more and faster?
  • 14. BHL is all about OPEN & SHARING
  • 15. Remind me again why?
  • 16. An inordinate fondness for data Access Putting biodiversity literature in the hands of researchers Set the data free Suck it; mash it; broadcast it Increase Reuse, recyle, expand
  • 17. Stats: Usage • Jan – Sep 2009 • Daily average – 266,000 visitors – 970 visitors – 436,000 visits – 1,600 visits / day – 2.1million – 7,700 pageviews / pageviews day Jan – Sep 2009 Launch to 30 Sep 2009
  • 18. Global, coordinated development New functionality from BHL-Europe Improved deduplication tools Semantic interface OAIS-compliant preservation infrastructure Building a community of developers Funded & volunteer RubyBHL: PyBHL: New partners, new content
  • 19. Open Software & Development BHL Bits: Portal code, utilities, services Taxonomic Literature Group Google Group for discussion of “taxonomic literature & the services required to make literature interoperable within biodiversity research and biodiversity informatics.”
  • 20. Open Data Downloads Simple tab-delimited exports of core data Data model DB schema as ERD
  • 21. Open Data
  • 22. Open Source Pageturning UI
  • 23. Metadata: Feedback loop Assigned to library staff for review & resolution
  • 24. Services Names Service Return all occurrences of a name throughout BHL digitized corpus Documentation: Access to 51million name strings using TaxonFinder 1.4million unique names Working out a strategy for obscure species Algorithm improvements to detect nomenclatural & taxonomic acts OpenURL Facilitate links to citations: protologues, articles, references Documentation: Useful to Nomenclators, Reference Systems IPNI Tropicos
  • 25. Services: OpenURL Request pid=title:3934&volume=3&issue=&spage=262&date=1856
  • 26. Services: OpenURL Disambiguation Looking for: BHL returns:
  • 27. Services: OpenURL Results
  • 28. Encyclopedia of Life 522,000 species pages linked to BHL #1 referring site
  • 29. Other Consumers EarthCape Labs Sort/Search capabilities with harvested names YouTube demo: BioGUID BHL Name Timeline BHL Name Comparison
  • 30. Global BHL Based on open access Open content Collaboration Shared development
  • 31. Uh, so what's it mean to me? 1.9 million known species … most described once in a hard to find article … wouldn't it be nice to know more about your neighbors ...
  • 32. And thanks to ...
  • 33. Thanks for sticking around!