• Like
An Inordinate Fondness for Data: The Biodiversity Heritage Library
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

An Inordinate Fondness for Data: The Biodiversity Heritage Library


An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.

An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. An Inordinate Fondness for Data The Biodiversity Heritage Library OCLC Digital Forum East 2009 5 November 2009 Arlington, VA Martin R. Kalfatovic Smithsonian Institution Libraries
  • 2. American Museum of Natural History (New York) Academy of Natural Sciences Philadelphia California Academy of Sciences (San Francisco) Field Museum (Chicago) Natural History Museum (London) Smithsonian Institution Libraries (Washington) Missouri Botanical Garden (St. Louis) New York Botanical Garden (New York) Royal Botanic Garden, Kew Botany Libraries, Harvard University Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University Marine Biological Laboratory / Woods Hole Oceanographic Institution
  • 3. The Encyclopedia of Life
  • 4. Education and Outreach Smithsonian & Harvard H Synthesis Center Field Museum Species Pages & Secretariat Smithsonian Informatics Marine Biological Laboratory Missouri Botanical Garden
  • 5. How much is there: Core literature pre- 1923: 100 million pages (?) All pre-1923: 120- 150 million pages All literature: 280-320 million pages
  • 6. • Northeast Regional Scanning Facility (Boston) • Jersey City Facility • University of Illinois • Natural History Museum, London • Missouri Botanical Garden (Non-Scribe operation) • Fedscan (Library of Congress) • Smithsonian Libraries
  • 7. BHL Members: BHL-Europe • Museum für Naturkunde - • Stichting Nationaal Leibniz-Institut für Evolutions- Natuurhistorisch Museum, und Biodiversitätsforschung an Naturalis der Humboldt-Universität zu • National Botanic Garden of Berlin Belgium • Natural History Museum, UK • Royal Museum for Central Africa, • Narodni muzeum NMP CZ • Royal Belgian Institute of Natural • Angewandte Informationstechnik Sciences Forschungsgesellschaft mbH • Bibliothèque nationale de France • Freie Universität Berlin • Museum national d’histoire FUBBGBM naturelle • Georg-August-Universität • Consejo Superior de Göttingen Stiftung Öffentlichen Investigaciones Cientificas Rechts • Università degli Studi di Firenze • Naturhistorisches Museum Wien • Royal Botanic Garden, • Hungarian Natural History Edinburgh Museum • Species 2000 • Museum and Institute of Zoology, Polish Academy of • John Wiley & Sons limited Sciences • Helsingin yliopisto UH-Viikki • University of Copenhagen
  • 8. Now Online More than: 40,000 volumes 16 million pages Only 290 million to go! Avg. monthly growth rate 1,500 volumes 600,000 pages See you in 2048!
  • 9. Ingest existing content 12,000,000 pages+ from other Internet Archive scanning partners
  • 10. Acquiring other content ... Researches scanning their own work or literature relevant to their work Journals that have scanned their content, but do not have a robust platform to host it
  • 11. Biodiversity Heritage Library Permission Process Working with non-profit publishers for sharing with the BHL To digitize and mount works under copyright BHL must obtain permission from the copyright holders. Many biodiversity journals and monographs are published by non-profit institutions or learned societies whose mission is to promote research and learning. Some of these institutions have not sold their rights to commercial publishers and are open to sharing with the BHL.
  • 12. So what? Does [fill in blank] do that? … and more and faster?
  • 13. So what? Does [fill in blank] do that? … and more and faster?
  • 14. BHL is all about OPEN & SHARING
  • 15. Remind me again why?
  • 16. An inordinate fondness for data Access Putting biodiversity literature in the hands of researchers Set the data free Suck it; mash it; broadcast it Increase Reuse, recyle, expand
  • 17. Stats: Usage • Jan – Sep 2009 • Daily average – 266,000 visitors – 970 visitors – 436,000 visits – 1,600 visits / day – 2.1million – 7,700 pageviews / pageviews day Jan – Sep 2009 Launch to 30 Sep 2009
  • 18. Global, coordinated development New functionality from BHL-Europe Improved deduplication tools Semantic interface OAIS-compliant preservation infrastructure Building a community of developers Funded & volunteer RubyBHL: http://github.com/mjy/rubyBHL PyBHL: http://linux.softpedia.com/get/Programming/Libraries/pybhl-51612.shtml New partners, new content
  • 19. Open Software & Development BHL Bits: Portal code, utilities, services http://code.google.com/p/bhl-bits/ Taxonomic Literature Group Google Group for discussion of “taxonomic literature & the services required to make literature interoperable within biodiversity research and biodiversity informatics.” http://groups.google.com/group/taxonlit
  • 20. Open Data Downloads Simple tab-delimited exports of core data http://www.biodiversitylibrary.org/data/BHLExportSchema.pdf Data model DB schema as ERD http://bhl-bits.googlecode.com/files/20090930_BHLDataModel.pdf
  • 21. Open Data
  • 22. Open Source Pageturning UI http://github.com/openlibrary/bookreader
  • 23. Metadata: Feedback loop Assigned to library staff for review & resolution
  • 24. Services Names Service Return all occurrences of a name throughout BHL digitized corpus Documentation: http://bit.ly/2e6sg9 Access to 51million name strings using TaxonFinder 1.4million unique names Working out a strategy for obscure species Algorithm improvements to detect nomenclatural & taxonomic acts OpenURL Facilitate links to citations: protologues, articles, references Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspx Useful to Nomenclators, Reference Systems IPNI Tropicos
  • 25. Services: OpenURL Request http://www.biodiversitylibrary.org/openurl? pid=title:3934&volume=3&issue=&spage=262&date=1856 http://www.tropicos.org/Name/1200408
  • 26. Services: OpenURL Disambiguation Looking for: BHL returns:
  • 27. Services: OpenURL Results
  • 28. Encyclopedia of Life 522,000 species pages linked to BHL #1 referring site
  • 29. Other Consumers EarthCape Labs Sort/Search capabilities with harvested names YouTube demo: http://www.youtube.com/watch?v=qw7qw87JTOs BioGUID BHL Name Timeline http://bioguid.info/bhl/ BHL Name Comparison http://bioguid.info/bhl/compare.php
  • 30. Global BHL Based on open access Open content Collaboration Shared development
  • 31. Uh, so what's it mean to me? 1.9 million known species … most described once in a hard to find article … wouldn't it be nice to know more about your neighbors ...
  • 32. And thanks to ...
  • 33. Thanks for sticking around!