An Inordinate Fondness for Data: The Biodiversity Heritage Library


Published on

An Inordinate Fondness for Data: The Biodiversity Heritage Library. Martin R. Kalfatovic. OCLC Digital Forum East 2009. November 5, 2009. Arlington, VA.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

An Inordinate Fondness for Data: The Biodiversity Heritage Library

  1. 1. An Inordinate Fondness for Data The Biodiversity Heritage Library OCLC Digital Forum East 2009 5 November 2009 Arlington, VA Martin R. Kalfatovic Smithsonian Institution Libraries
  2. 2. American Museum of Natural History (New York) Academy of Natural Sciences Philadelphia California Academy of Sciences (San Francisco) Field Museum (Chicago) Natural History Museum (London) Smithsonian Institution Libraries (Washington) Missouri Botanical Garden (St. Louis) New York Botanical Garden (New York) Royal Botanic Garden, Kew Botany Libraries, Harvard University Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University Marine Biological Laboratory / Woods Hole Oceanographic Institution
  3. 3. The Encyclopedia of Life
  4. 4. Education and Outreach Smithsonian & Harvard H Synthesis Center Field Museum Species Pages & Secretariat Smithsonian Informatics Marine Biological Laboratory Missouri Botanical Garden
  5. 5. How much is there: Core literature pre- 1923: 100 million pages (?) All pre-1923: 120- 150 million pages All literature: 280-320 million pages
  6. 6. • Northeast Regional Scanning Facility (Boston) • Jersey City Facility • University of Illinois • Natural History Museum, London • Missouri Botanical Garden (Non-Scribe operation) • Fedscan (Library of Congress) • Smithsonian Libraries
  7. 7. BHL Members: BHL-Europe • Museum für Naturkunde - • Stichting Nationaal Leibniz-Institut für Evolutions- Natuurhistorisch Museum, und Biodiversitätsforschung an Naturalis der Humboldt-Universität zu • National Botanic Garden of Berlin Belgium • Natural History Museum, UK • Royal Museum for Central Africa, • Narodni muzeum NMP CZ • Royal Belgian Institute of Natural • Angewandte Informationstechnik Sciences Forschungsgesellschaft mbH • Bibliothèque nationale de France • Freie Universität Berlin • Museum national d’histoire FUBBGBM naturelle • Georg-August-Universität • Consejo Superior de Göttingen Stiftung Öffentlichen Investigaciones Cientificas Rechts • Università degli Studi di Firenze • Naturhistorisches Museum Wien • Royal Botanic Garden, • Hungarian Natural History Edinburgh Museum • Species 2000 • Museum and Institute of Zoology, Polish Academy of • John Wiley & Sons limited Sciences • Helsingin yliopisto UH-Viikki • University of Copenhagen
  8. 8. Now Online More than: 40,000 volumes 16 million pages Only 290 million to go! Avg. monthly growth rate 1,500 volumes 600,000 pages See you in 2048!
  9. 9. Ingest existing content 12,000,000 pages+ from other Internet Archive scanning partners
  10. 10. Acquiring other content ... Researches scanning their own work or literature relevant to their work Journals that have scanned their content, but do not have a robust platform to host it
  11. 11. Biodiversity Heritage Library Permission Process Working with non-profit publishers for sharing with the BHL To digitize and mount works under copyright BHL must obtain permission from the copyright holders. Many biodiversity journals and monographs are published by non-profit institutions or learned societies whose mission is to promote research and learning. Some of these institutions have not sold their rights to commercial publishers and are open to sharing with the BHL.
  12. 12. So what? Does [fill in blank] do that? … and more and faster?
  13. 13. So what? Does [fill in blank] do that? … and more and faster?
  14. 14. BHL is all about OPEN & SHARING
  15. 15. Remind me again why?
  16. 16. An inordinate fondness for data Access Putting biodiversity literature in the hands of researchers Set the data free Suck it; mash it; broadcast it Increase Reuse, recyle, expand
  17. 17. Stats: Usage • Jan – Sep 2009 • Daily average – 266,000 visitors – 970 visitors – 436,000 visits – 1,600 visits / day – 2.1million – 7,700 pageviews / pageviews day Jan – Sep 2009 Launch to 30 Sep 2009
  18. 18. Global, coordinated development New functionality from BHL-Europe Improved deduplication tools Semantic interface OAIS-compliant preservation infrastructure Building a community of developers Funded & volunteer RubyBHL: PyBHL: New partners, new content
  19. 19. Open Software & Development BHL Bits: Portal code, utilities, services Taxonomic Literature Group Google Group for discussion of “taxonomic literature & the services required to make literature interoperable within biodiversity research and biodiversity informatics.”
  20. 20. Open Data Downloads Simple tab-delimited exports of core data Data model DB schema as ERD
  21. 21. Open Data
  22. 22. Open Source Pageturning UI
  23. 23. Metadata: Feedback loop Assigned to library staff for review & resolution
  24. 24. Services Names Service Return all occurrences of a name throughout BHL digitized corpus Documentation: Access to 51million name strings using TaxonFinder 1.4million unique names Working out a strategy for obscure species Algorithm improvements to detect nomenclatural & taxonomic acts OpenURL Facilitate links to citations: protologues, articles, references Documentation: Useful to Nomenclators, Reference Systems IPNI Tropicos
  25. 25. Services: OpenURL Request pid=title:3934&volume=3&issue=&spage=262&date=1856
  26. 26. Services: OpenURL Disambiguation Looking for: BHL returns:
  27. 27. Services: OpenURL Results
  28. 28. Encyclopedia of Life 522,000 species pages linked to BHL #1 referring site
  29. 29. Other Consumers EarthCape Labs Sort/Search capabilities with harvested names YouTube demo: BioGUID BHL Name Timeline BHL Name Comparison
  30. 30. Global BHL Based on open access Open content Collaboration Shared development
  31. 31. Uh, so what's it mean to me? 1.9 million known species … most described once in a hard to find article … wouldn't it be nice to know more about your neighbors ...
  32. 32. And thanks to ...
  33. 33. Thanks for sticking around!