Biodiversity Heritage Library


Published on

Biodiversity Heritage Library talk for EOL Fellows workshop by Dr. Tom Garnett, director of BHL

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Biodiversity Heritage Library

  1. 1. The Biodiversity Heritage Library: Liberating the World’s Biodiversity Literature Thomas Garnett EOL Fellows March 2010
  2. 2. BHL- Why? The cited half-life of publications in taxonomy is longer than in any other scientific discipline -Macro-economic case for open access, Tom Moritz -Current taxonomic literature often relies on texts and specimens > 100 years old. Levinus Vincent Elenchus tabularum, pinacothecarum, 1719 . 2
  3. 3. BHL – Why? The Taxonomic Impediment “The taxonomic impediment is a term that describes the gaps of knowledge in our taxonomic system” - Darwin Declaration, 1998 Georges Louis Leclerc, comte de Buffon Histoire naturelle : générale et particulière (Oiseaux), 1799-1808 3
  4. 4. BHL Members: US/UK • Academy of Natural Science (Philadelphia, PA) • American Museum of Natural History (New York, NY) • California Academy of Science (San Francisco, CA) • The Field Museum (Chicago, IL) • Harvard University Botany Libraries (Cambridge, MA) • Harvard University, Ernst Mayr Library of the Museum of Comparative Zoology (Cambridge, MA) • Marine Biological Laboratory / Woods Hole Oceanographic Institution (Woods Hole, MA) • Missouri Botanical Garden (St. Louis, MO) • Natural History Museum (London, UK) • The New York Botanical Garden (New York, NY) • Royal Botanic Gardens, Kew (Richmond, UK) • Smithsonian Institution Libraries (Washington, DC)
  5. 5. BHL Members: BHL-Europe • Museum für Naturkunde - Leibniz-Institut • Stichting Nationaal Natuurhistorisch für Evolutions- und Museum, Naturalis Biodiversitätsforschung an der Humboldt- • National Botanic Garden of Belgium Universität zu Berlin • Royal Museum for Central Africa, • Natural History Museum, UK • Royal Belgian Institute of Natural • Narodni muzeum NMP CZ Sciences • Angewandte Informationstechnik • Bibliothèque nationale de France Forschungsgesellschaft mbH • Museum national d’histoire naturelle • Freie Universität Berlin FUBBGBM • Consejo Superior de Investigaciones • Georg-August-Universität Göttingen Cientificas Stiftung Öffentlichen Rechts • Università degli Studi di Firenze • Naturhistorisches Museum Wien • Royal Botanic Garden, Edinburgh • Hungarian Natural History Museum • Species 2000 • Museum and Institute of Zoology, Polish • John Wiley & Sons limited Academy of Sciences • University of Copenhagen • Helsingin yliopisto UH-Viikki
  6. 6. BHL Members: BHL-China • Chinese Academy of Science – Institute of Botany • Chinese Academy of Science – Institute of Zoology • Chinese Academy of Science – Institute of Microbiology • Chinese Academy Science - Institute of Oceanography
  7. 7. BHL is a Focused Program • Though BHL has is composed of libraries it has been a domain-specific program, not just a digital library project. It arose from and is responsive to the biodiversity community composed of the disciplines of taxonomy, systematics, evolutionary biology, ecology, conservation, and wildlife management. These are the primary audience.
  8. 8. Biomechanics Biochemistry Biomagnetism Core Bioelectronics Zoos Radioecology Bioacoustics Supporting Petrology Agricultural ecology Sedimentation Paleontology Biogeomorphology Orogeny Geophysics Microscopy BioclimatologyForestry Restoration Geochemistry History of Scientific drawing ecology Taxidermy Stratigraphy Natural sciences& illustration Soil science Vivariums, Animal biochemistry Aquaculture terrariums, Geomicrobiology Natural History – Animal culture Medical botany / zoology aquariums Terminology, Abbrv. Cyanobacteria Geomorphology Immunology Specimen catalogs Natural History – Toponymy Ecophysiology Dictionaries & Encyclopedias animal Wile trade Physical geography Collection & Natural History – Virology preservation Biographies Environmental Mineralogy Continental drift Natural History – Policy Socio-cultural Plate tectonics Directories Anthropology Oceanography Economic botany Environmental Plant Culture Microbial ecology Management Geobiology Ethnology History of discoveries, Seismology Biophysics Exploration & travelBioluminescence Hydrology Plant lore Phenology Atlases & Gazeteers Cytology Wildlife conservation Genetics Melioration Coral Islands, Reefs & Atolls Physical Anthropology Fluid dynamics Topical terms Crops and climate Prehistoric archaeology Outliers Agricultural meteorology derived from LCSH
  9. 9. Core Literature Botany Plant conservation Phytogeography Plant anatomy Plant physiology Plant ecology Spermatophyta, Phanerogams Cryptogams Biological diversity Evolution Phylogenetic relationships Evolutionary genetics Scientific voyages and expeditions Pre-Linnaean works Linnaean works Biodiversity conservation Conservation biology Ecosystem management Endangered species & ecosystems Extinction Classification, Nomenclature Biogeography Zoology/Botany--Morphology Zoology/Botany--Anatomy Zoology/Botany--Embryology Zoology/Botany-- Reproduction Zoology/Botany--Geographical distribution Classification, systematics and taxonomy Zoology Invertebrates Chordates Vertebrates Animal Behavior
  10. 10. Stats: Now Online • 70,630 volumes • 26.4 million pages Oldest book: Schöffer’s Herbarius, 1484.
  11. 11. What is the plan? Digitize the core literature of biodiversity. Full works, not bits & pieces. Open Access: all content can be repurposed, reused, reformatted. Congruent: must fit in to a dynamic knowledge ecology. Scan public domain biodiversity literature. Negotiate rights to digitize copyrighted materials. Ingest content digitized by others. Provide interfaces & APIs for repository. GUIs Services for data mining & citation resolution
  12. 12. BHL Digital Preservation • Committed to long-term storage, curation, and preservation of digital text assets for the world-wide biodiversity community • BHL is a steward for this literature. • To keep this content available and open for the future requires careful organizational planning. • Preservation is both a technical and political/social process.
  13. 13. BHL Relationship with Non-Profit Journal Publishers Opt in Copyright Model: The BHL works with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals BHL indexes the articles using Taxonomic Intelligence, thereby vastly increasing their usability. Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century . 73 Permission Agreements to date. More under negotiation. Integration with gray literature in later phases of project.
  14. 14. Scanning = human work
  15. 15. Scan & Store: Internet Archive Storage in Petaboxes Scanning on Scribes
  16. 16. Referrers: 1 Jan 08 – 31 Jan 10 Jan 1, 2008 – Jan 31, 2010
  17. 17. Name Finding via TaxonFinder
  18. 18. SOAP response Name finding via TaxonFinder Submit Extract names to NameBank Image from Scanner Converted to text OCR via OC OCR OCR Name Finding in action with Taxonomic Intelligence…
  19. 19. OCR error rate for names only Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. Top OCR errors 1 Insert Space 8 n->v 35.16% 2 Omit Space 9 l->i 3 e->c 10 r->i 4 u->I 11 u->ii 5 u->n 12 h->l 6 i->l 13 h->ii 7 c->e 14 e->o
  20. 20. Considerations • Improving OCR software is out of scope – Google’s Tesseract is only viable open source option – Flurry of activity in 2006-2007, quiet since • Rekeying is expensive given size of corpus – Will not scale
  21. 21. Name finding statistics • 27.7 million pages scanned • 70.4 million name strings found • 56.2 million names verified with a NameBankID • 1.4 million unique names with a NameBankID • 3.3 million unique names *without* a NameBankID – This is where the interesting data live!!!
  22. 22.
  23. 23. PDF Generation Stats
  24. 24. Mandate for new development • display / manage articles • meet community demands for bibliography / citation management • build from more open source tools
  25. 25. Development goals re: citations • Create a repository for community-vetted taxonomic bibliographies. • Ability to ingest, display, download, and index articles so that the BHL can operate as an article repository. • Build from existing community of work around Drupal / Biblio. – In use by collaborators
  26. 26.
  27. 27.
  28. 28.
  29. 29. Services • OpenURL – Facilitate links to citations: protologues, articles, references • Documentation: • Names Service – Return all occurrences of a name throughout BHL digitized corpus • Documentation: – Access to 51million name strings using TaxonFinder – 1.4million unique names – Working out a strategy for obscure species – Algorithm improvements to detect nomenclatural & taxonomic acts • New API
  30. 30. Services: OpenURL pid=title:3934&volume=14&issue=&spage=301&date=1879
  31. 31. Services: OpenURL Disambiguation • Looking for: • BHL returns:
  32. 32. Services: OpenURL Results
  33. 33. EOL Interfaces Taxonomic name finding enhancements – Nomenclatural acts in web services – Other algorithms / verification • WoRMS data • Improvement – Ranking results – Visualization • LifeDesks – Bibliography sharing – Resolve to articles
  34. 34. Thank You Tom We welcome your input and advice. Tom Garnett Biodiversity Heritage Library Program Director 202-633-2238