An International Cooperative Digital Library for Taxonomic Literature Martin R. Kalfatovic Smithsonian Institution Librari...
 
Biodiversity <ul><li>What is Biodiversity? </li></ul><ul><li>Genetic variability within species </li></ul><ul><li>Diversit...
Biodiversity <ul><li>Wholesome food </li></ul><ul><li>Drinkable water </li></ul><ul><li>Breathable air </li></ul><ul><li>S...
Taxonomic Literature <ul><li>Over 250 years of systematic description of life </li></ul><ul><li>Systema naturae  (10 th  e...
Taxonomic Literature <ul><li>Taxonomic descriptions must be published for the name to be valid </li></ul><ul><li>Publicati...
Taxonomic Literature The cited half-life of publications in taxonomy is longer than in any other scientific discipline * *...
The cultivation of natural science cannot be efficiently carried on without reference to an extensive library Charles Darw...
<ul><li>The Taxonomic Impediment </li></ul>“ The taxonomic impediment is a term that describes the gaps of knowledge in ou...
Taxonomic Impediment <ul><li>Specimen collections </li></ul><ul><li>Databases </li></ul><ul><li>Publications </li></ul><ul...
The Taxonomic Impediment <ul><li>that there is access to information held in national/regional/global collections </li></u...
Yet another physical difficulty is the task of assembling the library and indexes which will enable the student to work un...
Biologia Centrali-Americana Biologia Centrali-Americana Edited by Frederick Ducane Godman and Osbert Salvin London : Pub. ...
Digital Divide
Henry Walter Bates The Naturalist on the River Amazons , 1863 Vishwas Chavan travels a lot. An informatician based at the ...
BHL Timeline 2003. Telluride. Encyclopedia of Life meeting February 2005. London. Library and Laboratory: the Marriage of ...
BHL Members American Museum of Natural History (New York)‏ Field Museum (Chicago)‏ Natural History Museum (London)‏ Smiths...
BHL Members <ul><li>University of Illinois, Urbana-Champaign (contributing member)‏ </li></ul><ul><li>Scheme for addition ...
 
Encyclopedia of Life … imagine for a moment that all the diversity of the world were finally revealed and then described, ...
 
OH O H 2 N OH H Informatics Marine Biological Laboratory Missouri Botanical Garden Species Pages &  Secretariat Smithsonia...
Funding <ul><li>Initial grant from the MacArthur and Sloan Foundations (as part of the Encyclopedia of Life grant)‏ </li><...
BHL Focus: Literature
BHL Focus: Literature
Mass Scanning <ul><li>Mass scanning is a proven technology </li></ul><ul><li>Post processing of generated data proven, but...
The Internet Archive <ul><li>501(c)(3) organization </li></ul><ul><li>Dedicated to “Universal Access to Human Knowledge” <...
Scribe Scanner <ul><li>Single Scribe Machine </li></ul><ul><ul><li>Custom built by the Internet Archive </li></ul></ul><ul...
BHL Scanning Centers <ul><li>Northeast Regional Scanning Center </li></ul><ul><ul><li>10 Scribe machines </li></ul></ul><u...
BHL Scanning Centers <ul><li>University of Illinois </li></ul><ul><ul><li>2 Scribe machines </li></ul></ul><ul><li>Natural...
BHL Scanning Centers <ul><li>Washington, DC </li></ul><ul><ul><li>1 Scribe machine at Smithsonian Libraries </li></ul></ul...
Scanning Stats 28 June 2008 <ul><li>6,153,568 pages </li></ul><ul><li>15,343 volumes </li></ul><ul><li>6,049 titles </li><...
Automate Discovery <ul><li>Automated, scalable structural mark-up </li></ul><ul><li>Open to schemas for semantic mark-up <...
Structural Markup <ul><li><article> </li></ul><ul><li>    <title> A BRIEF CONSIDERATION OF CERTAIN POINTS IN THE MORPHOLOG...
Semantic Markup <ul><li>GoldenGATE The intention of the GoldenGATE editor is to build a bridge between NLP components and ...
<ul><li>10.7 million name strings in NameBank </li></ul><ul><li>Uses sophisticated algorithm (TaxonGrab) to locate likely ...
Build Content
Permissions <ul><li>Seek permissions from copyright holders </li></ul><ul><li>Opt in Copyright Model: The BHL will activel...
Successes <ul><li>Entomological News </li></ul><ul><li>Journal of Hymenoptera Research </li></ul><ul><li>Herpetological Re...
BHL Advantages <ul><li>Use of the articles will increase as evidenced by citation upsurge </li></ul><ul><li>Long-term mana...
Serve Content <ul><li>Machine to machine communication </li></ul><ul><li>Human interfaceable portal </li></ul><ul><li>Stan...
 
Encyclopedia of Life
 
Persistent Identifiers <ul><li>Stable URL </li></ul><ul><li>Handle </li></ul><ul><li>DOI </li></ul><ul><li>BICI/SICI </li>...
BHL Portal <ul><li>Library catalog-like interface to BHL literature </li></ul><ul><li>Enhanced structural analysis to prov...
<ul><li>Search </li></ul><ul><li>Browse </li></ul>
Infrastructure: Now / Soon / Later <ul><li>Now: Missouri Botanical Garden development site </li></ul><ul><li>Now: Storage;...
Looking Forward <ul><li>Co-evolving bioinformatics resources produce a rich information ecology: </li></ul><ul><ul><li>Con...
<ul><li>Quick ramp-up high early costs – development, mass scanning, etc.  </li></ul><ul><li>Derive some long-term costs f...
The Long Now Strategy <ul><li>Institutions that are creating the BHL exist to persist through time.  That’s an important p...
A Global Library for Life In any well-appointed Natural History Library there should be found every book and every edition...
 
<ul><li>Midrange estimate: 25% of 5 million species = 1.3 million species, or roughly 1 every 20 minutes </li></ul><ul><li...
 
 
 
 
Thank You ... now, stick  around for ... Suzanne
LINKS <ul><li>Biodiversity Heritage Library http://www.biodiversitylibrary.org/ </li></ul><ul><li>Biodiversity Heritage Li...
CREDITS <ul><li>Thanks to: </li></ul><ul><ul><li>Chris Freeland, Missouri Botanical Garden </li></ul></ul><ul><ul><li>Tom ...
Upcoming SlideShare
Loading in...5
×

An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

1,790

Published on

An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library. Martin R. Kalfatovic. American Library Association Annual Meeting. Collaborative Digital Initiatives: Show and Tell and Lessons Learned. June 30, 2008. Anaheim, CA.

Published in: Education, Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
1,790
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
49
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • Title Slide
  • An International Cooperative Digital Library for Taxonomic Literature: The Biodiversity Heritage Library

    1. 1. An International Cooperative Digital Library for Taxonomic Literature Martin R. Kalfatovic Smithsonian Institution Libraries 30 June 2008
    2. 3. Biodiversity <ul><li>What is Biodiversity? </li></ul><ul><li>Genetic variability within species </li></ul><ul><li>Diversity of species </li></ul><ul><li>Ecosystems and landscapes </li></ul>
    3. 4. Biodiversity <ul><li>Wholesome food </li></ul><ul><li>Drinkable water </li></ul><ul><li>Breathable air </li></ul><ul><li>Stable climate for </li></ul><ul><ul><li>Forestry </li></ul></ul><ul><ul><li>Agriculture </li></ul></ul><ul><ul><li>Fisheries </li></ul></ul><ul><li>Waste decomposition </li></ul><ul><li>Bioremediation </li></ul><ul><li>Invasive species </li></ul><ul><li>Pest control </li></ul><ul><li>Ecotourism </li></ul><ul><li>Pharmaceuticals </li></ul><ul><li>Genomics </li></ul><ul><li>Proteomics </li></ul><ul><li>Bioengineering </li></ul><ul><li>Biotechnology </li></ul><ul><li>Molecular design </li></ul><ul><li>Imitating nature </li></ul><ul><li>Designer organisms </li></ul><ul><li>Renewable feedstocks </li></ul><ul><li>Envirofriendly </li></ul><ul><li>Manufacturing processes </li></ul>
    4. 5. Taxonomic Literature <ul><li>Over 250 years of systematic description of life </li></ul><ul><li>Systema naturae (10 th ed. 1758) by Carl von Linné </li></ul>
    5. 6. Taxonomic Literature <ul><li>Taxonomic descriptions must be published for the name to be valid </li></ul><ul><li>Publications must be available to the public through trusted sources </li></ul><ul><li>Libraries have been the traditional place </li></ul>
    6. 7. Taxonomic Literature The cited half-life of publications in taxonomy is longer than in any other scientific discipline * * * The decay rate is longer than in any scientific discipline ~ Macro-economic case for open access Tom Moritz
    7. 8. The cultivation of natural science cannot be efficiently carried on without reference to an extensive library Charles Darwin, et al. (1847)‏ Darwin, C. R. et al. 1847. Copy of Memorial to the First Lord of the Treasury [Lord John Russell], respecting the Management of the British Museum. Parliamentary Papers, Accounts and Papers 1847 , paper number (268), volume XXXIV.253 (13 April): 1-3. [Complete Works of Charles Darwin Online]
    8. 9. <ul><li>The Taxonomic Impediment </li></ul>“ The taxonomic impediment is a term that describes the gaps of knowledge in our taxonomic system” - Darwin Declaration, 1998
    9. 10. Taxonomic Impediment <ul><li>Specimen collections </li></ul><ul><li>Databases </li></ul><ul><li>Publications </li></ul><ul><li>Observations </li></ul><ul><li>‘ Gray’ literature </li></ul><ul><li>Index cards </li></ul><ul><li>Field notebooks </li></ul>
    10. 11. The Taxonomic Impediment <ul><li>that there is access to information held in national/regional/global collections </li></ul><ul><li>that electronic data is efficiently captured and provided in usable form </li></ul><ul><li>that existing information held in literature and by current experts is made available electronically </li></ul><ul><li>that stability of scientific names of organisms, used to access this information, is promoted </li></ul><ul><li>- Darwin Declaration, 1998 </li></ul>The essential requirements for accessing and utilizing this global information are:
    11. 12. Yet another physical difficulty is the task of assembling the library and indexes which will enable the student to work under proper conditions…. the beginner must now be prepared to spend liberally, or else must establish himself in an institution where a large library exists ; if he work by himself with only a few books, he will have to confine himself to a very narrow specialty indeed. 'The Limitations of Taxonomy' by J.M. Aldrich, Science , April 22, 1927, vol. LXV, no. 1686, p.381 The Taxonomic Impediment
    12. 13. Biologia Centrali-Americana Biologia Centrali-Americana Edited by Frederick Ducane Godman and Osbert Salvin London : Pub. for the editors by R. H. Porter, 1879-1915 Chart showing distribution in public collections of the complete 63 volume sets held worldwide. 2 complete copies in Central America held at the Smithsonian Tropical Research Institute Library
    13. 14. Digital Divide
    14. 15. Henry Walter Bates The Naturalist on the River Amazons , 1863 Vishwas Chavan travels a lot. An informatician based at the National Chemical Laboratory in Pune, India, he collects data on what types of animal live where in India to enter into a biodiversity database … Much of the information Chavan seeks is in old, out-of-print tomes … To find them, Chavan has spent years trailing around libraries. He dreams of the day when books such as these are scanned and made available as digital files on the Internet. “ Science in the Web Age: The Real Death of Print” by Andreas von Bubnoff Nature 438, 550-552 1 December 2005 Digital Divide?
    15. 16. BHL Timeline 2003. Telluride. Encyclopedia of Life meeting February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature May 2005. Washington. Ground work for the Biodiversity Heritage Library June 2006. Washington. Organizational and Technical meeting August 2006. New York Botanical Garden. BHL Director’s Meeting. October 2006. St. Louis/San Francisco. Technical meetings February 2007. Museum of Comparative Zoology. Organizational meeting May 2007. Encyclopedia of Life and BHL Portal Launch. Washington DC.
    16. 17. BHL Members American Museum of Natural History (New York)‏ Field Museum (Chicago)‏ Natural History Museum (London)‏ Smithsonian Institution Libraries (Washington) Missouri Botanical Garden (St. Louis)‏ New York Botanical Garden (New York)‏ Royal Botanic Garden, Kew Botany Libraries, Harvard University Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University Marine Biological Laboratory / Woods Hole Oceanographic Institution
    17. 18. BHL Members <ul><li>University of Illinois, Urbana-Champaign (contributing member)‏ </li></ul><ul><li>Scheme for addition of European and Asian partners underway </li></ul><ul><li>Additional categories of membership under consideration </li></ul>
    18. 20. Encyclopedia of Life … imagine for a moment that all the diversity of the world were finally revealed and then described, say one page to a species. The description would contain the scientific name, a photograph or drawing, a brief diagnosis, and information of where the species if found. If published in conventional book form … this Great Encyclopedia of Life would occupy 60 meters of library shelf per million species … 100 million species of organisms … would extend through 6 kilometers of shelving … E.O. Wilson (1992)‏
    19. 22. OH O H 2 N OH H Informatics Marine Biological Laboratory Missouri Botanical Garden Species Pages & Secretariat Smithsonian Education and Outreach Smithsonian & Harvard Synthesis Center Field Museum
    20. 23. Funding <ul><li>Initial grant from the MacArthur and Sloan Foundations (as part of the Encyclopedia of Life grant)‏ </li></ul><ul><li>Additional support from parent institutions </li></ul><ul><li>Additional grants being actively pursued by BHL and individual members </li></ul>
    21. 24. BHL Focus: Literature
    22. 25. BHL Focus: Literature
    23. 26. Mass Scanning <ul><li>Mass scanning is a proven technology </li></ul><ul><li>Post processing of generated data proven, but evolving </li></ul>
    24. 27. The Internet Archive <ul><li>501(c)(3) organization </li></ul><ul><li>Dedicated to “Universal Access to Human Knowledge” </li></ul><ul><li>Founder of the Open Content Alliance </li></ul><ul><li>Provides: </li></ul><ul><ul><li>Mass scanning </li></ul></ul><ul><ul><li>Archival storage of files </li></ul></ul><ul><ul><li>Image processing </li></ul></ul><ul><ul><li>Technology development </li></ul></ul>
    25. 28. Scribe Scanner <ul><li>Single Scribe Machine </li></ul><ul><ul><li>Custom built by the Internet Archive </li></ul></ul><ul><ul><li>Human operated </li></ul></ul><ul><ul><li>3,500 page per shift per day </li></ul></ul>
    26. 29. BHL Scanning Centers <ul><li>Northeast Regional Scanning Center </li></ul><ul><ul><li>10 Scribe machines </li></ul></ul><ul><ul><li>MBL/WHOI </li></ul></ul><ul><ul><li>Harvard </li></ul></ul><ul><li>New York Public Library </li></ul><ul><ul><li>10 Scribe machines </li></ul></ul><ul><ul><li>AMNH </li></ul></ul><ul><ul><li>NYBG </li></ul></ul>
    27. 30. BHL Scanning Centers <ul><li>University of Illinois </li></ul><ul><ul><li>2 Scribe machines </li></ul></ul><ul><li>Natural History Museum, London </li></ul><ul><ul><li>1 Scribe machine </li></ul></ul><ul><li>Missouri Botanical Garden </li></ul><ul><ul><li>Non-Scribe operation </li></ul></ul>
    28. 31. BHL Scanning Centers <ul><li>Washington, DC </li></ul><ul><ul><li>1 Scribe machine at Smithsonian Libraries </li></ul></ul><ul><ul><li>10 Scribe facility at Library of Congress with Fedlink (operational May2008)‏ </li></ul></ul>
    29. 32. Scanning Stats 28 June 2008 <ul><li>6,153,568 pages </li></ul><ul><li>15,343 volumes </li></ul><ul><li>6,049 titles </li></ul>
    30. 33. Automate Discovery <ul><li>Automated, scalable structural mark-up </li></ul><ul><li>Open to schemas for semantic mark-up </li></ul><ul><li>Integration of taxonomic intelligence </li></ul>
    31. 34. Structural Markup <ul><li><article> </li></ul><ul><li>  <title> A BRIEF CONSIDERATION OF CERTAIN POINTS IN THE MORPHOLOGY OFTHE FAMILY CHALCIDID^E.*. </title> </li></ul><ul><li>  <author> L. O. HOWARD. </author> </li></ul><ul><li>  <volume> 1 </volume> </li></ul><ul><li>  <issue> 2 </issue> </li></ul><ul><li>  <start_page> 65 </start_page> </li></ul><ul><li>  <end_page> 86 </end_page> </li></ul><ul><li>  <start_count_page> 85 </start_count_page> </li></ul><ul><li>  <end_count_page> 106 </end_count_page> </li></ul><ul><li>  <start_page_image_file> 3908800908001101smthrich_0085.djvu </start_page_image_file> </li></ul><ul><li>  <end_page_image_file> 3908800908001101smthrich_0106.djvu </end_page_image_file> </li></ul><ul><li>  </article> </li></ul>
    32. 35. Semantic Markup <ul><li>GoldenGATE The intention of the GoldenGATE editor is to build a bridge between NLP components and XML markup of natural language text according to arbitrary XML schemas. It allows the deployment of NLP components to marking up the bodies of literature they were designed for. In this way, it enables transforming the texts into XML content according to an XML schema that was designed to gain maximum benefit from the knowledge provided in them. </li></ul><ul><li>Integrated Open Taxonomic Access (INOTAXA) </li></ul>
    33. 36. <ul><li>10.7 million name strings in NameBank </li></ul><ul><li>Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text </li></ul><ul><li>Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition </li></ul>Taxonomic Intelligence
    34. 37. Build Content
    35. 38. Permissions <ul><li>Seek permissions from copyright holders </li></ul><ul><li>Opt in Copyright Model: The BHL will actively work with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals </li></ul><ul><li>BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost. </li></ul><ul><li>Will provide a set of files to the publishers for reuse as they see fit </li></ul>
    36. 39. Successes <ul><li>Entomological News </li></ul><ul><li>Journal of Hymenoptera Research </li></ul><ul><li>Herpetological Review </li></ul><ul><li>Publications of the San Diego Natural History Museum </li></ul><ul><li>California Academy of Sciences publications </li></ul><ul><li>And more ... </li></ul>
    37. 40. BHL Advantages <ul><li>Use of the articles will increase as evidenced by citation upsurge </li></ul><ul><li>Long-term management of the digital assets is provided by the BHL at no cost </li></ul><ul><li>Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century </li></ul><ul><li>Structural mark-up of backfiles into conformance with NLM DTD (just starting)‏ </li></ul>
    38. 41. Serve Content <ul><li>Machine to machine communication </li></ul><ul><li>Human interfaceable portal </li></ul><ul><li>Standard identifiers (proponent of the “yodi” - yet another digital identifier </li></ul><ul><li>??? </li></ul>
    39. 43. Encyclopedia of Life
    40. 45. Persistent Identifiers <ul><li>Stable URL </li></ul><ul><li>Handle </li></ul><ul><li>DOI </li></ul><ul><li>BICI/SICI </li></ul><ul><li>ISSN </li></ul><ul><li>ISBN </li></ul><ul><li>LSIDs </li></ul>http://www.biodiversitylibrary.org
    41. 46. BHL Portal <ul><li>Library catalog-like interface to BHL literature </li></ul><ul><li>Enhanced structural analysis to provide volume/issue/article page access to the literature </li></ul><ul><li>Iterative development based on feedback from user community </li></ul><ul><li>Provide access to two key audiences: </li></ul><ul><ul><li>Humans </li></ul></ul><ul><ul><li>Machines </li></ul></ul>
    42. 47. <ul><li>Search </li></ul><ul><li>Browse </li></ul>
    43. 48. Infrastructure: Now / Soon / Later <ul><li>Now: Missouri Botanical Garden development site </li></ul><ul><li>Now: Storage; Internet Archive, Missouri Botanical Garden </li></ul><ul><li>Soon: Move to Fedora storage model </li></ul><ul><li>Later: Move to a distributed Fedora storage model </li></ul>
    44. 49. Looking Forward <ul><li>Co-evolving bioinformatics resources produce a rich information ecology: </li></ul><ul><ul><li>Consortium for the Barcoding of Life (CBOL) with gene sequences deposited in GenBank. </li></ul></ul><ul><ul><li>GBIF’s Electronic Catalog of Taxonomic Names </li></ul></ul><ul><ul><li>Herbaria and museum specimen databases </li></ul></ul>
    45. 50. <ul><li>Quick ramp-up high early costs – development, mass scanning, etc. </li></ul><ul><li>Derive some long-term costs from the operating budgets of the member institutions (Examples under consideration: acquisitions budget, staff positions, etc.)‏ </li></ul><ul><li>Integrate functions/tasks with wider efforts where appropriate, e.g. mass storage </li></ul>Looking Forward
    46. 51. The Long Now Strategy <ul><li>Institutions that are creating the BHL exist to persist through time. That’s an important part of their business </li></ul><ul><li>The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly </li></ul>
    47. 52. A Global Library for Life In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned. Charles Davies Sherborn, Epilogue to Index Animalium , March 1922
    48. 54. <ul><li>Midrange estimate: 25% of 5 million species = 1.3 million species, or roughly 1 every 20 minutes </li></ul><ul><li>Low estimate: 15% of 4 million species = 0.6 million species, or roughly 1 every 44 minutes. </li></ul><ul><li>High estimate: 50% of 6 million species = 3 million species, or roughly 1 every 9 minutes </li></ul><ul><li>Conservation International http://tinyurl.com/3hzkax </li></ul>
    49. 59. Thank You ... now, stick around for ... Suzanne
    50. 60. LINKS <ul><li>Biodiversity Heritage Library http://www.biodiversitylibrary.org/ </li></ul><ul><li>Biodiversity Heritage Library Blog http://biodiversitylibrary.blogspot.com </li></ul><ul><li>Encyclopedia of Life http://www.eol.org/ </li></ul><ul><li>Smithsonian Institution Libraries http:// www.sil.si.edu / </li></ul><ul><li>Universal Biological Indexer and Organizer http://www.ubio.org/ </li></ul><ul><li>Biologia Centrali-Americana http://www.sil.si.edu/digitalcollections/bca/ </li></ul>
    51. 61. CREDITS <ul><li>Thanks to: </li></ul><ul><ul><li>Chris Freeland, Missouri Botanical Garden </li></ul></ul><ul><ul><li>Tom Garnett, The Biodiversity Heritage Library </li></ul></ul><ul><ul><li>The staff at the Internet Archive </li></ul></ul><ul><li>Images from </li></ul><ul><ul><li>The Galaxy of Images, Smithsonian Libraries ( www.sil.si.edu/imagegalaxy )‏ </li></ul></ul><ul><ul><li>Martin R. Kalfatovic </li></ul></ul><ul><ul><li>Suzanne C. Pilsk </li></ul></ul><ul><ul><li>Bernard Scaife </li></ul></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×