Slideshare.net (beta)

 
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons



All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 1 (more)

The Biodiversity Heritage Library

From Kalfatovic, 5 months ago

Talk given January 30, 2008 at the National Agriculture Library by more

816 views  |  0 comments  |  0 favorites  |  16 downloads  |  3 embeds (Stats)
 

Tags

bhl

 
 
 
 

Privacy InfoNew!

This slideshow is Public

 
CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License
Embed in your blog
Embed (wordpress.com)
custom

Slideshow transcript

Slide 1: The Biodiversity Heritage Library Martin R. Kalfatovic Suzanne C. Pilsk Smithsonian Institution Libraries 30 January 2008 KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 2: Yet another physical difficulty is the task of assembling the library and indexes which will enable the student to work under proper conditions…. the beginner must now be prepared to spend liberally, or else must establish himself in an institution where a large library exists; if he work by himself with only a few books, he will have to confine himself to a very narrow specialty indeed. 'The Limitations of Taxonomy' by J.M. Aldrich, Science, April 22, 1927, vol. LXV, no. 1686, p.381 KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 3: BHL Timeline 2003. Telluride. Encyclopedia of Life meeting February 2005. London. Library and Laboratory: the Marriage of Research, Data and Taxonomic Literature May 2005. Washington. Ground work for the Biodiversity Heritage Library June 2006. Washington. Organizational and Technical meeting August 2006. New York Botanical Garden. BHL Director’s Meeting. October 2006. St. Louis/San Francisco. Technical meetings February 2007. Museum of Comparative Zoology. Organizational meeting May 2007. Encyclopedia of Life and BHL Portal Launch. Washington DC. KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 4: BHL Members American Museum of Natural History (New York) Field Museum (Chicago) Natural History Museum (London) Smithsonian Institution (Washington) Missouri Botanical Garden (St. Louis) New York Botanical Garden (New York) KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 5: BHL Members Royal Botanic Garden, Kew Botany Libraries, Harvard University Ernst Mayr Library of the Museum of Comparative Zoology, Harvard University Marine Biological Laboratory / Woods Hole Oceanographic Institution KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 6: BHL Members University of Illinois, Urbana- Champaign (contributing member) Scheme for addition of European and Asian partners under consideration Additional categories of membership under consideration KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 7: BHL Focus: Literature KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 8: BHL Focus: Literature KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 9: BHL Focus: Literature • Core literature pre- 1923: 100 million pages (?) • All pre-1923: 120-150 million pages • All literature: 280-320 million pages KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 10: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 11: Selection Tools  Combined Serial list for selection of title to scan to avoid duplication of effort  Monographic “de-duping” algorithm  OCLC Collection Analysis KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 12: BHL Collections • 1.3 million catalogue records • 73% are monographs (remainder are serials at title-level) • 63% is English language material • The next most popular language (9%) is German • About 30% of material was published before 1923 KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 13: Selection  Marine Biological Laboratory/WHOI  Marine monographs  General Science  Museum of Comparative Zoology  MCZ publications  Herpetology monographs and serials  Ichthyology monographs and serials KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 14: Selection  University of Illinois  Fieldiana  Natural history of Illinois  American Museum of Natural History  AMNH publications  Ornithology  Natural History Museum  NHM publications  Major natural history general serials KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 15: Selection  Botany Collections  Missouri Botanical Garden, New York Botanical Garden, Harvard Botany Libraries, and Royal Botanic Garden, Kew will cooperatively develop a methodology for botanical publications KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 16: Selection  Smithsonian Institution Libraries  Smithsonian publications  Entomology collection  Marine mammals  Fishes  Selected special collections materials KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 17: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 18: The Internet Archive • 501(c)(3) organization • Dedicated to “Universal Access to Human Knowledge” • Founder of the Open Content Alliance • Provides: – Mass scanning – Archival storage of files – Image processing – Technology development KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 19: Scribe Scanner • Single Scribe Machine – Custom built by the Internet Archive – Human operated – 3,500 page per shift per day KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 20: BHL Scanning Centers  Northeast Regional Scanning Center  10 Scribe machines  MBL/WHOI  Harvard  New York Public Library  10 Scribe machines  AMNH  NYBG KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 21: BHL Scanning Centers  University of Illinois  2 Scribe machines  Natural History Museum, London  1 Scribe machine  Missouri Botanical Garden  Non-Scribe operation KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 22: BHL Scanning Centers  Washington, DC  1 Scribe machine at Smithsonian Libraries  10 Scribe facility at Library of Congress with Fedlink under construction (Spring 2008) KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 23: Scanning Stats  5.5 million plus total pages scanned  500,000 plus from the Natural History Museum, London  1,000,000 from the MBL/WHOI library  Fieldiana, 75,000 plus pages KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 24: Scanning Stats  Smithsonian Libraries  250,000 pages (non-Scribe scanned, 1996-2007)  100,000 Scribe scanned pages (since August 2007)  Other libraries (non-Scribe)  MOBOT: 780,000  AMNH: 150,000 KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 25: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 26: But what about ... KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 27: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 28: But what about  Difficult (impossible?) to re-purpose much of the material  Quality of images often questionable  “Frankenbooks”  Sketchy / inaccurate bibliographic data KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 29: Persistent Identifiers  Stable URL  Handle  DOI  BICI/SICI http://www.biodiversitylibrary.org  ISSN  ISBN  LSIDs KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 30: Structural Markup <article>   <title>A BRIEF CONSIDERATION OF  CERTAIN POINTS IN THE  MORPHOLOGY OFTHE FAMILY  CHALCIDID^E.*.</title>   <author>L. O. HOWARD.</author>   <volume>1</volume>   <issue>2</issue>   <start_page>65</start_page>   <end_page>86</end_page>   <start_count_page>85</start_count_page>   <end_count_page>106</end_count_page>   <start_page_image_file>39088009080011 01smthrich_0085.djvu</start_page_imag e_file>   <end_page_image_file>390880090800110 1smthrich_0106.djvu</end_page_image_ file>   </article> KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 31: Semantic Markup  GoldenGATE The intention of the GoldenGATE editor is to build a bridge between NLP components and XML markup of natural language text according to arbitrary XML schemas. It allows the deployment of NLP components to marking up the bodies of literature they were designed for. In this way, it enables transforming the texts into XML content according to an XML schema that was designed to gain maximum benefit from the knowledge provided in them.  Integrated Open Taxonomic Access (INOTAXA) KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 32: Taxonomic Intelligence  10.7 million name strings in NameBank  Uses sophisticated algorithm (TaxonGrab) to locate likely name strings in OCR text  Iterative processing of BHL texts will both increase the number of name strings in NameBank and increase the accuracy of name string recognition KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 33: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 34: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 35: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 36: BHL & Publishers KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 37: Permissions • Seek permissions from copyright holders • Opt in Copyright Model: The BHL will actively work with professional societies and associations to integrate their publications into the BHL in a way that serves the societies’ missions and goals • BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost. • Will provide a set of files to the publishers for reuse as they see fit. KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 38: BHL Advantages • Use of the articles will increase as evidenced by citation upsurge • Long-term management of the digital assets is provided by the BHL at no cost • Publishers’ content is embedded in the emerging knowledge ecology that is sweeping biology in this century • Structural markup of backfiles into conformance with NLM DTD (just starting) KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 39: Successes • Entomological News • Journal of Hymenoptera Research • Herpetological Review • Publications of the San Diego Natural History Museum • California Academy of Sciences publications • And more ... KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 40: BHL Portal • Library catalog-like interface to BHL literature • Enhanced structural analysis to provide volume/issue/article page access to the literature • Iterative development based on feedback from user community • Provide access to two key audiences: – Humans – Machines KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 41: Page Delivery KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 42: Taxonomic Intelligence KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 43: Web 2.0 Features  Search  Browse KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 44: Discovered Bibliographies KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 45: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 46: Demos  BHL Portal www.biodiversitylibrary.org  uBio www.ubio.org KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 47: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 48: Funding & the Future  Initial grant from the MacArthur and Sloan Foundations (as part of the Encylopedia of Life grant)  Additional support from parent institutions  Additional grants being actively pursued by BHL and individual members KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 49: Funding & the Future • Co-evolving bioinformatics resources produce a rich information ecology: – Consortium for the Barcoding of Life (CBOL) with gene sequences deposited in GenBank. – GBIF’s Electronic Catalog of Taxonomic Names – Hebaria and museum specimen databases KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 50: Funding & the Future Financial Sustainability Strategy • Quick ramp-up high early costs – development, mass scanning, etc. Drive long-term costs down the asymptote toward zero. • Derive some long-term costs from the operating budgets of the member institutions. (examples under consideration: acquisitions budget, staff positions, etc.) KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 51: Funding & the Future Financial Sustainability Strategy • Integrate functions/tasks with wider efforts where appropriate, e.g. mass storage. • Clear roles for staff who wear multiple hats. 1.5 grant funded positions currently but >15 staff who make substantive contributions. KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 52: Funding & the Future The Long Now Strategy  Institutions that are creating the BHL exist to persist through time. That’s an important part of their business. Use them!  The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 53: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 54: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 55: Structure of the Encyclopedia of Life Serine Molecule KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 56: Biodiversity Synthesis Center Heritage Field Museum Library Informatics Marine Biological Laboratory & MOBOT Secretariat Smithsonian Education & Outreach Serine Molecule Smithsonian/Harvard KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 57: EOL Species Pages  Built from a variety of new and existing sources  Views available for varying levels of expertise from novice to expert  Legacy literature a key component of the EOL species pages KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 58: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 59: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 60: KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 61: In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned. Charles Davies Sherborn, Epilogue to Index Animalium, March 1922 KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 62: Thank You ... for sticking around! KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008

Slide 63: CREDITS Thanks to:  Chris Freeland, Missouri Botanical Garden  Tom Garnett, The Biodiversity Heritage Library Project  The staff at the Internet Archive Images from  The Galaxy of Images, Smithsonian Libraries (www.sil.si.edu/imagegalaxy)  Martin R. Kalfatovic  Suzanne C. Pilsk  Bernard Scaife KALFATOVIC and PILSK :: SMITHSONIAN INSTITUTION LIBRARIES :: NATIONAL AGRICULTURE LIBRARY :: 30 JANUARY 2008