Your SlideShare is downloading. ×
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BHL Tech Overview for BHL-Europe

1,101

Published on

Presented at BHL-Europe Kickoff Meeting. …

Presented at BHL-Europe Kickoff Meeting.
Museum für Naturkunde, Berlin
12 May 2009

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,101
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. BHL Technology Overview Chris Freeland Technical Director, BHL Director of Bioinformatics, Missouri Botanical Garden
    • 2. About BHL: Usage, History
    • 3. Goals of BHL
      • Scan public domain biodiversity literature.
      • Negotiate rights to digitize copyrighted materials.
      • Ingest content digitized by others.
      • Provide interfaces & APIs for repository.
        • GUIs
        • Services for data mining & citation resolution
      http://www.biodiversitylibrary.org
    • 4.
      • More than:
        • 33,000 volumes
        • 13.3 million pages
      • Avg. monthly growth rate
        • 1,500 volumes
        • 600,000 pages
      Now Online
    • 5. Monthly Usage Stats
      • 45,000 unique users
      • 250,000 pageviews
    • 6. History
      • Preliminary work: MOBOT’s Botanicus
        • http://www.botanicus.org
      • Funded by Keck Foundation & IMLS
      • Working demonstration of how nomenclators/databases can link into digitized scientific literature
    • 7. Architecture
    • 8. Distributed
      • Digitized content on Internet Archive servers in California
      • Metadata index on MOBOT servers in Missouri
      • Image server on MBL servers in Massachusetts
      • Nice, but not global
    • 9. MOBOT Petabox cluster Internet Archive Image Server MBL
    • 10.  
    • 11. Scanning Workflow
    • 12. Scanning Operations
      • BHL uses scanning centers established by Internet Archive for mass scanning.
      • Some partner libraries also scan in-house.
      • Want to expand international footprint:
        • mirrored content
        • ingest from global data providers
      Locations of BHL/IA Scanning Centers
    • 13. Workflow Selection Preparation Post Production (Re)publication Digitization Conservation
    • 14. Open Access Data
      • Flora medica , oder, Abbildung der wichtigsten officinellen Pflanzen…[Heft 1-18]
        • Publisher: Jena,August Schmid,1831 [i.e. 1829-1831].
      PDF OCR XML JP2
    • 15.  
    • 16. Complexities of distributed, mass scanning from NYBG from Smithsonian
    • 17. Post Processing & Derivatives
    • 18. Derivatives
      • JPEG2000 (JP2) images
      • OCR: ABBY FineReader
      • PDF: LuraTech PDF Compressor
      • XML metadata
    • 19. Name Finding via TaxonFinder
    • 20. Raw Image Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
    • 21. Name Finding Stats to date *
      • Have mined more than 42 million name string occurrences
      • More than 30 million name strings verified by NameBank
        • 1.5 million unique
      *12 May 2009
    • 22. Content Delivery
    • 23.  
    • 24.  
    • 25. OCR error rate for names only Top OCR errors Study in 2008 found that for sample population of 3,003 names, 1,056 were incorrectly transcribed by OCR. http://biodiversitylibrary.blogspot.com/2008/10/evaluation-of-taxonomic-name-finding.html 1 Insert Space 8 n->v 2 Omit Space 9 l->i 3 e->c 10 r->i 4 u->I 11 u->ii 5 u->n 12 h->l 6 i->l 13 h->ii 7 c->e 14 e->o 35.16%
    • 26. Current image delivery: djatoka
      • Images stored as JPEG2000 (.jp2)
      • Decoded & delivered to browser via djatoka
        • Open source JP2 image server
        • Developed by digital librarians
        • Scalable
        • Rapid development cycle (v1.1)
        • Growing community of users
    • 27. djatoka Browser IIPViewer www.biodiversitylibrary.org .jp2 .jpg IA /page/1274907 pageid: 1274907 BHLdb http://www.archive.org/download/mushroomsofameri00palm/.../mushroomsofameri00palm_0010.jp2 images.biodivlibrary.org A user requests Mushrooms of America, edible and poisonous , Plate X: http://www.biodiversitylibrary.org/page/1274907 locate: BHL/IA architecture St. Louis San Francisco Woods Hole
    • 28.  
    • 29.  
    • 30. New delivery option: IA Bookreader
      • Open source
      • Example: Flora medica
        • http://www.us.archive.org/GnuBook/?id=floramedicaodera118diet#229
    • 31. IA Book Viewer http://www.us.archive.org/GnuBook/?id=floramedicaodera118diet#229
    • 32. APIs & Data Sharing
      • Name Service ( Documentation )
        • REST: XML or JSON
      • Data Export ( Documentation )
        • Monthly export of BHL titles, volumes, pages, names, other metadata in delimited files
    • 33. *Soon: Citation resolver via OpenURL
        • Beetle, A. A. 1977. Noteworthy grasses from Mexico V. Phytologia 37(4): 317–407.
        • http://example.edu/cgi?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:article &rft.jtitle=Phytologia &rft.atitle=Noteworthy+grasses+from+Mexico &rft.aulast=Beetle &rft.aufirst=A &rft.date=1977 &rft.volume=37&rft.issue=4&rft.spage=317&rft.epage=407
    • 34. Articles
    • 35.  
    • 36.  
    • 37.  
    • 38.  
    • 39.  
    • 40.  
    • 41. Article repository
      • Needed a way to display these PDFs
      • Wanted to extend contribution functionality to users
      • “ Safe harbor” model
        • BHL provides platform
        • Community provides content
          • Scientists, students, libraries
    • 42. http:// cite.biodiversitylibrary.org
      • Drupal with Biblio module
      • Multi-lingual interface
      • Customizable display, layout
      • Solr search/faceting
      • OAI & other services for discovery/sharing
    • 43.  
    • 44.  
    • 45.  
    • 46.  
    • 47. Outreach
    • 48. BHL Blog
      • Updates
      • Announcements
      • 1,500 users / month
    • 49. Twitter
      • twitter.com/BioDivLibrary
      • Communication tool
        • Connecting with LinkedData community, other users
        • Receiving assistance, guidance
        • FAST turnaround
    • 50. If BHL-E is not a Research Project…
    • 51. Technologies in hand:
      • TaxonFinder
      • djatoka
      • IA Bookreader
      • Drupal/Biblio
      • OAI-PMH
      • OpenURL
      • Fedora Commons
    • 52. Needed:
      • Deduplication Tools
      • Storage
      • OCR
      • Markup/rekeying
      • UI/UX
      • Interface translation
      • Data synchronization
    • 53. Thank you
        • Chris Freeland
        • 4344 Shaw Blvd.
        • St. Louis, MO 63110
        • [email_address]
        • http://www.biodiversitylibrary.org

    ×