Your SlideShare is downloading. ×
0
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

BHL Tech Overview for BHL-Europe

1,111

Published on

Presented at BHL-Europe Kickoff Meeting. …

Presented at BHL-Europe Kickoff Meeting.
Museum für Naturkunde, Berlin
12 May 2009

Published in: Technology, Education
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,111
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Transcript

    • 1. BHL Technology Overview Chris Freeland Technical Director, BHL Director of Bioinformatics, Missouri Botanical Garden
    • 2. About BHL: Usage, History
    • 3. Goals of BHL <ul><li>Scan public domain biodiversity literature. </li></ul><ul><li>Negotiate rights to digitize copyrighted materials. </li></ul><ul><li>Ingest content digitized by others. </li></ul><ul><li>Provide interfaces & APIs for repository. </li></ul><ul><ul><li>GUIs </li></ul></ul><ul><ul><li>Services for data mining & citation resolution </li></ul></ul>http://www.biodiversitylibrary.org
    • 4. <ul><li>More than: </li></ul><ul><ul><li>33,000 volumes </li></ul></ul><ul><ul><li>13.3 million pages </li></ul></ul><ul><li>Avg. monthly growth rate </li></ul><ul><ul><li>1,500 volumes </li></ul></ul><ul><ul><li>600,000 pages </li></ul></ul>Now Online
    • 5. Monthly Usage Stats <ul><li>45,000 unique users </li></ul><ul><li>250,000 pageviews </li></ul>
    • 6. History <ul><li>Preliminary work: MOBOT’s Botanicus </li></ul><ul><ul><li>http://www.botanicus.org </li></ul></ul><ul><li>Funded by Keck Foundation & IMLS </li></ul><ul><li>Working demonstration of how nomenclators/databases can link into digitized scientific literature </li></ul>
    • 7. Architecture
    • 8. Distributed <ul><li>Digitized content on Internet Archive servers in California </li></ul><ul><li>Metadata index on MOBOT servers in Missouri </li></ul><ul><li>Image server on MBL servers in Massachusetts </li></ul><ul><li>Nice, but not global </li></ul>
    • 9. MOBOT Petabox cluster Internet Archive Image Server MBL
    • 10.  
    • 11. Scanning Workflow
    • 12. Scanning Operations <ul><li>BHL uses scanning centers established by Internet Archive for mass scanning. </li></ul><ul><li>Some partner libraries also scan in-house. </li></ul><ul><li>Want to expand international footprint: </li></ul><ul><ul><li>mirrored content </li></ul></ul><ul><ul><li>ingest from global data providers </li></ul></ul>Locations of BHL/IA Scanning Centers
    • 13. Workflow Selection Preparation Post Production (Re)publication Digitization Conservation
    • 14. Open Access Data <ul><li>Flora medica , oder, Abbildung der wichtigsten officinellen Pflanzen…[Heft 1-18] </li></ul><ul><ul><li>Publisher: Jena,August Schmid,1831 [i.e. 1829-1831]. </li></ul></ul>PDF OCR XML JP2
    • 15.  
    • 16. Complexities of distributed, mass scanning from NYBG from Smithsonian
    • 17. Post Processing & Derivatives
    • 18. Derivatives <ul><li>JPEG2000 (JP2) images </li></ul><ul><li>OCR: ABBY FineReader </li></ul><ul><li>PDF: LuraTech PDF Compressor </li></ul><ul><li>XML metadata </li></ul>
    • 19. Name Finding via TaxonFinder
    • 20. Raw Image Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
    • 21. Name Finding Stats to date * <ul><li>Have mined more than 42 million name string occurrences </li></ul><ul><li>More than 30 million name strings verified by NameBank </li></ul><ul><ul><li>1.5 million unique </li></ul></ul>*12 May 2009
    • 22. Content Delivery
    • 23.  
    • 24.  
    • 25. OCR error rate for names only Top OCR errors Study in 2008 found that for sample population of 3,003 names, 1,056 were incorrectly transcribed by OCR. http://biodiversitylibrary.blogspot.com/2008/10/evaluation-of-taxonomic-name-finding.html 1 Insert Space 8 n->v 2 Omit Space 9 l->i 3 e->c 10 r->i 4 u->I 11 u->ii 5 u->n 12 h->l 6 i->l 13 h->ii 7 c->e 14 e->o 35.16%
    • 26. Current image delivery: djatoka <ul><li>Images stored as JPEG2000 (.jp2) </li></ul><ul><li>Decoded & delivered to browser via djatoka </li></ul><ul><ul><li>Open source JP2 image server </li></ul></ul><ul><ul><li>Developed by digital librarians </li></ul></ul><ul><ul><li>Scalable </li></ul></ul><ul><ul><li>Rapid development cycle (v1.1) </li></ul></ul><ul><ul><li>Growing community of users </li></ul></ul>
    • 27. djatoka Browser IIPViewer www.biodiversitylibrary.org .jp2 .jpg IA /page/1274907 pageid: 1274907 BHLdb http://www.archive.org/download/mushroomsofameri00palm/.../mushroomsofameri00palm_0010.jp2 images.biodivlibrary.org A user requests Mushrooms of America, edible and poisonous , Plate X: http://www.biodiversitylibrary.org/page/1274907 locate: BHL/IA architecture St. Louis San Francisco Woods Hole
    • 28.  
    • 29.  
    • 30. New delivery option: IA Bookreader <ul><li>Open source </li></ul><ul><li>Example: Flora medica </li></ul><ul><ul><li>http://www.us.archive.org/GnuBook/?id=floramedicaodera118diet#229 </li></ul></ul>
    • 31. IA Book Viewer http://www.us.archive.org/GnuBook/?id=floramedicaodera118diet#229
    • 32. APIs & Data Sharing <ul><li>Name Service ( Documentation ) </li></ul><ul><ul><li>REST: XML or JSON </li></ul></ul><ul><li>Data Export ( Documentation ) </li></ul><ul><ul><li>Monthly export of BHL titles, volumes, pages, names, other metadata in delimited files </li></ul></ul>
    • 33. *Soon: Citation resolver via OpenURL <ul><ul><li>Beetle, A. A. 1977. Noteworthy grasses from Mexico V. Phytologia 37(4): 317–407. </li></ul></ul><ul><ul><li>http://example.edu/cgi?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:article &rft.jtitle=Phytologia &rft.atitle=Noteworthy+grasses+from+Mexico &rft.aulast=Beetle &rft.aufirst=A &rft.date=1977 &rft.volume=37&rft.issue=4&rft.spage=317&rft.epage=407 </li></ul></ul>
    • 34. Articles
    • 35.  
    • 36.  
    • 37.  
    • 38.  
    • 39.  
    • 40.  
    • 41. Article repository <ul><li>Needed a way to display these PDFs </li></ul><ul><li>Wanted to extend contribution functionality to users </li></ul><ul><li>“ Safe harbor” model </li></ul><ul><ul><li>BHL provides platform </li></ul></ul><ul><ul><li>Community provides content </li></ul></ul><ul><ul><ul><li>Scientists, students, libraries </li></ul></ul></ul>
    • 42. http:// cite.biodiversitylibrary.org <ul><li>Drupal with Biblio module </li></ul><ul><li>Multi-lingual interface </li></ul><ul><li>Customizable display, layout </li></ul><ul><li>Solr search/faceting </li></ul><ul><li>OAI & other services for discovery/sharing </li></ul>
    • 43.  
    • 44.  
    • 45.  
    • 46.  
    • 47. Outreach
    • 48. BHL Blog <ul><li>Updates </li></ul><ul><li>Announcements </li></ul><ul><li>1,500 users / month </li></ul>
    • 49. Twitter <ul><li>twitter.com/BioDivLibrary </li></ul><ul><li>Communication tool </li></ul><ul><ul><li>Connecting with LinkedData community, other users </li></ul></ul><ul><ul><li>Receiving assistance, guidance </li></ul></ul><ul><ul><li>FAST turnaround </li></ul></ul>
    • 50. If BHL-E is not a Research Project…
    • 51. Technologies in hand: <ul><li>TaxonFinder </li></ul><ul><li>djatoka </li></ul><ul><li>IA Bookreader </li></ul><ul><li>Drupal/Biblio </li></ul><ul><li>OAI-PMH </li></ul><ul><li>OpenURL </li></ul><ul><li>Fedora Commons </li></ul>
    • 52. Needed: <ul><li>Deduplication Tools </li></ul><ul><li>Storage </li></ul><ul><li>OCR </li></ul><ul><li>Markup/rekeying </li></ul><ul><li>UI/UX </li></ul><ul><li>Interface translation </li></ul><ul><li>Data synchronization </li></ul>
    • 53. Thank you <ul><ul><li>Chris Freeland </li></ul></ul><ul><ul><li>4344 Shaw Blvd. </li></ul></ul><ul><ul><li>St. Louis, MO 63110 </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>http://www.biodiversitylibrary.org </li></ul></ul>

    ×