BHL Technology Overview

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Group

    BHL Technology Overview - Presentation Transcript

    1. Biodiversity Heritage Library (BHL): Technology Overview Chris Freeland Director, Bioinformatics Missouri Botanical Garden Technical Director Biodiversity Heritage Library [email_address] www.biodiversitylibrary.org
    2. BHL Partners
      • Museums
        • American Museum of Natural History (New York)
        • Natural History Museum (London)
        • Smithsonian Institution (Washington)
        • The Field Museum (Chicago)
      • Botanical Gardens
        • Missouri Botanical Garden
        • New York Botanical Garden
        • Royal Botanic Garden, Kew
      • University Libraries
        • Botany Libraries, Harvard University
        • Ernst Meyer Library of the Museum of Comparative Zoology, Harvard University
        • University of Illinois
      • Bioinformatics Institutes
        • MBL/WHOI
        • uBio.org
    3. Why have BHL? In any well-appointed Natural History Library there should be found every book and every edition of every book dealing in the remotest way with the subjects concerned. One never knows wherein one edition differs from or supplements the other and unless these are on the same table at the same time it is not possible to collate them properly. Moreover for accurate work it is necessary for the student to verify every reference he may find ; it is not enough to copy from a previous author; he must verify each reference itself from the original . Charles Davies Sherborn, Epilogue to Index Animalium , March 1922 Charles Davies Sherborn (1861-1942)
    4. Unique Components of BHL
      • Combining metadata records from multiple libraries (similar, but different) and representing through a shared portal
      • Use of JPEG2000
      • Web 2.0 Mashups
      • Taxonomic data mining
      • Services
      • Rare & novel content
    5. Scanning process
      • Select Book
      • Pull from Shelf
      • Send to IA scanning center
      • Book is scanned & QA
      • Page images loaded on IA cluster
        • Derivatives created
      • Book returned to library
      • Files harvested from IA portal
      • Books available for display within BHL portal
    6. Mushrooms of America, edible and poisonous. Ed. by Julius A. Palmer, Jr. , 1885.
    7. Scan & Store: Internet Archive Scanning on Scribes Storage in Petaboxes
    8. Scanning & Derivatives
      • XML
      • JP2
      • PDF
      • JPG
      • TXT
      • DJVu
      Master Derivatives
    9. Harvest from IA
      • Extract, Transform, Load (ETL)
      • Custom scripts to extract content via IA’s APIs
      • Database scripts to transform to relational data structure
      • Load into database
    10.  
    11.  
    12.  
    13. Stable URL Attribution Name Finding Page Turning Page Turning Zoom/Pan Download/View Browse Search Filter Target/Object
    14. JPEG2000 (*.jp2) display
      • RAW original => 85% .jp2
      • LuraTech encoder
        • Wavelet compression
      • LizardTech decoder
        • Tiled on the fly, cached for performance
      • GSIV browser-based client viewer
        • ‘ AJAXian’
    15. LizardTech ExpressServer Browser GSIV.js www.biodiversitylibrary.org .jp2 .jpg IA /page/1274907 pageid: 1274907 BHLdb http://www.archive.org/download/mushroomsofameri00palm/.../mushroomsofameri00palm_0010.jp2 images.mobot.org A user requests Mushrooms of America, edible and poisonous , Plate X: http://www.biodiversitylibrary.org/page/1274907 locate: BHL/IA architecture = 5.0+ sec transfer Time to deliver image: 8+ sec
    16. Reuse, don’t rebuild
    17. TIF Image from Scanner Converted to text via PrimeOCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
    18. Names data mining
    19. Tag cloud from LCSH Subject Heading from library catalog Expressed as MARCXML Tag Cloud
    20. Geocoding LCSH
    21. RSS Feeds
      • Specific: Last 25 books published in German from NYBG
      • RSS Feed location: http://www.biodiversitylibrary.org/RecentRss/25/GER/NYBG    
      • Allgemeine deutsche Garten-Zeitung , 7, 1829 (added: 04/03/2008 )
      • Zeitschrift fr wissenschaftliche Mikroskopie und fr mikroskopische Technik . 2, 1885 (added: 03/28/2008 )
      • Zeitschrift fr technische Biologie . 7, 1919 (added: 03/27/2008 )
      • General: Last 25 books from all libraries
      • RSS Feed location: http://www.biodiversitylibrary.org/RecentRss/25    
      • Summa plantarum : v.1 (added: 05/01/2008 )
      • Vegetable materia medica of the United States (added: 04/30/2008 )
      • The family herbal; (added: 04/30/2008 )
    22. Services
      • Names
        • v.1 released http://www.biodiversitylibrary.org/services/name/NameService.asmx
      • Stable urls
        • http://www.biodiversitylibrary.org/bibliography/1652
        • http://www.biodiversitylibrary.org/name/Carcharodon_carcharias
      • Future:
        • Citation Resolver
        • Titles Resolver
    23. BHL Name Services http://www.biodiversitylibrary.org/services/name/NameService.asmx
    24. Provider Integration
      • Encyclopedia of Life
      • Atrium Andes Biodiversity
      • Wikipedia
      • EDIT Scratchpads
      • More to come…
    25.  
    26.  
    27. Hardware Infrastructure
      • Distributed
      • Partially redundant
        • Work needed
      • Mixed platforms
      • Mixed app frameworks
    28. MOBOT Petabox cluster Internet Archive
    29.  
    30. File Storage Estimates
      • 4MB per page including derivatives
      • 1 million pages = 4TB storage
      • Expected output: 60 – 100 million pages
      • 240 - 400 TB for files
      • 10 - 20 GB for db
    31. Future Work
      • Services
        • Citation Resolver
        • Titles Resolver
      • Interfaces
      • Editing
        • Authoritative
        • Community
      • Backend
    32. Fedora
      • Funded by Gordon and Betty Moore Foundation to adopt Fedora Commons
      • Working with Internet Archive to define use and practice
      • Project completion December 2009
    33. Thank You
      • Chris Freeland
      • [email_address]
      • BHL Portal
      • www.biodiversitylibrary.org
      • BHL Blog
      • biodiversitylibrary.blogspot.com
      • BHL collection at Internet Archive
      • www.archive.org /details/biodiversity

    + chrisfreelandchrisfreeland, 2 years ago

    custom

    677 views, 0 favs, 0 embeds more stats

    Presentation to Smithsonian's Office of the Chief I more

    More Info

    © All Rights Reserved

    Go to text version
    • Total Views 677
      • 677 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 12
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as innappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel

    Categories