Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Purposeful Gaming Crowdsourcing the Correction of OCRed Text in the Biodiversity Heritage Library


Published on

Optical Character Recognition (OCR) of scanned text enables full-text searching. Unfortunately, OCR software does not produce 100% accurate representation of the text, especially in older works with varying fonts, odd layouts and ink bleed-through. Led by Missouri Botanical Garden, and partnering with New York Botanical Garden, Harvard Museum of Comparative Zoology and Cornell, an IMLS-funded project has developed 2 games to engage the public in correcting inaccurately OCRed text in BHL.

Published in: Technology
  • Be the first to comment

Purposeful Gaming Crowdsourcing the Correction of OCRed Text in the Biodiversity Heritage Library

  1. 1. Purposeful Gaming: Crowdsourcing the Correction of OCRed Text in the Biodiversity Heritage Library Marty Schlabach Food & Agriculture Librarian Mann Library Cornell University Upstate NY Science Librarians Annual Meeting Cornell University October 23, 2015
  2. 2. BHL Members • American Museum of Natural History Library • Comisión Nacional para el Conocimiento y Usa de la Biodiversidad (CONABIO) • Cornell University Library • Field Museum of Natural History Library • Harvard University Botany Libraries • Harvard University, Museum of Comparative Zoology, Ernst Mayr Library • Library of Congress • LuEsther T. Mertz Library, The New York Botanical Garden • Marine Biological Laboratory / Woods Hole Oceanographic Institution Library • Missouri Botanical Garden, Peter H. Raven Library • National Library Board, Singapore • Natural History Museum Library, London • Royal Botanic Gardens, Kew, Library, Art & Archives • Smithsonian Libraries • United States Geological Survey Libraries Program • University Library, University of Illinois Urbana-Champaign
  3. 3. • National Leadership Grant for Libraries awarded to Missouri Botanical Garden in St Louis. • Partners include Harvard, New York Botanical Garden, Cornell • Runs Dec 2013-Nov 2015 • Funded in part by IMLS Purposeful Gaming and BHL: engaging the public in improving & enhancing discovery & access to digital texts
  4. 4. BHL Problem Statement: • Major challenge for digital libraries: • full-text searching of scanned texts is significantly hampered by poor output from Optical Character Recognition (OCR) software. • Historic literature has proven to be particularly problematic because of its tendency to have • varying fonts • varying typesetting • varying layouts • ink bleed-through • foxing • other physical condition issues
  5. 5. • Building an online game to crowdsource the correction of inaccurate OCR • Crowdsourcing the transcription of inaccurate OCR and handwritten texts • Adding new content types upon which to test the approach • Seed & Nursery Catalogs & Seed Exchange Lists • Test OCR correction on this content type • Crowd-source the transcription when needed • Field Notebooks, • Handwritten, OCR virtually impossible • Crowd-source the transcription How are we engaging the public in improving and enhancing the discovery and access to digital texts in BHL?
  6. 6. Beanstalk
  7. 7. • Smorball 1353 sessions 1081 players • Beanstalk 991 sessions 829 players Amount of Game-Playing?
  8. 8. What happens with the game output? • Multiple players enter the same character string for a word, system considers it correct • String of characters or the correct word is added to the index • Made available for searching & improves discoverability
  9. 9. Thanks to my colleagues at: • Mann Library, Cornell University • Liberty Hyde Bailey Hortorium, Cornell University • Biodiversity Heritage Library • Mertz Library, New York Botanical Garden • Peter H. Raven Library, Missouri Botanical Garden • National Agriculture Library, USDA • Institute for Museum and Library Services • Tiltfactor
  10. 10. Questions??