Your SlideShare is downloading. ×
0
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Many hands make light work, the american version [charleston library conference 201111]
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Many hands make light work, the american version [charleston library conference 201111]

494

Published on

Initial results (2 months after first use) of crowd sourcing at the California Digital Newspaper Collection.

Initial results (2 months after first use) of crowd sourcing at the California Digital Newspaper Collection.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
494
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Many Hands Make Light Work, the American Version Experiences with User-Text-Correction at California Digital Newspaper Collection (CDNC): How crowd-sourcing OCR text correction impacts a historic newspaper collection
  • 2. About the CollectionThe California Digital NewspaperCollection contains over 490,000 pages of visits per month significant California newspapers publishedfrom 1846 to 1922. The newspapers were digitized to both minutes per visit page and article level METS/ALTO data aspart of the National Digital NewspaperProgram. pages per visit site statistics between Nov. 2010 and Aug. 2011 The collection is displayed using Veridiandigital library software.
  • 3. poor OCR reduces search recall to low levels OCR quality ranges between 50%-90% of word level accuracy
  • 4. Daily Alta California, 2 January 1850 $$post OCR text correction is expensive≈ $0.50 per 1000 characters or $5.00 to $10.00 per newspaper page
  • 5. The Average CDNC User users above 40 years old users who considerLike the users of many digital newspaper themselves genealogists collections, patrons of the CDNC visit thesite for personal reasons, consider users who visit the site atthemselves genealogists or family least weekly historians, and return to the sitefrequently.
  • 6. Wikipedia on Crowdsourcing: “distributed problem-solving and production model”“sourcing tasks traditionally performed by specific individualsto an undefined large group of people or community (crowd) through an open call”
  • 7. Crowd-Sourcing Projects Project Gutenberg Family Search The National Library of Australia The National Library of Finland FreeBMD.org
  • 8. Site Statistics SinceUser Text Correction visits per month minutes per visit pages per visit
  • 9. lines per month corrected by the top corrector 30,000 ‘Engaging with users and building virtual communities is just as important to the users as providing the data itself. They want total lines corrected since 2008 49 Million to be part of a community.’ Rose Holley, The National Library of Australia total number of text correctors 30,000 lines corrected per month in 2011 2,000,000 +
  • 10. User Text Correction added to CDNC
  • 11. Results August 22 - October 22 Users who have Lines Corrected Per Monthcorrected text Lines corrected bytop corrector Total number of linescorrected
  • 12. Goals•  Improve OCR text at low cost •  Improve search precision / recall •  Build user community
  • 13. Risks?•  User text correction of newspapers is (relatively) new •  Users won’t know what to do, interface is confusing •  Users don’t understand errors in OCR text • Vandalism of text
  • 14. Benefits• Text quality improved •  Cost effective $•  Community involvement •  Users empowered
  • 15. User Reaction“Great feature (I tested it during the beta) for a “I have used the new system and like it. The usergreat site, which I have used extensively.  I plan to correction is great idea.” use the edit feature when I get back to research in ~Pat the Los Angeles Herald and the Daily AltaCalifornia.” ~Lawrence B. “Exactly what the system needed!!! Pulled up a couple articles in the beta system and made some text corrections. Went back and tried the old system using the words I corrected and it worked!! “STUNNINGLY  FANTASTIC!!!! is what I think!” Outstanding enhancement!” ~A fifth generation Californian ~Mary B. of multiple Forty-niner families
  • 16. “The addition of user text correction (UTC) to the California Digital Newspaper Collection has dramatically improved the quality of the computer-generated text and enlivened our relationship with ourusers.  Within a couple of weeks of implementing UTC, and with little publicity, a handful of users had already corrected thousands of linesof text.  Many of those users emailed us directly with questions aboutor praise for the UTC, building direct, personal connections between our staff and users that hadn’t existed before.” ~Brian Geiger, Center for Bibliographic Research, UC Riverside
  • 17. ? Brian Geiger, Director Center for Bibliographic Studies and Research University of California Riverside bgeiger@ucr.edu Frederick Zarndt, Chair IFLA Newspapers Section frederick@frederickzarndt.com

×