Written rummage

917 views

Published on

Senior defense ppt by Joshua Rio Ross. ORU April 18, 2011

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
917
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Written rummage

  1. 1. Written Rummage An Exploration of Crowdsourcing and Digital Transcription By Joshua Rio-Ross
  2. 2. Crowdsourcing <ul><li>Outsourcing a task to an undefined group of people (a crowd) rather than contracting professionals. </li></ul><ul><li>Crowd: volunteer or hire </li></ul><ul><li>A few extant examples: </li></ul><ul><ul><li>Open Solubility Challenge </li></ul></ul><ul><ul><li>Open Dinosaur Project </li></ul></ul><ul><ul><li>Galaxy Zoo </li></ul></ul><ul><li>Amazon’s Mechanical Turk </li></ul>
  3. 3. Digital Libraries <ul><li>Increasing prominence in information age </li></ul><ul><li>Sharing among academic institutions </li></ul><ul><ul><li>Broader exposition of collections </li></ul></ul><ul><ul><li>Facilitated/expanded academic research </li></ul></ul><ul><li>Drawback: unsearchable </li></ul><ul><li>Extant examples: </li></ul><ul><ul><li>Google Books </li></ul></ul><ul><ul><li>Hathitrust.org </li></ul></ul>
  4. 4. OCR & Transcription <ul><li>OCR: Optical Character Recognition </li></ul><ul><ul><li>Useful for most type-written texts, some hand </li></ul></ul><ul><ul><li>OCR programs still not flawless </li></ul></ul><ul><ul><li>Captcha/ReCaptcha </li></ul></ul><ul><li>There are some things OCR can’t rewrite; </li></ul><ul><li>For everything else, there’s people. </li></ul><ul><li>Professional is undesirably expensive </li></ul><ul><ul><li>$2.50-8.00 per page </li></ul></ul>
  5. 5. The Need <ul><li>Libraries have the stash </li></ul><ul><li>Students near and far need the stash </li></ul><ul><li>Searchable texts are ideal and often imperative for research </li></ul><ul><li>Libraries can’t transcribe entire manuscript collections, though often can upload them. </li></ul><ul><li>A resource for affordable mass transcription is necessary </li></ul>
  6. 6. The Project <ul><li>Written Rummage </li></ul><ul><ul><li>Use crowdsourcing resources to distribute large handwritten documents to unknown taskforce for transcription. </li></ul></ul><ul><ul><li>Use crowdsourcing resources to enact quality control (i.e. proofread and correct) the submitted transcriptions. </li></ul></ul><ul><ul><li>Pursue possibly systematizing to provide a non-profit organization for libraries, universities and other private collectors. </li></ul></ul>
  7. 7. Snapter <ul><li>Online program </li></ul><ul><li>Free version functional </li></ul><ul><li>Advanced version (library compatible) </li></ul><ul><li>Photograph pages rather than scan </li></ul><ul><li>Ideal for manuscript health </li></ul><ul><li>Program reshapes images for readability </li></ul><ul><li>Converts to PDF </li></ul><ul><li>PDF can be given URL </li></ul>
  8. 8. Mechanical Turk <ul><li>Online crowdsourcing resource </li></ul><ul><li>Ideal for distribution of small, simple tasks (called HITs) </li></ul><ul><li>“ Requester-Worker” dynamic </li></ul><ul><li>Description and pay proposed by requester; worker decides on tasks </li></ul><ul><li>Pay at requester’s discretion </li></ul><ul><li>Text-box option </li></ul><ul><li>Can code URL links into HITs </li></ul>
  9. 9. MTurk Practicality <ul><li>Can publish numerous HITs at a time (.cvs) </li></ul><ul><li>Can vary prices and expiration time per HIT or HIT batch </li></ul><ul><li>Little incentive to do poor work </li></ul><ul><li>Poor work can be quality-controlled </li></ul><ul><li>Cost-effective </li></ul><ul><li>Notable: HIT completion time vs. Pay </li></ul>
  10. 10. Google Docs <ul><li>Google account holders can share documents online in varying degrees. </li></ul><ul><li>Each document receives own URL </li></ul><ul><li>Can be shared with non-account holders </li></ul><ul><li>Equipped with most type-necessary tools (I.e. paragraphing, font mod, cut/paste) </li></ul>
  11. 11. Google Doc Practicality <ul><li>Copy-Paste submitted transcription to Google Doc. </li></ul><ul><li>Save and thus create URL to “share with everyone.” </li></ul><ul><li>Provide link to that URL in Proofread template </li></ul><ul><li>Also provide link to original manuscript for comparison. </li></ul><ul><li>Insert and inform users of errors </li></ul>
  12. 12. Process <ul><li>Link image URL(s) to HIT </li></ul><ul><li>Establish HIT compensation </li></ul><ul><li>Receive transcriptions </li></ul><ul><li>Store to Google Docs, make sharable </li></ul><ul><li>Make HIT with transcription and image link </li></ul><ul><li>Establish HIT compensation </li></ul><ul><li>Receive 2 nd transcriptions and store in Google Docs </li></ul><ul><li>Check off in Transcription Track Speadsheet </li></ul>
  13. 13. Results <ul><li>Collection: Frederick Douglass Diary </li></ul><ul><li>Number transcribed: 68/72 </li></ul><ul><li>Number transcribed and proofread: 28/72 </li></ul><ul><li>Quality: inconsistent </li></ul>
  14. 14. Frederick Douglass Diary Manuscript # 2
  15. 15.   Several letters came on board and were handed to me after after our voyage began and we were well on our way to the gates of the Sea, and unfortunately the Pilots left the ship too soon to permits of sending in word of answer or a word of farewell to many dear friends to whom its would have been a pleasure to ahve sent such a word. We have now been on this noble ship about .8. hours . and no sign of the dreaded trouble of Sea Sickness . Had two or Three Shark talks with mr Brulock and mr Trayland. The latter is a remarkable clever talker , and is a man very free from pretense , but I fear is a little biased in his politics by mugwanfuism I like him. We have but few few passengers and of them there is a large sprinkling of ladies. Most of the latter thus far have spent their time on deck in Steamer chains with B ooks <strike>laze</strike> peacefully closed on this Lazy morn as <strike>soon</strike> moments tharn for use s while the Men walk the deck and    Several letters came on board and were handed to me after our voyage began and we were well on our way to the gates of the sea and unfortunately the pilots left the ship too soon to possibility of sending a word of answer or a word of farewell to many dear friends to whom it would have been a pleasure to have sent such a word. We have now been on this noble ship about 8 hours and no sign of the dreaded trouble of sea sickness. Had two or three sharp talks with Mr Brulock and Mr Wayland. The latter is a remarkable clever talker and is a man very free from pretense but I fear is a little biased in his politics by mugrerantism. I like him.We have but few few passengers and of them there is a large sprinkling of ladies. Much of the latter thus for have spend their time on deck in steamer chairs with books peacefully closed on their laps more as ornaments than for use while the men walk the deck and
  16. 16. Frederick Douglass Diary Manuscript #21
  17. 17. <ul><li>Jan 21. Another bright day. Cool and bracing and the blue sky answered well stereotyped descriptions. The time was favorable for viewing the vast assemblage of shattered ruins spread out before us in the great Roman Forum. The Forum of Tragic and other features of the ancient greatness of Rome we did not view these alone. Men and women were there penhafed , from each quarter of the globe, seated or standing in the sunlight with pencils, pens and note books in their hands, noticing the fuller columns, broken tablets over which skillful artificers thought and wrought long before the Babe of Bethlehem was born. I have seen nothing more impressive and solemn. Nothing that tells so eloquently the story that all who live must die and at last - not only for man but for all his best endeavors it is dust to dust ashes to ashes. Marble, granite in whatever vastness shape hardness or position. Much yeild to the deft touch of time. Yet how grandly and persistently have these old tablets marble blocks resisted. How nobly they have continued to bear testimony to the energy the ambition and the greatness of the people who two thousand years </li></ul><ul><li>Jan 21. Another bright day. Cool and bracing and the blue sky answered well stereotyped descriptions. The time was favorable for viewing the vast assemblage of shattered ruins spread out before us in the great Roman Forum. </li></ul>
  18. 18. Results: Process & Economics <ul><li>Base 1 st submission pay: $0.03 </li></ul><ul><li>Current 1 st submission pay: $0.10 </li></ul><ul><li>Base 2 nd submission pay: $0.05 </li></ul><ul><li>Current 2 nd submission pay: $0.10 </li></ul><ul><li>If task not completed after slated time, increase wage by $0.01 or $0.02 </li></ul><ul><li>Amazon MTurk service fee (10%) </li></ul><ul><li>Total Paid: $13.418 </li></ul><ul><li>More tests pending for time/pay variable </li></ul>
  19. 19. Results: Problems <ul><li>Choppy production pace </li></ul><ul><li>Increased wages </li></ul><ul><ul><li>Quicker production </li></ul></ul><ul><ul><li>Same quality of production </li></ul></ul><ul><li>Processing pace tedious </li></ul><ul><ul><li>Google Doc transfer </li></ul></ul><ul><ul><li>Proofreading accountability </li></ul></ul><ul><li>Need system to verify accuracy </li></ul>
  20. 20. Projections, Expansions, Visions <ul><li>Functional automated program </li></ul><ul><li>Receive digital images from libraries </li></ul><ul><li>Format submitted manuscripts (Snapter) </li></ul><ul><li>Whole batch of manuscripts made HITs </li></ul><ul><li>Not done: increase pay by intervals </li></ul><ul><li>Automated error generator? </li></ul><ul><li>Better proofreading accountability </li></ul><ul><li>Send manuscripts back to library </li></ul>
  21. 21. Lecheim

×