Crowdsourcing Transcription with Open Source Software

1,244 views

Published on

Ben Brumfield's presentation on crowdsourcing transcription tools at the Midwest Archives Conference Fall Symposium 2013. Discussion of factors for choosing a crowdsourcing tool, with screen-shots and analysis of Scripto, the Bentham Transcription Desks, the NARA Transcribr Drupal module, Zooniverse's Scribe, and live demos of the hosted tools Virtual Transcription Laboratory, WIkisource.org, and FromThePage.com

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,244
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Crowdsourcing Transcription with Open Source Software

  1. 1. Crowdsourcing Transcription with Open Source Software Ben Brumfield MAC Fall Symposium 2013
  2. 2. Why Transcribe?  Crowdsourcing can be − Tagging − Georectification − Identification  But if you've got scanned documents, you've got a problem
  3. 3. Serendipity: One Volunteer's Story Nat Wooding – Semi-retired data analyst – 200 pages of Julia Brumfield's 1923 diary in nine months – No relation to diarist
  4. 4. Serendipity: One Volunteer's Story Nat Wooding – Semi-retired data analyst – 200 pages of Julia Brumfield's 1923 diary in nine months – No relation to diarist – Great uncle was diarist's letter carrier, also named Nat Wooding
  5. 5. Why Crowdsource? Free Labor!
  6. 6. Why Crowdsource? Free Labor! “Free as in beer” “Free as in speech” “Free as in....
  7. 7. Free as in puppy! http://www.flickr.com/photos/magnusbrath/7614518858/
  8. 8. Why Crowdsource? “At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory.” – Trevor Owens, “Crowdsourcing Cultural Heritage: The Objectives are Upside-down”
  9. 9. Why Crowdsource? “By engaging the public in digitising our collections, we are − Increasing the scientific literacy of the public − Providing increased access to our collections − Building an advocacy network for our collections and our institutions.” – Paul Flemons, Australian Museum
  10. 10. Why Crowdsource?  Convert website visitors into volunteers  Convert volunteers into advocates  What's next?
  11. 11. Questions?
  12. 12. Choosing a Transcription Platform  The good news: – More than 30 tools to choose from!
  13. 13. Choosing a Transcription Platform  The good news: – More than 30 tools to choose from!  The bad news: – More than 30 tools to choose from!
  14. 14. Selection Factors ● Source Material ● Transcript Purpose ● Organizational/Project Management Fit ● Financial and Technical Resources
  15. 15. Source Material ● Is it of interest to anyone else? ● Is it under copyright? ● Does it need restricted access? ● Is it composed of “text” or “records”? ● How complex is the layout? How important is that layout?
  16. 16. Purpose •How will you be using the transcribed data? – Traditional print editions – Searchable online editions •Do you want to use the system to analyze the text? •Do you need to import the transcripts into other systems? •Is public engagement the only goal?
  17. 17. Organizational Fit •How important is traditional editorial workflow? •Will you rely on volunteers? How will you find and motivate them? •What is the duration of the project? •Is there a "final version"? •Is TEI a mandate?
  18. 18. Financial and Technical Resources •System administrators to install non-hosted software? •Money to pay hosting costs? •Programming skills to customize a tool? •Money to pay programmers for customization? •Support for on-going costs to keep the site running, however small?
  19. 19. The Tools ● Recent (oldest started in 2005) ● Influenced by origin ● Still pretty raw ● Most require tech expertise for set-up and customization ● All require making trade-offs http://tinyurl.com/TranscriptionToolGDoc
  20. 20. Open-source, On-site Tools Scripto Bentham Transcription Desk NARA Transcribr Drupal Module Zooniverse Scribe
  21. 21. Quick Definitions MediaWiki: Popular software framework for runnning wiki projects Wikipedia, Wikisource, Wiktionary, Wikitravel: Projects running on MediaWiki WikiMedia: Organization running many—but not all—MediaWiki-based wiki projects.
  22. 22. Hosted Tools Virtual Transcription Laboratory Wikisource.org FromThePage.com
  23. 23. Virtual Transcription Laboratory
  24. 24. Virtual Transcription Laboratory
  25. 25. Wikisource Live demo of State Library of Queensland on Wikisource showing project page, edit screen, and editorial workflow. Recommendation of Lori and the GLAMWiki group to help organizations navigate the community.
  26. 26. FromThePage Live demo of FromThePage showing edit screen, wiki-linking a single term, read pages for a subject, full-text search on name variants, and auto-link.
  27. 27. Thanks! Ben Brumfield benwbrum@gmail.com @benwbrum http://manuscripttranscription.blogspot.com My transcription tools: – FromThePage.com – OpenSourceIndexing.org http://tinyurl.com/TranscriptionToolGDoc

×