Roots and Routes: Crowdsourced Manuscript Transcription Workshop
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Roots and Routes: Crowdsourced Manuscript Transcription Workshop

Uploaded on

3-hour long workshop on crowdsourced transcription software for the University of Toronto's Roots and Routes seminar in 2012.

3-hour long workshop on crowdsourced transcription software for the University of Toronto's Roots and Routes seminar in 2012.

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. CrowdsourcedManuscript Transcription Ben Brumfield Roots and Routes 2012
  • 2. Not just crowdsourcing...● Collaborative work● Off-site solo work● Private work
  • 3. Not just manuscripts...● Maps● Textiles● Music● Flawed OCR
  • 4. Not just transcription...● Indexing● Editing● IdentificationCounting seals on Arctic ice caps.
  • 5. What it isntWell concentrate on web-based tools forextracting text from images, not addressing:● Oral History● Video● Audio Transcription● Image Manipulation● Transcription/Facsimile DisplayTools exist for these tasks, nevertheless.
  • 6. BreakWhat materials are you working with outside ofmodern, printed books and websites?
  • 7. Origins (Approaches)Two Approaches and one Dead End● Indexing● Editing● Tagging
  • 8. Indexing● Structured Data● Extracts from Text vs. Representing Text● Databases for Search and Analysis● Granular Quality Control● Gamification
  • 9. Editing● Books, Diaries, Letters, Articles● Representing Text● Traditional Editorial Workflow● Digital or Print Editions
  • 10. Tagging● Too small● Too imprecise
  • 11. Origins (Traditions)● OCR Correction● Documentary Editing● Genealogy● Natural Science● AstronomySplit this into 5 slides
  • 12. Online Tools● Recent (none older than 2005)● Influenced by origin● Still pretty raw● Most require tech expertise for set-up and customization● All require making trade-offs
  • 13. Lab Session 1: BreadthNYPL Whats on the Menu IndexingWikisource Editing
  • 14. Selection Factors● Source Material● Transcript Purpose● Organizational/Project Management Fit● Financial and Technical Resources
  • 15. Source MaterialEvaluating your source material:● Is it of interest to anyone else?● Is it under copyright?● Does it need restricted access?● Is it composed of documents or records?● Is it non-textual?● How complex is the layout? How important is that layout?
  • 16. PurposeHow will you be using the transcribed data?● Traditional print editions● Searchable online editions● Do you want to use the system to analyze the text?● How do you want to analyze the text?● Is public engagement a goal?● Should the transcripts be open?
  • 17. Organizational/Project Management Fit● How important is traditional editorial workflow?● Will you rely on volunteers? How will you motivate them?● What is the duration of the project?● Is there a "final version"?● Is TEI a mandate?
  • 18. Financial and Technical ResourcesDo you have or need:● System administrators to install non-hosted software?● Money to pay hosting costs?● Programming skills to customize a tool?● Money to pay programmers for customization?● Support for on-going costs to keep the site running, however small?
  • 19. Lab Session 2: Markup OptionsFromThePageTranscribeBentham
  • 20. Technical Questions to Answer● Where are the images now?● How do images get into the system?● How do transcripts get out of the system?● How mature is the underlying technology?● How configurable is the technology?● How does the system work with the public face of your project?● Where does the metadata live?● Who will maintain this? How long?● How many sites are using this system?
  • 21. WikisourcePro:● Mediawiki plus its add-on modules (e.g. print-on-demand, export).● Wikimedia community.● Incredibly mature.Con:● Wikimedia policy.● Public editing.● Limited mark-up.
  • 22. Bentham Transcription DeskPro:● MediaWiki is very mature.● TEI Toolbar (can also be used on other systems)● Deployed outside original project.Con:● Development efforts halted.
  • 23. ScriptoPro:● Team at CHNM has a great track record.● Your CMS is your public face.● MediaWiki is very mature.● Deployed and under active development.Con:● Your CMS handles all metadata.● Mark-up is extremely limited.
  • 24. FromThePagePro:● Designed for intensive editing and indexing.● Semantic mark-up and analysis.● Hosting available.Con:● Single developer (me).● No TEI mark-up.
  • 25. Islandora TEI EditorCaveat: I dont know much about this tool orthis team.● Based on Drupal and Fedora● Supports TEI via friendly interface● Many Drupal-based projects considering it.
  • 26. T-PENCaveat: I dont know much about this tool.● Designed for medieval manuscripts.● Supports TEI natively.● Line-by-line interface.● Hosted version available.
  • 27. ScribePro:● Excellent for complex layout or non- documentary transcription.● Zooniverse team is large, well-funded, experienced.● Configurable.Con:● No automated tool for loading images or viewing transcript database (yet!)● No concept of image-as-a-text.
  • 28. PybossaCaveat: I dont know much about this tool orthis team.● Open Knowledge Foundations crowdsourcing task management tool.● Designed for tabular data.● Google Spreadsheet data entry.● Extremely young.
  • 29. TextLabCaveat: I dont know much about this tool orthis team.● Melville Electronic Library.● Direct addition of TEI tags to image.
  • 30. Lab Session 3: ConfigurationScribe Old Weather, Whats the Score, Development deployments
  • 31. Find me Ben Brumfield @benwbrum