Presentation by Ines Byrne, National Library of Scotland. Invited talk at a workshop for 'Scotland's National Collections and the Digital Humanities,' a knowledge-exchange project hosted at the University of Edinburgh. 12 September 2014. http://www.blogs.hss.ed.ac.uk/archives-now/
Finals of Kant get Marx 2.0 : a general politics quiz
Transcribe NLS: Crowdsourcing at the National Library of Scotland
1. Transcribe NLS
Crowdsourcing at the National Library of Scotland
Scotland's National Collections and the Digital Humanities
Workshop 3: Research and/as Engagement
12 September 2014
Ines Byrne, Digital Collections Specialist, National Library of Scotland
i.byrne@nls.uk
2. Transcribe NLS
Question:
What is crowdsourcing?
Answer:
More than just free labour!
Actively engaging the public in our
collection development
Work we could never
resource ourselves
3. Transcribe NLS
Plans at NLS:
manuscript transcriptions - Transcribe NLS
printed text (OCR) corrections
indexing/tagging
4. crowdsourcing transcriptions
around the world
National Archives USA http://www.archives.gov/citizen-archivist/transcribe/
University of Iowa http://diyhistory.lib.uiowa.edu/
National Archives of Australia http://transcribe.naa.gov.au/
The Smithsonian https://transcription.si.edu/
Transcribe Bentham http://www.transcribe-bentham.da.ulcc.ac.uk
ScotlandsPlaces http://www.scotlandsplaces.gov.uk/transcribe
5. What can we learn from them?
crowd activity level
only 21% of registered transcribers produce transcriptions
70% of transcriptions come from 3% of active transcribers
on average, one transcriber contributes 2 working hours per week
on average, one transcriber works on 6 pages per hour
6. What can we learn from them?
what the transcriber wants
guidelines on how to use the tool
be able to flag up issues with more experienced transcribers
personal activity logs to keep track of their activity and history
7. What can we learn from them?
What is off-putting?
extensive and complicated instructions
hard-to-read handwriting
technical issues with the tool
complexity of mark-up encoding
8. What can we learn from them?
how to keep the crowd motivated
• feeling trusted and respected
• recognition – “show us how we fit into the big picture”
• the outcome – “tell us what your aim is – we want to help achieve it”
• clear instructions – “we will work better if we know what you want
from us”
• ability to communicate with other transcribers
• constantly adding more material to be transcribed – “keep us busy”
9. What can we learn from them?
lessons learned
• reply to enquiries promptly or else you lose your transcribers
• the public have more spare time than you can imagine
• moderation raises quality but is time-consuming (Bentham’s 2 moderators
could have produced 2.5 times more transcriptions than the crowd did)
• one system doesn’t fit all content adapt your tool to fit your material
• majority of transcribers are not very experienced
• invest in training videos
• go live quietly to be able to deal with all the issues arising
• no transcriber felt being exploited
10. What is it that we want to build?
Easy to use no complex mark-up encoding
No resources for moderation self-moderation
(loss of quality control)
High-level control registration required
12. Transcribe NLS
Materials for transcription:
Marjory Fleming diary
mountaineering diaries and notebooks
recipe books
genealogy materials
16. Thank you
Ines Byrne
National Library of Scotland
i.byrne@nls.uk
Editor's Notes
How do we incorporate feedback:
personal activity log registration allows for “my archive”
flag up issues with other transcribers forum
guidelines will have link to “how to use the toll” guide
training videos will have link to video with voice-over
off-putting: complex instructions guidelines are short and free from “jargon”
off-putting: technical issues “contact us”
reply to enquiries “contact us”
feeling trusted self-correcting within the crowd
“How do I fit into the bigger picture” say what our aim is
How do we incorporate feedback:
constantly add more materials drop-feeding
transcribers may not be experienced select easy materials (no complex structure, no messy layout, easy-to-read hand)
How do we incorporate feedback:
many are not very experienced intuitive tool
link to “how to use the tool” guidelines
documents are easily legible
off-putting: complexity of mark-up plain text editor (no TEI)
special mark-up via blue buttons (e.g. strike-through, illegible, marginal text)
ability to communicate with others link to forum
Other features:
image and transcription box viewable side by side or below each other
includes version history
transcriptions are visible to everyone, editable after log-in
saving as “finished” or “in progress” you can come back any time
saved transcriptions are immediately available via online search engines and soon after in our Digital Gallery
Integrated into existing NLS systems
many project build from scratch (e.g. via open source MediaWiki)
NLS built in-house to allow for system integration
Downsides
bound to existing look & feel
old-fashioned
boring
bound to existing functionality and layout
not extremely intuitive (e.g. facets on left)
navigation to next page in tool
display of metadata (how we can query the database)
bound to existing terminology
mainly in tool
duplication of materials in Digital Gallery
delay in transcriptions being visible in Digital Gallery discrepancy between tool and Gallery
Still great?
Yes
Huge potential to move accessibility to our collections forward
Huge potential to engage the public in our collections and collection-building