Dr. Mia Ridge presented on the British Library's crowdsourcing project to catalog over 230,000 playbills. Volunteers transcribed minimal records to extract key information like play titles, dates and theaters. Over 1,600 volunteers contributed hundreds of thousands of transcriptions. Feedback from volunteers identified errors and opportunities for improvement. Future goals include offering the platform to other researchers and addressing user experience issues to support ongoing crowdsourced work.
Crowdsourcing at the British Library: lessons learnt and future directions
1. Crowdsourcing at the British Library:
lessons learnt and future directions
Dr. Mia Ridge, digitalresearch@bl.uk
Digital Humanities Congress
Sheffield, September 2018
3. Asking the public to help with tasks that
contribute to a shared, significant goal or
research interest related to cultural heritage
collections or knowledge.
The activities and/or goals should be inherently
rewarding.
Crowdsourcing in cultural heritage
3
6. Potentially huge reach and impact
https://www.flickr.com/photos/jdevaunphotography/8456110245/ by Jason Devaun
Hundreds of millions contributions
Tens of millions online volunteers
Hundreds of completed projects
6
9. Playbills 'In the Spotlight'
Collection of over 230,000 printed sheets in 1,000
volumes
Minimal cataloguing: 'A collection of playbills from
miscellaneous Plymouth theatres 1796-1882'
No information about individual playbills,
performances, people
9
10. =005 20180813120353.0
=006 md
=007 cr|||||||||||
=008 180813r20171818enk|||||s|0||0|eng
=033 0$a18181211$pTheatre Royal (Bath, England)
=040 $aUk$beng$cUk$erda
=042 $aukblproject
=110 2$aTheatre Royal (Bath, England)$eauthor
=245 11$a[Playbill. At Theatre Royal, Bath]
=264 30$a[London] :$b[British Library Playbills Project],$c2017.
=300 $a1 online resource
=336 $atext$2rdacontent
=337 $acomputer$2rdamedia
=338 $aonline resource$2rdacarrier
=500 $aTitle devised by cataloguing agency.
=518 $aPerformace date: 11th December 1818.
=530 $aAlso available in print.
=534 $pReproduction of (manifestation):$aTheatre Royal (Bath, England)$t[Playbill. At Theatre Royal,
Bath]$c1818.
=650 0$aTheater$zGreat Britain$y19th Century.
=655 0$aPlaybills (Posters)$zEngland$zBath$y1818.
=655 7$aPlaybills (Posters)$2fast$0(OCoLC)fst01919953
=710 2$aBritish Library Playbills Project,$emanufacturer.
=773 0$aTheatre Royal (Bath, England)$tA collection of playbills from Theatre Royal, Bath 1812-
1818.$oDigital Store Playbills 178.$w(Uk)016661285
=856 40$uhttps://api.bl.uk/metadata/iiif/ark:/81055/vdc_100022589024.0x00015f$ydigitised sheet
=916 $a110 not NACO
=916 $a710/1 not NACO
=SRC $aPlaybills Project
Preparing the ground
10
17. Learning from comments
• At the head of page is the title of the main work performed - "The
Haunted Tower." The boxed title is for the ballet "Don Juan" which
follows "The Haunted Tower."
• "Merchant of Venice" followed by "Lovers' Quarrels."
• This feels as though it should be 'The POOR SOLDIER' rather than
just 'POOR SOLDIER'
• 'The' should also be highlighted as part of this title
• Title of musical farce not outlined, so cannot transcribe it.
• This play is not the same date as the main plays of the bill.
• This play is also not the same date as the main plays of the bill.
• The Death of Gen. Wolfe is a ballet, not a play.
• Is this a reference to the 'Flitch of Bacon custom' in Essex?
• forthcoming, not main item on playbill
• not sure if 'The Tragedy of' is the genre or part of the title
17
21. Future goals: platform for other
people's research?
Could we let academics and volunteers set up
new tasks based on our collections?
How would we ensure that they were
committed to recruit and motivate volunteers,
report on progress?
https://www.flickr.com/photos/internetarchivebookimages/14586824519/
21
22. Future work
Fix 'hygiene' issues with user experience (UX):
• Review and tidy documentation / help screens and posts
that have gotten out of sync
• Improve tagging / viewing tags experience
• Better access to data for researchers; easier progress
reports (update Jupyter notebooks)
New tasks? E.g. confirm theatre names, locations
Rethink 'performances' - not just plays?
Fun uses of data - #OnThisDay tweets?
Provide practice tasks with feedback on how you did
Analyse and use survey data
22
23. Thank you for listening.
http://playbills.libcrowds.com/
http://britishlibrary.typepad.co.uk/digital-scholarship/
Thanks to: Alex Mendes, Christian Algar, Beatrice Ashton-Lelliott and our 1600+ volunteers
Editor's Notes
This is the newspaper storage building - the collections we work with are huge.
My definition is partly descriptive, and partly proscriptive (what it should be, as well as what it is). The benefit should be wider than your institution e.g. improving catalogue data helps any user of the catalogue as well as the institution.
No financial rewards so has to be rewarding. Often task is quite enjoyable, and people are motivated by knowing their contribution helps make the world a better place.
'Online volunteering' is a good way of thinking about crowdsourcing in cultural heritage. Contributors are looking for a meaningful leisure activity - some just want casual activities they can pick up whenever suits them, others want an opportunity to develop deeper skills and interests. The opportunity to socialise with other people with similar interests can turn into a strong motivation for continuing for some volunteers.
If you've worked with in-person volunteer or community programmes, you already have a lot of the skills needed to run a good crowdsourcing project.
Digital tech offers serious advantages over in-person volunteering programmes. They are not tied to venue opening hours or location; not limited by conservation or handling issues once material is digitised. Allows you to reach thousands of people, or just a few interested specialists who might be located anywhere in the world. Convenience for volunteers means they can fit it in around their lives. A few minutes here and there adds up, means people can take up hobbies sooner (where previously they might have waited until retirement).
For the BL, specifically, it also aligns with our mission to make intellectual heritage available for research, inspiration and enjoyment. Because of the scale of BL collections, and the size of the uncatalogued and undigitised backlog, our goal is to help make collection items easier to find (rather than focusing on a specific research question). When working at this scale, have to focus on the basics first.
The case study... These playbills were always an obvious candidate as they had been digitised as single sheets but were only catalogued as volumes. Aiming to improve discoverability by making information about individual sheets findable in the catalogue. In future would be interested in working with researchers on more specific research questions.
The basics: Problem - There are almost a quarter of a million (230,000) printed sheets bound into 1,000 volumes. Existing catalogue records provide minimal details and do not expand beyond naming a location (town), the year(s) covered, and sometimes the name of a particular theatre.
No detail important to researchers: no titles of plays or performances; no names of actors, dramatis personae; no dates, or details of songs performed.
Varied formats, not suitable for OCR or computational processing into structured data. Crowdsourcing some structured text seemed like the most realistic way of enhancing records and aiding discoverability.
We talked to the Metadata Services team right at the start of the project, months before we had a prototype. We designed tasks to create the data that was most needed first. This work has been part of a push to create more granular metadata - single sheets, not just entire volumes - as the emphasis shifts from things that can be physically ordered to the reading room to single pages, or regions of pages.
I designed series of small tasks, rather than asking people to switch between different tasks on the one page. Currently four tasks per volume. Some tasks relate to the whole page, others to specific regions (information about specific performances). As titles are marked out (drawn around) they can be transcribed; related genres can then also be transcribed. Dates are done per sheet so can be done at any point. There's some manual management of volumes behind the scenes - reports to see what's been completed recently and could therefore move to the next stage.
We're trying to have it both ways - we're hoping people will complete lots of transcription tasks, but we're also hoping they'll get distracted and want to go off and learn more about something they've noticed.
We've reduced as many barriers to participation as we can (with the resources we have), and worked to make tasks as small and easy as we can (more could always be done) but we're also trying to encourage lots of discussion. People can download the data as it's created so they don't have to wait for access to something they're interested in.
Built in ways to do more with the playbills - you can download the specific image, view metadata from the catalogue record, and 'share' - will have link to forum thread 'spotted on in the spotlight'.
Uses IIIF images directly from the library viewer system - saved a lot on storage costs. Metadata drawn from viewer system so always up-to-date.
We're still working on the MARC import but we've got the play titles appearing in the Index tab of our Universal Viewer. We've also documented some ways to access the data via Jupyter Notebooks, which have embedded Python code that can be tweaked and run according to your interests.
With academics and staff of Nottingham's Theatre Royal we organised afternoon workshops with volunteers working on theatre archives.
One of our questions was whether they'd be interested in transcribing or using personal names for people mentioned on playbills (including actors, writers, printers, patrons, designers and more). We also did usability tests (of sorts), asking for first impressions of the site and related tasks. At the workshop, our suspicions that the value that some people might get from personal names wouldn't be equal to the work required to record them were confirmed, and we felt ok about deciding not to add this task. We also benefited from feedback on usability of specific aspects of the interface. [more precisely, what? Who knows.]
Nottingham workshop
Within each hour-long session, topics for discussion include:* About the volunteers; their interests, historical knowledge, projects. (Relatedly, do they use personal names in their current research, and if so, how? What others sources do they use, particularly for biographical data?)* Introducing In the Spotlight; first impressions of the site* What information about personal names can be usefully collected from the digitised playbills? Is it useful to include roles, character names, the relevant work or performance date?* What factors should we think about when designing the names task interface?
The Europeana Impact Model is a toolkit for assessing the impact of your project under various headings. Thinking about what we wanted to report on helped us build ways of measuring impact into the project.
Based on the Balanced Value Impact Model developed by Professor Simon Tanner, King’s College London; Europeana took the theoretical model and made it into an actionable playbook. Draws on agile, lean start-up, design thinking. Uses 'lenses' for measuring value from a specific perspective: utility, learning, community, legacy, existence and uses headings of economic, innovation, social and operational impact.
We included a comment box so that people could leave a note for us as they worked on sheets. This become really useful when we changed the system so we got the comments as emails and didn't have to go looking for them. Comments like this helped us design the 'report an error' function and work out where our 'how to' documentation needed fixing. It also got us thinking about whether we should expand the definition of a 'performance' from plays to other kinds of performances (which would mean adding a second 'categorise this title' task).
transcribe_titles_theatre_royal_margate_1796-1797_task_run.csv
Aiming for a virtuous circle - people tell us about things they've found or noticed, questions they have - we then share those - allows us to talk about the project without being self-promoting. This is important because the attention we pay to the project reaps rewards. Social media activity helps remind people that you exist so they come back and do a bit more.
When we launched a few people asked if participants could keep an eye out for various things. This indicated that people wanted to be able to note and access specific topics related to their personal interests. We translated that need into a 'tagging' system. Built a tag interface to support this - all tags can be viewed from a central page. UX needs work but it's ok as a first pass.
Additional research questions like 'for the benefit of', patronage of, song titles etc - we now have a platform for creating tasks for people but the onus is on person with the question to motivate participation. So no academics saying 'sure, that'd be useful' but not taking part in comms.
Prompt with questions like 'how much time can you commit to spending on discussion and outreach and promotion each week? We recommend 90 minutes over a week - review comments, compile progress reports, find people to answer questions, posting updates to social media, newsletters, other reports.
https://www.flickr.com/photos/internetarchivebookimages/14586824519/
Created... In the Spotlight (launching Nov 2, ok to tweet now). Based on Pybossa software, using IIIF images provided by the library's digital library system.