Crowdsourcing projects have generated millions of data points through volunteer contributions of classifications, tags and other information about cultural heritage and scientific collections. However, to what extent have crowdsourcing and citizen science projects democratised knowledge about the past within 'official' collections and knowledge management systems? And how would infrastructures and policies in cultural heritage organisations need to change to allow deeper integration with knowledge captured through citizen science projects?
Infrastructural Tensions: Infrastructure, Implementation, Policies
The event is a collaboration between Digital Humanities Uppsala, Uppsala University Library, the Department of Archives, Museums and Libraries (ALM), and Uppsala Forum on Democracy, Peace and Justice.
VVIP Pune Call Girls Hadapsar (7001035870) Pune Escorts Nearby with Complete ...
Hopes, dreams and reality: crowdsourcing and the democratisation of knowledge in cultural heritage
1. Hopes, dreams and reality:
crowdsourcing and the
democratisation of
knowledge in cultural
heritage
Dr Mia Ridge @mia_out
Infrastructural Tensions
Uppsala, August 2019
2. Overview
• What is crowdsourcing in cultural heritage?
• Why is it needed?
• How do you define 'success'?
• Key examples
• What infrastructure underlies crowdsourcing in
cultural heritage?
• To what extent have projects democratised
knowledge about the past?
• What’s next?
3. Asking the public to help with tasks that contribute to a
shared, significant goal or research interest related to
cultural heritage collections or knowledge.
As unpaid work, the activities and/or goals should be
inherently rewarding.
Crowdsourcing in cultural heritage
https://www.flickr.com/photos/nlireland/5786204856
3
32. Collections management
systems
• Focused on
management, discovery
• Use GLAM standards
• Access only for trained
cataloguers, subject
experts
• Information on
‘deliverable unit’
• Uses authorities,
controlled vocabularies
• Compulsory fields
enforced
Crowdsourcing platforms
• Focused on learning,
engagement, discovery
• New forms of data
• Participant expertise
unknown and variable
• Information about page,
region
• Folksonomies, free-form
tags
• Quality control varies by
task type
• Optional processes ->
uneven data, hard to
analyse
33. Prefers
• ‘Type what you see’
transcriptions, tags and
other easily verifiable
information
• Object-focused data that
can fit into existing
catalogue fields
Struggles with
• Resources for manually
validating contributions
• Finding somewhere to
store, manage and preserve
personal or unverifiable
stories and metadata
• Implementing changes to
working practices across
the organisation
Cultural heritage infrastructure
34. To what extent has crowdsourcing
democratised knowledge about
the past?
35. ‘Democratising knowledge’?
• More voices, different kinds of knowledge and
expertise?
• I designed / used / remember that thing
• I have relevant family / local history information to share
• I’m a self-taught expert / hobbyist with knowledge
about the object
• I’m a credentialed expert (but don’t work for a GLAM)
• Knowledge contributed included in collections
systems?
• The ability to decide what to work on?
• The ability to devise new tasks or projects?
36. Crowdsourcing is mostly contributory
• Contributory projects, which are generally designed by
scientists and for which members of the public
primarily contribute data
• Collaborative projects, which are generally designed by
scientists and for which members of the public
contribute data but also may help to refine project
design, analyze data, or disseminate findings
• Co-created projects, which are designed by scientists
and members of the public working together and for
which at least some of the public participants are
actively involved in most or all steps of the scientific
process.
Bonney et. al. (2009). Public Participation in Scientific Research: Defining the Field and
Assessing Its Potential for Informal Science Education. A CAISE Inquiry Group Report
37. Task complexity vs audience size
https://www.flickr.com/photos/library_of_congress/2162650585
Tedium
Complexity
Participation
39. Cultural heritage organisations
aren’t the only sites of knowledge
By Version 1 by Nohat (concept by Paullusmagnus); Wikimedia. - File:Wikipedia-logo.svg as of 14 May 2010T23:16:42, CC BY-SA 3.0,
https://en.wikipedia.org/w/index.php?curid=33285413
47. …for emerging sources of metadata
https://www.flickr.com/photos/21133841@N03/4108723484byKollageK
48. Opportunities to shift GLAM infrastructure
from ‘catalogue’ to ‘lake’ and provide
platforms for collaborative work?
https://www.flickr.com/photos/missouristatearchives/11653956994
49. Thank you for listening.
Thoughts?
Dr. Mia Ridge, @mia_out
Digital Curator, British Library
digitalresearch@bl.uk @BL_DigiSchol
Infrastructural Tensions, Uppsala, August 2019
Editor's Notes
Crowdsourcing projects have generated millions of data points through volunteer contributions of classifications, tags and other information about cultural heritage and scientific collections. However, to what extent have crowdsourcing and citizen science projects democratised knowledge about the past within 'official' collections and knowledge management systems? And how would infrastructures and policies in cultural heritage organisations need to change to allow deeper integration with knowledge captured through citizen science projects?
'the purpose of this two-day workshop is to explore the aesthetic, socio-political, and organizational dimensions of digitization of historical texts and objects, targeting precisely questions of open citizen science and the democratization of knowledge about the past'
My definition is partly descriptive, and partly proscriptive (what it should be, as well as what it is). The benefit should be wider than your institution e.g. improving catalogue data helps any user of the catalogue as well as the institution.
No financial rewards so has to be rewarding. Often task is quite enjoyable, and people are motivated by knowing their contribution helps make the world a better place.
'Online volunteering' is a good way of thinking about crowdsourcing in cultural heritage. Contributors are looking for a meaningful leisure activity - some just want casual activities they can pick up whenever suits them, others want an opportunity to develop deeper skills and interests. The opportunity to socialise with other people with similar interests can turn into a strong motivation for continuing for some volunteers.
If you've worked with in-person volunteer or community programmes, you already have a lot of the skills needed to run a good crowdsourcing project.
Digital tech offers serious advantages over in-person volunteering programmes. They are not tied to venue opening hours or location; not limited by conservation or handling issues once material is digitised. Allows you to reach thousands of people, or just a few interested specialists who might be located anywhere in the world. Convenience for volunteers means they can fit it in around their lives. A few minutes here and there adds up, means people can take up hobbies sooner (where previously they might have waited until retirement).
The long tradition of volunteering in museums and cultural heritage encompasses both citizen science and citizen history. The OED is one famous example (though in today's terms we might say they started out with nichesourcing and moved to crowdsourcing).
So crowdsourcing as we know it has been transformed by technology, but not created by it. Networked technology's addition of automatic validation, the speed of data gathering and feedback, and the ability to reach both broad and niche groups through loose networks have all been particularly important. For museums, technology has also helped manage the sheer physical issue of providing access to a museum's collection without space or conservation limitations.
There are many examples of natural history collecting and observation. In 1849 the Smithsonian meteorological observation project began with 150 volunteers, within ten years it had more than 600 volunteer observers, including people in Canada, Mexico, Latin America, and the Caribbean
19th Century natural history collecting
1849 Smithsonian weather observation project
1857, 1879 Oxford English Dictionary appeals
WWII Soldiers given a Field Collector's Manual in Natural History by the US Museum of Natural History
Sources:
Bruno, Elena. 2011. 'Smithsonian Crowdsourcing Since 1849!' Smithsonian Institution Archives. April 14. http://siarchives.si.edu/blog/smithsonian-crowdsourcing-1849.
Millikan, Frank Rives. 2012. 'Joseph Henry: Father of Weather Service'. The Joseph Henry Papers Project, Smithsonian Institution Archives. Accessed October 28. http://siarchives.si.edu/history/jhp/joseph03.htm.
The long tradition of volunteering in museums and cultural heritage encompasses both citizen science and citizen history. The OED is one famous example (though in today's terms we might say they started out with nichesourcing and moved to crowdsourcing).
So crowdsourcing as we know it has been transformed by technology, but not created by it. Networked technology's addition of automatic validation, the speed of data gathering and feedback, and the ability to reach both broad and niche groups through loose networks have all been particularly important. For museums, technology has also helped manage the sheer physical issue of providing access to a museum's collection without space or conservation limitations.
There are many examples of natural history collecting and observation. In 1849 the Smithsonian meteorological observation project began with 150 volunteers, within ten years it had more than 600 volunteer observers, including people in Canada, Mexico, Latin America, and the Caribbean
19th Century natural history collecting
1849 Smithsonian weather observation project
1857, 1879 Oxford English Dictionary appeals
WWII Soldiers given a Field Collector's Manual in Natural History by the US Museum of Natural History
Sources:
Bruno, Elena. 2011. 'Smithsonian Crowdsourcing Since 1849!' Smithsonian Institution Archives. April 14. http://siarchives.si.edu/blog/smithsonian-crowdsourcing-1849.
Millikan, Frank Rives. 2012. 'Joseph Henry: Father of Weather Service'. The Joseph Henry Papers Project, Smithsonian Institution Archives. Accessed October 28. http://siarchives.si.edu/history/jhp/joseph03.htm.
GLAMs are galleries, libraries, archives, museums.
Manually enhancing collections records is expensive and time-consuming. Very few orgs have the resources for straight digitisation.
Even when metadata or information records are created by professional cataloguers, the content is often designed for internal or specialist users and doesn't use the everyday language our audiences might use to find material.
There's a lot of specialist expertise outside the museum. There's an online community for almost every topic or type of item under the sun.
Well-designed projects can help people discover new interests, communities, or just encourage them to have a brief moment of deeper engagement with cultural heritage.
https://www.flickr.com/photos/george_eastman_house/2987740474/
George Eastman Museum St. Marks Place
Participants might be learning by following their own interests using heritage material, through discussion with others and with heritage staff, or just through spending lots of time developing familiarity with material.
https://www.flickr.com/photos/wwplarchives/4454748238/
Woodrow Wilson Presidential Library Archives
Bryn Mawr Faculty and Students, Bryn Mawr's class of 1886.
GLAMs are galleries, libraries, archives, museums.
Some headline figures re successful projects...
Can define success by the number of tasks completed. To achieve this, must also have succeeded in letting people know about your project and providing an interface that lets enough of them get started on their first tasks.
Over 320 million lines of text corrected in Australian newspapers
Every time I talk about this site I have to go check their stats because they go up all the time. The task design is satisfying enough that some people go to the site just to transcribe. A few people spent a lot of time doing the task - a pattern we'll see in nearly every project.
You can also look at the number or type of people engaged in the tasks. This is often important for organisations whose mission is to reach the public, whether that's to give them an experience of contemporary science or access to their history through specific collections. By 2014 Zooniverse projects had reached well over a million volunteers worldwide, nearly 1.7 million registered volunteers Oct 2018; FamilySearch Indexing 1.2 million volunteers in 10 years. Scientific projects might also look at the impact of publications on social media and in journals that result from their projects. Museums might look at the number of researchers who find their digitised collections.
Finally, you can look at the number of people who are deeply engaged - people whose feelings or knowledge about the material or the underlying disciplines change to the extent that they change some aspect of their behaviour. In this eg, people came for the herbarium specimens and got caught up in biographical interest of the original specimen collectors.
http://herbariaunited.org/wiki/Harry_Corbyn_Levinge or http://herbariaunited.org/wiki/Augustin_Ley
This is a new one, but it’s been percolating for a while. If the data created isn’t a) preserved and b) available for use (and re-use) as intended, has the project succeeded? The last mile is the hardest. Granularity an issue; data validation another.
Getting georeferencer data into the catalogue took years! (Unlike Trove where changes show up straight away.) Another key difference is that the corrected data is part of the infrastructure of Trove, where it’s bolted onto the Library’s catalogue. So why is this so hard?
GLAMs are galleries, libraries, archives, museums.
Classification
Flickr and Flickr Commons
Tags - describe what you see; Comment - share what you know
'the largest genealogy organization in the world' Introduced 2005. By November 2009, 160,000 online volunteers, 334 million individual names transcribed [more recent figures would be good] 'more than 10 million hits per day and more than one billion names in its database' Shows power of tapping into work that people are already highly motivated to do (and of religion, as it's The Church of Jesus Christ of Latter-day Saints)
Complex task - marking up transcriptions in XML - on difficult source material. Has a small number of very productive super-taggers… manual validation creates backlog and delay in approving content reduces feelings of reward. Post to the blog about progress help make up for it. Media coverage helped - each round drew in a few contributors.
New forms of collecting…
Richard in earlier workshop said - had moderation because had to check for copyrighted material. E.g. of a) collecting new material and b) non-text, image material
Getting into more specialised requests... Museum of Design in Plastics trying to find information about designers, methods, manufacturers for specific objects. Requires research or specialist knowledge. Not a microtask...
Not only the catalogue, but all it represents – processes ensuring discoverability, circulation, conservation, etc.
Trained cataloguers can be paid or volunteers
Trained cataloguers can be paid or volunteers
But that’s partly because… Unless you can draw on a really strong motivation (including a great community), you need to work hard to reduce task complexity and tedium
Might be more room for in-person or community-led projects to democratise knowledge. Digitisation as an enabler of all kinds of work.
Zooniverse originally created forums to help answer questions when busy after launch; but turned out to provide place for wonderful discussion. Here volunteers have compiled detailed guides to understanding handwriting on particular ships
Reserve time to pay attention!
I suppose we're trying to have it both ways - we've reduced as many barriers to participation as we can (with the resources we have), and worked to make tasks as small and easy as we can (more could always be done) but we're also trying to encourage lots of discussion. People can download the data as it's created - it might take a while to work through our systems and we wanted to provide instant access.
Built in ways to do more with the playbills - you can download the specific image, view metadata from the catalogue record, and 'share' - will have link to forum thread 'spotted on in the spotlight'
Needs to allow for things like probabilistic classifications. Don’t verify, just store and (re)use – think outside the CMS. Everybody needs a GOD?
Had been thinking about data lakes as a way of dealing with the complexities of data structures for different uses, but the need for shared infrastructure is actually more profound.
Integrating results of DS into discovery systems means those systems need to change: more ‘data lake’ than MARC? ‘Crowdsourcing / machine learning as additional info’ stuff. New data storage paradigms, new Uis/UX ideas