Presentation given at the Linguistic Data Consortium (LDC), University of Pennsylvania, April 2019. Based on presentations at the 6th ACM Collective Intelligence Conference, 2018 and the 6th AAAI Conference on Human Computation & Crowdsourcing (HCOMP), 2018. Blog post: https://blog.humancomputation.com/?p=9932.
BUT WHO PROTECTS THE
BRANDON DANG1, MARTIN J. RIEDL2, AND MATTHEW LEASE1
1School of Information, 2School of Journalism (both students contributed equally)
The University of Texas at Austin
AAAI HCOMP -&- ACM Collective Intelligence
July 2018, Zurich, Switzerland
“Gold rush” for crowdsourced labels in NLP
Snow et al, EMNLP 2008
• Annotating human language for
natural language processing (NLP)
• 22,000 labels for only $26 USD
• Crowd’s consensus labels can
replace traditional expert labels
Simultaneous “gold” rush across other areas
• Alonso et al., SIGIR Forum (Information Retrieval)
• Kittur et al., CHI (Human-Computer Interaction)
• Sorokin and Forsythe, CVPR (Computer Vision)
Matt Lease <firstname.lastname@example.org>
ACM Queue 2006 – Human Computation
“Software developers with innovative ideas for businesses and
technologies are constrained by the limits of artificial intelligence… If
software developers could programmatically access and incorporate
human intelligence into their applications, a whole new class of
innovative businesses and applications would be possible. This is the
goal of Amazon Mechanical Turk… people are freer to innovate
because they can now imbue software with real human intelligence.”
Soylent: A Word Processor with a Crowd Inside
• Bernstein et al., UIST 2010
But what about ethics?
• Fort, Adda, and Cohen (2011) – Gold Mine or Coal Mine?
• “…opportunities for our community to deliberately value ethics above cost savings.”
• Silberman, Irani, and Ross (2010)
• “How should we… conceptualize the role of [those] we ask to power our computing?”
• Irani and Silberman (2013)
• “…by hiding workers behind web forms and APIs… employers see themselves as
builders of innovative technologies, rather than… employers unconcerned with
working conditions… redirecting focus to the innovation of human computation
as a field of technological achievement.”
“Jeff Howe reveals that the crowd is more than
wise–it’s talented, creative, and stunningly
productive. It’s also a perfect meritocracy, where
age, gender, race, education, and job history no
longer matter; the quality of the work is all that
counts. If you can perform the service, design the
product, or solve the problem, you’ve got the job.”
Another Task: Online Content Moderation
• Many online platforms allow/encourage user generated content
• However, some types of content disallowed
• e.g., Pornography and nudity, depictions of violence, hate speech
• What is considered acceptable varies by platform and region;
often strong overlap but notable differences
• Also issues of free speech & due process in content removal & remediation
• Idea: AI detection & filtering
• Problem: Insufficient accuracy. What to do?
• Go-to solution when AI not good enough? Human Computation!
Digital “Dirty Jobs”
• The Googler who Looked at the Worst of the Internet
• Facebook content moderation
• The dirty job of keeping Facebook clean
• Even linguistic annotators report stress &
nightmares from reading news articles
(Strauss et al., LREC 2000)
Litigation & research
• Soto & Blauert vs. Microsoft Corporation (2018)
• Two content moderators report post-traumatic
stress disorder (Ghoshal 2017) from having to watch
child pornography as content moderators
• Growing research awareness & interest
• Conferences and workshops, e.g., at UCLA,
Santa Clara University, USC, and
Alexander von Humboldt Institute for Internet and Society
The great irony
The sort of task we most want an algorithm to do (emotionally disturbing)
is what people are instead doing because the algorithm isn’t good enough
Assuming such work will occur regardless, how can we protect the
workers engaged in it?
How can we reveal the minimum amount of information to a human
reviewer such that an objectionable image is still correctly identified?
But Who Protects the Moderators?
• Data collection in progress…
• Concept paper: https://arxiv.org/pdf/1804.10999.pdf
• Gillespie, T. (2018). Custodians of the internet: Platforms, content moderation, and the hidden decisions that
shape social media. Yale University Press.
• Grimmelmann, J. (2015). The virtues of moderation. The Yale Journal of Law & Technology, 17(1), 42–68
• Klonick, K. (2018). The new governors: The people, rules, and processes governing online speech. Harvard
Law Review, 131
• Myers West, S. (2018). Censored, suspended, shadowbanned: User interpretations of content moderation
on social media platforms. New Media & Society.
• Roberts, S. T. (2014). Behind the screen: The hidden digital labor of commercial content moderation. UIUC
Dang, B.*, Riedl, M. J.* & Lease, M. (2018): Toward Safer Crowdsourced Content Moderation 6th
ACM Collective Intelligence Conference, July 7-8, 2018, Zurich, Switzerland.
Dang, B.*, Riedl, M. J.* & Lease, M. (2018): But Who Protects the Moderators? The Case of
Crowdsourced Image Moderation. 6th AAAI Conference on Human Computation & Crowdsourcing.