The Rise of Crowd Computing
School of Information @mattlease
University of Texas at Austin firstname.lastname@example.org
“The place where people & technology meet”
~ Wobbrock et al., 2009
“iSchools” now exist at 65 universities around the world
What’s an Information School?
• Motivation from Artificial Intelligence (AI)
– Need for Plentiful Labeled Data
– Need for Capabilities Beyond What AI Can Deliver
• The Rise of Crowd Computing
– 1st Wave: Crowd-based data labeling
• Mechanical Turk & Beyond
– 2nd Wave: Crowd-based Human Computation
• Delivering beyond state-of-the-art AI applications today
• Open Problems
AI effectiveness is often limited by training data size
Problem: creating labeled data is expensive!
Banko and Brill (2001)
Motivation 2: What do we do when
state-of-the-art AI isn’t good enough?
• Jeff Howe. Wired, June 2006.
• Take a job traditionally
performed by a known agent
(often an employee)
• Outsource it to an undefined,
generally large group of
people via an open call
• Marketplace for paid crowd work (“micro-tasks”)
– Created in 2005 (remains in “beta” today)
• On-demand, scalable, 24/7 global workforce
• API lets human labor be integrated into software
– “You’ve heard of software-as-a-service. Now this is human-as-a-service.”
Amazon Mechanical Turk (MTurk)
The Early Days
• Artificial Intelligence, With Help From the Humans.
– J. Pontin. NY Times, March 25, 2007
• Is Amazon's Mechanical Turk a Failure? April 9, 2007
– “As of this writing, there are [only] 128 Human Intelligence
Tasks available via the Mechanical Turk task page.”
• Su et al., WWW 2007: “a web-based human data
collection system that we [call] ‘System M’ ”
The 1st Wave of Crowd Computing:
Data Collection via Crowdsourcing
MTurk “Discovery” sparks rush for “gold” labels across areas
• Alonso et al., SIGIR Forum (Information Retrieval)
• Kittur et al., CHI (Human-Computer Interaction)
• Sorokin and Forsythe, CVPR (Computer Vision)
Snow et al, EMNLP (NLP)
• Annotating human language
• 22,000 labels for only US $26
• Crowd’s consensus labels can
replace traditional expert labels
NLP Example – Dialect Identification
See work by Chris Callison-Burch. Interface:
Social & Behavioral Sciences
• A Guide to Behavioral Experiments
on Mechanical Turk
– W. Mason and S. Suri (2010). SSRN online.
• Crowdsourcing for Human Subjects Research
– L. Schmidt (CrowdConf 2010)
• Crowdsourcing Content Analysis for Behavioral Research:
Insights from Mechanical Turk
– Conley & Tosti-Kharas (2010). Academy of Management
• Amazon's Mechanical Turk : A New Source of
Inexpensive, Yet High-Quality, Data?
– M. Buhrmester et al. (2011). Perspectives… 6(1):3-5.
– see also: Amazon Mechanical Turk Guide for Social Scientists
Many Platforms for Paid Crowd Work
JobBoy, microWorkers, MiniFreelance,
MiniJobz, MinuteWorkers, MyEasyTask,
OpTask, ShortTask, SimpleWorkers
Why Eytan Adar hates MTurk Research
(CHI 2011 CHC Workshop)
• Overly-narrow research focus on MTurk
– Distinguish general vs. platform-specific problems
– Distinguish research vs. industry concerns
• Should researchers really focus on…
– “...writing the user’s manual for MTurk ...”?
– “…struggl[ing] against the limits of the platform...”?
“…by rewarding quick demonstrations of the tool’s
use, we fail to attain a deeper understanding of the
problems to which it is applied…”
Beyond Mechanical Turk: An Analysis of
Paid Crowd Work Platforms
Vakharia and Lease, iConference 2015
Qualitative assessment of 7 platforms for paid crowd work
Crowdsourcing Transcription Beyond
With Haofeng Zhou & Denys Baskov
HCOMP 2013 Speech Workshop
Tracking Sentiment (Access Resource)
Brew et al., PAIS 2010
– Work in exchange for
access to rich content
• Never-ending Learning
– Continual model
updates as what is
relevant vs. not
changes over time
Princeton University Press, 2005
• What was old is new
• Crowdsourcing: A New
Branch of Computer Science
– D.A. Grier, IEEE President
• Tabulating the heavens:
computing the Nautical
Almanac in 18th-century
– M. Croarken (2003)
The Mechanical Turk
The original, constructed and
unveiled in 1770 by Wolfgang
von Kempelen (1734–1804)
J. Pontin. Artificial Intelligence, With Help From
the Humans. New York Times (March 25, 2007)
The Human Processing Unit (HPU)
Davis et al. (2010)
ACM Queue, May 2006
“Software developers with innovative ideas for
businesses and technologies are constrained by the
limits of artificial intelligence… If software developers
could programmatically access and incorporate human
intelligence into their applications, a whole new class
of innovative businesses and applications would be
possible. This is the goal of Amazon Mechanical Turk…
people are freer to innovate because they can now
imbue software with real human intelligence.”
Creating A New Class of
Ethics Checking: The Next Frontier?
• Mark Johnson’s address at ACL 2003
– Transcript in Conduit 12(2) 2003
• Think how useful a little “ethics checker and
corrector” program integrated into a word
processor could be!
Soylent: A Word Processor with a Crowd Inside
• Bernstein et al., UIST 2010
S. Cooper et al. (2010)
Alice G. Walton. Online Gamers Help Solve Mystery of
Critical AIDS Virus Enzyme. The Atlantic, October 8, 2011.
Translation by monolingual speakers
• Bederson et al.,
• See also: Morita & Ishidi, ACM IUI 2009
HCOMP 2013 Panel
Anand Kulkarni: “How do we
dramatically reduce the complexity of
getting work done with the crowd?”
Greg Little: How can we post a task and
with 98% confidence know we’ll get a
How to ensure data quality?
• Research on statistical quality control methods
– Online vs. offline, feature-based vs. content-agnostic
– Worker calibration, noise vs. bias, weighted voting
• Human factors matter too!
– Instructions, design, interface, interaction
– Names, relationship, reputation Fair pay, hourly vs.
per-task, recognition, advancement
for Research on
Is everyone just lazy, stupid, or deceitful?!?
Many published papers seem to suggest this
• “Lazy Turkers”
But why can’t the workers just get it
right to begin with?
What is our responsibility?
• Ill-defined/incomplete/ambiguous/subjective task?
• Confusing, difficult, or unusable interface?
• Incomplete or unclear instructions?
• Insufficient or unhelpful examples given?
• Gold standard with low or unknown inter-assessor
agreement (i.e. measurement error in assessing
• Task design matters! (garbage in = garbage out)
– Report it for review, completeness, & reproducibility
What about context?
“Best practices” for crowdsourcing design often
minimizes context to maximize task efficiency
– e.g. “Are these pictures of the same person?”
Importance of Informed Consent +
Potential for Oppression, Crime, & War
Jonathan Zittrain, Minds for Sale
• A. Baio, November 2008. The Faces of Mechanical Turk.
• P. Ipeirotis. March 2010. The New Demographics of
• J. Ross, et al. Who are the Crowdworkers? CHI 2010.
What about ethics?
• Silberman, Irani, and Ross (2010)
– “How should we… conceptualize the role of these people
who we ask to power our computing?”
• Irani and Silberman (2013)
– “…by hiding workers behind web forms and APIs…
employers see themselves as builders of innovative
technologies, rather than… unconcerned with working
conditions… redirecting focus to the innovation of human
computation as a field of technological achievement.”
• Fort, Adda, and Cohen (2011)
– “…opportunities for our community to deliberately
value ethics above cost savings.” 59
Digital Dirty Jobs
• The Googler who Looked at the Worst of the Internet
• Policing the Web’s Lurid Precincts
• Facebook content moderation
• The dirty job of keeping Facebook clean
• Even linguistic annotators report stress &
nightmares from reading news articles!
What about freedom?
• Crowdsourcing vision: empowering freedom
– work whenever you want for whomever you want
• Risk: people being compelled to perform work
– Digital sweat shops? Digital slaves?
– Chinese Prisoners used for online gold farming
– We really don’t know (and need to learn more…)
– Traction? Human Trafficking at MSR Summit’12
The Future of Crowd Work
Paper @ CSCW 2013 by
Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton 62
• Crowdsourcing is quickly transforming practice
in industry and academia via greater efficiency
• Human Computation enables a new design
space for applications, augmenting state-of-the-
art AI with human computation to offer
new capabilities and user experiences
• With people at the center of this new computing
paradigm, important research questions span
both technological and social/societal challenges