SlideShare a Scribd company logo
1 of 36
Toward Better Crowdsourcing Science
(& Predicting Annotator Performance)
Matt Lease
School of Information
University of Texas at Austin
ir.ischool.utexas.edu
@mattlease
ml@utexas.edu
Slides: www.slideshare.net/mattlease
“The place where people & technology meet”
~ Wobbrock et al., 2009
www.ischools.org
The Future of Crowd Work, CSCW’13
by Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
3
Matt Lease <ml@utexas.edu>
• Task Design, Language, & Occam’s Razor
• What About the Humans?
• Predicting Annotator Performance
4
Matt Lease <ml@utexas.edu>
Roadmap
Hyun Joon Jung
• Task Design, Language, & Occam’s Razor
• What About the Humans?
• Predicting Annotator Performance
5
Matt Lease <ml@utexas.edu>
Roadmap
A Popular Tale of Crowdsourcing Woe
• Heroic ML researcher asks the
crowd to perform a simple task
• Crowd (invariably) screws it up…
• “Aha!” cries the ML researcher, “Fortunately,
I know exactly how to solve this problem!”
Matt Lease <ml@utexas.edu>
6
Matt Lease <ml@utexas.edu>
7
But why can’t the workers just get it
right to begin with?
Matt Lease <ml@utexas.edu>
8
Is everyone just lazy, stupid, or deceitful?!?
Much of our literature
seems to suggest this:
• Cheaters
• Fraudsters
• “Lazy Turkers”
• Scammers
• Spammers
Another story (a parable)
“We had a great software interface, but we went
out of business because our customers were too
stupid to figure out how to use it.”
Moral
• Even if a user were stupid or lazy, we still lose
• By accepting our own responsibility, we create
another opportunity to fix the problem…
– Cynical view: idiot-proofing
Matt Lease <ml@utexas.edu>
9
What is our responsibility?
• Ill-defined/incomplete/ambiguous/subjective task?
• Confusing, difficult, or unusable interface?
• Incomplete or unclear instructions?
• Insufficient or unhelpful examples given?
• Gold standard with low or unknown inter-assessor
agreement (i.e. measurement error in assessing
response quality)?
• Task design matters! (garbage in = garbage out)
– Report it for review, completeness, & reproducibility
Matt Lease <ml@utexas.edu>
10
A Few Simple Suggestions (1 of 2)
1. Make task self-contained: everything the worker
needs to know should be visible in-task
2. Short, simple, & clear instructions with examples
3. Avoid domain-specific & advanced terminology;
write for typical people (e.g., your mom)
4. Engage worker / avoid boring stuff. If possible,
select interesting content for people to work on
5. Always ask for open-ended feedback
Matt Lease <ml@utexas.edu>
11
Omar Alonso. Guidelines for Designing Crowdsourcing-based Relevance Experiments. 2009.
Suggested Sequencing (2 of 2)
1. Simulate first draft of task with your in-house personnel.
Assess, revise, & iterate (ARI)
2. Run task using relatively few workers & examples (ARI)
1. Do workers understand the instructions?
2. How long does it take? Is pay effective & ethical?
3. Replicate results on another dataset (generalization). (ARI)
4. [Optional] qualification test. (ARI)
5. Increase items. Look for boundary items & noisy gold (ARI)
6. Increase # of workers (ARI)
Matt Lease <ml@utexas.edu>
12
Omar Alonso. Guidelines for Designing Crowdsourcing-based Relevance Experiments. 2009.
Toward Better Crowdsourcing Science
Goal: Strengthen individual studies and minimize
unwarranted spread of bias in our scientific literature
• Occam’s Razor: avoid making assumptions beyond
what the data actually tells us (avoid prejudice!)
• Enumerate hypotheses for possible causes of low data
quality, assess supporting evidence for each hypothesis,
and for any claims made, cite supporting evidence
• Recognize uncertainty of analyses and convey this via
hedge statements such as, “the data suggests that…”
• Avoid derogatory language use without very strong
supporting evidence. The crowd enables our work!!
– Acknowledge your workers!
Matt Lease <ml@utexas.edu>
13
• Task Design, Language, & Occam’s Razor
• What About the Humans?
• Predicting Annotator Performance
14
Matt Lease <ml@utexas.edu>
Roadmap
Who are
the workers?
• A. Baio, November 2008. The Faces of Mechanical Turk.
• P. Ipeirotis. March 2010. The New Demographics of
Mechanical Turk
• J. Ross, et al. Who are the Crowdworkers? CHI 2010.
15
Matt Lease <ml@utexas.edu>
CACM August, 2013
16
Paul Hyman. Communications of the ACM, Vol. 56 No. 8, Pages 19-21, August 2013.
Matt Lease <ml@utexas.edu>
• “Contribute to society and human well-being”
• “Avoid harm to others”
“As an ACM member I will
– Uphold and promote the principles of this Code
– Treat violations of this code as inconsistent with membership in the ACM”
17
Matt Lease <ml@utexas.edu>
“Which approaches are less expensive and is this sensible? With the advent of
outsourcing and off-shoring these matters become more complex and take on new
dimensions …there are often related ethical issues concerning exploitation…
“…legal, social, professional and ethical [topics] should feature in all computing degrees.”
2008 ACM/IEEE Curriculum Update
• Mistakes are made in HITs rejection, worker blocking
– e.g., student error, bug, poor task design, noisy gold, etc.
• Workers have limited recourse for appeal
• Our errors impact real people’s lives
• What is the loss function to optimize?
• Should anyone hold researchers accountable? IRB?
• How do we balance the risk of human harm vs.
the potential benefit if our research succeeds?
Power Asymmetry on MTurk
18
Matt Lease <ml@utexas.edu>
ACM: “Contribute to society and human
well-being; avoid harm to others”
• How do we know who is doing the work, or if a
decision to work (for a given price) is freely made?
• Does it matter if work is performed by
– Political refugees? Children? Prisoners? Disabled?
• What (if any) moral obligation do crowdsourcing
researchers have to consider broader impacts of
our research (either good or bad) on the lives of
those we depend on to power our systems?
Matt Lease <ml@utexas.edu>
19
Who Are We Building a Better Future For?
• “Irani and Silberman (2013)
– “…AMT helps employers see themselves as builders
of innovative technologies, rather than employers
unconcerned with working conditions.”
• Silberman, Irani, and Ross (2010)
– “How should we… conceptualize the role of the
people we ask to power our computing?”
20
Could Effective Human Computation
Sometimes Be a Bad Idea?
• The Googler who Looked at the Worst of the Internet
• Policing the Web’s Lurid Precincts
• Facebook content moderation
• The dirty job of keeping Facebook clean
• Even linguistic annotators report stress &
nightmares from reading news articles!
21
Matt Lease <ml@utexas.edu>
Join the conversation!
Crowdwork-ethics, by Six Silberman
http://crowdwork-ethics.wtf.tw
an informal, occasional blog for researchers
interested in ethical issues in crowd work
22
Matt Lease <ml@utexas.edu>
• Task Design, Language, & Occam’s Razor
• What About the Humans?
• Predicting Annotator Performance
23
Matt Lease <ml@utexas.edu>
Roadmap
Hyun Joon Jung
Quality Control in Crowdsourcing
7/10/2015 24
Crowd workers
Label
Aggregation
Workflow
Design
Worker
Management
Existing Quality Control Methods
Task Design
Who is more accurate?
(worker performance estimation
and prediction)
Requester
Online marketplace
Crowd
workers
Motivation
Matt Lease <ml@utexas.edu>
25
Equally Accurate Workers?
1 0 1 0
7/10/2015 26
1 0 1 0 1 0
0 0 0 0 1 0 1 1 1 1
Alice
Bob
time t
Correctness of the ith task instance
1 -> correct , 0 -> wrong
Accuracy(Alice) = Accuracy(Bob) = 0.5
But should we expect equal work quality in the future?
What if examples are not i.i.d.?
Bob seems to be improving over time.
1: Time-series model
27
Latent Autoregressive
Real observation
Noise Model
Latent variable
𝑦𝑡 = f(𝑥 𝑡)
𝑥𝑡
Temporal correlation
How frequently y has
changed over time
𝜑
Offset
Sign navigates direction
between correct vs. not
𝑐
1 0 1 0
-0.3 0.4 -0.10.8𝑥𝑡
𝑦𝑡
𝑐 φ 𝑐 φ 𝑐 φ𝑐 φ
EM Variant (LAMORE, Park et al. 2014)
Jung et al. Predicting Next Label
Quality: A Time-Series Model of
Crowdwork. AAAI HCOMP 2014.
7/10/2015 28
Integrate multi-dimensional features of a
crowd assessor
Multiple features
Alice
accuracy time
temporal
effect
topic
familiarity
# of
labels
00.7 10.3 0.6 0.8 20
0.6 8.5 0.5 0.2 21 1
0.65 7.5 0.4 0.4 22 0
0.63 11.5 0.3 0.5 23 ?
Predict an assessor’s next label
quality based on a single feature
Alice
0.6
0.5
0.4
0.3
0
1
0
?
temporal
effect
Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015.
2: Modeling More Features
Features
7/10/2015 29
[1] Carterette, B., Soboroff, I.: The effect of assessor error on IR system evaluation. SIGIR ’10
[2] Ipeirotis, P.G., Gabrilovich, E.: Quizz: targeted crowdsourcing with a billion (potential) users. WWW’14
[3] Jung, H., et al.: Predicting Next Label Quality: A Time-Series Model of Crowdwork. HCOMP’14
How do we flexibly capture a wider range of assessor behaviors by
incorporating multi-dimensional features?
[1]
[1]
[2]
[3]
[3]
[3]
Various
accuracy
measures
Task features
Temporal
features
Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015.
Model
7/10/2015 30
Input: X (features for crowd assessor model)
Learning Framework [ ]
Output: Y (likelihood of getting correct label at t)
Generalizable feature-based Assessor Model (GAM)
Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015.
Which Features Matter?
7/10/2015 31
. Prediction performance (MAE) of assessors’ next judgments and corresponding cov
s varying decision rejection options (δ=[0⇠0.25] by 0.05). While theother methodss
cant decreasein coverage, under all thegiven reject options, GAM showsbetter cov
l asprediction performance.
49#
43#
39#
28#
27#
23#
22#
20#
19#
16#
10#
7#
5#
0# 10# 20# 30# 40# 50#
AA#
BA_opt#
BA_PES#
C#
NumLabels#
CurrentLabelQuality#
AccChangeDirecHon#
SA#
Phi#
BA_uni#
TaskTime#
TopicChange#
TopicEverSeen#
Fig.4. Summary of relativefeature importance across 54 regression models.
ases (27), which implicitly indicates that task familiarity affects an assessor’s
A GAM with the only top 5 features shows good performance
(7-10% less than full-featured GAM )
Relative feature importance across 54 individual prediction models.
Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015.
3: Reducing Supervision
Matt Lease <ml@utexas.edu>
32
Jung & Lease. Modeling Temporal Crowd Work Quality with Limited Supervision. HCOMP 2015.
Soft Label Updating & Discounting
Matt Lease <ml@utexas.edu>
33
Soft Label Updating
Matt Lease <ml@utexas.edu>
34
The Future of Crowd Work, CSCW’13
by Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
35
Matt Lease <ml@utexas.edu>
Thank You!
ir.ischool.utexas.eduSlides: www.slideshare.net/mattlease

More Related Content

What's hot

But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Future of learning 20180425 v1
Future of learning 20180425 v1Future of learning 20180425 v1
Future of learning 20180425 v1ISSIP
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
20220203 jim spohrer purdue v12
20220203 jim spohrer purdue v1220220203 jim spohrer purdue v12
20220203 jim spohrer purdue v12ISSIP
 
20210908 jim spohrer naples forum_2021 v1
20210908 jim spohrer naples forum_2021 v120210908 jim spohrer naples forum_2021 v1
20210908 jim spohrer naples forum_2021 v1ISSIP
 
Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6ISSIP
 
K tech santa clara 20131114 v1
K tech santa clara 20131114 v1K tech santa clara 20131114 v1
K tech santa clara 20131114 v1ISSIP
 
Thefutureofcitiesandregions 20200724 v5
Thefutureofcitiesandregions 20200724 v5Thefutureofcitiesandregions 20200724 v5
Thefutureofcitiesandregions 20200724 v5ISSIP
 
20210325 jim spohrer sir rel future_ai v10 copy
20210325 jim spohrer sir rel future_ai v10 copy20210325 jim spohrer sir rel future_ai v10 copy
20210325 jim spohrer sir rel future_ai v10 copyISSIP
 
Japan 20200724 v13
Japan 20200724 v13Japan 20200724 v13
Japan 20200724 v13ISSIP
 
20201209 jim spohrer platform economy v3
20201209 jim spohrer platform economy v320201209 jim spohrer platform economy v3
20201209 jim spohrer platform economy v3ISSIP
 
People's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced PerformancePeople's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced PerformanceMd. Abul Kalam Siddike
 
An Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part IAn Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part IAlessandro Bozzon
 
許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI台灣資料科學年會
 
20201213 jim spohrer icis augmented intelligence v6
20201213 jim spohrer icis augmented intelligence v620201213 jim spohrer icis augmented intelligence v6
20201213 jim spohrer icis augmented intelligence v6ISSIP
 
20210325 jim spohrer future ai v11
20210325 jim spohrer future ai v1120210325 jim spohrer future ai v11
20210325 jim spohrer future ai v11ISSIP
 
Ert 20200420 v11
Ert 20200420 v11Ert 20200420 v11
Ert 20200420 v11ISSIP
 
Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2
Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2
Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2ISSIP
 
Robotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkRobotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkDr. Crispin Coombs
 
20210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v1320210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v13ISSIP
 

What's hot (20)

But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Future of learning 20180425 v1
Future of learning 20180425 v1Future of learning 20180425 v1
Future of learning 20180425 v1
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
20220203 jim spohrer purdue v12
20220203 jim spohrer purdue v1220220203 jim spohrer purdue v12
20220203 jim spohrer purdue v12
 
20210908 jim spohrer naples forum_2021 v1
20210908 jim spohrer naples forum_2021 v120210908 jim spohrer naples forum_2021 v1
20210908 jim spohrer naples forum_2021 v1
 
Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6Korea day1 keynote 20161013 v6
Korea day1 keynote 20161013 v6
 
K tech santa clara 20131114 v1
K tech santa clara 20131114 v1K tech santa clara 20131114 v1
K tech santa clara 20131114 v1
 
Thefutureofcitiesandregions 20200724 v5
Thefutureofcitiesandregions 20200724 v5Thefutureofcitiesandregions 20200724 v5
Thefutureofcitiesandregions 20200724 v5
 
20210325 jim spohrer sir rel future_ai v10 copy
20210325 jim spohrer sir rel future_ai v10 copy20210325 jim spohrer sir rel future_ai v10 copy
20210325 jim spohrer sir rel future_ai v10 copy
 
Japan 20200724 v13
Japan 20200724 v13Japan 20200724 v13
Japan 20200724 v13
 
20201209 jim spohrer platform economy v3
20201209 jim spohrer platform economy v320201209 jim spohrer platform economy v3
20201209 jim spohrer platform economy v3
 
People's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced PerformancePeople's Interactions with Cognitive Assistants for Enhanced Performance
People's Interactions with Cognitive Assistants for Enhanced Performance
 
An Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part IAn Introduction to Human Computation and Games With A Purpose - Part I
An Introduction to Human Computation and Games With A Purpose - Part I
 
許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI許永真/Crowd Computing for Big and Deep AI
許永真/Crowd Computing for Big and Deep AI
 
20201213 jim spohrer icis augmented intelligence v6
20201213 jim spohrer icis augmented intelligence v620201213 jim spohrer icis augmented intelligence v6
20201213 jim spohrer icis augmented intelligence v6
 
20210325 jim spohrer future ai v11
20210325 jim spohrer future ai v1120210325 jim spohrer future ai v11
20210325 jim spohrer future ai v11
 
Ert 20200420 v11
Ert 20200420 v11Ert 20200420 v11
Ert 20200420 v11
 
Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2
Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2
Aaai fs 2017 cog_asst_in_gov_and_psa 20171110 v2
 
Robotisation of Knowledge and Service Work
Robotisation of Knowledge and Service WorkRobotisation of Knowledge and Service Work
Robotisation of Knowledge and Service Work
 
20210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v1320210322 jim spohrer eaae deans summit v13
20210322 jim spohrer eaae deans summit v13
 

Similar to Toward Better Crowdsourcing Science

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMatthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Semiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptxSemiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptxISSIP
 
Semiconductors 20240320 v14 Narayanasamy event.pptx
Semiconductors 20240320 v14 Narayanasamy event.pptxSemiconductors 20240320 v14 Narayanasamy event.pptx
Semiconductors 20240320 v14 Narayanasamy event.pptxISSIP
 
Spohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptxSpohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptxISSIP
 
Seminar 20221027 v4.pptx
Seminar 20221027 v4.pptxSeminar 20221027 v4.pptx
Seminar 20221027 v4.pptxISSIP
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Matthew Lease
 
ICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptxICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptxISSIP
 
Ntegra 20231003 v3.pptx
Ntegra 20231003 v3.pptxNtegra 20231003 v3.pptx
Ntegra 20231003 v3.pptxISSIP
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptxISSIP
 
Spohrer EMAC 20230509 v14.pptx
Spohrer EMAC 20230509 v14.pptxSpohrer EMAC 20230509 v14.pptx
Spohrer EMAC 20230509 v14.pptxISSIP
 
Graphic design and UI efficiency
Graphic design and UI efficiencyGraphic design and UI efficiency
Graphic design and UI efficiencyYury Solonitsyn
 
NHH 20221023 v3.pptx
NHH 20221023 v3.pptxNHH 20221023 v3.pptx
NHH 20221023 v3.pptxISSIP
 
UCSC-SV HCI_Masters 20240308 v13 AI.pptx
UCSC-SV HCI_Masters 20240308 v13 AI.pptxUCSC-SV HCI_Masters 20240308 v13 AI.pptx
UCSC-SV HCI_Masters 20240308 v13 AI.pptxISSIP
 
24 Hours of UX, 2023: Preventing the Future
24 Hours of UX, 2023: Preventing the Future24 Hours of UX, 2023: Preventing the Future
24 Hours of UX, 2023: Preventing the FutureJoshua Randall
 
Classroom to careers in Web Development
Classroom to careers in Web DevelopmentClassroom to careers in Web Development
Classroom to careers in Web DevelopmentDouglas Ng
 
UCSC-SV 20220825 v1.pptx
UCSC-SV 20220825 v1.pptxUCSC-SV 20220825 v1.pptx
UCSC-SV 20220825 v1.pptxISSIP
 

Similar to Toward Better Crowdsourcing Science (20)

The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Metrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-ComputingMetrocon-Rise-Of-Crowd-Computing
Metrocon-Rise-Of-Crowd-Computing
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Semiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptxSemiconductors 20240320 v14 corrected slides.pptx
Semiconductors 20240320 v14 corrected slides.pptx
 
Semiconductors 20240320 v14 Narayanasamy event.pptx
Semiconductors 20240320 v14 Narayanasamy event.pptxSemiconductors 20240320 v14 Narayanasamy event.pptx
Semiconductors 20240320 v14 Narayanasamy event.pptx
 
Spohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptxSpohrer PHD_ICT_KES 20230316 v10.pptx
Spohrer PHD_ICT_KES 20230316 v10.pptx
 
Seminar 20221027 v4.pptx
Seminar 20221027 v4.pptxSeminar 20221027 v4.pptx
Seminar 20221027 v4.pptx
 
Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)Rise of Crowd Computing (December 2012)
Rise of Crowd Computing (December 2012)
 
ICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptxICServ2023 20230914 v8.pptx
ICServ2023 20230914 v8.pptx
 
Ntegra 20231003 v3.pptx
Ntegra 20231003 v3.pptxNtegra 20231003 v3.pptx
Ntegra 20231003 v3.pptx
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
 
Spohrer EMAC 20230509 v14.pptx
Spohrer EMAC 20230509 v14.pptxSpohrer EMAC 20230509 v14.pptx
Spohrer EMAC 20230509 v14.pptx
 
Graphic design and UI efficiency
Graphic design and UI efficiencyGraphic design and UI efficiency
Graphic design and UI efficiency
 
NHH 20221023 v3.pptx
NHH 20221023 v3.pptxNHH 20221023 v3.pptx
NHH 20221023 v3.pptx
 
UCSC-SV HCI_Masters 20240308 v13 AI.pptx
UCSC-SV HCI_Masters 20240308 v13 AI.pptxUCSC-SV HCI_Masters 20240308 v13 AI.pptx
UCSC-SV HCI_Masters 20240308 v13 AI.pptx
 
24 Hours of UX, 2023: Preventing the Future
24 Hours of UX, 2023: Preventing the Future24 Hours of UX, 2023: Preventing the Future
24 Hours of UX, 2023: Preventing the Future
 
Classroom to careers in Web Development
Classroom to careers in Web DevelopmentClassroom to careers in Web Development
Classroom to careers in Web Development
 
UCSC-SV 20220825 v1.pptx
UCSC-SV 20220825 v1.pptxUCSC-SV 20220825 v1.pptx
UCSC-SV 20220825 v1.pptx
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Matthew Lease
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMatthew Lease
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...Matthew Lease
 

More from Matthew Lease (18)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences.
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not Anonymous
 
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Toward Better Crowdsourcing Science

  • 1. Toward Better Crowdsourcing Science (& Predicting Annotator Performance) Matt Lease School of Information University of Texas at Austin ir.ischool.utexas.edu @mattlease ml@utexas.edu Slides: www.slideshare.net/mattlease
  • 2. “The place where people & technology meet” ~ Wobbrock et al., 2009 www.ischools.org
  • 3. The Future of Crowd Work, CSCW’13 by Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 3 Matt Lease <ml@utexas.edu>
  • 4. • Task Design, Language, & Occam’s Razor • What About the Humans? • Predicting Annotator Performance 4 Matt Lease <ml@utexas.edu> Roadmap Hyun Joon Jung
  • 5. • Task Design, Language, & Occam’s Razor • What About the Humans? • Predicting Annotator Performance 5 Matt Lease <ml@utexas.edu> Roadmap
  • 6. A Popular Tale of Crowdsourcing Woe • Heroic ML researcher asks the crowd to perform a simple task • Crowd (invariably) screws it up… • “Aha!” cries the ML researcher, “Fortunately, I know exactly how to solve this problem!” Matt Lease <ml@utexas.edu> 6
  • 8. But why can’t the workers just get it right to begin with? Matt Lease <ml@utexas.edu> 8 Is everyone just lazy, stupid, or deceitful?!? Much of our literature seems to suggest this: • Cheaters • Fraudsters • “Lazy Turkers” • Scammers • Spammers
  • 9. Another story (a parable) “We had a great software interface, but we went out of business because our customers were too stupid to figure out how to use it.” Moral • Even if a user were stupid or lazy, we still lose • By accepting our own responsibility, we create another opportunity to fix the problem… – Cynical view: idiot-proofing Matt Lease <ml@utexas.edu> 9
  • 10. What is our responsibility? • Ill-defined/incomplete/ambiguous/subjective task? • Confusing, difficult, or unusable interface? • Incomplete or unclear instructions? • Insufficient or unhelpful examples given? • Gold standard with low or unknown inter-assessor agreement (i.e. measurement error in assessing response quality)? • Task design matters! (garbage in = garbage out) – Report it for review, completeness, & reproducibility Matt Lease <ml@utexas.edu> 10
  • 11. A Few Simple Suggestions (1 of 2) 1. Make task self-contained: everything the worker needs to know should be visible in-task 2. Short, simple, & clear instructions with examples 3. Avoid domain-specific & advanced terminology; write for typical people (e.g., your mom) 4. Engage worker / avoid boring stuff. If possible, select interesting content for people to work on 5. Always ask for open-ended feedback Matt Lease <ml@utexas.edu> 11 Omar Alonso. Guidelines for Designing Crowdsourcing-based Relevance Experiments. 2009.
  • 12. Suggested Sequencing (2 of 2) 1. Simulate first draft of task with your in-house personnel. Assess, revise, & iterate (ARI) 2. Run task using relatively few workers & examples (ARI) 1. Do workers understand the instructions? 2. How long does it take? Is pay effective & ethical? 3. Replicate results on another dataset (generalization). (ARI) 4. [Optional] qualification test. (ARI) 5. Increase items. Look for boundary items & noisy gold (ARI) 6. Increase # of workers (ARI) Matt Lease <ml@utexas.edu> 12 Omar Alonso. Guidelines for Designing Crowdsourcing-based Relevance Experiments. 2009.
  • 13. Toward Better Crowdsourcing Science Goal: Strengthen individual studies and minimize unwarranted spread of bias in our scientific literature • Occam’s Razor: avoid making assumptions beyond what the data actually tells us (avoid prejudice!) • Enumerate hypotheses for possible causes of low data quality, assess supporting evidence for each hypothesis, and for any claims made, cite supporting evidence • Recognize uncertainty of analyses and convey this via hedge statements such as, “the data suggests that…” • Avoid derogatory language use without very strong supporting evidence. The crowd enables our work!! – Acknowledge your workers! Matt Lease <ml@utexas.edu> 13
  • 14. • Task Design, Language, & Occam’s Razor • What About the Humans? • Predicting Annotator Performance 14 Matt Lease <ml@utexas.edu> Roadmap
  • 15. Who are the workers? • A. Baio, November 2008. The Faces of Mechanical Turk. • P. Ipeirotis. March 2010. The New Demographics of Mechanical Turk • J. Ross, et al. Who are the Crowdworkers? CHI 2010. 15 Matt Lease <ml@utexas.edu>
  • 16. CACM August, 2013 16 Paul Hyman. Communications of the ACM, Vol. 56 No. 8, Pages 19-21, August 2013. Matt Lease <ml@utexas.edu>
  • 17. • “Contribute to society and human well-being” • “Avoid harm to others” “As an ACM member I will – Uphold and promote the principles of this Code – Treat violations of this code as inconsistent with membership in the ACM” 17 Matt Lease <ml@utexas.edu> “Which approaches are less expensive and is this sensible? With the advent of outsourcing and off-shoring these matters become more complex and take on new dimensions …there are often related ethical issues concerning exploitation… “…legal, social, professional and ethical [topics] should feature in all computing degrees.” 2008 ACM/IEEE Curriculum Update
  • 18. • Mistakes are made in HITs rejection, worker blocking – e.g., student error, bug, poor task design, noisy gold, etc. • Workers have limited recourse for appeal • Our errors impact real people’s lives • What is the loss function to optimize? • Should anyone hold researchers accountable? IRB? • How do we balance the risk of human harm vs. the potential benefit if our research succeeds? Power Asymmetry on MTurk 18 Matt Lease <ml@utexas.edu>
  • 19. ACM: “Contribute to society and human well-being; avoid harm to others” • How do we know who is doing the work, or if a decision to work (for a given price) is freely made? • Does it matter if work is performed by – Political refugees? Children? Prisoners? Disabled? • What (if any) moral obligation do crowdsourcing researchers have to consider broader impacts of our research (either good or bad) on the lives of those we depend on to power our systems? Matt Lease <ml@utexas.edu> 19
  • 20. Who Are We Building a Better Future For? • “Irani and Silberman (2013) – “…AMT helps employers see themselves as builders of innovative technologies, rather than employers unconcerned with working conditions.” • Silberman, Irani, and Ross (2010) – “How should we… conceptualize the role of the people we ask to power our computing?” 20
  • 21. Could Effective Human Computation Sometimes Be a Bad Idea? • The Googler who Looked at the Worst of the Internet • Policing the Web’s Lurid Precincts • Facebook content moderation • The dirty job of keeping Facebook clean • Even linguistic annotators report stress & nightmares from reading news articles! 21 Matt Lease <ml@utexas.edu>
  • 22. Join the conversation! Crowdwork-ethics, by Six Silberman http://crowdwork-ethics.wtf.tw an informal, occasional blog for researchers interested in ethical issues in crowd work 22 Matt Lease <ml@utexas.edu>
  • 23. • Task Design, Language, & Occam’s Razor • What About the Humans? • Predicting Annotator Performance 23 Matt Lease <ml@utexas.edu> Roadmap Hyun Joon Jung
  • 24. Quality Control in Crowdsourcing 7/10/2015 24 Crowd workers Label Aggregation Workflow Design Worker Management Existing Quality Control Methods Task Design Who is more accurate? (worker performance estimation and prediction) Requester Online marketplace Crowd workers
  • 26. Equally Accurate Workers? 1 0 1 0 7/10/2015 26 1 0 1 0 1 0 0 0 0 0 1 0 1 1 1 1 Alice Bob time t Correctness of the ith task instance 1 -> correct , 0 -> wrong Accuracy(Alice) = Accuracy(Bob) = 0.5 But should we expect equal work quality in the future? What if examples are not i.i.d.? Bob seems to be improving over time.
  • 27. 1: Time-series model 27 Latent Autoregressive Real observation Noise Model Latent variable 𝑦𝑡 = f(𝑥 𝑡) 𝑥𝑡 Temporal correlation How frequently y has changed over time 𝜑 Offset Sign navigates direction between correct vs. not 𝑐 1 0 1 0 -0.3 0.4 -0.10.8𝑥𝑡 𝑦𝑡 𝑐 φ 𝑐 φ 𝑐 φ𝑐 φ EM Variant (LAMORE, Park et al. 2014) Jung et al. Predicting Next Label Quality: A Time-Series Model of Crowdwork. AAAI HCOMP 2014.
  • 28. 7/10/2015 28 Integrate multi-dimensional features of a crowd assessor Multiple features Alice accuracy time temporal effect topic familiarity # of labels 00.7 10.3 0.6 0.8 20 0.6 8.5 0.5 0.2 21 1 0.65 7.5 0.4 0.4 22 0 0.63 11.5 0.3 0.5 23 ? Predict an assessor’s next label quality based on a single feature Alice 0.6 0.5 0.4 0.3 0 1 0 ? temporal effect Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015. 2: Modeling More Features
  • 29. Features 7/10/2015 29 [1] Carterette, B., Soboroff, I.: The effect of assessor error on IR system evaluation. SIGIR ’10 [2] Ipeirotis, P.G., Gabrilovich, E.: Quizz: targeted crowdsourcing with a billion (potential) users. WWW’14 [3] Jung, H., et al.: Predicting Next Label Quality: A Time-Series Model of Crowdwork. HCOMP’14 How do we flexibly capture a wider range of assessor behaviors by incorporating multi-dimensional features? [1] [1] [2] [3] [3] [3] Various accuracy measures Task features Temporal features Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015.
  • 30. Model 7/10/2015 30 Input: X (features for crowd assessor model) Learning Framework [ ] Output: Y (likelihood of getting correct label at t) Generalizable feature-based Assessor Model (GAM) Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015.
  • 31. Which Features Matter? 7/10/2015 31 . Prediction performance (MAE) of assessors’ next judgments and corresponding cov s varying decision rejection options (δ=[0⇠0.25] by 0.05). While theother methodss cant decreasein coverage, under all thegiven reject options, GAM showsbetter cov l asprediction performance. 49# 43# 39# 28# 27# 23# 22# 20# 19# 16# 10# 7# 5# 0# 10# 20# 30# 40# 50# AA# BA_opt# BA_PES# C# NumLabels# CurrentLabelQuality# AccChangeDirecHon# SA# Phi# BA_uni# TaskTime# TopicChange# TopicEverSeen# Fig.4. Summary of relativefeature importance across 54 regression models. ases (27), which implicitly indicates that task familiarity affects an assessor’s A GAM with the only top 5 features shows good performance (7-10% less than full-featured GAM ) Relative feature importance across 54 individual prediction models. Jung & Lease. A Discriminative Approach to Predicting Assessor Accuracy. ECIR 2015.
  • 32. 3: Reducing Supervision Matt Lease <ml@utexas.edu> 32 Jung & Lease. Modeling Temporal Crowd Work Quality with Limited Supervision. HCOMP 2015.
  • 33. Soft Label Updating & Discounting Matt Lease <ml@utexas.edu> 33
  • 34. Soft Label Updating Matt Lease <ml@utexas.edu> 34
  • 35. The Future of Crowd Work, CSCW’13 by Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton 35 Matt Lease <ml@utexas.edu>