2. DEFINITIONS
2
Crowdsourcing: The practice of obtaining information or
input into a task or project by enlisting the services of a large
number of people, either paid or unpaid, typically via the
Internet.
Human-based computation: a computer science technique
in which a machine performs its function by outsourcing
certain steps to humans, usually as microwork
3. MARIE-JEAN-ANTOINE-NICOLAS DE CARITAT,
MARQUIS DE CONDORCET (1743-1794)
- French philosopher of the Enlightenment
and advocate of public education and
women rights (among lots of other
things)
- Éléments du calcul des probabilités, et
son application aux jeux de hasard, à la
loterie et aux jugements des hommes.
- “Jury theorem”
!3
4. SIR FRANCIS GALTON
(1822-1911)
Expert in pretty much everything
- Statistician, sociologist, psychologist, anthropologist,
eugenicist, tropical explorer, geographer,
meteorologist, psychometrician, and cake-cutter
- Created the statistical concept of correlation.
- introduced the use of questionnaires and surveys for
collecting data on human communities
- As the initiator of scientific meteorology, devised the
first weather map
- (Was the first to apply statistical methods to the
study of human differences and inheritance of
intelligence)
!4
10. HOW WOULDYOU SOLVETHIS
10
Greg Little, Lydia B. Chilton, Robert C. Miller, and Max Goldman
TurKit: Tools for Iterative Tasks on Mechanical Turk
HComp 2009
11. 11
Greg Little, Lydia B. Chilton, Robert C. Miller, and Max
Goldman
TurKit: Tools for Iterative Tasks on Mechanical Turk
HComp 2009
14. REST API
to People
Create task
Run batch*
Monitor
Results
Pay
Platform for human computations. But: how to program it? How to limit
recourse to (expensive) humans? how do we make their work more efficient?
16. Systematic Literature reviews (SLR)
Process
Prevalence of antepartum hemorrhage in women with placenta previa: a systematic review and
meta-analysis. Dazhi Fan, Song Wu, Li Liu, Qing Xia,Wen Wang, Xiaoling Guo & Zhengping Liu.
Scientific Reports volume 7,Article number: 40320 (2017)
1. Study on adults 75 and older
2. Involves the use of interaction technology
3. Is an “intervention” (alternatively: RCT)
16
17. USEFUL BUT PAINFUL…
- Millions of papers published every year
- About half of them is never cited (not even by the authors)
- Incomplete (40-70% of missing papers!)
- From idea to submission: typically 9 to 36 months
- Query repeated multiple times (6-30 months apart), sometimes 60
- ~1/3 abandoned
17
Perrine Créquit, Ludovic Trinquart, Amélie Yavchitz, and Philippe Ravaud. 2016.
Wasted research when systematic reviews fail to provide a complete and up-to- date evidence
synthesis: the example of lung cancer. BMC Medicine 14, 1 (2016), 8.
20. !20
Trained ML models
CAN WE DO BETTER? CAN MACHINE LEARNING HELP?
• Help in screening (keep the same search+filter process but improve it)
• Help in finding (different process), or Live SLR
Crowdsourcing
Model training Trained ML models
21. ON RCT
21
Wallace et al
Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach
Jamia, 2017
predicted probability of being an
RCT of ≤0.1
specificity of 99.8% and an overall
recall of 98%
22. 3 OPTIONS SO FAR
• Expert analysis: the typical approach today (painful, slow, and
expensive even if you don’t notice it)
• Crowdsourcing: Works well: speed, diversity, quality… but at a cost
• For scientists and experts. Hard to use it.
• Machine Learning and Classification: Label, Train, Classify
• Works great only in some cases: fairly “easy” problem, very large pool
!22
24. APPLICABILITY
• Finite pool, uniqueness of the problem: Not enough items to
train
• Can’t get ML to the precision we need
• Or, we can, but it takes time and in the meantime we initially
leverage crowd heavily, then progressively less (e.g. crisis
situations)
!24
25. ML, THEN CROWD WHEN IN DOUBT
ML AlgorithmGet training data
Train algorithms
Apply: machine first,
then (maybe) crowd
William Callaghan et al. MechanicalHeart: A Human-Machine Framework for the Classification of Phonocardiograms
CSCW 2018 !25
Trained ML
models
Trained ML
models
Works with weak algorithms for classification problems
(as long as confidence estimate is accurate)
26. 26
when the crowd is more confident than the
machine in the classification of a given instance,
they are most often correct.
Works well only if we take machine input when it
is very confident
William Callaghan et al. MechanicalHeart: A Human-Machine Framework for the Classification of Phonocardiograms
CSCW 2018
A “sprinkle” of ML helps
27. ML AS ASSISTANT THAT BIASES OUR THINKING
ML AlgorithmGet training data
Train algorithms
!27
Trained ML
models
Trained ML
models
Apply: machine sets a
prior, crowd
Krivosheev et al. Combining Crowd and Machines for Multi-predicate Item Screening
CSCW 2018
P (class | votes) = P (votes | class) * p(class) / p(votes)
Impact on redundancy - always ask crowd
28. 28
- Works with weak algorithms for classification problems
- “sprinkle of crowd” makes it right
29. EMBED CROWDS INSIDE MACHINE LEARNING
ARCHITECTURES
- Explore feature spaces that are largely unreachable by automatic
extraction,
- Train models that use human-understandable features
Cheng and Bernstein. Flock: Hybrid Crowd-Machine Learning Classifiers. CSCW2015
!29
31. 31
.1 improvement in ROC AUC
hybrid here used as features,
classification is automatic
Cheng and Bernstein. Flock: Hybrid Crowd-Machine Learning Classifiers. CSCW2015
Outliers are important
32. CROWD HELPS MACHINES HELP CROWD
• Bias the crowd to obtain better and faster (cheaper) responses
!32
33.
34. 34
Ramirez et al. Influencing workers: The case of human-machine collaboration
(in progress)
35. 35
Ramirez et al. Influencing workers: The case of human-machine collaboration
(in progress)
37. GENERAL FINITE POOL PROBLEM
• No clear idea on how well ML can do
• No clear idea on how well crowd can do (not to talk about task design)
• Limited items and limited budget: how to spend it?
• Kind of a meta-active learning problem, where in addition we have to learn how to learn
!37
38. SMALL STEPS: ACTIVE HYBRID LEARNING
• Given a set of hotel descriptions, find hotels that are kids-friendly
and that are near Macquire
• We have a ML algorithm given, and a crowd or hybrid classifier
• It is a learning vs exploitation trade-off.
!38
39. ACTIVE HYBRID LEARNING
Restricted version of the general problem
1. Mange trade-off between labelling items to learn vs labeling to classify
2. Actively learn if favour ML or crowd, and then perform active sampling
!39
MAB or RL problem
45. PROCESS
- Open call
- Training materials (on SLRs in general, and SLRs on related topics)
- Screening task (acts also as selection filter)
- Paper assignment - full paper screening (also act as filter)
- Paper reading and “guided” paper summarization (with redundancy
and metadata extraction)
- Peer “grading” (positive, like-style)
- Definition of dimensions for analysis (separate subgroups)
- Selection of group leaders (also based on volunteering)
- Brainstorming in video call with PI and group leaders, each
presenting dimensions
- Second iteration
- Revisiting summaries of papers based on dimensions and filling of
tables
- Cross-check tables
45
46. ASSISTED TASK DESIGN
- How to define a task
- How to train
- How (much) to test
- Pricing
- Stopping
- Optimizing task assignment to workers
- Finding task design errors early
- => Assist in design for creative work
48. SUMMING UP…
• Combining human and machine computation has incredible potential for solving a
variety of tasks
• Get results immediately, while improving ml
• Crisis situations
• Novel versions of old problems (from SLRs to fake news to criminal activities)
• Continuously check and improve areas where ML is weak, even with human-suggested
features
• Nothing of this is actually restricted to “crowd” - works with experts as well
• Move towards systems that do not require expertise, meaning, the average knowledge
worker can use it
!48