1. Scheduling Human Intelligence
Tasks in Multi-Tenant
Crowd-Powered Systems
Djellel Eddine Difallah, University of Fribourg, CH
Gianluca Demartini, University of Sheffield, UK
Philippe Cudré-Mauroux, University of Fribourg, CH
2. Introduction
• Crowdsourcing relies on a large pool of humans to perform
complex tasks (paid workers, volunteers, players etc)
• A Crowdsourcing platform (e.g., CrowdFlower, Amazon
MTurk) allows requesters to tap into a pool of paid workers
in a shared resources fashion
• Requesters would publish batches of similar tasks to be
completed in exchange of a monetary reward
• Workers can arrive and leave at any point in time and can
selectively focus on an arbitrary subset of the tasks only
2
3. Introduction
Observations
• Few workers perform many tasks, followed by a
long tail of workers performing fewer tasks [Ipeirotis
2010; Franklin et al. 2011]
• Large jobs are fast at the beginning, then they lose
their momentum toward the end [Difallah et al. 2014]
• We suspect that this leads to batches being treated
unequally. (Batch Size, Freshness, Requester,
Price) [Difallah et al. 2015]
3
4. 0.00
0.25
0.50
0.75
1.00
Jan 01 Jan 15 Feb 01 Feb 15 Mar 01 Mar 15 Apr 01
Time (Day)
Count(Normalized)
(a) Batch distribution per Size.
0.00
0.25
0.50
0.75
1.00
Jan 01 Jan 15 Feb 01 Feb 15 Mar 01 Mar 15 Apr 01
Time (Day)
Throughput(Normalized)
(b) Cumulative Throughput per Batch Size.
Introduction
Data Analysis
• Most of the Batches
present on AMT have
10 HITs or less
• The overall platform
throughput is
dominated by larger
batches
Tiny[0,10]
Small[10,100]
Medium[100,1000]
Large[1000,Inf]
4
5. Motivation
The case of Multi-Tenant Crowd-powered Systems (CPS)
• Definition: A CPS serves multiple customers/users (e.g., a
Crowd DBMS)
• The system posts a batch of tasks on the crowdsourcing
platform per user query
• The CPS is in constant competition to attract workers
• With itself — multiple tenants
• With other requesters
• Job starvation is problematic in business applications
5
6. Contributions
• We design a novel crowdsourcing system
architecture that allows job scheduling for a CPS
on top of a traditional crowdsourcing platform
• We devise a scheduling algorithm that embodies
a set of general design requirements
• We empirically evaluate our setup on Amazon
MTurk, with real crowd and a set of scheduling
algorithms
6
7. HIT-Bundle
Definition
• Scheduling requires that
we have control over the
serving process of tasks
• A HIT-Bundle is a batch
that contains
heterogeneous tasks
• All tasks that are generated
by the CPS are published
through the HIT-Bundle HIT-Bundle
Batch 1
Batch 2
Batch 3
Batch 4
7
8. HIT-Bundle
Micro Experiment
• Comparison of
batch execution
time using different
grouping strategies
• Distinct batches
• Combined in a
HIT-Bundle
0
25
50
75
100
0 1000 2000 3000 4000
Time (seconds)
#HITsRemaining
B6 − Bundle
B7 − Bundle
B6
B7
8
9. Proposed CPS
Architecture
Crowdsourcing
Decision Engine
HIT-Bundle Manager
Multi-Tenant
Crowd-Powered System
Crowdsourcing
Platform
Progress
Monitor API
HIT Scheduler
Human
Workers
c1 a1b3..
Queue
Crowdsourcing
App
HIT Collection and Reward
HIT
Results
Aggregator
HIT
Manager
Scheduler
External
HIT
Page
Batch A $$
Batch B $$$
Batch C $
..
Batch Catalog
HIT-Bundle
Creation/Update
Batch Merging
StatusMETA
System
Crowdsourced
queries
Batch Input
Merger
Resource
Tracker
config_file
9
10. Scheduling for the Crowd
Design Guidelines
• (R1) Runtime Scalability: Adopt a runtime scheduler that a)
dynamically adapts to the current availability of the crowd, and b)
scales to make real-time scheduling decisions as the work
demand grows higher
• (R2) Fairness: The scheduler must provide a steady progress to
large requests without blocking or starving, the smaller requests
• (R3) Priority: The scheduler must be sensitive to clients who
have higher priority (e.g., those who pay more)
• (R4) Human Aware: Unlike machines, people performances are
impacted by many factors including context switching, training
effects, boringness, task difficulty and interestingness
10
11. (Weighted) Fair Scheduler
• Fair Scheduling FS (R1) (R2):
• Keep track of how many tasks per
batch are currently assigned
running_tasks
• Assign task with min running_tasks
• The Weighted Fair Sharing WFS variant
(R3):
• Compute a weight, based on priority
(e.g., price)
• weight(Bj) = p(Bj)/sum(p(B))
• Assign task with
min running_tasks/weight
• Pros. ensures that all the batches receive
proportional number of workers available
• Cons. We don’t satisfy (R4) Human
Awareness
HIT-Bundle
7 tasks running
1. get_task()
FS: return( )
WFS: return( )
2.
p=0.1$ w= 0.5
p=0.05$ w= 0.25
p=0.05$ w= 0.25
11
12. Worker Context Switch
Micro Experiment
• We run a HIT Bundle with
heterogenous tasks
• Compute average execution
time for each HIT
• RR: Round Robin, task type
changes every time
• SEQ10 / SEQ25: Task types
are alternated every 10,
respectively 25 tasks
• The mean task execution time
is significantly lower for SEQ25
●
●
●
●
●
●
●
●
●
** (p−value=0.023)** (p−value=0.023)
20
40
60
RR SEQ10 SEQ25
Experiment Type
ExecutiontimeperHIT(Seconds)
RR SEQ10 SEQ25
12
13. Worker Conscious Fair
Scheduling WCFS
• Goal: Reduce the context switch introduced by having
the worker continuously switch tasks types
• We modify Fair Sharing with Delayed Scheduling [Zaharia
et al. 2010]
• A task will give up its priority K times until a worker
who just completed a similar task is available again
• Pros. we satisfy all our design requirements. A worker
receives longer sequences of similar tasks
• Cons. Need to set K
13
14. Experiments
Controlled Setup
• On Amazon Mechanical Turk (no simulations)
• HIT-Bundle with 5 different task types
• We artificially ensure that we have num_workers
>10 before starting an experiment
• We compare against basic schedulers First In First
Out (FIFO), Round Robin (RR), Shortest Job First
(SJF)
14
15. Controlled Experiments
Latency
All experiment are run in parallel
FIFO order [B1, B2, B3, B4, B5]
SJF order [B4, B3, B5, B2, B1] based on
previous evidence
• FIFO finishes jobs one after the other
• Wile SJF finishes the shortest jobs
first
• FS and RR offer a balanced
workforce
0
500
1000
1500
2000
B1 B2 B3 B4 B5
Batch
Time(Seconds)
FIFO FS RR SJF
(a) Batch Latency
0
500
1000
1500
2000
FIFO FS RR SJF
Scheduling Scheme
Time(Seconds)
(b) Overall Experiment Latency
15
16. 0
300
600
900
B1 B2 B3 B4 B5
Batch
Time(seconds)
B2:$0.02
B2:$0.05
(a)Vary The Price
0
250
500
750
1000
B1 B2 B3 B4 B5
Batch
Time(seconds)
10 workers
20 workers
(b) Vary The Workforce
Experiments
Varying the Control Factors
Weighted Fair Scheduler is used
• (a) Effect of increasing B2’s
priority (Price) on batch
execution time
• B2 executes faster
• (b) Effect of varying the number
of crowd workers involved in the
completion of the HIT batches
• The load is rebalanced (albeit,
with different proportions) but
all batches had a speed
increase
16
17. Experiments in the Wild
Execution Trace
0
10
20
30
0
10
20
30
0
10
20
30
FSIndividualBatchesWCFS
12:20 12:30 12:40 12:50
Time
#ActiveWorkers
18. Conclusions
• Batch starvation in crowdsourcing is problematic for requesters
• We introduce a new scheduling layer that shares a pool of crowd
workers among multiple tenants of a crowd-powered system
• We perform evaluations in a real setup with real workers
• We show that an HIT-Bundle increases the overall throughput
• Our technique (Worker Conscious Fair Sharing), inspired from
large scale data processing frameworks, minimises context switch
• Toward Service Level Agreement aware scheduling for
crowdsourcing platforms.
Code: https://github.com/XI-lab/HIT-Scheduler