Crowd scheduling www2016

eXascale Infolab
eXascale InfolabeXascale Infolab
Scheduling Human Intelligence
Tasks in Multi-Tenant
Crowd-Powered Systems
Djellel Eddine Difallah, University of Fribourg, CH
Gianluca Demartini, University of Sheffield, UK
Philippe Cudré-Mauroux, University of Fribourg, CH
Introduction
• Crowdsourcing relies on a large pool of humans to perform
complex tasks (paid workers, volunteers, players etc)
• A Crowdsourcing platform (e.g., CrowdFlower, Amazon
MTurk) allows requesters to tap into a pool of paid workers
in a shared resources fashion
• Requesters would publish batches of similar tasks to be
completed in exchange of a monetary reward
• Workers can arrive and leave at any point in time and can
selectively focus on an arbitrary subset of the tasks only
2
Introduction
Observations
• Few workers perform many tasks, followed by a
long tail of workers performing fewer tasks [Ipeirotis
2010; Franklin et al. 2011]
• Large jobs are fast at the beginning, then they lose
their momentum toward the end [Difallah et al. 2014]
• We suspect that this leads to batches being treated
unequally. (Batch Size, Freshness, Requester,
Price) [Difallah et al. 2015]
3
0.00
0.25
0.50
0.75
1.00
Jan 01 Jan 15 Feb 01 Feb 15 Mar 01 Mar 15 Apr 01
Time (Day)
Count(Normalized)
(a) Batch distribution per Size.
0.00
0.25
0.50
0.75
1.00
Jan 01 Jan 15 Feb 01 Feb 15 Mar 01 Mar 15 Apr 01
Time (Day)
Throughput(Normalized)
(b) Cumulative Throughput per Batch Size.
Introduction
Data Analysis
• Most of the Batches
present on AMT have
10 HITs or less
• The overall platform
throughput is
dominated by larger
batches
Tiny[0,10]
Small[10,100]
Medium[100,1000]
Large[1000,Inf]
4
Motivation
The case of Multi-Tenant Crowd-powered Systems (CPS)
• Definition: A CPS serves multiple customers/users (e.g., a
Crowd DBMS)
• The system posts a batch of tasks on the crowdsourcing
platform per user query
• The CPS is in constant competition to attract workers
• With itself — multiple tenants
• With other requesters
• Job starvation is problematic in business applications
5
Contributions
• We design a novel crowdsourcing system
architecture that allows job scheduling for a CPS
on top of a traditional crowdsourcing platform
• We devise a scheduling algorithm that embodies
a set of general design requirements
• We empirically evaluate our setup on Amazon
MTurk, with real crowd and a set of scheduling
algorithms
6
HIT-Bundle
Definition
• Scheduling requires that
we have control over the
serving process of tasks
• A HIT-Bundle is a batch
that contains
heterogeneous tasks
• All tasks that are generated
by the CPS are published
through the HIT-Bundle HIT-Bundle
Batch 1
Batch 2
Batch 3
Batch 4
7
HIT-Bundle
Micro Experiment
• Comparison of
batch execution
time using different
grouping strategies
• Distinct batches
• Combined in a
HIT-Bundle
0
25
50
75
100
0 1000 2000 3000 4000
Time (seconds)
#HITsRemaining
B6 − Bundle
B7 − Bundle
B6
B7
8
Proposed CPS
Architecture
Crowdsourcing
Decision Engine
HIT-Bundle Manager
Multi-Tenant
Crowd-Powered System
Crowdsourcing
Platform
Progress
Monitor API
HIT Scheduler
Human
Workers
c1 a1b3..
Queue
Crowdsourcing
App
HIT Collection and Reward
HIT
Results
Aggregator
HIT
Manager
Scheduler
External
HIT
Page
Batch A $$
Batch B $$$
Batch C $
..
Batch Catalog
HIT-Bundle
Creation/Update
Batch Merging
StatusMETA
System
Crowdsourced
queries
Batch Input
Merger
Resource
Tracker
config_file
9
Scheduling for the Crowd
Design Guidelines
• (R1) Runtime Scalability: Adopt a runtime scheduler that a)
dynamically adapts to the current availability of the crowd, and b)
scales to make real-time scheduling decisions as the work
demand grows higher
• (R2) Fairness: The scheduler must provide a steady progress to
large requests without blocking or starving, the smaller requests
• (R3) Priority: The scheduler must be sensitive to clients who
have higher priority (e.g., those who pay more)
• (R4) Human Aware: Unlike machines, people performances are
impacted by many factors including context switching, training
effects, boringness, task difficulty and interestingness
10
(Weighted) Fair Scheduler
• Fair Scheduling FS (R1) (R2):
• Keep track of how many tasks per
batch are currently assigned
running_tasks
• Assign task with min running_tasks
• The Weighted Fair Sharing WFS variant
(R3):
• Compute a weight, based on priority
(e.g., price)
• weight(Bj) = p(Bj)/sum(p(B))
• Assign task with

min running_tasks/weight
• Pros. ensures that all the batches receive
proportional number of workers available
• Cons. We don’t satisfy (R4) Human
Awareness
HIT-Bundle
7 tasks running
1. get_task()
FS: return( )
WFS: return( )
2.
p=0.1$ w= 0.5
p=0.05$ w= 0.25
p=0.05$ w= 0.25
11
Worker Context Switch
Micro Experiment
• We run a HIT Bundle with
heterogenous tasks
• Compute average execution
time for each HIT
• RR: Round Robin, task type
changes every time
• SEQ10 / SEQ25: Task types
are alternated every 10,
respectively 25 tasks
• The mean task execution time
is significantly lower for SEQ25
●
●
●
●
●
●
●
●
●
** (p−value=0.023)** (p−value=0.023)
20
40
60
RR SEQ10 SEQ25
Experiment Type
ExecutiontimeperHIT(Seconds)
RR SEQ10 SEQ25
12
Worker Conscious Fair
Scheduling WCFS
• Goal: Reduce the context switch introduced by having
the worker continuously switch tasks types
• We modify Fair Sharing with Delayed Scheduling [Zaharia
et al. 2010]
• A task will give up its priority K times until a worker
who just completed a similar task is available again
• Pros. we satisfy all our design requirements. A worker
receives longer sequences of similar tasks
• Cons. Need to set K
13
Experiments
Controlled Setup
• On Amazon Mechanical Turk (no simulations)
• HIT-Bundle with 5 different task types
• We artificially ensure that we have num_workers
>10 before starting an experiment
• We compare against basic schedulers First In First
Out (FIFO), Round Robin (RR), Shortest Job First
(SJF)
14
Controlled Experiments
Latency
All experiment are run in parallel
FIFO order [B1, B2, B3, B4, B5]
SJF order [B4, B3, B5, B2, B1] based on
previous evidence
• FIFO finishes jobs one after the other
• Wile SJF finishes the shortest jobs
first
• FS and RR offer a balanced
workforce
0
500
1000
1500
2000
B1 B2 B3 B4 B5
Batch
Time(Seconds)
FIFO FS RR SJF
(a) Batch Latency
0
500
1000
1500
2000
FIFO FS RR SJF
Scheduling Scheme
Time(Seconds)
(b) Overall Experiment Latency
15
0
300
600
900
B1 B2 B3 B4 B5
Batch
Time(seconds)
B2:$0.02
B2:$0.05
(a)Vary The Price
0
250
500
750
1000
B1 B2 B3 B4 B5
Batch
Time(seconds)
10 workers
20 workers
(b) Vary The Workforce
Experiments
Varying the Control Factors
Weighted Fair Scheduler is used
• (a) Effect of increasing B2’s
priority (Price) on batch
execution time
• B2 executes faster
• (b) Effect of varying the number
of crowd workers involved in the
completion of the HIT batches
• The load is rebalanced (albeit,
with different proportions) but
all batches had a speed
increase
16
Experiments in the Wild
Execution Trace
0
10
20
30
0
10
20
30
0
10
20
30
FSIndividualBatchesWCFS
12:20 12:30 12:40 12:50
Time
#ActiveWorkers
Conclusions
• Batch starvation in crowdsourcing is problematic for requesters
• We introduce a new scheduling layer that shares a pool of crowd
workers among multiple tenants of a crowd-powered system
• We perform evaluations in a real setup with real workers
• We show that an HIT-Bundle increases the overall throughput
• Our technique (Worker Conscious Fair Sharing), inspired from
large scale data processing frameworks, minimises context switch
• Toward Service Level Agreement aware scheduling for
crowdsourcing platforms.
Code: https://github.com/XI-lab/HIT-Scheduler
1 of 18

Recommended

The Dynamics of Micro-Task Crowdsourcing by
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
1.6K views41 slides
ODSC 2019: Sessionisation via stochastic periods for root event identification by
ODSC 2019: Sessionisation via stochastic periods for root event identificationODSC 2019: Sessionisation via stochastic periods for root event identification
ODSC 2019: Sessionisation via stochastic periods for root event identificationKuldeep Jiwani
300 views94 slides
Message broadcasts and logical time of process by
Message broadcasts and logical time of processMessage broadcasts and logical time of process
Message broadcasts and logical time of processJawid Ahmad Baktash
345 views10 slides
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup by
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar
95 views19 slides
Azure Stream Analytics Project : On-demand real-time analytics by
Azure Stream Analytics Project : On-demand real-time analyticsAzure Stream Analytics Project : On-demand real-time analytics
Azure Stream Analytics Project : On-demand real-time analyticsLamprini Koutsokera
683 views60 slides
MongoDB Project: Relational databases to Document-Oriented databases by
MongoDB Project: Relational databases to Document-Oriented databasesMongoDB Project: Relational databases to Document-Oriented databases
MongoDB Project: Relational databases to Document-Oriented databasesLamprini Koutsokera
154 views31 slides

More Related Content

Similar to Crowd scheduling www2016

02 performance by
02 performance02 performance
02 performancemarangburu42
284 views20 slides
Operating Systems Process Scheduling Algorithms by
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithmssathish sak
5.2K views47 slides
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:... by
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...IRJET Journal
34 views9 slides
Scheduling by
SchedulingScheduling
Schedulingsachin kumar sharma
260 views63 slides
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads by
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning WorkloadsHeterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning WorkloadsDatabricks
147 views33 slides
Scheduling and sequencing by
Scheduling and sequencingScheduling and sequencing
Scheduling and sequencingAkanksha Gupta
72.5K views34 slides

Similar to Crowd scheduling www2016(20)

Operating Systems Process Scheduling Algorithms by sathish sak
Operating Systems   Process Scheduling AlgorithmsOperating Systems   Process Scheduling Algorithms
Operating Systems Process Scheduling Algorithms
sathish sak5.2K views
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:... by IRJET Journal
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...
Service Request Scheduling in Cloud Computing using Meta-Heuristic Technique:...
IRJET Journal34 views
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads by Databricks
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning WorkloadsHeterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Databricks147 views
Scheduling and sequencing by Akanksha Gupta
Scheduling and sequencingScheduling and sequencing
Scheduling and sequencing
Akanksha Gupta72.5K views
Lecture7-QuantitativeAnalysis2.pptx by ssuser0d0f881
Lecture7-QuantitativeAnalysis2.pptxLecture7-QuantitativeAnalysis2.pptx
Lecture7-QuantitativeAnalysis2.pptx
ssuser0d0f88119 views
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS by vtunotesbysree
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERSVTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
VTU 5TH SEM CSE OPERATING SYSTEMS SOLVED PAPERS
vtunotesbysree20.7K views
Parallel Computing - Lec 6 by Shah Zaib
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
Shah Zaib32 views
Performance Testing Java Applications by C4Media
Performance Testing Java ApplicationsPerformance Testing Java Applications
Performance Testing Java Applications
C4Media5.7K views
Job Queues Overview by joeyrobert
Job Queues OverviewJob Queues Overview
Job Queues Overview
joeyrobert3.9K views
Operating System Lab Manual by Bilal Mirza
Operating System Lab ManualOperating System Lab Manual
Operating System Lab Manual
Bilal Mirza5K views
Product layout in Food Industry and Line Balancing by Abhishek Thakur
Product layout in Food Industry and Line BalancingProduct layout in Food Industry and Line Balancing
Product layout in Food Industry and Line Balancing
Abhishek Thakur7.8K views
Review Of System Properties And Type Of Tasks by Liz Bundren
Review Of System Properties And Type Of TasksReview Of System Properties And Type Of Tasks
Review Of System Properties And Type Of Tasks
Liz Bundren2 views
Comparision of different Round Robin Scheduling Algorithm using Dynamic Time ... by Editor IJMTER
Comparision of different Round Robin Scheduling Algorithm using Dynamic Time ...Comparision of different Round Robin Scheduling Algorithm using Dynamic Time ...
Comparision of different Round Robin Scheduling Algorithm using Dynamic Time ...
Editor IJMTER666 views
Operations Research_18ME735_module 5 sequencing notes.pdf by RoopaDNDandally
Operations Research_18ME735_module 5 sequencing notes.pdfOperations Research_18ME735_module 5 sequencing notes.pdf
Operations Research_18ME735_module 5 sequencing notes.pdf
RoopaDNDandally22 views
Nt1330 Final Paper by Traci Webb
Nt1330 Final PaperNt1330 Final Paper
Nt1330 Final Paper
Traci Webb5 views
Operations Management : Line Balancing by Rohan Bharaj
Operations Management : Line BalancingOperations Management : Line Balancing
Operations Management : Line Balancing
Rohan Bharaj33.2K views

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction by
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
287 views30 slides
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S... by
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
167 views16 slides
Representation Learning on Complex Graphs by
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
539 views33 slides
A force directed approach for offline gps trajectory map by
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
459 views12 slides
Cikm 2018 by
Cikm 2018Cikm 2018
Cikm 2018eXascale Infolab
871 views18 slides
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit... by
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
787 views20 slides

More from eXascale Infolab(20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction by eXascale Infolab
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
eXascale Infolab287 views
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S... by eXascale Infolab
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
eXascale Infolab167 views
Representation Learning on Complex Graphs by eXascale Infolab
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab539 views
A force directed approach for offline gps trajectory map by eXascale Infolab
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
eXascale Infolab459 views
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit... by eXascale Infolab
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab787 views
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous... by eXascale Infolab
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab1.2K views
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans by eXascale Infolab
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
eXascale Infolab687 views
SANAPHOR: Ontology-based Coreference Resolution by eXascale Infolab
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
eXascale Infolab1.1K views
Efficient, Scalable, and Provenance-Aware Management of Linked Data by eXascale Infolab
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
eXascale Infolab713 views
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data by eXascale Infolab
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
eXascale Infolab4K views
Executing Provenance-Enabled Queries over Web Data by eXascale Infolab
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
eXascale Infolab1.5K views
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu... by eXascale Infolab
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
eXascale Infolab3.1K views
CIKM14: Fixing grammatical errors by preposition ranking by eXascale Infolab
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab1.7K views
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series) by eXascale Infolab
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
eXascale Infolab663 views

Recently uploaded

MOSORE_BRESCIA by
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIAFederico Karagulian
5 views8 slides
ColonyOS by
ColonyOSColonyOS
ColonyOSJohanKristiansson6
9 views17 slides
PTicketInput.pdf by
PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdfstuartmcphersonflipm
376 views1 slide
Advanced_Recommendation_Systems_Presentation.pptx by
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptxneeharikasingh29
5 views9 slides
Understanding Hallucinations in LLMs - 2023 09 29.pptx by
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxGreg Makowski
13 views18 slides
Cross-network in Google Analytics 4.pdf by
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdfGA4 Tutorials
6 views7 slides

Recently uploaded(20)

Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx by Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski13 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials10 views
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023 by StatsCommunications
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023
Launch of the Knowledge Exchange Platform - Romina Boarini - 21 November 2023
Supercharging your Data with Azure AI Search and Azure OpenAI by Peter Gallagher
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher37 views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
RuleBookForTheFairDataEconomy.pptx by noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 views
Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann109 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann100 views
Data structure and algorithm. by Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 18 views
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf by vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 views

Crowd scheduling www2016

  • 1. Scheduling Human Intelligence Tasks in Multi-Tenant Crowd-Powered Systems Djellel Eddine Difallah, University of Fribourg, CH Gianluca Demartini, University of Sheffield, UK Philippe Cudré-Mauroux, University of Fribourg, CH
  • 2. Introduction • Crowdsourcing relies on a large pool of humans to perform complex tasks (paid workers, volunteers, players etc) • A Crowdsourcing platform (e.g., CrowdFlower, Amazon MTurk) allows requesters to tap into a pool of paid workers in a shared resources fashion • Requesters would publish batches of similar tasks to be completed in exchange of a monetary reward • Workers can arrive and leave at any point in time and can selectively focus on an arbitrary subset of the tasks only 2
  • 3. Introduction Observations • Few workers perform many tasks, followed by a long tail of workers performing fewer tasks [Ipeirotis 2010; Franklin et al. 2011] • Large jobs are fast at the beginning, then they lose their momentum toward the end [Difallah et al. 2014] • We suspect that this leads to batches being treated unequally. (Batch Size, Freshness, Requester, Price) [Difallah et al. 2015] 3
  • 4. 0.00 0.25 0.50 0.75 1.00 Jan 01 Jan 15 Feb 01 Feb 15 Mar 01 Mar 15 Apr 01 Time (Day) Count(Normalized) (a) Batch distribution per Size. 0.00 0.25 0.50 0.75 1.00 Jan 01 Jan 15 Feb 01 Feb 15 Mar 01 Mar 15 Apr 01 Time (Day) Throughput(Normalized) (b) Cumulative Throughput per Batch Size. Introduction Data Analysis • Most of the Batches present on AMT have 10 HITs or less • The overall platform throughput is dominated by larger batches Tiny[0,10] Small[10,100] Medium[100,1000] Large[1000,Inf] 4
  • 5. Motivation The case of Multi-Tenant Crowd-powered Systems (CPS) • Definition: A CPS serves multiple customers/users (e.g., a Crowd DBMS) • The system posts a batch of tasks on the crowdsourcing platform per user query • The CPS is in constant competition to attract workers • With itself — multiple tenants • With other requesters • Job starvation is problematic in business applications 5
  • 6. Contributions • We design a novel crowdsourcing system architecture that allows job scheduling for a CPS on top of a traditional crowdsourcing platform • We devise a scheduling algorithm that embodies a set of general design requirements • We empirically evaluate our setup on Amazon MTurk, with real crowd and a set of scheduling algorithms 6
  • 7. HIT-Bundle Definition • Scheduling requires that we have control over the serving process of tasks • A HIT-Bundle is a batch that contains heterogeneous tasks • All tasks that are generated by the CPS are published through the HIT-Bundle HIT-Bundle Batch 1 Batch 2 Batch 3 Batch 4 7
  • 8. HIT-Bundle Micro Experiment • Comparison of batch execution time using different grouping strategies • Distinct batches • Combined in a HIT-Bundle 0 25 50 75 100 0 1000 2000 3000 4000 Time (seconds) #HITsRemaining B6 − Bundle B7 − Bundle B6 B7 8
  • 9. Proposed CPS Architecture Crowdsourcing Decision Engine HIT-Bundle Manager Multi-Tenant Crowd-Powered System Crowdsourcing Platform Progress Monitor API HIT Scheduler Human Workers c1 a1b3.. Queue Crowdsourcing App HIT Collection and Reward HIT Results Aggregator HIT Manager Scheduler External HIT Page Batch A $$ Batch B $$$ Batch C $ .. Batch Catalog HIT-Bundle Creation/Update Batch Merging StatusMETA System Crowdsourced queries Batch Input Merger Resource Tracker config_file 9
  • 10. Scheduling for the Crowd Design Guidelines • (R1) Runtime Scalability: Adopt a runtime scheduler that a) dynamically adapts to the current availability of the crowd, and b) scales to make real-time scheduling decisions as the work demand grows higher • (R2) Fairness: The scheduler must provide a steady progress to large requests without blocking or starving, the smaller requests • (R3) Priority: The scheduler must be sensitive to clients who have higher priority (e.g., those who pay more) • (R4) Human Aware: Unlike machines, people performances are impacted by many factors including context switching, training effects, boringness, task difficulty and interestingness 10
  • 11. (Weighted) Fair Scheduler • Fair Scheduling FS (R1) (R2): • Keep track of how many tasks per batch are currently assigned running_tasks • Assign task with min running_tasks • The Weighted Fair Sharing WFS variant (R3): • Compute a weight, based on priority (e.g., price) • weight(Bj) = p(Bj)/sum(p(B)) • Assign task with
 min running_tasks/weight • Pros. ensures that all the batches receive proportional number of workers available • Cons. We don’t satisfy (R4) Human Awareness HIT-Bundle 7 tasks running 1. get_task() FS: return( ) WFS: return( ) 2. p=0.1$ w= 0.5 p=0.05$ w= 0.25 p=0.05$ w= 0.25 11
  • 12. Worker Context Switch Micro Experiment • We run a HIT Bundle with heterogenous tasks • Compute average execution time for each HIT • RR: Round Robin, task type changes every time • SEQ10 / SEQ25: Task types are alternated every 10, respectively 25 tasks • The mean task execution time is significantly lower for SEQ25 ● ● ● ● ● ● ● ● ● ** (p−value=0.023)** (p−value=0.023) 20 40 60 RR SEQ10 SEQ25 Experiment Type ExecutiontimeperHIT(Seconds) RR SEQ10 SEQ25 12
  • 13. Worker Conscious Fair Scheduling WCFS • Goal: Reduce the context switch introduced by having the worker continuously switch tasks types • We modify Fair Sharing with Delayed Scheduling [Zaharia et al. 2010] • A task will give up its priority K times until a worker who just completed a similar task is available again • Pros. we satisfy all our design requirements. A worker receives longer sequences of similar tasks • Cons. Need to set K 13
  • 14. Experiments Controlled Setup • On Amazon Mechanical Turk (no simulations) • HIT-Bundle with 5 different task types • We artificially ensure that we have num_workers >10 before starting an experiment • We compare against basic schedulers First In First Out (FIFO), Round Robin (RR), Shortest Job First (SJF) 14
  • 15. Controlled Experiments Latency All experiment are run in parallel FIFO order [B1, B2, B3, B4, B5] SJF order [B4, B3, B5, B2, B1] based on previous evidence • FIFO finishes jobs one after the other • Wile SJF finishes the shortest jobs first • FS and RR offer a balanced workforce 0 500 1000 1500 2000 B1 B2 B3 B4 B5 Batch Time(Seconds) FIFO FS RR SJF (a) Batch Latency 0 500 1000 1500 2000 FIFO FS RR SJF Scheduling Scheme Time(Seconds) (b) Overall Experiment Latency 15
  • 16. 0 300 600 900 B1 B2 B3 B4 B5 Batch Time(seconds) B2:$0.02 B2:$0.05 (a)Vary The Price 0 250 500 750 1000 B1 B2 B3 B4 B5 Batch Time(seconds) 10 workers 20 workers (b) Vary The Workforce Experiments Varying the Control Factors Weighted Fair Scheduler is used • (a) Effect of increasing B2’s priority (Price) on batch execution time • B2 executes faster • (b) Effect of varying the number of crowd workers involved in the completion of the HIT batches • The load is rebalanced (albeit, with different proportions) but all batches had a speed increase 16
  • 17. Experiments in the Wild Execution Trace 0 10 20 30 0 10 20 30 0 10 20 30 FSIndividualBatchesWCFS 12:20 12:30 12:40 12:50 Time #ActiveWorkers
  • 18. Conclusions • Batch starvation in crowdsourcing is problematic for requesters • We introduce a new scheduling layer that shares a pool of crowd workers among multiple tenants of a crowd-powered system • We perform evaluations in a real setup with real workers • We show that an HIT-Bundle increases the overall throughput • Our technique (Worker Conscious Fair Sharing), inspired from large scale data processing frameworks, minimises context switch • Toward Service Level Agreement aware scheduling for crowdsourcing platforms. Code: https://github.com/XI-lab/HIT-Scheduler