The Dynamics of Micro-Task
Crowdsourcing
The Case of Amazon MTurk
Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini,
Panos Ipeirotis, Philippe Cudré-Mauroux
WWW’15 - 20th May 2015 - Florence 1
Background
Crowdsourcing is an Effective solution
to certain classes of problems
2
Background
A Crowdsourcing Platform allows requesters to publish a
crowdsourcing request (batch)
composed of multiple tasks (HITs)
Programmatically Invoke the crowd with APIs
3
Background
Paid Microtask Crowdsourcing
scales-out but remains highly unpredictable
4
Background
Paid Microtask Crowdsourcing
scales-out but remains highly unpredictable
5
time
#HITs/ Minute
Batch Throughput
SLAs are expensive
6
MTurk is a Marketplace for HITs
Direct: Price, Time of the day, #workers, #HITs etc
Other: Forums, Reputation-sys (TurkOpticon), Recommendation-sys (Openturk) 7
A Data Driven
Approach
8
9
...Five Years Later
[2009 - 2014]
mturk-tracker collected
2.5Million different batches
with over 130Million HITs
10
mturk-tracker.com
● Collects metadata about each visible batch (Title, description, rewards,
required qualifications, HITs available etc)
● Records batch progress (every ~20 minutes)
We note that the tracker reports data periodically only and does not reflect
fine-grained information (e.g., real-time variations)
11
Menu
1. Notable Facts Extracted from the Data
2. Large-scale HIT Type Classification
3. Analyzing the Features Affecting Batch Throughput
4. Market Analysis
12
1) Notable Facts Extracted
from the Data
13
Country-Specific HITs
14
US and India?
Country-Specific HITs
Workers from US, India and Canada are the most sought after.
15
Distribution of Batch Size
16
“Power-law”
Evolution of Batch Sizes
Very large batches
start to appear
17
HIT Pricing
18
Is 1-cent per HIT
the norm?
HIT Pricing
19
5-cents is the new
1-cent
Requesters and Reward Evolution
20
Increasing number of New
and Distinct Requesters
2) Large-scale HIT Type
Classification
21
Classify HITs into types (Gadiraju et. al 2014)
- Information Finding (IF)
- Verification and Validation (VV )
- Interpretation and Analysis (IA)
- Content Creation (CC)
- Surveys (SU)
- Content Access (CA)
22
HIT Classes
We trained a Support Vector Machine (SVM) model
- HIT title, description, keywords, reward, date, allocated time, and batch
size
- Created labeled data on Mturk for 5,000 HITs uniformly sampled HITs
- Our HIT used 3 repetitions
- Consensus reached for 89% of the tasks
- 10-fold cross validation
- Precision of 0.895
- Recall of 0.899
- F-Measure of 0.895
- We then performed a large-scale classification for all 2.5M HITs
Supervised Classification
With the Crowd
23
Distribution of HIT Types
Less Content Access batches
Content Creation being the most popular
24
3) Analyzing the Features
Affecting Batch
Throughput
25
time
#HITs/ Minute
Batch Throughput
Batch Throughput Prediction
29 Features
HIT Features
HITs available, Start Time, Reward, Description length, Title length, Keywords,
requester_id, Time_alloted, Task type, Age (minutes) etc.
Market Features
Total HITs available, HITs arrived, rewards Arrived, % HITs completed etc.
26
Batch Throughput Prediction
T
time
delta
- Predict batch throughput at time T by training a Random Forest
Regression model with samples taken in [T-delta, T) time span
- 29 Features (including the Type of the Batch)
- Hourly Data in range [June-October] 2014
- We sampled 50 times points for evaluation purposes
27
Batch Throughput Prediction
T
time
delta
- Predict batch throughput at time T by training a Random Forest
Regression model with samples taken in [T-delta, T) time span
- 29 Features (including the Type of the Batch)
- Hourly Data in range [June-October] 2014
- We sampled 50 times points for evaluation purposes
We are interested in cases where prediction works reasonably 28
Predicted vs. Actual Batch
Throughput (delta=4 hours)
Prediction Works best for larger batches having
large momentum
29
Significant Features
- What features contribute best when the
prediction works reasonably
- We proceed by feature ablation
- Re-run prediction by removing 1 feature at a time
- 1000 samples
30
Significant Features
- What features contribute best when the
prediction works reasonably
- We proceed by feature ablation
- Re-run prediction by removing 1 feature at a time.
- 1000 samples
HITs_Available (Number of tasks in the batch)
Age_Minutes (how long ago the batch was created)
31
4) Market Analysis
32
Demand - The number of new tasks
published on the platform by the requesters
Supply - The workforce that the crowd is
providing
Supply Elasticity
How does the market reacts when new tasks
arrive on the platform?
33
Supply Elasticity
We regressed the percentage of
work done (within 1 Hour)
against the number of new HITs
34
Supply Elasticity
Intercept = 2.5
Slope = 0.5%
20% of new work gets
completed within an hour
35
Supply Elasticity
Intercept = 2.5
Slope = 0.5%
20% of new work gets
completed within an hour
36
Demand and Supply Periodicity
Demand Supply
37
Demand and Supply Periodicity
Strong weekly periodicity 7-10 days.
38
Conclusions
- Long time data analysis uncovers some hidden trends
- Large scale HIT classification
- Important features in throughput prediction (HITs
available, Age_minutes)
- Supply is Elastic
- (More work available -> More work Done)
- Supply and Demand are periodic (7-10days) 39
Is a Crowdsourcing Marketplace the right
paradigm for efficient and predictable
crowdsourcing?
40
Is a Crowdsourcing Marketplace the right
paradigm for efficient and predictable
crowdsourcing?
41
Q&A
Djellel Difallah
ded@exascale.info

The Dynamics of Micro-Task Crowdsourcing

  • 1.
    The Dynamics ofMicro-Task Crowdsourcing The Case of Amazon MTurk Djellel Eddine Difallah, Michele Catasta, Gianluca Demartini, Panos Ipeirotis, Philippe Cudré-Mauroux WWW’15 - 20th May 2015 - Florence 1
  • 2.
    Background Crowdsourcing is anEffective solution to certain classes of problems 2
  • 3.
    Background A Crowdsourcing Platformallows requesters to publish a crowdsourcing request (batch) composed of multiple tasks (HITs) Programmatically Invoke the crowd with APIs 3
  • 4.
    Background Paid Microtask Crowdsourcing scales-outbut remains highly unpredictable 4
  • 5.
    Background Paid Microtask Crowdsourcing scales-outbut remains highly unpredictable 5 time #HITs/ Minute Batch Throughput
  • 6.
  • 7.
    MTurk is aMarketplace for HITs Direct: Price, Time of the day, #workers, #HITs etc Other: Forums, Reputation-sys (TurkOpticon), Recommendation-sys (Openturk) 7
  • 8.
  • 9.
  • 10.
    ...Five Years Later [2009- 2014] mturk-tracker collected 2.5Million different batches with over 130Million HITs 10
  • 11.
    mturk-tracker.com ● Collects metadataabout each visible batch (Title, description, rewards, required qualifications, HITs available etc) ● Records batch progress (every ~20 minutes) We note that the tracker reports data periodically only and does not reflect fine-grained information (e.g., real-time variations) 11
  • 12.
    Menu 1. Notable FactsExtracted from the Data 2. Large-scale HIT Type Classification 3. Analyzing the Features Affecting Batch Throughput 4. Market Analysis 12
  • 13.
    1) Notable FactsExtracted from the Data 13
  • 14.
  • 15.
    Country-Specific HITs Workers fromUS, India and Canada are the most sought after. 15
  • 16.
    Distribution of BatchSize 16 “Power-law”
  • 17.
    Evolution of BatchSizes Very large batches start to appear 17
  • 18.
    HIT Pricing 18 Is 1-centper HIT the norm?
  • 19.
  • 20.
    Requesters and RewardEvolution 20 Increasing number of New and Distinct Requesters
  • 21.
    2) Large-scale HITType Classification 21
  • 22.
    Classify HITs intotypes (Gadiraju et. al 2014) - Information Finding (IF) - Verification and Validation (VV ) - Interpretation and Analysis (IA) - Content Creation (CC) - Surveys (SU) - Content Access (CA) 22 HIT Classes
  • 23.
    We trained aSupport Vector Machine (SVM) model - HIT title, description, keywords, reward, date, allocated time, and batch size - Created labeled data on Mturk for 5,000 HITs uniformly sampled HITs - Our HIT used 3 repetitions - Consensus reached for 89% of the tasks - 10-fold cross validation - Precision of 0.895 - Recall of 0.899 - F-Measure of 0.895 - We then performed a large-scale classification for all 2.5M HITs Supervised Classification With the Crowd 23
  • 24.
    Distribution of HITTypes Less Content Access batches Content Creation being the most popular 24
  • 25.
    3) Analyzing theFeatures Affecting Batch Throughput 25 time #HITs/ Minute Batch Throughput
  • 26.
    Batch Throughput Prediction 29Features HIT Features HITs available, Start Time, Reward, Description length, Title length, Keywords, requester_id, Time_alloted, Task type, Age (minutes) etc. Market Features Total HITs available, HITs arrived, rewards Arrived, % HITs completed etc. 26
  • 27.
    Batch Throughput Prediction T time delta -Predict batch throughput at time T by training a Random Forest Regression model with samples taken in [T-delta, T) time span - 29 Features (including the Type of the Batch) - Hourly Data in range [June-October] 2014 - We sampled 50 times points for evaluation purposes 27
  • 28.
    Batch Throughput Prediction T time delta -Predict batch throughput at time T by training a Random Forest Regression model with samples taken in [T-delta, T) time span - 29 Features (including the Type of the Batch) - Hourly Data in range [June-October] 2014 - We sampled 50 times points for evaluation purposes We are interested in cases where prediction works reasonably 28
  • 29.
    Predicted vs. ActualBatch Throughput (delta=4 hours) Prediction Works best for larger batches having large momentum 29
  • 30.
    Significant Features - Whatfeatures contribute best when the prediction works reasonably - We proceed by feature ablation - Re-run prediction by removing 1 feature at a time - 1000 samples 30
  • 31.
    Significant Features - Whatfeatures contribute best when the prediction works reasonably - We proceed by feature ablation - Re-run prediction by removing 1 feature at a time. - 1000 samples HITs_Available (Number of tasks in the batch) Age_Minutes (how long ago the batch was created) 31
  • 32.
    4) Market Analysis 32 Demand- The number of new tasks published on the platform by the requesters Supply - The workforce that the crowd is providing
  • 33.
    Supply Elasticity How doesthe market reacts when new tasks arrive on the platform? 33
  • 34.
    Supply Elasticity We regressedthe percentage of work done (within 1 Hour) against the number of new HITs 34
  • 35.
    Supply Elasticity Intercept =2.5 Slope = 0.5% 20% of new work gets completed within an hour 35
  • 36.
    Supply Elasticity Intercept =2.5 Slope = 0.5% 20% of new work gets completed within an hour 36
  • 37.
    Demand and SupplyPeriodicity Demand Supply 37
  • 38.
    Demand and SupplyPeriodicity Strong weekly periodicity 7-10 days. 38
  • 39.
    Conclusions - Long timedata analysis uncovers some hidden trends - Large scale HIT classification - Important features in throughput prediction (HITs available, Age_minutes) - Supply is Elastic - (More work available -> More work Done) - Supply and Demand are periodic (7-10days) 39
  • 40.
    Is a CrowdsourcingMarketplace the right paradigm for efficient and predictable crowdsourcing? 40
  • 41.
    Is a CrowdsourcingMarketplace the right paradigm for efficient and predictable crowdsourcing? 41 Q&A Djellel Difallah ded@exascale.info