Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Exploiting Worker Correlation for
Label Aggregation in Crowdsourcing
Yuan Li, Benjamin Rubinstein, Trevor Cohn

Crowdsourcing
Templates provided by FigureEight (formerly CrowdFlower)2

We focus on aggregating discrete labels
Templates provided by FigureEight (formerly CrowdFlower)3

A matrix of worker labels Infer the truth…
T T ?
T F ?
T ? T
T ? F
? F F
Truth
?
?
?
?
?

Problem statement
Assume there are 𝑊 workers who classify 𝑁 items into 𝐾 categories.
Let 𝑧𝑖 be the latent true label of item 𝑖, 𝑦𝑖𝑗 the label that worker 𝑗 assigns to item 𝑖,
𝑊𝑖 the set of workers who have labelled item 𝑖.
Goal:
inferring the true labels 𝑍 = 𝑧𝑖 𝑖=1
𝑁
based on the observed worker labels 𝑌 = 𝑦𝑖𝑗 𝑗∈𝑊𝑖
𝑖=1
𝑁
5

Outline
Introduction
Preliminaries
• Probabilistic models
Our proposed method
• Enhanced Bayesian Classifier Combination (EBCC)
Results
• Synthetic data
• Real-world data
Discussion

Crowdsourcing
A matrix of worker labels Infer the truth…
T T ?
T F ?
T ? T
T ? F
? F F
Truth
?
?
?
?
?

Classifier combination
A matrix of classifier predictions Infer the truth…
T T F
T F T
T T T
T F F
F F F
Truth
?
?
?
?
?

All probabilistic aggregation models define
𝑝 𝑌, 𝑍 = ෑ
𝑖
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 𝑝(𝑧𝑖)
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 is parameterized by 𝑉
𝑉 captures worker reliability
𝑝 𝑌, 𝑍 𝑉 = ෑ
𝑖
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖, 𝑉 𝑝(𝑧𝑖)
10

Independent BCC (iBCC)
Models mainly differ in how 𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 is parameterized by 𝑉
iBCC (Kim & Ghahramani, 2012) assumes conditional independence between 𝑦𝑖𝑗’s:
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 = ෑ
𝑗
𝑝 𝑦𝑖𝑗 𝑧𝑖
where 𝑣𝑗𝑘𝑙 = 𝑝 𝑦𝑖𝑗 = 𝑙 𝑧𝑖 = 𝑘
• Easy to marginalize out unobserved 𝑦𝑖𝑗
• #paras in 𝑉 is 𝑂 𝑊𝐾2
14

Dependent BCC (dBCC)
dBCC (Kim & Ghahramani, 2012) uses a Markov field for 𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 to capture
the correlation between𝑦𝑖𝑗’s
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖, 𝑉, 𝑈 =
1
𝐶 𝑉, 𝑈, 𝑧𝑖
exp ෍
1≤𝑗<𝑗′≤𝑊
𝑢𝑗𝑗′𝑦𝑖𝑗𝑦𝑖𝑗′
+ ෍
𝑗=1
𝑊
𝑣𝑗𝑧𝑖𝑦𝑖𝑗
However,
it’s intractable to marginalize out unobserved 𝑦𝑖𝑗 and #params in V and U is 𝑂(𝑊2𝐾2)
16

Worker A & B have labelled 20 items…
18
Class 0
0.8 0.1
0.1 0
≈
0.9
0.1
⨂
0.9
0.1
=
0.81 0.09
0.09 0.01
Conditional independence assumption
Class 1
0.4 0.1
0.1 0.4
≉
0.5
0.5
⨂
0.5
0.5
=
0.25 0.25
0.25 0.25
Conditional independence assumption
A B Truth
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
A B Truth
1 1 1
1 1 1
1 1 1
1 1 1
1 0 1
0 1 1
0 0 1
0 0 1
0 0 1
0 0 1

Use 2 rank-1 matrices…
19
0.4 0.1
0.1 0.4
≉
0.5
0.5
⨂
0.5
0.5
=
0.25 0.25
0.25 0.25
0.4 0.1
0.1 0.4
≈
1
2
0.1
0.9
⨂
0.1
0.9
+
1
2
0.9
0.1
⨂
0.9
0.1
=
0.41 0.09
0.09 0.41

There are 2 subtypes for class-1 items
20
0.4 0.1
0.1 0.4
≉
0.5
0.5
⨂
0.5
0.5
=
0.25 0.25
0.25 0.25
0.4 0.1
0.1 0.4
≈
1
2
0.1
0.9
⨂
0.1
0.9
+
1
2
0.9
0.1
⨂
0.9
0.1
=
0.41 0.09
0.09 0.41
In half of the time,
A & B both have
90% accuracy
In half of the time,
A & B both have
10% accuracy
They are labelling
easy items
They are labelling
hard items

Worker A & B have labelled 20 items…
21
Confusion matrix on class-level
Confusion matrix on subtype-level
A B Truth
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
A B Truth Subtype
1 1 1 Easy
1 1 1 Easy
1 1 1 Easy
1 1 1 Easy
1 0 1 ½ E + ½ H
0 1 1 ½ E + ½ H
0 0 1 Hard
0 0 1 Hard
0 0 1 Hard
0 0 1 Hard
Truth
Worker label
0 1
0 0.9 0.1
1 0.5 0.5
Truth Subtype
Worker label
0 1
0 - 0.9 0.1
1 Easy 0.1 0.9
1 Hard 0.9 0.1

In general, tensor rank decomposition can be used for modelling 𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 = 𝑘 = ෍
𝑚
𝜋𝑘𝑚 ⋅ Ԧ
𝑣1𝑘𝑚 ⊗ Ԧ
𝑣2𝑘𝑚 ⊗ ⋯ ⊗ Ԧ
𝑣𝑊𝑘𝑚
Or in a probabilistic way
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 = ෍
𝑚
𝑝 𝑔𝑖 = 𝑚 𝑧𝑖 ⋅ ෑ
𝑗
𝑝 𝑦𝑖𝑗 𝑧𝑖, 𝑔𝑖 = 𝑚
𝑚 = 1 … 𝑀, where 𝑀 is the number of subtypes per class
𝑔𝑖 is the subtype of item 𝑖 under its class 𝑧𝑖
22

More details about EBCC
Mean-field variational Bayes
Run multiple times, then select the solution with the highest ELBO
Use Dir(0.1) as the prior for 𝜋𝑘 to encourage the model to use fewer subtypes
24

Different ways to model 𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖
independent BCC (iBCC)
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 = ෑ
𝑗
𝑝 𝑦𝑖𝑗 𝑧𝑖
dependent BCC (dBCC)
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖, 𝑉, 𝑈 =
1
𝐶 𝑉, 𝑈, 𝑧𝑖
exp ෍
1≤𝑗<𝑗′≤𝑊
𝑢𝑗𝑗′𝑦𝑖𝑗𝑦𝑖𝑗′
+ ෍
𝑗=1
𝑊
𝑣𝑗𝑧𝑖𝑦𝑖𝑗
Our proposed enhanced BCC (EBCC)
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 = ෍
𝑚
𝑝 𝑔𝑖 = 𝑚 𝑧𝑖 ⋅ ෑ
𝑗
𝑝 𝑦𝑖𝑗 𝑧𝑖, 𝑔𝑖 = 𝑚

Synthetic data
Binary task, 2 subtypes per class
5 workers, subtype-level worker
accuracies shown in the table
All workers have labelled all items
Subtypes evenly distributed
Worker labels randomly
generated according to their
confusion matrices 27
Truth Subtype W1 W2 W3 W4 W5 %
0 0 0.9 0.9 0.7 0.7 0.7 25%
0 1 0.1 0.1 0.7 0.7 0.7 25%
1 0 0.9 0.1 0.7 0.7 0.7 25%
1 1 0.1 0.9 0.7 0.7 0.7 25%

Results on 17 real-world datasets
17 datasets
Coming from three crowdsourcing dataset
collections, namely the union of
Venanzi et al., (2015, AAAI) (8 datasets)
Zheng et al., (2017, VLDB) (7 datasets)
Zhang et al., (2014, NIPS) (5 datasets)
noting that 3 datasets are in common
between the last two collections.
28
10 benchmarks
MV
ZenCrowd (Demartini et al., 2012, WWW)
GLAD (Whitehill et al., 2009, NIPS)
DS (Dawid & Skene, 1979)
Minimax (Zhou et al., 2012, JMLR)
iBCC (Kim & Ghahramani, 2012, AISTATS)
CBCC (Venanzi et al., 2014, WWW)
LFC (Raykar et al., 2010, JMLR)
CATD (Li et al., 2014, VLDB)
CRH (Aydin et al., 2014, AAAI)

Results on 17 real-world datasets
EBCC(M=10) has the highest, 84.5%
the best existing method iBCC-MF, 83.4%
confusion-matrix-based models (DS, iBCC,
CBCC, LFC) perform similarly with mean
accuracy within range [82.9%, 83.4%]
followed by “1-coin” models, namely,
CATD (82.8%), GLAD (82.3%), ZC (82.2%)
Minimax and CRH fail catastrophically on
a few datasets

Discussion
Limitations
• EBCC performs worse than others on very noisy datasets
• Our mean-field VB implementation may get stuck on local optima

Thank you
Poster today @ Pacific Ballroom #240

Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Recommended

Recommended

More Related Content

Similar to Exploiting Worker Correlation for Label Aggregation in Crowdsourcing

Similar to Exploiting Worker Correlation for Label Aggregation in Crowdsourcing (20)

Recently uploaded

Recently uploaded (20)

Exploiting Worker Correlation for Label Aggregation in Crowdsourcing