3. We focus on aggregating discrete labels
Templates provided by FigureEight (formerly CrowdFlower)3
4. A matrix of worker labels Infer the truth…
T T ?
T F ?
T ? T
T ? F
? F F
Truth
?
?
?
?
?
5. Problem statement
Assume there are 𝑊 workers who classify 𝑁 items into 𝐾 categories.
Let 𝑧𝑖 be the latent true label of item 𝑖, 𝑦𝑖𝑗 the label that worker 𝑗 assigns to item 𝑖,
𝑊𝑖 the set of workers who have labelled item 𝑖.
Goal:
inferring the true labels 𝑍 = 𝑧𝑖 𝑖=1
𝑁
based on the observed worker labels 𝑌 = 𝑦𝑖𝑗 𝑗∈𝑊𝑖
𝑖=1
𝑁
5
11. Independent BCC (iBCC)
Models mainly differ in how 𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 is parameterized by 𝑉
iBCC (Kim & Ghahramani, 2012) assumes conditional independence between 𝑦𝑖𝑗’s:
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 = ෑ
𝑗
𝑝 𝑦𝑖𝑗 𝑧𝑖
where 𝑣𝑗𝑘𝑙 = 𝑝 𝑦𝑖𝑗 = 𝑙 𝑧𝑖 = 𝑘
• Easy to marginalize out unobserved 𝑦𝑖𝑗
• #paras in 𝑉 is 𝑂 𝑊𝐾2
14
12. Dependent BCC (dBCC)
dBCC (Kim & Ghahramani, 2012) uses a Markov field for 𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 to capture
the correlation between𝑦𝑖𝑗’s
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖, 𝑉, 𝑈 =
1
𝐶 𝑉, 𝑈, 𝑧𝑖
exp
1≤𝑗<𝑗′≤𝑊
𝑢𝑗𝑗′𝑦𝑖𝑗𝑦𝑖𝑗′
+
𝑗=1
𝑊
𝑣𝑗𝑧𝑖𝑦𝑖𝑗
However,
it’s intractable to marginalize out unobserved 𝑦𝑖𝑗 and #params in V and U is 𝑂(𝑊2𝐾2)
16
16. There are 2 subtypes for class-1 items
20
0.4 0.1
0.1 0.4
≉
0.5
0.5
⨂
0.5
0.5
=
0.25 0.25
0.25 0.25
0.4 0.1
0.1 0.4
≈
1
2
0.1
0.9
⨂
0.1
0.9
+
1
2
0.9
0.1
⨂
0.9
0.1
=
0.41 0.09
0.09 0.41
In half of the time,
A & B both have
90% accuracy
In half of the time,
A & B both have
10% accuracy
They are labelling
easy items
They are labelling
hard items
17. Worker A & B have labelled 20 items…
21
Confusion matrix on class-level
Confusion matrix on subtype-level
A B Truth
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
1 0 0
0 1 0
A B Truth Subtype
1 1 1 Easy
1 1 1 Easy
1 1 1 Easy
1 1 1 Easy
1 0 1 ½ E + ½ H
0 1 1 ½ E + ½ H
0 0 1 Hard
0 0 1 Hard
0 0 1 Hard
0 0 1 Hard
Truth
Worker label
0 1
0 0.9 0.1
1 0.5 0.5
Truth Subtype
Worker label
0 1
0 - 0.9 0.1
1 Easy 0.1 0.9
1 Hard 0.9 0.1
18. In general, tensor rank decomposition can be used for modelling 𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 = 𝑘 =
𝑚
𝜋𝑘𝑚 ⋅ Ԧ
𝑣1𝑘𝑚 ⊗ Ԧ
𝑣2𝑘𝑚 ⊗ ⋯ ⊗ Ԧ
𝑣𝑊𝑘𝑚
Or in a probabilistic way
𝑝 𝑦𝑖1, 𝑦𝑖2, … , 𝑦𝑖𝑊 𝑧𝑖 =
𝑚
𝑝 𝑔𝑖 = 𝑚 𝑧𝑖 ⋅ ෑ
𝑗
𝑝 𝑦𝑖𝑗 𝑧𝑖, 𝑔𝑖 = 𝑚
𝑚 = 1 … 𝑀, where 𝑀 is the number of subtypes per class
𝑔𝑖 is the subtype of item 𝑖 under its class 𝑧𝑖
22
20. More details about EBCC
Mean-field variational Bayes
Run multiple times, then select the solution with the highest ELBO
Use Dir(0.1) as the prior for 𝜋𝑘 to encourage the model to use fewer subtypes
24
23. Synthetic data
Binary task, 2 subtypes per class
5 workers, subtype-level worker
accuracies shown in the table
All workers have labelled all items
Subtypes evenly distributed
Worker labels randomly
generated according to their
confusion matrices 27
Truth Subtype W1 W2 W3 W4 W5 %
0 0 0.9 0.9 0.7 0.7 0.7 25%
0 1 0.1 0.1 0.7 0.7 0.7 25%
1 0 0.9 0.1 0.7 0.7 0.7 25%
1 1 0.1 0.9 0.7 0.7 0.7 25%
24. Results on 17 real-world datasets
17 datasets
Coming from three crowdsourcing dataset
collections, namely the union of
Venanzi et al., (2015, AAAI) (8 datasets)
Zheng et al., (2017, VLDB) (7 datasets)
Zhang et al., (2014, NIPS) (5 datasets)
noting that 3 datasets are in common
between the last two collections.
28
10 benchmarks
MV
ZenCrowd (Demartini et al., 2012, WWW)
GLAD (Whitehill et al., 2009, NIPS)
DS (Dawid & Skene, 1979)
Minimax (Zhou et al., 2012, JMLR)
iBCC (Kim & Ghahramani, 2012, AISTATS)
CBCC (Venanzi et al., 2014, WWW)
LFC (Raykar et al., 2010, JMLR)
CATD (Li et al., 2014, VLDB)
CRH (Aydin et al., 2014, AAAI)
25. Results on 17 real-world datasets
EBCC(M=10) has the highest, 84.5%
the best existing method iBCC-MF, 83.4%
confusion-matrix-based models (DS, iBCC,
CBCC, LFC) perform similarly with mean
accuracy within range [82.9%, 83.4%]
followed by “1-coin” models, namely,
CATD (82.8%), GLAD (82.3%), ZC (82.2%)
Minimax and CRH fail catastrophically on
a few datasets