[Karger+] Iterative Learning forReliable Crowdsourcing Systems        2012/04/08 #NIPSreading            Nakatani Shuyo
Crowdsourcing• Outsource to undefined public  – Almost workers are not experts  – Some workers may be SPAMMERs• Amazon Mec...
Spammer and Hammer• Spam/Spammer  – submitting arbitrary answers for fee• Ham/Hammer  – answering question correctly• It i...
Questions• How to ensure reliability of workers  – Is this worker is a spammer or hammer?• How to minimize total price  – ...
Setting• 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚          t1        t2        t3    …    tm• 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛• (l, r)-regular...
Model• 𝑠 𝑖 = ±1: correct answers of ti (unobserved)• 𝐴 𝑖𝑗 : answers to ti of wj (observed)                            ∀• 𝑝...
Example: spammer-hammer model• For 𝑞 ∈ 0, 1 given,• 𝑝 𝑗 = 1 with probability 𝑞   – wj is a perfect hammer (all correct).• ...
Iterative Inference• 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj• 𝑦 𝑗→𝑖 : worker messages from wj to ti               ...
Prediction• predicted answer:      𝑠𝑖    𝐴 𝑖𝑗            = sign              𝐴 𝑖𝑗 𝑦 𝑗→𝑖                   𝑖,𝑗 ∈𝐸          ...
Performance Guarantee                        10
Theorem 2.1• For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1.• Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers accordin...
Corollary 2.2• Under the hypotheses of Theorem 2.1,                     𝑚                                           𝑙𝑞    ...
Experiments• m = n = 1000, l = r• left: q=0.3, 𝑙 ∈ [1,30]• right: l = 25, 𝑞 ∈ [0, 0.4]                  from [Karger+ NIPS...
Lower Bound              14
Upcoming SlideShare
Loading in...5
×

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

16,362

Published on

Published in: Technology, News & Politics
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
16,362
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

  1. 1. [Karger+] Iterative Learning forReliable Crowdsourcing Systems 2012/04/08 #NIPSreading Nakatani Shuyo
  2. 2. Crowdsourcing• Outsource to undefined public – Almost workers are not experts – Some workers may be SPAMMERs• Amazon Mechanical Turk – Separate a large task into microtasks – Workers gain a few cents per a microtask 2
  3. 3. Spammer and Hammer• Spam/Spammer – submitting arbitrary answers for fee• Ham/Hammer – answering question correctly• It is difficult to distinguish spam/spammers – Requester doesn’t have a gold standard – Workers are neither persistent nor unidentifiable 3
  4. 4. Questions• How to ensure reliability of workers – Is this worker is a spammer or hammer?• How to minimize total price – ∝ number of task assignments• How to predict answers – majority voting? EMA?• How to estimate upper bound of error rate – estimate upper bound 4
  5. 5. Setting• 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚 t1 t2 t3 … tm• 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛• (l, r)-regular bipartite graph w1 w2 w3 … wn – Each task assigns to l workers. – Each worker assigns to r tasks.• Given m and r, how to select l? 𝑚𝑙 – 𝑚𝑙 = 𝑛𝑟, then 𝑛 = is decided. 𝑟 5
  6. 6. Model• 𝑠 𝑖 = ±1: correct answers of ti (unobserved)• 𝐴 𝑖𝑗 : answers to ti of wj (observed) ∀• 𝑝 𝑗 = 𝑝 𝐴 𝑖𝑗 = 𝑠 𝑖 for 𝑖 : reliability of workers – It assumes independent on task 2• 𝐄 2𝑝 𝑗 − 1 = 𝑞 : average quality parameter – 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are diligent – q is set to 0.3 on the later experiment 6
  7. 7. Example: spammer-hammer model• For 𝑞 ∈ 0, 1 given,• 𝑝 𝑗 = 1 with probability 𝑞 – wj is a perfect hammer (all correct).• 𝑝 𝑗 = 1/2 with probability 1 − 𝑞 – wj is a spammer (random answers) 2• Then 𝐄 2𝑝 𝑗 − 1 = 𝑞×1+ 1− 𝑞 ×0= 𝑞 7
  8. 8. Iterative Inference• 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj• 𝑦 𝑗→𝑖 : worker messages from wj to ti 8 from [Karger+ NIPS11]
  9. 9. Prediction• predicted answer: 𝑠𝑖 𝐴 𝑖𝑗 = sign 𝐴 𝑖𝑗 𝑦 𝑗→𝑖 𝑖,𝑗 ∈𝐸 𝑗∈𝜕 𝑖 – where 𝜕 𝑖 : neighborhood of ti• error rate: 𝑚 1 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 9
  10. 10. Performance Guarantee 10
  11. 11. Theorem 2.1• For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1.• Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according to (l, r)-regular bipartite graph• Estimate from k iterations of the iterative algorithm• If 𝜇 ≡ 𝐄 2𝑝 𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟, then 𝑚 𝑙𝑞 1 − 2 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 2𝜌 𝑘 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 – where 11
  12. 12. Corollary 2.2• Under the hypotheses of Theorem 2.1, 𝑚 𝑙𝑞 1 − 2 2𝜌∞lim sup lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 𝑘→∞ 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1• where – For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31 – For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15 12
  13. 13. Experiments• m = n = 1000, l = r• left: q=0.3, 𝑙 ∈ [1,30]• right: l = 25, 𝑞 ∈ [0, 0.4] from [Karger+ NIPS11] 13
  14. 14. Lower Bound 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×