Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Short Text Language Detection with ... by Shuyo Nakatani 65016 views
- [Kim+ ICML2012] Dirichlet Process w... by Shuyo Nakatani 68091 views
- Extreme Extraction - Machine Readin... by Shuyo Nakatani 26058 views
- Language Detection Library for Java by Shuyo Nakatani 54162 views
- 星野「調査観察データの統計科学」第1＆2章 by Shuyo Nakatani 11328 views
- 人工知能と機械学習の違いって？ by Shuyo Nakatani 10940 views

25,135 views

Published on

No Downloads

Total views

25,135

On SlideShare

0

From Embeds

0

Number of Embeds

16,835

Shares

0

Downloads

13

Comments

0

Likes

2

No embeds

No notes for slide

- 1. [Karger+] Iterative Learning forReliable Crowdsourcing Systems 2012/04/08 #NIPSreading Nakatani Shuyo
- 2. Crowdsourcing• Outsource to undefined public – Almost workers are not experts – Some workers may be SPAMMERs• Amazon Mechanical Turk – Separate a large task into microtasks – Workers gain a few cents per a microtask 2
- 3. Spammer and Hammer• Spam/Spammer – submitting arbitrary answers for fee• Ham/Hammer – answering question correctly• It is difficult to distinguish spam/spammers – Requester doesn’t have a gold standard – Workers are neither persistent nor unidentifiable 3
- 4. Questions• How to ensure reliability of workers – Is this worker is a spammer or hammer?• How to minimize total price – ∝ number of task assignments• How to predict answers – majority voting? EMA?• How to estimate upper bound of error rate – estimate upper bound 4
- 5. Setting• 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚 t1 t2 t3 … tm• 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛• (l, r)-regular bipartite graph w1 w2 w3 … wn – Each task assigns to l workers. – Each worker assigns to r tasks.• Given m and r, how to select l? 𝑚𝑙 – 𝑚𝑙 = 𝑛𝑟, then 𝑛 = is decided. 𝑟 5
- 6. Model• 𝑠 𝑖 = ±1: correct answers of ti (unobserved)• 𝐴 𝑖𝑗 : answers to ti of wj (observed) ∀• 𝑝 𝑗 = 𝑝 𝐴 𝑖𝑗 = 𝑠 𝑖 for 𝑖 : reliability of workers – It assumes independent on task 2• 𝐄 2𝑝 𝑗 − 1 = 𝑞 : average quality parameter – 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are diligent – q is set to 0.3 on the later experiment 6
- 7. Example: spammer-hammer model• For 𝑞 ∈ 0, 1 given,• 𝑝 𝑗 = 1 with probability 𝑞 – wj is a perfect hammer (all correct).• 𝑝 𝑗 = 1/2 with probability 1 − 𝑞 – wj is a spammer (random answers) 2• Then 𝐄 2𝑝 𝑗 − 1 = 𝑞×1+ 1− 𝑞 ×0= 𝑞 7
- 8. Iterative Inference• 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj• 𝑦 𝑗→𝑖 : worker messages from wj to ti 8 from [Karger+ NIPS11]
- 9. Prediction• predicted answer: 𝑠𝑖 𝐴 𝑖𝑗 = sign 𝐴 𝑖𝑗 𝑦 𝑗→𝑖 𝑖,𝑗 ∈𝐸 𝑗∈𝜕 𝑖 – where 𝜕 𝑖 : neighborhood of ti• error rate: 𝑚 1 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 9
- 10. Performance Guarantee 10
- 11. Theorem 2.1• For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1.• Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according to (l, r)-regular bipartite graph• Estimate from k iterations of the iterative algorithm• If 𝜇 ≡ 𝐄 2𝑝 𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟, then 𝑚 𝑙𝑞 1 − 2 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 2𝜌 𝑘 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 – where 11
- 12. Corollary 2.2• Under the hypotheses of Theorem 2.1, 𝑚 𝑙𝑞 1 − 2 2𝜌∞lim sup lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 𝑘→∞ 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1• where – For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31 – For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15 12
- 13. Experiments• m = n = 1000, l = r• left: q=0.3, 𝑙 ∈ [1,30]• right: l = 25, 𝑞 ∈ [0, 0.4] from [Karger+ NIPS11] 13
- 14. Lower Bound 14

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment