[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

•

2 likes•28,130 views

Shuyo Nakatani

Technology News & Politics

[Karger+] Iterative Learning for
Reliable Crowdsourcing Systems

2012/04/08 #NIPSreading
Nakatani Shuyo

Crowdsourcing
• Outsource to undefined public
– Almost workers are not experts
– Some workers may be SPAMMERs
• Amazon Mechanical Turk
– Separate a large task into microtasks
– Workers gain a few cents per a microtask

2

Spammer and Hammer
• Spam/Spammer
– submitting arbitrary answers for fee
• Ham/Hammer
– answering question correctly
• It is difficult to distinguish spam/spammers
– Requester doesn’t have a gold standard
– Workers are neither persistent nor unidentifiable
3

Questions
• How to ensure reliability of workers
– Is this worker is a spammer or hammer?
• How to minimize total price
– ∝ number of task assignments
• How to predict answers
– majority voting? EMA?
• How to estimate upper bound of error rate
– estimate upper bound

4

Setting
• 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚 t1 t2 t3 … tm

• 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛
• (l, r)-regular bipartite graph w1 w2 w3 … wn

– Each task assigns to l workers.
– Each worker assigns to r tasks.
• Given m and r, how to select l?
𝑚𝑙
– 𝑚𝑙 = 𝑛𝑟, then 𝑛 = is decided.
𝑟

5

Model
• 𝑠 𝑖 = ±1: correct answers of ti (unobserved)
• 𝐴 𝑖𝑗 : answers to ti of wj (observed)
∀
• 𝑝 𝑗 = 𝑝 𝐴 𝑖𝑗 = 𝑠 𝑖 for 𝑖 : reliability of workers
– It assumes independent on task
2
• 𝐄 2𝑝 𝑗 − 1 = 𝑞 : average quality parameter
– 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are
diligent
– q is set to 0.3 on the later experiment

6

Example: spammer-hammer model
• For 𝑞 ∈ 0, 1 given,
• 𝑝 𝑗 = 1 with probability 𝑞
– wj is a perfect hammer (all correct).
• 𝑝 𝑗 = 1/2 with probability 1 − 𝑞
– wj is a spammer (random answers)
2
• Then 𝐄 2𝑝 𝑗 − 1 = 𝑞×1+ 1− 𝑞 ×0= 𝑞

7

Iterative Inference
• 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj
• 𝑦 𝑗→𝑖 : worker messages from wj to ti

8
from [Karger+ NIPS11]

Prediction
• predicted answer:

𝑠𝑖 𝐴 𝑖𝑗 = sign 𝐴 𝑖𝑗 𝑦 𝑗→𝑖
𝑖,𝑗 ∈𝐸 𝑗∈𝜕 𝑖
– where 𝜕 𝑖 : neighborhood of ti
• error rate:
𝑚
1
lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗
𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸
𝑖=1

9

Theorem 2.1
• For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1.
• Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according
to (l, r)-regular bipartite graph
• Estimate from k iterations of the iterative algorithm
• If 𝜇 ≡ 𝐄 2𝑝 𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟, then
𝑚 𝑙𝑞
1 − 2
lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 2𝜌 𝑘
𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸
𝑖=1
– where

11

Corollary 2.2
• Under the hypotheses of Theorem 2.1,
𝑚 𝑙𝑞
1 − 2
2𝜌∞
lim sup lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒
𝑘→∞ 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸
𝑖=1
• where

– For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31
– For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15

12

Experiments
• m = n = 1000, l = r
• left: q=0.3, 𝑙 ∈ [1,30]
• right: l = 25, 𝑞 ∈ [0, 0.4]

from [Karger+ NIPS11] 13

What's hot

MT102 Лекц-1ssuser1b40bc

Central TendencyKaori Kubo Germano, PhD

regressionKaori Kubo Germano, PhD

MT102 Лекц 13ssuser184df1

MT102 Лекц 14ssuser184df1

MT102 Лекц 12ssuser184df1

PROBABILITY DISTRIBUTION OF SUM OF TWO CONTINUOUS VARIABLES AND CONVOLUTIONJournal For Research

Standard normal distributionNadeem Uddin

MT102 Лекц 16ssuser184df1

MT102 Лекц 8ssuser184df1

Normal probability distributionNadeem Uddin

Basic calculus (ii) recapFarzad Javidanrad

MT102 Лекц 6ssuser184df1

VariabilityKaori Kubo Germano, PhD

On The Distribution of Non - Zero Zeros of Generalized Mittag – Leffler Funct...IJERA Editor

効率的反実仮想学習Masa Kato

Central Tendency & DispersionBirinder Singh Gulati

Chpt8 how to do an experimentLexume1

A Mathematical Model for the Enhancement of Stress Induced Hypoglycaemia by A...IJRES Journal

MITx_14310_CLTRyosuke Ishii

What's hot (20)

MT102 Лекц-1

Central Tendency

regression

MT102 Лекц 13

MT102 Лекц 14

MT102 Лекц 12

PROBABILITY DISTRIBUTION OF SUM OF TWO CONTINUOUS VARIABLES AND CONVOLUTION

Standard normal distribution

MT102 Лекц 16

MT102 Лекц 8

Normal probability distribution

Basic calculus (ii) recap

MT102 Лекц 6

Variability

On The Distribution of Non - Zero Zeros of Generalized Mittag – Leffler Funct...

効率的反実仮想学習

Central Tendency & Dispersion

Chpt8 how to do an experiment

A Mathematical Model for the Enhancement of Stress Induced Hypoglycaemia by A...

MITx_14310_CLT

Similar to [Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

2Multi_armed_bandits.pptxZhiwuGuo1

13Kernel_Machines.pptxKarasuLee

Lecture Notes: EEEC4340318 Instrumentation and Control Systems - System ModelsAIMST University

Calculus Review Session Brian Prest Duke University Nicholas School of the En...rofiho9697

Analysis of Algorithms - 2AtakanAral

ERF Training WorkshopPanel Data 5Economic Research Forum

Queues internet src2Ammulu Amma

STLtalk about statistical analysis and its applicationJulieDash5

Linear regression, costs & gradient descentRevanth Kumar

Deep neural networks & computational graphsRevanth Kumar

Daa notes 2smruti sarangi

Neural NetworksMakerere Unversity School of Public Health, Victoria University

Quadratic form and functional optimizationJunpei Tsuji

Playing Go with Clojureztellman

Support vector machinesJinho Lee

Digital control systems (dcs) lecture 18-19-20Ali Rind

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu

Reinforcement Learning basics part1Euijin Jeong

equivalence and countabilityROHAN GAIKWAD

Av 738- Adaptive Filtering - Wiener Filters[wk 3]Dr. Bilal Siddiqui, C.Eng., MIMechE, FRAeS

Similar to [Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems (20)

2Multi_armed_bandits.pptx

13Kernel_Machines.pptx

Lecture Notes: EEEC4340318 Instrumentation and Control Systems - System Models

Calculus Review Session Brian Prest Duke University Nicholas School of the En...

Analysis of Algorithms - 2

ERF Training WorkshopPanel Data 5

Queues internet src2

STLtalk about statistical analysis and its application

Linear regression, costs & gradient descent

Deep neural networks & computational graphs

Daa notes 2

Neural Networks

Quadratic form and functional optimization

Playing Go with Clojure

Support vector machines

Digital control systems (dcs) lecture 18-19-20

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf

Reinforcement Learning basics part1

equivalence and countability

Av 738- Adaptive Filtering - Wiener Filters[wk 3]

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

MINDCTI Revenue Release Quarter One 2024MIND CTI

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Manulife - Insurer Innovation Award 2024The Digital Insurer

GenAI Risks & Security Meetup 01052024.pdflior mazor

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Real Time Object Detection Using Open CVKhem

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)

Top 10 Most Downloaded Games on Play Store in 2024

2024: Domino Containers - The Next Step. News from the Domino Container commu...

MINDCTI Revenue Release Quarter One 2024

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Axa Assurance Maroc - Insurer Innovation Award 2024

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Manulife - Insurer Innovation Award 2024

GenAI Risks & Security Meetup 01052024.pdf

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Boost PC performance: How more available memory can improve productivity

Scaling API-first – The story of a global engineering organization

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

AWS Community Day CPH - Three problems of Terraform

Real Time Object Detection Using Open CV

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Boost Fertility New Invention Ups Success Rates.pdf

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

1. [Karger+] Iterative Learning for Reliable Crowdsourcing Systems 2012/04/08 #NIPSreading Nakatani Shuyo

2. Crowdsourcing • Outsource to undefined public – Almost workers are not experts – Some workers may be SPAMMERs • Amazon Mechanical Turk – Separate a large task into microtasks – Workers gain a few cents per a microtask 2

3. Spammer and Hammer • Spam/Spammer – submitting arbitrary answers for fee • Ham/Hammer – answering question correctly • It is difficult to distinguish spam/spammers – Requester doesn’t have a gold standard – Workers are neither persistent nor unidentifiable 3

4. Questions • How to ensure reliability of workers – Is this worker is a spammer or hammer? • How to minimize total price – ∝ number of task assignments • How to predict answers – majority voting? EMA? • How to estimate upper bound of error rate – estimate upper bound 4

5. Setting • 𝑡 𝑖 : tasks, 𝑖 = 1, ⋯ , 𝑚 t1 t2 t3 … tm • 𝑤 𝑗 : workers, 𝑗 = 1, ⋯ , 𝑛 • (l, r)-regular bipartite graph w1 w2 w3 … wn – Each task assigns to l workers. – Each worker assigns to r tasks. • Given m and r, how to select l? 𝑚𝑙 – 𝑚𝑙 = 𝑛𝑟, then 𝑛 = is decided. 𝑟 5

6. Model • 𝑠 𝑖 = ±1: correct answers of ti (unobserved) • 𝐴 𝑖𝑗 : answers to ti of wj (observed) ∀ • 𝑝 𝑗 = 𝑝 𝐴 𝑖𝑗 = 𝑠 𝑖 for 𝑖 : reliability of workers – It assumes independent on task 2 • 𝐄 2𝑝 𝑗 − 1 = 𝑞 : average quality parameter – 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are diligent – q is set to 0.3 on the later experiment 6

7. Example: spammer-hammer model • For 𝑞 ∈ 0, 1 given, • 𝑝 𝑗 = 1 with probability 𝑞 – wj is a perfect hammer (all correct). • 𝑝 𝑗 = 1/2 with probability 1 − 𝑞 – wj is a spammer (random answers) 2 • Then 𝐄 2𝑝 𝑗 − 1 = 𝑞×1+ 1− 𝑞 ×0= 𝑞 7

8. Iterative Inference • 𝑥 𝑖→𝑗 : real-valued task messages from ti to wj • 𝑦 𝑗→𝑖 : worker messages from wj to ti 8 from [Karger+ NIPS11]

9. Prediction • predicted answer: 𝑠𝑖 𝐴 𝑖𝑗 = sign 𝐴 𝑖𝑗 𝑦 𝑗→𝑖 𝑖,𝑗 ∈𝐸 𝑗∈𝜕 𝑖 – where 𝜕 𝑖 : neighborhood of ti • error rate: 𝑚 1 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 9

10. Performance Guarantee 10

11. Theorem 2.1 • For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1. • Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according to (l, r)-regular bipartite graph • Estimate from k iterations of the iterative algorithm • If 𝜇 ≡ 𝐄 2𝑝 𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟, then 𝑚 𝑙𝑞 1 − 2 lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 2𝜌 𝑘 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 – where 11

12. Corollary 2.2 • Under the hypotheses of Theorem 2.1, 𝑚 𝑙𝑞 1 − 2 2𝜌∞ lim sup lim sup 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴 𝑖𝑗 ≤ 𝑒 𝑘→∞ 𝑚→∞ 𝑚 𝑖,𝑗 ∈𝐸 𝑖=1 • where – For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31 – For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15 12

13. Experiments • m = n = 1000, l = r • left: q=0.3, 𝑙 ∈ [1,30] • right: l = 25, 𝑞 ∈ [0, 0.4] from [Karger+ NIPS11] 13

14. Lower Bound 14

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems

Similar to [Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems (20)

More from Shuyo Nakatani

More from Shuyo Nakatani (20)

Recently uploaded

Recently uploaded (20)

[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems