Targeted Crowdsourcing
with a Billion (Potential) Users
Panos Ipeirotis, Stern School of Business, New York University
Evg...
The sabbatical mission…
“We have a billion users…
leverage their knowledge …”
Knowledge Graph: Things not Strings
Knowledge Graph: Things not Strings
Still incomplete…
• “Date of birth of Bayes” (…uncertain…)
• “Symptom of strep throat”
• “Side effects of treximet”
• “Who...
The sabbatical mission…
“Let’s create a new crowdsourcing system…”
“We have a billion users…
leverage their knowledge …”
Ideally…
But often…
The common solution…
Volunteers vs. hired workers
• Hired workers provide predictability
• …but have a monetary cost
• …are motivated by money ...
Key Challenge
“Crowdsource in a predictable manner,
with knowledgeable users,
without introducing monetary rewards”
Quizz
Calibration vs. Collection
• Calibration questions (known answer):
Evaluating user competence on topic at hand
• Collectio...
Model: Markov Decision Process
a correct
b incorrect
c collection
V = c*IG[a,b]
[a+1,b,c]
V = c*IG[a+1,b] [a,b+1,c]
V = c*...
Challenges
• Why would anyone come and play this game?
• Why would knowledgeable users come?
• Wouldn’t it be simpler to j...
Attracting Visitors: Ad Campaigns
Running Ad Campaigns: Objectives
• We want to attract good users, not just clicks
• We do not want to think hard about key...
Solution: Treat Quizz as eCommerce Site
Solution: Treat Quizz as eCommerce Site
Feedback:
Value of click
User Value: Information Gain
• Value of user: total information contributed
• Information gain is additive: #questions x i...
How to measure quality?
• Naïve (and unstable) approach: q = correct/total
• Bayesian approach: q is latent, with uniform ...
Expected Information Gain
Effect of Ad Targeting
(Perhaps it is just more users?)
• Control: Ad campaign with no feedback, all keywords across quizz...
Effect of Optimizing for Conversion Value
• Control: Feedback on “conversion event” but no value
• Treatment: Feedback pro...
Example of Targeting: Medical Quizzes
• Our initial goal was to use medical topics as a
evidence that some topics are not ...
Participation is important!
Really useful users
• Immediate feedback helps most
– Knowing the correct answer 10x more important than knowing
whether given answer was corr...
• Showing score is moderately helpful
– Be careful what you incentivize though 
– “Total Correct” incentivizes quantity, ...
• Competitiveness (how other users performed)
helps significantly
Treatment Effect
Show if user answer correct +2.4%
Show ...
• Leaderboards are tricky!
– Initially, strong positive effect
– Over time, effect became strongly negative
– All-time lea...
Cost/Benefit Analysis
Self-selection and participation
• Low performing users naturally drop out
• With paid users, monetary incentives keep the...
Comparison with paid crowdsourcing
• Best paid user
– 68% quality, 40 answers (~1.5 minutes per question)
– Quality-equiva...
Citizen Science Applications
• Google gives $10K/month to nonprofits in ad budget
• Climate CoLab experiment running
– Dou...
Conclusions
• New way to run crowdsourcing, targeting with ads
• Engages unpaid users, avoids problems with extrinsic rewa...
Upcoming SlideShare
Loading in …5
×

Quizz: Targeted Crowdsourcing with a Billion (Potential) Users

1,105 views

Published on

We describe Quizz, a gamified crowdsourcing system that simultaneously assesses the knowledge of users and acquires new knowledge from them. Quizz operates by asking users to complete short quizzes on specific topics; as a user answers the quiz questions, Quizz estimates the user’s competence. To acquire new knowledge, Quizz also incorporates questions for which we do not have a known answer; the answers given by competent users provide useful signals for selecting the correct answers for these questions. Quizz actively tries to identify knowledgeable users on the Internet by running advertising campaigns, effectively leveraging “for free” the targeting capabilities of existing, publicly available, ad placement services. Quizz quantifies the contributions of the users using information theory and sends feedback to the advertising system about each user. The feedback allows the ad targeting mechanism to further optimize ad placement.

Our experiments, which involve over ten thousand users, confirm that we can crowdsource knowledge curation for niche and specialized topics, as the advertising network can automatically identify users with the desired expertise and interest in the given topic. We present controlled experiments that examine the effect of various incentive mechanisms, highlighting the need for having short-term rewards as goals, which incentivize the users to contribute. Finally, our cost-quality analysis indicates that the cost of our approach is below that of hiring workers through paid-crowdsourcing platforms, while offering the additional advantage of giving access to billions of potential users all over the planet, and being able to reach users with specialized expertise that is not typically available through existing labor marketplaces.

Published in: Technology, Business
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total views
1,105
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
19
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Quizz: Targeted Crowdsourcing with a Billion (Potential) Users

  1. 1. Targeted Crowdsourcing with a Billion (Potential) Users Panos Ipeirotis, Stern School of Business, New York University Evgeniy Gabrilovich, Google Work done while on sabbatical at Google
  2. 2. The sabbatical mission… “We have a billion users… leverage their knowledge …”
  3. 3. Knowledge Graph: Things not Strings
  4. 4. Knowledge Graph: Things not Strings
  5. 5. Still incomplete… • “Date of birth of Bayes” (…uncertain…) • “Symptom of strep throat” • “Side effects of treximet” • “Who is Cristiano Ronaldo dating” • “When is Jay Z playing in New York” • “What is the customer service number for Google” • …
  6. 6. The sabbatical mission… “Let’s create a new crowdsourcing system…” “We have a billion users… leverage their knowledge …”
  7. 7. Ideally…
  8. 8. But often…
  9. 9. The common solution…
  10. 10. Volunteers vs. hired workers • Hired workers provide predictability • …but have a monetary cost • …are motivated by money (spam, misrepresent qualifications…) • …extrinsic rewards crowd-out intrinsic motivation • …do not always have the knowledge • Volunteers cost less (…) • …but difficult to predict success
  11. 11. Key Challenge “Crowdsource in a predictable manner, with knowledgeable users, without introducing monetary rewards”
  12. 12. Quizz
  13. 13. Calibration vs. Collection • Calibration questions (known answer): Evaluating user competence on topic at hand • Collection questions (unknown answer): Asking questions for things we do not know • Trust more answers coming from competent users Tradeoff Learn more about user quality vs. getting answers (technical solution: use a Markov Decision Process)
  14. 14. Model: Markov Decision Process a correct b incorrect c collection V = c*IG[a,b] [a+1,b,c] V = c*IG[a+1,b] [a,b+1,c] V = c*IG[a,b+1] [a,b,c+1] V = (c+1)*IG[a,b] Exploit Explore User drops out User continues User correct User incorrect
  15. 15. Challenges • Why would anyone come and play this game? • Why would knowledgeable users come? • Wouldn’t it be simpler to just pay?
  16. 16. Attracting Visitors: Ad Campaigns
  17. 17. Running Ad Campaigns: Objectives • We want to attract good users, not just clicks • We do not want to think hard about keyword selection, appropriate ad text, etc. • We want automation across thousands of topics (from treatment side effects to celebrity dating)
  18. 18. Solution: Treat Quizz as eCommerce Site
  19. 19. Solution: Treat Quizz as eCommerce Site Feedback: Value of click
  20. 20. User Value: Information Gain • Value of user: total information contributed • Information gain is additive: #questions x infogain • Information gain for question with n choices, user quality q • Random user quality: q=1/n  IG(q,n) = 0 • Perfect user quality: q=1  IG(q,n) = log(n) • Using a Bayesian version to accommodate for uncertainty about q
  21. 21. How to measure quality? • Naïve (and unstable) approach: q = correct/total • Bayesian approach: q is latent, with uniform prior • Then q follows Beta(a,b) distr (a: correct, b:incorrect)
  22. 22. Expected Information Gain
  23. 23. Effect of Ad Targeting (Perhaps it is just more users?) • Control: Ad campaign with no feedback, all keywords across quizzes [optimizes for clicks] • Treatment: Ad campaign with feedback enabled [optimizes for conversions] • Clicks/visitors: Same • Conversion rate: 34% vs 13% (~3x more users participated) • Number of answers: 2866 vs 279 (~10x more answers submitted) • Total Information Gain: 7560 bits vs 610 bits (~11.5x more bits)
  24. 24. Effect of Optimizing for Conversion Value • Control: Feedback on “conversion event” but no value • Treatment: Feedback provides information gain per click • Clicks/visitors: Same • Conversion rate: 39% vs 30% (~30% more users participated) • Number of answers: 1683 vs 1183 (~42% more answers submitted) • Total Information Gain: 4690 bits vs 2870 bits (~63% more bits)
  25. 25. Example of Targeting: Medical Quizzes • Our initial goal was to use medical topics as a evidence that some topics are not crowdsourcable • Our hypothesis failed: They were the best performing quizzes… • Users coming from sites such as Mayo Clinic, WebMD, … (i.e., “pronsumers”, not professionals)
  26. 26. Participation is important! Really useful users
  27. 27. • Immediate feedback helps most – Knowing the correct answer 10x more important than knowing whether given answer was correct – Conjecture: Users also want to learn Treatment Effect Show if user answer correct +2.4% Show the correct answer +20.4% Score: % of correct answers +2.3% Score: # of correct answers -2.2% Score: Information gain +4.0% Show statistics for performance of other users +9.8% Leaderboard based on percent correct -4.8% Leaderboard based on total correct answers -1.5%
  28. 28. • Showing score is moderately helpful – Be careful what you incentivize though  – “Total Correct” incentivizes quantity, not quality Treatment Effect Show if user answer correct +2.4% Show the correct answer +20.4% Score: % of correct answers +2.3% Score: # of correct answers -2.2% Score: Information gain +4.0% Show statistics for performance of other users +9.8% Leaderboard based on percent correct -4.8% Leaderboard based on total correct answers -1.5%
  29. 29. • Competitiveness (how other users performed) helps significantly Treatment Effect Show if user answer correct +2.4% Show the correct answer +20.4% Score: % of correct answers +2.3% Score: # of correct answers -2.2% Score: Information gain +4.0% Show statistics for performance of other users +9.8% Leaderboard based on percent correct -4.8% Leaderboard based on total correct answers -1.5%
  30. 30. • Leaderboards are tricky! – Initially, strong positive effect – Over time, effect became strongly negative – All-time leaderboards considered harmful Treatment Effect Show if user answer correct +2.4% Show the correct answer +20.4% Score: % of correct answers +2.3% Score: # of correct answers -2.2% Score: Information gain +4.0% Show statistics for performance of other users +9.8% Leaderboard based on percent correct -4.8% Leaderboard based on total correct answers -1.5%
  31. 31. Cost/Benefit Analysis
  32. 32. Self-selection and participation • Low performing users naturally drop out • With paid users, monetary incentives keep them Submitted answers %correct
  33. 33. Comparison with paid crowdsourcing • Best paid user – 68% quality, 40 answers (~1.5 minutes per question) – Quality-equivalency: 13 answers @ 99% accuracy, 23 answers @ 90% accuracy – 5 cents/question, or $3/hr to match advertising cost of unpaid users • Knowledgeable users are much faster and more efficient Submitted answers %correct
  34. 34. Citizen Science Applications • Google gives $10K/month to nonprofits in ad budget • Climate CoLab experiment running – Doubled traffic with only $20/day – Targets political activist groups (not only climate) • Additional experiments: Crowdcrafting, ebird, Weendy
  35. 35. Conclusions • New way to run crowdsourcing, targeting with ads • Engages unpaid users, avoids problems with extrinsic rewards • Provides access to expert users, not available labor platforms • Experts not always professionals (e.g., Mayo Clinic users) • Nonprofits can use Google Ad Grants to attract (for free) participants to citizen science projects

×