UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

863 views
753 views

Published on

Joint work with Hyun Joon Jung describing our submission to this year's IRAT task, presented at NIST TREC conference (November 8, 2012)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
863
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

  1. 1. TREC 2012 Crowdsourcing Track Becoming IRATE : UT Austin’sImage Relevance Assessment Task Enthusiasm! Hyun Joon Jung Matthew Lease hyunJoon@utexas.edu ml@ischool.utexas.edu @mattlease
  2. 2. Key Points• Interface design for efficient, cohesive judging – Collected 44K labels for $40• Off-the-shelf worker scoring metric (Raykar & Yu)• Completely unsupervised (no training or tuning)• Online label analysis (cf. Welinder & Perona’10)• Personalized error reports for workers• … and all in 3 weeks!  2
  3. 3. Interface Design 3
  4. 4. Scoring and Incentivizing Workers V. Raykar, S. Yu, L. Zhao, G. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. Journal of Machine Learning Research, 99:1297–1322, 2010. 4
  5. 5. Past Work: Offline Crowdsourcinge.g., Jung & Lease, HCOMP 2012 5
  6. 6. Here: Online CrowdsourcingUnsupervised, Incremental, Iterative data collection Label Collection Worker Evaluation Trusted Confident Ambiguous workers Iterative Welinder & Perona. Online Pseudo-Ground Truth crowdsourcing: Rating annotators and obtaining cost-effective labels. CVPR’10 Workshops. 6
  7. 7. Collecting Labels• Partition examples into subsets• For each example in the current partition – Collect 2k labels for the example – If Jaccard agreement & high confidence • Declare aggregate label as “pseudo-gold” – Else if within budget and trusted workers exist • Collect another label and re-test for pseudo-gold – Else • Give up, output best guess aggregate label 7
  8. 8. Identifying Trusted Workers• For a subset of psuedo-gold examples – Collect 2k labels for the example• For each worker – If spammer score > 0.5 over >= 100 examples • Add worker to trusted pool 8
  9. 9. Personalized Error Reports 9
  10. 10. Number of labels & Cost Breakdown # of workers per judgments 182, 1% 40, 0% • 80% of judgments: labeled only twice. 3821, 19% 2 workers • 99% of judgments: labeled only three 3 workers times. 4 workers 15757, 80% 5 workers Cost breakdown • Label Collection: $22 (44,000 Labels / 100 labels per HIT * 0.05) • Worker Evaluation: $5 (10,000 labels / 100 labels per HIT * 0.05) • Bonus: $10 to 4 trusted workers based on our policy 10
  11. 11. Effectiveness 11
  12. 12. Key Points• Some interesting ideas to explore further – Interface design – Online label analysis (cf. Welinder & Perona’10) – Personalized error reports for workers• Some nice properties – Unsupervised, 44K labels for $40, rapid development• Preliminary results, more analysis needed… 12
  13. 13. Thanks!NIST: Ellen & IanTrack Org: Gabriella & Mark ir.ischool.utexas.edu/crowdSupport – Temple Fellowship Matt Lease - ml@ischool.utexas.edu - @mattlease

×