Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CrowdED: Guideline for optimal Crowdsourcing Experimental Design

398 views

Published on

Presented at the #HumL workshop "Augmenting Intelligence with Humans­-in-­the-­Loop" at the WWW 2018 conference

Published in: Education
  • Be the first to comment

  • Be the first to like this

CrowdED: Guideline for optimal Crowdsourcing Experimental Design

  1. 1. CrowdED: Guideline for Optimal Crowdsourcing Amrapali Zaveri, Pedro Hernandez Serrano, Manisha Desai, Michel Dumontier HumL@WWW2018 @AmrapaliZ 24 April, 20181
  2. 2. Crowdsourcing Tasks ❖ Tasks based on human skills not yet replicable by machines ❖ Highly parallelizable tasks ❖ Every human (worker) must be provided with a monetary reward for an answer ❖ Consolidated answers solve scientific problems !2
  3. 3. Crowdsourcing Design ❖ Gold standard questions ❖ Master Workers ❖ Majority voting ❖ Overall accuracy !3
  4. 4. Crowdsourcing Use Case Biomedical Metadata Quality Assessment* !4 *MetaCrowd: Crowdsourcing Biomedical Metadata Quality Assessment.  Amrapali Zaveri and Michel Dumontier. Bio-Ontologies 2017.
  5. 5. How CrowdED is too crowded? BUT !5
  6. 6. Research Question Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks? !6
  7. 7. CrowdED a two-staged statistical Crowdsourcing Experimental Design !7
  8. 8. Related Studies !8 Adaptive Model Active Learning KB Test Questions Self Assessment Cost-Time & Cost-Quality Optimization CrowdED CrowdED offers a two-staged statistical model to estimate a-priori worker and task assignment to achieve maximum accuracy.
  9. 9. Stage 1: • Train all workers • On a proportion of tasks • Identify best workers & • Hard tasks 2 Stages !9 ! Stage 2: • Assign best workers to • Hard tasks • Remaining tasks • Calculate Overall Accuracy !
  10. 10. Stage 1 ! Stage 1 Easy Hard Good Poor Workers Tasks !10
  11. 11. Assign Tasks to Workers ! Stage 1 Easy Hard Good Poor Workers Tasks Task Label Truth 1 1 hard_task age 1 2 hard_task age 1 3 hard_task age 1 4 easy_task age 1 5 easy_task age Simulate Odd no. Proportion of tasks to train !11 Worker Label Truth 1 1 good_worker age 2 1 poor_worker age 3 1 good_worker age 4 1 good_worker age 5 1 poor_worker age Workerview Taskview
  12. 12. Calculate Worker Accuracy & Task Difficulty !12 Task Label Truth Task Difficulty 1 1 hard_task age 0.54 1 2 hard_task age 0.42 1 3 hard_task age 0.45 1 4 easy_task age 0.80 1 5 easy_task age 0.70 Worker Label Truth Worker Accuracy 1 1 good_worker age 0.75 2 1 poor_worker age 0.58 3 1 good_worker age 0.78 4 1 good_worker age 0.95 5 1 poor_worker age 0.54 Workerview Taskview
  13. 13. Simulate Worker Answer !13 Task Label Truth Task Difficulty Worker Answer 1 1 hard_task age 0.54 age 1 2 hard_task age 0.42 tissue 1 3 hard_task age 0.45 disease 1 4 easy_task age 0.80 age 1 5 easy_task age 0.70 age Worker Label Truth Worker Accuracy Worker Answer 1 1 good_worker age 0.75 age 2 1 poor_worker age 0.58 age 3 1 good_worker age 0.78 age 4 1 good_worker age 0.95 tissue 5 1 poor_worker age 0.54 age !13 Workerview Taskview
  14. 14. Calculate Worker Performance Avg. proportion of times a worker is in agreement with other workers for a given task vs. all tasks performed by the worker Range [0…1] Threshold identify ! Easy Hard Good Poor !14
  15. 15. Easy Tasks !15 Hard Tasks! Worker Label Truth Worker Accuracy Worker Answer 1 1 good_worker age 0.75 age 2 1 poor_worker age 0.58 age 3 1 good_worker age 0.78 age 4 1 good_worker age 0.95 tissue 5 1 poor_worker age 0.54 age Worker Label Truth Worker Accuracy Worker Answer 2 2 good_worker age 0.75 treatment 3 2 poor_worker age 0.58 disease 15 2 good_worker age 0.78 age 17 2 poor_worker age 0.95 tissue 20 2 poor_worker age 0.54 Taskview Taskview
  16. 16. Stage 1: • Train all workers • On a proportion of tasks • Identify best workers & • Hard tasks 2 Stages !16 ! Stage 2: • Assign best workers to • Hard tasks & • Remaining tasks • Calculate Overall Accuracy !
  17. 17. Stage 2 ! Easy Hard Good Poor Stage 2 !17
  18. 18. Simulate Worker Answer Stage 2 ! Hard Good simulate Remaining Tasks !18 Task Label Truth Task Difficulty Worker Answer 1 1 hard_task age 0.54 age 1 2 hard_task age 0.42 tissue 1 3 hard_task age 0.45 disease 1 4 easy_task age 0.80 age 1 5 easy_task age 0.70 age Workerview
  19. 19. Merge Stage 1 and 2 & Assign Answers !19 Worker Label Truth Worker Accuracy Worker Answer 1 1 good_worker age 0.75 age 2 1 poor_worker age 0.58 age 3 1 good_worker age 0.78 age 4 1 good_worker age 0.95 tissue 5 1 poor_worker age 0.54 age Taskview Answer = age
  20. 20. Assessing Design Merged Dataset calculate !20 Overall Accuracy avg. of all the tasks which had consensus Worker Label Truth Worker Accuracy Worker Answer 1 1 good_worker age 0.75 age 2 1 poor_worker age 0.58 age 3 1 good_worker age 0.78 age 4 1 good_worker age 0.95 tissue 5 1 poor_worker age 0.54 age Taskview
  21. 21. Experimental Evaluation • tasks = [60, 80, 100, 120, 140, 160, 180] • workers = [20, 30, 40] • answers key = ["liver", "blood", "lung", "brain", “heart"] • good workers = [0.1, 0.3, 0.5, 0.7, 0.9] • hard tasks = [0.1, 0.3, 0.5, 0.7, 0.9] • proportion of training tasks = [0.2, 0.3, 0.4, 0.5, 0.6] • workers per task = [3, 5, 7, 9, 11] 13,125 combinations !21
  22. 22. • Results support the intuition that reduced difficulty (10%) in tasks result in higher accuracy !22
  23. 23. • calculating the performance of the workers in combination with whether she was a good worker (from the beginning) ensures that she is the best worker • adopting the two- staged algorithm ensures that only the best workers are chosen to perform all the tasks !23
  24. 24. Results !24
  25. 25. CrowdED recommendation • no. of workers should be 40-60% of the total number of tasks • train workers on 40-60% of the tasks in Stage 1 • set the number of workers per task to be either 3, 5 or 7 (fewer than 9) • reduce the number of hard tasks • adopt the two-staged algorithm to identify the best workers !25
  26. 26. https://pedrohserrano.shinyapps.io/crowdapp/ !26
  27. 27. Conclusion & Future Work • Two-staged statistical design for designing optimal crowdsourcing experiments • a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks • Implemented in Python, open source, Jupyter notebook • Future work • Training the workers vs. not training • Real-world experiments and comparison with baseline approaches • Include budgetary constraints • Extend the interface to allow user to vary parameters and observe sensitivity the design is to various assumptions !27
  28. 28. @AmrapaliZamrapali.zaveri@maastrichtuniversity.nl Thank You! Questions? Try it yourself https://github.com/MaastrichtU-IDS/crowdED Feedback welcome ! !28

×