Successfully reported this slideshow.

Programming with People

1

Share

Loading in …3
×
1 of 85
1 of 85

Programming with People

1

Share

Humans can perform many tasks with ease that remain difficult or impossible for computers. Crowdsourcing platforms like Amazon's Mechanical Turk make it possible to harness human-based computational power on an unprecedented scale. However, their utility as a general-purpose computational platform remains limited. The lack of complete automation makes it difficult to orchestrate complex or interrelated tasks. Scheduling human workers to reduce latency costs real money, and jobs must be monitored and rescheduled when workers fail to complete their tasks. Furthermore, it is often difficult to predict the length of time and payment that should be budgeted for a given task. Finally, the results of human-based computations are not necessarily reliable, both because human skills and accuracy vary widely, and because workers have a financial incentive to minimize their effort.

This talk presents AutoMan, the first fully automatic crowdprogramming system. AutoMan integrates human-based computations into a standard programming language as ordinary function calls, which can be intermixed freely with traditional functions. This abstraction allows AutoMan programmers to focus on their programming logic. An AutoMan program specifies a confidence level for the overall computation and a budget. The AutoMan runtime system then transparently manages all details necessary for scheduling, pricing, and quality control. AutoMan automatically schedules human tasks for each computation until it achieves the desired confidence level; monitors, reprices, and restarts human tasks as necessary; and maximizes parallelism across human workers while staying under budget.

AutoMan is available for download at www.automan-lang.org.

Humans can perform many tasks with ease that remain difficult or impossible for computers. Crowdsourcing platforms like Amazon's Mechanical Turk make it possible to harness human-based computational power on an unprecedented scale. However, their utility as a general-purpose computational platform remains limited. The lack of complete automation makes it difficult to orchestrate complex or interrelated tasks. Scheduling human workers to reduce latency costs real money, and jobs must be monitored and rescheduled when workers fail to complete their tasks. Furthermore, it is often difficult to predict the length of time and payment that should be budgeted for a given task. Finally, the results of human-based computations are not necessarily reliable, both because human skills and accuracy vary widely, and because workers have a financial incentive to minimize their effort.

This talk presents AutoMan, the first fully automatic crowdprogramming system. AutoMan integrates human-based computations into a standard programming language as ordinary function calls, which can be intermixed freely with traditional functions. This abstraction allows AutoMan programmers to focus on their programming logic. An AutoMan program specifies a confidence level for the overall computation and a budget. The AutoMan runtime system then transparently manages all details necessary for scheduling, pricing, and quality control. AutoMan automatically schedules human tasks for each computation until it achieves the desired confidence level; monitors, reprices, and restarts human tasks as necessary; and maximizes parallelism across human workers while staying under budget.

AutoMan is available for download at www.automan-lang.org.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Programming with People

  1. 1. Dan  Barowy,  Charlie  Curtsinger,  Emery  Berger,  Andrew  McGregor   Programming  with  People:   Integra(ng  Human-­‐Based   &  Digital  Computa(on  
  2. 2. Computers really good at some tasks… decoding human genome
  3. 3. designing your next car…
  4. 4. blood flow simulation…
  5. 5. (science) blood flow simulation…
  6. 6. Not so good at other tasks…
  7. 7. “is this a giraffe?”
  8. 8. isGiraffe(image) (not a real function!)
  9. 9. Can we “implement” this function? isGiraffe( )
  10. 10. Can we “implement” this function? isGiraffe( ) We could just ask people!
  11. 11. isGiraffe( ) Find people via MTurk…
  12. 12. h"p://mturk.com   MTurk = Amazon’s “Mechanical Turk”
  13. 13. A sample task.
  14. 14. Original “Mechanical Turk”: 18th Century chess machine (!)
  15. 15. The secret. Amazon’s service also looks like a computer but has people inside.
  16. 16. Now we implement this function with MTurk workers… isGiraffe( ) True
  17. 17. Q1: How much is this task worth? isGiraffe( ) True
  18. 18. isGiraffe( ) True Q2: How much time should it take?
  19. 19. isGiraffe( ) False Q3: What if worker gets it wrong?
  20. 20. Why would a worker get it wrong? False
  21. 21. Incompetent or lazy (“Homer”) False
  22. 22. Bots (Internet + $ = Scammers) False
  23. 23. Both just guessing answers.
  24. 24. Or…evil genius: deliberately picks wrong answer. False
  25. 25. random adversary model No need to worry about evil geniuses: we are fine with a
  26. 26. random adversary model i.e., Homer & Bender = OK.
  27. 27. random adversary model i.e., Homer & Bender = OK. We rule Dr. Evil out. Why?
  28. 28. long-term financial incentive MTurk tracks approval rates and work records:
  29. 29. credentials limit Sybil attacks Use of financial credentials make it hard to gin up new accounts, and
  30. 30. isGiraffe( ) ?   But how do we know any one person is not a Homer or a Bender?
  31. 31. isGiraffe( ) Idea: vote. If both agree on the answer, we’re happy, right?
  32. 32. Pr[agree]  =  1/2   isGiraffe( ) Not so much. Random = 50% chance of agreement.
  33. 33. Pr[agree]  =  1/32   <  5%   BUT: can dramatically reduce likelihood by increasing # of workers.
  34. 34. Pr[agree]  =  k(1/k)n   (unanimous  agreement)     k = # choices, n = # workers. (see paper for more details)
  35. 35. é choices ê Pr[agree] More choices = fewer people
  36. 36. isGiraffe( ) ( ) AutoMan: DSL in Scala – runs on any JVM.
  37. 37. isGiraffe( ) ( ) Total $ for computation AutoMan programmer-specified
  38. 38. isGiraffe( ) ( ) Total $ for computation Confidence level (per function) 95% (p < 0.05) AutoMan programmer-specified
  39. 39. isGiraffe( ) US minimum wage, Adaptive doubling AutoMan ties pay and time; Doubles both if no workers.
  40. 40. isGiraffe( ) US minimum wage, Adaptive doubling 30s, $0.06 ($7.25 / 120) Initially: all tasks 30 seconds.
  41. 41. isGiraffe( ) US minimum wage, Adaptive doubling prevents gaming 30s, $0.06 ($7.25 / 120) 60s, $0.12 No one shows up: both doubled.
  42. 42. You might think this is exploitable: just wait for jobs to rise in price.
  43. 43. Strategy fails when other people are around & grab money.
  44. 44.                          =  base  (Pavail)round                                  *  mul=plierround   E[gain]     Math: workers should never wait. Expected earnings after some # of rounds…
  45. 45. =  base  (½)round        *  mul=plierround   E[gain]     Even odds somebody will take the money…
  46. 46. E[gain]    =  base  (½)round        *  2round   Doubling increases wage after each round…
  47. 47. =  base           E[gain]     Terms dependent on rounds cancel out.
  48. 48. no incentive to wait =  base           E[gain]    
  49. 49. isGiraffe( ) True 95% confidence AutoMan manages time, $, quality.
  50. 50. How  many  giraffes  are   in  this  picture?   k = 3 choices! AutoMan handles “radio button” questions
  51. 51. How  many  giraffes  are   in  this  picture?   k = 3 choices!Risk: Homer & Bender always guess
  52. 52. How  many  giraffes  are   in  this  picture?   k = 3 choices!E.g., always choose first option.
  53. 53. How  many  giraffes  are   in  this  picture?   k = 3 choices! To combat this, AutoMan randomizes answers.
  54. 54. 25 choices! Which  are  from  Sesame  Street?   Kermit  the  Frog               Spongebob  Squarepants   Cookie  Monster                                           The  Count   Oscar  the  Grouch    ☐   ☐   ☐   ☐   ☐   “Checkbox” questions
  55. 55. Which  are  from  Sesame  Street?   Kermit  the  Frog               Spongebob  Squarepants   Cookie  Monster                                           The  Count   Oscar  the  Grouch    þ þ þ þ þ 25 choices! Same risk: random respondents
  56. 56. Which  are  from  Sesame  Street?   Kermit  the  Frog               Spongebob  Squarepants   Cookie  Monster                                           The  Count   Oscar  the  Grouch    þ þ ☐   þ ☐   25 choices! AutoMan checks each randomly
  57. 57. What  does  this   license  plate  say?   36d choices! XXXXXX 366 choices = !2176782336[A-Z0-9]{6}! Last question category: constrained free-text
  58. 58. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  tasks  @  $0.06;  30s  work     t1   t2   t3   Example real execution
  59. 59. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  tasks  @  $0.06;  30s  work     t1   t2   t3   1m  50s  
  60. 60. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  tasks  @  $0.06;  30s  work     t1   t2   t3   1m  50s   2m  30s  
  61. 61. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  tasks  @  $0.06;  30s  work     t1   t2   t3   1m  50s   2m  30s   2m  50s  
  62. 62. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  tasks  @  $0.06;  30s  work     t1   t2   t3   1m  50s   2m  30s   2m  50s  Inconclusive!  
  63. 63. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  more  tasks   t1   t2   t3   t4   t5   t6  
  64. 64. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  more  tasks   t1   t2   t3   t4   t5   t6   7m  
  65. 65. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  more  tasks   t1   t2   t3   t4   t5   t6   18m  50s   7m  
  66. 66. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawns  3  more  tasks   t1   t2   t3   t4   t5   t6   7m   18m  50s   51m   Timeout: double pay and time
  67. 67. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawn  1  more  task  @  $0.12,  60s  work   t1   t2   t3   t4   t5   t6   t7  
  68. 68. Which  one  of  these  doesn’t  belong?   [95%  conf.]   AUTOMAN:  spawn  1  more  task  @  $0.12,  60s  work   t1   t2   t3   t4   t5   t6   t7   1h  9m  50s;   cost  =  $0.36   AUTOMAN:  5  out  of  6    ⇒  95%  confidence;   return      
  69. 69. read_plate( ) More sophisticated function:
  70. 70. 12.2%   [Maryland  State  Highway  Administra=on]   Success rate of real system!
  71. 71. Easy under optimal conditions
  72. 72. More complex in general
  73. 73. “Difficult” set of plates
  74. 74. Easier to read than CAPTCHAs!
  75. 75. Real task as posted on MTurk
  76. 76. Workflow: pictures to strings
  77. 77. def is_car(img_url: String) = a.RadioButtonQuestion { q => q.budget = 1.00 q.confidence = 0.95 q.text = “Is this a car?” q.image_url = img_url q.options = List( a.Option('yes, ”Yes"), a.Option('no, ”No”) ) } Actual AutoMan code:
  78. 78. def get_plate_text(img_url: String) = a.FreeTextQuestion { q => q.text = ”What does this plate say?" q.image_url = img_url q.pattern = "XXXXXYYY” } Actual AutoMan code:
  79. 79. t1 t2 t3 t3 t4 t5 t6 t7 t8 "Is this a vehicle?" start end $0.06 post tasks w1: yes w2: yes w3: yes 3 answers w4: yes 1 answer w5: yes Task 1 Task 2 Task 3 Task 4 Task 5 1 answer "What does the license plate say?" unanimous agreement! post tasks $0.06 workers disagree! 2 answers post tasks Task 8 Task 9 timeout! $0.12$0.06 post tasks X cancelled! 1 answer end 767JKF  yes   w6: 767JFK w7: 767JKF Task 6 Task 7 w8: 767JKF Task 10 Task 11 Example execution
  80. 80. MediaLab  LPR  database     “extremely  dif-icult”  dataset   144  plates   Accuracy:  91.6%   Average  cost:  12.08  cents   Latency:  <  2  minutes  per  image          >12.2%!   AutoMan evaluation
  81. 81. www.automan-­‐lang.org   AUTOMAN:   Programming  with  People   read_plate( ) def read_plate(url: String) = a.FreeTextQuestion { q => q.text = ”What does this plate say?” q.image_url = url q.pattern = "XXXXXYYY” } Dan  Barowy,  Charlie  Curtsinger,  Emery  Berger,  Andrew  McGregor  

×