Dan	
  Barowy,	
  Charlie	
  Curtsinger,	
  Emery	
  Berger,	
  Andrew	
  McGregor	
  
Programming	
  with	
  People:	
  
Integra(ng	
  Human-­‐Based	
  
&	
  Digital	
  Computa(on	
  
Computers
really good at
some tasks…
decoding human genome
designing your next car…
blood flow simulation…
(science)
blood flow simulation…
Not so good at
other tasks…
“is this a giraffe?”
isGiraffe(image)
(not a real function!)
Can we
“implement”
this function?
isGiraffe( )
Can we
“implement”
this function?
isGiraffe( )
We could just ask people!
isGiraffe( )
Find people via MTurk…
h"p://mturk.com	
  
MTurk = Amazon’s
“Mechanical Turk”
A sample task.
Original “Mechanical
Turk”: 18th Century
chess machine (!)
The secret. Amazon’s
service also looks like a
computer but has
people inside.
Now we
implement
this function
with MTurk
workers…
isGiraffe( )
True
Q1: How
much is this
task worth?
isGiraffe( )
True
isGiraffe( )
True
Q2: How
much time
should it
take?
isGiraffe( )
False
Q3: What if
worker gets
it wrong?
Why would
a worker get
it wrong?
False
Incompetent or
lazy (“Homer”)
False
Bots (Internet + $ = Scammers)
False
Both just
guessing
answers.
Or…evil genius:
deliberately picks
wrong answer.
False
random adversary model
No need to worry
about evil geniuses:
we are fine with a
random adversary model
i.e., Homer & Bender = OK.
random adversary model
i.e., Homer & Bender = OK.
We rule Dr. Evil out. Why?
long-term financial incentive
MTurk tracks approval
rates and work records:
credentials limit Sybil attacks
Use of financial credentials
make it hard to gin up new
accounts, and
isGiraffe( )
?	
  
But how do we know
any one person is not a
Homer or a Bender?
isGiraffe( )
Idea: vote. If both agree
on the answer, we’re
happy, right?
Pr[agree]	
  =	
  1/2	
  
isGiraffe( )
Not so much. Random =
50% chance of agreement.
Pr[agree]	
  =	
  1/32	
  
<	
  5%	
  
BUT: can dramatically
reduce likelihood by
increasing # of workers.
Pr[agree]	
  =	
  k(1/k)n	
  
(unanimous	
  agreement)	
  	
  
k = # choices, n = # workers.
(see paper for more details)
é choices
ê Pr[agree]
More choices = fewer people
isGiraffe( )
( )
AutoMan: DSL in Scala
– runs on any JVM.
isGiraffe( )
( )
Total $ for computation
AutoMan programmer-specified
isGiraffe( )
( )
Total $ for computation
Confidence level
(per function)
95%
(p < 0.05)
AutoMan programmer-specified
isGiraffe( )
US minimum wage,
Adaptive doubling
AutoMan ties pay and time;
Doubles both if no workers.
isGiraffe( )
US minimum wage,
Adaptive doubling
30s, $0.06 ($7.25 / 120)
Initially: all tasks 30 seconds.
isGiraffe( )
US minimum wage,
Adaptive doubling
prevents gaming
30s, $0.06 ($7.25 / 120)
60s, $0.12
No one shows up: both doubled.
You might think
this is exploitable:
just wait for jobs to
rise in price.
Strategy fails
when other
people are
around &
grab money.
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  =	
  base	
  (Pavail)round	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  *	
  mul=plierround	
  
E[gain]	
  	
  
Math: workers
should never wait.
Expected
earnings
after some #
of rounds…
=	
  base	
  (½)round	
  
	
  	
  	
  *	
  mul=plierround	
  
E[gain]	
  	
  
Even odds
somebody
will take the
money…
E[gain]	
  	
  =	
  base	
  (½)round	
  
	
  	
  	
  *	
  2round	
  
Doubling
increases
wage after
each round…
=	
  base	
  	
  	
  
	
  
	
  
E[gain]	
  	
  
Terms
dependent on
rounds
cancel out.
no incentive to wait
=	
  base	
  	
  	
  
	
  
	
  
E[gain]	
  	
  
isGiraffe( )
True 95% confidence
AutoMan
manages
time, $,
quality.
How	
  many	
  giraffes	
  are	
  
in	
  this	
  picture?	
  
k = 3 choices!
AutoMan
handles
“radio
button”
questions
How	
  many	
  giraffes	
  are	
  
in	
  this	
  picture?	
  
k = 3 choices!Risk: Homer &
Bender always guess
How	
  many	
  giraffes	
  are	
  
in	
  this	
  picture?	
  
k = 3 choices!E.g., always choose
first option.
How	
  many	
  giraffes	
  are	
  
in	
  this	
  picture?	
  
k = 3 choices!
To combat this,
AutoMan randomizes
answers.
25 choices!
Which	
  are	
  from	
  Sesame	
  Street?	
  
Kermit	
  the	
  Frog	
  	
  	
  	
  	
  	
  	
  
Spongebob	
  Squarepants	
  
Cookie	
  Monster	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
The	
  Count	
  
Oscar	
  the	
  Grouch	
  	
  ☐	
  
☐	
  
☐	
  
☐	
  
☐	
  
“Checkbox” questions
Which	
  are	
  from	
  Sesame	
  Street?	
  
Kermit	
  the	
  Frog	
  	
  	
  	
  	
  	
  	
  
Spongebob	
  Squarepants	
  
Cookie	
  Monster	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
The	
  Count	
  
Oscar	
  the	
  Grouch	
  	
  þ	
 
þ	
 
þ	
 
þ	
 
þ	
 
25 choices!
Same risk: random respondents
Which	
  are	
  from	
  Sesame	
  Street?	
  
Kermit	
  the	
  Frog	
  	
  	
  	
  	
  	
  	
  
Spongebob	
  Squarepants	
  
Cookie	
  Monster	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
The	
  Count	
  
Oscar	
  the	
  Grouch	
  	
  þ	
 
þ	
 
☐	
  
þ	
 
☐	
  
25 choices!
AutoMan checks each randomly
What	
  does	
  this	
  
license	
  plate	
  say?	
  
36d choices!
XXXXXX
366 choices = !2176782336[A-Z0-9]{6}!
Last question category:
constrained free-text
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  tasks	
  @	
  $0.06;	
  30s	
  work	
  	
  
t1	
   t2	
   t3	
  
Example real execution
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  tasks	
  @	
  $0.06;	
  30s	
  work	
  	
  
t1	
   t2	
   t3	
  
1m	
  50s	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  tasks	
  @	
  $0.06;	
  30s	
  work	
  	
  
t1	
   t2	
   t3	
  
1m	
  50s	
  
2m	
  30s	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  tasks	
  @	
  $0.06;	
  30s	
  work	
  	
  
t1	
   t2	
   t3	
  
1m	
  50s	
  
2m	
  30s	
  
2m	
  50s	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  tasks	
  @	
  $0.06;	
  30s	
  work	
  	
  
t1	
   t2	
   t3	
  
1m	
  50s	
  
2m	
  30s	
  
2m	
  50s	
  Inconclusive!	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  more	
  tasks	
  
t1	
   t2	
   t3	
   t4	
   t5	
   t6	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  more	
  tasks	
  
t1	
   t2	
   t3	
   t4	
   t5	
   t6	
  
7m	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  more	
  tasks	
  
t1	
   t2	
   t3	
   t4	
   t5	
   t6	
  
18m	
  50s	
  
7m	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawns	
  3	
  more	
  tasks	
  
t1	
   t2	
   t3	
   t4	
   t5	
   t6	
  
7m	
  
18m	
  50s	
  
51m	
  
Timeout: double pay and time
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawn	
  1	
  more	
  task	
  @	
  $0.12,	
  60s	
  work	
  
t1	
   t2	
   t3	
   t4	
   t5	
   t6	
   t7	
  
Which	
  one	
  of	
  these	
  doesn’t	
  belong?	
  
[95%	
  conf.]	
  
AUTOMAN:	
  spawn	
  1	
  more	
  task	
  @	
  $0.12,	
  60s	
  work	
  
t1	
   t2	
   t3	
   t4	
   t5	
   t6	
   t7	
  
1h	
  9m	
  50s;	
  
cost	
  =	
  $0.36	
  
AUTOMAN:	
  5	
  out	
  of	
  6	
  
	
  ⇒	
  95%	
  confidence;	
  
return	
  	
  	
  
read_plate( )
More sophisticated function:
12.2%	
  
[Maryland	
  State	
  Highway	
  Administra=on]	
  
Success rate of real system!
Easy under optimal conditions
More complex in general
“Difficult” set of plates
Easier to read than CAPTCHAs!
Real task as posted on MTurk
Workflow: pictures to strings
def is_car(img_url: String) =
a.RadioButtonQuestion { q =>
q.budget = 1.00
q.confidence = 0.95
q.text = “Is this a car?”
q.image_url = img_url
q.options = List(
a.Option('yes, ”Yes"),
a.Option('no, ”No”)
)
}
Actual AutoMan code:
def get_plate_text(img_url: String) =
a.FreeTextQuestion { q =>
q.text = ”What does this plate
say?"
q.image_url = img_url
q.pattern = "XXXXXYYY”
}
Actual AutoMan code:
t1 t2 t3 t3 t4 t5 t6 t7 t8
"Is this a vehicle?"
start end
$0.06
post tasks
w1:
yes
w2:
yes
w3:
yes
3 answers
w4:
yes
1 answer
w5:
yes
Task 1
Task 2
Task 3
Task 4
Task 5
1 answer
"What does the license plate say?"
unanimous
agreement!
post tasks
$0.06
workers
disagree!
2 answers post tasks
Task 8
Task 9
timeout!
$0.12$0.06
post tasks
X
cancelled!
1 answer
end
767JKF	
  yes	
  
w6:
767JFK
w7:
767JKF
Task 6
Task 7
w8:
767JKF
Task
10
Task
11
Example execution
MediaLab	
  LPR	
  database	
  	
  
“extremely	
  dif-icult”	
  dataset	
  
144	
  plates	
  
Accuracy:	
  91.6%	
  
Average	
  cost:	
  12.08	
  cents	
  
Latency:	
  <	
  2	
  minutes	
  per	
  image	
  
	
  	
  	
  	
  >12.2%!	
  
AutoMan evaluation
www.automan-­‐lang.org	
  
AUTOMAN:	
  
Programming	
  with	
  People	
  
read_plate( )
def read_plate(url:
String) =
a.FreeTextQuestion { q =>
q.text = ”What does this
plate say?”
q.image_url = url
q.pattern = "XXXXXYYY”
}
Dan	
  Barowy,	
  Charlie	
  Curtsinger,	
  Emery	
  Berger,	
  Andrew	
  McGregor	
  

Programming with People