Programming with People

Dan
Barowy,
Charlie
Curtsinger,
Emery
Berger,
Andrew
McGregor

Programming
with
People:

Integra(ng
Human-‐Based

&
Digital
Computa(on

Computers
really good at
some tasks…
decoding human genome

(science)
blood flow simulation…

isGiraffe(image)
(not a real function!)

Can we
“implement”
this function?
isGiraffe( )

Can we
“implement”
this function?
isGiraffe( )
We could just ask people!

isGiraffe( )
Find people via MTurk…

h"p://mturk.com

MTurk = Amazon’s
“Mechanical Turk”

Original “Mechanical
Turk”: 18th Century
chess machine (!)

The secret. Amazon’s
service also looks like a
computer but has
people inside.

Now we
implement
this function
with MTurk
workers…
isGiraffe( )
True

Q1: How
much is this
task worth?
isGiraffe( )
True

isGiraffe( )
True
Q2: How
much time
should it
take?

isGiraffe( )
False
Q3: What if
worker gets
it wrong?

Why would
a worker get
it wrong?
False

Incompetent or
lazy (“Homer”)
False

Bots (Internet + $ = Scammers)
False

Or…evil genius:
deliberately picks
wrong answer.
False

random adversary model
No need to worry
about evil geniuses:
we are fine with a

i.e., Homer & Bender = OK.

i.e., Homer & Bender = OK.
We rule Dr. Evil out. Why?

long-term financial incentive
MTurk tracks approval
rates and work records:

credentials limit Sybil attacks
Use of financial credentials
make it hard to gin up new
accounts, and

isGiraffe( )
?

But how do we know
any one person is not a
Homer or a Bender?

isGiraffe( )
Idea: vote. If both agree
on the answer, we’re
happy, right?

Pr[agree]
=
1/2

isGiraffe( )
Not so much. Random =
50% chance of agreement.

Pr[agree]
=
1/32

<
5%

BUT: can dramatically
reduce likelihood by
increasing # of workers.

Pr[agree]
=
k(1/k)n

(unanimous
agreement)

k = # choices, n = # workers.
(see paper for more details)

é choices
ê Pr[agree]
More choices = fewer people

isGiraffe( )
( )
AutoMan: DSL in Scala
– runs on any JVM.

isGiraffe( )
( )
Total $ for computation
AutoMan programmer-specified

isGiraffe( )
( )
Total $ for computation
Confidence level
(per function)
95%
(p < 0.05)
AutoMan programmer-specified

isGiraffe( )
US minimum wage,
Adaptive doubling
AutoMan ties pay and time;
Doubles both if no workers.

isGiraffe( )
US minimum wage,
Adaptive doubling
30s, $0.06 ($7.25 / 120)
Initially: all tasks 30 seconds.

isGiraffe( )
US minimum wage,
Adaptive doubling
prevents gaming
30s, $0.06 ($7.25 / 120)
60s, $0.12
No one shows up: both doubled.

You might think
this is exploitable:
just wait for jobs to
rise in price.

Strategy fails
when other
people are
around &
grab money.

=
base
(Pavail)round

*
mul=plierround

E[gain]

Math: workers
should never wait.
Expected
earnings
after some #
of rounds…

=
base
(½)round

*
mul=plierround

E[gain]

Even odds
somebody
will take the
money…

E[gain]

=
base
(½)round

*
2round

Doubling
increases
wage after
each round…

=
base

E[gain]

Terms
dependent on
rounds
cancel out.

no incentive to wait
=
base

E[gain]

isGiraffe( )
True 95% confidence
AutoMan
manages
time, $,
quality.

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!
AutoMan
handles
“radio
button”
questions

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!Risk: Homer &
Bender always guess

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!E.g., always choose
first option.

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!
To combat this,
AutoMan randomizes
answers.

25 choices!
Which
are
from
Sesame
Street?

Kermit
the
Frog

Spongebob
Squarepants

Cookie
Monster

The
Count

Oscar
the
Grouch

☐

☐

☐

☐

☐

“Checkbox” questions

Which
are
from
Sesame
Street?

Kermit
the
Frog

Spongebob
Squarepants

Cookie
Monster

The
Count

Oscar
the
Grouch

þ

þ

þ

þ

þ

25 choices!
Same risk: random respondents

Which
are
from
Sesame
Street?

Kermit
the
Frog

Spongebob
Squarepants

Cookie
Monster

The
Count

Oscar
the
Grouch

þ

þ

☐

þ

☐

25 choices!
AutoMan checks each randomly

What
does
this

license
plate
say?

36d choices!
XXXXXX
366 choices = !2176782336[A-Z0-9]{6}!
Last question category:
constrained free-text

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

Example real execution

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

2m
30s

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

2m
30s

2m
50s

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

2m
30s

2m
50s
Inconclusive!

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

7m

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

18m
50s

7m

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

7m

18m
50s

51m

Timeout: double pay and time

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawn
1
more
task
@
$0.12,
60s
work

t1
t2
t3
t4
t5
t6
t7

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawn
1
more
task
@
$0.12,
60s
work

t1
t2
t3
t4
t5
t6
t7

1h
9m
50s;

cost
=
$0.36

AUTOMAN:
5
out
of
6

⇒
95%
conﬁdence;

return

read_plate( )
More sophisticated function:

12.2%

[Maryland
State
Highway
Administra=on]

Success rate of real system!

def is_car(img_url: String) =
a.RadioButtonQuestion { q =>
q.budget = 1.00
q.confidence = 0.95
q.text = “Is this a car?”
q.image_url = img_url
q.options = List(
a.Option('yes, ”Yes"),
a.Option('no, ”No”)
)
}
Actual AutoMan code:

def get_plate_text(img_url: String) =
a.FreeTextQuestion { q =>
q.text = ”What does this plate
say?"
q.image_url = img_url
q.pattern = "XXXXXYYY”
}
Actual AutoMan code:

t1 t2 t3 t3 t4 t5 t6 t7 t8
"Is this a vehicle?"
start end
$0.06
post tasks
w1:
yes
w2:
yes
w3:
yes
3 answers
w4:
yes
1 answer
w5:
yes
Task 1
Task 2
Task 3
Task 4
Task 5
1 answer
"What does the license plate say?"
unanimous
agreement!
post tasks
$0.06
workers
disagree!
2 answers post tasks
Task 8
Task 9
timeout!
$0.12$0.06
post tasks
X
cancelled!
1 answer
end
767JKF
yes

w6:
767JFK
w7:
767JKF
Task 6
Task 7
w8:
767JKF
Task
10
Task
11
Example execution

MediaLab
LPR
database

“extremely
dif-icult”
dataset

144
plates

Accuracy:
91.6%

Average
cost:
12.08
cents

Latency:
<
2
minutes
per
image

>12.2%!

AutoMan evaluation

www.automan-‐lang.org

AUTOMAN:

Programming
with
People

read_plate( )
def read_plate(url:
String) =
a.FreeTextQuestion { q =>
q.text = ”What does this
plate say?”
q.image_url = url
q.pattern = "XXXXXYYY”
}
Dan
Barowy,
Charlie
Curtsinger,
Emery
Berger,
Andrew
McGregor

Programming with People

Recommended

Recommended

More Related Content

Similar to Programming with People

Similar to Programming with People (20)

More from Emery Berger

More from Emery Berger (20)

Recently uploaded

Recently uploaded (20)

Programming with People