Programming with People: Integrating Human-Based & Digital Computation

Dan
Barowy,
Charlie
Curtsinger,
Emery
Berger,
Andrew
McGregor

Programming
with
People:

Integra(ng
Human-‐Based

&
Digital
Computa(on

Computers
really good at
some tasks…
decoding human genome

(science)
blood flow simulation…

isGiraffe(image)
(not a real function!)

Can we
“implement”
this function?
isGiraffe( )

Can we
“implement”
this function?
isGiraffe( )
We could just ask people!

isGiraffe( )
Find people via MTurk…

h"p://mturk.com

MTurk = Amazon’s
“Mechanical Turk”

Original “Mechanical
Turk”: 18th Century
chess machine (!)

The secret. Amazon’s
service also looks like a
computer but has
people inside.

Now we
implement
this function
with MTurk
workers…
isGiraffe( )
True

Q1: How
much is this
task worth?
isGiraffe( )
True

isGiraffe( )
True
Q2: How
much time
should it
take?

isGiraffe( )
False
Q3: What if
worker gets
it wrong?

Why would
a worker get
it wrong?
False

Incompetent or
lazy (“Homer”)
False

Bots (Internet + $ = Scammers)
False

Or…evil genius:
deliberately picks
wrong answer.
False

random adversary model
No need to worry
about evil geniuses:
we are fine with a

i.e., Homer & Bender = OK.

i.e., Homer & Bender = OK.
We rule Dr. Evil out. Why?

long-term financial incentive
MTurk tracks approval
rates and work records:

credentials limit Sybil attacks
Use of financial credentials
make it hard to gin up new
accounts, and

isGiraffe( )
?

But how do we know
any one person is not a
Homer or a Bender?

isGiraffe( )
Idea: vote. If both agree
on the answer, we’re
happy, right?

Pr[agree]
=
1/2

isGiraffe( )
Not so much. Random =
50% chance of agreement.

Pr[agree]
=
1/32

<
5%

BUT: can dramatically
reduce likelihood by
increasing # of workers.

Pr[agree]
=
k(1/k)n

(unanimous
agreement)

k = # choices, n = # workers.
(see paper for more details)

é choices
ê Pr[agree]
More choices = fewer people

isGiraffe( )
( )
AutoMan: DSL in Scala
– runs on any JVM.

isGiraffe( )
( )
Total $ for computation
AutoMan programmer-specified

isGiraffe( )
( )
Total $ for computation
Confidence level
(per function)
95%
(p < 0.05)
AutoMan programmer-specified

isGiraffe( )
US minimum wage,
Adaptive doubling
AutoMan ties pay and time;
Doubles both if no workers.

isGiraffe( )
US minimum wage,
Adaptive doubling
30s, $0.06 ($7.25 / 120)
Initially: all tasks 30 seconds.

isGiraffe( )
US minimum wage,
Adaptive doubling
prevents gaming
30s, $0.06 ($7.25 / 120)
60s, $0.12
No one shows up: both doubled.

You might think
this is exploitable:
just wait for jobs to
rise in price.

Strategy fails
when other
people are
around &
grab money.

=
base
(Pavail)round

*
mul=plierround

E[gain]

Math: workers
should never wait.
Expected
earnings
after some #
of rounds…

=
base
(½)round

*
mul=plierround

E[gain]

Even odds
somebody
will take the
money…

E[gain]

=
base
(½)round

*
2round

Doubling
increases
wage after
each round…

=
base

E[gain]

Terms
dependent on
rounds
cancel out.

no incentive to wait
=
base

E[gain]

isGiraffe( )
True 95% confidence
AutoMan
manages
time, $,
quality.

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!
AutoMan
handles
“radio
button”
questions

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!Risk: Homer &
Bender always guess

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!E.g., always choose
first option.

How
many
giraﬀes
are

in
this
picture?

k = 3 choices!
To combat this,
AutoMan randomizes
answers.

25 choices!
Which
are
from
Sesame
Street?

Kermit
the
Frog

Spongebob
Squarepants

Cookie
Monster

The
Count

Oscar
the
Grouch

☐

☐

☐

☐

☐

“Checkbox” questions

Which
are
from
Sesame
Street?

Kermit
the
Frog

Spongebob
Squarepants

Cookie
Monster

The
Count

Oscar
the
Grouch

þ

þ

þ

þ

þ

25 choices!
Same risk: random respondents

Which
are
from
Sesame
Street?

Kermit
the
Frog

Spongebob
Squarepants

Cookie
Monster

The
Count

Oscar
the
Grouch

þ

þ

☐

þ

☐

25 choices!
AutoMan checks each randomly

What
does
this

license
plate
say?

36d choices!
XXXXXX
366 choices = !2176782336[A-Z0-9]{6}!
Last question category:
constrained free-text

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

Example real execution

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

2m
30s

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

2m
30s

2m
50s

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
tasks
@
$0.06;
30s
work

t1
t2
t3

1m
50s

2m
30s

2m
50s
Inconclusive!

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

7m

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

18m
50s

7m

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawns
3
more
tasks

t1
t2
t3
t4
t5
t6

7m

18m
50s

51m

Timeout: double pay and time

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawn
1
more
task
@
$0.12,
60s
work

t1
t2
t3
t4
t5
t6
t7

Which
one
of
these
doesn’t
belong?

[95%
conf.]

AUTOMAN:
spawn
1
more
task
@
$0.12,
60s
work

t1
t2
t3
t4
t5
t6
t7

1h
9m
50s;

cost
=
$0.36

AUTOMAN:
5
out
of
6

⇒
95%
conﬁdence;

return

read_plate( )
More sophisticated function:

12.2%

[Maryland
State
Highway
Administra=on]

Success rate of real system!

def is_car(img_url: String) =
a.RadioButtonQuestion { q =>
q.budget = 1.00
q.confidence = 0.95
q.text = “Is this a car?”
q.image_url = img_url
q.options = List(
a.Option('yes, ”Yes"),
a.Option('no, ”No”)
)
}
Actual AutoMan code:

def get_plate_text(img_url: String) =
a.FreeTextQuestion { q =>
q.text = ”What does this plate
say?"
q.image_url = img_url
q.pattern = "XXXXXYYY”
}
Actual AutoMan code:

t1 t2 t3 t3 t4 t5 t6 t7 t8
"Is this a vehicle?"
start end
$0.06
post tasks
w1:
yes
w2:
yes
w3:
yes
3 answers
w4:
yes
1 answer
w5:
yes
Task 1
Task 2
Task 3
Task 4
Task 5
1 answer
"What does the license plate say?"
unanimous
agreement!
post tasks
$0.06
workers
disagree!
2 answers post tasks
Task 8
Task 9
timeout!
$0.12$0.06
post tasks
X
cancelled!
1 answer
end
767JKF
yes

w6:
767JFK
w7:
767JKF
Task 6
Task 7
w8:
767JKF
Task
10
Task
11
Example execution

MediaLab
LPR
database

“extremely
dif-icult”
dataset

144
plates

Accuracy:
91.6%

Average
cost:
12.08
cents

Latency:
<
2
minutes
per
image

>12.2%!

AutoMan evaluation

www.automan-‐lang.org

AUTOMAN:

Programming
with
People

read_plate( )
def read_plate(url:
String) =
a.FreeTextQuestion { q =>
q.text = ”What does this
plate say?”
q.image_url = url
q.pattern = "XXXXXYYY”
}
Dan
Barowy,
Charlie
Curtsinger,
Emery
Berger,
Andrew
McGregor

Programming with People: Integrating Human-Based & Digital Computation

Recommended

Recommended

More Related Content

Similar to Programming with People: Integrating Human-Based & Digital Computation

Similar to Programming with People: Integrating Human-Based & Digital Computation (20)

More from Emery Berger

More from Emery Berger (20)

Recently uploaded

Recently uploaded (20)

Programming with People: Integrating Human-Based & Digital Computation