1. Preserving
Worker
Privacy
in
Crowdsourcing
Hiroshi
Kajino1,
Hiromi
Arai2,
Hisashi
Kashima3
1.
The
University
of
Tokyo,
2.
RIKEN,
3.
Kyoto
University
18/09/14
ECML/PKDD 2014 1
2. Outline
Propose
&
address
a
worker
privacy
problem
in
crowdsourcing
■ IntroducNon
&
ExisNng
Work
□ Crowdsourcing:
Outsourcing
to
unspecified
people
□ Quality
control:
Quality
of
results
is
variable
■ Proposed
Problem
SeVng
□ Worker
privacy:
SensiNve
info
of
workers
can
be
inferred
□ Worker-‐private
quality
control
problem
■ Proposed
Method
□ ExisNng
Quality
control
method
+
secure
computaNon
■ Experiments
□ Accuracy:
Validate
approximaNon
in
secure
computaNon
□ Computa=on
=me:
Validate
computaNonal
overhead
18/09/14
ECML/PKDD 2014 2
3. Outline
Propose
&
address
a
worker
privacy
problem
in
crowdsourcing
■ IntroducNon
&
ExisNng
Work
□ Crowdsourcing:
Outsourcing
to
unspecified
people
□ Quality
control:
Quality
of
results
is
variable
■ Proposed
Problem
SeVng
□ Worker
privacy:
SensiNve
info
of
workers
can
be
inferred
□ Worker-‐private
quality
control
problem
■ Proposed
Method
□ ExisNng
Quality
control
method
+
secure
computaNon
■ Experiments
□ Accuracy:
Validate
approximaNon
in
secure
computaNon
□ Computa=on
=me:
Validate
computaNonal
overhead
18/09/14
ECML/PKDD 2014 3
4. Research
Target
■ Crowdsourcing
□ Pros:
Easy
to
use
at
low
costs
• Industry:
Reduce
financial/Nme
costs
for
outsourcing
• Academy:
Trigger
of
new
AI
research
areas
(human
computaNon)
□ Cons:
Quality
issue,
privacy
issues,
etc.
4
Crowdsourcing
is
a
method
to
outsource
tasks
to
unspecified
workers
18/09/14
1.
Submit
instances
ECML/PKDD 2014
Requester
Worker
overlooks
inquiry
2.
Return
answers
(h]p://www.captcha.net/)
5. ExisNng
Work
EsNmate
ground
truth
labels
by
aggregaNng
mulNple
workers’
answers
■ Quality
of
answers
depends
on
abiliNes
of
workers
□ CollecNng
labels
from
mulNple
workers
is
necessary
■ Quality
control
problem
(in
a
labeling
task)
□ Input:
Crowd
labels
{yij ∈ {0,1} | i = 1,..., I, j = 1,..., J}
□ Output:
EsNmated
true
labels
{yi ∈ {0,1} | i = 1,..., I}
18/09/14
Task
example:
Label
an
image
whether
it
contains
a
bird
or
not
1 =
Bird
0 =
Not
Bird
0
Ground
truth
ECML/PKDD 2014 5
instance
i
1
1
0
0
1
0
0
0
?
?
?
worker
j
6. ExisNng
Work
EsNmate
consensus
labels
by
inferring
worker
models
■ Latent
Class
Method
[Dawid
&
Skene,
1979]
□ Model:
Latent
class
model
p
yij
βj
I J
• p = Pr[yi = 1]: Prob.
yi
of
true
label = 1
• αj = Pr[yij = yi | yi = 1]
• βj = Pr[yij = yi | yi = 0]
• I, J: #(Instance),
AbiliNes
of
worker j
#(Worker)
□ Inference:
Given
{yij},
esNmate
αj
{yi}, {αj, βj}, p
• E-‐step:
EsNmate
{yi},
fixing
{αj, βj}, p
• M-‐step:
EsNmate
{αj, βj}, p,
fixing
{yi}
18/09/14
ECML/PKDD 2014 6
7. Outline
Propose
&
address
a
worker
privacy
problem
in
crowdsourcing
■ IntroducNon
&
ExisNng
Work
□ Crowdsourcing:
Outsourcing
to
unspecified
people
□ Quality
control:
Quality
of
results
is
variable
■ Proposed
Problem
SeVng
□ Worker
privacy:
SensiNve
info
of
workers
can
be
inferred
□ Worker-‐private
quality
control
problem
■ Proposed
Method
□ ExisNng
Quality
control
method
+
secure
computaNon
■ Experiments
□ Accuracy:
Validate
approximaNon
in
secure
computaNon
□ Computa=on
=me:
Validate
computaNonal
overhead
18/09/14
ECML/PKDD 2014 7
8. Worker
Privacy
Issue
Simply
passing
answers
to
the
requester
can
invade
worker
privacy
■ SensiNve
informaNon
in
answers
□ Loca=on
• AED4
collects
locaNons
of
AEDs
in
a
map
• Movement
history
of
a
worker
is
revealed
□ Personal
Informa=on
in
Ques=onnaire
Task
• Interest
of
workers,
personal
informaNon
(quasi-‐idenNfier)
• Joining
other
data
sets
can
idenNfy
anonymous
workers
□ Ability
• Quality
control
methods
reveal
the
ability
of
a
worker
• DemoNvate
to
join
in
volunteer-‐based
crowdsourcing
18/09/14
ECML/PKDD 2014 8
9. Our
Problem
SeVng
We
propose
a
worker-‐private
quality
control
problem
■ Worker-‐Private
Quality
Control
Problem
□ Input:
Crowd
labels
{yij | i = 1,..., I, j = 1,..., J}
□ Output:
EsNmated
true
labels
{yi | i = 1,..., I}
□ Subject
to:
Labels
and
abiliNes
are
kept
worker-‐private
cf.
Similar
def
can
be
found
in
query
audiNng
18/09/14
ECML/PKDD 2014 9
Worker
j’s
vj
is
worker-‐private
if
others
cannot
determine
vj
uniquely
Defini=on
10. Outline
Propose
&
address
a
worker
privacy
problem
in
crowdsourcing
■ IntroducNon
&
ExisNng
Work
□ Crowdsourcing:
Outsourcing
to
unspecified
people
□ Quality
control:
Quality
of
results
is
variable
■ Proposed
Problem
SeVng
□ Worker
privacy:
SensiNve
info
of
workers
can
be
inferred
□ Worker-‐private
quality
control
problem
■ Proposed
Method
□ ExisNng
Quality
control
method
+
secure
computaNon
■ Experiments
□ Accuracy:
Validate
approximaNon
in
secure
computaNon
□ Computa=on
=me:
Validate
computaNonal
overhead
18/09/14
ECML/PKDD 2014 10
11. Proposed
Method:
Overview
Propose
a
privacy-‐preserving
inference
algorithm
for
LC
model
■ Worker-‐Private
Latent
Class
Protocol
□ Model:
Latent
class
model
(same
as
the
previous
one)
□ Secure
Inference:
• E-‐step:
Requester
&
workers
esNmate
{yi}
by
secure
computaNon
• M-‐step:
Each
worker
updates
αj,
βj
secretly
18/09/14
secure
computaNon
Workers
keep
their
answers
secret
ECML/PKDD 2014 11
Requester
obtains
true
answers
New!
12. Proposed
Method:
Building
Block
Secure
sum
allows
us
to
compute
the
sum
without
privacy
invasion
■ Secure
Sum
Protocol
(Generalized
Paillier
cryptosystem
[Damgård+,01])
Compute
Σj vj when
each
worker
j has
value
vj secretly
□ Addi=ve
Homomorphic
Cryptosystem:
For
plaintexts
v1, v2
∈
Zn and
ciphertexts
Enc(v1), Enc(v2),
Enc(v1 + v2) = Enc(v1)・Enc(v2) holds
□ Protocol:
1)
Each
worker
j computes
Enc(vj),
and
parNes
compute
Enc(Σj vj)
2)
ParNes
decrypt
Enc(Σj vj) using
distributed
secret
keys
18/09/14
Aoer
execuNng
the
protocol,
any
party
learns
nothing
other
than
their
iniNal
knowledge
&
the
sum.
ECML/PKDD 2014 12
Lemma
13. Proposed
Method:
Algorithm
Incorporate
workers
into
computaNon
to
preserve
worker
privacy
■ Worker-‐Private
Latent
Class
Protocol
□ Parameters:
{μi}, p, {αj}, {βj}
• μi = Pr[yi = 1 | Data], p = Pr[yi = 1]
• αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]
18/09/14
True
labels μ1
μ2
μ3
AbiliNes
p
1
0
1
α1, β1
1
0
0
α2, β2
0
0
0
α3, β3
ECML/PKDD 2014 13
14. Proposed
Method:
Algorithm
Incorporate
workers
into
computaNon
to
preserve
worker
privacy
■ Worker-‐Private
Latent
Class
Protocol
□ Parameters:
{μi}, p, {αj}, {βj}
• μi = Pr[yi = 1 | Data], p = Pr[yi = 1]
• αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]
18/09/14
True
labels μ1
μ2
μ3
AbiliNes
p
1
0
1
α1, β1
1
0
0
α2, β2
0
0
0
α3, β3
Public
Private
values
of
each
worker
ECML/PKDD 2014 14
15. Proposed
Method:
Algorithm
Incorporate
workers
into
computaNon
to
preserve
worker
privacy
■ Worker-‐Private
Latent
Class
Protocol
□ Parameters:
{μi}, p, {αj}, {βj}
□ E-‐Step:
ParNes
update
true
labels
using
secure
sum
18/09/14
True
labels μ1
μ2
μ3
AbiliNes
p
1
0
1
α1, β1
1
0
0
α2, β2
0
0
0
α3, β3
ECML/PKDD 2014 15
Weighted
majority
vote
of
crowd
labels
17. Proposed
Method:
Security
Analysis
Making
true
labels
public
does
not
invade
worker
privacy
Aoer
execuNng
the
protocol,
each
worker’s
labels
and
abiliNes
are
kept
worker-‐private.
■ CondiNons
□ #(workers)
≧
3
□ For
each
instance,
there
exist
at
least
one
worker
who
does
not
give
a
label
to
the
instance.
18/09/14
ECML/PKDD 2014 17
Theorem
18. Outline
Propose
&
address
a
worker
privacy
problem
in
crowdsourcing
■ IntroducNon
&
ExisNng
Work
□ Crowdsourcing:
Outsourcing
to
unspecified
people
□ Quality
control:
Quality
of
results
is
variable
■ Proposed
Problem
SeVng
□ Worker
privacy:
SensiNve
info
of
workers
can
be
inferred
□ Worker-‐private
quality
control
problem
■ Proposed
Method
□ ExisNng
Quality
control
method
+
secure
computaNon
■ Experiments
□ Accuracy:
Validate
approximaNon
in
secure
computaNon
□ Computa=on
=me:
Validate
computaNonal
overhead
18/09/14
ECML/PKDD 2014 18
19. Experiments:
Overview
Evaluate
two
drawbacks
of
introducing
secure
computaNon
■ Cons
of
secure
computaNon
1) Approxima=on:
• Secure
sum
protocol
works
only
on
integers
• Use
approximaNon
parameter
L
worker
to
convert
as
value
vj
j ’s
-‐>
Large
number
round(L vj)
2) Computa=on
Time:
• Cryptographic
(&
communicaNon)
overhead
■ Data
Set
□ Duchenne
Data
Set:
[Whitehill+,09]
• Judge
fake
smile
or
not
• #(workers)=20,
#(instances)=159
18/09/14
Cited
from
[Whitehill+,09]
ECML/PKDD 2014 19
20. Experiments:
(1)
ApproximaNon
Accuracy
EsNmaNon
errors
can
be
handled
by
approximaNon
parameter
■ RelaNve
Errors
of
EsNmated
Parameters
□ Compare
L
esNmated
model
parameters
w/
&
w/o
secure
comp.
□ Approx.
parameter
L
can
control
errors
arbitrarily
□ Note:
Accuracy
of
the
true
labels
was
the
same
as
the
original
18/09/14
ECML/PKDD 2014 20
Approx.
Parameter
L
21. Experiments:
(2)
ComputaNon
Time
AddiNonal
computaNon
Nme
on
a
real
data
set
was
less
than
a
second
■ Cryptographic
Overhead
□ Key
generaNon
□ One
iteraNon
of
the
algorithm
(encrypNon
&
decrypNon)
0.8
sec
on
the
real
data
set
(#(workers)=20,
#(instances)=159,
#(iteraNons)=15)
18/09/14
ECML/PKDD 2014 21
#(workers)
22. Conclusion
We
proposed
the
noNon
of
worker
privacy
■ ContribuNons
of
Our
Work
□ No=on
of
worker
privacy
• Workers’
sensiNve
informaNon
can
leak
from
their
answers
□ WPLC
protocol
• Introducing
secure
computaNon
into
the
LC
method
• Security
is
theoreNcally
guaranteed
□ Experiments
• Accuracy
can
be
controlled
by
a
hyperparameter
• ComputaNon
Nme
is
tolerable
18/09/14
ECML/PKDD 2014 22