SlideShare a Scribd company logo
1 of 23
Download to read offline
Preserving 
Worker 
Privacy 
in 
Crowdsourcing 
Hiroshi 
Kajino1, 
Hiromi 
Arai2, 
Hisashi 
Kashima3 
1. 
The 
University 
of 
Tokyo, 
2. 
RIKEN, 
3. 
Kyoto 
University 
18/09/14 
ECML/PKDD 2014 1
Outline 
Propose 
& 
address 
a 
worker 
privacy 
problem 
in 
crowdsourcing 
■ IntroducNon 
& 
ExisNng 
Work 
□ Crowdsourcing: 
Outsourcing 
to 
unspecified 
people 
□ Quality 
control: 
Quality 
of 
results 
is 
variable 
■ Proposed 
Problem 
SeVng 
□ Worker 
privacy: 
SensiNve 
info 
of 
workers 
can 
be 
inferred 
□ Worker-­‐private 
quality 
control 
problem 
■ Proposed 
Method 
□ ExisNng 
Quality 
control 
method 
+ 
secure 
computaNon 
■ Experiments 
□ Accuracy: 
Validate 
approximaNon 
in 
secure 
computaNon 
□ Computa=on 
=me: 
Validate 
computaNonal 
overhead 
18/09/14 
ECML/PKDD 2014 2
Outline 
Propose 
& 
address 
a 
worker 
privacy 
problem 
in 
crowdsourcing 
■ IntroducNon 
& 
ExisNng 
Work 
□ Crowdsourcing: 
Outsourcing 
to 
unspecified 
people 
□ Quality 
control: 
Quality 
of 
results 
is 
variable 
■ Proposed 
Problem 
SeVng 
□ Worker 
privacy: 
SensiNve 
info 
of 
workers 
can 
be 
inferred 
□ Worker-­‐private 
quality 
control 
problem 
■ Proposed 
Method 
□ ExisNng 
Quality 
control 
method 
+ 
secure 
computaNon 
■ Experiments 
□ Accuracy: 
Validate 
approximaNon 
in 
secure 
computaNon 
□ Computa=on 
=me: 
Validate 
computaNonal 
overhead 
18/09/14 
ECML/PKDD 2014 3
Research 
Target 
■ Crowdsourcing 
□ Pros: 
Easy 
to 
use 
at 
low 
costs 
• Industry: 
Reduce 
financial/Nme 
costs 
for 
outsourcing 
• Academy: 
Trigger 
of 
new 
AI 
research 
areas 
(human 
computaNon) 
□ Cons: 
Quality 
issue, 
privacy 
issues, 
etc. 
4 
Crowdsourcing 
is 
a 
method 
to 
outsource 
tasks 
to 
unspecified 
workers 
18/09/14 
1. 
Submit 
instances 
ECML/PKDD 2014 
Requester 
Worker 
overlooks 
inquiry 
2. 
Return 
answers 
(h]p://www.captcha.net/)
ExisNng 
Work 
EsNmate 
ground 
truth 
labels 
by 
aggregaNng 
mulNple 
workers’ 
answers 
■ Quality 
of 
answers 
depends 
on 
abiliNes 
of 
workers 
□ CollecNng 
labels 
from 
mulNple 
workers 
is 
necessary 
■ Quality 
control 
problem 
(in 
a 
labeling 
task) 
□ Input: 
Crowd 
labels 
{yij ∈ {0,1} | i = 1,..., I, j = 1,..., J} 
□ Output: 
EsNmated 
true 
labels 
{yi ∈ {0,1} | i = 1,..., I} 
18/09/14 
Task 
example: 
Label 
an 
image 
whether 
it 
contains 
a 
bird 
or 
not 
1 = 
Bird 
0 = 
Not 
Bird 
0 
Ground 
truth 
ECML/PKDD 2014 5 
instance 
i 
1 
1 
0 
0 
1 
0 
0 
0 
? 
? 
? 
worker 
j
ExisNng 
Work 
EsNmate 
consensus 
labels 
by 
inferring 
worker 
models 
■ Latent 
Class 
Method 
[Dawid 
& 
Skene, 
1979] 
□ Model: 
Latent 
class 
model 
p 
yij 
βj 
I J 
• p = Pr[yi = 1]: Prob. 
yi 
of 
true 
label = 1 
• αj = Pr[yij = yi | yi = 1] 
• βj = Pr[yij = yi | yi = 0] 
• I, J: #(Instance), 
AbiliNes 
of 
worker j 
#(Worker) 
□ Inference: 
Given 
{yij}, 
esNmate 
αj 
{yi}, {αj, βj}, p 
• E-­‐step: 
EsNmate 
{yi}, 
fixing 
{αj, βj}, p 
• M-­‐step: 
EsNmate 
{αj, βj}, p, 
fixing 
{yi} 
18/09/14 
ECML/PKDD 2014 6
Outline 
Propose 
& 
address 
a 
worker 
privacy 
problem 
in 
crowdsourcing 
■ IntroducNon 
& 
ExisNng 
Work 
□ Crowdsourcing: 
Outsourcing 
to 
unspecified 
people 
□ Quality 
control: 
Quality 
of 
results 
is 
variable 
■ Proposed 
Problem 
SeVng 
□ Worker 
privacy: 
SensiNve 
info 
of 
workers 
can 
be 
inferred 
□ Worker-­‐private 
quality 
control 
problem 
■ Proposed 
Method 
□ ExisNng 
Quality 
control 
method 
+ 
secure 
computaNon 
■ Experiments 
□ Accuracy: 
Validate 
approximaNon 
in 
secure 
computaNon 
□ Computa=on 
=me: 
Validate 
computaNonal 
overhead 
18/09/14 
ECML/PKDD 2014 7
Worker 
Privacy 
Issue 
Simply 
passing 
answers 
to 
the 
requester 
can 
invade 
worker 
privacy 
■ SensiNve 
informaNon 
in 
answers 
□ Loca=on 
• AED4 
collects 
locaNons 
of 
AEDs 
in 
a 
map 
• Movement 
history 
of 
a 
worker 
is 
revealed 
□ Personal 
Informa=on 
in 
Ques=onnaire 
Task 
• Interest 
of 
workers, 
personal 
informaNon 
(quasi-­‐idenNfier) 
• Joining 
other 
data 
sets 
can 
idenNfy 
anonymous 
workers 
□ Ability 
• Quality 
control 
methods 
reveal 
the 
ability 
of 
a 
worker 
• DemoNvate 
to 
join 
in 
volunteer-­‐based 
crowdsourcing 
18/09/14 
ECML/PKDD 2014 8
Our 
Problem 
SeVng 
We 
propose 
a 
worker-­‐private 
quality 
control 
problem 
■ Worker-­‐Private 
Quality 
Control 
Problem 
□ Input: 
Crowd 
labels 
{yij | i = 1,..., I, j = 1,..., J} 
□ Output: 
EsNmated 
true 
labels 
{yi | i = 1,..., I} 
□ Subject 
to: 
Labels 
and 
abiliNes 
are 
kept 
worker-­‐private 
cf. 
Similar 
def 
can 
be 
found 
in 
query 
audiNng 
18/09/14 
ECML/PKDD 2014 9 
Worker 
j’s 
vj 
is 
worker-­‐private 
if 
others 
cannot 
determine 
vj 
uniquely 
Defini=on
Outline 
Propose 
& 
address 
a 
worker 
privacy 
problem 
in 
crowdsourcing 
■ IntroducNon 
& 
ExisNng 
Work 
□ Crowdsourcing: 
Outsourcing 
to 
unspecified 
people 
□ Quality 
control: 
Quality 
of 
results 
is 
variable 
■ Proposed 
Problem 
SeVng 
□ Worker 
privacy: 
SensiNve 
info 
of 
workers 
can 
be 
inferred 
□ Worker-­‐private 
quality 
control 
problem 
■ Proposed 
Method 
□ ExisNng 
Quality 
control 
method 
+ 
secure 
computaNon 
■ Experiments 
□ Accuracy: 
Validate 
approximaNon 
in 
secure 
computaNon 
□ Computa=on 
=me: 
Validate 
computaNonal 
overhead 
18/09/14 
ECML/PKDD 2014 10
Proposed 
Method: 
Overview 
Propose 
a 
privacy-­‐preserving 
inference 
algorithm 
for 
LC 
model 
■ Worker-­‐Private 
Latent 
Class 
Protocol 
□ Model: 
Latent 
class 
model 
(same 
as 
the 
previous 
one) 
□ Secure 
Inference: 
• E-­‐step: 
Requester 
& 
workers 
esNmate 
{yi} 
by 
secure 
computaNon 
• M-­‐step: 
Each 
worker 
updates 
αj, 
βj 
secretly 
18/09/14 
secure 
computaNon 
Workers 
keep 
their 
answers 
secret 
ECML/PKDD 2014 11 
Requester 
obtains 
true 
answers 
New!
Proposed 
Method: 
Building 
Block 
Secure 
sum 
allows 
us 
to 
compute 
the 
sum 
without 
privacy 
invasion 
■ Secure 
Sum 
Protocol 
(Generalized 
Paillier 
cryptosystem 
[Damgård+,01]) 
Compute 
Σj vj when 
each 
worker 
j has 
value 
vj secretly 
□ Addi=ve 
Homomorphic 
Cryptosystem: 
For 
plaintexts 
v1, v2 
∈ 
Zn and 
ciphertexts 
Enc(v1), Enc(v2), 
 Enc(v1 + v2) = Enc(v1)・Enc(v2) holds 
□ Protocol: 
1) 
Each 
worker 
j computes 
Enc(vj), 
and 
parNes 
compute 
Enc(Σj vj) 
2) 
ParNes 
decrypt 
Enc(Σj vj) using 
distributed 
secret 
keys 
18/09/14 
Aoer 
execuNng 
the 
protocol, 
any 
party 
learns 
nothing 
other 
than 
their 
iniNal 
knowledge 
& 
the 
sum. 
ECML/PKDD 2014 12 
Lemma
Proposed 
Method: 
Algorithm 
Incorporate 
workers 
into 
computaNon 
to 
preserve 
worker 
privacy 
■ Worker-­‐Private 
Latent 
Class 
Protocol 
□ Parameters: 
{μi}, p, {αj}, {βj} 
• μi = Pr[yi = 1 | Data], p = Pr[yi = 1] 
• αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0] 
18/09/14 
True 
labels μ1 
μ2 
μ3 
AbiliNes 
p 
1 
0 
1 
α1, β1 
1 
0 
0 
α2, β2 
0 
0 
0 
α3, β3 
ECML/PKDD 2014 13
Proposed 
Method: 
Algorithm 
Incorporate 
workers 
into 
computaNon 
to 
preserve 
worker 
privacy 
■ Worker-­‐Private 
Latent 
Class 
Protocol 
□ Parameters: 
{μi}, p, {αj}, {βj} 
• μi = Pr[yi = 1 | Data], p = Pr[yi = 1] 
• αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0] 
18/09/14 
True 
labels μ1 
μ2 
μ3 
AbiliNes 
p 
1 
0 
1 
α1, β1 
1 
0 
0 
α2, β2 
0 
0 
0 
α3, β3 
Public 
Private 
values 
of 
each 
worker 
ECML/PKDD 2014 14
Proposed 
Method: 
Algorithm 
Incorporate 
workers 
into 
computaNon 
to 
preserve 
worker 
privacy 
■ Worker-­‐Private 
Latent 
Class 
Protocol 
□ Parameters: 
{μi}, p, {αj}, {βj} 
□ E-­‐Step: 
ParNes 
update 
true 
labels 
using 
secure 
sum 
18/09/14 
True 
labels μ1 
μ2 
μ3 
AbiliNes 
p 
1 
0 
1 
α1, β1 
1 
0 
0 
α2, β2 
0 
0 
0 
α3, β3 
ECML/PKDD 2014 15 
Weighted 
majority 
vote 
of 
crowd 
labels
Proposed 
Method: 
Algorithm 
Incorporate 
workers 
into 
computaNon 
to 
preserve 
worker 
privacy 
■ Worker-­‐Private 
Latent 
Class 
Protocol 
□ Parameters: 
{μi}, p, {αj}, {βj} 
□ M-­‐Step: 
Each 
worker 
independently 
updates 
abiliNes 
18/09/14 
True 
labels μ1 
μ2 
μ3 
AbiliNes 
p 
1 
0 
1 
α1, β1 
1 
0 
0 
α2, β2 
0 
0 
0 
α3, β3 
ECML/PKDD 2014 16 
Checking 
agreement
Proposed 
Method: 
Security 
Analysis 
Making 
true 
labels 
public 
does 
not 
invade 
worker 
privacy 
Aoer 
execuNng 
the 
protocol, 
each 
worker’s 
labels 
and 
abiliNes 
are 
kept 
worker-­‐private. 
■ CondiNons 
□ #(workers) 
≧ 
3 
□ For 
each 
instance, 
there 
exist 
at 
least 
one 
worker 
who 
does 
not 
give 
a 
label 
to 
the 
instance. 
18/09/14 
ECML/PKDD 2014 17 
Theorem
Outline 
Propose 
& 
address 
a 
worker 
privacy 
problem 
in 
crowdsourcing 
■ IntroducNon 
& 
ExisNng 
Work 
□ Crowdsourcing: 
Outsourcing 
to 
unspecified 
people 
□ Quality 
control: 
Quality 
of 
results 
is 
variable 
■ Proposed 
Problem 
SeVng 
□ Worker 
privacy: 
SensiNve 
info 
of 
workers 
can 
be 
inferred 
□ Worker-­‐private 
quality 
control 
problem 
■ Proposed 
Method 
□ ExisNng 
Quality 
control 
method 
+ 
secure 
computaNon 
■ Experiments 
□ Accuracy: 
Validate 
approximaNon 
in 
secure 
computaNon 
□ Computa=on 
=me: 
Validate 
computaNonal 
overhead 
18/09/14 
ECML/PKDD 2014 18
Experiments: 
Overview 
Evaluate 
two 
drawbacks 
of 
introducing 
secure 
computaNon 
■ Cons 
of 
secure 
computaNon 
1) Approxima=on: 
• Secure 
sum 
protocol 
works 
only 
on 
integers 
• Use 
approximaNon 
parameter 
L 
worker 
to 
convert 
as 
value 
vj 
j ’s 
-­‐> 
Large 
number 
round(L vj) 
2) Computa=on 
Time: 
• Cryptographic 
(& 
communicaNon) 
overhead 
■ Data 
Set 
□ Duchenne 
Data 
Set: 
[Whitehill+,09] 
• Judge 
fake 
smile 
or 
not 
• #(workers)=20, 
#(instances)=159 
18/09/14 
Cited 
from 
[Whitehill+,09] 
ECML/PKDD 2014 19
Experiments: 
(1) 
ApproximaNon 
Accuracy 
EsNmaNon 
errors 
can 
be 
handled 
by 
approximaNon 
parameter 
■ RelaNve 
Errors 
of 
EsNmated 
Parameters 
□ Compare 
L 
esNmated 
model 
parameters 
w/ 
& 
w/o 
secure 
comp. 
□ Approx. 
parameter 
L 
can 
control 
errors 
arbitrarily 
□ Note: 
Accuracy 
of 
the 
true 
labels 
was 
the 
same 
as 
the 
original 
18/09/14 
ECML/PKDD 2014 20 
Approx. 
Parameter 
L
Experiments: 
(2) 
ComputaNon 
Time 
AddiNonal 
computaNon 
Nme 
on 
a 
real 
data 
set 
was 
less 
than 
a 
second 
■ Cryptographic 
Overhead 
□ Key 
generaNon 
□ One 
iteraNon 
of 
the 
algorithm 
(encrypNon 
& 
decrypNon) 
0.8 
sec 
on 
the 
real 
data 
set 
(#(workers)=20, 
#(instances)=159, 
#(iteraNons)=15) 
18/09/14 
ECML/PKDD 2014 21 
#(workers)
Conclusion 
We 
proposed 
the 
noNon 
of 
worker 
privacy 
■ ContribuNons 
of 
Our 
Work 
□ No=on 
of 
worker 
privacy 
• Workers’ 
sensiNve 
informaNon 
can 
leak 
from 
their 
answers 
□ WPLC 
protocol 
• Introducing 
secure 
computaNon 
into 
the 
LC 
method 
• Security 
is 
theoreNcally 
guaranteed 
□ Experiments 
• Accuracy 
can 
be 
controlled 
by 
a 
hyperparameter 
• ComputaNon 
Nme 
is 
tolerable 
18/09/14 
ECML/PKDD 2014 22
QuesNons? 
18/09/14 
ECML/PKDD 2014 23

More Related Content

Similar to Preserving Worker Privacy in Crowdsourcing

Oracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truthOracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truthAlfredo Krieg
 
Real Time Eye Tracking and Application
Real Time Eye Tracking and ApplicationReal Time Eye Tracking and Application
Real Time Eye Tracking and ApplicationAkshay Kamble
 
Methodologies for the Development of Crowd and Social-based applications
Methodologies for the Development of Crowd and Social-based applicationsMethodologies for the Development of Crowd and Social-based applications
Methodologies for the Development of Crowd and Social-based applicationsAndrea Mauri
 
Anomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement LearningAnomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement LearningHari Koduvely (PhD)
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...Dataconomy Media
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringDataRobot
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...GeeksLab Odessa
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarSigOpt
 
Investing in ai driven startups
Investing in ai driven startupsInvesting in ai driven startups
Investing in ai driven startupsRoy Lowrance
 

Similar to Preserving Worker Privacy in Crowdsourcing (20)

Oracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truthOracle database performance are database users telling me the truth
Oracle database performance are database users telling me the truth
 
20130716 aaai13-short
20130716 aaai13-short20130716 aaai13-short
20130716 aaai13-short
 
Real Time Eye Tracking and Application
Real Time Eye Tracking and ApplicationReal Time Eye Tracking and Application
Real Time Eye Tracking and Application
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
23 timestudy
23 timestudy23 timestudy
23 timestudy
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 
Methodologies for the Development of Crowd and Social-based applications
Methodologies for the Development of Crowd and Social-based applicationsMethodologies for the Development of Crowd and Social-based applications
Methodologies for the Development of Crowd and Social-based applications
 
Anomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement LearningAnomaly Detection through Reinforcement Learning
Anomaly Detection through Reinforcement Learning
 
Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
IC2IT 2013 Presentation
IC2IT 2013 PresentationIC2IT 2013 Presentation
IC2IT 2013 Presentation
 
RanjanKumarPusty__Resume
RanjanKumarPusty__ResumeRanjanKumarPusty__Resume
RanjanKumarPusty__Resume
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
LDQ 2014 DQ Methodology
LDQ 2014 DQ MethodologyLDQ 2014 DQ Methodology
LDQ 2014 DQ Methodology
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
 
Tuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques WebinarTuning 2.0: Advanced Optimization Techniques Webinar
Tuning 2.0: Advanced Optimization Techniques Webinar
 
Investing in ai driven startups
Investing in ai driven startupsInvesting in ai driven startups
Investing in ai driven startups
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 

Recently uploaded

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 

Recently uploaded (20)

Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 

Preserving Worker Privacy in Crowdsourcing

  • 1. Preserving Worker Privacy in Crowdsourcing Hiroshi Kajino1, Hiromi Arai2, Hisashi Kashima3 1. The University of Tokyo, 2. RIKEN, 3. Kyoto University 18/09/14 ECML/PKDD 2014 1
  • 2. Outline Propose & address a worker privacy problem in crowdsourcing ■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable ■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-­‐private quality control problem ■ Proposed Method □ ExisNng Quality control method + secure computaNon ■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead 18/09/14 ECML/PKDD 2014 2
  • 3. Outline Propose & address a worker privacy problem in crowdsourcing ■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable ■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-­‐private quality control problem ■ Proposed Method □ ExisNng Quality control method + secure computaNon ■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead 18/09/14 ECML/PKDD 2014 3
  • 4. Research Target ■ Crowdsourcing □ Pros: Easy to use at low costs • Industry: Reduce financial/Nme costs for outsourcing • Academy: Trigger of new AI research areas (human computaNon) □ Cons: Quality issue, privacy issues, etc. 4 Crowdsourcing is a method to outsource tasks to unspecified workers 18/09/14 1. Submit instances ECML/PKDD 2014 Requester Worker overlooks inquiry 2. Return answers (h]p://www.captcha.net/)
  • 5. ExisNng Work EsNmate ground truth labels by aggregaNng mulNple workers’ answers ■ Quality of answers depends on abiliNes of workers □ CollecNng labels from mulNple workers is necessary ■ Quality control problem (in a labeling task) □ Input: Crowd labels {yij ∈ {0,1} | i = 1,..., I, j = 1,..., J} □ Output: EsNmated true labels {yi ∈ {0,1} | i = 1,..., I} 18/09/14 Task example: Label an image whether it contains a bird or not 1 = Bird 0 = Not Bird 0 Ground truth ECML/PKDD 2014 5 instance i 1 1 0 0 1 0 0 0 ? ? ? worker j
  • 6. ExisNng Work EsNmate consensus labels by inferring worker models ■ Latent Class Method [Dawid & Skene, 1979] □ Model: Latent class model p yij βj I J • p = Pr[yi = 1]: Prob. yi of true label = 1 • αj = Pr[yij = yi | yi = 1] • βj = Pr[yij = yi | yi = 0] • I, J: #(Instance), AbiliNes of worker j #(Worker) □ Inference: Given {yij}, esNmate αj {yi}, {αj, βj}, p • E-­‐step: EsNmate {yi}, fixing {αj, βj}, p • M-­‐step: EsNmate {αj, βj}, p, fixing {yi} 18/09/14 ECML/PKDD 2014 6
  • 7. Outline Propose & address a worker privacy problem in crowdsourcing ■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable ■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-­‐private quality control problem ■ Proposed Method □ ExisNng Quality control method + secure computaNon ■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead 18/09/14 ECML/PKDD 2014 7
  • 8. Worker Privacy Issue Simply passing answers to the requester can invade worker privacy ■ SensiNve informaNon in answers □ Loca=on • AED4 collects locaNons of AEDs in a map • Movement history of a worker is revealed □ Personal Informa=on in Ques=onnaire Task • Interest of workers, personal informaNon (quasi-­‐idenNfier) • Joining other data sets can idenNfy anonymous workers □ Ability • Quality control methods reveal the ability of a worker • DemoNvate to join in volunteer-­‐based crowdsourcing 18/09/14 ECML/PKDD 2014 8
  • 9. Our Problem SeVng We propose a worker-­‐private quality control problem ■ Worker-­‐Private Quality Control Problem □ Input: Crowd labels {yij | i = 1,..., I, j = 1,..., J} □ Output: EsNmated true labels {yi | i = 1,..., I} □ Subject to: Labels and abiliNes are kept worker-­‐private cf. Similar def can be found in query audiNng 18/09/14 ECML/PKDD 2014 9 Worker j’s vj is worker-­‐private if others cannot determine vj uniquely Defini=on
  • 10. Outline Propose & address a worker privacy problem in crowdsourcing ■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable ■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-­‐private quality control problem ■ Proposed Method □ ExisNng Quality control method + secure computaNon ■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead 18/09/14 ECML/PKDD 2014 10
  • 11. Proposed Method: Overview Propose a privacy-­‐preserving inference algorithm for LC model ■ Worker-­‐Private Latent Class Protocol □ Model: Latent class model (same as the previous one) □ Secure Inference: • E-­‐step: Requester & workers esNmate {yi} by secure computaNon • M-­‐step: Each worker updates αj, βj secretly 18/09/14 secure computaNon Workers keep their answers secret ECML/PKDD 2014 11 Requester obtains true answers New!
  • 12. Proposed Method: Building Block Secure sum allows us to compute the sum without privacy invasion ■ Secure Sum Protocol (Generalized Paillier cryptosystem [Damgård+,01]) Compute Σj vj when each worker j has value vj secretly □ Addi=ve Homomorphic Cryptosystem: For plaintexts v1, v2 ∈ Zn and ciphertexts Enc(v1), Enc(v2),  Enc(v1 + v2) = Enc(v1)・Enc(v2) holds □ Protocol: 1) Each worker j computes Enc(vj), and parNes compute Enc(Σj vj) 2) ParNes decrypt Enc(Σj vj) using distributed secret keys 18/09/14 Aoer execuNng the protocol, any party learns nothing other than their iniNal knowledge & the sum. ECML/PKDD 2014 12 Lemma
  • 13. Proposed Method: Algorithm Incorporate workers into computaNon to preserve worker privacy ■ Worker-­‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj} • μi = Pr[yi = 1 | Data], p = Pr[yi = 1] • αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0] 18/09/14 True labels μ1 μ2 μ3 AbiliNes p 1 0 1 α1, β1 1 0 0 α2, β2 0 0 0 α3, β3 ECML/PKDD 2014 13
  • 14. Proposed Method: Algorithm Incorporate workers into computaNon to preserve worker privacy ■ Worker-­‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj} • μi = Pr[yi = 1 | Data], p = Pr[yi = 1] • αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0] 18/09/14 True labels μ1 μ2 μ3 AbiliNes p 1 0 1 α1, β1 1 0 0 α2, β2 0 0 0 α3, β3 Public Private values of each worker ECML/PKDD 2014 14
  • 15. Proposed Method: Algorithm Incorporate workers into computaNon to preserve worker privacy ■ Worker-­‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj} □ E-­‐Step: ParNes update true labels using secure sum 18/09/14 True labels μ1 μ2 μ3 AbiliNes p 1 0 1 α1, β1 1 0 0 α2, β2 0 0 0 α3, β3 ECML/PKDD 2014 15 Weighted majority vote of crowd labels
  • 16. Proposed Method: Algorithm Incorporate workers into computaNon to preserve worker privacy ■ Worker-­‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj} □ M-­‐Step: Each worker independently updates abiliNes 18/09/14 True labels μ1 μ2 μ3 AbiliNes p 1 0 1 α1, β1 1 0 0 α2, β2 0 0 0 α3, β3 ECML/PKDD 2014 16 Checking agreement
  • 17. Proposed Method: Security Analysis Making true labels public does not invade worker privacy Aoer execuNng the protocol, each worker’s labels and abiliNes are kept worker-­‐private. ■ CondiNons □ #(workers) ≧ 3 □ For each instance, there exist at least one worker who does not give a label to the instance. 18/09/14 ECML/PKDD 2014 17 Theorem
  • 18. Outline Propose & address a worker privacy problem in crowdsourcing ■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable ■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-­‐private quality control problem ■ Proposed Method □ ExisNng Quality control method + secure computaNon ■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead 18/09/14 ECML/PKDD 2014 18
  • 19. Experiments: Overview Evaluate two drawbacks of introducing secure computaNon ■ Cons of secure computaNon 1) Approxima=on: • Secure sum protocol works only on integers • Use approximaNon parameter L worker to convert as value vj j ’s -­‐> Large number round(L vj) 2) Computa=on Time: • Cryptographic (& communicaNon) overhead ■ Data Set □ Duchenne Data Set: [Whitehill+,09] • Judge fake smile or not • #(workers)=20, #(instances)=159 18/09/14 Cited from [Whitehill+,09] ECML/PKDD 2014 19
  • 20. Experiments: (1) ApproximaNon Accuracy EsNmaNon errors can be handled by approximaNon parameter ■ RelaNve Errors of EsNmated Parameters □ Compare L esNmated model parameters w/ & w/o secure comp. □ Approx. parameter L can control errors arbitrarily □ Note: Accuracy of the true labels was the same as the original 18/09/14 ECML/PKDD 2014 20 Approx. Parameter L
  • 21. Experiments: (2) ComputaNon Time AddiNonal computaNon Nme on a real data set was less than a second ■ Cryptographic Overhead □ Key generaNon □ One iteraNon of the algorithm (encrypNon & decrypNon) 0.8 sec on the real data set (#(workers)=20, #(instances)=159, #(iteraNons)=15) 18/09/14 ECML/PKDD 2014 21 #(workers)
  • 22. Conclusion We proposed the noNon of worker privacy ■ ContribuNons of Our Work □ No=on of worker privacy • Workers’ sensiNve informaNon can leak from their answers □ WPLC protocol • Introducing secure computaNon into the LC method • Security is theoreNcally guaranteed □ Experiments • Accuracy can be controlled by a hyperparameter • ComputaNon Nme is tolerable 18/09/14 ECML/PKDD 2014 22