Spatial Crowdsourcing (SC) is a transformative platform that engages individuals, groups and communities in the act of collecting, analyzing, and disseminating environmental, social and other spatio-temporal information. The objective of SC is to outsource a set of spatio-temporal tasks to a set of workers, i.e., individuals with mobile devices that perform the tasks by physically traveling to specified locations of interest. However, current solutions require the workers, who in many cases are simply volunteering for a cause, to disclose their locations to untrustworthy entities. In this paper, we introduce a framework for protecting location privacy of workers participating in SC tasks. We argue that existing location privacy techniques are not sufficient for SC, and we propose a mechanism based on differential privacy and geocasting that achieves effective SC services while offering privacy guarantees to workers. We investigate analytical models and task assignment strategies that balance multiple crucial aspects of SC functionality, such as task completion rate, worker travel distance and system overhead. Extensive experimental results on real-world datasets show that the proposed technique protects workers' location privacy without incurring significant performance metrics penalties.
Link: http://dl.acm.org/citation.cfm?id=2732966
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
A Framework for Protecting Worker Location Privacy in Spatial Crowdsourcing
1. A
Framework
for
Protec/ng
Worker
Loca/on
Privacy
in
Spa/al
Crowdsourcing
VLDB
2014
CSCI
587
Nov
12
2014
Cyrus
Shahabi
Privacy
in
spa/al
crowdsourcing
1
2. Mo/va/on
[1]
hOp://mobithinking.com/mobile-‐marke/ng-‐tools/latest-‐mobile-‐stats/
Ubiquity
of
mobile
users
Technology
advances
on
mobiles
Network
bandwidth
improvements
From
2.5G
(up
to
384Kbps)
to
3G
(up
to
14.7Mbps)
and
recently
4G
(up
to
100
Mbps)
Smartphone's
sensors.
e.g.,
video
cameras
6.5
billion
mobile
subscrip/ons,
93.5%
of
the
world
popula/on
[1]
VLDB
2014
2
3. Spa/al
Crowdsourcing
q Crowdsourcing
– Outsourcing
a
set
of
tasks
to
a
set
of
workers
q Spa/al
Crowdsourcing
– Crowdsourcing
a
set
of
spa%al
tasks
to
a
set
of
workers.
– Spa%al
task
is
related
to
a
loca/on
.e.g.,
taking
pictures
Loca/on
privacy
is
one
of
the
major
impediments
that
may
hinder
workers
from
par/cipa/on
in
SC
VLDB
2014
3
4. Problem
Statement
Workers
Requesters
SC-‐server
Report
loca+ons
Current
solu/ons
require
the
workers
to
disclose
their
loca/ons
to
untrustworthy
en//es,
i.e.,
SC-‐server.
A
framework
for
protec/ng
privacy
of
worker
loca/ons,
whereby
the
SC-‐server
only
has
access
to
data
sani/zed
according
to
differen%al
privacy.
VLDB
2014
4
7. Related
Work
v Pseudonymity
(using
fake
iden/ty)
• e.g.
fake
iden/ty
+
loca/on
==
resident
of
the
home
VLDB
2014
7
v
K-‐anonymity
model
(not
dis/nguish
among
other
k
records)
iden//es
are
known
the
loca/on
k-‐anonymity
fails
to
prevent
the
loca/on
of
a
subject
being
not
iden/fiable
all
k
users
reside
in
the
exact
same
loca/on
k-‐anonymity,
do
not
provide
rigorous
privacy
v
Cryptography
such
technique
is
computa%onal
expensive
=>not
suitable
for
SC
applica/ons
8. Differen/al
Privacy
(DP)
DP
ensures
an
adversary
do
not
know
from
the
sani/zed
data
whether
an
individual
is
present
or
not
in
the
original
data
Given
neighboring
datasets
and
,
the
sensi/vity
of
query
set
QS
is
the
the
maximum
change
in
their
query
results
∑=
−=
q
1i
21
,
|)()(|max)(
21
DQSDQSQS
DD
σ
1L
-‐sensi+vity:
1D 2D
[Dwork’06]
shows
that
it
is
sufficient
to
achieve
-‐DP
by
adding
random
Laplace
noise
with
mean
εσλ /)(QS=
ε
DP
allows
only
aggregate
queries,
e.g.,
count,
sum.
ε ε≤
=
=
]Pr[
]Pr[
ln 2
1
UQS
UQS
D
D
A
database
produces
transcript
U
on
a
set
of
queries.
Transcript
U
sa/sfies
-‐
dis/nguishability
if
for
every
pair
of
sibling
datasets
and
and
they
differ
in
only
one
record,
it
holds
that
1D ,2D 21 DD =
ε
:
privacy
budget
-‐dis$nguishability
[Dwork’06]
ε
VLDB
2014
8
10. 3. Geocast {t,GR}
2. Task Request t
Requesters
Workers
SC-Server
Worker
Database
1. Sanitized ReleasePSD
4. Consent
Cell Service
Provider
GR
0. Report Locations
Privacy
Framework
0.
Workers
send
their
loca/ons
to
a
trusted
CSP
2.
SC-‐server
receives
tasks
from
requesters
3.
When
SC-‐server
receives
task
t,
it
queries
the
PSD
to
determine
a
GR
that
enclose
sufficient
workers.
Then,
SC-‐
server
ini/alizes
geocast
communica/on
to
disseminate
t
to
all
workers
within
GR
4.
Workers
confirm
their
availability
to
perform
the
assigned
task
1.
CSP
releases
a
PSD
according
to
.
PSD
is
accessed
by
SC-‐server
ε
Workers
trust
SCP
Workers
do
not
trust
SC-‐server
and
requesters
Focus
on
private
task
assignment
rather
than
post
assignment
VLDB
2014
10
11. Design
Goal
and
Performance
Metrics
Assignment
Success
Rate
(ASR):
measures
the
ra/o
of
tasks
accepted
by
workers
to
the
total
number
of
task
requests
Worker
Travel
Distance
(WTD):
the
average
travel
distance
of
all
workers
System
Overhead:
the
average
number
of
no/fied
workers
(ANW).
ANW
affects
both
communica%on
overhead
required
to
geocast
task
requests
and
the
computa%on
overhead
of
matching
algorithm
Protec/ng
worker
loca/on
may
reduce
the
effec/veness
and
efficiency
of
worker-‐task
matching,
captured
by
following
metrics:
VLDB
2014
11
13. Adap/ve
Grid
(Worker
PSD)
A B
C D
Level 1
Level 2
1c 2c
3c 4c
5c 6c
7c 8c9c 10c
11c 12c
13c 14c
16c 17c
15c
18c
19c 20c 21c
)100( '
=AN )100( '
=BN
)100( '
=CN )200( '
=DN
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
⎥
⎥
⎤
⎢
⎢
⎡ ×
=
2
1
4
1
,10max
k
N
m
ε
Creates
a
coarse-‐grained,
fixed
size
grid
over
data
domain.
Then
issues
count
queries
for
each
level-‐1
cell
using
11 mm ×2
1m 1ε
Par//ons
each
level-‐1
cell
into
level-‐2
cells,
is
adap/vely
chosen
based
on
noisy
count
of
level-‐1
cell
22 mm × 2m
'N
⎥
⎥
⎤
⎢
⎢
⎡ ×
=
2
2
2
'
4
1
k
N
m
ε
21 εεε +=
[Qardaji’13]
VLDB
2014
13
14. Customized
AG
Expected
#workers
(noisy
count)
in
level-‐2
cells
22
2
2 //' εkmNn ==
large
leads
to
high
communica+on
cost
n
Increase
to
decrease
overhead,
but
only
to
the
point
where
there
is
at
least
one
worker
in
a
cell
2m
1
0.5
6
2.8
0.5
0.25
5
5.6
0.1
0.05
2
28
J
Customized
AG
%)88,2( 2 == hpk
ε 2ε 2m n
1
0.5
3
11
0.5
0.25
2
25
0.1
0.05
1
100
L
Original
AG
)5( 2 =k
ε 2ε 2m n
100'=N
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
−−=
2/1
exp
2
1
1
ε
PSD
h
count
p
The
probability
that
the
real
count
is
larger
than
zero:
VLDB
2014
14
15. Customized
AG
• Original
AG
and
Customized
AG
adapts
to
data
distribu/ons
• Original
AG
minimizes
overall
es/ma/on
error
of
region
queries
while
customized
AG
increases
the
number
of
2nd
level
cells
VLDB
2014
15
Original
AG
Customized
AG
Yelp
Dataset
17. Analy/cal
U/lity
Model
SC-‐server
establishes
an
Expected
U%lity
(
)
threshold,
which
is
the
targeted
success
rate
for
a
task.
>
.
EU
a
pEU
is
a
random
variable
for
an
event
that
a
worker
accepts
a
received
task
aa
pFalseXPpTrueXP −==== 1)(;)(
X
wa
a
pU
pwBinomialX
)1(1
),(~
−−=⇒
Assuming
independent
workers.
is
the
probability
that
at
least
one
worker
accepts
the
task
Uw
We
define
Acceptance
Rate
as
a
decreasing
func/on
of
task-‐worker
distance
(e.g.
linear,
Zipian)
10);( ≤≤= aa
pdFp
VLDB
2014
17
19. Geocast
Region
Construc/on
Determines
a
small
region
that
contains
sufficient
workers
2.
Qci ←
4.
If
,
return
GR
EUU ≥
5.
MTDGRneighborsscneighbors i ∩−= }'{
6.
;
Go
to
2.
neighborsQQ ∪=
1.
Init
GR
=
{},
max-‐heap
of
candidates
Q
=
{
the
cell
that
contains
}
t
Q
t
1c 2c
3c 4c
5c 6c
7c 8c
9c 10c
11c 12c
14c
16c 17c
15c
18c
19c 20c 21c
13c
3.
)1)(1(1 icUUU −−−←
Greedy
Algorithm
(GDY)
VLDB
2014
19
20. Par/al
Cell
Selec/on
t
0t
ic
Sub-cell 'ic
1t 2t 3t
4t
5t
6t
7t
8t
Splisng
ic
13c
1c 2c
3c 4c
5c 6c
7c 8c
9c 10c
11c 12c
14c
16c 17c
15c
18c
19c 20c 21c
Splisng
7c
L
The
number
of
workers
can
s/ll
be
large
with
AG,
especially
when
small
2ε
Allow
par$al
cell
inclusion
on
the
lastly
added
cell
ic
VLDB
2014
20
21. Internet WLAN
Cellular
Mobile
Ad-‐hoc
Networks
Communica/on
Cost
t
1c 2c
3c
4c
5c 6c
7c 8c
9c 10c
11c 12c
14c
16c 17c
15c
18c
19c 20c 21c
13c
The
more
compact
the
GR,
the
lower
the
cost
Measurement:
rangeionCommunicat
countHop
×
=
2
workerstwobetweendistanceFarthest
Infrastructure-‐based
Mode
v.s
Infrastructure-‐less
Mode
)(
)(
BALLMINarea
GRarea
DCM =
Digital
Compactness
Measurement
[Kim’84]
VLDB
2014
21
24. Experimental
Setup
• Datasets
• Assump/ons
– Gowalla
and
Yelp
users
are
workers
– Check-‐in
points
(i.e.,
of
restaurants)
are
task
loca/ons
• Parameter
sesngs
• 1000
random
tasks
x
10
seeds
Name
#Tasks
#Workers
MTD
(km)
Gowalla
151,075
6,160
3.6
Yelp
15,583
70,817
13.5
}1,7.0,4.0,1.0{=ε
}9.0,7.0,5.0,3.0{=EU
}1,7.0,4.0,1.0{=MaxAR
VLDB
2014
24
25. GR
Construc/on
Heuris/cs
(Gow.-‐Linear)
0
20
40
60
80
100
120
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
0
0.1
0.2
0.3
0.4
0.5
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
0
2
4
6
8
Eps=0.1 Eps=0.4 Eps=0.7 Eps=1
GDY G-GR
G-PA G-GP
ANW
WTD-‐FC
HOP
VLDB
2014
GDY
=
geocast
(GREedy
algorithm)
+
original
Adap/ve
grid
(AG)
[Qardaji’13]
G-‐GR
=
geocast
+
AG
with
customized
GRanularity
G-‐PA
=
geocast
with
PAr/al
cell
selec/on
+
original
Adap/ve
grid
(AG)
G-‐GP
=
geocast
with
Par/al
cell
selec/on
+
AG
with
customized
Granularity
25
26. Effect
of
Grid
Size
to
ASR
50
60
70
80
90
100
0.1 0.2 0.4 0.8 1.41 1.6 3.2 6.4 12.8 25.6
ASR
k2
Gowalla-Linear Gowalla-Zipf
Yelp-Linear Yelp-Zipf
Over-provision
Under-provision
Average
ASR
over
all
values
of
budget
by
varying
k2
VLDB
2014
26
32. Conclusion
Iden/fied
geocas/ng
as
a
needed
step
to
preseve
privacy
prior
to
workers
consen/ng
to
a
task
Introduced
a
novel
privacy-‐aware
framework
in
SC,
which
enables
workers
par/cipa/on
without
compromising
their
loca/on
privacy
Provided
heuris/cs
and
op/miza/ons
for
determining
effec/ve
geocast
regions
that
achieve
high
assignment
success
rate
with
low
overhead
Experimental
results
on
real
datasets
shows
that
the
proposed
techniques
are
effec/ve
and
the
cost
of
privacy
is
prac/cal
VLDB
2014
32
33. References
VLDB
2014
Hien
To,
Gabriel
Ghinita,
Cyrus
Shahabi.
A
Framework
for
Protec%ng
Worker
Loca%on
Privacy
in
Spa%al
Crowdsourcing.
In
Proceedings
of
the
40th
Interna/onal
Conference
on
Very
Large
Data
Bases
(VLDB
2014)
Hien
To,
Gabriel
Ghinita,
Cyrus
Shahabi.
PriGeoCrowd:
A
Toolbox
for
Private
Spa%al
Crowdsourcing.
(demo)
In
Proceedings
of
the
31st
IEEE
Interna/onal
Conference
on
Data
Engineering
(ICDE
2015)
33