[WWW2014] Reconciling Mobile App Privacy and Usability on Smartphones: Could User Privacy Profiles Help?
1. Reconciling Mobile App Privacy and
Usability on Smartphones:
Bin Liu, Jialiu Lin, Norman Sadeh
School of Computer Science
Carnegie Mellon University
1
Could User Privacy Profiles Help?
2. Explosion of Smartphone Privacy Settings
• Users are expected to configure/review an
unrealistically large number of privacy / permission
settings! [Lin 2012]
2
3. Approaches for Privacy Settings
iOS Privacy Settings
Give users fine-grained
controls but the number of
decisions is overwhelming
3
4. Approaches for Privacy Settings
Android Permissions
Grant permissions upfront
during installation
Will not actively show them
again
Grant/Deny on a per-app
basis.
Insufficient control
4
5. Neither of them are working well
• iOS Privacy Settings:
– Users are overwhelmed with options!
• Android Permissions:
– Ineffective [Felt 2012], bad timing [Kelley et al. 2012],
– Lack of sufficiently fine control.
– In Android 4.3, App Ops are introduced:
• Post-installation fine-grained control
5
6. 6
App
Ops
Then
it
has
same
problem
as
iOS
Privacy
Se6ngs:
Way
more
se3ngs
that
users
can
handle!
7. How can we simplify this process?
• Users care about app privacy [Lin 2012, Kelly 2012]
– but they are overwhelmed with options.
• Can default settings take care of everything?
– Ideally people feel and behave in similar ways
– However, [Agarwal 2013, Lin 2012]
people’s app privacy preferences are diverse.
7
8. Research Question
Can we have a manageable framework that
capture users’ diverse preferences and
reduce their preferences into small number of
profiles?
8
I
want
to
choose
the
profile
that:
Protect
my
locaDon
informaDon…
Keep
my
phone
call
history
away
from
social
apps
…
Whatever,
just
give
those
apps
the
permissions.
9. Dataset Description
• We were given access to a unique corpus of data:
users’ actual permission settings from LBE Privacy
Guard
– This app runs on rooted Android phones.
– Available on Google Play and several other app stores.
• It relies on API interception technology to give users
the ability to control 12 permissions that can
possibly be requested by an app
– e.g. location, phone ID, call monitoring, SMS, etc.
9
10. 10
A sample snapshot of LBE Privacy Guard on a MIUI 2 phone
Users
can
choose
“Allow”,
“Deny”,
“Ask
to
be
dynamically
prompted”
or
leave
them
as
“Default”
(which
is
managed
by
LBE).
11. Dataset Description
• Permission settings of 4.8 million LBE users
– over a 10-day time period (May 1st ~ 10th, 2013)
• Users’ settings are mostly stable after 10 days.
Majority are done changing.
– Format: [user, app, permission, decision]
• Decision: Allow, Deny, Ask, Default.
• We can know if each decision is made / reviewed by
user or default settings. (And in the analysis, we
exclude decisions if users are not involved in them)
11
12. • Preprocessing
– Users: >=20 apps, >=1 non-default and >=1 non-allow
settings.
– Apps: >=10 users, >=1 permission requests, available
on Google Play during the same time period.
– Permission Request from an app: >5 users’ decisions.
• After the screening process, the corpus analyzed
in this study includes:
– 239,402 representative users
– 12,119 representative apps
– 28,630,179 decision records
– On average, each user has 22.66 apps; each app
requested 3.03 of all observed 12 permissions.
12
13. Diversity of users’ app permission settings
App-permission pairs with >80%
agreement among users:
ONLY 63.9%
One-size-fits-all does not apply!
13
Distribution of users’ decisions
(“Allow”, “Deny” and “Ask”)
for each app-permission pair
14. Could we predict their settings?
• Specifically, can we learn a function F:
– Assumption: We limit the model where the set of
decisions is restricted to “Allow” or “Deny” (the
majority decisions) in this study.
– We trained classifiers based on training dataset,
and evaluate classifier using 20% of apps a user
has already installed to make predictions on the
other 80%.
• Equivalent to assuming that a user has already
installed 4 or 5 apps using the corresponding
app-permission decisions.
14
15. High Dimensionality & Sparsity Challenge
• Users’ data is sparse.
– On average, each user will only install 22.66 apps
from 12,119 apps.
• Two approaches were considered
– Approach 1: Aggregation
• Instead of dealing with the decisions app by app,
study users’ preferences at an aggregate level.
– Approach 2: SVD
• similar techniques used in recommender systems.
15
16. Details of the prediction settings
• Apply SVM classifier with linear kernels (LibLinear)
– Efficient for large-scale input (14.5 million rows)
– Convenient to implant additional features
• Aggregation:
– Collect users’ general preferences on each of the 12
permissions for all app installed.
• SVD
– Reduce the dimensionality of matrix (#user * #(app-permission
pairs) into 100. (The app-permission pairs are generated from
the permission requests of the most 1000 popular apps)
16
18. Interactive Process
• Fully predicting users’ settings is hard.
• A more realistic way to predict users’ settings:
Figure them out if we have confidence,
otherwise, ask the users.
– Estimate confidence for each decision query from the
training of the classifier.
– Select a small subset of questions that our
classifier is relatively uncertain and ask the users.
(Not random samples)
18
19. 19
Predic?on
accuracy
improves
as
users
entered
more
decisions
in
the
interac?ve
process.
(87.8%
-‐>
92%
with
addiDonal
input
of
10%
of
users’
se3ngs.)
• Tested with a purely analytical settings:
– We take users’ final settings in the time period as
ground truth.
– Further user experiments will be conducted.
20. Simplifying Privacy Decisions Using
Privacy Profiles
• Intuition: Though users' preferences are diverse,
there are strong correlations that enable us to
identify a small set of privacy profiles
• Question: Is it possible to develop easy-to-
understand privacy profiles that capture users’
different preferences?
– Each profile effectively corresponds to a group of like-
minded users.
– We can match individual users with profiles by
showing descriptions or asking a few questions.
20
21. Generating Privacy Profiles
• Clustering Like-Minded Users
• We represent each user as a vector of their
aggregated preference on each of the 12
permissions.
– According our previous results, features of aggregation
on permissions boosted the performance of the decision
prediction.
• Then we apply K-mean algorithm on their
characteristic vectors with Euclidean distance to
identify the clusters.
21
22. How many privacy profiles do we need?
• The effectiveness depends on clustering method
and actual users’ experiences.
• Metrics to consider for a good value of K
– Prediction Accuracy
• If we replace users’ identity into profile membership and re-
run the classification task, how accurate can we predict?
– Interpretability & Understandability
• Compact descriptions could be presented to users who
would then identify which profile is the best match.
– Stability of Privacy Profiles
• If only part of users’ decisions are observable, will each
user be matched with the same privacy profile? 22
24. 24
C1 C2 C3
Send SMS
Phone Call
SMS DB
Contact
Call Log
Positioning
Phone ID
3G Network
Wi−Fi Network
ROOT
Phone State
Call Monitoring
−1.0
−0.5
0.0
0.5
1.0
C1 C2 C3 C4
Send SMS
Phone Call
SMS DB
Contact
Call Log
Positioning
Phone ID
3G Network
Wi−Fi Network
ROOT
Phone State
Call Monitoring
−1.0
−0.5
0.0
0.5
1.0
C1 C2 C3 C4 C5
Send SMS
Phone Call
SMS DB
Contact
Call Log
Positioning
Phone ID
3G Network
Wi−Fi Network
ROOT
Phone State
Call Monitoring
−1.0
−0.5
0.0
0.5
1.0
C1 C2
nd SMS
one Call
SMS DB
Contact
Call Log
sitioning
hone ID
Network
Network
ROOT
ne State
nitoring
−1.0
−0.5
0.0
0.5
1.0
Comparing Aggregated Preferences for
Different K
K
=
2
K
=
3
K
=
4
K
=
5
Fine
Differences
Coarse
Differences
25. 25
Comparing Variations of Users’ Settings
C1
SMS
Call
DB
ntact
Log
ning
e ID
work
work
OOT
tate
ring
K=1 avg=0.511
0.0
0.2
0.4
0.6
0.8
1.0
C1 C2 C3
Send SMS
Phone Call
SMS DB
Contact
Call Log
Positioning
Phone ID
3G Network
Wi−Fi Network
ROOT
Phone State
Call Monitoring
K=3 avg=0.251
0.0
0.2
0.4
0.6
0.8
1.0
C1 C2 C3 C4
Send SMS
Phone Call
SMS DB
Contact
Call Log
Positioning
Phone ID
3G Network
Wi−Fi Network
ROOT
Phone State
Call Monitoring
K=4 avg=0.231
0.0
0.2
0.4
0.6
0.8
1.0
K
=
1
K
=
3
K
=
4
K
=
5
avg
=
0.511
avg
=
0.251
avg
=
0.231
avg
=
0.216
C1 C2 C3 C4 C5
Send SMS
Phone Call
SMS DB
Contact
Call Log
Positioning
Phone ID
3G Network
Wi−Fi Network
ROOT
Phone State
Call Monitoring
K=5 avg=0.216
0.0
0.2
0.4
0.6
0.8
1.0
Variances
of
users’
se6ngs
on
each
permission
has
been
significantly
reduced
using
profiles.
26. 26
For
each
profile,
users’
overall
preferences
are
clearer
and
their
decisions
are
more
similar
to
each
other.
27. Assigning users into profiles
• Description-based approach:
Discriminative features of each profile
– What decisions do users in this profile usually
have, which are relatively unique?
27
28. 28
These features can also provide a basis for asking a
few questions to users and determine in which
cluster their preferences fall.
C1 C2 C3 C4 C5 C6
Send SMS
Phone Call
SMS DB
Contact
Call Log
Positioning
Phone ID
3G Network
Wi−Fi Network
ROOT
Phone State
Call Monitoring
−1.
−0.
0.
0.
1.
29. Concluding Remarks
• Mobile apps can access a wide range of sensitive
data and functionality
• Mobile app users care about privacy but in different
ways: one size-fits-all settings would not work
• However, our research shows that a small set of
privacy profiles and limited number of interactions
with users can go a long way in accurately capturing
people’s privacy preferences
– Privacy profiles & simple dialogues can go a
long way in reconciling mobile app privacy and
usability
29
30. Concluding Remarks
• We are refining our prediction with
inputs from users, tuning features and
models.
• Showing deeper analyzed information,
such as purposes why app requested
the permission, can also help users to
make decisions (paper submitted for
publication)
• Human subject experiments will be
conducted to evaluate how users
respond to these interfaces. 30
I got it.
I should
choose this
profile.