More Related Content Similar to Privacy, Ethics, and Big (Smartphone) Data, at Mobisys 2014 (20) Privacy, Ethics, and Big (Smartphone) Data, at Mobisys 201412. ©2014CarnegieMellonUniversity:12
The Challenge
• Mobile and Cloud computing a
clear part of the future
• Lots of fun technical challenges here
– Computer scientists are great here!
– We’re awesome at optimizing things
• But biggest challenges will likely
relate to privacy and ethics
13. ©2014CarnegieMellonUniversity:13
Three Stories
• Reflections and personal experiences
• Livehoods
– How much can we infer about people?
– Who wins, who doesn’t with tech?
• Google Glass
– Why so much negative feedback?
– Are there lessons we can learn?
• PrivacyGrade
– What are ways of protecting people?
– What is the role of developers?
14. ©2014CarnegieMellonUniversity:14
Story 1: The Challenge of
Getting Data About Our Cities
• Today’s methods for getting
city data slow, costly, limited
– Ex. Travel Behavioral Inventory
– US Census 2010 cost $13b
– Quality of life surveys
• Some approaches today:
– Call Data Records
– Deploy a custom app
16. ©2014CarnegieMellonUniversity:16
Livehoods, Our First Urban
Analytics Tool
• The character of an urban area is defined
not just by the types of places found
there, but also by the people that make
it part of their daily life
Cranshaw et al, The Livehoods Project: Utilizing Social Media
to Understand the Dynamics of a City, ICWSM 2012.
18. ©2014CarnegieMellonUniversity:18
Two Perspectives of Cities
“Socially constructed”
Neighborhoods are organic,
cultural artifacts. Borders are
blurry, imprecise, and may be
different to different people.
Can we discover
automated ways of
identifying the “organic”
boundaries of the city?
Can we extract local
cultural knowledge from
social media?
Can we build a collective
cognitive map from data?
31. ©2014CarnegieMellonUniversity:31
Reflections on Urban Analytics
Few Privacy Issues
• Publicly visible data with no
expectations of privacy
– No IRB issues
• Remove venues labeled as “home”
– We only received one request to remove
a venue (wasn’t labeled as a home)
• We only show data about geographic
areas vs individuals
• So far, so good
32. ©2014CarnegieMellonUniversity:32
Reflections on Urban Analytics
But Lots of Questions About Ethics
• Discussions with lots of different folks
on how data like this might be used in
the future
– Urban planners, Yahoo, Facebook,
Google Maps, Startups, and more
• Lots of great questions
– Not so many great answers
– Share some of these with you
34. ©2014CarnegieMellonUniversity:34
How Much Can Be Inferred?
• Very likely much more can be inferred
using rich data like this
– Demographics, socioeconomic, friends
– Physical and mental health (depression)
– How “risky” you are (bars, clinics, etc)
• Unclear how far inferencing can go
– Also, not much can stop advertisers, NSA,
startups, etc
– And, not just what you do, but what
others like you are doing
36. ©2014CarnegieMellonUniversity:36
“[An analyst at Target] was able to identify about
25 products that… allowed him to assign each
shopper a ‘pregnancy prediction’ score. [H]e
could also estimate her due date to within a small
window, so Target could send coupons timed to
very specific stages of her pregnancy.” (NYTimes)
38. ©2014CarnegieMellonUniversity:38
A New Kind of Redlining?
• “denying, or charging more for,
services such as banking, insurance,
access to health care, … supermarkets,
or denying jobs … against a particular
group of people” (Wikipedia)
39. ©2014CarnegieMellonUniversity:39
Huge Risk of New Kind of
Redlining
Map of Philadelphia
showing redlining of lower
income neighborhoods.
Households and businesses
in the red zones could not
get mortgages or business
loans. (Wikipedia)
40. ©2014CarnegieMellonUniversity:40
Huge Risk of New Kind of
Redlining
Johnson says his jaw
dropped when he read one
of the reasons American
Express gave for lowering
his credit limit:
"Other customers who have
used their card at
establishments where you
recently shopped have a
poor repayment history
with American Express."
41. ©2014CarnegieMellonUniversity:41
Potential Biases in the Data?
Pittsburgh’s Hill District
Was ground zero for Jazz
musicians in 20th century
8,000 residents and 400
businesses, decimating the
economic center of African-
American Pittsburgh
Median Income (2009):
$17,939
42. ©2014CarnegieMellonUniversity:42
Potential Biases in the Data?
• Socioeconomic bias
– Little data from lower socioeconomic areas
• Urban bias
– Twitter, Flickr, Foursquare all more active
per capita in cities
• Age and gender bias
– Most young, male, technology-savvy
• Is this a problem that will solve itself?
– Or, can we address this in our models?
44. ©2014CarnegieMellonUniversity:44
Who Gains From this Data?
• Can we design systems that share
the value across more people?
– People co-create data and gain value
– Participatory design philosophy
• Can we also make
people feel more
invested in the
cities they live in?
49. ©2014CarnegieMellonUniversity:49
Why Such Negative Press?
• PARC knew privacy a big problem
– But didn’t know what to do
– So didn’t build any privacy protections
• Unclear value proposition
– Focused on technical aspects
– What benefits to end-users?
• Google Glass is replaying
the past
50. ©2014CarnegieMellonUniversity:50
Why Such Negative Press?
• Grudin’s law
– Why does groupware fail?
– “When those who benefit are not
those who do the work, then the
technology is likely to fail, or at least
be subverted”
• Privacy corollary
– When those who bear the privacy risks do
not benefit in proportion to the perceived
risks, the tech is likely to fail
54. ©2014CarnegieMellonUniversity:54
But Expectations Can Change
One resort felt the trend so heavily that it
posted a notice: “PEOPLE ARE FORBIDDEN
TO USE THEIR KODAKS ON THE BEACH.”
Other locations were no safer. For a time,
Kodak cameras were banned from the
Washington Monument.
The “Hartford Courant” sounded the alarm
as well, declaring the “the sedate citizen
can’t indulge in any hilariousness without
the risk of being caught in the act and
having his photograph passed around
among his Sunday School children.”
56. ©2014CarnegieMellonUniversity:56
Privacy Hump
• A lot of our concerns about tech fall
under umbrella term “privacy”
– Value, fears, expectations,
what others around us think
Many legitimate concerns
Many alarmist rants
“Right” way to deploy?
Value proposition?
Rules on proper use?
Things have settled down
Few fears materialized
People feel in control
People get tangible value
61. ©2014CarnegieMellonUniversity:61
What Do Developers Know
about Privacy?
• Interviews with 13 app developers
• Surveys with 228 app developers
• What tools? Knowledge? Incentives?
• Points of leverage?
Balebako et al, The Privacy and Security Behaviors
of Smartphone App Developers. USEC 2014.
62. ©2014CarnegieMellonUniversity:62
Summary of Findings
• App developers not evil
• But need to monetize
– Ads and analytics heavily used
– Turns out libraries big source of problems
• Also don’t know what to do
– “I haven’t even read [our privacy policy]. I
mean, it’s just legal stuff that’s required, so
I just put in there.”
– Often just ask other people around them
– Low awareness of guidelines
63. ©2014CarnegieMellonUniversity:63
Developers Need a Lot of Help!
• Good set best practices for security
– SSL, hashing of passwords, randomization
– Common attacks: SQL, XSS, CSRF
• What are best practices for privacy???
• Developers also have many problems
– App functionality, bandwidth, power,
making money… privacy far down the list
64. ©2014CarnegieMellonUniversity:64
What are your apps really doing?
Shares your location,
gender, unique phone ID,
phone# with advertisers
Uploads your entire
contact list to their server
(including phone #s)
Lin et al, Expectation and Purpose: Understanding User’s Mental Models
of Mobile App Privacy thru Crowdsourcing. Ubicomp 2012.
67. ©2014CarnegieMellonUniversity:67
Privacy as Expectations
• Apply this same idea of mental models
for privacy
– Compare what people expect an app
to do vs what an app actually does
– Emphasize the biggest gaps,
misconceptions that many people had
App Behavior
(What an app
actually does)
User Expectations
(What people think
the app does)
68. ©2014CarnegieMellonUniversity:68
10% users were surprised this app
wrote contents to their SD card.
25% users were surprised this app
sent their approximate location to
dictionary.com for searching nearby
words.
85% users were surprised this app
sent their phone’s unique ID to
mobile ads providers.
0% users were surprised this app
could control their audio settings.
See all
90% users were surprised this app
sent their precise location to
mobile ads providers.
95% users were surprised this app
sent their approximate location
to mobile ads providers.
95% users were surprised this app
sent their phone’s unique ID to
mobile ads providers.
0% users were surprised this app
can control camera flashlight.
69. ©2014CarnegieMellonUniversity:69
Results for Location Data
(N=20 per app, Expectations Condition)
App Comfort Level (-2 – 2)
Maps 1.52
GasBuddy 1.47
Weather Channel 1.45
Foursquare 0.95
TuneIn Radio 0.60
Evernote 0.15
Angry Birds -0.70
Brightest Flashlight Free -1.15
Toss It -1.2
71. ©2014CarnegieMellonUniversity:71
How PrivacyGrade Works
• Libraries can offer us insight into the
purpose of a permission request
– Location used by Google Maps library vs
Location used by advertising library
• Create a model of people’s concerns of
permission by purpose
Lin et al, Modeling Users’ Mobile App Privacy Preferences: Restoring
Usability in a Sea of Permission Settings. SOUPS 2014.
83. ©2014CarnegieMellonUniversity:83
Thanks!
More info at cmuchimps.org
or email jasonh@cs.cmu.edu
Special thanks to:
• Army Research Office
• NSF
• Alfred P. Sloan
• NQ Mobile
• DARPA
• Google
• CMU Cylab
• Shah Amini
• Justin Cranshaw
• Afsaneh Doryab
• Jialiu Lin
• Song Luan
• Jun-Ki Min
• Dan Tasse
• Jason Wiese
87. ©2014CarnegieMellonUniversity:87
Using Location Data to
Infer Friendships
• 2.8m location sightings of
489 users of Locaccino
friend finder in Pittsburgh
• Place entropy for inferring
social quality of a place
– #unique people seen in a place
– 0.0002 x 0.0002 lat/lon grid,
~30m x 30m
Cranshaw et al, Bridging the Gap Between Physical Location and
Online Social Networks, Ubicomp 2010
88. ©2014CarnegieMellonUniversity:88
Inferring Friendships
• 67 different machine learning features
– Location diversity (and entropy)
– Intensity and Duration
– Specificity (TF-IDF)
– Graph structure (overlap in friends)
• 92% accuracy in predicting friend/not
– Location entropy improves performance
over shallow features like #co-locations
97. ©2014CarnegieMellonUniversity:97
How PrivacyGrade Works (1)
• We operationalize privacy as people’s
expectations of data use
– Ex. Most people don’t expect Fruit Ninja
to use location data, but it actually does
– Ex. Most people do expect Google Maps
to use location data