©2009CarnegieMellonUniversity:1
Analyzing the Privacy of
Smartphone Apps
Apr 22, 2013
Shah Amini
Jialiu Lin
Prateek Sachde...
©2013CarnegieMellonUniversity:2
How to Manage
Smartphone Privacy?
• Lots of smart devices
– 1B smartphones worldwide
• Lot...
©2013CarnegieMellonUniversity:3
Smartphones are Intimate
Mobile phones and millennials (Pew 2012):
• 75% use in bed before...
©2013CarnegieMellonUniversity:4
Smartphone Data is Rich
Who we know
(contact list,
social networking)
Who we call
(call lo...
©2013CarnegieMellonUniversity:5
Smartphone Data is Rich
Where we go
(gps, foursquare)
Photos
(some geotagged)
Sensors
(acc...
©2013CarnegieMellonUniversity:6
Inferences from Data
Example: Modeling Social Relationships
• If you were in a jail in Mex...
©2013CarnegieMellonUniversity:7
Inferences from Data
Example: Modeling Social Relationships
• Can we build a richer augmen...
©2013CarnegieMellonUniversity:8
Inferences from Data
Example: Modeling Social Relationships
©2013CarnegieMellonUniversity:9
Inferences from Data
Example: Modeling Social Relationships
©2013CarnegieMellonUniversity:10
Inferences from Data
Example: Modeling Social Relationships
©2013CarnegieMellonUniversity:11
• Friend or not – 92% accuracy
– Using just GPS co-location data
• Life facet {family, so...
©2013CarnegieMellonUniversity:12
Sensor data
Sleep data
(self-reported
ground truth)
Inferences from Data
Example: Sleep
©2013CarnegieMellonUniversity:13
Smartphone Data for Depression
Social Relationships
• Isolation
• Lack of close
family or...
©2013CarnegieMellonUniversity:14
How to Manage
Smartphone Privacy?
• Lots of smart devices
– 1B smartphones worldwide
• Lo...
©2013CarnegieMellonUniversity:15
Shares your location,
gender, unique phone ID,
phone# with advertisers
Uploads your entir...
©2013CarnegieMellonUniversity:16
Many Smartphone Apps Have
“Unusual” Permissions
App Permissions Used
Tiny Flashlight + LE...
©2013CarnegieMellonUniversity:17
Android
• What do these
permissions mean?
• Why does app need
this permission?
• When doe...
©2013CarnegieMellonUniversity:18
Two Threads of Work
• Works in progress, feedback appreciated
• CrowdScanning
– Crowdsour...
©2013CarnegieMellonUniversity:19
CrowdScanning Core Ideas
• Idea 1: find the gap between what
people expect an app to do a...
©2013CarnegieMellonUniversity:20
Nissan Maxima Gear Shift
©2013CarnegieMellonUniversity:21
Privacy as Expectations
• Apply this same idea of mental models
for privacy
– Compare wha...
©2013CarnegieMellonUniversity:22
Crowdsourcing Privacy
• Few people read privacy policies
– We want to install the app
– R...
©2013CarnegieMellonUniversity:23
10% users were surprised this app
wrote contents to their SD card.
25% users were surpris...
©2013CarnegieMellonUniversity:24
Our Study on App Privacy
• Showed crowd workers screenshots and
description of app (from ...
©2013CarnegieMellonUniversity:25
Our Study on App Privacy
• Participants
– Recruited from Mturk, US people only
– Asked wh...
©2013CarnegieMellonUniversity:26
Results for Location Data
(N=20 per app, Expectations Condition)
App Comfort Level (-2 – ...
©2013CarnegieMellonUniversity:27
Most Unexpected Uses
(N=20 per app, Expectations Condition)
• Found strong correlation be...
©2013CarnegieMellonUniversity:28
Showing Purpose Lowers Concerns
• All differences statistically significant
• Big increas...
©2013CarnegieMellonUniversity:29
Scaling Up CrowdScanning
• It took ~2 wks to crowdsource 56 apps
• 700k+ apps for iOS & A...
©2013CarnegieMellonUniversity:30
Scaling Up CrowdScanning
Crawled Data Set
• Crawled 171k apps from Google Play
– App name...
©2013CarnegieMellonUniversity:31
©2013CarnegieMellonUniversity:32
©2013CarnegieMellonUniversity:33
Scaling Up CrowdScanning
Static Analysis of Apps
• Starting assumptions:
– Most apps use ...
©2013CarnegieMellonUniversity:34
Scaling Up CrowdScanning
Libraries are Major Point of Leverage
©2013CarnegieMellonUniversity:35
Scaling Up CrowdScanning
Static Analysis of Apps
• Features extracted:
– Libraries used
–...
©2013CarnegieMellonUniversity:36
Scaling Up CrowdScanning
Tangent: Analyzing App Comments
• Linear regression of most comm...
©2013CarnegieMellonUniversity:37
Two Threads of Work
• CrowdScanning
– Crowdsourcing approach to understand
coarse-grain p...
©2013CarnegieMellonUniversity:38
Gort App Analysis Tool
• Goal of Gort is to help analysts
understand and vet behaviors of...
©2013CarnegieMellonUniversity:39
Example Comparison
• CrowdScanning: Yelp uses location
• Gort: When (what screens) and wh...
©2013CarnegieMellonUniversity:40
Gort v1
Control
Flow Graph
Current
Screen
Servers
contacted
HTTP
details
HTTP
requests
Ma...
©2013CarnegieMellonUniversity:41
Gort v2 Envisioned Workflow
• Start with a pool of apps
• Use heuristics to flag unusual ...
©2013CarnegieMellonUniversity:42
Gort v2 Heuristics for Apps
• Interviewed 13 experts
– Asked what characteristics and beh...
©2013CarnegieMellonUniversity:43
Traversing Screens in Apps
• Have to traverse app for some heuristics
– Ex. when exactly ...
©2013CarnegieMellonUniversity:44
Traversing Screens in Apps
• General case is fairly easy
– Breadth-first-search from home...
©2013CarnegieMellonUniversity:45
Some Hard Cases for Traversal
Dialogs w/
side effects
Text InputsLogins
©2013CarnegieMellonUniversity:46
Some Hard Cases for Traversal
Changes to
system env
App
Updates
Randomized
dialogs
©2013CarnegieMellonUniversity:47
Scaling Up CrowdScanning
Making the Results Public
• What will we do with all these resul...
©2013CarnegieMellonUniversity:48
Public Feedback to Date
• Slate
• Yahoo News
• MSNBC
• Pittsburgh Tribune Review
©2013CarnegieMellonUniversity:49
Thanks!
More info at cmuchimps.org
or email jasonh@cs.cmu.edu
Special thanks to:
• Army R...
©2013CarnegieMellonUniversity:50
©2013CarnegieMellonUniversity:51
The Opportunity
• We are creating
a worldwide
sensor network
with these
smartphones
• We ...
©2013CarnegieMellonUniversity:52
Summary
• Smartphones offer big opportunity
to understand human behavior at
unprecedented...
©2013CarnegieMellonUniversity:53
Reach of Apps Growing
Finances Automobiles Homes
©2013CarnegieMellonUniversity:54
Reach of Apps Growing
Upcoming SlideShare
Loading in …5
×

Analyzing the Privacy of Smartphone Apps, for CMU Cylab Talk on April 2013

176 views
150 views

Published on

This is a talk I gave in April 2013 at Carnegie Mellon University's CyLab weekly seminar. It describes some of our team's latest work on combining crowdsourcing with static and dynamic analysis to understand the privacy and security behaviors of smartphone apps.

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
176
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • image source: http://expatlingo.com/2012/10/04/phone-addicts-hong-kong-welcomes-you/http://news.cnet.com/8301-1035_3-57534132-94/worldwide-smartphone-user-base-hits-1-billion/
  • Start out with a statement that probably won’t be controversial, which is that smartphones are pervasiveAbout 40% of all mobile phones sold today are smartphones, and the number is rapidly growingWhat’s also interestingare trends in how people use these smartphoneshttp://blog.sciencecreative.com/2011/03/16/the-authentic-online-marketer/http://www.generationalinsights.com/millennials-addicted-to-their-smartphones-some-suffer-nomophobia/In fact, Millennials don’t just sleep with their smartphones. 75% use them in bed before going to sleep and 90% check them again first thing in the morning.  Half use them while eating and third use them in the bathroom. A third check them every half hour. Another fifth check them every ten minutes. A quarter of them check them so frequently that they lose count.http://www.androidtapp.com/how-simple-is-your-smartphone-to-use-funny-videos/Pew Research CenterAround 83 percent of those 18- to 29-year-olds sleep with their cell phones within reach. http://persquaremile.com/category/suburbia/
  • Smartphones intimate part of our livesLocation,call logs,SMS,pics, moreCan capture human behavior atunprecedented fidelity and scale
  • We know these relationships, but computers have an overly simplified model of our relationships, usually just “friend”Can we do better?
  • Image adapted from Real Life Social Network, by Paul Adams
  • http://mashable.com/2012/10/31/pipsqueek-bluetooth-smartphone-kids/
  • http://www.google.com/intl/en/chrome/webstore/apps.html
  • Screenshot from AppBrainhttp://www.appbrain.com/stats/libraries/ad
  • http://www.micklerandassociates.com/10-apps-everybody-should-have/
  • cbsnews.com/8301-505263_162-57560825/smartphone-snoops-how-your-phone-data-is-being-shared/
  • DARPAGoogleCMU CyLab
  • http://www.flickr.com/photos/robby_van_moor/478725670/
  • http://about-google-android.blogspot.com/2012/12/an-android-powered-car-infotainment.html
  • Analyzing the Privacy of Smartphone Apps, for CMU Cylab Talk on April 2013

    1. 1. ©2009CarnegieMellonUniversity:1 Analyzing the Privacy of Smartphone Apps Apr 22, 2013 Shah Amini Jialiu Lin Prateek Sachdeva Jason Hong Janne Lindqvist Norman Sadeh Joy Zhang Computer Human Interaction: Mobility Privacy Security
    2. 2. ©2013CarnegieMellonUniversity:2 How to Manage Smartphone Privacy? • Lots of smart devices – 1B smartphones worldwide • Lots of apps – ~700k apps and 40B+ downloads for each of Android and iOS • Highly intimate • Lots of rich data • Lots of inferences
    3. 3. ©2013CarnegieMellonUniversity:3 Smartphones are Intimate Mobile phones and millennials (Pew 2012): • 75% use in bed before going to sleep • 83% sleep with their mobile phones • 90% check first thing in the morning • Half use them while eating • A third use them in the bathroom (!) • A fifth check them every ten minutes
    4. 4. ©2013CarnegieMellonUniversity:4 Smartphone Data is Rich Who we know (contact list, social networking) Who we call (call log) Who we text (sms log, Kakao, social networking)
    5. 5. ©2013CarnegieMellonUniversity:5 Smartphone Data is Rich Where we go (gps, foursquare) Photos (some geotagged) Sensors (accel, sound, light)
    6. 6. ©2013CarnegieMellonUniversity:6 Inferences from Data Example: Modeling Social Relationships • If you were in a jail in Mexico, which of the 500+ “friends” in your phone contact list would come and get you out?
    7. 7. ©2013CarnegieMellonUniversity:7 Inferences from Data Example: Modeling Social Relationships • Can we build a richer augmented social graph? – models tie strength, group, role
    8. 8. ©2013CarnegieMellonUniversity:8 Inferences from Data Example: Modeling Social Relationships
    9. 9. ©2013CarnegieMellonUniversity:9 Inferences from Data Example: Modeling Social Relationships
    10. 10. ©2013CarnegieMellonUniversity:10 Inferences from Data Example: Modeling Social Relationships
    11. 11. ©2013CarnegieMellonUniversity:11 • Friend or not – 92% accuracy – Using just GPS co-location data • Life facet {family, social, work} – 90% • Tie strength {low, med, high} – 75% – Using just contacts, call logs, SMS logs Cranshaw et al, Bridging the Gap Between Physical Location and Online Social Networks, Ubicomp 2010. Min et al, Mining Smartphone Data to Classify Life-Facets of Social Relationships, CSCW 2013. Inferences from Data Example: Modeling Social Relationships
    12. 12. ©2013CarnegieMellonUniversity:12 Sensor data Sleep data (self-reported ground truth) Inferences from Data Example: Sleep
    13. 13. ©2013CarnegieMellonUniversity:13 Smartphone Data for Depression Social Relationships • Isolation • Lack of close family or friends Physical Activities • Mobility • Consistency • Places you go to Sleep Patterns • Excessive sleep • Too little sleep • Change over time Cognitive Behaviors • Multitasking • Lots of phone use
    14. 14. ©2013CarnegieMellonUniversity:14 How to Manage Smartphone Privacy? • Lots of smart devices – 1B smartphones worldwide • Lots of apps – ~700k apps and 40B+ downloads for each of Android and iOS • High intimacy • Lots of rich data • Lots of inferences
    15. 15. ©2013CarnegieMellonUniversity:15 Shares your location, gender, unique phone ID, phone# with advertisers Uploads your entire contact list to their server (including phone #s) What are your apps really doing?
    16. 16. ©2013CarnegieMellonUniversity:16 Many Smartphone Apps Have “Unusual” Permissions App Permissions Used Tiny Flashlight + LED Internet Access, phone# Backgrounds Contact List Dictionary Location Bible Quotes Location • Advertising, malware, bootstrapping social networks, future permissions
    17. 17. ©2013CarnegieMellonUniversity:17 Android • What do these permissions mean? • Why does app need this permission? • When does it use these permissions?
    18. 18. ©2013CarnegieMellonUniversity:18 Two Threads of Work • Works in progress, feedback appreciated • CrowdScanning – Crowdsourcing approach to understand coarse-grain privacy perceptions of apps • Gort – Tool for analysts to understand fine-grain app behaviors
    19. 19. ©2013CarnegieMellonUniversity:19 CrowdScanning Core Ideas • Idea 1: find the gap between what people expect an app to do and what it actually does • Idea 2: use crowdsourcing to do this (crowdsource privacy) Lin et al, Expectation and Purpose: Understanding User’s Mental Models of Mobile App Privacy thru Crowdsourcing. Ubicomp 2012.
    20. 20. ©2013CarnegieMellonUniversity:20 Nissan Maxima Gear Shift
    21. 21. ©2013CarnegieMellonUniversity:21 Privacy as Expectations • Apply this same idea of mental models for privacy – Compare what people expect an app to do vs what an app actually does – Emphasize the biggest gaps, misconceptions that many people had App Behavior (What an app actually does) User Expectations (What people think the app does)
    22. 22. ©2013CarnegieMellonUniversity:22 Crowdsourcing Privacy • Few people read privacy policies – We want to install the app – Reading policies not part of main task – Complexity of these policies (the pain!!!) – Clear cost (time) for unclear benefit • Crowdsourcing can mitigate these problems
    23. 23. ©2013CarnegieMellonUniversity:23 10% users were surprised this app wrote contents to their SD card. 25% users were surprised this app sent their approximate location to dictionary.com for searching nearby words. 85% users were surprised this app sent their phone’s unique ID to mobile ads providers. 0% users were surprised this app could control their audio settings. See all 90% users were surprised this app sent their precise location to mobile ads providers. 95% users were surprised this app sent their approximate location to mobile ads providers. 95% users were surprised this app sent their phone’s unique ID to mobile ads providers. 0% users were surprised this app can control camera flashlight.
    24. 24. ©2013CarnegieMellonUniversity:24 Our Study on App Privacy • Showed crowd workers screenshots and description of app (from Google Play) – 56 of top 100 Android Apps • Showed permissions one at a time – Only those related to privacy • Expectation Condition – Why they think the app uses permission – How comfortable they were with it • Purpose Condition – We gave an explanation (based on our analysis) – How comfortable they were with it
    25. 25. ©2013CarnegieMellonUniversity:25 Our Study on App Privacy • Participants – Recruited from Mturk, US people only – Asked what version of Android OS they used – Between-subjects (one condition only) • Method – Only 56 of top 100 apps requested use of unique phone ID, contact list, or location • Led to a total of 134 app-resource pairs – 20 participants per pair per condition • 2*20*134 = 5360 tasks
    26. 26. ©2013CarnegieMellonUniversity:26 Results for Location Data (N=20 per app, Expectations Condition) App Comfort Level (-2 – 2) Maps 1.52 GasBuddy 1.47 Weather Channel 1.45 Foursquare 0.95 TuneIn Radio 0.60 Evernote 0.15 Angry Birds -0.70 Brightest Flashlight Free -1.15 Toss It -1.2
    27. 27. ©2013CarnegieMellonUniversity:27 Most Unexpected Uses (N=20 per app, Expectations Condition) • Found strong correlation between expectations & comfort level (r=0.91) Apps using Contact List Comfort Level (-2 – 2) Backgrounds HD Wallpaper -1.35 Pandora -0.70 GO Launcher EX -0.75
    28. 28. ©2013CarnegieMellonUniversity:28 Showing Purpose Lowers Concerns • All differences statistically significant • Big increases for dictionary, Shazam, Air Control Lite, and others (> 1.0) App Comfort w/ Purpose Comfort w/o Purpose Device ID 0.47 ( =0.30) -0.10 ( =0.41) Contact List 0.66 ( =0.22) 0.16 ( =0.54) Network Location 0.90 ( =0.53) 0.65 ( =0.55) GPS Location 0.72 ( =0.62) 0.35 ( =0.73)
    29. 29. ©2013CarnegieMellonUniversity:29 Scaling Up CrowdScanning • It took ~2 wks to crowdsource 56 apps • 700k+ apps for iOS & Android markets • Idea: Use static & dynamic analysis + clustering for privacy models of apps – Ex. “Games uses location” -1.3 – Ex. “Uses location for map” +0.5
    30. 30. ©2013CarnegieMellonUniversity:30 Scaling Up CrowdScanning Crawled Data Set • Crawled 171k apps from Google Play – App name – Category (Arcade, Finance, etc) – Number of downloads – Average user rating (1-5) – Rating distribution – Price – Content Rating – 13M user reviews
    31. 31. ©2013CarnegieMellonUniversity:31
    32. 32. ©2013CarnegieMellonUniversity:32
    33. 33. ©2013CarnegieMellonUniversity:33 Scaling Up CrowdScanning Static Analysis of Apps • Starting assumptions: – Most apps use third-party libraries – When sensitive data is used, b/c libraries • Ex. Location sent to ad server via library • Ex. Location sent to Google for maps • Understanding what libraries app uses and how they are used can offer us richer semantics and explanations
    34. 34. ©2013CarnegieMellonUniversity:34 Scaling Up CrowdScanning Libraries are Major Point of Leverage
    35. 35. ©2013CarnegieMellonUniversity:35 Scaling Up CrowdScanning Static Analysis of Apps • Features extracted: – Libraries used – Network conn (in library or in main code) – Permissions (in library or main code) • 124k apps processed – Uses PyDev (Python for Eclipse) and AndroGuard (reverse eng apps) – 5 Amazon EC2 instances, 30 secs / app • Will crowdsource core set of 400 apps and build models to predict privacy
    36. 36. ©2013CarnegieMellonUniversity:36 Scaling Up CrowdScanning Tangent: Analyzing App Comments • Linear regression of most common words to 5-star ratings – Out of 1M comments, 8% of dataset – Only 0.09% comments related to privacy
    37. 37. ©2013CarnegieMellonUniversity:37 Two Threads of Work • CrowdScanning – Crowdsourcing approach to understand coarse-grain privacy perceptions of apps • Gort – Tool for analysts to understand fine-grain app behaviors
    38. 38. ©2013CarnegieMellonUniversity:38 Gort App Analysis Tool • Goal of Gort is to help analysts understand and vet behaviors of apps – Journalists – Privacy advocates – Three letter agencies
    39. 39. ©2013CarnegieMellonUniversity:39 Example Comparison • CrowdScanning: Yelp uses location • Gort: When (what screens) and why?
    40. 40. ©2013CarnegieMellonUniversity:40 Gort v1 Control Flow Graph Current Screen Servers contacted HTTP details HTTP requests Market description Permissions used Personal data sent
    41. 41. ©2013CarnegieMellonUniversity:41 Gort v2 Envisioned Workflow • Start with a pool of apps • Use heuristics to flag unusual behaviors to direct analyst’s attention – Static and dynamic heuristics • See overview of apps, view individual apps, check odd behaviors and context (screens)
    42. 42. ©2013CarnegieMellonUniversity:42 Gort v2 Heuristics for Apps • Interviewed 13 experts – Asked what characteristics and behaviors they would check to vet an app – Got ~100 heuristics, still organizing them Network • Sends password w/o SSL • Connects to fixed IP address Permissions • Contact List • Location but not for maps or ads • Uses mic Phone / SMS • SMS to fixed / premium num • Forwards SMS to server
    43. 43. ©2013CarnegieMellonUniversity:43 Traversing Screens in Apps • Have to traverse app for some heuristics – Ex. when exactly does the app use location? – Also want to capture screenshots
    44. 44. ©2013CarnegieMellonUniversity:44 Traversing Screens in Apps • General case is fairly easy – Breadth-first-search from home screen – Uses TEMA to get widgets on screen – Use Android’s MonkeyRunner to simulate input and get screenshots • But lots of exception cases…
    45. 45. ©2013CarnegieMellonUniversity:45 Some Hard Cases for Traversal Dialogs w/ side effects Text InputsLogins
    46. 46. ©2013CarnegieMellonUniversity:46 Some Hard Cases for Traversal Changes to system env App Updates Randomized dialogs
    47. 47. ©2013CarnegieMellonUniversity:47 Scaling Up CrowdScanning Making the Results Public • What will we do with all these results? • Basic idea: deploy a web site – Let public see results of our scans – Show privacy scores (and explanations) – Tell app developers how to fix their apps • Awareness, Knowledge, Motivation • Still early stages here, should have first iteration of site out end of May
    48. 48. ©2013CarnegieMellonUniversity:48 Public Feedback to Date • Slate • Yahoo News • MSNBC • Pittsburgh Tribune Review
    49. 49. ©2013CarnegieMellonUniversity:49 Thanks! More info at cmuchimps.org or email jasonh@cs.cmu.edu Special thanks to: • Army Research Office • National Science Foundation • Alfred P. Sloan Foundation • Google • CMU Cylab Join our community for researchers at: www.reddit.com/r/pervasivecomputing
    50. 50. ©2013CarnegieMellonUniversity:50
    51. 51. ©2013CarnegieMellonUniversity:51 The Opportunity • We are creating a worldwide sensor network with these smartphones • We can now capture and analyze human behavior at unprecedented fidelity and scale
    52. 52. ©2013CarnegieMellonUniversity:52 Summary • Smartphones offer big opportunity to understand human behavior at unprecedented fidelity and scale • Augmented Social Graph • Urban Analytics • CrowdScanning
    53. 53. ©2013CarnegieMellonUniversity:53 Reach of Apps Growing Finances Automobiles Homes
    54. 54. ©2013CarnegieMellonUniversity:54 Reach of Apps Growing

    ×