Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analyzing the Privacy of Smartphone Apps, for CMU Cylab Talk on April 2013

221 views

Published on

This is a talk I gave in April 2013 at Carnegie Mellon University's CyLab weekly seminar. It describes some of our team's latest work on combining crowdsourcing with static and dynamic analysis to understand the privacy and security behaviors of smartphone apps.

Published in: Technology, News & Politics
  • Be the first to comment

  • Be the first to like this

Analyzing the Privacy of Smartphone Apps, for CMU Cylab Talk on April 2013

  1. 1. ©2009CarnegieMellonUniversity:1 Analyzing the Privacy of Smartphone Apps Apr 22, 2013 Shah Amini Jialiu Lin Prateek Sachdeva Jason Hong Janne Lindqvist Norman Sadeh Joy Zhang Computer Human Interaction: Mobility Privacy Security
  2. 2. ©2013CarnegieMellonUniversity:2 How to Manage Smartphone Privacy? • Lots of smart devices – 1B smartphones worldwide • Lots of apps – ~700k apps and 40B+ downloads for each of Android and iOS • Highly intimate • Lots of rich data • Lots of inferences
  3. 3. ©2013CarnegieMellonUniversity:3 Smartphones are Intimate Mobile phones and millennials (Pew 2012): • 75% use in bed before going to sleep • 83% sleep with their mobile phones • 90% check first thing in the morning • Half use them while eating • A third use them in the bathroom (!) • A fifth check them every ten minutes
  4. 4. ©2013CarnegieMellonUniversity:4 Smartphone Data is Rich Who we know (contact list, social networking) Who we call (call log) Who we text (sms log, Kakao, social networking)
  5. 5. ©2013CarnegieMellonUniversity:5 Smartphone Data is Rich Where we go (gps, foursquare) Photos (some geotagged) Sensors (accel, sound, light)
  6. 6. ©2013CarnegieMellonUniversity:6 Inferences from Data Example: Modeling Social Relationships • If you were in a jail in Mexico, which of the 500+ “friends” in your phone contact list would come and get you out?
  7. 7. ©2013CarnegieMellonUniversity:7 Inferences from Data Example: Modeling Social Relationships • Can we build a richer augmented social graph? – models tie strength, group, role
  8. 8. ©2013CarnegieMellonUniversity:8 Inferences from Data Example: Modeling Social Relationships
  9. 9. ©2013CarnegieMellonUniversity:9 Inferences from Data Example: Modeling Social Relationships
  10. 10. ©2013CarnegieMellonUniversity:10 Inferences from Data Example: Modeling Social Relationships
  11. 11. ©2013CarnegieMellonUniversity:11 • Friend or not – 92% accuracy – Using just GPS co-location data • Life facet {family, social, work} – 90% • Tie strength {low, med, high} – 75% – Using just contacts, call logs, SMS logs Cranshaw et al, Bridging the Gap Between Physical Location and Online Social Networks, Ubicomp 2010. Min et al, Mining Smartphone Data to Classify Life-Facets of Social Relationships, CSCW 2013. Inferences from Data Example: Modeling Social Relationships
  12. 12. ©2013CarnegieMellonUniversity:12 Sensor data Sleep data (self-reported ground truth) Inferences from Data Example: Sleep
  13. 13. ©2013CarnegieMellonUniversity:13 Smartphone Data for Depression Social Relationships • Isolation • Lack of close family or friends Physical Activities • Mobility • Consistency • Places you go to Sleep Patterns • Excessive sleep • Too little sleep • Change over time Cognitive Behaviors • Multitasking • Lots of phone use
  14. 14. ©2013CarnegieMellonUniversity:14 How to Manage Smartphone Privacy? • Lots of smart devices – 1B smartphones worldwide • Lots of apps – ~700k apps and 40B+ downloads for each of Android and iOS • High intimacy • Lots of rich data • Lots of inferences
  15. 15. ©2013CarnegieMellonUniversity:15 Shares your location, gender, unique phone ID, phone# with advertisers Uploads your entire contact list to their server (including phone #s) What are your apps really doing?
  16. 16. ©2013CarnegieMellonUniversity:16 Many Smartphone Apps Have “Unusual” Permissions App Permissions Used Tiny Flashlight + LED Internet Access, phone# Backgrounds Contact List Dictionary Location Bible Quotes Location • Advertising, malware, bootstrapping social networks, future permissions
  17. 17. ©2013CarnegieMellonUniversity:17 Android • What do these permissions mean? • Why does app need this permission? • When does it use these permissions?
  18. 18. ©2013CarnegieMellonUniversity:18 Two Threads of Work • Works in progress, feedback appreciated • CrowdScanning – Crowdsourcing approach to understand coarse-grain privacy perceptions of apps • Gort – Tool for analysts to understand fine-grain app behaviors
  19. 19. ©2013CarnegieMellonUniversity:19 CrowdScanning Core Ideas • Idea 1: find the gap between what people expect an app to do and what it actually does • Idea 2: use crowdsourcing to do this (crowdsource privacy) Lin et al, Expectation and Purpose: Understanding User’s Mental Models of Mobile App Privacy thru Crowdsourcing. Ubicomp 2012.
  20. 20. ©2013CarnegieMellonUniversity:20 Nissan Maxima Gear Shift
  21. 21. ©2013CarnegieMellonUniversity:21 Privacy as Expectations • Apply this same idea of mental models for privacy – Compare what people expect an app to do vs what an app actually does – Emphasize the biggest gaps, misconceptions that many people had App Behavior (What an app actually does) User Expectations (What people think the app does)
  22. 22. ©2013CarnegieMellonUniversity:22 Crowdsourcing Privacy • Few people read privacy policies – We want to install the app – Reading policies not part of main task – Complexity of these policies (the pain!!!) – Clear cost (time) for unclear benefit • Crowdsourcing can mitigate these problems
  23. 23. ©2013CarnegieMellonUniversity:23 10% users were surprised this app wrote contents to their SD card. 25% users were surprised this app sent their approximate location to dictionary.com for searching nearby words. 85% users were surprised this app sent their phone’s unique ID to mobile ads providers. 0% users were surprised this app could control their audio settings. See all 90% users were surprised this app sent their precise location to mobile ads providers. 95% users were surprised this app sent their approximate location to mobile ads providers. 95% users were surprised this app sent their phone’s unique ID to mobile ads providers. 0% users were surprised this app can control camera flashlight.
  24. 24. ©2013CarnegieMellonUniversity:24 Our Study on App Privacy • Showed crowd workers screenshots and description of app (from Google Play) – 56 of top 100 Android Apps • Showed permissions one at a time – Only those related to privacy • Expectation Condition – Why they think the app uses permission – How comfortable they were with it • Purpose Condition – We gave an explanation (based on our analysis) – How comfortable they were with it
  25. 25. ©2013CarnegieMellonUniversity:25 Our Study on App Privacy • Participants – Recruited from Mturk, US people only – Asked what version of Android OS they used – Between-subjects (one condition only) • Method – Only 56 of top 100 apps requested use of unique phone ID, contact list, or location • Led to a total of 134 app-resource pairs – 20 participants per pair per condition • 2*20*134 = 5360 tasks
  26. 26. ©2013CarnegieMellonUniversity:26 Results for Location Data (N=20 per app, Expectations Condition) App Comfort Level (-2 – 2) Maps 1.52 GasBuddy 1.47 Weather Channel 1.45 Foursquare 0.95 TuneIn Radio 0.60 Evernote 0.15 Angry Birds -0.70 Brightest Flashlight Free -1.15 Toss It -1.2
  27. 27. ©2013CarnegieMellonUniversity:27 Most Unexpected Uses (N=20 per app, Expectations Condition) • Found strong correlation between expectations & comfort level (r=0.91) Apps using Contact List Comfort Level (-2 – 2) Backgrounds HD Wallpaper -1.35 Pandora -0.70 GO Launcher EX -0.75
  28. 28. ©2013CarnegieMellonUniversity:28 Showing Purpose Lowers Concerns • All differences statistically significant • Big increases for dictionary, Shazam, Air Control Lite, and others (> 1.0) App Comfort w/ Purpose Comfort w/o Purpose Device ID 0.47 ( =0.30) -0.10 ( =0.41) Contact List 0.66 ( =0.22) 0.16 ( =0.54) Network Location 0.90 ( =0.53) 0.65 ( =0.55) GPS Location 0.72 ( =0.62) 0.35 ( =0.73)
  29. 29. ©2013CarnegieMellonUniversity:29 Scaling Up CrowdScanning • It took ~2 wks to crowdsource 56 apps • 700k+ apps for iOS & Android markets • Idea: Use static & dynamic analysis + clustering for privacy models of apps – Ex. “Games uses location” -1.3 – Ex. “Uses location for map” +0.5
  30. 30. ©2013CarnegieMellonUniversity:30 Scaling Up CrowdScanning Crawled Data Set • Crawled 171k apps from Google Play – App name – Category (Arcade, Finance, etc) – Number of downloads – Average user rating (1-5) – Rating distribution – Price – Content Rating – 13M user reviews
  31. 31. ©2013CarnegieMellonUniversity:31
  32. 32. ©2013CarnegieMellonUniversity:32
  33. 33. ©2013CarnegieMellonUniversity:33 Scaling Up CrowdScanning Static Analysis of Apps • Starting assumptions: – Most apps use third-party libraries – When sensitive data is used, b/c libraries • Ex. Location sent to ad server via library • Ex. Location sent to Google for maps • Understanding what libraries app uses and how they are used can offer us richer semantics and explanations
  34. 34. ©2013CarnegieMellonUniversity:34 Scaling Up CrowdScanning Libraries are Major Point of Leverage
  35. 35. ©2013CarnegieMellonUniversity:35 Scaling Up CrowdScanning Static Analysis of Apps • Features extracted: – Libraries used – Network conn (in library or in main code) – Permissions (in library or main code) • 124k apps processed – Uses PyDev (Python for Eclipse) and AndroGuard (reverse eng apps) – 5 Amazon EC2 instances, 30 secs / app • Will crowdsource core set of 400 apps and build models to predict privacy
  36. 36. ©2013CarnegieMellonUniversity:36 Scaling Up CrowdScanning Tangent: Analyzing App Comments • Linear regression of most common words to 5-star ratings – Out of 1M comments, 8% of dataset – Only 0.09% comments related to privacy
  37. 37. ©2013CarnegieMellonUniversity:37 Two Threads of Work • CrowdScanning – Crowdsourcing approach to understand coarse-grain privacy perceptions of apps • Gort – Tool for analysts to understand fine-grain app behaviors
  38. 38. ©2013CarnegieMellonUniversity:38 Gort App Analysis Tool • Goal of Gort is to help analysts understand and vet behaviors of apps – Journalists – Privacy advocates – Three letter agencies
  39. 39. ©2013CarnegieMellonUniversity:39 Example Comparison • CrowdScanning: Yelp uses location • Gort: When (what screens) and why?
  40. 40. ©2013CarnegieMellonUniversity:40 Gort v1 Control Flow Graph Current Screen Servers contacted HTTP details HTTP requests Market description Permissions used Personal data sent
  41. 41. ©2013CarnegieMellonUniversity:41 Gort v2 Envisioned Workflow • Start with a pool of apps • Use heuristics to flag unusual behaviors to direct analyst’s attention – Static and dynamic heuristics • See overview of apps, view individual apps, check odd behaviors and context (screens)
  42. 42. ©2013CarnegieMellonUniversity:42 Gort v2 Heuristics for Apps • Interviewed 13 experts – Asked what characteristics and behaviors they would check to vet an app – Got ~100 heuristics, still organizing them Network • Sends password w/o SSL • Connects to fixed IP address Permissions • Contact List • Location but not for maps or ads • Uses mic Phone / SMS • SMS to fixed / premium num • Forwards SMS to server
  43. 43. ©2013CarnegieMellonUniversity:43 Traversing Screens in Apps • Have to traverse app for some heuristics – Ex. when exactly does the app use location? – Also want to capture screenshots
  44. 44. ©2013CarnegieMellonUniversity:44 Traversing Screens in Apps • General case is fairly easy – Breadth-first-search from home screen – Uses TEMA to get widgets on screen – Use Android’s MonkeyRunner to simulate input and get screenshots • But lots of exception cases…
  45. 45. ©2013CarnegieMellonUniversity:45 Some Hard Cases for Traversal Dialogs w/ side effects Text InputsLogins
  46. 46. ©2013CarnegieMellonUniversity:46 Some Hard Cases for Traversal Changes to system env App Updates Randomized dialogs
  47. 47. ©2013CarnegieMellonUniversity:47 Scaling Up CrowdScanning Making the Results Public • What will we do with all these results? • Basic idea: deploy a web site – Let public see results of our scans – Show privacy scores (and explanations) – Tell app developers how to fix their apps • Awareness, Knowledge, Motivation • Still early stages here, should have first iteration of site out end of May
  48. 48. ©2013CarnegieMellonUniversity:48 Public Feedback to Date • Slate • Yahoo News • MSNBC • Pittsburgh Tribune Review
  49. 49. ©2013CarnegieMellonUniversity:49 Thanks! More info at cmuchimps.org or email jasonh@cs.cmu.edu Special thanks to: • Army Research Office • National Science Foundation • Alfred P. Sloan Foundation • Google • CMU Cylab Join our community for researchers at: www.reddit.com/r/pervasivecomputing
  50. 50. ©2013CarnegieMellonUniversity:50
  51. 51. ©2013CarnegieMellonUniversity:51 The Opportunity • We are creating a worldwide sensor network with these smartphones • We can now capture and analyze human behavior at unprecedented fidelity and scale
  52. 52. ©2013CarnegieMellonUniversity:52 Summary • Smartphones offer big opportunity to understand human behavior at unprecedented fidelity and scale • Augmented Social Graph • Urban Analytics • CrowdScanning
  53. 53. ©2013CarnegieMellonUniversity:53 Reach of Apps Growing Finances Automobiles Homes
  54. 54. ©2013CarnegieMellonUniversity:54 Reach of Apps Growing

×