Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Helping Developers with Privacy

6,802 views

Published on

Keynote talk for VL/HCC 2018. I talk about why developers should care about privacy, what privacy is and why it is hard, some of our group's research in building better tools to help developers (in particular, Coconut IDE Plug-in and PrivacyStreams), and lastly some frameworks for thinking about privacy and developers.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Helping Developers with Privacy

  1. 1. 1 Helping Developers with Privacy VL/HCC 2018 Jason Hong jasonh@cs.cmu.edu Computer Human Interaction: Mobility Privacy Security
  2. 2. :2
  3. 3. :3 New Kinds of Guidelines and Regulations US Federal Trade Commission guidelines California Attorney General recommendations European Union General Data Protection
  4. 4. :4 How Can We Help Developers Do Better with Respect to Privacy? • Why devs? Shouldn’t lawyers and management be handling privacy issues? • Lots of decisions about privacy will be made by devs with little knowledge and experience – Google, Facebook, etc can afford privacy teams, but still require devs to help design and implement – For long tail of small and medium businesses, devs will be making almost all decisions – All of these developers need help in managing and navigating privacy issues
  5. 5. :5 Today’s Talk • What is privacy? Why is it hard? • Our team’s work on smartphone privacy – Why smartphone privacy? – PrivacyGrade.org for grading app privacy – Studies on what developers know about privacy – Coconut IDE plugin tool – PrivacyStreams programming model • What you can do to help with privacy
  6. 6. :6 Why is Privacy Hard? #1 Privacy is a broad and fuzzy term • Privacy is a broad umbrella term that captures concerns about our relationships with others Everyday Risks Extreme Risks Stalkers, Hackers _________________________________ Well-being Personal safety Finances Employers _________________________________ Over-monitoring Discrimination Reputation Friends, Family _________________________________ Over-protection Social obligations Embarrassment Government __________________________ Civil liberties
  7. 7. :7 Why is Privacy Hard? #1 Privacy is a broad and fuzzy term • Lots of lenses (not mutually exclusive) – The right to be left alone – Control and feedback over one’s data – Anonymity (popular among researchers) – Presentation of self (impression management) – Right to be forgotten – Contextual integrity (take social norms into account) • Each leads to different way of handling privacy – Right to be left alone -> do not call list, blocking – Right to be forgotten -> delete from search engines
  8. 8. :8 Today, Will Focus on One Form of Privacy Data Privacy • Data privacy is primarily about how orgs collect, use, and protect sensitive data – Focuses on Personally Identifiable Information (PII) • Ex. Name, street address, unique IDs, pictures – Rules about data use, privacy notices • Led to the Fair Information Practices – Notice / Awareness – Choice / Consent – Access / Participation – Integrity / Security – Enforcement / Redress
  9. 9. :9 Some Comments on Data Privacy • Data privacy tends to be procedurally-oriented – Did you follow this set of rules? – Did you check off all of the boxes? – This is in contrast to outcome-oriented – Somewhat hard to measure too (Better? Worse?) • Many laws embody the Fair Information Practices – GDPR, HIPAA, Financial Privacy Act, COPPA, FERPA – But, enforcement is a weakness here • If an org violates, can be hard to detect • In practice, limited resources for enforcement
  10. 10. :10 Why is Privacy Hard? #2 No Common Set of Best Practices for Privacy • Security has lots of best practices + tools for devs – Use TLS/SSL – Hash user passwords – Devices should not have common default passwords – Use firewalls to block unauthorized traffic • For privacy, not so much – Choice / Consent: Best way of offering choice? – Access / Participation: Best way of offering access? – Notice / Awareness: Typically privacy policies, useful?
  11. 11. :11 • New York Times Privacy Policy • Still state of the art for privacy notices • But no one reads these
  12. 12. :12 Why is Privacy Hard? #3 Technological Capabilities Rapidly Growing • Data gathering easier and pervasive – Everything on the web (Google + FB) – Sensors (smartphones, IoT) • Data storage and querying bigger and faster • Inferences more powerful – Some examples shortly • Data sharing more widespread – Social media – Lots of companies collecting and sharing with each other, hard to explain to end-users (next slide)
  13. 13. :13 • 2010 diagram of ad tech ecosystem • Most of these are collecting and using data about you
  14. 14. :14 Built a logistic regression to predict sexuality based on what your friends on Facebook disclosed, even if you didn’t disclose Inferences about people more powerful
  15. 15. :15 “[An analyst at Target] was able to identify about 25 products that… allowed him to assign each shopper a ‘pregnancy prediction’ score. [H]e could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.” (NYTimes)
  16. 16. :16 Recap of Why Privacy is Hard • Privacy is a broad and fuzzy term • No common set of best practices • Technological capabilities rapidly growing • Note that these are just a few reasons, there are many, many more – But enough so that we have common ground
  17. 17. :17 Today’s Talk • What is privacy? Why is it hard? • Our team’s work on smartphone privacy – Why smartphone privacy? – PrivacyGrade.org for grading app privacy – Studies on what developers know about privacy – Coconut IDE plugin tool – PrivacyStreams programming model • What you can do to help with privacy
  18. 18. :18 Why Care About Smartphone Privacy? • Over 1B smartphones sold every year – Perhaps most widely deployed platform • Well over 100B apps downloaded on each of Android and iOS • Incredibly intimate devices
  19. 19. :19 Fun Facts about Millennials 83% sleep with phones
  20. 20. :20 Fun Facts about Millennials 83% sleep with phones 90% check first thing in morning
  21. 21. :21 Fun Facts about Millennials 83% sleep with phones 90% check first thing in morning 1 in 3 use in bathroom
  22. 22. :22 Smartphone Data is Intimate Who we know (contacts + call log) Sensors (accel, sound, light) Where we go (gps, photos)
  23. 23. :23 The Opportunity and the Risk • There are all these amazing things we could do – Healthcare – Urban analytics – Sustainability • But only if we can legitimately address privacy concerns – Spam, misuse, breaches http://www.flickr.com/photos/robby_van_moor/478725670/
  24. 24. :24 Some Smartphone Apps Use Your Data in Unexpected Ways Shared your location, gender, unique phone ID, phone# with advertisers Uploaded your entire contact list to their server (including phone #s)
  25. 25. :25 More Unexpected Uses of Your Data Location Data Unique device ID Location Data Network Access Unique device ID Location Data Microphone Unique device ID
  26. 26. :26 PrivacyGrade.org • Improve transparency • Assign privacy grades to all 1M+ Android apps • Does not help devs directly
  27. 27. :27
  28. 28. :28
  29. 29. :29
  30. 30. :30
  31. 31. :31 Expectations vs Reality
  32. 32. :32 Privacy as Expectations Use crowdsourcing to compare what people expect an app to do vs what an app actually does App Behavior (What an app actually does) User Expectations (What people think the app does)
  33. 33. :33 How PrivacyGrade Works • We crowdsourced people’s expectations of core set of 837 apps – Ex. “How comfortable are you with Drag Racing using your location for ads?” • We generated purposes by examining what third-party libraries used by app • Created a model to predict people’s likely privacy concerns and applied to 1M Android apps
  34. 34. :34 How PrivacyGrade Works
  35. 35. :35 How PrivacyGrade Works • Long tail distribution of libraries • We focused on top 400 libraries, which covers vast majority of cases
  36. 36. :36 Impact of PrivacyGrade • Popular Press – NYTimes, CNN, BBC, CBS, more • Government – Earlier work helped lead to FTC fines • Google – Google has something like PrivacyGrade internally • Developers
  37. 37. :37 Market Failure for Privacy • Let’s say you want to purchase a web cam – Go into store, can compare price, color, features – But can’t easily compare security (hidden feature) – So, security does not influence customer purchases – So, devs not incentivized to improve • Same is true for privacy – This is where things like PrivacyGrade can help – Improve transparency, address market failures – More broadly, what other ways to incentivize?
  38. 38. :38 Study 1 What Do Developers Know about Privacy? • A lot of privacy research is about end-users – Very little about developers • Interviewed 13 app developers • Surveyed 228 app developers – Got a good mix of experiences and size of orgs • What knowledge? What tools used? Incentives? • Are there potential points of leverage? Balebako et al, The Privacy and Security Behaviors of Smartphone App Developers. USEC 2014.
  39. 39. :39 Study 1 Summary of Findings Third-party Libraries Problematic • Use ads and analytics to monetize
  40. 40. :40 Study 1 Summary of Findings Third-party Libraries Problematic • Use ads and analytics to monetize • Hard to understand their behaviors – A few didn’t know they were using libraries (based on inconsistent answers) – Some didn’t know the libraries collected data – “If either Facebook or Flurry had a privacy policy that was short and concise and condensed into real English rather than legalese, we definitely would have read it.” – In a later study we did on apps, we found 40% apps used sensitive data only b/c of libraries [Chitkara 2017]
  41. 41. :41 Study 1 Summary of Findings Devs Don’t Know What to Do • Low awareness of existing privacy guidelines – Fair Information Practices, FTC guidelines, Google – Often just ask others around them • Low perceived value of privacy policies – Mostly protection from lawsuits – “I haven’t even read [our privacy policy]. I mean, it’s just legal stuff that’s required, so I just put in there.”
  42. 42. :42 Study 2 How do developers address privacy when coding? • Interviewed 9 Android developers • Semi-structured interview probing about their three most recent apps – Their understanding of privacy – Any privacy training they received – What data collected in app and how used • Libraries used? • Was data sent to cloud server? • How and where data stored? – We also checked against their app if on app store
  43. 43. :43 Study 2 Findings Inaccurate Understanding of Their Own Apps • Some data practices they claimed didn’t match app behaviors • Lacked knowledge of library behaviors • Fast iterations led to changes in data collection and data use • Team dynamics – Division of labor, don’t know what other devs doing – Turnover, use of sensitive data not documented
  44. 44. :44 Study 2 Findings Lack of Knowledge of Alternatives • Many apps use some kind of identifier, and different identifiers have tradeoffs – Hardware identifiers (riskiest since persistent) – Application identifier (email, hashcode) – Advertising identifier • Main point: Many alternatives exist, but often went with first solution found (e.g. StackOverflow) – We also saw this a lot in a later user study
  45. 45. :45 Study 2 Findings Lack of Motivation to Address Privacy Issues • Might ignore privacy issues if not required – Ex. Get location permission for one reason (maps), but also use for other reasons (ads) – Ex. Get name and email address, only need email – Ex. Get device ID because no permission needed • Android permissions and Play Store requirements useful in forcing devs to improve
  46. 46. :46 How to Get People to Change Behaviors? Security Sensitivity Stack Awareness Knowledge Motivation Does person know of existing threat? Does person know tools, behaviors, strategies to protect? Can person identify attack / problem? Can person use tools, behaviors, strategies? Does person care?
  47. 47. :47 Security Sensitivity Stack Adapted for Developers and Privacy Awareness Knowledge Motivation Are devs aware of privacy problem? Ex. Identifier tradeoffs, library behavior Do devs know how to address? Ex. Might not know right API call Do devs care? Ex. Sometimes ignore issues if not required
  48. 48. :48 Coconut Plug-In to Help Devs with Privacy • Plug-in for IntelliJ IDE to help with privacy – Require Java annotations to document data practices • A form of metadata for Java source code (@Override @Deprecated @Inherited) • Intended to address awareness, knowledge, motivation • Coconut currently only works with limited set of APIs • Example annotation for location request
  49. 49. :49 Coconut Plug-In to Help Devs with Privacy Detect Potential Privacy Issues in Code • Help devs understand design options – Knowledge of APIs limited, typically used first solution they found – Potential issues highlighted in purple – Offers suggestions for alternatives and quick fixes
  50. 50. :50 Coconut Plug-In to Help Devs with Privacy Identifiers and Privacy • Detect inappropriate use of unique identifier based on the purpose specified by the dev • Quick fixes for common problems
  51. 51. :51 Coconut Plug-In to Help Devs with Privacy Aggregate Sensitive Data Usage in One Place • All annotations gathered and categorized in one tool window called PrivacyChecker – Helps with multiple team members and versions – Also makes it easy to jump to that code
  52. 52. :52 Coconut IDE Plug-In Evaluation • Lab study of Coconut – Lab studies: 9 + 9 developers (w/ and w/o plug-in) – Tasks: build a weather app, use 3rd party library for ad monetization, store ID and location locally (analytics) • Ideally: coarse-grained location for weather and ads, private storage for local data, not hardware ID – Participants were informed privacy important here – Could also use any resource (e.g. search engine) – Interview, surveys, answer questions about app behavior, write a 1 paragraph privacy policy for app
  53. 53. :53 Coconut IDE Plug-In Evaluation Results • Participants with plug-in – Better privacy practices (more likely to follow ideal case) – Better at answering questions about their app • Ex. Granularity of location used, frequency, sent • Participants w/o plug-in – Many didn’t realize ad library was sending data • Had two judges evaluate privacy policies – Coconut avg = 5.8, control = 2.8 (out of 10) • Perceived as not too disruptive, also very useful – Med. for “Disruptive” & “Time consuming” = 2 out of 7
  54. 54. :54 Opportunities with Annotations • Use annotations to help other aspects of privacy – Annotations can be embedded into compiled code • Can be used to help with checking • Ex. App says it only uses location for maps, verify that – Use annotations to help generate privacy policies – Use annotations to generate good UIs • Ex. Runtime UIs • Ex. Better explanations • Stepping back: the more value to annotations, more likely to be adopted
  55. 55. :55 PrivacyStreams Programming Model Observation 1: Many Apps Don’t Need Raw Data # apps need coarse-grained data # apps need fine-grained data Based on a manual examination of 99 popular apps in Google Play and 20 apps in research papers. location microphone contacts messages Li et al. PrivacyStreams: Enabling Transparency in Personal Data Processing for Mobile Apps. PACM on Interactive, Mobile, Wearable, and Ubiquitous Technologies (IMWUT) 1(3). 2017.
  56. 56. :56 PrivacyStreams Programming Model Observation 2: Difficult for Devs to Get Sensitive Data int sampleRate = 8000; int bufferSize = AudioRecord.getMinBufferSize(sampleRate, AudioFormat.CHANNEL_IN_DEFAULT, AudioFormat.ENCODING_PCM_16BIT); AudioRecord audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, sampleRate, AudioFormat.CHANNEL_IN_DEFAULT, AudioFormat.ENCODING_PCM_16BIT, bufferSize); Deal with encoding, format, etc. audioRecord.startRecording(); long startTime = System.currentTimeMillis(); double rmsAmplitude = 0; long bufferTotalLen = 0; while (true) { short[] buffer = new short[bufferSize]; int bufferLen = audioRecord.read(buffer, 0, bufferSize); for (int i=0; i < bufferLen; i++) { rmsAmplitude += (double) buffer[i] * buffer[i] / 10000; } bufferTotalLen += bufferLen; long currentTime = System.currentTimeMillis(); if (currentTime - startTime > DURATION) { break; } } Process raw data while (true) { // … try { Thread.sleep(INTERVAL); } catch (InterruptedException e) { e.printStackTrace(); } } Handle threads if (ContextCompat.checkSelfPermission(this.context, Manifest.permission.RECORD_AUDIO) != PackageManager.PERMISSION_GRANTED) { Log.d("Task0", "Permission denied."); ActivityCompat.requestPermissions(thisActivity, new String[]{Manifest.permission.READ_CONTACTS}, 1); return; } Handle permissions
  57. 57. 57 UQI.getData(Audio.recordPeriodic(DURATION, INTERVAL), Purpose.HEALTH("monitor sleep")) .setField("loudness", calcLoudness(Audio.AUDIO_DATA)) .forEach("loudness", callback); Developers Auditors End-users Audio loudness app calcLoudness callback “This app will only get access to the microphone loudness.” PrivacyStreams Makes Privacy a Side Effect of Helping Developers See tutorials and code at privacystreams.github.io
  58. 58. :58 User Study • Goal – Is PrivacyStreams easy to use and liked? – Can we correctly analyze apps? • Study 1: Lab study – 10 Android devs, 5 programming tasks – Use both PrivacyStreams and Android standard APIs • Study 2: Field study – 5 experienced Android devs, 5 real apps (2 weeks) – Writes/rewrite an app with PrivacyStreams • Study 3: Privacy analysis – Analyze the 5 apps developed in the field study
  59. 59. 59 N=2 N=2 N=2 N=1 N=2 N=4 N=4 N=3 N=6 N=3 Average time (minutes) Contact Location SMS Image Geofence Study 1 Results Devs More Efficient Using PrivacyStreams
  60. 60. 60 App Analysis time (s) Generated description Speedometer 12.17 This app requests LOCATION permission to get the speed continuously. Lockscreen app 2.94 This app requests CALL_LOG permission to get the last missed call. Weather app 14.72 This app requests LOCATION permission to get the city-level location. Sleep monitor 13.03 This app requests MICROPHONE permission to get how loud it is. Album app 14.36 This app requests STORAGE permission to get all local images. Study 3 Results Analyzing Developed Apps
  61. 61. :61 Opportunities for PrivacyStreams • We think this could be a new and general way to manage third-party access to sensitive data – Ex. Browser plug-ins, IoT, databases of sensitive data • Looking at how to incorporate machine learning into pipeline (combining multiple streams) • Looking to integrate this into Privacy-Enhanced Android, DARPA Brandeis project on privacy – And then convince Google, Apple, others that this is the way to go for third-party APIs
  62. 62. :62 Today’s Talk • What is privacy? Why is it hard? • Our team’s work on smartphone privacy – Studies on what developers know about privacy – PrivacyGrade.org for grading apps – Coconut IDE plugin tool – PrivacyStreams programming model • What you can do to help with privacy
  63. 63. :63 Some Reflections on Privacy, and a Call to Action • Smartphone privacy is just one slice of privacy • Devs need privacy help for web, IoT, cloud, backend database processing, and more – Third-party libraries too (both creating and using) • Devs also need help with entire lifecycle of data – Collection, storage, inferencing, usage, sharing, presentation to end-users, auditing, documentation – Distributed teams, turnover, versioning • Close with two frameworks for thinking about research in this space
  64. 64. :64 Allen Newell’s Time Bands of Cognition Applied to Developers and Privacy 101 Unit Task 100 Operations 10-1 Deliberate Act 104 Task 103 Task 102 Task 107 106 105 Scale (sec) Cognitive Rational Social Stratum Annotations API usage Quick fixes Understanding a library Design Patterns Code documentation Sharing best practices Defining privacy policies Code reviews Examples
  65. 65. :65 Allen Newell’s Time Bands of Cognition Applied to Developers and Privacy 101 Unit Task 100 Operations 10-1 Deliberate Act 104 Task 103 Task 102 Task 107 106 105 Scale (sec) Cognitive Rational Social Stratum Annotations API usage Quick fixes Understanding a library Design Patterns Code documentation Sharing best practices Defining privacy policies Code reviews Examples Consider how to link your idea across time scales; a single point solution might not have enough value to be adopted
  66. 66. :66 Security Sensitivity Stack Adapted for Developers and Privacy Awareness Knowledge Motivation IDE feedback Notices from GitHub / App Stores More static / dynamic analysis tools IDE support Faster foraging for good examples Best practices embodied in libraries IDE requires (or app store) Shame (PrivacyGrade) Make life easier (privacy as side effect) Regulatory fines (GDPR)
  67. 67. :67 Security Sensitivity Stack Adapted for Developers and Privacy Awareness Knowledge Motivation IDE feedback Notices from GitHub / App Stores More static / dynamic analysis tools IDE support Faster foraging for good examples Best practices embodied in libraries IDE requires (or app store) Shame (PrivacyGrade) Make life easier (privacy as side effect) Regulatory fines (GDPR) Consider how to link your idea across this sensitivity stack; addressing one or two may not be enough value to be adopted
  68. 68. :68 Thanks! More info at cmuchimps.org or email jasonh@cs.cmu.edu Special thanks to: • DARPA Brandeis • Google • Yuvraj Agarwal • Shah Amini • Rebecca Balebako • Mike Czapik • Matt Fredrikson • Shawn Hanna • Haojian Jin • Tianshi Li • Yuanchun Li • Jialiu Lin • Song Luan • Swarup Sahoo • Mike Villena • Jason Wiese • Alex Yu • And many more… • CMU Cylab • NQ Mobile
  69. 69. :69
  70. 70. :70 Two Pieces of Advice for Privacy Research • Consider incentives and structure at hand • Ex. Not a lot of formal CS training in industry • Ex. Devs good at functional requirements – App functionality, bandwidth, power, making money…
  71. 71. :71 DARPA Brandeis • There are all these amazing things we could do if we can legitimately address privacy concerns • Four year program seeking to advance privacy – Enterprise privacy – IoT privacy – Smartphone Privacy -> Privacy-enhanced Android • Note: some work I’ll present done before this program, but easier to understand in this context • Also, not presenting in chronological order
  72. 72. :72 DARPA Brandeis Smartphone Privacy • Our approach: have devs declare in apps the purpose of why sensitive data being used – Devs select from a small set of defined purposes • Today: “This app uses location” • Ours: “This app uses location for advertising” – Use these purposes throughout ecosystem • Ex. IDE support for purposes • Ex. New ways of checking purposes • Ex. Use in GUIs to help end-users

×