Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Privacy, Ethics, and Big (Smartphone) Data, at Mobisys 2014


Published on

Keynote talk I gave at the Mobile and Cloud Workshop at Mobisys 2014. I talk about my experiences and reflections on privacy, focusing on (1) Urban Analytics, (2) Google Glass, and (3) PrivacyGrade.

Published in: Technology, Business
  • Be the first to comment

Privacy, Ethics, and Big (Smartphone) Data, at Mobisys 2014

  1. 1. ©2009CarnegieMellonUniversity:1 Privacy, Ethics, and Big (Smartphone) Data Mobile Cloud Computing and Services June 16, 2014 Jason Hong Computer Human Interaction: Mobility Privacy Security
  2. 2. ©2014CarnegieMellonUniversity:2 Smartphones are Pervasive • 58% penetration in the US as of early 2014 • About 1.2M Android and iOS apps • Over 75 billion apps downloaded on each of Android and iOS
  3. 3. ©2014CarnegieMellonUniversity:3 Smartphones are Intimate • Mobile phones and millennials (Cisco 2012): • 75% use in bed before sleep • 83% sleep with their phones • 90% check first thing in the morning • A third use in bathroom (!!) • A fifth check every ten minutes
  4. 4. ©2014CarnegieMellonUniversity:4 Smartphone Data is Intimate Who we know (contact list) Who we call (call log) Who we text (sms log)
  5. 5. ©2014CarnegieMellonUniversity:5 Smartphone Data is Intimate Where we go (gps, foursquare) Photos (some geotagged) Sensors (accel, sound, light)
  6. 6. ©2014CarnegieMellonUniversity:6 The Opportunity • We are creating a worldwide sensor network with these smartphones • We can now capture and analyze human behavior at unprecedented fidelity and scale
  7. 7. ©2014CarnegieMellonUniversity:7 Understanding Individuals Min et al, Toss ‘n’ Turn: Smartphone as Sleep and Sleep Quality Detector, CHI 2014.
  8. 8. ©2014CarnegieMellonUniversity:8 Understanding Small Groups Better communication Computational Social Science
  9. 9. ©2014CarnegieMellonUniversity:9 Understanding Urban Areas • AT&T Work on Human Mobility Median distance traveled per day
  10. 10. ©2014CarnegieMellonUniversity:10 Laborshed for Morristown NJ
  11. 11. ©2014CarnegieMellonUniversity:11 Eric Fischer’s Maps of Tourists vs Locals
  12. 12. ©2014CarnegieMellonUniversity:12 The Challenge • Mobile and Cloud computing a clear part of the future • Lots of fun technical challenges here – Computer scientists are great here! – We’re awesome at optimizing things • But biggest challenges will likely relate to privacy and ethics
  13. 13. ©2014CarnegieMellonUniversity:13 Three Stories • Reflections and personal experiences • Livehoods – How much can we infer about people? – Who wins, who doesn’t with tech? • Google Glass – Why so much negative feedback? – Are there lessons we can learn? • PrivacyGrade – What are ways of protecting people? – What is the role of developers?
  14. 14. ©2014CarnegieMellonUniversity:14 Story 1: The Challenge of Getting Data About Our Cities • Today’s methods for getting city data slow, costly, limited – Ex. Travel Behavioral Inventory – US Census 2010 cost $13b – Quality of life surveys • Some approaches today: – Call Data Records – Deploy a custom app
  15. 15. ©2014CarnegieMellonUniversity:15 The Vision: Urban Analytics • How can we use smartphones + social media + machine learning to offer new and useful insights about cities in a manner that is cheap, fast, and scalable?
  16. 16. ©2014CarnegieMellonUniversity:16 Livehoods, Our First Urban Analytics Tool • The character of an urban area is defined not just by the types of places found there, but also by the people that make it part of their daily life Cranshaw et al, The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City, ICWSM 2012.
  17. 17. ©2014CarnegieMellonUniversity:17 Two Perspectives “Politically constructed” “Socially constructed” Neighborhoods have fixed borders defined by the city government. Neighborhoods are organic, cultural artifacts. Borders are blurry, imprecise, and may be different to different people.
  18. 18. ©2014CarnegieMellonUniversity:18 Two Perspectives of Cities “Socially constructed” Neighborhoods are organic, cultural artifacts. Borders are blurry, imprecise, and may be different to different people. Can we discover automated ways of identifying the “organic” boundaries of the city? Can we extract local cultural knowledge from social media? Can we build a collective cognitive map from data?
  19. 19. ©2014CarnegieMellonUniversity:19 Livehoods Data Source • Crawled 18m check-ins from foursquare – Claims 20m users – People who linked their foursquare accts to Twitter • Spectral clustering based on geographic and social proximity
  20. 20. ©2014CarnegieMellonUniversity:20 ☺ ☺ ☺ ☺ If you watch check-ins over time, you’ll notice that groups of like-minded people tend to stay in the same areas.
  21. 21. ©2014CarnegieMellonUniversity:21 We can aggregate these patterns to compute relationships between check- in venues.
  22. 22. ©2014CarnegieMellonUniversity:22 These relationships can then be used to identify natural borders in the urban landscape.
  23. 23. ©2014CarnegieMellonUniversity:23 Livehood 2 Livehood 1 We call the discovered clusters “Livehoods” reflecting their dynamic character.
  24. 24. ©2014CarnegieMellonUniversity:24 Try it out at
  25. 25. ©2014CarnegieMellonUniversity:25 South Side Pittsburgh Safety LH8 LH9LH7 LH6 LH7 vs LH8 “Whenever I was living down on 15th Street [LH7] I had to worry about drunk people following me home, but on 23rd [LH8] I need to worry about people trying to mug you... so it’s different. It’s not something I had anticipated, but there is a distinct difference between the two areas of the South Side.”
  26. 26. ©2014CarnegieMellonUniversity:26 South Side Pittsburgh Demographic Differences LH8 LH9LH7 LH6 LH6 “There is this interesting mix of people there I don’t see walking around the neighborhood. I think they are coming to the Giant Eagle [grocery store] from lower income neighborhoods... I always assumed they came from up the hill.”
  27. 27. ©2014CarnegieMellonUniversity:27 South Side Pittsburgh “I always assumed they came from up the hill.”
  28. 28. ©2014CarnegieMellonUniversity:28 South Side Pittsburgh
  29. 29. ©2014CarnegieMellonUniversity:29 Other Potential Urban Analytics
  30. 30. ©2014CarnegieMellonUniversity:30 Topic Modeling (LDA)
  31. 31. ©2014CarnegieMellonUniversity:31 Reflections on Urban Analytics Few Privacy Issues • Publicly visible data with no expectations of privacy – No IRB issues • Remove venues labeled as “home” – We only received one request to remove a venue (wasn’t labeled as a home) • We only show data about geographic areas vs individuals • So far, so good
  32. 32. ©2014CarnegieMellonUniversity:32 Reflections on Urban Analytics But Lots of Questions About Ethics • Discussions with lots of different folks on how data like this might be used in the future – Urban planners, Yahoo, Facebook, Google Maps, Startups, and more • Lots of great questions – Not so many great answers – Share some of these with you
  33. 33. ©2014CarnegieMellonUniversity:33 How Much Can Be Inferred?
  34. 34. ©2014CarnegieMellonUniversity:34 How Much Can Be Inferred? • Very likely much more can be inferred using rich data like this – Demographics, socioeconomic, friends – Physical and mental health (depression) – How “risky” you are (bars, clinics, etc) • Unclear how far inferencing can go – Also, not much can stop advertisers, NSA, startups, etc – And, not just what you do, but what others like you are doing
  35. 35. ©2014CarnegieMellonUniversity:35 Built a logistic regression to predict sexuality based on what your friends on Facebook disclosed How Much Can Be Inferred?
  36. 36. ©2014CarnegieMellonUniversity:36 “[An analyst at Target] was able to identify about 25 products that… allowed him to assign each shopper a ‘pregnancy prediction’ score. [H]e could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.” (NYTimes)
  37. 37. ©2014CarnegieMellonUniversity:37 How Much Can Be Inferred?
  38. 38. ©2014CarnegieMellonUniversity:38 A New Kind of Redlining? • “denying, or charging more for, services such as banking, insurance, access to health care, … supermarkets, or denying jobs … against a particular group of people” (Wikipedia)
  39. 39. ©2014CarnegieMellonUniversity:39 Huge Risk of New Kind of Redlining Map of Philadelphia showing redlining of lower income neighborhoods. Households and businesses in the red zones could not get mortgages or business loans. (Wikipedia)
  40. 40. ©2014CarnegieMellonUniversity:40 Huge Risk of New Kind of Redlining Johnson says his jaw dropped when he read one of the reasons American Express gave for lowering his credit limit: "Other customers who have used their card at establishments where you recently shopped have a poor repayment history with American Express."
  41. 41. ©2014CarnegieMellonUniversity:41 Potential Biases in the Data? Pittsburgh’s Hill District Was ground zero for Jazz musicians in 20th century 8,000 residents and 400 businesses, decimating the economic center of African- American Pittsburgh Median Income (2009): $17,939
  42. 42. ©2014CarnegieMellonUniversity:42 Potential Biases in the Data? • Socioeconomic bias – Little data from lower socioeconomic areas • Urban bias – Twitter, Flickr, Foursquare all more active per capita in cities • Age and gender bias – Most young, male, technology-savvy • Is this a problem that will solve itself? – Or, can we address this in our models?
  43. 43. ©2014CarnegieMellonUniversity:43 Who Gains From this Data? • Today, most data only flows one way – Mainly to advertisers (and NSA) – Also banks, insurance, credit cards
  44. 44. ©2014CarnegieMellonUniversity:44 Who Gains From this Data? • Can we design systems that share the value across more people? – People co-create data and gain value – Participatory design philosophy • Can we also make people feel more invested in the cities they live in?
  45. 45. ©2014CarnegieMellonUniversity:45 Tiramisu Bus Tracker App • People can see incoming bus data • People can also share info – Got on bus – #seats available • New feature: people can discuss transit
  46. 46. ©2014CarnegieMellonUniversity:46 Story 2: Google Glass • Why there has been so much negative backlash about Google Glass? • Are there lessons we can learn here about privacy and adoption of tech?
  47. 47. ©2014CarnegieMellonUniversity:47 Origins of Ubicomp
  48. 48. ©2014CarnegieMellonUniversity:48 Popular Press Reaction
  49. 49. ©2014CarnegieMellonUniversity:49 Why Such Negative Press? • PARC knew privacy a big problem – But didn’t know what to do – So didn’t build any privacy protections • Unclear value proposition – Focused on technical aspects – What benefits to end-users? • Google Glass is replaying the past
  50. 50. ©2014CarnegieMellonUniversity:50 Why Such Negative Press? • Grudin’s law – Why does groupware fail? – “When those who benefit are not those who do the work, then the technology is likely to fail, or at least be subverted” • Privacy corollary – When those who bear the privacy risks do not benefit in proportion to the perceived risks, the tech is likely to fail
  51. 51. ©2014CarnegieMellonUniversity:51 But Expectations Can Change • Initial perceptions of mobile phone users – Rude, annoying – Casual chat, driving • Six weeks later… – Had same behaviors • People with more exposure to mobile phones better
  52. 52. ©2014CarnegieMellonUniversity:52 But Expectations Can Change • Famous 1890 article defining privacy as “the right to be let alone” was about photography
  53. 53. ©2014CarnegieMellonUniversity:53 • People objected to having phones in their homes because it “permitted intrusion… by solicitors, purveyors of inferior music, eavesdropping operators, and even wire-transmitted germs” But Expectations Can Change
  54. 54. ©2014CarnegieMellonUniversity:54 But Expectations Can Change One resort felt the trend so heavily that it posted a notice: “PEOPLE ARE FORBIDDEN TO USE THEIR KODAKS ON THE BEACH.” Other locations were no safer. For a time, Kodak cameras were banned from the Washington Monument. The “Hartford Courant” sounded the alarm as well, declaring the “the sedate citizen can’t indulge in any hilariousness without the risk of being caught in the act and having his photograph passed around among his Sunday School children.”
  55. 55. ©2014CarnegieMellonUniversity:55 And Framing Really Matters • Ubicomp -> Invisible Computing – Talked less about the tech – More about how it could help people – More positive press
  56. 56. ©2014CarnegieMellonUniversity:56 Privacy Hump • A lot of our concerns about tech fall under umbrella term “privacy” – Value, fears, expectations, what others around us think Many legitimate concerns Many alarmist rants “Right” way to deploy? Value proposition? Rules on proper use? Things have settled down Few fears materialized People feel in control People get tangible value
  57. 57. ©2014CarnegieMellonUniversity:57 But How to Get Over Hump? • Still a big gap in knowledge on best ways of mitigating privacy issues • Prime example: Facebook news feed
  58. 58. ©2014CarnegieMellonUniversity:58 “We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.”
  59. 59. ©2014CarnegieMellonUniversity:59 • Privacy Placebos: Things that make people feel better about privacy, but doesn’t offer much. • Other examples: Privacy policies, access logs • How ethical is it to use these approaches?
  60. 60. ©2014CarnegieMellonUniversity:60 Story 3: PrivacyGrade • – every SMS, search, and phone# • • How can we help improve privacy? – Developers – End-users
  61. 61. ©2014CarnegieMellonUniversity:61 What Do Developers Know about Privacy? • Interviews with 13 app developers • Surveys with 228 app developers • What tools? Knowledge? Incentives? • Points of leverage? Balebako et al, The Privacy and Security Behaviors of Smartphone App Developers. USEC 2014.
  62. 62. ©2014CarnegieMellonUniversity:62 Summary of Findings • App developers not evil • But need to monetize – Ads and analytics heavily used – Turns out libraries big source of problems • Also don’t know what to do – “I haven’t even read [our privacy policy]. I mean, it’s just legal stuff that’s required, so I just put in there.” – Often just ask other people around them – Low awareness of guidelines
  63. 63. ©2014CarnegieMellonUniversity:63 Developers Need a Lot of Help! • Good set best practices for security – SSL, hashing of passwords, randomization – Common attacks: SQL, XSS, CSRF • What are best practices for privacy??? • Developers also have many problems – App functionality, bandwidth, power, making money… privacy far down the list
  64. 64. ©2014CarnegieMellonUniversity:64 What are your apps really doing? Shares your location, gender, unique phone ID, phone# with advertisers Uploads your entire contact list to their server (including phone #s) Lin et al, Expectation and Purpose: Understanding User’s Mental Models of Mobile App Privacy thru Crowdsourcing. Ubicomp 2012.
  65. 65. ©2014CarnegieMellonUniversity:65 Many Smartphone Apps Have “Unusual” Permissions Location Data Unique device ID Location Data Network Access Unique device ID Location Data Unique device ID
  66. 66. ©2014CarnegieMellonUniversity:66 Expectations vs Reality
  67. 67. ©2014CarnegieMellonUniversity:67 Privacy as Expectations • Apply this same idea of mental models for privacy – Compare what people expect an app to do vs what an app actually does – Emphasize the biggest gaps, misconceptions that many people had App Behavior (What an app actually does) User Expectations (What people think the app does)
  68. 68. ©2014CarnegieMellonUniversity:68 10% users were surprised this app wrote contents to their SD card. 25% users were surprised this app sent their approximate location to for searching nearby words. 85% users were surprised this app sent their phone’s unique ID to mobile ads providers. 0% users were surprised this app could control their audio settings. See all 90% users were surprised this app sent their precise location to mobile ads providers. 95% users were surprised this app sent their approximate location to mobile ads providers. 95% users were surprised this app sent their phone’s unique ID to mobile ads providers. 0% users were surprised this app can control camera flashlight.
  69. 69. ©2014CarnegieMellonUniversity:69 Results for Location Data (N=20 per app, Expectations Condition) App Comfort Level (-2 – 2) Maps 1.52 GasBuddy 1.47 Weather Channel 1.45 Foursquare 0.95 TuneIn Radio 0.60 Evernote 0.15 Angry Birds -0.70 Brightest Flashlight Free -1.15 Toss It -1.2
  70. 70. ©2014CarnegieMellonUniversity:70 How to Scale It Up?
  71. 71. ©2014CarnegieMellonUniversity:71 How PrivacyGrade Works • Libraries can offer us insight into the purpose of a permission request – Location used by Google Maps library vs Location used by advertising library • Create a model of people’s concerns of permission by purpose Lin et al, Modeling Users’ Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings. SOUPS 2014.
  72. 72. ©2014CarnegieMellonUniversity:72 How PrivacyGrade Works • We crowdsourced people’s expectations of a core set of 837 apps – Ex. “How comfortable are you with Fruit Ninja using your location for ads?” – 20 people per question (as above)
  73. 73. ©2014CarnegieMellonUniversity:73 How PrivacyGrade Works
  74. 74. ©2014CarnegieMellonUniversity:74
  75. 75. ©2014CarnegieMellonUniversity:75
  76. 76. ©2014CarnegieMellonUniversity:76
  77. 77. ©2014CarnegieMellonUniversity:77 Early Discussion of PrivacyGrade • Can we use big data for privacy? • Many points of leverage in ecosystem – Policy makers / third party advocates – Developers – OS / Hardware / app store – End-users • Legal issues – Still under discussion with CMU lawyers – DMCA, liability issues
  78. 78. ©2014CarnegieMellonUniversity:78 How far should we go in inferring behaviors? How can we minimize biases in our data and models? How can we help ensure data doesn’t flow one way?
  79. 79. ©2014CarnegieMellonUniversity:79 What are better ways of going over the privacy hump? How acceptable is it to use privacy placebos?
  80. 80. ©2014CarnegieMellonUniversity:80 How can we help developers with privacy? Can we use big data approaches for privacy? What are ways of getting our work out there?
  81. 81. ©2014CarnegieMellonUniversity:81 How can we work with public policy makers to create better guidelines around privacy?
  82. 82. ©2014CarnegieMellonUniversity:82 How can we create a connected world we would all want to live in? Computer Human Interaction: Mobility Privacy Security
  83. 83. ©2014CarnegieMellonUniversity:83 Thanks! More info at or email Special thanks to: • Army Research Office • NSF • Alfred P. Sloan • NQ Mobile • DARPA • Google • CMU Cylab • Shah Amini • Justin Cranshaw • Afsaneh Doryab • Jialiu Lin • Song Luan • Jun-Ki Min • Dan Tasse • Jason Wiese
  84. 84. ©2014CarnegieMellonUniversity:84
  85. 85. ©2014CarnegieMellonUniversity:85 Summary • Smartphones and cloud computing offer big opportunity to understand human behavior • Also pose many large challenges, in privacy and ethics • But I’m optimistic
  86. 86. ©2014CarnegieMellonUniversity:86 86
  87. 87. ©2014CarnegieMellonUniversity:87 Using Location Data to Infer Friendships • 2.8m location sightings of 489 users of Locaccino friend finder in Pittsburgh • Place entropy for inferring social quality of a place – #unique people seen in a place – 0.0002 x 0.0002 lat/lon grid, ~30m x 30m Cranshaw et al, Bridging the Gap Between Physical Location and Online Social Networks, Ubicomp 2010
  88. 88. ©2014CarnegieMellonUniversity:88 Inferring Friendships • 67 different machine learning features – Location diversity (and entropy) – Intensity and Duration – Specificity (TF-IDF) – Graph structure (overlap in friends) • 92% accuracy in predicting friend/not – Location entropy improves performance over shallow features like #co-locations
  89. 89. ©2014CarnegieMellonUniversity:89 • Insert graph here • Describe entropy Co-location data to infer friendship Using place entropy, accuracy of 92% Can also infer number of friends
  90. 90. ©2014CarnegieMellonUniversity:90
  91. 91. ©2014CarnegieMellonUniversity:91 • Insert graph here • Describe entropy Could infer friend / not-friend based on co- location patterns and place entropy at 92%
  92. 92. ©2014CarnegieMellonUniversity:92 Brooklyn Queens Expressway
  93. 93. ©2014CarnegieMellonUniversity:93 Bezerkeley, CA
  94. 94. ©2014CarnegieMellonUniversity:94 Expectations Condition Why do you think Angry Birds uses your location data? How comfortable are you with Angry Birds using your location data?
  95. 95. ©2014CarnegieMellonUniversity:95 Purpose Condition Angry Birds uses your location data for advertising. How comfortable are you with Angry Birds using your location data?
  96. 96. ©2014CarnegieMellonUniversity:96 Few people read privacy policies • We want to install the app • Reading policies not part of main task • Complexity (the pain!!!) • Clear cost (time) for unclear benefit
  97. 97. ©2014CarnegieMellonUniversity:97 How PrivacyGrade Works (1) • We operationalize privacy as people’s expectations of data use – Ex. Most people don’t expect Fruit Ninja to use location data, but it actually does – Ex. Most people do expect Google Maps to use location data