Privacy-Enhanced Personalization


Published on

talk given by Alfred Kobsa at the IHM'10 conference

Published in: Technology, News & Politics
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Privacy-Enhanced Personalization

  1. 1. Privacy-Enhanced Personalization Alfred Kobsa Donald Bren School of Information and Computer Sciences University of California, Irvine
  2. 2. “Traditional” personalization on the World Wide Web 64% 48% 48% 23% 23% 23% 20% 16% 11% 9% 7% 5%News clipping services Personalized content through non-PC devices Custom pricing Targeted marketing/advertising Express transactions Saved links Product recommendations Wish lists Personal productivity tools Account access Customized content Tailored email alerts Percent of 44 companies interviewed (multiple responses accepted) Source: Forrester Research
  3. 3. More recent deployed personalization • Personalized search • Web courses that tailor their teaching strategy to each individual student • Information and recommendations by portable devices that consider users’ location and habits • Personalized news (to mobile devices) • Product descriptions whose complexity is geared towards the presumed level of user expertise • Tailored presentations that take into account the user’s preferences regarding product presentation and media types (text, graphics, video)
  4. 4. Current personalization methods (in 30 seconds) Data sources • Explicit user input • User interaction logs Methods • Assignment to user groups • Rule-based inferences • Machine learning Storage of data about users • Persistent user profile • Updated over time
  5. 5. Web personalization delivers benefits for both users and web vendors Jupiter Communications, 1998: Personalization at 25 consumer e-commerce sites increased the number of new customers by 47% in the first year, and revenues by 52%. Nielsen NetRatings, 1999: • Registered visitors to portal sites spend over 3 times longer at their home portal than other users, and view 3 to 4 times more pages at their portal • E-commerce sites offering personalized services convert significantly more visitors into buyers than those that don’t. Choicestream 2004, 2005: • 80% interested in personalized content • 60% willing to spend a least 2 minutes answering questions about themselves Downside: Personalized sites collect significantly more personal data than regular websites, and do this often in a very inconspicuous manner.
  6. 6. Many computer users are concerned about their privacy online Number of users who reported: • being extremely or very concerned about divulging personal information online: 67% (Forrester 1999), 74% (AARP 2000) • being (extremely) concerned about being tracked online: 77% (AARP 2000) • leaving web sites that required registration information: 41% (Boston Consulting 1997) • having entered fake registration information: 40% (GVU 1998), 27% (Boston Consulting 1997), 32% (Forrester 1999) • having refrained from shopping online due to privacy concerns, or bought less: 32% (Forrester 1999), 32% 35% 54% : IBM 1999, 24% (AARP 2000) • wanting internet sites ask for permission to use personal data: 81% (Pew 2000) • being willing to give out personal data for getting something valuable in return: 31% (GUV 1998), 30% (Forrester 99), 51% (Personalization Consortium)
  7. 7. Privacy surveys do not seem to predict people’s privacy-related actions very well Harper and Singleton, 2001 Personalization Consortium • In several privacy studies in E-commerce contexts, discrepancies have already been observed between users stating high privacy concerns but subsequently disclosing personal data carelessly. • Several authors therefore challenge the genuineness of such reported privacy attitudes and emphasize the need for experiments that allow for an observation of actual online disclosure behavior.
  8. 8. Either Personalization or Privacy? • Personal data of computer users are indispensable for personalized interaction • Computer users are reluctant to give out personal data ☛ Tradeoff between privacy and personalization?
  9. 9. The tension between privacy and personalization is more complex than that… • Indirect relationship between privacy and personalization • Situation-dependent • Many mitigating factors People use “privacy calculus” to decide whether or not to disclose personal data, e.g. for personalization purposes
  10. 10. Privacy-Enhanced Personalization How can personalized systems maximize their personalization benefits, while at the same time being compliant with the privacy constraints that are in effect? Can we have good personalization and good privacy at the same time?
  11. 11. Privacy constraints, and how to deal with them Privacy constraints A. People’s privacy preferences in a given situation (and factors that influence them) B. Privacy norms (laws, self-regulation, principles) Reconciliation of privacy and personalization 1. Use of privacy-enhancing technology 2. Privacy-minded user interaction design
  12. 12. Privacy constraints, and how to deal with them Privacy constraints A. People’s privacy preferences in a given situation (and factors that influence them) B. Privacy norms (laws, self-regulation, principles) Reconciliation of privacy and personalization 1. Use of privacy-enhancing technology 2. Privacy-minded user interaction design
  13. 13. A. Factors that influence people’s willingness to disclose information Information type – Basic demographic and lifestyle information, personal tastes, hobbies – Internet behavior and purchases – Extended demographic information – Financial and contact information – Credit card and social security numbers Data values – Willingness to disclose certain data decreases with deviance from group average – Confirmed for age, weight, salary, spousal salary, credit rating and amount of savings
  14. 14. Valuation of personalization “the consumers’ value for personalization is almost two times […] more influential than the consumers’ concern for privacy in determining usage of personalization services” Chellappa and Sin 2005 What types of personalization do users value? • time savings, monetary savings, pleasure • customized content, remembering preferences
  15. 15. Trust Trust positively affects both people’s stated willingness to provide personal information to websites, and their actual information disclosure to experimental websites. Antecedents of trust • Past experience (of oneself and of others)  Established, long-term relationships are specifically important • Design and operation of a website • Reputation of the website operator • Presence of a privacy statement / seal • Awareness of and control over the use of personal data
  16. 16. Privacy constraints, and how to deal with them Privacy constraints A. People’s privacy preferences in a given situation (and factors that influence them) B. Privacy norms (laws, self-regulation, principles) Reconciliation of privacy and personalization 1. Use of privacy-enhancing technology 2. Privacy-minded user interaction design
  17. 17. B. Privacy norms • Privacy laws More than 40 countries and 100 states and provinces worldwide • Industry self-regulations Companies, industry sectors (NAI) • Privacy principles – supra-national (OECD, APEC) – national (Australia, Canada, New Zealand…) – member organizations (ACM)  Several privacy laws disallow a number of frequently used personalization methods unless the user’s consents
  18. 18. Privacy constraints, and how to deal with them Privacy constraints A. People’s privacy preferences in a given situation (and factors that influence them) B. Privacy norms (laws, self-regulation, principles) Reconciliation of privacy and personalization 1. Use of privacy-enhancing technology 2. Privacy-minded user interaction design
  19. 19. 1. Privacy-enhancing personalization technology Sometimes variants of personalization methods are available that are more privacy-friendly, with little or even no loss in personalization quality Personalization Privacy
  20. 20. Privacy-Enhancing Technology for Personalized Systems Caveats • PETs are not “complete solutions” to the privacy problems posed by personalized systems • Their presence is also unlikely to “charm away” users’ privacy concerns.  PETs should be used judiciously in a user-oriented system design that takes both privacy attitudes and privacy norms into account
  21. 21. Pseudonymous users and user models • In most cases privacy laws do not apply any more when users cannot be identified with reasonable means. • Pseudonymous users may be more inclined to disclose personal data (but this needs still to be shown!), leading to better personalization. • Anonymity is currently difficult and/or tedious to preserve when pay- ments, physical goods and non-electronic services are exchanged. • Harbors the risk of misuse. • Hinders vendors from cross-channel personalization. • Anonymity of database entries, web trails, query terms, ratings and texts can be surprisingly well defeated by an attacker who has identified data available that can be partly matched with the “anonymous” data. Users of personalized systems can receive full personalization and remain anonymous (users are unidentifiable, unlinkable and unobservable for third parties, linkable for personalized system) + –
  22. 22. • A website with personalization at the client side only is generally not subject to privacy laws. • Users may be more inclined to disclose their personal data if personalization is performed locally upon locally stored data only. • Popular user modeling and personalization methods that rely on an analysis of data from the whole user population cannot be applied any more or will have to be radically redesigned. • Server-side personalization algorithms often incorporate confidential rules / methods, and must be protected from unauthorized access.  Trusted computing platforms Client-side Personalization Users’ data are all located at the client rather than the server side. All personalization processes that rely on this data are exclusively carried out at the client side. + –
  23. 23. Privacy-enhancing techniques for collaborative filtering • Traditional collaborative filtering systems collect large amounts of user data in a central repository (product ratings, purchased products, visited web pages), • Central repositories may not be trustworthy, can be targets for attacks. Privacy-protecting techniques Distribution: No central repository, but interacting distributed clusters that contain information about a few users only. Aggregation of encrypted data – Users privately maintain their own individual ratings – Community of users compute an aggregate of their private data with homomorphic encryption and peer-to-peer communication – The aggregate allows personalized recommendations to be generated at the client side Perturbation: Ratings become systematically altered before submission to the central server, to hide users’ true values from the server. Obfuscation – A certain percentage of users’ ratings become replaced by different values before the ratings are submitted to a central server for collaborative filtering. – Users can then “plausibly deny” the accuracy of any of their data.
  24. 24. Privacy constraints, and how to deal with them Privacy constraints A. People’s privacy preferences in a given situation (and factors that influence them) B. Privacy norms (laws, self-regulation, principles) Reconciliation of privacy and personalization 1. Use of privacy-enhancing technology 2. Privacy-minded user interaction design
  25. 25. 2. Privacy-minded user interaction design Two experiments on the privacy-minded communication of websites’ privacy practices
  26. 26. Current “industry standard”: Privacy statements Privacy statements… • are regarded as very important by 76% (DTI 2001) • make 55% more comfortable providing personal information (Roy Morgan 2001, Gartner 2001) • are claimed to be viewed by 73% [“always” by 26%] (Harris Interactive 2001) • are effectively viewed by only 0.5% (Kohavi 2001), 1% (Reagan 2001) • are several readability levels too difficult (Lutz 04, JensenPotts 04)
  27. 27. Our counterproposal: A design pattern for personalized websites that collect user data 1. Traditional link to global privacy statement • Still necessary for legal reasons 2. Additionally, contextualized local communication of privacy practices and personalization benefits • Break long privacy policies into small, understandable pieces • Relate them specifically to the current context • Explain privacy practices as well as personalization benefits Design patterns constitute descriptions of best practices based on research and application experience. They give designers guidelines for the efficient and effective design of user interfaces. Every personalized site that collects user data should include the following elements on every page:
  28. 28. An example webpage based on the proposed design pattern Traditional link to a privacy statement Explanation of privacy practices Explanation of personalization benefits
  29. 29. The tested “experimental new version of an online bookstore” Contextualized short description of relevant privacy practices (taken from original privacy statement) Contextualized short description of relevant personalization benefits (derived from original privacy statement) Links to original privacy state-ment (split into privacy, security and personalization notice) “Selection counter”
  30. 30. Traditional version, with link to privacy statement only Links to original privacy state- ment (split into privacy, security and personalization notice) “Selection counter”
  31. 31. Why not simply ask users what they like better? • Inquiry-based empirical studies Reveal aspects of users’ rationale that cannot be inferred from mere observation • Observational empirical studies Reveal actual user behavior which may differ from users’ stated behavior  This divergence seems to be quite substantial in the case of stated privacy preferences vs. actual privacy-related behavior
  32. 32. Experimental Procedures (partly based on deception) 1. Instructions to subjects  “Usability test with new version of a well-known online book retailer”  Answering questions to allegedly obtain better book recommendations  No obligation to answer any question, but helpful for better recommendation.  Data that subjects entered would purportedly be available to company  Possibility to buy one of the recommended books with a 70% discount.  Reminder that if they buy a book, ID card and bank/credit card would be checked (subjects were instructed beforehand to bring these documents if they wish to buy) 2. Answering interest questions in order to “filter the selection set” (anonymous) • 32 questions with 86/64 answer options become presented (some free-text) • Most questions were about users’ interests (a very few were fairly sensitive) • All “make sense” in the context of filtering books that are interesting for readers • Answering questions decreased the “selection counter” in a systematic manner • After nine pages of data entry, users are encouraged to review their entries, and then to view those books that purportedly match their interests
  33. 33. Experimental Procedures (cont’d) 3. “Recommendation” of 50 books (anonymous) • 50 predetermined and invariant books are displayed (popular fiction, politics, travel, sex and health advisories) • Selected based on their low price and their presumable attractiveness for students • Prices of all books are visibly marked down by 70%, resulting in out-of-pocket expenses between €2 and €12 for a book purchase. • Extensive information on every book available 4. Purchase of one book (identified) • Subjects may purchase one book if they wish • Those who do are asked for their names, shipping and payment data (bank account or credit card charge). 5. Completing questionnaires 6. Verification of name, address and bank data (if book purchased)
  34. 34. Subjects and experimental design • 58 economics/business/MIS students of Humboldt University, Berlin, Germany (data of 6 students were eventually discarded) • Randomly assigned to one of two system versions: a) 26 used traditional version (that ony had links to global descriptions of privacy, personalization and security) b) 26 used contextualized version (with additional local information on privacy practices and personalization benefits for each entry field) Hypothesis: – User will be more willing to share personal data – User will regard enhanced site more favorably in terms of privacy
  35. 35. Results Control With contextual explanations % increase p Questions answered 84% 91% +8% 0.001 Answers given 56% 67% +20% 0.001 Book buyers 58% 77% +33% 0.07 “Privacy has priority” 3.34 3.95 +18% 0.01 “Data is used responsibly” 3.62 3.91 +8% 0.12 “Data allowed store to select better books” 2.85 3.40 +19% 0.035 ObservationsPerception
  36. 36. Privacy, Trust and Personalization Trust Willingness to disclose personal data Quality of personalization Perceived benefits Perceived privacy + + + + Control of own data Understanding + + Purchases + + + + + +
  37. 37. Other factors to be studied With contextual explanations w/o contextual explanations Increase p Questions answered 91% 84% +8% 0.001 Answers given 67% 56% +20% 0.001 Book buyers 77% 58% +33% 0.07 “Privacy has priority” 3.95 3.34 +18% 0.01 “Data allowed store to select better books” 3.40 2.85 +19% 0.035 “Data is used responsibly” 3.91 3.62 +8% 0.12 ObservationsPerception • Site reputation • Is privacy or benefit dis- closure more important? • Stringency of privacy practices • Permanent visibility of contextual explanations • References to full privacy policy
  38. 38. Privacy Finder (Cranor et al.) P3P • P3P is a W3C standard that allows a website to create a machine-readable version of its privacy policy • P3P policies an be automatically downloaded and evaluated against an individual's privacy preferences Privacy Finder is a search engine that • Displays the search results • Evaluates the P3P policies of the retrieved sites • Compares them with the user’s individual preferences • Displays this information graphically
  39. 39. Will online shoppers pay a premium for privacy? (Tsai et al., WEIS 2007) • Participants were adults from the general Pittsburgh population (12.5% privacy-unconcerned were screened out) • They received a flat fee of $45, and had to buy a battery pack and a sex toy online (each about $15 with shipping). They could keep the rest. • Three conditions, each with 16 subjects. • Subjects had to enter a pre-determined search expression, and were shown a list of real search results that were artificially ranked and supplemented with additional information: – in condition no-info: total price only (with shipping) – in condition irrelevant-info: total price, accessibility rating – in condition privacy-info: total price, privacy rating
  40. 40. Condition “privacy info”
  41. 41. Condition “privacy info”
  42. 42. Results 1. Between conditions no-info and accessibility-info: No significant difference in avg. purchase price 2. Between conditions no-info and privacy info: Users with privacy-info paid an average premium of $0.59 for batteries, and $0.62 for sex toys (p0.0001) 3. In the absence of prominent privacy information, people will purchase where price is lowest: Batteries Sex toys no-info 83% 80% irrelevant-info 75% 67% privacy-info 22% 33% Percentage of users who bought at the cheapest site
  43. 43. Ratio of purchases by level of privacy
  44. 44. There is no magic bullet for reconciling personalization with privacy Effort is comparable to … making systems secure … making systems fast … making systems reliable
  45. 45. Privacy-Enhanced Personalization: Need for a process approach 1. Gain the user’s trust – Respect the user’s privacy attitude (and let the user know) • Respect privacy laws / industry privacy agreements – Provide benefits (including optimal personalization within the given privacy constraints) – Increase the user’s understanding (don’t do magic) – Use trust-enhancing methods – Give users control – Use privacy-enhancing technology (and let the user know) 2. Then be patient, and most users will incrementally come forward with personal data / permissions if the usage purpose for the data and the ensuing benefits are clear and valuable enough to them.
  46. 46. Existing approaches for catering to privacy constraints • Largest permissible dominator (e.g., Disney) – Infeasible if a large number of jurisdictions are involved, since the largest permissible denominator would be very small – Individual preferences not taken into account • Different country/region versions (e.g., IBM) – Infeasible as soon as the number of countries/regions, and hence the number of different versions of the personalized system, increases – Individual preferences not taken into account • Anonymous personalization (users are not identified) – Nearly full personalization possible – Harbors the risk of misuse – Slightly difficult to implement if physical shipments are involved – Practical extent of protection unclear – Individual user preferences not taken into account
  47. 47. Our approach Seeking a mechanism to dynamically select user modeling components that comply with the currently prevailing privacy constraints
  48. 48. User modeling component pool Different methods differ in their data requirements, quality of predictions, and also their privacy implications data used demographic data user-supplied data visited pages X X X X X X X X X X X X X user modeling component methods used UMC1 clustering UMC2 rule-based reasoning UMC3 fuzzy reasoning with uncertainty UMC4 rule-based reasoning UMC5 fuzzy reasoning with uncertainty UMC6 incremental machine learning UMC7 one-time machine learning across sessions UMC8 one-time machine learning + fuzzy reasoning with uncertainty
  49. 49. Product line architecture • “The common architecture for a set of related products or systems developed by an organization.” [Bosch, 2000] • a PLA includes – Stable core: basic functionalities – Options: optional features/qualities – Variants: alternative features/qualities • Dynamic runtime selection (van der Hoek 2002): A particular architecture instance is selected from the product-line architecture
  50. 50. Our approach Selection Component (Fink Kobsa 2003)
  51. 51. Example: .com with privacy
  52. 52. The privacy constraints
  53. 53. Roadmap for Privacy-Enhanced Personalization Research • Study the impacts of privacy laws, industry regulations and individual privacy preferences on the admissibility of personalization methods • Provide optimal personalization while respecting privacy constraints • Apply state-of-the-art industry practice for managing the combinatorial complexity of privacy constraints
  54. 54. Readings… • A. Kobsa: Privacy-Enhanced Web Personalization. In P. Brusilovsky, A. Kobsa, W. Nejdl, eds.: The Adaptive Web: Methods and Strategies of Web Personalization. Springer Verlag. • A. Kobsa: Privacy-Enhanced Personalization. Communications of the ACM, Aug. 2007
  55. 55. Survey of privacy laws