Successfully reported this slideshow.

vldb02hippocratic.ppt

804 views

Published on

  • Be the first to comment

  • Be the first to like this

vldb02hippocratic.ppt

  1. 1. Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu
  2. 2. Vision Paper Algorithms Performance Graphs Founding Principles New Challenges
  3. 3. The Hippocratic Oath <ul><li>“What I may see or hear in the course of treatment or even outside of the treatment in regard to the life of men, which on no account [ought to be] spread abroad, I will keep to myself, holding such things shameful to be spoken about.” </li></ul><ul><li>– Hippocratic Oath, 8 (circa 400 BC) </li></ul>
  4. 4. Privacy Violations <ul><li>Accidents: </li></ul><ul><ul><li>Kaiser, GlobalHealthrax </li></ul></ul><ul><li>Lax security: </li></ul><ul><ul><li>Massachusetts govt. </li></ul></ul><ul><li>Ethically questionable behavior: </li></ul><ul><ul><li>Lotus & Equifax, Lexis-Nexis, Medical Marketing Service, Boston University, CVS & Giant Food </li></ul></ul><ul><li>Illegal: </li></ul><ul><ul><li>Toysmart </li></ul></ul>
  5. 5. Growing Privacy Concerns <ul><li>Popular Press: </li></ul><ul><ul><li>Economist: The End of Privacy (May 99) </li></ul></ul><ul><ul><li>Time: The Death of Privacy (Aug 97) </li></ul></ul><ul><li>Govt. legislation </li></ul><ul><li>S. Garfinkel, &quot;Database Nation: The Death of Privacy in 21st Century&quot;, O' Reilly, Jan 2000 </li></ul><ul><li>Special issue on internet privacy, CACM, Feb 99 </li></ul>
  6. 6. Related Work <ul><li>Statistical Databases </li></ul><ul><ul><li>Provide statistical information (sum, count, etc.) without compromising sensitive information about individuals. [AW89] </li></ul></ul><ul><li>Multilevel Secure Databases </li></ul><ul><ul><li>Multilevel relations, e.g., records tagged “secret”, “confidential”, or “unclassified”, e.g. [JS91] </li></ul></ul><ul><li>Wish to protect privacy in transactional databases that support daily operations. </li></ul><ul><ul><li>Cannot restrict queries to statistical queries. </li></ul></ul><ul><ul><li>Cannot tag all the records “top secret”. </li></ul></ul>
  7. 7. Current Database Systems <ul><li>Ullman, “Principles of Database and Knowledgebase Systems” </li></ul><ul><li>Fundamental: </li></ul><ul><ul><li>Manage persistent data. </li></ul></ul><ul><ul><li>Access a large amount of data efficiently. </li></ul></ul><ul><li>Desirable: </li></ul><ul><ul><li>Support for data model, high-level languages, transaction management, access control , and resiliency. </li></ul></ul><ul><li>Similar list in other database textbooks. </li></ul>
  8. 8. The Vision <ul><li>We propose Hippocratic Databases that include responsibility for the privacy of data they manage as a founding tenet. </li></ul>
  9. 9. Approach <ul><li>Derive founding principles from current privacy legislation. </li></ul><ul><li>Strawman Design </li></ul><ul><li>Challenges & Open Problems </li></ul>
  10. 10. Caveats <ul><li>Technology alone cannot address all concerns about privacy. </li></ul><ul><ul><li>Solution has to be a mix of laws, societal norms, markets and technology. </li></ul></ul><ul><ul><li>But by advancing technology, we can influence the overall quality of the solution. </li></ul></ul><ul><li>Not all the world’s data lives in database systems. </li></ul><ul><ul><li>Additional inducement for data to move to its right home. </li></ul></ul><ul><ul><li>Hippocratic databases can serve as guide for other types of data repositories. </li></ul></ul>
  11. 11. Privacy Legislation <ul><li>Fair Information Practices Act (US, 1974) </li></ul><ul><li>OECD Guidelines (Europe, 1980) </li></ul><ul><li>Canadian Standards Association’s Model Code for Protection of Personal Information (1995) </li></ul><ul><li>Australian Privacy Amendment (2000) </li></ul><ul><li>Japan: proposed legislation (2003) </li></ul>
  12. 12. The Ten Principles <ul><li>Collection Group </li></ul><ul><ul><li>Purpose Specification, Consent, Limited Collection </li></ul></ul><ul><li>Use Group </li></ul><ul><ul><li>Limited Use, Limited Disclosure, Limited Retention, Accuracy </li></ul></ul><ul><li>Security & Openness Group </li></ul><ul><ul><li>Safety, Openness, Compliance </li></ul></ul>
  13. 13. Collection Group <ul><li>Purpose Specification </li></ul><ul><ul><li>For personal information stored in the database, the purposes for which the information has been collected shall be associated with that information. </li></ul></ul><ul><li>Consent </li></ul><ul><ul><li>The purposes associated with personal information shall have consent of the donor of the personal information. </li></ul></ul><ul><li>Limited Collection </li></ul><ul><ul><li>The information collected shall be limited to the minimum necessary for accomplishing the specified purposes. </li></ul></ul>
  14. 14. Use Group <ul><li>Limited Use </li></ul><ul><ul><li>The database shall run only those queries that are consistent with the purposes for which the information has been collected. </li></ul></ul><ul><li>Limited Disclosure </li></ul><ul><ul><li>Personal information shall not be communicated outside the database for purposes other than those for which there is consent from the donor of the information. </li></ul></ul>
  15. 15. Use Group (2) <ul><li>Limited Retention </li></ul><ul><ul><li>Personal information shall be retained only as long as necessary for the fulfillment of the purposes for which it has been collected. </li></ul></ul><ul><li>Accuracy </li></ul><ul><ul><li>Personal information stored in the database shall be accurate and up-to-date. </li></ul></ul>
  16. 16. Security & Openness Group <ul><li>Safety </li></ul><ul><ul><li>Personal information shall be protected by security safeguards against theft and other misappropriations. </li></ul></ul><ul><li>Openness </li></ul><ul><ul><li>A donor shall be able to access all information about the donor stored in the database. </li></ul></ul><ul><li>Compliance </li></ul><ul><ul><li>A donor shall be able to verify compliance with the above principles. Similarly, the database shall be able to address a challenge concerning compliance. </li></ul></ul>
  17. 17. Talk Outline <ul><li>Motivation </li></ul><ul><li>Founding Principles </li></ul><ul><li>Strawman Design </li></ul><ul><li>New Challenges </li></ul>
  18. 18. Strawman Architecture Privacy Policy Data Collection Queries Other Store
  19. 19. Architecture: Policy Privacy Policy Privacy Metadata Creator Store Privacy Metadata <ul><li>For each purpose & piece of information (attribute): </li></ul><ul><ul><li>External recipients </li></ul></ul><ul><ul><li>Retention period </li></ul></ul><ul><ul><li>Authorized users </li></ul></ul><ul><li>Different designs possible. </li></ul>Converts privacy policy into privacy metadata tables. Limited Disclosure Limited Retention
  20. 20. Privacy Policies Table {mining} {registration} {registration} {shipping} {shipping, charge} Authorized-users 10 years empty book order recommendations 3 years empty email customer register 3 years empty name customer register 1 month empty email customer purchase 1 month {delivery, credit-card} name customer purchase Retention External-recipients Attribute Table Purpose
  21. 21. Architecture: Data Collection Data Collection Store Privacy Constraint Validator Audit Info Audit Trail Privacy Metadata Privacy policy compatible with user’s privacy preference? Audit trail for compliance. Compliance Consent
  22. 22. Architecture: Data Collection Data Collection Store Privacy Constraint Validator Data Accuracy Analyzer Audit Info Audit Trail Privacy Metadata Data cleansing, e.g., catch typos in address. Record Access Control Associate set of purposes with each record. Purpose Specification Accuracy
  23. 23. Architecture: Queries Queries Store Attribute Access Control Privacy Metadata Record Access Control 2. Query tagged “telemarketing” cannot see credit card info. 3. Telemarketing query only sees records that include “telemarketing” in set of purposes. Safety Limited Use 1. Telemarketing cannot issue query tagged “charge”. Safety
  24. 24. Architecture: Queries Queries Store Audit Info Audit Trail Query Intrusion Detector Attribute Access Control Privacy Metadata Record Access Control Telemarketing query that asks for all phone numbers. <ul><li>Compliance </li></ul><ul><li>Training data for query intrusion detector </li></ul>Safety Compliance
  25. 25. Architecture: Other Store Privacy Metadata Other Data Retention Manager Encryption Support Delete items in accordance with privacy policy. Additional security for sensitive data. Data Collection Analyzer Analyze queries to identify unnecessary collection, retention & authorizations. Limited Retention Limited Collection Safety
  26. 26. Strawman Architecture Privacy Policy Data Collection Queries Privacy Metadata Creator Store Privacy Constraint Validator Data Accuracy Analyzer Audit Info Audit Info Audit Trail Query Intrusion Detector Attribute Access Control Privacy Metadata Other Data Retention Manager Record Access Control Encryption Support Data Collection Analyzer
  27. 27. Talk Outline <ul><li>Privacy </li></ul><ul><li>Founding Principles </li></ul><ul><li>Strawman Design </li></ul><ul><li>New Challenges </li></ul>
  28. 28. New Challenges <ul><li>General </li></ul><ul><ul><li>Language </li></ul></ul><ul><ul><li>Efficiency </li></ul></ul><ul><li>Use </li></ul><ul><ul><li>Limited Collection </li></ul></ul><ul><ul><li>Limited Disclosure </li></ul></ul><ul><ul><li>Limited Retention </li></ul></ul><ul><li>Security and Openness </li></ul><ul><ul><li>Safety </li></ul></ul><ul><ul><li>Openness </li></ul></ul><ul><ul><li>Compliance </li></ul></ul>
  29. 29. Language <ul><li>Need a language for privacy policies & user preferences. </li></ul><ul><li>P3P can be used as starting point. </li></ul><ul><ul><li>Developed primarily for web shopping. </li></ul></ul><ul><ul><li>What about richer domains? </li></ul></ul><ul><li>How do we balance expressibility and usability? </li></ul>contact email phone home work <ul><ul><ul><li>P3P recipients: </li></ul></ul></ul><ul><ul><li>Arrange concepts in hierarchy or subsumption relationship. </li></ul></ul><ul><ul><ul><li>Purpose: </li></ul></ul></ul>Ours Same Delivery Unrelated Public
  30. 30. Language (2) <ul><li>How do we accommodate user negotiation models? </li></ul><ul><ul><li>User willing to disclose information only if fairly compensated. </li></ul></ul><ul><ul><li>Value of privacy as coalitional game [KPR2001] </li></ul></ul>
  31. 31. Efficiency <ul><li>How do we minimize the cost of privacy checking? </li></ul><ul><li>How do we incorporate purpose into database design and query optimization? </li></ul><ul><li>Tradeoffs between space & running time. </li></ul><ul><ul><ul><li>Only tag records in customer table with purpose, not all records. But now need to do a join when scanning records in order table. </li></ul></ul></ul><ul><li>How does the secure databases work on decomposition of multilevel relations into single-level relations [JS91] apply here? </li></ul>
  32. 32. Limited Collection <ul><li>How do we identify attributes that are collected but not used? </li></ul><ul><ul><li>Assets are only needed for mortgage when salary is below some threshold. </li></ul></ul><ul><li>What’s the needed granularity for numeric attributes? </li></ul><ul><ul><li>Queries only ask “Salary > threshold” for rent application. </li></ul></ul><ul><li>How do we generate minimal queries? </li></ul><ul><ul><li>Redundancy may be hidden in application code. </li></ul></ul>
  33. 33. Limited Disclosure <ul><li>Can the user dynamically determine the set of recipients? </li></ul><ul><li>Example: Alice wants to add EasyCredit to set of recipients in EquiRate’s database. </li></ul><ul><li>Digital signatures. </li></ul>
  34. 34. Limited Retention <ul><li>Completely forgetting some information is non-trivial. </li></ul><ul><li>How do we delete a record from the logs and checkpoints, without affecting recovery? </li></ul><ul><li>How do we continue to support historical analysis and statistical queries without incurring privacy breaches? </li></ul>
  35. 35. Safety <ul><li>Encryption provides additional layer of security. </li></ul><ul><li>How do we index encrypted data? </li></ul><ul><li>How do we run queries against encrypted data? </li></ul><ul><li>[SWP00], [HILM02] </li></ul>
  36. 36. Openness <ul><li>A donor shall be able to access all information about the donor stored in the database. </li></ul><ul><li>How does the database check Alice is really Alice and not somebody else? </li></ul><ul><ul><li>Princeton admissions office broke into Yale’s admissions using applicant’s social security number and birth date. </li></ul></ul><ul><li>How does Alice find out what databases have information about her? </li></ul><ul><ul><li>Symmetrically private information retrieval [GIKM98]. </li></ul></ul>
  37. 37. Compliance <ul><li>Universal Logging </li></ul><ul><ul><li>Can we provide each user whose data is accessed with a log of that access, along with the query reading the data? </li></ul></ul><ul><ul><li>Use intermediaries who aggregate and analyze logs for many users. </li></ul></ul><ul><li>Tracking Privacy Breaches </li></ul><ul><ul><li>Insert “fingerprint” records with emails, telephone numbers, and credit card numbers. </li></ul></ul><ul><ul><li>Some data may be more valuable for spammers or credit card theft. How do we identify categories to do stratified fingerprinting rather than randomly inserting records? </li></ul></ul>
  38. 38. Summary <ul><li>Database systems that take responsibility for the privacy of data they manage. </li></ul><ul><li>Key privacy principles </li></ul><ul><li>Strawman design </li></ul><ul><li>Technical challenges </li></ul>
  39. 39. Closing Thoughts <ul><li>“Code is law … it is all a matter of code: the software and hardware that rule the internet” </li></ul><ul><li>-- L. Lessig </li></ul><ul><li>We can architect cyberspace to protect values we believe are fundamental, or we can architect it to allow those values to disappear. </li></ul><ul><li>Where does the database community want to go from here? </li></ul>
  40. 40. Strawman Architecture Privacy Policy Data Collection Queries Privacy Metadata Creator Store Privacy Constraint Validator Data Accuracy Analyzer Audit Info Audit Info Audit Trail Query Intrusion Detector Attribute Access Control Privacy Metadata Other Data Retention Manager Record Access Control Encryption Support Data Collection Analyzer
  41. 41. Privacy <ul><li>Privacy is the right of individuals to determine for themselves when, how and to what extent information about them is communicated to others. </li></ul><ul><li>-- Alan Westin </li></ul>

×