Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees
Upcoming SlideShare
Loading in...5
×
 

Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

on

  • 1,187 views

Network of Excellence Internet Science Summer School. The theme of the summer school is "Internet Privacy and Identity, Trust and Reputation Mechanisms". ...

Network of Excellence Internet Science Summer School. The theme of the summer school is "Internet Privacy and Identity, Trust and Reputation Mechanisms".
More information: http://www.internet-science.eu/

Statistics

Views

Total Views
1,187
Views on SlideShare
532
Embed Views
655

Actions

Likes
0
Downloads
17
Comments
0

3 Embeds 655

http://www.internet-science.eu 624
http://internet-science.eu 30
http://webcache.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees Presentation Transcript

    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Privacy-Preserving Data Analysis Mechanisms and Formal Guarantees Joss Wright joss.wright@oii.ox.ac.uk Oxford Internet Institute Oxford University . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 1/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Privacy What is privacy? Many definitions in different areas of application. A useful definition: informational self-determination Enable data subjects to control how, in what way, and to whom their data is made available. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 2/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Privacy What is privacy? Within the privacy enhancing technologies community: Protecting the relations between communicating parties from observation. Context privacy. Anonymous communications. Preventing deduction of identities or attributes from collections of data. Data privacy. Strongly related concepts, but surprisingly separate fields of research. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 3/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Data Privacy Protection of individual data subjects from identification. Typically we work within the context of statistical queries on databases. Counts, averages, histogram queries, etc. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 4/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Model Consider a database as made up from a number of rows representing a single, unique individual, with columns showing attributes. All databases are not like this, but it’s useful for mechanism design and gives sufficient generality. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 5/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Model Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 6/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Actors Data subjects Owners of the data Holders and publishers of data Recipients of data Attacker . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 7/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Trust in the System Where do we place trust in the system? Subjects Need not be trusted as they control their own data. Publishers May need to be trusted in how they gather the data. If you expect them to control release, they must be trusted. Data Recipients Adversarial and malicious. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 8/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Basic Mechanisms Anonymization Remove explicit identifiers such as names. Privacy-preserving data mining Restrict queries to preserve privacy or results. Preferably enforced by the data publisher. Data peturbation Alter data to prevent undesirable inferences from being drawn . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 9/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Remove names or other obvious identifiers from data. Problems arise with quasi-identifiers. Combinations of record values that uniquely identify individuals. These can be difficult to specify or even detect. Exacerbated by the fact that data from external sources may combine with the database to form a quasi-identifier. We’ll come back to this. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 10/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 27 168 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 11/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 27 168 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 12/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 30 168 Red values are unique, therefore quasi-identifiers. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 13/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 30 168 Blue values are unique combinations, and so quasi-identifiers. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 14/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Methods One of the most well-known anonymizing mechanisms applied to data is k-anonymity Each unique set of records in a database should be combined with (1 − k) other records in the database. Any given record therefore describes at least k people. The probability that you are identified by that record is 1/k. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 15/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions k-anonymity Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 27 168 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 16/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions k-anonymity Name Age Height Joss [25-35] ≤180 Alice [25-35] ≤180 Bob [25-35] >180 Charles [25-35] >180 David [25-35] ≤180 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 17/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions k-anonymity Applied This is not a hypothetical issue. When Sweeney proposed k-anonymity, she demonstrated the risks. Took postcode, date of birth and sex from a published voter register Took anonymized published medical records Identified the record belonging to a former governor of Massachusetts. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 18/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Beyond k-anonymity k-anonymity gives a basic level of anonymization that prevents an individual being simply re-identified from their published attributes. There are, naturally, more subtle issues. We may still be able to infer sensitive information about a person, even if we can’t directly identify them. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 19/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity k-anonymity ensures that an individual is indistinguishable from a group of other individuals, preventing their direct re-identification. It could be, however, that attributes shared by the entire group are sensitive. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 20/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss 31 168 Flu Alice 30 144 Flu Bob 25 200 HIV Charles 31 187 HIV David 27 168 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 21/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss [25-35] ≤180 Flu Alice [25-35] ≤180 Flu Bob [25-35] >180 HIV Charles [25-35] >180 HIV David [25-35] ≤180 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 22/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss [25-35] ≤180 Flu Alice [25-35] ≤180 Flu Bob [25-35] >180 HIV Charles [25-35] >180 HIV David [25-35] ≤180 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 23/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss [25-35] ≤200 Flu Alice [25-35] ≤200 Flu Bob [25-35] ≤200 HIV Charles [25-35] ≤200 HIV David [25-35] ≤200 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 24/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity l-diversity ensures that not only are all users k-anonymous, but that each group of users shares a variety of sensitive attributes. Variations ensure that all sensitive attributes are evenly or sufficiently distributed to avoid high probability association of user with attribute. One notable extenstion is t-closeness that ensures that the distribution of attributes in the group is close to the distribution across the entire table. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 25/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Peturbation The above approaches maintain the consistency of the database. One of the oldest ideas is simply to replace genuine values with perturbed values that maintain almost-correct desirable properties. For numeric quantities this can simply be the addition of random noise according to some appropriate distribution. Obviously this works best for numerical data. For categories, this can result in attributes being re-assigned in a variety of ways. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 26/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Permutation Sensitive attributes can be swapped between data records, maintaining statistical quantities such as aggregate counts, averages and distribution of data. This has to be performed sensitively with respect to the required analyses. Typically on an ad-hoc, per-database basis. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 27/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Sweeney’s k-anonymity Re-identification In 2001, Sweeney set out to prove the ideas behind k-anonymity. Took publicly available voter registration data and published, anonymized medical records. (GIC Healthcare Data.) At the time of the data collection, William Weld was the governor of Massachusetts. According to the voter records, only six people in Cambridge, Massachusetts shared his birth date. Of those six, three were male. Only one lived within his (5-digit) ZIP code. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 28/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Sweeney’s k-anonymity Re-identification The anonymized medical records contained over 100 attributes detailing diagnoses, procedures and medications. Sweeney calculated that 87% of US citizens were uniquely identifiable through the quasi-identifier of {sex, date of birth, 5-digit ZIP} 53% from {sex, date of birth, city} 18% from {sex, date of birth, county} . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 29/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize Netflix wanted to improve its film recommendation algorithm. Published a database of over 100,000,000 film ratings by roughly 500,000 subscribers between 1999 and 2005. A million dollar prize was offered for an algorithm that would improve the recommendations given to users by a given degree of accuracy. “...all customer identifying information has been removed.” . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 30/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize Narayanan and Shmatikov disagreed. Combined Netflix data with IMDb data to re-identify a large number of users. Linked Netflix ratings to IMDb profiles. Showed the entire viewing history of many users. Demonstrated how information such as political preference could be extracted from the available data. Proof of concept algorithm used IMDb. Easily adaptable for alternative information sources. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 31/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize With 8 film ratings, 96% of subscribers can be uniquely identified. With 2 ratings, and dates, 64% can be completely deanonymized. With 2 ratings, and dates, 89% can be reduced to a possible 8 users. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 32/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize Redux Following this publication, Netflix’s response was... ... to announce a second Netflix prize containing more data points, including age, zip code, gender and previously-chosen films. Eventually cancelled, but only in response to legal action from customers and concerns from the US Federal Trade Commission. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 33/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Mechanisms Revisited The mechanisms we’ve looked at so far are: Typically ad-hoc based on the desired utility; the purpose for which the data will be used. Without formal guarantees. Quantifiable probability that individuals could be reidentified. Sensitive to auxiliary information from external data sources. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 34/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Mechanisms Revisited We can also consider privacy mechanisms as falling into one of two families: Non-interactive Anonymize the data somehow, then release it. Interactive Keep the database secret, and only release results to queries. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 35/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Non-Interactive Mechanisms Historically, the main way of doing things. Including most of the methods we’ve looked at so far. A major limitation of this approach to anonymization is that it requires you to fix the utility before you release the data. Data is either useless and anonymous Or useful and identifiable. It is difficult to predict interactions with data that might be released in the future. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 36/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Interactive Mechanisms In interactive mechanisms, the data is never released. Instead, queries are sent to the holder of the database, who releases an answer. This approach is taken by the current state of the art: differential privacy. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 37/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy In 1978, Dalenius stated the following desirable property for privacy-preserving statistical databases: “A statistical database should reveal nothing about an individual that could not be learned without access to the database.” This is impossible, largely due to the existence of auxiliary external information that can be combined with the data in the database. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 38/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy ‘Suppose one’s height were considered a sensitive piece of information, and that revealing the height of an individual were a privacy breach. Assume that a database yields the average heights of women of different nationalities. An adversary who has access to the statistical database and the auxiliary information “Terry Gross is two inches shorter than the average Lithuanian woman” learns Terry Gross’ height, while anyone learning only the auxiliary information, without access to the average heights, learns relatively little.’ – Dwork . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 39/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Critically, this privacy breach occurs whether or not Terry Gross’ data is in the database. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 40/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Rather than guaranteeing that a privacy breach will not occur, differential privacy guarantees that the privacy breach will not occur due to the data in the database. Reformulated: Anything that can happen if your data is in the database could have happened even if your data weren’t in the database. This neatly accomodates any and all possible auxiliary information available now or in the future. It also divorces the privacy mechanism from the nature of the underlying data, providing a general mechanism. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 41/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Core A randomised function K achieves ϵ-differential privacy if, for any two databases D1 , D2 differing on at most one element, and all S ⊆ Range(K): Pr[K(D1 ) ∈ S] ≤ eϵ × Pr[K(D2 ) ∈ S] . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 42/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Core Alternatively: Pr[K(D1 )∈S] Pr[K(D2 )∈S] ≤ eϵ The ratio between the two probabilities is bounded by eϵ . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 43/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions The Exponential Function eε 150 100 50 0 0 1 2 3 4 5 ε . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 44/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Diferential Privacy Core Translated: for any calculation that you make on a database, any result you get is (almost) equally probable if you add a person, and thus a single record, to that database. Alternatively put: two databases that differ in a single record should be indistinguishable, with given probability, when accessed via the privacy mechanism. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 45/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Achieving Differential Privacy How do we achieve this guarantee? There are a variety of mechanisms proposed in the literature, but Dwork’s original suggestion remains popular: Appropriately chosen random noise is added to the result of a query of arbitrary complexity. Noise added to the result means that the original database retains its accuracy. The Laplace distribution provides desirable properties for the appropriate noise. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 46/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions The Laplace Distribution . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 47/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Achieving Differential Privacy How do we know how much noise to add? We use the L1 -sensitivity of the function to bound the noise: Defined as the amount by which the query could change if a single record were added to the database. Recall that our guarantee is based around indistinguishability between similar databases. As an example: the count function (e.g. “How many people in the database are left-handed?”) can only differ by one. Other queries types differ, but many complex queries have manageable L1 -sensitivity. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 48/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Properties of Differential Privacy Use of the Laplace distribution to add noise provably adds the smallest amount required to preserve privacy. The multiplicative factor used in the guarantee is scalable for higher or lower guarantees. Higher values decrease the likelihood that databases can be distinguished as a result of queries, but make results less accurate. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 49/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Illustrated Pr[x] a b µ1 µ2 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 50/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Illustrated (Explanation) In the previous slide, let µ1 and µ2 be two “true” results of a query, such as a count function, from each of two databases that differ in a single record. With random noise added, drawn from the Laplace distribution, both a and b are possible “noisy” results of the query for either database. Importantly, the ratio between the probability of a given noisy result, such as a or b, based on µ1 , and the probability of that result based on µ2 , is constant. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 51/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Properties of Differential Privacy Differentially private queries are neatly composable in two senses: A complex sequence of queries can be given to the database owner, each of which depends on the accurate result of the previous query. At the end, only the final result need be perturbed. The result of a differentially private query exhausts some amount of the privacy guarantee. Further queries can be made until this budget is exhausted. At this point the database should be destroyed! . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 52/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Practical Application Privacy Integrated Queries (PINQ) For practical application, we do not want database owners to need to understand the theory. There is now a simple database query language, similar to SQL, that automatically enforces differential privacy guarantees. Has been used in academic analyses, but not commercially. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 53/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Practical Application Smart Grids Recent work by Danezis demonstrates differentially-private smart metering for electrical grids. Injects noise in billing by increasing the amount you pay. Rapidly gets very expensive, but gives quantifiable privacy goals. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 54/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Future Work Differential privacy is a very strong guarantee. How effectively can it be weakened? Distributed settings for data sources and noise addition. Streaming, or otherwise changing, data rather than static databases. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 55/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Lessons A step back: anonymizing data is hard. We are only just beginning to realise just how hard. Differential privacy, and PINQ, are good examples of how to go about this and what limitations we face. Netflix and other examples show that these risks are not isolated or theoretical. This is before we look at Facebook, Google, Amazon. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 56/57
    • iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Lessons If you are in a position where you need to anonymize data, think very carefully about how you treat the data, and what you release. Eyeballing data, and removing obvious linkages, is not even close to sufficient. Do it if you want to, but don’t claim it’s anonymized. The most important principle is data minimisation. Only gather what you need. Only use it for what you (initially) need. Only share it when you must. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..Joss Wright Privacy-Preserving Data Analysis: 57/57