•0 likes•165 views

Report

Share

Download to read offline

Presented at the PrivateNLP workshop at WSDM 2020 https://sites.google.com/view/wsdm-privatenlp-2020/home/ Amazon prides itself on being the most customer-centric company on earth. That means maintaining the highest possible standards of both security and privacy when dealing with customer data. This month, at the ACM Web Search and Data Mining (WSDM) Conference, my colleagues and I will describe a way to protect privacy during large-scale analyses of textual data supplied by customers. Our method works by, essentially, re-phrasing the customer supplied text and basing analysis on the new phrasing, rather than on the customers’ own language.

- 1. Preserving Privacy and Utility in Text Data Analysis Tom Diethe, Oluwaseyi Feyisetan, Thomas Drake, Borja Balle {sey,tdiethe,draket}@amazon.com borja.balle@gmail.com PrivateNLP Workshop, WSDM February 7 2020
- 2. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 1 / 41
- 3. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 2 / 41
- 4. Alexa AI What is Alexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 3 / 41
- 5. Alexa AI What is Alexa? A cloud-based voice service that can help you with tasks, entertainment, general information, shopping, and more The more you talk to Alexa, the more Alexa adapts to your speech patterns, vocabulary, and personal preferences How do we ... create robust and eﬃcient AI systems? maintain the privacy of customer data? Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 3 / 41
- 6. Failure Modes Unintentional failures: ML system produces a formally correct but completely unsafe outcome Outliers/anomalies Dataset shift Limited memory Intentional failures: failure is caused by an active adversary attempting to subvert the system to attain her goals, such as to: misclassify the result infer private training data steal the underlying algorithm Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 4 / 41
- 7. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 5 / 41
- 8. A ﬁrst attempt: Can’t I just anonymize my data? k-anonymity: information for each person cannot be distinguished from at least k − 1 individuals whose information also appear in the release Suppose a company is audited for salary discrimination The auditor can see salaries by gender, age and nationality for each department and oﬃce If the auditor has a friend, an ex, a date, working for the company she will learn the salary of that person Reducing data granularity reduces the risk, but also reduces accuracy (ﬁdelity in this case) Oﬃce Dept. Salary D.O.B. Nationality Gender London IT £##### May 1985 Portuguese Female Still presents risk of re-identiﬁcation!. If there are 10 females born between 80-85 in the whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 6 / 41
- 9. A ﬁrst attempt: Can’t I just anonymize my data? k-anonymity: information for each person cannot be distinguished from at least k − 1 individuals whose information also appear in the release Suppose a company is audited for salary discrimination The auditor can see salaries by gender, age and nationality for each department and oﬃce If the auditor has a friend, an ex, a date, working for the company she will learn the salary of that person Reducing data granularity reduces the risk, but also reduces accuracy (ﬁdelity in this case) Oﬃce Dept. Salary D.O.B. Nationality Gender London IT £##### May 1985 Portuguese Female Still presents risk of re-identiﬁcation!. If there are 10 females born between 80-85 in the whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 6 / 41
- 10. A ﬁrst attempt: Can’t I just anonymize my data? k-anonymity: information for each person cannot be distinguished from at least k − 1 individuals whose information also appear in the release Suppose a company is audited for salary discrimination The auditor can see salaries by gender, age and nationality for each department and oﬃce If the auditor has a friend, an ex, a date, working for the company she will learn the salary of that person Reducing data granularity reduces the risk, but also reduces accuracy (ﬁdelity in this case) Oﬃce Dept. Salary D.O.B. Nationality Gender UK IT £##### 1980-1985 - Female Still presents risk of re-identiﬁcation!. If there are 10 females born between 80-85 in the whole of UK’s IT department, 9 of them could conspire to learn the salary of the 10th one Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 6 / 41
- 11. Anonymized Data Isn’t Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released “anonymized” data on state employees that showed every hospital visit Goal was to help researchers. Removed all obvious identiﬁers such as name, address, and social security number MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization, requested a copy of the data Reidentiﬁcation William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identiﬁers. Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6 people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code. Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his oﬃce. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 7 / 41
- 12. Anonymized Data Isn’t Example 1: Mid 1990’s: Massachusetts “Group Insurance Commission” released “anonymized” data on state employees that showed every hospital visit Goal was to help researchers. Removed all obvious identiﬁers such as name, address, and social security number MIT PhD student Latanya Sweeney decided to attempt to reverse the anonymization, requested a copy of the data Reidentiﬁcation William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identiﬁers. Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, population 54,000 and 7 ZIP codes. For $20, she purchased the complete voter rolls from the city of Cambridge, containing the name, address, ZIP code, birth date, and gender of every voter. Crossing this with the GIC records, Sweeney found Governor Weld with ease: Only 6 people shared his birth date, only 3 of them men, and of them, only he lived in his ZIP code. Sweeney sent the Governor’s health records (including diagnoses and prescriptions) to his oﬃce. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 7 / 41
- 13. Anonymized Data Isn’t Example 2: In 2006, Netﬂix released data pertaining to how 500,000 of its users rated movies over a six-year period Netﬂix “anonymized” the data before releasing it by removing usernames, but assigned unique identiﬁcation numbers to users in order to allow for continuous tracking of user ratings and trends Reidentiﬁcation Researchers used this information to uniquely identify individual Netﬂix users by crossing the data with the public IMDB database. According to the study, if a person has information about when and how a user rated six movies, that person can identify 99% of people in the Netﬂix database. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 8 / 41
- 14. Anonymized Data Isn’t Example 2: In 2006, Netﬂix released data pertaining to how 500,000 of its users rated movies over a six-year period Netﬂix “anonymized” the data before releasing it by removing usernames, but assigned unique identiﬁcation numbers to users in order to allow for continuous tracking of user ratings and trends Reidentiﬁcation Researchers used this information to uniquely identify individual Netﬂix users by crossing the data with the public IMDB database. According to the study, if a person has information about when and how a user rated six movies, that person can identify 99% of people in the Netﬂix database. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 8 / 41
- 15. Diﬀerential Privacy A randomised mechanism M : X → Y is -diﬀerentially private if for all neighbouring inputs x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have P[M(x) ∈ E] ≤ e P M x ∈ E 0 5 10 15 20 25 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Ratio bounded by e M(D) M(D') Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 9 / 41
- 16. Diﬀerential Privacy A randomised mechanism M : X → Y is -diﬀerentially private if for all neighbouring inputs x x (i.e. x − x 1 = 1) and for all sets of outputs E ⊆ Y we have P[M(x) ∈ E] ≤ e P M x ∈ E 0 5 10 15 20 25 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Ratio bounded by e M(D) M(D') Mechanisms: Randomised response −→ plausible deniability Laplace mechanism: e.g. ˜µ = µ + ξ, ξ ∼ Lap 1 n Output perturbation ... Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 9 / 41
- 17. Randomized Response [Warner ’65] Say you want to release a bit x ∈ {Yes, No}. Do the following: 1 ﬂip a coin 2 if tails, respond truthfully with x 3 if heads, ﬂip a second coin and respond “Yes” if heads; respond “No” if tails Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 10 / 41
- 18. Randomized Response [Warner ’65] Say you want to release a bit x ∈ {Yes, No}. Do the following: 1 ﬂip a coin 2 if tails, respond truthfully with x 3 if heads, ﬂip a second coin and respond “Yes” if heads; respond “No” if tails Claim: Above algorithm satisﬁes (log 3)-diﬀerential privacy Pr[Response = Yes|x = Yes] Pr[Response = Yes|x = No] = 1/2 × 1 + 1/2 × 1/2 1/2 × 0 + 1/2 × 1/2 = 3/4 1/4 = 3 =⇒ e = 3 Same for Pr[Response=No|x=Yes] Pr[Response=No|x=No] . Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 10 / 41
- 19. Important Properties Robustness to post-processing: M is ( , δ)-DP, then f (M) is ( , δ)-DP Composition: if M1, . . . , Mn are ( , δ)-DP, then g (M1, . . . , Mn) is ( n i=1 i , n i=1 δi )-DP Protects against arbitrary side knowledge Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 11 / 41
- 20. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 12 / 41
- 21. User-AI system interaction via natural language User’s goal: meet some speciﬁc need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Diﬀerential Privacy Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
- 22. User-AI system interaction via natural language User’s goal: meet some speciﬁc need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Diﬀerential Privacy Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
- 23. User-AI system interaction via natural language User’s goal: meet some speciﬁc need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Diﬀerential Privacy Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
- 24. User-AI system interaction via natural language User’s goal: meet some speciﬁc need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Diﬀerential Privacy Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
- 25. User-AI system interaction via natural language User’s goal: meet some speciﬁc need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Diﬀerential Privacy Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
- 26. User-AI system interaction via natural language User’s goal: meet some speciﬁc need with respect to an issued query x Agent’s goal: satisfy the user’s request Privacy violation: occurs when x is used to make personal inference. e.g. unrestricted PII present Mechanism: Modify the query to protect privacy whilst preserving semantics Our approach: Metric Diﬀerential Privacy Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 13 / 41
- 27. Desired Functionality Intent Query x Modiﬁed Query x GetWeather Will it be colder in Cleveland Will it be colder in Ohio PlayMusic Play Cantopop on lastfm Play C-pop on lastfm BookRestaurant Book a restaurant in Milladore Book a restaurant in Wood County SearchCreativeWork I want to watch Manthan ﬁlm I want to watch Hindi ﬁlm Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 14 / 41
- 28. Word Embeddings Mapping from words into vectors of real numbers (many ways to do this!) e.g. Neural network based models (e.g. Word2Vec, GloVe, fastText) Deﬁnes a mapping φ : W → Rn Nearest neigbours are often synonyms Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 15 / 41
- 29. Metric Diﬀerential Privacy Recall the deﬁnition of DP ... P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1 = 1 This can be rewritten into a single equation as: P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e x−x 1 Metric diﬀerential privacy generalises this to use any valid metric d(x, x ): P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e d(x,x ) (easy to see that standard DP is metric DP with d(x, x ) = x − x 1) Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 16 / 41
- 30. Metric Diﬀerential Privacy Recall the deﬁnition of DP ... P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1 = 1 This can be rewritten into a single equation as: P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e x−x 1 Metric diﬀerential privacy generalises this to use any valid metric d(x, x ): P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e d(x,x ) (easy to see that standard DP is metric DP with d(x, x ) = x − x 1) Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 16 / 41
- 31. Metric Diﬀerential Privacy Recall the deﬁnition of DP ... P[M(x) ∈ E] ≤ e P M x ∈ E for x, x ∈ X s.t. x − x 1 = 1 This can be rewritten into a single equation as: P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e x−x 1 Metric diﬀerential privacy generalises this to use any valid metric d(x, x ): P[M(x) ∈ E] P[M(x ) ∈ E] ≤ e d(x,x ) (easy to see that standard DP is metric DP with d(x, x ) = x − x 1) Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 16 / 41
- 32. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020] Given: w ∈ W: word to be “privatised” from word space W (dictionary) φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn ) v = φ(w): corresponding word vector d : Z × Z → R: distance function in embedding space Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1 n , i = 1, ..., n for Rn ) Metric DP Mechanism for word embeddings 1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( ) 2 The new vector v will not be a word (a.s.) 3 Project back to W: w = arg minw∈W d(v , φ(w)), return w What do we need? d satisﬁes the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle) A way to sample using Ω in the metric space that respects d and gives us -metric DP Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 17 / 41
- 33. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020] Given: w ∈ W: word to be “privatised” from word space W (dictionary) φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn ) v = φ(w): corresponding word vector d : Z × Z → R: distance function in embedding space Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1 n , i = 1, ..., n for Rn ) Metric DP Mechanism for word embeddings 1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( ) 2 The new vector v will not be a word (a.s.) 3 Project back to W: w = arg minw∈W d(v , φ(w)), return w What do we need? d satisﬁes the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle) A way to sample using Ω in the metric space that respects d and gives us -metric DP Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 17 / 41
- 34. Privacy in the Space of Word Embeddings [Feyisetan 2019, Feyisetan 2020] Given: w ∈ W: word to be “privatised” from word space W (dictionary) φ : W → Z: embedding function from word space to embedding space Z (e.g. Rn ) v = φ(w): corresponding word vector d : Z × Z → R: distance function in embedding space Ω( ): the D.P. noise sampling distribution (e.g. Ωi ( ) = Lap 1 n , i = 1, ..., n for Rn ) Metric DP Mechanism for word embeddings 1 Perturb the word vector: v = v + ξ where ξ ∼ Ω( ) 2 The new vector v will not be a word (a.s.) 3 Project back to W: w = arg minw∈W d(v , φ(w)), return w What do we need? d satisﬁes the axioms of a metric (nonnegative, indiscernibles, symmetry, triangle) A way to sample using Ω in the metric space that respects d and gives us -metric DP Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 17 / 41
- 35. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 18 / 41
- 36. Diﬀerential Privacy in the Space of Euclidean Word Embedding Adding noise to a location always produces a valid location — a point somewhere on the earth’s surface Adding noise to a word embedding produces a new point in the embedding space, but it’s A.S. not the location of a valid word embedding We perform approximate nearest neighbors ﬁnd the nearest valid embedding Nearest valid embedding could be the original word itself: in that case, the original word is returned Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 19 / 41
- 37. Practical Considerations To help choose , we deﬁne: Uncertainty statistics for the adversary over the outputs Indistinguishability statistics: plausible deniability Find a radius of high protection: guarantee on the likelihood of changing any word in the embedding vocabulary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 20 / 41
- 38. Euclidean Experiments: Setup Dataset IMDb Enron InsuranceQA Task type Sentiment analysis Author identiﬁcation Question answering Evaluation Metric accuracy accuracy MAP, MRR Training set size 25, 000 8, 517 12, 887 Test set size 25, 000 850 1, 800 Total word count 5, 958, 157 307, 639 92, 095 Vocabulary size 79, 428 15, 570 2, 745 Sentence length µ = 42.27 σ = 34.38 µ = 30.68 σ = 31.54 µ = 7.15 σ = 2.06 Scenario 1: Train time protection little access to public data (10%), but abundant access to private training data (90%); model training is done on the combined dataset (i.e. public subset + perturbed private subset) Scenario 2: Test time protection models trained on complete training set; evaluation on privatized version of the test sets We used 300-D GloVe word embeddings with biLSTM models Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 21 / 41
- 39. Results IMDb reviews – Accuracy vs baseline for diﬀerent values of ε 200 400 600 800 1000 epsilon 0.0 0.2 0.4 0.6 0.8 1.0 accuracy Accuracy (at training time) Accuracy Baseline 200 400 600 800 1000 epsilon 0.0 0.2 0.4 0.6 0.8 1.0 accuracy Accuracy (at test time) Accuracy Baseline Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 22 / 41
- 40. Results Enron emails – Accuracy vs baseline for diﬀerent values of ε 200 400 600 800 1000 epsilon 0.0 0.2 0.4 0.6 0.8 1.0 accuracy Accuracy (at training time) Accuracy Baseline 200 400 600 800 1000 epsilon 0.0 0.2 0.4 0.6 0.8 1.0 accuracy Accuracy (at test time) Accuracy Baseline Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 22 / 41
- 41. Results InsuranceQA – MAP/MRR scores for diﬀerent values of ε on the dev set 200 400 600 800 1000 epsilon 0.0 0.2 0.4 0.6 0.8 1.0 Scores for dev at training time MAP on dev MRR on dev MAP baseline MRR baseline 200 400 600 800 1000 epsilon 0.0 0.2 0.4 0.6 0.8 1.0 Scores for dev at test time MAP on dev MRR on dev MAP baseline MRR baseline Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 22 / 41
- 42. Privacy Evaluation In the previous experiments, we didn’t explicitly evaluate privacy Problem: is an arbitrary number that is hard to interpret This is especially true in metric DP, since is on a diﬀerent scale As we have seen, there are empirical ways to calibrate according to statistics of the word embeddings But how do we convince stakeholders that the privacy guarantees are holding, and there are no bugs? Solution: machine auditors – machine learning algorithms designed to diﬀerent types of privacy attacks on the data Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
- 43. Privacy Evaluation In the previous experiments, we didn’t explicitly evaluate privacy Problem: is an arbitrary number that is hard to interpret This is especially true in metric DP, since is on a diﬀerent scale As we have seen, there are empirical ways to calibrate according to statistics of the word embeddings But how do we convince stakeholders that the privacy guarantees are holding, and there are no bugs? Solution: machine auditors – machine learning algorithms designed to diﬀerent types of privacy attacks on the data Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
- 44. Privacy Evaluation In the previous experiments, we didn’t explicitly evaluate privacy Problem: is an arbitrary number that is hard to interpret This is especially true in metric DP, since is on a diﬀerent scale As we have seen, there are empirical ways to calibrate according to statistics of the word embeddings But how do we convince stakeholders that the privacy guarantees are holding, and there are no bugs? Solution: machine auditors – machine learning algorithms designed to diﬀerent types of privacy attacks on the data Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
- 45. Privacy Evaluation In the previous experiments, we didn’t explicitly evaluate privacy Problem: is an arbitrary number that is hard to interpret This is especially true in metric DP, since is on a diﬀerent scale As we have seen, there are empirical ways to calibrate according to statistics of the word embeddings But how do we convince stakeholders that the privacy guarantees are holding, and there are no bugs? Solution: machine auditors – machine learning algorithms designed to diﬀerent types of privacy attacks on the data Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
- 46. Privacy Evaluation In the previous experiments, we didn’t explicitly evaluate privacy Problem: is an arbitrary number that is hard to interpret This is especially true in metric DP, since is on a diﬀerent scale As we have seen, there are empirical ways to calibrate according to statistics of the word embeddings But how do we convince stakeholders that the privacy guarantees are holding, and there are no bugs? Solution: machine auditors – machine learning algorithms designed to diﬀerent types of privacy attacks on the data Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
- 47. Privacy Evaluation In the previous experiments, we didn’t explicitly evaluate privacy Problem: is an arbitrary number that is hard to interpret This is especially true in metric DP, since is on a diﬀerent scale As we have seen, there are empirical ways to calibrate according to statistics of the word embeddings But how do we convince stakeholders that the privacy guarantees are holding, and there are no bugs? Solution: machine auditors – machine learning algorithms designed to diﬀerent types of privacy attacks on the data Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 23 / 41
- 48. Machine Auditors Probabilistic record linkage auditing attack Objective: link a user in a public dataset, to a user in a (leaked) private dataset. Attack simulation: simulate public and “leaked” datasets by randomly splitting an initial dataset. The attack takes advantage of rare words and queries issued by users. A vector of word counts can be extracted from user queries and used to perform the linkage. Assumptions: attacker is able to narrow the attack set (using side knowledge) Evaluation: how many accurate links can the attacker reconstruct? Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 24 / 41
- 49. Machine Auditors Membership auditing attack [Shokri et al ’17, Song & Shmatikov ’18] Objective: identify whether an individual’s data (queries) were used in the training set of an ML model. Attack simulation: train ML model on queries from m users. Train “shadow” models using data from a diﬀerent set of n users. The attack model is a classiﬁer built using the output of the shadow models Assumptions: attacker is able to narrow the attack set (using side knowledge) Evaluation: can the attacker correctly detect m users inside and outside the model’s dataset Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 24 / 41
- 50. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 25 / 41
- 51. Hyperbolic Spaces (a) (b) (a) Projection of a point in the Lorentz model Hn to the Poincaré model (b) WebIsADb is-a relationships in GloVe vocabulary on B2 Poincaré disk Continuous analog of a tree structure Natural language captures hypernomy and hyponomy −→ embeddings require fewer dimensions Use models of Hyperbolic space - projections into Euclidean space Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 26 / 41
- 52. Hyperbolic Diﬀerential Privacy Distances in n−dimensional Poincaré ball are given by: dBn (u, v) = arcosh 1 + 2 u − v 2 (1 − u 2 )(1 − v 2 ) Claim: dBn (u, v) is a valid metric. Proof (via Lorentzian model) in the paper Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 27 / 41
- 53. Hyperbolic Noise Recall for Euclidean metric DP, we use Laplacian noise to achieve −mDP, i.e: ξ ∼ Lap 1 n We derive the Hyperbolic Laplace distribution: p(x|µ = 0, ε) = 1 + ε 2 2F1(1, ε, 2 + ε, −1) − 2 x − 1 − 1 −ε where 2F1(a, b; c, z) is the hypergeometric function For sampling, we developed a Lorentzian Metropolis Hastings sampler (see paper) −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 28 / 41
- 54. Hyperbolic Noise Recall for Euclidean metric DP, we use Laplacian noise to achieve −mDP, i.e: ξ ∼ Lap 1 n We derive the Hyperbolic Laplace distribution: p(x|µ = 0, ε) = 1 + ε 2 2F1(1, ε, 2 + ε, −1) − 2 x − 1 − 1 −ε where 2F1(a, b; c, z) is the hypergeometric function For sampling, we developed a Lorentzian Metropolis Hastings sampler (see paper) −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 28 / 41
- 55. Hyperbolic Noise Recall for Euclidean metric DP, we use Laplacian noise to achieve −mDP, i.e: ξ ∼ Lap 1 n We derive the Hyperbolic Laplace distribution: p(x|µ = 0, ε) = 1 + ε 2 2F1(1, ε, 2 + ε, −1) − 2 x − 1 − 1 −ε where 2F1(a, b; c, z) is the hypergeometric function For sampling, we developed a Lorentzian Metropolis Hastings sampler (see paper) −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 28 / 41
- 56. Hyperbolic Privacy Experiments 1 Task: obfuscation vs. Koppel’s authorship attribution algorithm Datasets: TPAN@Clef tasks, correct author predictions (lower=better) Pan-11 Pan-12 small large set-A set-C set-D set-I 0.5 36 72 4 3 2 5 1 35 73 3 3 2 5 2 40 78 4 3 2 5 8 65 116 4 5 4 5 ∞ 147 259 6 6 6 12 Correct author predictions (lower is better) Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 29 / 41
- 57. Hyperbolic Privacy Experiments 2 Task: expected privacy vs Euclidean baseline Datasets: 100/200/300d GloVe embeddings expected value Nw ε worst-case Nw hyp-100 euc-100 euc-200 euc-300 0.125 134 1.25 38.54 39.66 39.88 0.5 148 1.62 42.48 43.62 43.44 1 172 2.07 48.80 50.26 53.82 2 297 3.92 92.42 93.75 90.90 8 960 140.67 602.21 613.11 587.68 Privacy comparisons (lower Nw is better) Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 30 / 41
- 58. Hyperbolic Utility Experiments 5 classiﬁcation tasks: sentiment x2, product reviews, opinion polarity, question-type 3 natural language tasks: NL inference, paraphrase detection, semantic textual similarity baselines: utility results baselined using SentEval against random replacement hyp-100d original dataset random ε = 0.125 ε = 1 ε = 8 InferSent SkipThought fastText MR 58.19 58.38 63.56 74.52 81.10 79.40 78.20 CR 77.48 83.21∗∗ 83.92∗∗ 85.19∗∗ 86.30 83.1 80.20 MPQA 84.27 88.53∗ 88.62∗ 88.98∗ 90.20 89.30 88.00 SST-5 30.81 41.76 42.40 42.53 46.30 − 45.10 TREC-6 75.20 82.40 82.40 84.20∗ 88.20 88.40 83.40 SICK-E 79.20 81.00∗∗ 82.38∗∗ 82.34∗∗ 86.10 79.5 78.9 MRPC 69.86 74.78∗ 75.07∗ 75.01∗ 76.20 − 74.40 STS14 0.17/0.16 0.44/0.45 0.45/0.46∗ 0.52/0.53∗ 0.68/0.65 0.44/0.45 0.65/0.63 Accuracy scores on classiﬁcation tasks. * indicates results better than 1 baseline, ** better than 2 baselines Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 31 / 41
- 59. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 32 / 41
- 60. UTILITYPRIVACY Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 33 / 41
- 61. Example: Diﬀerentially Private SGD Algorithm 1: Diﬀerentially Private SGD Input: dataset z = (z1, . . . , zn) Hyperparameters: learning rate η, mini-batch size m, number of epochs T, noise variance σ2, clipping norm L Initialize w ← 0 for t ∈ [T] do for k ∈ [n/m] do Sample S ⊂ [n] with |S| = m uniformly at random Let g ← 1 m j∈S clipL( (zj , w)) + 2L m N(0, σ2I) Update w ← w − ηg return w 5+ hyper-parameters aﬀecting both privacy and utility For deep learning applications we only have empirical utility (not analyitic) How do we ﬁnd the hyperparameters that give us an optimal trade-oﬀ? Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 34 / 41
- 62. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 63. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 64. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 65. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 66. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 67. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 68. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 69. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 70. The Privacy-Utility Pareto Front Pareto-Optimal Points Hyper-parameter Space Privacy Loss Error Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 35 / 41
- 71. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 72. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 73. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 74. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 75. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 76. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 77. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 78. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 79. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 80. Bayesian Optimization Gradient-free optimization for black-box functions Widely used in applications (HPO in ML, scheduling & planning, experimental design ...) In multi-objective problems, BO aims to learn the Pareto front with a minimal number of evaluations. Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 36 / 41
- 81. DPareto DPareto Repeat: 1 For each objective (privacy, utility): 1 Fit a surrogate model (Gaussian process (GP)) using the available dataset 2 Calculate the predictive distribution using the GP mean and variance functions 2 Use the posterior of the surrogate models to form an acquisition function 3 Collect the next point at the estimated global max. of the acquisition function until budget exhausted Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 37 / 41
- 82. DPareto vs Random Sampling 28 ) 20 22 24 26 28 Sampled points 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 PFhypervolume Hypervolume Evolution MLP1 (RS) MLP1 (BO) MLP2 (RS) MLP2 (BO) 10−1 100 101 ε 0.0 0.2 0.4 0.6 0.8 1.0 Classiﬁcationerror MLP2 Pareto Fronts Initial +256 RS +256 BO 10−1 100 101 ε 0.16 0.18 0.20 0.22 0.24 Classiﬁcationerror LogReg+SGD Samples 1500 RS 256 BO Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 38 / 41
- 83. Outline 1 Alexa AI 2 Algorithmic Privacy 3 Privacy for Text 4 Diﬀerential Privacy in Euclidean Spaces 5 Diﬀerential Privacy in Hyperbolic Spaces 6 Optimizing the Privacy Utility Trade-oﬀ 7 Summary Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 39 / 41
- 84. Summary: Privacy Enhancing Technologies Privacy Privacy risks can be counter-intuitive and tricky to formalize High-dimensional data and side knowledge make privacy hard Semantic guarantees (eg. DP) behave better than syntactic ones (eg. k-anonymization) Diﬀerential privacy is a mature privacy enhancing technology Metric DP provides local plausible deniability, accuracy can be good even in cases with an inﬁnite number of outcomes Empirical privacy-utility trade-oﬀ evaluation enables application-speciﬁc decisions Bayesian optimization provides computationally eﬃcient method to recover the Pareto front (esp. with large number of hyper-parameters) Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 40 / 41
- 85. Questions? tdiethe@amazon.com Diethe, Feyisetan, Drake, Balle (Amazon) Privacy and Utility in Text Data Analysis February 7 2020 41 / 41