Successfully reported this slideshow.
Upcoming SlideShare
×

# Differential privacy without sensitivity [NIPS2016読み会資料]

2,793 views

Published on

NIPS2016読み会の発表資料です
イベント: https://connpass.com/event/47580/

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Differential privacy without sensitivity [NIPS2016読み会資料]

1. 1. Differential Privacy without Sensitivity 南 賢太郎（東大 情報理工 D1） 2017/1/19@NIPS2016読み会
2. 2. Overview Differential privacy (DP) • Degrees of privacy protection [Dwork+06] Gibbs posterior • A generalization of the Bayesian posterior Contribution We proved (𝜀, 𝛿)-DP of the Gibbs posterior without boundedness of the loss 2
3. 3. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 3
4. 4. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 4
5. 5. Privacy constraint in ML & statistics 5 𝑋1 𝑋2 𝑋 𝑛 ⋯ User’s data 𝐷 Curator Statistic 𝜃
6. 6. Privacy constraint in ML & statistics 6 𝑋1 𝑋2 𝑋 𝑛 ⋯ User’s data 𝐷 Curator Statistic 𝜃 In many applications of ML & statistics, the data 𝐷 = {𝑋1, … , 𝑋 𝑛} contains user’s personal information Problem: Calculate a statistic of interest 𝜃 privately TBD.
7. 7. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 7 𝑋1 𝑋2 𝑋 𝑛 ⋯
8. 8. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 8 𝑋1 𝑋2 𝑋 𝑛 ⋯ 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Auxiliary info. 𝐷′
9. 9. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 9 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise
10. 10. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 10 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯
11. 11. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 11 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Small noise for 𝜃  Adding noise may not deteriorate the accuracy Large noise for 𝑋𝑖  Privacy preservation
12. 12. Differential privacy Idea: 1. Generate a random 𝜃 from a data-dependent distribution 𝜌 𝐷 12 𝑋1 𝑋2 𝑋 𝑛 ⋯
13. 13. Differential privacy Idea: 2. Two “adjacent” datasets differing in a single individual should be statistically indistinguishable 13 𝑋1 𝑋2 𝑋 𝑛 ⋯ 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Close in the sense of a “statistical distance”
14. 14. Differential privacy Def: Differential Privacy [Dwork+06] • 𝜀 > 0, 𝛿 ∈ [0, 1) privacy parameters • 𝜌 𝐷 satisfies (𝜀, 𝛿)-differential privacy if 1. for any adjacent datasets 𝐷, 𝐷′, and 2. for any set 𝐴 ⊂ Θ of outputs, the following inequality holds: 14
15. 15. Interpretation of DP • DP prevents identification with statistical significance • e.g. Adversary cannot construct power 𝛾-test for 𝐻0: 𝑋𝑖 = 𝑋 𝑣. 𝑠. 𝐻1: 𝑋𝑖 ≠ 𝑋 at 5% significance level • See also: 15
16. 16. DP and statistical learning Example: Linear classification • Find a 𝜀, 𝛿 -DP distribution of hyperplanes that minimizes the expected classification error 16
17. 17. Differentially private learning Question: What kind of random estimators should we use? 1. Noise addition to a deterministic estimator • e.g. maximum likelihood estimator + noise 2. Modification of the Bayesian posterior (this work) 17
18. 18. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 18
19. 19. Gibbs posterior • Bayesian posterior • Introduce a “scale parameter” 𝛽 > 0 19
20. 20. Gibbs posterior A natural data-dependent distribution in statistics & ML • Contains the Bayesian posterior ℓ 𝜃, 𝑥 = − log 𝑝 𝑥 𝜃 , 𝛽 = 1 • Important in PAC-Bayes theory [Catoni07][Zhang06] 20 Loss function ℓ(𝜃, 𝑥) Prior distribution 𝜋 Inverse temperature 𝛽 > 0
21. 21. Gibbs posterior 21 𝛽 → 0
22. 22. Gibbs posterior Problem • If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior • Is DP satisfied if we choose 𝛽 > 0 sufficiently small? 22 𝛽 → 0
23. 23. Gibbs posterior Problem • If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior • Is DP satisfied if we choose 𝛽 > 0 sufficiently small? 23 𝛽 → 0 Answer Yes, if… • ℓ is bounded (Previously known) • 𝛻ℓ is bounded (This work)
24. 24. The exponential mechanism Theorem [MT07] An algorithm that draws 𝜃 from a distribution satisfies (𝜀, 0)-DP 24
25. 25. The exponential mechanism Theorem [MT07] An algorithm that draws 𝜃 from a distribution satisfies (𝜀, 0)-DP • This is the Gibbs posterior if ℒ 𝜃, 𝐷 = 𝑖=1 𝑛 ℓ(𝜃, 𝑥𝑖) • 𝛽 has to satisfy 𝛽 ≤ 𝜀 2Δℒ • Δℒ: sensitivity (TBD.) 25
26. 26. Sensitivity Definition: Sensitivity of ℒ: Θ × 𝒳 𝑛 → ℝ • The exponential mechanism works if 𝛥ℒ < ∞ ! 26 𝐿∞-norm Supremum is taken over adjacent datasets
27. 27. Sensitivity Theorem [Wang+15] (A) ℓ 𝜃, 𝑥 ≤ 𝐴 ⟹ Δℒ ≤ 2𝐴 (B) ℓ 𝜃, 𝑥 − ℓ 𝜃, 𝑥′ ≤ 𝐴 ⟹ Δℒ ≤ 𝐴 27 𝜃 𝐴 𝜃 𝐴
28. 28. Loss function that does not satisfy (𝜀, 0)- DP • Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧 • The max difference of loss (≈ 𝑀) grows toward +∞ as DiamΘ → ∞ 28𝜃 𝑀 ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 ) +∞
29. 29. Loss function that does not satisfy (𝜀, 0)- DP • Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧 • The max difference of loss (≈ 𝑀) grows toward +∞ as DiamΘ → ∞ 29𝜃 𝑀 ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 ) +∞ We need differential privacy without sensitivity!
30. 30. From bounded to Lipschitz • In the example of logistic loss, the 1st derivative is bounded • The Lipschitz constant 𝐿 is not influenced by the size of parameter space DiamΘ 30
31. 31. Main theorem 31 Theorem [Minami+16] Assumption: 1. For all 𝑥 ∈ 𝒳, ℓ(⋅, 𝑥) is 𝐿-Lipschitz and convex 2. The prior is log-strongly-concave i.e. − log 𝜋(⋅) is 𝑚 𝜋-strongly convex 3. Θ = ℝ 𝑑  The Gibbs posterior 𝐺 𝛽,𝐷 satisfies (𝜀, 𝛿)-DP if 𝛽 > 0 is chosen as (1) Independent of the sensitivity!
32. 32. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 32
33. 33. Example: Logistic Loss Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦( 𝑎, 𝑧 + 𝑏) 33 𝒵 = 𝑧 ∈ ℝ 𝑑, ∥ 𝑧 ∥2≤ 𝑅 𝒳 = 𝑧, 𝑦 ∣ 𝑧 ∈ 𝒵, 𝑦 ∈ −1, +1 𝜃 = (𝑎, 𝑏)
34. 34. Example: Logistic Loss • Gaussian prior 𝜋 𝜃 = 𝑁 𝜃 0, 𝑛𝜆 −1 𝐼 • The Gibbs posterior is given by: • 𝐺 𝛽 satisfies (𝜀, 𝛿)-DP if 34
35. 35. Langevin Monte Carlo method • In practice, sampling from the Gibbs posterior can be a computationally hard problem • Some approximate sampling methods are used (e.g. MCMC, VB) 35
36. 36. Langevin Monte Carlo method • Langevin Monte Carlo (LMC) 36 GD LMC
37. 37. Langevin Monte Carlo method • “Mixing-time” results have been derived for log-concave distributions [Dalalyan14][Durmus & Moulines15] • LMC can attain 𝛾-approximation after finite 𝑇 iterations • Polynomial time in 𝑛 and 𝛾−1 : 𝑇 ∼ 𝑂 𝑛 𝛾 2 log 𝑛 𝛾 2 37
38. 38. • I have a Privacy Preservation guarantee • I have an Approximate Posterior • (Ah…) 38
39. 39. Privacy Preserving Approximate Posterior (PPAP) • We can prove (𝜀, 𝛿′)-DP of LMC-Gibbs posterior Proposition [Minami+16] • Assume that ℓ and 𝜋 satisfies the assumption of Main Theorem. • We also assume that ℓ(⋅, 𝑥) is 𝑀-smooth for every 𝑥 ∈ 𝒳 • After 𝑂 𝑛 𝛾 2 log 𝑛 𝛾 2 iterations, the output of the LMC satisfies (𝜀, 𝛿 + 𝑒 𝜀 + 1 𝛾)-DP. 39
40. 40. Summary 1. Differentially private learning = Differential privacy + Statistical learning 2. We developed a new method to prove (𝜀, 𝛿)-DP for Gibbs posteriors without “sensitivity” • Applicable to Lipschitz & convex losses • (+) Guarantee for an approximate sampling method Thank you! 40