Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Differential privacy without sensitivity [NIPS2016読み会資料]

2,793 views

Published on

NIPS2016読み会の発表資料です
イベント: https://connpass.com/event/47580/
紹介論文: http://papers.nips.cc/paper/6050-differential-privacy-without-sensitivity

Published in: Technology
  • Be the first to comment

Differential privacy without sensitivity [NIPS2016読み会資料]

  1. 1. Differential Privacy without Sensitivity 南 賢太郎(東大 情報理工 D1) 2017/1/19@NIPS2016読み会
  2. 2. Overview Differential privacy (DP) • Degrees of privacy protection [Dwork+06] Gibbs posterior • A generalization of the Bayesian posterior Contribution We proved (𝜀, 𝛿)-DP of the Gibbs posterior without boundedness of the loss 2
  3. 3. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 3
  4. 4. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 4
  5. 5. Privacy constraint in ML & statistics 5 𝑋1 𝑋2 𝑋 𝑛 ⋯ User’s data 𝐷 Curator Statistic 𝜃
  6. 6. Privacy constraint in ML & statistics 6 𝑋1 𝑋2 𝑋 𝑛 ⋯ User’s data 𝐷 Curator Statistic 𝜃 In many applications of ML & statistics, the data 𝐷 = {𝑋1, … , 𝑋 𝑛} contains user’s personal information Problem: Calculate a statistic of interest 𝜃 privately TBD.
  7. 7. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 7 𝑋1 𝑋2 𝑋 𝑛 ⋯
  8. 8. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 8 𝑋1 𝑋2 𝑋 𝑛 ⋯ 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Auxiliary info. 𝐷′
  9. 9. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 9 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise
  10. 10. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 10 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯
  11. 11. Adversarial formulation of privacy Example: Mean of binary-valued query (Yes: 1, No: 0) 11 𝑋1 𝑋2 𝑋 𝑛 ⋯ Noise 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Small noise for 𝜃  Adding noise may not deteriorate the accuracy Large noise for 𝑋𝑖  Privacy preservation
  12. 12. Differential privacy Idea: 1. Generate a random 𝜃 from a data-dependent distribution 𝜌 𝐷 12 𝑋1 𝑋2 𝑋 𝑛 ⋯
  13. 13. Differential privacy Idea: 2. Two “adjacent” datasets differing in a single individual should be statistically indistinguishable 13 𝑋1 𝑋2 𝑋 𝑛 ⋯ 𝑋1 ′ 𝑋2 𝑋 𝑛 ⋯ Close in the sense of a “statistical distance”
  14. 14. Differential privacy Def: Differential Privacy [Dwork+06] • 𝜀 > 0, 𝛿 ∈ [0, 1) privacy parameters • 𝜌 𝐷 satisfies (𝜀, 𝛿)-differential privacy if 1. for any adjacent datasets 𝐷, 𝐷′, and 2. for any set 𝐴 ⊂ Θ of outputs, the following inequality holds: 14
  15. 15. Interpretation of DP • DP prevents identification with statistical significance • e.g. Adversary cannot construct power 𝛾-test for 𝐻0: 𝑋𝑖 = 𝑋 𝑣. 𝑠. 𝐻1: 𝑋𝑖 ≠ 𝑋 at 5% significance level • See also: 15
  16. 16. DP and statistical learning Example: Linear classification • Find a 𝜀, 𝛿 -DP distribution of hyperplanes that minimizes the expected classification error 16
  17. 17. Differentially private learning Question: What kind of random estimators should we use? 1. Noise addition to a deterministic estimator • e.g. maximum likelihood estimator + noise 2. Modification of the Bayesian posterior (this work) 17
  18. 18. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 18
  19. 19. Gibbs posterior • Bayesian posterior • Introduce a “scale parameter” 𝛽 > 0 19
  20. 20. Gibbs posterior A natural data-dependent distribution in statistics & ML • Contains the Bayesian posterior ℓ 𝜃, 𝑥 = − log 𝑝 𝑥 𝜃 , 𝛽 = 1 • Important in PAC-Bayes theory [Catoni07][Zhang06] 20 Loss function ℓ(𝜃, 𝑥) Prior distribution 𝜋 Inverse temperature 𝛽 > 0
  21. 21. Gibbs posterior 21 𝛽 → 0
  22. 22. Gibbs posterior Problem • If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior • Is DP satisfied if we choose 𝛽 > 0 sufficiently small? 22 𝛽 → 0
  23. 23. Gibbs posterior Problem • If 𝛽 ↓ 0, 𝐺 𝛽 𝜃 𝐷 is flattened and get close to the prior • Is DP satisfied if we choose 𝛽 > 0 sufficiently small? 23 𝛽 → 0 Answer Yes, if… • ℓ is bounded (Previously known) • 𝛻ℓ is bounded (This work)
  24. 24. The exponential mechanism Theorem [MT07] An algorithm that draws 𝜃 from a distribution satisfies (𝜀, 0)-DP 24
  25. 25. The exponential mechanism Theorem [MT07] An algorithm that draws 𝜃 from a distribution satisfies (𝜀, 0)-DP • This is the Gibbs posterior if ℒ 𝜃, 𝐷 = 𝑖=1 𝑛 ℓ(𝜃, 𝑥𝑖) • 𝛽 has to satisfy 𝛽 ≤ 𝜀 2Δℒ • Δℒ: sensitivity (TBD.) 25
  26. 26. Sensitivity Definition: Sensitivity of ℒ: Θ × 𝒳 𝑛 → ℝ • The exponential mechanism works if 𝛥ℒ < ∞ ! 26 𝐿∞-norm Supremum is taken over adjacent datasets
  27. 27. Sensitivity Theorem [Wang+15] (A) ℓ 𝜃, 𝑥 ≤ 𝐴 ⟹ Δℒ ≤ 2𝐴 (B) ℓ 𝜃, 𝑥 − ℓ 𝜃, 𝑥′ ≤ 𝐴 ⟹ Δℒ ≤ 𝐴 27 𝜃 𝐴 𝜃 𝐴
  28. 28. Loss function that does not satisfy (𝜀, 0)- DP • Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧 • The max difference of loss (≈ 𝑀) grows toward +∞ as DiamΘ → ∞ 28𝜃 𝑀 ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 ) +∞
  29. 29. Loss function that does not satisfy (𝜀, 0)- DP • Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦 𝜃, 𝑧 • The max difference of loss (≈ 𝑀) grows toward +∞ as DiamΘ → ∞ 29𝜃 𝑀 ℓ(𝜃, 𝑧, +1 ) ℓ(𝜃, 𝑧, −1 ) +∞ We need differential privacy without sensitivity!
  30. 30. From bounded to Lipschitz • In the example of logistic loss, the 1st derivative is bounded • The Lipschitz constant 𝐿 is not influenced by the size of parameter space DiamΘ 30
  31. 31. Main theorem 31 Theorem [Minami+16] Assumption: 1. For all 𝑥 ∈ 𝒳, ℓ(⋅, 𝑥) is 𝐿-Lipschitz and convex 2. The prior is log-strongly-concave i.e. − log 𝜋(⋅) is 𝑚 𝜋-strongly convex 3. Θ = ℝ 𝑑  The Gibbs posterior 𝐺 𝛽,𝐷 satisfies (𝜀, 𝛿)-DP if 𝛽 > 0 is chosen as (1) Independent of the sensitivity!
  32. 32. Outline 1. Differential privacy 2. Differentially private learning 1. Background 2. Main result Differential privacy of Gibbs posterior [Minami+16] 3. Applications 1. Logistic regression 2. Posterior approximation method 32
  33. 33. Example: Logistic Loss Logistic loss ℓ 𝜃, (𝑧, 𝑦) = log 1 + exp −𝑦( 𝑎, 𝑧 + 𝑏) 33 𝒵 = 𝑧 ∈ ℝ 𝑑, ∥ 𝑧 ∥2≤ 𝑅 𝒳 = 𝑧, 𝑦 ∣ 𝑧 ∈ 𝒵, 𝑦 ∈ −1, +1 𝜃 = (𝑎, 𝑏)
  34. 34. Example: Logistic Loss • Gaussian prior 𝜋 𝜃 = 𝑁 𝜃 0, 𝑛𝜆 −1 𝐼 • The Gibbs posterior is given by: • 𝐺 𝛽 satisfies (𝜀, 𝛿)-DP if 34
  35. 35. Langevin Monte Carlo method • In practice, sampling from the Gibbs posterior can be a computationally hard problem • Some approximate sampling methods are used (e.g. MCMC, VB) 35
  36. 36. Langevin Monte Carlo method • Langevin Monte Carlo (LMC) 36 GD LMC
  37. 37. Langevin Monte Carlo method • “Mixing-time” results have been derived for log-concave distributions [Dalalyan14][Durmus & Moulines15] • LMC can attain 𝛾-approximation after finite 𝑇 iterations • Polynomial time in 𝑛 and 𝛾−1 : 𝑇 ∼ 𝑂 𝑛 𝛾 2 log 𝑛 𝛾 2 37
  38. 38. • I have a Privacy Preservation guarantee • I have an Approximate Posterior • (Ah…) 38
  39. 39. Privacy Preserving Approximate Posterior (PPAP) • We can prove (𝜀, 𝛿′)-DP of LMC-Gibbs posterior Proposition [Minami+16] • Assume that ℓ and 𝜋 satisfies the assumption of Main Theorem. • We also assume that ℓ(⋅, 𝑥) is 𝑀-smooth for every 𝑥 ∈ 𝒳 • After 𝑂 𝑛 𝛾 2 log 𝑛 𝛾 2 iterations, the output of the LMC satisfies (𝜀, 𝛿 + 𝑒 𝜀 + 1 𝛾)-DP. 39
  40. 40. Summary 1. Differentially private learning = Differential privacy + Statistical learning 2. We developed a new method to prove (𝜀, 𝛿)-DP for Gibbs posteriors without “sensitivity” • Applicable to Lipschitz & convex losses • (+) Guarantee for an approximate sampling method Thank you! 40

×