Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20130219 nofreelunch arai

631 views

Published on

  • Be the first to comment

20130219 nofreelunch arai

  1. 1. PPDM勉強会第3回 “No Free Lunch in Data Privacy” (Daniel Kifer, Ashwin Machanavajjhala SIGMOD2011) 理化学研究所 荒井ひろみ
  2. 2. Contributions of this paper 1.  Simplify the impossibility result* 2.  Show privacy definition that relies on assumptions about data generating mechanism and compare with DP 3.  Propose a guideline for determining whether DP is suitable for a given application. 4.  Demonstrate cases that DP does not meet the guideline 1.  when applied to arbitrary SNS data. 2.  when applied to tabular data when an attacker has aggregate-level background knowledge. 5.  Propose a modification of DP for tabular data with aggregate-level background knowledge *briefly,“answering many queries w/ bounded noise does not preserve privacy”
  3. 3. Outline }  Brief review of differential privacy (DP) }  Analysis of attacker }  Define discriminant, non privacy game, note no free lunch theorem }  Show relation between DP and no free lunch theorem }  Privacy risks of DP algorithms for various attackers }  Unsuitable cases : naïve application for correlated data }  New DP definition subject to background knowledge }  Introducing restrictions that represent previously released exact query answers
  4. 4. Differential privacy : problem setting }  Query-answering mechanism 顧客 age Item A Item B … A 25 0 1 … B 42 1 1 … … answer 顧客 Type x Type Y Type Z 20代 324 1 52 30代 34 34 13 … DB wants to preserve privacy of individuals Query answers: statistical information etc. query Collection of private data
  5. 5. Differential privacy : motivation }  A privacy guarantee that limits risk incurred by JOINING encourages participation in the dataset. }  minimize the increased risk to an individual incurred by joining (or leaving) the database. (NOT comparing an adversary's prior and posterior views ) w/ Yuko w/o Yuko Not so different Recall : Dalenius’s problem }  if the statistical database teaches us anything at all, then it should change our beliefs about individuals }  the things that statistical databases are designed to teach can, sometimes indirectly, cause damage to an individual, even if this individual is not in the database.
  6. 6. Differential privacy : definition D D’ … Almost same probability Randomized query-response algorithm K [Dwork06] S S’ P(K(D) ∈S) P(K(D’) ∈S) Range(K) Definition of the neighboring DBs is very important in this paper
  7. 7. Differential privacy: mechanism •  Density function of the Laplace distribution •  for any z, z’ such that |z – z’| ≦ 1,the density at z is at most  times the density at z’, satisfying the condition in [Dwork06]
  8. 8. Definitions of DP }  Two flavors of DP }  Deleting of inserting a tuple: unbounded }  Changing tuple value: bounded •  Note that the existence of the tuple ≠participation !
  9. 9. The no-free-lunch theorem }  It is not possible to guarantee privacy and utility w/o making assumptions about the data-generating mechanism… }  To discuss this problem: }  Define the discriminant ω as a lower bound on utility }  Analyze ω of the Laplace mechanism }  Define the non-privacy game }  Propose no free lunch theorem }  Free lunch theorem for DP
  10. 10. Discriminant (as a utility measure) }  ω : a measure for query accuracy. If ω ~1 * , A answers with reasonable accuracy. }  A : randomized answering query processor }  Integer k : like anonymity parameter ? }  Constraint c : lower bound of utility of A with parameter k. * Note that the discriminant is 1 for deterministic algorithm e.g. k-anonymity algorithm.
  11. 11. Discriminant D D’ … e.g. k=2 S S’ P(A(D) ∈S≧c P(A(D’) ∈S’)≧c Range(K)
  12. 12. Example of discriminant }  Canser-patient DB }  # of canser pationts in DB : D1: 0 / D2: 10,000 / D3: 20,000 }  S1=[0,1000], S2=[9000,11000], S3=[19000,∞], }  P(A(Di)∈Si)≧0.95 for all i
  13. 13. Discriminant of the Laplace mechanism Intuitive description: }  Laplace mechanism w/ sensitivity 0.5 }  Choose n large enough → we can choose {Di} and {Si} so that the distances between Di ‘s and the ranges of Si’s are large enough → discriminant became 1 ∝n
  14. 14. Non-privacy game }  Privacy definition as a game: }  Assume a data-generating mechanism P }  The attacker guess a true answer q(D) from a randomized answer A(D) against a sensitive query q P D q A(D) q(D) ?
  15. 15. No free lunch theorem }  Providing both privacy (as a game) and utility is impossible if there are no restriction on data-generating mechanism }  If D is uniformly distributed, the attacker’s strategy is to guess q(D) if A(D) ∈ Si }  The attacker’s guess is correct w/ probability 1/k w/o A(D). }  He wins w/ probability ~1 w/ A(D) !
  16. 16. No free lunch and differential privacy }  Privacy definition w/o assumption about the data : }  Note: the discriminant ω(k;A) of any algorithm A satisfying ε-free-lunch privacy is bounded by }  (my interpretation) Let P(A(D1)∈S)=c. There are at least k-1 possible DB instances {Di} where ce-ε ≦ P(A(Di)∈S)≦ceε. Using Σi P(A(Di)∈S)=1, c≦
  17. 17. Privacy risks in differential privacy }  General guideline for determining a privacy definition }  Note that the DP for more knowledgeable attacker add less noise ! Consider three kinds of DP algorithm
  18. 18. example }  Consider the table with 1 tuple (Bob) and two 2-bit attributes R1 and R2: 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 neighbors neighbors neighbors’ neighbors Bounded DP (tuple) Attribute DP 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 neighbors’ neighbors’ neighbors Bit DP neighbors’ neighbors neighbors Probability of answering the true record Question: boundで はないの か…?
  19. 19. example }  Consider the table with 1 tuple (Bob) and two 2-bit attributes R1 and R2: 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 neighbors neighbors neighbors’ neighbors Bounded DP (tuple) Attribute DP 00 00 00 01 00 10 00 11 01 00 01 01 01 10 01 11 10 00 10 01 10 10 10 11 11 00 11 01 11 10 11 11 neighbors’ neighbors’ neighbors Bit DP neighbors’ neighbors neighbors Probability of answering the true record Higher lower Probability of answering the true record
  20. 20. Problem: correlated data }  If several records are known to have the same attribute value, sensitivity must be larger }  E.g. disease database : bob and his family might have the same disease }  How should we deal with this problem? }  Hide evidence of participation (any influence of a certain participation) }  Discussion }  Growing SNSs }  Prior knowledge about exact statistics
  21. 21. Growing social networks Assume edge-growing SNSs. }  Let the network grow, after which the attacker will ask the query “how many edges are there between the two communities". Can we preserve privacy of Bob’s external link? →making assumptions about data generating model 1.  Forest Fire model 2.  Copying model 3.  MVS model From simulation: }  1,2→ we cannot set a noise parameter ε reliably unless we know the network parameters (model parameters or final edge number). }  3→ has a steady state distribution, rather favorable Only bob has external link Initial state of two clusters Charlie Bob MVS model
  22. 22. Privacy breach after some exact data releases }  Example: contingency tables(deterministic) and additional differential private data release }  a demonstration for additional privacy breach (4.1) }  Consider a table T, attribute R w/ domain {r1,…,rk} }  k-1 queries : }  If we additionally knew the exact answer to “select count(*) from T where R=ri”, we would be able to exactly reconstruct the table. → the tuples are correlated !! }  Additional differential private answers...
  23. 23. Privacy breach after some exact statistics release (2) }  Consider a table T, attribute R w/ domain {r1,…,rk} }  k-1 queries : }  Additional k ε-differential private answers... }  If k is large (e.g. d-bit vector w/ 2^d possible value) the variance is small (recall 2.2 knowledge vs privacy risk) → T is reconstructed w/ very high probability … (due to correlation w/ prior release of information)
  24. 24. A plausible deniability (idea) }  What we should do to maintain consistency w/ previously deterministic query answers have been released? }  We should choose bounded DP }  If the number of tuples had been answered previously, the number of tuples might be stay the same }  In general, we can maintain consistency in several ways… }  exchange attribute values collaboratively, for example
  25. 25. differential privacy subject to background knowledge }  definitions R L 計 M 43 9 52 F 44 4 48 計 87 13 100 contingency table cell cell count table T id gender hande dness taro male left hana female right … … … R L 計 M 42 9 52 F 44 4 48 計 87 13 100 move
  26. 26. differential privacy subject to background knowledge }  Define DP for neighboring tables
  27. 27. Neighbors induced by other prior statistics }  Example: exact query answer for “select gender, count(*) from T group by gender” }  × unbounded DP : the number of tuples is already published }  × bounded DB : we cannot arbitrarily modify a single tuple.. }  Define neighbors that maintaining consistency with the prior query answers:
  28. 28. Neighbor-based algorithm for DP }  Definitions : }  distance function between two contingency tables: }  To achieve 2ε-generic DP, exponential mechanism [McSherry06], can be used. (Δq=d(Ta,Tb))
  29. 29. Neighbor-based algorithm for DP }  Laplace mechanism : }  Sensitivity }  The Laplace mechanism adds noise (the probability of density function is ) to the query answer.
  30. 30. NP-hard problem }  Dealing with neighbors under constraints is NP-hard problem →the general problem of finding an upper bound on the sensitivity of a query is at least co-NP-hard, and we suspect that the problem is πp2-complete.
  31. 31. The case where efficient algorithms exists… }  Consider 2d table }  Let the query qall as:“SELECT R1, R2, COUNT(*) FROM T GROUP BY R1, R2”. }  The sensitivity of qall can be computed using the following lemma: }  Removing a subset paths that form Hamiltonian cycles, it is shown that the original set of moves was the smallest set of moves.
  32. 32. Related works }  Impossibility result [Dwork06, Dinor & Nissim03, etc.] }  Answering many queries w/ bounded noise does not preserve privacy }  SNS privacy }  Relationship privacy [Rastogi09] }  Adversarial privacy : an algorithm is private if the posterior P(t|O) dist. Is close to the prior P(t) (weaker than indistinguishability). }  Assumptions about the data (SNSs?) }  Small perturbation, higher utility than the existing laplace mechanism }  Resistance to various attackers [Kasiviswanathan08]
  33. 33. Summary 1.  They proposed no free lunch theorem (as a simplified version of the impossibility result) based on privacy game. 2.  They proposed a guideline for DP application ( we must consider not only data generating mechanism, also previously released data ). 3.  Show examples: 1.  when applied to arbitrary SNS data. 2.  when applied to tabular data when an attacker has aggregate-level background knowledge. 4.  They proposed a modification of DP for tabular data with aggregate-level background knowledge (contingency table). The knowledge can be described as the constraint of the existence of neighbors.

×