Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Measurement of Privacy

1,197 views

Published on

We examine the problem of measuring the affect of anonymisation upon a data set by utilising mutual information as a metric and applying varing degrees of differential privacy to causal and non-causal structures.

Published in: Technology
  • Be the first to comment

The Measurement of Privacy

  1. 1. The Measurement of Privacy A Lecture to FISMA, 19 April 2017, Espoo Ian Oliver, Wei Ren, Yoan Miche Security Research Group (Espoo) 1 © Nokia 19 April 2015 - Public
  2. 2. Outline of the Problem • Data Colletion is Ubiquitous • Balance between... • Privacy Law: GDPR, ePrivacy, SOX, COPPA, HIIPA, . . . • Business Need: Data quality, Data quantity, Information content, . . . • Consumer Trust: Excessive Data Collection, Sharing, . . . • and we want to share and process data (legally, of course) • and we want to defeat machine learning (somewhat...) 2 © Nokia 19 April 2015 - Public
  3. 3. Anonymisation Solution: Anonymise the Data 3 © Nokia 19 April 2015 - Public
  4. 4. Anonymisation Techniques How to Anonymise Data • Suppression • Hashing • Tokenisation (Equivalence Classes) • Encryption • κ-anonymisation • ℓ-diversity • (ϵ, δ)-Differential Privacy 4 © Nokia 19 April 2015 - Public
  5. 5. Problem How much anonymisation is required? 5 © Nokia 19 April 2015 - Public
  6. 6. What is Anonymisation? A = {(ϵ, δ) − differentialPrivacy, κ-anon, hash(. . .), . . .} (1) Where [απ1...πn ]+ is one or more applications of an instantiation of an anonymisation function with a given set of parameters, then: Di Do Mi Mo [απ1...πn ]+ m m > 6 © Nokia 19 April 2015 - Public
  7. 7. Desirable Properties of Anonymisation • The output dataset must be legal • The output dataset must be useful 7 © Nokia 19 April 2015 - Public
  8. 8. Legal and Useful..? A legal data set might be found by selecting all elements with a given amount (or lack of) information: χL(d : D, p : R[0,1]) = { 1, m(d) ≤ p 0, otherwise (2) A similar definition follows for χU for usefulness; and composed with χL gives: Di Do C [απ1...πn ]+ χL◦χU Assuming C exists and at least one entry in D≀ is ‘reachable’ 8 © Nokia 19 April 2015 - Public
  9. 9. The Challenge Define: sufficiently 9 © Nokia 19 April 2015 - Public
  10. 10. The Challenge or, can we measure the degree of anonymisation? 10 © Nokia 19 April 2015 - Public
  11. 11. The Challenge and thus, establish which data sets have the ‘desirable properties’ we require? 11 © Nokia 19 April 2015 - Public
  12. 12. So now we need a metric Which information content (entropy) metric? 12 © Nokia 19 April 2015 - Public
  13. 13. So now we need a metric Which information content (entropy) metric? Which fits our chosen model/framework 13 © Nokia 19 April 2015 - Public
  14. 14. Mutual Information • Basis for machine learning/AI • Well grounded theory, statistical basis • Used to evaluate internal consistency and relationships in datasets • Degenerates ‘nicely’ with too little data • xi, xj ∈ structs(D) • Extension: MI(Dx, Dy) 14 © Nokia 19 April 2015 - Public
  15. 15. Case Study • Collection of (‘anonymised’) signalling data • hashed(ID) × LOC × TIMESTAMP 15 © Nokia 19 April 2015 - Public
  16. 16. Method • Apply (ϵ, δ)-differential privacy to various combinations of fields • Create measurement mechanisms, easy for Location and Time • Select suitable MI estimator functions and parameters • Invent a few new ways of doing MI and Machine Learning... • Calculate the MI for each dataset • Match the results against the earlier model of anonymisation • Construct χL and χU 16 © Nokia 19 April 2015 - Public
  17. 17. Hashing Considered Harmful (for Anonymisation) 17 © Nokia 19 April 2015 - Public
  18. 18. Reduction in MI Causal vs Non-causal field anonymisation 18 © Nokia 19 April 2015 - Public
  19. 19. Rate of Reduction in MI Sensitivity of ϵ in (ϵ, 0)-differential privacy 19 © Nokia 19 April 2015 - Public
  20. 20. MI under (ϵ, δ)-Differential Privacy 20 © Nokia 19 April 2015 - Public
  21. 21. MI under (ϵ, δ)-Differential Privacy [ ∂MI ∂ϵ , ∂MI ∂δ ] (3) 21 © Nokia 19 April 2015 - Public
  22. 22. χL and χU This obviously depends upon how we define χL and χU but the intersection are the datasets C - now we look more local maxima/minima within there. 22 © Nokia 19 April 2015 - Public
  23. 23. Discussion • Privacy can be meaningfully metricised → evaluation of anonymisation techniques • Non-trivial datasets = MASSIVE amounts of computation • c.1 × 106 data points = 8-10 hours of computation, optimisations are possible for some estimators • MI estimators and distance functions are a problem (non-Euclidian, non-linear and non-existant in many cases) • Classification functions: χL could be very useful (lawyers replaced by algorithms?) • Heuristics for choosing (ϵ, δ) and κ • Causal vs Non-causal data points • Units of privacy elude us for the moment - MI per amount of data ? • Some surprises with differential privacy’s δ - has implications for quality of data and machine learning • Probability spaces, Kullback-Liebler, Earth-mover, Non-continuous mappings, eigenvectors, comparing matricies... 23 © Nokia 19 April 2015 - Public

×