Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Privacy-Preserving Data Analysis, Adria Gascon

163 views

Published on

SOCIAM all-hands meeting, September, University of Oxford

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Privacy-Preserving Data Analysis, Adria Gascon

  1. 1. Privacy-Preserving Data Analysis Adria Gascon The Alan Turing Institute & Warwick University Based on joint work with Borja Balle, Phillipp Schoppmann, Mariana Raykova, Jack Doerner, Samee Zahur, David Evans, Age Chapman, Alan Davoust, Peter Buneman
  2. 2. What analysis on what data?  Fined grained private data, e.g. tracking for targeted advertising, credit scoring...  Data held by several organisations, e.g. hospitals?  Data held by individuals, e.g. on their phones?
  3. 3. Who cares?  Data owners (of course)  Data controllers
  4. 4. Adria Gascon Phillipp Schoppmann Borja Balle Mariana Raykova Jack Doerner Samee Zahur David Evans Privacy Preserving Distributed Linear Regression on High-Dimensional Data
  5. 5. Motivation Treatment Outcome Medical Data Census Data Financial Data Atr. 1 Atr. 2 … Atr. 4 Atr. 5 … Atr. 7 Atr. 8 … -1.0 0 54.3 … North 34 … 5 1 … 1.5 1 0.6 … South 12 … 10 0 … -0.3 1 16.0 … East 56 … 2 0 … 0.7 0 35.0 … Centre 67 … 15 1 … 3.1 1 20.2 … West 29 … 7 1 … Note: This is vertcally-parttoned data; similar problems with horizontally-parttoned
  6. 6. Private Multi-Party Machine Learning Assumptons • Parameters of the model will be received by all partes • Partes can engage in on-line secure communicatons • External partes might be used to outsource computaton or initalize cryptographic primitves Problem • Two or more partes want to jointly learn a model of their data • But they can’t share their private data with other partes
  7. 7. The Trusted Party “Solution” (secure channel) (secure channel) (secure channel) Trusted Party Receives plain-text data, runs algorithm, returns result to partes ? The Trusted Party assumpton: • Introduces a single point of failure • Relies on weak incentves • Requires agreement between all data providers => Useful but unrealistc. Maybe can be simulated?
  8. 8. Secure Multi-Party Computation (MPC) Public: Private: (party i) Goal: Compute f in a way that each party learns y (and nothing else!)
  9. 9. Our Contribution A PMPML system for vertcally parttoned linear regression Features: • Scalable to millions of records and hundreds of dimensions • Formal privacy guarantees (semi-honest security) • Open source implementaton Tools: • Combine standard MPC constructons (GC, OT, TI, …) • Efcient private inner product protocols • Conjugate gradient descent robust to fxed-point encodings
  10. 10. FAQ: Why is PMPML… Excitng? Can provide access to previously ”locked” data Hard? Privacy is tricky to formalize, hard to implement, and inherently interdisciplinary Worth? Beter models while avoiding legal risks and bad PR
  11. 11. Read It, Use It https://github.com/schoppmp/linreg-mpc http://eprint.iacr.org/2016/892PETS’17
  12. 12. Adria Gascon Phillipp Schoppmann Borja Balle Private Document Classifcaton in Federated Databases
  13. 13. Secure document classification
  14. 14. Secure document classification
  15. 15. Adria Gascon James Bell Tejas Kulkarni Privacy-Preserving Distributed Hypothesis Testng
  16. 16. ● Drop off in Manhattan? ● Tip over 25 %? ● Was it a short journey? ● Was payment method credit card? Drop-off in Manhattan and tip over 25% are significantly correlated events. But this result is differentially private, so I cannot easily tell if a given journey was included in the training dataset or not.
  17. 17. Problem: model-check security properties on private source code. Privacy-Preserving Model Checking
  18. 18. ● Problem: Check security properties on (private) source code. ● “Public” equivalent: MOPS [1], and some others. – Security property expressed as regular expression over sequences of instructions – Find all paths in control flow graph that match path ● Application of Private Regular Path Queries [1] Hao Chen and David Wagner. 2002. MOPS: an infrastructure for examining security properties of software. In Proceedings of the 9th ACM conference on Computer and communications security (CCS '02), Vijay Atluri (Ed.). ACM, New York, NY, USA, 235-244. DOI=http://dx.doi.org/10.1145/586110.586142 Privacy-Preserving Model Checking
  19. 19. Secure queries on graph data
  20. 20. Simple Example 1 #include <stdio.h> 2 #include <sys/types.h> 3 #include <unistd.h> 4 #include <pwd.h> 5 6 void drop_priv() 7 { 8 struct passwd *passwd; 9 10 if ((passwd = getpwuid(getuid())) == NULL) 11 { 12 printf("getpwuid() failed"); 13 return; 14 } 15 printf("Drop user %s's privilegen", passwd- >pw_name); 16 seteuid(getuid()); 17 } 18 19 int main(int argc, char *argv[]) 20 { 21 drop_priv(); 22 printf("About to execn"); hello.c
  21. 21. Simple Example Control flow graph Security property FSA (system call with root priviledge)
  22. 22. Interesting case: distributed private graph (code) main.c library.c
  23. 23. Related Work Verification Across Intellectual Property Boundaries [2]: [2] Chaki, Sagar, Christian Schallhart, and Helmut Veith. "Verification across intellectual property boundaries." ACM Transactions on Software Engineering and Methodology (TOSEM) 22.2 (2013): 15.
  24. 24. Related Work Verification Across Intellectual Property Boundaries [2] They also say... “While we are aware of advanced methods such as secure multiparty computation [Goldreich 2002] and zeroknowledge proofs [Ben-Or et al. 1988], we believe that they are impracticable for our problem, as such methods cannot be easily wrapped over given validation tools. Finally, we believe that any advanced method without an intuitive proof for its secrecy will be heavily opposed by the supplier—and might therefore be hard to establish in practice.”
  25. 25. Case study: thttpd ● Tiny http server ● 2 main modules (thttp.c and libhttp.c) thttp.c (2k loc) libhttp.c (4k loc)
  26. 26. thttpd control flow graph... ● 2 main modules only ● functions are disconnected
  27. 27. thttpd: next steps ● Adapt private Regular Path Queries work for pushdown automata ● Find some bugs. ● Write paper. ● Voila!
  28. 28. Thanks!

×