1. Privacy-Preserving Data Analysis
Adria Gascon
The Alan Turing Institute & Warwick University
Based on joint work with Borja Balle, Phillipp Schoppmann,
Mariana Raykova, Jack Doerner, Samee Zahur, David
Evans, Age Chapman, Alan Davoust, Peter Buneman
2. What analysis on what data?
Fined grained private data, e.g. tracking for targeted
advertising, credit scoring...
Data held by several organisations, e.g. hospitals?
Data held by individuals, e.g. on their phones?
4. Adria Gascon Phillipp Schoppmann Borja Balle
Mariana Raykova Jack Doerner Samee Zahur David Evans
Privacy Preserving
Distributed Linear Regression on
High-Dimensional Data
5. Motivation
Treatment
Outcome
Medical Data
Census Data
Financial Data
Atr. 1 Atr. 2 … Atr. 4 Atr. 5 … Atr. 7 Atr. 8 …
-1.0 0 54.3 … North 34 … 5 1 …
1.5 1 0.6 … South 12 … 10 0 …
-0.3 1 16.0 … East 56 … 2 0 …
0.7 0 35.0 … Centre 67 … 15 1 …
3.1 1 20.2 … West 29 … 7 1 …
Note: This is vertcally-parttoned data; similar problems with horizontally-parttoned
6. Private Multi-Party Machine Learning
Assumptons
• Parameters of the model will be received by all partes
• Partes can engage in on-line secure communicatons
• External partes might be used to outsource
computaton or initalize cryptographic primitves
Problem
• Two or more partes want to jointly learn a model of
their data
• But they can’t share their private data with other partes
7. The Trusted Party “Solution”
(secure channel)
(secure channel)
(secure channel)
Trusted
Party
Receives plain-text data, runs
algorithm, returns result to partes
?
The Trusted Party assumpton:
• Introduces a single point of failure
• Relies on weak incentves
• Requires agreement between all data providers
=> Useful but unrealistc. Maybe can be simulated?
8. Secure Multi-Party Computation (MPC)
Public:
Private:
(party i)
Goal:
Compute f in a way that each party
learns y (and nothing else!)
9. Our Contribution
A PMPML system for vertcally parttoned linear regression
Features:
• Scalable to millions of records and hundreds of dimensions
• Formal privacy guarantees (semi-honest security)
• Open source implementaton
Tools:
• Combine standard MPC constructons (GC, OT, TI, …)
• Efcient private inner product protocols
• Conjugate gradient descent robust to fxed-point encodings
10. FAQ: Why is PMPML…
Excitng?
Can provide access to previously ”locked” data
Hard?
Privacy is tricky to formalize, hard to implement,
and inherently interdisciplinary
Worth?
Beter models while avoiding legal risks and bad
PR
11. Read It, Use It
https://github.com/schoppmp/linreg-mpc
http://eprint.iacr.org/2016/892PETS’17
15. Adria Gascon James Bell Tejas Kulkarni
Privacy-Preserving Distributed
Hypothesis Testng
16. ● Drop off in Manhattan?
● Tip over 25 %?
● Was it a short journey?
● Was payment method
credit card?
Drop-off in Manhattan and tip over 25%
are significantly correlated events.
But this result is differentially private, so I cannot easily tell
if a given journey was included in the training dataset or not.
18. ●
Problem: Check security properties on (private)
source code.
●
“Public” equivalent: MOPS [1], and some others.
– Security property expressed as regular expression over
sequences of instructions
– Find all paths in control flow graph that match path
●
Application of Private Regular Path Queries
[1] Hao Chen and David Wagner. 2002. MOPS: an infrastructure for examining security properties of software.
In Proceedings of the 9th ACM conference on Computer and communications security (CCS '02), Vijay Atluri
(Ed.). ACM, New York, NY, USA, 235-244. DOI=http://dx.doi.org/10.1145/586110.586142
Privacy-Preserving Model Checking
23. Related Work
Verification Across Intellectual Property Boundaries [2]:
[2] Chaki, Sagar, Christian Schallhart, and Helmut Veith. "Verification across intellectual property boundaries."
ACM Transactions on Software Engineering and Methodology (TOSEM) 22.2 (2013): 15.
24. Related Work
Verification Across Intellectual Property Boundaries [2]
They also say...
“While we are aware of advanced methods such as secure multiparty computation
[Goldreich 2002] and zeroknowledge proofs [Ben-Or et al. 1988], we believe that they are
impracticable for our problem, as such methods cannot be easily wrapped over given
validation tools. Finally, we believe that any advanced method without an intuitive proof for
its secrecy will be heavily opposed by the supplier—and might therefore be hard to
establish in practice.”
25. Case study: thttpd
●
Tiny http server
●
2 main modules (thttp.c and libhttp.c)
thttp.c
(2k loc)
libhttp.c
(4k loc)
26. thttpd control flow graph...
●
2 main modules only
●
functions are disconnected
27. thttpd: next steps
●
Adapt private Regular Path Queries work for
pushdown automata
●
Find some bugs.
●
Write paper.
●
Voila!