Privacy-Preserving Data Analysis
The Alan Turing Institute & Warwick University
Based on joint work with Borja Balle, Phillipp Schoppmann,
Mariana Raykova, Jack Doerner, Samee Zahur, David
Evans, Age Chapman, Alan Davoust, Peter Buneman
What analysis on what data?
Fined grained private data, e.g. tracking for targeted
advertising, credit scoring...
Data held by several organisations, e.g. hospitals?
Data held by individuals, e.g. on their phones?
Data owners (of course)
Adria Gascon Phillipp Schoppmann Borja Balle
Mariana Raykova Jack Doerner Samee Zahur David Evans
Distributed Linear Regression on
Atr. 1 Atr. 2 … Atr. 4 Atr. 5 … Atr. 7 Atr. 8 …
-1.0 0 54.3 … North 34 … 5 1 …
1.5 1 0.6 … South 12 … 10 0 …
-0.3 1 16.0 … East 56 … 2 0 …
0.7 0 35.0 … Centre 67 … 15 1 …
3.1 1 20.2 … West 29 … 7 1 …
Note: This is vertcally-parttoned data; similar problems with horizontally-parttoned
Private Multi-Party Machine Learning
• Parameters of the model will be received by all partes
• Partes can engage in on-line secure communicatons
• External partes might be used to outsource
computaton or initalize cryptographic primitves
• Two or more partes want to jointly learn a model of
• But they can’t share their private data with other partes
The Trusted Party “Solution”
Receives plain-text data, runs
algorithm, returns result to partes
The Trusted Party assumpton:
• Introduces a single point of failure
• Relies on weak incentves
• Requires agreement between all data providers
=> Useful but unrealistc. Maybe can be simulated?
Secure Multi-Party Computation (MPC)
Compute f in a way that each party
learns y (and nothing else!)
A PMPML system for vertcally parttoned linear regression
• Scalable to millions of records and hundreds of dimensions
• Formal privacy guarantees (semi-honest security)
• Open source implementaton
• Combine standard MPC constructons (GC, OT, TI, …)
• Efcient private inner product protocols
• Conjugate gradient descent robust to fxed-point encodings
FAQ: Why is PMPML…
Can provide access to previously ”locked” data
Privacy is tricky to formalize, hard to implement,
and inherently interdisciplinary
Beter models while avoiding legal risks and bad
Read It, Use It
Adria Gascon James Bell Tejas Kulkarni
● Drop off in Manhattan?
● Tip over 25 %?
● Was it a short journey?
● Was payment method
Drop-off in Manhattan and tip over 25%
are significantly correlated events.
But this result is differentially private, so I cannot easily tell
if a given journey was included in the training dataset or not.
Problem: model-check security properties on
private source code.
Privacy-Preserving Model Checking
Problem: Check security properties on (private)
“Public” equivalent: MOPS , and some others.
– Security property expressed as regular expression over
sequences of instructions
– Find all paths in control flow graph that match path
Application of Private Regular Path Queries
 Hao Chen and David Wagner. 2002. MOPS: an infrastructure for examining security properties of software.
In Proceedings of the 9th ACM conference on Computer and communications security (CCS '02), Vijay Atluri
(Ed.). ACM, New York, NY, USA, 235-244. DOI=http://dx.doi.org/10.1145/586110.586142
Privacy-Preserving Model Checking
Verification Across Intellectual Property Boundaries :
 Chaki, Sagar, Christian Schallhart, and Helmut Veith. "Verification across intellectual property boundaries."
ACM Transactions on Software Engineering and Methodology (TOSEM) 22.2 (2013): 15.
Verification Across Intellectual Property Boundaries 
They also say...
“While we are aware of advanced methods such as secure multiparty computation
[Goldreich 2002] and zeroknowledge proofs [Ben-Or et al. 1988], we believe that they are
impracticable for our problem, as such methods cannot be easily wrapped over given
validation tools. Finally, we believe that any advanced method without an intuitive proof for
its secrecy will be heavily opposed by the supplier—and might therefore be hard to
establish in practice.”
Case study: thttpd
Tiny http server
2 main modules (thttp.c and libhttp.c)
thttpd control flow graph...
2 main modules only
functions are disconnected
thttpd: next steps
Adapt private Regular Path Queries work for
Find some bugs.