1
Preserving privacy
while sharing data
Gordon Haff
Emerging Technology Evangelist
@ghaff
https://bitmason.blogspot.com
September 2020
@ghaff https://bitmason.blogspot.com
2
Shared data can
accelerate innovation
and improve outcomes
in energy, telecoms,
healthcare...
The opportunity
@ghaff https://bitmason.blogspot.com
3
But data can be private
and sensitive at the
individual person or
organization level
The problem
@ghaff https://bitmason.blogspot.com
4
Source: Andrew Trask
@ghaff https://bitmason.blogspot.com
5
Anonymization
Removing/tokenizing personal
data fields
Encrypt/transform personal
data fields
Aggregation by trusted agency
@ghaff https://bitmason.blogspot.com
6
Does it work?
Sort of...
@ghaff https://bitmason.blogspot.com
7
What’s personal data?
Who can you really trust?
Lack of data diversity (e.g.
k-anonymity failures)
Susceptibility to attack
@ghaff https://bitmason.blogspot.com
8
@ghaff https://bitmason.blogspot.com
9
Reconstruction
Source: US Census
@ghaff https://bitmason.blogspot.com
10
Identification of patterns
Source: https://avtanski.net/
@ghaff https://bitmason.blogspot.com
11
Source: Privitar
Linkage attacks
@ghaff https://bitmason.blogspot.com
12
Re-identification
@ghaff https://bitmason.blogspot.com
13
US Census
@ghaff https://bitmason.blogspot.com
14
US Census
@ghaff https://bitmason.blogspot.com
15
Differential Privacy
Response to erosion of traditional Statistical
Disclosure Limitation (SDL) techniques
Widely share statistics over a set of data without
revealing anything about individuals
2006 Dwork, McSherry, Nissim, and Smith
(ε-differential privacy)
@ghaff https://bitmason.blogspot.com
16
Requirements
Formal model
Resist linkage attacks
Resist unknown future attacks
Effective in settings in which
extensive external information
may be available
@ghaff https://bitmason.blogspot.com
17
Injects random data into a data
set (in a mathematically rigorous
way) to protect individual privacy
Value of randomness trades off
privacy and utility/accuracy
https://www.accessnow.org/understanding-
differential-privacy-matters-digital-rights/
@ghaff https://bitmason.blogspot.com
18
@ghaff https://bitmason.blogspot.com
19
Limitations
Base rate
Noise
Repeated queries
@ghaff https://bitmason.blogspot.com
20
But what if you don’t
have a trusted
third-party?
@ghaff https://bitmason.blogspot.com
21
Multi-Party Computation
Collaborative analysis of silo-ed datasets
without trusting a third party
● Equivalence to incorruptible trusted party
● Parties jointly compute a function on their
inputs using a protocol
● No information is revealed about inputs
@ghaff https://bitmason.blogspot.com
22
Preserve privacy and correctness
Adversarial participants
Collusion
Threat models
Overhead
Considerations
@ghaff https://bitmason.blogspot.com
23
Protocol distributes encrypted
(AES) shares of (masked) data
Implementations and efficiency
depend on threat assumptions
In general, low compute but high
communications overhead
@ghaff https://bitmason.blogspot.com
24
Ongoing research
● Subscribe to:
https://research.redhat.com/quarterly/
● Boston University Red Hat Collaboratory
● openmined.org (PySyft)
CONFIDENTIAL Designator
linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
facebook.com/redhatinc
twitter.com/RedHat
25
Red Hat is the world’s leading provider of enterprise
open source software solutions. Award-winning support,
training, and consulting services make Red Hat a trusted
adviser to the Fortune 500.
Thank you

Preserving privacy while sharing data