Building AI with Security and Privacy in mind

2 0 J U L Y 2 0 2 1
B U I L D I N G A I W I T H S E C U R I T Y A N D
P R I V A C Y I N M I N D
G E E T A C H A U H A N
PyTorch Partner Engineering, Facebook AI
@ C H A U H A N G

CTO Connection 2021
A G E N D A 0 1

P R I V A C Y C H A L L E N G E S I N A I

0 2

P R I V A C Y P R E S E R V I N G M L

0 3

T O O L S & T E C H N I Q U E S

0 4

S T E P S F O R S T A R T I N G Y O U R J O U R N E Y

CTO Connection 2021
Privacy Challenges in AI

CTO Connection 2021
Centralized AI is like Closed Source of the 90s

CTO Connection 2021
• Privacy Tradeoff between protecting data privacy and training AI/ML models

• Tensions associated with data minimization and retention

• Proliferation of AI/ML models: heighten lack of public understanding

• As artificial intelligence evolves, it magnifies the ability to use information in ways that
can intrude on privacy interests

• Increasingly sensitive nature of data used for research raises other privacy challenges

• Sourcing of data that is free of bias

•
PRIVACY CHALLENGES IN AI

CTO Connection 2021
PRIVACY CHALLENGES

CTO Connection 2021
Privacy Preserving ML Techniques

CTO Connection 2021
PRIVACY PRESERVING ML TECHNIQUES

CTO Connection 2021
Data x Function f(.)
Encrypted Data c
Encrypted Output c'
HOMOMORPHIC ENCRYPTION

CTO Connection 2021
HOMOMORPHIC ENCRYPTION

CTO Connection 2021
D I F F E R E N T I A L P R I V A C Y
Promise, made by a data holder, or a curator, to a data subject:
 
“ You will not be affected, adversely or otherwise, by allowing
your data to be used in any study or analysis, no matter what
other studies, data sets, or information sources, are available ”

~ Cynthia Dwork

CTO Connection 2021
∀ D and D′ that di
ff
er in one person’s data ∀ x: ℙ[M(D) = x] ≤ exp(ε) ⋅ ℙ[M(D′) = x] +
𝛿
The distribution of the output M(D) on database D is (nearly) the same
as M(D′), where D and D′ di
ff
er in one person’s contributions.
Parameter ε quanti
fi
es
information leakage

Parameter
𝛿
gives some slack

(ε,
𝛿
)

CTO Connection 2021
Data corrupted
with noise
Function f(.)

Data corrupted
with noise

CTO Connection 2021
Jointly compute function f(.)
Random numbers Random numbers
Trusted Third Party
Jointly compute function f(.)
Secret

share

of Data x
Secret

share

of Data x
SECURE MULTI-PARTY COMPUTATION

CTO Connection 2021
B
SECURE MULTI-PARTY COMPUTATION

CTO Connection 2021
Function f(.)

Data x
Train / Evaluate
ON DEVICE COMPUTATION

CTO Connection 2021
Function f(.)

in an enclave

Encrypted Data c
Data x

Output and function
attestation
TRUSTED EXECUTION ENVIRONMENT

CTO Connection 2021
F E D E R A T E D L E A R N I N G
Federated Learning enables devices to
collaboratively train global models with
privacy by default

CTO Connection 2021
Clients
Server
...
⧖
⧖
⧖
⧖
⫐
Private
data
Eligibility
criteria
Need
me?
Checkin

CTO Connection 2021
Clients
Server
...
⧖
⧖
⧖
⧖
⫐
Private
data
Eligibility
criteria
Need
me?
Yes!
Not
now...
Select subset
of devices
Selection

CTO Connection 2021
Clients
Server
...
⧖
⧖
⧖
⧖
⫐
Model weights
and code
Current
model
Current model
distribution

CTO Connection 2021
Clients
Server
...
⟳
⟳
⟳
⧖
⫐
Λ
Updated
model
Model
training
On-device
model training

CTO Connection 2021
Clients
Server
...
⟳
⟳
⟳
⧖
➖
Model
delta
Focused collection,
deltas are ephemeral
Model update
sharing

CTO Connection 2021
Clients
Server
...
⟳
⟳
⟳
⧖
⟳
Weighted delta
aggregation
Σ
Λ
Optimizing current model
using weighted delta
Global model
optimization

CTO Connection 2021
Clients
Server
...
⟳
⟳
⟳
⧖
⟳
Monitor and
snapshot progress
Repeat until
model
converges
Repeat until
model
converges

CTO Connection 2021
P R I V A C Y P R E S E R V I N G M L T E C H N I Q U E S
Homomorphic Encryption
• Encrypted data, encrypted computations

• Zero-knowledge proof intermediate results

• May leak information when output is revealed
Differential Privacy
• Low-probability guarantees on the output

• Prevent linkages attacks
Secure MPC
• Zero-knowledge proof intermediate results

• No information is leaked through the transcript of a computation

• May leak information when other parties output is revealed
On-device computation
• Local data privacy

• Limitations due to computation or memory on device

• Reduced ability to aggregate data from multiple devices
Trusted Execution Environments
• Hardware isolated environment, Limited in memory

• Securing data and models

• Remote attestation
Federated Learning
• Decentralized, training takes longer, heterogenous

• Aggregate data from multiple devices / datasets, without revealing data

• Network transmission costs high for model downloads, gradient updates

CTO Connection 2021
Training
Server
...
⟳
⟳
⟳
Σ
Λ
Inference
...
⌃
⌃
⌃
Intermediate
model state
Encrypted model
update
Ephemeral model
update
Final model state
Intermediate
model state
Can we
improve?
W H A T C A N B E S E E N ?

CTO Connection 2021
E N D T O E N D P R I V A C Y P R E S E R V I N G S Y S T E M
Federated Learning (FL) FL+ Secure Enclaves + DP + Secure Aggregation
Device
• Intermediate Model State

• Still prone to remembering
• Clients get the secure enclave private key

• Client clips the model updates before adding random mask

• Model with DP noise
Server
• Ephemeral Model Updates

• Compromised server, can leak
details
• Logic for computing the sum of the masks and the DP noise
inside Secure enclaves with attestation

• Only-In-Aggregate Model Updates
Network • Encrypted Model Updates • Randomly masked Encrypted Ephemeral Model Updates
Developer • Intermediate Model State • Intermediate Model State with DP
Consumer/World • Final Model State • Final Model State - Model with user level DP

CTO Connection 2021
Tools & Techniques

CTO Connection 2021
AI Broad Guidelines and Considerations
PRIVACY BY DESIGN
Opt-In vs. Opt-Out

 
Making opt-in the default
approach.
Data Minimization

 
Collecting only the data
that is needed.
Limited Data Retention

Limiting the amount of time
that data is kept.
Transparency and
Education

 
Ensure consumers are
aware of processes that
use their data.
Privacy Review Boards

Ensure consumer privacy is
prioritized across the
organization.
Responsible
 
AI Principles
 
 
Committing AI development
to principles that the
company abides by.

CTO Connection 2021
Understand
Align
Mitigate
Monitor
Measure
Stakeholder conversations to find
 
consensus and outline measurement and
mitigation plans

Analyze model performance,
 
label bias, outcomes, and other
relevant signals
Address observed
 
issues in dataset,
 
models, policies, etc
How might the product’s goals, its policy,
and its implementation affect users from
different subgroups? Identify contextual
definitions of fairness

Monitor effect of mitigations on
 
subgroups, and ensure fairness
analysis holds as product adapts

PRIVACY BY DESIGN

CTO Connection 2021
TOOLS & LIBRARIES

CTO Connection 2021
CrypTen is a platform for research in machine learning + MPC

•BGW + Beaver triples
 
•PyTorch-based
 
•reverse-mode autograd
 
•GPU support
 
•import torch → import crypten as torch
 
•Designed to expose MPC in an API familiar to ML researchers that use PyTorch
https://crypten.ai/

CTO Connection 2021
Library that enables training PyTorch models with Di
ff
erential Privacy

•PyTorch-based
 
•Instantiate Privacy Engine and attach to Optimizer
 
•Vectorized per-sample gradient computation that is 10x faster than microbatching
 
•Cryptographically safe pseudo-random number generator

•Extensible API

https://opacus.ai/

CTO Connection 2021
“The mission of the OpenMined community is to create an accessible ecosystem of
privacy tools and education. We do this by extending popular libraries like PyTorch
with advanced techniques in cryptography and di
ff
erential privacy.”
 
 
“With OpenMined, people and organizations can host private datasets, allowing data
scientists to train or query on data they "cannot see". The data owners retain
complete control: data is never copied, moved, or shared.”

Remote Execution, Encrypted Computation, Di
ff
erential Privacy

PySyft, PyGrid, Duet, TenSEAL…

https://www.openmined.org/

CTO Connection 2021
OTHER INDUSTRY SOLUTIONS
Private Federated Learning
Azure Confidential
Computing

CTO Connection 2021
Getting Started Resources

CTO Connection 2021
Level1: Just Starting

• Intro course from courses.openmined.org

• Simple sample with Con
fi
dential VMs

Level2: Intermediate

• Experiment with Server side FL, DP, Secure MPC

• Use tools like OpenMined, Crypten, Opacus

Level3: Advanced

• Experiment with Secure Enclaves, combine multiple techniques

• Experiment with On-device training for Decentralized Distributed ML

Level4: Mature

• Advanced techniques for large scale on-device training HSL, VSL

• Sols for Adversarial attacks
WHERE TO START YOUR JOURNEY?

USE CASES
+ COVID-19 Sols

+ Cancer Research

+ Integrity (eg PhotoDNA project)

+ Federated AI across Enterprise Silos
 
+ What problems will you solve?

PAPERS WITH CODE
•Reproducible
Research - ArXiv
integration

•Datasets
 
•Federated
Learning task

https://paperswithcode.com/task/federated-learning

REFERENCES
• CrypTen: https://crypten.ai/

• CrypTen Tutorials: https://github.com/facebookresearch/CrypTen#how-crypten-works

• Opacus: https://ai.facebook.com/blog/introducing-opacus-a-high-speed-library-for-training-pytorch-mo
with-differential-privacy/

• Opacus Tutorials: https://opacus.ai/tutorials/

• Papers w/ Code- FL task: https://paperswithcode.com/task/federated-learning

• OpenMined for Covid-19 Apps: https://blog.openmined.org/providing-opensource-privacy-for-covid19/

• Udacity Course: https://www.udacity.com/course/secure-and-private-ai--ud185

• Private AI Series, OpenMined: https://courses.openmined.org/

• Active Federated Learning Paper: https://arxiv.org/pdf/1909.12641.pdf

• Fair Resource allocation in FL: https://arxiv.org/pdf/1905.10497.pdf

• Ditto: Fair & Robust FL through Personalization: https://arxiv.org/pdf/2012.04221.pdf

• Resilient: Failure resilient inference: https://arxiv.org/pdf/2002.07386.pdf

• Owkin: https://owkin.com/

CTO Connection 2021
QUESTIONS?

Contact:

Email: gchauhan@fb.com

Linkedin: https://www.linkedin.com/in/geetachauhan/

CTO Connections 2021
T H A N K Y O U
Big thanks to Brian Knott, Dzmitry Huba, Selena Chan, Ilya Mironov, Laurens Van Der
Maaten, Davide Testuggine, Joe Spisak, Shauna Keller, Christian Keller for inputs

Building AI with Security and Privacy in mind

More Related Content

What's hot

Similar to Building AI with Security and Privacy in mind

More from geetachauhan

Recently uploaded

Building AI with Security and Privacy in mind