How to Troubleshoot Apps for the Modern Connected Worker
Building AI with Security and Privacy in mind
1. 2 0 J U L Y 2 0 2 1
B U I L D I N G A I W I T H S E C U R I T Y A N D
P R I V A C Y I N M I N D
G E E T A C H A U H A N
PyTorch Partner Engineering, Facebook AI
@ C H A U H A N G
2. CTO Connection 2021
A G E N D A 0 1
P R I V A C Y C H A L L E N G E S I N A I
0 2
P R I V A C Y P R E S E R V I N G M L
0 3
T O O L S & T E C H N I Q U E S
0 4
S T E P S F O R S T A R T I N G Y O U R J O U R N E Y
5. CTO Connection 2021
• Privacy Tradeoff between protecting data privacy and training AI/ML models
• Tensions associated with data minimization and retention
• Proliferation of AI/ML models: heighten lack of public understanding
• As artificial intelligence evolves, it magnifies the ability to use information in ways that
can intrude on privacy interests
• Increasingly sensitive nature of data used for research raises other privacy challenges
• Sourcing of data that is free of bias
•
PRIVACY CHALLENGES IN AI
11. CTO Connection 2021
D I F F E R E N T I A L P R I V A C Y
Promise, made by a data holder, or a curator, to a data subject:
“ You will not be affected, adversely or otherwise, by allowing
your data to be used in any study or analysis, no matter what
other studies, data sets, or information sources, are available ”
~ Cynthia Dwork
12. CTO Connection 2021
D I F F E R E N T I A L P R I V A C Y
∀ D and D′ that di
ff
er in one person’s data ∀ x: ℙ[M(D) = x] ≤ exp(ε) ⋅ ℙ[M(D′) = x] +
𝛿
The distribution of the output M(D) on database D is (nearly) the same
as M(D′), where D and D′ di
ff
er in one person’s contributions.
Parameter ε quanti
fi
es
information leakage
Parameter
𝛿
gives some slack
(ε,
𝛿
)
13. CTO Connection 2021
Data corrupted
with noise
Function f(.)
Data corrupted
with noise
D I F F E R E N T I A L P R I V A C Y
14. CTO Connection 2021
Jointly compute function f(.)
Random numbers Random numbers
Trusted Third Party
Jointly compute function f(.)
Secret
share
of Data x
Secret
share
of Data x
SECURE MULTI-PARTY COMPUTATION
17. CTO Connection 2021
Function f(.)
in an enclave
Encrypted Data c
Data x
Output and function
attestation
TRUSTED EXECUTION ENVIRONMENT
18. CTO Connection 2021
F E D E R A T E D L E A R N I N G
Federated Learning enables devices to
collaboratively train global models with
privacy by default
26. CTO Connection 2021
P R I V A C Y P R E S E R V I N G M L T E C H N I Q U E S
Homomorphic Encryption
• Encrypted data, encrypted computations
• Zero-knowledge proof intermediate results
• May leak information when output is revealed
Differential Privacy
• Low-probability guarantees on the output
• Prevent linkages attacks
Secure MPC
• Zero-knowledge proof intermediate results
• No information is leaked through the transcript of a computation
• May leak information when other parties output is revealed
On-device computation
• Local data privacy
• Limitations due to computation or memory on device
• Reduced ability to aggregate data from multiple devices
Trusted Execution Environments
• Hardware isolated environment, Limited in memory
• Securing data and models
• Remote attestation
Federated Learning
• Decentralized, training takes longer, heterogenous
• Aggregate data from multiple devices / datasets, without revealing data
• Network transmission costs high for model downloads, gradient updates
28. CTO Connection 2021
E N D T O E N D P R I V A C Y P R E S E R V I N G S Y S T E M
Federated Learning (FL) FL+ Secure Enclaves + DP + Secure Aggregation
Device
• Intermediate Model State
• Still prone to remembering
• Clients get the secure enclave private key
• Client clips the model updates before adding random mask
• Model with DP noise
Server
• Ephemeral Model Updates
• Compromised server, can leak
details
• Logic for computing the sum of the masks and the DP noise
inside Secure enclaves with attestation
• Only-In-Aggregate Model Updates
Network • Encrypted Model Updates • Randomly masked Encrypted Ephemeral Model Updates
Developer • Intermediate Model State • Intermediate Model State with DP
Consumer/World • Final Model State • Final Model State - Model with user level DP
30. CTO Connection 2021
AI Broad Guidelines and Considerations
PRIVACY BY DESIGN
Opt-In vs. Opt-Out
Making opt-in the default
approach.
Data Minimization
Collecting only the data
that is needed.
Limited Data Retention
Limiting the amount of time
that data is kept.
Transparency and
Education
Ensure consumers are
aware of processes that
use their data.
Privacy Review Boards
Ensure consumer privacy is
prioritized across the
organization.
Responsible
AI Principles
Committing AI development
to principles that the
company abides by.
31. CTO Connection 2021
Understand
Align
Mitigate
Monitor
Measure
Stakeholder conversations to find
consensus and outline measurement and
mitigation plans
Analyze model performance,
label bias, outcomes, and other
relevant signals
Address observed
issues in dataset,
models, policies, etc
How might the product’s goals, its policy,
and its implementation affect users from
different subgroups? Identify contextual
definitions of fairness
Monitor effect of mitigations on
subgroups, and ensure fairness
analysis holds as product adapts
PRIVACY BY DESIGN
33. CTO Connection 2021
CrypTen is a platform for research in machine learning + MPC
•BGW + Beaver triples
•PyTorch-based
•reverse-mode autograd
•GPU support
•import torch → import crypten as torch
•Designed to expose MPC in an API familiar to ML researchers that use PyTorch
https://crypten.ai/
34. CTO Connection 2021
Library that enables training PyTorch models with Di
ff
erential Privacy
•PyTorch-based
•Instantiate Privacy Engine and attach to Optimizer
•Vectorized per-sample gradient computation that is 10x faster than microbatching
•Cryptographically safe pseudo-random number generator
•Extensible API
https://opacus.ai/
35. CTO Connection 2021
“The mission of the OpenMined community is to create an accessible ecosystem of
privacy tools and education. We do this by extending popular libraries like PyTorch
with advanced techniques in cryptography and di
ff
erential privacy.”
“With OpenMined, people and organizations can host private datasets, allowing data
scientists to train or query on data they "cannot see". The data owners retain
complete control: data is never copied, moved, or shared.”
Remote Execution, Encrypted Computation, Di
ff
erential Privacy
PySyft, PyGrid, Duet, TenSEAL…
https://www.openmined.org/
36. CTO Connection 2021
OTHER INDUSTRY SOLUTIONS
Private Federated Learning
Azure Confidential
Computing
38. CTO Connection 2021
Level1: Just Starting
• Intro course from courses.openmined.org
• Simple sample with Con
fi
dential VMs
Level2: Intermediate
• Experiment with Server side FL, DP, Secure MPC
• Use tools like OpenMined, Crypten, Opacus
Level3: Advanced
• Experiment with Secure Enclaves, combine multiple techniques
• Experiment with On-device training for Decentralized Distributed ML
Level4: Mature
• Advanced techniques for large scale on-device training HSL, VSL
• Sols for Adversarial attacks
WHERE TO START YOUR JOURNEY?
39. USE CASES
+ COVID-19 Sols
+ Cancer Research
+ Integrity (eg PhotoDNA project)
+ Federated AI across Enterprise Silos
+ What problems will you solve?
43. CTO Connections 2021
T H A N K Y O U
Big thanks to Brian Knott, Dzmitry Huba, Selena Chan, Ilya Mironov, Laurens Van Der
Maaten, Davide Testuggine, Joe Spisak, Shauna Keller, Christian Keller for inputs