Privacy in Machine
Learning
Sharmistha Chatterjee
Senior Manager Data Sciences
at Publicis Sapient
Author | Speaker | GDE for ML
@sharmichat82
TensorFlow Encrypted and Differential Privacy
Key Takeaways
Privacy Protection with AI
Impact of Privacy Breach
Attacks Types and Need for Protection in ML
TensorFlow Packages for Privacy in AI ( Standalone and Federated )
Demo
Classification Model Neural Structured Learning
Tensorflow Encrypted –Federated Learning
Differential Privacy & Membership Inference Attacks
Are Companies Data Safe?
Penalties Incurred
“The magnitude of this penalty
resets the baseline for privacy
cases — including for any
future violation by Facebook —
and sends a strong message to
every company in America that
collects consumers” ------
cnbc.com
Source - https://www.statista.com/chart/18805/highest-penalties-in-privacy-enforcement-actions-worldwide/
Different Types of Attacks
Private Information Leakage
Source - Simple Demographics Often Identify People Uniquely
Different Types of Attacks
Private Information Leakage
● Syntactic Attacks
● Linking Attacks
● Machine Learning
Attacks
Anonymized Netflix DB + Publicly
available IBDB Ratings
Isolation Attack
Source -https://course.ece.cmu.edu/~ece734/lectures/lecture-2018-10-08-deanonymization.pdf
Different Types of Attacks
Where and How
● Model API (Training and Prediction) based attacks at Cloud
● Membership Inference Attacks
● Model Inversion and Model Reconstruction attacks
Model Output
Raw data (db,
cache)
ML Model
Feature Vectors
(SVM, KNN)
Reconstruction Attacks
(White-Box)
Model inversion Attacks (Black Box)
Feature Extraction
Training
Standalone vs Federated Attacks
Learning by differencing two datasets Active Inference Attacks
● Updates data features towards
ascending gradients of global/local
model
● Non-member instances doesn’t
change gradients
f 𝑓𝛿
𝐷 𝛿𝐷
● Measure Information leakage from
f∆ of D∪D∆
Add and modify new
Dataset
Source - Comprehensive Privacy Analysis of Deep Learning https://arxiv.org/pdf/1812.00910.pdf
Privacy Protection Bills and Acts
Integrity, Confidentiality, Authentication, Authorization, Non-repudiation, Availability
Health Insurance Portability
and Accountability
Act (HIPAA)
NIST 800-171,
The Gramm-Leach-Bliley
Act (GLB Act or GLBA)
Federal Information Security
Management Act (FISMA)
GDPR (General
Data Protection
Regulation)
EU-US and
Swiss-US
Privacy Shield
Framework
Source - https://wfanet.org/knowledge/item/2018/11/28/GDPR-the-emergence-of-a-global-standard-on-privacy/
Privacy Should Not
Be a Luxury Good
Sundar Pichai
Yes, we use data to make
products more helpful for
everyone. But we also
protect your information
Source - https://www.nytimes.com/2019/05/07/opinion/google-sundar-pichai-privacy.html
Indirect
Identifier
(DOB, Zip Code,
medical Record
No, Ip Address,
geo-Location)
Direct
Identifier
(Name, Address,
Social Security
Number)
Data linked
to multiple
individuals
(Movie preferences,
Retail Preferences)
Data Anonymization at different Levels
Personal Data Anonymization Conventional techniques
Data not linked
to any
individual
(Census data, Survey)
Data related
to
individuals
(Weather)
● K-Anonymity
● L-Diversity
● T-Closeness
Source : https://georgetownlawtechreview.org/re-identification-of-anonymized-data/GLTR-04-2017/
Neural Structured Learning Framework with TensorFlow
Train with Adversarial Perturbations
Source : https://www.tensorflow.org/neural_structured_learning/framework#step-by-step_tutori
● Higher Accuracy
● Robustness against Adversarial
attacks (Accuracy of 76.2% vs
79.3%)
● Less Labelled Data
Neural Structured Learning with TensorFlow
Adversarial Perturbations on Training with Sensitive Personal Information
● AdvRegConfig
● AdvNeighborConfig
● AdversarialRegulariza
Encryption in Federated Learning
Secure Computation using TensorFlow
Local Model
Federated Server
Global Model
Private Data
Devices
Controlled Data Access
Risk –Store and Process
Source : https://blog.tensorflow.org/2019/03/introducing-tensorflow-federated.html
● Encrypted Data set
● Encrypted Training
● Encrypted Prediction
TensorFlow Encrypted
Public Training vs Private Training
Source : https://conferences.oreilly.com/artificial-intelligence/ai-ny-
2019/cdn.oreillystatic.com/en/assets/1/event/291/Privacy-
preserving%20machine%20learning%20in%20TensorFlow%20with%20TF%20Encrypted%20Presentation.pdf
Differential Privacy
Scalable Distributed Systems
● Encode general patterns
● Secrecy/privacy in general information
● Protection against differencing attack,
lineage attacks, and reconstruction
attacks
Dataset Analysis and
Statistics (Mean, Variance,
Median)
No
Information
Leakage
Suitable for
large datasets
Abstraction – No
behavior change
on
addition/removal
of point
Differentially Private Machine Learning Models
Metrics
Two adjacent datasets differing in a single
individual must be statistically
indistinguishable
Sensitivity/Amount of Noise Added
𝜃~ 𝑝 𝐷
Privacy Budget/Number of Queries Answered
Dataset D
Dataset D’
𝜃~ 𝑝 𝐷′
Why Bolton DP – Trust and Performance
Post Convergence Noise Injection
Aggregated Model
17
DB
DB
DB
NOISE
TensorFlow Encrypted with Differential Privacy
Hybrid Distributed Systems
Model Owner
Server-2
Server-1
TF-Encrypted
DP
Multi-party Communication (share, serve, shutdown)
hook = sy.KerasHook(tf.keras)
Distributed Trust and Control
Mechanism with TensorFlow Privacy
Properties
Gaussian Noise
(Sum of Squares)
Privacy Vs Utility
Trade-off
Joint clipping of
gradients Sampling policies
minibatches
Privacy group of
vectors
Gradient Descent optimizer
(Gradient Noising)
Demo TensorFlow Privacy
Use of Renyi Differential Privacy
Selecting and
Configuring
Privacy
Mechanisms
Training
Account
Privacy
loss (𝜀, 𝛿)
A randomized mechanism f → R is said to
have
Renyi Differential privacy of order 𝛼, or
(𝛼, ∈)- RDP, if for any adjacent D, D’ it
holds that : 𝐷 𝛼 (f(D) || f(D’)) ≤ 𝜀
Source - Machine Learning with Differential Privacy in TensorFlow :
http://www.cleverhans.io/privacy/2019/03/26/machine-learning-with-differential-privacy-in-tensorflow.html
Conclusion
Consumable API driven AI solution
● Use AI to build Scalable Distributed Systems
● Combine with Privacy at Deployment
Data Composition
Data Obfuscation
Data Reconstruction
Cloud Pub-Sub
Cloud Dataflow
Cloud Storage
Big Query
AI Modeling
References
● https://github.com/sharmi1206/nsl_personalinfo_classification
● https://github.com/sharmi1206/Membership_Inference_Attack_DP
● http://techairesearch/blogs
● https://github.com/OpenMined/PySyft
● https://github.com/tf-encrypted/tf-encrypted
● https://github.com/sharmi1206/differential-privacy-tensorflow
● https://conferences.oreilly.com/artificial-intelligence/ai-ny-
2019/cdn.oreillystatic.com/en/assets/1/event/291/Privacy-
preserving%20machine%20learning%20in%20TensorFlow%20with%20TF%20Encrypted%2
0Presentation.pdf
● https://github.com/work-hard-play-harder/DP-MIA
● https://arxiv.org/pdf/1812.00910.pdf
● https://www.cs.cornell.edu/~shmat/shmat_oak17.pdf
● https://arxiv.org/abs/1810.08130
● https://medium.com/dropoutlabs/encrypted-deep-learning-training-and-predictions-with-
tf-encrypted-keras-557193284f44
Appendix-Membership Inference Attacks with
TensorFlow Privacy
Data Record,
Class Label
Target
Model
Class
Label
Attack
Model
Data record
in training
set?
Bolton DP vs DP
Customized Model
24
Bolton DP
Properties
25
● Convex Loss
● Output Perturbation
● Input relationships preserved
● Maximize Utility
● Less runtime overhead
● Better convergence
Questions?
Follow http://techairesearch.com/blogs

Gde privacy tf_summit

  • 1.
    Privacy in Machine Learning SharmisthaChatterjee Senior Manager Data Sciences at Publicis Sapient Author | Speaker | GDE for ML @sharmichat82 TensorFlow Encrypted and Differential Privacy
  • 2.
    Key Takeaways Privacy Protectionwith AI Impact of Privacy Breach Attacks Types and Need for Protection in ML TensorFlow Packages for Privacy in AI ( Standalone and Federated ) Demo Classification Model Neural Structured Learning Tensorflow Encrypted –Federated Learning Differential Privacy & Membership Inference Attacks
  • 3.
    Are Companies DataSafe? Penalties Incurred “The magnitude of this penalty resets the baseline for privacy cases — including for any future violation by Facebook — and sends a strong message to every company in America that collects consumers” ------ cnbc.com Source - https://www.statista.com/chart/18805/highest-penalties-in-privacy-enforcement-actions-worldwide/
  • 4.
    Different Types ofAttacks Private Information Leakage Source - Simple Demographics Often Identify People Uniquely
  • 5.
    Different Types ofAttacks Private Information Leakage ● Syntactic Attacks ● Linking Attacks ● Machine Learning Attacks Anonymized Netflix DB + Publicly available IBDB Ratings Isolation Attack Source -https://course.ece.cmu.edu/~ece734/lectures/lecture-2018-10-08-deanonymization.pdf
  • 6.
    Different Types ofAttacks Where and How ● Model API (Training and Prediction) based attacks at Cloud ● Membership Inference Attacks ● Model Inversion and Model Reconstruction attacks Model Output Raw data (db, cache) ML Model Feature Vectors (SVM, KNN) Reconstruction Attacks (White-Box) Model inversion Attacks (Black Box) Feature Extraction Training
  • 7.
    Standalone vs FederatedAttacks Learning by differencing two datasets Active Inference Attacks ● Updates data features towards ascending gradients of global/local model ● Non-member instances doesn’t change gradients f 𝑓𝛿 𝐷 𝛿𝐷 ● Measure Information leakage from f∆ of D∪D∆ Add and modify new Dataset Source - Comprehensive Privacy Analysis of Deep Learning https://arxiv.org/pdf/1812.00910.pdf
  • 8.
    Privacy Protection Billsand Acts Integrity, Confidentiality, Authentication, Authorization, Non-repudiation, Availability Health Insurance Portability and Accountability Act (HIPAA) NIST 800-171, The Gramm-Leach-Bliley Act (GLB Act or GLBA) Federal Information Security Management Act (FISMA) GDPR (General Data Protection Regulation) EU-US and Swiss-US Privacy Shield Framework Source - https://wfanet.org/knowledge/item/2018/11/28/GDPR-the-emergence-of-a-global-standard-on-privacy/
  • 9.
    Privacy Should Not Bea Luxury Good Sundar Pichai Yes, we use data to make products more helpful for everyone. But we also protect your information Source - https://www.nytimes.com/2019/05/07/opinion/google-sundar-pichai-privacy.html
  • 10.
    Indirect Identifier (DOB, Zip Code, medicalRecord No, Ip Address, geo-Location) Direct Identifier (Name, Address, Social Security Number) Data linked to multiple individuals (Movie preferences, Retail Preferences) Data Anonymization at different Levels Personal Data Anonymization Conventional techniques Data not linked to any individual (Census data, Survey) Data related to individuals (Weather) ● K-Anonymity ● L-Diversity ● T-Closeness Source : https://georgetownlawtechreview.org/re-identification-of-anonymized-data/GLTR-04-2017/
  • 11.
    Neural Structured LearningFramework with TensorFlow Train with Adversarial Perturbations Source : https://www.tensorflow.org/neural_structured_learning/framework#step-by-step_tutori ● Higher Accuracy ● Robustness against Adversarial attacks (Accuracy of 76.2% vs 79.3%) ● Less Labelled Data
  • 12.
    Neural Structured Learningwith TensorFlow Adversarial Perturbations on Training with Sensitive Personal Information ● AdvRegConfig ● AdvNeighborConfig ● AdversarialRegulariza
  • 13.
    Encryption in FederatedLearning Secure Computation using TensorFlow Local Model Federated Server Global Model Private Data Devices Controlled Data Access Risk –Store and Process Source : https://blog.tensorflow.org/2019/03/introducing-tensorflow-federated.html ● Encrypted Data set ● Encrypted Training ● Encrypted Prediction
  • 14.
    TensorFlow Encrypted Public Trainingvs Private Training Source : https://conferences.oreilly.com/artificial-intelligence/ai-ny- 2019/cdn.oreillystatic.com/en/assets/1/event/291/Privacy- preserving%20machine%20learning%20in%20TensorFlow%20with%20TF%20Encrypted%20Presentation.pdf
  • 15.
    Differential Privacy Scalable DistributedSystems ● Encode general patterns ● Secrecy/privacy in general information ● Protection against differencing attack, lineage attacks, and reconstruction attacks Dataset Analysis and Statistics (Mean, Variance, Median) No Information Leakage Suitable for large datasets Abstraction – No behavior change on addition/removal of point
  • 16.
    Differentially Private MachineLearning Models Metrics Two adjacent datasets differing in a single individual must be statistically indistinguishable Sensitivity/Amount of Noise Added 𝜃~ 𝑝 𝐷 Privacy Budget/Number of Queries Answered Dataset D Dataset D’ 𝜃~ 𝑝 𝐷′
  • 17.
    Why Bolton DP– Trust and Performance Post Convergence Noise Injection Aggregated Model 17 DB DB DB NOISE
  • 18.
    TensorFlow Encrypted withDifferential Privacy Hybrid Distributed Systems Model Owner Server-2 Server-1 TF-Encrypted DP Multi-party Communication (share, serve, shutdown) hook = sy.KerasHook(tf.keras) Distributed Trust and Control
  • 19.
    Mechanism with TensorFlowPrivacy Properties Gaussian Noise (Sum of Squares) Privacy Vs Utility Trade-off Joint clipping of gradients Sampling policies minibatches Privacy group of vectors Gradient Descent optimizer (Gradient Noising)
  • 20.
    Demo TensorFlow Privacy Useof Renyi Differential Privacy Selecting and Configuring Privacy Mechanisms Training Account Privacy loss (𝜀, 𝛿) A randomized mechanism f → R is said to have Renyi Differential privacy of order 𝛼, or (𝛼, ∈)- RDP, if for any adjacent D, D’ it holds that : 𝐷 𝛼 (f(D) || f(D’)) ≤ 𝜀 Source - Machine Learning with Differential Privacy in TensorFlow : http://www.cleverhans.io/privacy/2019/03/26/machine-learning-with-differential-privacy-in-tensorflow.html
  • 21.
    Conclusion Consumable API drivenAI solution ● Use AI to build Scalable Distributed Systems ● Combine with Privacy at Deployment Data Composition Data Obfuscation Data Reconstruction Cloud Pub-Sub Cloud Dataflow Cloud Storage Big Query AI Modeling
  • 22.
    References ● https://github.com/sharmi1206/nsl_personalinfo_classification ● https://github.com/sharmi1206/Membership_Inference_Attack_DP ●http://techairesearch/blogs ● https://github.com/OpenMined/PySyft ● https://github.com/tf-encrypted/tf-encrypted ● https://github.com/sharmi1206/differential-privacy-tensorflow ● https://conferences.oreilly.com/artificial-intelligence/ai-ny- 2019/cdn.oreillystatic.com/en/assets/1/event/291/Privacy- preserving%20machine%20learning%20in%20TensorFlow%20with%20TF%20Encrypted%2 0Presentation.pdf ● https://github.com/work-hard-play-harder/DP-MIA ● https://arxiv.org/pdf/1812.00910.pdf ● https://www.cs.cornell.edu/~shmat/shmat_oak17.pdf ● https://arxiv.org/abs/1810.08130 ● https://medium.com/dropoutlabs/encrypted-deep-learning-training-and-predictions-with- tf-encrypted-keras-557193284f44
  • 23.
    Appendix-Membership Inference Attackswith TensorFlow Privacy Data Record, Class Label Target Model Class Label Attack Model Data record in training set?
  • 24.
    Bolton DP vsDP Customized Model 24
  • 25.
    Bolton DP Properties 25 ● ConvexLoss ● Output Perturbation ● Input relationships preserved ● Maximize Utility ● Less runtime overhead ● Better convergence
  • 26.