SlideShare a Scribd company logo
1 of 26
Alleviating Privacy Attacks via
Causal Learning
Shruti Tople, Amit Sharma, Aditya V. Nori
Microsoft Research
https://arxiv.org/abs/1909.12732
https://github.com/microsoft/robustdg
Motivation: ML models leak information
about data points in the training set
Neural
Network
TrainingHealth Records
(HIV/AIDS
patients)
ML-as-a-service
Member of
Train Dataset
Non-member
Membership Inference Attacks
[SP’17][CSF’18][NDSS’19][SP’19]
The likely reason is overfitting
Output
85%
Output
95%
Overfitting to
dataset
• Neural networks or associational models
overfit to the training dataset
• Membership inference adversary exploits
differences in prediction score for training and
test data [CSF’18]
Overfitting to
distribution
The likely reason is overfitting
• Neural networks or associational models
overfit to the training dataset
• Membership inference attacks exploit
differences in prediction score for training and
test data [CSF’18]
• Privacy risk can increase when model is
deployed to different distributions
• E.g., Hospital in one region shares the model to
other regions
Output
85%
Output
95%
Overfitting to
dataset
Output
75%
Poor generalization across distributions exacerbates
membership inference risk.
Can causal ML
models help?
Can causal ML models help?
Contributions
1. Causal models provide stronger (differential) privacy guarantees than
associational models.
• Due to their better generalizability on new distributions.
2. And hence are more robust to membership inference attacks.
• As the training dataset size → ∞, membership inference attack’s accuracy drops to a
random guess.
3. We empirically demonstrate privacy benefits of causal models across 5 datasets.
• Associational models exhibit up to 80% attack accuracy whereas causal models exhibit
attack accuracy close to 50%.
Causal
Learning
Privacy
Disease
Severity
Background: Causal Learning
𝒀
Blood
Pressure
Heart
Rate
𝑿 𝒑𝒂𝒓𝒆𝒏𝒕 𝑿 𝒑𝒂𝒓𝒆𝒏𝒕
𝑿 𝟏 𝑿 𝟐
Weight Age
Use a structural causal model (SCM) that defines what
conditional probabilities are invariant across different
distributions [Pearl’09].
Background: Causal Learning
Use a structural causal model (SCM) that defines what
conditional probabilities are invariant across different
distributions [Pearl’09].
Causal Predictive Model: A prediction model based only
on the parents of the outcome Y.
What if SCM is not known? Learn an invariant feature
representation across distributions [ABGD’19, MTS’20].
For ML models, causal learning can be useful for
fairness [KLRS’17]
explainability [DSZ’16, MTS’19]
privacy [this work]
Disease
Severity
𝒀
Blood
Pressure
Heart
Rate
𝑿 𝒑𝒂𝒓𝒆𝒏𝒕 𝑿 𝒑𝒂𝒓𝒆𝒏𝒕
𝑿 𝟏 𝑿 𝟐
Weight Age
𝒀
𝑋𝑆0 𝑋 𝑃𝐴
𝑋𝑆2
𝑋𝑆1
𝑋 𝐶𝐻
𝑋𝑐𝑝
Intervention
Why is a model based on causal parents
invariant across data distributions?
Why is a model based on causal parents
invariant across data distributions?
𝒀
𝑋𝑆0 𝑋 𝑃𝐴
𝑋𝑆2
𝑋𝑆1
𝑋 𝐶𝐻
𝑋𝑐𝑝
Intervention
𝒀
𝑋𝑆0 𝑋 𝑃𝐴
𝑋𝑆2
𝑋𝑆1
𝑋 𝐶𝐻
𝑋𝑐𝑝
𝑃(𝑌|𝑋 𝑃𝐴) is invariant across different distributions, unless there is a
change in true data-generating process for Y.
Result 1: Worst-case out-of-distribution error of a
causal model is lower than an associational model.
For any model ℎ, and 𝑃∗ such that 𝑃∗ 𝑌 𝑋 𝑃𝐴 = 𝑃(𝑌|𝑋 𝑃𝐴),
In-Distribution Error (IDE)= 𝐈𝐃𝐄 𝐏 𝒉, 𝒚 = 𝐋 𝑷 𝒉, 𝒚 − 𝐋 𝑺∼P(𝒉, 𝒚)
Expected loss on the same distribution as the train data
Out-of-Distribution Error (ODE)=𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉, 𝒚 = 𝐋 𝑷∗ 𝒉, 𝒚 − 𝐋 𝑺∼P 𝒉, 𝒚
Expected loss on a different distribution 𝑃∗
than the train data
Result 1: Worst-case out-of-distribution error of a
causal model is lower than an associational model.
For any model ℎ, and 𝑃∗ such that 𝑃∗ 𝑌 𝑋 𝑃𝐴 = 𝑃(𝑌|𝑋 𝑃𝐴),
In-Distribution Error (IDE)= 𝐈𝐃𝐄 𝐏 𝒉, 𝒚 = 𝐋 𝑷 𝒉, 𝒚 − 𝐋 𝑺∼P(𝒉, 𝒚)
Expected loss on the same distribution as the train data
Out-of-Distribution Error (ODE)=𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉, 𝒚 = 𝐋 𝑷∗ 𝒉, 𝒚 − 𝐋 𝑺∼P 𝒉, 𝒚
Expected loss on a different distribution 𝑃∗
than the train data
Proof Idea. Simple case: Assume 𝑦 = 𝑓(𝒙) is deterministic.
𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉 𝐜, 𝒚 ≤ 𝐈𝐃𝐄 𝐏(𝒉 𝒄, 𝒚) + 𝒅𝒊𝒔𝒄 𝐋 𝑷, 𝑷∗
Discrepancy
b/w 𝑷 and 𝑷∗
distributions
Causal Model
Result 1: Worst-case out-of-distribution error of a
causal model is lower than an associational model.
For any model ℎ, and 𝑃∗ such that 𝑃∗ 𝑌 𝑋 𝑃𝐴 = 𝑃(𝑌|𝑋 𝑃𝐴),
In-Distribution Error (IDE)= 𝐈𝐃𝐄 𝐏 𝒉, 𝒚 = 𝐋 𝑷 𝒉, 𝒚 − 𝐋 𝑺∼P(𝒉, 𝒚)
Expected loss on the same distribution as the train data
Out-of-Distribution Error (ODE)=𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉, 𝒚 = 𝐋 𝑷∗ 𝒉, 𝒚 − 𝐋 𝑺∼P 𝒉, 𝒚
Expected loss on a different distribution 𝑃∗
than the train data
Proof Idea. Simple case: Assume 𝑦 = 𝑓(𝒙) is deterministic.
𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉 𝐜, 𝒚 ≤ 𝐈𝐃𝐄 𝐏(𝒉 𝒄, 𝒚) + 𝒅𝒊𝒔𝒄 𝐋 𝑷, 𝑷∗
𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉 𝒂, 𝒚 ≤ 𝐈𝐃𝐄 𝐏 𝒉 𝒂, 𝒚 + 𝒅𝒊𝒔𝒄 𝐋 𝑷, 𝑷∗
+ 𝐋 𝑷∗(𝒉 𝒂,𝑷
𝑶𝑷𝑻
, 𝒚)
⇒ max
𝐏∗
𝐎𝐃𝐄𝐁𝐨𝐮𝐧𝐝 𝐏,𝐏∗ 𝒉 𝐜, 𝒚 ≤ max
𝐏∗
𝐎𝐃𝐄𝐁𝐨𝐮𝐧𝐝 𝐏,𝐏∗ 𝒉 𝒂, 𝒚
Discrepancy
b/w 𝑷 and 𝑷∗
distributions
Optimal 𝒉 𝒂 on P is
not optimal on 𝑷∗
Causal Model
Assoc. Model
Result 1: Worst-case out-of-distribution error of a
causal model is lower than an associational model.
And better generalization results in lower
sensitivity for a causal model
Sensitivity: If a single data point 𝒙, 𝑦 ∼ 𝑃∗ is added to the train
dataset 𝑆 to create 𝑆′, how much does the learnt model h 𝑆
min
change?
Since the optimal causal model is the same across all 𝑃∗
, adding
any 𝒙, 𝑦 ∼ 𝑃∗ has less impact on a trained causal model.
Sensitivity for a causal
model
Sensitivity for an
associational model
Main Result: A causal model has stronger
Differential Privacy guarantees
Let M be a mechanism that returns a ML model trained over dataset 𝑆, M(𝑆) = ℎ.
Differential Privacy [DR’14]: A learning mechanism M satisfies 𝜖-differential
privacy if for any two datasets, 𝑆, 𝑆′ that differ in one data point,
Pr(M 𝑆 ∈𝐻)
Pr(M 𝑆′ ∈𝐻)
≤ 𝑒 𝜖.
(Smaller 𝜖 values provide better privacy guarantees)
Since lower sensitivity ⇒ lower 𝜖,
Theorem: When equivalent Laplace noise is added and models are trained on same
dataset, causal mechanism MC provides 𝜖 𝐶-DP and associational mechanism MA
provides 𝜖 𝐴-DP guarantees such that:
𝝐 𝒄 ≤ 𝝐 𝑨
Therefore, causal models are more robust to
membership inference (MI) attacks
Advantage of an MI adversary:
(True Positive Rate – False Positive Rate)
in detecting whether 𝑥 is from training dataset or not.
[From Yeom et al. CSF’18] Membership advantage of an adversary is bounded by
𝑒 𝜖
− 1.
Since the optimal causal models are the same for 𝑃 and 𝑃∗,
As 𝑛 → ∞, membership advantage of causal model → 0.
Theorem: When trained on the same dataset of size 𝑛, membership
advantage of a causal model is lower than the membership advantage for an
associational model.
Empirical
Evaluation
Goal: Compare MI attack accuracy between
causal and associational models
[BN] When true causal structure is known
Datasets generated from Bayesian networks: Child, Sachs, Water, Alarm
Causal model: MLE estimation based on Y’s parents
Associational model: Neural networks with 3 linear layers
𝑃∗: Noise added to conditional probabilities (uniform or additive)
[MNIST] When true causal structure is unknown
Colored MNIST dataset (Digits are correlated with color)
Causal Model: Invariant Risk Minimization that utilizes 𝑃 𝑌 𝑋 𝑃𝐴 is same across distributions [ABGD’19]
Associational Model: Empirical Risk Minimization using the same NN architecture
𝑃∗: Different correlations between color and digit than the train dataset
Attacker Model: Predict whether an input belongs to train dataset or not
[BN] With uniform noise, MI attack accuracy
for a causal model is near a random guess
80%
50%
For associational models, the attacker can guess membership in training set with 80% accuracy.
[BN-Child] With uniform noise, MI attack accuracy
for a causal model is near a random guess
80%
50%
For associational models, the attacker can guess membership in training set with 80% accuracy.
Privacy without loss in utility: Causal & DNN models achieve same prediction accuracy.
[BN-Child] MI Attack accuracy increases with
amount of noise for associational models, but
stays constant at 50% for causal models
[BN] Consistent results across all four datasets
High attack accuracy for associational
models when 𝑃∗
(Test2) has uniform noise.
Same classification accuracy between
causal and associational models.
[MNIST] MI attack accuracy is lower for invariant
risk minimizer compared to associational model
IRM model motivated by causal reasoning has 53% attack accuracy, close to random.
Associational model also fails to generalize: 16% accuracy on test set.
Model
Train
Accuracy
(%)
Test
Accuracy
(%)
Attack
Accuracy
(%)
Causal Model
(IRM)
70 69 53
Associational
Model (ERM)
87 16 66
Conclusion
• Established theoretical connection between causality and differential privacy.
• Demonstrated the benefits of causal ML models for alleviating privacy attacks,
both theoretically and empirically.
• Code available at https://github.com/microsoft/robustdg
Future work: Investigate robustness of causal models with other kinds of
adversarial attacks.
Causal
Learning
Privacy
thank you!
Amit Sharma
Microsoft Research
References
• [ABGD’19] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv
preprint arXiv:1907.02893, 2019.
• [CSF’18] Yeom, S., Giacomelli, I., Fredrikson, M., and Jha, S. Privacy risk in machine learning: Analyzing the connection
to overfitting. CSF 2018.
• [DR’14] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and
Trends in Theoretical Computer Science, 9(3–4):211–407, 2014.
• [DSZ’16] Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic transparency via quantitative input influence: Theory
and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on, pp. 598–617. IEEE,
2016
• [KLRS’17] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in
Neural Information Processing Systems, pp. 4066–4076, 2017.
• [MTS’19] Mahajan, Divyat, Chenhao Tan, and Amit Sharma. "Preserving Causal Constraints in Counterfactual
Explanations for Machine Learning Classifiers." arXiv preprint arXiv:1912.03277 (2019).
• [MTS’20] Mahajan, Divyat, Shruti Tople and Amit Sharma. “Domain Generalization using Causal Matching”. arXiv
preprint arXiv:2006.07500, 2020.
• [NDSS’19] Salem, A., Zhang, Y., Humbert, M., Fritz, M., and Backes, M. Ml-leaks: Model and data independent
membership inference attacks and defenses on machine learning models. NDSS 2019.
• [SP’17] Shokri, R., Stronati, M., Song, C., and Shmatikov, V. Membership inference attacks against machine learning
models. Security and Privacy (SP), 2017.
• [SP’19] Nasr, M., Shokri, R., and Houmansadr, A. Comprehensive privacy analysis of deep learning: Stand-alone and
federated learning under passive and active white-box inference attacks. Security and Privacy (SP), 2019.

More Related Content

What's hot

Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble AlgorithmsSara Hooker
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleAmit Sharma
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?Galit Shmueli
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear RegressionSara Hooker
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal researchGalit Shmueli
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep DiveSara Hooker
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision TreesSara Hooker
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predictGalit Shmueli
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparationSara Hooker
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing DataDataCards
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
 

What's hot (18)

Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...Repurposing Classification & Regression Trees for Causal Research with High-D...
Repurposing Classification & Regression Trees for Causal Research with High-D...
 
Causal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scaleCausal data mining: Identifying causal effects at scale
Causal data mining: Identifying causal effects at scale
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Module 3: Linear Regression
Module 3:  Linear RegressionModule 3:  Linear Regression
Module 3: Linear Regression
 
Repurposing predictive tools for causal research
Repurposing predictive tools for causal researchRepurposing predictive tools for causal research
Repurposing predictive tools for causal research
 
sigir2018tutorial
sigir2018tutorialsigir2018tutorial
sigir2018tutorial
 
Module 2: Machine Learning Deep Dive
Module 2:  Machine Learning Deep DiveModule 2:  Machine Learning Deep Dive
Module 2: Machine Learning Deep Dive
 
sigir2020
sigir2020sigir2020
sigir2020
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Module 5: Decision Trees
Module 5: Decision TreesModule 5: Decision Trees
Module 5: Decision Trees
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predict
 
Module 1.2 data preparation
Module 1.2  data preparationModule 1.2  data preparation
Module 1.2 data preparation
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
 

Similar to Causal Learning Boosts Privacy for ML Models

Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)MeetupDataScienceRoma
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
Summary.ppt
Summary.pptSummary.ppt
Summary.pptbutest
 
DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...
DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...
DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...IJCI JOURNAL
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and rPhilip Ramsey
 
Dealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdfDealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdfNagaVarthini
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingColleen Farrelly
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension systemMasa Kato
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Download It
Download ItDownload It
Download Itbutest
 
Predire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataPredire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataData Driven Innovation
 
PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsColleen Farrelly
 
AIAA-SDM-PEMF-2013
AIAA-SDM-PEMF-2013AIAA-SDM-PEMF-2013
AIAA-SDM-PEMF-2013OptiModel
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018Philip Ramsey
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningIOSR Journals
 

Similar to Causal Learning Boosts Privacy for ML Models (20)

Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)Adversarial examples in deep learning (Gregory Chatel)
Adversarial examples in deep learning (Gregory Chatel)
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Summary.ppt
Summary.pptSummary.ppt
Summary.ppt
 
DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...
DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...
DATA-LEVEL HYBRID STRATEGY SELECTION FOR DISK FAULT PREDICTION MODEL BASED ON...
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
Modeling strategies for definitive screening designs using jmp and r
Modeling strategies for definitive  screening designs using jmp and rModeling strategies for definitive  screening designs using jmp and r
Modeling strategies for definitive screening designs using jmp and r
 
Dealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdfDealing with imbalanced data sets.pdf
Dealing with imbalanced data sets.pdf
 
Morse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk ModelingMorse-Smale Regression for Risk Modeling
Morse-Smale Regression for Risk Modeling
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Adversarial examples reading comprehension system
Adversarial examples reading comprehension systemAdversarial examples reading comprehension system
Adversarial examples reading comprehension system
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Download It
Download ItDownload It
Download It
 
Predire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big DataPredire il futuro con Machine Learning & Big Data
Predire il futuro con Machine Learning & Big Data
 
PyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear ModelsPyData Miami 2019, Quantum Generalized Linear Models
PyData Miami 2019, Quantum Generalized Linear Models
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
AIAA-SDM-PEMF-2013
AIAA-SDM-PEMF-2013AIAA-SDM-PEMF-2013
AIAA-SDM-PEMF-2013
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018
 
Predictive data analytics models and their applications
Predictive data analytics models and their applicationsPredictive data analytics models and their applications
Predictive data analytics models and their applications
 
Multi-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data MiningMulti-Cluster Based Approach for skewed Data in Data Mining
Multi-Cluster Based Approach for skewed Data in Data Mining
 

More from Amit Sharma

DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolAmit Sharma
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceAmit Sharma
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactAmit Sharma
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsAmit Sharma
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAmit Sharma
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data scienceAmit Sharma
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesAmit Sharma
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesAmit Sharma
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsAmit Sharma
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comAmit Sharma
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsAmit Sharma
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsAmit Sharma
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...Amit Sharma
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferencesAmit Sharma
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...Amit Sharma
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationAmit Sharma
 

More from Amit Sharma (16)

DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
 
The Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practiceThe Impact of Computing Systems | Causal inference in practice
The Impact of Computing Systems | Causal inference in practice
 
Artificial Intelligence for Societal Impact
Artificial Intelligence for Societal ImpactArtificial Intelligence for Societal Impact
Artificial Intelligence for Societal Impact
 
Measuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systemsMeasuring effectiveness of machine learning systems
Measuring effectiveness of machine learning systems
 
Auditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographicsAuditing search engines for differential satisfaction across demographics
Auditing search engines for differential satisfaction across demographics
 
Causal inference in data science
Causal inference in data scienceCausal inference in data science
Causal inference in data science
 
Causal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practicesCausal inference in online systems: Methods, pitfalls and best practices
Causal inference in online systems: Methods, pitfalls and best practices
 
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential OutcomesEquivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
Equivalence causal frameworks: SEMs, Graphical models and Potential Outcomes
 
Estimating the causal impact of recommender systems
Estimating the causal impact of recommender systemsEstimating the causal impact of recommender systems
Estimating the causal impact of recommender systems
 
Data mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.comData mining for causal inference: Effect of recommendations on Amazon.com
Data mining for causal inference: Effect of recommendations on Amazon.com
 
Estimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actionsEstimating influence of online activity feeds on people's actions
Estimating influence of online activity feeds on people's actions
 
From prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systemsFrom prediction to causation: Causal inference in online systems
From prediction to causation: Causal inference in online systems
 
The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...The interplay of personal preference and social influence in sharing networks...
The interplay of personal preference and social influence in sharing networks...
 
The role of social connections in shaping our preferences
The role of social connections in shaping our preferencesThe role of social connections in shaping our preferences
The role of social connections in shaping our preferences
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
 
RSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendationRSWEB 2013: A research platform for social recommendation
RSWEB 2013: A research platform for social recommendation
 

Recently uploaded

Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfGABYFIORELAMALPARTID1
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGSoniaBajaj10
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxPayal Shrivastava
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxpriyankatabhane
 

Recently uploaded (20)

Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdfKDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
KDIGO-2023-CKD-Guideline-Public-Review-Draft_5-July-2023.pdf
 
Probability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UGProbability.pptx, Types of Probability, UG
Probability.pptx, Types of Probability, UG
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
PLASMODIUM. PPTX
PLASMODIUM. PPTXPLASMODIUM. PPTX
PLASMODIUM. PPTX
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
FBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptxFBI Profiling - Forensic Psychology.pptx
FBI Profiling - Forensic Psychology.pptx
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptxEnvironmental Acoustics- Speech interference level, acoustics calibrator.pptx
Environmental Acoustics- Speech interference level, acoustics calibrator.pptx
 

Causal Learning Boosts Privacy for ML Models

  • 1. Alleviating Privacy Attacks via Causal Learning Shruti Tople, Amit Sharma, Aditya V. Nori Microsoft Research https://arxiv.org/abs/1909.12732 https://github.com/microsoft/robustdg
  • 2. Motivation: ML models leak information about data points in the training set Neural Network TrainingHealth Records (HIV/AIDS patients) ML-as-a-service Member of Train Dataset Non-member Membership Inference Attacks [SP’17][CSF’18][NDSS’19][SP’19]
  • 3. The likely reason is overfitting Output 85% Output 95% Overfitting to dataset • Neural networks or associational models overfit to the training dataset • Membership inference adversary exploits differences in prediction score for training and test data [CSF’18]
  • 4. Overfitting to distribution The likely reason is overfitting • Neural networks or associational models overfit to the training dataset • Membership inference attacks exploit differences in prediction score for training and test data [CSF’18] • Privacy risk can increase when model is deployed to different distributions • E.g., Hospital in one region shares the model to other regions Output 85% Output 95% Overfitting to dataset Output 75% Poor generalization across distributions exacerbates membership inference risk.
  • 6. Can causal ML models help? Contributions 1. Causal models provide stronger (differential) privacy guarantees than associational models. • Due to their better generalizability on new distributions. 2. And hence are more robust to membership inference attacks. • As the training dataset size → ∞, membership inference attack’s accuracy drops to a random guess. 3. We empirically demonstrate privacy benefits of causal models across 5 datasets. • Associational models exhibit up to 80% attack accuracy whereas causal models exhibit attack accuracy close to 50%. Causal Learning Privacy
  • 7. Disease Severity Background: Causal Learning 𝒀 Blood Pressure Heart Rate 𝑿 𝒑𝒂𝒓𝒆𝒏𝒕 𝑿 𝒑𝒂𝒓𝒆𝒏𝒕 𝑿 𝟏 𝑿 𝟐 Weight Age Use a structural causal model (SCM) that defines what conditional probabilities are invariant across different distributions [Pearl’09].
  • 8. Background: Causal Learning Use a structural causal model (SCM) that defines what conditional probabilities are invariant across different distributions [Pearl’09]. Causal Predictive Model: A prediction model based only on the parents of the outcome Y. What if SCM is not known? Learn an invariant feature representation across distributions [ABGD’19, MTS’20]. For ML models, causal learning can be useful for fairness [KLRS’17] explainability [DSZ’16, MTS’19] privacy [this work] Disease Severity 𝒀 Blood Pressure Heart Rate 𝑿 𝒑𝒂𝒓𝒆𝒏𝒕 𝑿 𝒑𝒂𝒓𝒆𝒏𝒕 𝑿 𝟏 𝑿 𝟐 Weight Age
  • 9. 𝒀 𝑋𝑆0 𝑋 𝑃𝐴 𝑋𝑆2 𝑋𝑆1 𝑋 𝐶𝐻 𝑋𝑐𝑝 Intervention Why is a model based on causal parents invariant across data distributions?
  • 10. Why is a model based on causal parents invariant across data distributions? 𝒀 𝑋𝑆0 𝑋 𝑃𝐴 𝑋𝑆2 𝑋𝑆1 𝑋 𝐶𝐻 𝑋𝑐𝑝 Intervention 𝒀 𝑋𝑆0 𝑋 𝑃𝐴 𝑋𝑆2 𝑋𝑆1 𝑋 𝐶𝐻 𝑋𝑐𝑝 𝑃(𝑌|𝑋 𝑃𝐴) is invariant across different distributions, unless there is a change in true data-generating process for Y.
  • 11. Result 1: Worst-case out-of-distribution error of a causal model is lower than an associational model.
  • 12. For any model ℎ, and 𝑃∗ such that 𝑃∗ 𝑌 𝑋 𝑃𝐴 = 𝑃(𝑌|𝑋 𝑃𝐴), In-Distribution Error (IDE)= 𝐈𝐃𝐄 𝐏 𝒉, 𝒚 = 𝐋 𝑷 𝒉, 𝒚 − 𝐋 𝑺∼P(𝒉, 𝒚) Expected loss on the same distribution as the train data Out-of-Distribution Error (ODE)=𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉, 𝒚 = 𝐋 𝑷∗ 𝒉, 𝒚 − 𝐋 𝑺∼P 𝒉, 𝒚 Expected loss on a different distribution 𝑃∗ than the train data Result 1: Worst-case out-of-distribution error of a causal model is lower than an associational model.
  • 13. For any model ℎ, and 𝑃∗ such that 𝑃∗ 𝑌 𝑋 𝑃𝐴 = 𝑃(𝑌|𝑋 𝑃𝐴), In-Distribution Error (IDE)= 𝐈𝐃𝐄 𝐏 𝒉, 𝒚 = 𝐋 𝑷 𝒉, 𝒚 − 𝐋 𝑺∼P(𝒉, 𝒚) Expected loss on the same distribution as the train data Out-of-Distribution Error (ODE)=𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉, 𝒚 = 𝐋 𝑷∗ 𝒉, 𝒚 − 𝐋 𝑺∼P 𝒉, 𝒚 Expected loss on a different distribution 𝑃∗ than the train data Proof Idea. Simple case: Assume 𝑦 = 𝑓(𝒙) is deterministic. 𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉 𝐜, 𝒚 ≤ 𝐈𝐃𝐄 𝐏(𝒉 𝒄, 𝒚) + 𝒅𝒊𝒔𝒄 𝐋 𝑷, 𝑷∗ Discrepancy b/w 𝑷 and 𝑷∗ distributions Causal Model Result 1: Worst-case out-of-distribution error of a causal model is lower than an associational model.
  • 14. For any model ℎ, and 𝑃∗ such that 𝑃∗ 𝑌 𝑋 𝑃𝐴 = 𝑃(𝑌|𝑋 𝑃𝐴), In-Distribution Error (IDE)= 𝐈𝐃𝐄 𝐏 𝒉, 𝒚 = 𝐋 𝑷 𝒉, 𝒚 − 𝐋 𝑺∼P(𝒉, 𝒚) Expected loss on the same distribution as the train data Out-of-Distribution Error (ODE)=𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉, 𝒚 = 𝐋 𝑷∗ 𝒉, 𝒚 − 𝐋 𝑺∼P 𝒉, 𝒚 Expected loss on a different distribution 𝑃∗ than the train data Proof Idea. Simple case: Assume 𝑦 = 𝑓(𝒙) is deterministic. 𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉 𝐜, 𝒚 ≤ 𝐈𝐃𝐄 𝐏(𝒉 𝒄, 𝒚) + 𝒅𝒊𝒔𝒄 𝐋 𝑷, 𝑷∗ 𝐎𝐃𝐄 𝐏,𝐏∗ 𝒉 𝒂, 𝒚 ≤ 𝐈𝐃𝐄 𝐏 𝒉 𝒂, 𝒚 + 𝒅𝒊𝒔𝒄 𝐋 𝑷, 𝑷∗ + 𝐋 𝑷∗(𝒉 𝒂,𝑷 𝑶𝑷𝑻 , 𝒚) ⇒ max 𝐏∗ 𝐎𝐃𝐄𝐁𝐨𝐮𝐧𝐝 𝐏,𝐏∗ 𝒉 𝐜, 𝒚 ≤ max 𝐏∗ 𝐎𝐃𝐄𝐁𝐨𝐮𝐧𝐝 𝐏,𝐏∗ 𝒉 𝒂, 𝒚 Discrepancy b/w 𝑷 and 𝑷∗ distributions Optimal 𝒉 𝒂 on P is not optimal on 𝑷∗ Causal Model Assoc. Model Result 1: Worst-case out-of-distribution error of a causal model is lower than an associational model.
  • 15. And better generalization results in lower sensitivity for a causal model Sensitivity: If a single data point 𝒙, 𝑦 ∼ 𝑃∗ is added to the train dataset 𝑆 to create 𝑆′, how much does the learnt model h 𝑆 min change? Since the optimal causal model is the same across all 𝑃∗ , adding any 𝒙, 𝑦 ∼ 𝑃∗ has less impact on a trained causal model. Sensitivity for a causal model Sensitivity for an associational model
  • 16. Main Result: A causal model has stronger Differential Privacy guarantees Let M be a mechanism that returns a ML model trained over dataset 𝑆, M(𝑆) = ℎ. Differential Privacy [DR’14]: A learning mechanism M satisfies 𝜖-differential privacy if for any two datasets, 𝑆, 𝑆′ that differ in one data point, Pr(M 𝑆 ∈𝐻) Pr(M 𝑆′ ∈𝐻) ≤ 𝑒 𝜖. (Smaller 𝜖 values provide better privacy guarantees) Since lower sensitivity ⇒ lower 𝜖, Theorem: When equivalent Laplace noise is added and models are trained on same dataset, causal mechanism MC provides 𝜖 𝐶-DP and associational mechanism MA provides 𝜖 𝐴-DP guarantees such that: 𝝐 𝒄 ≤ 𝝐 𝑨
  • 17. Therefore, causal models are more robust to membership inference (MI) attacks Advantage of an MI adversary: (True Positive Rate – False Positive Rate) in detecting whether 𝑥 is from training dataset or not. [From Yeom et al. CSF’18] Membership advantage of an adversary is bounded by 𝑒 𝜖 − 1. Since the optimal causal models are the same for 𝑃 and 𝑃∗, As 𝑛 → ∞, membership advantage of causal model → 0. Theorem: When trained on the same dataset of size 𝑛, membership advantage of a causal model is lower than the membership advantage for an associational model.
  • 19. Goal: Compare MI attack accuracy between causal and associational models [BN] When true causal structure is known Datasets generated from Bayesian networks: Child, Sachs, Water, Alarm Causal model: MLE estimation based on Y’s parents Associational model: Neural networks with 3 linear layers 𝑃∗: Noise added to conditional probabilities (uniform or additive) [MNIST] When true causal structure is unknown Colored MNIST dataset (Digits are correlated with color) Causal Model: Invariant Risk Minimization that utilizes 𝑃 𝑌 𝑋 𝑃𝐴 is same across distributions [ABGD’19] Associational Model: Empirical Risk Minimization using the same NN architecture 𝑃∗: Different correlations between color and digit than the train dataset Attacker Model: Predict whether an input belongs to train dataset or not
  • 20. [BN] With uniform noise, MI attack accuracy for a causal model is near a random guess 80% 50% For associational models, the attacker can guess membership in training set with 80% accuracy.
  • 21. [BN-Child] With uniform noise, MI attack accuracy for a causal model is near a random guess 80% 50% For associational models, the attacker can guess membership in training set with 80% accuracy. Privacy without loss in utility: Causal & DNN models achieve same prediction accuracy.
  • 22. [BN-Child] MI Attack accuracy increases with amount of noise for associational models, but stays constant at 50% for causal models
  • 23. [BN] Consistent results across all four datasets High attack accuracy for associational models when 𝑃∗ (Test2) has uniform noise. Same classification accuracy between causal and associational models.
  • 24. [MNIST] MI attack accuracy is lower for invariant risk minimizer compared to associational model IRM model motivated by causal reasoning has 53% attack accuracy, close to random. Associational model also fails to generalize: 16% accuracy on test set. Model Train Accuracy (%) Test Accuracy (%) Attack Accuracy (%) Causal Model (IRM) 70 69 53 Associational Model (ERM) 87 16 66
  • 25. Conclusion • Established theoretical connection between causality and differential privacy. • Demonstrated the benefits of causal ML models for alleviating privacy attacks, both theoretically and empirically. • Code available at https://github.com/microsoft/robustdg Future work: Investigate robustness of causal models with other kinds of adversarial attacks. Causal Learning Privacy thank you! Amit Sharma Microsoft Research
  • 26. References • [ABGD’19] Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019. • [CSF’18] Yeom, S., Giacomelli, I., Fredrikson, M., and Jha, S. Privacy risk in machine learning: Analyzing the connection to overfitting. CSF 2018. • [DR’14] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014. • [DSZ’16] Anupam Datta, Shayak Sen, and Yair Zick. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on, pp. 598–617. IEEE, 2016 • [KLRS’17] Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual fairness. In Advances in Neural Information Processing Systems, pp. 4066–4076, 2017. • [MTS’19] Mahajan, Divyat, Chenhao Tan, and Amit Sharma. "Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers." arXiv preprint arXiv:1912.03277 (2019). • [MTS’20] Mahajan, Divyat, Shruti Tople and Amit Sharma. “Domain Generalization using Causal Matching”. arXiv preprint arXiv:2006.07500, 2020. • [NDSS’19] Salem, A., Zhang, Y., Humbert, M., Fritz, M., and Backes, M. Ml-leaks: Model and data independent membership inference attacks and defenses on machine learning models. NDSS 2019. • [SP’17] Shokri, R., Stronati, M., Song, C., and Shmatikov, V. Membership inference attacks against machine learning models. Security and Privacy (SP), 2017. • [SP’19] Nasr, M., Shokri, R., and Houmansadr, A. Comprehensive privacy analysis of deep learning: Stand-alone and federated learning under passive and active white-box inference attacks. Security and Privacy (SP), 2019.