SlideShare a Scribd company logo
1 of 13
Download to read offline
How can a learnt model unlearn something
Framework for "Machine Unlearning"
Saradindu Sengupta
Senior ML Engineer @Nunam
Where I work on building learning systems to forecast health and failure of Li-ion batteries.
PyData Global 2023
Overview
1. A brief overview of what is unlearning and why not just retrain.
a. Why we came her - Applications for machine unlearning
b. A brief overview of previous research work
2. Challenges to be encountered for an unlearning algorithm
a. Stochasticity
b. Incrementality
c. Degradation
3. Unlearning framework
a. Technical Design requirements to be met
4. Type of removal requests to be handled
a. Feature-wise
b. Item-wise
c. Class-wise
5. Verification
a. How to define evaluation metric for the unlearning algorithm
b. How to define if the influence of forgetting the dataset is truly removed
Why we came here
Privacy
Security Usability
● Facebook Privacy Policy Change
● iCloud photo hacking
● “Right to be forgotten” regulations
stipulates “individuals have the
right to be forgotten”.
● Polluted training data would pollute
model outcome
● Polygraph, a worm detection
program conclusively demonstrated
that. [Perdisci, Dagon, and et.al,
“Misleading worm signature
generators using deliberate noise
injection”]
● Recommendation engine
● Netflix account sharing would
pollute the content
recommendation
Why not just retrain ?
Why not retrain ?
Training cost for ResNet-50 decreased by 38% overall [Google Cloud Cost] due to hardware optimization and parallelism
but total cost have increased significantly.
[1] COUNTING THE COST OF TRAINING LARGE LANGUAGE MODELS
[2] (Sharir et al., 2020)
Model
Params
(Billions) Token
Days to
train
Price to
Train
Cost per 1M
params
GPT-3XL 1.3 26 0.4 2,500 1.92
GPT-J 6 120 8 45000 7.5
GPT-3 6.7B 6.7 134 11 40000 5.97
T-5 11B 11 34 9 60000 5.45
GPT-3 13B 13 260 39 150000 11.54
GPT-3 NeoX 20 400 47 525000 26.25
GPT 70B 70 1400 85 2500000 35.71
GPT 175B 175 3500 110.5 8750000 50
Research Space - How we came here
[Y. Cao and J. Yang, 2015]
● Introduced the term ‘machine unlearning’
● Provided deterministic algorithm for
unlearning
[A. Ginart and et al. , 2019]
Introduced probabilistic unlearning
inspired from differential privacy
[(Guo et al., 2020] [Izzo et al., 2021] [Neel et al., 2021] [Ullah et al., 2021]
Provided theoretical error boundness to probabilistic unlearning
[Cauwenberghs and Poggio, 2001] [Tveit et al., 2003]
Introduced decremental learning
[Du et al., 2019] [Golatkar et al.,2020b,a]
[Nguyen et al., 2020]
Introduced unlearning for deep
learning
Challenges for an efficient unlearning algorithm
1. Stochasticity
a. The stochasticity of the training process makes identifying a single data point that influences
weight very difficult
2. Incrementality
a. The nature of incrementality in training , where a single instance of data influences the
subsequent instances and it is itself influenced by previous samples, makes the process of
removing influence tricky
3. Catastrophic Unlearning
a. While removing influences of subset of data, the nature of degradation of its performance can be
exponential which makes the process hard to quantify.
Unlearning Framework
Training Dataset
Pre-trained
Model Forget Dataset
Unlearning Model
Unlearned Model
Training Dataset
Evaluation Metric
Retraining without forget dataset
Check how good the unlearning algorithm
is compared to retrained model
Unlearning Framework: Design Requirements
1. Completeness
a. An unlearned model should be making same prediction as a retrained model on incoming
samples
b. Metrics can be derived from adversarial attacks
2. Timeliness
a. A retrained and an unlearnt model should work same but the first process taking more time
than later
b. This metric is a compromise with completeness as a model retrained will have better
accuracy, although might be negligible but time to retrain would be costly
3. Verifiability
a. An unlearnt model should provide mechanism to check the effects the unlearning request
b. To that end, backdoor attacks can be useful
Type of removal requests
1. Item-wise
a. Remove certain items/samples from training data
2. Feature-wise
a. Unlearning at the feature level
b. When misclassified samples leaks error for specific features
3. Class-wise
a. Unlearning at class level
b. It can be implicit in many scenarios
Verification and Evaluation Metrics
Unlearning verification and evaluation metrics or tests overlap in some areas but inherently while the
first is used for model optimization the later is used for model evaluation. Some of the used unlearning
model verification tests are mentioned below.
Verification
1. Feature Injection test
2. Forgetting Measuring
3. Information Leakage
4. Membership Inference Attack
5. Backdoor Attack
6. Slowdown Attack
7. Interclass Confusion Test
8. Federated Verification
9. Cryptographic Protocol
Evaluation Metric
1. Accuracy
2. Completeness
3. Unlearn Time
4. Relearn Time
5. Layer-wise Distance
6. Activation Distance
7. JS Divergence
8. Membership Inference Attack
9. ZRF Score
10. Model Inversion Attack
Verification
2 of the most commonly references
Feature Injection test
Membership Inference Attack
Kong, Z., & Alfeld, S. (2022). Approximate Data
Deletion in Generative Models. ArXiv.
/abs/2206.14439
Shokri, R., Stronati, M., Song, C., & Shmatikov, V.
(2016). Membership Inference Attacks against
Machine Learning Models. ArXiv. /abs/1610.05820
Evaluation Metric
Layer-wise Distance
Unlearn Time
Tarun, A. K., Chundawat, V. S., Mandal, M., & Kankanhalli,
M. (2021). Fast Yet Effective Machine Unlearning. ArXiv.
https://doi.org/10.1109/TNNLS.2023.3266233
Y Cao, J Yang(2015). Towards Making Systems Forget with
Machine Unlearning
Thank You
/in/saradindusengupta
@iamsaradindu /saradindusengupta

More Related Content

Similar to Pydata Global 2023 - How can a learnt model unlearn something

Fundamental of testing
Fundamental of testingFundamental of testing
Fundamental of testingaidul azmi
 
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCESURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCEIRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)Javier Gonzalez-Sanchez
 
AUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEMAUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEMIRJET Journal
 
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET Journal
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET Journal
 
Online Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep LearningOnline Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep LearningIRJET Journal
 
Bab i fundamental of testing
Bab i fundamental of testingBab i fundamental of testing
Bab i fundamental of testingSyakir Arsalan
 
Modul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfModul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfSunu Wibirama
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingIJMTST Journal
 
Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"Awais Chaudhary
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
Azure machine-learning
Azure machine-learningAzure machine-learning
Azure machine-learningBrian Lee
 
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...Daniel983829
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 

Similar to Pydata Global 2023 - How can a learnt model unlearn something (20)

Istqb Sample Questions
Istqb Sample QuestionsIstqb Sample Questions
Istqb Sample Questions
 
Fundamental of testing
Fundamental of testingFundamental of testing
Fundamental of testing
 
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCESURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)
 
AUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEMAUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEM
 
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering Technique
 
Online Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep LearningOnline Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep Learning
 
Bab i fundamental of testing
Bab i fundamental of testingBab i fundamental of testing
Bab i fundamental of testing
 
Modul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfModul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdf
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its Working
 
Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
Azure machine-learning
Azure machine-learningAzure machine-learning
Azure machine-learning
 
Ew36913917
Ew36913917Ew36913917
Ew36913917
 
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 

More from SARADINDU SENGUPTA

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSARADINDU SENGUPTA
 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...SARADINDU SENGUPTA
 
AZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionAZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionSARADINDU SENGUPTA
 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionSARADINDU SENGUPTA
 
PyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's CorrectionPyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's CorrectionSARADINDU SENGUPTA
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...SARADINDU SENGUPTA
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA
 

More from SARADINDU SENGUPTA (7)

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...
 
AZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionAZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in production
 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in production
 
PyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's CorrectionPyData Global 2022 - Lightning Talk - Bessel's Correction
PyData Global 2022 - Lightning Talk - Bessel's Correction
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
 

Recently uploaded

如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 

Recently uploaded (20)

123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 

Pydata Global 2023 - How can a learnt model unlearn something

  • 1. How can a learnt model unlearn something Framework for "Machine Unlearning" Saradindu Sengupta Senior ML Engineer @Nunam Where I work on building learning systems to forecast health and failure of Li-ion batteries. PyData Global 2023
  • 2. Overview 1. A brief overview of what is unlearning and why not just retrain. a. Why we came her - Applications for machine unlearning b. A brief overview of previous research work 2. Challenges to be encountered for an unlearning algorithm a. Stochasticity b. Incrementality c. Degradation 3. Unlearning framework a. Technical Design requirements to be met 4. Type of removal requests to be handled a. Feature-wise b. Item-wise c. Class-wise 5. Verification a. How to define evaluation metric for the unlearning algorithm b. How to define if the influence of forgetting the dataset is truly removed
  • 3. Why we came here Privacy Security Usability ● Facebook Privacy Policy Change ● iCloud photo hacking ● “Right to be forgotten” regulations stipulates “individuals have the right to be forgotten”. ● Polluted training data would pollute model outcome ● Polygraph, a worm detection program conclusively demonstrated that. [Perdisci, Dagon, and et.al, “Misleading worm signature generators using deliberate noise injection”] ● Recommendation engine ● Netflix account sharing would pollute the content recommendation Why not just retrain ?
  • 4. Why not retrain ? Training cost for ResNet-50 decreased by 38% overall [Google Cloud Cost] due to hardware optimization and parallelism but total cost have increased significantly. [1] COUNTING THE COST OF TRAINING LARGE LANGUAGE MODELS [2] (Sharir et al., 2020) Model Params (Billions) Token Days to train Price to Train Cost per 1M params GPT-3XL 1.3 26 0.4 2,500 1.92 GPT-J 6 120 8 45000 7.5 GPT-3 6.7B 6.7 134 11 40000 5.97 T-5 11B 11 34 9 60000 5.45 GPT-3 13B 13 260 39 150000 11.54 GPT-3 NeoX 20 400 47 525000 26.25 GPT 70B 70 1400 85 2500000 35.71 GPT 175B 175 3500 110.5 8750000 50
  • 5. Research Space - How we came here [Y. Cao and J. Yang, 2015] ● Introduced the term ‘machine unlearning’ ● Provided deterministic algorithm for unlearning [A. Ginart and et al. , 2019] Introduced probabilistic unlearning inspired from differential privacy [(Guo et al., 2020] [Izzo et al., 2021] [Neel et al., 2021] [Ullah et al., 2021] Provided theoretical error boundness to probabilistic unlearning [Cauwenberghs and Poggio, 2001] [Tveit et al., 2003] Introduced decremental learning [Du et al., 2019] [Golatkar et al.,2020b,a] [Nguyen et al., 2020] Introduced unlearning for deep learning
  • 6. Challenges for an efficient unlearning algorithm 1. Stochasticity a. The stochasticity of the training process makes identifying a single data point that influences weight very difficult 2. Incrementality a. The nature of incrementality in training , where a single instance of data influences the subsequent instances and it is itself influenced by previous samples, makes the process of removing influence tricky 3. Catastrophic Unlearning a. While removing influences of subset of data, the nature of degradation of its performance can be exponential which makes the process hard to quantify.
  • 7. Unlearning Framework Training Dataset Pre-trained Model Forget Dataset Unlearning Model Unlearned Model Training Dataset Evaluation Metric Retraining without forget dataset Check how good the unlearning algorithm is compared to retrained model
  • 8. Unlearning Framework: Design Requirements 1. Completeness a. An unlearned model should be making same prediction as a retrained model on incoming samples b. Metrics can be derived from adversarial attacks 2. Timeliness a. A retrained and an unlearnt model should work same but the first process taking more time than later b. This metric is a compromise with completeness as a model retrained will have better accuracy, although might be negligible but time to retrain would be costly 3. Verifiability a. An unlearnt model should provide mechanism to check the effects the unlearning request b. To that end, backdoor attacks can be useful
  • 9. Type of removal requests 1. Item-wise a. Remove certain items/samples from training data 2. Feature-wise a. Unlearning at the feature level b. When misclassified samples leaks error for specific features 3. Class-wise a. Unlearning at class level b. It can be implicit in many scenarios
  • 10. Verification and Evaluation Metrics Unlearning verification and evaluation metrics or tests overlap in some areas but inherently while the first is used for model optimization the later is used for model evaluation. Some of the used unlearning model verification tests are mentioned below. Verification 1. Feature Injection test 2. Forgetting Measuring 3. Information Leakage 4. Membership Inference Attack 5. Backdoor Attack 6. Slowdown Attack 7. Interclass Confusion Test 8. Federated Verification 9. Cryptographic Protocol Evaluation Metric 1. Accuracy 2. Completeness 3. Unlearn Time 4. Relearn Time 5. Layer-wise Distance 6. Activation Distance 7. JS Divergence 8. Membership Inference Attack 9. ZRF Score 10. Model Inversion Attack
  • 11. Verification 2 of the most commonly references Feature Injection test Membership Inference Attack Kong, Z., & Alfeld, S. (2022). Approximate Data Deletion in Generative Models. ArXiv. /abs/2206.14439 Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2016). Membership Inference Attacks against Machine Learning Models. ArXiv. /abs/1610.05820
  • 12. Evaluation Metric Layer-wise Distance Unlearn Time Tarun, A. K., Chundawat, V. S., Mandal, M., & Kankanhalli, M. (2021). Fast Yet Effective Machine Unlearning. ArXiv. https://doi.org/10.1109/TNNLS.2023.3266233 Y Cao, J Yang(2015). Towards Making Systems Forget with Machine Unlearning