SlideShare a Scribd company logo

Pydata Global 2023 - How can a learnt model unlearn something

In the recent past with the explosion of large language or vision models, it became inherently very costly to train models on new data. Coupled with that the various new data privacy legislations introduced or to be introduced make the "right to be forgotten" very costly and time-consuming. In this talk, we will go through the current state of research on "machine unlearning", how a learnt model forgets something without retraining and a general demonstration of the machine unlearning framework.

1 of 13
Download to read offline
How can a learnt model unlearn something
Framework for "Machine Unlearning"
Saradindu Sengupta
Senior ML Engineer @Nunam
Where I work on building learning systems to forecast health and failure of Li-ion batteries.
PyData Global 2023
Overview
1. A brief overview of what is unlearning and why not just retrain.
a. Why we came her - Applications for machine unlearning
b. A brief overview of previous research work
2. Challenges to be encountered for an unlearning algorithm
a. Stochasticity
b. Incrementality
c. Degradation
3. Unlearning framework
a. Technical Design requirements to be met
4. Type of removal requests to be handled
a. Feature-wise
b. Item-wise
c. Class-wise
5. Verification
a. How to define evaluation metric for the unlearning algorithm
b. How to define if the influence of forgetting the dataset is truly removed
Why we came here
Privacy
Security Usability
● Facebook Privacy Policy Change
● iCloud photo hacking
● “Right to be forgotten” regulations
stipulates “individuals have the
right to be forgotten”.
● Polluted training data would pollute
model outcome
● Polygraph, a worm detection
program conclusively demonstrated
that. [Perdisci, Dagon, and et.al,
“Misleading worm signature
generators using deliberate noise
injection”]
● Recommendation engine
● Netflix account sharing would
pollute the content
recommendation
Why not just retrain ?
Why not retrain ?
Training cost for ResNet-50 decreased by 38% overall [Google Cloud Cost] due to hardware optimization and parallelism
but total cost have increased significantly.
[1] COUNTING THE COST OF TRAINING LARGE LANGUAGE MODELS
[2] (Sharir et al., 2020)
Model
Params
(Billions) Token
Days to
train
Price to
Train
Cost per 1M
params
GPT-3XL 1.3 26 0.4 2,500 1.92
GPT-J 6 120 8 45000 7.5
GPT-3 6.7B 6.7 134 11 40000 5.97
T-5 11B 11 34 9 60000 5.45
GPT-3 13B 13 260 39 150000 11.54
GPT-3 NeoX 20 400 47 525000 26.25
GPT 70B 70 1400 85 2500000 35.71
GPT 175B 175 3500 110.5 8750000 50
Research Space - How we came here
[Y. Cao and J. Yang, 2015]
● Introduced the term ‘machine unlearning’
● Provided deterministic algorithm for
unlearning
[A. Ginart and et al. , 2019]
Introduced probabilistic unlearning
inspired from differential privacy
[(Guo et al., 2020] [Izzo et al., 2021] [Neel et al., 2021] [Ullah et al., 2021]
Provided theoretical error boundness to probabilistic unlearning
[Cauwenberghs and Poggio, 2001] [Tveit et al., 2003]
Introduced decremental learning
[Du et al., 2019] [Golatkar et al.,2020b,a]
[Nguyen et al., 2020]
Introduced unlearning for deep
learning
Challenges for an efficient unlearning algorithm
1. Stochasticity
a. The stochasticity of the training process makes identifying a single data point that influences
weight very difficult
2. Incrementality
a. The nature of incrementality in training , where a single instance of data influences the
subsequent instances and it is itself influenced by previous samples, makes the process of
removing influence tricky
3. Catastrophic Unlearning
a. While removing influences of subset of data, the nature of degradation of its performance can be
exponential which makes the process hard to quantify.
Ad

Recommended

Deep neural network for youtube recommendations
Deep neural network for youtube recommendationsDeep neural network for youtube recommendations
Deep neural network for youtube recommendationsKan-Han (John) Lu
 
ISTQB Test Automation Engineer Answers to Sample Question Paper
ISTQB Test Automation Engineer Answers to Sample Question PaperISTQB Test Automation Engineer Answers to Sample Question Paper
ISTQB Test Automation Engineer Answers to Sample Question PaperNeeraj Kumar Singh
 
IRJET- Face Recognition by Additive Block based Feature Extraction
IRJET- Face Recognition by Additive Block based Feature ExtractionIRJET- Face Recognition by Additive Block based Feature Extraction
IRJET- Face Recognition by Additive Block based Feature ExtractionIRJET Journal
 
Applicants Qualification Filtering System
Applicants Qualification Filtering SystemApplicants Qualification Filtering System
Applicants Qualification Filtering SystemSiti Nabilah Ismail
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction SystemIRJET Journal
 
Face detection based attendance system
Face detection based attendance systemFace detection based attendance system
Face detection based attendance systemIRJET Journal
 
USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING
USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLINGUSING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING
USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLINGIRJET Journal
 
Robust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineRobust Tracking Via Feature Mapping Method and Support Vector Machine
Robust Tracking Via Feature Mapping Method and Support Vector MachineIRJET Journal
 

More Related Content

Similar to Pydata Global 2023 - How can a learnt model unlearn something

Fundamental of testing
Fundamental of testingFundamental of testing
Fundamental of testingaidul azmi
 
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCESURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCEIRJET Journal
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation SystemIRJET Journal
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)Javier Gonzalez-Sanchez
 
AUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEMAUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEMIRJET Journal
 
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET Journal
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET Journal
 
Online Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep LearningOnline Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep LearningIRJET Journal
 
Bab i fundamental of testing
Bab i fundamental of testingBab i fundamental of testing
Bab i fundamental of testingSyakir Arsalan
 
Modul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfModul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfSunu Wibirama
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingIJMTST Journal
 
Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"Awais Chaudhary
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGIRJET Journal
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
Azure machine-learning
Azure machine-learningAzure machine-learning
Azure machine-learningBrian Lee
 
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...Daniel983829
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 

Similar to Pydata Global 2023 - How can a learnt model unlearn something (20)

Istqb Sample Questions
Istqb Sample QuestionsIstqb Sample Questions
Istqb Sample Questions
 
Fundamental of testing
Fundamental of testingFundamental of testing
Fundamental of testing
 
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCESURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
SURVEY ON ONLINE EXAMINATION SYSTEM USING ARTIFICIAL INTELLIGENCE
 
Recuriter Recommendation System
Recuriter Recommendation SystemRecuriter Recommendation System
Recuriter Recommendation System
 
201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)201008 Software Testing Notes (part 1/2)
201008 Software Testing Notes (part 1/2)
 
AUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEMAUTOMATED PROCTORING SYSTEM
AUTOMATED PROCTORING SYSTEM
 
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...
 
IRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering TechniqueIRJET- Automated CV Classification using Clustering Technique
IRJET- Automated CV Classification using Clustering Technique
 
Online Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep LearningOnline Exam Proctoring using Deep Learning
Online Exam Proctoring using Deep Learning
 
Bab i fundamental of testing
Bab i fundamental of testingBab i fundamental of testing
Bab i fundamental of testing
 
Modul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdfModul Topik 4 - Kecerdasan Buatan.pdf
Modul Topik 4 - Kecerdasan Buatan.pdf
 
A Study on Machine Learning and Its Working
A Study on Machine Learning and Its WorkingA Study on Machine Learning and Its Working
A Study on Machine Learning and Its Working
 
Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"Computer based online written test system "Tao Software"
Computer based online written test system "Tao Software"
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
 
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNINGHANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
HANDWRITTEN DIGIT RECOGNITION USING MACHINE LEARNING
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
Azure machine-learning
Azure machine-learningAzure machine-learning
Azure machine-learning
 
Ew36913917
Ew36913917Ew36913917
Ew36913917
 
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
machine-learning-development-audit-framework-assessment-and-inspection-of-ris...
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 

More from SARADINDU SENGUPTA

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSARADINDU SENGUPTA
 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...SARADINDU SENGUPTA
 
AZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionAZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionSARADINDU SENGUPTA
 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionSARADINDU SENGUPTA
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...SARADINDU SENGUPTA
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine LearningSARADINDU SENGUPTA
 

More from SARADINDU SENGUPTA (6)

Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid StationSolar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
Solar Energy Output Forecasting from SolarGIS Data for Connected Grid Station
 
An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...An Analytical Comparison of Different Regularization Parameter Selection Meth...
An Analytical Comparison of Different Regularization Parameter Selection Meth...
 
AZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in productionAZConf 2023 - Considerations for LLMOps: Running LLMs in production
AZConf 2023 - Considerations for LLMOps: Running LLMs in production
 
GDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in productionGDG Community Day 2023 - Interpretable ML in production
GDG Community Day 2023 - Interpretable ML in production
 
PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...PyData Global 2022 - Things I learned while running neural networks on microc...
PyData Global 2022 - Things I learned while running neural networks on microc...
 
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
GDG Cloud Community Day 2022 -  Managing data quality in Machine LearningGDG Cloud Community Day 2022 -  Managing data quality in Machine Learning
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
 

Recently uploaded

Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxVighnesh Shashtri
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Cyber Security Experts
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)CUO VEERANAN VEERANAN
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdfdigimartfamily
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxJose Briones
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referencepriyansabari355
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referencepriyansabari355
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 

Recently uploaded (18)

Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Artificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptxArtificial Intelligence and its Impact on Society.pptx
Artificial Intelligence and its Impact on Society.pptx
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
Web 3.0 in Data Privacy and Security | Data Privacy |Blockchain Security| Cyb...
 
Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)Big Data - large Scale data (Amazon, FB)
Big Data - large Scale data (Amazon, FB)
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
data analytics and tools from in2inglobal.pdf
data analytics  and tools from in2inglobal.pdfdata analytics  and tools from in2inglobal.pdf
data analytics and tools from in2inglobal.pdf
 
What is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptxWhat is the value of your Data v3.0.pptx
What is the value of your Data v3.0.pptx
 
SABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as referenceSABARI PRIYAN's self introduction as reference
SABARI PRIYAN's self introduction as reference
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
SABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a referenceSABARI PRIYAN's self introduction as a reference
SABARI PRIYAN's self introduction as a reference
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 

Pydata Global 2023 - How can a learnt model unlearn something

  • 1. How can a learnt model unlearn something Framework for "Machine Unlearning" Saradindu Sengupta Senior ML Engineer @Nunam Where I work on building learning systems to forecast health and failure of Li-ion batteries. PyData Global 2023
  • 2. Overview 1. A brief overview of what is unlearning and why not just retrain. a. Why we came her - Applications for machine unlearning b. A brief overview of previous research work 2. Challenges to be encountered for an unlearning algorithm a. Stochasticity b. Incrementality c. Degradation 3. Unlearning framework a. Technical Design requirements to be met 4. Type of removal requests to be handled a. Feature-wise b. Item-wise c. Class-wise 5. Verification a. How to define evaluation metric for the unlearning algorithm b. How to define if the influence of forgetting the dataset is truly removed
  • 3. Why we came here Privacy Security Usability ● Facebook Privacy Policy Change ● iCloud photo hacking ● “Right to be forgotten” regulations stipulates “individuals have the right to be forgotten”. ● Polluted training data would pollute model outcome ● Polygraph, a worm detection program conclusively demonstrated that. [Perdisci, Dagon, and et.al, “Misleading worm signature generators using deliberate noise injection”] ● Recommendation engine ● Netflix account sharing would pollute the content recommendation Why not just retrain ?
  • 4. Why not retrain ? Training cost for ResNet-50 decreased by 38% overall [Google Cloud Cost] due to hardware optimization and parallelism but total cost have increased significantly. [1] COUNTING THE COST OF TRAINING LARGE LANGUAGE MODELS [2] (Sharir et al., 2020) Model Params (Billions) Token Days to train Price to Train Cost per 1M params GPT-3XL 1.3 26 0.4 2,500 1.92 GPT-J 6 120 8 45000 7.5 GPT-3 6.7B 6.7 134 11 40000 5.97 T-5 11B 11 34 9 60000 5.45 GPT-3 13B 13 260 39 150000 11.54 GPT-3 NeoX 20 400 47 525000 26.25 GPT 70B 70 1400 85 2500000 35.71 GPT 175B 175 3500 110.5 8750000 50
  • 5. Research Space - How we came here [Y. Cao and J. Yang, 2015] ● Introduced the term ‘machine unlearning’ ● Provided deterministic algorithm for unlearning [A. Ginart and et al. , 2019] Introduced probabilistic unlearning inspired from differential privacy [(Guo et al., 2020] [Izzo et al., 2021] [Neel et al., 2021] [Ullah et al., 2021] Provided theoretical error boundness to probabilistic unlearning [Cauwenberghs and Poggio, 2001] [Tveit et al., 2003] Introduced decremental learning [Du et al., 2019] [Golatkar et al.,2020b,a] [Nguyen et al., 2020] Introduced unlearning for deep learning
  • 6. Challenges for an efficient unlearning algorithm 1. Stochasticity a. The stochasticity of the training process makes identifying a single data point that influences weight very difficult 2. Incrementality a. The nature of incrementality in training , where a single instance of data influences the subsequent instances and it is itself influenced by previous samples, makes the process of removing influence tricky 3. Catastrophic Unlearning a. While removing influences of subset of data, the nature of degradation of its performance can be exponential which makes the process hard to quantify.
  • 7. Unlearning Framework Training Dataset Pre-trained Model Forget Dataset Unlearning Model Unlearned Model Training Dataset Evaluation Metric Retraining without forget dataset Check how good the unlearning algorithm is compared to retrained model
  • 8. Unlearning Framework: Design Requirements 1. Completeness a. An unlearned model should be making same prediction as a retrained model on incoming samples b. Metrics can be derived from adversarial attacks 2. Timeliness a. A retrained and an unlearnt model should work same but the first process taking more time than later b. This metric is a compromise with completeness as a model retrained will have better accuracy, although might be negligible but time to retrain would be costly 3. Verifiability a. An unlearnt model should provide mechanism to check the effects the unlearning request b. To that end, backdoor attacks can be useful
  • 9. Type of removal requests 1. Item-wise a. Remove certain items/samples from training data 2. Feature-wise a. Unlearning at the feature level b. When misclassified samples leaks error for specific features 3. Class-wise a. Unlearning at class level b. It can be implicit in many scenarios
  • 10. Verification and Evaluation Metrics Unlearning verification and evaluation metrics or tests overlap in some areas but inherently while the first is used for model optimization the later is used for model evaluation. Some of the used unlearning model verification tests are mentioned below. Verification 1. Feature Injection test 2. Forgetting Measuring 3. Information Leakage 4. Membership Inference Attack 5. Backdoor Attack 6. Slowdown Attack 7. Interclass Confusion Test 8. Federated Verification 9. Cryptographic Protocol Evaluation Metric 1. Accuracy 2. Completeness 3. Unlearn Time 4. Relearn Time 5. Layer-wise Distance 6. Activation Distance 7. JS Divergence 8. Membership Inference Attack 9. ZRF Score 10. Model Inversion Attack
  • 11. Verification 2 of the most commonly references Feature Injection test Membership Inference Attack Kong, Z., & Alfeld, S. (2022). Approximate Data Deletion in Generative Models. ArXiv. /abs/2206.14439 Shokri, R., Stronati, M., Song, C., & Shmatikov, V. (2016). Membership Inference Attacks against Machine Learning Models. ArXiv. /abs/1610.05820
  • 12. Evaluation Metric Layer-wise Distance Unlearn Time Tarun, A. K., Chundawat, V. S., Mandal, M., & Kankanhalli, M. (2021). Fast Yet Effective Machine Unlearning. ArXiv. https://doi.org/10.1109/TNNLS.2023.3266233 Y Cao, J Yang(2015). Towards Making Systems Forget with Machine Unlearning