SlideShare a Scribd company logo
Identifying Critical Neurons in
ANN Architectures using Mixed
Integer Programming
Mostafa ElAraby Guy Wolf Margarida Carvalho
[OPTML Neurips 2020]
Motivation
The existence of efficient sub-networks
with faster inference and marginal loss in
accuracy when compared to the original
over-parameterized ANN.
Frankle and Carbin (2018) introduced the
lottery ticket conjecture and empirically
showed the existence of a lucky pruned
subnetwork, a winning ticket.
Contents
● Preliminary about Mixed-Integer Programming (MIP)
● Neuron Importance Score introduction
● MIP formulation
● Proposed Algorithm
● Scalability
● Experiments
● Conclusion
MIP Preliminary
Linear Programming (LP)
A powerful framework used to solve optimization problems in the following form:
Linear Objective
An optimization objective that
can be minimization or
maximization of a linear
equation consisting of
decision variables that wer are
trying to solve.
Decision Variables
Variable optimized by the LP
optimization process and at
the end the solver will give its
solved value.
Linear Constraints
A set of constraints on the
decision variables that the
solver tries to satisfy
narrowing its optimization
space. The solver would throw
an infeasible solution if it can’t
find a solution satisfying the
linear constraints.
Mixed-Integer Programming (MIP)
Similar to the linear programming optimization but can have integer decision
variables along with continuous variables used in linear programming.
It is considered a harder problem that can be relaxed into a linear programming
problem.
Branch and Bound algorithm
We relax our MIP into an LP if we solve it
we are lucky and wwe get the optimal
solution. Otherwise, which is the normal
case we take an integer variable having a
float solution (branching variable) and we
add linear constraints excluding that
solution resulting in 2 new MIPs.
Neuron Importance Score
Introduction
MIP solver will compute a neuron
importance score [0-1] for neurons in
convolutional/ fully connected layers.
Neurons with small importance score
can be safely pruned without loss in
terms of accuracy.
MIP Formulation
Linear layers with no activation
Let h be a decision variable representing input value to layer l having weights W
and bias b
ReLU activated layers
Relaxing z decision variable
For faster solving time we relax the binary decision variable to be a relaxed
approximation
Proposed Constraints with Importance Score S
Representing Convolutional layers
We convert convolutional layers to Toeplitz flat matrices converting the
convolution to simple matrices multiplication and using same previous
constraints introduced for the fully connected layers with importance score for
each filter
Objective Function : Softmax
Softmax: is the marginal softmax that penalize for wrong predictions
regardless of the logit value. Y is the one hot encoded true label.
Objective Function: Sparsity
I represents the scaled down importance score (s - 2) that shown empirically to
give non-important neurons a lower score.
When we increase ƛ , more neurons gets the value near zero.
Proposed Algorithm
Scalability
MIP Solvers are slow
Representing a deep neural network is hard to solve in even commercial solvers
making it harder for our algorithm to scale well for large models.
For that problem we propose 2 solutions:
- Parallelizing computation layer wise
- Parallelizing computation Class wise
Parallel layers using decoupled greedy learning
Class-wise decoupling
In this experiment, we show that the neuron importance scores can be
approximated by 1) solving for each class the MIP with only one data point
from it, and then 2) taking the average of the computed scores for each neuron
as the final score estimation. Such procedure would speed-up our methodology
for problems with numerous classes.
Experiments
Pruning Experiments
Robustness Experiments
We show empirically that our framework is robust on different convergence
levels of the trained neural network as shown in the following Figure.
Generalization Experiments
Cross-dataset generalization: sub-network masking is computed on source
dataset (d1 ) and then applied to target dataset (d2 ) by retraining with the
same early initialization. Test accuracies are presented for masked and
unmasked (REF.) networks on d2 , as well as pruning percentage.
Conclusion
We proposed a mixed integer program to compute neuron importance scores in
ReLU-based deep neural networks. Our contributions focus on providing
scalable computation of importance scores in fully connected and
convolutional layers.

More Related Content

What's hot

Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry features
AYUSH RAJ
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
Saad Elbeleidy
 
Perceptron and Sigmoid Neurons
Perceptron and Sigmoid NeuronsPerceptron and Sigmoid Neurons
Perceptron and Sigmoid Neurons
Shajun Nisha
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
Sopheaktra YONG
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
Shajun Nisha
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
JAEMINJEONG5
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbfkylin
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
Taiji Suzuki
 
Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)
Adam Blevins
 
Ire presentation
Ire presentationIre presentation
Ire presentation
Raj Patel
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Jisang Yoon
 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
JAEMINJEONG5
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
Pierre de Lacaze
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
Ilya Kuzovkin
 
Learning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learningLearning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learning
Simon John
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
Jisang Yoon
 
CNN
CNNCNN
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
MLAI2
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
MLAI2
 

What's hot (20)

Classification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry featuresClassification of handwritten characters by their symmetry features
Classification of handwritten characters by their symmetry features
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
Perceptron and Sigmoid Neurons
Perceptron and Sigmoid NeuronsPerceptron and Sigmoid Neurons
Perceptron and Sigmoid Neurons
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
 
Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
2021 04-03-sean
2021 04-03-sean2021 04-03-sean
2021 04-03-sean
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"Paper overview: "Deep Residual Learning for Image Recognition"
Paper overview: "Deep Residual Learning for Image Recognition"
 
Learning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learningLearning to compare: relation network for few shot learning
Learning to compare: relation network for few shot learning
 
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
PPT - Enhancing the Locality and Breaking the Memory Bottleneck of Transforme...
 
CNN
CNNCNN
CNN
 
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
 
Zoooooohaib
ZoooooohaibZoooooohaib
Zoooooohaib
 
Online Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual LearningOnline Coreset Selection for Rehearsal-based Continual Learning
Online Coreset Selection for Rehearsal-based Continual Learning
 

Similar to Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Florent Renucci
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_reportCharles Hubbard
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
Devansh16
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
aciijournal
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
aciijournal
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkNaren Chandra Kattla
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network Jie Bao
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
Chao Han chaohan@vt.edu
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
IRJET Journal
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
gerogepatton
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
ijaia
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
AaryanArora10
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
Dr.(Mrs).Gethsiyal Augasta
 
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
ssuser4b1f48
 
EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupJacob Ramey
 
Dynamic programming prasintation eaisy
Dynamic programming prasintation eaisyDynamic programming prasintation eaisy
Dynamic programming prasintation eaisy
ahmed51236
 

Similar to Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming (20)

Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
Manifold Blurring Mean Shift algorithms for manifold denoising, report, 2012
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_report
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network AlgorithmsWeb Spam Classification Using Supervised Artificial Neural Network Algorithms
Web Spam Classification Using Supervised Artificial Neural Network Algorithms
 
Web spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithmsWeb spam classification using supervised artificial neural network algorithms
Web spam classification using supervised artificial neural network algorithms
 
Electricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural NetworkElectricity Demand Forecasting Using Fuzzy-Neural Network
Electricity Demand Forecasting Using Fuzzy-Neural Network
 
Report_NLNN
Report_NLNNReport_NLNN
Report_NLNN
 
Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network  Study on Some Key Issues of Synergetic Neural Network
Study on Some Key Issues of Synergetic Neural Network
 
deep CNN vs conventional ML
deep CNN vs conventional MLdeep CNN vs conventional ML
deep CNN vs conventional ML
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
A BI-OBJECTIVE MODEL FOR SVM WITH AN INTERACTIVE PROCEDURE TO IDENTIFY THE BE...
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
NS-CUK Seminar: V.T.Hoang, Review on "Relative Molecule Self-Attention Transf...
 
Group Project
Group ProjectGroup Project
Group Project
 
EGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project WriteupEGRE 310 RAMEYJM Final Project Writeup
EGRE 310 RAMEYJM Final Project Writeup
 
Dynamic programming prasintation eaisy
Dynamic programming prasintation eaisyDynamic programming prasintation eaisy
Dynamic programming prasintation eaisy
 

Recently uploaded

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 

Recently uploaded (20)

一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 

Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

  • 1. Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming Mostafa ElAraby Guy Wolf Margarida Carvalho [OPTML Neurips 2020]
  • 2. Motivation The existence of efficient sub-networks with faster inference and marginal loss in accuracy when compared to the original over-parameterized ANN. Frankle and Carbin (2018) introduced the lottery ticket conjecture and empirically showed the existence of a lucky pruned subnetwork, a winning ticket.
  • 3. Contents ● Preliminary about Mixed-Integer Programming (MIP) ● Neuron Importance Score introduction ● MIP formulation ● Proposed Algorithm ● Scalability ● Experiments ● Conclusion
  • 5. Linear Programming (LP) A powerful framework used to solve optimization problems in the following form: Linear Objective An optimization objective that can be minimization or maximization of a linear equation consisting of decision variables that wer are trying to solve. Decision Variables Variable optimized by the LP optimization process and at the end the solver will give its solved value. Linear Constraints A set of constraints on the decision variables that the solver tries to satisfy narrowing its optimization space. The solver would throw an infeasible solution if it can’t find a solution satisfying the linear constraints.
  • 6. Mixed-Integer Programming (MIP) Similar to the linear programming optimization but can have integer decision variables along with continuous variables used in linear programming. It is considered a harder problem that can be relaxed into a linear programming problem.
  • 7. Branch and Bound algorithm We relax our MIP into an LP if we solve it we are lucky and wwe get the optimal solution. Otherwise, which is the normal case we take an integer variable having a float solution (branching variable) and we add linear constraints excluding that solution resulting in 2 new MIPs.
  • 9. Introduction MIP solver will compute a neuron importance score [0-1] for neurons in convolutional/ fully connected layers. Neurons with small importance score can be safely pruned without loss in terms of accuracy.
  • 11. Linear layers with no activation Let h be a decision variable representing input value to layer l having weights W and bias b
  • 13. Relaxing z decision variable For faster solving time we relax the binary decision variable to be a relaxed approximation
  • 14. Proposed Constraints with Importance Score S
  • 15. Representing Convolutional layers We convert convolutional layers to Toeplitz flat matrices converting the convolution to simple matrices multiplication and using same previous constraints introduced for the fully connected layers with importance score for each filter
  • 16. Objective Function : Softmax Softmax: is the marginal softmax that penalize for wrong predictions regardless of the logit value. Y is the one hot encoded true label.
  • 17. Objective Function: Sparsity I represents the scaled down importance score (s - 2) that shown empirically to give non-important neurons a lower score. When we increase ƛ , more neurons gets the value near zero.
  • 20. MIP Solvers are slow Representing a deep neural network is hard to solve in even commercial solvers making it harder for our algorithm to scale well for large models. For that problem we propose 2 solutions: - Parallelizing computation layer wise - Parallelizing computation Class wise
  • 21. Parallel layers using decoupled greedy learning
  • 22. Class-wise decoupling In this experiment, we show that the neuron importance scores can be approximated by 1) solving for each class the MIP with only one data point from it, and then 2) taking the average of the computed scores for each neuron as the final score estimation. Such procedure would speed-up our methodology for problems with numerous classes.
  • 25. Robustness Experiments We show empirically that our framework is robust on different convergence levels of the trained neural network as shown in the following Figure.
  • 26. Generalization Experiments Cross-dataset generalization: sub-network masking is computed on source dataset (d1 ) and then applied to target dataset (d2 ) by retraining with the same early initialization. Test accuracies are presented for masked and unmasked (REF.) networks on d2 , as well as pruning percentage.
  • 27. Conclusion We proposed a mixed integer program to compute neuron importance scores in ReLU-based deep neural networks. Our contributions focus on providing scalable computation of importance scores in fully connected and convolutional layers.