Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

Identifying Critical Neurons in
ANN Architectures using Mixed
Integer Programming
Mostafa ElAraby Guy Wolf Margarida Carvalho
[OPTML Neurips 2020]

Motivation
The existence of eﬃcient sub-networks
with faster inference and marginal loss in
accuracy when compared to the original
over-parameterized ANN.
Frankle and Carbin (2018) introduced the
lottery ticket conjecture and empirically
showed the existence of a lucky pruned
subnetwork, a winning ticket.

Contents
● Preliminary about Mixed-Integer Programming (MIP)
● Neuron Importance Score introduction
● MIP formulation
● Proposed Algorithm
● Scalability
● Experiments
● Conclusion

Linear Programming (LP)
A powerful framework used to solve optimization problems in the following form:
Linear Objective
An optimization objective that
can be minimization or
maximization of a linear
equation consisting of
decision variables that wer are
trying to solve.
Decision Variables
Variable optimized by the LP
optimization process and at
the end the solver will give its
solved value.
Linear Constraints
A set of constraints on the
decision variables that the
solver tries to satisfy
narrowing its optimization
space. The solver would throw
an infeasible solution if it can’t
ﬁnd a solution satisfying the
linear constraints.

Mixed-Integer Programming (MIP)
Similar to the linear programming optimization but can have integer decision
variables along with continuous variables used in linear programming.
It is considered a harder problem that can be relaxed into a linear programming
problem.

Branch and Bound algorithm
We relax our MIP into an LP if we solve it
we are lucky and wwe get the optimal
solution. Otherwise, which is the normal
case we take an integer variable having a
ﬂoat solution (branching variable) and we
add linear constraints excluding that
solution resulting in 2 new MIPs.

Introduction
MIP solver will compute a neuron
importance score [0-1] for neurons in
convolutional/ fully connected layers.
Neurons with small importance score
can be safely pruned without loss in
terms of accuracy.

Linear layers with no activation
Let h be a decision variable representing input value to layer l having weights W
and bias b

Relaxing z decision variable
For faster solving time we relax the binary decision variable to be a relaxed
approximation

Proposed Constraints with Importance Score S

Representing Convolutional layers
We convert convolutional layers to Toeplitz ﬂat matrices converting the
convolution to simple matrices multiplication and using same previous
constraints introduced for the fully connected layers with importance score for
each ﬁlter

Objective Function : Softmax
Softmax: is the marginal softmax that penalize for wrong predictions
regardless of the logit value. Y is the one hot encoded true label.

Objective Function: Sparsity
I represents the scaled down importance score (s - 2) that shown empirically to
give non-important neurons a lower score.
When we increase ƛ , more neurons gets the value near zero.

MIP Solvers are slow
Representing a deep neural network is hard to solve in even commercial solvers
making it harder for our algorithm to scale well for large models.
For that problem we propose 2 solutions:
- Parallelizing computation layer wise
- Parallelizing computation Class wise

Parallel layers using decoupled greedy learning

Class-wise decoupling
In this experiment, we show that the neuron importance scores can be
approximated by 1) solving for each class the MIP with only one data point
from it, and then 2) taking the average of the computed scores for each neuron
as the ﬁnal score estimation. Such procedure would speed-up our methodology
for problems with numerous classes.

Robustness Experiments
We show empirically that our framework is robust on different convergence
levels of the trained neural network as shown in the following Figure.

Generalization Experiments
Cross-dataset generalization: sub-network masking is computed on source
dataset (d1 ) and then applied to target dataset (d2 ) by retraining with the
same early initialization. Test accuracies are presented for masked and
unmasked (REF.) networks on d2 , as well as pruning percentage.

Conclusion
We proposed a mixed integer program to compute neuron importance scores in
ReLU-based deep neural networks. Our contributions focus on providing
scalable computation of importance scores in fully connected and
convolutional layers.

Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming

Similar to Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming (20)

Recently uploaded

Recently uploaded (20)

Identifying Critical Neurons in ANN Architectures using Mixed Integer Programming