SlideShare a Scribd company logo
Introducing sparsity in artificial neural
networks:
a sensitivity-based approach
ENZO TARTAGLIONE
POSTDOC AT UNIVERSITY OF TORINO
Universita’ degli Studi
Di Torino
Computer Science Dept.
EIDOS group
Deep networks
• High number of hidden layers
• More complex classification tasks
(ImageNet)
• Use of convolutional layers, pooling layers,
very large fully-connected layers
• Very high number of parameters (hundreds
of millions and even more…)
• Is it possible to boost the performance
making the ANN robust to noise?
2
STATE OF THE ART
Size of ANN models vs generalization
3
STATE OF THE ART
Approaches to reduce the size of an ANN
4
Quantization [Zhou et al., 2016] [Han et al., 2015-1]
Modify the architecture [Howard et al., 2017]
Regularize and prune to achieve sparsity
STATE OF THE ART
Why sparse networks?
5
Less memory required.
Less comp. resources.
Deployability on embedded
devices.
STATE OF THE ART
Typical architectures are
overparametrized!
Some existing pruning strategies…
6
Design a proxy L0 regularizer [Louizos et al., 2018]
Greedy thresholding after L2+dropout strategy [Han et al., 2015-2]
Grouping for convolutional features[Lebedev and Lempitsky, 2016] [Hadifar et al., 2020]
Dropout-based approaches [Srivastava et al., 2014]
Lasso-based regularizers [Scardapane et al., 2017]
…
STATE OF THE ART
When is a parameter necessary?
7
.
.
.
.
I
N
P
U
T
O
U
T
P
U
T
𝑦
𝑤
Changing w we change
the output of the
network… we don’t want
to modify it!
Changing w we do not
change the output of the
network… we are free to
change it!
Forward Propagation
PUBLISHED
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
PRE-TRAINED MODEL
Definition of sensitivity
Small perturbation of w
where C is the size of the output and 𝛼 𝑘 a weight scalar factor.
8
Δ𝑦 𝑘 ≈ Δ𝑤𝑖
𝜕𝑦 𝑘
𝜕𝑤𝑖
𝑆 𝒚, 𝑤𝑖 =
𝑘=1
𝐶
𝛼 𝑘
𝜕𝑦 𝑘
𝜕𝑤𝑖
PUBLISHED
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
Towards the definition of the update term
We need an insensitivity parameter:
To guarantee this quantity always being positive… trivial choice:
9
Value Importance
𝑆
𝑆
𝑤 ?
?
𝑆 𝒚, 𝑤𝑖 = 1 − 𝑆(𝒚, 𝑤𝑖)
𝑆 𝑏 𝒚, 𝑤𝑖 = max 0, 𝑆 𝒚, 𝑤𝑖
PUBLISHED
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
Weight update proposed:
where
• 𝑆 𝑏 is a function which states whether the parameter is relevant or not to the computation of
the output y of the network.
• 𝒚 is the output of the network.
• 𝑤𝑖 is the parameter.
• 𝐿 is a generic loss function.
Sensitivity-based regularization
10
𝑤𝑖
𝑡
≔ 𝑤𝑖
𝑡−1
− 𝜂
𝜕𝐿
𝜕𝑤𝑖
𝑡−1 − 𝜆𝑤𝑖
𝑡−1
𝑆 𝑏(𝒚, 𝑤𝑖
𝑡−1
)
PUBLISHED
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
Which function are we minimizing?
We need to solve the integral
Math, math, math….
valid for any architecture and any loss function, but for ReLU-activated networks…
11
PUBLISHED
𝑅 = 𝑤 ⋅ 𝑆 𝑏 𝒚, 𝑤 𝑑𝑤
𝑅 = Θ 𝑆 𝒚, 𝑤
𝑤2
2
1 −
𝑘=1
𝐶
𝛼 𝑘 𝑠𝑖𝑔𝑛
𝜕𝑦 𝑘
𝜕𝑤
𝑛=1
∞
−1 𝑛+1
𝜕 𝑛
𝑦 𝑘
𝜕𝑤 𝑛
𝑤 𝑛−1
𝑛 + 1 !
𝑅 =
𝑤2
2
𝑆 𝑏 𝒚, 𝑤
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
Thresholding
During training, due to numerical
errors and asymptotic behaviors it
might happen that its value never
reaches zero.
For this we introduce a simple
thresholding mechanism
12
PUBLISHED
𝑤𝑖 < 𝑇
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
Overview on the technique
13
𝑤𝑖
𝑡
≔ 𝑤𝑖
𝑡−1
− 𝜂
𝜕𝐿
𝜕𝑤𝑖
𝑡−1 − 𝜆𝑤𝑖
𝑡−1
𝑆 𝑏(𝒚, 𝑤𝑖
𝑡−1
)
Forward Propagation
Back-Propagation
Update
Pruning
𝜕𝐿
𝜕𝑤
, S(w)
At the end of the epoch
PUBLISHED
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
Sensitivity-based regularization:
results on LeNet300-MNIST
14
PUBLISHED
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
15
Sensitivity-based regularization:
results on VGG16-ImageNet
PUBLISHED
Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven
regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
16
References
[Zhou et al., 2016] Zhou, Shuchang, et al. "Dorefa-net: Training low bitwidth convolutional neural networks with
low bitwidth gradients." arXiv preprint arXiv:1606.06160 (2016).
[Han et al., 2015-1] Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural
networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015).
[Howard et al., 2017] Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile
vision applications." arXiv preprint arXiv:1704.04861 (2017).
[Louizos et al., 2018] C. Louizos, M. Welling, and D. P. Kingma, “Learning sparse neuralnetworks
throughl0regularization,”6th International Conference onLearning Representations, ICLR 2018 - Conference Track
Proceedings,2018.
[Han et al, 2015-2] . Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and con-nections for efficient
neural network,” inAdvances in neural informationprocessing systems, 2015, pp. 1135–1143.
[Srivastava et al., 2014] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-dinov, “Dropout: a
simple way to prevent neural networks from over-fitting,”The Journal of Machine Learning Research, vol. 15, no.
1, pp.1929–1958, 2014
17
References (II)
[Lebedev and Lempitsky, 2016] . Lebedev and V. Lempitsky, “Fast convnets using group-wise
braindamage,” inProceedings of the IEEE Conference on Computer Visionand Pattern
Recognition, 2016, pp. 2554–2564.
[Scardapane et al., 2017] Scardapane, Simone, et al. "Group sparse regularization for deep
neural networks." Neurocomputing 241 (2017): 81-89.
[Hadifar et al., 2020] Hadifar, Amir, et al. "Block-wise Dynamic Sparseness." arXiv preprint
arXiv:2001.04686 (2020).
18

More Related Content

What's hot

HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
Yan Xu
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
RADO7900
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
Sivagowry Shathesh
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
KONGU ENGINEERING COLLEGE
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
RADO7900
 
IRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep LearningIRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep Learning
IRJET Journal
 
The Deep Learning Glossary
The Deep Learning GlossaryThe Deep Learning Glossary
The Deep Learning Glossary
NVIDIA
 
neural network
neural networkneural network
neural network
STUDENT
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
David Voyles
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
Vijayananda Mohire
 
David Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AIDavid Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AI
Bayes Nets meetup London
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
Universitat Politècnica de Catalunya
 
Deep learning - what is it and why now?
Deep learning - what is it and why now?Deep learning - what is it and why now?
Deep learning - what is it and why now?
Natalia Konstantinova
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
Knoldus Inc.
 
Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...
Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...
Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...
Edureka!
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Amr Rashed
 
Neural networks in business forecasting
Neural networks in business forecastingNeural networks in business forecasting
Neural networks in business forecasting
Amir Shokri
 
Neural networks
Neural networksNeural networks
Neural networks
Rizwan Rizzu
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Sarvesh Kumar
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew Ng
dataHacker. rs
 

What's hot (20)

HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
IRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep LearningIRJET- Machine Learning V/S Deep Learning
IRJET- Machine Learning V/S Deep Learning
 
The Deep Learning Glossary
The Deep Learning GlossaryThe Deep Learning Glossary
The Deep Learning Glossary
 
neural network
neural networkneural network
neural network
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
Bhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogueBhadale group of companies ai neural networks and algorithms catalogue
Bhadale group of companies ai neural networks and algorithms catalogue
 
David Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AIDavid Barber - Deep Nets, Bayes and the story of AI
David Barber - Deep Nets, Bayes and the story of AI
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
 
Deep learning - what is it and why now?
Deep learning - what is it and why now?Deep learning - what is it and why now?
Deep learning - what is it and why now?
 
State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
 
Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...
Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...
Deep Learning Using TensorFlow | TensorFlow Tutorial | AI & Deep Learning Tra...
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Neural networks in business forecasting
Neural networks in business forecastingNeural networks in business forecasting
Neural networks in business forecasting
 
Neural networks
Neural networksNeural networks
Neural networks
 
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
 
Notes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew NgNotes from Coursera Deep Learning courses by Andrew Ng
Notes from Coursera Deep Learning courses by Andrew Ng
 

Similar to Learning Sparse Neural Networksvia Sensitivity-Driven Regularization

Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
Massimiliano Patacchiola
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM France Lab
 
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
Md Rakibul Hasan
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
ijaia
 
A Study of Social Media Data and Data Mining Techniques
A Study of Social Media Data and Data Mining TechniquesA Study of Social Media Data and Data Mining Techniques
A Study of Social Media Data and Data Mining Techniques
IJERA Editor
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
SungminYou
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Valentin De Bortoli
 
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
ijaia
 
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Daniel George
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
Sangeeta Tiwari
 
Stock Prediction Using Artificial Neural Networks
Stock Prediction Using Artificial Neural NetworksStock Prediction Using Artificial Neural Networks
Stock Prediction Using Artificial Neural Networks
ijbuiiir1
 
Deep randomized neural networks
Deep randomized neural networksDeep randomized neural networks
Deep randomized neural networks
Claudio Gallicchio
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
GauravPandey319
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
Claudio Gallicchio
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
IJERA Editor
 
Classification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural NetworkClassification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural Network
irjes
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET Journal
 
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNINGSOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
IRJET Journal
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
Namkug Kim
 
Sign Language Recognition using Deep Learning
Sign Language Recognition using Deep LearningSign Language Recognition using Deep Learning
Sign Language Recognition using Deep Learning
IRJET Journal
 

Similar to Learning Sparse Neural Networksvia Sensitivity-Driven Regularization (20)

Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning ChallengesIBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
IBM Cloud Paris Meetup 20180517 - Deep Learning Challenges
 
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
An Evolutionary-based Neural Network for Distinguishing between Genuine and P...
 
X trepan an extended trepan for
X trepan an extended trepan forX trepan an extended trepan for
X trepan an extended trepan for
 
A Study of Social Media Data and Data Mining Techniques
A Study of Social Media Data and Data Mining TechniquesA Study of Social Media Data and Data Mining Techniques
A Study of Social Media Data and Data Mining Techniques
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural NetworksQuantitative Propagation of Chaos for SGD in Wide Neural Networks
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
 
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
NETWORK LEARNING AND TRAINING OF A CASCADED LINK-BASED FEED FORWARD NEURAL NE...
 
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
 
Stock Prediction Using Artificial Neural Networks
Stock Prediction Using Artificial Neural NetworksStock Prediction Using Artificial Neural Networks
Stock Prediction Using Artificial Neural Networks
 
Deep randomized neural networks
Deep randomized neural networksDeep randomized neural networks
Deep randomized neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Reservoir computing fast deep learning for sequences
Reservoir computing   fast deep learning for sequencesReservoir computing   fast deep learning for sequences
Reservoir computing fast deep learning for sequences
 
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
Rainfall Prediction using Data-Core Based Fuzzy Min-Max Neural Network for Cl...
 
Classification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural NetworkClassification Of Iris Plant Using Feedforward Neural Network
Classification Of Iris Plant Using Feedforward Neural Network
 
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A SurveyIRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
IRJET- Prediction of Autism Spectrum Disorder using Deep Learning: A Survey
 
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNINGSOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
SOCIAL DISTANCING MONITORING IN COVID-19 USING DEEP LEARNING
 
Recent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectivesRecent advances of AI for medical imaging : Engineering perspectives
Recent advances of AI for medical imaging : Engineering perspectives
 
Sign Language Recognition using Deep Learning
Sign Language Recognition using Deep LearningSign Language Recognition using Deep Learning
Sign Language Recognition using Deep Learning
 

Recently uploaded

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 

Recently uploaded (20)

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 

Learning Sparse Neural Networksvia Sensitivity-Driven Regularization

  • 1. Introducing sparsity in artificial neural networks: a sensitivity-based approach ENZO TARTAGLIONE POSTDOC AT UNIVERSITY OF TORINO Universita’ degli Studi Di Torino Computer Science Dept. EIDOS group
  • 2. Deep networks • High number of hidden layers • More complex classification tasks (ImageNet) • Use of convolutional layers, pooling layers, very large fully-connected layers • Very high number of parameters (hundreds of millions and even more…) • Is it possible to boost the performance making the ANN robust to noise? 2 STATE OF THE ART
  • 3. Size of ANN models vs generalization 3 STATE OF THE ART
  • 4. Approaches to reduce the size of an ANN 4 Quantization [Zhou et al., 2016] [Han et al., 2015-1] Modify the architecture [Howard et al., 2017] Regularize and prune to achieve sparsity STATE OF THE ART
  • 5. Why sparse networks? 5 Less memory required. Less comp. resources. Deployability on embedded devices. STATE OF THE ART Typical architectures are overparametrized!
  • 6. Some existing pruning strategies… 6 Design a proxy L0 regularizer [Louizos et al., 2018] Greedy thresholding after L2+dropout strategy [Han et al., 2015-2] Grouping for convolutional features[Lebedev and Lempitsky, 2016] [Hadifar et al., 2020] Dropout-based approaches [Srivastava et al., 2014] Lasso-based regularizers [Scardapane et al., 2017] … STATE OF THE ART
  • 7. When is a parameter necessary? 7 . . . . I N P U T O U T P U T 𝑦 𝑤 Changing w we change the output of the network… we don’t want to modify it! Changing w we do not change the output of the network… we are free to change it! Forward Propagation PUBLISHED Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018) PRE-TRAINED MODEL
  • 8. Definition of sensitivity Small perturbation of w where C is the size of the output and 𝛼 𝑘 a weight scalar factor. 8 Δ𝑦 𝑘 ≈ Δ𝑤𝑖 𝜕𝑦 𝑘 𝜕𝑤𝑖 𝑆 𝒚, 𝑤𝑖 = 𝑘=1 𝐶 𝛼 𝑘 𝜕𝑦 𝑘 𝜕𝑤𝑖 PUBLISHED Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 9. Towards the definition of the update term We need an insensitivity parameter: To guarantee this quantity always being positive… trivial choice: 9 Value Importance 𝑆 𝑆 𝑤 ? ? 𝑆 𝒚, 𝑤𝑖 = 1 − 𝑆(𝒚, 𝑤𝑖) 𝑆 𝑏 𝒚, 𝑤𝑖 = max 0, 𝑆 𝒚, 𝑤𝑖 PUBLISHED Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 10. Weight update proposed: where • 𝑆 𝑏 is a function which states whether the parameter is relevant or not to the computation of the output y of the network. • 𝒚 is the output of the network. • 𝑤𝑖 is the parameter. • 𝐿 is a generic loss function. Sensitivity-based regularization 10 𝑤𝑖 𝑡 ≔ 𝑤𝑖 𝑡−1 − 𝜂 𝜕𝐿 𝜕𝑤𝑖 𝑡−1 − 𝜆𝑤𝑖 𝑡−1 𝑆 𝑏(𝒚, 𝑤𝑖 𝑡−1 ) PUBLISHED Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 11. Which function are we minimizing? We need to solve the integral Math, math, math…. valid for any architecture and any loss function, but for ReLU-activated networks… 11 PUBLISHED 𝑅 = 𝑤 ⋅ 𝑆 𝑏 𝒚, 𝑤 𝑑𝑤 𝑅 = Θ 𝑆 𝒚, 𝑤 𝑤2 2 1 − 𝑘=1 𝐶 𝛼 𝑘 𝑠𝑖𝑔𝑛 𝜕𝑦 𝑘 𝜕𝑤 𝑛=1 ∞ −1 𝑛+1 𝜕 𝑛 𝑦 𝑘 𝜕𝑤 𝑛 𝑤 𝑛−1 𝑛 + 1 ! 𝑅 = 𝑤2 2 𝑆 𝑏 𝒚, 𝑤 Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 12. Thresholding During training, due to numerical errors and asymptotic behaviors it might happen that its value never reaches zero. For this we introduce a simple thresholding mechanism 12 PUBLISHED 𝑤𝑖 < 𝑇 Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 13. Overview on the technique 13 𝑤𝑖 𝑡 ≔ 𝑤𝑖 𝑡−1 − 𝜂 𝜕𝐿 𝜕𝑤𝑖 𝑡−1 − 𝜆𝑤𝑖 𝑡−1 𝑆 𝑏(𝒚, 𝑤𝑖 𝑡−1 ) Forward Propagation Back-Propagation Update Pruning 𝜕𝐿 𝜕𝑤 , S(w) At the end of the epoch PUBLISHED Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 14. Sensitivity-based regularization: results on LeNet300-MNIST 14 PUBLISHED Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 15. 15 Sensitivity-based regularization: results on VGG16-ImageNet PUBLISHED Tartaglione, E., Lepsoy, S., Fiandrotti, A., Francini, G. (2018). Learning sparse neural networks via sensitivity-driven regularization. In Advances in Neural Information Processing Systems (NeurIPS 2018)
  • 16. 16
  • 17. References [Zhou et al., 2016] Zhou, Shuchang, et al. "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients." arXiv preprint arXiv:1606.06160 (2016). [Han et al., 2015-1] Han, Song, Huizi Mao, and William J. Dally. "Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding." arXiv preprint arXiv:1510.00149 (2015). [Howard et al., 2017] Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017). [Louizos et al., 2018] C. Louizos, M. Welling, and D. P. Kingma, “Learning sparse neuralnetworks throughl0regularization,”6th International Conference onLearning Representations, ICLR 2018 - Conference Track Proceedings,2018. [Han et al, 2015-2] . Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and con-nections for efficient neural network,” inAdvances in neural informationprocessing systems, 2015, pp. 1135–1143. [Srivastava et al., 2014] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-dinov, “Dropout: a simple way to prevent neural networks from over-fitting,”The Journal of Machine Learning Research, vol. 15, no. 1, pp.1929–1958, 2014 17
  • 18. References (II) [Lebedev and Lempitsky, 2016] . Lebedev and V. Lempitsky, “Fast convnets using group-wise braindamage,” inProceedings of the IEEE Conference on Computer Visionand Pattern Recognition, 2016, pp. 2554–2564. [Scardapane et al., 2017] Scardapane, Simone, et al. "Group sparse regularization for deep neural networks." Neurocomputing 241 (2017): 81-89. [Hadifar et al., 2020] Hadifar, Amir, et al. "Block-wise Dynamic Sparseness." arXiv preprint arXiv:2001.04686 (2020). 18

Editor's Notes

  1. - noise -> are some parameters less relevant than others?
  2. Two questions: how to estimate the change of the output? Where we drive non-necessary w?
  3. T is magnitude-based for all the literature out there