SlideShare a Scribd company logo
EE5180:Introduction to Machine
Learning-Final Presentation
Powerpropagation: A sparsity inducing
weight reparameterisation
Group 5
Saurabh Chawla (EE21M524)
Mandeep Chaudhary (EE21M523)
Nirmal Mundra (EE21M527)
Rahul Alok Sharma (EE21M519)
Introduction to Neural Networks
• Network consists of a large number of highly interconnected processing
elements (neurons) working together to learn from experience.
• Each neuron is connected to other neurons by means of directed
communication links, each with associated weight. The weight represent
information being used by the net to solve a problem.
Advantages:
• It can model non-linear systems
• The ability to learn allows the network to adapt to changes in
the surrounding environment
• Distributed nature of the NN gives it fault tolerant capabilities
Powerpropagation
• Powerpropagation is a new weight-parameterisation for neural networks
that leads to inherently sparse models.
• In this training the model is regularly pruned or sparsified during training
in order to reduce the computational burden thus eliminating parameters
that do not play a vital role in the functional behaviour of the model.
• During the Training, we raise the weight of the network to the a–th–power
(preserving the sign).This will result in magnitude of the weight appearing
in the gradient (chain rule), thus encouraging “rich got richer” dynamics.
Powerpropagation cont.
• This solution is encoded by reduced capacity, resulting from the learning
process itself, without any explicit force to impose frugality (i.e. no
masking, regularisation, etc.).
• Reparameterisation focus is towards sparse representation rather then
improving convergence.
= Original Weight
= Reparameterised Weight
= Link Function
= Raising Parameter Rate
α
Gradient of Reparameterised Loss L(., Ψ (Φ)) wrt. to Φ:
Properties:
1. α=1 recovers the baseline i.e, standard gradiant descent
2. Zero is the critical point of any weight for α > 1
3. In addition zero is surrounded by a plateau.Weights are less likely to
change sign
Effect on Weight Distribution
(a) Standard Learning (b) Powerpropagation
• When learning rate α= 1.0 applied then the distribution of weights is
standard and equally distributed about centre (as shown in fig. in blue
colour).
• When learning rate α= 1.5 applied then major density of weights are
distributed near centre (shown in green colour.)
• Parameters with larger magnitudes are allowed to adapt faster in order to
represent the required features to solve the task, while smaller magnitude
parameters are restricted, making it more likely that they will be irrelevant
in representing the learned solution
Hyper Parameters Selection for MNIST/Fashion MNIST
MNIST : Training set of 60000 images & Test set of 10000 images. Images
centred in a 28 x 28 Pixel.
Train Batch Size: 60
Training Steps: 50000
Learning Rate : 0.1
α (Power Propagation Parameter) checked with: 1,2,3,4 & 5.
Results (with MNIST , Fashion MNIST Dataset)
(a) MNIST (b) Fashion MNIST
Remaining
Weights
Raising
Parameter(α)
Accuracy
10% Baseline 70.00%
10% 3 96.77%
Remaining
Weights
Raising
Parameter(α)
Accuracy
10% Baseline 50.28%
10% 3 78.48%
Results (with CIFAR 10 Dataset)
Implementation variations
• Powerpropagation is general, intuitive, cheap and straight-forward to implement
and can readily be combined with various other techniques. Powerpropagation
can be implemented in two different settings:
1. Combining Powerpropagation with a traditional weight-pruning technique e.g.
Shot pruning, iterative pruning
2. Combining with recent state-of-the-art sparse-to-sparse algorithms e.g.
TopKAST
Continual learning
• It means Sequential learning of tasks without forgetting.
• Powerpropagation for Continual Learning will base its use on the class of methods that
implement gradient sparsity. i.e the catastrophic forgetting is overcome by masking the
gradients to parameters found to constitute the solution of the previous task.
• Eg PackNet
Problems in PackNet
• The no of Tasks should be known beforehand
• By reserving fixed fraction of weights for each task, no distinction is made in terms of
difficulty or relatedness to previous data.
• The above two problems are overcome by Efficient PackNet
How to resolve PackNet problem
• Problem of Resource Allocation: A simple search over the range of sparsity rates
is done and is terminated once the model’s performance falls short of minimum
accepted target performance or once minimal accepted sparsity is reached.
• The mask for the certain task is chosen among all network parameters, including the
ones used by previous tasks, hence it encourages the reuse of existing parameters.
• As the backward pass becomes sparse, the method becomes more computationally
efficient with large no of tasks.
Related Work
• Modern approaches to sparsity in deep learning are categorized as Dense to
Sparse and Sparse to sparse methods.
• Dense to sparse algo refers to instantiation of dense network that is
sparsified throughout the training. Eg One Shot Pruning,
• Sparse to Sparse Algo maintains a constant sparsity throughout. Eg single
Shot Pruning,
Thanks

More Related Content

Similar to EE5180_G-5.pptx

Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
waseem khan
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Randa Elanwar
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
MayuraD1
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Simplilearn
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
ssuserd23711
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
DrKBManwade
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
sudheeremoa229
 
Setting Artificial Neural Networks parameters
Setting Artificial Neural Networks parametersSetting Artificial Neural Networks parameters
Setting Artificial Neural Networks parameters
Madhumita Tamhane
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
Taymoor Nazmy
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Maninda Edirisooriya
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
Dongmin Choi
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable Networks
NAVER Engineering
 
Semi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networksSemi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networks
品媛 陳
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
MohamedAliHabib3
 

Similar to EE5180_G-5.pptx (20)

Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
 
Setting Artificial Neural Networks parameters
Setting Artificial Neural Networks parametersSetting Artificial Neural Networks parameters
Setting Artificial Neural Networks parameters
 
Lec 6-bp
Lec 6-bpLec 6-bp
Lec 6-bp
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable Networks
 
Ffnn
FfnnFfnn
Ffnn
 
Semi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networksSemi orthogonal low-rank matrix factorization for deep neural networks
Semi orthogonal low-rank matrix factorization for deep neural networks
 
N ns 1
N ns 1N ns 1
N ns 1
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 

Recently uploaded

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 

Recently uploaded (20)

Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 

EE5180_G-5.pptx

  • 1. EE5180:Introduction to Machine Learning-Final Presentation Powerpropagation: A sparsity inducing weight reparameterisation Group 5 Saurabh Chawla (EE21M524) Mandeep Chaudhary (EE21M523) Nirmal Mundra (EE21M527) Rahul Alok Sharma (EE21M519)
  • 2. Introduction to Neural Networks • Network consists of a large number of highly interconnected processing elements (neurons) working together to learn from experience. • Each neuron is connected to other neurons by means of directed communication links, each with associated weight. The weight represent information being used by the net to solve a problem. Advantages: • It can model non-linear systems • The ability to learn allows the network to adapt to changes in the surrounding environment • Distributed nature of the NN gives it fault tolerant capabilities
  • 3. Powerpropagation • Powerpropagation is a new weight-parameterisation for neural networks that leads to inherently sparse models. • In this training the model is regularly pruned or sparsified during training in order to reduce the computational burden thus eliminating parameters that do not play a vital role in the functional behaviour of the model. • During the Training, we raise the weight of the network to the a–th–power (preserving the sign).This will result in magnitude of the weight appearing in the gradient (chain rule), thus encouraging “rich got richer” dynamics.
  • 4. Powerpropagation cont. • This solution is encoded by reduced capacity, resulting from the learning process itself, without any explicit force to impose frugality (i.e. no masking, regularisation, etc.). • Reparameterisation focus is towards sparse representation rather then improving convergence. = Original Weight = Reparameterised Weight = Link Function = Raising Parameter Rate α
  • 5. Gradient of Reparameterised Loss L(., Ψ (Φ)) wrt. to Φ: Properties: 1. α=1 recovers the baseline i.e, standard gradiant descent 2. Zero is the critical point of any weight for α > 1 3. In addition zero is surrounded by a plateau.Weights are less likely to change sign
  • 6. Effect on Weight Distribution (a) Standard Learning (b) Powerpropagation
  • 7. • When learning rate α= 1.0 applied then the distribution of weights is standard and equally distributed about centre (as shown in fig. in blue colour). • When learning rate α= 1.5 applied then major density of weights are distributed near centre (shown in green colour.) • Parameters with larger magnitudes are allowed to adapt faster in order to represent the required features to solve the task, while smaller magnitude parameters are restricted, making it more likely that they will be irrelevant in representing the learned solution
  • 8. Hyper Parameters Selection for MNIST/Fashion MNIST MNIST : Training set of 60000 images & Test set of 10000 images. Images centred in a 28 x 28 Pixel. Train Batch Size: 60 Training Steps: 50000 Learning Rate : 0.1 α (Power Propagation Parameter) checked with: 1,2,3,4 & 5.
  • 9. Results (with MNIST , Fashion MNIST Dataset) (a) MNIST (b) Fashion MNIST Remaining Weights Raising Parameter(α) Accuracy 10% Baseline 70.00% 10% 3 96.77% Remaining Weights Raising Parameter(α) Accuracy 10% Baseline 50.28% 10% 3 78.48%
  • 10. Results (with CIFAR 10 Dataset)
  • 11. Implementation variations • Powerpropagation is general, intuitive, cheap and straight-forward to implement and can readily be combined with various other techniques. Powerpropagation can be implemented in two different settings: 1. Combining Powerpropagation with a traditional weight-pruning technique e.g. Shot pruning, iterative pruning 2. Combining with recent state-of-the-art sparse-to-sparse algorithms e.g. TopKAST
  • 12. Continual learning • It means Sequential learning of tasks without forgetting. • Powerpropagation for Continual Learning will base its use on the class of methods that implement gradient sparsity. i.e the catastrophic forgetting is overcome by masking the gradients to parameters found to constitute the solution of the previous task. • Eg PackNet Problems in PackNet • The no of Tasks should be known beforehand • By reserving fixed fraction of weights for each task, no distinction is made in terms of difficulty or relatedness to previous data. • The above two problems are overcome by Efficient PackNet
  • 13. How to resolve PackNet problem • Problem of Resource Allocation: A simple search over the range of sparsity rates is done and is terminated once the model’s performance falls short of minimum accepted target performance or once minimal accepted sparsity is reached. • The mask for the certain task is chosen among all network parameters, including the ones used by previous tasks, hence it encourages the reuse of existing parameters. • As the backward pass becomes sparse, the method becomes more computationally efficient with large no of tasks.
  • 14. Related Work • Modern approaches to sparsity in deep learning are categorized as Dense to Sparse and Sparse to sparse methods. • Dense to sparse algo refers to instantiation of dense network that is sparsified throughout the training. Eg One Shot Pruning, • Sparse to Sparse Algo maintains a constant sparsity throughout. Eg single Shot Pruning,