SlideShare a Scribd company logo
1 of 40
Feedback
1
Interaction Lab., Seoul National University of Science and Technology
■Degree of gaze estimation
 Geometric based
• 0.5°~ 3°
 Appearance based
• 3°~ 12°
Feedback
Interaction Lab., Seoul National University of Science and Technology
■GazeTR-Hybrid and GazeTR-Conv
Feedback
3
Interaction Lab. Seoul National University of Science and Technology
Improving Accuracy of Binary Neural Networks
using Unbalanced Activation Distribution
Jae-Yeop Jeong
Computer Vision and Pattern Recognition 2021
Hyungjun Kim, Jihoon Park, Changhun Lee, Jae-Joon Kim
POSTECH, Korea
Interaction Lab., Seoul National University of Science and Technology
■Intro
■Method
■Conclusion
Agenda
5
Intro
Method
6
Interaction Lab., Seoul National University of Science and Technology
■Lightweight Network
 Network quantization
 Network pruning
 Efficient architecture design
Intro(1/6)
7
Interaction Lab., Seoul National University of Science and Technology
■Network quantization
 Convert floating-point type to integer or fixed point
• Model Lightweight
• Reduced computation
• Efficient hardware use
 The process that reduces represent form and internal composition in Neural Network
Intro(2/6)
8
Interaction Lab., Seoul National University of Science and Technology
■Binary Neural Network
 32-bit floating-point weights can be represented by 1-bit weights
 XNOR-and-popcount logics
 Weight quantization
 Activation quantization
Intro(3/6)
9
-1.21 0.88 -0.11
0.17 1.72 -0.7
0.98 -0.57 0.63
-1 1 -1
1 1 -1
1 -1 1
Interaction Lab., Seoul National University of Science and Technology
■Accuracy of BNNs
 Due to the lightweight nature, BNNs are garnering interests as a promising solution for
DNN computing on edge devices
 BNNs still suffer from the accuracy degradation caused by the aggressive quantization
 Low accuracy than full-precision model
Intro(4/6)
10
Interaction Lab., Seoul National University of Science and Technology
■Problems of BNNs
 Gradient mismatch problem caused by the non-differentiable binary activation function
• Gradients cannot propagate through the quantization layer in the back-propagation process
• Straight-Through-Estimator(STE)
Intro(5/6)
11
Interaction Lab., Seoul National University of Science and Technology
■Technique to improve the accuracy of BNNs
 Straight-Through-Estimator(STE)
 Parameterized clipping activation function(PACT)
• Focused on multi-bit networks (2-bit, 4-bit …)
 Managing activation distribution
• There are only two values (1 or -1)
• Focused on binary activation more balanced
 Additional activation function
• Between the binary convolution layer and the following BN layer
• ReLU, PReLU
• Increasing the non-linearity
Intro(6/6)
12
Method
Conclusion
13
Interaction Lab., Seoul National University of Science and Technology
■Overview
 The activation functions in conventional full precision models to describe the motivation for
using unbalanced activation distribution
 How to make the distribution of the binary activation unbalanced using threshold shifting in
BNNs
 Experimental results on ImageNet data set
 Ineffectiveness of the methodology to train the threshold of binary activation in BNNs
 Additional activation functions
Method(1/25)
14
Interaction Lab., Seoul National University of Science and Technology
■Motivation for using unbalanced activation distribution
 ReLU solved the gradient vanishing problem
• The gradient vanishing problem is largely diminished by using BN layers in recent models
 There might be other reasons for the success of the ReLU function
Method(2/25)
15
Interaction Lab., Seoul National University of Science and Technology
■Make unbalanced activation distribution using hardtanh
 Compare the performance hardtanh function and ReLU6
• Slight variant of the ReLU function where its positive outputs are clipped to 6
 Threshold shifting
• x-axis, y-axis, increased range
Method(3/25)
16
Interaction Lab., Seoul National University of Science and Technology
■Training model
 VGG-small model was trained on CIFAR-10 dataset with different activation functions
 VGG-small
• Consists of 4 Conv with 64, 64, 128, 128 output channel
• Three dense with 512
• BN layer is placed between the Conv layer and the activation layer
• Max pooling layer was used after each of the latter three convolution layers
 Trained the model in the same condition with 10 different seeds
Method(4/25)
17
Interaction Lab., Seoul National University of Science and Technology
■Accuracy gap between hardtanh and ReLU6(1/2)
 Red dotted line represents the test accuracy of the ReLU6
Method(5/25)
18
Interaction Lab., Seoul National University of Science and Technology
■Accuracy gap between hardtanh and ReLU6(2/2)
 The hardtanh function is shifted to the right along the x-axis
• The number of negative outputs increase and the output distribution becomes positively skewed
 The higher performance of ReLU activation function
• Come from the unbalanced distribution of activation outputs
• When sign function is used, the activation distribution becomes even more noticeable in BNNs
Method(6/25)
19
Interaction Lab., Seoul National University of Science and Technology
■BNNs with unbalanced activation distribution(1/4)
 Previous multi-bit quantization methods
• ReLU-based quantization which outputs unbalanced activation distribution
 Previous BNNs quantization methods
• Sign function for binary activation function thus distribution of binary activation is balanced
• The ratio of 1 to -1 is close to 1:1
Method(7/25)
20
Interaction Lab., Seoul National University of Science and Technology
■BNNs with unbalanced activation distribution(2/4)
 By shifting the activation function along the x-axis helps improve the accuracy of a model
 Poor performance was sparked by balanced activation distribution (sign function)
 Shifting the sign function along the x-axis
Method(8/25)
21
Interaction Lab., Seoul National University of Science and Technology
■BNNs with unbalanced activation distribution(3/4)
 VGG-small
• The Max pooling layer makes the distribution positively skewed
Method(9/25)
22
Interaction Lab., Seoul National University of Science and Technology
■BNNs with unbalanced activation distribution(4/4)
 Effect of Max Pooling
 Max pooling layer is the only layer that gives asymmetry in the model
• The accuracy difference between shifting the threshold to the positive direction and the negative direction
• Replace Max pooling to Average pooling
Method(10/25)
23
Interaction Lab., Seoul National University of Science and Technology
■Effect of other conditions
 Datasets
 Models
 Initialization methods
 Optimizers
Method(11/25)
24
Interaction Lab., Seoul National University of Science and Technology
■Dataset
 MNIST
• Binary VGG- small model on MNIST dataset
• 30 different random seeds and average results are presented
Method(12/25)
25
Interaction Lab., Seoul National University of Science and Technology
■Model architecture
 Two additional model architectures
 2-layer MLP and LeNet-5
• 2-layer MLP do not have Max pooling layer in it
Method(13/25)
26
Interaction Lab., Seoul National University of Science and Technology
■Initialization method(1/2)
 Trained the models from scratch using Xavier normal initialization
 Full-precision model is float point 32
 Pretrained full-precision models are often used to initialize BNN models
 Hardtanh + ReLU
 Full- precision VGG-small model on CIFAR-10 dataset
 Binary VGG- small model initialized with full-precision pretrained model
Method(14/25)
27
Interaction Lab., Seoul National University of Science and Technology
■Initialization method(2/2)
Method(15/25)
28
1
0.6
Interaction Lab., Seoul National University of Science and Technology
■Optimizer
 Mostly used Adam optimizer in BNN
 Replace Adam to SGD
Method(16/25)
29
Interaction Lab., Seoul National University of Science and Technology
■Experiments on ImageNet
 Finding shift amount
• Top-1 train accuracy at the first epoch and Top-1 validation accuracy
Method(17/25)
30
Interaction Lab., Seoul National University of Science and Technology
■Results on ImageNet
Method(18/25)
31
Interaction Lab., Seoul National University of Science and Technology
■Effect of training threshold
 Limited effect on the performance of BNNs
 Batch normalization bias
 Limited learning capability
Method(19/25)
32
Interaction Lab., Seoul National University of Science and Technology
■Batch normalization bias
 Trainable threshold is already covered by the bias values in the BN layer
 𝑋 and 𝑌 are inputs to the BN layer and output from the binary activation layer
 𝜇, 𝜎, 𝛾, 𝛽, and 𝑡ℎ
 mean, std, weight, and bias of the BN layer
 𝛽 and 𝑡ℎ are initialized to zero in the beginning
Method(20/25)
33
Interaction Lab., Seoul National University of Science and Technology
■Limited learning capability
 Trainable threshold method
• An initialization point
Method(21/25)
34
Interaction Lab., Seoul National University of Science and Technology
■Effect of additional activation function(1/4)
 Use additional activation function after convolution layer in BNNs
 BNNs usually lack non-linearity in the model
 Additional activation function is also related to the unbalanced activation distribution
Method(22/25)
35
Interaction Lab., Seoul National University of Science and Technology
■Effect of additional activation function(2/4)
 Effect of additional PReLU layers after binary convolution layers in the ResNet20
Method(23/25)
36
Interaction Lab., Seoul National University of Science and Technology
■Effect of additional activation function(3/4)
 There is accuracy degradation with PReLU
• Already play a role in distorting the activation distribution
• Further shifting the threshold of binary activation function is excessive
Method(24/25)
37
Interaction Lab., Seoul National University of Science and Technology
■Effect of additional activation function(4/4)
 Replace to PReLU to LeakyReLU
 When the slope 1
• Identity function
Method(25/25)
38
Interaction Lab., Seoul National University of Science and Technology
■Analyzed the impact of activation distribution on the accuracy of BNN models
■Demonstrated that the accuracy of previous BNN models could be further
improved
Conclusion
39
Q&A
40

More Related Content

What's hot

Algorithm For optimization.pptx
Algorithm For optimization.pptxAlgorithm For optimization.pptx
Algorithm For optimization.pptxKARISHMA JAIN
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief NetworksHasan H Topcu
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back trackingTech_MX
 
Introduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANsIntroduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANsHichem Felouat
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Yuta Niki
 
Planning in Artificial Intelligence
Planning in Artificial IntelligencePlanning in Artificial Intelligence
Planning in Artificial Intelligencekitsenthilkumarcse
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Midsquare method- simulation system
Midsquare method- simulation systemMidsquare method- simulation system
Midsquare method- simulation systemArman Hossain
 
Common Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationCommon Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationSigOpt
 
Fundamentals of Neural Networks
Fundamentals of Neural NetworksFundamentals of Neural Networks
Fundamentals of Neural NetworksGagan Deep
 
Continual Learning: why, how, and when
Continual Learning: why, how, and whenContinual Learning: why, how, and when
Continual Learning: why, how, and whenGabriele Graffieti
 
Kdd'20 presentation 223
Kdd'20 presentation 223Kdd'20 presentation 223
Kdd'20 presentation 223Manh Tuan Do
 

What's hot (20)

Algorithm For optimization.pptx
Algorithm For optimization.pptxAlgorithm For optimization.pptx
Algorithm For optimization.pptx
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
8 queens problem using back tracking
8 queens problem using back tracking8 queens problem using back tracking
8 queens problem using back tracking
 
Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
Introduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANsIntroduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANs
 
Learn Latex
Learn LatexLearn Latex
Learn Latex
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)Transformer Introduction (Seminar Material)
Transformer Introduction (Seminar Material)
 
Planning in Artificial Intelligence
Planning in Artificial IntelligencePlanning in Artificial Intelligence
Planning in Artificial Intelligence
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Introduction to Latex
Introduction to LatexIntroduction to Latex
Introduction to Latex
 
Midsquare method- simulation system
Midsquare method- simulation systemMidsquare method- simulation system
Midsquare method- simulation system
 
Common Problems in Hyperparameter Optimization
Common Problems in Hyperparameter OptimizationCommon Problems in Hyperparameter Optimization
Common Problems in Hyperparameter Optimization
 
Fundamentals of Neural Networks
Fundamentals of Neural NetworksFundamentals of Neural Networks
Fundamentals of Neural Networks
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Continual Learning: why, how, and when
Continual Learning: why, how, and whenContinual Learning: why, how, and when
Continual Learning: why, how, and when
 
Kdd'20 presentation 223
Kdd'20 presentation 223Kdd'20 presentation 223
Kdd'20 presentation 223
 

Similar to Improving accuracy of binary neural networks using unbalanced activation distribution

Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationJaey Jeong
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformerJaey Jeong
 
Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsJaey Jeong
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
DeepXplore: Automated Whitebox Testing of Deep Learning Systems DeepXplore: Automated Whitebox Testing of Deep Learning Systems
DeepXplore: Automated Whitebox Testing of Deep Learning Systems Jeongwhan Choi
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...André Gonçalves
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...ssuser4b1f48
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Lviv Data Science Summer School
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...
NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...
NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...ssuser4b1f48
 
A novel hybridization of opposition-based learning and cooperative co-evoluti...
A novel hybridization of opposition-based learning and cooperative co-evoluti...A novel hybridization of opposition-based learning and cooperative co-evoluti...
A novel hybridization of opposition-based learning and cooperative co-evoluti...Borhan Kazimipour
 
Graph Attention Networks.pptx
Graph Attention Networks.pptxGraph Attention Networks.pptx
Graph Attention Networks.pptxssuser2624f71
 
Artificial Neural Network Seminar Report
Artificial Neural Network Seminar ReportArtificial Neural Network Seminar Report
Artificial Neural Network Seminar ReportTodd Turner
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...ssuser4b1f48
 
Live to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural networkLive to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural networknooriasukmaningtyas
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutSeunghyun Hwang
 

Similar to Improving accuracy of binary neural networks using unbalanced activation distribution (20)

Unsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimationUnsupervised representation learning for gaze estimation
Unsupervised representation learning for gaze estimation
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformer
 
Neural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settingsNeural networks for semantic gaze analysis in xr settings
Neural networks for semantic gaze analysis in xr settings
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
DeepXplore: Automated Whitebox Testing of Deep Learning Systems DeepXplore: Automated Whitebox Testing of Deep Learning Systems
DeepXplore: Automated Whitebox Testing of Deep Learning Systems
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...Online learning in estimation of distribution algorithms for dynamic environm...
Online learning in estimation of distribution algorithms for dynamic environm...
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
NS-CUK Seminar: S.T.Nguyen, Review on "Hierarchical Graph Convolutional Netwo...
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...
NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...
NS-CUK Joint Journal Club: THLee, Review on "Learning Conjoint Attentions for...
 
A novel hybridization of opposition-based learning and cooperative co-evoluti...
A novel hybridization of opposition-based learning and cooperative co-evoluti...A novel hybridization of opposition-based learning and cooperative co-evoluti...
A novel hybridization of opposition-based learning and cooperative co-evoluti...
 
Graph Attention Networks.pptx
Graph Attention Networks.pptxGraph Attention Networks.pptx
Graph Attention Networks.pptx
 
Artificial Neural Network Seminar Report
Artificial Neural Network Seminar ReportArtificial Neural Network Seminar Report
Artificial Neural Network Seminar Report
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
NS-CUK Journal club: HBKim, Review on "Neural Graph Collaborative Filtering",...
 
Live to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural networkLive to learn: learning rules-based artificial neural network
Live to learn: learning rules-based artificial neural network
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 

More from Jaey Jeong

핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNNJaey Jeong
 
Gaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityGaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityJaey Jeong
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model trainingJaey Jeong
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forestJaey Jeong
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Jaey Jeong
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsJaey Jeong
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnnJaey Jeong
 
deep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsdeep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsJaey Jeong
 
deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationJaey Jeong
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingJaey Jeong
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkJaey Jeong
 

More from Jaey Jeong (12)

핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN핵심 딥러닝 입문 4장 RNN
핵심 딥러닝 입문 4장 RNN
 
Gaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual realityGaze supported 3 d object manipulation in virtual reality
Gaze supported 3 d object manipulation in virtual reality
 
hands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model traininghands on machine learning Chapter 4 model training
hands on machine learning Chapter 4 model training
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...
 
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random foresthands on machine learning Chapter 6&7 decision tree, ensemble and random forest
hands on machine learning Chapter 6&7 decision tree, ensemble and random forest
 
Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...Appearance based gaze estimation using deep features and random forest regres...
Appearance based gaze estimation using deep features and random forest regres...
 
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tabletsTablet gaze unconstrained appearance based gaze estimation in mobile tablets
Tablet gaze unconstrained appearance based gaze estimation in mobile tablets
 
deep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnndeep learning from scratch chapter 7.cnn
deep learning from scratch chapter 7.cnn
 
deep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skillsdeep learning from scratch chapter 5.learning related skills
deep learning from scratch chapter 5.learning related skills
 
deep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagationdeep learning from scratch chapter 6.backpropagation
deep learning from scratch chapter 6.backpropagation
 
deep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learingdeep learning from scratch chapter 4.neural network learing
deep learning from scratch chapter 4.neural network learing
 
deep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural networkdeep learning from scratch chapter 3 neural network
deep learning from scratch chapter 3 neural network
 

Recently uploaded

Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 

Recently uploaded (20)

Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 

Improving accuracy of binary neural networks using unbalanced activation distribution

  • 2. Interaction Lab., Seoul National University of Science and Technology ■Degree of gaze estimation  Geometric based • 0.5°~ 3°  Appearance based • 3°~ 12° Feedback
  • 3. Interaction Lab., Seoul National University of Science and Technology ■GazeTR-Hybrid and GazeTR-Conv Feedback 3
  • 4. Interaction Lab. Seoul National University of Science and Technology Improving Accuracy of Binary Neural Networks using Unbalanced Activation Distribution Jae-Yeop Jeong Computer Vision and Pattern Recognition 2021 Hyungjun Kim, Jihoon Park, Changhun Lee, Jae-Joon Kim POSTECH, Korea
  • 5. Interaction Lab., Seoul National University of Science and Technology ■Intro ■Method ■Conclusion Agenda 5
  • 7. Interaction Lab., Seoul National University of Science and Technology ■Lightweight Network  Network quantization  Network pruning  Efficient architecture design Intro(1/6) 7
  • 8. Interaction Lab., Seoul National University of Science and Technology ■Network quantization  Convert floating-point type to integer or fixed point • Model Lightweight • Reduced computation • Efficient hardware use  The process that reduces represent form and internal composition in Neural Network Intro(2/6) 8
  • 9. Interaction Lab., Seoul National University of Science and Technology ■Binary Neural Network  32-bit floating-point weights can be represented by 1-bit weights  XNOR-and-popcount logics  Weight quantization  Activation quantization Intro(3/6) 9 -1.21 0.88 -0.11 0.17 1.72 -0.7 0.98 -0.57 0.63 -1 1 -1 1 1 -1 1 -1 1
  • 10. Interaction Lab., Seoul National University of Science and Technology ■Accuracy of BNNs  Due to the lightweight nature, BNNs are garnering interests as a promising solution for DNN computing on edge devices  BNNs still suffer from the accuracy degradation caused by the aggressive quantization  Low accuracy than full-precision model Intro(4/6) 10
  • 11. Interaction Lab., Seoul National University of Science and Technology ■Problems of BNNs  Gradient mismatch problem caused by the non-differentiable binary activation function • Gradients cannot propagate through the quantization layer in the back-propagation process • Straight-Through-Estimator(STE) Intro(5/6) 11
  • 12. Interaction Lab., Seoul National University of Science and Technology ■Technique to improve the accuracy of BNNs  Straight-Through-Estimator(STE)  Parameterized clipping activation function(PACT) • Focused on multi-bit networks (2-bit, 4-bit …)  Managing activation distribution • There are only two values (1 or -1) • Focused on binary activation more balanced  Additional activation function • Between the binary convolution layer and the following BN layer • ReLU, PReLU • Increasing the non-linearity Intro(6/6) 12
  • 14. Interaction Lab., Seoul National University of Science and Technology ■Overview  The activation functions in conventional full precision models to describe the motivation for using unbalanced activation distribution  How to make the distribution of the binary activation unbalanced using threshold shifting in BNNs  Experimental results on ImageNet data set  Ineffectiveness of the methodology to train the threshold of binary activation in BNNs  Additional activation functions Method(1/25) 14
  • 15. Interaction Lab., Seoul National University of Science and Technology ■Motivation for using unbalanced activation distribution  ReLU solved the gradient vanishing problem • The gradient vanishing problem is largely diminished by using BN layers in recent models  There might be other reasons for the success of the ReLU function Method(2/25) 15
  • 16. Interaction Lab., Seoul National University of Science and Technology ■Make unbalanced activation distribution using hardtanh  Compare the performance hardtanh function and ReLU6 • Slight variant of the ReLU function where its positive outputs are clipped to 6  Threshold shifting • x-axis, y-axis, increased range Method(3/25) 16
  • 17. Interaction Lab., Seoul National University of Science and Technology ■Training model  VGG-small model was trained on CIFAR-10 dataset with different activation functions  VGG-small • Consists of 4 Conv with 64, 64, 128, 128 output channel • Three dense with 512 • BN layer is placed between the Conv layer and the activation layer • Max pooling layer was used after each of the latter three convolution layers  Trained the model in the same condition with 10 different seeds Method(4/25) 17
  • 18. Interaction Lab., Seoul National University of Science and Technology ■Accuracy gap between hardtanh and ReLU6(1/2)  Red dotted line represents the test accuracy of the ReLU6 Method(5/25) 18
  • 19. Interaction Lab., Seoul National University of Science and Technology ■Accuracy gap between hardtanh and ReLU6(2/2)  The hardtanh function is shifted to the right along the x-axis • The number of negative outputs increase and the output distribution becomes positively skewed  The higher performance of ReLU activation function • Come from the unbalanced distribution of activation outputs • When sign function is used, the activation distribution becomes even more noticeable in BNNs Method(6/25) 19
  • 20. Interaction Lab., Seoul National University of Science and Technology ■BNNs with unbalanced activation distribution(1/4)  Previous multi-bit quantization methods • ReLU-based quantization which outputs unbalanced activation distribution  Previous BNNs quantization methods • Sign function for binary activation function thus distribution of binary activation is balanced • The ratio of 1 to -1 is close to 1:1 Method(7/25) 20
  • 21. Interaction Lab., Seoul National University of Science and Technology ■BNNs with unbalanced activation distribution(2/4)  By shifting the activation function along the x-axis helps improve the accuracy of a model  Poor performance was sparked by balanced activation distribution (sign function)  Shifting the sign function along the x-axis Method(8/25) 21
  • 22. Interaction Lab., Seoul National University of Science and Technology ■BNNs with unbalanced activation distribution(3/4)  VGG-small • The Max pooling layer makes the distribution positively skewed Method(9/25) 22
  • 23. Interaction Lab., Seoul National University of Science and Technology ■BNNs with unbalanced activation distribution(4/4)  Effect of Max Pooling  Max pooling layer is the only layer that gives asymmetry in the model • The accuracy difference between shifting the threshold to the positive direction and the negative direction • Replace Max pooling to Average pooling Method(10/25) 23
  • 24. Interaction Lab., Seoul National University of Science and Technology ■Effect of other conditions  Datasets  Models  Initialization methods  Optimizers Method(11/25) 24
  • 25. Interaction Lab., Seoul National University of Science and Technology ■Dataset  MNIST • Binary VGG- small model on MNIST dataset • 30 different random seeds and average results are presented Method(12/25) 25
  • 26. Interaction Lab., Seoul National University of Science and Technology ■Model architecture  Two additional model architectures  2-layer MLP and LeNet-5 • 2-layer MLP do not have Max pooling layer in it Method(13/25) 26
  • 27. Interaction Lab., Seoul National University of Science and Technology ■Initialization method(1/2)  Trained the models from scratch using Xavier normal initialization  Full-precision model is float point 32  Pretrained full-precision models are often used to initialize BNN models  Hardtanh + ReLU  Full- precision VGG-small model on CIFAR-10 dataset  Binary VGG- small model initialized with full-precision pretrained model Method(14/25) 27
  • 28. Interaction Lab., Seoul National University of Science and Technology ■Initialization method(2/2) Method(15/25) 28 1 0.6
  • 29. Interaction Lab., Seoul National University of Science and Technology ■Optimizer  Mostly used Adam optimizer in BNN  Replace Adam to SGD Method(16/25) 29
  • 30. Interaction Lab., Seoul National University of Science and Technology ■Experiments on ImageNet  Finding shift amount • Top-1 train accuracy at the first epoch and Top-1 validation accuracy Method(17/25) 30
  • 31. Interaction Lab., Seoul National University of Science and Technology ■Results on ImageNet Method(18/25) 31
  • 32. Interaction Lab., Seoul National University of Science and Technology ■Effect of training threshold  Limited effect on the performance of BNNs  Batch normalization bias  Limited learning capability Method(19/25) 32
  • 33. Interaction Lab., Seoul National University of Science and Technology ■Batch normalization bias  Trainable threshold is already covered by the bias values in the BN layer  𝑋 and 𝑌 are inputs to the BN layer and output from the binary activation layer  𝜇, 𝜎, 𝛾, 𝛽, and 𝑡ℎ  mean, std, weight, and bias of the BN layer  𝛽 and 𝑡ℎ are initialized to zero in the beginning Method(20/25) 33
  • 34. Interaction Lab., Seoul National University of Science and Technology ■Limited learning capability  Trainable threshold method • An initialization point Method(21/25) 34
  • 35. Interaction Lab., Seoul National University of Science and Technology ■Effect of additional activation function(1/4)  Use additional activation function after convolution layer in BNNs  BNNs usually lack non-linearity in the model  Additional activation function is also related to the unbalanced activation distribution Method(22/25) 35
  • 36. Interaction Lab., Seoul National University of Science and Technology ■Effect of additional activation function(2/4)  Effect of additional PReLU layers after binary convolution layers in the ResNet20 Method(23/25) 36
  • 37. Interaction Lab., Seoul National University of Science and Technology ■Effect of additional activation function(3/4)  There is accuracy degradation with PReLU • Already play a role in distorting the activation distribution • Further shifting the threshold of binary activation function is excessive Method(24/25) 37
  • 38. Interaction Lab., Seoul National University of Science and Technology ■Effect of additional activation function(4/4)  Replace to PReLU to LeakyReLU  When the slope 1 • Identity function Method(25/25) 38
  • 39. Interaction Lab., Seoul National University of Science and Technology ■Analyzed the impact of activation distribution on the accuracy of BNN models ■Demonstrated that the accuracy of previous BNN models could be further improved Conclusion 39