Retraining Quantized Neural Network Models with Unlabeled Data.pdf

Retraining Quantized Neural Network Models
with Unlabeled Data
Kundjanasith Thonglek1
, Keichi Takahashi1
, Kohei Ichikawa1
,
Chawanat Nakasan2
, Hidemoto Nakada3
, Ryousei Takano3
and Hajimu Iida1
1
Nara Institute of Science and Technology, 2
Kanazawa University,
3
National Institute of Advanced Industrial Science and Technology

Running models on edge devices
Running models on edge devices does not require transferring the training
and inference datasets between the edge devices and a centralized server.
Better data privacy
Less network latency
Less power consumption
Specialized neural network

Model compression
Compressing neural network models reduces the size of the model, but also degrades
the accuracy of the model since it reduces the precision of the weights in the models.
Model compression
techniques
Uses
pre-trained models
Supports
fully connected layers
Reduces
redundant parameters
Impacts
accuracy
Parameter pruning
and sharing
Low-rank
factorization
Transferred/compact
convolutional filters
Knowledge
distillation

Objective
Reduce the size of neural network models without the significant accuracy loss
Neural Network Model
Model
Size
Model
Accuracy
Original Model Compressed Model
Compression Proposed
method
Model
Size
Model
Accuracy
Model
Size
Model
Accuracy
Compressed Model

Retraining methods
We can not always access the original labeled datasets because of
privacy policy and license limitation.
Retraining method is necessary to recover the accuracy of the compressed models.
Most existing retraining method require the original labeled datasets
to retrain the compressed models.
Using unlabeled dataset for retraining is highly useful when the
original labeled dataset is unavailable.

Proposed method
Quantization Retraining
Quantized Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size

Proposed method
Quantization Retraining
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Quantized Model

Quantization
Calculate clusters Calculate centroids

Proposed retraining method
Unlabeled
Data set
Quantized model
Non-trainable layer
Trainable layer
Original model
Trainable layer
Output vector
Output vector
Loss

Case Study of VGG-16
Model Architecture
Bias Value Weight Value

Model quantization
Size of Quantized VGG-16 models Accuracy of Quantized VGG-16 models
# of
quantized
layers

Model retraining
Retraining Quantized VGG-16 models
Quantizing the 14th
and 15th
layers using 32-256
centroids achieved nearly the accuracy of the original
model.
The best configuration for quantizing
VGG-16 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 14th
and 15th
layers using 32 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids

Case Study of ResNet-50
Model Architecture
Bias Value Weight Value

Model quantization
Size of Quantized ResNet-50 models Accuracy of Quantized ResNet-50 models
# of
quantized
layers

Model retraining
Retraining Quantized ResNet-50 models
Quantizing the 13th
- 49th
layers using 128 or less
centroids clearly degrades the accuracy of the model.
The best configuration for quantizing
ResNet-50 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 13th
- 49th
layers using 256 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids

Conventional & Proposed retraining
Accuracy of quantized model through retraining Retraining time of quantized model

Conclusion
We proposed a novel retraining method with unlabeled data for compressed
neural network models to reduce the size of model without significant accuracy loss.
The experimental result when applying the proposed retraining method.
- The model size of VGG-16 was reduced by 81.10% with only 0.34% loss of accuracy.
- The model size of ResNet-50 was reduced by 52.54% with only 0.71% loss of
accuracy.
The structure of other neural network models should be investigated to conduct the
efficient retraining method. Moreover, we will try to apply compression techniques other
than quantization.

Q&A
Thank you for your attention
Email: thonglek.kundjanasith.ti7@is.naist.jp

Experimental setup
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method
Hardware specification[*]
Targeted models
1. VGG-16 model
2. ResNet-50 model
[*] The hardware specification of a compute node in AI Bridging Cloud Infrastructure (ABCI)
provided by National Institute of Advanced Industrial Science and Technology (AIST)

Output vector
Unlabeled Data set
N data points
Neural network model
Output Layer
1 2 M
3
1 1 2 3 M
2 1 2 3 M
3 1 2 3 M
1 2 3 M
M
Output vector
N
N

Experimental setup
The experiments were conducted using
the computational resource of AI Bridging
Cloud Infrastructure (ABCI) provided by
the National Institute of Advanced
Industrial Science and Technology (AIST)
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method

Example Slide
• Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.
• It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software like
Aldus PageMaker including versions of Lorem Ipsum.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.

Quantization
Find clusters Find centroid of each cluster

Quantization
Find clusters Calculate centroids

Retraining Quantized Neural Network Models with Unlabeled Data.pdf

More Related Content

Similar to Retraining Quantized Neural Network Models with Unlabeled Data.pdf

More from Kundjanasith Thonglek

Recently uploaded

Retraining Quantized Neural Network Models with Unlabeled Data.pdf