Retraining Quantized Neural Network Models
with Unlabeled Data
Kundjanasith Thonglek1
, Keichi Takahashi1
, Kohei Ichikawa1
,
Chawanat Nakasan2
, Hidemoto Nakada3
, Ryousei Takano3
and Hajimu Iida1
1
Nara Institute of Science and Technology, 2
Kanazawa University,
3
National Institute of Advanced Industrial Science and Technology
Running models on edge devices
Running models on edge devices does not require transferring the training
and inference datasets between the edge devices and a centralized server.
Better data privacy
Less network latency
Less power consumption
Specialized neural network
Model compression
Compressing neural network models reduces the size of the model, but also degrades
the accuracy of the model since it reduces the precision of the weights in the models.
Model compression
techniques
Uses
pre-trained models
Supports
fully connected layers
Reduces
redundant parameters
Impacts
accuracy
Parameter pruning
and sharing
Low-rank
factorization
Transferred/compact
convolutional filters
Knowledge
distillation
Objective
Reduce the size of neural network models without the significant accuracy loss
Neural Network Model
Model
Size
Model
Accuracy
Original Model Compressed Model
Compression Proposed
method
Neural Network Model
Model
Size
Model
Accuracy
Neural Network Model
Model
Size
Model
Accuracy
Compressed Model
Retraining methods
We can not always access the original labeled datasets because of
privacy policy and license limitation.
Retraining method is necessary to recover the accuracy of the compressed models.
Most existing retraining method require the original labeled datasets
to retrain the compressed models.
Using unlabeled dataset for retraining is highly useful when the
original labeled dataset is unavailable.
Proposed method
Quantization Retraining
Original Model Compressed Model
Quantized Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Proposed method
Quantization Retraining
Original Model Compressed Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Quantized Model
Quantization
Calculate clusters Calculate centroids
Proposed method
Quantization Retraining
Original Model Compressed Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Quantized Model
Proposed retraining method
Unlabeled
Data set
Quantized model
Non-trainable layer
Trainable layer
Original model
Trainable layer
Output vector
Output vector
Loss
Case Study of VGG-16
Model Architecture
Bias Value Weight Value
Model quantization
Size of Quantized VGG-16 models Accuracy of Quantized VGG-16 models
# of
quantized
layers
Model retraining
Retraining Quantized VGG-16 models
Quantizing the 14th
and 15th
layers using 32-256
centroids achieved nearly the accuracy of the original
model.
The best configuration for quantizing
VGG-16 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 14th
and 15th
layers using 32 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
Case Study of ResNet-50
Model Architecture
Bias Value Weight Value
Model quantization
Size of Quantized ResNet-50 models Accuracy of Quantized ResNet-50 models
# of
quantized
layers
Model retraining
Retraining Quantized ResNet-50 models
Quantizing the 13th
- 49th
layers using 128 or less
centroids clearly degrades the accuracy of the model.
The best configuration for quantizing
ResNet-50 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 13th
- 49th
layers using 256 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
Conventional & Proposed retraining
Accuracy of quantized model through retraining Retraining time of quantized model
Conclusion
We proposed a novel retraining method with unlabeled data for compressed
neural network models to reduce the size of model without significant accuracy loss.
The experimental result when applying the proposed retraining method.
- The model size of VGG-16 was reduced by 81.10% with only 0.34% loss of accuracy.
- The model size of ResNet-50 was reduced by 52.54% with only 0.71% loss of
accuracy.
The structure of other neural network models should be investigated to conduct the
efficient retraining method. Moreover, we will try to apply compression techniques other
than quantization.
Q&A
Thank you for your attention
Email: thonglek.kundjanasith.ti7@is.naist.jp
Experimental setup
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method
Hardware specification[*]
Targeted models
1. VGG-16 model
2. ResNet-50 model
[*] The hardware specification of a compute node in AI Bridging Cloud Infrastructure (ABCI)
provided by National Institute of Advanced Industrial Science and Technology (AIST)
Output vector
Unlabeled Data set
N data points
Neural network model
Output Layer
1 2 M
3
1 1 2 3 M
2 1 2 3 M
3 1 2 3 M
1 2 3 M
M
Output vector
N
N
Experimental setup
The experiments were conducted using
the computational resource of AI Bridging
Cloud Infrastructure (ABCI) provided by
the National Institute of Advanced
Industrial Science and Technology (AIST)
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method
Quantization
Calculate clusters Calculate centroids
Example Slide
• Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.
• It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software like
Aldus PageMaker including versions of Lorem Ipsum.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.
Quantization
Find clusters Find centroid of each cluster
Quantization
Find clusters Calculate centroids

Retraining Quantized Neural Network Models with Unlabeled Data.pdf

  • 1.
    Retraining Quantized NeuralNetwork Models with Unlabeled Data Kundjanasith Thonglek1 , Keichi Takahashi1 , Kohei Ichikawa1 , Chawanat Nakasan2 , Hidemoto Nakada3 , Ryousei Takano3 and Hajimu Iida1 1 Nara Institute of Science and Technology, 2 Kanazawa University, 3 National Institute of Advanced Industrial Science and Technology
  • 2.
    Running models onedge devices Running models on edge devices does not require transferring the training and inference datasets between the edge devices and a centralized server. Better data privacy Less network latency Less power consumption Specialized neural network
  • 3.
    Model compression Compressing neuralnetwork models reduces the size of the model, but also degrades the accuracy of the model since it reduces the precision of the weights in the models. Model compression techniques Uses pre-trained models Supports fully connected layers Reduces redundant parameters Impacts accuracy Parameter pruning and sharing Low-rank factorization Transferred/compact convolutional filters Knowledge distillation
  • 4.
    Objective Reduce the sizeof neural network models without the significant accuracy loss Neural Network Model Model Size Model Accuracy Original Model Compressed Model Compression Proposed method Neural Network Model Model Size Model Accuracy Neural Network Model Model Size Model Accuracy Compressed Model
  • 5.
    Retraining methods We cannot always access the original labeled datasets because of privacy policy and license limitation. Retraining method is necessary to recover the accuracy of the compressed models. Most existing retraining method require the original labeled datasets to retrain the compressed models. Using unlabeled dataset for retraining is highly useful when the original labeled dataset is unavailable.
  • 6.
    Proposed method Quantization Retraining OriginalModel Compressed Model Quantized Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size
  • 7.
    Proposed method Quantization Retraining OriginalModel Compressed Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size Quantized Model
  • 8.
  • 9.
    Proposed method Quantization Retraining OriginalModel Compressed Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size Quantized Model
  • 10.
    Proposed retraining method Unlabeled Dataset Quantized model Non-trainable layer Trainable layer Original model Trainable layer Output vector Output vector Loss
  • 11.
    Case Study ofVGG-16 Model Architecture Bias Value Weight Value
  • 12.
    Model quantization Size ofQuantized VGG-16 models Accuracy of Quantized VGG-16 models # of quantized layers
  • 13.
    Model retraining Retraining QuantizedVGG-16 models Quantizing the 14th and 15th layers using 32-256 centroids achieved nearly the accuracy of the original model. The best configuration for quantizing VGG-16 model - Quantize the biases in all layer using one centroid and - Quantize the weights in 14th and 15th layers using 32 centroids It compressed to possible smallest model size without significant accuracy loss. # of centroids
  • 14.
    Case Study ofResNet-50 Model Architecture Bias Value Weight Value
  • 15.
    Model quantization Size ofQuantized ResNet-50 models Accuracy of Quantized ResNet-50 models # of quantized layers
  • 16.
    Model retraining Retraining QuantizedResNet-50 models Quantizing the 13th - 49th layers using 128 or less centroids clearly degrades the accuracy of the model. The best configuration for quantizing ResNet-50 model - Quantize the biases in all layer using one centroid and - Quantize the weights in 13th - 49th layers using 256 centroids It compressed to possible smallest model size without significant accuracy loss. # of centroids
  • 17.
    Conventional & Proposedretraining Accuracy of quantized model through retraining Retraining time of quantized model
  • 18.
    Conclusion We proposed anovel retraining method with unlabeled data for compressed neural network models to reduce the size of model without significant accuracy loss. The experimental result when applying the proposed retraining method. - The model size of VGG-16 was reduced by 81.10% with only 0.34% loss of accuracy. - The model size of ResNet-50 was reduced by 52.54% with only 0.71% loss of accuracy. The structure of other neural network models should be investigated to conduct the efficient retraining method. Moreover, we will try to apply compression techniques other than quantization.
  • 19.
    Q&A Thank you foryour attention Email: thonglek.kundjanasith.ti7@is.naist.jp
  • 20.
    Experimental setup Hardware Specification CPUIntel Xeon Gold 6148 x 2 Main Memory 364 GiB GPU Nvidia Tesla V100 SXM2 x4 GPU Memory 16 GiB Datasets - ImageNet dataset is used for training the pre-trained model or the original model - CIFAR dataset is used for retraining the quantized model by the proposed method Hardware specification[*] Targeted models 1. VGG-16 model 2. ResNet-50 model [*] The hardware specification of a compute node in AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST)
  • 21.
    Output vector Unlabeled Dataset N data points Neural network model Output Layer 1 2 M 3 1 1 2 3 M 2 1 2 3 M 3 1 2 3 M 1 2 3 M M Output vector N N
  • 22.
    Experimental setup The experimentswere conducted using the computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by the National Institute of Advanced Industrial Science and Technology (AIST) Hardware Specification CPU Intel Xeon Gold 6148 x 2 Main Memory 364 GiB GPU Nvidia Tesla V100 SXM2 x4 GPU Memory 16 GiB Datasets - ImageNet dataset is used for training the pre-trained model or the original model - CIFAR dataset is used for retraining the quantized model by the proposed method
  • 23.
  • 24.
    Example Slide • LoremIpsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s. • When an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. • It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. • When an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
  • 25.
    Quantization Find clusters Findcentroid of each cluster
  • 26.