Running neural network models on edge devices is attracting much attention by neural network researchers since edge computing technology is becoming more powerful than ever. However, deploying large neural network models on edge devices is challenging due to the limitation in available computing resources and storage space. Therefore, model compression techniques have been recently studied to reduce the model size and fit models on resource-limited edge devices. Compressing neural network models reduces the size of a model, but also degrades the accuracy of the model since it reduces the precision of weights in the model. Consequently, a retraining method is required to recover the accuracy of compressed models. Most existing retraining methods require the original labeled training datasets to retrain the models, but labeling is a time-consuming process. In particular, we cannot always access the original labeled datasets because of privacy policies and license limitations. In this paper, we propose a method to retrain a compressed neural network model with an unlabeled dataset that is different from the original labeled dataset. We compress the neural network model using quantization to decrease the size of the model. Subsequently, the compressed model is retrained by our proposed retraining method without using a labeled dataset to recover the accuracy of the model. We compared the proposed retraining method against the conventional retraining. The proposed method reduced the size of VGG-16 and ResNet-50 by 81.10% and 52.45%, respectively without significant accuracy loss. In addition, our proposed retraining method is clearly faster than the conventional retraining method.
Dev Dives: Streamline document processing with UiPath Studio Web
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
1. Retraining Quantized Neural Network Models
with Unlabeled Data
Kundjanasith Thonglek1
, Keichi Takahashi1
, Kohei Ichikawa1
,
Chawanat Nakasan2
, Hidemoto Nakada3
, Ryousei Takano3
and Hajimu Iida1
1
Nara Institute of Science and Technology, 2
Kanazawa University,
3
National Institute of Advanced Industrial Science and Technology
2. Running models on edge devices
Running models on edge devices does not require transferring the training
and inference datasets between the edge devices and a centralized server.
Better data privacy
Less network latency
Less power consumption
Specialized neural network
3. Model compression
Compressing neural network models reduces the size of the model, but also degrades
the accuracy of the model since it reduces the precision of the weights in the models.
Model compression
techniques
Uses
pre-trained models
Supports
fully connected layers
Reduces
redundant parameters
Impacts
accuracy
Parameter pruning
and sharing
Low-rank
factorization
Transferred/compact
convolutional filters
Knowledge
distillation
4. Objective
Reduce the size of neural network models without the significant accuracy loss
Neural Network Model
Model
Size
Model
Accuracy
Original Model Compressed Model
Compression Proposed
method
Neural Network Model
Model
Size
Model
Accuracy
Neural Network Model
Model
Size
Model
Accuracy
Compressed Model
5. Retraining methods
We can not always access the original labeled datasets because of
privacy policy and license limitation.
Retraining method is necessary to recover the accuracy of the compressed models.
Most existing retraining method require the original labeled datasets
to retrain the compressed models.
Using unlabeled dataset for retraining is highly useful when the
original labeled dataset is unavailable.
11. Case Study of VGG-16
Model Architecture
Bias Value Weight Value
12. Model quantization
Size of Quantized VGG-16 models Accuracy of Quantized VGG-16 models
# of
quantized
layers
13. Model retraining
Retraining Quantized VGG-16 models
Quantizing the 14th
and 15th
layers using 32-256
centroids achieved nearly the accuracy of the original
model.
The best configuration for quantizing
VGG-16 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 14th
and 15th
layers using 32 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
14. Case Study of ResNet-50
Model Architecture
Bias Value Weight Value
15. Model quantization
Size of Quantized ResNet-50 models Accuracy of Quantized ResNet-50 models
# of
quantized
layers
16. Model retraining
Retraining Quantized ResNet-50 models
Quantizing the 13th
- 49th
layers using 128 or less
centroids clearly degrades the accuracy of the model.
The best configuration for quantizing
ResNet-50 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 13th
- 49th
layers using 256 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
17. Conventional & Proposed retraining
Accuracy of quantized model through retraining Retraining time of quantized model
18. Conclusion
We proposed a novel retraining method with unlabeled data for compressed
neural network models to reduce the size of model without significant accuracy loss.
The experimental result when applying the proposed retraining method.
- The model size of VGG-16 was reduced by 81.10% with only 0.34% loss of accuracy.
- The model size of ResNet-50 was reduced by 52.54% with only 0.71% loss of
accuracy.
The structure of other neural network models should be investigated to conduct the
efficient retraining method. Moreover, we will try to apply compression techniques other
than quantization.
19. Q&A
Thank you for your attention
Email: thonglek.kundjanasith.ti7@is.naist.jp
20. Experimental setup
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method
Hardware specification[*]
Targeted models
1. VGG-16 model
2. ResNet-50 model
[*] The hardware specification of a compute node in AI Bridging Cloud Infrastructure (ABCI)
provided by National Institute of Advanced Industrial Science and Technology (AIST)
21. Output vector
Unlabeled Data set
N data points
Neural network model
Output Layer
1 2 M
3
1 1 2 3 M
2 1 2 3 M
3 1 2 3 M
1 2 3 M
M
Output vector
N
N
22. Experimental setup
The experiments were conducted using
the computational resource of AI Bridging
Cloud Infrastructure (ABCI) provided by
the National Institute of Advanced
Industrial Science and Technology (AIST)
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method
24. Example Slide
• Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.
• It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software like
Aldus PageMaker including versions of Lorem Ipsum.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.