SlideShare a Scribd company logo
1 of 26
Download to read offline
Retraining Quantized Neural Network Models
with Unlabeled Data
Kundjanasith Thonglek1
, Keichi Takahashi1
, Kohei Ichikawa1
,
Chawanat Nakasan2
, Hidemoto Nakada3
, Ryousei Takano3
and Hajimu Iida1
1
Nara Institute of Science and Technology, 2
Kanazawa University,
3
National Institute of Advanced Industrial Science and Technology
Running models on edge devices
Running models on edge devices does not require transferring the training
and inference datasets between the edge devices and a centralized server.
Better data privacy
Less network latency
Less power consumption
Specialized neural network
Model compression
Compressing neural network models reduces the size of the model, but also degrades
the accuracy of the model since it reduces the precision of the weights in the models.
Model compression
techniques
Uses
pre-trained models
Supports
fully connected layers
Reduces
redundant parameters
Impacts
accuracy
Parameter pruning
and sharing
Low-rank
factorization
Transferred/compact
convolutional filters
Knowledge
distillation
Objective
Reduce the size of neural network models without the significant accuracy loss
Neural Network Model
Model
Size
Model
Accuracy
Original Model Compressed Model
Compression Proposed
method
Neural Network Model
Model
Size
Model
Accuracy
Neural Network Model
Model
Size
Model
Accuracy
Compressed Model
Retraining methods
We can not always access the original labeled datasets because of
privacy policy and license limitation.
Retraining method is necessary to recover the accuracy of the compressed models.
Most existing retraining method require the original labeled datasets
to retrain the compressed models.
Using unlabeled dataset for retraining is highly useful when the
original labeled dataset is unavailable.
Proposed method
Quantization Retraining
Original Model Compressed Model
Quantized Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Proposed method
Quantization Retraining
Original Model Compressed Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Quantized Model
Quantization
Calculate clusters Calculate centroids
Proposed method
Quantization Retraining
Original Model Compressed Model
Decrease model size
with loss of accuracy
Increase model accuracy
while keeping the model size
Quantized Model
Proposed retraining method
Unlabeled
Data set
Quantized model
Non-trainable layer
Trainable layer
Original model
Trainable layer
Output vector
Output vector
Loss
Case Study of VGG-16
Model Architecture
Bias Value Weight Value
Model quantization
Size of Quantized VGG-16 models Accuracy of Quantized VGG-16 models
# of
quantized
layers
Model retraining
Retraining Quantized VGG-16 models
Quantizing the 14th
and 15th
layers using 32-256
centroids achieved nearly the accuracy of the original
model.
The best configuration for quantizing
VGG-16 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 14th
and 15th
layers using 32 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
Case Study of ResNet-50
Model Architecture
Bias Value Weight Value
Model quantization
Size of Quantized ResNet-50 models Accuracy of Quantized ResNet-50 models
# of
quantized
layers
Model retraining
Retraining Quantized ResNet-50 models
Quantizing the 13th
- 49th
layers using 128 or less
centroids clearly degrades the accuracy of the model.
The best configuration for quantizing
ResNet-50 model
- Quantize the biases in all layer using
one centroid and
- Quantize the weights in 13th
- 49th
layers using 256 centroids
It compressed to possible smallest model size without
significant accuracy loss.
# of centroids
Conventional & Proposed retraining
Accuracy of quantized model through retraining Retraining time of quantized model
Conclusion
We proposed a novel retraining method with unlabeled data for compressed
neural network models to reduce the size of model without significant accuracy loss.
The experimental result when applying the proposed retraining method.
- The model size of VGG-16 was reduced by 81.10% with only 0.34% loss of accuracy.
- The model size of ResNet-50 was reduced by 52.54% with only 0.71% loss of
accuracy.
The structure of other neural network models should be investigated to conduct the
efficient retraining method. Moreover, we will try to apply compression techniques other
than quantization.
Q&A
Thank you for your attention
Email: thonglek.kundjanasith.ti7@is.naist.jp
Experimental setup
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method
Hardware specification[*]
Targeted models
1. VGG-16 model
2. ResNet-50 model
[*] The hardware specification of a compute node in AI Bridging Cloud Infrastructure (ABCI)
provided by National Institute of Advanced Industrial Science and Technology (AIST)
Output vector
Unlabeled Data set
N data points
Neural network model
Output Layer
1 2 M
3
1 1 2 3 M
2 1 2 3 M
3 1 2 3 M
1 2 3 M
M
Output vector
N
N
Experimental setup
The experiments were conducted using
the computational resource of AI Bridging
Cloud Infrastructure (ABCI) provided by
the National Institute of Advanced
Industrial Science and Technology (AIST)
Hardware Specification
CPU Intel Xeon Gold 6148 x 2
Main Memory 364 GiB
GPU Nvidia Tesla V100 SXM2 x4
GPU Memory 16 GiB
Datasets
- ImageNet dataset is used for training the pre-trained model or the original model
- CIFAR dataset is used for retraining the quantized model by the proposed method
Quantization
Calculate clusters Calculate centroids
Example Slide
• Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.
• It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software like
Aldus PageMaker including versions of Lorem Ipsum.
• When an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into
electronic typesetting, remaining essentially unchanged.
Quantization
Find clusters Find centroid of each cluster
Quantization
Find clusters Calculate centroids

More Related Content

Similar to Retraining Quantized Neural Network Models with Unlabeled Data.pdf

AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0Sahil Kaw
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Accelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudAccelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudSamantha Guerriero
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
 
Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...IAEME Publication
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
Deep Learning on Everyday Devices
Deep Learning on Everyday DevicesDeep Learning on Everyday Devices
Deep Learning on Everyday DevicesBrodmann17
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3muayyad alsadi
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Something about SSE and beyond
Something about SSE and beyondSomething about SSE and beyond
Something about SSE and beyondLihang Li
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Neurosynaptic chips
Neurosynaptic chipsNeurosynaptic chips
Neurosynaptic chipsJeffrey Funk
 
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...Bharath Sudharsan
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 

Similar to Retraining Quantized Neural Network Models with Unlabeled Data.pdf (20)

AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
Accelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google CloudAccelerate Machine Learning on Google Cloud
Accelerate Machine Learning on Google Cloud
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 Once-for-All: Train One Network and Specialize it for Efficient Deployment Once-for-All: Train One Network and Specialize it for Efficient Deployment
Once-for-All: Train One Network and Specialize it for Efficient Deployment
 
Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...Performance boosting of discrete cosine transform using parallel programming ...
Performance boosting of discrete cosine transform using parallel programming ...
 
Project report
Project reportProject report
Project report
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part II
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
Deep Learning on Everyday Devices
Deep Learning on Everyday DevicesDeep Learning on Everyday Devices
Deep Learning on Everyday Devices
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Something about SSE and beyond
Something about SSE and beyondSomething about SSE and beyond
Something about SSE and beyond
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
DIGEST PODCAST
DIGEST PODCASTDIGEST PODCAST
DIGEST PODCAST
 
Neurosynaptic chips
Neurosynaptic chipsNeurosynaptic chips
Neurosynaptic chips
 
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
ECML PKDD 2021 ML meets IoT Tutorial Part II: Creating ML based Self learning...
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 

More from Kundjanasith Thonglek

Sparse Communication for Federated Learning
Sparse Communication for Federated LearningSparse Communication for Federated Learning
Sparse Communication for Federated LearningKundjanasith Thonglek
 
Improving Resource Availability in Data Center using Deep Learning.pdf
Improving Resource Availability in Data Center using Deep Learning.pdfImproving Resource Availability in Data Center using Deep Learning.pdf
Improving Resource Availability in Data Center using Deep Learning.pdfKundjanasith Thonglek
 
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...Kundjanasith Thonglek
 
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdfFederated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdfKundjanasith Thonglek
 
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdfAbnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdfKundjanasith Thonglek
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Kundjanasith Thonglek
 
Intelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdfIntelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdfKundjanasith Thonglek
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfKundjanasith Thonglek
 

More from Kundjanasith Thonglek (8)

Sparse Communication for Federated Learning
Sparse Communication for Federated LearningSparse Communication for Federated Learning
Sparse Communication for Federated Learning
 
Improving Resource Availability in Data Center using Deep Learning.pdf
Improving Resource Availability in Data Center using Deep Learning.pdfImproving Resource Availability in Data Center using Deep Learning.pdf
Improving Resource Availability in Data Center using Deep Learning.pdf
 
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
Enhancing the Prediction Accuracy of Solar Power Generation using a Generativ...
 
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdfFederated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
 
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdfAbnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
Abnormal Gait Recognition in Real-Time using Recurrent Neural Networks.pdf
 
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
Improving Resource Utilization in Data Centers using an LSTM-based Prediction...
 
Intelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdfIntelligent Vehicle Accident Analysis System.pdf
Intelligent Vehicle Accident Analysis System.pdf
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Retraining Quantized Neural Network Models with Unlabeled Data.pdf

  • 1. Retraining Quantized Neural Network Models with Unlabeled Data Kundjanasith Thonglek1 , Keichi Takahashi1 , Kohei Ichikawa1 , Chawanat Nakasan2 , Hidemoto Nakada3 , Ryousei Takano3 and Hajimu Iida1 1 Nara Institute of Science and Technology, 2 Kanazawa University, 3 National Institute of Advanced Industrial Science and Technology
  • 2. Running models on edge devices Running models on edge devices does not require transferring the training and inference datasets between the edge devices and a centralized server. Better data privacy Less network latency Less power consumption Specialized neural network
  • 3. Model compression Compressing neural network models reduces the size of the model, but also degrades the accuracy of the model since it reduces the precision of the weights in the models. Model compression techniques Uses pre-trained models Supports fully connected layers Reduces redundant parameters Impacts accuracy Parameter pruning and sharing Low-rank factorization Transferred/compact convolutional filters Knowledge distillation
  • 4. Objective Reduce the size of neural network models without the significant accuracy loss Neural Network Model Model Size Model Accuracy Original Model Compressed Model Compression Proposed method Neural Network Model Model Size Model Accuracy Neural Network Model Model Size Model Accuracy Compressed Model
  • 5. Retraining methods We can not always access the original labeled datasets because of privacy policy and license limitation. Retraining method is necessary to recover the accuracy of the compressed models. Most existing retraining method require the original labeled datasets to retrain the compressed models. Using unlabeled dataset for retraining is highly useful when the original labeled dataset is unavailable.
  • 6. Proposed method Quantization Retraining Original Model Compressed Model Quantized Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size
  • 7. Proposed method Quantization Retraining Original Model Compressed Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size Quantized Model
  • 9. Proposed method Quantization Retraining Original Model Compressed Model Decrease model size with loss of accuracy Increase model accuracy while keeping the model size Quantized Model
  • 10. Proposed retraining method Unlabeled Data set Quantized model Non-trainable layer Trainable layer Original model Trainable layer Output vector Output vector Loss
  • 11. Case Study of VGG-16 Model Architecture Bias Value Weight Value
  • 12. Model quantization Size of Quantized VGG-16 models Accuracy of Quantized VGG-16 models # of quantized layers
  • 13. Model retraining Retraining Quantized VGG-16 models Quantizing the 14th and 15th layers using 32-256 centroids achieved nearly the accuracy of the original model. The best configuration for quantizing VGG-16 model - Quantize the biases in all layer using one centroid and - Quantize the weights in 14th and 15th layers using 32 centroids It compressed to possible smallest model size without significant accuracy loss. # of centroids
  • 14. Case Study of ResNet-50 Model Architecture Bias Value Weight Value
  • 15. Model quantization Size of Quantized ResNet-50 models Accuracy of Quantized ResNet-50 models # of quantized layers
  • 16. Model retraining Retraining Quantized ResNet-50 models Quantizing the 13th - 49th layers using 128 or less centroids clearly degrades the accuracy of the model. The best configuration for quantizing ResNet-50 model - Quantize the biases in all layer using one centroid and - Quantize the weights in 13th - 49th layers using 256 centroids It compressed to possible smallest model size without significant accuracy loss. # of centroids
  • 17. Conventional & Proposed retraining Accuracy of quantized model through retraining Retraining time of quantized model
  • 18. Conclusion We proposed a novel retraining method with unlabeled data for compressed neural network models to reduce the size of model without significant accuracy loss. The experimental result when applying the proposed retraining method. - The model size of VGG-16 was reduced by 81.10% with only 0.34% loss of accuracy. - The model size of ResNet-50 was reduced by 52.54% with only 0.71% loss of accuracy. The structure of other neural network models should be investigated to conduct the efficient retraining method. Moreover, we will try to apply compression techniques other than quantization.
  • 19. Q&A Thank you for your attention Email: thonglek.kundjanasith.ti7@is.naist.jp
  • 20. Experimental setup Hardware Specification CPU Intel Xeon Gold 6148 x 2 Main Memory 364 GiB GPU Nvidia Tesla V100 SXM2 x4 GPU Memory 16 GiB Datasets - ImageNet dataset is used for training the pre-trained model or the original model - CIFAR dataset is used for retraining the quantized model by the proposed method Hardware specification[*] Targeted models 1. VGG-16 model 2. ResNet-50 model [*] The hardware specification of a compute node in AI Bridging Cloud Infrastructure (ABCI) provided by National Institute of Advanced Industrial Science and Technology (AIST)
  • 21. Output vector Unlabeled Data set N data points Neural network model Output Layer 1 2 M 3 1 1 2 3 M 2 1 2 3 M 3 1 2 3 M 1 2 3 M M Output vector N N
  • 22. Experimental setup The experiments were conducted using the computational resource of AI Bridging Cloud Infrastructure (ABCI) provided by the National Institute of Advanced Industrial Science and Technology (AIST) Hardware Specification CPU Intel Xeon Gold 6148 x 2 Main Memory 364 GiB GPU Nvidia Tesla V100 SXM2 x4 GPU Memory 16 GiB Datasets - ImageNet dataset is used for training the pre-trained model or the original model - CIFAR dataset is used for retraining the quantized model by the proposed method
  • 24. Example Slide • Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s. • When an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. • It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. • When an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
  • 25. Quantization Find clusters Find centroid of each cluster