Model compression

Model Compression
2021.1.31 Overfitting 김난희
비슷한 수준의 성능(정확성, )을 유지한채 더 적은 파라미터 수와 연산량을 가지는 모델을 만드는 것

Table of Contents
• Why?
• Compression Method
2/25

Basic
On-device
• Edge device
• Mobile device
• Embedded device
->
3/25

Why? -> personal reasons
• Academic world -> Industrial world
Training Time && Throughput (Inference Time) && Accuracy == ★
4/25

Why? -> future vision
• Communication • Latency • Privacy • Energy Efficiency
• High Speed
• Low Power
->
5/25

Compression Method
Model deployment seq
• NAS(Neural
Architecture Search)
• Popular
models(Lightweight
Models)
• Knowledge Distillation
• Network Pruning
• Quantization
• Joint Approach
• Kernel Optimization
• Graph Optimization
• Hardware Improvement
• CPU/NPU/TPU/DPU/GPU
Model Training Model Compression Compile Deployment
-> -> ->
경량 알고리즘 연구 알고리즘 경량화
6/25

Compression Method
• NAS(Neural
• Popular
models(Lightweight
Models)
• Network Pruning
• Quantization
• Joint Approach
-> -> ->
경량 알고리즘 연구
경량 알고리즘 연구
7/25

Compression Method
NAS
① Hyper-Parameter Optimization(HPO)
② Feature Learning
③ Neural Architecture Search(NAS)
AutoML
Barret Zoph et al., “NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING”
• Ex. Configuration String: [“Filter Width: 5”, “Filter Height:3”, “Num Filters:24”]
• Controller(RNN)을 수렴하게 강화학습
• Child Network의 정확도(Reword Signal)를 기반으로 Controller(RNN)의 파라미터들을 업데이트
• Child Network 구조에서 기대하는 Validation Accuracy를 최대화하기 위해 Controller(RNN)의
파라미터인 𝜃𝑐
가 최적화
• Policy gradient를 써서 파라미터 업데이트
8/25

Compression Method
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Separable Convolution
⑥ Feature Map Reuse
Compact Architecture
9/25

Compression Method
① Bottleneck
⑤ Depthwise Convolution
① Bottleneck
Christian Szegedy et al., “Going deeper with convolutions”
/9
• channel 수 줄이기
• pointwise convolution도 참고
• FC layer보다더 훨씬 parameter 수 줄이기
• Googlenet 초기 아이디어
10/25

① Bottleneck
Compression Method
① Bottleneck
• 3x3 == 3x1 -> 1x3 (receptive field 동일)
• 직사각형 구조로 Convolution
• Alexnet이 GPU 부족해서 사용
11/25

① Bottleneck
⑤ Depthwise Separable Convolution
Compression Method
=
+
Depthwise Convolution Pointwise Convolution
• Inception/ Xception/ SqueezeNet/ MobileNet 12/25

① Bottleneck
Compression Method
• 앞선 Feature map 을 Reuse
• Convolution filter 채널 줄이기
• 연산량과 파라미터 줄이기
• DenseNet 13/25

Compression Method
• NAS(Neural
• Popular
models(Lightweight
Models)
• Network Pruning
• Quantization
• Joint Approach
-> -> ->
알고리즘 경량화
알고리즘 경량화
14/25

Compression Method
Knowledge Distillation
• Features Distillation
• Softlabel
• Attention Disitillation
• KD
• FitNets
• OverHaul KD
• Relational KD
15/25

Compression Method
Network Pruning
Weight, Filter, Channel?
• Structed Pruning
• Unstructured Pruning
Metrics
• L1/l2
• GM Pruning
• BN Pruning
Comparison scope
• Local Pruning
• Global Pruning
16/25

Compression Method
Quantization
• DoReFa
• PACT
• QAT(Quantization Aware Training)
• PTQ(Post Training Quantization)
17/25

Compression Method
Joint Approach
Han Cai et al., “ONCE-FOR-ALL: TRAIN ONE NETWORK AND SPECIALIZE IT FOR EFFICIENT DEPLOYMENT”
• pruning, KD, kernel size,
number of layers
18/25

good metric for “Accuracy vs Speed(Latency)” ???
Compression Method
19/25

Metric
• Information Density • NetScore
𝐷 𝑁 =
𝑎(𝑁)
𝑝(𝑁)
D(N): informaion density
a(N): accuracy
p(N): the number of parameters
Ω N = 20 log(
𝑎(𝑁)𝛼
𝑝(𝑁)𝛽𝑚(𝑁)𝛾
)
Ω(N): netscore
a(N): accuracy
p(N): the number of parameters
m(N): the number of multiply-accumulate(MAC)
operations during inference
𝛼=2, 𝛽=0.5, 𝛾=0.5
Compression Method
20/25

Compression Method
Simone Bianco, et al., “Benchmark Analysis of Representative Deep Neural Network Architectures” 21/25

Compression Method
AlexNet vs SqueezeNet
Forrest N. Iandola et al., “SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND < 0.5MB MODEL SIZE” 22/25

Compression Method
Parameter↓, MAC↓ ∝? Memory↓, Speed↑
Titan Xp
Simone Bianco, et al., “Benchmark Analysis of Representative Deep Neural Network Architectures” 23/25

Compression Method
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a Presentation from MIT 24/25

Compression Method
Yang et al., “A Method to Estimate the Energy Consumption of Deep Neural Networks” 25/25
• ShuffleNet V2 Guides 참고

Next!
• Detail: Compression Method
• Exercise: Pruning, …
• Compression model paper
Reference, 도움받은 곳!
Youtube
• On device AI를 위한 모델 경량화[MODUCON 2019]
• [Techtonic 2020] Track 2. 딥러닝 모델을 Production 현장에 배포하고 Serving하
기 - 최효근 프로
• PR-017: Neural Architecture Search with Reinforcement Learning
• [풀영상] 모바일 ML을 위한 비전모델을 쥐어짜보즈아! (백수콘 June 2018)
Blog
• AutoML
• https://medium.com/daria-blog/automl-%EC%9D%B4%EB%9E%80-
%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C-1af227af2075
• Knowledge distillation
• https://towardsdatascience.com/knowledge-distillation-simplified-
dd4973dbc764
• Convolution 경량화 기법들
• https://eehoeskrap.tistory.com/431
Tech Talk
• Nota 국제인공지능대전
• Nota 딥러닝 경량화의 최신 동향 및 경량화 플랫폼 넷츠프레소의 소개
• Hyperconnect 회사 소개
26

Model compression

More Related Content

What's hot

Similar to Model compression

More from Nanhee Kim

Recently uploaded

Model compression

Editor's Notes