Model Compression
2021.1.31 Overfitting 김난희
비슷한 수준의 성능(정확성, )을 유지한채 더 적은 파라미터 수와 연산량을 가지는 모델을 만드는 것
Table of Contents
• Why?
• Compression Method
2/25
Basic
On-device
• Edge device
• Mobile device
• Embedded device
->
3/25
Why? -> personal reasons
• Academic world -> Industrial world
Training Time && Throughput (Inference Time) && Accuracy == ★
4/25
Why? -> future vision
• Communication • Latency • Privacy • Energy Efficiency
• High Speed
• Low Power
->
5/25
Compression Method
Model deployment seq
• NAS(Neural
Architecture Search)
• Popular
models(Lightweight
Models)
• Knowledge Distillation
• Network Pruning
• Quantization
• Joint Approach
• Kernel Optimization
• Graph Optimization
• Hardware Improvement
• CPU/NPU/TPU/DPU/GPU
Model Training Model Compression Compile Deployment
-> -> ->
경량 알고리즘 연구 알고리즘 경량화
6/25
Compression Method
Model deployment seq
• NAS(Neural
Architecture Search)
• Popular
models(Lightweight
Models)
• Knowledge Distillation
• Network Pruning
• Quantization
• Joint Approach
• Kernel Optimization
• Graph Optimization
• Hardware Improvement
• CPU/NPU/TPU/DPU/GPU
Model Training Model Compression Compile Deployment
-> -> ->
경량 알고리즘 연구
경량 알고리즘 연구
7/25
Compression Method
NAS
① Hyper-Parameter Optimization(HPO)
② Feature Learning
③ Neural Architecture Search(NAS)
AutoML
Barret Zoph et al., “NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING”
• Ex. Configuration String: [“Filter Width: 5”, “Filter Height:3”, “Num Filters:24”]
• Controller(RNN)을 수렴하게 강화학습
• Child Network의 정확도(Reword Signal)를 기반으로 Controller(RNN)의 파라미터들을 업데이트
• Child Network 구조에서 기대하는 Validation Accuracy를 최대화하기 위해 Controller(RNN)의
파라미터인 𝜃𝑐
가 최적화
• Policy gradient를 써서 파라미터 업데이트
8/25
Compression Method
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Separable Convolution
⑥ Feature Map Reuse
Compact Architecture
9/25
Compression Method
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Convolution
⑥ Feature Map Reuse
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Convolution
⑥ Feature Map Reuse
Compact Architecture
Christian Szegedy et al., “Going deeper with convolutions”
/9
• channel 수 줄이기
• pointwise convolution도 참고
• FC layer보다더 훨씬 parameter 수 줄이기
• Googlenet 초기 아이디어
10/25
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Convolution
⑥ Feature Map Reuse
Compression Method
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Convolution
⑥ Feature Map Reuse
Compact Architecture
• 3x3 == 3x1 -> 1x3 (receptive field 동일)
• 직사각형 구조로 Convolution
• Alexnet이 GPU 부족해서 사용
11/25
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Separable Convolution
⑥ Feature Map Reuse
Compression Method
Compact Architecture
=
+
Depthwise Convolution Pointwise Convolution
• Inception/ Xception/ SqueezeNet/ MobileNet 12/25
① Bottleneck
② Global Average Pooling
③ Filter Factorization
④ Group Convolution
⑤ Depthwise Convolution
⑥ Feature Map Reuse
Compression Method
Compact Architecture
• 앞선 Feature map 을 Reuse
• Convolution filter 채널 줄이기
• 연산량과 파라미터 줄이기
• DenseNet 13/25
Compression Method
Model deployment seq
• NAS(Neural
Architecture Search)
• Popular
models(Lightweight
Models)
• Knowledge Distillation
• Network Pruning
• Quantization
• Joint Approach
• Kernel Optimization
• Graph Optimization
• Hardware Improvement
• CPU/NPU/TPU/DPU/GPU
Model Training Model Compression Compile Deployment
-> -> ->
알고리즘 경량화
알고리즘 경량화
14/25
Compression Method
Knowledge Distillation
• Features Distillation
• Softlabel
• Attention Disitillation
• KD
• FitNets
• OverHaul KD
• Relational KD
15/25
Compression Method
Network Pruning
Weight, Filter, Channel?
• Structed Pruning
• Unstructured Pruning
Metrics
• L1/l2
• GM Pruning
• BN Pruning
Comparison scope
• Local Pruning
• Global Pruning
16/25
Compression Method
Quantization
• DoReFa
• PACT
• QAT(Quantization Aware Training)
• PTQ(Post Training Quantization)
17/25
Compression Method
Joint Approach
Han Cai et al., “ONCE-FOR-ALL: TRAIN ONE NETWORK AND SPECIALIZE IT FOR EFFICIENT DEPLOYMENT”
• pruning, KD, kernel size,
number of layers
18/25
good metric for “Accuracy vs Speed(Latency)” ???
Compression Method
19/25
Metric
• Information Density • NetScore
𝐷 𝑁 =
𝑎(𝑁)
𝑝(𝑁)
D(N): informaion density
a(N): accuracy
p(N): the number of parameters
Ω N = 20 log(
𝑎(𝑁)𝛼
𝑝(𝑁)𝛽𝑚(𝑁)𝛾
)
Ω(N): netscore
a(N): accuracy
p(N): the number of parameters
m(N): the number of multiply-accumulate(MAC)
operations during inference
𝛼=2, 𝛽=0.5, 𝛾=0.5
Compression Method
20/25
Compression Method
Simone Bianco, et al., “Benchmark Analysis of Representative Deep Neural Network Architectures” 21/25
Compression Method
AlexNet vs SqueezeNet
Forrest N. Iandola et al., “SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND < 0.5MB MODEL SIZE” 22/25
Compression Method
Parameter↓, MAC↓ ∝? Memory↓, Speed↑
Titan Xp
Simone Bianco, et al., “Benchmark Analysis of Representative Deep Neural Network Architectures” 23/25
Compression Method
Parameter↓, MAC↓ ∝? Memory↓, Speed↑
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a Presentation from MIT 24/25
Compression Method
Parameter↓, MAC↓ ∝? Memory↓, Speed↑
Yang et al., “A Method to Estimate the Energy Consumption of Deep Neural Networks” 25/25
• ShuffleNet V2 Guides 참고
Next!
• Detail: Compression Method
• Exercise: Pruning, …
• Compression model paper
Reference, 도움받은 곳!
Youtube
• On device AI를 위한 모델 경량화[MODUCON 2019]
• [Techtonic 2020] Track 2. 딥러닝 모델을 Production 현장에 배포하고 Serving하
기 - 최효근 프로
• PR-017: Neural Architecture Search with Reinforcement Learning
• [풀영상] 모바일 ML을 위한 비전모델을 쥐어짜보즈아! (백수콘 June 2018)
Blog
• AutoML
• https://medium.com/daria-blog/automl-%EC%9D%B4%EB%9E%80-
%EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C-1af227af2075
• Knowledge distillation
• https://towardsdatascience.com/knowledge-distillation-simplified-
dd4973dbc764
• Convolution 경량화 기법들
• https://eehoeskrap.tistory.com/431
Tech Talk
• Nota 국제인공지능대전
• Nota 딥러닝 경량화의 최신 동향 및 경량화 플랫폼 넷츠프레소의 소개
• Hyperconnect 회사 소개
26

Model compression

  • 1.
    Model Compression 2021.1.31 Overfitting김난희 비슷한 수준의 성능(정확성, )을 유지한채 더 적은 파라미터 수와 연산량을 가지는 모델을 만드는 것
  • 2.
    Table of Contents •Why? • Compression Method 2/25
  • 3.
    Basic On-device • Edge device •Mobile device • Embedded device -> 3/25
  • 4.
    Why? -> personalreasons • Academic world -> Industrial world Training Time && Throughput (Inference Time) && Accuracy == ★ 4/25
  • 5.
    Why? -> futurevision • Communication • Latency • Privacy • Energy Efficiency • High Speed • Low Power -> 5/25
  • 6.
    Compression Method Model deploymentseq • NAS(Neural Architecture Search) • Popular models(Lightweight Models) • Knowledge Distillation • Network Pruning • Quantization • Joint Approach • Kernel Optimization • Graph Optimization • Hardware Improvement • CPU/NPU/TPU/DPU/GPU Model Training Model Compression Compile Deployment -> -> -> 경량 알고리즘 연구 알고리즘 경량화 6/25
  • 7.
    Compression Method Model deploymentseq • NAS(Neural Architecture Search) • Popular models(Lightweight Models) • Knowledge Distillation • Network Pruning • Quantization • Joint Approach • Kernel Optimization • Graph Optimization • Hardware Improvement • CPU/NPU/TPU/DPU/GPU Model Training Model Compression Compile Deployment -> -> -> 경량 알고리즘 연구 경량 알고리즘 연구 7/25
  • 8.
    Compression Method NAS ① Hyper-ParameterOptimization(HPO) ② Feature Learning ③ Neural Architecture Search(NAS) AutoML Barret Zoph et al., “NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING” • Ex. Configuration String: [“Filter Width: 5”, “Filter Height:3”, “Num Filters:24”] • Controller(RNN)을 수렴하게 강화학습 • Child Network의 정확도(Reword Signal)를 기반으로 Controller(RNN)의 파라미터들을 업데이트 • Child Network 구조에서 기대하는 Validation Accuracy를 최대화하기 위해 Controller(RNN)의 파라미터인 𝜃𝑐 가 최적화 • Policy gradient를 써서 파라미터 업데이트 8/25
  • 9.
    Compression Method ① Bottleneck ②Global Average Pooling ③ Filter Factorization ④ Group Convolution ⑤ Depthwise Separable Convolution ⑥ Feature Map Reuse Compact Architecture 9/25
  • 10.
    Compression Method ① Bottleneck ②Global Average Pooling ③ Filter Factorization ④ Group Convolution ⑤ Depthwise Convolution ⑥ Feature Map Reuse ① Bottleneck ② Global Average Pooling ③ Filter Factorization ④ Group Convolution ⑤ Depthwise Convolution ⑥ Feature Map Reuse Compact Architecture Christian Szegedy et al., “Going deeper with convolutions” /9 • channel 수 줄이기 • pointwise convolution도 참고 • FC layer보다더 훨씬 parameter 수 줄이기 • Googlenet 초기 아이디어 10/25
  • 11.
    ① Bottleneck ② GlobalAverage Pooling ③ Filter Factorization ④ Group Convolution ⑤ Depthwise Convolution ⑥ Feature Map Reuse Compression Method ① Bottleneck ② Global Average Pooling ③ Filter Factorization ④ Group Convolution ⑤ Depthwise Convolution ⑥ Feature Map Reuse Compact Architecture • 3x3 == 3x1 -> 1x3 (receptive field 동일) • 직사각형 구조로 Convolution • Alexnet이 GPU 부족해서 사용 11/25
  • 12.
    ① Bottleneck ② GlobalAverage Pooling ③ Filter Factorization ④ Group Convolution ⑤ Depthwise Separable Convolution ⑥ Feature Map Reuse Compression Method Compact Architecture = + Depthwise Convolution Pointwise Convolution • Inception/ Xception/ SqueezeNet/ MobileNet 12/25
  • 13.
    ① Bottleneck ② GlobalAverage Pooling ③ Filter Factorization ④ Group Convolution ⑤ Depthwise Convolution ⑥ Feature Map Reuse Compression Method Compact Architecture • 앞선 Feature map 을 Reuse • Convolution filter 채널 줄이기 • 연산량과 파라미터 줄이기 • DenseNet 13/25
  • 14.
    Compression Method Model deploymentseq • NAS(Neural Architecture Search) • Popular models(Lightweight Models) • Knowledge Distillation • Network Pruning • Quantization • Joint Approach • Kernel Optimization • Graph Optimization • Hardware Improvement • CPU/NPU/TPU/DPU/GPU Model Training Model Compression Compile Deployment -> -> -> 알고리즘 경량화 알고리즘 경량화 14/25
  • 15.
    Compression Method Knowledge Distillation •Features Distillation • Softlabel • Attention Disitillation • KD • FitNets • OverHaul KD • Relational KD 15/25
  • 16.
    Compression Method Network Pruning Weight,Filter, Channel? • Structed Pruning • Unstructured Pruning Metrics • L1/l2 • GM Pruning • BN Pruning Comparison scope • Local Pruning • Global Pruning 16/25
  • 17.
    Compression Method Quantization • DoReFa •PACT • QAT(Quantization Aware Training) • PTQ(Post Training Quantization) 17/25
  • 18.
    Compression Method Joint Approach HanCai et al., “ONCE-FOR-ALL: TRAIN ONE NETWORK AND SPECIALIZE IT FOR EFFICIENT DEPLOYMENT” • pruning, KD, kernel size, number of layers 18/25
  • 19.
    good metric for“Accuracy vs Speed(Latency)” ??? Compression Method 19/25
  • 20.
    Metric • Information Density• NetScore 𝐷 𝑁 = 𝑎(𝑁) 𝑝(𝑁) D(N): informaion density a(N): accuracy p(N): the number of parameters Ω N = 20 log( 𝑎(𝑁)𝛼 𝑝(𝑁)𝛽𝑚(𝑁)𝛾 ) Ω(N): netscore a(N): accuracy p(N): the number of parameters m(N): the number of multiply-accumulate(MAC) operations during inference 𝛼=2, 𝛽=0.5, 𝛾=0.5 Compression Method 20/25
  • 21.
    Compression Method Simone Bianco,et al., “Benchmark Analysis of Representative Deep Neural Network Architectures” 21/25
  • 22.
    Compression Method AlexNet vsSqueezeNet Forrest N. Iandola et al., “SQUEEZENET: ALEXNET-LEVEL ACCURACY WITH 50X FEWER PARAMETERS AND < 0.5MB MODEL SIZE” 22/25
  • 23.
    Compression Method Parameter↓, MAC↓∝? Memory↓, Speed↑ Titan Xp Simone Bianco, et al., “Benchmark Analysis of Representative Deep Neural Network Architectures” 23/25
  • 24.
    Compression Method Parameter↓, MAC↓∝? Memory↓, Speed↑ "Approaches for Energy Efficient Implementation of Deep Neural Networks," a Presentation from MIT 24/25
  • 25.
    Compression Method Parameter↓, MAC↓∝? Memory↓, Speed↑ Yang et al., “A Method to Estimate the Energy Consumption of Deep Neural Networks” 25/25 • ShuffleNet V2 Guides 참고
  • 26.
    Next! • Detail: CompressionMethod • Exercise: Pruning, … • Compression model paper Reference, 도움받은 곳! Youtube • On device AI를 위한 모델 경량화[MODUCON 2019] • [Techtonic 2020] Track 2. 딥러닝 모델을 Production 현장에 배포하고 Serving하 기 - 최효근 프로 • PR-017: Neural Architecture Search with Reinforcement Learning • [풀영상] 모바일 ML을 위한 비전모델을 쥐어짜보즈아! (백수콘 June 2018) Blog • AutoML • https://medium.com/daria-blog/automl-%EC%9D%B4%EB%9E%80- %EB%AC%B4%EC%97%87%EC%9D%BC%EA%B9%8C-1af227af2075 • Knowledge distillation • https://towardsdatascience.com/knowledge-distillation-simplified- dd4973dbc764 • Convolution 경량화 기법들 • https://eehoeskrap.tistory.com/431 Tech Talk • Nota 국제인공지능대전 • Nota 딥러닝 경량화의 최신 동향 및 경량화 플랫폼 넷츠프레소의 소개 • Hyperconnect 회사 소개 26

Editor's Notes

  • #2 https://blogs.nvidia.co.kr/2018/05/15/how-do-i-understand-deep-learning-performance/
  • #6 https://v.kakao.com/v/20210119180103786?from=tgt https://towardsdatascience.com/deep-learning-and-carbon-emissions-79723d5bc86e
  • #12 receptive field 출력레이어의 뉴런에 미치는 입력뉴런 공간