Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Architecture Design for Deep Neural Networks I

21 views

Published on

ICME2019 Tutorial: Architecture Design for Deep Neural Networks I

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Architecture Design for Deep Neural Networks I

  1. 1. Gao Huang Assistant Professor Department of Automation, Tsinghua University Neural Architectures for Efficient Inference
  2. 2. OUTLINE 1. Macro-architecture innovations in ConvNets 2. Micro-architecture innovations in ConvNets 3. From static network to dynamic network
  3. 3. PART 1 MACRO-ARCH INNOVATIONS IN CNN
  4. 4. CONVOLUTIONAL NETWORKS LeNet AlexNet VGG Inception ResNet DenseNet
  5. 5. Main Ideas: ✓ Convolution, local receptive fields, shared weights ✓ Spatial subsampling LENET [LECUN ET AL. 1998]
  6. 6. Main Ideas: ✓ ReLU (Rectified Linear Unit) ✓ Dropout ✓ Local Response Normalization, Overlapping Pooling ✓ Data Augmentation, Multiple GPUs ALEXNET [KRIZHEVSKY ET AL. 2012]
  7. 7. Main Idea: ✓ Skip connection: promotes gradient propagation RESNET [HE ET AL. 2016] Identity mappings promote gradient propagation. : Element-wise addition
  8. 8. Main Idea: ✓ Dense connectivity: creates short path in the network and encourages feature reuse. ResNet GoogleNet FractalNet DENSENET [HUANG ET AL. 2017]
  9. 9. REDUNDANCY IN DEEP MODELS Classifier Input Prediction Low-level features Mid-level features High-level features Classifier
  10. 10. REDUCING REDUNDANCY
  11. 11. DENSE CONNECTIVITY C C C C : Channel-wise concatenationC
  12. 12. DENSE AND SLIM k channels k channels k channels k channels k : Growth Rate C C C C
  13. 13. Model #Layers #Parameters Validation Error ResNet 18 11.7M 30.43% DenseNet 121 ?? ?? DENSE AND SLIM ARCHITECTURE
  14. 14. Model #Layers #Parameters Validation Error ResNet 18 11.7M 30.43% DenseNet 121 8.0M 25.03% DENSE AND SLIM ARCHITECTURE More connections, less computation!
  15. 15. 21.0 22.5 24.0 25.5 27.0 Top-1error(%) GFLOPs ResNet DenseNet 21.0 22.5 24.0 25.5 27.0 Top-1error(%) # Parameters (M) ResNet DenseNet ResNet-152 ResNet-101 ResNet-50 ResNet-34 ResNet-152 ResNet-101 ResNet-50 ResNet-34 DenseNet-232(k=48) DenseNet-264 DenseNet-201 DenseNet-169 DenseNet-121 DenseNet-121 DenseNet-169 DenseNet-201 DenseNet-264 DenseNet-232(k=48) RESULTS ON IMAGENET
  16. 16. THELOSSSURFACE https://www.cs.umd.edu/~tomg/projects/landscapes/ VGG-56 VGG-110 DenseNet-121ResNet-56 Visualizing the loss landscape of neural nets. Li, Hao, et al. NIPS 2018.a
  17. 17. Main Idea: ✓ Multi-scale feature fusion: merge signals with different frequencies. Interlinked CNN (Zhou et al, ISNN’15) Neural Fabric (Saxena & Verbeek, NIPS’16) MSDNet (Huang et al, ICLR’18) HRNet (Sun et al, CVPR’19) MULTI-SCALE NETWORKS
  18. 18. Main Idea: ✓ Automatic architecture search using reinforcement learning, genetic/evolutional algorithms or differentiable approaches. AutoML is a very active research field, see www.automl.org Neural Architecture Search [Zoph and Le, 2017 and many]
  19. 19. PART 2 MICRO-ARCH INNOVATIONS IN CNN
  20. 20. Main Idea: ✓ Split convolution into multiple groups Standard Convolution Group Convolution 𝑂 𝐶 × 𝐶 𝐺 𝑂 𝐶 × 𝐶 Networks using Group Convolution: ✓ AlexNet (Krizhevsky et al, NIPS’12) ✓ ResNeXt (Xie et al, CVPR’17) ✓ CondenseNet (Huang et al, CVPR’18) ✓ ShuffleNet (Zhang et al, CVPR’18) ✓ … Group Convolution
  21. 21. Main Idea: ✓ Split convolution into multiple groups, each group has one channel Networks using DSC: ✓ Xception (Chollet, CVPR’17) ✓ MobileNet (Howard et al, CVPR’18) ✓ MobileNet V2 (Sandler et al, 2018) ✓ ShuffleNet V2 (Ma et al, CVPR’19) ✓ NasNet (Zoph, CVPR’18) ✓ … Depth-wise Separable Convolution (DSC)
  22. 22. Main Idea: ✓ Channel-wise attention: second order operations Squeeze and excitation network (Hu et al, CVPR’18)
  23. 23. Main Idea: ✓ Increase receptive field via filter dilation Dilated convolution (Yu & Koltun, ICLR’18)
  24. 24. Main Idea: ✓ Learn the offset filed for convolutional filters Deformable convolution (Dai et al, CVPR’18)
  25. 25. PART 3 DYNAMIC NETWORKS
  26. 26. 57. 68.7 70.5 77.8 79.6 0. 20. 40. 60. 80. 100. AlexNet GoogleNet VGG ResNet-152 DenseNet-264 Top-1Accuracy DEVELOPMENT OF DEEP LEARNING 2012 2014 2014 2015 2017
  27. 27. ACCURACY-TIME TRADEOFF
  28. 28. *Photo Courtesy of Pixel Addict (CC BY-ND 2.0) BIGGER IS BETTER Bigger models are needed for those noncanonical images.
  29. 29. *Photo Courtesy of Willian Doyle(CC BY-ND 2.0) BIGGER IS BETTER
  30. 30. Why do we use the same expensive model for all images?
  31. 31. Can we use small&cheap models for easy images; big&expensive models for hard ones?
  32. 32. A NAIVE IDEA OF ADAPTIVE EVALUATION AlexNet Inception ResNet "easy" horse "hard" hoarse
  33. 33. CHALLENGE: LACK OF COARSE-LEVEL FEATURES Down-sampling Linear Output Classifiers only work well on coarse-scale feature maps Nearly all computation has been done before getting a coarse feature Fine-level features Mid-level features Coarse-level features Down-sampling Down-sampling Input Classifier Classifier Classifier
  34. 34. SOLUTION: MULTI-SCALE ARCHITECTURE Fine-level features Mid-level features Coarse-level features
  35. 35. Fine-level features Mid-level features Coarse-level features SOLUTION: MULTI-SCALE ARCHITECTURE
  36. 36. Classifier 4Classifier 2 Classifier 3Classifier 1 … … … Test Input … MULTI-SCALE FEATURES Classifiers only operate on high level features! Fine-level features Mid-level features Coarse-level features
  37. 37. Classifier 4Classifier 2 Classifier 3Classifier 1 … … … Test Input … MULTI-SCALE DENSENET
  38. 38. Classifier 4Classifier 2 Classifier 3Classifier 1 … … … …Classifier 2 Classifier 3Classifier 1 cat: 0.2 0.2 ≱ threshold cat: 0.4 0.4 ≱ threshold cat: 0.6 0.6 > threshold MULTI-SCALE DENSENET
  39. 39. MULTI-SCALE DENSENET Results 2x-5x speedup over DenseNet
  40. 40. Classifier 4Classifier 2 Classifier 3Classifier 1 … … … Test Input … VISUALIZATION
  41. 41. Class: red wine Class: volcano "easy" "hard" VISUALIZATION (exit at first classifier) (exit at last classifier)
  42. 42. MORE RESULTS AND DISCUSSIONS Please refer to Multi-scale dense network for efficient image classification, ICLR Oral, 2018 (Acceptance rate 2.2%, Rank 4/935)
  43. 43. ADAPTIVE INFERENCE IS A CHALLENGING PROBLEM How to design proper network architectures? How to effectively training dynamic networks? How to efficiently perform dynamic evaluation? Adaptive inference for object detection and segmentation? Spatial adaptive and temporal adaptive?
  44. 44. THANK YOU !

×