Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Even Faster CNNs: Exploring the New Class of Winograd Algorithms," a Presentation from Arm

1,331 views

Published on

For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/arm/embedded-vision-training/videos/pages/may-2018-embedded-vision-summit-iodice

For more information about embedded vision, please visit:
http://www.embedded-vision.com

Gian Marco Iodice, Senior Software Engineer in the Machine Learning Group at Arm, presents the "Even Faster CNNs: Exploring the New Class of Winograd Algorithms" tutorial at the May 2018 Embedded Vision Summit.

Over the past decade, deep learning networks have revolutionized the task of classification and recognition in a broad area of applications. Deeper and more accurate networks have been proposed every year and more recent developments have shown how these workloads can be implemented on modern low-power embedded platforms. This presentation discusses a recently introduced class of algorithms to reduce the arithmetic complexity of convolution layers with small filter sizes. After an introduction to the latest optimizations techniques for the most common solutions such as GEMM, the talk dives deeply into the design of Winograd algorithms, analyzing the complexity and the performance achieved for convolutional neural networks.

Published in: Technology
  • Be the first to comment

"Even Faster CNNs: Exploring the New Class of Winograd Algorithms," a Presentation from Arm

  1. 1. ▪ ▪ ▪ ▪ ▪ ▪ ▪ Agenda
  2. 2. Arm: Extraordinary Growth From Sensors to Server 50 billion chips shipped 50 billion chips shipped
  3. 3. conv1 17% conv2 22% conv3 18% conv4 18% conv5 18% Layer breakdown for AlexNet Embedded Vision Summit 2016
  4. 4. Even Smaller Convolution Kernels…
  5. 5. CNN Layer Breakdown
  6. 6. Fully Connected Layer Issue (1)
  7. 7. • – • • Fully Connected Layer Issue (2)
  8. 8. GEMM-based Convolution
  9. 9. • • • • • • • • What Did We Do to Improve the Performance?
  10. 10. device OpenCL Concepts: Platform Model
  11. 11. • • • • OpenCL Concepts: Compute Unit
  12. 12. OpenCL concepts: work-items/work-groups • • • • OpenCL Concepts: Work-items/work-group
  13. 13. Improving L1 Cache Utilization: Memory Coalescing
  14. 14. Improving L2 Cache Utilization Tuning LWS (1)
  15. 15. • • • • • • • • Improving L2 Cache Utilization Tuning LWS (2)
  16. 16. Improving L2 Cache Utilization Tuning LWS (3)
  17. 17. Goal
  18. 18. 𝑟0 = 𝑑00 ∙ 𝑤0 + 𝑑01 ∙ 𝑤1 + 𝑑02 ∙ 𝑤2 𝑟1 = 𝑑10 ∙ 𝑤0 + 𝑑11 ∙ 𝑤1 + 𝑑12 ∙ 𝑤2 Introduction
  19. 19. Winograd’s Minimal Filtering Algorithm (1)
  20. 20. 𝑚1 = 𝑘1 + 𝑘2 ∙ 𝑤0 + 𝑤1 + 𝑤2 2 𝑚2 = 𝑘2 − 𝑘1 ∙ 𝑤0 − 𝑤1 + 𝑤2 2 𝑚3 = 𝑘1 − 𝑘3 ∙ w2 𝑚0 = 𝑘0 − 𝑘2 ∙ w0 Winograd’s Minimal Filtering Algorithm (2)
  21. 21. 2D Case: Nest Minimal 1D Algorithms (1)
  22. 22. 2D Case: Nest Minimal 1D Algorithms (2)
  23. 23. 2D Case: Nest Minimal 1D Algorithms (3)
  24. 24. 𝑴1 = 𝑲1 + 𝑲2 ∙ 𝑾0 + 𝑾1 + 𝑾2 2 𝑴2 = 𝑲2 − 𝑲1 ∙ 𝑾0 − 𝑾1 + 𝑾2 2 𝑴3 = 𝑲1 − 𝑲3 ∙ 𝐖2 𝑴0 = 𝑲0 − 𝑲2 ∙ 𝐖0 Complexity Reduction
  25. 25. 𝑌 = 𝐴𝑇 𝐺𝑤𝐺 𝑇 ⊙ 𝐵 𝑇 𝑘𝐵 𝐴 • • • • Algorithm Design (1)
  26. 26. Algorithm Design (2)
  27. 27. Input Transform
  28. 28. Filter Transform
  29. 29. Element-wise Multiplication as Batched GEMM
  30. 30. Output Transform
  31. 31. • • Memory Footprint: GEMM-based vs Winograd-based
  32. 32. • • • • Optimizing Input/Output Transform
  33. 33. • • • • Optimizing batched-GEMM
  34. 34. VGG16 Convolution Layers Breakdown (CPU)
  35. 35. VGG16 convolution layer breakdown (GPU) VGG16 Convolution Layers Breakdown (GPU)
  36. 36. GEMM-based vs Winograd-based Convolution (1)
  37. 37. GEMM-based vs Winograd-based Convolution (2)
  38. 38. Extending Winograd-based Convolution: F(4x4,3x3)
  39. 39. VGG16 Convolution Layers Breakdown (GPU)
  40. 40. • • GEMM-based vs Winograd-based Convolution (1)
  41. 41. • • GEMM-based vs Winograd-based Convolution (2)
  42. 42. • • Accuracy: Absolute Error
  43. 43. • • Accuracy: ILSVRC2012
  44. 44. • • • • • Current Investigations
  45. 45. • • • • • Conclusion
  46. 46. • • • • • • References
  47. 47. 5656 © 2018 Arm Limited The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks

×