Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Copyright © 2016 Imagination Technologies 1
Efficient Convolutional Neural Network
Inference on Mobile GPUs
Paul Brasnett
...
Copyright © 2016 Imagination Technologies 2
• About Imagination Technologies
• PowerVR GPUs
• Case study: Implementing Con...
Copyright © 2016 Imagination Technologies 3
• Imagination Technologies
is a leading IP supplier for
multimedia, processors...
Copyright © 2016 Imagination Technologies 4
What is a Mobile GPU?
Mobile GPU
Optimised for High
Performance at
Low Power
Copyright © 2016 Imagination Technologies 5
What is a Mobile GPU?
Mobile Devices
Automotive
Consumer Multimedia
Wearables
...
Copyright © 2016 Imagination Technologies 6
Why Mobile GPUs for Vision Processing?
CPUs can generate large amounts of heat...
Copyright © 2016 Imagination Technologies 7
Why Mobile GPUs for Vision Processing?
Provence
(raytracing)
Particle
Simulati...
Copyright © 2016 Imagination Technologies 8
Moving the CNN Workload to the GPU
PowerVR GPU — Graphics and computeCPU
Large...
Copyright © 2016 Imagination Technologies 9
Evolution of Mobile GPU
PowerVR
Series 6 GPU
PowerVR
Series 7 GPU
PowerVR
Seri...
Copyright © 2016 Imagination Technologies 10
Evolution of Mobile GPU
OpenCL 1.2
OpenCV
OpenVX
Vulkan
OpenCL 2.0
New APIs
Copyright © 2016 Imagination Technologies 11
• Mobile GPU increasingly dominating compute performance in SoCs
GPU Dominate...
Copyright © 2016 Imagination Technologies 12
• State-of-the-art performance
• Rapid development cycles
• Range of vision t...
Copyright © 2016 Imagination Technologies 13
What is a CNN?
Convolution Activation Normalization Pooling Fully Connected
C...
Copyright © 2016 Imagination Technologies 14
• Training — Offline
CNN Object Classification
Architecture
Data
CNN Library ...
Copyright © 2016 Imagination Technologies 15
• Training — Offline
• Inference — Online
CNN Object Classification
Architect...
Copyright © 2016 Imagination Technologies 16
• Training — Offline
• Inference — Online
CNN Object Classification
Architect...
Copyright © 2016 Imagination Technologies 17
Where is the Cost in CNN Inference?
Flops by layer-type (AlexNet)
Convolution...
Copyright © 2016 Imagination Technologies 18
• Create as many work-items as is size of output matrix
• Each work-item will...
Copyright © 2016 Imagination Technologies 19
• The OpenCL memory model
closely maps to GPU architecture
• Private Memory —...
Copyright © 2016 Imagination Technologies 20
• Work-items load A data into private memory
Matrix Multiply — Tiling Approac...
Copyright © 2016 Imagination Technologies 21
• Work-items load A data into private memory
• Work-groups load B data into l...
Copyright © 2016 Imagination Technologies 22
• Choose work-group size to fit the GPU, 32 work-items is typically a good
ch...
Copyright © 2016 Imagination Technologies 23
Matrix Multiply — Tiling Approach
0.1
1
10
100
1000
Time(s)
Matrix Size
Naïve...
Copyright © 2016 Imagination Technologies 24
CNN Classification: AlexNet & GoogLeNet
60
5.5
Model Coefficients
(Millions)
...
Copyright © 2016 Imagination Technologies 25
• Time consumed by layer type
Performance Analysis — CNN Inference
GoogLeNet
...
Copyright © 2016 Imagination Technologies 26
Performance Analysis — GPU v CPU*
* CPU results based on Caffe (with ATLAS)
0...
Copyright © 2016 Imagination Technologies 27
Efficiency Analysis — GPU v CPU
0
0.5
1
1.5
2
2.5
3
3.5
RelativeEfficiency(Hi...
Copyright © 2016 Imagination Technologies 28
• Mobile GPUs are widely available in a range of SoCs across numerous
markets...
Copyright © 2016 Imagination Technologies 29
• PowerVR GPU Compute
• https://imgtec.com/tools/powervr-gpu-compute/
• Guide...
Upcoming SlideShare
Loading in …5
×

"Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

2,879 views

Published on

For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/imagination-technologies/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit

For more information about embedded vision, please visit:
http://www.embedded-vision.com

Paul Brasnett, Principal Research Engineer at Imagination Technologies, presents the "Efficient Convolutional Neural Network Inference on Mobile GPUs" tutorial at the May 2016 Embedded Vision Summit.

GPUs have become established as a key tool for training of deep learning algorithms. Deploying those algorithms on end devices is a key enabler to their commercial success and mobile GPUs are proving to be an efficient target processor that is readily available in end devices today. This talk looks at how to approach the task of deploying convolutional neural networks (CNNs) on mobile GPUs today. Brasnett explores the key primitives for CNN inference and what strategies there are for implementation. He works through alternative options and trade-offs, and provides reference performance analysis on mobile GPUs, using the PowerVR architecture as a case study.

Published in: Technology
  • Be the first to comment

"Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

  1. 1. Copyright © 2016 Imagination Technologies 1 Efficient Convolutional Neural Network Inference on Mobile GPUs Paul Brasnett May 3, 2016
  2. 2. Copyright © 2016 Imagination Technologies 2 • About Imagination Technologies • PowerVR GPUs • Case study: Implementing Convolutions • Performance Analysis • Conclusions • Resources Overview
  3. 3. Copyright © 2016 Imagination Technologies 3 • Imagination Technologies is a leading IP supplier for multimedia, processors and communications • More than 8bn units containing Imagination IP shipped About Imagination Technologies SoCfabric PowerVR Graphics & GPU Compute Processors Ensigma Communications Processors PowerVR Vision Processors MIPS Processors PowerVR Video Processors
  4. 4. Copyright © 2016 Imagination Technologies 4 What is a Mobile GPU? Mobile GPU Optimised for High Performance at Low Power
  5. 5. Copyright © 2016 Imagination Technologies 5 What is a Mobile GPU? Mobile Devices Automotive Consumer Multimedia Wearables Internet of Things Augmented Reality Mobile GPU Optimised for High Performance at Low Power
  6. 6. Copyright © 2016 Imagination Technologies 6 Why Mobile GPUs for Vision Processing? CPUs can generate large amounts of heat• CPUs can deliver high peak/burst performance • But generate large amounts of heat • PowerVR Mobile GPUs provide • Lowest power FP16 & int pipelines • Local memory for highly efficient data access for compute operations • Power-saving features such as gating of non-compute parts of GPU for efficient compute operation
  7. 7. Copyright © 2016 Imagination Technologies 7 Why Mobile GPUs for Vision Processing? Provence (raytracing) Particle Simulation – 32k Particle Simulation – 4k Julia Set Ambient Occlusion Denoise Gaussian Blur CPU 100.00% 100% 100% 100% 100% 100% 100% PowerVR Series6 265% 407% 517% 963% 1126% 482% 383% 0% 100% 200% 300% 400% 500% 600% Performancerelative toCPU
  8. 8. Copyright © 2016 Imagination Technologies 8 Moving the CNN Workload to the GPU PowerVR GPU — Graphics and computeCPU Large Cache Unified System Memory CPU1 CPU0 THREADS Few Multiprocessor (Unified Shading Cluster) Multiprocessor (Unified Shading Cluster) Coarse Grain Scheduler L2 System Level CacheCache Unit Residency Slots Common StoreCompute Store Texture Processing Unit Residency Slots Common StoreCompute StoreScheduler System Memory Interface enqueue Compute Kernel Host Interface Scheduler System Memory Interface
  9. 9. Copyright © 2016 Imagination Technologies 9 Evolution of Mobile GPU PowerVR Series 6 GPU PowerVR Series 7 GPU PowerVR Series 8 GPU …
  10. 10. Copyright © 2016 Imagination Technologies 10 Evolution of Mobile GPU OpenCL 1.2 OpenCV OpenVX Vulkan OpenCL 2.0 New APIs
  11. 11. Copyright © 2016 Imagination Technologies 11 • Mobile GPU increasingly dominating compute performance in SoCs GPU Dominates Compute in Modern SoCs CPU GPU Illustrative diagram only, to show relative CPU/GPU size
  12. 12. Copyright © 2016 Imagination Technologies 12 • State-of-the-art performance • Rapid development cycles • Range of vision tasks • Classification • Localisation • Other applications… Why CNNs? Camera Localisation PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015
  13. 13. Copyright © 2016 Imagination Technologies 13 What is a CNN? Convolution Activation Normalization Pooling Fully Connected ConvolutionImage Activation Pooling Fully Connected CNN Architecture Building Blocks CNN Example Network Normalization Soft Max Convolution Activation Pooling Normalization Convolution Activation Pooling Soft Max
  14. 14. Copyright © 2016 Imagination Technologies 14 • Training — Offline CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients
  15. 15. Copyright © 2016 Imagination Technologies 15 • Training — Offline • Inference — Online CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients Architecture Model Coefficients
  16. 16. Copyright © 2016 Imagination Technologies 16 • Training — Offline • Inference — Online CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients Architecture Model Coefficients Image CNN Library Compute Classification Mobile GPU
  17. 17. Copyright © 2016 Imagination Technologies 17 Where is the Cost in CNN Inference? Flops by layer-type (AlexNet) Convolution Normalisation Pooling Fully Connected
  18. 18. Copyright © 2016 Imagination Technologies 18 • Create as many work-items as is size of output matrix • Each work-item will read it’s row and column and produce dot product • Requires large number of accesses to memory Matrix Multiply — Naïve x = A B C
  19. 19. Copyright © 2016 Imagination Technologies 19 • The OpenCL memory model closely maps to GPU architecture • Private Memory — Per work-item • Local Memory • Shared within a work-group • Global Memory /Constant Memory • Visible to all work-groups • Host memory • Typically share CPU/GPU on a mobile SoC OpenCL Memory Model
  20. 20. Copyright © 2016 Imagination Technologies 20 • Work-items load A data into private memory Matrix Multiply — Tiling Approach Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime” x = A B C
  21. 21. Copyright © 2016 Imagination Technologies 21 • Work-items load A data into private memory • Work-groups load B data into local memory • Each work item will read from local memory and produce a dot product • Significantly reduces global memory accesses Matrix Multiply — Tiling Approach x = A B C Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”
  22. 22. Copyright © 2016 Imagination Technologies 22 • Choose work-group size to fit the GPU, 32 work-items is typically a good choice for PowerVR GPUs • Read multiple items (e.g. 4 or 8) into private memory at a time to optimise memory transfers • Consider the use of half data type in place of float • Most PowerVR platforms provide up to 2x the flops • Define workgroup size at compile time • __attribute__((reqd_work_group_size(SIZE, 1, 1))) Matrix Multiply — OpenCL Tips
  23. 23. Copyright © 2016 Imagination Technologies 23 Matrix Multiply — Tiling Approach 0.1 1 10 100 1000 Time(s) Matrix Size Naïve Tiled matrix multiply
  24. 24. Copyright © 2016 Imagination Technologies 24 CNN Classification: AlexNet & GoogLeNet 60 5.5 Model Coefficients (Millions) AlexNet GoogLeNet 1.3 3.1 Operations (Billions) AlexNet GoogLeNet18.2 10.07 Top-5 Error Rate (%) AlexNet GoogLeNet  Bandwidth  Compute
  25. 25. Copyright © 2016 Imagination Technologies 25 • Time consumed by layer type Performance Analysis — CNN Inference GoogLeNet Convolutions Pooling Normalisation Fully Connected Reference Time*: 1.36 Reference Time*: 1.00 AlexNet Convolutions Pooling Normalisation Fully Connected
  26. 26. Copyright © 2016 Imagination Technologies 26 Performance Analysis — GPU v CPU* * CPU results based on Caffe (with ATLAS) 0 2 4 6 8 10 12 14RelativeFPSPerformance (Higherisbetter) AlexNet GPU - PowerVR 2 Cluster GPU (480MHz) CPU - ARM A15 (1.6GHz)
  27. 27. Copyright © 2016 Imagination Technologies 27 Efficiency Analysis — GPU v CPU 0 0.5 1 1.5 2 2.5 3 3.5 RelativeEfficiency(Higheris better) AlexNet GPU - PowerVR 2 Cluster GPU (480MHz) CPU - ARM A15 (1.6GHz)
  28. 28. Copyright © 2016 Imagination Technologies 28 • Mobile GPUs are widely available in a range of SoCs across numerous markets today • Compared to mobile CPUs, PowerVR Mobile GPUs offer • upto 3x higher efficiency and • upto 12x higher performance deployment for CNNs • Newer CNN architectures with smaller fully connected layers help to make more efficient use of compute resources • PowerVR GPUs scale to allow for higher levels of performance & lower power for current and future generations of vision enabled products • COME & SEE THE DEMO DURING THE NEXT BREAK Conclusions
  29. 29. Copyright © 2016 Imagination Technologies 29 • PowerVR GPU Compute • https://imgtec.com/tools/powervr-gpu-compute/ • Guide to writing OpenCL • http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue • PowerVR Imaging Framework • http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk • PowerVR CNN Demo • See our stand • OpenCL Tutorial • https://handsonopencl.github.io/ Resources

×