SlideShare a Scribd company logo
Copyright © 2016 Imagination Technologies 1
Efficient Convolutional Neural Network
Inference on Mobile GPUs
Paul Brasnett
May 3, 2016
Copyright © 2016 Imagination Technologies 2
• About Imagination Technologies
• PowerVR GPUs
• Case study: Implementing Convolutions
• Performance Analysis
• Conclusions
• Resources
Overview
Copyright © 2016 Imagination Technologies 3
• Imagination Technologies
is a leading IP supplier for
multimedia, processors and
communications
• More than 8bn units
containing Imagination IP
shipped
About Imagination Technologies
SoCfabric
PowerVR
Graphics & GPU Compute
Processors
Ensigma
Communications
Processors
PowerVR
Vision
Processors
MIPS
Processors
PowerVR
Video
Processors
Copyright © 2016 Imagination Technologies 4
What is a Mobile GPU?
Mobile GPU
Optimised for High
Performance at
Low Power
Copyright © 2016 Imagination Technologies 5
What is a Mobile GPU?
Mobile Devices
Automotive
Consumer Multimedia
Wearables
Internet of Things
Augmented Reality
Mobile GPU
Optimised for High
Performance at
Low Power
Copyright © 2016 Imagination Technologies 6
Why Mobile GPUs for Vision Processing?
CPUs can generate large amounts of heat• CPUs can deliver high peak/burst
performance
• But generate large amounts of heat
• PowerVR Mobile GPUs provide
• Lowest power FP16 & int pipelines
• Local memory for highly efficient data
access for compute operations
• Power-saving features such as gating
of non-compute parts of GPU for
efficient compute operation
Copyright © 2016 Imagination Technologies 7
Why Mobile GPUs for Vision Processing?
Provence
(raytracing)
Particle
Simulation –
32k
Particle
Simulation –
4k
Julia Set
Ambient
Occlusion
Denoise Gaussian Blur
CPU 100.00% 100% 100% 100% 100% 100% 100%
PowerVR Series6 265% 407% 517% 963% 1126% 482% 383%
0%
100%
200%
300%
400%
500%
600%
Performancerelative
toCPU
Copyright © 2016 Imagination Technologies 8
Moving the CNN Workload to the GPU
PowerVR GPU — Graphics and computeCPU
Large Cache
Unified System Memory
CPU1
CPU0
THREADS
Few
Multiprocessor (Unified Shading Cluster)
Multiprocessor (Unified Shading Cluster)
Coarse Grain Scheduler
L2
System Level CacheCache Unit
Residency
Slots
Common
StoreCompute Store
Texture
Processing Unit
Residency
Slots
Common
StoreCompute StoreScheduler
System Memory Interface
enqueue
Compute
Kernel
Host
Interface
Scheduler
System Memory Interface
Copyright © 2016 Imagination Technologies 9
Evolution of Mobile GPU
PowerVR
Series 6 GPU
PowerVR
Series 7 GPU
PowerVR
Series 8 GPU
…
Copyright © 2016 Imagination Technologies 10
Evolution of Mobile GPU
OpenCL 1.2
OpenCV
OpenVX
Vulkan
OpenCL 2.0
New APIs
Copyright © 2016 Imagination Technologies 11
• Mobile GPU increasingly dominating compute performance in SoCs
GPU Dominates Compute in Modern SoCs
CPU
GPU
Illustrative diagram only, to show relative CPU/GPU size
Copyright © 2016 Imagination Technologies 12
• State-of-the-art performance
• Rapid development cycles
• Range of vision tasks
• Classification
• Localisation
• Other applications…
Why CNNs?
Camera Localisation
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera
Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015
Copyright © 2016 Imagination Technologies 13
What is a CNN?
Convolution Activation Normalization Pooling Fully Connected
ConvolutionImage Activation Pooling
Fully Connected
CNN Architecture Building Blocks
CNN Example Network
Normalization
Soft Max
Convolution Activation Pooling Normalization
Convolution Activation Pooling Soft Max
Copyright © 2016 Imagination Technologies 14
• Training — Offline
CNN Object Classification
Architecture
Data
CNN Library Compute + Time Model Coefficients
Copyright © 2016 Imagination Technologies 15
• Training — Offline
• Inference — Online
CNN Object Classification
Architecture
Data
CNN Library Compute + Time Model Coefficients
Architecture
Model Coefficients
Copyright © 2016 Imagination Technologies 16
• Training — Offline
• Inference — Online
CNN Object Classification
Architecture
Data
CNN Library Compute + Time Model Coefficients
Architecture
Model Coefficients
Image
CNN Library Compute Classification
Mobile GPU
Copyright © 2016 Imagination Technologies 17
Where is the Cost in CNN Inference?
Flops by layer-type (AlexNet)
Convolution
Normalisation
Pooling
Fully Connected
Copyright © 2016 Imagination Technologies 18
• Create as many work-items as is size of output matrix
• Each work-item will read it’s row and column and produce dot product
• Requires large number of accesses to memory
Matrix Multiply — Naïve
x =
A B C
Copyright © 2016 Imagination Technologies 19
• The OpenCL memory model
closely maps to GPU architecture
• Private Memory — Per work-item
• Local Memory
• Shared within a work-group
• Global Memory /Constant Memory
• Visible to all work-groups
• Host memory
• Typically share CPU/GPU on a
mobile SoC
OpenCL Memory Model
Copyright © 2016 Imagination Technologies 20
• Work-items load A data into private memory
Matrix Multiply — Tiling Approach
Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”
x =
A B C
Copyright © 2016 Imagination Technologies 21
• Work-items load A data into private memory
• Work-groups load B data into local memory
• Each work item will read from local memory and produce a dot product
• Significantly reduces global memory accesses
Matrix Multiply — Tiling Approach
x =
A B C
Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”
Copyright © 2016 Imagination Technologies 22
• Choose work-group size to fit the GPU, 32 work-items is typically a good
choice for PowerVR GPUs
• Read multiple items (e.g. 4 or 8) into private memory at a time to optimise
memory transfers
• Consider the use of half data type in place of float
• Most PowerVR platforms provide up to 2x the flops
• Define workgroup size at compile time
• __attribute__((reqd_work_group_size(SIZE, 1, 1)))
Matrix Multiply — OpenCL Tips
Copyright © 2016 Imagination Technologies 23
Matrix Multiply — Tiling Approach
0.1
1
10
100
1000
Time(s)
Matrix Size
Naïve
Tiled matrix multiply
Copyright © 2016 Imagination Technologies 24
CNN Classification: AlexNet & GoogLeNet
60
5.5
Model Coefficients
(Millions)
AlexNet GoogLeNet
1.3
3.1
Operations
(Billions)
AlexNet GoogLeNet18.2
10.07
Top-5 Error Rate (%)
AlexNet GoogLeNet
 Bandwidth  Compute
Copyright © 2016 Imagination Technologies 25
• Time consumed by layer type
Performance Analysis — CNN Inference
GoogLeNet
Convolutions
Pooling
Normalisation
Fully Connected
Reference Time*: 1.36 Reference Time*: 1.00
AlexNet
Convolutions
Pooling
Normalisation
Fully Connected
Copyright © 2016 Imagination Technologies 26
Performance Analysis — GPU v CPU*
* CPU results based on Caffe (with ATLAS)
0
2
4
6
8
10
12
14RelativeFPSPerformance
(Higherisbetter)
AlexNet
GPU - PowerVR 2 Cluster
GPU (480MHz)
CPU - ARM A15 (1.6GHz)
Copyright © 2016 Imagination Technologies 27
Efficiency Analysis — GPU v CPU
0
0.5
1
1.5
2
2.5
3
3.5
RelativeEfficiency(Higheris
better)
AlexNet
GPU - PowerVR 2
Cluster GPU (480MHz)
CPU - ARM A15
(1.6GHz)
Copyright © 2016 Imagination Technologies 28
• Mobile GPUs are widely available in a range of SoCs across numerous
markets today
• Compared to mobile CPUs, PowerVR Mobile GPUs offer
• upto 3x higher efficiency and
• upto 12x higher performance deployment for CNNs
• Newer CNN architectures with smaller fully connected layers help to
make more efficient use of compute resources
• PowerVR GPUs scale to allow for higher levels of performance & lower
power for current and future generations of vision enabled products
• COME & SEE THE DEMO DURING THE NEXT BREAK
Conclusions
Copyright © 2016 Imagination Technologies 29
• PowerVR GPU Compute
• https://imgtec.com/tools/powervr-gpu-compute/
• Guide to writing OpenCL
• http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue
• PowerVR Imaging Framework
• http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk
• PowerVR CNN Demo
• See our stand
• OpenCL Tutorial
• https://handsonopencl.github.io/
Resources

More Related Content

More from Edge AI and Vision Alliance

“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
Edge AI and Vision Alliance
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
Edge AI and Vision Alliance
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
Edge AI and Vision Alliance
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
Edge AI and Vision Alliance
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
Edge AI and Vision Alliance
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 

Recently uploaded

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

"Efficient Convolutional Neural Network Inference on Mobile GPUs," a Presentation from Imagination Technologies

  • 1. Copyright © 2016 Imagination Technologies 1 Efficient Convolutional Neural Network Inference on Mobile GPUs Paul Brasnett May 3, 2016
  • 2. Copyright © 2016 Imagination Technologies 2 • About Imagination Technologies • PowerVR GPUs • Case study: Implementing Convolutions • Performance Analysis • Conclusions • Resources Overview
  • 3. Copyright © 2016 Imagination Technologies 3 • Imagination Technologies is a leading IP supplier for multimedia, processors and communications • More than 8bn units containing Imagination IP shipped About Imagination Technologies SoCfabric PowerVR Graphics & GPU Compute Processors Ensigma Communications Processors PowerVR Vision Processors MIPS Processors PowerVR Video Processors
  • 4. Copyright © 2016 Imagination Technologies 4 What is a Mobile GPU? Mobile GPU Optimised for High Performance at Low Power
  • 5. Copyright © 2016 Imagination Technologies 5 What is a Mobile GPU? Mobile Devices Automotive Consumer Multimedia Wearables Internet of Things Augmented Reality Mobile GPU Optimised for High Performance at Low Power
  • 6. Copyright © 2016 Imagination Technologies 6 Why Mobile GPUs for Vision Processing? CPUs can generate large amounts of heat• CPUs can deliver high peak/burst performance • But generate large amounts of heat • PowerVR Mobile GPUs provide • Lowest power FP16 & int pipelines • Local memory for highly efficient data access for compute operations • Power-saving features such as gating of non-compute parts of GPU for efficient compute operation
  • 7. Copyright © 2016 Imagination Technologies 7 Why Mobile GPUs for Vision Processing? Provence (raytracing) Particle Simulation – 32k Particle Simulation – 4k Julia Set Ambient Occlusion Denoise Gaussian Blur CPU 100.00% 100% 100% 100% 100% 100% 100% PowerVR Series6 265% 407% 517% 963% 1126% 482% 383% 0% 100% 200% 300% 400% 500% 600% Performancerelative toCPU
  • 8. Copyright © 2016 Imagination Technologies 8 Moving the CNN Workload to the GPU PowerVR GPU — Graphics and computeCPU Large Cache Unified System Memory CPU1 CPU0 THREADS Few Multiprocessor (Unified Shading Cluster) Multiprocessor (Unified Shading Cluster) Coarse Grain Scheduler L2 System Level CacheCache Unit Residency Slots Common StoreCompute Store Texture Processing Unit Residency Slots Common StoreCompute StoreScheduler System Memory Interface enqueue Compute Kernel Host Interface Scheduler System Memory Interface
  • 9. Copyright © 2016 Imagination Technologies 9 Evolution of Mobile GPU PowerVR Series 6 GPU PowerVR Series 7 GPU PowerVR Series 8 GPU …
  • 10. Copyright © 2016 Imagination Technologies 10 Evolution of Mobile GPU OpenCL 1.2 OpenCV OpenVX Vulkan OpenCL 2.0 New APIs
  • 11. Copyright © 2016 Imagination Technologies 11 • Mobile GPU increasingly dominating compute performance in SoCs GPU Dominates Compute in Modern SoCs CPU GPU Illustrative diagram only, to show relative CPU/GPU size
  • 12. Copyright © 2016 Imagination Technologies 12 • State-of-the-art performance • Rapid development cycles • Range of vision tasks • Classification • Localisation • Other applications… Why CNNs? Camera Localisation PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015
  • 13. Copyright © 2016 Imagination Technologies 13 What is a CNN? Convolution Activation Normalization Pooling Fully Connected ConvolutionImage Activation Pooling Fully Connected CNN Architecture Building Blocks CNN Example Network Normalization Soft Max Convolution Activation Pooling Normalization Convolution Activation Pooling Soft Max
  • 14. Copyright © 2016 Imagination Technologies 14 • Training — Offline CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients
  • 15. Copyright © 2016 Imagination Technologies 15 • Training — Offline • Inference — Online CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients Architecture Model Coefficients
  • 16. Copyright © 2016 Imagination Technologies 16 • Training — Offline • Inference — Online CNN Object Classification Architecture Data CNN Library Compute + Time Model Coefficients Architecture Model Coefficients Image CNN Library Compute Classification Mobile GPU
  • 17. Copyright © 2016 Imagination Technologies 17 Where is the Cost in CNN Inference? Flops by layer-type (AlexNet) Convolution Normalisation Pooling Fully Connected
  • 18. Copyright © 2016 Imagination Technologies 18 • Create as many work-items as is size of output matrix • Each work-item will read it’s row and column and produce dot product • Requires large number of accesses to memory Matrix Multiply — Naïve x = A B C
  • 19. Copyright © 2016 Imagination Technologies 19 • The OpenCL memory model closely maps to GPU architecture • Private Memory — Per work-item • Local Memory • Shared within a work-group • Global Memory /Constant Memory • Visible to all work-groups • Host memory • Typically share CPU/GPU on a mobile SoC OpenCL Memory Model
  • 20. Copyright © 2016 Imagination Technologies 20 • Work-items load A data into private memory Matrix Multiply — Tiling Approach Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime” x = A B C
  • 21. Copyright © 2016 Imagination Technologies 21 • Work-items load A data into private memory • Work-groups load B data into local memory • Each work item will read from local memory and produce a dot product • Significantly reduces global memory accesses Matrix Multiply — Tiling Approach x = A B C Tiling approach based on “2008. Volkov and Demmel. Using GPUs to accelerate linear algebra runtime”
  • 22. Copyright © 2016 Imagination Technologies 22 • Choose work-group size to fit the GPU, 32 work-items is typically a good choice for PowerVR GPUs • Read multiple items (e.g. 4 or 8) into private memory at a time to optimise memory transfers • Consider the use of half data type in place of float • Most PowerVR platforms provide up to 2x the flops • Define workgroup size at compile time • __attribute__((reqd_work_group_size(SIZE, 1, 1))) Matrix Multiply — OpenCL Tips
  • 23. Copyright © 2016 Imagination Technologies 23 Matrix Multiply — Tiling Approach 0.1 1 10 100 1000 Time(s) Matrix Size Naïve Tiled matrix multiply
  • 24. Copyright © 2016 Imagination Technologies 24 CNN Classification: AlexNet & GoogLeNet 60 5.5 Model Coefficients (Millions) AlexNet GoogLeNet 1.3 3.1 Operations (Billions) AlexNet GoogLeNet18.2 10.07 Top-5 Error Rate (%) AlexNet GoogLeNet  Bandwidth  Compute
  • 25. Copyright © 2016 Imagination Technologies 25 • Time consumed by layer type Performance Analysis — CNN Inference GoogLeNet Convolutions Pooling Normalisation Fully Connected Reference Time*: 1.36 Reference Time*: 1.00 AlexNet Convolutions Pooling Normalisation Fully Connected
  • 26. Copyright © 2016 Imagination Technologies 26 Performance Analysis — GPU v CPU* * CPU results based on Caffe (with ATLAS) 0 2 4 6 8 10 12 14RelativeFPSPerformance (Higherisbetter) AlexNet GPU - PowerVR 2 Cluster GPU (480MHz) CPU - ARM A15 (1.6GHz)
  • 27. Copyright © 2016 Imagination Technologies 27 Efficiency Analysis — GPU v CPU 0 0.5 1 1.5 2 2.5 3 3.5 RelativeEfficiency(Higheris better) AlexNet GPU - PowerVR 2 Cluster GPU (480MHz) CPU - ARM A15 (1.6GHz)
  • 28. Copyright © 2016 Imagination Technologies 28 • Mobile GPUs are widely available in a range of SoCs across numerous markets today • Compared to mobile CPUs, PowerVR Mobile GPUs offer • upto 3x higher efficiency and • upto 12x higher performance deployment for CNNs • Newer CNN architectures with smaller fully connected layers help to make more efficient use of compute resources • PowerVR GPUs scale to allow for higher levels of performance & lower power for current and future generations of vision enabled products • COME & SEE THE DEMO DURING THE NEXT BREAK Conclusions
  • 29. Copyright © 2016 Imagination Technologies 29 • PowerVR GPU Compute • https://imgtec.com/tools/powervr-gpu-compute/ • Guide to writing OpenCL • http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue • PowerVR Imaging Framework • http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk • PowerVR CNN Demo • See our stand • OpenCL Tutorial • https://handsonopencl.github.io/ Resources