SlideShare a Scribd company logo
1 of 21
Download to read offline
Copyright © 2015 ARM 1
Roberto Mijat, Visual Computing Marketing Manager
12 May 2015
Understanding the Role of Integrated
GPUs in Vision Applications
Copyright © 2015 ARM 2
• World leading semiconductor IP licensor
• Founded in 1990
• >1K processor licenses (>350 partners)
• >12bn shipments in 2014
• >50bn shipments to date
• Business model
• Designing and licensing of IP
• Not manufacturing chips
• Products
• CPUs
• Suite of integrated media IP
• Interconnect
• Physical IP
Introduction to ARM and Mali
Copyright © 2015 ARM 3
What is GPU Compute?
Cost-effective, efficient,
and high-performance
parallel computation
• 2D/3D Graphics
• Image processing
• Multimedia
• Computer Vision
OS and applications
The GPU is now programmable
through C-like high-level languages
Managed as
accelerator or
companion
processor
CPU GPU
Copyright © 2015 ARM 4
Comprehensive Heterogeneous Compute
Generic CPU
Serial workloads
Task parallel workloads
< 10 threads
1-8 cores
Short pipeline (generally <20 stages)
Low latency
General purpose
SIMD engine
Generic GPU
Data parallel workloads
100s-1000s threads
1-100s cores
Long pipeline (generally >50 stages)
Very high latency
High throughput
2D/3D Graphics
Stream processing
Copyright © 2015 ARM 5
• Reduced power, performance and area compared to desktop/HPC
• Designed for fan-less mobile devices
• Optimized for energy efficiency
• Integrated in System-On-Chip
• Sharing physical main memory with CPU (and other processors)
• Local caches
• I/O coherency available in newer platforms
• Primary use case is 3D graphics acceleration
• Gaming and user interfaces
• Modern designs support GPU Compute (aka GPGPU)
• GPU is a standard feature in mobile devices
Key Characteristics of Integrated Mobile GPU
Copyright © 2015 ARM 6
• Option #1: Manually optimize your code
• Study your algorithm, determine how to partition it
• Optimize using low-level NEON™ (CPU) and OpenCL (GPU)
• Manually fine tune load balance between CPU and GPU
• Option #2: Utilize GPU Compute enabled middleware, for example:
• Gesture UI middleware from eyeSight
• OpenCL enabled functionality from ArcSoft, Fotonation
• Computer Vision mobile library from ArrayFire
• But the key question is not how, but: WHEN should you be using the GPU
for computer vision?
• This presentation will try to answer this question through examples
Enabling Computer Vision on Integrated GPU
Copyright © 2015 ARM 7
• Set of sub-sampled images
• Each level
• Apply smoothing filter
• Sub-sample in both direction
• Widely used in computer vision
• Feature extraction
• Stereo vision
• Object detection
Image Pyramid: The Algorithm
x
y
y/2
y/4
x/2
x/4
Copyright © 2015 ARM 8
• In principle well suited to GPU
• Embarrassingly parallel problem
• No data dependencies
• Generally the GPU improves performance
• Architectural specific optimizations
• Algorithm structure changes
• GPU specific optimization stages
• Added interleaving conversions to enable planar level operations
• Consolidated GPU kernels to improve efficiently
• Used OpenCL data structures and vector maths
Image Pyramid: Optimizing for GPU
Copyright © 2015 ARM 9
• Popular tuneable algorithm to extract edges from images
• 4 main stages
• Gaussian filter (reduce noise)
• Sobel filter (identify candidate edges)
• Remove pixels that are not a local maximum
• Hysteresis thresholding (to form high-quality edges)
Canny Edge Detection—The Algorithm
IMAGES SOURCE: Wikipedia
Copyright © 2015 ARM 10
• Canny Edges Detection overall adapts well to GPU acceleration
• Convolution stages map well to parallelism and vectorisation
• Hysteresis is very serial in nature but constitutes minor component
• Large performance uplift of the algorithm from CPU-only reference
implementation through an elementary port using OpenCL
Canny Edges Detection: First GPU Port
Resolution Speed-up (*)
720 HD x7.48
1080p HD x7.24
4k x8.30
(*) only kernel execution measured
Copyright © 2015 ARM 11
• Optimization stages on GPU (OpenCL)
• Utilize vector load to reduce the pressure on the L/S pipeline
• Loop-unroll to increase performance of arithmetically bound kernels
• Trade-off between branching and redundant operations
• Use padding to avoid boundary checks
• Datatypes size reduction
Canny Edges Detection: Optimized GPU Port
Resolution
Further
improvement of
GPU version (*)
720 HD x4.97
1080p HD x5.67
4k x6.85
(*) only kernel execution measured
Copyright © 2015 ARM 12
The Hidden Cost of Using the GPU (1)
GPU timeCPU time
Driver & kernel setup
Cache coherency
Cache coherency
Driver clean up
Total GPU timePyramid
on CPU
Pyramid
on GPU
(diagram is conceptual, not in scale)
Copyright © 2015 ARM 13
The Hidden Cost of Using the GPU (2)
• To benefit from GPU acceleration
• Computational workload must overshadow the overheads
• Run repeated passes (multiple-frames)
• Use multiple buffers to pipeline read-backs whilst GPU moves on
Canny edge detection—single frame Canny edge detection—200 frames
Copyright © 2015 ARM 14
Complex Imaging Pipeline Example: HoG
• We examined a complex computer vision pipeline
• Histogram of Gradients often used in image recognition pipelines
• We investigated how the GPU can improve computation
• CPU version combined many of the stages
• On GPU each stage was kept separate for simplicity
Derivative
Dx and Dy
Phase and
Magnitude
DxGreyscale
Image
Orientation
binning
Magnitude
block
calculation
Normalise
Dy
Phase
Magnitude
Descriptor
Extractor &
Classifier
Copyright © 2015 ARM 15
Histogram of Gradients: GPU Implementation
• We applied common optimizations as per
pyramid and canny edge
• Arctangent function applied to each pixel in
Phase and Magnitude computation
• Default CPU atan2() library function slow
• Approximation version 2x faster
• GPU built in function 6x faster
• Another built in function (sqrt) is used by the
normalise stage
Copyright © 2015 ARM 16
Histogram of Gradients: The Results
• Significant performance
improvement on GPU
• Improvement reduced with
smaller images
• When running on the CPU at
smaller resolutions, most of
the data will be in the cache
• On CPU we have fewer
threads, which means fewer
chances to hide latency
• Can we improve further?
8.2x
6.2x3.0x
Copyright © 2015 ARM 17
HoG: Migrate Small Tasks back to CPU?
Copyright © 2015 ARM 18
Screenshots of ARM DS-5 StreamlineTool
CPU and GPU Work Correlation
Copyright © 2015 ARM 19
• More efficient processing is achieved by keeping the GPU busy
Reducing CPU and GPU Serialization
Screenshots of ARM DS-5 Streamline Tool
enqueue Frame 0
enqueue Frame 1
wait for Frame 0
to complete…
enqueue Frame 2
wait for Frame 1
to complete…
etc.
enqueue Frame 0
wait for Frame 0
to complete…
enqueue Frame 1
wait for Frame 1
to complete…
etc.
Interleaved
CPU/GPU
activity
Serialised
CPU/GPU
activity
Copyright © 2015 ARM 20
• www.malideveloper.com
• Download guides, papers, tools, etc.
• http://community.arm.com/welcome
• Community forums, blogs and more
• malidevelopers@arm.com
• Graphics and GPU Compute developer support
• http://malideveloper.arm.com/develop-for-mali/opencl-renderscript-tutorials/
• A range of video and written tutorials for GPU Compute, OpenCL and RenderScript
• http://malideveloper.arm.com/develop-for-mali/features/mali-t6xx-gpu-user-space-
drivers/
• ARM® Mali™-T600 series GPU user-space binary drivers available for download
• Linaro BSP now available with Mali-T600 series GPU support
• And most importantly:
• The Mali ecosystem of partners
• The Embedded Vision Alliance
Resources
Copyright © 2015 ARM 21
• The GPU is architecturally suitable for several computer vision
algorithms
• Workload characteristics & size determine optimal CPU/GPU
balance
• Computation load must overwhelm system overheads
• Kernel & system optimization extract optimal performance
• Stable well-understood algorithms typically evolve to hardware
• If software solution needed by choice (cost) or necessity (time-to-
market)
• GPU can increase performance and reduce power vs. CPU-only
• Add flexibility and reduce cost for chip, sensor and ISP vendors
• Improve performance of software on existing silicon
In Conclusion: The Role of GPU Compute

More Related Content

More from Edge AI and Vision Alliance

“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...Edge AI and Vision Alliance
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...Edge AI and Vision Alliance
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic LeapEdge AI and Vision Alliance
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
 

Recently uploaded

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 

Recently uploaded (20)

Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 

"Understanding the Role of Integrated GPUs in Vision Applications," a Presentation from ARM

  • 1. Copyright © 2015 ARM 1 Roberto Mijat, Visual Computing Marketing Manager 12 May 2015 Understanding the Role of Integrated GPUs in Vision Applications
  • 2. Copyright © 2015 ARM 2 • World leading semiconductor IP licensor • Founded in 1990 • >1K processor licenses (>350 partners) • >12bn shipments in 2014 • >50bn shipments to date • Business model • Designing and licensing of IP • Not manufacturing chips • Products • CPUs • Suite of integrated media IP • Interconnect • Physical IP Introduction to ARM and Mali
  • 3. Copyright © 2015 ARM 3 What is GPU Compute? Cost-effective, efficient, and high-performance parallel computation • 2D/3D Graphics • Image processing • Multimedia • Computer Vision OS and applications The GPU is now programmable through C-like high-level languages Managed as accelerator or companion processor CPU GPU
  • 4. Copyright © 2015 ARM 4 Comprehensive Heterogeneous Compute Generic CPU Serial workloads Task parallel workloads < 10 threads 1-8 cores Short pipeline (generally <20 stages) Low latency General purpose SIMD engine Generic GPU Data parallel workloads 100s-1000s threads 1-100s cores Long pipeline (generally >50 stages) Very high latency High throughput 2D/3D Graphics Stream processing
  • 5. Copyright © 2015 ARM 5 • Reduced power, performance and area compared to desktop/HPC • Designed for fan-less mobile devices • Optimized for energy efficiency • Integrated in System-On-Chip • Sharing physical main memory with CPU (and other processors) • Local caches • I/O coherency available in newer platforms • Primary use case is 3D graphics acceleration • Gaming and user interfaces • Modern designs support GPU Compute (aka GPGPU) • GPU is a standard feature in mobile devices Key Characteristics of Integrated Mobile GPU
  • 6. Copyright © 2015 ARM 6 • Option #1: Manually optimize your code • Study your algorithm, determine how to partition it • Optimize using low-level NEON™ (CPU) and OpenCL (GPU) • Manually fine tune load balance between CPU and GPU • Option #2: Utilize GPU Compute enabled middleware, for example: • Gesture UI middleware from eyeSight • OpenCL enabled functionality from ArcSoft, Fotonation • Computer Vision mobile library from ArrayFire • But the key question is not how, but: WHEN should you be using the GPU for computer vision? • This presentation will try to answer this question through examples Enabling Computer Vision on Integrated GPU
  • 7. Copyright © 2015 ARM 7 • Set of sub-sampled images • Each level • Apply smoothing filter • Sub-sample in both direction • Widely used in computer vision • Feature extraction • Stereo vision • Object detection Image Pyramid: The Algorithm x y y/2 y/4 x/2 x/4
  • 8. Copyright © 2015 ARM 8 • In principle well suited to GPU • Embarrassingly parallel problem • No data dependencies • Generally the GPU improves performance • Architectural specific optimizations • Algorithm structure changes • GPU specific optimization stages • Added interleaving conversions to enable planar level operations • Consolidated GPU kernels to improve efficiently • Used OpenCL data structures and vector maths Image Pyramid: Optimizing for GPU
  • 9. Copyright © 2015 ARM 9 • Popular tuneable algorithm to extract edges from images • 4 main stages • Gaussian filter (reduce noise) • Sobel filter (identify candidate edges) • Remove pixels that are not a local maximum • Hysteresis thresholding (to form high-quality edges) Canny Edge Detection—The Algorithm IMAGES SOURCE: Wikipedia
  • 10. Copyright © 2015 ARM 10 • Canny Edges Detection overall adapts well to GPU acceleration • Convolution stages map well to parallelism and vectorisation • Hysteresis is very serial in nature but constitutes minor component • Large performance uplift of the algorithm from CPU-only reference implementation through an elementary port using OpenCL Canny Edges Detection: First GPU Port Resolution Speed-up (*) 720 HD x7.48 1080p HD x7.24 4k x8.30 (*) only kernel execution measured
  • 11. Copyright © 2015 ARM 11 • Optimization stages on GPU (OpenCL) • Utilize vector load to reduce the pressure on the L/S pipeline • Loop-unroll to increase performance of arithmetically bound kernels • Trade-off between branching and redundant operations • Use padding to avoid boundary checks • Datatypes size reduction Canny Edges Detection: Optimized GPU Port Resolution Further improvement of GPU version (*) 720 HD x4.97 1080p HD x5.67 4k x6.85 (*) only kernel execution measured
  • 12. Copyright © 2015 ARM 12 The Hidden Cost of Using the GPU (1) GPU timeCPU time Driver & kernel setup Cache coherency Cache coherency Driver clean up Total GPU timePyramid on CPU Pyramid on GPU (diagram is conceptual, not in scale)
  • 13. Copyright © 2015 ARM 13 The Hidden Cost of Using the GPU (2) • To benefit from GPU acceleration • Computational workload must overshadow the overheads • Run repeated passes (multiple-frames) • Use multiple buffers to pipeline read-backs whilst GPU moves on Canny edge detection—single frame Canny edge detection—200 frames
  • 14. Copyright © 2015 ARM 14 Complex Imaging Pipeline Example: HoG • We examined a complex computer vision pipeline • Histogram of Gradients often used in image recognition pipelines • We investigated how the GPU can improve computation • CPU version combined many of the stages • On GPU each stage was kept separate for simplicity Derivative Dx and Dy Phase and Magnitude DxGreyscale Image Orientation binning Magnitude block calculation Normalise Dy Phase Magnitude Descriptor Extractor & Classifier
  • 15. Copyright © 2015 ARM 15 Histogram of Gradients: GPU Implementation • We applied common optimizations as per pyramid and canny edge • Arctangent function applied to each pixel in Phase and Magnitude computation • Default CPU atan2() library function slow • Approximation version 2x faster • GPU built in function 6x faster • Another built in function (sqrt) is used by the normalise stage
  • 16. Copyright © 2015 ARM 16 Histogram of Gradients: The Results • Significant performance improvement on GPU • Improvement reduced with smaller images • When running on the CPU at smaller resolutions, most of the data will be in the cache • On CPU we have fewer threads, which means fewer chances to hide latency • Can we improve further? 8.2x 6.2x3.0x
  • 17. Copyright © 2015 ARM 17 HoG: Migrate Small Tasks back to CPU?
  • 18. Copyright © 2015 ARM 18 Screenshots of ARM DS-5 StreamlineTool CPU and GPU Work Correlation
  • 19. Copyright © 2015 ARM 19 • More efficient processing is achieved by keeping the GPU busy Reducing CPU and GPU Serialization Screenshots of ARM DS-5 Streamline Tool enqueue Frame 0 enqueue Frame 1 wait for Frame 0 to complete… enqueue Frame 2 wait for Frame 1 to complete… etc. enqueue Frame 0 wait for Frame 0 to complete… enqueue Frame 1 wait for Frame 1 to complete… etc. Interleaved CPU/GPU activity Serialised CPU/GPU activity
  • 20. Copyright © 2015 ARM 20 • www.malideveloper.com • Download guides, papers, tools, etc. • http://community.arm.com/welcome • Community forums, blogs and more • malidevelopers@arm.com • Graphics and GPU Compute developer support • http://malideveloper.arm.com/develop-for-mali/opencl-renderscript-tutorials/ • A range of video and written tutorials for GPU Compute, OpenCL and RenderScript • http://malideveloper.arm.com/develop-for-mali/features/mali-t6xx-gpu-user-space- drivers/ • ARM® Mali™-T600 series GPU user-space binary drivers available for download • Linaro BSP now available with Mali-T600 series GPU support • And most importantly: • The Mali ecosystem of partners • The Embedded Vision Alliance Resources
  • 21. Copyright © 2015 ARM 21 • The GPU is architecturally suitable for several computer vision algorithms • Workload characteristics & size determine optimal CPU/GPU balance • Computation load must overwhelm system overheads • Kernel & system optimization extract optimal performance • Stable well-understood algorithms typically evolve to hardware • If software solution needed by choice (cost) or necessity (time-to- market) • GPU can increase performance and reduce power vs. CPU-only • Add flexibility and reduce cost for chip, sensor and ISP vendors • Improve performance of software on existing silicon In Conclusion: The Role of GPU Compute