1©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm® Snapdragon™ Processors:
A Super Gaming Platform
Manish Sirdeshmukh, Product Manager, Staff
Todd LeMoine, Engineer, Principal/Manager
Qualcomm Technologies, Inc.
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
3©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Source: Gartner, October 2013, “Forecast Video Game Ecosystem Worldwide”
Total mobile gaming revenues (for all platforms)
are projected to grow from $13 billion in 2013 to
$22 billion in 2015
$ 22B
4©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Gaming on mobile today
Comparison: PC Comparison: Mobile
5©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Gaming on mobile today
Desktop PC Snapdragon 805
“Epic now has brought Unreal Engine 4 to Android with the Snapdragon 800 and 805 chipsets from Qualcomm
Technologies,” said Niklas Smedberg, Senior Engine Programmer, Epic Games. “Recently we worked with
Qualcomm [QTI] to elevate graphics to the next level on the Snapdragon Adreno GPU hardware, which delivers
some of the most power-efficient unified shader capabilities we’ve seen yet for Android smartphones and tablets.”
Comparison: PC Comparison: Mobile
6©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Image: Modern Combat 5 by Gameloft
What is involved in games?
Gameplay execution (animation):
Animation for water movement and
anchored boat motion
Gameplay execution (AI):
Enemy helicopter controlled by AI
Gameplay execution (physics):
Particle physics makes explosions
look real
Console-quality graphics:
Lens effect on the sunlight breaking
through the clouds
Console-quality graphics:
Hi-res textures provide rich details
to the scene
Console-quality graphics:
Bloom glare from gun fire provide
immersive experience
Fast connectivity:
Play a mission in multi-player gaming
High-quality video:
After completing the level, watch a
cut scene transition
Responsive and accurate control:
Control the character movement
Multi-screen experience:
Mirror your screen to TV
Cinema-quality sound:
Hear gunfire, explosions, bullets
flying by, and the helicopter’s
rotor blades
7©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Snapdragon processors
8©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Snapdragon processors
9©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
How is SoC utilized by a game?
Heterogeneous hardware blocks and data flow
Graphics Textures,
Shaders, Geometry
Video
Data
Audio
Data
Start
Quad Core CPU
System Memory
Final Frame
CPU #1 CPU #2 CPU #3 CPU #4
Physics
Animation
Gamelogic
Artificial
Intelligence
To Display Panel
To Wi-Fi
Display Panel
Encoded
Final Frame
Input Signals
DisplayReads
GPUReads
Video
Graphics Rendering
Audio
GraphicsPixelWrites
Video Pixel Writes
To Speakers
Wi-Fi
Engine
Video
Decoder
Video
Encoder
DSP
(Audio Decoder)
Sensor
Engine
Display
Engine
GPU
10©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm Gobi, Qualcomm Adreno, Qualcomm Hexagon and Krait are products of Qualcomm Technologies, Inc.
Qualcomm® Adreno™ GPU
• Adreno is Qualcomm Technologies,
Inc.’s (QTI) integrated GPU
• Adreno 420 is QTI’s latest integrated
GPU shipping in Snapdragon 805
• Adreno GPUs are custom designed
for mobile use
Qualcomm® Krait™ 450
Quad Core CPU
Location
GPS, GLONASS, Beidou, Galileo Satellites
Adreno 420 GPU
OpenGL ES 2.0/3.1*
OpenCL 1.2 Full
Snapdragon Display Engine
4K, Miracast, picture enhancement
Dual ISPs
(Imaging)
Up to 55MP
1.2GPix/s bw
Camera SW
USB
3.0
Multimedia
Processing
4K Decode
HEVC Decode
Snapdragon Voice Activation
Gestures
Studio Access Security
Memory
2x64 bit LPDDR3
Qualcomm® Hexagon™ DSP
Ultra Low Power
Sensor Engine
Fusion 4.5
Fusion4.5
Qualcomm® Gobi™
9x35 Modem
4th gen CAT 6 LTE
Up to 3x20MHz CA
*Product is based on provisional Khronos Specification, and is designed to pass the Khronos Conformance
Testing Process when available. Current conformance status can be found at www.khronos.org/comformance.
11©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno 420 GPU highlights
• Desktop and console quality graphics on mobile
• Complete DirectX11 FL 11_2 pipeline, supports OpenGL ES 3.1
• Support for dynamic hardware tessellation & geometry shaders
Richer, visually immersive graphics
No Tessellation Tessellation
12©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno 420 supports most advanced graphics APIs
Feature/APIs OpenGL ES 3.0 OpenGL ES 3.1 Android Extension Pack
Compute Shader No Yes Yes
Atomics No Yes Yes
Image Load/Store No Yes Yes
Draw Indirect No Yes Yes
Texture Gather No Yes Yes
Multisample Textures No Yes Yes
Stencil Textures No Yes Yes
Separate Shader Objects No Yes Yes
Advanced Blending Modes
(Programmable Blending)
No Yes Yes
Geometry Shaders No No Yes
Tessellation Shaders No No Yes
13©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
ASTC Unified Shaders FlexRender™ technology
FlexRender is a product of Qualcomm Technologies, Inc.
Adreno 420 GPU highlights
• Improved architecture for performance & efficiency
• Better performance
• Reduced power consumption
Direct
Rendering
Tiled
Rendering
Dynamic
Switching
Original ASTC Compression
24bpp 8bpp 3.56bpp 2bpp
Unified Shaders
Pixel
Vertex
Compute
Tessellation
Geometry
Adreno GPU
System memory
Tile buffer
Adreno GPU
System memory
14©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno 420 architectural improvements
• DX11.2 3D pipeline
− Hardware tessellation
− Geometry shading
− Stream out from VS, DS, GS
− Programmable blending
• Upgraded compute
− Direct compute, OpenCL 1.2 Full profile
− Faster RenderScript
• Improved texturing
− Improved texture performance
− Support for higher level texture filtering (e.g.,
Aniso) with less performance impact
− ASTC support, better LOD & filtering quality
− Larger caches: texture cache, L2 cache
• Improved ROPs & Z
− Faster depth rejection
− Designed to achieve peak draw rate more often
System MemoryCommand
Processor
(Input Assembler)
Vertex Shader
Hull Shader
(LOD, Control Patch)
Tessellator
Domain Shader
(Vertex Calculation
& Displacement)
Geometry Shader
Rasterizer
Pixel Shader
Render Backend
Index Buffers
Hardware
Tessellation
Pipeline
Vertex Buffers
Constant
Buffers
Unordered
Access
Resources
Texture
Resources
Render
Targets
Textures
Buffers
Unified
Shader
Processor
Frame Buffer
Stream Out
15©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno GPU architecture
Advantages:
• Designed to minimize unnecessary data traffic to host memory
• Designed to minimize power consumption
• Use of transparency / anti-aliasing is inexpensive
Tiled Rendering architecture Early Z (Depth) Reject feature
Objects in
background
Objects in
foreground
Advantages:
• Designed to prevent unnecessary use of GPU resources in drawing
pixels for occluded objects
• Designed to increase overall graphics performance for larger scenes
with opaque geometry
16©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno GPU architecture
Dynamic FlexRender technology Double Rate Half Precision (DRHP) design
Adreno GPU
System memory
Direct rendering
GMEM (Tile Buffer)
Adreno GPU
System memory
Tiled rendering
FlexRender
Dynamic
Switching 1X
Speed for
“highp”
Shaders
2X
Speed for
“mediump”
Shaders
Advantages:
• Better performance and power for wider range of use cases
• More developer flexibility
Advantages:
• Use additional/complex shaders without compromising performance
• Better performance with power efficiency
17©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
OpenGL ES
optimizations
18©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: frame buffer objects
Worst case pattern of FBO usage
Frame buffer
Clear Draw
FBO 0
Draw
Frame buffer
Draw
Store Store
Load Load
Store
Frame rendering
19©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: frame buffer objects
Optimized render order
Frame Buffer
Clear Draw
FBO 0
Store
Invalidate Framebuffer Draw
Store
Optimal rendering order:
FBO0 invalidate, FBO0 draw … FBOn invalidate, FBOn draw, FB clear, FB draw
Frame rendering
20©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: dynamic vertex buffer objects
• In the worst case the complete sequence of VBO updates
and draw calls may have to be repeated for each bin
• Even when using glBufferSubData multiple copies of the
entire VBO may need to be maintained by the driver
Worst case pattern of VBO usage
Update VBO0 Update VBO0 Update VBO0Draw Draw Draw
Frame rendering
21©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: dynamic vertex buffer objects
Optimized dynamic VBO order
Update VBO0 Draw VBO0
Update VBO0 Update VBOn Draw VBO0 Draw VBOn
Or if multiple dynamic VBOs are used
Frame rendering
Frame rendering
22©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: sorting
Potential to reduce both the number of state changes as well as overdraw - both of which have
a negative impact on GPU performance
• Sort by material
− Reduces shader and texture state changes
• Sort opaque draw calls front-to back
− Reduces time spent shading fragments which will be overwritten later
− Have observed > 10ms/frame performance increase in some fragment bound content with just this optimization.
• Draw the skybox last
− Typically the skybox is covered by foreground geometry in half or more of the screen
23©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: shader performance
Precision
• Operations on 16 bit floating point (mediump) values are 2x faster than on 32 bit (highp)
− Recommend setting default precision to mediump and promoting only values which require higher
precision, E.g
Scalar architecture
• Adreno 3xx and 4xx GPUs utilize a scalar architecture
• Avoid using components that aren’t needed for the final result
• Wherever possible re-order operations to execute on as few components as possible
precision mediump float; // Set default precision in FS to fp16
out vec2 vSmallTexCoord; // Uses mediump
out highp vec2 vLargeTexCoord; // Uses highp
24©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: tessellation
Tessellation allows for incredible levels of detail and can substantially reduce memory
bandwidth and CPU cycles by allowing other game sub-systems to operate on low resolution
representations of meshes, but …
• High levels of tessellation can generate sub-pixel triangles which cause poor rasterizer
utilization
− Very important to utilize distance, screen space size or other adaptive metrics for computing tessellation
factors which avoid sub-pixel triangles
Full Rasterizer Utilization Partial Rasterizer Utilization
25©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: tessellation
Culling
• Hardware back-face culling occurs after the tessellation stage, which potentially wastes GPU
resources tessellating back facing primitives
• Back-facing primitives can be identified in the TCS and culled by setting their edge
tessellation factors to 0
− A slight “fudge” factor may be needed in this calculation if displacement mapping will be used in the TES as
this technique may change the visibility of primitives
General
• Whenever possible disable the TCS and TES stages if the tessellation factor for the mesh
would be ~1
− Eliminates the use of unnecessary GPU stages
26©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno tools
27©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Graphics content development & tools
Asset
Creation
Compress/
Optimize
Code Emulate Compile Deploy Analyze/ Debug
28©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno SDK and Adreno Profiler and products of Qualcomm Technologies, inc.
Adreno tools
• Support for OpenGL ES 3.1, 3.0 & 2.0, DirectX, and OpenCL
• Supported on Windows, Mac OSX, and Linux
• Comprehensive collection of utilities
• Over 100 samples and tutorials
• Thorough documentation
Adreno SDK
Available on developer.qualcomm.com
Adreno Profiler
• Comprehensive profiling tool
• Supported on Windows, Mac OSX, and Linux
• Enables detailed analysis of GPU utilization
• Proven effective and easy to use
• Works with commercial devices & apps
29©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno Profiler: introduction
Grapher mode: real-time analysisScrubber mode : detailed frame analysis
API call stack
Optimization
suggestions
Shader stats
Shader editor
Texture browser
Detailed frame stats
Overrides
Metrics
Frame emulation
Scrubber metrics
30©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno Profiler
demo
Reign of Amira™
Available on GooglePlayReign of Amira is a product of Qualcomm Technologies, Inc.
31©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno SDK
• Desktop OpenGL ES emulator
− Now supporting OpenGL ES 3.1
• Over 100 samples and tutorials
− Simple tutorials to advanced demos
− Covers OpenGL ES 2.0 and 3.0, DirectX,
and OpenCL
• Utilities and libraries
− Texture compression
− Mesh optimization
• Adreno texture tool
• Developer documentation
− Adreno Developer Guide
Shader samples
Animal materials (fur, elephant skin, fish
scales, alligators, etc.)
General lighting (ambient, diffuse, specular,
Blinn-Phong, parallax, etc.)
Human materials (skin, eye, etc.) Other effects (environment mapping, warping,
glass distortion, god rays, etc.)
Other materials (cloth, wood, plastic, marble,
leather, metal, etc.)
Advanced rendering (toon shading, deferred
lighting, eye adaption, etc.)
32©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno SDK
demo
Reign of Amira™
Available on GooglePlay
33©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Special thanks
34©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
For more information on Qualcomm, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Adreno, Gobi, Hexagon, FlexRender and Reign of Amira are trademarks
of Qualcomm Incorporated, registered in the United States and other countries. Krait and Uplinq
are trademarks of Qualcomm Incorporated. All Qualcomm Incorporated trademarks are used with
permission. Other products and brand names may be trademarks or registered trademarks of
their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm
Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate
structure, as applicable.
Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of
its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm
Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering,
research and development functions, and substantially all of its product and services businesses,
including its semiconductor business, QCT.
Thank you FOLLOW US ON:

Qualcomm Snapdragon Processors: A Super Gaming Platform

  • 1.
    1©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved.
  • 2.
    Qualcomm® Snapdragon™ Processors: ASuper Gaming Platform Manish Sirdeshmukh, Product Manager, Staff Todd LeMoine, Engineer, Principal/Manager Qualcomm Technologies, Inc. Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
  • 3.
    3©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Source: Gartner, October 2013, “Forecast Video Game Ecosystem Worldwide” Total mobile gaming revenues (for all platforms) are projected to grow from $13 billion in 2013 to $22 billion in 2015 $ 22B
  • 4.
    4©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Gaming on mobile today Comparison: PC Comparison: Mobile
  • 5.
    5©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Gaming on mobile today Desktop PC Snapdragon 805 “Epic now has brought Unreal Engine 4 to Android with the Snapdragon 800 and 805 chipsets from Qualcomm Technologies,” said Niklas Smedberg, Senior Engine Programmer, Epic Games. “Recently we worked with Qualcomm [QTI] to elevate graphics to the next level on the Snapdragon Adreno GPU hardware, which delivers some of the most power-efficient unified shader capabilities we’ve seen yet for Android smartphones and tablets.” Comparison: PC Comparison: Mobile
  • 6.
    6©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Image: Modern Combat 5 by Gameloft What is involved in games? Gameplay execution (animation): Animation for water movement and anchored boat motion Gameplay execution (AI): Enemy helicopter controlled by AI Gameplay execution (physics): Particle physics makes explosions look real Console-quality graphics: Lens effect on the sunlight breaking through the clouds Console-quality graphics: Hi-res textures provide rich details to the scene Console-quality graphics: Bloom glare from gun fire provide immersive experience Fast connectivity: Play a mission in multi-player gaming High-quality video: After completing the level, watch a cut scene transition Responsive and accurate control: Control the character movement Multi-screen experience: Mirror your screen to TV Cinema-quality sound: Hear gunfire, explosions, bullets flying by, and the helicopter’s rotor blades
  • 7.
    7©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Snapdragon processors
  • 8.
    8©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Snapdragon processors
  • 9.
    9©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. How is SoC utilized by a game? Heterogeneous hardware blocks and data flow Graphics Textures, Shaders, Geometry Video Data Audio Data Start Quad Core CPU System Memory Final Frame CPU #1 CPU #2 CPU #3 CPU #4 Physics Animation Gamelogic Artificial Intelligence To Display Panel To Wi-Fi Display Panel Encoded Final Frame Input Signals DisplayReads GPUReads Video Graphics Rendering Audio GraphicsPixelWrites Video Pixel Writes To Speakers Wi-Fi Engine Video Decoder Video Encoder DSP (Audio Decoder) Sensor Engine Display Engine GPU
  • 10.
    10©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm Gobi, Qualcomm Adreno, Qualcomm Hexagon and Krait are products of Qualcomm Technologies, Inc. Qualcomm® Adreno™ GPU • Adreno is Qualcomm Technologies, Inc.’s (QTI) integrated GPU • Adreno 420 is QTI’s latest integrated GPU shipping in Snapdragon 805 • Adreno GPUs are custom designed for mobile use Qualcomm® Krait™ 450 Quad Core CPU Location GPS, GLONASS, Beidou, Galileo Satellites Adreno 420 GPU OpenGL ES 2.0/3.1* OpenCL 1.2 Full Snapdragon Display Engine 4K, Miracast, picture enhancement Dual ISPs (Imaging) Up to 55MP 1.2GPix/s bw Camera SW USB 3.0 Multimedia Processing 4K Decode HEVC Decode Snapdragon Voice Activation Gestures Studio Access Security Memory 2x64 bit LPDDR3 Qualcomm® Hexagon™ DSP Ultra Low Power Sensor Engine Fusion 4.5 Fusion4.5 Qualcomm® Gobi™ 9x35 Modem 4th gen CAT 6 LTE Up to 3x20MHz CA *Product is based on provisional Khronos Specification, and is designed to pass the Khronos Conformance Testing Process when available. Current conformance status can be found at www.khronos.org/comformance.
  • 11.
    11©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno 420 GPU highlights • Desktop and console quality graphics on mobile • Complete DirectX11 FL 11_2 pipeline, supports OpenGL ES 3.1 • Support for dynamic hardware tessellation & geometry shaders Richer, visually immersive graphics No Tessellation Tessellation
  • 12.
    12©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno 420 supports most advanced graphics APIs Feature/APIs OpenGL ES 3.0 OpenGL ES 3.1 Android Extension Pack Compute Shader No Yes Yes Atomics No Yes Yes Image Load/Store No Yes Yes Draw Indirect No Yes Yes Texture Gather No Yes Yes Multisample Textures No Yes Yes Stencil Textures No Yes Yes Separate Shader Objects No Yes Yes Advanced Blending Modes (Programmable Blending) No Yes Yes Geometry Shaders No No Yes Tessellation Shaders No No Yes
  • 13.
    13©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. ASTC Unified Shaders FlexRender™ technology FlexRender is a product of Qualcomm Technologies, Inc. Adreno 420 GPU highlights • Improved architecture for performance & efficiency • Better performance • Reduced power consumption Direct Rendering Tiled Rendering Dynamic Switching Original ASTC Compression 24bpp 8bpp 3.56bpp 2bpp Unified Shaders Pixel Vertex Compute Tessellation Geometry Adreno GPU System memory Tile buffer Adreno GPU System memory
  • 14.
    14©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno 420 architectural improvements • DX11.2 3D pipeline − Hardware tessellation − Geometry shading − Stream out from VS, DS, GS − Programmable blending • Upgraded compute − Direct compute, OpenCL 1.2 Full profile − Faster RenderScript • Improved texturing − Improved texture performance − Support for higher level texture filtering (e.g., Aniso) with less performance impact − ASTC support, better LOD & filtering quality − Larger caches: texture cache, L2 cache • Improved ROPs & Z − Faster depth rejection − Designed to achieve peak draw rate more often System MemoryCommand Processor (Input Assembler) Vertex Shader Hull Shader (LOD, Control Patch) Tessellator Domain Shader (Vertex Calculation & Displacement) Geometry Shader Rasterizer Pixel Shader Render Backend Index Buffers Hardware Tessellation Pipeline Vertex Buffers Constant Buffers Unordered Access Resources Texture Resources Render Targets Textures Buffers Unified Shader Processor Frame Buffer Stream Out
  • 15.
    15©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno GPU architecture Advantages: • Designed to minimize unnecessary data traffic to host memory • Designed to minimize power consumption • Use of transparency / anti-aliasing is inexpensive Tiled Rendering architecture Early Z (Depth) Reject feature Objects in background Objects in foreground Advantages: • Designed to prevent unnecessary use of GPU resources in drawing pixels for occluded objects • Designed to increase overall graphics performance for larger scenes with opaque geometry
  • 16.
    16©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno GPU architecture Dynamic FlexRender technology Double Rate Half Precision (DRHP) design Adreno GPU System memory Direct rendering GMEM (Tile Buffer) Adreno GPU System memory Tiled rendering FlexRender Dynamic Switching 1X Speed for “highp” Shaders 2X Speed for “mediump” Shaders Advantages: • Better performance and power for wider range of use cases • More developer flexibility Advantages: • Use additional/complex shaders without compromising performance • Better performance with power efficiency
  • 17.
    17©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. OpenGL ES optimizations
  • 18.
    18©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: frame buffer objects Worst case pattern of FBO usage Frame buffer Clear Draw FBO 0 Draw Frame buffer Draw Store Store Load Load Store Frame rendering
  • 19.
    19©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: frame buffer objects Optimized render order Frame Buffer Clear Draw FBO 0 Store Invalidate Framebuffer Draw Store Optimal rendering order: FBO0 invalidate, FBO0 draw … FBOn invalidate, FBOn draw, FB clear, FB draw Frame rendering
  • 20.
    20©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: dynamic vertex buffer objects • In the worst case the complete sequence of VBO updates and draw calls may have to be repeated for each bin • Even when using glBufferSubData multiple copies of the entire VBO may need to be maintained by the driver Worst case pattern of VBO usage Update VBO0 Update VBO0 Update VBO0Draw Draw Draw Frame rendering
  • 21.
    21©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: dynamic vertex buffer objects Optimized dynamic VBO order Update VBO0 Draw VBO0 Update VBO0 Update VBOn Draw VBO0 Draw VBOn Or if multiple dynamic VBOs are used Frame rendering Frame rendering
  • 22.
    22©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: sorting Potential to reduce both the number of state changes as well as overdraw - both of which have a negative impact on GPU performance • Sort by material − Reduces shader and texture state changes • Sort opaque draw calls front-to back − Reduces time spent shading fragments which will be overwritten later − Have observed > 10ms/frame performance increase in some fragment bound content with just this optimization. • Draw the skybox last − Typically the skybox is covered by foreground geometry in half or more of the screen
  • 23.
    23©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: shader performance Precision • Operations on 16 bit floating point (mediump) values are 2x faster than on 32 bit (highp) − Recommend setting default precision to mediump and promoting only values which require higher precision, E.g Scalar architecture • Adreno 3xx and 4xx GPUs utilize a scalar architecture • Avoid using components that aren’t needed for the final result • Wherever possible re-order operations to execute on as few components as possible precision mediump float; // Set default precision in FS to fp16 out vec2 vSmallTexCoord; // Uses mediump out highp vec2 vLargeTexCoord; // Uses highp
  • 24.
    24©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: tessellation Tessellation allows for incredible levels of detail and can substantially reduce memory bandwidth and CPU cycles by allowing other game sub-systems to operate on low resolution representations of meshes, but … • High levels of tessellation can generate sub-pixel triangles which cause poor rasterizer utilization − Very important to utilize distance, screen space size or other adaptive metrics for computing tessellation factors which avoid sub-pixel triangles Full Rasterizer Utilization Partial Rasterizer Utilization
  • 25.
    25©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Optimization: tessellation Culling • Hardware back-face culling occurs after the tessellation stage, which potentially wastes GPU resources tessellating back facing primitives • Back-facing primitives can be identified in the TCS and culled by setting their edge tessellation factors to 0 − A slight “fudge” factor may be needed in this calculation if displacement mapping will be used in the TES as this technique may change the visibility of primitives General • Whenever possible disable the TCS and TES stages if the tessellation factor for the mesh would be ~1 − Eliminates the use of unnecessary GPU stages
  • 26.
    26©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno tools
  • 27.
    27©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Graphics content development & tools Asset Creation Compress/ Optimize Code Emulate Compile Deploy Analyze/ Debug
  • 28.
    28©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno SDK and Adreno Profiler and products of Qualcomm Technologies, inc. Adreno tools • Support for OpenGL ES 3.1, 3.0 & 2.0, DirectX, and OpenCL • Supported on Windows, Mac OSX, and Linux • Comprehensive collection of utilities • Over 100 samples and tutorials • Thorough documentation Adreno SDK Available on developer.qualcomm.com Adreno Profiler • Comprehensive profiling tool • Supported on Windows, Mac OSX, and Linux • Enables detailed analysis of GPU utilization • Proven effective and easy to use • Works with commercial devices & apps
  • 29.
    29©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno Profiler: introduction Grapher mode: real-time analysisScrubber mode : detailed frame analysis API call stack Optimization suggestions Shader stats Shader editor Texture browser Detailed frame stats Overrides Metrics Frame emulation Scrubber metrics
  • 30.
    30©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno Profiler demo Reign of Amira™ Available on GooglePlayReign of Amira is a product of Qualcomm Technologies, Inc.
  • 31.
    31©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno SDK • Desktop OpenGL ES emulator − Now supporting OpenGL ES 3.1 • Over 100 samples and tutorials − Simple tutorials to advanced demos − Covers OpenGL ES 2.0 and 3.0, DirectX, and OpenCL • Utilities and libraries − Texture compression − Mesh optimization • Adreno texture tool • Developer documentation − Adreno Developer Guide Shader samples Animal materials (fur, elephant skin, fish scales, alligators, etc.) General lighting (ambient, diffuse, specular, Blinn-Phong, parallax, etc.) Human materials (skin, eye, etc.) Other effects (environment mapping, warping, glass distortion, god rays, etc.) Other materials (cloth, wood, plastic, marble, leather, metal, etc.) Advanced rendering (toon shading, deferred lighting, eye adaption, etc.)
  • 32.
    32©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Adreno SDK demo Reign of Amira™ Available on GooglePlay
  • 33.
    33©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. Special thanks
  • 34.
    34©2013-2014 Qualcomm Technologies,Inc. and/or its affiliated companies. All Rights Reserved. For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog ©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Snapdragon, Adreno, Gobi, Hexagon, FlexRender and Reign of Amira are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Krait and Uplinq are trademarks of Qualcomm Incorporated. All Qualcomm Incorporated trademarks are used with permission. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. Thank you FOLLOW US ON: