More Related Content Similar to Qualcomm Snapdragon Processors: A Super Gaming Platform (20) More from Qualcomm Developer Network (20) Qualcomm Snapdragon Processors: A Super Gaming Platform 2. Qualcomm® Snapdragon™ Processors:
A Super Gaming Platform
Manish Sirdeshmukh, Product Manager, Staff
Todd LeMoine, Engineer, Principal/Manager
Qualcomm Technologies, Inc.
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
3. 3©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Source: Gartner, October 2013, “Forecast Video Game Ecosystem Worldwide”
Total mobile gaming revenues (for all platforms)
are projected to grow from $13 billion in 2013 to
$22 billion in 2015
$ 22B
5. 5©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Gaming on mobile today
Desktop PC Snapdragon 805
“Epic now has brought Unreal Engine 4 to Android with the Snapdragon 800 and 805 chipsets from Qualcomm
Technologies,” said Niklas Smedberg, Senior Engine Programmer, Epic Games. “Recently we worked with
Qualcomm [QTI] to elevate graphics to the next level on the Snapdragon Adreno GPU hardware, which delivers
some of the most power-efficient unified shader capabilities we’ve seen yet for Android smartphones and tablets.”
Comparison: PC Comparison: Mobile
6. 6©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Image: Modern Combat 5 by Gameloft
What is involved in games?
Gameplay execution (animation):
Animation for water movement and
anchored boat motion
Gameplay execution (AI):
Enemy helicopter controlled by AI
Gameplay execution (physics):
Particle physics makes explosions
look real
Console-quality graphics:
Lens effect on the sunlight breaking
through the clouds
Console-quality graphics:
Hi-res textures provide rich details
to the scene
Console-quality graphics:
Bloom glare from gun fire provide
immersive experience
Fast connectivity:
Play a mission in multi-player gaming
High-quality video:
After completing the level, watch a
cut scene transition
Responsive and accurate control:
Control the character movement
Multi-screen experience:
Mirror your screen to TV
Cinema-quality sound:
Hear gunfire, explosions, bullets
flying by, and the helicopter’s
rotor blades
9. 9©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
How is SoC utilized by a game?
Heterogeneous hardware blocks and data flow
Graphics Textures,
Shaders, Geometry
Video
Data
Audio
Data
Start
Quad Core CPU
System Memory
Final Frame
CPU #1 CPU #2 CPU #3 CPU #4
Physics
Animation
Gamelogic
Artificial
Intelligence
To Display Panel
To Wi-Fi
Display Panel
Encoded
Final Frame
Input Signals
DisplayReads
GPUReads
Video
Graphics Rendering
Audio
GraphicsPixelWrites
Video Pixel Writes
To Speakers
Wi-Fi
Engine
Video
Decoder
Video
Encoder
DSP
(Audio Decoder)
Sensor
Engine
Display
Engine
GPU
10. 10©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm Gobi, Qualcomm Adreno, Qualcomm Hexagon and Krait are products of Qualcomm Technologies, Inc.
Qualcomm® Adreno™ GPU
• Adreno is Qualcomm Technologies,
Inc.’s (QTI) integrated GPU
• Adreno 420 is QTI’s latest integrated
GPU shipping in Snapdragon 805
• Adreno GPUs are custom designed
for mobile use
Qualcomm® Krait™ 450
Quad Core CPU
Location
GPS, GLONASS, Beidou, Galileo Satellites
Adreno 420 GPU
OpenGL ES 2.0/3.1*
OpenCL 1.2 Full
Snapdragon Display Engine
4K, Miracast, picture enhancement
Dual ISPs
(Imaging)
Up to 55MP
1.2GPix/s bw
Camera SW
USB
3.0
Multimedia
Processing
4K Decode
HEVC Decode
Snapdragon Voice Activation
Gestures
Studio Access Security
Memory
2x64 bit LPDDR3
Qualcomm® Hexagon™ DSP
Ultra Low Power
Sensor Engine
Fusion 4.5
Fusion4.5
Qualcomm® Gobi™
9x35 Modem
4th gen CAT 6 LTE
Up to 3x20MHz CA
*Product is based on provisional Khronos Specification, and is designed to pass the Khronos Conformance
Testing Process when available. Current conformance status can be found at www.khronos.org/comformance.
11. 11©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno 420 GPU highlights
• Desktop and console quality graphics on mobile
• Complete DirectX11 FL 11_2 pipeline, supports OpenGL ES 3.1
• Support for dynamic hardware tessellation & geometry shaders
Richer, visually immersive graphics
No Tessellation Tessellation
12. 12©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno 420 supports most advanced graphics APIs
Feature/APIs OpenGL ES 3.0 OpenGL ES 3.1 Android Extension Pack
Compute Shader No Yes Yes
Atomics No Yes Yes
Image Load/Store No Yes Yes
Draw Indirect No Yes Yes
Texture Gather No Yes Yes
Multisample Textures No Yes Yes
Stencil Textures No Yes Yes
Separate Shader Objects No Yes Yes
Advanced Blending Modes
(Programmable Blending)
No Yes Yes
Geometry Shaders No No Yes
Tessellation Shaders No No Yes
13. 13©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
ASTC Unified Shaders FlexRender™ technology
FlexRender is a product of Qualcomm Technologies, Inc.
Adreno 420 GPU highlights
• Improved architecture for performance & efficiency
• Better performance
• Reduced power consumption
Direct
Rendering
Tiled
Rendering
Dynamic
Switching
Original ASTC Compression
24bpp 8bpp 3.56bpp 2bpp
Unified Shaders
Pixel
Vertex
Compute
Tessellation
Geometry
Adreno GPU
System memory
Tile buffer
Adreno GPU
System memory
14. 14©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno 420 architectural improvements
• DX11.2 3D pipeline
− Hardware tessellation
− Geometry shading
− Stream out from VS, DS, GS
− Programmable blending
• Upgraded compute
− Direct compute, OpenCL 1.2 Full profile
− Faster RenderScript
• Improved texturing
− Improved texture performance
− Support for higher level texture filtering (e.g.,
Aniso) with less performance impact
− ASTC support, better LOD & filtering quality
− Larger caches: texture cache, L2 cache
• Improved ROPs & Z
− Faster depth rejection
− Designed to achieve peak draw rate more often
System MemoryCommand
Processor
(Input Assembler)
Vertex Shader
Hull Shader
(LOD, Control Patch)
Tessellator
Domain Shader
(Vertex Calculation
& Displacement)
Geometry Shader
Rasterizer
Pixel Shader
Render Backend
Index Buffers
Hardware
Tessellation
Pipeline
Vertex Buffers
Constant
Buffers
Unordered
Access
Resources
Texture
Resources
Render
Targets
Textures
Buffers
Unified
Shader
Processor
Frame Buffer
Stream Out
15. 15©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno GPU architecture
Advantages:
• Designed to minimize unnecessary data traffic to host memory
• Designed to minimize power consumption
• Use of transparency / anti-aliasing is inexpensive
Tiled Rendering architecture Early Z (Depth) Reject feature
Objects in
background
Objects in
foreground
Advantages:
• Designed to prevent unnecessary use of GPU resources in drawing
pixels for occluded objects
• Designed to increase overall graphics performance for larger scenes
with opaque geometry
16. 16©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno GPU architecture
Dynamic FlexRender technology Double Rate Half Precision (DRHP) design
Adreno GPU
System memory
Direct rendering
GMEM (Tile Buffer)
Adreno GPU
System memory
Tiled rendering
FlexRender
Dynamic
Switching 1X
Speed for
“highp”
Shaders
2X
Speed for
“mediump”
Shaders
Advantages:
• Better performance and power for wider range of use cases
• More developer flexibility
Advantages:
• Use additional/complex shaders without compromising performance
• Better performance with power efficiency
18. 18©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: frame buffer objects
Worst case pattern of FBO usage
Frame buffer
Clear Draw
FBO 0
Draw
Frame buffer
Draw
Store Store
Load Load
Store
Frame rendering
19. 19©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: frame buffer objects
Optimized render order
Frame Buffer
Clear Draw
FBO 0
Store
Invalidate Framebuffer Draw
Store
Optimal rendering order:
FBO0 invalidate, FBO0 draw … FBOn invalidate, FBOn draw, FB clear, FB draw
Frame rendering
20. 20©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: dynamic vertex buffer objects
• In the worst case the complete sequence of VBO updates
and draw calls may have to be repeated for each bin
• Even when using glBufferSubData multiple copies of the
entire VBO may need to be maintained by the driver
Worst case pattern of VBO usage
Update VBO0 Update VBO0 Update VBO0Draw Draw Draw
Frame rendering
21. 21©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: dynamic vertex buffer objects
Optimized dynamic VBO order
Update VBO0 Draw VBO0
Update VBO0 Update VBOn Draw VBO0 Draw VBOn
Or if multiple dynamic VBOs are used
Frame rendering
Frame rendering
22. 22©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: sorting
Potential to reduce both the number of state changes as well as overdraw - both of which have
a negative impact on GPU performance
• Sort by material
− Reduces shader and texture state changes
• Sort opaque draw calls front-to back
− Reduces time spent shading fragments which will be overwritten later
− Have observed > 10ms/frame performance increase in some fragment bound content with just this optimization.
• Draw the skybox last
− Typically the skybox is covered by foreground geometry in half or more of the screen
23. 23©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: shader performance
Precision
• Operations on 16 bit floating point (mediump) values are 2x faster than on 32 bit (highp)
− Recommend setting default precision to mediump and promoting only values which require higher
precision, E.g
Scalar architecture
• Adreno 3xx and 4xx GPUs utilize a scalar architecture
• Avoid using components that aren’t needed for the final result
• Wherever possible re-order operations to execute on as few components as possible
precision mediump float; // Set default precision in FS to fp16
out vec2 vSmallTexCoord; // Uses mediump
out highp vec2 vLargeTexCoord; // Uses highp
24. 24©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: tessellation
Tessellation allows for incredible levels of detail and can substantially reduce memory
bandwidth and CPU cycles by allowing other game sub-systems to operate on low resolution
representations of meshes, but …
• High levels of tessellation can generate sub-pixel triangles which cause poor rasterizer
utilization
− Very important to utilize distance, screen space size or other adaptive metrics for computing tessellation
factors which avoid sub-pixel triangles
Full Rasterizer Utilization Partial Rasterizer Utilization
25. 25©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Optimization: tessellation
Culling
• Hardware back-face culling occurs after the tessellation stage, which potentially wastes GPU
resources tessellating back facing primitives
• Back-facing primitives can be identified in the TCS and culled by setting their edge
tessellation factors to 0
− A slight “fudge” factor may be needed in this calculation if displacement mapping will be used in the TES as
this technique may change the visibility of primitives
General
• Whenever possible disable the TCS and TES stages if the tessellation factor for the mesh
would be ~1
− Eliminates the use of unnecessary GPU stages
27. 27©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Graphics content development & tools
Asset
Creation
Compress/
Optimize
Code Emulate Compile Deploy Analyze/ Debug
28. 28©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno SDK and Adreno Profiler and products of Qualcomm Technologies, inc.
Adreno tools
• Support for OpenGL ES 3.1, 3.0 & 2.0, DirectX, and OpenCL
• Supported on Windows, Mac OSX, and Linux
• Comprehensive collection of utilities
• Over 100 samples and tutorials
• Thorough documentation
Adreno SDK
Available on developer.qualcomm.com
Adreno Profiler
• Comprehensive profiling tool
• Supported on Windows, Mac OSX, and Linux
• Enables detailed analysis of GPU utilization
• Proven effective and easy to use
• Works with commercial devices & apps
29. 29©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno Profiler: introduction
Grapher mode: real-time analysisScrubber mode : detailed frame analysis
API call stack
Optimization
suggestions
Shader stats
Shader editor
Texture browser
Detailed frame stats
Overrides
Metrics
Frame emulation
Scrubber metrics
30. 30©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno Profiler
demo
Reign of Amira™
Available on GooglePlayReign of Amira is a product of Qualcomm Technologies, Inc.
31. 31©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Adreno SDK
• Desktop OpenGL ES emulator
− Now supporting OpenGL ES 3.1
• Over 100 samples and tutorials
− Simple tutorials to advanced demos
− Covers OpenGL ES 2.0 and 3.0, DirectX,
and OpenCL
• Utilities and libraries
− Texture compression
− Mesh optimization
• Adreno texture tool
• Developer documentation
− Adreno Developer Guide
Shader samples
Animal materials (fur, elephant skin, fish
scales, alligators, etc.)
General lighting (ambient, diffuse, specular,
Blinn-Phong, parallax, etc.)
Human materials (skin, eye, etc.) Other effects (environment mapping, warping,
glass distortion, god rays, etc.)
Other materials (cloth, wood, plastic, marble,
leather, metal, etc.)
Advanced rendering (toon shading, deferred
lighting, eye adaption, etc.)
34. 34©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
For more information on Qualcomm, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
©2013-2014 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.
Qualcomm, Snapdragon, Adreno, Gobi, Hexagon, FlexRender and Reign of Amira are trademarks
of Qualcomm Incorporated, registered in the United States and other countries. Krait and Uplinq
are trademarks of Qualcomm Incorporated. All Qualcomm Incorporated trademarks are used with
permission. Other products and brand names may be trademarks or registered trademarks of
their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm
Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate
structure, as applicable.
Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of
its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm
Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering,
research and development functions, and substantially all of its product and services businesses,
including its semiconductor business, QCT.
Thank you FOLLOW US ON: