Droidcon2013 triangles gangolells_imagination

© Imagination Technologies p1
www.imgtec.comApril 2013
It’s all about triangles!
Understanding the GPU in your pocket to
write better code

Introductions
 Who?
 Guillem Vinals Gangolells (guillem.vinalsgangolells@imgtec.com)
 Developer Technology Engineer, PowerVR Graphics
 What?
 It’s all about triangles! Understanding the GPU in your pocket to write better code

Company overview
 Leading silicon, software & cloud IP supplier
 Multimedia: graphics; GPU compute; video; vision
 Communications: demodulation; connectivity; sensors
 Processors: applications CPUs; embedded MCUs
 Cloud: device and user management; services
 Targeting high volume, high growth markets
 Top semis and OEMs for mobile, connected home consumer
automotive and more
 Pure: our strategic product division
 Digital radio, internet connected audio, home automation
 Established technology powerhouse
 Founded 1985; London FTSE 250 (IMG.L); ~1,500 employees
 UK HQ; global operations
Comprehensive IP
portfolio for SoCs
& cloud connectivity
IP business pathfinder
Market maker/driver

www.imgtec.com
A Crash Course in Graphics Architectures

Immediate Mode Renderer (IMR)
 Buffers kept in system memory
 High bandwidth use, power consumption & latency
 Each triangle is processed to completion in submission order
 Wastes processing time and thus power due to “overdraw”
 ‘Early-Z’ techniques help but are only as good as your geometry sorting

Concept: Tiling
 Frame buffer sub-divided into Tiles
 32x32 pixels per tile, for example
 Varies by device
 Geometry is sorted into affected tiles
 Allows each tile to be processed independently
 Small number of fragments per tile
 Allows on-chip memory to be used

Tile Based Renderer (TBR)
 Rasterizing performed per-tile
 Allows the use of fast, on-chip, buffers
 Each triangle is processed to completion in submission order
 Wastes processing time and thus power due to “overdraw”
 ‘Early-Z’ techniques help but are only as good as your geometry sorting

Concept: Deferred Rendering
 Fragments - Two stage process
 Hidden Surface Removal (HSR)
 Shading
 HSR is pixel perfect
 Only visible fragments pass, no ‘overdraw’
 Only requires position data
 Less bandwidth & processing, saves power
 HSR is submission order independent
 No need for applications to submit geometry front to back

Tile Based Deferred Renderer (TBDR) = PowerVR
 Rasterizing performed per-tile
 Allows the use of fast, on-chip, buffers
 Hidden Surface Removal (HSR) reduces overdraw
 Pixel perfect, and submission order independent, no geometry sorting needed
 Optimised to only retrieve information required (*), saving even more bandwidth
 Saves power and bandwidth

www.imgtec.com
PowerVR Hardware Overview

Pipeline Summary
Geometry Processing

Pipeline Summary
Fragment Processing

Bandwidth Saving
 Bandwidth usage is the biggest contributor to GPU power consumption
 Saving bandwidth means staying ‘on chip’ as much as possible
 It also means throwing away work you don’t need to do
 PowerVR is designed from the ground up to do all of these

Unified Architecture

Pixel Back End (PBE)
 Combines sub-samples for on-chip MSAA
 MSAA Performed per-tile
 Done using sub-sampling
 Negligible impact on bandwidth
 Each sub-sample benefits from HSR
 Series5/5XT: 4x MSAA
 Series6: 8x MSAA
 Performs final format conversions
 Up scaling, down scaling etc. (Internal True
Colour)

www.imgtec.com
Further Considerations

Micro Kernel
 Specialised software running on the USSE (Series5) or its own core (Series6)
 Allows the GPU and CPU to operate with minimal synchronisation
 Improves performance by handling interrupts on the GPU
 Competing solutions handle interrupts on CPU (in the driver)

Multicore
 Near linear performance scaling
 Small fixed overhead known at design time
 Geometry processing load-balanced
 Cores share the processing effort
 Tiling enables parallel fragment processing
 Any core can work on any tile when available
 Each tile is self-contained
 Multi-core logic is handled by the hardware
 Completely transparent to the developer

Alpha Blending
 Tiling GPUs don’t need to reach in to system memory to perform an alpha blend
 The colour buffer is on-chip
 This means that alpha blending doesn’t cost you any additional bandwidth
 It also means that alpha blending is fast…very fast
 HSR will also save you some work by throwing away occluded blending work
 Remember: Opaque, Alpha Test, Alpha Blend

www.imgtec.com
Golden Rules

Common Bottlenecks
Based on past observation
Most Likely
CPU Usage
Bandwidth Usage
CPU/GPU Synchronisation
Fragment Shader Instructions
Geometry Upload
Texture Upload
Vertex Shader Instructions
Geometry Complexity
Least Likely

Warning!
Some of these rules may seem obvious to you…
…we still see them broken everyday…
…if you know them, please bear with us

Understand Your Target Device
 No two devices are identical
 Even when they look the same
 Different SoCs will have different bottlenecks
 Make sure you test against different chips
 Make sure you understand the hardware
 You don’t want your optimisation to make things worse
 Clearly, you’re already doing this….your here 
Golden Rule 1

Don’t Waste GPU Time
 The Principle of “Good Enough”
 Don't waste polygons on un-needed detail
 Textures should never be much larger than their size on screen
 Why waste time loading a 1Kx1K texture if it’s never going to appear bigger than 128x128?
 If the user won't notice it, don’t waste time processing it
Golden Rule 2

Promote Calculations up The Chain
 Don’t do a calculation you don’t need to do
 If you can do it once per scene, do it once per scene
 If you can’t, try and do it per vertex
 There are generally fewer vertices in a scene than fragments.
 If you can, pre-bake
 E.g. lighting
 Remember, ‘Good Enough’
Golden Rule 3

Don’t Access an Active Render Target
 Accessing a render target from the CPU is very bad for performance
 If it’s not done properly it will synchronise the GPU and CPU….This is Bad™
Golden Rule 4

Accessing Render Targets Safely
 Use EGL_KHR_fence_sync
 Use CPU side handles to GPU mapped memory to avoid blocking calls
 E.g. GraphicsBuffer (or gralloc) on Android
Golden Rule 4 Cont.

Avoid Updating Active Assets
 Assets may need to stay the same for multiple frames
 We refer to this as an asset’s ‘Lifespan’
Golden Rule 5
 Changing a texture during its lifespan may cause ‘Ghosting’
 Changing a buffer during its lifespan is blocking
 This can be managed using circular buffers, similarly to render targets

Use VBOs and Indexed Geometry
 VBOs benefit from driver level optimisations
 Vertex Array Objects (VAOs) may be even better
 Index your geometry
 It makes your data smaller
 It also benefits from driver level optimisations
 Use static VBOs ideally, and consider the assets lifespan
 Don’t use a VBO for dynamic data
Golden Rule 6

Batch Your Draw Calls
 Group static objects, and draw once
 Static objects are objects that are static relative to each other
 Sort objects by render state
 Emphasis on texture and program state changes
 Try using texture atlases
 Remember Golden Rule 5 if your going to update the contents
Golden Rule 7

Compress Your Textures
 The lower the bitrate the less bandwidth consumed
 Use PVRTC & PVRTC2, at 2 & 4bpp RGB/RGBA
 Don’t confuse this with PNG or JPG which are
decompressed in memory
 Usually to 24bpp or 32bpp
 PVRTC is read directly from the compressed form
 It stays in memory at 2bpp or 4bpp
 Use MIP-Mapping and remember ‘Good Enough’
Golden Rule 8

Alpha Test/Discard & Alpha Blend
 Alpha Test removes advantages of ‘Early-Z’ techniques and HSR
 Fragment visibility isn’t known until fragment shader is run
 Prefer Alpha Blending, and render in the order Opaque, Alpha Test, Alpha Blend
 Makes best use of HSR
Golden Rule 9

Use ‘Clear’ and ‘DiscardFrameBuffer’
 Calling ‘Clear’ ensures the previous render isn’t uploaded to the GPU
 By default, the depth/stencil buffers are written to memory at the end of a render
 Calling DiscardFrameBufferExt(…) ensures these buffers aren’t written to system memory
 Look for the ‘GL_EXT_discard_framebuffer’ extension
Do both if you can!
Golden Rule 10

Questions ?
Or drop us an email: devtech@imgtec.com
Download our PowerVR SDK: bit.ly/PVR_SDK
Also, you can download examples, tools and
shell as an Android SDK add-on:
http://install.powervrinsider.com/androidsdk.xml

www.imgtec.com
April 2013

Droidcon2013 triangles gangolells_imagination

More Related Content

What's hot

Viewers also liked

Similar to Droidcon2013 triangles gangolells_imagination

More from Droidcon Berlin

Recently uploaded

Droidcon2013 triangles gangolells_imagination