Droidcon2013 triangles gangolells_imagination


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Droidcon2013 triangles gangolells_imagination

  1. 1. © Imagination Technologies p1www.imgtec.comApril 2013It’s all about triangles!Understanding the GPU in your pocket towrite better code
  2. 2. © Imagination Technologies p2Introductions Who? Guillem Vinals Gangolells (guillem.vinalsgangolells@imgtec.com) Developer Technology Engineer, PowerVR Graphics What? It’s all about triangles! Understanding the GPU in your pocket to write better code
  3. 3. © Imagination Technologies p3Company overview Leading silicon, software & cloud IP supplier Multimedia: graphics; GPU compute; video; vision Communications: demodulation; connectivity; sensors Processors: applications CPUs; embedded MCUs Cloud: device and user management; services Targeting high volume, high growth markets Top semis and OEMs for mobile, connected home consumerautomotive and more Pure: our strategic product division Digital radio, internet connected audio, home automation Established technology powerhouse Founded 1985; London FTSE 250 (IMG.L); ~1,500 employees UK HQ; global operationsComprehensive IPportfolio for SoCs& cloud connectivityIP business pathfinderMarket maker/driver
  4. 4. © Imagination Technologies p4www.imgtec.comA Crash Course in Graphics Architectures
  5. 5. © Imagination Technologies p5Immediate Mode Renderer (IMR) Buffers kept in system memory High bandwidth use, power consumption & latency Each triangle is processed to completion in submission order Wastes processing time and thus power due to “overdraw” ‘Early-Z’ techniques help but are only as good as your geometry sorting
  6. 6. © Imagination Technologies p6Concept: Tiling Frame buffer sub-divided into Tiles 32x32 pixels per tile, for example Varies by device Geometry is sorted into affected tiles Allows each tile to be processed independently Small number of fragments per tile Allows on-chip memory to be used
  7. 7. © Imagination Technologies p7Tile Based Renderer (TBR) Rasterizing performed per-tile Allows the use of fast, on-chip, buffers Each triangle is processed to completion in submission order Wastes processing time and thus power due to “overdraw” ‘Early-Z’ techniques help but are only as good as your geometry sorting
  8. 8. © Imagination Technologies p8Concept: Deferred Rendering Fragments - Two stage process Hidden Surface Removal (HSR) Shading HSR is pixel perfect Only visible fragments pass, no ‘overdraw’ Only requires position data Less bandwidth & processing, saves power HSR is submission order independent No need for applications to submit geometry front to back
  9. 9. © Imagination Technologies p9Tile Based Deferred Renderer (TBDR) = PowerVR Rasterizing performed per-tile Allows the use of fast, on-chip, buffers Hidden Surface Removal (HSR) reduces overdraw Pixel perfect, and submission order independent, no geometry sorting needed Optimised to only retrieve information required (*), saving even more bandwidth Saves power and bandwidth
  10. 10. © Imagination Technologies p10www.imgtec.comPowerVR Hardware Overview
  11. 11. © Imagination Technologies p11Pipeline SummaryGeometry Processing
  12. 12. © Imagination Technologies p12Pipeline SummaryFragment Processing
  13. 13. © Imagination Technologies p13Bandwidth Saving Bandwidth usage is the biggest contributor to GPU power consumption Saving bandwidth means staying ‘on chip’ as much as possible It also means throwing away work you don’t need to do PowerVR is designed from the ground up to do all of these
  14. 14. © Imagination Technologies p14Unified Architecture
  15. 15. © Imagination Technologies p15Pixel Back End (PBE) Combines sub-samples for on-chip MSAA MSAA Performed per-tile Done using sub-sampling Negligible impact on bandwidth Each sub-sample benefits from HSR Series5/5XT: 4x MSAA Series6: 8x MSAA Performs final format conversions Up scaling, down scaling etc. (Internal TrueColour)
  16. 16. © Imagination Technologies p16www.imgtec.comFurther Considerations
  17. 17. © Imagination Technologies p17Micro Kernel Specialised software running on the USSE (Series5) or its own core (Series6) Allows the GPU and CPU to operate with minimal synchronisation Improves performance by handling interrupts on the GPU Competing solutions handle interrupts on CPU (in the driver)
  18. 18. © Imagination Technologies p18Multicore Near linear performance scaling Small fixed overhead known at design time Geometry processing load-balanced Cores share the processing effort Tiling enables parallel fragment processing Any core can work on any tile when available Each tile is self-contained Multi-core logic is handled by the hardware Completely transparent to the developer
  19. 19. © Imagination Technologies p19Alpha Blending Tiling GPUs don’t need to reach in to system memory to perform an alpha blend The colour buffer is on-chip This means that alpha blending doesn’t cost you any additional bandwidth It also means that alpha blending is fast…very fast HSR will also save you some work by throwing away occluded blending work Remember: Opaque, Alpha Test, Alpha Blend
  20. 20. © Imagination Technologies p20www.imgtec.comGolden Rules
  21. 21. © Imagination Technologies p21Common BottlenecksBased on past observationMost LikelyCPU UsageBandwidth UsageCPU/GPU SynchronisationFragment Shader InstructionsGeometry UploadTexture UploadVertex Shader InstructionsGeometry ComplexityLeast Likely
  22. 22. © Imagination Technologies p22Warning!Some of these rules may seem obvious to you……we still see them broken everyday……if you know them, please bear with us
  23. 23. © Imagination Technologies p23Understand Your Target Device No two devices are identical Even when they look the same Different SoCs will have different bottlenecks Make sure you test against different chips Make sure you understand the hardware You don’t want your optimisation to make things worse Clearly, you’re already doing this….your here Golden Rule 1
  24. 24. © Imagination Technologies p24Don’t Waste GPU Time The Principle of “Good Enough” Dont waste polygons on un-needed detail Textures should never be much larger than their size on screen Why waste time loading a 1Kx1K texture if it’s never going to appear bigger than 128x128? If the user wont notice it, don’t waste time processing itGolden Rule 2
  25. 25. © Imagination Technologies p25Promote Calculations up The Chain Don’t do a calculation you don’t need to do If you can do it once per scene, do it once per scene If you can’t, try and do it per vertex There are generally fewer vertices in a scene than fragments. If you can, pre-bake E.g. lighting Remember, ‘Good Enough’Golden Rule 3
  26. 26. © Imagination Technologies p26Don’t Access an Active Render Target Accessing a render target from the CPU is very bad for performance If it’s not done properly it will synchronise the GPU and CPU….This is Bad™Golden Rule 4
  27. 27. © Imagination Technologies p27Accessing Render Targets Safely Use EGL_KHR_fence_sync Use CPU side handles to GPU mapped memory to avoid blocking calls E.g. GraphicsBuffer (or gralloc) on AndroidGolden Rule 4 Cont.
  28. 28. © Imagination Technologies p28Avoid Updating Active Assets Assets may need to stay the same for multiple frames We refer to this as an asset’s ‘Lifespan’Golden Rule 5 Changing a texture during its lifespan may cause ‘Ghosting’ Changing a buffer during its lifespan is blocking This can be managed using circular buffers, similarly to render targets
  29. 29. © Imagination Technologies p29Use VBOs and Indexed Geometry VBOs benefit from driver level optimisations Vertex Array Objects (VAOs) may be even better Index your geometry It makes your data smaller It also benefits from driver level optimisations Use static VBOs ideally, and consider the assets lifespan Don’t use a VBO for dynamic dataGolden Rule 6
  30. 30. © Imagination Technologies p30Batch Your Draw Calls Group static objects, and draw once Static objects are objects that are static relative to each other Sort objects by render state Emphasis on texture and program state changes Try using texture atlases Remember Golden Rule 5 if your going to update the contentsGolden Rule 7
  31. 31. © Imagination Technologies p31Compress Your Textures The lower the bitrate the less bandwidth consumed Use PVRTC & PVRTC2, at 2 & 4bpp RGB/RGBA Don’t confuse this with PNG or JPG which aredecompressed in memory Usually to 24bpp or 32bpp PVRTC is read directly from the compressed form It stays in memory at 2bpp or 4bpp Use MIP-Mapping and remember ‘Good Enough’Golden Rule 8
  32. 32. © Imagination Technologies p32Alpha Test/Discard & Alpha Blend Alpha Test removes advantages of ‘Early-Z’ techniques and HSR Fragment visibility isn’t known until fragment shader is run Prefer Alpha Blending, and render in the order Opaque, Alpha Test, Alpha Blend Makes best use of HSRGolden Rule 9
  33. 33. © Imagination Technologies p33Use ‘Clear’ and ‘DiscardFrameBuffer’ Calling ‘Clear’ ensures the previous render isn’t uploaded to the GPU By default, the depth/stencil buffers are written to memory at the end of a render Calling DiscardFrameBufferExt(…) ensures these buffers aren’t written to system memory Look for the ‘GL_EXT_discard_framebuffer’ extensionDo both if you can!Golden Rule 10
  34. 34. © Imagination Technologies p34Questions ?Or drop us an email: devtech@imgtec.comDownload our PowerVR SDK: bit.ly/PVR_SDKAlso, you can download examples, tools andshell as an Android SDK add-on:http://install.powervrinsider.com/androidsdk.xml
  35. 35. © Imagination Technologies p35www.imgtec.comApril 2013