Droidcon2013 triangles gangolells_imagination

  • 694 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
694
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
17
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © Imagination Technologies p1www.imgtec.comApril 2013It’s all about triangles!Understanding the GPU in your pocket towrite better code
  • 2. © Imagination Technologies p2Introductions Who? Guillem Vinals Gangolells (guillem.vinalsgangolells@imgtec.com) Developer Technology Engineer, PowerVR Graphics What? It’s all about triangles! Understanding the GPU in your pocket to write better code
  • 3. © Imagination Technologies p3Company overview Leading silicon, software & cloud IP supplier Multimedia: graphics; GPU compute; video; vision Communications: demodulation; connectivity; sensors Processors: applications CPUs; embedded MCUs Cloud: device and user management; services Targeting high volume, high growth markets Top semis and OEMs for mobile, connected home consumerautomotive and more Pure: our strategic product division Digital radio, internet connected audio, home automation Established technology powerhouse Founded 1985; London FTSE 250 (IMG.L); ~1,500 employees UK HQ; global operationsComprehensive IPportfolio for SoCs& cloud connectivityIP business pathfinderMarket maker/driver
  • 4. © Imagination Technologies p4www.imgtec.comA Crash Course in Graphics Architectures
  • 5. © Imagination Technologies p5Immediate Mode Renderer (IMR) Buffers kept in system memory High bandwidth use, power consumption & latency Each triangle is processed to completion in submission order Wastes processing time and thus power due to “overdraw” ‘Early-Z’ techniques help but are only as good as your geometry sorting
  • 6. © Imagination Technologies p6Concept: Tiling Frame buffer sub-divided into Tiles 32x32 pixels per tile, for example Varies by device Geometry is sorted into affected tiles Allows each tile to be processed independently Small number of fragments per tile Allows on-chip memory to be used
  • 7. © Imagination Technologies p7Tile Based Renderer (TBR) Rasterizing performed per-tile Allows the use of fast, on-chip, buffers Each triangle is processed to completion in submission order Wastes processing time and thus power due to “overdraw” ‘Early-Z’ techniques help but are only as good as your geometry sorting
  • 8. © Imagination Technologies p8Concept: Deferred Rendering Fragments - Two stage process Hidden Surface Removal (HSR) Shading HSR is pixel perfect Only visible fragments pass, no ‘overdraw’ Only requires position data Less bandwidth & processing, saves power HSR is submission order independent No need for applications to submit geometry front to back
  • 9. © Imagination Technologies p9Tile Based Deferred Renderer (TBDR) = PowerVR Rasterizing performed per-tile Allows the use of fast, on-chip, buffers Hidden Surface Removal (HSR) reduces overdraw Pixel perfect, and submission order independent, no geometry sorting needed Optimised to only retrieve information required (*), saving even more bandwidth Saves power and bandwidth
  • 10. © Imagination Technologies p10www.imgtec.comPowerVR Hardware Overview
  • 11. © Imagination Technologies p11Pipeline SummaryGeometry Processing
  • 12. © Imagination Technologies p12Pipeline SummaryFragment Processing
  • 13. © Imagination Technologies p13Bandwidth Saving Bandwidth usage is the biggest contributor to GPU power consumption Saving bandwidth means staying ‘on chip’ as much as possible It also means throwing away work you don’t need to do PowerVR is designed from the ground up to do all of these
  • 14. © Imagination Technologies p14Unified Architecture
  • 15. © Imagination Technologies p15Pixel Back End (PBE) Combines sub-samples for on-chip MSAA MSAA Performed per-tile Done using sub-sampling Negligible impact on bandwidth Each sub-sample benefits from HSR Series5/5XT: 4x MSAA Series6: 8x MSAA Performs final format conversions Up scaling, down scaling etc. (Internal TrueColour)
  • 16. © Imagination Technologies p16www.imgtec.comFurther Considerations
  • 17. © Imagination Technologies p17Micro Kernel Specialised software running on the USSE (Series5) or its own core (Series6) Allows the GPU and CPU to operate with minimal synchronisation Improves performance by handling interrupts on the GPU Competing solutions handle interrupts on CPU (in the driver)
  • 18. © Imagination Technologies p18Multicore Near linear performance scaling Small fixed overhead known at design time Geometry processing load-balanced Cores share the processing effort Tiling enables parallel fragment processing Any core can work on any tile when available Each tile is self-contained Multi-core logic is handled by the hardware Completely transparent to the developer
  • 19. © Imagination Technologies p19Alpha Blending Tiling GPUs don’t need to reach in to system memory to perform an alpha blend The colour buffer is on-chip This means that alpha blending doesn’t cost you any additional bandwidth It also means that alpha blending is fast…very fast HSR will also save you some work by throwing away occluded blending work Remember: Opaque, Alpha Test, Alpha Blend
  • 20. © Imagination Technologies p20www.imgtec.comGolden Rules
  • 21. © Imagination Technologies p21Common BottlenecksBased on past observationMost LikelyCPU UsageBandwidth UsageCPU/GPU SynchronisationFragment Shader InstructionsGeometry UploadTexture UploadVertex Shader InstructionsGeometry ComplexityLeast Likely
  • 22. © Imagination Technologies p22Warning!Some of these rules may seem obvious to you……we still see them broken everyday……if you know them, please bear with us
  • 23. © Imagination Technologies p23Understand Your Target Device No two devices are identical Even when they look the same Different SoCs will have different bottlenecks Make sure you test against different chips Make sure you understand the hardware You don’t want your optimisation to make things worse Clearly, you’re already doing this….your here Golden Rule 1
  • 24. © Imagination Technologies p24Don’t Waste GPU Time The Principle of “Good Enough” Dont waste polygons on un-needed detail Textures should never be much larger than their size on screen Why waste time loading a 1Kx1K texture if it’s never going to appear bigger than 128x128? If the user wont notice it, don’t waste time processing itGolden Rule 2
  • 25. © Imagination Technologies p25Promote Calculations up The Chain Don’t do a calculation you don’t need to do If you can do it once per scene, do it once per scene If you can’t, try and do it per vertex There are generally fewer vertices in a scene than fragments. If you can, pre-bake E.g. lighting Remember, ‘Good Enough’Golden Rule 3
  • 26. © Imagination Technologies p26Don’t Access an Active Render Target Accessing a render target from the CPU is very bad for performance If it’s not done properly it will synchronise the GPU and CPU….This is Bad™Golden Rule 4
  • 27. © Imagination Technologies p27Accessing Render Targets Safely Use EGL_KHR_fence_sync Use CPU side handles to GPU mapped memory to avoid blocking calls E.g. GraphicsBuffer (or gralloc) on AndroidGolden Rule 4 Cont.
  • 28. © Imagination Technologies p28Avoid Updating Active Assets Assets may need to stay the same for multiple frames We refer to this as an asset’s ‘Lifespan’Golden Rule 5 Changing a texture during its lifespan may cause ‘Ghosting’ Changing a buffer during its lifespan is blocking This can be managed using circular buffers, similarly to render targets
  • 29. © Imagination Technologies p29Use VBOs and Indexed Geometry VBOs benefit from driver level optimisations Vertex Array Objects (VAOs) may be even better Index your geometry It makes your data smaller It also benefits from driver level optimisations Use static VBOs ideally, and consider the assets lifespan Don’t use a VBO for dynamic dataGolden Rule 6
  • 30. © Imagination Technologies p30Batch Your Draw Calls Group static objects, and draw once Static objects are objects that are static relative to each other Sort objects by render state Emphasis on texture and program state changes Try using texture atlases Remember Golden Rule 5 if your going to update the contentsGolden Rule 7
  • 31. © Imagination Technologies p31Compress Your Textures The lower the bitrate the less bandwidth consumed Use PVRTC & PVRTC2, at 2 & 4bpp RGB/RGBA Don’t confuse this with PNG or JPG which aredecompressed in memory Usually to 24bpp or 32bpp PVRTC is read directly from the compressed form It stays in memory at 2bpp or 4bpp Use MIP-Mapping and remember ‘Good Enough’Golden Rule 8
  • 32. © Imagination Technologies p32Alpha Test/Discard & Alpha Blend Alpha Test removes advantages of ‘Early-Z’ techniques and HSR Fragment visibility isn’t known until fragment shader is run Prefer Alpha Blending, and render in the order Opaque, Alpha Test, Alpha Blend Makes best use of HSRGolden Rule 9
  • 33. © Imagination Technologies p33Use ‘Clear’ and ‘DiscardFrameBuffer’ Calling ‘Clear’ ensures the previous render isn’t uploaded to the GPU By default, the depth/stencil buffers are written to memory at the end of a render Calling DiscardFrameBufferExt(…) ensures these buffers aren’t written to system memory Look for the ‘GL_EXT_discard_framebuffer’ extensionDo both if you can!Golden Rule 10
  • 34. © Imagination Technologies p34Questions ?Or drop us an email: devtech@imgtec.comDownload our PowerVR SDK: bit.ly/PVR_SDKAlso, you can download examples, tools andshell as an Android SDK add-on:http://install.powervrinsider.com/androidsdk.xml
  • 35. © Imagination Technologies p35www.imgtec.comApril 2013