Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced Mobile Optimizations.ppt


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Advanced Mobile Optimizations.ppt

  1. 1. Advanced Mobile Optimizations How to go to 60 fps after you have removed all Sleep calls ;-)
  2. 2. Disclaimer <ul><li>The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity </li></ul>
  3. 3. Optimization Mindset <ul><li>you can't just make your game faster </li></ul><ul><ul><li>there is no magic bullet </li></ul></ul><ul><ul><li>very specific stuff </li></ul></ul><ul><ul><ul><li>not the same as scripting charachter </li></ul></ul></ul>
  4. 4. Optimization Mindset <ul><li>not in specific order </li></ul><ul><li>know </li></ul><ul><li>think </li></ul><ul><li>measure </li></ul>
  5. 5. Optimization Mindset <ul><li>You can't avoid any of that </li></ul><ul><ul><li>no, really </li></ul></ul>
  6. 6. Optimization Mindset <ul><li>know + think = shoot in the dark </li></ul><ul><ul><li>you just write code hoping for the best </li></ul></ul><ul><li>know + measure = shoot in the dark </li></ul><ul><ul><li>you are missing &quot;understand&quot; part </li></ul></ul><ul><li>think + measure = shoot in the dark </li></ul><ul><ul><li>you solve abstract problem, not real </li></ul></ul>
  7. 7. Optimization Mindset: know + think <ul><li>hardware is more complex then you think </li></ul><ul><ul><li>highly parallel </li></ul></ul><ul><ul><li>deep pipelining </li></ul></ul><ul><ul><li>when you write asm - high-level already </li></ul></ul>
  8. 8. Optimization Mindset: know + measure <ul><li>knowledge is static </li></ul><ul><li>knowledge comes from the past </li></ul><ul><li>knowledge is general </li></ul>
  9. 9. Optimization Mindset: know + measure <ul><li>qsort vs bubble sort </li></ul><ul><ul><li>sure, qsort is faster </li></ul></ul><ul><li>but you are missing the point </li></ul><ul><ul><li>maybe radix? </li></ul></ul><ul><ul><li>maybe no need to sort? </li></ul></ul><ul><ul><li>maybe insertion? </li></ul></ul><ul><ul><li>parallel sorting network? </li></ul></ul>
  10. 10. Optimization Mindset: think + measure <ul><li>solving abstract problem </li></ul><ul><ul><li>example: GPU </li></ul></ul><ul><ul><ul><li>optimizing for RIVA TNT and GTX is different </li></ul></ul></ul>
  11. 11. Optimization Mindset <ul><li>well, if you are missing two from the three </li></ul><ul><ul><li>no comments </li></ul></ul>
  12. 12. Know <ul><li>your hardware </li></ul><ul><li>your data </li></ul><ul><ul><li>knowing data is interleaved with think </li></ul></ul><ul><ul><li>we will talk more of it in &quot;think&quot; </li></ul></ul>
  13. 13. Know your hardware <ul><li>GPU </li></ul><ul><li>CPU </li></ul><ul><li>whatever </li></ul><ul><ul><li>e.g. disk load speed </li></ul></ul>
  14. 14. Know your hardware: GPU <ul><li>Pipeline </li></ul><ul><ul><li>meaning - slow step = slow everything </li></ul></ul><ul><ul><li>you are as slow as your bottleneck </li></ul></ul><ul><li>Know your pipeline </li></ul><ul><li>Won't go into full pipeline spec </li></ul><ul><ul><li>Resources section </li></ul></ul><ul><li>Just common/biggest problems </li></ul>
  15. 15. Know your hardware: GPU Geometry <ul><li>pre/post tnl cache </li></ul><ul><ul><li>should use indexed geometry or not </li></ul></ul><ul><li>cache hit rate </li></ul><ul><ul><li>strips vs tri list </li></ul></ul><ul><li>memory throughput </li></ul><ul><ul><li>vertex size </li></ul></ul><ul><li>fetch cost (memory) </li></ul><ul><ul><li>pack attributes or not </li></ul></ul>
  16. 16. Know your hardware: GPU Textures <ul><li>Texture Cache </li></ul><ul><ul><li>swizzle </li></ul></ul><ul><ul><li>compression </li></ul></ul><ul><ul><li>mip-maps </li></ul></ul><ul><li>Biggest memory hog </li></ul>
  17. 17. Know your hardware: GPU Shaders <ul><li>VertexProgram vs FragmentShader </li></ul><ul><ul><li>balancing </li></ul></ul><ul><ul><li>attributes </li></ul></ul><ul><li>Unified Shaders </li></ul><ul><ul><li>load balancing </li></ul></ul><ul><li>Precision </li></ul><ul><ul><li>gles: highp/mediump/lowp </li></ul></ul><ul><ul><li>CG: float/half/fixed (iirc) </li></ul></ul>
  18. 18. Know your hardware: GPU Rasterization <ul><li>Fillrate (memory speed) </li></ul><ul><ul><li>alpha </li></ul></ul><ul><li>2x2 samples (or more) </li></ul><ul><ul><li>why GometryLOD matters </li></ul></ul>
  19. 19. Know your hardware: CPU <ul><li>Mobile = in-order RISC </li></ul><ul><ul><li>for stupid code far worse than CISC </li></ul></ul><ul><li>2 main issues: </li></ul><ul><ul><li>Memory speed </li></ul></ul><ul><ul><li>Computation speed </li></ul></ul>
  20. 20. Know your hardware: CPU Memory <ul><li>This is single most important factor </li></ul><ul><ul><li>memory access far slower then computation </li></ul></ul><ul><li>Latency vs Throughput </li></ul><ul><li>Caches </li></ul><ul><ul><li>fast memory </li></ul></ul><ul><ul><li>your best friend </li></ul></ul><ul><ul><li>L1/L2/whatever </li></ul></ul><ul><li>LHS </li></ul>
  21. 21. Know your hardware: CPU Computations <ul><li>SIMD </li></ul><ul><ul><li>better memory usage </li></ul></ul><ul><ul><li>better arithmetic usage (4 vals instead of 1) </li></ul></ul>
  22. 22. Know your target hardware <ul><li>There were general rules </li></ul><ul><li>But you are running on that particular piece of sh... hardware </li></ul>
  23. 23. Know your target hardware: PowerVR <ul><li>TBDR </li></ul><ul><ul><li>perfect hidden surface removal </li></ul></ul><ul><ul><li>Alpha-Test/discard </li></ul></ul><ul><li>shader precision </li></ul><ul><li>unified shaders </li></ul><ul><li>Tegra / ATI-AMD / Adreno more common </li></ul>
  24. 24. Know your target hardware: ARM <ul><li>VFP = FPU on steroids (not real SIMD) </li></ul><ul><ul><li>scalar instructions at same speed as vectorized </li></ul></ul><ul><li>NEON = SIMD </li></ul><ul><ul><li>more registers </li></ul></ul><ul><ul><li>awesome load/store instructions </li></ul></ul><ul><ul><li>not as cool as Altivec but cool enough for mobiles </li></ul></ul>
  25. 25. Know your target hardware: ARM <ul><li>Conditional execution of most instructions </li></ul><ul><li>Fold shifts and rotates into the &quot;data processing&quot; instructions </li></ul><ul><ul><li>load structure from array by index </li></ul></ul><ul><li>Thumb + float = disaster </li></ul><ul><ul><li>switch back and forth between Thumb mode and regular 32-bit mode </li></ul></ul>
  26. 26. Know your hardware: Resources <ul><li>RTR </li></ul><ul><li>lots of whitepapers: </li></ul><ul><ul><li>powerVR (imgtech) tegra (nvidia) adreno (qualcomm) </li></ul></ul><ul><ul><li>AMD/ATI - basically the same as X360, but much smaller tiles </li></ul></ul><ul><li>ARM dev center </li></ul>
  27. 27. Think <ul><li>Think about your data </li></ul><ul><li>Think about your algorithms </li></ul><ul><li>Think about your constraints </li></ul><ul><li>Think about your hardware </li></ul>
  28. 28. Think Basics <ul><li>CPU vs GPU </li></ul><ul><ul><li>e.g. draw calls </li></ul></ul><ul><ul><ul><li>pure CPU cost </li></ul></ul></ul><ul><li>CPU: </li></ul><ul><ul><li>memory vs arithmetic </li></ul></ul><ul><ul><ul><li>memory slower </li></ul></ul></ul><ul><li>GPU: </li></ul><ul><ul><li>vprog vs fshader </li></ul></ul><ul><ul><li>memory vs arithmetic </li></ul></ul>
  29. 29. Think Memory <ul><li>fragmentation </li></ul><ul><li>data organization </li></ul><ul><ul><li>AOS vs SOA </li></ul></ul><ul><ul><li>hot/cold split </li></ul></ul><ul><li>data structures </li></ul><ul><ul><li>linear vs random </li></ul></ul><ul><ul><li>array vs list </li></ul></ul><ul><ul><li>map vs hashtable </li></ul></ul><ul><ul><li>allocators </li></ul></ul>
  30. 30. Think Constraints <ul><li>GPU: will you see the difference? </li></ul><ul><ul><li>really? </li></ul></ul><ul><ul><li>on mobile screen? </li></ul></ul><ul><ul><li>on that one small thingy in the corner? </li></ul></ul><ul><li>CPU: will you need that? </li></ul><ul><ul><li>e.g. physics in casual game? </li></ul></ul><ul><li>Memory: will you need that? </li></ul><ul><ul><li>will you need more then XXX actors? </li></ul></ul>
  31. 31. Measure <ul><li>you didn't optimize anything if you didn't measure difference </li></ul><ul><li>you can't optimize if you don't know what needs to be optimized </li></ul><ul><ul><li>if you can't measure what takes time </li></ul></ul>
  32. 32. Measure Tools <ul><li>there are lots of tools </li></ul><ul><ul><li>instruments (ios) </li></ul></ul><ul><ul><li>perfhud (tegra) </li></ul></ul><ul><ul><li>adreno profiler (qualcomm) </li></ul></ul><ul><ul><li>some more probably </li></ul></ul><ul><li>Poor-man profiler </li></ul><ul><ul><li>timers </li></ul></ul>
  33. 33. Unity use case: random bits <ul><li>Mobile shaders </li></ul><ul><ul><li>specialized of usual built-ins </li></ul></ul><ul><li>Skinning </li></ul><ul><ul><li>full NEON/VFP impl </li></ul></ul><ul><ul><ul><li>usually 10-15% of c-code time </li></ul></ul></ul><ul><ul><ul><ul><li>and we are not done optimizing it ;-) </li></ul></ul></ul></ul><ul><li>Rej's baking material to texture and coming soon BRDF baking to texture </li></ul>
  34. 34. Unity use case: random bits <ul><li>Remote Profiler </li></ul><ul><ul><li>run on target hw, data is transferred over wifi </li></ul></ul><ul><ul><li>collect in Editor and show pretty graphs ;-) </li></ul></ul><ul><li>Sort alpha-test *after* opaque </li></ul><ul><li>check *lots* of extensions </li></ul><ul><li>LODs - almost done </li></ul><ul><li>Vertex Cache optimization - after LODs ;-) </li></ul>
  35. 35. Closing Words <ul><li>Know hardware </li></ul><ul><li>Know data </li></ul><ul><li>Think data </li></ul><ul><li>Think constraints </li></ul><ul><li>Measure always </li></ul><ul><ul><li>You better know earlier </li></ul></ul><ul><li>You should be always optimizing </li></ul>
  36. 36. Questions