Advanced Mobile Optimizations.ppt

807 views
762 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
807
On SlideShare
0
From Embeds
0
Number of Embeds
117
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Advanced Mobile Optimizations.ppt

  1. 1. Advanced Mobile Optimizations How to go to 60 fps after you have removed all Sleep calls ;-)
  2. 2. Disclaimer <ul><li>The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity </li></ul>
  3. 3. Optimization Mindset <ul><li>you can't just make your game faster </li></ul><ul><ul><li>there is no magic bullet </li></ul></ul><ul><ul><li>very specific stuff </li></ul></ul><ul><ul><ul><li>not the same as scripting charachter </li></ul></ul></ul>
  4. 4. Optimization Mindset <ul><li>not in specific order </li></ul><ul><li>know </li></ul><ul><li>think </li></ul><ul><li>measure </li></ul>
  5. 5. Optimization Mindset <ul><li>You can't avoid any of that </li></ul><ul><ul><li>no, really </li></ul></ul>
  6. 6. Optimization Mindset <ul><li>know + think = shoot in the dark </li></ul><ul><ul><li>you just write code hoping for the best </li></ul></ul><ul><li>know + measure = shoot in the dark </li></ul><ul><ul><li>you are missing &quot;understand&quot; part </li></ul></ul><ul><li>think + measure = shoot in the dark </li></ul><ul><ul><li>you solve abstract problem, not real </li></ul></ul>
  7. 7. Optimization Mindset: know + think <ul><li>hardware is more complex then you think </li></ul><ul><ul><li>highly parallel </li></ul></ul><ul><ul><li>deep pipelining </li></ul></ul><ul><ul><li>when you write asm - high-level already </li></ul></ul>
  8. 8. Optimization Mindset: know + measure <ul><li>knowledge is static </li></ul><ul><li>knowledge comes from the past </li></ul><ul><li>knowledge is general </li></ul>
  9. 9. Optimization Mindset: know + measure <ul><li>qsort vs bubble sort </li></ul><ul><ul><li>sure, qsort is faster </li></ul></ul><ul><li>but you are missing the point </li></ul><ul><ul><li>maybe radix? </li></ul></ul><ul><ul><li>maybe no need to sort? </li></ul></ul><ul><ul><li>maybe insertion? </li></ul></ul><ul><ul><li>parallel sorting network? </li></ul></ul>
  10. 10. Optimization Mindset: think + measure <ul><li>solving abstract problem </li></ul><ul><ul><li>example: GPU </li></ul></ul><ul><ul><ul><li>optimizing for RIVA TNT and GTX is different </li></ul></ul></ul>
  11. 11. Optimization Mindset <ul><li>well, if you are missing two from the three </li></ul><ul><ul><li>no comments </li></ul></ul>
  12. 12. Know <ul><li>your hardware </li></ul><ul><li>your data </li></ul><ul><ul><li>knowing data is interleaved with think </li></ul></ul><ul><ul><li>we will talk more of it in &quot;think&quot; </li></ul></ul>
  13. 13. Know your hardware <ul><li>GPU </li></ul><ul><li>CPU </li></ul><ul><li>whatever </li></ul><ul><ul><li>e.g. disk load speed </li></ul></ul>
  14. 14. Know your hardware: GPU <ul><li>Pipeline </li></ul><ul><ul><li>meaning - slow step = slow everything </li></ul></ul><ul><ul><li>you are as slow as your bottleneck </li></ul></ul><ul><li>Know your pipeline </li></ul><ul><li>Won't go into full pipeline spec </li></ul><ul><ul><li>Resources section </li></ul></ul><ul><li>Just common/biggest problems </li></ul>
  15. 15. Know your hardware: GPU Geometry <ul><li>pre/post tnl cache </li></ul><ul><ul><li>should use indexed geometry or not </li></ul></ul><ul><li>cache hit rate </li></ul><ul><ul><li>strips vs tri list </li></ul></ul><ul><li>memory throughput </li></ul><ul><ul><li>vertex size </li></ul></ul><ul><li>fetch cost (memory) </li></ul><ul><ul><li>pack attributes or not </li></ul></ul>
  16. 16. Know your hardware: GPU Textures <ul><li>Texture Cache </li></ul><ul><ul><li>swizzle </li></ul></ul><ul><ul><li>compression </li></ul></ul><ul><ul><li>mip-maps </li></ul></ul><ul><li>Biggest memory hog </li></ul>
  17. 17. Know your hardware: GPU Shaders <ul><li>VertexProgram vs FragmentShader </li></ul><ul><ul><li>balancing </li></ul></ul><ul><ul><li>attributes </li></ul></ul><ul><li>Unified Shaders </li></ul><ul><ul><li>load balancing </li></ul></ul><ul><li>Precision </li></ul><ul><ul><li>gles: highp/mediump/lowp </li></ul></ul><ul><ul><li>CG: float/half/fixed (iirc) </li></ul></ul>
  18. 18. Know your hardware: GPU Rasterization <ul><li>Fillrate (memory speed) </li></ul><ul><ul><li>alpha </li></ul></ul><ul><li>2x2 samples (or more) </li></ul><ul><ul><li>why GometryLOD matters </li></ul></ul>
  19. 19. Know your hardware: CPU <ul><li>Mobile = in-order RISC </li></ul><ul><ul><li>for stupid code far worse than CISC </li></ul></ul><ul><li>2 main issues: </li></ul><ul><ul><li>Memory speed </li></ul></ul><ul><ul><li>Computation speed </li></ul></ul>
  20. 20. Know your hardware: CPU Memory <ul><li>This is single most important factor </li></ul><ul><ul><li>memory access far slower then computation </li></ul></ul><ul><li>Latency vs Throughput </li></ul><ul><li>Caches </li></ul><ul><ul><li>fast memory </li></ul></ul><ul><ul><li>your best friend </li></ul></ul><ul><ul><li>L1/L2/whatever </li></ul></ul><ul><li>LHS </li></ul>
  21. 21. Know your hardware: CPU Computations <ul><li>SIMD </li></ul><ul><ul><li>better memory usage </li></ul></ul><ul><ul><li>better arithmetic usage (4 vals instead of 1) </li></ul></ul>
  22. 22. Know your target hardware <ul><li>There were general rules </li></ul><ul><li>But you are running on that particular piece of sh... hardware </li></ul>
  23. 23. Know your target hardware: PowerVR <ul><li>TBDR </li></ul><ul><ul><li>perfect hidden surface removal </li></ul></ul><ul><ul><li>Alpha-Test/discard </li></ul></ul><ul><li>shader precision </li></ul><ul><li>unified shaders </li></ul><ul><li>Tegra / ATI-AMD / Adreno more common </li></ul>
  24. 24. Know your target hardware: ARM <ul><li>VFP = FPU on steroids (not real SIMD) </li></ul><ul><ul><li>scalar instructions at same speed as vectorized </li></ul></ul><ul><li>NEON = SIMD </li></ul><ul><ul><li>more registers </li></ul></ul><ul><ul><li>awesome load/store instructions </li></ul></ul><ul><ul><li>not as cool as Altivec but cool enough for mobiles </li></ul></ul>
  25. 25. Know your target hardware: ARM <ul><li>Conditional execution of most instructions </li></ul><ul><li>Fold shifts and rotates into the &quot;data processing&quot; instructions </li></ul><ul><ul><li>load structure from array by index </li></ul></ul><ul><li>Thumb + float = disaster </li></ul><ul><ul><li>switch back and forth between Thumb mode and regular 32-bit mode </li></ul></ul>
  26. 26. Know your hardware: Resources <ul><li>RTR </li></ul><ul><li>lots of whitepapers: </li></ul><ul><ul><li>powerVR (imgtech) tegra (nvidia) adreno (qualcomm) </li></ul></ul><ul><ul><li>AMD/ATI - basically the same as X360, but much smaller tiles </li></ul></ul><ul><li>ARM dev center </li></ul>
  27. 27. Think <ul><li>Think about your data </li></ul><ul><li>Think about your algorithms </li></ul><ul><li>Think about your constraints </li></ul><ul><li>Think about your hardware </li></ul>
  28. 28. Think Basics <ul><li>CPU vs GPU </li></ul><ul><ul><li>e.g. draw calls </li></ul></ul><ul><ul><ul><li>pure CPU cost </li></ul></ul></ul><ul><li>CPU: </li></ul><ul><ul><li>memory vs arithmetic </li></ul></ul><ul><ul><ul><li>memory slower </li></ul></ul></ul><ul><li>GPU: </li></ul><ul><ul><li>vprog vs fshader </li></ul></ul><ul><ul><li>memory vs arithmetic </li></ul></ul>
  29. 29. Think Memory <ul><li>fragmentation </li></ul><ul><li>data organization </li></ul><ul><ul><li>AOS vs SOA </li></ul></ul><ul><ul><li>hot/cold split </li></ul></ul><ul><li>data structures </li></ul><ul><ul><li>linear vs random </li></ul></ul><ul><ul><li>array vs list </li></ul></ul><ul><ul><li>map vs hashtable </li></ul></ul><ul><ul><li>allocators </li></ul></ul>
  30. 30. Think Constraints <ul><li>GPU: will you see the difference? </li></ul><ul><ul><li>really? </li></ul></ul><ul><ul><li>on mobile screen? </li></ul></ul><ul><ul><li>on that one small thingy in the corner? </li></ul></ul><ul><li>CPU: will you need that? </li></ul><ul><ul><li>e.g. physics in casual game? </li></ul></ul><ul><li>Memory: will you need that? </li></ul><ul><ul><li>will you need more then XXX actors? </li></ul></ul>
  31. 31. Measure <ul><li>you didn't optimize anything if you didn't measure difference </li></ul><ul><li>you can't optimize if you don't know what needs to be optimized </li></ul><ul><ul><li>if you can't measure what takes time </li></ul></ul>
  32. 32. Measure Tools <ul><li>there are lots of tools </li></ul><ul><ul><li>instruments (ios) </li></ul></ul><ul><ul><li>perfhud (tegra) </li></ul></ul><ul><ul><li>adreno profiler (qualcomm) </li></ul></ul><ul><ul><li>some more probably </li></ul></ul><ul><li>Poor-man profiler </li></ul><ul><ul><li>timers </li></ul></ul>
  33. 33. Unity use case: random bits <ul><li>Mobile shaders </li></ul><ul><ul><li>specialized of usual built-ins </li></ul></ul><ul><li>Skinning </li></ul><ul><ul><li>full NEON/VFP impl </li></ul></ul><ul><ul><ul><li>usually 10-15% of c-code time </li></ul></ul></ul><ul><ul><ul><ul><li>and we are not done optimizing it ;-) </li></ul></ul></ul></ul><ul><li>Rej's baking material to texture and coming soon BRDF baking to texture </li></ul>
  34. 34. Unity use case: random bits <ul><li>Remote Profiler </li></ul><ul><ul><li>run on target hw, data is transferred over wifi </li></ul></ul><ul><ul><li>collect in Editor and show pretty graphs ;-) </li></ul></ul><ul><li>Sort alpha-test *after* opaque </li></ul><ul><li>check *lots* of extensions </li></ul><ul><li>LODs - almost done </li></ul><ul><li>Vertex Cache optimization - after LODs ;-) </li></ul>
  35. 35. Closing Words <ul><li>Know hardware </li></ul><ul><li>Know data </li></ul><ul><li>Think data </li></ul><ul><li>Think constraints </li></ul><ul><li>Measure always </li></ul><ul><ul><li>You better know earlier </li></ul></ul><ul><li>You should be always optimizing </li></ul>
  36. 36. Questions

×