Advanced Mobile Optimizations


Published on

How take 60fps in games for iPhone

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Advanced Mobile Optimizations

  1. 1. Advanced Mobile Optimizations<br />How to go to 60 fps after you have removed all Sleep calls ;-)<br />
  2. 2. Disclaimer<br />The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity<br />
  3. 3. Optimization Mindset<br />you can't just make your game faster<br />there is no magic bullet<br />very specific stuff<br />not the same as scripting charachter<br />
  4. 4. Optimization Mindset<br />not in specific order<br />know<br />think<br />measure<br />
  5. 5. Optimization Mindset<br />You can't avoid any of that<br />no, really<br />
  6. 6. Optimization Mindset<br />know + think = shoot in the dark<br />you just write code hoping for the best<br />know + measure = shoot in the dark<br />you are missing "understand" part<br />think + measure = shoot in the dark<br />you solve abstract problem, not real<br />
  7. 7. Optimization Mindset: know + think<br />hardware is more complex then you think<br />highly parallel<br />deep pipelining<br />when you write asm - high-level already<br />
  8. 8. Optimization Mindset: know + measure<br />knowledge is static<br />knowledge comes from the past<br />knowledge is general<br />
  9. 9. Optimization Mindset: know + measure<br />qsort vs bubble sort<br />sure, qsort is faster<br />but you are missing the point<br />maybe radix?<br />maybe no need to sort?<br />maybe insertion?<br />parallel sorting network?<br />
  10. 10. Optimization Mindset: think + measure<br />solving abstract problem<br />example: GPU<br />optimizing for RIVA TNT and GTX is different<br />
  11. 11. Optimization Mindset<br />well, if you are missing two from the three<br />no comments<br />
  12. 12. Know<br />your hardware<br />your data<br />knowing data is interleaved with think<br />we will talk more of it in "think"<br />
  13. 13. Know your hardware<br />GPU<br />CPU<br />whatever<br />e.g. disk load speed<br />
  14. 14. Know your hardware: GPU<br />Pipeline<br />meaning - slow step = slow everything<br />you are as slow as your bottleneck<br />Know your pipeline<br />Won't go into full pipeline spec<br />Resources section<br />Just common/biggest problems<br />
  15. 15. Know your hardware: GPU Geometry<br />pre/post tnl cache<br />should use indexed geometry or not<br />cache hit rate <br />strips vs tri list<br />memory throughput<br />vertex size<br />fetch cost (memory)<br />pack attributes or not<br />
  16. 16. Know your hardware: GPU Textures<br />Texture Cache<br />swizzle<br />compression<br />mip-maps<br />Biggest memory hog<br />
  17. 17. Know your hardware: GPU Shaders<br />VertexProgram vs FragmentShader<br />balancing<br />attributes<br />Unified Shaders<br />load balancing<br />Precision<br />gles: highp/mediump/lowp<br />CG: float/half/fixed (iirc)<br />
  18. 18. Know your hardware: GPU Rasterization<br />Fillrate (memory speed)<br />alpha<br />2x2 samples (or more)<br />why GometryLOD matters<br />
  19. 19. Know your hardware: CPU<br />Mobile = in-order RISC<br />for stupid code far worse than CISC <br />2 main issues:<br />Memory speed<br />Computation speed<br />
  20. 20. Know your hardware: CPU Memory<br />This is single most important factor<br />memory access far slower then computation<br />Latency vs Throughput<br />Caches<br />fast memory<br />your best friend<br />L1/L2/whatever<br />LHS<br />
  21. 21. Know your hardware: CPU Computations<br />SIMD<br />better memory usage<br />better arithmetic usage (4 vals instead of 1)<br />
  22. 22. Know your target hardware<br />There were general rules<br />But you are running on that particular piece of sh... hardware<br />
  23. 23. Know your target hardware: PowerVR<br />TBDR<br />perfect hidden surface removal<br />Alpha-Test/discard<br />shader precision <br />unified shaders<br />Tegra / ATI-AMD / Adreno more common <br />
  24. 24. Know your target hardware: ARM<br />VFP = FPU on steroids (not real SIMD)<br />scalar instructions at same speed as vectorized<br />NEON = SIMD<br />more registers<br />awesome load/store instructions<br />not as cool as Altivec but cool enough for mobiles<br />
  25. 25. Know your target hardware: ARM<br />Conditional execution of most instructions<br />Fold shifts and rotates into the "data processing" instructions<br />load structure from array by index<br />Thumb + float = disaster<br />switch back and forth between Thumb mode and regular 32-bit mode <br />
  26. 26. Know your hardware: Resources<br />RTR<br />lots of whitepapers:<br />powerVR (imgtech) tegra (nvidia) adreno (qualcomm)<br />AMD/ATI - basically the same as X360, but much smaller tiles<br />ARM dev center<br />
  27. 27. Think<br />Think about your data<br />Think about your algorithms<br />Think about your constraints<br />Think about your hardware<br />
  28. 28. Think Basics<br />CPU vs GPU<br />e.g. draw calls <br />pure CPU cost<br />CPU:<br />memory vs arithmetic<br />memory slower<br />GPU:<br />vprog vs fshader<br />memory vs arithmetic<br />
  29. 29. Think Memory<br />fragmentation<br />data organization<br />AOS vs SOA <br />hot/cold split<br />data structures<br />linear vs random <br />array vs list <br />map vs hashtable <br />allocators<br />
  30. 30. Think Constraints<br />GPU: will you see the difference?<br />really?<br />on mobile screen?<br />on that one small thingy in the corner?<br />CPU: will you need that?<br />e.g. physics in casual game?<br />Memory: will you need that?<br />will you need more then XXX actors?<br />
  31. 31. Measure<br />you didn't optimize anything if you didn't measure difference<br />you can't optimize if you don't know what needs to be optimized<br />if you can't measure what takes time<br />
  32. 32. Measure Tools<br />there are lots of tools <br />instruments (ios)<br />perfhud (tegra)<br />adreno profiler (qualcomm)<br />some more probably<br />Poor-man profiler<br />timers<br />
  33. 33. Unity use case:random bits<br />Mobile shaders<br />specialized of usual built-ins<br />Skinning<br />full NEON/VFP impl<br />usually 10-15% of c-code time<br />and we are not done optimizing it ;-)<br />Rej's baking material to texture and coming soon BRDF baking to texture<br />
  34. 34. Unity use case:random bits<br />Remote Profiler<br />run on target hw, data is transferred over wifi<br />collect in Editor and show pretty graphs ;-)<br />Sort alpha-test *after* opaque<br />check *lots* of extensions<br />LODs - almost done<br />Vertex Cache optimization - after LODs ;-)<br />
  35. 35. Closing Words <br />Know hardware<br />Know data<br />Think data<br />Think constraints<br />Measure always<br />You better know earlier<br />You should be always optimizing<br />
  36. 36. Questions<br />