Advanced Mobile Optimizations How to go to 60 fps after you have removed all Sleep calls ;-)
Disclaimer The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of  Unity
Optimization Mindset you can't just make your game faster there is no magic bullet very specific stuff not the same as scripting charachter
Optimization Mindset not in specific order know think measure
Optimization Mindset You can't avoid any of that no, really
Optimization Mindset know + think = shoot in the dark you just write code hoping for the best know + measure = shoot in the dark you are missing "understand" part think + measure = shoot in the dark you solve abstract problem, not real
Optimization Mindset:  know + think hardware is more complex then you think highly parallel deep pipelining when you write asm - high-level already
Optimization Mindset:  know + measure knowledge is static knowledge comes from the past knowledge is general
Optimization Mindset:  know + measure qsort vs bubble sort sure, qsort is faster but you are missing the point maybe radix? maybe no need to sort? maybe insertion? parallel sorting network?
Optimization Mindset:  think + measure solving abstract problem example: GPU optimizing for RIVA TNT and GTX is different
Optimization Mindset well, if you are missing two from the three no comments
Know your hardware your data knowing data is interleaved with think we will talk more of it in "think"
Know your hardware GPU CPU whatever e.g. disk load speed
Know your hardware: GPU Pipeline meaning - slow step = slow everything you are as slow as your bottleneck Know your pipeline Won't go into full pipeline spec Resources section Just common/biggest problems
Know your hardware:  GPU Geometry pre/post tnl cache should use indexed geometry or not cache hit rate  strips vs tri list memory throughput vertex size fetch cost (memory) pack attributes or not
Know your hardware:  GPU Textures Texture Cache swizzle compression mip-maps Biggest memory hog
Know your hardware:  GPU Shaders VertexProgram vs FragmentShader balancing attributes Unified Shaders load balancing Precision gles: highp/mediump/lowp CG: float/half/fixed (iirc)
Know your hardware:  GPU Rasterization Fillrate (memory speed) alpha 2x2 samples (or more) why GometryLOD matters
Know your hardware: CPU Mobile = in-order RISC for stupid code far worse than CISC  2 main issues: Memory speed Computation speed
Know your hardware:  CPU Memory This is single most important factor memory access far slower then computation Latency vs Throughput Caches fast memory your best friend L1/L2/whatever LHS
Know your hardware:  CPU Computations SIMD better memory usage better arithmetic usage (4 vals instead of 1)
Know your target hardware There were general rules But you are running on that particular piece of sh... hardware
Know your target hardware: PowerVR TBDR perfect hidden surface removal Alpha-Test/discard shader precision  unified shaders Tegra / ATI-AMD / Adreno more common
Know your target hardware: ARM VFP = FPU on steroids (not real SIMD) scalar instructions at same speed as vectorized NEON = SIMD more registers awesome load/store instructions not as cool as Altivec but cool enough for mobiles
Know your target hardware: ARM Conditional execution of most instructions Fold shifts and rotates into the "data processing" instructions load structure from array by index Thumb + float = disaster switch back and forth between Thumb mode and regular 32-bit mode
Know your hardware: Resources RTR lots of whitepapers: powerVR (imgtech) tegra (nvidia) adreno (qualcomm) AMD/ATI - basically the same as X360, but much smaller tiles ARM dev center
Think Think about your data Think about your algorithms Think about your constraints Think about your hardware
Think Basics CPU vs GPU e.g. draw calls  pure CPU cost CPU: memory vs arithmetic memory slower GPU: vprog vs fshader memory vs arithmetic
Think Memory fragmentation data organization AOS vs SOA  hot/cold split data structures linear vs random  array vs list  map vs hashtable  allocators
Think Constraints GPU: will you see the difference? really? on mobile screen? on that one small thingy in the corner? CPU: will you need that? e.g. physics in casual game? Memory: will you need that? will you need more then XXX actors?
Measure you didn't optimize anything if you didn't measure difference you can't optimize if you don't know what needs to be optimized if you can't measure what takes time
Measure Tools there are lots of tools  instruments (ios) perfhud (tegra) adreno profiler (qualcomm) some more probably Poor-man profiler timers
Unity use case: random bits Mobile shaders specialized of usual built-ins Skinning full NEON/VFP impl usually 10-15% of c-code time and we are not done optimizing it ;-) Rej's baking material to texture and coming soon BRDF baking to texture
Unity use case: random bits Remote Profiler run on target hw, data is transferred over wifi collect in Editor and show pretty graphs ;-) Sort alpha-test *after* opaque check *lots* of extensions LODs - almost done Vertex Cache optimization - after LODs ;-)
Closing Words Know hardware Know data Think data Think constraints Measure always You better know earlier You should be always optimizing
Questions

Advanced Mobile Optimizations.ppt

  • 1.
    Advanced Mobile OptimizationsHow to go to 60 fps after you have removed all Sleep calls ;-)
  • 2.
    Disclaimer The viewsexpressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity
  • 3.
    Optimization Mindset youcan't just make your game faster there is no magic bullet very specific stuff not the same as scripting charachter
  • 4.
    Optimization Mindset notin specific order know think measure
  • 5.
    Optimization Mindset Youcan't avoid any of that no, really
  • 6.
    Optimization Mindset know+ think = shoot in the dark you just write code hoping for the best know + measure = shoot in the dark you are missing "understand" part think + measure = shoot in the dark you solve abstract problem, not real
  • 7.
    Optimization Mindset: know + think hardware is more complex then you think highly parallel deep pipelining when you write asm - high-level already
  • 8.
    Optimization Mindset: know + measure knowledge is static knowledge comes from the past knowledge is general
  • 9.
    Optimization Mindset: know + measure qsort vs bubble sort sure, qsort is faster but you are missing the point maybe radix? maybe no need to sort? maybe insertion? parallel sorting network?
  • 10.
    Optimization Mindset: think + measure solving abstract problem example: GPU optimizing for RIVA TNT and GTX is different
  • 11.
    Optimization Mindset well,if you are missing two from the three no comments
  • 12.
    Know your hardwareyour data knowing data is interleaved with think we will talk more of it in "think"
  • 13.
    Know your hardwareGPU CPU whatever e.g. disk load speed
  • 14.
    Know your hardware:GPU Pipeline meaning - slow step = slow everything you are as slow as your bottleneck Know your pipeline Won't go into full pipeline spec Resources section Just common/biggest problems
  • 15.
    Know your hardware: GPU Geometry pre/post tnl cache should use indexed geometry or not cache hit rate strips vs tri list memory throughput vertex size fetch cost (memory) pack attributes or not
  • 16.
    Know your hardware: GPU Textures Texture Cache swizzle compression mip-maps Biggest memory hog
  • 17.
    Know your hardware: GPU Shaders VertexProgram vs FragmentShader balancing attributes Unified Shaders load balancing Precision gles: highp/mediump/lowp CG: float/half/fixed (iirc)
  • 18.
    Know your hardware: GPU Rasterization Fillrate (memory speed) alpha 2x2 samples (or more) why GometryLOD matters
  • 19.
    Know your hardware:CPU Mobile = in-order RISC for stupid code far worse than CISC 2 main issues: Memory speed Computation speed
  • 20.
    Know your hardware: CPU Memory This is single most important factor memory access far slower then computation Latency vs Throughput Caches fast memory your best friend L1/L2/whatever LHS
  • 21.
    Know your hardware: CPU Computations SIMD better memory usage better arithmetic usage (4 vals instead of 1)
  • 22.
    Know your targethardware There were general rules But you are running on that particular piece of sh... hardware
  • 23.
    Know your targethardware: PowerVR TBDR perfect hidden surface removal Alpha-Test/discard shader precision unified shaders Tegra / ATI-AMD / Adreno more common
  • 24.
    Know your targethardware: ARM VFP = FPU on steroids (not real SIMD) scalar instructions at same speed as vectorized NEON = SIMD more registers awesome load/store instructions not as cool as Altivec but cool enough for mobiles
  • 25.
    Know your targethardware: ARM Conditional execution of most instructions Fold shifts and rotates into the "data processing" instructions load structure from array by index Thumb + float = disaster switch back and forth between Thumb mode and regular 32-bit mode
  • 26.
    Know your hardware:Resources RTR lots of whitepapers: powerVR (imgtech) tegra (nvidia) adreno (qualcomm) AMD/ATI - basically the same as X360, but much smaller tiles ARM dev center
  • 27.
    Think Think aboutyour data Think about your algorithms Think about your constraints Think about your hardware
  • 28.
    Think Basics CPUvs GPU e.g. draw calls pure CPU cost CPU: memory vs arithmetic memory slower GPU: vprog vs fshader memory vs arithmetic
  • 29.
    Think Memory fragmentationdata organization AOS vs SOA hot/cold split data structures linear vs random array vs list map vs hashtable allocators
  • 30.
    Think Constraints GPU:will you see the difference? really? on mobile screen? on that one small thingy in the corner? CPU: will you need that? e.g. physics in casual game? Memory: will you need that? will you need more then XXX actors?
  • 31.
    Measure you didn'toptimize anything if you didn't measure difference you can't optimize if you don't know what needs to be optimized if you can't measure what takes time
  • 32.
    Measure Tools thereare lots of tools instruments (ios) perfhud (tegra) adreno profiler (qualcomm) some more probably Poor-man profiler timers
  • 33.
    Unity use case:random bits Mobile shaders specialized of usual built-ins Skinning full NEON/VFP impl usually 10-15% of c-code time and we are not done optimizing it ;-) Rej's baking material to texture and coming soon BRDF baking to texture
  • 34.
    Unity use case:random bits Remote Profiler run on target hw, data is transferred over wifi collect in Editor and show pretty graphs ;-) Sort alpha-test *after* opaque check *lots* of extensions LODs - almost done Vertex Cache optimization - after LODs ;-)
  • 35.
    Closing Words Knowhardware Know data Think data Think constraints Measure always You better know earlier You should be always optimizing
  • 36.