Advanced Mobile Optimizations.ppt

Advanced Mobile Optimizations How to go to 60 fps after you have removed all Sleep calls ;-)

Disclaimer The views expressed here are my personal views and do not necessarily reflect the thoughts, opinions, intentions, plans or strategies of Unity

Optimization Mindset you can't just make your game faster there is no magic bullet very specific stuff not the same as scripting charachter

Optimization Mindset not in specific order know think measure

Optimization Mindset You can't avoid any of that no, really

Optimization Mindset know + think = shoot in the dark you just write code hoping for the best know + measure = shoot in the dark you are missing "understand" part think + measure = shoot in the dark you solve abstract problem, not real

Optimization Mindset: know + think hardware is more complex then you think highly parallel deep pipelining when you write asm - high-level already

Optimization Mindset: know + measure knowledge is static knowledge comes from the past knowledge is general

Optimization Mindset: know + measure qsort vs bubble sort sure, qsort is faster but you are missing the point maybe radix? maybe no need to sort? maybe insertion? parallel sorting network?

Optimization Mindset: think + measure solving abstract problem example: GPU optimizing for RIVA TNT and GTX is different

Optimization Mindset well, if you are missing two from the three no comments

Know your hardware your data knowing data is interleaved with think we will talk more of it in "think"

Know your hardware GPU CPU whatever e.g. disk load speed

Know your hardware: GPU Pipeline meaning - slow step = slow everything you are as slow as your bottleneck Know your pipeline Won't go into full pipeline spec Resources section Just common/biggest problems

Know your hardware: GPU Geometry pre/post tnl cache should use indexed geometry or not cache hit rate strips vs tri list memory throughput vertex size fetch cost (memory) pack attributes or not

Know your hardware: GPU Textures Texture Cache swizzle compression mip-maps Biggest memory hog

Know your hardware: GPU Shaders VertexProgram vs FragmentShader balancing attributes Unified Shaders load balancing Precision gles: highp/mediump/lowp CG: float/half/fixed (iirc)

Know your hardware: GPU Rasterization Fillrate (memory speed) alpha 2x2 samples (or more) why GometryLOD matters

Know your hardware: CPU Mobile = in-order RISC for stupid code far worse than CISC 2 main issues: Memory speed Computation speed

Know your hardware: CPU Memory This is single most important factor memory access far slower then computation Latency vs Throughput Caches fast memory your best friend L1/L2/whatever LHS

Know your hardware: CPU Computations SIMD better memory usage better arithmetic usage (4 vals instead of 1)

Know your target hardware There were general rules But you are running on that particular piece of sh... hardware

Know your target hardware: PowerVR TBDR perfect hidden surface removal Alpha-Test/discard shader precision unified shaders Tegra / ATI-AMD / Adreno more common

Know your target hardware: ARM VFP = FPU on steroids (not real SIMD) scalar instructions at same speed as vectorized NEON = SIMD more registers awesome load/store instructions not as cool as Altivec but cool enough for mobiles

Know your target hardware: ARM Conditional execution of most instructions Fold shifts and rotates into the "data processing" instructions load structure from array by index Thumb + float = disaster switch back and forth between Thumb mode and regular 32-bit mode

Know your hardware: Resources RTR lots of whitepapers: powerVR (imgtech) tegra (nvidia) adreno (qualcomm) AMD/ATI - basically the same as X360, but much smaller tiles ARM dev center

Think Think about your data Think about your algorithms Think about your constraints Think about your hardware

Think Basics CPU vs GPU e.g. draw calls pure CPU cost CPU: memory vs arithmetic memory slower GPU: vprog vs fshader memory vs arithmetic

Think Memory fragmentation data organization AOS vs SOA hot/cold split data structures linear vs random array vs list map vs hashtable allocators

Think Constraints GPU: will you see the difference? really? on mobile screen? on that one small thingy in the corner? CPU: will you need that? e.g. physics in casual game? Memory: will you need that? will you need more then XXX actors?

Measure you didn't optimize anything if you didn't measure difference you can't optimize if you don't know what needs to be optimized if you can't measure what takes time

Measure Tools there are lots of tools instruments (ios) perfhud (tegra) adreno profiler (qualcomm) some more probably Poor-man profiler timers

Unity use case: random bits Mobile shaders specialized of usual built-ins Skinning full NEON/VFP impl usually 10-15% of c-code time and we are not done optimizing it ;-) Rej's baking material to texture and coming soon BRDF baking to texture

Unity use case: random bits Remote Profiler run on target hw, data is transferred over wifi collect in Editor and show pretty graphs ;-) Sort alpha-test *after* opaque check *lots* of extensions LODs - almost done Vertex Cache optimization - after LODs ;-)

Closing Words Know hardware Know data Think data Think constraints Measure always You better know earlier You should be always optimizing

Advanced Mobile Optimizations.ppt

More Related Content

Similar to Advanced Mobile Optimizations.ppt

More from Транслируем.бел

Recently uploaded

Advanced Mobile Optimizations.ppt