Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tales from the Optimization Trenches - Unite Copenhagen 2019

329 views

Published on

In this talk, you'll learn about the tools and techniques that Unity's Consulting and Development team uses to identify and fix performance issues. The team travels the world visiting customers and conducting Project Reviews, in-depth engagements to locate and resolve performance bottlenecks. This session is designed to help you apply their knowledge to your Unity projects, so you'll see examples of real-life performance problems, their solutions, and receive up-to-date best practice advice.

Speaker: Ignacio Liverotti – Unity

Watch the session on YouTube: https://youtu.be/GuODu4-cXXQ

Published in: Technology
  • Be the first to comment

Tales from the Optimization Trenches - Unite Copenhagen 2019

  1. 1. Tales from the Optimization Trenches Ignacio Liverotti Unity Technologies
  2. 2. About me and what I do here at Unity 3 — Joined Unity as a Software Engineer in 2015 — Became a Developer Relations Engineer in 2018 — I visit our Enterprise customers and help them resolve technical issues affecting their projects
  3. 3. Project Reviews 4 — Multi-day engagement: — We travel to our customers’ offices — Review their projects — Identify problems — Some of them are resolved onsite — We investigate and recommend solutions for the rest
  4. 4. Project Reviews 5 — Types of problems: – Runtime performance – Build/patch size – Load times – Workflow issues – Build times
  5. 5. Today’s plan 6 — Introduction to optimization and profiling in Unity — CPU optimization — GPU optimization — Memory footprint optimization — (Optimization) rules to live by
  6. 6. Introduction to optimization and profiling
  7. 7. What is optimization? 8 — Modifying a project so that an aspect of it is more efficient or uses fewer resources (CPU, GPU, etc)
  8. 8. Why do we want to optimize? 9 — To pass the certification requirements imposed by the various distribution platforms — To reduce battery consumption — To deploy our project to a wider range of target devices — To streamline our production process
  9. 9. Optimization involves a lot more than rewriting code!
  10. 10. What else does it involve? 11 — Reviewing assets (texture size, format, poly count, audio files sample freq, etc.) — Reviewing the Project Settings — Reviewing the assets’ settings — Simplifying our solutions
  11. 11. The first step in our (optimization) journey: profiling! 12 — Using tools to gather actual data on how resources are being used — This data will drive our optimization efforts — We don’t want to optimize based on “guesses” — We want the tools to let us know where the problems are
  12. 12. A note about optimization ‘tips’ and ‘advice’ 13 — Don’t apply optimization advice blindly – Certain pieces of advice are always applicable – But good advice applied in the wrong situation can make things worse – A technique that worked in a certain project and platform might not work for you
  13. 13. CPU optimization
  14. 14. What is the goal of CPU optimization? 15 — To reduce the stress on the CPU – Because the CPU is the actual performance bottleneck – Or to free the CPU so that we can do more
  15. 15. How do we achieve that? 16 — By using more efficient combinations of algorithms and data structures — By aiming for nearly zero per-frame allocations – The GC algorithm can be quite CPU intensive!
  16. 16. Tools of the trade: CPU 17 — Unity Profiler — Unity Profile Analyzer – Don’t miss the next talk! — Xcode Instruments — Intel VTune Amplifier — Consoles have their own proprietary tools
  17. 17. Tools of the trade: CPU 18
  18. 18. Tools of the trade: CPU 19
  19. 19. Example 1: Per-frame memory allocations 20 — Scenario: Management game for mobile platforms that ‘feels’ slow during gameplay — We take a capture using the Unity Profiler and uncheck all items, except for ‘GarbageCollector’:
  20. 20. The Garbage Collector is running once every three frames
  21. 21. Example 1: Per-frame memory allocations 22 — Our hypothesis: Our code is allocating memory on a per-frame basis — Let’s select a random frame in the Unity Profiler:
  22. 22. Example 1: Per-frame memory allocations 23 — What about 10 frames later?
  23. 23. 400 KB of allocations per frame @60 FPS ~= 24 MB of allocations per second
  24. 24. Example 1: Per-frame memory allocations 25 — Let’s dig into the ‘GC Alloc’ column:
  25. 25. Example 1: Per-frame memory allocations 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 void Update() { ProcessScore(); ProcessHitPoints(); // More game logic methods. } private void ProcessScore() { Debug.Log("GameLogic.ProcessScore(). Score: " + Score.ToString("00000")); // Score processing logic. } private void ProcessHitPoints() { Debug.Log("GameLogic.ProcessHitPoints(). HitPoints: " + HitPoints.ToString("00000")); // Hit points processing logic. } 26
  26. 26. Example 1: Per-frame memory allocations 27 — From https://docs.unity3d.com/Manual/PlatformDependentCompilation.html using System.Diagnostics; public static class Logging { [Conditional ("ENABLE_LOG")] static public void Log (object message) { UnityEngine.Debug.Log (message); } } 1 2 3 4 5 6 7 8 9 10
  27. 27. Example 1: Per-frame memory allocations 28 — If we want to see the log messages, we need to add ENABLE_LOG to the list of defined symbols:
  28. 28. Example 1: Per-frame memory allocations 29 — Let’s remove ENABLE_LOG and reprofile:
  29. 29. Example 1: Per-frame memory allocations 30 — Zero per-frame allocations — No GC spikes! 
  30. 30. Takeaway: use the Unity Profiler to understand where your managed allocations are coming from and fix them
  31. 31. Example 2: GC spikes in a fast-paced game 32 — Scenario: Mobile racing game where the frame rate needs to be steady — Common approach: let the GC do its work – The problem it causes: When the GC kicks in, program execution actually stops – Also, the larger the managed heap, the longer it takes for the GC algorithm to complete
  32. 32. Example 2: GC spikes in a fast-paced game 33 — GC capture:
  33. 33. Example 2: GC spikes in a fast-paced game 34 — What we recommend: – Unload all resources when transitioning from the menu to the ‘racing’ scene – Allocate a pool of objects – Optimize the frame time as much as possible – Enable the incremental garbage collector
  34. 34. Example 2: GC spikes in a fast-paced game 35 — The incremental GC was introduced in the early 2019 development cycle — Instead of causing a single, long interruption, it splits the work across multiple slices
  35. 35. Example 2: GC spikes in a fast-paced game 36 — Incremental GC capture:
  36. 36. Example 2: GC spikes in a fast-paced game 37 — Enable it via the Player Settings menu:
  37. 37. Takeaway: use the incremental GC and remember to optimize the frame time as much as possible so that we can give it room to do its job
  38. 38. GPU optimization
  39. 39. What are the goals of GPU optimization? 40 — Reduce the stress on the GPU so that we can render our scene at the target frame rate — Free up the GPU for performing other tasks (including offloading work from the CPU via compute shaders)
  40. 40. How do we achieve that? 41 — Minimizing the number of unnecessary rendering operations — Reducing the amount of data sent to the GPU — Minimizing the number of state changes (‘set pass’ calls) — Optimizing our most expensive shaders
  41. 41. Tools of the trade: GPU 42 — Unity Frame Debugger — RenderDoc — NVidia NSight — XCode Frame Capture — Intel GPA — Consoles have their own proprietary tools
  42. 42. Tools of the trade: GPU 43
  43. 43. Example 3: Strategy game for mobile 44 — Scenario: A customer working on strategy game for iOS/Android were experiencing framerate issues — We profiled it using Xcode Frame Capture and saw a warning message saying that we were sending too much geometry to the GPU
  44. 44. Example 3: Strategy game for mobile 45
  45. 45. Example 3: Strategy game for mobile 46
  46. 46. Example 3: Strategy game for mobile 47 — Both draw calls have the same geometry as input:
  47. 47. — But the output of one of them is taking significantly more screen real state in the final frame than the other one! Example 3: Strategy game for mobile 48
  48. 48. Example 3: Strategy game for mobile 49
  49. 49. Do we need that much geometry for the model in the background? We probably don’t.
  50. 50. Example 3: Strategy game for mobile 51 — Our advice to the team: create LODs for the assets
  51. 51. Example 3: Strategy game for mobile 52
  52. 52. 53 34.4K With LODs ~38% Tris reduction 55.1K Without LODs
  53. 53. Takeaway: by understanding our requirements and the tools and techniques at our disposal, we’ve achieved a nearly 40% reduction without observable differences in the final output
  54. 54. Example 4: Sprite rendering 55 — Scenario: A customer working on a top down tile-based strategy game for PC experienced very low frame rates when deploying to mobile and WebGL
  55. 55. Example 4: Sprite rendering 56 — We profiled the game using the Unity Profiler and saw high frame times — Most of that time was spent on rendering — The CPU was too busy creating and sending rendering commands to the GPU
  56. 56. Example 4: Sprite rendering 57 — The Frame Debugger revealed one draw call per tile (several hundred draw calls in the real project!)
  57. 57. Example 4: Sprite rendering 58
  58. 58. Example 4: Sprite rendering 59
  59. 59. Example 4: Sprite rendering 60 — Let’s look at the SpriteRenderer components: – They all share the same material!
  60. 60. Let’s look at the code…
  61. 61. Example 4: Sprite rendering 1 2 3 4 5 6 7 8 9 10 11 12 using UnityEngine; public class Tile : MonoBehaviour { private Material _spriteRendererMaterial; void PreprocessMethod() { var spriteRenderer = GetComponent<SpriteRenderer>(); _spriteRendererMaterial = spriteRenderer.material; } } 62 This statement creates a copy of the first material from this SpriteRenderer, assigns it to the SpriteRenderer and returns it.
  62. 62. Example 4: Sprite rendering 1 2 3 4 5 6 7 8 9 10 11 12 using UnityEngine; public class Tile : MonoBehaviour { private Material _spriteRendererMaterial; void PreprocessMethod() { var spriteRenderer = GetComponent<SpriteRenderer>(); _spriteRendererMaterial = spriteRenderer.sharedMaterial; } } 63
  63. 63. Example 4: Sprite rendering 64 — All tiles are now drawn in the same batch — The game was successfully deployed to mobile and WebGL  — And the performance of the standalone version improved as well!
  64. 64. Takeaway: identifying the problem and understanding the underlying issue allowed us to trim several hundred draw calls per frame
  65. 65. Memory footprint optimization
  66. 66. — Fit on devices that don’t have a large amount of memory — Improve loading times — Being able to add more content — Avoid hard-crashes due to out of memory situations — Improving overall performance by shuffling around less data during runtime What are the goals of memory footprint reduction? 67
  67. 67. Tools of the trade: memory footprint — Unity Memory Profiler — Xcode Instruments Allocations — Xcode Instruments VM Tracker — Consoles have their own proprietary tools 68
  68. 68. Tools of the trade: memory footprint 69
  69. 69. Tools of the trade: memory footprint 70
  70. 70. Example 5: Built-in shaders duplication 71 — A customer project has a large number of materials that use Unity’s Standard shader:
  71. 71. Example 5: Built-in shaders duplication 72 — Each Material is stored in its own AssetBundle:
  72. 72. Example 5: Built-in shaders duplication 73 — A memory snapshot of the project reveals that there are multiple instances of the Standard shader in memory:
  73. 73. Example 5: Built-in shaders duplication 74 — This happens because the Standard shader is one of Unity’s built-in shaders — As such, it cannot be explicitly included in an AssetBundle — And it will be implicitly included in every AssetBundle that has a material with a reference to it
  74. 74. Example 5: Built-in shaders duplication 75 — What we recommend instead: – Download a copy of the built-in shaders – Make a copy of the Standard Shader, rename it (e.g., ‘Unite 2019 Standard’) and add it to its own AssetBundle – Fix the materials so that they use the new renamed shader – Rebuild the AssetBundles
  75. 75. Example 5: Built-in shaders duplication 76 — After rebuilding and taking a new memory snapshot: – We now have a single instance of our custom ‘Unite 2019 Standard’ shader 
  76. 76. Takeaway: by using the right tools and understanding the internals of the engine, we were able to eliminate duplicates in memory
  77. 77. 78 — Scenario: A customer is porting a desktop game to mobile platforms and it keeps crashing on low-end devices due to its high memory footprint Example 6: Unable to run the game on mobile
  78. 78. Example 6: Unable to run the game on mobile 79 — A memory snapshot of the project reveals this:
  79. 79. 80 Example 6: Unable to run the game on mobile
  80. 80. 81 — Let’s look at the settings for these assets: Example 6: Unable to run the game on mobile
  81. 81. 82 — ‘Decompress on load’ option description from the Unity manual: – Audio files will be decompressed as soon as they are loaded – This option should be used for smaller compressed sounds to avoid the performance overhead of decompressing on the fly – Be aware that decompressing Vorbis-encoded sounds on load will use about ten times more memory than keeping them compressed, so don’t use this option for large files Example 6: Unable to run the game on mobile
  82. 82. Do we have other options? We do! There’s a ‘Streaming’ option
  83. 83. 84 — Description from the Unity manual: – Decode audio on the fly – Uses a minimal amount of memory to buffer compressed data – The data is incrementally read from the disk and decoded on the fly Example 6: Unable to run the game on mobile
  84. 84. 85 — Let’s change the load type to ‘Streaming’: Example 6: Unable to run the game on mobile
  85. 85. 86 — And take another memory snapshot: Example 6: Unable to run the game on mobile
  86. 86. 87 0.5MB Set to ‘Streaming’ ~99% Memory footprint reduction 53MB Set to ‘Decompress on load’
  87. 87. Takeaway: understanding our requirements and using the correct settings allowed us to reduce the audio memory footprint by ~99%
  88. 88. 89 — These problems can be avoided! — Let’s not catch them via the Memory Profiler — Instead, let’s use an AssetPostprocessor and create rules: – Background music assets should be set to streaming – SFX should be set to decompress on load – Etc Example 6: Unable to run the game on mobile
  89. 89. Bonus tip: Snapshot diffs 90
  90. 90. (Optimization) rules to live by
  91. 91. (Optimization) rules to live by 92 — Don’t assume where the bottlenecks are, always profile first — Profile on the target device — Profile early, profile often — Don’t apply several fixes simultaneously, tackle one problem at the time — Apply the ‘optimization triad’: – Optimize your assets – Update fewer things – Draw less stuff 
  92. 92. Thank you

×