Hi, My name is Kim Steen Riber, and will talk about memory profiling in Unity
I work at unity as a core developer. Before joining Unity I worked on a number of games:At unity I focus primarely on Performace and memory optimizations and making tools to give you the ability to make your games as performant as posibleAnd I like my LEGOs
There are several things you want to adress when optimizing your game. The obvious one ofcourse is Frames per second. This is where the user gets the smooth experience of a game that runs 30 or 60 fps. There are two limiting factors to this, namely The time the game takes on the cpu: that would be things like your gamecode, physics, skinning and other tasks that run on the cpu side. Then there is the gpu side, where things like drawcalls, shader complexity and fillrate matter. Your game will never run faster than the limiting of the 2.One other thing that can degrade the user experience is if the game has a lot of hickups. This will make the game feel stuttering. Things that can cause hickups are long running tasks like garbage collect or physics world rebuilding if for example a static collider is moved.Also memory is an imprtant thing to keep in mind when developing your game. There are 2 main reasons for this: Keeping the runtime memory low, to avoid out of memory – specially on mobile devices, and keeping the memory activity low, to avoid the garbage collecter to kick in
In unity pro we have a tool for this: the unity profiler. This has several focus areas, but I will only talk about the cpu and memory profiler.In unity 4.1 and 4.2 we have added exended functionality to get a more detailed view of your memory usage
The Cpu profiler will let you see the time consumption for each frame. This is displayed in a hierarchy view and can be sorted with the most timeconsuming methods at the top. This will allow you to pinpoint where you should focus your efforts when optimizing. Ther is also a column that tells how much mono memory is allocated by each entry in the profiler. I’ll get to that later.The profiler has the ability to connect to a running player on a device or on your computer. This allows you to profile the actial running game on for example an iphone or in the webplayer. This is done using the dropdown at the top ‘Active Profiler’In the editor you also have the ability to turn on deep profiling. This will instrument all calls in managed code, and will give you are more detailed callgraph. This however has a large performance overhead, so is only feasible on small scenes
The GPU profiler will measure the time taken by individual drawcalls, and lets you drill down into the hierarchy to find the most timeconsuming calls. As with CPU profiling it is best to record the profile when running on device, since the overhead caused by the editor can be significant.
In Unity 4.1 we have made some improvements to the memory profiler which lets you inspect memory at a much finer level than before. There is a simple view and a detailed view. The detailed view is expensive to calculate, so this is implementet as the ability to take a snapshot of the curent state.The detail view will show a list of UnityObjects like Assets and Scene objects, and will also have a calculation of some core areas of unity, like webstreams, mono managed heap, shaderlab and many more.This view will allow you to find memory consuming objects like an uncompressed or excesively large texture acidently added to your game, and focus your efforts on reducing these.In Unity 4.2 we have also implemented a reference view, which allows you to see where a given asset or gameobject is referenced from. This can help you in finding why an asset is not unloaded, by pinpointing where a script is holding a reference to it.
The simple view in the memoryprofiler shows some key numbers in the used memory.It has the Used size and the reserved size of varrious parts of the engine. Both mono and Unity reserves blocks of memory ahead of time and then allocates the requested bytes from these.The GfxDriver usage is based on an estimate calculated from when textures and meshes are uploaded to the driver.
As you could see on the simple memory profiler, The memory in unity is split up in managed and native memory.The managed memory is what is used from Mono, and includes what you allocate in scripts and the wrappers for unity objects.This memory is garbage collected by monoThe native memory is what Unity is allocating to hold everything in the engine. This includes Asset data like tex, mesh, audio, animation. It includes gameobjects and components, and then it covers engine internals like redering , culling, shaderlab, particles, webstreams, files, physics, etc….This memory is not garbage collected, and sometimes needs the UnloadUnusedAssets method call to be freed.
The managed memory used by mono reserves a block of system memory to use for the allocations requested from your scripts. When the heapspace is exhausted, mono will run the garbage collecter to reclaim memory that is no longer being referenced.If more memory is needed mono will allocate more heap space. This will grow to fit the peak memory usage of your application. The garbage collection can be a timeconsuming operation specially on large games. This is one of the reasons to keep the memory activity low.Another reason is that with high memory activity, memory is likely to fragment and the small memory fragments will be unusable and cause memory to growWhen the memory is being freed, mono can in some cases give the heap space back to the system, although this is less likely if memory is fragmentet
So the basics of fragmentation, is that, if there is a lot of allocations where some are retained and some are deleted, memory will get filled with small holes. This will look like free memory, but when trying to make a larger allocation, there is not enough contiguous memory to service that request. This means that mono will have to allocate more heap blocks even though it seams like there is enough free space.
Even though scripting happens i a managed language like C#, there are a number of ways to control the memory and reduce the amount of allocations made by your code.If there are routines in your code that needs some temporary buffers to process data, it is worth considdering to keep these buffers around instead of reallocating them every time the routine is run.Another thing to considder is creating object pools if temporary objects are shortlived and needed often. An example of that could be bullets that are short lived. Having a pool of bullets and then grabbing one when needed and putting it back when done, it will save both the object instantiation and will reduce the allocation activity.A Third thing to considder is using structs instead of classes. When using structs the data is allocated on the stack instead of the heap.Also. Don’t use OnGUI since the GUIUtility will kick in.These are all things that will reduce the memory activity, and thereby reduce the need for garbage collections and fragmentation
This is a small example where on the left side we have a script that needs a databuffer to process some data. It is written without regards to memory usage, as you can see it uses a class for the workdata, and the workbuffer in here is allocated every time the function runs.On the right side, is a version where the Workdata is a struct, and the workbuffer is placed on the class in order to only allocate it once and then reuse the same buffer.On the CPU profiler you can see how the Mono memory usage is excesively high in the allocating script and how a lot of time is used to allocate the workbuffer.In the non allocating script, no memory is allocated, and the script takes no time – like it should since there is not real work done.
Some objects that you can allocate in scripts have large native memory footprints in unity. An example of that is the WWW class which in mono is only a small wrapper, but in Unity it contains some very large allocations. This backing memory is not cleaned up until the finalizers are run, or Dispose is called manually. If this is left to the garbage collector and the finalizers, this memory can potentially live long after the reference is removed from mono.
So the way the garbage collector works is shown here.A reference is removed from mono by setting the reference to null or the object going out of scope. After a while the garbage collector will run, either because the heapspace gets exhausted, or because we manually call GC.Collect()When the memory is collected by the garbagecollector, it is put on a queue for the finalizer. This is handled by another thread, and eventually the finalizer will call Dispose on the object. At this point the unity native memory will be deallocated.To skip this roundtrip, you should call Dispose manually on the object, and then the object will not be given to the finalizers and the memory will be cleaned up imediately. I have a small example that shows this
Garbage collect / Dispose DEMODEMO:Load www, set to null and notice a quick releaseIncrease heap and do the same. Notice memory will not decrease.Use detailed view to show that webstream is still around Do GC.Collect and see memory dropLoad www and use dispose. See that memory drops
When using asset bundles, it is important to know what memory unity has allocated at various stages.For the webstream data there are several buffers depending on the settings of the stream. If it is a compressed stream loading from disk, it will have these buffers: The compressed file, some decompression buffers used by lzma, and the final uncompressed data.The assetbundle created from this www object, allocates a map of objects in the assetbundle and offsets in the decompressed data.When loading objects from the assetbundle, these objects will be created in memory from the decompressed data
When you have your assetbundle url or file, you can load this file asynchroniously by calling new WWW(). This will load the compressed file into memory and start the decompression. The decompression algorithm will need some buffers and these will require a buffer of 8MB. The decompression will construct the unpacked data and after it is done it will deallocate its decompression buffers.To avoid that the 8MB decompression buffer is reallocated for every file being loaded, we keep one buffer in memory that we reuse.Keep in mind that several parallel loads will require more of these buffers to be allocated simultaneously and will result in a large memory spike. To avoid this, load one file at a time instead of starting multiple www requestsFor Assetbundles created with Unity 4.2 we have changed the compression a bit, which has reduced the decompression buffer requirements to half a megabyte instead.
After the file is loaded, you construct the assetbundle from the file by calling www.assetBundle.This will load the map of objects and what offsets they have into the decompressed data. The assetbundle retains a pointer to the WebStream. This will prevent the data from being unloaded even if Dispose is called on the www object.
When objects are loaded from the assetbundle, the data from the webstream is used to construct the requested objects. The assetbundle will hold references to these object.
To loadobjects from the assetbundle, use the Load, LoadAll or mainAsset methods.In the example I will show later, I will load a texture. This will instantiate the texture from the assetbundle data. Then the texture is uploaded to the gpu, and on player builds the main data is then deallocated.On platforms that use our multithreaded renderer the texture is trasfered to the renderthread and the transfer buffer will grow to fit the texture data – I will show that later
When all objects you need have been loaded from the file, you can get rid of the webstream data and the asset bundle.The www object should be deleted by calling Dispose, and not left for the garbage collector to clean up.As long as the Assetbundle is still around, the webstream will not be unloaded, so the assetbundle needs to be unloaded as well, in order to release its handle to the webstream.Unloading the assetbundle can be done by calling unload with either true or false.Calling it with false will release the webstream and delete the map in the assetbundleIf you instead call unload (true) the assetbundle will travers its list of loaded objects and destroy these as well. If there are still references to these objects in scripts, these will be null after this.
When you are done using your assets, you can unload them by still having the assetbundle and calling unload(true) or you can call UnloadUnusedAssets and all assets that are no longer being referenced will be removed from memoryIf an object is not unloaded when calling unload unused assets, it is because it is still being referenced. In the profiler in unity 4.1 you will be able to see if the object is referenced from script or if it for example is marked as HideAndDontSaveIn unity 4.2 we have added additional functionality, so you can see what other objects are holding a references to your object. This should help you to find the references and unload the object
So the most memory efficiency way of loading assetbundles is, If posible, to load just one assetbundle at a time, load what is needed and destroy the assetbundle again. Here is a small example that loads a texture from an assetbundle and when this coroutine exits, both webstream and assetbundle is unloaded and there is no other memory left except the instantiated texture.To, later on get rid of this texture, set the reference to null and call UnloadUnusedAssets.
I have made a small example that shows the assetbundle loading and unloading, and how you can use the memory profiler to see what is happening and what memory is allocated. This is demoed on Unity 4.2 as this has some extra features, that I would also like to show you.DEMO:Load new(unity4.2) www. Notice mem increases. Dispose removes all memory againLoad old(pre 4.2) www. After dispose, there is +8MB left. Use detail view to investigate, and sow the cached buffer.Load www, ab and texture. Detail view – show webstream, assetbundle and texture. Notice the gfxClientBuffer.Dispose www -> nothing happens, because ab has retained handle.Unlead (true). See mem falls to 40MB -> gfxClient and cached decompression bufferLoad all. Unload (false) and Dispose().See detail ->Tex referenced by scriptSet tex= null see that nothing happens. See tex still in detail view, but not referencedCall unloadUnusedAssets. See gfx memory go down and tex is gone from detailed
[UniteKorea2013] Memory profiling in Unity
Memory profiling in UnityKim Steen Riber (firstname.lastname@example.org)
PageWho am I?• Core developer @ Unity• Released games• LIMBO• Watchmen• Total overdose• Unity focus areas• CPU performance• Memory optimizations09-05-2013 2
PageOptimizing your game• FPS• CPU usage(Gamecode, Physics, Skinning, Particles, …)• GPU usage (Drawcalls, Shader usage, Imageeffects, …)• Hickups• Spikes in framerate caused by heavy tasks (e.g.GC.Collect)• Physics world rebuild due to moved static colliders• Memory• Maintaining small runtime memory on device• Avoid GC Hickups by reducing memory activity• Leak detection09-05-2013 3
PageCPU Profiler• Cpu time consumption for methods• Mono memory activity• Remote Profiling of you game running on target device• Deep profile (editor only)• Detailed view, but large overhead, only usable for very smallscenes09-05-2013 5
PageMemory Profiler• Simple and Detailed memory view (Unity 4.1)• Detail view is taken as a snapshot (too expensive for perframe)• Reference view (Unity 4.2)• See where an object is referenced – Good for leak hunting09-05-2013 7
PageMemory Profiler09-05-2013 8• Simple memory view• Unity reserves chunks of memory from the os• Mono allocates from a reserved heap• GfxMemory is an estimate
PageMono vs. Unity Memory• Managed - Mono• Script objects• Wrappers for unityobjects• Game objects• Assets• Components• …• Memory is garbagecollected09-05-2013 9• Native - Unity internal• Asset data• Textures• Meshes• Audio• Animation• Game objects• Engine internals• Rendering• Particles• Webstreams• Physics• …..
PageMono Memory Internals• Allocates system heap blocks for allocations• Garbage collector cleans up• Will allocate new heap blocks when needed• Fragmentation can cause new heap blocks eventhough memory is not exhausted• Heap blocks are kept in Mono for later use• Memory can be given back to the system after a while09-05-2013 10
PageFragmentation• Memory will get fragmentet if there is a lot ofactivity09-05-2013 11Mono: 16K System:64KMono: 32K System:64KMono: 32K System:64KMono: 48K System:128K
PageAvoiding Allocations• Reuse temporary buffers• If buffers for data processing are needed everyframe, allocate the buffer once and reuse• Allocate pools of reusable objects• Create freelists for objects that are needed often• Use structs instead of classes• Structs are placed on the stack, while classes uses theheap• Don’t use OnGUI• Even empty OnGUI calls are very memory intensive09-05-2013 12
PageAvoiding Allocations• Use the CPU profiler to identify mono allocations09-05-2013 13
PageUnity Object wrapper• Some Objects used in scripts have large nativebacking memory in unity• Memory not freed until Finalizers have run09-05-2013 14WWWDecompression bufferCompressed fileDecompressed fileManaged Native
PageMono Garbage Collection• Object goes out of scope• GC.Collect runs when• Mono exhausts the heap space• Or user calls System.GC.Collect()• Finalizers• Run on a separate thread• Unity native memory• Dispose() cleans up internalmemory• Eventually called from finalizer• Manually call Dispose() to cleanup09-05-2013 15Main thread Finalizer threadwww = null;new(someclass);//no more heap-> GC.Collect();www.Dispose();.....
PageGarbage collect and DisposeDemo09-05-2013 16
PageAssetbundle memory usage• WebStream• Compressed file• Decompression buffers• Uncompressed file• Assetbundle• Map of objects and offsets in WebStream• Instantiated objects• Textures, Meshes, etc.09-05-2013 17
PageAssetbundle memory usage• WWW www = new WWW(assetbundle_url);• Loads the compressed file into memory• Constructs decompression buffers (8MB per file)• Decompresses the file into memory• Deallocates the decompression buffers• One decompression buffer of 8MB is reused andnever deallocated09-05-2013 18
PageAssetbundle memory usage• AssetBundle ab = www.assetBundle;• Loads the map of where objects are in the webstream• Retains the www WebStream09-05-2013 19WebStreamTexTexMeshAssetBundle
PageAssetbundle memory usage• AssetBundle ab = www.assetBundle;• Loads the map of where objects are in the webstream• Retains the www WebStream09-05-2013 20WebStreamTexTexMeshAssetBundleLoaded objects
PageAssetbundle memory usage• Texture2D tex =ab.Load(“MyTex", typeof(Texture2D));• Instantiates a texture from the assetbundle• Uploads the texture to the GPU• On editor a system copy is retained (2x memory)• Transferbuffer for the RenderThread will grow to fit thelargest Texture or Vertexbuffer (never shrinks)09-05-2013 21
PageAssetbundle memory usage• Deleting the WebStream• www.Dispose(); // will count down the retain count• If not called, finalizers will clean the memory eventually• Deleting the Assetbundle• ab.Unload(false);• Unloads the map and counts down the www retain count• ab.Unload(true);• Will also force unload all assets created from theassetbundle09-05-2013 22
PageAssetbundle memory usage• Removing the loaded objects from memory• ab.Unload(true)• Force unload objects loaded by assetbundle• UnloadUnusedAssets()• Unloads when there are no more references to the object• Use the memory profiler to find remainingreferences• In Unity 4.1 a reason why a Object is given• In Unity 4.2 specific scripts or Objects referencing agiven object are shown09-05-2013 23
PageAssetbundle memory usage• For best memory efficiency do one assetbundle ata time• If more assetbundles are needed at the sametime, load one www object into memory at a time, toreduce decompression buffer usage09-05-2013 24
PageConclusions• Avoid memory activity• Use the memory profiler to monitor memory usage• Load one WWW object at a time• Use Dispose() on objects that derive from IDisposable• Specially if they have a large native memory footprint• Use UnloadUnusedAssets() to clean up assets09-05-2013 26