SlideShare a Scribd company logo
1 of 45
Confidential © 2018 Arm Limited
Owen Wu (owen.wu@arm.com)
Developer Relations Engineer
Mobile Graphics
Optimization
Guides
Confidential © 2018 Arm Limited
Who We Are
Arm Developer Relations Introduction
3 Confidential © 2018 Arm Limited
Who We Are
• Arm provides CPU & GPU IPs to Silicon Vendor
• We don’t make chips
• Clients including Apple, Samsung, Nintendo, Qualcomm, MediaTek, etc.
• Softbank acquired us in 2016
• >95% market share of smartphone CPU
• >1 Billion GPU shipped in 2018
• We help developers to make better game
4 Confidential © 2018 Arm Limited
What We Can Help
• Developer education
• Game issue investigation
• Game performance analysis
• Game performance optimization
• Deep collaboration for Mali GPU
• Game promotion on Arm/Partner global events
• Development devices (in the future)
Confidential © 2018 Arm Limited
Optimization
Guides
6 Confidential © 2018 Arm Limited
Why Optimize
with reliable performance, smooth
gameplay, and long battery life
Retain users
by ensuring good user experience on a
wide range of consumer devices
Widen market
by maximizing rendering effectiveness inside
a 2.5 Watt system-wide power budget
Enhance visuals
7 Confidential © 2018 Arm Limited
Best Parctices of Optimization
• Write code with optimization in mind
• Hardware knowledge
• Check performance regression everyday
• Arm Performance Advisor
• Don’t do deep optimization too early
• Arm Mobile Studio
8 Confidential © 2018 Arm Limited
How To Do Optimization
• Knowledge of hardware
• Gather reliable and stable data
• Identify bottleneck first
• Find out why bottleneck happened
• Figure out the solution
• Verify the solution
• Optimize one thing at one time
Confidential © 2018 Arm Limited
Basic Hardware
Concepts
10 Confidential © 2018 Arm Limited
Pipeline Stages
Application
Graphics
Driver
Vertex
Shader
Primitive
Assembler
Rasterizer
Early Frag
Op
Fragment
Shader
Late Frag OpBlending
Color
Output
Depth
Output
Stencil
Output
11 Confidential © 2018 Arm Limited
GPU Is Multi-threading
• Traditional API is single threaded
• Driver hides the complexity
• CPU and GPU is running parallelly
• Every tasks in GPU are running parallelly too
• Optimization goal – keep GPU as busy as possible
12 Confidential © 2018 Arm Limited
GPU
Desktop GPU – Immidate Rendering Mode
Vertex Shader
Fragment
Shader
Frame Buffer 0Frame Buffer 1
Application Draw Calls
13 Confidential © 2018 Arm Limited
Mobile GPU – Tiled Rendering Mode
GPU
Vertex Shader
Fragment
Shader
Frame Buffer 0Frame Buffer 1
Application Draw Calls
14 Confidential © 2018 Arm Limited
External Memory / Memory Bandwidth
• Mobile’s external memory bandwidth is much smaller than
desktop’s
• Read or write to external memory is very power consuming
• Tile based rendering mode reduce the memory bandwidth
requirement
• GPU computing power advances faster than memory bandwidth
• Usually 2GB/s is recommended
GPU
Tile
Memory
External memory
Slow Path
15 Confidential © 2018 Arm Limited
Cache
• Keep data in fast memory
• Cache size is small
• Keep data small so cache can keep more data
• Sequential access can increase cache hit rate
16 Confidential © 2018 Arm Limited
Pixel/Fragment/Texel
• Pixel is a single position data on frame buffer
• Fragment is a thread in GPU which will output a pixel
• Texel is a single position data on texture
17 Confidential © 2018 Arm Limited
Texture Compression
• ASTC may get better quality with same memory size as ETC.
• Or same quality with less memory size than ETC.
• ASTC takes longer to encode compared to ETC and might make the game packaging
process take longer time. Due to this, it is better to use it on final packaging of the
game.
• ASTC allows more control in terms of quality by allowing to set block size. There are no
single best default in block size, but generally setting it to 5x5 or 6x6 is good default.
18 Confidential © 2018 Arm Limited
Texture Color Space
• Use linear color space rendering if using lighting
• Check sRGB in texture inspector window
• Textures that are not processed as color should NOT
be used in sRGB color space (such as metallic,
roughness, normal map, etc).
• Current hardware supports sRGB format and
hardware will do Gamma correction automatically for
free
19 Confidential © 2018 Arm Limited
Texture Filtering
• Trilinear - Like Bilinear but with added blur
between mipmap level
• Don’t use trilinear with no mipmap
• This filtering will remove noticeable change
between mipmap by adding smooth transition.
20 Confidential © 2018 Arm Limited
Texture Filtering
• Anisotropic - Make textures look better when
viewed from different angle, which is good for
ground level textures
• Higher anisotropic level cost higher
21 Confidential © 2018 Arm Limited
Texture Filtering
• Use bilinear for balance between performance and visual quality
• Trilinear will cost more memory bandwidth than bilinear and needs to be used
selectively
• Bilinear + 2x Anisotropic most of the time will look and perform better than Trilinear +
1x Anisotropic, so this combination can be better solution rather than using Trilinear.
• Keep the anisotropic level low. Using a level higher than 2 should be done very
selectively for critical game assets.
• This is because higher anisotropic level will cost a lot more bandwidth and will affect device battery life.
Confidential © 2018 Arm Limited
Best Optimization
Practices of Mali
GPU
23 Confidential © 2018 Arm Limited
Reduce Render State Switch
• Render state switch is very expensive operation
• Rendering as many primitives as possible in one draw call
• Don’t just check number of draw calls or batches
• Number of render state switch is also an important index
• Using Tris/SetPass (i.e. 95.2K/34) is more accurate
• Batch as many draw call as possible
• Static batch
• GPU Instancing
• Dynamic batch
24 Confidential © 2018 Arm Limited
Reduce Frame Buffer Switch
• Bind each frame buffer object only once
• Making all required draw calls before switching to the
next
• Avoid unnecessary context switch
• Use Unity frame debugger to check
• Use Arm Mobile Studio to do API level check
25 Confidential © 2018 Arm Limited
Clear Frame Buffer Before Rendering
• Before rendering, GPU will read frame buffer into tile
memory from external memory
• Minimizing start of tile loads
• Can cheaply initialize the tile memory to a clear color value
• Ensure that you clear or invalidate all of your attachments at
the start of each render pass
• Use Unity frame debugger to check
• Use Arm Mobile Studio to do API level check
No clear before
switching render target.
Bad for performance.
26 Confidential © 2018 Arm Limited
Reduce Frame Buffer Write
• After rendering, GPU will write result to external
memory from tile memory
• Minimizing end of tile stores
• Avoid writing back to external memory whenever is
possible
• Disable writing to depth/stencil buffer if
depth/stencil value is not used
• Use Unity frame debugger to check
• Use Arm Mobile Studio to do API level check
27 Confidential © 2018 Arm Limited
Avoid Rendering Small Triangles
• The bandwidth and processing cost of a vertex is typically orders
of magnitude higher than the cost of processing a fragment
• Make sure that you get many pixels worth of fragment work for
each primitive
• Make sure each model which create at least 10-20 fragments
per primitive
• Use dynamic mesh level-of-detail, using simpler meshes when
objects are further away from the camera
28 Confidential © 2018 Arm Limited
Take Advantage of Early-Z
• Many fragments are occluded by other fragments
• Running fragment shader of occluded fragment is wasting GPU
power
• Render opaque object from front to back
• Occluded fragment will be rejected if
• Fragment shader doesn’t use discard
• Fragment shader doesn’t write value to depth buffer
• Alpha-to-coverage is OFF
• Otherwise the fragment will go Late-Z path which rejects
occluded fragment after fragment shader
Early Frag
Op
Fragment
Shader
Late Frag Op
29 Confidential © 2018 Arm Limited
Always Use Mipmap If Camera Is Not Still
• Using mipmapping will improve GPU performance
• Less cache miss
• Mipmapping also reduce texture aliasing and improve final image
quality
30 Confidential © 2018 Arm Limited
Don't Use Pre-Z Pass
• The opaque geometry is drawn twice, first as a depth-only update, and then as a color render which uses
an EQUALS depth test to reduce the redundant fragment processing
• Mali GPU has already include optimizations such as FPK to reduce the redundant fragment processing
automatically
• The cost of the additional draw calls, vertex shading, and memory bandwidth nearly always outweighs
the benefits of Z-prepass
31 Confidential © 2018 Arm Limited
Shader Floating-point Precision
• Use mediump and highp keywords
• Full FP32 of vertex attributes is unnecessary for many uses of attribute data
• Keep the data at the minimum precision needed to produce an acceptable final output
• Use FP32 for computing vertex positions only
• Use the lowest possible precision for other attributes
• Don’t always use FP32 for everything
• Don’t upload FP32 data into a buffer and then read it as a mediump attribute
Confidential © 2018 Arm Limited
Arm
Mobile Studio
Introduction
33 Confidential © 2018 Arm Limited
What is in the box?
Streamline
Graphics
Analyzer
Mali Offline
Compiler
(separate download)
Performance
Advisor
(closed beta)
Download Arm Mobile Studio: http://developer.arm.com/mobile-studio
34 Confidential © 2018 Arm Limited
Streamline
Performance Analyzer
Mali GPU support
 Analyze and optimize Mali™ GPU
graphics and compute workloads
 Accelerate your workflow using
built-in analysis templates
Optimize for energy
 Move beyond simple frame time
and FPS tracking
 Monitor overall usage of processor
cycles and memory bandwidth
Speed up your app
 Find out where the system is
spending the most time
 Tune code for cache efficiency
Application event traceNative code profiling
 Break performance
down by function
 View cost alongside
disassembly listing
Arm CPU support
 Profile 32-bit and 64-bit apps for
ARMv7-A and ARMv8-A cores
 Tune multi-threading for
DynamIQ multi-core systems
 Annotate software
workloads
 Define logical event
channel structure
 Trace cross-channel
task dependencies
Tune your rendering
 Identify critical-path GPU
shader core resources
 Detect content inefficiency
35 Confidential © 2018 Arm Limited
Streamline
36 Confidential © 2018 Arm Limited
Triage nurse scenarios
Vsync bound Fragment bound CPU bound
Serialization problems Thermally bound
16.6 ms
37 Confidential © 2018 Arm Limited
Graphics Analyzer
GPU API Debugger
Shader analysis
 Capture and view all shaders used
 Optimize shader performance using
integrated Mali Offline Compiler
Cross platform
 Host support for Windows,
macOS, and Linux
 Target support for any Android
GPU
Rendering API debug
 Graphics debug for content
developers
 Support for all versions of
OpenGL ES and Vulkan
Android utility appVisual analysis views
 Native mode
 Overdraw mode
 Shader map mode
 Fragment count mode
State visibility
 Show API state after every API call
 Trace backwards from point-of-use
to API call responsible for state set
 Manage on-device
connection
 Select and launch
user application
Frame analysis
 Diagnose root causes
of rendering errors
 Identify sources of
rendering inefficiency
38 Confidential © 2018 Arm Limited
Trace outline
Frame capture
Vertex data
API calls
Statistics
Target state
Shaders
Textures,
Buffers,
Uniforms,
…
39 Confidential © 2018 Arm Limited
s
Mali Offline Compiler
Shader static analysis
Rapid iteration
 Verify impact of shader changes
without needing whole application
rebuild
Profile for any Mali GPU
 Cost shader code for every Mali
GPU without needing hardware
Mali GPU aware
 Support for all actively
shipping Mali GPUs
 Cycle counts reflect
specific microarchitecture
Critical path analysisControl flow aware
 Best case control flow
 Worst case control flow
Syntax verification
 Verify correctness of code changes
 Receive clear error diagnostics for
invalid shaders
 Identify dominant
shader resource
 Target this for
optimization!
Register usage
 Work registers
 Uniform registers
 Stack spilling
40 Confidential © 2018 Arm Limited
Mali Offline Compiler
 Use GA to capture the API calls and shaders to
understand the AP behavior.
 Use Offline Shader compiler to profiling
instruction counts for ALU, L/S, TEX
 If the shader needs more registers than the
available one, the GPU would need to perform
registers spilling
 registers spilling will cause big inefficiencies
and higher Load/Store utilization
Mali_Offline_Compiler_v4.3.0$ ./malisc --core Mali-T600 --revision
0p0_15dev0
--driver Mali-T600_r4p0-00rel0 --vertex shader-176.vert –V
ARM Mali Offline Shader Compiler v4.3.0 (C) Copyright 2007-2014
ARM Limited. All rights reserved.
Compilation successful. 3 work registers used, 16 uniform registers
used, spilling not used.
A L/S T Total Bound
Cycles: 9 5 0 14 A
Shortest Path: 4.5 5 0 9.5 L/S
Longest Path: 4.5 5 0 9.5 L/S
Note: The cycles counts do not include possible stalls due to cache
misses.
41 Confidential © 2018 Arm Limited
Performance Advisor
Analysis Reports
Fast workflow
 Integrate data capture and analysis
into nightly CI test workflow
 Read results over a nice cup of tea
Caveats …
 Still under development
 Currently in closed beta
Overview chartsRegion views
 Split by dynamic
behavior
 Split by application
annotation
Executive dashboard
 Show high level status summary
 Show status breakdown by regions
of interest
 See performance
trends over time
 See region splits
Summary reports
 Easy-to-use performance
status reports
 Integrated initial root cause
analysis
42 Confidential © 2018 Arm Limited
Performance Advisor
• An automated performance triage nurse
• Move beyond simple FPS-based regression tracking
• Perform an automated first pass analysis
• Generate easy to read performance report
• Route common issues directly to the team to review
• Free up performance experts to focus on the difficult problems
• Integrate into nightly continuous integration
• Catch major issues early
• Detect gradual regressions before they start impacting users
43 Confidential © 2018 Arm Limited
Region-by-region analysis
4444
Thank You
Danke
Merci
谢谢
ありがとう
Gracias
Kiitos
감사합니다
धन्यवाद
‫תודה‬
Confidential © 2018 Arm Limited
4545
The Arm trademarks featured in this presentation are registered trademarks or
trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights
reserved. All other marks featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
Confidential © 2018 Arm Limited

More Related Content

What's hot

CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingMark Kilgard
 
[Ubisoft] Perforce Integration in a AAA Game Engine
[Ubisoft] Perforce Integration in a AAA Game Engine[Ubisoft] Perforce Integration in a AAA Game Engine
[Ubisoft] Perforce Integration in a AAA Game EnginePerforce
 
ブループリント+ビジュアルスクリプトと仲良くやる方法
ブループリント+ビジュアルスクリプトと仲良くやる方法ブループリント+ビジュアルスクリプトと仲良くやる方法
ブループリント+ビジュアルスクリプトと仲良くやる方法Masahiko Nakamura
 
Editor Utility Widgetで色々便利にしてみた。
Editor Utility Widgetで色々便利にしてみた。Editor Utility Widgetで色々便利にしてみた。
Editor Utility Widgetで色々便利にしてみた。IndieusGames
 
Blender で作ったアニメーションを Unreal Engine 4 で利用する
Blender で作ったアニメーションを Unreal Engine 4 で利用するBlender で作ったアニメーションを Unreal Engine 4 で利用する
Blender で作ったアニメーションを Unreal Engine 4 で利用するrarihoma
 
ゲーム開発環境の自動化
ゲーム開発環境の自動化ゲーム開発環境の自動化
ゲーム開発環境の自動化Masahiko Nakamura
 
ゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤー
ゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤーゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤー
ゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤーTatsuya Iwama
 
【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~
【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~
【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~UnityTechnologiesJapan002
 
それを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイント
それを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイントそれを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイント
それを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイントMakoto Goto
 

What's hot (20)

建築ビジュアライズにおけるLightmass実践使用方法 (フリーランス 真茅健一様)
建築ビジュアライズにおけるLightmass実践使用方法 (フリーランス 真茅健一様)建築ビジュアライズにおけるLightmass実践使用方法 (フリーランス 真茅健一様)
建築ビジュアライズにおけるLightmass実践使用方法 (フリーランス 真茅健一様)
 
CS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path RenderingCS 354 Vector Graphics & Path Rendering
CS 354 Vector Graphics & Path Rendering
 
[Ubisoft] Perforce Integration in a AAA Game Engine
[Ubisoft] Perforce Integration in a AAA Game Engine[Ubisoft] Perforce Integration in a AAA Game Engine
[Ubisoft] Perforce Integration in a AAA Game Engine
 
ブループリント+ビジュアルスクリプトと仲良くやる方法
ブループリント+ビジュアルスクリプトと仲良くやる方法ブループリント+ビジュアルスクリプトと仲良くやる方法
ブループリント+ビジュアルスクリプトと仲良くやる方法
 
Editor Utility Widgetで色々便利にしてみた。
Editor Utility Widgetで色々便利にしてみた。Editor Utility Widgetで色々便利にしてみた。
Editor Utility Widgetで色々便利にしてみた。
 
Blender で作ったアニメーションを Unreal Engine 4 で利用する
Blender で作ったアニメーションを Unreal Engine 4 で利用するBlender で作ったアニメーションを Unreal Engine 4 で利用する
Blender で作ったアニメーションを Unreal Engine 4 で利用する
 
UE4でマルチプレイヤーゲームを作ろう
UE4でマルチプレイヤーゲームを作ろうUE4でマルチプレイヤーゲームを作ろう
UE4でマルチプレイヤーゲームを作ろう
 
なぜなにFProperty - 対応方法と改善点 -
なぜなにFProperty - 対応方法と改善点 -なぜなにFProperty - 対応方法と改善点 -
なぜなにFProperty - 対応方法と改善点 -
 
UE4のモバイル開発におけるコンテンツアップデートの話 - Chunk IDとの激闘編 -
UE4のモバイル開発におけるコンテンツアップデートの話 - Chunk IDとの激闘編 -UE4のモバイル開発におけるコンテンツアップデートの話 - Chunk IDとの激闘編 -
UE4のモバイル開発におけるコンテンツアップデートの話 - Chunk IDとの激闘編 -
 
UE4をレンダラとした趣味的スピード背景ルックデブ(UE4 Environment Art Dive)
UE4をレンダラとした趣味的スピード背景ルックデブ(UE4 Environment Art Dive)UE4をレンダラとした趣味的スピード背景ルックデブ(UE4 Environment Art Dive)
UE4をレンダラとした趣味的スピード背景ルックデブ(UE4 Environment Art Dive)
 
ゲーム開発環境の自動化
ゲーム開発環境の自動化ゲーム開発環境の自動化
ゲーム開発環境の自動化
 
UE4で作成するUIと最適化手法
UE4で作成するUIと最適化手法UE4で作成するUIと最適化手法
UE4で作成するUIと最適化手法
 
Localization feature of ue4
Localization feature of ue4Localization feature of ue4
Localization feature of ue4
 
ゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤー
ゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤーゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤー
ゲームエンジンの文法【UE4】No.006 3次元座標(直交座標系) ,UE4の単位,アウトライナ,レイヤー
 
UE4 MultiPlayer Online Deep Dive 基礎編2 -Traveling- (historia様ご講演) #ue4dd
UE4 MultiPlayer Online Deep Dive 基礎編2 -Traveling-  (historia様ご講演)  #ue4ddUE4 MultiPlayer Online Deep Dive 基礎編2 -Traveling-  (historia様ご講演)  #ue4dd
UE4 MultiPlayer Online Deep Dive 基礎編2 -Traveling- (historia様ご講演) #ue4dd
 
【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~
【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~
【Unite Tokyo 2019】今すぐ現場で覚えておきたい最適化技法 ~「ゲシュタルト・オーディン」開発における最適化事例~
 
Light prepass
Light prepassLight prepass
Light prepass
 
Robo Recallで使われている 最新のVR開発テクニックをご紹介!
Robo Recallで使われている最新のVR開発テクニックをご紹介!Robo Recallで使われている最新のVR開発テクニックをご紹介!
Robo Recallで使われている 最新のVR開発テクニックをご紹介!
 
それを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイント
それを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイントそれを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイント
それを早く言ってよ〜パフォーマンスを出すエフェクト制作のポイント
 
実行速度の最適化のあれこれ プラス おまけ
実行速度の最適化のあれこれ プラス おまけ  実行速度の最適化のあれこれ プラス おまけ
実行速度の最適化のあれこれ プラス おまけ
 

Similar to [Unity Forum 2019] Mobile Graphics Optimization Guides

[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for ArtistOwen Wu
 
Droidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon Berlin
 
IoTFuse - Machine Learning at the Edge
IoTFuse - Machine Learning at the EdgeIoTFuse - Machine Learning at the Edge
IoTFuse - Machine Learning at the EdgeAustin Blackstone
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for ArtistsOwen Wu
 
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
Optimization in Unity: simple tips for developing with "no surprises" / Anton...Optimization in Unity: simple tips for developing with "no surprises" / Anton...
Optimization in Unity: simple tips for developing with "no surprises" / Anton...DevGAMM Conference
 
Hosting AAA Multiplayer Experiences with Multiplay
Hosting AAA Multiplayer Experiences with MultiplayHosting AAA Multiplayer Experiences with Multiplay
Hosting AAA Multiplayer Experiences with MultiplayUnity Technologies
 
OpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUOpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUJiansong Chen
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryMemVerge
 
Streaming print data directly to printhead electronics
Streaming print data directly to printhead electronicsStreaming print data directly to printhead electronics
Streaming print data directly to printhead electronicsGlobal Graphics Software
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...NTT Software Innovation Center
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...Amazon Web Services
 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicschangehee lee
 
Unity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioUnity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioOwen Wu
 
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...James Anderson
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...Edge AI and Vision Alliance
 

Similar to [Unity Forum 2019] Mobile Graphics Optimization Guides (20)

[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist[TGDF 2020] Mobile Graphics Best Practices for Artist
[TGDF 2020] Mobile Graphics Best Practices for Artist
 
Droidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imaginationDroidcon2013 triangles gangolells_imagination
Droidcon2013 triangles gangolells_imagination
 
IoTFuse - Machine Learning at the Edge
IoTFuse - Machine Learning at the EdgeIoTFuse - Machine Learning at the Edge
IoTFuse - Machine Learning at the Edge
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
[Unite Seoul 2020] Mobile Graphics Best Practices for Artists
 
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
Optimization in Unity: simple tips for developing with "no surprises" / Anton...Optimization in Unity: simple tips for developing with "no surprises" / Anton...
Optimization in Unity: simple tips for developing with "no surprises" / Anton...
 
Hosting AAA Multiplayer Experiences with Multiplay
Hosting AAA Multiplayer Experiences with MultiplayHosting AAA Multiplayer Experiences with Multiplay
Hosting AAA Multiplayer Experiences with Multiplay
 
OpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUOpenGL ES and Mobile GPU
OpenGL ES and Mobile GPU
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than Memory
 
Streaming print data directly to printhead electronics
Streaming print data directly to printhead electronicsStreaming print data directly to printhead electronics
Streaming print data directly to printhead electronics
 
RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...RDMA programming design and case studies – for better performance distributed...
RDMA programming design and case studies – for better performance distributed...
 
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
AWS re:Invent 2016: Deep Learning, 3D Content Rendering, and Massively Parall...
 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
 
Webinář InfiniBox
Webinář InfiniBoxWebinář InfiniBox
Webinář InfiniBox
 
The Road to Ultra Low Latency
The Road to Ultra Low LatencyThe Road to Ultra Low Latency
The Road to Ultra Low Latency
 
Unity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studioUnity mobile game performance profiling – using arm mobile studio
Unity mobile game performance profiling – using arm mobile studio
 
Machine Vision Cameras
Machine Vision CamerasMachine Vision Cameras
Machine Vision Cameras
 
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
GDG Cloud Southlake #20:Stefano Doni: Kubernetes performance tuning dilemma: ...
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
HD CCTV -Arecont Exacq Pivot3.ppt
HD CCTV -Arecont Exacq Pivot3.pptHD CCTV -Arecont Exacq Pivot3.ppt
HD CCTV -Arecont Exacq Pivot3.ppt
 

Recently uploaded

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 

Recently uploaded (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 

[Unity Forum 2019] Mobile Graphics Optimization Guides

  • 1. Confidential © 2018 Arm Limited Owen Wu (owen.wu@arm.com) Developer Relations Engineer Mobile Graphics Optimization Guides
  • 2. Confidential © 2018 Arm Limited Who We Are Arm Developer Relations Introduction
  • 3. 3 Confidential © 2018 Arm Limited Who We Are • Arm provides CPU & GPU IPs to Silicon Vendor • We don’t make chips • Clients including Apple, Samsung, Nintendo, Qualcomm, MediaTek, etc. • Softbank acquired us in 2016 • >95% market share of smartphone CPU • >1 Billion GPU shipped in 2018 • We help developers to make better game
  • 4. 4 Confidential © 2018 Arm Limited What We Can Help • Developer education • Game issue investigation • Game performance analysis • Game performance optimization • Deep collaboration for Mali GPU • Game promotion on Arm/Partner global events • Development devices (in the future)
  • 5. Confidential © 2018 Arm Limited Optimization Guides
  • 6. 6 Confidential © 2018 Arm Limited Why Optimize with reliable performance, smooth gameplay, and long battery life Retain users by ensuring good user experience on a wide range of consumer devices Widen market by maximizing rendering effectiveness inside a 2.5 Watt system-wide power budget Enhance visuals
  • 7. 7 Confidential © 2018 Arm Limited Best Parctices of Optimization • Write code with optimization in mind • Hardware knowledge • Check performance regression everyday • Arm Performance Advisor • Don’t do deep optimization too early • Arm Mobile Studio
  • 8. 8 Confidential © 2018 Arm Limited How To Do Optimization • Knowledge of hardware • Gather reliable and stable data • Identify bottleneck first • Find out why bottleneck happened • Figure out the solution • Verify the solution • Optimize one thing at one time
  • 9. Confidential © 2018 Arm Limited Basic Hardware Concepts
  • 10. 10 Confidential © 2018 Arm Limited Pipeline Stages Application Graphics Driver Vertex Shader Primitive Assembler Rasterizer Early Frag Op Fragment Shader Late Frag OpBlending Color Output Depth Output Stencil Output
  • 11. 11 Confidential © 2018 Arm Limited GPU Is Multi-threading • Traditional API is single threaded • Driver hides the complexity • CPU and GPU is running parallelly • Every tasks in GPU are running parallelly too • Optimization goal – keep GPU as busy as possible
  • 12. 12 Confidential © 2018 Arm Limited GPU Desktop GPU – Immidate Rendering Mode Vertex Shader Fragment Shader Frame Buffer 0Frame Buffer 1 Application Draw Calls
  • 13. 13 Confidential © 2018 Arm Limited Mobile GPU – Tiled Rendering Mode GPU Vertex Shader Fragment Shader Frame Buffer 0Frame Buffer 1 Application Draw Calls
  • 14. 14 Confidential © 2018 Arm Limited External Memory / Memory Bandwidth • Mobile’s external memory bandwidth is much smaller than desktop’s • Read or write to external memory is very power consuming • Tile based rendering mode reduce the memory bandwidth requirement • GPU computing power advances faster than memory bandwidth • Usually 2GB/s is recommended GPU Tile Memory External memory Slow Path
  • 15. 15 Confidential © 2018 Arm Limited Cache • Keep data in fast memory • Cache size is small • Keep data small so cache can keep more data • Sequential access can increase cache hit rate
  • 16. 16 Confidential © 2018 Arm Limited Pixel/Fragment/Texel • Pixel is a single position data on frame buffer • Fragment is a thread in GPU which will output a pixel • Texel is a single position data on texture
  • 17. 17 Confidential © 2018 Arm Limited Texture Compression • ASTC may get better quality with same memory size as ETC. • Or same quality with less memory size than ETC. • ASTC takes longer to encode compared to ETC and might make the game packaging process take longer time. Due to this, it is better to use it on final packaging of the game. • ASTC allows more control in terms of quality by allowing to set block size. There are no single best default in block size, but generally setting it to 5x5 or 6x6 is good default.
  • 18. 18 Confidential © 2018 Arm Limited Texture Color Space • Use linear color space rendering if using lighting • Check sRGB in texture inspector window • Textures that are not processed as color should NOT be used in sRGB color space (such as metallic, roughness, normal map, etc). • Current hardware supports sRGB format and hardware will do Gamma correction automatically for free
  • 19. 19 Confidential © 2018 Arm Limited Texture Filtering • Trilinear - Like Bilinear but with added blur between mipmap level • Don’t use trilinear with no mipmap • This filtering will remove noticeable change between mipmap by adding smooth transition.
  • 20. 20 Confidential © 2018 Arm Limited Texture Filtering • Anisotropic - Make textures look better when viewed from different angle, which is good for ground level textures • Higher anisotropic level cost higher
  • 21. 21 Confidential © 2018 Arm Limited Texture Filtering • Use bilinear for balance between performance and visual quality • Trilinear will cost more memory bandwidth than bilinear and needs to be used selectively • Bilinear + 2x Anisotropic most of the time will look and perform better than Trilinear + 1x Anisotropic, so this combination can be better solution rather than using Trilinear. • Keep the anisotropic level low. Using a level higher than 2 should be done very selectively for critical game assets. • This is because higher anisotropic level will cost a lot more bandwidth and will affect device battery life.
  • 22. Confidential © 2018 Arm Limited Best Optimization Practices of Mali GPU
  • 23. 23 Confidential © 2018 Arm Limited Reduce Render State Switch • Render state switch is very expensive operation • Rendering as many primitives as possible in one draw call • Don’t just check number of draw calls or batches • Number of render state switch is also an important index • Using Tris/SetPass (i.e. 95.2K/34) is more accurate • Batch as many draw call as possible • Static batch • GPU Instancing • Dynamic batch
  • 24. 24 Confidential © 2018 Arm Limited Reduce Frame Buffer Switch • Bind each frame buffer object only once • Making all required draw calls before switching to the next • Avoid unnecessary context switch • Use Unity frame debugger to check • Use Arm Mobile Studio to do API level check
  • 25. 25 Confidential © 2018 Arm Limited Clear Frame Buffer Before Rendering • Before rendering, GPU will read frame buffer into tile memory from external memory • Minimizing start of tile loads • Can cheaply initialize the tile memory to a clear color value • Ensure that you clear or invalidate all of your attachments at the start of each render pass • Use Unity frame debugger to check • Use Arm Mobile Studio to do API level check No clear before switching render target. Bad for performance.
  • 26. 26 Confidential © 2018 Arm Limited Reduce Frame Buffer Write • After rendering, GPU will write result to external memory from tile memory • Minimizing end of tile stores • Avoid writing back to external memory whenever is possible • Disable writing to depth/stencil buffer if depth/stencil value is not used • Use Unity frame debugger to check • Use Arm Mobile Studio to do API level check
  • 27. 27 Confidential © 2018 Arm Limited Avoid Rendering Small Triangles • The bandwidth and processing cost of a vertex is typically orders of magnitude higher than the cost of processing a fragment • Make sure that you get many pixels worth of fragment work for each primitive • Make sure each model which create at least 10-20 fragments per primitive • Use dynamic mesh level-of-detail, using simpler meshes when objects are further away from the camera
  • 28. 28 Confidential © 2018 Arm Limited Take Advantage of Early-Z • Many fragments are occluded by other fragments • Running fragment shader of occluded fragment is wasting GPU power • Render opaque object from front to back • Occluded fragment will be rejected if • Fragment shader doesn’t use discard • Fragment shader doesn’t write value to depth buffer • Alpha-to-coverage is OFF • Otherwise the fragment will go Late-Z path which rejects occluded fragment after fragment shader Early Frag Op Fragment Shader Late Frag Op
  • 29. 29 Confidential © 2018 Arm Limited Always Use Mipmap If Camera Is Not Still • Using mipmapping will improve GPU performance • Less cache miss • Mipmapping also reduce texture aliasing and improve final image quality
  • 30. 30 Confidential © 2018 Arm Limited Don't Use Pre-Z Pass • The opaque geometry is drawn twice, first as a depth-only update, and then as a color render which uses an EQUALS depth test to reduce the redundant fragment processing • Mali GPU has already include optimizations such as FPK to reduce the redundant fragment processing automatically • The cost of the additional draw calls, vertex shading, and memory bandwidth nearly always outweighs the benefits of Z-prepass
  • 31. 31 Confidential © 2018 Arm Limited Shader Floating-point Precision • Use mediump and highp keywords • Full FP32 of vertex attributes is unnecessary for many uses of attribute data • Keep the data at the minimum precision needed to produce an acceptable final output • Use FP32 for computing vertex positions only • Use the lowest possible precision for other attributes • Don’t always use FP32 for everything • Don’t upload FP32 data into a buffer and then read it as a mediump attribute
  • 32. Confidential © 2018 Arm Limited Arm Mobile Studio Introduction
  • 33. 33 Confidential © 2018 Arm Limited What is in the box? Streamline Graphics Analyzer Mali Offline Compiler (separate download) Performance Advisor (closed beta) Download Arm Mobile Studio: http://developer.arm.com/mobile-studio
  • 34. 34 Confidential © 2018 Arm Limited Streamline Performance Analyzer Mali GPU support  Analyze and optimize Mali™ GPU graphics and compute workloads  Accelerate your workflow using built-in analysis templates Optimize for energy  Move beyond simple frame time and FPS tracking  Monitor overall usage of processor cycles and memory bandwidth Speed up your app  Find out where the system is spending the most time  Tune code for cache efficiency Application event traceNative code profiling  Break performance down by function  View cost alongside disassembly listing Arm CPU support  Profile 32-bit and 64-bit apps for ARMv7-A and ARMv8-A cores  Tune multi-threading for DynamIQ multi-core systems  Annotate software workloads  Define logical event channel structure  Trace cross-channel task dependencies Tune your rendering  Identify critical-path GPU shader core resources  Detect content inefficiency
  • 35. 35 Confidential © 2018 Arm Limited Streamline
  • 36. 36 Confidential © 2018 Arm Limited Triage nurse scenarios Vsync bound Fragment bound CPU bound Serialization problems Thermally bound 16.6 ms
  • 37. 37 Confidential © 2018 Arm Limited Graphics Analyzer GPU API Debugger Shader analysis  Capture and view all shaders used  Optimize shader performance using integrated Mali Offline Compiler Cross platform  Host support for Windows, macOS, and Linux  Target support for any Android GPU Rendering API debug  Graphics debug for content developers  Support for all versions of OpenGL ES and Vulkan Android utility appVisual analysis views  Native mode  Overdraw mode  Shader map mode  Fragment count mode State visibility  Show API state after every API call  Trace backwards from point-of-use to API call responsible for state set  Manage on-device connection  Select and launch user application Frame analysis  Diagnose root causes of rendering errors  Identify sources of rendering inefficiency
  • 38. 38 Confidential © 2018 Arm Limited Trace outline Frame capture Vertex data API calls Statistics Target state Shaders Textures, Buffers, Uniforms, …
  • 39. 39 Confidential © 2018 Arm Limited s Mali Offline Compiler Shader static analysis Rapid iteration  Verify impact of shader changes without needing whole application rebuild Profile for any Mali GPU  Cost shader code for every Mali GPU without needing hardware Mali GPU aware  Support for all actively shipping Mali GPUs  Cycle counts reflect specific microarchitecture Critical path analysisControl flow aware  Best case control flow  Worst case control flow Syntax verification  Verify correctness of code changes  Receive clear error diagnostics for invalid shaders  Identify dominant shader resource  Target this for optimization! Register usage  Work registers  Uniform registers  Stack spilling
  • 40. 40 Confidential © 2018 Arm Limited Mali Offline Compiler  Use GA to capture the API calls and shaders to understand the AP behavior.  Use Offline Shader compiler to profiling instruction counts for ALU, L/S, TEX  If the shader needs more registers than the available one, the GPU would need to perform registers spilling  registers spilling will cause big inefficiencies and higher Load/Store utilization Mali_Offline_Compiler_v4.3.0$ ./malisc --core Mali-T600 --revision 0p0_15dev0 --driver Mali-T600_r4p0-00rel0 --vertex shader-176.vert –V ARM Mali Offline Shader Compiler v4.3.0 (C) Copyright 2007-2014 ARM Limited. All rights reserved. Compilation successful. 3 work registers used, 16 uniform registers used, spilling not used. A L/S T Total Bound Cycles: 9 5 0 14 A Shortest Path: 4.5 5 0 9.5 L/S Longest Path: 4.5 5 0 9.5 L/S Note: The cycles counts do not include possible stalls due to cache misses.
  • 41. 41 Confidential © 2018 Arm Limited Performance Advisor Analysis Reports Fast workflow  Integrate data capture and analysis into nightly CI test workflow  Read results over a nice cup of tea Caveats …  Still under development  Currently in closed beta Overview chartsRegion views  Split by dynamic behavior  Split by application annotation Executive dashboard  Show high level status summary  Show status breakdown by regions of interest  See performance trends over time  See region splits Summary reports  Easy-to-use performance status reports  Integrated initial root cause analysis
  • 42. 42 Confidential © 2018 Arm Limited Performance Advisor • An automated performance triage nurse • Move beyond simple FPS-based regression tracking • Perform an automated first pass analysis • Generate easy to read performance report • Route common issues directly to the team to review • Free up performance experts to focus on the difficult problems • Integrate into nightly continuous integration • Catch major issues early • Detect gradual regressions before they start impacting users
  • 43. 43 Confidential © 2018 Arm Limited Region-by-region analysis
  • 45. 4545 The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks Confidential © 2018 Arm Limited

Editor's Notes

  1. Mobile Studio consists of four component tools, although at the moment only two are actually in the public tool bundle. Streamline, a system profiler for CPU and GPU performance. Graphics Analyzer, an API debugger for OpenGL ES and Vulkan rendering APIs. In addition we have: Mali Offline Compiler, a syntax checker and static analysis tool for GPU shader programs, which is currently available as a separate download. Performance Advisor, a new tool which places automated performance analysis into a continuous integration workflow. This is currently still in development in a closed beta, but expect to see this joining the Studio release early next year.
  2. This is how the tool looks, all of the views are customizable so you can show only the data you need per API. APITrace is every single call that you make to your chosen api. Can get into the millions quite easily. Dynamic Help is static analysis so we have had a list of our things to watch out for by our experts so it gives you pointers. Textures and Shaders so we get every single asset in your application. And we run shaders through the offline compiler this makes them easily sortable. Frame Outline allows you to quickly navigate between the whole trace to find your problem area fast.
  3. Each region defined has its own analysis section with advice and links to further actions that can be taken All of this information is packaged into one report, which can be integrated into CI systems, or run manualy, and reduces the reliance on technical experts to spends long amounts of time determining why application have performance issues. This enables teams to move forward, empowering them with deeper knowledge, to understand where the application needs attention. In turn Freeing up the indivdual expert to concentrate on other areas.