SlideShare a Scribd company logo
1 of 32
Confidential Restricted to Arm Exec Committee © 2021 Arm
Owen Wu(owen.wu@arm.com)
2021/07/10
TGDF 2021
Unity Mobile Game
Performance Profiling –
Using Arm Mobile
Studio
2 Confidential Restricted to Arm Exec Committee © 2021 Arm
Test Content
Confidential Restricted to Arm Exec Committee © 2021 Arm
Unity
Frame Debugger
and Profiler
4 Confidential Restricted to Arm Exec Committee © 2021 Arm
Unity Profiler
• It’s very useful to know which task is most expensive
• We can see Render Camera took 73.04 ms
• Then we can see Post-Processing in the most expensive task in Render
Camera(62.17 ms)
• Then we finally found Bloom effect took 57.89 ms should be the bottleneck
5 Confidential Restricted to Arm Exec Committee © 2021 Arm
Frame Debugger
• It is very useful to know how a frame is rendered
• The first thing we found is that the camera renders at full screen resolution
2340x1080
• Modern mobile usually has a very high resolution screen and we can
render to smaller texture then up scale
6 Confidential Restricted to Arm Exec Committee © 2021 Arm
Post-Processing
• As bloom is the most expensive post-processing, we can check what is
going on for bloom
• We can see there are 25 draw calls for bloom, each has half resolution of
previous draw call
• So we can reduce the Render Camera resolution which also reduce the
number of draw calls
7 Confidential Restricted to Arm Exec Committee © 2021 Arm
Fixed Timestep
• Default is 50Hz or 0.02 timestep
• Our FixedUpdate() is short enough to be called multiple times per frame
• 0.04 timestep is suggested for aiming 30 fps
• 0.033 timestep may end up two updates per frame sometimes
• Make sure the FixedUpdate() is called only once per frame
Confidential Restricted to Arm Exec Committee © 2021 Arm
Arm Mobile Studio
9 Confidential Restricted to Arm Exec Committee © 2021 Arm
Low Level Profiling with Arm Mobile Studio
• Unity profiler and frame debugger is good for high level profiling
• But we still need low level profiling on real device
• Arm Mobile Studio is a free tool for low level profiling on Mali GPU
• Arm Mobile Studio includes 4 tools for different perspectives
• Performance Advisor
– Generate easy-to-read reports and get optimization advice
• Streamline
– Comprehensive performance profiler, to capture all the counter activity
• Mali Offline Compiler
– Check how a shader program would perform on a Mali GPU
• Graphics Analyzer
– Debug graphics API calls and analyze how content was rendered
10 Confidential Restricted to Arm Exec Committee © 2021 Arm
Performance Advisor
Analysis Reports
Fast workflow
 Integrate data capture and analysis
into nightly CI test workflow
 Read results over a nice cup of tea
Caveats …
 Still under development, so
not included in 2019.0
 Limited access beta program if you
are happy to provide input
Overview charts
Region views
 Split by dynamic
behavior
 Split by application
annotation
Executive dashboard
 Show high level status summary
 Show status breakdown by regions
of interest
 See performance
trends over time
 See region splits
basis
Summary reports
 Easy-to-use performance
status reports
 Integrated initial root cause
analysis
11 Confidential Restricted to Arm Exec Committee © 2021 Arm
Performance Advisor
Show average
FPS data
Show the game is
fragment bound
Show the GPU is
busy but CPU is
idle at the most
of time
Show the bond
based on
timeline
Capture the
screen when FPS
is low
12 Confidential Restricted to Arm Exec Committee © 2021 Arm
GPU Cycles per Frame We can add GPU cycle budget for
30 fps. Here we set 28M cycles
as the budget.
850MHz/30 = 28M/frame
13 Confidential Restricted to Arm Exec Committee © 2021 Arm
Shader Cycles per Frame The execution engine is the most busy
unit in shader core. Execution engine
is responsible for shader execution
and arithmetic operations.
14 Confidential Restricted to Arm Exec Committee © 2021 Arm
Vertices per Frame
We can also set budget here as the green line shows.
We can see the amount of vertices increased over time.
15 Confidential Restricted to Arm Exec Committee © 2021 Arm
Primitives per Frame
The total primitives increased over time but
visible primitives is not. It turns out the dead
enemies are not destroyed even they are invisible.
16 Confidential Restricted to Arm Exec Committee © 2021 Arm
Streamline
Performance Analyzer
Mali GPU support
 Analyze and optimize Mali™ GPU
graphics and compute workloads
 Accelerate your workflow using
built-in analysis templates
Optimize for energy
 Move beyond simple frame time
and FPS tracking
 Monitor overall usage of processor
cycles and memory bandwidth
Speed up your game
 Find out where the system is
spending the most time
 Tune code for cache efficiency
Application event trace
Native code profiling
 Break performance
down by function
 View cost alongside
disassembly listing
Arm CPU support
 Profile 32-bit and 64-bit apps for
ARMv7-A and ARMv8-A cores
 Tune multi-threading for
multi-core and big.LITTLE systems
 Annotate software
workloads
 Define logical event
channel structure
 Trace cross-channel
task dependencies
Tune your rendering
 Identify critical-path GPU
shader core resources
 Detect content inefficiency
17 Confidential Restricted to Arm Exec Committee © 2021 Arm
Streamline Analysis The visible primitive rate is only 34.4%. It means
we sent many invisible primitives to GPU. It
matches the result of Performance Advisor. We
should cull those invisible primitives on CPU.
Back facing primitive rate is 49.5%. It is
reasonable since normally about 50% primitives
are back facing.
Sampling cull rate is 31.9%. It usually means that
there is micro triangle issue.
18 Confidential Restricted to Arm Exec Committee © 2021 Arm
Streamline Analysis
Partial coverage rate is 72.3% and it is pretty high. It
means there are many quads are partial covered
which is usually caused by micro triangle. Partial
coverage is pretty expensive for hardware.
19 Confidential Restricted to Arm Exec Committee © 2021 Arm
Streamline Analysis - Shader Precision
This counter shows that the most interpolations are
32 bits. It matched the result of Performance
Advisor. 32 bits interpolation is twice slower than
16 bits interpolation.
20 Confidential Restricted to Arm Exec Committee © 2021 Arm
s
Mali Offline Compiler
Shader static analysis
Rapid iteration
 Verify impact of shader changes
without needing whole application
rebuild
Profile for any Mali GPU
 Cost shader code for every Mali
GPU without needing hardware
Mali GPU aware
 Support for all actively
shipping Mali GPUs
 Cycle counts reflect
specific microarchitecture
Critical path analysis
Control flow aware
 Best case control flow
 Worst case control flow
Syntax verification
 Verify correctness of code changes
 Receive clear error diagnostics for
invalid shaders
 Identify dominant
shader resource
 Target this for
optimization!
Register usage
 Work registers
 Uniform registers
 Stack spilling
21 Confidential Restricted to Arm Exec Committee © 2021 Arm
Shader Analysis with Mali Offline Compiler
Mali Offline Compiler v7.3.0 (Build 002baf)
Copyright 2007-2020 Arm Limited, all rights reserved
Configuration
=============
Hardware: Mali-G72 r0p3
Architecture: Bifrost
Driver: r25p0-00rel0
Shader type: OpenGL ES Fragment
GPU information.
Driver information.
22 Confidential Restricted to Arm Exec Committee © 2021 Arm
Shader Analysis with Mali Offline Compiler
Main shader
===========
Work registers: 64
Uniform registers: 26
Stack spilling: false
16-bit arithmetic: 2%
A LS V T Bound
Total instruction cycles: 17.33 0.00 3.75 2.00 A
Shortest path cycles: 11.27 0.00 0.50 0.00 A
Longest path cycles: 17.33 0.00 3.75 2.00 A
A = Arithmetic, LS = Load/Store, V = Varying, T = Texture
Only 2% arithmetic is mediump. Mediump is 2 times
faster than Highp.
Shader is arithmetic bound. It matches the result of
Performance Advisor.
23 Confidential Restricted to Arm Exec Committee © 2021 Arm
Shader Analysis with Mali Offline Compiler
Shader properties
=================
Has uniform computation: true
Has side-effects: false
Modifies coverage: true
Uses late ZS test: false
Uses late ZS update: true
Reads color buffer: false
Note: This tool shows only the shader-visible property state.
API configuration may also impact the value of some properties.
Unnecessary uniform computation in shader. Should
be moved to CPU.
Instruction like discard() will force shader to do late
Z update.
Shader logic is forcing shader to do late Z update.
24 Confidential Restricted to Arm Exec Committee © 2021 Arm
Graphics Analyzer
GPU API Debugger
Shader analysis
 Capture and view all shaders used
 Debug and optimize performance
issues using the integrated Mali
Offline Compiler
Cross platform
 Host support for Windows, Linux
and Mac
Advanced API debug
 Graphics debug for content
developers
 Support for all versions of
OpenGL ES and Vulkan
Android application
Visual analysis views
 Native mode
 Overdraw mode
 Shader map mode
 Fragment count mode
State visibility
 Show API state after every API call
 Trace backwards from point-of-use
to triggering API call which caused
a specific state to change
 Manage on-device
connection
 Select and launch
user application
Frame analysis
 Diagnose root causes of
rendering errors
 Identify sources of
rendering inefficiency
25 Confidential Restricted to Arm Exec Committee © 2021 Arm
Graphics API Call Analysis with Graphics Analyzer
26 Confidential Restricted to Arm Exec Committee © 2021 Arm
Graphics API Call Analysis with Graphics Analyzer
27 Confidential Restricted to Arm Exec Committee © 2021 Arm
Graphics API Call Analysis with Graphics Analyzer
28 Confidential Restricted to Arm Exec Committee © 2021 Arm
Graphics API Call Analysis with Graphics Analyzer
• Use mesh LOD to avoid micro triangle issue.
29 Confidential Restricted to Arm Exec Committee © 2021 Arm
Arm Mobile Studio
• Starter Edition
• https://developer.arm.com/mobile-
studio
• Professional Edition
• https://developer.arm.com/mobile-
studio-pro
30 Confidential Restricted to Arm Exec Committee © 2021 Arm
Q & A
• Send your questions to : owen.wu@arm.com
• Visit developer.arm.com :
https://developer.arm.com/solutions/graphics-and-gaming
• More Unity resources :
https://developer.arm.com/solutions/graphics-and-
gaming/gaming-engine/unity
Confidential Restricted to Arm Exec Committee © 2021 Arm
Thank You
Danke
Gracias
谢谢
ありがとう
Asante
Merci
감사합니다
धन्यवाद
Kiitos
‫ا‬ً‫شكر‬
ধন্যবাদ
‫תודה‬
The Arm trademarks featured in this presentation are registered
trademarks or trademarks of Arm Limited (or its subsidiaries) in
the US and/or elsewhere. All rights reserved. All other marks
featured may be trademarks of their respective owners.
www.arm.com/company/policies/trademarks
Confidential Restricted to Arm Exec Committee © 2021 Arm

More Related Content

What's hot

Game Creators Conference 2019 Keiji Kikuchi
Game Creators Conference 2019 Keiji KikuchiGame Creators Conference 2019 Keiji Kikuchi
Game Creators Conference 2019 Keiji KikuchiKeiji Kikuchi
 
若輩エンジニアから見たUniRxを利用したゲーム開発
若輩エンジニアから見たUniRxを利用したゲーム開発若輩エンジニアから見たUniRxを利用したゲーム開発
若輩エンジニアから見たUniRxを利用したゲーム開発Hirohito Morinaga
 
Editor スクリプティング 入門
Editor スクリプティング 入門Editor スクリプティング 入門
Editor スクリプティング 入門Keigo Ando
 
スマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYO
スマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYOスマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYO
スマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYOGame Tools & Middleware Forum
 
Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019
Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019
Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019Unity Technologies
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingumsl snfrzb
 
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜Unity Technologies Japan K.K.
 
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」直久 住川
 
Yocto Project ハンズオン / 参加者用資料
Yocto Project ハンズオン / 参加者用資料Yocto Project ハンズオン / 参加者用資料
Yocto Project ハンズオン / 参加者用資料Nobuhiro Iwamatsu
 
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019Unity Technologies
 
【Unite Tokyo 2019】2DアーティストのためのGPU入門
【Unite Tokyo 2019】2DアーティストのためのGPU入門【Unite Tokyo 2019】2DアーティストのためのGPU入門
【Unite Tokyo 2019】2DアーティストのためのGPU入門UnityTechnologiesJapan002
 
【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術
【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術
【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術Unity Technologies Japan K.K.
 
Building And Releasing A Massively Multiplayer Online Game
Building And Releasing A Massively Multiplayer Online GameBuilding And Releasing A Massively Multiplayer Online Game
Building And Releasing A Massively Multiplayer Online GameJamie Winsor
 
Cross-scene references: A shock to the system - Unite Copenhagen 2019
Cross-scene references: A shock to the system - Unite Copenhagen 2019Cross-scene references: A shock to the system - Unite Copenhagen 2019
Cross-scene references: A shock to the system - Unite Copenhagen 2019Unity Technologies
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemGuerrilla
 
第9回ACRiウェビナー_日立/島田様ご講演資料
第9回ACRiウェビナー_日立/島田様ご講演資料第9回ACRiウェビナー_日立/島田様ご講演資料
第9回ACRiウェビナー_日立/島田様ご講演資料直久 住川
 
Unity ネイティブプラグインの作成について
Unity ネイティブプラグインの作成についてUnity ネイティブプラグインの作成について
Unity ネイティブプラグインの作成についてTatsuhiko Yamamura
 

What's hot (20)

Game Creators Conference 2019 Keiji Kikuchi
Game Creators Conference 2019 Keiji KikuchiGame Creators Conference 2019 Keiji Kikuchi
Game Creators Conference 2019 Keiji Kikuchi
 
若輩エンジニアから見たUniRxを利用したゲーム開発
若輩エンジニアから見たUniRxを利用したゲーム開発若輩エンジニアから見たUniRxを利用したゲーム開発
若輩エンジニアから見たUniRxを利用したゲーム開発
 
Editor スクリプティング 入門
Editor スクリプティング 入門Editor スクリプティング 入門
Editor スクリプティング 入門
 
スマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYO
スマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYOスマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYO
スマホもゲーム機も!CRIWARE×UE4 最新機能紹介 - 株式会社CRI・ミドルウェア - GTMF 2018 TOKYO
 
Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019
Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019
Learn how to do stylized shading with Shader Graph – Unite Copenhagen 2019
 
Star Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processingStar Ocean 4 - Flexible Shader Managment and Post-processing
Star Ocean 4 - Flexible Shader Managment and Post-processing
 
[GTMF2019] Python / BlueprintによるUnreal Engineの自動化
[GTMF2019] Python / BlueprintによるUnreal Engineの自動化[GTMF2019] Python / BlueprintによるUnreal Engineの自動化
[GTMF2019] Python / BlueprintによるUnreal Engineの自動化
 
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
【Unity道場】AssetGraph入門 〜ノードを駆使しててUnityの面倒な手作業を自動化する方法〜
 
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
「FPGA 開発入門:FPGA を用いたエッジ AI の高速化手法を学ぶ」
 
Yocto Project ハンズオン / 参加者用資料
Yocto Project ハンズオン / 参加者用資料Yocto Project ハンズオン / 参加者用資料
Yocto Project ハンズオン / 参加者用資料
 
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
Optimize your game with the Profile Analyzer - Unite Copenhagen 2019
 
【Unite Tokyo 2019】2DアーティストのためのGPU入門
【Unite Tokyo 2019】2DアーティストのためのGPU入門【Unite Tokyo 2019】2DアーティストのためのGPU入門
【Unite Tokyo 2019】2DアーティストのためのGPU入門
 
60fpsアクションを実現する秘訣を伝授 解析編
60fpsアクションを実現する秘訣を伝授 解析編60fpsアクションを実現する秘訣を伝授 解析編
60fpsアクションを実現する秘訣を伝授 解析編
 
【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術
【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術
【Unity道場スペシャル 2017京都】最適化をする前に覚えておきたい技術
 
Building And Releasing A Massively Multiplayer Online Game
Building And Releasing A Massively Multiplayer Online GameBuilding And Releasing A Massively Multiplayer Online Game
Building And Releasing A Massively Multiplayer Online Game
 
Cross-scene references: A shock to the system - Unite Copenhagen 2019
Cross-scene references: A shock to the system - Unite Copenhagen 2019Cross-scene references: A shock to the system - Unite Copenhagen 2019
Cross-scene references: A shock to the system - Unite Copenhagen 2019
 
Real-time content creation with UE4 ray tracing
Real-time content creation with UE4 ray tracingReal-time content creation with UE4 ray tracing
Real-time content creation with UE4 ray tracing
 
Killzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo PostmortemKillzone Shadow Fall Demo Postmortem
Killzone Shadow Fall Demo Postmortem
 
第9回ACRiウェビナー_日立/島田様ご講演資料
第9回ACRiウェビナー_日立/島田様ご講演資料第9回ACRiウェビナー_日立/島田様ご講演資料
第9回ACRiウェビナー_日立/島田様ご講演資料
 
Unity ネイティブプラグインの作成について
Unity ネイティブプラグインの作成についてUnity ネイティブプラグインの作成について
Unity ネイティブプラグインの作成について
 

Similar to Unity mobile game performance profiling – using arm mobile studio

[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...Edge AI and Vision Alliance
 
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...The Linux Foundation
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile StudioOwen Wu
 
XLcloud 3-d remote rendering
XLcloud 3-d remote renderingXLcloud 3-d remote rendering
XLcloud 3-d remote renderingMarius Preda PhD
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsIgalia
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsDeepak Shankar
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
 
Design the implementation of CDEx Robust DC Motor.
Design the implementation of CDEx Robust DC Motor.Design the implementation of CDEx Robust DC Motor.
Design the implementation of CDEx Robust DC Motor.Ankita Tiwari
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfSamiraKids
 
“Seamless Deployment of Multimedia and Machine Learning Applications at the E...
“Seamless Deployment of Multimedia and Machine Learning Applications at the E...“Seamless Deployment of Multimedia and Machine Learning Applications at the E...
“Seamless Deployment of Multimedia and Machine Learning Applications at the E...Edge AI and Vision Alliance
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002Enrico Busto
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002Enrico Busto
 
HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)
HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)
HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)DevGAMM Conference
 
Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools Matteo Valoriani
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Codemotion
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Codemotion
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Aananth C N
 

Similar to Unity mobile game performance profiling – using arm mobile studio (20)

[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio [Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
 
Real Time Video Processing in FPGA
Real Time Video Processing in FPGA Real Time Video Processing in FPGA
Real Time Video Processing in FPGA
 
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
“Open Standards: Powering the Future of Embedded Vision,” a Presentation from...
 
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
XPDDS17: Approach to Native Applications in XEN on ARM - Volodymyr Babchuk, E...
 
[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio[TGDF 2019] Mali GPU Architecture and Mobile Studio
[TGDF 2019] Mali GPU Architecture and Mobile Studio
 
XLcloud 3-d remote rendering
XLcloud 3-d remote renderingXLcloud 3-d remote rendering
XLcloud 3-d remote rendering
 
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUsturnip: Update on Open Source Vulkan Driver for Adreno GPUs
turnip: Update on Open Source Vulkan Driver for Adreno GPUs
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...
 
Introduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSPIntroduction to Blackfin BF532 DSP
Introduction to Blackfin BF532 DSP
 
Design the implementation of CDEx Robust DC Motor.
Design the implementation of CDEx Robust DC Motor.Design the implementation of CDEx Robust DC Motor.
Design the implementation of CDEx Robust DC Motor.
 
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdfJIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
JIT Spraying Never Dies - Bypass CFG By Leveraging WARP Shader JIT Spraying.pdf
 
“Seamless Deployment of Multimedia and Machine Learning Applications at the E...
“Seamless Deployment of Multimedia and Machine Learning Applications at the E...“Seamless Deployment of Multimedia and Machine Learning Applications at the E...
“Seamless Deployment of Multimedia and Machine Learning Applications at the E...
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)
HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)
HMS Core Game Solution- create the immersive game world / Fei Tong (Huawei)
 
Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools Debug, Analyze and Optimize Games with Intel Tools
Debug, Analyze and Optimize Games with Intel Tools
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
 
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codem...
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
 

Recently uploaded

chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Recently uploaded (20)

chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 

Unity mobile game performance profiling – using arm mobile studio

  • 1. Confidential Restricted to Arm Exec Committee © 2021 Arm Owen Wu(owen.wu@arm.com) 2021/07/10 TGDF 2021 Unity Mobile Game Performance Profiling – Using Arm Mobile Studio
  • 2. 2 Confidential Restricted to Arm Exec Committee © 2021 Arm Test Content
  • 3. Confidential Restricted to Arm Exec Committee © 2021 Arm Unity Frame Debugger and Profiler
  • 4. 4 Confidential Restricted to Arm Exec Committee © 2021 Arm Unity Profiler • It’s very useful to know which task is most expensive • We can see Render Camera took 73.04 ms • Then we can see Post-Processing in the most expensive task in Render Camera(62.17 ms) • Then we finally found Bloom effect took 57.89 ms should be the bottleneck
  • 5. 5 Confidential Restricted to Arm Exec Committee © 2021 Arm Frame Debugger • It is very useful to know how a frame is rendered • The first thing we found is that the camera renders at full screen resolution 2340x1080 • Modern mobile usually has a very high resolution screen and we can render to smaller texture then up scale
  • 6. 6 Confidential Restricted to Arm Exec Committee © 2021 Arm Post-Processing • As bloom is the most expensive post-processing, we can check what is going on for bloom • We can see there are 25 draw calls for bloom, each has half resolution of previous draw call • So we can reduce the Render Camera resolution which also reduce the number of draw calls
  • 7. 7 Confidential Restricted to Arm Exec Committee © 2021 Arm Fixed Timestep • Default is 50Hz or 0.02 timestep • Our FixedUpdate() is short enough to be called multiple times per frame • 0.04 timestep is suggested for aiming 30 fps • 0.033 timestep may end up two updates per frame sometimes • Make sure the FixedUpdate() is called only once per frame
  • 8. Confidential Restricted to Arm Exec Committee © 2021 Arm Arm Mobile Studio
  • 9. 9 Confidential Restricted to Arm Exec Committee © 2021 Arm Low Level Profiling with Arm Mobile Studio • Unity profiler and frame debugger is good for high level profiling • But we still need low level profiling on real device • Arm Mobile Studio is a free tool for low level profiling on Mali GPU • Arm Mobile Studio includes 4 tools for different perspectives • Performance Advisor – Generate easy-to-read reports and get optimization advice • Streamline – Comprehensive performance profiler, to capture all the counter activity • Mali Offline Compiler – Check how a shader program would perform on a Mali GPU • Graphics Analyzer – Debug graphics API calls and analyze how content was rendered
  • 10. 10 Confidential Restricted to Arm Exec Committee © 2021 Arm Performance Advisor Analysis Reports Fast workflow  Integrate data capture and analysis into nightly CI test workflow  Read results over a nice cup of tea Caveats …  Still under development, so not included in 2019.0  Limited access beta program if you are happy to provide input Overview charts Region views  Split by dynamic behavior  Split by application annotation Executive dashboard  Show high level status summary  Show status breakdown by regions of interest  See performance trends over time  See region splits basis Summary reports  Easy-to-use performance status reports  Integrated initial root cause analysis
  • 11. 11 Confidential Restricted to Arm Exec Committee © 2021 Arm Performance Advisor Show average FPS data Show the game is fragment bound Show the GPU is busy but CPU is idle at the most of time Show the bond based on timeline Capture the screen when FPS is low
  • 12. 12 Confidential Restricted to Arm Exec Committee © 2021 Arm GPU Cycles per Frame We can add GPU cycle budget for 30 fps. Here we set 28M cycles as the budget. 850MHz/30 = 28M/frame
  • 13. 13 Confidential Restricted to Arm Exec Committee © 2021 Arm Shader Cycles per Frame The execution engine is the most busy unit in shader core. Execution engine is responsible for shader execution and arithmetic operations.
  • 14. 14 Confidential Restricted to Arm Exec Committee © 2021 Arm Vertices per Frame We can also set budget here as the green line shows. We can see the amount of vertices increased over time.
  • 15. 15 Confidential Restricted to Arm Exec Committee © 2021 Arm Primitives per Frame The total primitives increased over time but visible primitives is not. It turns out the dead enemies are not destroyed even they are invisible.
  • 16. 16 Confidential Restricted to Arm Exec Committee © 2021 Arm Streamline Performance Analyzer Mali GPU support  Analyze and optimize Mali™ GPU graphics and compute workloads  Accelerate your workflow using built-in analysis templates Optimize for energy  Move beyond simple frame time and FPS tracking  Monitor overall usage of processor cycles and memory bandwidth Speed up your game  Find out where the system is spending the most time  Tune code for cache efficiency Application event trace Native code profiling  Break performance down by function  View cost alongside disassembly listing Arm CPU support  Profile 32-bit and 64-bit apps for ARMv7-A and ARMv8-A cores  Tune multi-threading for multi-core and big.LITTLE systems  Annotate software workloads  Define logical event channel structure  Trace cross-channel task dependencies Tune your rendering  Identify critical-path GPU shader core resources  Detect content inefficiency
  • 17. 17 Confidential Restricted to Arm Exec Committee © 2021 Arm Streamline Analysis The visible primitive rate is only 34.4%. It means we sent many invisible primitives to GPU. It matches the result of Performance Advisor. We should cull those invisible primitives on CPU. Back facing primitive rate is 49.5%. It is reasonable since normally about 50% primitives are back facing. Sampling cull rate is 31.9%. It usually means that there is micro triangle issue.
  • 18. 18 Confidential Restricted to Arm Exec Committee © 2021 Arm Streamline Analysis Partial coverage rate is 72.3% and it is pretty high. It means there are many quads are partial covered which is usually caused by micro triangle. Partial coverage is pretty expensive for hardware.
  • 19. 19 Confidential Restricted to Arm Exec Committee © 2021 Arm Streamline Analysis - Shader Precision This counter shows that the most interpolations are 32 bits. It matched the result of Performance Advisor. 32 bits interpolation is twice slower than 16 bits interpolation.
  • 20. 20 Confidential Restricted to Arm Exec Committee © 2021 Arm s Mali Offline Compiler Shader static analysis Rapid iteration  Verify impact of shader changes without needing whole application rebuild Profile for any Mali GPU  Cost shader code for every Mali GPU without needing hardware Mali GPU aware  Support for all actively shipping Mali GPUs  Cycle counts reflect specific microarchitecture Critical path analysis Control flow aware  Best case control flow  Worst case control flow Syntax verification  Verify correctness of code changes  Receive clear error diagnostics for invalid shaders  Identify dominant shader resource  Target this for optimization! Register usage  Work registers  Uniform registers  Stack spilling
  • 21. 21 Confidential Restricted to Arm Exec Committee © 2021 Arm Shader Analysis with Mali Offline Compiler Mali Offline Compiler v7.3.0 (Build 002baf) Copyright 2007-2020 Arm Limited, all rights reserved Configuration ============= Hardware: Mali-G72 r0p3 Architecture: Bifrost Driver: r25p0-00rel0 Shader type: OpenGL ES Fragment GPU information. Driver information.
  • 22. 22 Confidential Restricted to Arm Exec Committee © 2021 Arm Shader Analysis with Mali Offline Compiler Main shader =========== Work registers: 64 Uniform registers: 26 Stack spilling: false 16-bit arithmetic: 2% A LS V T Bound Total instruction cycles: 17.33 0.00 3.75 2.00 A Shortest path cycles: 11.27 0.00 0.50 0.00 A Longest path cycles: 17.33 0.00 3.75 2.00 A A = Arithmetic, LS = Load/Store, V = Varying, T = Texture Only 2% arithmetic is mediump. Mediump is 2 times faster than Highp. Shader is arithmetic bound. It matches the result of Performance Advisor.
  • 23. 23 Confidential Restricted to Arm Exec Committee © 2021 Arm Shader Analysis with Mali Offline Compiler Shader properties ================= Has uniform computation: true Has side-effects: false Modifies coverage: true Uses late ZS test: false Uses late ZS update: true Reads color buffer: false Note: This tool shows only the shader-visible property state. API configuration may also impact the value of some properties. Unnecessary uniform computation in shader. Should be moved to CPU. Instruction like discard() will force shader to do late Z update. Shader logic is forcing shader to do late Z update.
  • 24. 24 Confidential Restricted to Arm Exec Committee © 2021 Arm Graphics Analyzer GPU API Debugger Shader analysis  Capture and view all shaders used  Debug and optimize performance issues using the integrated Mali Offline Compiler Cross platform  Host support for Windows, Linux and Mac Advanced API debug  Graphics debug for content developers  Support for all versions of OpenGL ES and Vulkan Android application Visual analysis views  Native mode  Overdraw mode  Shader map mode  Fragment count mode State visibility  Show API state after every API call  Trace backwards from point-of-use to triggering API call which caused a specific state to change  Manage on-device connection  Select and launch user application Frame analysis  Diagnose root causes of rendering errors  Identify sources of rendering inefficiency
  • 25. 25 Confidential Restricted to Arm Exec Committee © 2021 Arm Graphics API Call Analysis with Graphics Analyzer
  • 26. 26 Confidential Restricted to Arm Exec Committee © 2021 Arm Graphics API Call Analysis with Graphics Analyzer
  • 27. 27 Confidential Restricted to Arm Exec Committee © 2021 Arm Graphics API Call Analysis with Graphics Analyzer
  • 28. 28 Confidential Restricted to Arm Exec Committee © 2021 Arm Graphics API Call Analysis with Graphics Analyzer • Use mesh LOD to avoid micro triangle issue.
  • 29. 29 Confidential Restricted to Arm Exec Committee © 2021 Arm Arm Mobile Studio • Starter Edition • https://developer.arm.com/mobile- studio • Professional Edition • https://developer.arm.com/mobile- studio-pro
  • 30. 30 Confidential Restricted to Arm Exec Committee © 2021 Arm Q & A • Send your questions to : owen.wu@arm.com • Visit developer.arm.com : https://developer.arm.com/solutions/graphics-and-gaming • More Unity resources : https://developer.arm.com/solutions/graphics-and- gaming/gaming-engine/unity
  • 31. Confidential Restricted to Arm Exec Committee © 2021 Arm Thank You Danke Gracias 谢谢 ありがとう Asante Merci 감사합니다 धन्यवाद Kiitos ‫ا‬ً‫شكر‬ ধন্যবাদ ‫תודה‬
  • 32. The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. www.arm.com/company/policies/trademarks Confidential Restricted to Arm Exec Committee © 2021 Arm

Editor's Notes

  1. Has uniform computationShows if there was any optimized uniform computation. This is computation that depends only on literal constants or uniform values, and therefore produces the same result for every thread in a draw call or compute dispatch. While the drivers can optimize this, it still has a cost, so where possible, move it from the shader into application logic on the CPU. Has side-effectsShows if this shader has side-effects that are visible in memory, outside of the fixed graphics pipeline. They can be caused by: Writes into shader storage buffers Stores into images Uses of atomics Side-effecting shaders cannot be optimized away by techniques such as hidden surface removal, so their use should be minimized. Modifies coverage Shows if a fragment shader has a coverage mask that can be changed by shader execution, for example by using a discard statement. Shaders with modifiable coverage must use a late-ZS update, which reduces efficiency of early ZS testing for later fragments at the same coordinate. Note : Other API-side behaviors, such as setting of alpha-to-coverage, can also impact coverage masks and are not considered here. Uses late ZS test Shows if a fragment shader contains logic that forces a late ZS test, for example by writing to gl_FragDepth. This disables use of early-ZS testing and hidden surface removal, which can be a significant efficiency loss. Note : Other API-side behaviors, such as disabling depth testing, can override this behavior. Uses late ZS update Shows if a fragment shader contains logic that forces a late ZS update, for example by reading the old depth value in the shader by using gl_LastFragDepthARM. This can reduce efficiency of early ZS testing for later fragments at the same coordinate. Reads color buffer Shows if a fragment shader contains logic that programmatically reads from the color buffer, for example by reading from gl_LastFragColorARM. Shaders that read from the color buffer in this manner are treated as transparent, and cannot be used as hidden-surface removal occluders.