Debug, Analyze and Optimize
Games with Intel Tools
Surviving the apocalypse on mainstream graphics
Matteo Valoriani, FifthIngenium CEO
Intel Software Innovator
Nice to Meet You
www.slideshare.net/MatteoValoriani
https://it.linkedin.com/in/matteovaloriani
http://fifthingenium.com/blog
https://github.com/mvaloriani
mvaloriani at gmail dot com
@MatteoValoriani
Matteo Valoriani
CEO of FifthIngenium
PhD at Politecnico of Milano
Speaker and Consultant
2
Agenda
3
• Introduction
• Intel® Graphics Performance
Analyzers .. What ?
• Intel® GPA Live Demo
• Optimizations
• Conclusion
4
PC Virtual Reality Mobile
6
Getting Started with
Intel® Graphics
Performance Analyzers
(Intel® GPA)
https://goo.gl/2cKmMa
Intel® Graphics Performance Analyzers 2017 R1
8
How to Get Started?
Download for FREE at https://software.intel.com/gpa/
• No Code Changes Needed
• No Environment Changes
• No IDE Necessary
9
Intel® GPA - Optimize Your Graphics Applications!
High–End
GPU
Mainstream
Graphics
10fps
5fps
60+
fps
5fps
30 fps
30+
fps
Intel® Graphics
Performance
Analyzers
Host OS
▪ Win 7, Win 8.1, Win 10 (64-
bit)
DirectX*
▪ DX 9, 9EX, 10.x, 11.0, 11.1, 12
Target Hardware
▪ Intel, NVidia* & AMD* GPUs
▪ Windows x86 Tablets
▪ HTC Vive*, Oculus Rift*
10
Optimize Windows*, Android*, and Ubuntu* Games!
Windows
Gaming
Android
Gaming
Host OS
▪ Windows, Ubuntu, Mac* OSX
OpenGL* ES
▪ 1.x, 2.x, 3.x
Target Hardware
▪ Intel Atom®
Android OS
▪ 4.x, 5.x, 6.x
Ubuntu
Gaming
Host OS
▪ Ubuntu 16.04
OpenGL
▪ 3.2, 3.3, 4.0, 4.1 (Core Profile)
Target Hardware
▪ Intel® HD Graphics 4k-6k
Target OS
▪ Ubuntu 16.04
11
Host/Target Architecture
Host System Target System
12
What’s Inside Intel® GPA?
System Analyzer / HUD
Graphics Frame
Analyzer
In-game analysis
Single frame analysisTimeline analysis
Graphics Monitor
Launch & config tool
Graphics Trace
Analyzer
13
How to Use Intel® GPA?
HUD / System Analyzer:
Frame Analyzer:
Trace Analyzer:
CPU Limited
GPU Limited
Capture Frame
Capture Trace
?Run with
Intel® GPA
In-Game Analysis Offline Analysis
14
Graphics Monitor
Launch & config tool
Define profiles & preferences
15
System Analyzer / HUD
In-game analysis
• Get metrics for CPU, GPU, graphics drivers, DirectX*,
OpenGL*, or OpenGL* ES
• Experiment with override modes that quickly isolate
common performance bottlenecks
• Capture frames and traces for further analysis
• Display up to 16 performance metrics
simultaneously
• Monitor the current, minimum, and maximum frame
rate
• Use without code modifications or special libraries
16
System Analyzer / HUD
Live Analysis
17
18
Graphics Frame Analyzer
In-game analysis
• Use the API log to identify visual errors by function and call
errors and warnings to graphics APIs
• Select a draw call and verify its contribution to the frame,
alpha channel, color, format, and depth buffers
• Quantify performance optimization opportunities with
render experiments per draw call
• Solve issues with shadowing, lighting, or color schemes by
locating misplaced objects
19
Graphics Frame Analyzer
Performance Analysis with Hardware Metrics
Evaluation Flow to find 3D hotspots
20
21
Published in the 6th generation graphics API dev guide :
https://software.intel.com/en-us/articles/6th-gen-graphics-api-dev-guide
22
23
Graphics Trace Analyzer
Single frame analysis
24
Optimizations
25
Script Frustum Culling and Co-routines
Use the following Monobehavior callbacks to cull scripts
outside of the camera frustum that do not need to
update when not in focus.
Monobehavior callbacks which trigger when object with
script leaves / enters the camera frustum
26
Script Frustum Culling and Co-routines (2)
Co-routines are essentially functions with the ability to
pause and resume execution.
The power of co-routines can be leveraged by
removing the original Update() function in your script
and replacing it with a co-routine.
You can then set how often you would like your co-
routine to execute using the yield command.
27
Memory Management Optimization
A great way to get an overview of how you are managing memory is to check the ‘GC
Alloc’ section of the Overview window in Unity Profiler and step through your frames until
you see a significant allocation.
• To avoid frequent allocations, it is advantageous to use structs instead of classes to
have allocations be done on the stack, instead of in the heap.
• Multiple allocations to the heap can lead to significant memory fragmentation and
frequent garbage collections.
28
Occlusion Culling
Occlusion culling is a feature available in Unity that enables you to cull out objects that
are occluded by other objects with respect to the camera.
29
Occlusion Culling
1. Go through your entire scene to multi-select
any objects that should be included in
occlusion culling calculations and mark them
as “Occluder Static” and “Occludee Static”.
2. When setting up your occlusion culling system,
set your occlusion areas carefully.By default,
Unity uses the entire scene as the occlusion
area, which can lead to frivolous computation.
3. To make sure that the entire scene isn’t used,
create an occlusion area manually and
surround only the area to be included in the
calculation.
30
LOD
Level of Detail (LOD) allows multiple meshes to attach to a game object and provides the
ability to switch between meshes the object uses based on camera distance. The LOD can
automatically simplify the mesh to compensate.
LOD L0 L1 L2
fps 160 180 220
31
Terrain Optimization
• Sampler limited
• No dynamic branching
• Optimized for Legacy HW where sampling was faster than computing LODs
• Implementation of dynamic branching increased perf by 2x ( 3ms -> 1.5ms)
• Using samplegrad for dynamic LOD selection
Conclusion
32
33
Conclusion
With the right tools :
and the right methodology, finding performance bottlenecks is easy !
Questions ?
34
References
https://software.intel.com/gpa/
https://software.intel.com/en-us/articles/6th-gen-graphics-api-dev-guide
https://software.intel.com/en-us/articles/how-to-plan-optimizations-with-unity
https://software.intel.com/en-us/android/articles/unity-optimization-guide-for-x86-android-part-2
https://software.intel.com/en-us/android/articles/unity-optimization-guide-for-x86-android-part-3
https://software.intel.com/en-us/android/articles/unity-optimization-guide-for-x86-android-part-4
https://x-team.com/blog/unity-3d-optimisation-and-best-practices-part-1/
http://docs.unity3d.com/Manual/class-OcclusionArea.html
Legal Notices and Disclaimers
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well
as any warranty arising from course of performance, course of dealing, or usage in trade.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel
a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.
The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are
available on request.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on
system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com].
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit www.intel.com/benchmarks.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain
optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice.
Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your
system hardware, software or configuration may affect your actual performance.
Intel, Atom and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
© Intel Corporation.
36
Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codemotion Rome 2017

Debug, Analyze and Optimize Games with Intel Tools - Matteo Valoriani - Codemotion Rome 2017

  • 1.
    Debug, Analyze andOptimize Games with Intel Tools Surviving the apocalypse on mainstream graphics Matteo Valoriani, FifthIngenium CEO Intel Software Innovator
  • 2.
    Nice to MeetYou www.slideshare.net/MatteoValoriani https://it.linkedin.com/in/matteovaloriani http://fifthingenium.com/blog https://github.com/mvaloriani mvaloriani at gmail dot com @MatteoValoriani Matteo Valoriani CEO of FifthIngenium PhD at Politecnico of Milano Speaker and Consultant 2
  • 3.
    Agenda 3 • Introduction • Intel®Graphics Performance Analyzers .. What ? • Intel® GPA Live Demo • Optimizations • Conclusion
  • 4.
  • 5.
  • 6.
    6 Getting Started with Intel®Graphics Performance Analyzers (Intel® GPA) https://goo.gl/2cKmMa
  • 7.
  • 8.
    8 How to GetStarted? Download for FREE at https://software.intel.com/gpa/ • No Code Changes Needed • No Environment Changes • No IDE Necessary
  • 9.
    9 Intel® GPA -Optimize Your Graphics Applications! High–End GPU Mainstream Graphics 10fps 5fps 60+ fps 5fps 30 fps 30+ fps Intel® Graphics Performance Analyzers
  • 10.
    Host OS ▪ Win7, Win 8.1, Win 10 (64- bit) DirectX* ▪ DX 9, 9EX, 10.x, 11.0, 11.1, 12 Target Hardware ▪ Intel, NVidia* & AMD* GPUs ▪ Windows x86 Tablets ▪ HTC Vive*, Oculus Rift* 10 Optimize Windows*, Android*, and Ubuntu* Games! Windows Gaming Android Gaming Host OS ▪ Windows, Ubuntu, Mac* OSX OpenGL* ES ▪ 1.x, 2.x, 3.x Target Hardware ▪ Intel Atom® Android OS ▪ 4.x, 5.x, 6.x Ubuntu Gaming Host OS ▪ Ubuntu 16.04 OpenGL ▪ 3.2, 3.3, 4.0, 4.1 (Core Profile) Target Hardware ▪ Intel® HD Graphics 4k-6k Target OS ▪ Ubuntu 16.04
  • 11.
  • 12.
    12 What’s Inside Intel®GPA? System Analyzer / HUD Graphics Frame Analyzer In-game analysis Single frame analysisTimeline analysis Graphics Monitor Launch & config tool Graphics Trace Analyzer
  • 13.
    13 How to UseIntel® GPA? HUD / System Analyzer: Frame Analyzer: Trace Analyzer: CPU Limited GPU Limited Capture Frame Capture Trace ?Run with Intel® GPA In-Game Analysis Offline Analysis
  • 14.
    14 Graphics Monitor Launch &config tool Define profiles & preferences
  • 15.
    15 System Analyzer /HUD In-game analysis • Get metrics for CPU, GPU, graphics drivers, DirectX*, OpenGL*, or OpenGL* ES • Experiment with override modes that quickly isolate common performance bottlenecks • Capture frames and traces for further analysis • Display up to 16 performance metrics simultaneously • Monitor the current, minimum, and maximum frame rate • Use without code modifications or special libraries
  • 16.
  • 17.
  • 18.
    18 Graphics Frame Analyzer In-gameanalysis • Use the API log to identify visual errors by function and call errors and warnings to graphics APIs • Select a draw call and verify its contribution to the frame, alpha channel, color, format, and depth buffers • Quantify performance optimization opportunities with render experiments per draw call • Solve issues with shadowing, lighting, or color schemes by locating misplaced objects
  • 19.
  • 20.
    Performance Analysis withHardware Metrics Evaluation Flow to find 3D hotspots 20
  • 21.
    21 Published in the6th generation graphics API dev guide : https://software.intel.com/en-us/articles/6th-gen-graphics-api-dev-guide
  • 22.
  • 23.
  • 24.
  • 25.
    25 Script Frustum Cullingand Co-routines Use the following Monobehavior callbacks to cull scripts outside of the camera frustum that do not need to update when not in focus. Monobehavior callbacks which trigger when object with script leaves / enters the camera frustum
  • 26.
    26 Script Frustum Cullingand Co-routines (2) Co-routines are essentially functions with the ability to pause and resume execution. The power of co-routines can be leveraged by removing the original Update() function in your script and replacing it with a co-routine. You can then set how often you would like your co- routine to execute using the yield command.
  • 27.
    27 Memory Management Optimization Agreat way to get an overview of how you are managing memory is to check the ‘GC Alloc’ section of the Overview window in Unity Profiler and step through your frames until you see a significant allocation. • To avoid frequent allocations, it is advantageous to use structs instead of classes to have allocations be done on the stack, instead of in the heap. • Multiple allocations to the heap can lead to significant memory fragmentation and frequent garbage collections.
  • 28.
    28 Occlusion Culling Occlusion cullingis a feature available in Unity that enables you to cull out objects that are occluded by other objects with respect to the camera.
  • 29.
    29 Occlusion Culling 1. Gothrough your entire scene to multi-select any objects that should be included in occlusion culling calculations and mark them as “Occluder Static” and “Occludee Static”. 2. When setting up your occlusion culling system, set your occlusion areas carefully.By default, Unity uses the entire scene as the occlusion area, which can lead to frivolous computation. 3. To make sure that the entire scene isn’t used, create an occlusion area manually and surround only the area to be included in the calculation.
  • 30.
    30 LOD Level of Detail(LOD) allows multiple meshes to attach to a game object and provides the ability to switch between meshes the object uses based on camera distance. The LOD can automatically simplify the mesh to compensate. LOD L0 L1 L2 fps 160 180 220
  • 31.
    31 Terrain Optimization • Samplerlimited • No dynamic branching • Optimized for Legacy HW where sampling was faster than computing LODs • Implementation of dynamic branching increased perf by 2x ( 3ms -> 1.5ms) • Using samplegrad for dynamic LOD selection
  • 32.
  • 33.
    33 Conclusion With the righttools : and the right methodology, finding performance bottlenecks is easy !
  • 34.
  • 35.
  • 36.
    Legal Notices andDisclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel, Atom and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation. 36