Forts and Fights Scaling Performance on Unreal Engine*

•Download as PPTX, PDF•

4 likes•2,773 views

1. The document outlines profiling tools and techniques for optimizing game performance in Unreal Engine 4, including tools for profiling CPU, GPU, memory, and content. 2. It provides guidance on using tools like the profiler, stat commands, VTune, rendering stats, and view modes to identify optimization opportunities for issues like animation updates, materials, lighting, culling. 3. The document highlights recent improvements in Unreal Engine 4.19 like improved cloth simulation and worker thread scaling that can enhance performance.

Technology

Jeff Rous – Senior Developer Relations Engineer, Intel
Peter Knepley – Technical Lead, Epic Games
Bob Tellez – Technical Lead, Epic Games

Agenda
Introductions
Super sweet Fortnite* video
Scalability in Unreal Engine* 4
Profiling tools
4.19 Goodness
Wrap up and What’s Next
2

5
Target Hardware - Research
1. Gather data from as many sources as
you can for GPUs and CPUs.
2. Create tables of benchmark scores to
card/chip name.
3. Determine target benchmark scores that
include a supported % of the population.
4. Make histograms of population by
benchmark.
5. Distribute into buckets of roughly equal
size.

Target Hardware - Decisions
• For each bucket, find a popular CPU and
GPU that is near the weaker side of the
bucket’s range.
• Do the research YOURSELF. Existing
data will likely be out of date since
desktop hardware changes frequently.
• Every platform and bucket you support is
another configuration to maintain and
test.
6

7
Shadows/Lighting
• Static lighting was not possible due to the
building and destruction features.
• Dynamic lighting is expensive for low end
machines, but can be awesome on high
end.
• We use simple forward shading for Save
the World* mode, but not Battle Royale*.
• High end machines look much better with
DistanceFieldAO enabled, so we optimized
it so it can be enabled on consoles as well.

8
Render Resolution
• Resolution dramatically affects GPU
performance.
• During development we used discrete
resolutions to make comparing
performance easier.
• This was very effective, but ultimately the
end-user experience is better with a slider.
• Render resolution does not affect UI.

9
Animation (URO)/Significance Manager
• Update Rate Optimization (URO) reduces
the tick frequency of animations.
• In the engine, URO is purely distance
based. In Fortnite*, there is a budget for
characters.
• The significance manager scores players
and enemies to make sure more important
characters animate at higher rates.
• It is also used to score other things
including particle systems and levels.

10
Material Quality
• Adding material quality nodes to materials
greatly improves GPU time on low end
machines.
• Artists must maintain them.
• Adding a material quality node will triple the
number of shaders it generates.
• Try to reduce dependent texture fetching
using this node.

11
HLOD/Distance Culling
• Save the World* uses distance culling a lot
due to the densely populated levels.
• We use a shader to cause object to
animate into view. This allowed us to be
more aggressive on low end machines.
• Set a range of object sizes and cull
distances.
• Battle Royale* required long distance
visibility so we used HLODs to represent
far away geometry.

13
“Am I CPU bound, or GPU bound?”
It displays the overall frame time, CPU
time taken by the game thread, the
render thread and the GPU time
Unreal Tournament* target on PC is 8
ms
Fortnite* target on consoles is 16 ms
CPU Profiling
Stat Unit

14
Creates a stats file using UE4’s stat
system with both native code timings
and blueprints
Can be useful to find general
performance issues as well as one time
hitches
Opened with the Stats Viewer in UE4
Editor’s Session Frontend
Easy to find ticking Objects that should
not be ticking
CPU Profiling
Stat StartFile / Stat StopFile

15
Helps find spikes in CPU time that are
harder to find with targeted tools
Callstacks are printed to the game log
The cost of running it is typically minor
so it can be left on during internal
playtests
Used on Fortnite* to find synchronous
loads
CPU Profiling
Stat Dumphitches

16
Used to find issues like unexpected
disk I/O during gameplay
Unreal Tournament* was calling
LoadLibrary every frame and not
finding the file
Issues like that can account for large
amount of the frame time on lower end
systems
CPU Profiling
Windows Performance Recorder and Analyzer

17
Intel® VTune™ Amplifier
Intel® Vtune™ Amplifier enables deep
profiling and problem identification.
Hotspots, locks, syncs, multithreading,
even GPU data!
With 4.19, new support for event
based CPU sampling using itt_notify
framework.
Vtune™ is now free!

18
Triangle count display
Unreal Tournament* has a triangle
count budget around 5 million for low
end
DM-Underland* had a landscape mesh
that was 7 million alone
CPU Profiling
Stat RHI

19
CPU Profiling
LOD Colorization View Mode
Identify meshes with no LODs
Identify LODs with wrong transition
points
On Unreal Tournament*, helped find
rocks in the distance that had no LODs

20
Our built-in tool for displaying the GPU
time breakdown
r.ProfileGPU.ShowUI can be used to
suppress the popup window
Use different values of r.SetRes and
r.ScreenPercentage to determine if you
are vertex or pixel bound
UT switched Simple Forward Shading
for low end
GPU Profiling
ProfileGPU

21
Intel® Graphics Performance Analyzers
Use ToggleDrawEvents and
r.ShowMaterialDrawEvents commands
Frame debugging / live mode
Experiments!

22
Help track materials that may be
overbudget by visualizing their cost
Green is good
White is bad
DM-Underland* has coral foliage that is
white hot
Lowered draw distance and simplified
shader
GPU Profiling
Shader Complexity View Mode

23
Memreport –full
Generates a log file with a breakdown
of memory usage
Listtextures
Generates a log file or a csv
Keep a spreadsheet of textures each
release to watch for usage changes
Memory Profiling
Common Tools

24
Look out for
• Overly large assets
• Content that does not belong
Count lists time in map
Triangle count per asset
Unreal Tournament* modified the panel
to show LOD count for static meshes
Memory Profiling
Primitive Stats Viewer

25
Look out for
• Wrong group
• Wrong LODBias
• Uncompressed textures
• Non-mipping textures
• Bad dimensions
Memory Profiling
Texture Stats Viewer

26
Unreal Engine* 4.19 Goodness
Worker threads scale with CPU. No
more idle cores!
Cloth throughput improved ~30%.
Intel® Vtune™ Amplifier Support –
Gives deep insight into what the
engine is doing at all times. Enables
profiling of task scheduler that was
previously opaque.
4.19 is available now! Upgrade to take
advantage of these improvements!

Call To Action
Scalability is a question of quality. Make your game look as good as possible on
as many machines as you can!
Check out the docs and videos for all of our profiling tools!
Check out the Unreal* demos in the Intel and Epic booths!
27

28
Links
Intel Developer Zone (software.intel.com/gamedev/partners/unreal)
Unreal Profiling Tools (docs.unrealengine.com/en-us/Engine/Performance)
Fortnite (www.epicgames.com/fortnite/en-US/home)
Unreal Tournament (www.epicgames.com/unrealtournament)
Unreal 4.19 Optimizations (software.intel.com/en-us/articles/intel-software-
engineers-assist-with-unreal-engine-419-optimizations)
Unreal Engine 4 Optimization Guide (software.intel.com/en-us/articles/unreal-
engine-4-optimization-tutorial-part-1)

Forts and Fights Scaling Performance on Unreal Engine*

With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing. Takeaway: Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency. Intended Audience: This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.

Scalability for All: Unreal Engine* 4 with Intel

Intel® Software

「原神」におけるコンソールプラットフォーム開発

Unity Technologies Japan K.K.

【Unite 2017 Tokyo】最適化をする前に覚えておきたい技術

Unity Technologies Japan K.K.

UE4 Lightmass for Large Console Games (UE4 Lightmass Deep Dive)

エピック・ゲームズ・ジャパン Epic Games Japan

Rendering AAA-Quality Characters of Project A1

Ki Hyunwoo

Lightmassの仕組み ~Lightmap編~ (Epic Games Japan: 篠山範明)

エピック・ゲームズ・ジャパン Epic Games Japan

Mieszko Zielinski (Epic Games), White Nights 2015

White Nights Conference

2014年12月14日に大阪で行われたUE4背景アーティスト勉強会で使用したスライドになります。勉強会当時よりも少し補足解説が追加されています。ゲーム開発における背景のワークフローの解説をレベルデザインパイプラインの流れに沿って解説をします。最初に軽くレベルデザインパイプラインの説明をし、その中で背景制作でもっとも重要なアートプロトタイプの工程について解説をしています。勉強会当時の動画がYoutubeにアップしてありますので、あわせてご覧ください。 https://www.youtube.com/watch?v=huJ69V-FGog

GPU最適化入門

Takahiro KOGUCHI

なぜなにFProperty - 対応方法と改善点 -

エピック・ゲームズ・ジャパン Epic Games Japan

講演動画：https://www.youtube.com/watch?v=tqzJSVrnqgY UE4.25にてUProperty は FProperty にリファクタリングされました。この対応によりロード時間・パフォーマンス・メモリ消費量が改善されましたが、プロジェクトやエンジンのカスタムした箇所のC++コードに変更を加える必要があります。この動画ではそれらの詳細について解説いたします。講師：Software Engineer Developer Relations, 鈴木孝司 ( https://twitter.com/wankotank ) #EGJオンラインラーニング https://www.unrealengine.com/ja/blog/connect-with-the-unreal-engine-community-online

Parallel Futures of a Game Engine

Johan Andersson

Game engines have long been in the forefront of taking advantage of the ever increasing parallel compute power of both CPUs and GPUs. This talk is about how the parallel compute is utilized in practice on multiple platforms today in the Frostbite game engine and how we think the parallel programming models, hardware and software in the industry should look like in the next 5 years to help us make the best games possible

【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019

UnityTechnologiesJapan002

2019/9/25-6に開催されたUnite Tokyo 2019の講演スライドです。 SangYun Yi（Unity Technologies Korea）こんな人におすすめ・3D背景アーティスト・テクニカルアーティスト・ライティングアーティスト受講者が得られる知見・プログレッシブライトマッパーの基本的な使い方・各プラットフォームごとのLWRP、HDRPでのプログレッシブライトマッパーの効果的な使い方・プログレッシブライトマッパーの各オプションの機能 Unityのイベント資料はこちらから： https://www.slideshare.net/UnityTechnologiesJapan/clipboards

Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games

Epic Games China

DirectX 11 Rendering in Battlefield 3

Electronic Arts / DICE

【CEDEC2017】Unityを使ったNintendo Switch™向けのタイトル開発・移植テクニック!!

Unity Technologies Japan K.K.

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)

Johan Andersson

UE4におけるLoadingとGCのProfilingと最適化手法

エピック・ゲームズ・ジャパン Epic Games Japan

Localization feature of ue4

エピック・ゲームズ・ジャパン Epic Games Japan

Lighting of Killzone: Shadow Fall

Guerrilla

The presentation describes Physically Based Lighting Pipeline of Killzone : Shadow Fall - Playstation 4 launch title. The talk covers studio transition to a new asset creation pipeline, based on physical properties. Moreover it describes light rendering systems used in new 3D engine built from grounds up for upcoming Playstation 4 hardware. A novel real time lighting model, simulating physically accurate Area Lights, will be introduced, as well as hybrid - ray-traced / image based reflection system. We believe that physically based rendering is a viable way to optimize asset creation pipeline efficiency and quality. It also enables the rendering quality to reach a new level that is highly flexible depending on art direction requirements.

そう、UE4ならね。あなたのモバイルゲームをより快適にする沢山の冴えたやり方について Part 1 <Shader Compile, PSO Cache編>

エピック・ゲームズ・ジャパン Epic Games Japan

補足：LRUキャッシュの導入を検討する際は OpenGL.UseEmulatedUBsの有効化も合わせてご検討ください。講演動画：https://youtu.be/A_l65FlY25I Part 2：https://www.slideshare.net/EpicGamesJapan/ue4-festeast2019-ue4mobilepart2-179705328 2019年10月6日に行われた「UNREAL FEST EAST 2019」で登壇した際に使用した資料です。 ●公式サイト https://unrealengine.jp/unrealfest/ === シェーダコンパイルによるカクツキなどモバイルゲーム開発特有の問題は数多くあり、それらはユーザのストレスに繋がる可能性があります。UE4はそういった問題に対しての機能を持っていますが、用法・用量を守って正しく使わないと別の問題を引き起こしてしまいます。そこで本講演ではそれらの機能の使い方、注意点などについて解説します（他のプラットフォーム開発でも役立つ内容にする予定です）。あ、今年は1人講演です。

猫でも分かるUE4.22から入ったSubsystem

エピック・ゲームズ・ジャパン Epic Games Japan

講演動画はこちら https://www.youtube.com/watch?v=Wbq3KO3ZJaI ※補足 Q,使用するSubSystemを取捨選択したい場合はどうするの？ A.現状はPlugin内に自作Subsystemを置いて、プラグインをONOFFすることになるはずです。発表者：岡田和也（Epic Games Japan）本スライドは2020年2月6日に行われた勉強会「第4回 UE4何でも勉強会 in 東京」の講演資料です。 https://ue4allstudy.connpass.com/event/161710/

なぜなにリアルタイムレンダリング

Satoshi Kodaira

Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

AMD Developer Central

Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/

Moving Frostbite to Physically Based Rendering

Electronic Arts / DICE

Course presentation at SIGGRAPH 2014 by Charles de Rousiers and Sébastian Lagarde at Electronic Arts about transitioning the Frostbite game engine to physically-based rendering. Make sure to check out the 118 page course notes on: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/ ‎ During the last few months, we have revisited the concept of image quality in Frostbite. The core of our approach was to be as close as possible to a cinematic look. We used the concept of reference to evaluate the accuracy of produced images. Physically based rendering (PBR) was the natural way to achieve this. This talk covers all the different steps needed to switch a production engine to PBR, including the small details often bypass in the literature. The state of the art of real-time PBR techniques allowed us to achieve good overall results but not without production issues. We present some techniques for improving convolution time for image based reflection, proper ambient occlusion handling, and coherent lighting units which are mandatory for level editing. Moreover, we have managed to reduce the quality gap, highlighted by our systematic reference comparison, in particular related to rough material handling, glossy screen space reflection, and area lighting. The technical part of PBR is crucial for achieving good results, but represents only the top of the iceberg. Frostbite has become the de facto high-end game engine within Electronic Arts and is now used by a large amount of game teams. Moving all these game teams from “old fashion” lighting to PBR has required a lot of education, which have been done in parallel of the technical development. We have provided editing and validation tools to help the transition of art production. In addition, we have built a flexible material parametrisation framework to adapt to the various authoring tools and game teams’ requirements.

【UE4.25 新機能】ロードの高速化機能「IOStore」について

エピック・ゲームズ・ジャパン Epic Games Japan

Fortniteを支える技術

エピック・ゲームズ・ジャパン Epic Games Japan

Accelerate Your Game Development on Android*

Intel® Software

【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh

MAKERPRO.cc

What's hot

Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson

AMD Developer Central

UE4背景アーティスト勉強会（前編）背景ワークフロー解説

Aiko Shinohara

GPU最適化入門

Takahiro KOGUCHI

なぜなにFProperty - 対応方法と改善点 -

エピック・ゲームズ・ジャパン Epic Games Japan

Parallel Futures of a Game Engine

Johan Andersson

【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019

UnityTechnologiesJapan002

Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games

Epic Games China

DirectX 11 Rendering in Battlefield 3

Electronic Arts / DICE

【CEDEC2017】Unityを使ったNintendo Switch™向けのタイトル開発・移植テクニック!!

Unity Technologies Japan K.K.

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)

Johan Andersson

UE4におけるLoadingとGCのProfilingと最適化手法

エピック・ゲームズ・ジャパン Epic Games Japan

Localization feature of ue4

エピック・ゲームズ・ジャパン Epic Games Japan

Lighting of Killzone: Shadow Fall

Guerrilla

そう、UE4ならね。あなたのモバイルゲームをより快適にする沢山の冴えたやり方について Part 1 <Shader Compile, PSO Cache編>

エピック・ゲームズ・ジャパン Epic Games Japan

猫でも分かるUE4.22から入ったSubsystem

エピック・ゲームズ・ジャパン Epic Games Japan

なぜなにリアルタイムレンダリング

Satoshi Kodaira

Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

AMD Developer Central

Moving Frostbite to Physically Based Rendering

Electronic Arts / DICE

【UE4.25 新機能】ロードの高速化機能「IOStore」について

エピック・ゲームズ・ジャパン Epic Games Japan

Fortniteを支える技術

エピック・ゲームズ・ジャパン Epic Games Japan

What's hot (20)

Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson

UE4背景アーティスト勉強会（前編）背景ワークフロー解説

GPU最適化入門

なぜなにFProperty - 対応方法と改善点 -

Parallel Futures of a Game Engine

【Unite Tokyo 2019】Unityプログレッシブライトマッパー2019

Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games

DirectX 11 Rendering in Battlefield 3

【CEDEC2017】Unityを使ったNintendo Switch™向けのタイトル開発・移植テクニック!!

Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)

UE4におけるLoadingとGCのProfilingと最適化手法

Localization feature of ue4

Lighting of Killzone: Shadow Fall

そう、UE4ならね。あなたのモバイルゲームをより快適にする沢山の冴えたやり方について Part 1 <Shader Compile, PSO Cache編>

猫でも分かるUE4.22から入ったSubsystem

なぜなにリアルタイムレンダリング

Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14

Moving Frostbite to Physically Based Rendering

【UE4.25 新機能】ロードの高速化機能「IOStore」について

Fortniteを支える技術

Similar to Forts and Fights Scaling Performance on Unreal Engine*

Accelerate Your Game Development on Android*

Intel® Software

【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh

MAKERPRO.cc

Getting Space Pirate Trainer* to Perform on Intel® Graphics

Intel® Software

Optimization Deep Dive: Unreal Engine 4 on Intel

Intel® Software

Intel Core X-seires processors

Low Hong Chuan

Intel Technologies for High Performance Computing

Intel Software Brasil

Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...

Intel® Software

E5 Intel Xeon Processor E5 Family Making the Business Case

Intel IT Center

Performance out of the box developers

Michelle Holley

Intel python 2017

DESMOND YUEN

Python* Scalability in Production Environments

Intel® Software

4th gen intelcoreprocessor family

Ahnku Toh

Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОДNick Turunov

Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...

Intel® Software

Software Development Tools for Intel® IoT Platforms

Intel® Software

Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...

Databricks

Apache Spark is a popular data processing engine designed to execute advanced analytics on very large data sets which are common in today’s enterprise use cases. To enable Spark’s high performance for different workloads (e.g. machine-learning applications), in-memory data storage capabilities are built right in. However, Spark’s in-memory capabilities are limited by the memory available in the server; it is common for computing resources to be idle during the execution of a Spark job, even though the system’s memory is saturated. To mitigate this limitation, Spark’s distributed architecture can run on a cluster of nodes, thus taking advantage of the memory available across all nodes. While employing additional nodes would solve the server DRAM capacity problem, it does so at an increased cost. Intel(R) Memory Drive Technology is a software-defned memory (SDM) technology, which combined with an Intel(R) Optane(TM) SSD, expands the system’s memory. This combination of Intel(R) Optane(TM) SSD with Intel Memory Drive Technology alleviates those memory limitations that are inherent to Spark, by making more memory available to the operating system and to Spark jobs, transparently.

The Architecture of Intel Processor Graphics: Gen 11

DESMOND YUEN

The Architecture of 11th Generation Intel® Processor Graphics

Intel® Software

Optimizing Direct X On Multi Core Architectures

psteinb

Debug, Analyze and Optimize Games with Intel Tools

Matteo Valoriani

Similar to Forts and Fights Scaling Performance on Unreal Engine* (20)

Accelerate Your Game Development on Android*

【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh

Getting Space Pirate Trainer* to Perform on Intel® Graphics

Optimization Deep Dive: Unreal Engine 4 on Intel

Intel Core X-seires processors

Intel Technologies for High Performance Computing

Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...

E5 Intel Xeon Processor E5 Family Making the Business Case

Performance out of the box developers

Intel python 2017

Python* Scalability in Production Environments

4th gen intelcoreprocessor family

Как выбрать оптимальную серверную архитектуру для создания высокоэффективных ЦОД

Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...

Software Development Tools for Intel® IoT Platforms

Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...

The Architecture of Intel Processor Graphics: Gen 11

The Architecture of 11th Generation Intel® Processor Graphics

Optimizing Direct X On Multi Core Architectures

Debug, Analyze and Optimize Games with Intel Tools

More from Intel® Software

AI for All: Biology is eating the world & AI is eating Biology

Intel® Software

Advances in cell biology and creation of an immense amount of data are converging with advances in Machine learning to analyze this data. Biology is experiencing its AI moment and driving the massive computation involved in understanding biological mechanisms and driving interventions. Learn about how cutting edge technologies such as Software Guard Extensions (SGX) in the latest Intel Xeon Processors and Open Federated Learning (OpenFL), an open framework for federated learning developed by Intel, are helping advance AI in gene therapy, drug design, disease identification and more.

Python Data Science and Machine Learning at Scale with Intel and Anaconda

Intel® Software

Python is the number 1 language for data scientists, and Anaconda is the most popular python platform. Intel and Anaconda have partnered to bring scalability and near-native performance to Python with simple installations. Learn how data scientists can now access oneAPI-optimized Python packages such as NumPy, Scikit-Learn, Modin, Pandas, and XGBoost directly from the Anaconda repository through simple installation and minimal code changes.

Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci

Intel® Software

Preprocess, visualize, and Build AI Faster at-Scale on Intel Architecture. Develop end-to-end AI pipelines for inferencing including data ingestion, preprocessing, and model inferencing with tabular, NLP, RecSys, video and image using Intel oneAPI AI Analytics Toolkit and other optimized libraries. Build at-scale performant pipelines with Databricks and end-to-end Xeon optimizations. Learn how to visualize with the OmniSci Immerse Platform and experience a live demonstration of the Intel Distribution of Modin and OmniSci.

AI for good: Scaling AI in science, healthcare, and more.

Intel® Software

Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...

Intel® Software

Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...

Intel® Software

Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...

Intel® Software

AWS & Intel Webinar Series - Accelerating AI Research

Intel® Software

Intel Developer Program

Intel® Software

Whether you are an AI, HPC, IoT, Graphics, Networking or Media developer, visit the Intel Developer Zone today to access the latest software products, resources, training, and support. Test-drive the latest Intel hardware and software products on DevCloud, our online development sandbox, and use DevMesh, our online collaboration portal, to meet and work with other innovators and product leaders. Get started by joining the Intel Developer Community @ software.intel.com.

Intel AIDC Houston Summit - Overview Slides

Intel® Software

AIDC NY: BODO AI Presentation - 09.19.2019

Intel® Software

AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019

Intel® Software

Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...

Intel® Software

Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...

Intel® Software

Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...

Intel® Software

RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...

Intel® Software

AIDC India - AI on IA

Intel® Software

AIDC India - Intel Movidius / Open Vino Slides

Intel® Software

AIDC India - AI Vision Slides

Intel® Software

Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...

Intel® Software

More from Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology

Python Data Science and Machine Learning at Scale with Intel and Anaconda

Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci

AI for good: Scaling AI in science, healthcare, and more.

Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...

Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...

Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...

AWS & Intel Webinar Series - Accelerating AI Research

Intel Developer Program

Intel AIDC Houston Summit - Overview Slides

AIDC NY: BODO AI Presentation - 09.19.2019

AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019

Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...

Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...

Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...

RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...

AIDC India - AI on IA

AIDC India - Intel Movidius / Open Vino Slides

AIDC India - AI Vision Slides

Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

The Future of Platform Engineering

Jemma Hussein Allen

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Bits & Pixels using AI for Good.........

Alison B. Lowndes

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

Knowledge engineering: from people to machines and back

Elena Simperl

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales

UiPath Test Automation using UiPath Test Suite series, part 4

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Neuro-symbolic is not enough, we need neuro-*semantic*

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

The Future of Platform Engineering

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Essentials of Automations: Optimizing FME Workflows with Parameters

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Assuring Contact Center Experiences for Your Customers With ThousandEyes

Leading Change strategies and insights for effective change management pdf 1.pdf

GraphRAG is All You need? LLM & Knowledge Graph

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

When stars align: studies in data quality, knowledge graphs, and machine lear...

Bits & Pixels using AI for Good.........

Key Trends Shaping the Future of Infrastructure.pdf

Knowledge engineering: from people to machines and back

Forts and Fights Scaling Performance on Unreal Engine*

1. Jeff Rous – Senior Developer Relations Engineer, Intel Peter Knepley – Technical Lead, Epic Games Bob Tellez – Technical Lead, Epic Games

2. Agenda Introductions Super sweet Fortnite* video Scalability in Unreal Engine* 4 Profiling tools 4.19 Goodness Wrap up and What’s Next 2

3. Embedded promo video 3

4. 4

5. 5 Target Hardware - Research 1. Gather data from as many sources as you can for GPUs and CPUs. 2. Create tables of benchmark scores to card/chip name. 3. Determine target benchmark scores that include a supported % of the population. 4. Make histograms of population by benchmark. 5. Distribute into buckets of roughly equal size.

6. Target Hardware - Decisions • For each bucket, find a popular CPU and GPU that is near the weaker side of the bucket’s range. • Do the research YOURSELF. Existing data will likely be out of date since desktop hardware changes frequently. • Every platform and bucket you support is another configuration to maintain and test. 6

7. 7 Shadows/Lighting • Static lighting was not possible due to the building and destruction features. • Dynamic lighting is expensive for low end machines, but can be awesome on high end. • We use simple forward shading for Save the World* mode, but not Battle Royale*. • High end machines look much better with DistanceFieldAO enabled, so we optimized it so it can be enabled on consoles as well.

8. 8 Render Resolution • Resolution dramatically affects GPU performance. • During development we used discrete resolutions to make comparing performance easier. • This was very effective, but ultimately the end-user experience is better with a slider. • Render resolution does not affect UI.

9. 9 Animation (URO)/Significance Manager • Update Rate Optimization (URO) reduces the tick frequency of animations. • In the engine, URO is purely distance based. In Fortnite*, there is a budget for characters. • The significance manager scores players and enemies to make sure more important characters animate at higher rates. • It is also used to score other things including particle systems and levels.

10. 10 Material Quality • Adding material quality nodes to materials greatly improves GPU time on low end machines. • Artists must maintain them. • Adding a material quality node will triple the number of shaders it generates. • Try to reduce dependent texture fetching using this node.

11. 11 HLOD/Distance Culling • Save the World* uses distance culling a lot due to the densely populated levels. • We use a shader to cause object to animate into view. This allowed us to be more aggressive on low end machines. • Set a range of object sizes and cull distances. • Battle Royale* required long distance visibility so we used HLODs to represent far away geometry.

12. 12

13. 13 “Am I CPU bound, or GPU bound?” It displays the overall frame time, CPU time taken by the game thread, the render thread and the GPU time Unreal Tournament* target on PC is 8 ms Fortnite* target on consoles is 16 ms CPU Profiling Stat Unit

14. 14 Creates a stats file using UE4’s stat system with both native code timings and blueprints Can be useful to find general performance issues as well as one time hitches Opened with the Stats Viewer in UE4 Editor’s Session Frontend Easy to find ticking Objects that should not be ticking CPU Profiling Stat StartFile / Stat StopFile

15. 15 Helps find spikes in CPU time that are harder to find with targeted tools Callstacks are printed to the game log The cost of running it is typically minor so it can be left on during internal playtests Used on Fortnite* to find synchronous loads CPU Profiling Stat Dumphitches

16. 16 Used to find issues like unexpected disk I/O during gameplay Unreal Tournament* was calling LoadLibrary every frame and not finding the file Issues like that can account for large amount of the frame time on lower end systems CPU Profiling Windows Performance Recorder and Analyzer

17. 17 Intel® VTune™ Amplifier Intel® Vtune™ Amplifier enables deep profiling and problem identification. Hotspots, locks, syncs, multithreading, even GPU data! With 4.19, new support for event based CPU sampling using itt_notify framework. Vtune™ is now free!

18. 18 Triangle count display Unreal Tournament* has a triangle count budget around 5 million for low end DM-Underland* had a landscape mesh that was 7 million alone CPU Profiling Stat RHI

19. 19 CPU Profiling LOD Colorization View Mode Identify meshes with no LODs Identify LODs with wrong transition points On Unreal Tournament*, helped find rocks in the distance that had no LODs

20. 20 Our built-in tool for displaying the GPU time breakdown r.ProfileGPU.ShowUI can be used to suppress the popup window Use different values of r.SetRes and r.ScreenPercentage to determine if you are vertex or pixel bound UT switched Simple Forward Shading for low end GPU Profiling ProfileGPU

21. 21 Intel® Graphics Performance Analyzers Use ToggleDrawEvents and r.ShowMaterialDrawEvents commands Frame debugging / live mode Experiments!

22. 22 Help track materials that may be overbudget by visualizing their cost Green is good White is bad DM-Underland* has coral foliage that is white hot Lowered draw distance and simplified shader GPU Profiling Shader Complexity View Mode

23. 23 Memreport –full Generates a log file with a breakdown of memory usage Listtextures Generates a log file or a csv Keep a spreadsheet of textures each release to watch for usage changes Memory Profiling Common Tools

24. 24 Look out for • Overly large assets • Content that does not belong Count lists time in map Triangle count per asset Unreal Tournament* modified the panel to show LOD count for static meshes Memory Profiling Primitive Stats Viewer

25. 25 Look out for • Wrong group • Wrong LODBias • Uncompressed textures • Non-mipping textures • Bad dimensions Memory Profiling Texture Stats Viewer

26. 26 Unreal Engine* 4.19 Goodness Worker threads scale with CPU. No more idle cores! Cloth throughput improved ~30%. Intel® Vtune™ Amplifier Support – Gives deep insight into what the engine is doing at all times. Enables profiling of task scheduler that was previously opaque. 4.19 is available now! Upgrade to take advantage of these improvements!

27. Call To Action Scalability is a question of quality. Make your game look as good as possible on as many machines as you can! Check out the docs and videos for all of our profiling tools! Check out the Unreal* demos in the Intel and Epic booths! 27

28. 28 Links Intel Developer Zone (software.intel.com/gamedev/partners/unreal) Unreal Profiling Tools (docs.unrealengine.com/en-us/Engine/Performance) Fortnite (www.epicgames.com/fortnite/en-US/home) Unreal Tournament (www.epicgames.com/unrealtournament) Unreal 4.19 Optimizations (software.intel.com/en-us/articles/intel-software- engineers-assist-with-unreal-engine-419-optimizations) Unreal Engine 4 Optimization Guide (software.intel.com/en-us/articles/unrealengine-4-optimization-tutorial-part-1)

29. Legal Notices and Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. Intel, Core and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation.

Editor's Notes

Hi everyone, thanks for coming. This is Forts and Fights: Scaling performance on Unreal Engine. Today we’re going to dig in on how Intel and Epic worked together to optimize Fortnite and Unreal Tournament. This talk is a culmination of 5 years of Intel / Epic collaboration with lots of optimization trips in between and the learnings that came from them. We’ll start off with a short video and then get into scalability, profiling tools, and a bit about the work that we’ve done on the 4.19 release. [Introductions] That brings me to the introductions. I’m Jeff Rous, a senior developer relations engineer at Intel. What my team does it work closely with folks like Epic here to optimize their games for the CPU and GPU. I’ve had the pleasure of working on Paragon, Unreal Tournament and Fortnite over the years. We also do quite a bit of engine optimization work which I’ll talk about towards the end. I’ve been with Intel for 14 years now. Hi, my name is Peter Knepley, I'm Technical Lead on Fortnite Battle Royale and previous to that I was Technical Lead on Unreal Tournament. I've worked at Epic Games for 8 years as a gameplay programmer and I've spent a lot of time profiling our games. I've shipped Gears 3, Gears Judgment, Unreal Tournament, Paragon, and Fortnite with Epic. Hey everyone! I'm Bob Tellez, and I've been working on Fortnite as a Technical Lead for about four and a half years. Previously, I've worked as an engine programmer on Unreal Engine for about two years. I've spent a lot of time profiling, tweaking, and optimizing various aspects of Fortnite and I'd love to tell you all about it! In this presentation I'm going to be sharing how we chose the target hardware for Fortnite, and what features we configured in order to both run well and look great on these machines.
[Target Hardware - Research] The first thing you need to know when optimizing your game is what machines will be running it. It may be tempting to just make some gut decisions about what platforms, GPUs and CPUs are popular, but I encourage you to do some research to make the best possible decisions to make your game look as good as it can for as many people as possible. To do this you should first try to find as many sources of data about the hardware that your *potential* gamers are using. The Steam Hardware & Software Survey is a great public source of information but you may also have private information as well. For example, when evaluating Fortnite, we gathered data about those who have opened the Epic Games Launcher to play other Epic games. We also got some data from Tencent about users in different regions of the world. What you are looking for in this data is a list of unique video cards and CPUs that are actually used by a non-trivial number of people. Do the best you can. I have found that there are cards and chips that misreport their names or have inconsistent names, but they should not be very common and I just trimmed the dirty data. You should then create a table of benchmark scores for each of these GPUs and CPUs. You can find these benchmarks on the website of your choice. I used videocardbenchmark.net. Sort this table by benchmark score so you can now see how low you need to go to hit a target percentage of the population. Your target percentage might depend on the size of your project/company/budget. It can be challenging to make a modern game run on very old hardware, so make sure you are up to the task! To make something really EPIC like Fortnite, you'll probably want to support at least 90% of potential users. Trim all hardware that is below the benchmark scores for your target percentage. Now you'll need to visualize the data to divide it up appropriately into a few discrete buckets, so you'll want to make a histogram like the one shown on this slide. Play around with the histogram bin size until you have a good feeling for the population distribution. At this point you have enough information to determine your target spec machines! Try to divide the population MOSTLY equally, but feel free to move the division lines around a little to to put a popular set of cards with similar strength near the bottom of a bucket. The number of buckets to choose very much depends on how much work you want to put into supporting settings configurations. Having more buckets allows more machines to look awesome, but it can be quite costly for many people on your team to support tons of configurations. For Fortnite, we decided to have four: Low, Medium, High, and EPIC.
[Target Hardware - Decisions] At this point in the process, you should now have a good idea of what people are using. It's time to choose some hardware that represents each bucket. You'll likely need to purchase this hardware, and sometimes it's hard to get a hold of older chips and cards, which is why the hardware you choose to represent each bucket will need to have been somewhat popular when it was released. While working on Fortnite, I found that old hardware tends to break so there is a good chance you'll need to buy a replacement. If your choice was popular enough, you have a good shot being able to find an exact replacement. Otherwise, you will need to change your choice which will affect your ability to compare performance between builds. You may have noticed on my previous slide that I did not show any numbers in the histogram. This is because I would like to encourage you do this research yourself! You may find some canned research available online, but desktop hardware changes very frequently so I suggest you go through this process with as fresh data as you can, and re-evaluate from time to time. Depending on the scope of your project you may only need to do this once, or if you are a live game like Fortnite you may need to do this every 6 months or so. Remember that changing your target hardware for any reason is pretty disruptive since you will need to change many settings, so I would recommend sticking with your decisions for a good while before re-evaluating. Speaking of maintainability, keep in mind the overall number of combinations of settings that you will need to support. This presentation, so far, has largely been about desktop hardware. If you plan on supporting consoles or mobile platforms, know that each of them will also have one or more buckets. All buckets will need to be tested so adding platforms greatly increases the amount of work that needs to be done. Luckily for Fortnite, the settings for "High" desktop were somewhat close to PS4 and XB1, so those consoles use these settings with some tweaks. Xbox One X and PS4 Pro use "Epic" desktop settings with tweaks. Keeping the settings similar made it easier to keep track of the quality of consoles even while testing on desktop and vice versa.
[Shadows/Lighting] Once you know your target hardware, it's time to start optimizing and configuring settings. There are very many settings to tweak, but in this presentation I will only be talking about a few of the settings that have the highest impact on framerate. Let's start with shadows and lighting. One of the best things you can do is use static lighting in your game where possible. Unfortunately, in Fortnite we could not do this for pretty much anything due to the player's ability to create or destroy nearly all of the environment combined with the continuous day/night cycle. We chose to disable static lighting entirely and go fully dynamic! Dynamic lighting can be somewhat expensive for weak GPUs, but it looks and works great in our game so we looked into some options. Initially, we turned on simple forward shading for low end machines, which was a rendering mode that allowed us to have very simple lighting but skip many of the expensive parts of our deferred renderer. This is very effective for framerate and to get things working, but the visual quality was a little lacking. In Battle Royale we optimized other parts of our GPU usage and turned this feature back off, but in Save the World it is still on. In the future we intend to have it off in both modes once we find some more GPU time in Save the World. One of the biggest quality improvements to Fortnite was when we started using distance field ambient occlusion. This was initially very expensive and only enabled when using Epic settings, but we optimized it and added a lower resolution mode to allow it to be used in High settings and consoles as well. Depending on your game, you will probably want this setting enabled on the lowest bucket you can to make it look awesome. Try to make actors that move a lot not affect distance fields by setting a flag on them that disables them, and let mostly static parts of the environment be affected.
[Render Resolution] Render resolution dramatically affects GPU performance in Fortnite. We use very many postprocess render targets, and some of them have somewhat pricey pixel shaders. In the early stages of the game's development, we just had a slider to control the percentage scale of your monitor you wanted your render resolution to be. This was very bad when comparing performance between machines because your monitor's supported resolution would greatly affect your framerate and people rarely report their monitor resolution when listing hardware specs. To combat this, we changed over to having discrete buttons in the setting screen to set your resolution to 480p, 700p, 1080p, and 1440p, regardless of your monitor size. We quickly learned that the "Epic" 1440p size was not enough for folks who had very fast GPUs so to trade off user experience for practical measurement, we changed "Epic" to just be full resolution of your monitor. While this made comparing Epic machines hard again, it was an acceptable compromise for a long time. Eventually we brought back a slider where you would choose the size instead of percentage because once our game was made available to far more people, the end-user convenience outweighed so practical benefit. On consoles we recently started using dynamic scaling so the game can look as good as it can given the other things happening in the scene. One last thing to keep in mind about render resolution is that it does not affect the UI for obvious reasons, so while this is still a powerful setting if you happen to have a UI heavy game this setting may do less for you than you think.
[Animation (URO)/Significance Manager] So far I have been talking about scaling based on GPU performance, but let's not neglect the CPU! One of the large CPU costs in Fortnite is updating animations. The Unreal Engine provides a way to reduce the frequency of animation updates called Update Rate Optimization, or URO. By default, URO is purely based on distance which works pretty well in general cases. In Fortnite, however, we changed this behavior to use a budget for characters that is based on a score we call significance which is calculated by a significance manager. The budget only allows a certain number of characters to update at a full rate while others have reduced rates, which makes cases that involve many characters on the screen to still maintain your target framerate even if they are all close to you. This budget scales to be stricter on weaker hardware. In Fortnite, a character's score is based not only on distance to the camera, but also on screen space size. This gracefully handles cases like using a sniper scope to zoom in on a player. You can use the significance manager to score more things than just characters. The significance manager works nicely with particle effects for cases where many are used at once. It is also used in Fortnite to handle level streaming.
[Material Quality] If you have a lot of complex materials, you should consider adding material quality nodes to them. Making simpler versions of materials can greatly improve GPU time on low end machines. You can use the "Shader Complexity" viewmode to try to find your worst offenders and focus on making them, and Pete will be talking about that viewmode later in the presentation. There are a couple downsides to using quality nodes. Those who work on the materials will have to maintain them, and this can be hard if you have very many. Also every time a quality node shows up in a material it triples the number of shaders the material makes. This normally is not a big deal but keep it in mind and don't go crazy with them. When you use quality nodes, you'll generally want to focus on reducing the number of instructions that are generated, but also generally expensive operations like dependent texture fetching which we found several cases of in the Fortnite terrain materials.
[HLOD/Distance Culling] The last big performance settings I'm going to be talking about are Hierarchical Level of Detail (HLOD) and distance culling. In Save the World we had somewhat densely populated levels with many small actors that are all independently destroyed and this put a strain on occlusion culling. We combat this by bringing in distance culling quite a bit on weaker machines, but as expected it made it so the user could see when objects disappeared at a distance. Since Fortnite is a fun and "bouncy" game, instead of trying to hide the culling we added some shader logic to cause actors to animate in and out when culling happened. This is much less jarring and allowed us to be very aggressive. To make sure that the general shape of the level far away from you remained mostly intact, we made the cull distance of all actors in our levels based on the size of the object so that trees and buildings would be the last to cull. In Battle Royale, combat and skydiving is done over large distances and we could not be very aggressive with cull distance at all. Instead we use HLOD to represent large portions of the map, which involves a process that automatically generates a mesh representing the actors in a level. These levels are completely unloaded from memory and streamed in as the camera comes close to them. Once streamed in the HLOD mesh is removed revealing the loaded level.
[Overview of Profiling] -X seconds This is a screenshot of the DM-Underland map that shipped with the latest Unreal Tournament game. The goal was to get this map running at 120 fps on discreet GPU computers and at least 30 fps on laptops with HD-4000. Any given view could be several million polys. I'm here today to talk about Intel and Epic came together to optimize this map. Hopefully you can apply any or all of these techniques on your UE4 game to make sure that it not only uses your desired amount of CPU and GPU, but also fits into memory on your platform of choice.
[CPU Profiling] [stat unit] I'm going to start with CPU profiling. The first step in our profiling journey is stat unit. It displays the overall frame time which is then broken down by CPU time spent in the game and render threads and the GPU time. This helps answer the age old question "Am I CPU bound, or GPU bound?" We need to know the answer to that question to know where to start optimizing. The threads are fairly parallel so we only have to worry about the one with the longest time currently shown in stat unit. After every big change, I come back to stat unit to make sure that the frame time is going down and to know where to optimize next. On Unreal Tournament which was a PC only title, the high end target was 8ms of frame time to hit 120 frames per seconds and for HD-4000 class hardware the target was more lenient at 33 ms of frame time or 30 frames per second. On Fortnite, we're targeting 16 ms of frame time on consoles to hit 60 frames per second. For DM-Underland, the game thread time was over the render thread time, so I’m going to start there.
[Stat StartFile / Stat StopFile] To measure CPU usage, I typically start with our stats file captures. The stat startfile command will tell UE4’s stat system to start grabbing timings for both native code and blueprints. This can be useful to inspect general performance issues as well as one time hitches. The capture files can be really large when measuring over long periods of time so it's not always the right tool to find random hitches though. Once we have our desired capture, the stat stopfile command will write out the stats capture. Then using the Stats Viewer in UE4 Editor’s Session Frontend, we open the capture. You will see that we get measurements of the game thread, render thread and other worker threads. UFUNCTIONs are automatically marked up in the trace, but it's possible to add manual tracing to non UFUNCTIONs. We use it extensively to find any blueprints that are ticking that should not be ticking. In Underland, some environmental items made by designers were set to tick and they showed up in the game thread of the stats capture. We set those blueprint to never tick and then rerunning the profile showed an decrease in our CPU time. This system has been quiet valuable on Battle Royale to optimize our dedicated server performance. One of the easy things to see in this profiler is when components belonging to pawns are updated. We had a lot of cosmetic only components like trail particles that were updating their positions on dedicated servers even though they were never rendered. At 100 players for Battle Royale, we need all that time back. We have code that detaches them at runtime on dedicated servers and now they are no longer showing up on the profile for dedicated servers.
[stat dumphitches] When we're looking for CPU hitches without an exact repro, Stat dumphitches is my go to tool. The cost of running it is typically minor so we will leave it on during internal playtests if we're looking for a hitch. When a frame goes long in a playtest the Callstacks are printed to the game log so a programmer can look afterwards without disturbing the rest of the playtest. I recommend Launching the game with -noverifygc to cut down on garbage collection (or GC) times showing up in your dump hitch logs. GC verification won't be on during shipping so seeing it in the hitch log is not very helpful. Stat dumphitches has been very valuable on Fortnite when trying to find synchronous loads of assets that should've been preloaded or async loaded. I also used it extensively on Gears of War and Unreal Tournament. In our DM-Underland investigation, it showed hitching on the low end laptop in a tick function of a plugin that I had written.
[Windows Performance Recorder and Analyzer] To further investigate the hitching, we used Microsoft Windows Performance Recorder and Windows Performance Analyzer.WPR and WPA have been helpful for finding many types of issues in our games when it comes to interacting with the rest of the computer. In the case of Unreal Tournament, we used it to find the source of some unwanted disk I/O that was really killing frame rate on a low end laptop. We had a plugin for lighting up keyboards that wanted to call LoadLibrary for a third party dll. For many machines, especially a laptop without an external keyboard, the dll doesn’t exist. I wrote some code that would retry every frame to load that dll and that caused a lot of frame rate drops on that laptop. On my high end dev machine, I never even noticed the performance hit. We used Windows Performance Recorder to find out that I was trying and failing to load that specific dll every frame. Changing that code to only try once removed the hitching and it no longer showed up in stat dump hitches or WPR.
[Intel VTune] - 30 seconds VTune is Intel's CPU profiling tool. It's a good next step after Unreal’s internal profiling tools have identified the problem functions. It helps to determine thread bottlenecks, sync points and the way work is given to TaskGraph threads on the CPU. For 4.19, Intel worked closely with Epic engineers to implement support for ITT markers in Unreal Engine 4. This added much needed contextual data to the Vtune graphical visualizations and was extremely beneficial in profiling the engine’s thread scheduler for some of our other 4.19 work which I’ll talk about later. We also used VTune on Fortnite to find that we’re somewhat render thread thread bound. We’re addressing this in the game and I’ll talk more about it later. Oh by the way, VTune is now free if you hadn’t heard!
[Render Thread] [Stat RHI] Once the game thread performance was under control, I checked stat unit again and now the render thread time was over the game thread time. Stat RHI is my first stop when profiling the render thread. It has a lot of good info, but most importantly for me it has the number of triangles drawn. This can help narrow down which portions of the maps are over the triangle budget. On Unreal Tournament, I was very particular about the poly count because on our target HD 4000, we had to keep the polygon budget around 5 million to get a good framerate. When we started profiling DM-Underland to run on HD-4000 machines, we noticed that the polycount with no characters was over 7 million and sometimes up to 10 million. Our first hunch was that the landscape might have too many polys. We found that the tesselation of the landscape piece was set very high and it was using 5 million triangles on its own. We got the level designer to change the section size from 255x255 to 63x63. This dropped the landscape to well below 1 million triangles. The level designer had to repaint a few bits of the landscape to make up for the resolution change.
[LOD Colorization view mode] Even after all that savings on landscape, we still had too many polys. The next tool that I used was the level of detail (or LOD) colorization view. When that viewmode is enabled, instead of being textured and lit, meshes at LOD 0 will be gray, LOD 1 will be green, LOD 2 red, LOD 3 blue. If you look at the screenshot on the left, the background above the play area is quite gray. The rocks there are also used around the map 47 times. Their top LOD is 9272 triangles so we're at about half a million triangles in just that rock mesh. Luckily, the editor has automatic LOD generation built in so I was able to create LODs all the way down to 184 polys without any artist help. If you look at the screenshot on the right, now the rocks are red showing that it has LODs and it’s currently rendering LOD 3. Using the same technique looking around the map, I was able to identify a bunch of other rock meshes that needed LOD creation and I was able to get the poly count within the five million poly budget.
[GPU Profiling] [ProfileGPU] Once we're good on the CPU side, we moved on to the GPU. I like to use ProfileGPU which is our built-in tool for displaying the GPU time breakdown. r.ProfileGPU.ShowUI can be used to suppress the popup window and only print to the log, but typically I use the GUI version. I used different resolutions and screen percentages in the same scene with multiple runs of ProfileGPU to determine if weÕre able to hit frame rate targets with low pixel counts. The frame rate did increase a bit with resolution decrease, so we were pixel bound. But even with low resolution and low screen percentage, we were still having trouble making the desired framerate with the deferred renderer. We ended up using our Simple Forward Shading when a player chooses low settings. The GPU profiles were much more favorable after switching renderers, but it comes at the cost of visual fidelity. It was fine for Unreal Tournament, but for Fortnite Battle Royale, we decided that it was going to provide too much advantage and changed the look of the game too much. The perf gains alone were not worth it so we still use the deferred renderer on FN:BR for low settings.
Intel GPA is a tool that helps developers identify where their apps are slow on Intel graphics. Contains both a live mode and a frame debugger. These help narrow down whether you’re bottlenecked in shadows, geometry, post processing etc. ToggleDrawEvents is a console command in UE4 that turns on annotations to help identify where in the scene you are. r.ShowMaterialDrawEvents will mark up each draw call with the material name so you can tie it back to your blueprints. Both of these are super helpful for identifying expensive parts of the scene like landscape in both Fortnite and Unreal Tournament, which we’ll talk about a bit later. Here at Intel, GPA is our bread and butter tool to profile games on Intel Graphics and identify where targeted optimizations can be made. It also works with other hardware, although you won’t get the same depth of hardware data that you will on Intel. We used it to profile both Unreal Tournament and Fortnite and was instrumental in identifying things like the landscape tessellation issue on the Underland map Pete mentioned before.
[Shader Complexity View Mode] Even with simple forward shading, some areas of the map still had framerate issues due to overly complex materials. To find those complex materials, I used the shader complexity viewmode. This view mode shows good materials in green and bad materials in white. On the DM-Underland map in UT, we used it to identify a couple hot spots. The underwater area in general was expensive because of the over draw on the transparent water, but there were some areas that were showing up white hot. It turns out that there's some very expensive coral foliage at the bottom of the lake. You can barely see on high end machines from the normal play area and almost never see on low end. It's also an area that doesn't really see that much game play so toning it down wouldn't effect the overall scene. We ended up lowering the draw distance on that foliage and simplifying the shader on low detail to get back from frame time and back into budget.
[Memory Profiling Tools] Once we were done with CPU and GPU optimizations, we moved on to memory optimizations. Some platforms like consoles have hard memory caps, they will crash when you use too much memory. Others like low end PC have soft memory caps, they have virtual memory once you hit the physical ram limit. Hitting the soft memory cap can cause compressed memory or virtual memory paging which will kill your performance. I use the Memreport -full console command to get a list of everything in memory. It generates a text file that contains information about all the static meshes, skeletal meshes, sounds and textures that are loaded. Listtextures is included in the memreport, but I typically will have QA run it on it’s own regularly once I’ve already done a pass on other memory. -csv can be used to make it easier to import into a spreadsheet which makes keeping track of memory trends in textures easy. I’ll do passes through the spreadsheet to look for textures in the wrong group or using too much memory due to forced sizes. We keep spreadsheets of texture usage for every release that way any major swing in texture memory can be investigated without a ton of effort.
[Primitive Stats] The editor's primitive statistics panel is another tool that I use when trying to optimize memory. I can sort by size in memory and see if anything is an outlier. It's also a good place to look for assets that don't belong in the current scene, but are getting loaded anyway. On Fortnite Battle Royale, we use it to watch for any assets from Save the World that might be getting loaded accidentally. The primitive stats also lists triangle count which makes it useful for trying to optimize the number of triangles in the scene, in the case of the Landscape in DM-Underland that used to be over 5 million triangles, you can see that it's now only 127 thousand triangles. The count stat can help you decide if you have too many unique hero meshes and would be able to save some memory by duplicating some. On Unreal Tournament, I modified the panel to show LOD count to help deal with the issues we talked about before with rocks that had no LODs.
[Texture Stats] The statistics panel also had a mode that shows textures. It's similar to the view that memreport provides but can be refreshed in real time. In UE4, the texture pool saves us from having to worry about being pushed over memory limits, but it's in our best interest to make sure the texture pool is being used optimally. The tighter we can make our texture pool, the more space we have for other things. I like to use the texture statistics panel to verify that textures are all in the correct group, are properly mipping, power of 2 dimensions and have the right LODBias. On Unreal Tournament and Gears of War, we would routinely have cinematic sized textures showing up during gameplay so we needed to keep a close watch on this list. Another common mixup is normal maps ending up in World or Character instead of WorldNormalMap or CharacterNormalMap.[Summary]Using the techniques I have described, we achieved our goal of 30 frames per second on an HD-4000 in DM-Underland. We’ve also applied them to get to 60 frames per second on Fortnite console builds. Our next optimization target is getting Fortnite Battle Royale dedicated servers up to a steady 20 hz and then hopefully 30 hz. We have a lot of optimizations coming in Unreal Engine 4.20, but many of these optimizations are already in 4.19.
Going back to Bob’s section about having performance buckets, we often have high end CPUs where a lot of the time is idle. For 4.19, we added some great stuff for developers to take advantage of. Prior to 4.19, Unreal Engine 4 does not create enough worker threads to fully utilize a CPU beyond 6 cores. This has been fixed to allow the Task Graph system to detect the number of cores on a CPU and scale the number of worker threads available accordingly. This lets developers take full advantage of high core count CPUs, creating more visual realism through systems such as cloth physics, environment destruction, CPU based particles and advanced 3D audio. The cloth system allows for dynamic simulation of meshes that respond to the player, wind or other environmental factors. Typical cloth workloads include player capes or flags. Cloth is simulated every frame, even if the player is not looking at it because the simulation results determine if it shows up in the player's view. Improved performance by about 30% in 4.19. Vtune is an important tool to determine thread bottlenecks, sync points and the effectiveness of a thread scheduler. Worked closely with Epic engineers to implement support for ITT markers in Unreal Engine 4. This adds much needed contextual data to the Vtune graphical visualizations. The picture on the right was taken with a test level with an absurd amount of cloth rendering. This was run on an i7-6950X 10 core 20 thread extreme edition CPU. Now you can make use of all of that CPU power in your games too! Looking forward, Fortnite* is focusing on consistent framerate on console and dedicated server. RHI thread in DX11 is being enabled for extra headroom. Take advantage of all system resources in your games. CPU is often overlooked but can add some great eye candy if extra cycles are available, especially with 4.19.

Forts and Fights Scaling Performance on Unreal Engine*

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Forts and Fights Scaling Performance on Unreal Engine*

Similar to Forts and Fights Scaling Performance on Unreal Engine* (20)

More from Intel® Software

More from Intel® Software (20)

Recently uploaded

Recently uploaded (20)

Forts and Fights Scaling Performance on Unreal Engine*

Editor's Notes