1. The document outlines profiling tools and techniques for optimizing game performance in Unreal Engine 4, including tools for profiling CPU, GPU, memory, and content.
2. It provides guidance on using tools like the profiler, stat commands, VTune, rendering stats, and view modes to identify optimization opportunities for issues like animation updates, materials, lighting, culling.
3. The document highlights recent improvements in Unreal Engine 4.19 like improved cloth simulation and worker thread scaling that can enhance performance.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
Unreal Engine* 4 is a high-performance game engine for game developers. Learn how Intel and Epic Games* worked together to improve engine performance both for CPUs and GPUs and how developers can take advantage of it.
Epic Games Japan hold a meeting named "Lightmass Deep Dive" on July 30, 2016.
Osamu Satio of Square Enix Osaka gave a presentation about their Lightmass Operation for Large Console Games. EGJ translated the slide for the presentation to English and published it.
There are some movies in the slide. So we recommend downloading this slide.
Mieszko Zielinski, Lead AI Programmer, Epic Games —
UE4 Large World AI Navigation
Mieszko Zielinski, Lead AI Programmer, Epic Games
UE4: навигация AI в большом мире
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
Unreal Engine* 4 is a high-performance game engine for game developers. Learn how Intel and Epic Games* worked together to improve engine performance both for CPUs and GPUs and how developers can take advantage of it.
Epic Games Japan hold a meeting named "Lightmass Deep Dive" on July 30, 2016.
Osamu Satio of Square Enix Osaka gave a presentation about their Lightmass Operation for Large Console Games. EGJ translated the slide for the presentation to English and published it.
There are some movies in the slide. So we recommend downloading this slide.
Mieszko Zielinski, Lead AI Programmer, Epic Games —
UE4 Large World AI Navigation
Mieszko Zielinski, Lead AI Programmer, Epic Games
UE4: навигация AI в большом мире
Game engines have long been in the forefront of taking advantage of the ever increasing parallel compute power of both CPUs and GPUs. This talk is about how the parallel compute is utilized in practice on multiple platforms today in the Frostbite game engine and how we think the parallel programming models, hardware and software in the industry should look like in the next 5 years to help us make the best games possible
A technical deep dive into the DX11 rendering in Battlefield 3, the first title to use the new Frostbite 2 Engine. Topics covered include DX11 optimization techniques, efficient deferred shading, high-quality rendering and resource streaming for creating large and highly-detailed dynamic environments on modern PCs.
Presented by Ken Kuwano (Epic Games Japan)
This slide is a translation of the presentation material from the "UE4 Localization Deep Dive" on October 31, 2019.
The presentation describes Physically Based Lighting Pipeline of Killzone : Shadow Fall - Playstation 4 launch title. The talk covers studio transition to a new asset creation pipeline, based on physical properties. Moreover it describes light rendering systems used in new 3D engine built from grounds up for upcoming Playstation 4 hardware. A novel real time lighting model, simulating physically accurate Area Lights, will be introduced, as well as hybrid - ray-traced / image based reflection system.
We believe that physically based rendering is a viable way to optimize asset creation pipeline efficiency and quality. It also enables the rendering quality to reach a new level that is highly flexible depending on art direction requirements.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Course presentation at SIGGRAPH 2014 by Charles de Rousiers and Sébastian Lagarde at Electronic Arts about transitioning the Frostbite game engine to physically-based rendering.
Make sure to check out the 118 page course notes on: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/
During the last few months, we have revisited the concept of image quality in Frostbite. The core of our approach was to be as close as possible to a cinematic look. We used the concept of reference to evaluate the accuracy of produced images. Physically based rendering (PBR) was the natural way to achieve this. This talk covers all the different steps needed to switch a production engine to PBR, including the small details often bypass in the literature.
The state of the art of real-time PBR techniques allowed us to achieve good overall results but not without production issues. We present some techniques for improving convolution time for image based reflection, proper ambient occlusion handling, and coherent lighting units which are mandatory for level editing.
Moreover, we have managed to reduce the quality gap, highlighted by our systematic reference comparison, in particular related to rough material handling, glossy screen space reflection, and area lighting.
The technical part of PBR is crucial for achieving good results, but represents only the top of the iceberg. Frostbite has become the de facto high-end game engine within Electronic Arts and is now used by a large amount of game teams. Moving all these game teams from “old fashion” lighting to PBR has required a lot of education, which have been done in parallel of the technical development. We have provided editing and validation tools to help the transition of art production. In addition, we have built a flexible material parametrisation framework to adapt to the various authoring tools and game teams’ requirements.
Game engines have long been in the forefront of taking advantage of the ever increasing parallel compute power of both CPUs and GPUs. This talk is about how the parallel compute is utilized in practice on multiple platforms today in the Frostbite game engine and how we think the parallel programming models, hardware and software in the industry should look like in the next 5 years to help us make the best games possible
A technical deep dive into the DX11 rendering in Battlefield 3, the first title to use the new Frostbite 2 Engine. Topics covered include DX11 optimization techniques, efficient deferred shading, high-quality rendering and resource streaming for creating large and highly-detailed dynamic environments on modern PCs.
Presented by Ken Kuwano (Epic Games Japan)
This slide is a translation of the presentation material from the "UE4 Localization Deep Dive" on October 31, 2019.
The presentation describes Physically Based Lighting Pipeline of Killzone : Shadow Fall - Playstation 4 launch title. The talk covers studio transition to a new asset creation pipeline, based on physical properties. Moreover it describes light rendering systems used in new 3D engine built from grounds up for upcoming Playstation 4 hardware. A novel real time lighting model, simulating physically accurate Area Lights, will be introduced, as well as hybrid - ray-traced / image based reflection system.
We believe that physically based rendering is a viable way to optimize asset creation pipeline efficiency and quality. It also enables the rendering quality to reach a new level that is highly flexible depending on art direction requirements.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Course presentation at SIGGRAPH 2014 by Charles de Rousiers and Sébastian Lagarde at Electronic Arts about transitioning the Frostbite game engine to physically-based rendering.
Make sure to check out the 118 page course notes on: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/
During the last few months, we have revisited the concept of image quality in Frostbite. The core of our approach was to be as close as possible to a cinematic look. We used the concept of reference to evaluate the accuracy of produced images. Physically based rendering (PBR) was the natural way to achieve this. This talk covers all the different steps needed to switch a production engine to PBR, including the small details often bypass in the literature.
The state of the art of real-time PBR techniques allowed us to achieve good overall results but not without production issues. We present some techniques for improving convolution time for image based reflection, proper ambient occlusion handling, and coherent lighting units which are mandatory for level editing.
Moreover, we have managed to reduce the quality gap, highlighted by our systematic reference comparison, in particular related to rough material handling, glossy screen space reflection, and area lighting.
The technical part of PBR is crucial for achieving good results, but represents only the top of the iceberg. Frostbite has become the de facto high-end game engine within Electronic Arts and is now used by a large amount of game teams. Moving all these game teams from “old fashion” lighting to PBR has required a lot of education, which have been done in parallel of the technical development. We have provided editing and validation tools to help the transition of art production. In addition, we have built a flexible material parametrisation framework to adapt to the various authoring tools and game teams’ requirements.
Optimization Deep Dive: Unreal Engine 4 on IntelIntel® Software
This talk covers the work Intel and Epic Games have done together to enable improved performance of UE4 on Intel platforms, including DirectX 12 and Android. Many techniques presented are general and apply to all games and engines.
Open Source Interactive CPU Preview Rendering with Pixar's Universal Scene De...Intel® Software
Universal Scene Description* (USD) is an open source initiative developed by Pixar for fast, large scale, and universal asset management across multiple programs including Maya, Houdini, and others.
E5 Intel Xeon Processor E5 Family Making the Business Case Intel IT Center
This presentation highlights cloud computing advantages of the Intel® Xeon® processor E5 family and helps you make the business case for investing. Includes access to an ROI calculator.
Abstract: Explore the packet I/O data path from a NIC across PCI-Express to cache/memory and understand how to build efficient CPU code for networked applications.
Speaker: Venky Venkatesan, Intel Fellow, Chief Architect – Packet Processing and Networking Applications
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Intel® Software
Deep learning based Inference on edge based devices is growing rapidly. In this talk, learn about how developers and researchers are taking advantage of Intel® Processor Graphics to get best performance.
Software Development Tools for Intel® IoT PlatformsIntel® Software
This talk familiarizes participants with the benefits of using the Intel® software development tools and libraries for developing end-to-end IoT solutions.
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...Databricks
Apache Spark is a popular data processing engine designed to execute advanced analytics on very large data sets which are common in today’s enterprise use cases. To enable Spark’s high performance for different workloads (e.g. machine-learning applications), in-memory data storage capabilities are built right in.
However, Spark’s in-memory capabilities are limited by the memory available in the server; it is common for computing resources to be idle during the execution of a Spark job, even though the system’s memory is saturated. To mitigate this limitation, Spark’s distributed architecture can run on a cluster of nodes, thus taking advantage of the memory available across all nodes. While employing additional nodes would solve the server DRAM capacity problem, it does so at an increased cost. Intel(R) Memory Drive Technology is a software-defned memory (SDM) technology, which combined with an Intel(R) Optane(TM) SSD, expands the system’s memory.
This combination of Intel(R) Optane(TM) SSD with Intel Memory Drive Technology alleviates those memory limitations that are inherent to Spark, by making more memory available to the operating system and to Spark jobs, transparently.
The Architecture of 11th Generation Intel® Processor GraphicsIntel® Software
Scheduled for release this year, this next generation brings significant improvements over the widely used 9th generation of Intel® Processor Graphics. The talk begins with an overview of Intel® Graphics architecture, its building blocks, and their performance implications. Next, take an in-depth look at the new and innovative features of this latest generation of integrated graphics.
Optimizing Direct X On Multi Core Architecturespsteinb
This slide set covers best practices in designing threaded rendering in PC games. Examples of current PC titles will be used throughout the talk to highlight the various points.
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
Advances in cell biology and creation of an immense amount of data are converging with advances in Machine learning to analyze this data. Biology is experiencing its AI moment and driving the massive computation involved in understanding biological mechanisms and driving interventions. Learn about how cutting edge technologies such as Software Guard Extensions (SGX) in the latest Intel Xeon Processors and Open Federated Learning (OpenFL), an open framework for federated learning developed by Intel, are helping advance AI in gene therapy, drug design, disease identification and more.
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
Python is the number 1 language for data scientists, and Anaconda is the most popular python platform. Intel and Anaconda have partnered to bring scalability and near-native performance to Python with simple installations. Learn how data scientists can now access oneAPI-optimized Python packages such as NumPy, Scikit-Learn, Modin, Pandas, and XGBoost directly from the Anaconda repository through simple installation and minimal code changes.
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
Preprocess, visualize, and Build AI Faster at-Scale on Intel Architecture. Develop end-to-end AI pipelines for inferencing including data ingestion, preprocessing, and model inferencing with tabular, NLP, RecSys, video and image using Intel oneAPI AI Analytics Toolkit and other optimized libraries. Build at-scale performant pipelines with Databricks and end-to-end Xeon optimizations. Learn how to visualize with the OmniSci Immerse Platform and experience a live demonstration of the Intel Distribution of Modin and OmniSci.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
How do we scale AI to its full potential to enrich the lives of everyone on earth? Learn about AI hardware and software acceleration and how Intel AI technologies are being used to solve critical problems in high energy physics, cancer research, financial inclusion, and more. Get started on your AI Developer Journey @ software.intel.com/ai
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
Software AI Accelerators deliver orders of magnitude performance gain for AI across deep learning, classical machine learning, and graph analytics and are key to enabling AI Everywhere. Get started on your AI Developer Journey @ software.intel.com/ai.
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
Learn about the algorithms and associated implementations that power SigOpt, a platform for efficiently conducting model development and hyperparameter optimization. Get started on your AI Developer Journey @ software.intel.com/ai.
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
oneDNN Graph API extends oneDNN with a graph interface which reduces deep learning integration costs and maximizes compute efficiency across a variety of AI hardware including AI accelerators. Get started on your AI Developer Journey @ software.intel.com/ai.
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
Scale your research workloads faster with Intel on AWS. Learn how the performance and productivity of Intel Hardware and Software help bridge the gap between ideation and results in Data Science. Get started on your AI Developer Journey @ software.intel.com/ai.
Whether you are an AI, HPC, IoT, Graphics, Networking or Media developer, visit the Intel Developer Zone today to access the latest software products, resources, training, and support. Test-drive the latest Intel hardware and software products on DevCloud, our online development sandbox, and use DevMesh, our online collaboration portal, to meet and work with other innovators and product leaders. Get started by joining the Intel Developer Community @ software.intel.com.
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
Explore practical elements, such as performance profiling, debugging, and porting advice. Get an overview of advanced programming topics, like common design patterns, SIMD lane interoperability, data conversions, and more.
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
Explore how to build a unified framework based on FFmpeg and GStreamer to enable video analytics on all Intel® hardware, including CPUs, GPUs, VPUs, FPGAs, and in-circuit emulators.
Review state-of-the-art techniques that use neural networks to synthesize motion, such as mode-adaptive neural network and phase-functioned neural networks. See how next-generation CPUs with reinforcement learning can offer better performance.
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
This talk focuses on the newest release in RenderMan* 22.5 and its adoption at Pixar Animation Studios* for rendering future movies. With native support for Intel® Advanced Vector Extensions, Intel® Advanced Vector Extensions 2, and Intel® Advanced Vector Extensions 512, it includes enhanced library features, debugging support, and an extensive test framework.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
5. 5
Target Hardware - Research
1. Gather data from as many sources as
you can for GPUs and CPUs.
2. Create tables of benchmark scores to
card/chip name.
3. Determine target benchmark scores that
include a supported % of the population.
4. Make histograms of population by
benchmark.
5. Distribute into buckets of roughly equal
size.
6. Target Hardware - Decisions
• For each bucket, find a popular CPU and
GPU that is near the weaker side of the
bucket’s range.
• Do the research YOURSELF. Existing
data will likely be out of date since
desktop hardware changes frequently.
• Every platform and bucket you support is
another configuration to maintain and
test.
6
7. 7
Shadows/Lighting
• Static lighting was not possible due to the
building and destruction features.
• Dynamic lighting is expensive for low end
machines, but can be awesome on high
end.
• We use simple forward shading for Save
the World* mode, but not Battle Royale*.
• High end machines look much better with
DistanceFieldAO enabled, so we optimized
it so it can be enabled on consoles as well.
8. 8
Render Resolution
• Resolution dramatically affects GPU
performance.
• During development we used discrete
resolutions to make comparing
performance easier.
• This was very effective, but ultimately the
end-user experience is better with a slider.
• Render resolution does not affect UI.
9. 9
Animation (URO)/Significance Manager
• Update Rate Optimization (URO) reduces
the tick frequency of animations.
• In the engine, URO is purely distance
based. In Fortnite*, there is a budget for
characters.
• The significance manager scores players
and enemies to make sure more important
characters animate at higher rates.
• It is also used to score other things
including particle systems and levels.
10. 10
Material Quality
• Adding material quality nodes to materials
greatly improves GPU time on low end
machines.
• Artists must maintain them.
• Adding a material quality node will triple the
number of shaders it generates.
• Try to reduce dependent texture fetching
using this node.
11. 11
HLOD/Distance Culling
• Save the World* uses distance culling a lot
due to the densely populated levels.
• We use a shader to cause object to
animate into view. This allowed us to be
more aggressive on low end machines.
• Set a range of object sizes and cull
distances.
• Battle Royale* required long distance
visibility so we used HLODs to represent
far away geometry.
13. 13
“Am I CPU bound, or GPU bound?”
It displays the overall frame time, CPU
time taken by the game thread, the
render thread and the GPU time
Unreal Tournament* target on PC is 8
ms
Fortnite* target on consoles is 16 ms
CPU Profiling
Stat Unit
14. 14
Creates a stats file using UE4’s stat
system with both native code timings
and blueprints
Can be useful to find general
performance issues as well as one time
hitches
Opened with the Stats Viewer in UE4
Editor’s Session Frontend
Easy to find ticking Objects that should
not be ticking
CPU Profiling
Stat StartFile / Stat StopFile
15. 15
Helps find spikes in CPU time that are
harder to find with targeted tools
Callstacks are printed to the game log
The cost of running it is typically minor
so it can be left on during internal
playtests
Used on Fortnite* to find synchronous
loads
CPU Profiling
Stat Dumphitches
16. 16
Used to find issues like unexpected
disk I/O during gameplay
Unreal Tournament* was calling
LoadLibrary every frame and not
finding the file
Issues like that can account for large
amount of the frame time on lower end
systems
CPU Profiling
Windows Performance Recorder and Analyzer
17. 17
Intel® VTune™ Amplifier
Intel® Vtune™ Amplifier enables deep
profiling and problem identification.
Hotspots, locks, syncs, multithreading,
even GPU data!
With 4.19, new support for event
based CPU sampling using itt_notify
framework.
Vtune™ is now free!
18. 18
Triangle count display
Unreal Tournament* has a triangle
count budget around 5 million for low
end
DM-Underland* had a landscape mesh
that was 7 million alone
CPU Profiling
Stat RHI
19. 19
CPU Profiling
LOD Colorization View Mode
Identify meshes with no LODs
Identify LODs with wrong transition
points
On Unreal Tournament*, helped find
rocks in the distance that had no LODs
20. 20
Our built-in tool for displaying the GPU
time breakdown
r.ProfileGPU.ShowUI can be used to
suppress the popup window
Use different values of r.SetRes and
r.ScreenPercentage to determine if you
are vertex or pixel bound
UT switched Simple Forward Shading
for low end
GPU Profiling
ProfileGPU
21. 21
Intel® Graphics Performance Analyzers
Use ToggleDrawEvents and
r.ShowMaterialDrawEvents commands
Frame debugging / live mode
Experiments!
22. 22
Help track materials that may be
overbudget by visualizing their cost
Green is good
White is bad
DM-Underland* has coral foliage that is
white hot
Lowered draw distance and simplified
shader
GPU Profiling
Shader Complexity View Mode
23. 23
Memreport –full
Generates a log file with a breakdown
of memory usage
Listtextures
Generates a log file or a csv
Keep a spreadsheet of textures each
release to watch for usage changes
Memory Profiling
Common Tools
24. 24
Look out for
• Overly large assets
• Content that does not belong
Count lists time in map
Triangle count per asset
Unreal Tournament* modified the panel
to show LOD count for static meshes
Memory Profiling
Primitive Stats Viewer
25. 25
Look out for
• Wrong group
• Wrong LODBias
• Uncompressed textures
• Non-mipping textures
• Bad dimensions
Memory Profiling
Texture Stats Viewer
26. 26
Unreal Engine* 4.19 Goodness
Worker threads scale with CPU. No
more idle cores!
Cloth throughput improved ~30%.
Intel® Vtune™ Amplifier Support –
Gives deep insight into what the
engine is doing at all times. Enables
profiling of task scheduler that was
previously opaque.
4.19 is available now! Upgrade to take
advantage of these improvements!
27. Call To Action
Scalability is a question of quality. Make your game look as good as possible on
as many machines as you can!
Check out the docs and videos for all of our profiling tools!
Check out the Unreal* demos in the Intel and Epic booths!
27
Hi everyone, thanks for coming. This is Forts and Fights: Scaling performance on Unreal Engine. Today we’re going to dig in on how Intel and Epic worked together to optimize Fortnite and Unreal Tournament. This talk is a culmination of 5 years of Intel / Epic collaboration with lots of optimization trips in between and the learnings that came from them. We’ll start off with a short video and then get into scalability, profiling tools, and a bit about the work that we’ve done on the 4.19 release.
[Introductions]
That brings me to the introductions. I’m Jeff Rous, a senior developer relations engineer at Intel. What my team does it work closely with folks like Epic here to optimize their games for the CPU and GPU. I’ve had the pleasure of working on Paragon, Unreal Tournament and Fortnite over the years. We also do quite a bit of engine optimization work which I’ll talk about towards the end. I’ve been with Intel for 14 years now.
Hi, my name is Peter Knepley, I'm Technical Lead on Fortnite Battle Royale and previous to that I was Technical Lead on Unreal Tournament. I've worked at Epic Games for 8 years as a gameplay programmer and I've spent a lot of time profiling our games. I've shipped Gears 3, Gears Judgment, Unreal Tournament, Paragon, and Fortnite with Epic.
Hey everyone! I'm Bob Tellez, and I've been working on Fortnite as a Technical Lead for about four and a half years. Previously, I've worked as an engine programmer on Unreal Engine for about two years. I've spent a lot of time profiling, tweaking, and optimizing various aspects of Fortnite and I'd love to tell you all about it! In this presentation I'm going to be sharing how we chose the target hardware for Fortnite, and what features we configured in order to both run well and look great on these machines.
[Target Hardware - Research]
The first thing you need to know when optimizing your game is what machines will be running it. It may be tempting to just make some gut decisions about what platforms, GPUs and CPUs are popular, but I encourage you to do some research to make the best possible decisions to make your game look as good as it can for as many people as possible. To do this you should first try to find as many sources of data about the hardware that your *potential* gamers are using. The Steam Hardware & Software Survey is a great public source of information but you may also have private information as well. For example, when evaluating Fortnite, we gathered data about those who have opened the Epic Games Launcher to play other Epic games. We also got some data from Tencent about users in different regions of the world.
What you are looking for in this data is a list of unique video cards and CPUs that are actually used by a non-trivial number of people. Do the best you can. I have found that there are cards and chips that misreport their names or have inconsistent names, but they should not be very common and I just trimmed the dirty data. You should then create a table of benchmark scores for each of these GPUs and CPUs. You can find these benchmarks on the website of your choice. I used videocardbenchmark.net.
Sort this table by benchmark score so you can now see how low you need to go to hit a target percentage of the population. Your target percentage might depend on the size of your project/company/budget. It can be challenging to make a modern game run on very old hardware, so make sure you are up to the task! To make something really EPIC like Fortnite, you'll probably want to support at least 90% of potential users. Trim all hardware that is below the benchmark scores for your target percentage.
Now you'll need to visualize the data to divide it up appropriately into a few discrete buckets, so you'll want to make a histogram like the one shown on this slide. Play around with the histogram bin size until you have a good feeling for the population distribution. At this point you have enough information to determine your target spec machines!
Try to divide the population MOSTLY equally, but feel free to move the division lines around a little to to put a popular set of cards with similar strength near the bottom of a bucket. The number of buckets to choose very much depends on how much work you want to put into supporting settings configurations. Having more buckets allows more machines to look awesome, but it can be quite costly for many people on your team to support tons of configurations. For Fortnite, we decided to have four: Low, Medium, High, and EPIC.
[Target Hardware - Decisions]
At this point in the process, you should now have a good idea of what people are using. It's time to choose some hardware that represents each bucket. You'll likely need to purchase this hardware, and sometimes it's hard to get a hold of older chips and cards, which is why the hardware you choose to represent each bucket will need to have been somewhat popular when it was released. While working on Fortnite, I found that old hardware tends to break so there is a good chance you'll need to buy a replacement. If your choice was popular enough, you have a good shot being able to find an exact replacement. Otherwise, you will need to change your choice which will affect your ability to compare performance between builds.
You may have noticed on my previous slide that I did not show any numbers in the histogram. This is because I would like to encourage you do this research yourself! You may find some canned research available online, but desktop hardware changes very frequently so I suggest you go through this process with as fresh data as you can, and re-evaluate from time to time. Depending on the scope of your project you may only need to do this once, or if you are a live game like Fortnite you may need to do this every 6 months or so. Remember that changing your target hardware for any reason is pretty disruptive since you will need to change many settings, so I would recommend sticking with your decisions for a good while before re-evaluating.
Speaking of maintainability, keep in mind the overall number of combinations of settings that you will need to support. This presentation, so far, has largely been about desktop hardware. If you plan on supporting consoles or mobile platforms, know that each of them will also have one or more buckets. All buckets will need to be tested so adding platforms greatly increases the amount of work that needs to be done. Luckily for Fortnite, the settings for "High" desktop were somewhat close to PS4 and XB1, so those consoles use these settings with some tweaks. Xbox One X and PS4 Pro use "Epic" desktop settings with tweaks. Keeping the settings similar made it easier to keep track of the quality of consoles even while testing on desktop and vice versa.
[Shadows/Lighting]
Once you know your target hardware, it's time to start optimizing and configuring settings. There are very many settings to tweak, but in this presentation I will only be talking about a few of the settings that have the highest impact on framerate. Let's start with shadows and lighting. One of the best things you can do is use static lighting in your game where possible. Unfortunately, in Fortnite we could not do this for pretty much anything due to the player's ability to create or destroy nearly all of the environment combined with the continuous day/night cycle. We chose to disable static lighting entirely and go fully dynamic!
Dynamic lighting can be somewhat expensive for weak GPUs, but it looks and works great in our game so we looked into some options.
Initially, we turned on simple forward shading for low end machines, which was a rendering mode that allowed us to have very simple lighting but skip many of the expensive parts of our deferred renderer. This is very effective for framerate and to get things working, but the visual quality was a little lacking. In Battle Royale we optimized other parts of our GPU usage and turned this feature back off, but in Save the World it is still on. In the future we intend to have it off in both modes once we find some more GPU time in Save the World.
One of the biggest quality improvements to Fortnite was when we started using distance field ambient occlusion. This was initially very expensive and only enabled when using Epic settings, but we optimized it and added a lower resolution mode to allow it to be used in High settings and consoles as well. Depending on your game, you will probably want this setting enabled on the lowest bucket you can to make it look awesome. Try to make actors that move a lot not affect distance fields by setting a flag on them that disables them, and let mostly static parts of the environment be affected.
[Render Resolution]
Render resolution dramatically affects GPU performance in Fortnite. We use very many postprocess render targets, and some of them have somewhat pricey pixel shaders.
In the early stages of the game's development, we just had a slider to control the percentage scale of your monitor you wanted your render resolution to be. This was very bad when comparing performance between machines because your monitor's supported resolution would greatly affect your framerate and people rarely report their monitor resolution when listing hardware specs. To combat this, we changed over to having discrete buttons in the setting screen to set your resolution to 480p, 700p, 1080p, and 1440p, regardless of your monitor size. We quickly learned that the "Epic" 1440p size was not enough for folks who had very fast GPUs so to trade off user experience for practical measurement, we changed "Epic" to just be full resolution of your monitor. While this made comparing Epic machines hard again, it was an acceptable compromise for a long time.
Eventually we brought back a slider where you would choose the size instead of percentage because once our game was made available to far more people, the end-user convenience outweighed so practical benefit. On consoles we recently started using dynamic scaling so the game can look as good as it can given the other things happening in the scene.
One last thing to keep in mind about render resolution is that it does not affect the UI for obvious reasons, so while this is still a powerful setting if you happen to have a UI heavy game this setting may do less for you than you think.
[Animation (URO)/Significance Manager]
So far I have been talking about scaling based on GPU performance, but let's not neglect the CPU! One of the large CPU costs in Fortnite is updating animations. The Unreal Engine provides a way to reduce the frequency of animation updates called Update Rate Optimization, or URO.
By default, URO is purely based on distance which works pretty well in general cases. In Fortnite, however, we changed this behavior to use a budget for characters that is based on a score we call significance which is calculated by a significance manager. The budget only allows a certain number of characters to update at a full rate while others have reduced rates, which makes cases that involve many characters on the screen to still maintain your target framerate even if they are all close to you. This budget scales to be stricter on weaker hardware.
In Fortnite, a character's score is based not only on distance to the camera, but also on screen space size. This gracefully handles cases like using a sniper scope to zoom in on a player.
You can use the significance manager to score more things than just characters. The significance manager works nicely with particle effects for cases where many are used at once. It is also used in Fortnite to handle level streaming.
[Material Quality]
If you have a lot of complex materials, you should consider adding material quality nodes to them. Making simpler versions of materials can greatly improve GPU time on low end machines. You can use the "Shader Complexity" viewmode to try to find your worst offenders and focus on making them, and Pete will be talking about that viewmode later in the presentation.
There are a couple downsides to using quality nodes. Those who work on the materials will have to maintain them, and this can be hard if you have very many. Also every time a quality node shows up in a material it triples the number of shaders the material makes. This normally is not a big deal but keep it in mind and don't go crazy with them.
When you use quality nodes, you'll generally want to focus on reducing the number of instructions that are generated, but also generally expensive operations like dependent texture fetching which we found several cases of in the Fortnite terrain materials.
[HLOD/Distance Culling]
The last big performance settings I'm going to be talking about are Hierarchical Level of Detail (HLOD) and distance culling. In Save the World we had somewhat densely populated levels with many small actors that are all independently destroyed and this put a strain on occlusion culling. We combat this by bringing in distance culling quite a bit on weaker machines, but as expected it made it so the user could see when objects disappeared at a distance.
Since Fortnite is a fun and "bouncy" game, instead of trying to hide the culling we added some shader logic to cause actors to animate in and out when culling happened. This is much less jarring and allowed us to be very aggressive.
To make sure that the general shape of the level far away from you remained mostly intact, we made the cull distance of all actors in our levels based on the size of the object so that trees and buildings would be the last to cull.
In Battle Royale, combat and skydiving is done over large distances and we could not be very aggressive with cull distance at all. Instead we use HLOD to represent large portions of the map, which involves a process that automatically generates a mesh representing the actors in a level. These levels are completely unloaded from memory and streamed in as the camera comes close to them. Once streamed in the HLOD mesh is removed revealing the loaded level.
[Overview of Profiling] -X seconds
This is a screenshot of the DM-Underland map that shipped with the latest Unreal Tournament game. The goal was to get this map running at 120 fps on discreet GPU computers and at least 30 fps on laptops with HD-4000. Any given view could be several million polys.
I'm here today to talk about Intel and Epic came together to optimize this map. Hopefully you can apply any or all of these techniques on your UE4 game to make sure that it not only uses your desired amount of CPU and GPU, but also fits into memory on your platform of choice.
[CPU Profiling]
[stat unit]
I'm going to start with CPU profiling. The first step in our profiling journey is stat unit. It displays the overall frame time which is then broken down by CPU time spent in the game and render threads and the GPU time. This helps answer the age old question "Am I CPU bound, or GPU bound?" We need to know the answer to that question to know where to start optimizing. The threads are fairly parallel so we only have to worry about the one with the longest time currently shown in stat unit. After every big change, I come back to stat unit to make sure that the frame time is going down and to know where to optimize next. On Unreal Tournament which was a PC only title, the high end target was 8ms of frame time to hit 120 frames per seconds and for HD-4000 class hardware the target was more lenient at 33 ms of frame time or 30 frames per second. On Fortnite, we're targeting 16 ms of frame time on consoles to hit 60 frames per second. For DM-Underland, the game thread time was over the render thread time, so I’m going to start there.
[Stat StartFile / Stat StopFile]
To measure CPU usage, I typically start with our stats file captures.
The stat startfile command will tell UE4’s stat system to start grabbing timings for both native code and blueprints.
This can be useful to inspect general performance issues as well as one time hitches.
The capture files can be really large when measuring over long periods of time so it's not always the right tool to find random hitches though.
Once we have our desired capture, the stat stopfile command will write out the stats capture.
Then using the Stats Viewer in UE4 Editor’s Session Frontend, we open the capture.
You will see that we get measurements of the game thread, render thread and other worker threads.
UFUNCTIONs are automatically marked up in the trace, but it's possible to add manual tracing to non UFUNCTIONs.
We use it extensively to find any blueprints that are ticking that should not be ticking. In Underland, some environmental items made by designers were set to tick and they showed up in the game thread of the stats capture. We set those blueprint to never tick and then rerunning the profile showed an decrease in our CPU time.
This system has been quiet valuable on Battle Royale to optimize our dedicated server performance. One of the easy things to see in this profiler is when components belonging to pawns are updated. We had a lot of cosmetic only components like trail particles that were updating their positions on dedicated servers even though they were never rendered. At 100 players for Battle Royale, we need all that time back. We have code that detaches them at runtime on dedicated servers and now they are no longer showing up on the profile for dedicated servers.
[stat dumphitches]
When we're looking for CPU hitches without an exact repro, Stat dumphitches is my go to tool.
The cost of running it is typically minor so we will leave it on during internal playtests if we're looking for a hitch.
When a frame goes long in a playtest the Callstacks are printed to the game log so a programmer can look afterwards without disturbing the rest of the playtest.
I recommend Launching the game with -noverifygc to cut down on garbage collection (or GC) times showing up in your dump hitch logs. GC verification won't be on during shipping so seeing it in the hitch log is not very helpful.
Stat dumphitches has been very valuable on Fortnite when trying to find synchronous loads of assets that should've been preloaded or async loaded. I also used it extensively on Gears of War and Unreal Tournament. In our DM-Underland investigation, it showed hitching on the low end laptop in a tick function of a plugin that I had written.
[Windows Performance Recorder and Analyzer]
To further investigate the hitching, we used Microsoft Windows Performance Recorder and Windows Performance Analyzer.WPR and WPA have been helpful for finding many types of issues in our games when it comes to interacting with the rest of the computer. In the case of Unreal Tournament, we used it to find the source of some unwanted disk I/O that was really killing frame rate on a low end laptop. We had a plugin for lighting up keyboards that wanted to call LoadLibrary for a third party dll. For many machines, especially a laptop without an external keyboard, the dll doesn’t exist. I wrote some code that would retry every frame to load that dll and that caused a lot of frame rate drops on that laptop. On my high end dev machine, I never even noticed the performance hit. We used Windows Performance Recorder to find out that I was trying and failing to load that specific dll every frame. Changing that code to only try once removed the hitching and it no longer showed up in stat dump hitches or WPR.
[Intel VTune] - 30 seconds
VTune is Intel's CPU profiling tool. It's a good next step after Unreal’s internal profiling tools have identified the problem functions. It helps to determine thread bottlenecks, sync points and the way work is given to TaskGraph threads on the CPU. For 4.19, Intel worked closely with Epic engineers to implement support for ITT markers in Unreal Engine 4. This added much needed contextual data to the Vtune graphical visualizations and was extremely beneficial in profiling the engine’s thread scheduler for some of our other 4.19 work which I’ll talk about later. We also used VTune on Fortnite to find that we’re somewhat render thread thread bound. We’re addressing this in the game and I’ll talk more about it later.
Oh by the way, VTune is now free if you hadn’t heard!
[Render Thread]
[Stat RHI]
Once the game thread performance was under control, I checked stat unit again and now the render thread time was over the game thread time.
Stat RHI is my first stop when profiling the render thread. It has a lot of good info, but most importantly for me it has the number of triangles drawn. This can help narrow down which portions of the maps are over the triangle budget. On Unreal Tournament, I was very particular about the poly count because on our target HD 4000, we had to keep the polygon budget around 5 million to get a good framerate. When we started profiling DM-Underland to run on HD-4000 machines, we noticed that the polycount with no characters was over 7 million and sometimes up to 10 million. Our first hunch was that the landscape might have too many polys. We found that the tesselation of the landscape piece was set very high and it was using 5 million triangles on its own. We got the level designer to change the section size from 255x255 to 63x63. This dropped the landscape to well below 1 million triangles. The level designer had to repaint a few bits of the landscape to make up for the resolution change.
[LOD Colorization view mode]
Even after all that savings on landscape, we still had too many polys. The next tool that I used was the level of detail (or LOD) colorization view. When that viewmode is enabled, instead of being textured and lit, meshes at LOD 0 will be gray, LOD 1 will be green, LOD 2 red, LOD 3 blue. If you look at the screenshot on the left, the background above the play area is quite gray. The rocks there are also used around the map 47 times. Their top LOD is 9272 triangles so we're at about half a million triangles in just that rock mesh. Luckily, the editor has automatic LOD generation built in so I was able to create LODs all the way down to 184 polys without any artist help. If you look at the screenshot on the right, now the rocks are red showing that it has LODs and it’s currently rendering LOD 3. Using the same technique looking around the map, I was able to identify a bunch of other rock meshes that needed LOD creation and I was able to get the poly count within the five million poly budget.
[GPU Profiling]
[ProfileGPU]
Once we're good on the CPU side, we moved on to the GPU. I like to use ProfileGPU which is our built-in tool for displaying the GPU time breakdown.
r.ProfileGPU.ShowUI can be used to suppress the popup window and only print to the log, but typically I use the GUI version.
I used different resolutions and screen percentages in the same scene with multiple runs of ProfileGPU to determine if weÕre able to hit frame rate targets with low pixel counts. The frame rate did increase a bit with resolution decrease, so we were pixel bound.
But even with low resolution and low screen percentage, we were still having trouble making the desired framerate with the deferred renderer. We ended up using our Simple Forward Shading when a player chooses low settings. The GPU profiles were much more favorable after switching renderers, but it comes at the cost of visual fidelity. It was fine for Unreal Tournament, but for Fortnite Battle Royale, we decided that it was going to provide too much advantage and changed the look of the game too much. The perf gains alone were not worth it so we still use the deferred renderer on FN:BR for low settings.
Intel GPA is a tool that helps developers identify where their apps are slow on Intel graphics. Contains both a live mode and a frame debugger. These help narrow down whether you’re bottlenecked in shadows, geometry, post processing etc. ToggleDrawEvents is a console command in UE4 that turns on annotations to help identify where in the scene you are. r.ShowMaterialDrawEvents will mark up each draw call with the material name so you can tie it back to your blueprints. Both of these are super helpful for identifying expensive parts of the scene like landscape in both Fortnite and Unreal Tournament, which we’ll talk about a bit later.
Here at Intel, GPA is our bread and butter tool to profile games on Intel Graphics and identify where targeted optimizations can be made. It also works with other hardware, although you won’t get the same depth of hardware data that you will on Intel. We used it to profile both Unreal Tournament and Fortnite and was instrumental in identifying things like the landscape tessellation issue on the Underland map Pete mentioned before.
[Shader Complexity View Mode]
Even with simple forward shading, some areas of the map still had framerate issues due to overly complex materials. To find those complex materials, I used the shader complexity viewmode.
This view mode shows good materials in green and bad materials in white. On the DM-Underland map in UT, we used it to identify a couple hot spots. The underwater area in general was expensive because of the over draw on the transparent water, but there were some areas that were showing up white hot. It turns out that there's some very expensive coral foliage at the bottom of the lake. You can barely see on high end machines from the normal play area and almost never see on low end. It's also an area that doesn't really see that much game play so toning it down wouldn't effect the overall scene. We ended up lowering the draw distance on that foliage and simplifying the shader on low detail to get back from frame time and back into budget.
[Memory Profiling Tools]
Once we were done with CPU and GPU optimizations, we moved on to memory optimizations. Some platforms like consoles have hard memory caps, they will crash when you use too much memory. Others like low end PC have soft memory caps, they have virtual memory once you hit the physical ram limit. Hitting the soft memory cap can cause compressed memory or virtual memory paging which will kill your performance.
I use the Memreport -full console command to get a list of everything in memory. It generates a text file that contains information about all the static meshes, skeletal meshes, sounds and textures that are loaded.
Listtextures is included in the memreport, but I typically will have QA run it on it’s own regularly once I’ve already done a pass on other memory.
-csv can be used to make it easier to import into a spreadsheet which makes keeping track of memory trends in textures easy.
I’ll do passes through the spreadsheet to look for textures in the wrong group or using too much memory due to forced sizes.
We keep spreadsheets of texture usage for every release that way any major swing in texture memory can be investigated without a ton of effort.
[Primitive Stats]
The editor's primitive statistics panel is another tool that I use when trying to optimize memory. I can sort by size in memory and see if anything is an outlier. It's also a good place to look for assets that don't belong in the current scene, but are getting loaded anyway. On Fortnite Battle Royale, we use it to watch for any assets from Save the World that might be getting loaded accidentally. The primitive stats also lists triangle count which makes it useful for trying to optimize the number of triangles in the scene, in the case of the Landscape in DM-Underland that used to be over 5 million triangles, you can see that it's now only 127 thousand triangles. The count stat can help you decide if you have too many unique hero meshes and would be able to save some memory by duplicating some. On Unreal Tournament, I modified the panel to show LOD count to help deal with the issues we talked about before with rocks that had no LODs.
[Texture Stats]
The statistics panel also had a mode that shows textures. It's similar to the view that memreport provides but can be refreshed in real time. In UE4, the texture pool saves us from having to worry about being pushed over memory limits, but it's in our best interest to make sure the texture pool is being used optimally. The tighter we can make our texture pool, the more space we have for other things. I like to use the texture statistics panel to verify that textures are all in the correct group, are properly mipping, power of 2 dimensions and have the right LODBias. On Unreal Tournament and Gears of War, we would routinely have cinematic sized textures showing up during gameplay so we needed to keep a close watch on this list. Another common mixup is normal maps ending up in World or Character instead of WorldNormalMap or CharacterNormalMap.[Summary]Using the techniques I have described, we achieved our goal of 30 frames per second on an HD-4000 in DM-Underland. We’ve also applied them to get to 60 frames per second on Fortnite console builds. Our next optimization target is getting Fortnite Battle Royale dedicated servers up to a steady 20 hz and then hopefully 30 hz. We have a lot of optimizations coming in Unreal Engine 4.20, but many of these optimizations are already in 4.19.
Going back to Bob’s section about having performance buckets, we often have high end CPUs where a lot of the time is idle. For 4.19, we added some great stuff for developers to take advantage of.
Prior to 4.19, Unreal Engine 4 does not create enough worker threads to fully utilize a CPU beyond 6 cores. This has been fixed to allow the Task Graph system to detect the number of cores on a CPU and scale the number of worker threads available accordingly. This lets developers take full advantage of high core count CPUs, creating more visual realism through systems such as cloth physics, environment destruction, CPU based particles and advanced 3D audio.
The cloth system allows for dynamic simulation of meshes that respond to the player, wind or other environmental factors. Typical cloth workloads include player capes or flags. Cloth is simulated every frame, even if the player is not looking at it because the simulation results determine if it shows up in the player's view. Improved performance by about 30% in 4.19.
Vtune is an important tool to determine thread bottlenecks, sync points and the effectiveness of a thread scheduler. Worked closely with Epic engineers to implement support for ITT markers in Unreal Engine 4. This adds much needed contextual data to the Vtune graphical visualizations.
The picture on the right was taken with a test level with an absurd amount of cloth rendering. This was run on an i7-6950X 10 core 20 thread extreme edition CPU. Now you can make use of all of that CPU power in your games too!
Looking forward, Fortnite* is focusing on consistent framerate on console and dedicated server. RHI thread in DX11 is being enabled for extra headroom.
Take advantage of all system resources in your games. CPU is often overlooked but can add some great eye candy if extra cycles are available, especially with 4.19.