Download the full presentation here: http://www.guerrilla-games.com/read/decima-engine-visibility-in-horizon-zero-dawn
Abstract: Horizon Zero Dawn presented the Decima engine with new challenges in rendering large and dense environments. In particular, we needed to be able to quickly query a very large set of potential objects to find which should be visible. This talk looks at the problems we faced moving from more constrained Killzone levels to Horizon's open world, and our approach to fast visibility queries using the PS4's asynchronous compute hardware. It also covers our recent work on efficiently collecting batches of object instances during the query to reduce load on the entire rendering pipeline. A basic familiarity with GPU compute will be helpful to get the most out of this talk.
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
This talk covers changes in CryENGINE 3 technology during 2012, with DX11 related topics such as moving to deferred rendering while maintaining backward compatibility on a multiplatform engine, massive vegetation rendering, MSAA support and how to deal with its common visual artifacts, among other topics.
Talk by Fabien Christin from DICE at GDC 2016.
Designing a big city that players can explore by day and by night while improving on the unique visual from the first Mirror's Edge game isn't an easy task.
In this talk, the tools and technology used to render Mirror's Edge: Catalyst will be discussed. From the physical sky to the reflection tech, the speakers will show how they tamed the new Frostbite 3 PBR engine to deliver realistic images with stylized visuals.
They will talk about the artistic and technical challenges they faced and how they tried to overcome them, from the simple light settings and Enlighten workflow to character shading and color grading.
Takeaway
Attendees will get an insight of technical and artistic techniques used to create a dynamic time of day system with updating radiosity and reflections.
Intended Audience
This session is targeted to game artists, technical artists and graphics programmers who want to know more about Mirror's Edge: Catalyst rendering technology, lighting tools and shading tricks.
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Past, Present and Future Challenges of Global Illumination in GamesColin Barré-Brisebois
Global illumination (GI) has been an ongoing quest in games. The perpetual tug-of-war between visual quality and performance often forces developers to take the latest and greatest from academia and tailor it to push the boundaries of what has been realized in a game product. Many elements need to align for success, including image quality, performance, scalability, interactivity, ease of use, as well as game-specific and production challenges.
First we will paint a picture of the current state of global illumination in games, addressing how the state of the union compares to the latest and greatest research. We will then explore various GI challenges that game teams face from the art, engineering, pipelines and production perspective. The games industry lacks an ideal solution, so the goal here is to raise awareness by being transparent about the real problems in the field. Finally, we will talk about the future. This will be a call to arms, with the objective of uniting game developers and researchers on the same quest to evolve global illumination in games from being mostly static, or sometimes perceptually real-time, to fully real-time.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
This talk covers changes in CryENGINE 3 technology during 2012, with DX11 related topics such as moving to deferred rendering while maintaining backward compatibility on a multiplatform engine, massive vegetation rendering, MSAA support and how to deal with its common visual artifacts, among other topics.
Talk by Fabien Christin from DICE at GDC 2016.
Designing a big city that players can explore by day and by night while improving on the unique visual from the first Mirror's Edge game isn't an easy task.
In this talk, the tools and technology used to render Mirror's Edge: Catalyst will be discussed. From the physical sky to the reflection tech, the speakers will show how they tamed the new Frostbite 3 PBR engine to deliver realistic images with stylized visuals.
They will talk about the artistic and technical challenges they faced and how they tried to overcome them, from the simple light settings and Enlighten workflow to character shading and color grading.
Takeaway
Attendees will get an insight of technical and artistic techniques used to create a dynamic time of day system with updating radiosity and reflections.
Intended Audience
This session is targeted to game artists, technical artists and graphics programmers who want to know more about Mirror's Edge: Catalyst rendering technology, lighting tools and shading tricks.
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Past, Present and Future Challenges of Global Illumination in GamesColin Barré-Brisebois
Global illumination (GI) has been an ongoing quest in games. The perpetual tug-of-war between visual quality and performance often forces developers to take the latest and greatest from academia and tailor it to push the boundaries of what has been realized in a game product. Many elements need to align for success, including image quality, performance, scalability, interactivity, ease of use, as well as game-specific and production challenges.
First we will paint a picture of the current state of global illumination in games, addressing how the state of the union compares to the latest and greatest research. We will then explore various GI challenges that game teams face from the art, engineering, pipelines and production perspective. The games industry lacks an ideal solution, so the goal here is to raise awareness by being transparent about the real problems in the field. Finally, we will talk about the future. This will be a call to arms, with the objective of uniting game developers and researchers on the same quest to evolve global illumination in games from being mostly static, or sometimes perceptually real-time, to fully real-time.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Game engines have long been in the forefront of taking advantage of the ever increasing parallel compute power of both CPUs and GPUs. This talk is about how the parallel compute is utilized in practice on multiple platforms today in the Frostbite game engine and how we think the parallel programming models, hardware and software in the industry should look like in the next 5 years to help us make the best games possible
The presentation describes Physically Based Lighting Pipeline of Killzone : Shadow Fall - Playstation 4 launch title. The talk covers studio transition to a new asset creation pipeline, based on physical properties. Moreover it describes light rendering systems used in new 3D engine built from grounds up for upcoming Playstation 4 hardware. A novel real time lighting model, simulating physically accurate Area Lights, will be introduced, as well as hybrid - ray-traced / image based reflection system.
We believe that physically based rendering is a viable way to optimize asset creation pipeline efficiency and quality. It also enables the rendering quality to reach a new level that is highly flexible depending on art direction requirements.
Volumetric Lighting for Many Lights in Lords of the FallenBenjamin Glatzel
In this session I’m going to give you an in-depth insight into the design and the implementation of the volumetric lighting system we’ve developed for ‘Lords of the Fallen’. The system allows the simulation of countless volumetric lighting effects in parallel while still being a feasible solution on next-gen consoles.
This presentation was held at the Digital Dragons 2014 conference.
Videos shown during the talk are available here: http://bglatzel.movingblocks.net/publications
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
In this talk, the authors will describe an overview of a different method for deferred lighting approach used in CryENGINE 3, along with an in-depth description of the many techniques used. Original file and videos at http://crytek.com/cryengine/presentations
There are a lot of articles about games. Most of these are about particular aspects of a game like rendering or physics. All engines, however, have a binding structure that ties all aspects of the game together. Usually there is a base class (Object, Actor or Entity are common names) that all objects in the game derive from, but very little is written on the subject. Only very recently a couple of talks on game|tech have briefly touched on the subject. Still, choosing a structure to build your game on is very important. The end user might not “see” the difference between a good and a bad structure, but this choice will affect many aspects of the development process. A good structure will reduce risk and increase the efficiency of the team.
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
This lecture covers rendering topics related to Crytek’s latest engine iteration, the technology which powers titles such as Ryse, Warface, and Crysis 3. Among covered topics, Sousa presented SMAA 1TX: an update featuring a robust and simple temporal antialising component; performant and physically-plausible camera related post-processing techniques such as motion blur and depth of field were also covered.
Course presentation at SIGGRAPH 2014 by Charles de Rousiers and Sébastian Lagarde at Electronic Arts about transitioning the Frostbite game engine to physically-based rendering.
Make sure to check out the 118 page course notes on: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/
During the last few months, we have revisited the concept of image quality in Frostbite. The core of our approach was to be as close as possible to a cinematic look. We used the concept of reference to evaluate the accuracy of produced images. Physically based rendering (PBR) was the natural way to achieve this. This talk covers all the different steps needed to switch a production engine to PBR, including the small details often bypass in the literature.
The state of the art of real-time PBR techniques allowed us to achieve good overall results but not without production issues. We present some techniques for improving convolution time for image based reflection, proper ambient occlusion handling, and coherent lighting units which are mandatory for level editing.
Moreover, we have managed to reduce the quality gap, highlighted by our systematic reference comparison, in particular related to rough material handling, glossy screen space reflection, and area lighting.
The technical part of PBR is crucial for achieving good results, but represents only the top of the iceberg. Frostbite has become the de facto high-end game engine within Electronic Arts and is now used by a large amount of game teams. Moving all these game teams from “old fashion” lighting to PBR has required a lot of education, which have been done in parallel of the technical development. We have provided editing and validation tools to help the transition of art production. In addition, we have built a flexible material parametrisation framework to adapt to the various authoring tools and game teams’ requirements.
We present the technology and ideas behind the unique lighting in MIRRORS EDGE from DICE. Covering how DICE adopted Global illumination into their lighting process and Illuminate Labs current toolbox of state of the art lighting technology.
Progressive Lightmapper: An Introduction to Lightmapping in UnityUnity Technologies
In 2018.1 we removed the preview label from the Progressive Lightmapper – we’ve made memory improvements, optimizations, and have had customers battle test it. We are now also working on a GPU accelerated version of the lightmapper. In this session, Tobias and Kuba will provide an intro to the basics of lightmapping and address of the most common issues that users struggle with and how to solve them. They will also provide an update on the future roadmap for lightmapping in Unity.
Tobias Alexander Franke & Kuba Cupisz (Unity Technologies)
This talk provides additional details around the hybrid real-time rendering pipeline we developed at SEED for Project PICA PICA.
At Digital Dragons 2018, we presented how leveraging Microsoft's DirectX Raytracing enables intuitive implementations of advanced lighting effects, including soft shadows, reflections, refractions, and global illumination. We also dove into the unique challenges posed by each of those domains, discussed the tradeoffs, and evaluated where raytracing fits in the spectrum of solutions.
The Real-time Volumetric Cloudscapes of Horizon Zero DawnGuerrilla
Download the full presentation (including movies) from http://www.guerrilla-games.com/publications.html#schneider1508.
Abstract: Real-time volumetric clouds in games usually pay for fast performance with a reduction in quality. The most successful approaches are limited to low altitude fluffy and translucent stratus-type clouds. For Horizon: Zero Dawn, Guerrilla need a solution that can fill a sky with evolving and realistic results that closely match highly detailed reference images which represent high altitude cirrus clouds and all of the major low level cloud types, including thick billowy cumulus clouds. These clouds need to light correctly according to the time of day and other cloud-specific lighting effects. Additionally, we are targeting GPU performance of 2ms. Our solution is a volumetric cloud shader which handles the aspects of modeling, animation and lighting logically without sacrificing quality or draw time. Special emphasis will be placed on our solutions for direct-ability of cloud shapes and formations as well as on our lighting model and optimizations.
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
A behind-the-scenes look into the latest renderer technology powering the critically acclaimed DOOM. The lecture will cover how technology was designed for balancing a good visual quality and performance ratio. Numerous topics will be covered, among them details about the lighting solution, techniques for decoupling costs frequency and GCN specific approaches.
Unity - Internals: memory and performanceCodemotion
by Marco Trivellato - In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part will cover implementation details and their memory vs cycles tradeoffs in both Unity4 and the upcoming Unity5.
Game engines have long been in the forefront of taking advantage of the ever increasing parallel compute power of both CPUs and GPUs. This talk is about how the parallel compute is utilized in practice on multiple platforms today in the Frostbite game engine and how we think the parallel programming models, hardware and software in the industry should look like in the next 5 years to help us make the best games possible
The presentation describes Physically Based Lighting Pipeline of Killzone : Shadow Fall - Playstation 4 launch title. The talk covers studio transition to a new asset creation pipeline, based on physical properties. Moreover it describes light rendering systems used in new 3D engine built from grounds up for upcoming Playstation 4 hardware. A novel real time lighting model, simulating physically accurate Area Lights, will be introduced, as well as hybrid - ray-traced / image based reflection system.
We believe that physically based rendering is a viable way to optimize asset creation pipeline efficiency and quality. It also enables the rendering quality to reach a new level that is highly flexible depending on art direction requirements.
Volumetric Lighting for Many Lights in Lords of the FallenBenjamin Glatzel
In this session I’m going to give you an in-depth insight into the design and the implementation of the volumetric lighting system we’ve developed for ‘Lords of the Fallen’. The system allows the simulation of countless volumetric lighting effects in parallel while still being a feasible solution on next-gen consoles.
This presentation was held at the Digital Dragons 2014 conference.
Videos shown during the talk are available here: http://bglatzel.movingblocks.net/publications
Secrets of CryENGINE 3 Graphics TechnologyTiago Sousa
In this talk, the authors will describe an overview of a different method for deferred lighting approach used in CryENGINE 3, along with an in-depth description of the many techniques used. Original file and videos at http://crytek.com/cryengine/presentations
There are a lot of articles about games. Most of these are about particular aspects of a game like rendering or physics. All engines, however, have a binding structure that ties all aspects of the game together. Usually there is a base class (Object, Actor or Entity are common names) that all objects in the game derive from, but very little is written on the subject. Only very recently a couple of talks on game|tech have briefly touched on the subject. Still, choosing a structure to build your game on is very important. The end user might not “see” the difference between a good and a bad structure, but this choice will affect many aspects of the development process. A good structure will reduce risk and increase the efficiency of the team.
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
This lecture covers rendering topics related to Crytek’s latest engine iteration, the technology which powers titles such as Ryse, Warface, and Crysis 3. Among covered topics, Sousa presented SMAA 1TX: an update featuring a robust and simple temporal antialising component; performant and physically-plausible camera related post-processing techniques such as motion blur and depth of field were also covered.
Course presentation at SIGGRAPH 2014 by Charles de Rousiers and Sébastian Lagarde at Electronic Arts about transitioning the Frostbite game engine to physically-based rendering.
Make sure to check out the 118 page course notes on: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/
During the last few months, we have revisited the concept of image quality in Frostbite. The core of our approach was to be as close as possible to a cinematic look. We used the concept of reference to evaluate the accuracy of produced images. Physically based rendering (PBR) was the natural way to achieve this. This talk covers all the different steps needed to switch a production engine to PBR, including the small details often bypass in the literature.
The state of the art of real-time PBR techniques allowed us to achieve good overall results but not without production issues. We present some techniques for improving convolution time for image based reflection, proper ambient occlusion handling, and coherent lighting units which are mandatory for level editing.
Moreover, we have managed to reduce the quality gap, highlighted by our systematic reference comparison, in particular related to rough material handling, glossy screen space reflection, and area lighting.
The technical part of PBR is crucial for achieving good results, but represents only the top of the iceberg. Frostbite has become the de facto high-end game engine within Electronic Arts and is now used by a large amount of game teams. Moving all these game teams from “old fashion” lighting to PBR has required a lot of education, which have been done in parallel of the technical development. We have provided editing and validation tools to help the transition of art production. In addition, we have built a flexible material parametrisation framework to adapt to the various authoring tools and game teams’ requirements.
We present the technology and ideas behind the unique lighting in MIRRORS EDGE from DICE. Covering how DICE adopted Global illumination into their lighting process and Illuminate Labs current toolbox of state of the art lighting technology.
Progressive Lightmapper: An Introduction to Lightmapping in UnityUnity Technologies
In 2018.1 we removed the preview label from the Progressive Lightmapper – we’ve made memory improvements, optimizations, and have had customers battle test it. We are now also working on a GPU accelerated version of the lightmapper. In this session, Tobias and Kuba will provide an intro to the basics of lightmapping and address of the most common issues that users struggle with and how to solve them. They will also provide an update on the future roadmap for lightmapping in Unity.
Tobias Alexander Franke & Kuba Cupisz (Unity Technologies)
This talk provides additional details around the hybrid real-time rendering pipeline we developed at SEED for Project PICA PICA.
At Digital Dragons 2018, we presented how leveraging Microsoft's DirectX Raytracing enables intuitive implementations of advanced lighting effects, including soft shadows, reflections, refractions, and global illumination. We also dove into the unique challenges posed by each of those domains, discussed the tradeoffs, and evaluated where raytracing fits in the spectrum of solutions.
The Real-time Volumetric Cloudscapes of Horizon Zero DawnGuerrilla
Download the full presentation (including movies) from http://www.guerrilla-games.com/publications.html#schneider1508.
Abstract: Real-time volumetric clouds in games usually pay for fast performance with a reduction in quality. The most successful approaches are limited to low altitude fluffy and translucent stratus-type clouds. For Horizon: Zero Dawn, Guerrilla need a solution that can fill a sky with evolving and realistic results that closely match highly detailed reference images which represent high altitude cirrus clouds and all of the major low level cloud types, including thick billowy cumulus clouds. These clouds need to light correctly according to the time of day and other cloud-specific lighting effects. Additionally, we are targeting GPU performance of 2ms. Our solution is a volumetric cloud shader which handles the aspects of modeling, animation and lighting logically without sacrificing quality or draw time. Special emphasis will be placed on our solutions for direct-ability of cloud shapes and formations as well as on our lighting model and optimizations.
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
A behind-the-scenes look into the latest renderer technology powering the critically acclaimed DOOM. The lecture will cover how technology was designed for balancing a good visual quality and performance ratio. Numerous topics will be covered, among them details about the lighting solution, techniques for decoupling costs frequency and GCN specific approaches.
Unity - Internals: memory and performanceCodemotion
by Marco Trivellato - In this presentation we will provide in-depth knowledge about the Unity runtime. The first part will focus on memory and how to deal with fragmentation and garbage collection. The second part will cover implementation details and their memory vs cycles tradeoffs in both Unity4 and the upcoming Unity5.
Current HDFS Namenode stores all of its metadata in RAM. This has allowed Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits the total number of files that a single NameNode can store. While Federation allows one to create multiple volumes with additional Namenodes, there is a need to scale a single namespace and also to store multiple namespaces in a single Namenode.
This talk describes a project that removes the space limits while maintaining similar performance by caching only the working set or hot metadata in Namenode memory. We believe this approach will be very effective because the subset of files that is frequently accessed is much smaller than the full set of files stored in HDFS.
In this talk we will describe our overall approach and give details of our implementation along with some early performance numbers.
Speaker: Lin Xiao, PhD student at Carnegie Mellon University, intern at Hortonworks
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn. This was a presentation made at QCon 2009 and is embedded on LinkedIn's blog - http://blog.linkedin.com/
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
Speaker(s): Jon Haddad, Apache Cassandra Evangelist at DataStax
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
Speaker(s): Jon Haddad, Apache Cassandra Evangelist at DataStax
This sessions covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Attendees will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This talk is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Intro deck from Cassandra Day Atlanta. Covers the evolution of data storage and analysis, the architecture of Cassandra, the read & write path, and using Cassandra for analytics. By Jon Haddad & Luke Tillman
04 accelerating dl inference with (open)capi and posit numbersYutaka Kawai
This was presented by Louis Ledoux and Marc Casas at OpenPOWER summit EU 2019. The original one is uploaded at:
https://static.sched.com/hosted_files/opeu19/1a/presentation_louis_ledoux_posit.pdf
Diagnosing Problems in Production - CassandraJon Haddad
This presentation covers diagnosing and solving common problems encountered in production, using performance profiling tools. We’ll also give a crash course to basic JVM garbage collection tuning. Readers will leave with a better understanding of what they should look for when they encounter problems with their in-production Cassandra cluster. This presentation is intended for people with a general understanding of Cassandra, but it not required to have experience running it in production.
Scaling with sync_replication using Galera and EC2Marco Tusa
Challenging architecture design, and proof of concept on a real case of study using Syncrhomous solution.
Customer asks me to investigate and design MySQL architecture to support his application serving shops around the globe.
Scale out and scale in base to sales seasons.
Slides from talk given on Java/Scala Lab 2014 in Odessa, Ukraine. Describes of how Java can be used as platform for latency-restricted applications such as High Frequency Trading and demonstrates how latencies 15-30µsec can be achieved on vanilla Oracle JDK.
Horizon Zero Dawn: An Open World QA Case StudyGuerrilla
Download the full presentation here: http://www.guerrilla-games.com/
Abstract: A retrospective of how Developer (Guerrilla) and Publisher QA (SIE) worked together in partnership to help deliver a AAA experience. The presentation will focus on the victories and the challenges we faced as part of testing such an ambitious open-world title, and leveraging automated test solutions and telemetry to inform exploratory test strategy. The presentation will talk about how we managed teams over a 21 month testing lifecycle, with a team ranging from small internal test teams, before scaling up to a team of 70+ people across the globe with tens of thousands of test hours invested. The presentation will cover the successes and challenges relating to our people; early engagement, communication, trust, agility and collaboration. Our processes; risk management, test strategy, and post launch support, and our tools; Telemetry, Risk Registers, Worldwide build delivery.
Horizon Zero Dawn: A Game Design Post-MortemGuerrilla
Download the full presentation here: http://www.guerrilla-games.com/read/horizon-zero-dawn-a-game-design-postmortem
Abstract: Going through early prototypes and delving into design decisions and processes that shaped development of the game, this talk gives insight into the journey game design went through while moving from an ambitious paper concept to a finished open world action RPG, with all of the small and large design decisions and choices that have to be made along the way.
Putting the AI Back Into Air: Navigating the Air Space of Horizon Zero DawnGuerrilla
Download the full presentation here: http://www.guerrilla-games.com/read/putting-the-ai-back-into-air
Abstract: In this talk, we explain the technology behind the aerial navigation in Horizon Zero Dawn. In Horizon, we've represented the flyable air space by use of a run-time generated height map. Queries can be done on this height map for positional information and navigation. We present a hierarchical path planning algorithm for finding a progressively more detailed path between two points. Additionally, we will touch on some gameplay related subjects, to show the additional challenges we faced in implementing the different flying behaviors, such as transitioning from air to ground and guided crash-landing.
Building Non-Linear Narratives in Horizon Zero DawnGuerrilla
Download the original PowerPoint presentation here: http://www.guerrilla-games.com/read/building-non-linear-narratives-in-horizon-zero-dawn
Open world RPGs require deep stories, which emphasize player choice and freedom. The bigger the games grow, the more complex systems managing these stories get. See how Guerrilla tackled the issue in Horizon Zero Dawn by creating a quest system which has non-linearity at its base.
Player Traversal Mechanics in the Vast World of Horizon Zero DawnGuerrilla
Download the original PowerPoint presentation here: http://www.guerrilla-games.com/read/player-traversal-mechanics-in-the-vast-world-of-horizon-zero-dawn
Paul van Grinsven shows what is needed to make Aloy traverse the vast world of Horizon Zero Dawn, with its complex and organic environments. Various traversal mechanics are covered from a gameplay programmer's perspective, focusing on the interaction between code and animations. The different systems and techniques involved in the implementation of these mechanics are explained, and Van Grinsven looks at the underlying reasoning and design decisions.
The Production and Visual FX of Killzone Shadow FallGuerrilla
With the arrival of next generation consoles comes more processing power and memory... and with that, a whole lot more possibilities. Take an exclusive look behind the scenes of this Playstation 4 launch title and hear about the challenges the team were faced with during production. Learn more about the engine's new lighting and shading tech and take a closer look at how some of the effects like dynamic dust, rain and various particle effects are achieved.
Out of Sight, Out of Mind: Improving Visualization of AI InfoGuerrilla
When AI was simple, debugging consisted of confirming that the character was simply doing the one thing you expected. Over time, debugging moved away from "what" and became more about "why?" or "why not?" The collision of information about the agents, the environment, the player and the game state creates an enormous amount of data that can affect the decisions that the characters make. In this presentation some features of ReView will be demonstrated as how it was used for debugging the multi-player bots in Killzone Shadow Fall.
The Next-Gen Dynamic Sound System of Killzone Shadow FallGuerrilla
We'll describe our new audio run-time and toolset that was built specifically for Killzone Shadow Fall on PlayStation 4. However, the ideas used are widely applicable and the focus is on integration with the game engine and fast iteration, with special attention to shortcuts to get your creative spark translated into in-game sounds as quickly as possible. We will demonstrate our implementation of these demands, which is a next-gen sound system that was designed to combine artistic freedom with high run-time performance. It should be interesting to both creative as well as technical minds who are looking for inspiration on what to expect from a modern sound design environment. To emphasize the performance advantage, we will show the point of view of both the sound designer and the programmer simultaneously. We will use examples starting with simple sounds and build up to increasingly more complex dynamic ones to illustrate the benefits of this unique approach.
Killzone Shadow Fall: Creating Art Tools For A New Generation Of GamesGuerrilla
This talk describes the tool improvements Guerrilla Games implemented to make Killzone Shadow Fall shine on the PlayStation 4. It highlights additions to the Maya pipeline, such as Viewport 2.0, Maya's coupling with in-game updates and in-engine deferred renderer features including real-time shadow-casting, volumetric lighting, hardware instancing, lens flares and color grading.
This talk is about our experiences gained during making of the Killzone Shadow Fall announcement demo.
We’ve gathered all the hard data about our assets, memory, CPU and GPU usage and a whole bunch of tricks.
The goal of talk is to help you to form a clear picture of what’s already possible to achieve on PS4.
A Hierarchically-Layered Multiplayer Bot System for a First-Person ShooterGuerrilla
This thesis research was the basis for Killzone 2’s multi-player bots. It proposes a hierarchically structured system to control the multiplayer bots, using one AI commander for each faction, each commanding several group leaders, each commanding several individual bots. The system is evaluated on the basis of a fully implemented AI for one of the multiplayer game modes.
Games always have too much stuff to render, but are often set in occluded environments, where much of what is in the view frustum is not visible to the camera. Occlusion culling tries to avoid rendering objects which the player can't see.
Discussion of occlusion for games has tended to focus on the use of GPU pixel counters to determine whether or not an object is visible. This places additional stress on a limited resource, and has latency which has to be worked around, requiring a more complex pipeline.
For KILLZONE 3 we took a different approach - we use the SPUs to render a conservative depth buffer and perform queries against it. This allows us to cull objects very early in the frame, avoiding any pipeline costs for invisible objects.
This presentation talks about the ideas (and dead ends) we explored along the way, as well as explaining in detail what we ended up with.
This presentation the describes architecture behind the multiplayer bot AI for Killzone 2. As part of this, it describes the hierarchical task network (HTN) planner, terrain analysis, and the way commander squad and individual bot AI work together for dynamic tactical gameplay.
Automatic Annotations in Killzone 3 and BeyondGuerrilla
Voxel-based technologies have been successfully used to extract navigation data for NPCs. This talk shows how voxel data was used to automatically extract cover planes and more information in Killzone 3’s toolset.
This talk details various aspects of designing and developing videogames at Guerrilla. It highlights methods that are very similar to methods used in the CGI industry, and it illuminates some of the most important differences. And it covers the complete breadth of videogame development from artistic design to production pipelines and tool and engine development.
The PlayStation®3’s SPUs in the Real World: A KILLZONE 2 Case StudyGuerrilla
This session describes many of the SPU techniques used in the engine used to develop KILLZONE 2 for the PlayStation 3. It first focuses on individual techniques for SPU's as well as covering how these techniques work together in the game engine each frame.
Dynamic tactical position evaluation functions are procedures that use static and dynamic information about the world to make tactical decisions at run-time. By having NPCs use a procedural description of the solution to a tactical decision, the solution can vary depending on the inputs. Using static information about the world as input makes sure the tactics are applicable at any place on the map. Using dynamic inputs makes sure they are applicable in multiple situations.
Practical Occlusion Culling in Killzone 3Guerrilla
Killzone 3 features complex occluded environments. To cull non-visible geometry early in the frame, the game uses PlayStation 3 SPUs to rasterize a conservative depth buffer and perform fast synchronous occlusion queries against it. This talk presents an overview of the approach and key lessons learned during its development.
This presentation gives an overview of the rendering techniques used in KILLZONE 2. We put the main focus on the lighting and shadowing techniques of our deferred shading engine and how we made them play nicely with anti-aliasing.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
3. About me
Making games since 19(ZX)81
Getting paid for it since 1998
Moved to New Zealand in 2004
Contracting for Guerrilla since 2005, mostly full time
character arc
4. About Guerrilla
Over 200 people in Amsterdam Studio
Good at art
Good at technology
Getting good at workflow
Optimising for workflow is key
• Don’t stop (or crash) the ship!
Optimising for performance is still important
• Good workflow gives artists time to make more content…
5. About Horizon Zero Dawn
New IP was a big departure from the Killzone FPS series
New challenges
• Large world size
• Huge content size
• Content in small pieces
• See all the way to the horizon
• Variable density (settlements, cauldrons)
• Continuous streaming
• Random encounters
11. Existing system – Models
Our model data is represented by MeshResources
• Actually an arbitrary tree of other MeshResources
• Level of detail (LOD) switch and multi-mesh internal nodes
• Static and skinned mesh leaf nodes with primitive data
LodMeshResource
MultiMeshResource StaticMeshResource StaticMeshResource
StaticMeshResource StaticMeshResource
PrimitiveResource
PrimitiveResource
12. Existing system – Instances
Meshes placed in the world as DrawableObjects
• Each has its own MeshInstanceTree
• Encodes resource tree in efficient flat form
MeshInstanceTree leaf nodes contain DrawableSetups
• Primitive geometry (vertex & index arrays)
• Shaders and rendering options
• Local-to-world transform(s)
K-d tree of DrawableObjects provides spatial hierarchy
13. Existing system - Queries
Find visible DrawableObjects
• Walk k-d tree
• Frustum cull each DrawableObject
Find visible DrawableSetups
• Walk each visible DrawableObject’s MeshInstanceTree
• Descend into relevant LODs
• Frustum cull each DrawableSetup
Output list of visible DrawableSetups to renderer
14. Existing system - Queries
Same system for all geometry
• Static and dynamic treated the same way
Run two or more queries per frame
• Player camera (perspective)
• Sunlight shadow map (orthographic)
• Other shadow maps (perspective, smaller frusta)
15. Existing system - Problems
Knew there were several scaling issues
• K-d tree rebuilds expensive
• MeshInstanceTree queries expensive
• Despite parallel jobs
• Mix of large and small trees unbalance jobs
• K-d tree queries relatively fast though
• API aimed at fewer, larger objects
• Building blocks are many many small objects
• Interface overhead (mainly from locking)
17. Basic goals
No offline precomputation
• Seriously, none
Support existing content
Handle far more content than KZ4
Cut the query time below KZ4
• Query is critical path for all rendering
18. New system - StaticScene
Handle static data only
• Far more static data than dynamic data
• Existing system works well for dynamic data
• Don’t overcomplicate things
• Run both jobs in parallel
Use asynchronous compute hardware
• Synchronise with CPU not rendering
• Like PS3 SPU sync, which we knew we liked
19. Input constraints
Most static resource trees are like this
• High LOD(s): Lists of building blocks, placed by artists
• Low LOD(s): Collapsed geometry created by tools from building blocks
LodMeshResource
MultiMeshResource
LodMeshResource LodMeshResource
StaticMeshResource StaticMeshResource
SMR SMR SMR SMR
lod1: collapsed
geometry
lod0: multiple building blocks
(with internal lods)
lod2: collapsed
geometry
20. Input constraints
Want to flatten this tree into something more uniform
• For feeding compute
Can represent completely as two LOD levels
• Call these parent & child
• Bounding box plus [min, max) LOD distances
• Leaf is visible iff. both parent and child LOD selected
• If tree doesn’t have two LOD levels, pad with “always on” levels
• No special cases
Outlier content moved to dynamic
• Works fine within the old system
• Artists tweaked workflow to get rid of it over time
21. High level structure: StaticTile
Split world into tiles
• Defined by streaming groups, not spatially
• Some groups are true spatial tiles
• Other groups things like settlements, encounters
Once created, a tile is immutable
• Dynamic updates would complicate things
• Destroy and recreate to change
22. High level data: Tile contents
GPU buffers
• QueryObjects (representing DrawableObjects)
• QuerySetups (representing DrawableSetups)
• QueryInstances (Connect one QueryObject and one QuerySetup)
• Map 1:1 to compute threads
• Matrices & bounding boxes
• Loaded indirectly from instances
CPU clusters
• Spatially coherent ranges of instances with bounds & LOD
23. Low level data: QueryInstance
Most numerous data
• Hence aggressive packing
Filter allows fast rejection of instances
• E.g. for selecting just shadow casters
• Or just visual meshes
• No need to load more data
Indices used for indirect loads
• Matrices (48 byte 4x3 float matrix)
• Bounds (12 byte half-precision AABB)
QueryInstance
Filter mask 3 bits
Flags 2 bits
Setup index 12 bits
Object index 17 bits
Parent & child
bounds indices
2 @ 15 bits
Matrix index 14 bits
Parent LOD range 2 @ 12 bits
Child LOD range 2 @ 12 bits
Future proofing 2 whole bits
16 bytes
24. Low level data: QuerySetup/QueryObject
Both stored exactly once per tile
• Packing much less important
Renderer uses camera-relative floating space
• Store snapped integer position for objects
• Object-to-snapped matrix relative to that
• Construct and output object-to-floating
• Maintaining precision far from origin
Setup has accurate vertex bounds for geometry
• LOD bounds are always an overestimate
• Aggregated across LODs, stored in object space
QueryObject
Object to snapped 48 bytes
Snapped position 12 bytes
LOD scale, flags 4 bytes
64 bytes
QuerySetup
Local bounds 24 bytes
CPU pointer 8 bytes
32 bytes
25. Building tiles: Loading
On streaming thread:
• Split adds into multiple StaticTiles if change is large
• Add objects to “orphan” tile if change is small
• Add/remove sets match, so remove whole tiles (apart from orphans)
• Aim to get consistent tile size for load balancing & scaling
• Generate spatial partition, fill buffers
On main thread:
• Just make ready tiles active
• No heavy lifting - avoid streaming hitches
26. Building tiles: Spatial partition
Use tiles as first level of partition
• Generally spatially coherent already
Create spatial+ partition within each tile
• Sort instances by filter, LOD range and Morton number
• Filter allows rejection of entire clusters on CPU
• LOD range does the same thing for e.g. detailed dense areas
• Morton numbers provide reasonable spatial coherence
Sort keys define clusters
• Along with minimum size
• Clusters map 1:1 to compute jobs
27. Simple but useful concept
• Quantise position to integers
• Interleave components bit-by-bit
• Generates a 1D Z-order curve from N-dimensional points
• In 3D, equivalent to building an octree
Morton numbers close?
• Positions are close (mostly)
• And vice-versa
Can compute fairly quickly with bit tricks
• Spread bits of each component by two
Useful for quick and dirty spatial structure
• Hilbert curve has better locality, but more expensive to compute
Aside: Morton numbers
0
63
28. CPU job for query
Run one CPU job per static query
• Part of overall job graph for visibility query
• In parallel with existing CPU jobs for dynamic query
Test tile and cluster visibility on CPU
Kick one compute job for each visible cluster
CPU job waits for GPU jobs
Process tiles/clusters Sort and renderGame code
Visible Skipped Visible
Cluster compute job Cluster compute job
Wait for computeCPU:
Compute
dispatch
Compute
sync point
Compute
dispatch
GPU:
29. GPU compute job for cluster
Uber-shader with some compile-time specialisations
• Shadow queries and visual queries have different options
• Compile a couple of “fast path” shaders with reduced code size
• Make sure we don’t fall off the fast path…
30. GPU compute thread: Input and tests
Load QueryInstance and test filter
• Early out if not selected
Load contents (matrices, bounds, etc.)
Compute parent world bounds and test LOD
• Skip visibility tests if not selected
Compute child world bounds and test LOD
Load accurate local bounds and test visibility
• Frustum
• Size threshold
• Occlusion cull
31. GPU compute thread: Output
Load, update and store LOD fade state
• For player camera query only
• Needs LOD and frustum visibility to update
Skip if faded out (or invisible, for shadows)
Allocate space in output buffer
• Shared by all threads/jobs for one query
• Use aggregated atomics and global counter
Write DrawableSetup pointer and transform
32. Aside: Some compute terminology
Wavefront
• Indivisible block of threads executing in lock-step (64 threads on PS4)
LDS
• Local Data Store - fast memory shared by wavefront threads
VGPR
• Vector General Purpose Register, one value per thread in wavefront
• Also have SGPRs, scalar registers, with one value for the entire wavefront
Vector Lane
• One element of a vector register
Ballot(predicate)
• Bit-mask of active threads where (predicate) is true (64 bits on PS4)
Atomic
• Memory operation which can’t be interrupted by another thread
33. Aside: Aggregated atomics
Compute threads often want to use global atomics
• E.g. to append to buffers, count things
Useful to aggregate these across a wavefront
• One atomic per wavefront instead of one per thread
• Saves memory traffic
• As a bonus, gives fixed append order within wavefront
Drop-in replacement for existing atomics
For more information, see [Adinetz14]
34. Aside: Aggregated atomics
0 0 1 1 1 2 3 3Prefix sum n of active bits (vector)
4First active thread performs global atomic
Count of active thread bit-mask (scalar) 4
A B C D E F G HActive threads each want to write an item
to shared output, using counter to append
Threads
m
Counter buffer
(initial value)
Previous counter value m distributed to
all threads
m mm m
Each thread writes to address (m+n) FB E
Output buffer
m+2 m+3m+0 m+1
H
m m+3…Addresses
m+4Global atomic
Counter buffer
(updated value)
Bit-mask of active threads (scalar) 0 1 0 0 1 1 0 1
Time
35. Aside: Aggregated atomics
// Which threads are active now? This is scalar, i.e. the same for the entire wavefront
ulong active_mask = ballot(true);
// If the ID of this thread is the lowest active ID, then this is the first active thread
uint wavefront_old_value;
if (ReadFirstLane(thread_id_in_wavefront) == thread_id_in_wavefront)
{
// Count bits in scalar mask to get total increment for the wavefront.
uint increment = popcnt(active_mask);
// Perform global atomic add, retrieve original value
OutputBuffer.AtomicAdd(address, increment, wavefront_old_value);
}
// Distribute original value from first active thread to all originally active threads
wavefront_old_value = ReadFirstLane(wavefront_old_value);
// Add prefix sum to get each thread's value. This is usually used as a destination address.
uint thread_value = wavefront_old_value + MaskBitCnt(active_mask);
37. Batching during query
In Horizon Zero Dawn renderer:
• Sort DrawableSetup by geometry and shader (as well as depth etc.)
• Batch together close to the end of the CPU render pipeline
• Perfect batching as global list of DrawableSetups is available
• Pay CPU cost for each DrawableSetup in the pipeline
• Only support this for key passes (deferred geometry, shadowmaps)
Latest work:
• Create and output batches from visibility query
• Renderer only sees batches, saves CPU time
• Imperfect batching as GPU only sees one cluster at a time
• Currently use both systems at once
38. GPU compute thread: Batched output
Write two buffers, one for DrawableSetupBatches
• Batch header
And one for DrawableSetupInstances
• Local to floating space transform & state
• Previous transform (only dynamic geometry)
DrawableSetupBatch
Sort key 8 bytes
Pointer to setup 8 bytes
Bounding sphere 16 bytes
Pointer to instances 8 bytes
Count and stride 8 bytes
48 bytes
DrawableSetupInstance*
Instance state 16 bytes
Local to floating 48 bytes
(Previous L2F) (48 bytes)
64-112 bytes
39. GPU compute thread: Batched output
On the CPU
• Add DrawableSetup index to spatial+ sort key for QueryInstances
• So QueryInstances now sorted by DrawableSetup as well
• Mark QueryInstance with a bit when DrawableSetup changes
In the shader
• Use these bits to group threads by DrawableSetup
• (Do visibility tests for each thread)
• Each active thread appends DrawableSetupInstance to output
• Each group of threads accumulates DrawableSetupBatch to LDS
• E.g. overall bounds, batch length
• First active thread in each group appends DrawableSetupBatch to output
40. GPU compute thread: Batching
0 0 1 1 1 2 3 3Group index (prefix sum over end bits)
C HBActive threads append their instances E E HB C
Instance buffer
2 11
First active thread per group
appends batch to output buffer 11 2
Batch buffer
(again showing lengths)
Pointers from batches
back to their instances
Read back from LDS
Do visibility tests, active threads are visible C HB E
A B C D E F G H
Wavefront sorted by DrawableSetup
(colours show groups of same setup)
Threads
And collate into LDS using group indices
(showing batch lengths here)
LDS atomics
1 2 1 LDS array
0 1 0 0 1 1 0 1
Group end bit from CPU
(four groups of threads)
Time
41. GPU compute thread: One big batch
0 0 0 0 0 0 0 1
Group end bit from CPU
(here just one big group)
0 0 0 0 0 0 0 0Group index (prefix sum over end bits)
C HBActive threads append their instances E E HB C
Instance buffer
4
First active thread per group
appends batch to output buffer 4
Batch buffer
(showing batch lengths)
Pointers from batches
back to their instances
Read back from LDS
Do visibility tests, active threads are visible C HB E
A B C D E F G H
Wavefront sorted by DrawableSetup
(here each thread has the same setup)
Threads
And collate into LDS using group indices
(here one batch of four visible instances)
LDS atomics
4 LDS array
Time
42. GPU compute thread: Batches of one
1 1 1 1 1 1 1 1
Group end bit from CPU
(here eight groups of one setup each)
0 1 2 3 4 5 6 7Group index (prefix sum over end bits)
C HBActive threads append their instances E E HB C
Instance buffer
Do visibility tests, active threads are visible C HB E
A B C D E F G H
Wavefront sorted by DrawableSetup
(here every thread has a unique setup)
Threads
And collate into LDS using group indices
(here four visible batches of one instance)
LDS atomics
1 1 1 1 LDS array
11 11 1
First active thread per group
appends batch to output buffer 11 1
Batch buffer
(showing batch lengths)
Pointers from batches
back to their instances
Read back from LDS
Time
43. GPU compute thread: Batching
Advantages
• Writes directly to output arrays, no extra storage
• Stable ordering of writes within wavefront
• Single pass through single shader
Disadvantages
• Some additional setup on CPU
• Conflicts with spatial sort to some extent
• Max batch size limited to wavefront size
• Uses about 2KB of LDS
• New data format with bigger QueryInstances
45. Statistics
Typically have 500K – 1.5M StaticInstances
StaticScene GPU data uses ~10-30MB
Main static query time is ~1-2ms
• Some of that time the CPU busy-waits for GPU
• Shadow query times generally less
Batching changes are valuable
• Items in render pipeline reduced by ~60-70%
• No meaningful change to query times
52. Future work
Original design was conservative with compute
• PS4 compute very fast, very flexible
Move more work from CPU
• Move tile/cluster cull to compute
• Let compute produce shader constants directly
Move dynamic content to (Static)Scene
• Needs faster object addition and removal
• Needs faster (and/or simpler) spatial structure
• Particularly needed for placement (vegetation etc.) meshes
53. Conclusion
PS4 compute is great
Simple solutions are great
• Do dumb stuff, lots of it, but really really fast
Work within existing workflow
Build to scale
• Content grew dramatically
towards the end of the project
• Tweaked the size and distribution of clusters
• New system generally coped
54. Special thanks to Michiel, Jeroen, Roland and the
Guerrilla Tech Team for making this talk possible today!
will@secondintention.com
QUESTIONS?
55. Reference
[Adinetz14] Optimized filtering with warp-aggregated atomics
• https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/
56. Bonus slides: Occlusion culling
Test against previous frame’s depth buffer
• On PS4 we read compressed depth tiles
• Generate conservative MIP chain
Simple and surprisingly effective
• Reproject and test bounding boxes against one MIP
• Constant-time test of four texels for small objects
• Scan box of texels for larger objects
• Tricky to schedule with GPU rendering
• Not perfect, but good enough
58. Bonus slides: Occlusion culling
We occlusion cull everywhere we frustum cull
• For every dynamic k-d tree node, DrawableObject, DrawableSetup
• For every static QueryInstance
• Same data and algorithm on CPU and GPU
• Used for player camera queries, not shadow maps
Straightforward compute implementation
• More complex SIMD implementation for CPU
Editor's Notes
Hi, I’m Will Vale and I’ve been making games for a while. I worked in the UK for several years at Particle Systems in Sheffield, and started contracting for Guerrilla fairly soon after emigrating to this hemisphere.
Guerrilla is a first party Sony studio that’s been traditionally known for really amazing art and technology. While we haven’t always had the most efficient workflow we’ve paid a lot of attention to that over the development of KZ4 and Horizon and we’re getting good at it.
Because the studio is big, workflow is probably the most important optimisation critierion. Any wasted time is scaled up by the large team size.
But performance is important too, since good workflow tends to mean more and better content for the engine to deal with.
Our in-house engine is Decima, it’s been used to create the Killzone series of games across multiple platforms, and Horizon and Kojima Productions’ Death Stranding on Playstation 4.
The Killzone games are relatively linear first person shooters with narrower areas connecting wider play spaces. Horizon was a huge departure from this – we wanted a large open world full of detail, and this obviously created some challenges for the engine. The variable content density and the need to keep headroom for spontaneous enemy encounters made for an interesting problem.
Before talking about what we did, I’ll talk a little about where we started from with Horizon’s visibility.
Our 3D models as seen by the game are a tree of MeshResources which can be arbitrarily complex. We have Lod meshes which select between different branches, and Multimeshes to place branches relative to the root. And at the leaves we have static meshes containing the geometry. There are other leaf node types but they’re not relevant to this talk.
It’s a very flexible system, which is good for building different content workflows, but makes it harder to nail down and extract typical behaviours when optimising.
We place instances of mesh resources in the world as DrawableObjects. These have a more efficient encoding of the MeshResource tree which is used at runtime.
One particular issue is that no mesh resource knows that it’s the root of a tree, it’s only when placed by a DrawableObject that this is made clear.
The leaf nodes of this tree contain DrawableSetups, which are what we feed the renderer. Each is a chunk of geometry with the shaders (and state) needed to render it. They isolate the renderer from the rest of the content.
In KZ3 we also occlusion-culled each object and setup using software occlusion culling on the PS3 SPUs.
In KZ4 we used middleware for static content, which handled all visibility including occlusion.
Queries are CPU jobs which are part of the general rendering job graph. We have a flexible job architecture on the CPU – most code (and all rendering code) runs in jobs.
This system wasn’t good enough for the world size of Horizon, we needed to reduce the complexity of the Kd-Tree since we’d be rebuilding it often as content streamed, and we wanted to run fewer MeshInstanceTree queries since these were expensive.
We also had the problem that the scene API was really aimed at few, large objects, and the building blocks which create Horizon’s world are a vast number of tiny objects.
So what did we build instead?
We had visibility middleware in KZ4 which caused workflow problems as we needed to precompute visibility data and this just didn’t fit into how we were working. It increased the time needed to properly test changes – we could start the game without it, but then artists had to wait for the baking process to finish in the background before they could see things as they should be. We still shipped the game with it, but we didn’t think we could fit it into Horizon.
I found it quite hard to give up the idea of precomputation, but it is liberating – if you don’t have to wait for baking processes you can test code changes quickly as well.
Obviously we also needed to handle new content and handle it faster.
We built a single-purpose system to ease the load on the existing multi-purpose system. The StaticScene handles just static geometry, since that’s what we have vastly more of.
We could use the existing system to handle the remaining dynamic geometry and run the jobs in parallel.
We also aimed to use the PS4’s asynchronous compute capability, since we thought that would better handle the amount of content, and would be relatively easy to synchronize with the CPU.
Since we wanted to feed relatively flat data to compute, we had to constrain the complex MeshResource trees.
It turns out that many of our static resources looked like this, with a top-level LodMeshResource selecting between one or more LODs of building blocks, and one or more LODs of collapsed geometry created by export tools. The building blocks themselves have LOD nodes, but in the collapsed geometry that’s simplified away.
The key observation was that two LOD levels were enough to represent all those trees. We call these parent and child, and they have the bounds of the LodMeshResource and the LOD bracket for one of the branches.
A leaf is visible if and only if both parent and child LODs are selected.
To avoid special cases, we add “empty” LOD levels so everything has two.
Any content that didn’t fit this moved to the dynamic system and the artists removed it over time.
To cull all this data we needed some spatial structure. At the top level we split the world into StaticTiles, which are sometimes true spatial tiles, and sometimes other things like settlements or encounters. These are defined by the streaming system, and the StaticScene doesn’t get to choose.
The tiles are created immutable and can’t be updated, which simplifies things.
A tile is mostly flat buffers of GPU data, with a list of clusters used on the CPU for first-pass culling.
Since shipping Horizon we’ve made some changes to support batch rendering, which I’ll talk about later. The data formats I’m describing are the new ones since I didn’t think there’d be time to cover both. Briefly, we used to have smaller QueryInstances and larger QuerySetups since each DrawableSetup had its own local-to-object space transform. For batching we remove the transforms from the DrawableSetups and instead use the DrawableSetup’s location in the MeshResourceTree to define a local-to-object space transform.
To save space, all the elements other than the QueryInstance are hashed and stored exactly once per tile, with the instances looking them up by index. Again, this is a post-shipping change.
The data is mostly indices, some filters bits to indicate visual meshes and different types of shadowcaster, and the parent and child LOD ranges for the leaf the instance is in.
As it stands now there’s not much room to grow!
Each unique QuerySetup and QueryObject is stored once per tile, so packing these structures isn’t as important.
The game world uses a high-precision coordinate system to avoid problems caused by floating point precision loss away from origin, and the renderer works in a floating space which follows the camera. We pass in a high precision object transform based on a 1m integer grid and the query shader subtracts the snapped camera origin and outputs floating space transforms to the renderer.
The QuerySetup has accurate local bounding information which we use for frustum culling. We could use the LOD bounds but they’re always an overestimate.
Our streaming system loads and unloads objects on a background thread, since loading can happen over several frames. The StaticScene receives sets of added and removed objects, with the adds and removes guaranteed to match up so we don’t have to deal with partial unloads.
Generally a group of added objects creates a single StaticTile, but if the tile would have a lot of QueryInstances (more than 24K) we split it up. Likewise if it would have very few (under 1K) we add these to a special “orphan” tile.
All the heavy lifting (spatial partitioning, memory allocation etc.) happens on the streaming thread, the main thread just has to make tiles active when they’re available.
We need a spatial partition to break up the data, and we have a couple of levels for this. The StaticTiles provide the first level since they’re generally coherent already, and we create a partially spatial partition within each tile to define clusters.
We generate a sort key for each QueryInstance using filter bits, maxmimum LOD range, and Morton number. The filter allows us to quickly discard clusters that aren’t relevant to a query, and the LOD range does a similar thing for dense areas of content. Below this the Morton numbers give a degreee of spatial coherence.
After sorting, we use the changes in sort keys to define clusters, again with a minimum size of 4K instances to help with load balancing. Each cluster is a potential compute job.
Morton numbers, introduced by Guy Macdonald Morton, are a way of mapping between N-dimensional coordinates and a one-dimensional value.
We start by quantising a position to some integer grid, and interleave the components bit-by-bit to produce a single number. In 3D this is like building an octree, and the increasing Morton numbers follow a Z-order curve like the one in the picture.
They’re pretty easy to compute with bit tricks, and if the Morton numbers are close together, then the positions are generally close together too so these are useful for quick and dirty spatial structuring. There are better but more expensive curves available, such as the Hilbert curve.
Having built our data we want to execute queries on it, and like the old system we create a CPU job to do this within the render job graph. This can run in parallel with the dynamic query job so the latencies don’t add up.
We use the tile and cluster hierarchy to skip as many clusters as possible, and dispatch one compute job for each visible cluster. Then the CPU job waits for compute to finish. This is nice and simple and avoids having our CPU job scheduler synchronise mixed job types.
Usually when waiting for a result like this, we’d pick up more work from the scheduler to avoid idling, but in this case we opt not to since the wait is generally short and we’re really after minimal latency.
The GPU jobs we dispatch use a single uber shader which is specialised into a handful of compile time variants. These all have reduced instruction and register count versus the generic shader, which helps the GPU compute units run more wavefronts at a time.
We only use the generic shader when query options are switched from their default settings, which is a debug only thing.
The shader is pretty long, around 1500 instructions in most cases.
Within the compute job, each thread loads a QueryInstance and tests the filter bits to see if it can skip all the work.
Otherwise it goes ahead and indirectly loads the bounds and matrices in order to perform the LOD and visibility tests. If at least the parent LOD test passes, we do both child LOD and accurate visibility tests since we need both results to update the LOD fade states.
We also check mesh streaming availability at this point – if vertex and index data is streamed out we can’t draw the object.
Once the tests are complete, we can update the LOD fade state using the results and the previous frame’s value. This only happens for the player camera queries at the moment.
If the instance is invisible, we stop, otherwise we allocate space for it in the output buffer. Since this is shared by all jobs and threads in the query, we have to synchronise access using a global counter updated atomically.
Once we have an address, each thead can write the DrawableSetup and transform to the output buffer.
I’m going to talk in more detail about our compute work now, so I’ll just go over some basic terms.
All our atomic operations are aggregated across a wavefront’s threads. This means that instead of using one atomic per thread we use one per wavefront.
This saves a lot of memory traffic, and is a drop-in replacement for standard atomics.
In this diagram, the rows of boxes show threads within a single simplified eight-thread wavefront, and the contents show what that thread is looking at during a particular stage of the process. So each column represents values in a single thread. This is obviously simplified since each thread will have several live registers at any time rather than just the one-per-row shown here.
The rounded boxes on the right show global buffers elsewhere in GPU memory, which are being operated on by this wavefront and many others at the same time.
We start with some active threads (coloured boxes) which want to write to the output buffer.
We create a bitmask of the active threads using ballot(), which is a scalar value, i..e the same for each thread.
We also create a prefix sum over the bitmask, by adding up all the bit values to the left of the thread in question. This is different for each thread as you can see.
Lastly we sum the bits in the bitmask, which is scalar again.
We then select the first thread and have it perform the global atomic, adding the sum of bits to the global counter value.
We get back the original value and distribute it to each thread, and finally each thread adds the prefix sum and writes to that address. As you can see the writes are compact and ordered by the prefix sum.
And here’s the code for that. ballot(true) gives us a mask for active threads.
We select the first active thread by comparing the current thread’s ID to the first active thread’s value for that ID.
Then the first thread counts the bits in the mask and adds that to the global counter.
Each thread then retrieve’s the first active thread’s result, and finally adds the prefix sum to get its own counter value.
So that covers how the system is put together,
I’m now going to talk about some brand new work which didn’t ship in Horizon.
In the Horizon renderer, after the query is complete we sort the visible DrawableSetups by their geometry and shader (as well as the usual sort criteria like depth)
At the back end of the renderer, we collect batches with consistent geometry and shader, and draw those as a single draw call. This gives us good batching because we can see all the DrawableSetups, but means we paid the CPU cost for all the individual DrawableSetups in the render pipeline.
We also only supported this for key passes like deferred geometry and shadowmap rendering, since it needed more machinery in the back end.
We wanted to remove this extra CPU and scratch memory cost by doing the batching in the query instead, at the very front of the pipeline. The batch quality is lower since the GPU only sees a cluster of objects at a time, but it’s still pretty good. Currently we’re using both systems but I’m hoping we can turn off the last minute batching in future.
To do batching well, we removed transforms from the DrawableSetups entirely so they could be shared more effectively between instances and resources. Without this we didn’t have a good enough sharing rate.
Instead of writing DrawableSetups and transforms, we write DrawableSetupBatches, which are effectively heads for batch lists of DrawableSetupInstances. The batches contain bounds for all the instances, batch lengths etc. and the instances have the transforms and some other renderer state.
These go in scratch memory and they’re only written for what we can see, so again dense packing isn’t super critical
On the CPU, we extend the QueryInstance sort key to sort by DrawableSetup, and flag QueryInstances with a bit when this changes. So now we only have one bit free of our two!
On the GPU, we uses this bits to group threads by DrawableSetup, do the normal visibility tests, have each active thread write a DrawableSetupInstance and then have the first active thread in each group about a DrawableSetupBatch to the output.
Again we’re looking at a simplified wavefront here. The letters show the QueryInstance being processed by each thread, and the colours the groups of DrawableSetups.
We start by loading the group end bit from the QueryInstance, and taking a prefix sum over this to get a group index. You can see that each coloured group has been identified by a consistent index.
We then do the normal visibility testing and end up with a subset of active threads representing visible instances.
The active threads then write their instances to the first output buffer, using aggregated atomics as before.
They also collate their batch properties in LDS arrays shared by the entire wavefront. These are the circles, and here I’m showing each thread adding the value one to the group’s array slot to create the batch lengths. In the real shader we also collate the bounds and other per-batch information.
The first active thread in each group reads back the batch data and writes the batch to the output buffer. It knows how to point to the correct instance since it was the thread that wrote the first instance in the batch.
I know this is a bit confusing, here are a couple more examples that might help.
This example shows what happens if there’s one big batch. Everything gets collated to group index zero and we output one batch for the whole wavefront.
The group index is zero for each thread, since they’re all in the same group.
The instances written are the same
All threads collate to group zero’s entry so we get a count of four instances in the batch
And finally there’s one group so we only write one entry to the batch buffer, pointing again to the start of the list of four instances.
And this is when all the DrawableSetups are different. Here the group indices are the same as the thread indices. This is basically the setup before we had batching in the shader.
Here all the group end bits are set, so each thread gets a different group index.
We write the same list of instances as before.
But when we collate, each thread has its own group so we get batches of length one.
And each active thread writes out its batch, giving four batches of one instance each.
Phew.
This scheme is pretty simple, and that gives us some big advantages – we don’t need any extra machinery, storage, or passes. For example if we generated linked lists of setups in each batch, we’d need another pass to flatten those into something the renderer can consume. Here we don’t.
We do lose some of the spatial coherence since we have to sort by setup first, but that only affects the efficiency of the culling hierarchy, not the renderer.
The downside is really that the max batch size is limited to the wavefront size, but this doesn’t hurt much in practice – it’s still a 64-fold saving.
We also use a chunk of LDS, but since our shader uses a lot of registers the LDS isn’t hurting occupancy – we can still run as many wavefronts.
So that’s where we are now.
The static scene handles a lot of data – usually 500K up to 1.5 million instances which could potentially be visible to a single camera position.
The GPU data to represent this isn’t too big, but as always we’d like it to be smaller.
The query time is also pretty consistent, the shadow queries are usually faster since they have different frusta and somewhat relaxed LOD criteria.
Changes we made for batching were very valuable, removing about two thirds of the work from the render pipeline without changing query times much if at all.
I thought I’d show some pics to finish up, to give an idea of what the static content actually looks like. Here’s a chunk of random jungle, which doesn’t have a lot of static content.
It’s mostly terrain and vegetation and those are both dynamic systems with a lot of regeneration at runtime.
If we remove the static geometry, we can see the rocks and some hand-placed trees vanish.
But if we visualise the static DrawableSetups there are still an awful lot of them!
Meridian is the opposite, it’s nearly all static geometry
As you can see, there’s just terrain, a bit of vegetation, and the animated characters left.
And the static scene density is so high that you can’t really make it out!
Where are we going from here? Well, we were pretty cautious with compute since this was one of the first complicated things we did with it. So we want to move more of the system from the CPU to the GPU, and ideally much of the query output on the GPU so the renderer doesn’t need to touch it, just feed it forward to the shaders later.
More importantly, we want to have the static scene take on more dynamic content, in particular the placed meshes, which don’t change all the time but often enough that the current tile creation wouldn’t be quick enough.
To conclude, compute is great on PS4 and well worth making use of.
I also think that simple solutions are great – do things that aren’t complicated, but do lots of them fast.
Work within the workflow to avoid slowing down or annoying your content creators, and build to scale for all the content they’re going to create. For Horizon the content grew very fast at the end of the project but by tweaking our minimum sizes we could generally cope with this quite well.
Thanks to my colleages at Guerrilla, particularly, Michiel, Jeroen and the tech team for help and support, and Roland for the awesome slide designs.