This document provides an overview of the PLAYSTATION®Edge components for animation, geometry processing, and compression on the PlayStation 3. It describes how animation is processed using blend trees and compressed data formats. Geometry processing involves skinning, triangle culling, and compressed vertex outputs on the SPUs. The offline tools pipeline generates hierarchy and compressed animation/geometry data. Runtime processing involves decompression, skinning, culling, and output on the SPUs in a pipelined manner. The goal is to offload work from the PPU and RSX for improved performance.
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
Direct3D 11 will have tessellation for smoother curves and finer details. The new compute shader will make postprocessing faster and easier. You'll need Direct3D 11 to have the best graphics, and this talk will show you how you can get started using current generation hardware.
BitSquid Tech: Benefits of a data-driven renderertobias_persson
BitSquid Tech is a data-driven game engine developed by BitSquid that aims to provide flexibility and fast iteration. Key aspects include supporting hot-reloading of content, utilizing multi-core processors, and having multiple specialized tools instead of a single large editor. The data-driven renderer uses a JSON configuration file to define the entire rendering pipeline, including resources, layers, and resource generators for post-processing, allowing the pipeline to vary dramatically between projects or hardware while remaining high-level and hot-reloadable. This provides significant benefits for flexibility, experimentation, and scaling to different hardware.
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
Presentation from Game Developers Conference 2011 (GDC2011), presented by Colin Barre-Brisebois and Marc Bouchard. A rendering technique for real-time translucency rendering, implemented in DICE's Frostbite 2 engine, featured in Battlefield 3.
In this presentation we discuss the need to decouple game code and game data in current games. Viewing games as
data-independent, data-processing systems can empower the designer and artist to easily add new content and to
experiment and fine-tune the game mechanics potentially leading to better and more expandable games. To put the
discussion into context we present the way we have structured the data-driven system in Space Debris and the flexibility it
provided during level design.
The document discusses techniques for destruction masking in the Frostbite game engine. It describes using signed volume distance fields to define destruction mask shapes, and techniques like deferred decals and triangle culling using distance fields to efficiently render the masks with good performance. Triangle culling showed the best GPU performance on the PS3, allowing destruction masks to be rendered in a more optimized way than traditional techniques.
The document provides an overview of next-generation graphics on the Xbox 360, including details about the Xbox 360 system architecture, GPU, and graphics APIs like Direct3D. The Xbox 360 GPU was custom designed by ATI Technologies and includes features like 10MB of embedded DRAM, 48 shader ALUs, and hardware support for tessellation, textures, and multi-sampling anti-aliasing. Direct3D on Xbox 360 exposes low-level access to the GPU while supporting familiar features and adding extensions for the custom hardware.
This document provides techniques for optimizing games for the PlayStation platforms. It discusses case studies from Motorstorm: Arctic Edge including using lower polygon counts and LOD for landscapes. It also covers maintaining framerate on PS3 through dynamic resolution and reducing GPU load. The document emphasizes optimizing for the PS3's multi-core architecture including offloading tasks to SPUs like physics and using the SPUs to assist the GPU with tasks like post-processing.
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
Direct3D 11 will have tessellation for smoother curves and finer details. The new compute shader will make postprocessing faster and easier. You'll need Direct3D 11 to have the best graphics, and this talk will show you how you can get started using current generation hardware.
BitSquid Tech: Benefits of a data-driven renderertobias_persson
BitSquid Tech is a data-driven game engine developed by BitSquid that aims to provide flexibility and fast iteration. Key aspects include supporting hot-reloading of content, utilizing multi-core processors, and having multiple specialized tools instead of a single large editor. The data-driven renderer uses a JSON configuration file to define the entire rendering pipeline, including resources, layers, and resource generators for post-processing, allowing the pipeline to vary dramatically between projects or hardware while remaining high-level and hot-reloadable. This provides significant benefits for flexibility, experimentation, and scaling to different hardware.
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
Presentation from Game Developers Conference 2011 (GDC2011), presented by Colin Barre-Brisebois and Marc Bouchard. A rendering technique for real-time translucency rendering, implemented in DICE's Frostbite 2 engine, featured in Battlefield 3.
In this presentation we discuss the need to decouple game code and game data in current games. Viewing games as
data-independent, data-processing systems can empower the designer and artist to easily add new content and to
experiment and fine-tune the game mechanics potentially leading to better and more expandable games. To put the
discussion into context we present the way we have structured the data-driven system in Space Debris and the flexibility it
provided during level design.
The document discusses techniques for destruction masking in the Frostbite game engine. It describes using signed volume distance fields to define destruction mask shapes, and techniques like deferred decals and triangle culling using distance fields to efficiently render the masks with good performance. Triangle culling showed the best GPU performance on the PS3, allowing destruction masks to be rendered in a more optimized way than traditional techniques.
The document provides an overview of next-generation graphics on the Xbox 360, including details about the Xbox 360 system architecture, GPU, and graphics APIs like Direct3D. The Xbox 360 GPU was custom designed by ATI Technologies and includes features like 10MB of embedded DRAM, 48 shader ALUs, and hardware support for tessellation, textures, and multi-sampling anti-aliasing. Direct3D on Xbox 360 exposes low-level access to the GPU while supporting familiar features and adding extensions for the custom hardware.
This document provides techniques for optimizing games for the PlayStation platforms. It discusses case studies from Motorstorm: Arctic Edge including using lower polygon counts and LOD for landscapes. It also covers maintaining framerate on PS3 through dynamic resolution and reducing GPU load. The document emphasizes optimizing for the PS3's multi-core architecture including offloading tasks to SPUs like physics and using the SPUs to assist the GPU with tasks like post-processing.
Gamebryo LightSpeed provides improved runtime performance, a modular game framework, entity modeling tools, Lua scripting and debugging, and rapid iteration capabilities. New features include deferred lighting for improved rendering, a decoration system for terrain customization, terrain streaming for unlimited map sizes, and an enhanced water editor. It offers an integrated development environment with tools in Visual Studio, 3DS Max, and the Toolbench plugin suite.
This paper presents a technique called Clustered Shading for deferred and forward rendering. In Clustered Shading, samples from the view are grouped into clusters based on their 3D position and normal, rather than just 2D position as in tiled shading. This creates a better mapping of lights to samples, reducing lighting computations. Clustered Shading also enables back-face culling of lights per cluster. The technique results in significantly fewer lighting computations than tiled shading. It allows real-time rendering of scenes with thousands more lights than previously possible.
Built for performance: the UIElements Renderer – Unite Copenhagen 2019Unity Technologies
In this technical talk, we will describe the science behind the UIElements rendering system, built from the ground up for retained-mode UI. It uses every CPU/GPU trick in the book to render thousands of different elements onscreen in a fraction of a millisecond, all on one thread. This powerful UI performance and optimization tool also supports complex features like clipping and vector graphics, even on low-end devices.
Speaker: Wessam Bahnassi – Unity
Watch the session on YouTube: https://youtu.be/zeCdVmfGUN0
Scene Graphs & Component Based Game EnginesBryan Duggan
A presentation I made at the Fermented Poly meetup in Dublin about Scene Graphs & Component Based Game Engines. Lots of examples from my own game engine BGE - where almost everything is a component. Get the code and the course notes here: https://github.com/skooter500/BGE
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
This talk will present a novel technique for the rendering of surfaces covered with fallen deformable snow featured in Batman: Arkham Origins. Scalable from current generation consoles to high-end PCs, as well as next-generation consoles, the technique allows for visually convincing and organically interactive deformable snow surfaces everywhere characters can stand/walk/fight/fall, is extremely fast, has a low memory footprint, and can be used extensively in an open world game. We will explain how this technique is novel in its approach of acquiring arbitrary deformation, as well as present all the details required for implementation. Moreover, we will share the results of our collaboration with NVIDIA, and how it allowed us to bring this technique to the next level on PC using DirectX 11 tessellation.
This document provides recommendations for optimizing DirectX 11 performance. It separates the graphics pipeline process into offline and runtime stages. For the offline stage, it recommends creating resources like buffers, textures and shaders on multiple threads. For the runtime stage, it suggests culling unused objects, minimizing state changes, and pushing commands to the driver quickly. It also provides tips for updating dynamic resources efficiently and grouping related constants together. The goal is to keep the CPU and GPU pipelines fully utilized for maximum performance.
In the session from Game Developers Conference 2011, we'll take a complete look at the terrain system in Frostbite 2 as it was applied in Battlefield 3. The session is partitioned into three parts. We begin with the scalability aspects and discuss how consistent use of hierarchies allowed us to combine high resolutions with high view distances. We then turn towards workflow aspects and describe how we achieved full in-game realtime editing. A fair amount of time is spent describing how issues were addressed.
Finally, we look at the runtime side. We describe usage of CPU, GPU and memory resources and how it was kept to a minimum. We discuss how the GPU is offloaded by caching intermediate results in a procedural virtual texture and how prioritization was done to allow for work throttling without sacrificing quality. We also go into depth about the flexible streaming system that work with both FPS and driving games.
Oracle Exadata is the equivalent of a F1 car in terms of performance but are you sure your application is driving it at its full potential? A simple "lift&shift" approach to Exadata migration might lose significant opportunities for improvements. This session highlights a few example where making little changes dramatically changed the application performance
GDC 2012: Advanced Procedural Rendering in DX11smashflt
The document discusses procedural rendering techniques in DirectX 11. It introduces signed distance fields (SDFs) as a useful tool for procedural geometry creation. SDFs define the distance to surfaces and can be used to create complex shapes by combining simple primitives and applying operations like unions and differences. The document shows how SDFs can be generated from triangle meshes or particle systems and used for effects like cutting holes and repeating patterns. It also covers techniques for visualizing SDFs through raycasting and polygonization via marching cubes. The document concludes by discussing how these techniques can be applied to smoothed particle hydrodynamics simulations.
Track 2 session 6 db2 utilities update and best practices v2IBMSystemzEvents
The document provides an overview of updates and best practices for DB2 utilities. Key enhancements include improved performance for online REORG utilities, reduced impact of the SWITCH phase for partition-level REORGs, automated handling of mapping tables during REORG, and faster partition-level recovery using inline image copies. Statistics collection and LOAD/UNLOAD processing were also enhanced. Deprecated items are noted. Contacting IBM for a DB2 Utilities Workshop is recommended for learning more about optimizing utility usage.
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...Colin Barré-Brisebois
This talk covers techniques from Battlefield 3 and Need for Speed: The Run. Includes chroma sub-sampling for faster full-screen effects, a novel DirectX 9+ scatter-gather approach to bokeh rendering, HiZ reverse-reload for faster shadow, improved temporally-stable dynamic ambient occlusion, and tile-based deferred shading on Xbox 360.
ne of the most sought after features in PostgreSQL is a scalable multi-master replication solution. While there does exists some tools to create multi-master clusters such as Bucardo and pgpool-II, they may not be the right fit for an application. In this session, you will learn some of the strengths and weaknesses of these more popular multi-master solutions for PostgreSQL and how they compare to using Slony for your multi-master needs. We will explore the types of deployments best suited for a Slony deployment and the steps necessary to configure a multi-master solution for PostgreSQL.
A Bizarre Way to do Real-Time LightingSteven Tovey
This document provides a 10 step guide for implementing real-time lighting on the PlayStation 3 (PS3) using its parallel architecture of 6 Synergistic Processing Units (SPUs). It discusses rendering a pre-pass to extract normals and depth, calculating lighting in a tile-based parallel manner on the SPUs, and compositing the final lighting texture. Special techniques like using atomics, striping data across SPUs, and maintaining pipeline balance are needed to optimize performance on the PS3's parallel architecture. The goal is to achieve real-time lighting for a game with 20 cars racing at night, while preserving picture quality and reducing frame latency to acceptable levels.
GridSQL is an open source distributed database built on PostgreSQL that allows it to scale horizontally across multiple servers by partitioning and distributing data and queries. It provides significantly improved performance over a single PostgreSQL instance for large datasets and queries by parallelizing processing across nodes. However, it has some limitations compared to PostgreSQL such as lack of support for advanced SQL features, slower transactions, and need for downtime to add nodes.
Graham Wihlidal from SEED attended the Munich Khronos Meetup and presented some aspects of Halcyon's rendering architecture, as well as details of the Vulkan implementation. Graham presented components like high-level render command translation, render graph, and shader compilation.
pSX is a PlayStation 1 emulator that runs under Windows and Linux. It emulates most aspects of the PlayStation and allows users to play PS1 games by loading various disc image file formats. The emulator is designed to be easy to use with minimal configuration required. It supports features like save states, memory cards, and compressed disc images to provide an authentic PlayStation experience.
Stado is an open source, shared-nothing architecture for scaling PostgreSQL across multiple servers. It partitions and distributes PostgreSQL tables and queries them in parallel. Stado provides linear scalability for read queries as more nodes are added. However, writes are slower than a single PostgreSQL instance due to additional network hops. Stado also has limitations in transaction performance, high availability and backup/restore capabilities compared to a single PostgreSQL database.
Introduction to the Graphics Pipeline of the PS3Slide_N
This document provides an overview of the graphics pipeline in the PlayStation 3 (PS3) as presented by Cedric Perthuis. It describes the key hardware components like the Cell processor with its SPEs and RSX graphics processor. It also discusses the software APIs like PSGL, a version of OpenGL ES, and extensions developed for the PS3. Examples are provided of how to utilize the unique hardware architecture, such as using SPEs to preprocess particle system data.
The document summarizes the key features and capabilities of Direct3D 10, which was designed to maximize GPU performance by reducing CPU overhead and enabling more work to be done on the GPU. Some of the main features discussed include constant buffers, geometry shaders, texture arrays, and other capabilities that reduce draw calls and state changes. Direct3D 10 also provides a standardized, consistent API and enables new visual effects by exposing more of the GPU's programmability and functionality to developers.
Gamebryo LightSpeed provides improved runtime performance, a modular game framework, entity modeling tools, Lua scripting and debugging, and rapid iteration capabilities. New features include deferred lighting for improved rendering, a decoration system for terrain customization, terrain streaming for unlimited map sizes, and an enhanced water editor. It offers an integrated development environment with tools in Visual Studio, 3DS Max, and the Toolbench plugin suite.
This paper presents a technique called Clustered Shading for deferred and forward rendering. In Clustered Shading, samples from the view are grouped into clusters based on their 3D position and normal, rather than just 2D position as in tiled shading. This creates a better mapping of lights to samples, reducing lighting computations. Clustered Shading also enables back-face culling of lights per cluster. The technique results in significantly fewer lighting computations than tiled shading. It allows real-time rendering of scenes with thousands more lights than previously possible.
Built for performance: the UIElements Renderer – Unite Copenhagen 2019Unity Technologies
In this technical talk, we will describe the science behind the UIElements rendering system, built from the ground up for retained-mode UI. It uses every CPU/GPU trick in the book to render thousands of different elements onscreen in a fraction of a millisecond, all on one thread. This powerful UI performance and optimization tool also supports complex features like clipping and vector graphics, even on low-end devices.
Speaker: Wessam Bahnassi – Unity
Watch the session on YouTube: https://youtu.be/zeCdVmfGUN0
Scene Graphs & Component Based Game EnginesBryan Duggan
A presentation I made at the Fermented Poly meetup in Dublin about Scene Graphs & Component Based Game Engines. Lots of examples from my own game engine BGE - where almost everything is a component. Get the code and the course notes here: https://github.com/skooter500/BGE
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
This talk will present a novel technique for the rendering of surfaces covered with fallen deformable snow featured in Batman: Arkham Origins. Scalable from current generation consoles to high-end PCs, as well as next-generation consoles, the technique allows for visually convincing and organically interactive deformable snow surfaces everywhere characters can stand/walk/fight/fall, is extremely fast, has a low memory footprint, and can be used extensively in an open world game. We will explain how this technique is novel in its approach of acquiring arbitrary deformation, as well as present all the details required for implementation. Moreover, we will share the results of our collaboration with NVIDIA, and how it allowed us to bring this technique to the next level on PC using DirectX 11 tessellation.
This document provides recommendations for optimizing DirectX 11 performance. It separates the graphics pipeline process into offline and runtime stages. For the offline stage, it recommends creating resources like buffers, textures and shaders on multiple threads. For the runtime stage, it suggests culling unused objects, minimizing state changes, and pushing commands to the driver quickly. It also provides tips for updating dynamic resources efficiently and grouping related constants together. The goal is to keep the CPU and GPU pipelines fully utilized for maximum performance.
In the session from Game Developers Conference 2011, we'll take a complete look at the terrain system in Frostbite 2 as it was applied in Battlefield 3. The session is partitioned into three parts. We begin with the scalability aspects and discuss how consistent use of hierarchies allowed us to combine high resolutions with high view distances. We then turn towards workflow aspects and describe how we achieved full in-game realtime editing. A fair amount of time is spent describing how issues were addressed.
Finally, we look at the runtime side. We describe usage of CPU, GPU and memory resources and how it was kept to a minimum. We discuss how the GPU is offloaded by caching intermediate results in a procedural virtual texture and how prioritization was done to allow for work throttling without sacrificing quality. We also go into depth about the flexible streaming system that work with both FPS and driving games.
Oracle Exadata is the equivalent of a F1 car in terms of performance but are you sure your application is driving it at its full potential? A simple "lift&shift" approach to Exadata migration might lose significant opportunities for improvements. This session highlights a few example where making little changes dramatically changed the application performance
GDC 2012: Advanced Procedural Rendering in DX11smashflt
The document discusses procedural rendering techniques in DirectX 11. It introduces signed distance fields (SDFs) as a useful tool for procedural geometry creation. SDFs define the distance to surfaces and can be used to create complex shapes by combining simple primitives and applying operations like unions and differences. The document shows how SDFs can be generated from triangle meshes or particle systems and used for effects like cutting holes and repeating patterns. It also covers techniques for visualizing SDFs through raycasting and polygonization via marching cubes. The document concludes by discussing how these techniques can be applied to smoothed particle hydrodynamics simulations.
Track 2 session 6 db2 utilities update and best practices v2IBMSystemzEvents
The document provides an overview of updates and best practices for DB2 utilities. Key enhancements include improved performance for online REORG utilities, reduced impact of the SWITCH phase for partition-level REORGs, automated handling of mapping tables during REORG, and faster partition-level recovery using inline image copies. Statistics collection and LOAD/UNLOAD processing were also enhanced. Deprecated items are noted. Contacting IBM for a DB2 Utilities Workshop is recommended for learning more about optimizing utility usage.
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...Colin Barré-Brisebois
This talk covers techniques from Battlefield 3 and Need for Speed: The Run. Includes chroma sub-sampling for faster full-screen effects, a novel DirectX 9+ scatter-gather approach to bokeh rendering, HiZ reverse-reload for faster shadow, improved temporally-stable dynamic ambient occlusion, and tile-based deferred shading on Xbox 360.
ne of the most sought after features in PostgreSQL is a scalable multi-master replication solution. While there does exists some tools to create multi-master clusters such as Bucardo and pgpool-II, they may not be the right fit for an application. In this session, you will learn some of the strengths and weaknesses of these more popular multi-master solutions for PostgreSQL and how they compare to using Slony for your multi-master needs. We will explore the types of deployments best suited for a Slony deployment and the steps necessary to configure a multi-master solution for PostgreSQL.
A Bizarre Way to do Real-Time LightingSteven Tovey
This document provides a 10 step guide for implementing real-time lighting on the PlayStation 3 (PS3) using its parallel architecture of 6 Synergistic Processing Units (SPUs). It discusses rendering a pre-pass to extract normals and depth, calculating lighting in a tile-based parallel manner on the SPUs, and compositing the final lighting texture. Special techniques like using atomics, striping data across SPUs, and maintaining pipeline balance are needed to optimize performance on the PS3's parallel architecture. The goal is to achieve real-time lighting for a game with 20 cars racing at night, while preserving picture quality and reducing frame latency to acceptable levels.
GridSQL is an open source distributed database built on PostgreSQL that allows it to scale horizontally across multiple servers by partitioning and distributing data and queries. It provides significantly improved performance over a single PostgreSQL instance for large datasets and queries by parallelizing processing across nodes. However, it has some limitations compared to PostgreSQL such as lack of support for advanced SQL features, slower transactions, and need for downtime to add nodes.
Graham Wihlidal from SEED attended the Munich Khronos Meetup and presented some aspects of Halcyon's rendering architecture, as well as details of the Vulkan implementation. Graham presented components like high-level render command translation, render graph, and shader compilation.
pSX is a PlayStation 1 emulator that runs under Windows and Linux. It emulates most aspects of the PlayStation and allows users to play PS1 games by loading various disc image file formats. The emulator is designed to be easy to use with minimal configuration required. It supports features like save states, memory cards, and compressed disc images to provide an authentic PlayStation experience.
Stado is an open source, shared-nothing architecture for scaling PostgreSQL across multiple servers. It partitions and distributes PostgreSQL tables and queries them in parallel. Stado provides linear scalability for read queries as more nodes are added. However, writes are slower than a single PostgreSQL instance due to additional network hops. Stado also has limitations in transaction performance, high availability and backup/restore capabilities compared to a single PostgreSQL database.
Introduction to the Graphics Pipeline of the PS3Slide_N
This document provides an overview of the graphics pipeline in the PlayStation 3 (PS3) as presented by Cedric Perthuis. It describes the key hardware components like the Cell processor with its SPEs and RSX graphics processor. It also discusses the software APIs like PSGL, a version of OpenGL ES, and extensions developed for the PS3. Examples are provided of how to utilize the unique hardware architecture, such as using SPEs to preprocess particle system data.
The document summarizes the key features and capabilities of Direct3D 10, which was designed to maximize GPU performance by reducing CPU overhead and enabling more work to be done on the GPU. Some of the main features discussed include constant buffers, geometry shaders, texture arrays, and other capabilities that reduce draw calls and state changes. Direct3D 10 also provides a standardized, consistent API and enables new visual effects by exposing more of the GPU's programmability and functionality to developers.
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...Johan Andersson
The document discusses procedural shading and texturing techniques used in game engines. It describes the Frostbite game engine which uses procedural generation to create terrain, foliage and other game assets in real-time. Surface shaders are created as graphs and allow procedural definition of material properties. The world renderer handles multi-threaded rendering of game worlds using procedural techniques.
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)Johan Andersson
1. The document discusses parallel graphics techniques used in the Frostbite game engine, both currently and potentially in the future. It describes using job-based parallelism to utilize multiple CPU cores and the PS3 SPUs.
2. One technique is parallel command buffer recording to dispatch draw calls to multiple command buffers and scale linearly with core count. Another is software occlusion culling using the SPUs/CPU to rasterize a coarse z-buffer.
3. Potential future techniques discussed include deferred shading using compute shaders, with the compute shader culling lights and accumulating lighting per screen-space tile.
The document discusses best practices for using the RSX graphics chip with the SPUs on the PlayStation 3. It describes how the SPUs can be used to offload geometry processing tasks like skinning, blend shapes, level of detail generation, shadow volume generation, and triangle culling to improve performance. This allows the RSX to focus on rendering the processed geometry from the SPUs. Various compression techniques are also discussed to reduce data transferred to the SPUs and RSX.
Presented as a pre-conference tutorial at the GPU Technology Conference in San Jose on September 20, 2010.
Learn about NVIDIA's OpenGL 4.1 functionality available now on Fermi-based GPUs.
Unite 2013 optimizing unity games for mobile platformsナム-Nam Nguyễn
The document discusses optimizing Unity games for mobile platforms. It begins with an introduction to the author and ARM. It then covers identifying performance bottlenecks like CPU, vertex processing, fragment processing, and bandwidth usage. Tools for profiling like the Unity Profiler and ARM DS-5 Streamline Performance Analyzer are presented. An example application is overviewed to demonstrate optimization techniques for geometry, level of detail, particles, textures, overdraw, lights, and texture atlases. The document provides guidance on using tools and techniques to improve performance and optimize games for mobile hardware.
The document discusses different types of graphics display systems including raster scan displays, random scan displays, and flat panel displays. It describes the key components of cathode ray tube (CRT) displays such as the electron gun and phosphor screen and how they generate images. It also covers color reproduction methods for CRTs like beam penetration and three color guns.
The document discusses graphics processing units (GPUs). It begins with an introduction and definition of GPUs as processors designed specifically for processing 3D graphics. It then covers the components of a GPU and compares GPU and CPU architectures. Specifically, it notes that GPUs have many parallel execution units while CPUs have few, and that GPUs have significantly faster memory interfaces than CPUs. The document concludes by noting that GPU development is ongoing and faster GPUs can be expected in the future.
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
The document discusses current and future uses of graphics processing units (GPUs) in game engines. It covers topics like shader programming, parallel rendering, texture techniques, raytracing, and general purpose GPU (GPGPU) computing. The author envisions future improvements like more robust shader subroutines, enhanced texture sampling capabilities, hardware-accelerated sparse textures, and limited case raytracing integrated into game engines.
The document compares the hardware architectures and graphics processing capabilities of the Xbox 360 and PlayStation 3 game consoles. Both consoles use procedural synthesis to optimize rendering of 3D scenes from stored data. The Xbox 360 uses a GPU designed specifically for gaming by Microsoft and ATI, while the PS3 uses a GPU not primarily designed for gaming. As a result, the Xbox 360's graphics rendering is generally more optimized for games. However, both consoles have trade-offs, so the best choice depends on intended use and price.
09 basics operating and monitoring v1.00_enconfidencial
The document discusses the basics of operating and monitoring a PCS 7 system. It describes the general functions of the operator station (OS) and how it can be configured as a single station or multiple station system. It also covers plant hierarchy settings, the OS-AS connection, compiling projects, layouts, block icons and faceplates. The key points are:
- The OS is based on WinCC and used for process visualization, alarm logging, tag logging, and more.
- A system can be a single OS or multiple OSs connected to one or more automation stations. Redundant servers provide high availability.
- Plant hierarchy settings determine how data is structured in pictures and tag names on the
A technical deep dive into the DX11 rendering in Battlefield 3, the first title to use the new Frostbite 2 Engine. Topics covered include DX11 optimization techniques, efficient deferred shading, high-quality rendering and resource streaming for creating large and highly-detailed dynamic environments on modern PCs.
The document provides an overview of graphics programming on the Xbox 360, including details about the system and GPU architecture, graphics APIs like Direct3D, shader development, and tools for graphics debugging and optimization like PIX. Key points include that the Xbox 360 GPU is designed by ATI and includes 10MB of EDRAM, supports shader model 3.0, and has dedicated hardware for features like tessellation, procedural geometry, and anti-aliasing. Direct3D is optimized for the Xbox 360 hardware and exposes new features. PIX is a powerful tool for performance analysis and debugging graphics applications on the Xbox 360.
The document discusses several advanced use cases for profiling applications using the XDS560 Trace tool and Advanced Event Triggering (AET) logic on Texas Instruments processors, including interrupt profiling to analyze interrupt servicing times, statistical profiling to identify functions consuming the most cycles, thread-aware profiling to generate a cycle-accurate execution graph of thread-based applications, and generating a thread-based dynamic call graph from captured trace data.
This document provides tips and tricks for achieving the best performance with Intel graphics. It discusses optimizing various aspects of an application including the platform, sampler, fillrate, arithmetic logic, and geometry. It emphasizes starting optimizations at the application level and being conscious of memory access patterns and cache locality. It also discusses scaling a game across different platforms by establishing performance ceilings and optimizing for varying memory bandwidth, sampler throughput, fillrate, and geometry throughput. The overall goal is to create a game that runs well on a wide range of systems.
Allegorithmic developed Substance, a middleware for procedurally generating textures on CPUs to reduce texture memory and streaming bottlenecks. Substance uses a node-based graph to procedurally generate textures. It is designed to take advantage of multi-core CPUs through techniques like task parallelism, data parallelism, and lockless synchronization to efficiently generate textures across CPU cores in parallel. Testing showed Substance could utilize 4 CPU cores to generate textures 3.8 times faster than a single core, helping to maintain high framerates during texture streaming.
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio Owen Wu
The document discusses Mali GPU architecture and Arm Mobile Studio. It provides details on Mali GPU components like Bifrost shader cores and tile-based rendering. It also describes features such as index-driven vertex shading, forward pixel kill, and efficient render passes. The document concludes with an overview of the Arm Mobile Studio tools for profiling GPU and CPU performance on mobile devices.
New Millennium for Computer Entertainment - KutaragiSlide_N
This document discusses the next generation of computer entertainment and Sony's vision for the future. It summarizes Sony's development of new technologies including the Emotion Engine processor and Graphics Synthesizer that will power the next PlayStation console. These new components provide significantly more processing and graphics capabilities compared to existing consoles and PCs. Sony aims to advance from sound and graphics synthesis to emotion synthesis by using these technologies to generate realistic animations and simulate human emotions in games.
Ken Kutaragi was the Executive Deputy President and COO in charge of Home, Broadband and Semiconductor Solutions Network Companies, and Game Business Group at Sony. He saw digital consumer electronics like digital flat TVs, home servers, and digital cameras as the new driving force for next generation technologies. He believed future homes would be powered by technologies like artificial intelligence, broadband networks, optical/wireless connectivity, and the PlayStation portable game console. Semiconductors would be the "heart" powering various digital devices, and entertainment was viewed as the "key" application that would drive new digital content and computing platforms like Sony's CELL processor.
The document outlines Nobuyuki Idei's transformation plan for Sony to improve profitability through structural reform. The plan involves two phases from FY2003-FY2006: 1) reducing fixed costs by 330 billion yen through streamlining operations and headcount reductions, and 2) implementing "convergence strategies" across businesses to enhance core businesses and create new areas of growth. The goal is to increase the group operating profit margin to over 10% by FY2006.
Moving Innovative Game Technology from the Lab to the Living RoomSlide_N
Richard Marks discusses moving innovative game technology from research labs into consumer living rooms. He provides examples of how Sony has developed new input and sensing technologies like the EyeToy webcam and PlayStation Move motion controller through research and then incorporated them into popular gaming products. Marks explains the process from initial research concepts and prototypes to mass production and commercial launches. He also looks at future trends in areas like immersive displays, life gaming, and haptic feedback.
Cell Technology for Graphics and VisualizationSlide_N
The document discusses Cell technology for graphics and visualization. It provides an overview of the Cell architecture including its Power Processor Element (PPE) and Synergistic Processor Elements (SPEs). The PPE handles operating system tasks while the SPEs provide computational performance. The document outlines programming models for the Cell including function offload, application specific accelerators, computational acceleration, streaming, and a shared memory multiprocessor model. It also discusses heterogeneous threading and a single source compiler approach.
This document summarizes an IBM presentation on industry trends in microprocessor design. It discusses how single-thread performance growth has slowed due to power limitations, leading chipmakers to adopt multi-core designs. It then outlines IBM's Cell/B.E. microprocessor and roadmap, including its heterogeneous multi-core architecture combining general-purpose and specialized processing elements. Finally, it notes both AMD and Intel are moving toward heterogeneous designs that integrate CPU and GPU capabilities to better handle high-performance computing workloads.
Translating GPU Binaries to Tiered SIMD Architectures with OcelotSlide_N
The document discusses Ocelot, a binary translation framework that allows architectures other than NVIDIA GPUs to execute programs written in PTX, an intermediate representation used by NVIDIA GPUs. It describes how Ocelot maps the PTX thread hierarchy to different architectures, uses translation techniques to hide memory latency, and emulates GPU data structures. It also provides details on the implementation of the translator and a case study of translating a PTX program to IBM Cell Processor assembly code.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
9. PLAYSTATION®Edge
Component Overview
Geometry Processing
Skinning on SPUs
Offload the RSX
Triangle Culling on SPUs
Remove unnecessary RSX processing
Blend Shapes on SPUs
Offload the PPU
Compressed Data formats
SPUs can use better data compression than the RSX
10. PLAYSTATION®Edge
Component Overview
Compression
Fast zlib decompression implemented for the
SPUs
Increases effective bandwidth
from BD-ROM
Useful for high speed streaming
40MB/sec with ~25% of an SPU
12. Full source code available
PLAYSTATION®Edge
Component Overview
SPU code
Runs as SPURS jobs
C with Intrinsics
PPU and tools code written in C with some C++
libGCM used as RSX interface
33. Two modes of usage
Primary mode
Use PLAYSTATION®Edge offline tools
Partition into vertex sets
Use indexed triangles
All features of pipeline can be used
SPU
34. Two modes of usage (cont)
Secondary mode
Data generated by other tools
Formats other than indexed triangles
Non-partitioned objects
Subset of pipeline features can be used
SPU
37. Vertex attributes can be input into the
SPUs in multiple arrays
Unique Vertex
Array 0
Instance Vertex
Array 1
38. Vertex information is decompressed
into tables of floats
Float TablesUnique Vertex
Array 0
Instance Vertex
Array 1
39. 24bit Unit Vector
Smallest 2 compression
Two smallest components with 10 bits each
Encoded from –sqrt(2)/2 to +sqrt(2)/2
Largest component reconstructed via
Largest = sqrt(1 – smallestA2 – smallestB2)
40. 24bit Unit Vector
Smallest 2 compression
Two smallest components with 10 bits each
Encoded from –sqrt(2)/2 to +sqrt(2)/2
Largest component reconstructed via
Largest = sqrt(1 – smallestA2 – smallestB2)
One additional bit to represent W as +1 or -1
For constructing bi-normal from normal and
tangent.
41. N-bit Fixed Point
with integer offsets
Simple n.x fixed point values
Per-segment integer offset
Bit count may vary from attribute to
attribute
43. Index Table Construction
Index table is created by a vertex cache
optimizer
RSX Best Practices
Thursday 2:30 pm – 3:30 pm
Room 3001, West Hall
Supplied in PlayStation 3 SDK
First party research
Importance of mini-cache
72. “Just in Time” Single Buffer Strategy
SPUs generate data in same frame as
RSX consumes it
System tuned so that the RSX rarely
waits on SPUs
SPU RSX synchronization in place
to handle rare cases
73. Geometry System
Rendering Sequence
On the PPU
Create a SPURS job
Place most RSX commands in the command
buffer
Leave space in the RSX command buffer for the
SPU to fill in later
On the SPU
Process geometry
Write final commands to RSX command buffer
74. Synchronization Techniques
RSX SPU synchronization
by manipulation of put pointer
RSX Best Practices
Thursday 2:30 pm – 3:30 pm
Room 3001, West Hall
RSX SPU synchronization
through “local stalls”
87. GCM Replay
GCM Replay is a new tool for RSX
Analysis
Debugging
Profiling
88. GCM Replay - Overview
GCM Replay consists of two parts
Small PS3 runtime library
Main Application - runs on a Windows PC + PS3 Dev-Tool
Host PC
GCM Replay
Game + libgcmReplay.a
PS3 Dev-ToolGCM Replay
PS3 Dev-Tool
89. GCM Replay - Overview
Uses RSX rather than simulation
Supports highly detailed analysis
Far greater than a typical real-time profiler would allow
Supporting scene-wide analysis
To the analysis of individual draw calls, vertices and pixels
Focus on off-line performance analysis features
Many of which have never been available before
91. GCM Replay - Overview
Run your game with the GCM Replay runtime linked in
Once you reach a point of interest…
Hit Capture
92. Capturing the Command Buffer
GCM Replay will traverse the Command Buffer
Transfer Command Buffer Memory to the PC
Each command is analysed
Command Buffer
Host PC
GCM Replay
Command Buffer
Memory
93. Command Buffer
Host PC
GCM Replay
Render Targets
Textures
Graphics Resource
Memory
Capturing the Command Buffer
Geometry
Vertex & Fragment
Programs
94. Capturing the Command Buffer
Once the process is complete - GCM Replay
has all the data it needs to
REPLAY your Command Buffer
95. Capturing the Command Buffer
Command Buffer
Ring Buffer
Character 0
Character 1
Character 2
Character 3
Character 4
Character 5
Character 6
Character 7
Character 8
Character 9
Character 10
Character 11
Character 12
Character 13
Capturing the Command Buffer can be very complex
96. Capturing the Command Buffer
All RSX usage models can be captured with GCM Replay
Command Buffer
Ring Buffer
97. GCM Replay Integration
Only takes a few minutes
At initialisation
// Initialise the capture API
cellGcmReplay::Network::Init();
cellGcmReplay::Capture::Init();
98. GCM Replay Integration
Then every frame…
// Call a single heartbeat function
cellGcmReplay::Heartbeat(&yourContext);
Add optional annotations
// Useful for adding semantics
cellGcmReplay::InsertDebugString(“Bloom Pass”);
109. Render Targets View
Puts internal analysis results at your finger tips
So you can instantly answer…
110. Render Targets View
What Draw Contexts write to this Render Target?
Is this Render Target aliased as a Texture?
Is this Render Target setup for
Double-Speed rendering?
Early-Z optimisation?
112. Render Target Refresh
1. Transfer Command Buffer and Resources
2. Kick Command Buffer up to the current Draw Context
Allows you to single step your rendering process
Both forwards and backwards in time
130. Draw Data View
See vertices kicked by current Draw Context
Each element of each referenced attribute
131. Vertex Programs View
Full disassembly
Stalls highlighted in Red
Optionally show
Instruction latencies
Dual issue
132. Vertex Program Constants View
See Vertex Program Constants
Used by current Draw Context
Colour-coded by analysis passes
Blue - newly modified
Green - inherited from previous
Red – redundant sets
138. GPU Registers View
Brief Mode - registers set
in current Draw Context
Descriptive Mode -
entire register state
Verify RSX state is what
you expect
139. Preview Resource Views
Exist for all Resource types
Analyse
Draw Context
Entire Scene
All Preview Views have unique features
Share common functionality - including
Cross-referencing
Search
Memory Dump
Export
140. Memory Layout View
See your Memory map
Resource locations in
Local Memory
Host Memory
143. GCM Replay Profiling
Supports a number of flavours
RSX executes your Command Buffer many times
Use of RSX hardware counters and timing facilities
Ensures timings and event counts are accurate
145. Sub-Unit Utilisation
On a Draw Context basis
Each utilisation is drawn overlapped
(not stacked)
Result - a sorted list of optimisation
targets
Clear indication of the bottleneck
Smallest
utilisation
Biggest
utilisation -
Bottleneck!
153. RSX
Deep and complex pipeline
Large array of rendering options
Difficult to predict
Effects of your engine changes?
What optimisations matter most?
154. What-If… I change the anisotropic
filtering level on my race track?
How much time would that buy back?
What would it look like?
What’s the best compromise?
155. What-If… I re-optimise my meshes?
For example
Convert triangle strips to triangle lists
Interleave all attribute streams
Optimise index tables for all vertex caches
How would these effect RSX performance?
156. What-If… If I write a near perfect
visibility culler?
That culls on a per triangle basis
Removes all triangles outside the viewport
All back-facing triangles
Zero-area degenerate triangles
Micro-triangles that miss all pixel centres
157. And…
It runs on the SPUs and removes all triangles
before they hit the RSX
How much time would that buy back?
158. With GCM Replay you can answer all these
questions…
Without touching a single line of code
159. GCM Replay What-Ifs
A new form of Conditional Profiling
Make fundamental changes to your
Command Buffer
Resources
All from within GCM Replay
Measure observed performance difference
See % Gain or Loss
160. What-if… Example
Through analysis I realise…
Not all my textures are
compressed
Not all my textures have a
mip-chain
My Scene
?
161. What-If… Workflow
Apply to
A Draw Context OR all Draw Contexts
A Resource OR all Resources of that type
Define scope
164. What-Ifs – Behind the Scenes
Profile baseline
For each What-If condition
Make modifications
Profile
Save results
In this example
Generate mip-chains
Compress Textures
Modify Texture state in Command Buffer
165. What-If… Results
Summarise
% gain for each What-If
Comments on actual modifications made
Total % gain for all What-Ifs
Instantly see the change in performance
166. What-If… Workflow
Two options
Profile – with new What-Ifs, same baseline
Commit – make new baseline
Iterate Process
Performance target reached
We’re as close as we can get
Incorporate the optimisations into your game
168. What-If… Optimise FP Constant Patching
What if all constants are set externally?
Directly patched by PPU or SPUs
169. Moroccan Scene Results
Applying all four What-Ifs…
Complete Texture Mip-chains +11.39%
Compress Textures +0.82%
Remove Global Redundancy +4.09%
Optimise Fragment Constant Patching +4.73%
Total Performance Gain
~21%
faster than original scene
171. What-If… Trim Triangles
What if we trim all triangles
Off-screen
Back facing
Degenerate
Don’t hit any pixel centres
Set scope to all Draw Contexts
Enable all triangle tests
Hit Profile
173. GCM Replay What-Ifs
Evaluate fundamental engine changes
Without actually having to make them
Provide rapid feedback
See What-If… results within minutes
Help you make informed decisions
What optimisations matter most?
How close are we to theoretical maximums?
Help avoid wasting time on fruitless changes
Save precious development time
174. The What-Ifs
Global
Remove Redundancy
For each Draw Context
Optimise all Triangle Lists
Convert Strips to Lists
Change Stream Interleaving
Trim Triangles
Trim Batches
Depth-only Pass
Disable unused Attributes
Disable unused Interpolators
Sort Batches Front to Back
Replace with Single Colour FP
Non-disclosed x3
Perfect Early-Z Settings
Convert to Indexed Drawing
Disable unused Clear Components
Remove redundant Clears
Remove completely filled Clears
Remove non-varying Attributes
For each Render Target
Remove redundancy
Non-disclosed
For each Texture
Switch Memory Context
Remove redundancy
Convert to DXT1
Convert to DXT5
Complete mip-chain
Override filtering modes
Override LOD bias
For each Vertex Program
Remove redundancy
Non-disclosed
For each Fragment Program
Optimise Constant Patching
Remove redundancy
Non-disclosed x2
175. GCM Replay Experiments
Extend the What-If concept
Experiments – automatically replay selected
What-Ifs with many times
Finds the optimal settings for your game
Example – Texture Placement Experiment
176. GCM Replay – The Future
More What-Ifs
More Experiments
Extend Edit-and-Continue
Modify all Resource types
Hot-load replacement Resources
Vertex and Fragment Program Debugging