Johan Andersson will show how the Frostbite 3 game engine is using the low-level graphics API Mantle to deliver significantly improved performance in Battlefield 4 on PC and future games from Electronic Arts in this presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
Graham Wihlidal from SEED attended the Munich Khronos Meetup and presented some aspects of Halcyon's rendering architecture, as well as details of the Vulkan implementation. Graham presented components like high-level render command translation, render graph, and shader compilation.
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
Global illumination (GI) has been an ongoing quest in games. The perpetual tug-of-war between visual quality and performance often forces developers to take the latest and greatest from academia and tailor it to push the boundaries of what has been realized in a game product. Many elements need to align for success, including image quality, performance, scalability, interactivity, ease of use, as well as game-specific and production challenges.
First we will paint a picture of the current state of global illumination in games, addressing how the state of the union compares to the latest and greatest research. We will then explore various GI challenges that game teams face from the art, engineering, pipelines and production perspective. The games industry lacks an ideal solution, so the goal here is to raise awareness by being transparent about the real problems in the field. Finally, we will talk about the future. This will be a call to arms, with the objective of uniting game developers and researchers on the same quest to evolve global illumination in games from being mostly static, or sometimes perceptually real-time, to fully real-time.
This presentation was given at SIGGRAPH 2017 by Colin Barré-Brisebois (EA SEED) as part of the Open Problems in Real-Time Rendering course.
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
Graham Wihlidal from SEED attended the Munich Khronos Meetup and presented some aspects of Halcyon's rendering architecture, as well as details of the Vulkan implementation. Graham presented components like high-level render command translation, render graph, and shader compilation.
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...Electronic Arts / DICE
Global illumination (GI) has been an ongoing quest in games. The perpetual tug-of-war between visual quality and performance often forces developers to take the latest and greatest from academia and tailor it to push the boundaries of what has been realized in a game product. Many elements need to align for success, including image quality, performance, scalability, interactivity, ease of use, as well as game-specific and production challenges.
First we will paint a picture of the current state of global illumination in games, addressing how the state of the union compares to the latest and greatest research. We will then explore various GI challenges that game teams face from the art, engineering, pipelines and production perspective. The games industry lacks an ideal solution, so the goal here is to raise awareness by being transparent about the real problems in the field. Finally, we will talk about the future. This will be a call to arms, with the objective of uniting game developers and researchers on the same quest to evolve global illumination in games from being mostly static, or sometimes perceptually real-time, to fully real-time.
This presentation was given at SIGGRAPH 2017 by Colin Barré-Brisebois (EA SEED) as part of the Open Problems in Real-Time Rendering course.
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
The presentation describes Physically Based Lighting Pipeline of Killzone : Shadow Fall - Playstation 4 launch title. The talk covers studio transition to a new asset creation pipeline, based on physical properties. Moreover it describes light rendering systems used in new 3D engine built from grounds up for upcoming Playstation 4 hardware. A novel real time lighting model, simulating physically accurate Area Lights, will be introduced, as well as hybrid - ray-traced / image based reflection system.
We believe that physically based rendering is a viable way to optimize asset creation pipeline efficiency and quality. It also enables the rendering quality to reach a new level that is highly flexible depending on art direction requirements.
Screen Space Decals in Warhammer 40,000: Space MarinePope Kim
My Siggraph 2012 presentation slides on Screen Space Decals in Warhammer 40,000: Space Marine.
SSD is similar to Deferred Decals, so I focused more on the problems we had and how we solved(or avoided) them
Physically Based Lighting in Unreal Engine 4Lukas Lang
Talk held at Unreal Meetup Munich on 15th May 2019.
I talked about some of the theoretical background of physically based lighting, demonstrated a workflow + containing value tables needed to be able to easily use the workflow.
Course presentation at SIGGRAPH 2014 by Charles de Rousiers and Sébastian Lagarde at Electronic Arts about transitioning the Frostbite game engine to physically-based rendering.
Make sure to check out the 118 page course notes on: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/
During the last few months, we have revisited the concept of image quality in Frostbite. The core of our approach was to be as close as possible to a cinematic look. We used the concept of reference to evaluate the accuracy of produced images. Physically based rendering (PBR) was the natural way to achieve this. This talk covers all the different steps needed to switch a production engine to PBR, including the small details often bypass in the literature.
The state of the art of real-time PBR techniques allowed us to achieve good overall results but not without production issues. We present some techniques for improving convolution time for image based reflection, proper ambient occlusion handling, and coherent lighting units which are mandatory for level editing.
Moreover, we have managed to reduce the quality gap, highlighted by our systematic reference comparison, in particular related to rough material handling, glossy screen space reflection, and area lighting.
The technical part of PBR is crucial for achieving good results, but represents only the top of the iceberg. Frostbite has become the de facto high-end game engine within Electronic Arts and is now used by a large amount of game teams. Moving all these game teams from “old fashion” lighting to PBR has required a lot of education, which have been done in parallel of the technical development. We have provided editing and validation tools to help the transition of art production. In addition, we have built a flexible material parametrisation framework to adapt to the various authoring tools and game teams’ requirements.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
This talk covers changes in CryENGINE 3 technology during 2012, with DX11 related topics such as moving to deferred rendering while maintaining backward compatibility on a multiplatform engine, massive vegetation rendering, MSAA support and how to deal with its common visual artifacts, among other topics.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
Checkerboard Rendering in Dark Souls: Remastered by QLOCQLOC
This is a talk on checkerboard rendering Markus & Andreas held at Digital Dragons 2019.
In it they quickly go through the history of Checkerboard Rendering before taking a deep dive into how it works and how it is implemented in Dark Souls: Remastered. Lastly, they present the quality and performance improvements they got from using it and their conclusion.
PS: The PDF. file includes useful in-depth notes from both authors.
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
This presentation discusses the Mantle API, what it is, why choose it, and abstraction level, small batch performance and platform efficiency.
Download the presentation from the AMD Developer website here: http://bit.ly/TrEUeC
This Webinar explores a variety of new and updated features in Java 8, and discuss how these changes can positively impact your day-to-day programming.
Watch the video replay here: http://bit.ly/1vStxKN
Your Webinar presenter, Marnie Knue, is an instructor for Develop Intelligence and has taught Sun & Oracle certified Java classes, RedHat JBoss administration, Spring, and Hibernate. Marnie also has spoken at JavaOne.
The presentation describes Physically Based Lighting Pipeline of Killzone : Shadow Fall - Playstation 4 launch title. The talk covers studio transition to a new asset creation pipeline, based on physical properties. Moreover it describes light rendering systems used in new 3D engine built from grounds up for upcoming Playstation 4 hardware. A novel real time lighting model, simulating physically accurate Area Lights, will be introduced, as well as hybrid - ray-traced / image based reflection system.
We believe that physically based rendering is a viable way to optimize asset creation pipeline efficiency and quality. It also enables the rendering quality to reach a new level that is highly flexible depending on art direction requirements.
Screen Space Decals in Warhammer 40,000: Space MarinePope Kim
My Siggraph 2012 presentation slides on Screen Space Decals in Warhammer 40,000: Space Marine.
SSD is similar to Deferred Decals, so I focused more on the problems we had and how we solved(or avoided) them
Physically Based Lighting in Unreal Engine 4Lukas Lang
Talk held at Unreal Meetup Munich on 15th May 2019.
I talked about some of the theoretical background of physically based lighting, demonstrated a workflow + containing value tables needed to be able to easily use the workflow.
Course presentation at SIGGRAPH 2014 by Charles de Rousiers and Sébastian Lagarde at Electronic Arts about transitioning the Frostbite game engine to physically-based rendering.
Make sure to check out the 118 page course notes on: http://www.frostbite.com/2014/11/moving-frostbite-to-pbr/
During the last few months, we have revisited the concept of image quality in Frostbite. The core of our approach was to be as close as possible to a cinematic look. We used the concept of reference to evaluate the accuracy of produced images. Physically based rendering (PBR) was the natural way to achieve this. This talk covers all the different steps needed to switch a production engine to PBR, including the small details often bypass in the literature.
The state of the art of real-time PBR techniques allowed us to achieve good overall results but not without production issues. We present some techniques for improving convolution time for image based reflection, proper ambient occlusion handling, and coherent lighting units which are mandatory for level editing.
Moreover, we have managed to reduce the quality gap, highlighted by our systematic reference comparison, in particular related to rough material handling, glossy screen space reflection, and area lighting.
The technical part of PBR is crucial for achieving good results, but represents only the top of the iceberg. Frostbite has become the de facto high-end game engine within Electronic Arts and is now used by a large amount of game teams. Moving all these game teams from “old fashion” lighting to PBR has required a lot of education, which have been done in parallel of the technical development. We have provided editing and validation tools to help the transition of art production. In addition, we have built a flexible material parametrisation framework to adapt to the various authoring tools and game teams’ requirements.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
This talk covers changes in CryENGINE 3 technology during 2012, with DX11 related topics such as moving to deferred rendering while maintaining backward compatibility on a multiplatform engine, massive vegetation rendering, MSAA support and how to deal with its common visual artifacts, among other topics.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
Checkerboard Rendering in Dark Souls: Remastered by QLOCQLOC
This is a talk on checkerboard rendering Markus & Andreas held at Digital Dragons 2019.
In it they quickly go through the history of Checkerboard Rendering before taking a deep dive into how it works and how it is implemented in Dark Souls: Remastered. Lastly, they present the quality and performance improvements they got from using it and their conclusion.
PS: The PDF. file includes useful in-depth notes from both authors.
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
This presentation discusses the Mantle API, what it is, why choose it, and abstraction level, small batch performance and platform efficiency.
Download the presentation from the AMD Developer website here: http://bit.ly/TrEUeC
This Webinar explores a variety of new and updated features in Java 8, and discuss how these changes can positively impact your day-to-day programming.
Watch the video replay here: http://bit.ly/1vStxKN
Your Webinar presenter, Marnie Knue, is an instructor for Develop Intelligence and has taught Sun & Oracle certified Java classes, RedHat JBoss administration, Spring, and Hibernate. Marnie also has spoken at JavaOne.
Inside XBox One by Martin Fuller from the Sweden Game Developers Conference, June 2, 2014, Stockholm, Sweden. View other presentations here: http://bit.ly/TrEUeC
Learn more about DirectGMA in this blog post: bit.ly/AMDDirectGMA
AMD has introduced Direct Graphics Memory Access in order to:
‒ Makes a portion of the GPU memory accessible to other devices
‒ Allows devices on the bus to write directly into this area of GPU memory
‒ Allows GPUs to write directly into the memory of remote devices on the bus supporting DirectGMA
‒ Provides a driver interface to allow 3rd party hardware vendors to support data exchange with an AMD GPU using DirectGMA
‒ and more
View the accompanying blog post here: bit.ly/AMDDirectGMA
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains.
Watch the webinar here: http://bit.ly/1obT0M2
For more developer resources, visit:
http://arrayfire.com/
http://developer.amd.com/
Follow us on Twitter: https://twitter.com/AMDDevCentral
See info in the slides for more contact information and resource links!
This is the slide deck from the popular "Introduction to Node.js" webinar with AMD and DevelopIntelligence, presented by Joshua McNeese. Watch our AMD Developer Central YouTube channel for the replay at https://www.youtube.com/user/AMDDevCentral.
This presentation accompanies the webinar replay located here: http://bit.ly/1zmvlkL
AMD Media SDK Software Architect Mikhail Mironov shows you how to leverage an AMD platform for multimedia processing using the new Media Software Development Kit. He discusses how to use a new set of C++ interfaces for easy access to AMD hardware blocks, and shows you how to leverage the Media SDK in the development of video conferencing, wireless display, remote desktop, video editing, transcoding, and more.
AMD’s math libraries can support a range of programmers from hobbyists to ninja programmers. Kent Knox from AMD’s library team introduces you to OpenCL libraries for linear algebra, FFT, and BLAS, and shows you how to leverage the speed of OpenCL through the use of these libraries.
Review the material presented in the AMD Math libraries webinar in this deck.
For more:
Visit the AMD Developer Forums:http://devgurus.amd.com/welcome
Watch the replay: www.youtube.com/user/AMDDevCentral
Follow us on Twitter: https://twitter.com/AMDDevCentral
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
This deck presents highlights from the Introduction to OpenCL™ Programming Webinar presented by Acceleware & AMD on Sept. 17, 2014. Watch a replay of this popular webinar on the AMD Dev Central YouTube channel here: https://www.youtube.com/user/AMDDevCentral or here for the direct link: http://bit.ly/1r3DgfF
Presentation & discussion around low-level graphics APIs. This was a quickly made presentation that I put together for a discussion with Intel and fellow ISVs, thought it could be worth sharing
Slides from a talk given to coursemates about my university final year project on the UWE CRTS course which involved porting uCLinux to the Pluto 6 gaming control board.
UKUUG presentation about µCLinux on Pluto 6edlangley
Slides from a <a>talk</a> given at the UKUUG 2006 conference derived from my final year project on the UWE CRTS degree which involved porting uCLinux to the Pluto 6 gaming control board.
DB2 for z/OS - Starter's guide to memory monitoring and controlFlorence Dubois
DB2 for z/OS makes more and more use of REAL memory to improve performance and reduce cost. But if you don't carefully budget and monitor the use of REAL memory on your system, you could be putting your applications at risk. This presentation will go back to the basics and answer the most common questions about REAL memory management including: how does DB2 uses virtual and REAL memory? how to build a budget based on system settings and buffer pool sizes? how to size the LFAREA? what are the key performance indicators and how do I know I am running 'safely'? what can be done to protect the system?
Yesterday's thinking may still believe NVMe (NVM Express) is in transition to a production ready solution. In this session, we will discuss how the evolution of NVMe is ready for production, the history and evolution of NVMe and the Linux stack to address where NVMe has progressed today to become the low latency, highly reliable database key value store mechanism that will drive the future of cloud expansion. Examples of protocol efficiencies and types of storage engines that are optimizing for NVMe will be discussed. Please join us for an exciting session where in-memory computing and persistence have evolved.
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
Learn more about how AMD’s RapidFire SDK simplifies the delivery of multi-game streaming from a single GPU while minimizing latency to ensure one of the best cloud gaming experiences in this presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
Oxide Games Partners Dan Baker and Tim Kipp will show you how to build a high throughput renderer using the Mantle API in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
This AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21 explains how Mantle features can enable developers to improve both CPU and GPU performance in their titles. Also view this and other presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
A look at how new Direct3D advancements enhance efficiency and enable fully-threaded building of command buffers in this prentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
Keynote presentation, The Role of Java in Heterogeneous Computing, and How You Can Help, by Nandini Ramani, VP, Java Platform, Oracle Corporation, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
Keynote presentation, Is There Anything New in Heterogeneous Computing, by Mike Muller, Chief Technology Officer, ARM, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
Keynote (Dr. Lisa Su) - Developers: The Heart of AMD Innovation - by Dr. Lisa...AMD Developer Central
Keynote, Developers: The Heart of AMD Innovation, by Dr. Lisa Su, Senior VP and GM, Global Business Units, AMD, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
4. 4
BF4 MANTLE GOALS
Goals:
– Significantly improve CPU performance
– More consistent & stable performance
– Improve GPU performance where possible
– Add support for a new Mantle rendering
backend in a live game
Minimize changes to engine interfaces
Compatible with built PC content
– Work on wide set of hardware
APU to quad-GPU
But x64 only (32-bit Windows needs to die)
Non-goals:
– Design new renderer from scratch for Mantle
– Take advantage of asymmetric MGPU
(APU+discrete)
– Optimize video memory consumption
5. 5
BF4 MANTLE STRATEGIC GOALS
Prove that low-level graphics APIs work outside of consoles
Push the industry towards low-level graphics APIs everywhere
Build a foundation for the future that we can build great games on
7. 7
SHADERS
Shader resource bind points replaced with a resource table object - descriptor set
– This is how the hardware accesses the shader resources
– Flat list of images, buffers and samplers used by any of the shader stages
– Vertex shader streams converted to vertex shader buffer loads
Engine assign each shader resource to specific slot in the descriptor set(s)
– Can share slots between shader stages = smaller descriptor sets
– The mapping takes a while to wrap one’s head around
8. 8
SHADER CONVERSION
DX11 bytecode shaders gets converted to AMDIL & mapping applied using ILC tool
– Done at load time
– Don’t have to change our shaders!
Have full source & control over the process
Could write AMDIL directly or use other frontends if wanted
9. 9
DESCRIPTOR SETS
Very simple usage in BF4: for each draw call write flat list of resources
–Essentially direct replacement of SetTexture/SetConstantBuffer/SetInputStream
Single dynamic descriptor set object per frame
Sub-allocate for each draw call and write list of resources
~15000 resource slots written per frame in BF4, still very fast
11. 11
DESCRIPTOR SETS – FUTURE OPTIMIZATIONS
Use static descriptor sets when possible
Reduce resource duplication by reusing & sharing more across shader stages
Nested descriptor sets
12. 12
COMPUTE PIPELINES
1:1 mapping between pipeline & shader
No state built into pipeline
Can execute in parallel with rendering
~100 compute pipelines in BF4
13. 13
GRAPHICS PIPELINES
All graphics shader stages combined to a single pipeline object together with important graphics state
~10000 graphics pipelines in BF4 on a single level, ~25 MB of video memory
Could use smaller working pool of active state objects to keep reasonable amount in memory
– Have not been required for us
14. 14
PRE-BUILDING PIPELINES
Graphics pipeline creation is expensive operation, do at load time instead of runtime!
– Creating one of our graphics pipelines take ~10-60 ms each
– Pre-build using N parallel low-priority jobs
– Avoid 99.9% of runtime stalls caused by pipeline creation!
Requires knowing the graphics pipeline state that will be used with the shaders
– Primitive type
– Render target formats
– Render target write masks
– Blend modes
Not fully trivial to know all state, may require engine changes / pre-defining use cases
– Important to design for!
15. 15
PIPELINE CACHE
Cache built pipelines both in memory cache and disk cache
– Improved loading times
– Max 300 MB
– Simple LRU policy
– LZ4 compressed (free)
Database signature:
– Driver version
– Vendor ID
– Device ID
17. 17
MEMORY MANAGEMENT
Mantle devices exposes multiple memory heaps with characteristics
– Can be different between devices, drivers and OS:es
User explicitly places resources in wanted heaps
– Driver suggests preferred heaps when creating objects, not a requirement
Type Size Page CPU access GPU
Read
GPU
Write
CPU
Read
CPU
Write
Local 256 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 130 170 0.0058 2.8
Local 4096 MB 65535 130 180 0 0
Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent|CpuUncached|CpuWriteCombined 2.6 2.6 0.1 3.3
Remote 16106 MB 65535 CpuVisible|CpuGpuCoherent 2.6 2.6 3.2 2.9
18. 18
FROSTBITE MEMORY HEAPS
System Shared Mapped
– CPU memory that is GPU visible.
– Write combined & persistently mapped = easy
& fast to write to in parallel at any time
System Shared Pinned
– CPU cached for readback.
– Not used much
Video Shared
– GPU memory accessible by CPU. Used for
descriptor sets and dynamic buffers
– Max 256 MB (legacy constraint)
– Avoid keeping persistently mapped as WDMM
doesn’t like this and can decide to move it back
to CPU memory
Video Private
– GPU private memory.
– Used for render targets, textures and other
resources CPU does not need to access
19. 19
MEMORY REFERENCES
WDDM needs to know which memory allocations are referenced for each command buffer
– In order to make sure they are resident and not paged out
– Max ~1700 memory references are supported
– Overhead with having lots of references
Engine needs to keep track of what memory is referenced while building the command buffers
– Easy & fast to do
– Each reference is either read-only or read/write
– We use a simple global list of references shared for all command buffers.
20. 20
MEMORY POOLING
Pooling memory allocations were required for us
– Sub allocate within larger 1 – 32 MB chunks
– All resources stored memory handle + offset
– Not as elegant as just void* on consoles
– Fragmentation can be a concern, not too much issues for us in practice
GPU virtual memory mapping is fully supported, can simplify & optimize management
21. 21
OVERCOMMITTING VIDEO MEMORY
Avoid overcommitting video memory!
– Will lead to severe stalls as VidMM moves blocks and moves memory back and forth
– VidMM is a black box
– One of the biggest issues we ran into during development
Recommendations
– Balance memory pools
– Make sure to use read-only memory references
– Use memory priorities
22. 22
MEMORY PRIORITIES
Setting priorities on the memory allocations helps VidMM choose what to page out when it has to
5 priority levels
– Very high = Render targets with MSAA
– High = Render targets and UAVs
– Normal = Textures
– Low = Shader & constant buffers
– Very low = vertex & index buffers
23. 23
MEMORY RESIDENCY FUTURE
For best results manage which resources are in video memory yourself & keep only ~80% used
– Avoid all stalls
– Can async DMA in and out
We are thinking of redesigning to fully avoid possibility of overcommitting
Hoping WDDM’s memory residency management can be simplified & improved in the future
25. 25
RESOURCE LIFETIMES
App manages lifetime of all resources
– Have to make sure GPU is not using an object or memory while we are freeing it on the CPU
– How we’ve always worked with GPUs on the consoles
– Multi-GPU adds some additional complexity that consoles do not have
We keep track of lifetimes on a per frame granularity
– Queues for object destruction & free memory operations
– Add to queue at any time on the CPU
– Process queues when GPU command buffers for the frame are done executing
– Tracked with command buffer fences
26. 26
LINEAR FRAME ALLOCATOR
We use multiple linear allocators with Mantle for both transient buffers & images
– Used for huge amount of small constant data and other GPU frame data that CPU writes
– Easy to use and very low overhead
– Don’t have to care about lifetimes or state
Fixed memory buffers for each frame
– Super cheap sub-allocation from from any thread
– If full, use heap allocation (also fast due to pooling)
Alternative: ring buffers
– Requires being able to stall & drain pipeline at any allocation if full, additional complexity for us
27. 27
TILING
Textures should be tiled for performance
– Explicitly handled in Mantle, user selects linear or tiled
– Some formats (BC) can’t be accessed as linear by the GPU
On consoles we handle tiling offline as part of our data processing pipeline
– We know the exact tiling formats and have separate resources per platform
For Mantle
– Tiling formats are opaque, can be different between GPU architectures and image types
– Tile textures with DMA image upload from SystemShared to VideoPrivate
Linear source, tiled destination
Free
29. 29
COMMAND BUFFERS
Command buffers are the atomic unit of work dispatched to the GPU
– Separate creation from execution
– No “immediate context” a la DX11 that can execute work at any call
– Makes resource synchronization and setup significantly easier & faster
Typical BF4 scenes have around ~50 command buffers per frame
– Reasonable tradeoff for us with submission overhead vs CPU load-balancing
30. 30
COMMAND BUFFER SOURCES
Frostbite has 2 separate sources of command buffers
– World rendering
Rendering the world with tons of objects, lots of draw calls. Have all frame data up front
All resources except for render targets are read-only
Generated in parallel up front each frame
– Immediate rendering (“the rest”)
Setting up rendering and doing lighting, post-fx, virtual texturing, compute, etc
Managing resource state, memory and running on different queues (graphics, compute, DMA)
Sequentially generated in a single job, simulate an immediate context by splitting the command buffer
Both are very important and have different requirements
31. 31
RESOURCE TRANSITIONS
Key design in Mantle to significantly lower driver overhead & complexity
– Explicit hazard tracking by the app/engine
– Drives architecture-specific caches & compression
– AMD: FMASK, CMASK, HTILE
– Enables explicit memory management
Examples:
– Optimal render target writes → Graphics shader read-only
– Compute shader write-only → DrawIndirect arguments
Mantle has a strong validation layer that tracks transitions which is a major help
32. 32
MANAGING RESOURCE TRANSITIONS
Engines need a clear design on how to handle state transitions
Multiple approaches possible:
– Sequential in-order command buffers
Generate one command buffer at the time in order
Transition resources on-demand when doing operation on them, very simple
Recommendation: start with this
– Out-of-order multiple command buffers
Track state per command buffer, fix up transitions when order of command buffers is known
– Hybrid approaches & more
33. 33
MANAGING RESOURCE TRANSITIONS IN FROSTBITE
Current approach in Frostbite is quite basic:
– We keep track of a single state for each resource (not subresource)
– The “immediate rendering” transition resources as needed depending on operation
– The out of order “world rendering” command buffers don’t need to transition states
Already have write access to MRTs and read-access to all resources setup outside them
Avoids the problem of them not knowing the state during generation
Works now but as we do more general parallel rendering it will have to change
– Track resource state for each command buffer & fixup between command buffers
34. 34
DYNAMIC STATE OBJECTS
Graphics state is only set with the pipeline object and 5 dynamic state objects
– State objects: color blend, raster, viewport, depth-stencil, MSAA
– No other parameters such as in DX11 with stencil ref or SetViewport functions
Frostbite use case:
– Pre-create when possible
– Otherwise on-demand creation (hash map)
– Only ~100 state objects!
Still possible to end up with lots of state objects
– Esp. with state object float & integer values (depth bounds, depth bias, viewport)
– But no need to store all permutations in memory, objects are fast to create & app manages lifetimes
36. 36
QUEUES
Universal queue can do both graphics, compute and presents
We use also use additional queues to parallelize GPU operations:
– DMA queue – Improve perf with faster transfers & avoiding idling graphics will transfering
– Compute queue - Improve perf by utilizing idle ALU and update resources simultaneously with gfx
More GPUs = more queues!
37. 37
Order of execution within a queue is sequential
Synchronize multiple queues with GPU semaphores (signal & wait)
Also works across multiple GPUs
Compute
Graphics
QUEUES SYNCHRONIZATION
S
Wait
W
S
38. 38
QUEUES SYNCHRONIZATION CONT
Started out with explicit semaphores
– Error prone to handle when having lots of different semaphores & queues
– Difficult to visualize & debug
Switched to more representation more similar to a job graph
Just a model on top of the semaphores
39. 39
GPU JOB GRAPH
Each GPU job has list of dependencies (other command buffers)
Dependencies has to finish first before job can run on its queue
The dependencies can be from any queue
Was easier to work with, debug and visualize
Really extendable going forward
Graphics 1 Graphics 2
DMA
Compute
Graphics 2
40. 40
ASYNC DMA
AMD GPUs have dedicated hardware DMA engines, let’s use them!
– Uploading through DMA is faster than on universal queue, even if blocking
– DMA have alignment restrictions, have to support falling back to copies on universal queue
Use case: Frame buffer & texture uploads
– Used by resource initial data uploads and our UpdateSubresource
– Guaranteed to be finished before the GPU universal queue starts rendering the frame
Use case: Multi-GPU frame buffer copy
– Peer-to-peer copy of the frame buffer to the GPU that will present it
41. 41
ASYNC COMPUTE
Frostbite has lots of compute shader passes that could run in parallel with graphics work
– HBAO, blurring, classification, tile-based lighting, etc
Running as async compute can improve GPU performance by utilizing ”free” ALU
– For example while doing shadowmap rendering (ROP bound)
42. 42
ASYNC COMPUTE – TILE-BASED LIGHTING
3 sequential compute shaders
– Input: zbuffer & gbuffer
– Output: HDR texture/UAV
Runs in parallel with graphics pipeline that renders to other targets
Compute
Graphics
TileZ
Gbuffer Shadowmaps Reflection Distort Transp
Cull lights Lighting
S
SWait
W
43. 43
ASYNC COMPUTE – TILE-BASED LIGHTING
We manually prepare the resources for the async compute
– Important to not access the resources on other queues at the same time (unless read-only state)
– Have to transition resources on the queue that last used it
Up to 80% faster in our initial tests, but not fully reliable
– But is a pretty small part of the frame time
– Not in BF4 yet
Compute
Graphics
TileZ
Gbuffer Shadowmaps Reflection Distort Transp
Cull lights Lighting
S
SWait
W
45. 45
MULTI-GPU
Multi-GPU alternatives:
– AFR – Alternate Frame Rendering (1-4 GPUs of the same power)
– Heterogeneous AFR – 1 small + 1 big GPU (APU + Discrete)
– SFR – Split Frame Rendering
– Multi-GPU Job Graph – Primary strong GPU + slave GPUs helping
Frostbite supports AFR natively
– No synchronization points within the frame
– For resources that are not rendered every frame: re-render resources for each GPU
Example: sky envmap update on weather change
With Mantle multi-GPU is explicit and we have to build support for it ourselves
46. 46
MULTI-GPU AFR WITH MANTLE
All resources explicitly duplicated on each GPU with async DMA
– Hidden internally in our rendering abstraction
Every frame alternate which GPU we build command buffers for and are using resources from
Our UpdateSubresource has to make sure it updates resources on all GPU
Presenting the screen has to in some modes copy the frame buffer to the GPU that owns the display
Bonus:
– Can simulate multi-GPU mode even with single GPU!
– Multi-GPU works in windowed mode!
47. 47
GPUs are independently rendering & presenting to the screen – can cause micro-stuttering
– Frames are not presented in a regular intervals
– Frame rate can be high but presentation & gameplay is not smooth
– FCAT is a good tool to analyse this
MULTI-GPU ISSUES
GPU0
GPU1
Frame 0 P
Frame 1 P
Frame 2 P
Frame 3 P
GPU0
GPU1
Irregular
presentation
interval
48. 48
GPUs are independently rendering & presenting to the screen – can cause micro-stuttering
– Frames are not presented in a regular intervals
– Frame rate can be high but presentation & gameplay is not smooth
– FCAT is a good tool to analyse this
We need to introduce dependency & dampening between the GPUs to alleviate this – frame pacing
MULTI-GPU ISSUES
GPU0
GPU1
Frame 0 P
Frame 1 P
Frame 2 P
Frame 3 P
Ideal
presentation
interval
49. 49
FRAME PACING
Measure average frame rate on each GPU
– Short history (10-30 frames)
– Filter out spikes
Insert delay on the GPU before each present
– Force the frame times to become more regular and GPUs to align
– Delay value is based on the calculate avg frame rate
GPU0
GPU1
Frame 0 P
Frame 1 P
Frame 2 P
Frame 3 P
GPU0
GPU1
Delay
D
51. 51
MANTLE DEV RECOMMENDATIONS
The validation layer is a critical friend!
You’ll end up with a lot of object & memory management code, try share with console code
Make sure you have control over memory usage and can avoid overcommitting video memory
Build a robust solution for resource state management early
Figure out how to pre-create your graphics pipelines, can require engine design changes
Build for multi-GPU support from the start, easier than to retrofit
52. 52
FUTURE
Second wave of Frostbite Mantle titles
Adapt Frostbite core rendering layer based on learnings from Mantle
– Refine binding & buffer updates to further reduce overhead
– Virtual memory management
– More async compute & async DMAs
– Multi-GPU job graph R&D
Linux
– Would like to see how our Mantle renderer behaves with different memory management & driver model