This presentation introduces a new NVIDIA extension called Command-list.
The purpose of this presentation is to explain the basic concepts on how to use it and show what are the benefits.
The sample I used for the talk is here: https://github.com/nvpro-samples/gl_commandlist_bk3d_models
The driver for trying should be PreRelease 347.09
http://www.nvidia.com/download/driverResults.aspx/80913/en-us
OpenGL 4.4 provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.
Video presentation here: http://on-demand.gputechconf.com/gtc/2014/video/S4379-opengl-44-scene-rendering-techniques.mp4
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Ever wondered how to use modern OpenGL in a way that radically reduces driver overhead? Then this talk is for you.
John McDonald and Cass Everitt gave this talk at Steam Dev Days in Seattle on Jan 16, 2014.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
presented at SIGGRAPH 2014 in Vancouver during NVIDIA's "Best of GTC" sponsored sessions
http://www.nvidia.com/object/siggraph2014-best-gtc.html
Watch the replay that includes a demo of GPU-accelerated Illustrator and several OpenGL 4 demos running on NVIDIA's Tegra Shield tablet.
http://www.ustream.tv/recorded/51255959
Find out more about the OpenGL examples for GameWorks:
https://developer.nvidia.com/gameworks-opengl-samples
OpenGL 4.4 provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.
Video presentation here: http://on-demand.gputechconf.com/gtc/2014/video/S4379-opengl-44-scene-rendering-techniques.mp4
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Ever wondered how to use modern OpenGL in a way that radically reduces driver overhead? Then this talk is for you.
John McDonald and Cass Everitt gave this talk at Steam Dev Days in Seattle on Jan 16, 2014.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
presented at SIGGRAPH 2014 in Vancouver during NVIDIA's "Best of GTC" sponsored sessions
http://www.nvidia.com/object/siggraph2014-best-gtc.html
Watch the replay that includes a demo of GPU-accelerated Illustrator and several OpenGL 4 demos running on NVIDIA's Tegra Shield tablet.
http://www.ustream.tv/recorded/51255959
Find out more about the OpenGL examples for GameWorks:
https://developer.nvidia.com/gameworks-opengl-samples
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
This presentation introduces Vulkan components, what you must know to start using this new API. And what you must know when using it on NVIDIA hardware
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
This lecture covers rendering topics related to Crytek’s latest engine iteration, the technology which powers titles such as Ryse, Warface, and Crysis 3. Among covered topics, Sousa presented SMAA 1TX: an update featuring a robust and simple temporal antialising component; performant and physically-plausible camera related post-processing techniques such as motion blur and depth of field were also covered.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
A behind-the-scenes look into the latest renderer technology powering the critically acclaimed DOOM. The lecture will cover how technology was designed for balancing a good visual quality and performance ratio. Numerous topics will be covered, among them details about the lighting solution, techniques for decoupling costs frequency and GCN specific approaches.
In the session from Game Developers Conference 2011, we'll take a complete look at the terrain system in Frostbite 2 as it was applied in Battlefield 3. The session is partitioned into three parts. We begin with the scalability aspects and discuss how consistent use of hierarchies allowed us to combine high resolutions with high view distances. We then turn towards workflow aspects and describe how we achieved full in-game realtime editing. A fair amount of time is spent describing how issues were addressed.
Finally, we look at the runtime side. We describe usage of CPU, GPU and memory resources and how it was kept to a minimum. We discuss how the GPU is offloaded by caching intermediate results in a procedural virtual texture and how prioritization was done to allow for work throttling without sacrificing quality. We also go into depth about the flexible streaming system that work with both FPS and driving games.
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
Download the full presentation here: http://www.guerrilla-games.com/read/decima-engine-visibility-in-horizon-zero-dawn
Abstract: Horizon Zero Dawn presented the Decima engine with new challenges in rendering large and dense environments. In particular, we needed to be able to quickly query a very large set of potential objects to find which should be visible. This talk looks at the problems we faced moving from more constrained Killzone levels to Horizon's open world, and our approach to fast visibility queries using the PS4's asynchronous compute hardware. It also covers our recent work on efficiently collecting batches of object instances during the query to reduce load on the entire rendering pipeline. A basic familiarity with GPU compute will be helpful to get the most out of this talk.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
A description of the next-gen rendering technique called Triangle Visibility Buffer. It offers up to 10x - 20x geometry compared to Deferred rendering and much higher resolution. Generally it aligns better with memory access patterns in modern GPUs compared to Deferred Lighting like Clustered Deferred Lighting etc.
Presented September 30, 2009 in San Jose, California at GPU Technology Conference.
Describes the new features of OpenGL 3.2 and NVIDIA's extensions beyond 3.2 such as bindless graphics, direct state access, separate shader objects, copy image, texture barrier, and Cg 2.2.
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
Learn how NVIDIA continues improving both Vulkan and OpenGL for cross-platform graphics and compute development. This high-level talk is intended for anyone wanting to understand the state of Vulkan and OpenGL in 2017 on NVIDIA GPUs. For OpenGL, the latest standard update maintains the compatibility and feature-richness you expect. For Vulkan, NVIDIA has enabled the latest NVIDIA GPU hardware features and now provides explicit support for multiple GPUs. And for either API, NVIDIA's SDKs and Nsight tools help you develop and debug your application faster.
NVIDIA booth theater presentation at SIGGRAPH in Los Angeles, August 1, 2017.
http://www.nvidia.com/object/siggraph2017-schedule.html?id=sig1732
Get your SIGGRAPH driver release with OpenGL 4.6 and the latest Vulkan functionality from
https://developer.nvidia.com/opengl-driver
The past few years have seen a sharp increase in the complexity of rendering algorithms used in modern game engines. Large portions of the rendering work are increasingly written in GPU computing languages, and decoupled from the conventional “one-to-one” pipeline stages for which shading languages were designed. Following Tim Foley’s talk from SIGGRAPH 2016’s Open Problems course on shading language directions, we explore example rendering algorithms that we want to express in a composable, reusable and performance-portable manner. We argue that a few key constraints in GPU computing languages inhibit these goals, some of which are rooted in hardware limitations. We conclude with a call to action detailing specific improvements we would like to see in GPU compute languages, as well as the underlying graphics hardware.
This talk was originally given at SIGGRAPH 2017 by Andrew Lauritzen (EA SEED) for the Open Problems in Real-Time Rendering course.
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
This presentation introduces Vulkan components, what you must know to start using this new API. And what you must know when using it on NVIDIA hardware
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
This lecture covers rendering topics related to Crytek’s latest engine iteration, the technology which powers titles such as Ryse, Warface, and Crysis 3. Among covered topics, Sousa presented SMAA 1TX: an update featuring a robust and simple temporal antialising component; performant and physically-plausible camera related post-processing techniques such as motion blur and depth of field were also covered.
Talk by Graham Wihlidal (Frostbite Labs) at GDC 2017.
Checkerboard rendering is a relatively new technique, popularized recently by the introduction of the PlayStation 4 Pro. Many modern game engines are adding support for it right now, and in this talk, Graham will present an in-depth look at the new implementation in Frostbite, which is used in shipping titles like 'Battlefield 1' and 'Mass Effect Andromeda'. Despite being conceptually simple, checkerboard rendering requires a deep integration into the post-processing chain, in particular temporal anti-aliasing, dynamic resolution scaling, and poses various challenges to existing effects. This presentation will cover the basics of checkerboard rendering, explain the impact on a game engine that powers a wide range of titles, and provide a detailed look at how the current implementation in Frostbite works, including topics like object id, alpha unrolling, gradient adjust, and a highly efficient depth resolve.
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
A behind-the-scenes look into the latest renderer technology powering the critically acclaimed DOOM. The lecture will cover how technology was designed for balancing a good visual quality and performance ratio. Numerous topics will be covered, among them details about the lighting solution, techniques for decoupling costs frequency and GCN specific approaches.
In the session from Game Developers Conference 2011, we'll take a complete look at the terrain system in Frostbite 2 as it was applied in Battlefield 3. The session is partitioned into three parts. We begin with the scalability aspects and discuss how consistent use of hierarchies allowed us to combine high resolutions with high view distances. We then turn towards workflow aspects and describe how we achieved full in-game realtime editing. A fair amount of time is spent describing how issues were addressed.
Finally, we look at the runtime side. We describe usage of CPU, GPU and memory resources and how it was kept to a minimum. We discuss how the GPU is offloaded by caching intermediate results in a procedural virtual texture and how prioritization was done to allow for work throttling without sacrificing quality. We also go into depth about the flexible streaming system that work with both FPS and driving games.
Decima Engine: Visibility in Horizon Zero DawnGuerrilla
Download the full presentation here: http://www.guerrilla-games.com/read/decima-engine-visibility-in-horizon-zero-dawn
Abstract: Horizon Zero Dawn presented the Decima engine with new challenges in rendering large and dense environments. In particular, we needed to be able to quickly query a very large set of potential objects to find which should be visible. This talk looks at the problems we faced moving from more constrained Killzone levels to Horizon's open world, and our approach to fast visibility queries using the PS4's asynchronous compute hardware. It also covers our recent work on efficiently collecting batches of object instances during the query to reduce load on the entire rendering pipeline. A basic familiarity with GPU compute will be helpful to get the most out of this talk.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
A description of the next-gen rendering technique called Triangle Visibility Buffer. It offers up to 10x - 20x geometry compared to Deferred rendering and much higher resolution. Generally it aligns better with memory access patterns in modern GPUs compared to Deferred Lighting like Clustered Deferred Lighting etc.
Presented September 30, 2009 in San Jose, California at GPU Technology Conference.
Describes the new features of OpenGL 3.2 and NVIDIA's extensions beyond 3.2 such as bindless graphics, direct state access, separate shader objects, copy image, texture barrier, and Cg 2.2.
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
Learn how NVIDIA continues improving both Vulkan and OpenGL for cross-platform graphics and compute development. This high-level talk is intended for anyone wanting to understand the state of Vulkan and OpenGL in 2017 on NVIDIA GPUs. For OpenGL, the latest standard update maintains the compatibility and feature-richness you expect. For Vulkan, NVIDIA has enabled the latest NVIDIA GPU hardware features and now provides explicit support for multiple GPUs. And for either API, NVIDIA's SDKs and Nsight tools help you develop and debug your application faster.
NVIDIA booth theater presentation at SIGGRAPH in Los Angeles, August 1, 2017.
http://www.nvidia.com/object/siggraph2017-schedule.html?id=sig1732
Get your SIGGRAPH driver release with OpenGL 4.6 and the latest Vulkan functionality from
https://developer.nvidia.com/opengl-driver
The past few years have seen a sharp increase in the complexity of rendering algorithms used in modern game engines. Large portions of the rendering work are increasingly written in GPU computing languages, and decoupled from the conventional “one-to-one” pipeline stages for which shading languages were designed. Following Tim Foley’s talk from SIGGRAPH 2016’s Open Problems course on shading language directions, we explore example rendering algorithms that we want to express in a composable, reusable and performance-portable manner. We argue that a few key constraints in GPU computing languages inhibit these goals, some of which are rooted in hardware limitations. We conclude with a call to action detailing specific improvements we would like to see in GPU compute languages, as well as the underlying graphics hardware.
This talk was originally given at SIGGRAPH 2017 by Andrew Lauritzen (EA SEED) for the Open Problems in Real-Time Rendering course.
Taking Killzone Shadow Fall Image Quality Into The Next GenerationGuerrilla
This talk focuses on the technical side of Killzone Shadow Fall, the platform exclusive launch title for PlayStation 4.
We present the details of several new techniques that were developed in the quest for next generation image quality, and the talk uses key locations from the game as examples. We discuss interesting aspects of the new content pipeline, next-gen lighting engine, usage of indirect lighting and various shadow rendering optimizations. We also describe the details of volumetric lighting, the real-time reflections system, and the new anti-aliasing solution, and include some details about the image-quality driven streaming system. A common, very important, theme of the talk is the temporal coherency and how it was utilized to reduce aliasing, and improve the rendering quality and image stability above the baseline 1080p resolution seen in other games.
Presented as a pre-conference tutorial at the GPU Technology Conference in San Jose on September 20, 2010.
Learn about NVIDIA's OpenGL 4.1 functionality available now on Fermi-based GPUs.
A technical deep dive into the DX11 rendering in Battlefield 3, the first title to use the new Frostbite 2 Engine. Topics covered include DX11 optimization techniques, efficient deferred shading, high-quality rendering and resource streaming for creating large and highly-detailed dynamic environments on modern PCs.
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
Third Workshop on Accelerator Programming Using Directives (WACCPD2016, co-located with SC16)
While GPUs are increasingly popular for high-performance
computing, optimizing the performance of GPU programs is a time-consuming and non-trivial process in general. This complexity stems from the low abstraction level of standard
GPU programming models such as CUDA and OpenCL:
programmers are required to orchestrate low-level operations
in order to exploit the full capability of GPUs. In terms of
software productivity and portability, a more attractive approach
would be to facilitate GPU programming by providing high-level
abstractions for expressing parallel algorithms.
OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years.
From OpenMP 4.0 onwards, GPU platforms are supported
by extending OpenMP’s high-level parallel abstractions with
accelerator programming. This extension allows programmers to
write GPU programs in standard C/C++ or Fortran languages,
without exposing too many details of GPU architectures.
However, such high-level parallel programming strategies generally impose additional program optimizations on compilers,
which could result in lower performance than fully hand-tuned
code with low-level programming models.To study potential
performance improvements by compiling and optimizing high-level GPU programs, in this paper, we 1) evaluate a set of
OpenMP 4.x benchmarks on an IBM POWER8 and NVIDIA
Tesla GPU platform and 2) conduct a comparable performance
analysis among hand-written CUDA and automatically-generated
GPU programs by the IBM XL and clang/LLVM compilers.
This presentation made at TI Developer Conference 2008, introduces the options available for developers to create User Interfaces on TI SGX based platforms.
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
Direct3D 11 will have tessellation for smoother curves and finer details. The new compute shader will make postprocessing faster and easier. You'll need Direct3D 11 to have the best graphics, and this talk will show you how you can get started using current generation hardware.
FGS 2011: Making A Game With Molehill: Zombie Tycoonmochimedia
Luc Beaulieu and Jean-Philipe Auclair from Frima Studio share their experience working with Adobe's new Molehill API's in making their new game "Zombie Tycoon".
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Planning Of Procurement o different goods and services
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
1. Siggraph Asia 2014
Tristan Lorach
Manager of Devtech for Professional Visualization group
OPENGL NVIDIA "COMMAND-LIST": "APPROACHING ZERO DRIVER OVERHEAD"
2. 2
Siggraph Asia 2014
GPU Scalability GPUs are extremely powerful But only if used correctly Mix of Application and Graphic API (Driver) responsibility Increase amount of work per batch (Job) Minimize CPUGPU interactions Lower Memory traffic Lower API calls: Batch things together Factorize Data Re-use data uploaded to Video memory (instancing…)
3. 3
Siggraph Asia 2014
Use of the GPU Games ~1500 to 3000 Drawcalls for a scene Intensive use of Multi-layer image processing over the scene Heavy shaders CAD/FCC/professional applications Hard to batch user’s works (CAD applications) ~10,000 to… 300,000 Drawcalls for a scene: heavy CPU workload for our driver Shaders simpler than games (but catching-up these days...) Post-Processing more and more used (SSAO)
4. 4
Siggraph Asia 2014
Use of the GPU New Graphic APIs are trying to address these concerns Khronos (Announced it at Siggraph Vancouver 2014) Metal Mantle DX12 All propose better ways to issue commands and render-states But can we improve OpenGL to go the same path ? Yes: NV_command_list
5. 5
Siggraph Asia 2014
OpenGL OpenGL is ~25 years old (!!) 1.x: fixed function 2.x: Shading language (GLSL) 3.x/4.x: more as a massive compute system More graphic stages (Geom.Shader; Tessellation) High flexibility & efficiency on memory accesses High programability Design "flaws": still “State Machine”; ServerClient; poor multi-threading management
1992
2012
6. 6
Siggraph Asia 2014
Challenge of Issuing Commands Issuing drawcalls and state changes can be a real bottleneck
650,000 Triangles
68,000 Parts
~ 10 Triangles per part
3,700,000 Triangles
98 000 Parts
~ 37 Triangles per part
14,338,275 Triangles/lines
300,528 drawcalls (parts)
~ 48 Triangles per part
App + driver
GPU
GPU
idle
CPU
Excessive Work from App & Driver On CPU
!
!
courtesy of PTC
9. 9
Siggraph Asia 2014
Application
Command-list
Token-buffer
OpenGL
Driver
Big Picture
Vertex Puller (IA)
Vertex Shader
TCS (Tessellation)
TES (Tessellation)
Tessellator
Geometry Shader
Transform Feedback
Rasterization
Fragment Shader
Per-Fragment Ops
Framebuffer
Tr. Feedback buffer
Uniform Block
Texture Fetch
Image Load/Store
Atomic Counter
Shader Storage
Element buffer (EBO)
Draw Indirect Buffer
Vertex Buffer (VBO)
Front-End
(decoder)
Cmd bundles
Push-Buffer
(FIFO)
cmds
FBO resources
(Textures / RB)
OpenGL
Commands
Token-buffers
+ state objects
resources
64 bits
Pointers
(bindless)
State Object
More work for
FE – but fast !
10. 10
Siggraph Asia 2014
Demo – Quadro K5000 CAD Car model 14,338,275 Primitives 300,528 drawcalls 348,862 attribute update 12,004 uniform update Set of CAD models together 29,344,075 Primitives 26,144 drawcalls 18,371 attribute update 13,632 uniform update
11. 11
Siggraph Asia 2014
DemoS – Quadro K5000 CAD Car model K5000: 920,363,200 primitives/S 19,290,668 drawcalls/S
12. 12
Siggraph Asia 2014
DemoS – Quadro K5000
Set of CAD models together K5000 1,211,684,096 primitives/S 1,079,545 drawcalls/S
13. 13
Siggraph Asia 2014
DemoS – Maxwell CAD Car model K5000: 920,363,200 primitives/S 19,290,668 drawcalls/S Maxwell GM204 (GeForce): 1,782,257,920 primitives/S 37,355,848 drawcalls/S Drawcall time : 8 Micro- seconds on CPU (!) Set of CAD models together K5000 1,211,684,096 primitives/S 1,079,545 drawcalls/S Maxwell GM204 (GeForce) 3,012,795,392 primitives/S 2,684,239 drawcalls/S Drawcall time: 42 Micro- seconds on CPU
!
14. 14
Siggraph Asia 2014
More Performances Another test: always using bindless UBO and VBO CPU-Emulation: Token stream executed as CPU "switch-case" Strategies Un-grouped: ~ 1 MB, 79,000 Tokens, ~68,000 drawcalls Material-grouped: ~300 KB, 22,000 Tokens, ~11,000 drawcalls
Strategy
Draw time Material-grouped
0.73
HW TOKEN-Buffers
0.56
Technique
GPU (ms)
1.4 1.14 x
~0 BIG x
CPU (ms)
Draw time un-grouped
2.4
1.1
GPU (ms)
4.5 1.6 x
~0 BIG x
CPU (ms)
CPU-emulated TOKEN-buffers
glBindBufferRange + drawcalls
0.73
1.6
3.5
7.2
Preliminary results K6000
15. 15
Siggraph Asia 2014
More Performances 5 000 shader changes: toggling between two shaders in „shaded & edges“ 5 000 fbo changes: similar as above but with fbo toggle instead of shader Almost no additional cost compared to rendering without fbo changes
Timing
GPU (ms)
CPU (ms)
CPU-emulated TOKEN-buffers
12.7
15.1
2.9 4.3 x
0.4 37 x
HW TOKEN-buffers
2.8 4.5 x
0.005 BIG x
Command-Lists
Timer
GPU (ms)
CPU (ms)
CPU-emulated TOKEN-buffers
60.0
60.0
1.8 33 x
0.9 66 x
HW TOKEN-buffers
1.7 35 x
0.022 BIG x
Command-Lists
Preliminary results on K5000
16. 16
Siggraph Asia 2014
GPU Virtual Memory
Bindless Technology What is it about? Work from native GPU pointers/handles (NVIDIA pioneered this technology) A lot less CPU work (memory hopping, validation...) Allow GPU to use flexible data structures Bindless Buffers Vertex & Global memory since Tesla Generation (CUDA capable) Bindless Textures Since Kepler Bindless Constants (UBO) New driver feature, support for Fermi and above Bindless plays a central role for Command-List
Vertex Puller (IA)
Vertex Shader
TCS (Tessellation)
TES (Tessellation)
Tessellator
Geometry Shader
Transform Feedback
Rasterization
Fragment Shader
Per-Fragment Ops
Framebuffer
Uniform Block
Texture Fetch
Element buffer (EBO)
Vertex Buffer (VBO)
Front-End (decoder)
Push buffer
64 bits address
Attr.s
Idx
Uniform Block
Send Ptrs
17. 17
Siggraph Asia 2014
#define UBA UNIFORM_BUFFER_ADDRESS_NV UpdateBuffers(); glEnableClientState(UNIFORM_BUFFER_UNIFIED_NV); glBufferAddressRangeNV (UBA, 0, addrView, viewSize); foreach (obj in scene) { ... // glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize); glBufferAddressRangeNV(UBA, 1, addrMatrices + obj.matrixOffset, maSize); foreach ( batch in obj.primitive_group_material) { // glBindBufferRange (UBO, 2, uboMaterial, batch.materialOffset, mtlSize); glBufferAddressRangeNV(UBA, 2, addrMaterial + batch.materialOffset, mtlSize); ... } }
Example On Using Bindless UBO
New pointer for UBO#1 updated per Object
New pointer for UBO#2 updated per Primitive group
pointer for UBO#0 updated once for all objects
regular UBO binds are now ignored!
18. 18
Siggraph Asia 2014 Pointers must be aligned So do Ptr Offsets glGetIntegerv(GL_UNIFORM_BUFFER_OFFSET_ALIGNMENT, &offsetAlignment); Normally: 256 bytes
gaps between each item
Try to fit as much data as possible… Not the case if passing an index for array access (MDI + “Base-instance”) But requires special GLSL code (indexing)
Bindless And Memory Alignement
GPU
Virtual
Memory
64 bits address
Uniform Material 0
Uniform Material 1
Uniform Material 2
…
glBufferAddressRangeNV (UBA, 2, addrMaterial + batch.materialOffset , mtlSize);
256b
Uniform array materials[]
Material 0
Material 1
Material 2
…
Uniform …
19. 19
Siggraph Asia 2014
NV_command_list Key Concepts
1.Tokenized Rendering: some state changes and draw commands are encoded into binary data stream Depends on bindless technology
2.State Objects Whole OpenGL States (program, blending...) captured as an object Allows pre-validation of state combinations, later reuse of objects is very fast
3.Command List Object „Display-list“ paradigm but more flexible: buffer are outside (referenced by address), so content can still be modified (matrics, vertices...)
20. 20
Siggraph Asia 2014
UpdateBuffers();
glBindBufferBase (UBO, 0, uboView);
foreach (obj in scene) {
// redundancy filter for these (if (used != last)...
glBindVertexBuffer (0, obj.geometry->vbo, 0, vtxSize);
glBindBuffer (ELEMENT, obj.geometry->ibo);
glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize);
// iterate over cached material groups
foreach ( batch in obj.materialGroups) {
glBindBufferRange (UBO, 2, uboMaterial, batch.materialOffset, mtlSize);
glMultiDrawElements (...);
}
}
Typical Scene Drawing Example
~ 2 500 api drawcalls
~11 000 drawcalls
~55 triangles per call
21. 21
Siggraph Asia 2014
UpdateBuffers();
glBindBufferBase (UBO, 0, uboView);
foreach (obj in scene) {
// redundancy filter for these (if (used != last)...
glBindVertexBuffer (0, obj.geometry->vbo, 0, vtxSize);
glBindBuffer (ELEMENT, obj.geometry->ibo);
glBindBufferRange (UBO, 1, uboMatrices, obj.matrixOffset, maSize);
// iterate over all parts individually
foreach ( part in obj.parts) {
if (part.material != lastMaterial){
glBindBufferRange (UBO, 2, uboMaterial, part.materialOffset, mtlSize);
}
glDrawElements (...);
}
}
Typical Scene Drawing Example
~68 000 drawcalls ~10 triangles per call
22. 22
Siggraph Asia 2014
Scene Drawing Converted To Token Buffer
glDrawCommandsNV (GL_TRIANGLES, tokenBuffer , offsets[], sizes[], count); // {0}, {bufferSize}, 1
foreach (obj in scene) {
glBufferAddressRangeNV(VERTEX.., 0, obj.geometry->addrVBO, ...);
glBufferAddressRangeNV(ELEMENT..., 0, obj.geometry->addrIBO, ...);
glBufferAddressRangeNV(UNIFORM..., 1, addrMatrices ...);
foreach ( batch in obj.materialGroups) {
glBufferAddressRangeNV(UNIFORM, 2, addrMaterial ...);
glMultiDrawElements(...)
}
}
becomes a single drawcall
replaces 80k calls to GL
(for graphic-card model)
Token buffer
Set Attr#0 on VBO address …
Set Attr#1 on VBO address …
Set Elements on EBO address …
Uniform Matrix on UBO address …
Uniform Material
on UBO address …
DrawElements
Uniform Material
on UBO address …
DrawElements
Obj#1
Obj#2
…
For all objects
…
23. 23
Siggraph Asia 2014
Tokens-buffers are tightly packed structs in linear memory
Token Buffer Structures
TokenUbo
{
GLuint header;
UniformAddressCommandNV
{
GLushort index;
GLushort stage;
GLuint64 address;
} cmd;
}
TokenVbo { GLuint header; AttributeAddressCommandNV { GLuint index; GLuint64 address; } cmd; }
TokenIbo
{
GLuint header;
ElementAddressCommandNV
{
GLuint64 address;
GLuint typeSizeInByte;
} cmd;
}
TokenDrawElements
{
Gluint header;
DrawElementsCommandNV
{
GLuint count;
GLuint firstIndex;
GLuint baseVertex;
} cmd;
}
Token buffer
Set Attr#0 on VBO address …
Set Attr#1 on VBO address …
Set Elements on EBO address …
Uniform Matrix
on UBO address …
Uniform Material on UBO address …
DrawElements
Uniform Material
on UBO address …
DrawElements
…
Uniform Material
on UBO address …
DrawElements
Uniform Material
on UBO address …
DrawElements
Uniform Material
on UBO address …
DrawElements
Uniform Material
on UBO address …
DrawElements
Uniform Material
on UBO address …
DrawElements
24. 24
Siggraph Asia 2014
Tokenized Rendering
Token Headers are GPU commands, followed by arguments
FRONTFACE_COMMAND_NV
BLEND_COLOR_COMMAND_NV
STENCIL_REF_COMMAND_NV
LINE_WIDTH_COMMAND_NV
POLYGON_OFFSET_COMMAND_NV
ALPHA_REF_COMMAND_NV
VIEWPORT_COMMAND_NV
SCISSOR_COMMAND_NV
ELEMENT_ADDRESS_COMMAND_NV
ATTRIBUTE_ADDRESS_COMMAND_NV
UNIFORM_ADDRESS_COMMAND_NV
DRAW_ELEMENTS_COMMAND_NV
DRAW_ARRAYS_COMMAND_NV
DRAW_ELEMENTS_STRIP_COMMAND_NV
DRAW_ARRAYS_STRIP_COMMAND_NV
DRAW_ELEMENTS_INSTANCED_COMMAND_NV
DRAW_ARRAYS_INSTANCED_COMMAND_NV
TERMINATE_SEQUENCE_COMMAND_NV
NOP_COMMAND_NV
Final list might change
Render States
Index Buffer;
Position Attribute; Normals; Texcoords...
Matrices; Material data...
Draw Commands !
Special Commands (see later)
25. 25
Siggraph Asia 2014
Tokenized Rendering
Example: UBO Address assignment structure
FRONTFACE_COMMAND_NV
BLEND_COLOR_COMMAND_NV
STENCIL_REF_COMMAND_NV
LINE_WIDTH_COMMAND_NV
POLYGON_OFFSET_COMMAND_NV
ALPHA_REF_COMMAND_NV
VIEWPORT_COMMAND_NV
SCISSOR_COMMAND_NV
ELEMENT_ADDRESS_COMMAND_NV
ATTRIBUTE_ADDRESS_COMMAND_NV
UNIFORM_ADDRESS_COMMAND_NV
DRAW_ELEMENTS_COMMAND_NV
DRAW_ARRAYS_COMMAND_NV
DRAW_ELEMENTS_STRIP_COMMAND_NV
DRAW_ARRAYS_STRIP_COMMAND_NV
DRAW_ELEMENTS_INSTANCED_COMMAND_NV
DRAW_ARRAYS_INSTANCED_COMMAND_NV
TERMINATE_SEQUENCE_COMMAND_NV
NOP_COMMAND_NV
TokenUbo
{
GLuint header;
UniformAddressCommandNV
{
GLushort index;
GLushort stage;
GLuint64 address;
} cmd;
}
Example
Real Token to be queried
header = glGetCommandHeaderNV(
UNIFORM_ADDRESS_COMMAND_NV, Sizeof(TokenUbo) )
One Shading stage to target (not ‘program’)
Stage = glGetStageIndexNV(S)
(S: STAGE_VERTEX|GEOMETRY|FRAGMENT|TESS_CONTROL/EVAL)
Hardware Index
*not* Uniform Program index
26. 26
Siggraph Asia 2014
TOKENIZED RENDERING Allows rendering of a mix of LIST, STRIP, FAN, LOOP modes in one go Pass „basic type“ as Token, use „mode“ for more in "Instanced" drawing
TokenDrawElements
{
Gluint header;
DrawElementsCommandNV
{
GLuint count;
GLuint firstIndex;
GLuint baseVertex;
} cmd;
}
TokenDrawElementsInstanced
{
Gluint header;
DrawElementsInstancedCommandNV
{
GLuint mode;
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLuint baseVertex;
GLuint baseInstance;
} cmd;
}
LIST, STRIP,
FAN, LOOP types
(GL_LINE_LOOP...)
DRAW_ELEMENTS_COMMAND_NV
DRAW_ARRAYS_COMMAND_NV
DRAW_ELEMENTS_STRIP_COMMAND_NV
DRAW_ARRAYS_STRIP_COMMAND_NV
glGetCommandHeaderNV()
DRAW_ELEMENTS_INSTANCED_COMMAND_NV
DRAW_ARRAYS_INSTANCED_COMMAND_NV
glGetCommandHeaderNV()
27. 27
Siggraph Asia 2014
TERMINATE_SEQUENCE: Occlusion Culling Now overcomes deficit of previous methods http://on-demand.gputechconf.com/gtc/2014/presentations/S4379-opengl-44-scene-rendering- techniques.pdf Enqueue all tokens for entire object Shader to Check Visibility; Shader for SCAN operation; Add TERMINATE_SEQUENCE_COMMAND_NV to end
Technique
Draw time GROUPED
Timing
GPU (ms)
CPU (ms)
6.3 2.6 x
0.9 6 x
CULL Old Bindless MDI (*)
CULL NEW TOKEN native
0.2 27 x
glBindBufferRange + drawcalls
16.8
5.4
6.2 2.7 x
6.3 0.85 x
CULL Old Readback (stalls pipe) (*)
K5000, Preliminary results, (*) taken from slightly differnt framework
HW TOKEN-buffers
~0 BIG x
No instancing!
(synthetic test, data replication)
SCENE:
materials: 138
objects: 13,032
geometries: 3,312
parts: 789,464
triangles: 29,527,840
vertices: 27,584,376
16.8
5.3 3.1 x
28. 28
Siggraph Asia 2014 Just a little... “commandBindableNV” turns binding IDs to Hardware IDs Texturing must go through bindless
ARB_bindless_texture Regular Texture binding Turned Off Changing UBO Ptr Addresses == texture changes (here ‘myBuf’)
Does Shaders Need Special Code ?
#extension GL_ARB_bindless_texture : require #extension GL_NV_command_list : enable #if GL_NV_command_list
layout(commandBindableNV) uniform; #endif layout(std140,binding=2) uniform myBuf {
sampler2D mySampler;
//or:
in int whichSampler;
}; layout(std140,binding=1) uniform samplers { sampler2D allSamplers[NUM_TEXTURES]; };
…
c = texture(mySampler, tc);
//or
c = texture(allSamplers [ whichSampler ] , tc);
…
Example:
29. 29
Siggraph Asia 2014
State Objects StateObject Encapsulates majority of state (fbo format, active shader, blend, depth ...) State Object immutable: gives more control over validation cost no bindings captured: bindless used instead; textures passed via UBO Primitive type is a state: participates to important validation work
Cannot be put in the token buffer
glCaptureStateNV(stateobject, GL_TRIANGLES );
Primitive type as argument:
because it is part of states
30. 30
Siggraph Asia 2014 Render entire scenes with different shaders/fbos... in one go
Driver caches state transitions !
Draw With State Objects
glDrawCommandsStatesNV(
tokenBuffer,
offsets[], sizes[],
states[], fbos[],
count);
Token buffer
VBO address …
VBO address …
EBO address …
Uniform Matrix
Uniform Material
DrawElements
Uniform Material
DrawElements
…
EBO address …
Uniform Matrix
Uniform Material
DrawElements
Uniform Material
DrawElements
FBO 1
FBO 2
fbos[]
FBO 1
FBO 1
State Obj 1
State Obj 2
State Obj 1
State Obj 4
Count = 4
States[]
offsets[]
sizes[0]
sizes[1]
sizes[2]
sizes[3]
31. 31
Siggraph Asia 2014 Driver bakes „Diff“ between states and apply only the difference
Pseudo Code of glDrawCommandsStatesNV
for i < count {
if (i == 0) set state from states[i];
else set state transition states[i-1] to states[i]
if (fbo[i]) {
// fbo[i] must be compatible with states[i].fbo
glBindFramebuffer( fbo[i] )
} else glBindFramebuffer( states[i].fbo )
ProcessCommandSequence(...tokenBuffer, offsets[i], sizes[i])
}
32. 32
Siggraph Asia 2014 Rendering must always happen in a FBO Backbuffer Ptr. Address is un-reliable (dynamic re-alloc when resizing…) Use glBlitFramebuffer() for FBO final Backbuffer result FBO Resources Must be Resident (glMakeTextureHandleResidentARB()) Resources can be replaced/swapped if they keep the same base configuration (attachment formats, # drawbuffers...) I.e. Same configuration as captured FBO settings from stateobject This is what happens when resizing:
Detach & Destroy resource create new ones attach them to FBO(s)
Frame-Buffer Objects And State Objects
33. 33
Siggraph Asia 2014 Combine multiple token buffers & states into a CommandList object glCreateCommandListsNV(N, &list) glCommandListSegmentsNV(list, segs) Convenient for concatenation glListDrawCommandsStatesClientNV(list, segment, token-buffer-ptrs, token-buffer- sizes, states, FBOs, count) glCompileCommandListNV(list); glCallCommandListNV(list)
Command-List Object
System mem.
VBO address …
VBO address …
EBO address …
Uniform Matrix
Uniform Material
DrawElements
…
Uniform Material
DrawElements
FBO
State Object
System mem.
VBO address …
VBO address …
EBO address …
Uniform Matrix
Uniform Material
DrawElements
…
FBO
State Object
System mem.
VBO address …
VBO address …
Uniform Matrix
Uniform Material
DrawArrays
…
Uniform Material
DrawArrays
FBO
State Object
…
Command-List Segment #0 Segment #1
34. 34
Siggraph Asia 2014
Command-List Object Less flexibilty compared to token buffer Token buffers taken from client pointers (CPU memory) Driver will optimize these token buffers and related state-objects Optimize State transitions; token cmds for the GPU Front-End Resulting command-list is immutable If any resource pointer changed, the command-list must be recompiled But still rather flexible possible to change content of referenced buffers! (matrices, materials, vertices...) Animation; skinning... No need for command-list recompilation
35. 35
Siggraph Asia 2014 Command-List doesn’t pretend to solve any OpenGL multi-threading Multi-threading can help if complex CPU work going-on update of Vertex / Element buffers: transform. Matrices; skinning… Update of Token Buffers: adding/removing characters in a game scene… OpenGL & Multithreading issues: OpenGL is bad at sharing its access from various threads Make-Current or Multiple Contexts aren’t good solutions Prefer to keep context ownership to one main thread
OpenGL And Multi-threading ?
36. 36
Siggraph Asia 2014 Token Buffers and Vertex/Elements/Uniform Buffers (VBO/UBO) Token Buffer Objects; VBOs and UBOs owned by OpenGL Multi-threading can be used for data update if: Work on CPU system memory; then push data back to OpenGL (glBufferSubData) Or Request for a pointer (glMapBuffers); then work with it from the thread State Objects: State Object and their 'Capture' must be handled in OpenGL context No way to do it on a separate threads that don’t have OpenGL context
Multi-threading And Command-List
37. 37
Siggraph Asia 2014
Multi-threading Use-Case Example #1
Thread #0
OpenGL Ctxt
Thread #1
Thread #1
Submit
States
PushAttrib()
set states & and capture
PopAttribs()
Get State ID
PushAttrib()
set states & and capture
PopAttribs()
Submit States
unmap Token buf.
ask for
Token buf.
Ptr
Build state list
ask for
Token buf.
Ptr
unmap
Token buf.
Build token cmds
Get State ID
Build token cmds
Build token cmds
ask for
Token buf.
Ptr
unmap Token buf.
ask for
Token buf.
Ptr
...
ask for
Token buf.
Ptr
Build token
cmds
unmap Token buf.
Build token cmds
Build state
list
38. 38
Siggraph Asia 2014
Multi-threading Use-Case Example #2
Split state capture from Token-buffer creation
Thread 1
Thread 2
Thread 3
Build Token buffers
in CPU memory
Thread 0
OpenGL
Ctxt
Walk through whole scene
and build State-Objects
Receive Token-Buffers
and issue them
as Token Buffer Objects
to OpenGL
39. 39
Siggraph Asia 2014
Migration Steps All rendering via FBO (statecapture doesn‘t support default backbuffer) No legacy states in GLSL use custom uniforms no built-in uniforms (gl_ModelView...) use glVertexAttribPointer (no glTexCoordPointer etc.) No classic uniforms, pack them all in UBO ARB_bindless_texture for texturing Bindless Buffers ARB_vertex_attrib_binding combined with NV_vertex_buffer_unified_memory (VBUM)... Organize for StateObject reuse state capture expensive: Capture them at init. time; Factorize their use in the scene
40. 40
Siggraph Asia 2014
Migration Tips Vertex Attributes and bindless VBO http://on-demand.gputechconf.com/siggraph/2014/presentation/SG4117- OpenGL-Scene-Rendering-Techniques.pdf (slide 11-16) GLSL
// classic attributes
...
normal = gl_Normal;
gl_Position = gl_Vertex;
// generic attributes
// ideally share this definition across C and GLSL
#define VERTEX_POS 0
#define VERTEX_NORMAL 1
in layout(location= VERTEX_POS) vec4 attr_Pos;
in layout(location= VERTEX_NORMAL) vec3 attr_Normal;
...
normal = attr_Normal;
gl_Position = attr_Pos;
41. 41
Siggraph Asia 2014
Migration Tips UBO Parameter management http://on-demand.gputechconf.com/siggraph/2014/presentation/SG4117- OpenGL-Scene-Rendering-Techniques.pdf ( 18-27, 44-47) Ideally group by frequency of change
// classic uniforms
uniform samplerCube viewEnvTex;
uniform vec4 materialColor;
uniform sampler2D materialTex;
...
// UBO usage, bindless texture inside UBO, grouped by change layout(std140, binding=0, commandBindableNV) uniform view { samplerCube texViewEnv; }; layout(std140, binding=1, commandBindableNV) uniform material { vec4 materialColor; sampler2D texMaterialColor; }; ...
42. 42
Siggraph Asia 2014 OpenGL Devtech 'Proviz' samples soon available on GitHub ! check https://github.com/nvpro-samples on a regular basis working on ramping-up https://developer.nvidia.com/samples & https://developer.nvidia.com/professional-graphics command-lists examples to come here this Dec. 2014 Many other samples will be pushed here as standalone repositories Beta Driver for Command-Lists: Release 346 for Dec.2014 Special Thanks to Pierre Boudier and Christoph Kubisch Questions/Feedback: tlorach@nvidia.com / ckubisch@nvidia.com
Thanks!
Editor's Notes
New APIs are meant to increase the efficiency of drivers by lowering their cost on CPU and thus shifting the workload over the GPU. This can translate to a better use of the GPU. But it can also translate to simply lowering the pressure over the CPU.