These slides discuss the techniques applied to porting a large, commercial AAA engine from Windows to Linux. It includes the lessons learned along the way, and pitfalls we ran into to help serve as a warning to other developers.
This presentation demonstrates how to efficiently manage GPU buffers using today's APIs. It describes why buffer management is so important, and how inefficient buffer management can cut frame rates in half. Finally, it demonstrates a couple of new techniques; the first being discard-free circular buffers and the second transient buffers.
Ever wondered how to use modern OpenGL in a way that radically reduces driver overhead? Then this talk is for you.
John McDonald and Cass Everitt gave this talk at Steam Dev Days in Seattle on Jan 16, 2014.
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
Benefits of Multi-rail Cluster Architectures for GPU-based Nodesinside-BigData.com
Craig Tierney from NVIDIA presented this deck at the MVAPICH User Group meeting.
"As high performance computing moves toward GPU-accelerated architectures, single node application performance can be between 3x and 75x faster than the CPUs alone. Performance increases of this size will require increases in network bandwidth and message rate to prevent the network from becoming the bottleneck in scalability. In this talk, we will present results from NVLink enabled systems connected via quad-rail EDR Infiniband."
Watch the video: https://wp.me/p3RLHQ-hkr
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
A description of the next-gen rendering technique called Triangle Visibility Buffer. It offers up to 10x - 20x geometry compared to Deferred rendering and much higher resolution. Generally it aligns better with memory access patterns in modern GPUs compared to Deferred Lighting like Clustered Deferred Lighting etc.
Graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized microprocessor that offloads and accelerates graphics rendering from the central (micro) processor. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In CPU, only a fraction of the chip does computations where as the GPU devotes more transistors to data processing.
GPGPU is a programming methodology based on modifying algorithms to run on existing GPU hardware for increased performance. Unfortunately, GPGPU programming is significantly more complex than traditional programming for several reasons.
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
This presentation demonstrates how to efficiently manage GPU buffers using today's APIs. It describes why buffer management is so important, and how inefficient buffer management can cut frame rates in half. Finally, it demonstrates a couple of new techniques; the first being discard-free circular buffers and the second transient buffers.
Ever wondered how to use modern OpenGL in a way that radically reduces driver overhead? Then this talk is for you.
John McDonald and Cass Everitt gave this talk at Steam Dev Days in Seattle on Jan 16, 2014.
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
Benefits of Multi-rail Cluster Architectures for GPU-based Nodesinside-BigData.com
Craig Tierney from NVIDIA presented this deck at the MVAPICH User Group meeting.
"As high performance computing moves toward GPU-accelerated architectures, single node application performance can be between 3x and 75x faster than the CPUs alone. Performance increases of this size will require increases in network bandwidth and message rate to prevent the network from becoming the bottleneck in scalability. In this talk, we will present results from NVLink enabled systems connected via quad-rail EDR Infiniband."
Watch the video: https://wp.me/p3RLHQ-hkr
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
A description of the next-gen rendering technique called Triangle Visibility Buffer. It offers up to 10x - 20x geometry compared to Deferred rendering and much higher resolution. Generally it aligns better with memory access patterns in modern GPUs compared to Deferred Lighting like Clustered Deferred Lighting etc.
Graphics processing unit or GPU (also occasionally called visual processing unit or VPU) is a specialized microprocessor that offloads and accelerates graphics rendering from the central (micro) processor. Modern GPUs are very efficient at manipulating computer graphics, and their highly parallel structure makes them more effective than general-purpose CPUs for a range of complex algorithms. In CPU, only a fraction of the chip does computations where as the GPU devotes more transistors to data processing.
GPGPU is a programming methodology based on modifying algorithms to run on existing GPU hardware for increased performance. Unfortunately, GPGPU programming is significantly more complex than traditional programming for several reasons.
Bindless Deferred Decals in The Surge 2Philip Hammer
These are the slides for my talk at Digital Dragons 2019 in Krakow.
Update: The recordings are online on youtube now:
https://www.youtube.com/watch?v=e2wPMqWETj8
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
Fundamentals of Linux Memory Management and CMA (Contiguous Memory Allocator) In Linux.
Virtual Memory, Physical Memory, Swap Space, DMA, IOMMU, Paging, Segmentation, TLB, Hugepages, Ion google memory manager
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
Learn how NVIDIA continues improving both Vulkan and OpenGL for cross-platform graphics and compute development. This high-level talk is intended for anyone wanting to understand the state of Vulkan and OpenGL in 2017 on NVIDIA GPUs. For OpenGL, the latest standard update maintains the compatibility and feature-richness you expect. For Vulkan, NVIDIA has enabled the latest NVIDIA GPU hardware features and now provides explicit support for multiple GPUs. And for either API, NVIDIA's SDKs and Nsight tools help you develop and debug your application faster.
NVIDIA booth theater presentation at SIGGRAPH in Los Angeles, August 1, 2017.
http://www.nvidia.com/object/siggraph2017-schedule.html?id=sig1732
Get your SIGGRAPH driver release with OpenGL 4.6 and the latest Vulkan functionality from
https://developer.nvidia.com/opengl-driver
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Unity Technologies
The Elder Scrolls Blades strove to produce high-quality visuals on modern mobile devices. This talk will describe the challenges of achieving that level of quality in procedurally generated 3D environments.
Speakers:
Simon-Pierre Thibault - Bethesda Game Studios
Sergei Savchenko - Bethesda Game Studios
Watch the session here: https://youtu.be/KbxiGH6igBk
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
Your AMI is one of the core foundations for running applications and services effectively on Amazon EC2. In this session, you'll learn how to optimize your AMI, including how you can measure and diagnose system performance and tune parameters for improved CPU and network performance. We'll cover application-specific examples from Netflix on how optimized AMIs can lead to improved performance.
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
This lecture covers rendering topics related to Crytek’s latest engine iteration, the technology which powers titles such as Ryse, Warface, and Crysis 3. Among covered topics, Sousa presented SMAA 1TX: an update featuring a robust and simple temporal antialising component; performant and physically-plausible camera related post-processing techniques such as motion blur and depth of field were also covered.
eMMC 5.0 is the latest generation of embedded NAND Flash IP. Arasan provides a complete solution including digital controllers for host and device, the mixed PHY I/O and pads, software drivers, hardware validation and support.
This course gets you started with writing device drivers in Linux by providing real time hardware exposure. Equip you with real-time tools, debugging techniques and industry usage in a hands-on manner. Dedicated hardware by Emertxe's device driver learning kit. Special focus on character and USB device drivers.
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
Fundamentals of Linux Memory Management and CMA (Contiguous Memory Allocator) In Linux.
Virtual Memory, Physical Memory, Swap Space, DMA, IOMMU, Paging, Segmentation, TLB, Hugepages, Ion google memory manager
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
Learn how NVIDIA continues improving both Vulkan and OpenGL for cross-platform graphics and compute development. This high-level talk is intended for anyone wanting to understand the state of Vulkan and OpenGL in 2017 on NVIDIA GPUs. For OpenGL, the latest standard update maintains the compatibility and feature-richness you expect. For Vulkan, NVIDIA has enabled the latest NVIDIA GPU hardware features and now provides explicit support for multiple GPUs. And for either API, NVIDIA's SDKs and Nsight tools help you develop and debug your application faster.
NVIDIA booth theater presentation at SIGGRAPH in Los Angeles, August 1, 2017.
http://www.nvidia.com/object/siggraph2017-schedule.html?id=sig1732
Get your SIGGRAPH driver release with OpenGL 4.6 and the latest Vulkan functionality from
https://developer.nvidia.com/opengl-driver
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite ...Unity Technologies
The Elder Scrolls Blades strove to produce high-quality visuals on modern mobile devices. This talk will describe the challenges of achieving that level of quality in procedurally generated 3D environments.
Speakers:
Simon-Pierre Thibault - Bethesda Game Studios
Sergei Savchenko - Bethesda Game Studios
Watch the session here: https://youtu.be/KbxiGH6igBk
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
Your AMI is one of the core foundations for running applications and services effectively on Amazon EC2. In this session, you'll learn how to optimize your AMI, including how you can measure and diagnose system performance and tune parameters for improved CPU and network performance. We'll cover application-specific examples from Netflix on how optimized AMIs can lead to improved performance.
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Graphics Gems from CryENGINE 3 (Siggraph 2013)Tiago Sousa
This lecture covers rendering topics related to Crytek’s latest engine iteration, the technology which powers titles such as Ryse, Warface, and Crysis 3. Among covered topics, Sousa presented SMAA 1TX: an update featuring a robust and simple temporal antialising component; performant and physically-plausible camera related post-processing techniques such as motion blur and depth of field were also covered.
eMMC 5.0 is the latest generation of embedded NAND Flash IP. Arasan provides a complete solution including digital controllers for host and device, the mixed PHY I/O and pads, software drivers, hardware validation and support.
This course gets you started with writing device drivers in Linux by providing real time hardware exposure. Equip you with real-time tools, debugging techniques and industry usage in a hands-on manner. Dedicated hardware by Emertxe's device driver learning kit. Special focus on character and USB device drivers.
presented at SIGGRAPH 2014 in Vancouver during NVIDIA's "Best of GTC" sponsored sessions
http://www.nvidia.com/object/siggraph2014-best-gtc.html
Watch the replay that includes a demo of GPU-accelerated Illustrator and several OpenGL 4 demos running on NVIDIA's Tegra Shield tablet.
http://www.ustream.tv/recorded/51255959
Find out more about the OpenGL examples for GameWorks:
https://developer.nvidia.com/gameworks-opengl-samples
This presentation made at TI Developer Conference 2008, introduces the options available for developers to create User Interfaces on TI SGX based platforms.
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
Direct3D 11 will have tessellation for smoother curves and finer details. The new compute shader will make postprocessing faster and easier. You'll need Direct3D 11 to have the best graphics, and this talk will show you how you can get started using current generation hardware.
Computer Graphics - Lecture 01 - 3D Programming I💻 Anton Gerdelan
Slides from when I was teaching CS4052 Computer Graphics at Trinity College Dublin in Ireland.
These slides aren't used any more so they may as well be available to the public!
There are some mistakes in the slides, I'll try to comment below these.
This is the second lecture, and introduces programming with OpenGL 4 and shaders.
OpenGL Fixed Function to Shaders - Porting a fixed function application to “m...ICS
Watch the video here: http://bit.ly/1TA24fU
OpenGL Fixed Function to Shaders - Porting a fixed function application to “modern” OpenGL - Webinar Mar 2016
Writing NodeJS applications is an easy task for JavaScript developers. However, getting what is happening under the hood in NodeJS may be intimidating, but understanding it is vital for web developers.
Indeed, when you try to learn NodeJS, most tutorials are about the NodeJS ecosystem like Express, Socket.IO, PassportJS. It is really rare to see some tutorials about the NodeJS runtime itself.
By this meetup, I want to spot the light on some advanced NodeJS topics so as to help developers answering questions an experienced NodeJS developer is expected to answer. Understanding these topics is essential to make you a much more desirable developer. I want to explore several topics including the famous event-loop along with NodeJS Module Patterns and how dependencies actually work in NodeJS.
I hope that this meetup would help you to be more comfortable understanding advanced code written in NodeJS.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
4. Why port?
Linux is open
Linux (for gaming) is
growing, and quickly
Stepping stone to mobile
Performance
Steam for Linux
% December January February
Windows 94.79 94.56 94.09
Mac 3.71 3.56 3.07
Linux 0.79 1.12 2.01
0%
1%
10%
100%
Nov Dec Jan Feb
Linux Mac Windows
5. Why port? – cont’d
GL exposes functionality by hardware
capability—not OS.
China tends to have equivalent GPUs, but overwhelmingly still
runs XP
OpenGL can allow DX10/DX11 (and beyond) features for all of those users
6. Why port? – cont’d
Specifications are public.
GL is owned by committee, membership
is available to anyone with interest (and some, but not a lot, of
$).
GL can be extended quickly, starting with a single vendor.
GL is extremely powerful
8. Windowing issues
Consider SDL!
Handles all cross-platform windowing issues, including on mobile
OSes.
Tight C implementation—everything you need, nothing you don’t.
Used for all Valve ports, and Linux Steam
http://www.libsdl.org/
9. Filesystem issues
Linux filesystems are case-sensitive
Windows is not
Not a big issue for deployment (because everyone ships packs of
some sort)
But an issue during development, with loose files
Solution 1: Slam all assets to lower case, including
directories, then tolower all file lookups (only adjust below root)
Solution 2: Build file cache, look for similarly named files
10. Other issues
Bad Defines
E.g. Assuming that LINUX meant DEDICATED_SERVER
Locale issues
locale can break printf/scanf round-tripping
Solution: Set locale to en_US.utf8, handle internationalization internally
One problem: Not everyone has en_US.utf8—so pop up a warning in that
case.
11. More Other Issues
Font
Consider freetype and fontconfig
Still work determining how to translate font sizes to linux
RDTSC (use clock_gettime(CLOCK_MONOTONIC) instead)
Raw Mouse input
Great, but some window managers also grab the keyboard
This breaks alt-tab. Grr.
Multi-monitor is less polished than Windows
SDL mostly handles this for you
13. Steam Linux Runtime (and SDK)
Runtime provides binary compatibility across many Linux distros
for end users
SDK has everything you’ll need to target the runtime in one
convenient set of packages
Debug versions available, too
For both developers and end users
http://media.steampowered.com/client/runtime/steam-runtime-sdk_latest.tar.xz
https://github.com/ValveSoftware/steam-runtime
14. Tools – CPU Compilation/Debug
Compilation / Debug
gcc – compilation
gdb – debugging from 1970
cgdb – debugging from 2000
ldd – dumpbin for linux
nm – for symbol information
objdump – disassembler / binary details
readelf – more details about binaries
make – no, really
We’ll talk about GPU Debug tools later
15. Tools – CPU Perf analysis
perf – free sampling profiler
vtune – Intel’s tool works on Linux, too!
Telemetry – You’re using this already, right?
Again, we’ll talk about GPU perf tools later
16. Telemetry
Telemetry is a performance visualization system on
steroids, created by RAD Game Tools.
Very low overhead (so you can leave it on all through
development)
Quickly identify long frames
Then dig into guts of that
frame
21. OpenGL Support
D3D10
D3D9
D3D11 GPU / D3D11 Capable OS
D3D10 GPU / D3D10 Capable OS
D3D10 GPU / D3D9 Capable OS
D3D9 (or below) GPU / All OSes
Sep 2011 Feb 2013
D3D11
22. togl
―to GL‖
A D3D9/10/11 implementation using
OpenGL
In application, using a DLL.
Engine code is overwhelmingly
(99.9%) unaware of which API is
being used—even rendering.
Source Engine
Matsys Shaderlib ShaderAPI
Direct3D
GPU
23. togl
―to GL‖
A D3D9/10/11 implementation using
OpenGL
In application, using a DLL.
Engine code is overwhelmingly
(99.9%) unaware of which API is
being used—even rendering.
Perf was a concern, but not a problem—this stack beats the
shorter stack by ~20% in apples:apples testing.
Source Engine
Matsys Shaderlib ShaderAPI
“CDirect3D9” (togl)
OpenGL
GPU
25. GL / D3D differences
GL has thread local data
A thread can have at most one Context current
A Context can be current on at most one thread
Calls into the GL from a thread that has no current Context are specified
to ―have no effect‖
MakeCurrent affects relationship between current thread and a Context.
Context Thread Context Thread
Thread
Thread
Context Thread
Context
Context
26. GL / D3D differences
GL is C based, objects referenced by handle
Many functions don’t take a handle at all, act on currently selected object
Handle is usually a GLuint.
GL supports extensions
GL is chatty, but shockingly efficient.
Do not judge a piece of code by the number of function calls.
Profile, profile, profile!
GL doesn’t suffer lost devices
27. GL extensions
NV|AMD|APPLE extensions are vendor specific (but may still be
supported cross-vendor)
Ex: NV_bindless_texture
EXT are multi-vendor specs
Ex: EXT_separate_shader_objects
ARB are ARB-approved
Ex: ARB_multitexture
Core extensions
A core feature from a later GL version exposed as an extension to an earlier GL
version.
Platform extensions (WGL|GLX|AGL|EGL)
Consider GLEW or similar to wrangle extensions
http://www.opengl.org/wiki/OpenGL_Extension
28. GL tricks
When googling for GL functions, enums, etc, search with and
without the leading gl or GL_
Reading specs will make you more powerful than you can possibly
imagine
Don’t like where GL is heading? Join Khronos Group and shape
your destiny.
29. GL objects
GL has many objects: textures, buffers, FBOs, etc.
Current object reference unit is selected using a selector, then
the object is bound.
Modifications then apply to the currently bound object.
Most object types have a default object 0.
30. GL Object Model (cont’d)
// Select texture unit 3.
glActiveTexture( GL_TEXTURE0 + 3 );
// bind texture object 7, which is a 2D texture.
glBindTexture( GL_TEXTURE_2D, 7 );
// Texture object 7 will now use nearest filtering for
// minification.
glTexParameteri( GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,
GL_NEAREST );
31. Core vs Compatibility
Some IHVs assert Core will be faster
No actual driver implementations have demonstrated this
Tools starting with Core, but will add Compat features as needed.
Some extensions / behaviors are outlawed by Core.
Recommendation: Use what you need.
33. EXT_direct_state_access
Common functions take an object name directly, no binding
needed for manipulation.
Code is easier to read, less switching needed.
More similar to D3D usage patterns
http://www.opengl.org/registry/specs/EXT/direct_state_access.txt
35. DSA when DSA is unavailable
DSA is a driver-only extension—hardware is irrelevant.
Write client code that assumes DSA
Provide your own DSA function(s) when DSA is unavailable
When resolving functions, use a pointer to your function if
extension is unavailable.
void myTextureParameteriEXT( GLuint texture, GLenum target,
GLenum pname, GLint param)
{
GLint curTex;
glGetIntegeriv( GL_TEXTURE_BINDING_2D, &curTex );
glBindTexture( target, texture );
glTexParameteri( target, pname, param );
glBindTexture( target, curTex );
}
36. EXT_swap_interval
Vsync, but can be changed dynamically at any time.
Actually a WGL/GLX extension.
wglSwapInterval(1); // Enable VSYNC
wglSwapInterval(0); // Disable VSYNC
http://www.opengl.org/wiki/Swap_Interval
http://www.opengl.org/registry/specs/EXT/wgl_swap_control.txt
http://www.opengl.org/registry/specs/EXT/swap_control.txt
37. EXT_swap_control_tear
XBox-style Swap-tear for the PC.
Requested by John Carmack.
First driver support a few weeks later
All vendors supported within a few months
wglSwapIntervalEXT(-1); // Try to vsync, but tear if late!
http://www.opengl.org/registry/specs/EXT/wgl_swap_control_tear.txt
http://www.opengl.org/registry/specs/EXT/glx_swap_control_tear.txt
38. ARB_debug_output
You provide a callback when the driver detects an error—get fed a
message.
When the driver is in single-
threaded mode, you can see
all the way back into your
own stack.
Supports fine-grained message
control.
And you can insert your own
messages in the error stream
from client code.
Quality varies by vendor, but
getting better.
40. More Useful GL Extensions
NVX_gpu_memory_info / GL_ATI_meminfo
Get memory info about the underlying GPU
GL_GREMEDY_string_marker
D3DPERF-equivalent
GL_ARB_vertex_array_bgra
better matches UINT-expectations of D3D
GL_APPLE_client_storage / GL_APPLE_texture_range
Not for linux, but useful for Mac.
41. GL Pitfalls
Several pitfalls along the way
Functional
Texture State
Handedness
Texture origin differences
Pixel Center Convention (D3D9->GL only)
Performance
MakeCurrent issues
Driver Serialization
Vendor differences—be sure to test your code on multiple vendors
42. Texture State
By default, GL stores information about how to access a texture in
a header that is directly tied to the texture.
This code doesn’t do what you want:
Texture*
Sampler
Info
Image Data
* Not to scale
44. ARB_sampler_objects
With ARB_sampler_objects, textures can now be accessed
different ways through different units.
Samplers take precedence over texture headers
If sampler 0 is bound, the texture header will be read.
No shader changes required
http://www.opengl.org/registry/specs/ARB/sampler_objects.txt
46. Other GL/D3D differences (cont’d)
Handedness
D3D is left-handed everywhere, GL is right-handed everywhere
Texture origin is lower-left in GL (flip coordinates about v)
Consider rendering upside-down, flipping at the end.
GLSL uses column-major matrices by default
Including when specifying constants/uniforms
Pixel Centers
OpenGL matches D3D10+
47. MakeCurrent issues
Responsible for several bugs on TF2
Font rendering glitches (the thread creating text tries to update
the texture page, but didn’t own the context
50. Driver Serialization
Modern OpenGL drivers are dual-core / multithreaded
Your application speaks to a thin shim
The shim moves data over to another thread to prepare for submission
Similar to D3D
Issuing certain calls causes the shim to need to flush all
work, then synchronize with the server thread.
This is very expensive
51. Known naughty functions
glGet(…) – Most of these cause serialization; shadow state (just
like D3D)
glGetError - use ARB_debug_output!
Functions that return a value
Functions that copy a non-determinable amount of client
memory, or determining the memory would be very hard
52. Detecting Driver Serialization
ARB_debug_output to the rescue!
Place a breakpoint in your callback, look up the callstack to see
which call is causing the problem
Message in ARB_debug_output to look for: ―Synchronous call:
stalling threaded optimizations.‖
53. Device (Context) Creation in GL
Creating a simple context in GL is easy:
Create a Window
Create a Context
Whether this gets you a Core or Compatibility context is
unspecified , but most vendors give you Compatibility.
Creating a ―robust‖ context with a specific GL-support version
requires using a WGL/GLX extension, and is trickier:
54. Context Creation – Cont’d
1. Create a window (don’t show)
2. Create a context
3. Query for window-specific extensions
4. Create another window (this will be the application window)
5. Create a context using extension function from step 3.
6. Destroy Context from step 2.
7. Destroy window from step 1.
Yuck.
With SDL, SDL_GL_SetAttribute + SDL_CreateWindow.
55. Common D3D Idioms in GL
Vertex Attributes
Vertex Buffers
Textures
Render to texture
Shaders
57. Vertex Attribs – Alternative #1
Vertex Attribute Objects (VAOs)
Good mapping for D3D (seductive!)
Slower than glVertexAttribPointer on all implementations
Recommendation: Skip it
58. ARB_vertex_attrib_binding
Separates Format from Binding
Code is easy to read
glVertexAttribFormat( 0, 4, GL_FLOAT, FALSE, 0 );
glVertexAttribBinding( 0, 0 );
glBindVertexBuffer( 0, buffer0, 0, 24 );
http://www.opengl.org/registry/specs/ARB/vertex_attrib_binding.txt
61. Vertex (and Index) Buffer Using
// Binding VBs also involves setting up VB attributes.
glBindBuffer( GL_ARRAY_BUFFER, vb );
glVertexAttribPointer( mProgram_pos, 3, GL_FLOAT, GL_FALSE, 24, 0 );
glVertexAttribPointer( mProgram_n, 3, GL_FLOAT, GL_FALSE, 24, 12 );
glEnableVertexAttribArray( mProgram_pos );
glEnableVertexAttribArray( mProgram_n );
// We finally know what the type is!
glBindBuffer( GL_ELEMENT_ARRAY_BUFFER, ib );
62. Dynamic Buffer Updates
Don’t use MapBuffer—because it returns a pointer, it causes
driver serialization.
Even worse, it probably causes a CPU-GPU sync point.
Instead, use BufferSubData on subsequent regions, then
BufferData when it’s time to discard.
63. Render to Texture
Render-to-texture in GL utilizes Frame Buffer Objects (FBOs)
FBOs are created like other objects, and have attachment points.
Many color points, one depth, one stencil, one depth-stencil
FBOs must be ―framebuffer complete‖ to be rendered to.
FBOs, like other ―container objects,‖ are not shared between
contexts.
http://www.opengl.org/registry/specs/ARB/framebuffer_object.txt
64. Frame Buffers
Spec has fantastic examples for creation, updating, etc, so not
replicating here
Watch BindRenderTarget (and BindDepthStencil) etc calls
At draw time, check whether render targets are in an existing
FBO configuration (exactly) via hash lookup
If so, use it.
If not, create a new FBO, bind attachments, check for
completeness and store in cache.
65. Frame Buffers – Don’ts
Do not create a single FBO and then swap out attachments on it.
This causes lots of validation in the driver, which in turn leads to
poor performance.
66. Shaders/Programs
In GL, Shaders are attached to a Program.
Each Shader covers a single shader stage (VS, PS, etc)
Shaders are Compiled
Programs are Linked
The Program is ―used‖
This clearly doesn’t map particularly well to D3D, which supports
mix-and-match.
67. Shaders/Programs cont’d
GL Uniforms == D3D Constants
Uniforms are part of program state
Swapping out programs also swaps uniforms
This also maps poorly to D3D.
68. Uniform problem
To solve the uniform problem, consider uniform buffer objects
Create a single buffer, bind to all programs
Modify parameters in the buffer
Or, keep track of ―global‖ uniform state and set values just prior
to draw time
If you’re coming from D3D11, Uniform Buffers ARE Constant
Buffers—no problems there.
http://www.opengl.org/wiki/Uniform_Buffer_Object
http://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt
69. Shader Approach #1: Program Hash
Pay attention to shaders that get set.
At draw time, hash the names of the shaders to see if an existing
program object has been linked
Otherwise, link and store in the hash
70. Shader Translation
You have a pile of HLSL. You need to give GL GLSL.
ARB_vertex_program / ARB_fragment_program is a possible
alternative, but only for DX9.
No *_tessellation_program
71. Shader Translation cont’d
One approach: compile HLSL, translate the byte code to simple
GLSL asm-like.
Pro: One set of shaders goes public
Pro: Can be fast
Con: Can be hard to debug problems
Con: Potentially slow fxc idioms end up in generated GLSL
Con: Debugging requires heavy cognitive load
72. Other Translation Approaches
Open Source Alternatives
HLSLCrossCompiler – D3D11 only (SM4/5)
MojoShader – SM1/2/3
Shipped in several games and engines, including Unreal Tournament 3, Unity.
https://github.com/James-Jones/HLSLCrossCompiler
http://icculus.org/mojoshader/
74. Performance tips – cont’d
For best performance, you will have to write vendor-specific code
in some cases.
But you were probably doing this anyways
And now behavior is specified in a public specification.
75. GL Debugging and Perf Tools
NVIDIA Nsight supports GL 4.2 Core.
With some specific extensions
More extensions / features coming!
PerfStudio and gDEBugger
CodeXL
Apitrace
Open Source api tracing tool—has scaling issues which Valve is working to
fix.
76. GL Debugging Tricks
Compare D3D to GL images
Keep them both
working on the
same platform
Bonus points:
Have the game
running on two machines,
broadcast inputs to both,
compare images in
realtime.
79. Magic Symbol Resolution
Linux equivalent of _NT_SYMBOL_PATH
In ~/.gdbinit:
set debug-file-directory /usr/lib/debug:/mnt/symstore/debug
/mnt/symstore/debug is a shared, remotely mounted share with your
symbols
Populate that server with symbols
Currently only applied to gdb, should also apply to Google’s perf tool
―soon‖
http://randomascii.wordpress.com/2013/02/20/symbols-on-linux-part-three-linux-versus-windows/
http://fedoraproject.org/wiki/Releases/FeatureBuildId
http://randomascii.wordpress.com/category/symbols-2/
80. Performance tips
Force-inline is your friend—many of the functions you’ll be
implementing are among the most-called functions in the
application.
With few exceptions, you can maintain a GL:D3D call ratio of 1:1
or less.
For example, use glBindMultiTextureEXT instead of
glActiveTexture/glBindTexture.
glBindMultiTextureEXT(texUnit, target, texture)
82. Sampler gotchas…
On certain drivers, GL_TEXTURE_COMPARE_MODE (for shadow
map lookups) is buggy when set via sampler.
For robustness, use texture setting on those particular drivers.
83. Latched State
Recall that GL is very stateful.
State set by an earlier call is often captured (latched) by a later
call.
Vertex Attributes are the prime example of this, but there are
numerous other examples.
84. Textures (Creation)
GLuint texId = 0;
// Says “This handle is a texture”
glGenTextures(1, &texId);
// Allocates memory
glTextureStorage2DEXT( texId, GL_TEXTURE_2D, mipCount,
texFmt, mip0Width, mip0Height );
// Pushes data—note that conversion is performed if necessary
foreach (mipLevel) {
glTextureSubImage2DEXT( texId, GL_TEXTURE_2D, mipLevel,
0, 0, mipWidth, mipHeight,
srcFmt, srcType, mipData );
}
85. Textures (Updating)
With TexStorage, updates are just like initial data specification
(glTextureSubImage or glCompressedTextureSubImage).
Texture->Texture updates are covered later
On-GPU compression is straightforward, implemented in
https://code.google.com/p/nvidia-texture-tools/
MIT License, use freely!
Or copy Simon Green’s technique:
http://developer.download.nvidia.com/SDK/10/opengl/samples.html#compress_YCoCgDXT
86. Textures (Setting State)
// Sets minification filtering on texture 7
// This parameter will be ignored if a sampler is bound.
glTextureParameteri( 7, GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER,
GL_NEAREST );
88. StretchRect
Implementing StretchRect in GL involves using Read/Write FBOs.
Bind source as a read target
Bind destination as a write target
Draw!
Alternatives:
No stretching/format conversion? EXT_copy_texture
Stretching / format conversion? NV_draw_texture
89. StretchRect – MSAA case
When MSAA is involved, use
EXT_framebuffer_multisample_blit_scaled
Allows resolving and resizing in a single blit
Otherwise two blits needed (one for resolve, one for resize)
90. Other GL/D3D differences
Clip Space
D3D:
-w <= x <= w
-w <= y <= w
0 <= z <= w
GL
-w <= x <= w
-w <= y <= w
-w <= z <= w
But anything with w < 0 still clipped by W=0 clipping
Latched State – let’s get back to this.
Editor's Notes
This talk is primarily aimed at people who have written significant Windows and D3D code and are looking to port to Linux / GL.We decided to port Source to Linux early last year in order to better understand the problem domain. This talk will focuses on many of the key things we learned, sometimes after a significant amount of profiling and experimentation or interacting with vendors.We realize it’s not as easy as it could be to port to Linux, but we’re doing what we can to identify and work with vendors on the key problems. We’ve also internally working on contributing back to several open source tools we’ve found particularly useful, such as apitrace.For us, a significant amount of the effort was porting several different titles from D3D to GL. We learned some surprising things about the state of the various proprietary and open source drivers.
Steam for Linux went live last month (Feb. 14)Log scaleAccess to a extremely quickly growing open platform – tripled userbase in the last 3 months, lots of new customers, can’t be explained by cannibalizing from other OS’sTechnical perks: Significant amount of overlap with porting to Mac, especially if your port uses SDL for windowing/input/sound/etc. Also useful as a stepping stone to port to mobile: Porting to Linux separates OS issues from architecture issues.In our case the process of shipping on Linux actually improved the reliability of our games. We discovered and fixed a significant number of heap related bugs that occurred more consistently or frequently on Linux.
Figures current as of 3/6/2013 - http://store.steampowered.com/hwsurveyD3D11 support: 47.9%D3D10 support: 29.47%D3D9 support: 20.78% -- High because of people on WinXP with Dx10 or even Dx11 class hardware
Figures current as of 3/6/2013 - - http://store.steampowered.com/hwsurveyD3D11 support: 47.9%D3D10 support: 46.3% (+17% from Direct3D)D3D9 support: 4% (-17% from Direct3D)
Elaborate on platform extensions, how to get them
Current Game Industry Members:EAEpic FuturemarkSonyTransgamingUnityValve
Our goal was to access texture 7 with nearest sampling when accessed through unit 0, but linear sampling with unit 1. What actually happens is we get linear filtering no matter what, because we twiddled the header with the second call.Eek!
Even though this is ugly, it’s only slightly uglier than D3D device creation.
Note that at this point, we can’t actually tell that a buffer is a VB or an IB. GL doesn’t care.Also note the pattern: we gen, we specify data.usage is one of six flags, but most common for game devs is GL_STATIC_DRAW or GL_DYNAMIC_DRAW.
This utilizes latched state
Pros: Relatively easy to implementPerformant (with tuning)Easy to output a cache of things to precompile to avoid dynamic runtime linkingCons:Can be a large number of pairs of shadersNot really practical for D3D10 (triplets) or D3D11 (up to 5 shader stages).