Key developments in graphics hardware design over last 20 years
GPU Programmability: “User-Programmable Vertex Engine” and “Cg” SIGGAPH papers
“How GPUs Work” (Luebke, Humpherys)
Modern OpenGL Mark Kilgard Principal System Software Engineer NVIDIA
Modern OpenGL
History
How did OpenGL get where it is now?
Present
Version 3.0
Functionality beyond 3.0
An Overview History of OpenGL
Pre-history 1991
IRIS GL, a proprietary Graphics Library by SGI
OpenGL, an open standard for 3D
Focus: procedural hardware-accelerated 3D graphics
Governed by Architectural Review Board (ARB)
Extensibility planned into design
Competition
Proprietary APIs (1991-1995)
PHIGS & PEX for X Window System (1992-1997)
Microsoft’s Direct3D (1998-)
OpenGL’s Pre-history IRIS GL 1 Window system: MEX IRIS GL 2 Window system: MEX Operating system: UNIX IRIS GL 3 Window system: NeWS/X11 Operating system: IRIX 3. x IRIS GL 4 Window system: Native X11 Operating system: IRIX 4.3 OpenGL 1.0 Window system: Native X11 with GLX Operating system: IRIX 5.1 1991 1993 1988 1986 1983 First work on GL 5.0 proposal 1989 Dates are for shipping commercial SGI implementation 1983-2008 = 25 years
OpenGL’s Design Philosophy
High-performance
Assumes hardware acceleration
Defined by a specification
Rather than a de-facto implementation
Rendering state machine
Procedural
Not a window system, not a scene graph
No initial sub-setting
Extensible
Data type rich
Cross-platform
Window system-independent core
X Window System, Microsoft Windows, OS/2, OS X, etc.
Multi-language bindings
C, FORTRAN, etc.
Not merely an API, rather a system
Timeline of OpenGL’s Development 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
Competitive 3D APIs
OpenGL has always existed in competition with other APIs
Strengthened OpenGL by driving feature parity
OpenGL’s competitive strengths:
Cross platform, open process
API stability, extensibility
Clean initial design & specification
1992 1994 1996 1998 2000 2002 2004 2006 2008 Proprietary Unix workstation 3D APIs XGL Dor é Starbase IRIS GL X Consortium 3D standard PEX Microsoft Direct3D DirectX 3 DirectX 5 DirectX 6 DirectX 7 DirectX 8 DirectX 9 DirectX 10
OpenGL 1.0 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
Immediate mode
Vertex transformation and lighting
Points, lines, polygons
Stippling, wide points and lines
Bitmaps, image rectangles, and pixel reads
Pixel store and transfer
1D and 2D textures, fog, and scissor
Display lists and evaluators
RGBA and color index color models
Color, depth, stencil, and accumulation buffers
Selection and feedback modes
Queries
OpenGL State Machine
From OpenGL 3.0 specification, unchanged since 1.0
SGI “Classic” Hardware View of OpenGL 3D Application or Game
Entirely fixed-function, no programmability
High-end SGI hardware manifested functionality in distinct chips
OpenGL API Front End Vertex Assembly Vertex Transform & Lighting Primitive Assembly, Clipping, Setup, and Rasterization Texture & Fog Texture Fetch Raster Operations Framebuffer Access Memory Interface Graphics Hardware Boundary 1992 Graphics data flow Memory operations Fixed-function unit Programmable unit
OpenGL 1.1 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
Vertex arrays
Texture objects
Texture internal formats
Texture sub-image updates
Texture proxies
Copy framebuffer-to-texture
Polygon offset
RGBA logical operations
The Look of OpenGL 1.1 SGI skyfly demo Stenciled shadow volumes Ideas in Motion
OpenGL 1.2 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
OpenGL 1.4 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
Automatic mipmap generation
Shadow-mapping
Depth textures and shadow comparisons
Texture level-of-detail bias
Texture mirrored repeat wrap mode
Multi-texture combination
Fog coordinate
Secondary color
Configurable point size attenuation
Color blending improvements
Stencil wrap operations
Window-space raster position specification
Hardware Shadow Mapping Without shadow mapping With shadow mapping Depth map from light source’s view Darker is closer light position Projective Texturing (1.0) & Polygon Offset (1.1) key enablers
Shadow Mapping Explained Planar distance from light Depth map projected onto scene ≤ = less than True “un-shadowed” region shown green equals
OpenGL 1.5 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
Vertex buffer objects (VBOs)
Occlusion queries
Generalized shadow mapping functions
GeForce FX (NV3x) View of OpenGL 3D Application or Game
Programmable fragment processing
16 texture units, IEEE 754 32-bit floating-point
Vertex program branching
OpenGL API GPU Front End Vertex Assembly Vertex Program Primitive Assembly, Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 2003 Attribute Fetch
Floating-point Fragment Programmability
OpenGL Fragment Program Flowchart More Instructions? Read Interpolants and/or Registers Map Input values: Swizzle, Negate, etc. Perform Instruction Math / Operation Write Output Register with Masking Begin Fragment Fetch & Decode Next Instruction Temporary Registers initialized to 0,0,0,0 Output Depth & Color Registers initialized to 0,0,0,1 Initialize Parameters Emit Output Registers as Transformed Vertex End Fragment Fragment Program Instruction Loop Fragment Program Instruction Memory Texture Fetch Instruction? yes no no Compute Texture Address & Level-of-detail & Fetch Texels Filter Texels yes Texture Images Primitive Interpolants
Core OpenGL fragment texturing & coloring Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Texture Unit 0 Texture Unit 1 Texture Unit 0 Texture Unit 1 Point Rasterization
NV1x OpenGL fragment texturing & coloring Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Register Combiners Texture Unit 0 General Stage 1 Final Stage Texture Unit 1 General Stage 0 Texture Unit 0 Texture Unit 1 GL_REGISTER_COMBINERS_NV enable Point Rasterization
Register Combiners NV2x OpenGL fragment texturing & coloring Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Texture Shaders Final Combiner General Stage 7 … GLTEXTURE_SHADER_NV enable GL_REGISTER_COMBINERS_NV enable Texture Shader 3 … Texture Shader 1 Texture Shader 0 Point Rasterization General Stage 1 General Stage 0 Texture Unit 3 … Texture Unit 1 Texture Unit 0 Texture Unit 3 … Texture Unit 1 Texture Unit 0
Fragment Program Instruction 0 NV3x OpenGL fragment texturing & coloring Line Rasterization Polygon Rasterization Pixel Rectangle Rasterization Bitmap Rasterization From Primitive Assembly DrawPixels Bitmap Conventional Texture Fetching Texture Environment Application Color Sum Fog To raster operations Coverage Application Texture Shaders Final Combiner General Stage 7 … … Fragment Program Fragment Program Instruction 1023 GL_REGISTER_COMBINERS_NV enable GLTEXTURE_SHADER_NV enable GL_FRAGMENT_PROGRAM_NV enable !!FP1.0 or !!ARBfp1.0 programs Texture Shader 3 … Texture Shader 1 Texture Shader 0 Point Rasterization General Stage 1 General Stage 0 Texture Unit 3 … Texture Unit 1 Texture Unit 0 Texture Unit 3 … Texture Unit 1 Texture Unit 0
OpenGL 2.0 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
Programmable shading
OpenGL Shading Language (GLSL)
Multiple color buffer rendering targets
Non-power-of-two texture dimensions
Point sprites
Separate blend equation
Two-sided stencil testing
GeForce 6 & 7 (NV4x/G7x) View of OpenGL 3D Application or Game
Limited vertex texturing
Fragment branching
Multiple render targets & floating-point blending
OpenGL API GPU Front End Vertex Assembly Vertex Program Primitive Assembly, Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 2004 Attribute Fetch
GeForce 8 & 9 (G8x/G9x) View of OpenGL Primitive Program 3D Application or Game
Primitive (geometry) programs
Parameter reads from buffer objects
Transform feedback (stream out)
OpenGL API GPU Front End Vertex Assembly Vertex Program , Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface CPU – GPU Boundary 2006 Attribute Fetch Primitive Assembly Parameter Buffer Read
OpenGL Pipeline Fixed-function Steps Primitive Program
Much of functional pipeline remains fixed-function
Vital to maintaining performance and data flow
Hard to compete with hard-wired rasterization, Zcull, and pixel compression
GPU Front End Vertex Assembly Vertex Program , Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface 2006 Attribute Fetch Primitive Assembly Parameter Buffer Read
OpenGL Pipeline Programmable Domains Primitive Program
New geometry shader domain for per-primitive programmable processing
Unified Streaming Processor Array (SPA) architecture means same capabilities for all domains
GPU Front End Vertex Assembly Vertex Program , Clipping, Setup, and Rasterization Fragment Program Texture Fetch Raster Operations Framebuffer Access Memory Interface 2006 Attribute Fetch Primitive Assembly Parameter Buffer Read Can be unified hardware!
OpenGL 2.1 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
OpenGL Shading Language (GLSL) improvements
Non-square matrices
Pixel buffer objects (PBOs)
sRGB color space texture formats
OpenGL 3.0 1992 1994 1996 1998 2000 2002 2004 2006 2008 OpenGL 1.0 approved OpenGL 1.1 OpenGL 1.2 Multitexture added (1.2.1) OpenGL 1.3 OpenGL 1.4 OpenGL 1.5 OpenGL 2.0 OpenGL 2.1 OpenGL 3.0 SGI Infinite- Reality OpenGL Utility Toolkit (GLUT) released Mesa 3D open source Khronos controls OpenGL 1 st GPU for PCs with single-chip transform & lighting for OpenGL (GeForce) NT 3.51 bring OpenGL to PCs OpenGL ES for embedded devices 1 st commercial OpenGL implementation (DEC)
OpenGL Shading Language (GLSL) improvements
New texture fetches
True integer data types and operators
switch/case/default flow control statements
Conditional rendering based on occlusion query results
Transform feedback
Vertex array objects
Floating-point textures, color buffers, and depth buffers
Half-precision vertex arrays
Texture arrays
Integer textures
Red and red-green texture formats
Compressed red and red-green formats
Framebuffer objects (FBOs)
Packed depth-stencil pixel formats
Per-color buffer clearing, blending, and masking
sRGB color space color buffers
Fine-grain buffer mapping and flushing
Areas of 3.0 Functionality Improvement
Programmability
Shader Model 4.0 features
OpenGL Shading Language (GLSL) 1.30
Texturing
New texture representations and formats
Framebuffer operations
Framebuffer objects
New formats
New copy (blit), clear, blend, and masking operations
Buffer management
Non-blocking and fine-grain update of buffer object data stores
Interpolation modifiers: centroid , noperspective , and flat
Vertex array element number: gl_VertexID
OpenGL Shading Language (GLSL) improvements
## concatenation in pre-processor for macros
switch / case / default statements
OpenGL 3.0 Texturing Functionality
Texture representation
Texture arrays: indexed access to a set of 1D or 2D texture images
Texture formats
Floating-point texture formats
Single-precision (32-bit, IEEE s23e8)
Half-precision (16-bit, s10e5)
Red & red/green texture formats
Intended as FBO framebuffer formats too
Compressed red & red/green texture formats
Shared exponent texture formats
Packed floating-point texture formats
Texture Arrays
Conventional texture = One logical pre-filtered image
Texture array = index-able plurality of pre-filtered images
Rationale is fewer texture object binds when drawing different objects
No filtering between mipmap sets in a texture array
All mipmap sets in array share same format/border & base dimensions
Both 1D and 2D texture arrays
Require shaders, no fixed-function support
Texture image specification
Use glTexImage3D, glTexSubImage3D, etc. to load 2D texture arrays
No new OpenGL commands for texture arrays
3 rd dimension specifies integer array index
No halving in 3 rd dimension for mipmaps
So 64 ×128x17 reduces to 32×64×17 all the way to 1×1×17
Texture Arrays Example
Multiple skins packed in texture array
Motivation : binding to one multi-skin texture array avoids texture bind per object
Texture array index 0 1 2 3 4 0 1 2 3 4 Mipmap level index
Compact Floating-point Textures
Shared exponent & packed float representations are ideal of High Dynamic Range (HDR) applications
Compact Floating-point Texture Formats
Packed float format
No sign bit, independent exponents
Shared exponent format
No sign bit, shared exponent, no implied leading 1
5-bit mantissa 5-bit exponent 6-bit mantissa 5-bit exponent 6-bit mantissa 5-bit exponent bit 31 bit 0 9-bit mantissa 5-bit shared exponent 9-bit mantissa 9-bit mantissa bit 31 bit 0
1- and 2-component Block Compression Scheme
Basic 1-component block compression format
Borrowed from alpha compression scheme of S3TC 5
8-bit B 8-bit A 2 min/max values 64 bits total per block + 4x4 Pixel Decoded Block Encoded Block 16 pixels x 8-bit/componet = 128 bits decoded so effectively 2:1 compression 16 bits
Framebuffer Operations
Framebuffer objects
Standardized framebuffer objects (FBOs) for rendering to textures and renderbuffers
Render-to-texture
Multisample renderbuffers for FBOs
Framebuffer operations
Copies from one FBO to another, including multisample data
Per-color attachment color clears, blending, and write masking
Framebuffer formats
Floating-point color buffers
Floating-point depth buffers
Rendering into framebuffer format with 3 small unsigned floating-point values packed in a 32-bit value
Rendering into sRGB color space framebuffers
Framebuffer Object Example
Depth peeling for correctly ordered transparency
Great render-to-texture application for FBOs
Depth Peeling Behind the Scenes
Depth buffer has closest fragment at all pixels
Save depth buffer
Render again, but use depth buffer as shadow map
Discard fragment in front of shadow map’s depth value
Effectively peels one layer of depth!
Resulting color buffer is 2 nd closest fragment
And depth buffer for 2 nd closest fragments’ depth
Now repeat peeling more layers
Use ping-pong depth buffer scheme
Use occlusion query to detect when no more fragments to peel
Composite color layers front-to-back (or back-to-front)
Front-to-back peeling can be done during the peeling process
Delicate Color Fidelity with sRGB
Problem : PC display devices have non-linear (sRGB) display gamut—delicate color shading looks wrong
Conventional rendering (uncorrected color) Gamma correct (sRGB rendered) Softer and more natural Unnaturally deep facial shadows NVIDIA’s Adriana GeForce 8 Launch Demo
What is sRGB?
A standard color space
Intended for monitors, printers, and the Internet
Created cooperatively by HP and Microsoft
Non-linear, roughly gamma of 2.2
Intuitively “encodes more dark values”
OpenGL 2.1 already added sRGB texture formats
Texture fetch converts sRGB to linear RGB, then filters
Result takes more than 8-bit fixed-point to represent in shader
3.0 adds complementary sRGB framebuffer support
“ sRGB correct blending” converts framebuffer sRGB to linear, blend with linear color from shader, then convert back to sRGB
Works with FrameBuffer Objects (FBOs)
sRGB chromaticity
So why sRGB? Standard Windows Display is Not Gamma Corrected
25+ years of PC graphics, icons, and images depend on not gamma correcting displays
sRGB textures and color buffers compensates for this
“ Expected” appearance of Windows desktop & icons but 3D lighting too dark Wash-ed out desktop appearance if color response was linear but 3D lighting is correct Gamma 1.0 Gamma 2.2 linear color response
Vertex Processing
Vertex array configuration
Objects to manage vertex array configuration client state
ARB_vertex_array_object —objects to manage vertex array configuration client state
Beyond OpenGL 3.0
OpenGL 3.0
EXT_gpu_shader4
NV_conditional_render
ARB_color_buffer_float
NV_depth_buffer_float
ARB_texture_float
EXT_packed_float
EXT_texture_shared_exponent
NV_half_float
ARB_half_float_pixel
EXT_framebuffer_object
EXT_framebuffer_multisample
EXT_framebuffer_blit
EXT_texture_integer
EXT_texture_array
EXT_packed_depth_stencil
EXT_draw_buffers2
EXT_texture_compression_rgtc
EXT_transform_feedback
APPLE_vertex_array_object
EXT_framebuffer_sRGB
APPLE_flush_buffer_range (modified)
In GeForce 8, 9, & 2xx Series but not yet core
EXT_geometry_shader4 (now ARB)
EXT_bindable_uniform
NV_gpu_program4
NV_parameter_buffer_object
EXT_texture_compression_latc
EXT_texture_buffer_object (now ARB)
NV_framebuffer_multisample_coverage
NV_transform_feedback2
NV_explicit_multisample
NV_multisample_coverage
EXT_draw_instanced (now ARB)
EXT_direct_state_access
EXT_vertex_array_bgra
EXT_texture_swizzle
Plenty of proven OpenGL extensions for OpenGL Working Group to draw upon for OpenGL 3.1
OpenGL Version Evolution
Now OpenGL is part of Khronos Group
Previously OpenGL’s evolution was governed by the OpenGL Architectural Review Board (ARB)
Now officially a Khronos working group
Khronos also standardizes OpenCL, OpenVG, etc.
How OpenGL version updates happen
OpenGL participants proposing extensions
Successful extensions are polished and incorporated into core
OpenGL 3.0 is great example of this process
Roughly 20 extensions folded into “core”
Just 3 of those previously unimplemented
OpenGL Extensions by Source
44% of extensions are “core” or multi-vendor
Lots of vendors have initiated extensions
Extending OpenGL is industry-wide collaboration
EXT SGI SGIS SGIX ARB NV Others Others ATI APPLE MESA Source: http://www.opengl.org/registry (Dec 2008)
What’s Driving OpenGL Modernization? Human desire for Visual Intuition and Entertainment Embarrassing Parallelism of Graphics Increasing Semiconductor Density Particularly the hardware-amenable, latency tolerant nature of rasterization Particularly interactive video games
Kurt Akeley Principal Researcher Microsoft Research Silicon Valley OpenGL’s Evolution: A Personal Retrospective
OpenGL is an architecture All errors specified No side effects Little undefined operation No undefined operation Validity of inputs Specification rules! When implementation errors are found, they are fixed. Enforcement No performance queries Not a formal aspect of architecture Speed No feature sub-setting Configuration attributes (e.g., framebuffer) Can vary amount of resource (e.g., memory) Configuration Carefully planned, though mistakes were made It’s an architecture, whether it was planned or not . Intentional design Top-level goal Conformance tests, … Code runs equivalently on all implementations Compatibility SGI Indy/Indigo/InfiniteReality NVIDIA GeForce, ATI Radeon, … IBM 360 30/40/50/65/75 Amdahl Different implementations OpenGL Blaauw/Brooks
But OpenGL is an API (Application Programming Interface)
Yes, Blaauw and Brooks talk about (computer) architecture as though it is always expressed as ISA (Instruction-Set Architecture)
But …
API is just a higher-level programming interface
“ Instruction-Set” Architecture implies other types of computer architectures (such as “API” Architecture)
OpenGL has evolved to include ISA-like interfaces (e.g., the interface below GLSL)
We didn’t know …
No mention in spec (even 3.0)
“ We view OpenGL as a state …”
First use in “ARB”
Architecture Review Board
Coined by Bill Glazier from “Palo Alto Architecture Review Board”
First formal usage (I know of)
Mark J. Kilgard, Realizing OpenGL: two implementations of one
architecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
workshop on Graphics hardware, p.45-55, August 03-04, 1997, Los Angeles, California, United States.
Fred is magnanimous
What is implied by “programmable”?
What does it mean to teach programming?
Does running a microwave oven count?
Does defining the geometry of a game “level” count?
Does specifying OpenGL modes count?
This seems to be a somewhat open question
Butler Lampson couldn’t tell me .
Microsoft developers of teaching tools couldn’t tell me.
An online search wasn’t very helpful.
Do we just “know it when we see it”?
Justice Potter Stewart’s definition of pornography
My try at some formalization
Key ideas:
Composition choice of placement, sequence
Non-obvious s emantics are interesting and novel
Imperative maybe there are other kinds of programming
“ Composition, the organization of elemental operations into a non-obvious whole, is the essence of imperative programming.” -- Kurt Akeley (Foreword to GPU Gems 3)
OpenGL has always been programmable
Follows directly from being an “architecture”
OpenGL commands are instructions (API as an ISA)
They can be “composed” to create programs
Multi-pass rendering is the prototypical example
But Peercy et al. implemented a RenderMan shader compiler
Invariance was specified from the start (e.g., same fragments)
We set out to enable “usage that we didn’t anticipate”
Obvious for a traditional ISA (e.g., IA32)
Not so obvious for a graphics API
Example: texture applies to all primitives, not just triangles
Example multi-pass OpenGL “program” glEnable(GL_DEPTH_TEST); glDisable(GL_LIGHTING); glColorMask( false , false , false , false ); glEnable(GL_POLYGON_OFFSET_FILL); glPolygonOffset( maxwidth /2, 1); draw solid objects glDepthMask(GL_FALSE); glColorMask( true , true , true , true ); glColor3f(1, 1, 1); glDisable(GL_POLYGON_OFFSET_FILL); glPolygonMode(GL_FRONT_AND_BACK, GL_LINE); glEnable(GL_CULL_FACE); glCullFace(GL_FRONT); draw solid objects again draw true edges // for a complete hidden-line drawing glDisable(GL_DEPTH_TEST); glPolygonMode(GL_FRONT_AND_BACK, GL_FILL); glDepthMask(GL_TRUE); glDisable(GL_CULL_FACE); Additions to the hidden-line algorithm (previous slide) highlighted in red Silhouette rendering
Invariance Corollary 1 Fragment generation is invariant with respect to the state values marked with in Rule 2.
Intended to capture complete sequence of operations
Also inspired design changes
Vertex assembly Primitive assembly Rasterization Fragment operations Display Vertex operations Application Primitive operations Framebuffer Texture memory Pixel assembly (unpack) Pixel operations Pixel pack Vertex pipeline Pixel pipeline Application All primitives (including pixels) are rasterized All vertexes are treated equally (e.g., lighted) All fragments are treated equally (e.g., texture mapped and depth-buffered) Not a required implementation, but “abstraction distance” matters
Culture and Process
Suppose … http://www.opengl.org/registry/ Name ARB_texture_cube_map Name Strings GL_ARB_texture_cube_map Notice Copyright OpenGL Architectural Review Board, 1999. Contact Michael Gold, NVIDIA (gold 'at' nvidia.com) Status Complete. Approved by ARB on 12/8/1999 Version Last Modified Date: December 14, 1999 Number ARB Extension #7 Dependencies None. Written based on the wording of the OpenGL 1.2.1 specification but not dependent on it. Overview This extension provides a new texture generation scheme for cube map textures. Instead of the current texture providing a 1D, 2D, or 3D lookup into a 1D, 2D, or 3D texture image, the texture is a set of six 2D images representing the faces of a cube. The (s,t,r) texture coordinates …
Complete specification Name Name Strings Notice Contact Status Version Number Dependencies Overview Issues New Procedures and Functions New Tokens Additions to Chapter 2 of the OpenGL Specification Additions to Chapter 3 of the OpenGL Specification Additions to Chapter 4 of the OpenGL Specification Additions to Chapter 5 of the OpenGL Specification Additions to Chapter 6 of the OpenGL Specification Additions to the GLX Specification Errors New State (type, query mechanism, initial value, attribute set, specification section) Usage Examples
19 issues The spec just linearly interpolates the reflection vectors computed per-vertex across polygons. Is there a problem interpolating reflection vectors in this way? Probably. The better approach would be to interpolate the eye vector and normal vector over the polygon and perform the reflection vector computation on a per-fragment basis. Not doing so is likely to lead to artifacts because angular changes in the normal vector result in twice as large a change in the reflection vector as normal vector changes. The effect is likely to be reflections that become glancing reflections too fast over the surface of the polygon. Note that this is an issue for REFLECTION_MAP_ARB, but not NORMAL_MAP_ARB.
19 issues … What happens if an (s,t,q) is passed to cube map generation that is close to (0,0,0), ie. a degenerate direction vector? RESOLUTION: Leave undefined what happens in this case (but may not lead to GL interruption or termination). Note that a vector close to (0,0,0) may be generated as a result of the per-fragment interpolation of (s,t,r) between vertices.
Trust and integrity
Lots of collaboration during the initial design
But final decisions made by a small group
SGI played fair
OpenGL 1.0 didn’t favor SGI equipment (our ports were late )
SGI obeyed all conformance rules
SGI didn’t adjust the spec to match our equipment
The ARB avoided marketing tasks such as benchmarks
Client Memory Vertex Attribute Transfer GPU Processor command processor vertex puller hardware rendering pipeline CPU CPU writes of command + vertex data GPU DMA transfer of command + vertex data vertex data travels through CPU memory reads CPU command queue application (client) memory vertex array
Vertex Buffer Object Vertex Attribute Pulling OpenGL (vertex) buffer object GPU command processor vertex puller hardware rendering pipeline CPU CPU writes of command + vertex indices vertex array GPU DMA transfer of command data application (client) memory memory reads CPU GPU DMA transfer of vertex data—CPU never reads data command queue
Initializing Vertex Buffer Objects (VBOs)
Once using vertex arrays, easy to switch to VBOs
Make the vertex array as before
Then bind to buffer object and copy data to the buffer
Different treatment of the “pointer” parameter to vertex array specification commands
When the current array buffer binding is zero, the pointer value is a client memory pointer
When the current array buffer binding is non-zero (meaning it names a buffer object), the pointer value is “recast” as an offset from the beginning of the buffer
Once again
The glBindBuffer ( GL_ARRAY_BUFFER ,0) call alone doesn’t change any vertex array buffer bindings
It takes a vertex array specification command such as glColorPointer to latch the zero
ensures compatibility with pre-VBO OpenGL
Texture Coordinate Set Selector
A selector in OpenGL is
A state variable that controls what state a subsequent command updates
Single-threaded CPU performance trends are stalled
Multi-core is CPU designer response
GPU performance continues on-trend
What does this mean for graphics API design?
CPUs must generate more visually rich API command streams to saturate GPUs
Can’t just send more commands faster
Single-threaded CPUs can only do so much
So must send more powerful commands
Déjà vu
We’ve been here before
Early 1980s : Graphics terminals used to be connected to minicomputers by slow speed interconnects
CPUs themselves far too slow for real-time rendering
Resulting rendering model
Download scene database to graphics terminal
Adjust viewing and modeling parameters
Send “redraw scene” command
What Happened
Such “scene processor” hardware not very flexible
Difficult to animate anything beyond rigid dynamics
Eventually SGI and others matched CPUs and interconnects to graphics performance
Result was IRIS GL’s immediate mode
CPU fast enough to send geometry every frame
OpenGL took this model
Over time added vertex arrays, vertex buffers, texturing, programmable shading, and more performance
CPU performance became limiter still
Better graphics driver tuning helped
Dual-core drivers help some more
OpenGL’s Most Powerful Command
Available since OpenGL 1.0
Can render essentially anything OpenGL can render!
Takes just one parameter
The command
glCallList ( GLuint displayListName);
Power of display lists comes from
Playing back arbitrary compiled commands
Allowing for hierarchical calling of display list
A display list can contain glCallList or glCallLists
Ability of application to re-define display lists
No editing, but can be re-defined
Enhanced Display Lists
OpenGL 1.0 display lists are too inflexible
Pixel & vertex data “compiled into” display lists
Binding objects always “by name”
Rather than “by reference
These problems can be fixed
Modern OpenGL supports buffers for transferring vertices and pixels
Compile commands into display lists that defer vertex and pixel transfers until execute-time
Rather than compile-time
Allow objects (textures, buffers, programs) to be bound “by reference” or “by name”
Other Display List Enhancements
Conditional display list execution
Relaxed vertex index and command order
Parallel construction of display lists by multiple threads
General insight : Easier for driver to optimize application’s graphics command stream if it gets to 1) see the repetition in the command stream clearly 2) take time to analyze and optimize usage
Conditional Display List Execution
Today’s occlusion query
Application must “query” to learn occlusion result
Latency too great to respond
Application can use OpenGL 3.0’s conditional render capability
But just skips vertex pulling, not state changes
Conditional display list execution
Allow a glCallList to depend on the occlusion result from an occlusion query object
Allows in-band occlusion querying
Skip both vertex pulling and state changes
Relaxed Vertex Index and Command Order
OpenGL today always executes commands “in order”
Sequentially requirement
Provide compile-time specification of re-ordering allowances
Allows GL implementation to re-order
Vertex indices within display list’s vertex batch
Commands within display list
Key rule : state vector rendering command executes in must match the state if command was rendered sequentially
Allow static or dynamic re-ordering
Static re-ordering needed for multi-pass invariances
Past practice
IRIS Performer would sort rendering by state changes for performance
[Sander 2007] show substantial benefit for vertex ordering
Parallel Display List Construction
Today’s model
Single thread makes all OpenGL rendering calls
Minimizes GPU context switch overhead
Ties command generation rate to single core’s CPU performance
Enhanced display list model
Multiple threads can build display lists in parallel
Single thread still executes display lists
Countable semaphore objects used to synchronize hand-off of display lists built by other threads with main rendering thread
Rethinking Display Lists
Display lists have been proposed for deprecation
Right as we really need them!
Much more interesting to enhance display lists
Dual-core driver already off-loads display list traversal to driver’s thread
Multi-core driver could scan frequently executed display lists to optimize their order and error processing
Includes adding pre-fetching to avoid stalling CPU on cache misses for object accesses
Direct3Disms
Developing a shader-rich game title costs $$$
For top titles, often US$ 5,000,000+
Investment typically amortized over multiple platforms
Consoles are primary target, then PCs
PC version typically developed for Direct3D
Reality: OpenGL is often 3 rd or worse priority
API differences = porting & performance pitfalls
Stops or slows Direct3D-developed 3D content from working easily on OpenGL platforms
Supporting Direct3D: Not New
OpenGL has always supported multiple formats well
OpenGL’s plethora of pixel and vertex formats
Very first OpenGL extension: EXT_bgra
Provides a pixel component ordering to match the color component ordering of Windows for 2D GDI rendering
Made core functionality by OpenGL 1.3
Many OpenGL extensions have embraced Direct3Disms
Secondary color
Fog coordinate
Point sprites
Direct3D vs. OpenGL Coordinate System Conventions
Window origin conventions
Direct3D = upper-left origin
OpenGL = lower-left origin
Pixel center conventions
Direct3D9 = pixel centers at integer locations
OpenGL (and Direct3D 10) = pixel centers at half-pixel locations
Clip space conventions
Direct3D = [-1,+1] for XY, [0,1] for Z
OpenGL = [-1,+1] range for XYZ
Affects
How projection matrix is loaded
Fragment shaders that access the window position
Point sprites have upper-left texture coordinate origin
OpenGL already lets application choose lower-left or upper-left
Direct3D vs. OpenGL Provoking Vertex Conventions
Direct3D uses “first” vertex of a triangle or line to determine which color is used for flat shading
OpenGL uses “last” vertex for lines, triangles, and quads
Except for polygons ( GL_POLYGON ) mode that use the first vertex
A long-time implementer of OpenGL (Mark Kilgard, NV more
A long-time implementer of OpenGL (Mark Kilgard, NVIDIA) and the system's original architect (Kurt Akeley, Microsoft) explain OpenGL's design and evolution. OpenGL's state machine is now a complex data-flow with multiple programmable stages. OpenGL practitioners can expect candid design explanations, advice for programming modern GPUs, and insight into OpenGL's future.
These slides were presented at SIGGRAPH Asia 2008 for the "Modern OpenGL: Its Design and Evolution" course.
Course abstract: OpenGL was conceived in 1991 to provide an industry standard for programming the hardware graphics pipeline. The original design has evolved considerably over the last 17 years. Whereas capabilities mandated by OpenGL such as texture mapping and a stencil buffer were present only on the world's most expensive graphics hardware back in 1991, now these features are completely pervasive in PCs and now even available in several hand-held devices. Over that time, OpenGL's original fixed-function state machine has evolved into a complex data-flow including several application-programmable stages. And the performance of OpenGL has increased from 100x to over 1,000x in many important raw graphics operations.
In this course, a long-time implementer of OpenGL and the system's original architect explain OpenGL's design and evolution.
You will learn how the modern (post-2006) graphics hardware pipeline is exposed through OpenGL. You will hear Kurt Akeley's personal retrospective on OpenGL's development. You will learn nine ways to write better OpenGL programs. You will learn how modern OpenGL implementations operate. Finally we discuss OpenGL's future evolution.
Whether you program with OpenGL or program with another API such as Direct3D, this course will give you new insights into graphics hardware architecture, programmable shading, and how to best take advantage of modern GPUs. less
0 comments
Post a comment