SlideShare a Scribd company logo
1 of 59
Download to read offline
1Winter 2011 – Beyond Programmable Shading
Beyond Programmable Shading
Introduction
Aaron Lefohn - Intel / University of Washington
Mike Houston – AMD / Stanford
2Winter 2011 – Beyond Programmable Shading
Course Goals, To Learn…
• The state-of-the-art in real-time rendering hardware,
programming models, and algorithms
• The open research problems in real-time rendering algorithms
and programming models
3Winter 2011 – Beyond Programmable Shading
Course Goals (Detail). To Learn…
• Architecture / programming models
– How does a GPU work and what is difference between CPU and GPU?
– How to use all CPU and GPU FLOPs
– Parallel programming models and algorithms for graphics on CPU and GPU
– Open research problems in parallel programming models for graphics
• Rendering
– The modern real-time rendering pipeline (DirectX11)
– Latest CPU and GPU parallel programming models
– How to solve rendering problems by mixing traditional 3D pipeline with your
own task- or data-parallel algorithms
– State-of-the-art in real-time rendering algorithms
– Open research problems in real-time rendering
4Winter 2011 – Beyond Programmable Shading
Course History
• Beyond Programmable Shading SIGGRAPH courses
– 2008, 2009, 2010
• Beyond Programmable Shading Stanford course
– Spring 2010 (5 student ―pilot‖ course)
• Winter 2011
– First offered at University of Washington
– To be offered at Stanford in spring 2011
5Winter 2011 – Beyond Programmable Shading
Instructor Biographies
• Aaron Lefohn
– Director of research for the Advanced Rendering Technologies team at Intel
– Real-time rendering and rendering programming model research
– Neoptica, graphics startup acquired by Intel in 2007
– Rendering R&D at Pixar
– Ph.D. University of California, Davis with John Owens
– Early GPGPU researcher (beginning in 2002)
• Mike Houston
– Principal Architect for AMD Accelerated Parallel Processing
– One of the OpenCL and Fusion SW technical leads
– Heavily involved in GPU and Fusion hardware for parallel computation
– Ph.D. Stanford with Pat Hanrahan
– Early GPGPU researcher (beginning in 2002)
6Winter 2011 – Beyond Programmable Shading
Syllabus Overview
• Section 1: Review modern real-time rendering
– DirectX 9 pipeline (ca. 2003)
– DirectX 11 pipeline (ca. 2009)
– Rendering effects / terms (shadow maps, cube maps, *-buffers, …)
• Section 2: GPU architecture and parallel programming
– Interleaved lectures on arch., prog. models, rendering applications
– Focus on GPU (because it is new to you), but discuss CPU and GPU
• Section 3: Putting it all together
– Advanced rendering techniques that use 3D pipeline plus parallel algorithms
– New real-time rendering pipelines
– How games use these architectures, programming models, and techniques
– Open research problems
7Winter 2011 – Beyond Programmable Shading
Assignment (Rough) Overview
• Assignment 1 (15%, individual)
– Warm-up ―programmable shading‖ project using DirectX11
– I‘ll provide skeleton code, you write shaders and add rendering passes
• Assignment 2 (35%, individual)
– Heterogeneous CPU + GPU rendering project using CPU and GPU languages
– I‘ll provide skeleton code, but this assig. will involve a lot of coding
• Assignment 3, term project (50%, ~2 people / project)
– ―Render pretty pictures using all CPU and GPU cores‖
– New solutions to open problems in rendering or parallel algorithms
8Winter 2011 – Beyond Programmable Shading
Logistics I
• Computers
– Each registered student will be loaned a new computer for the term
– Location
– Grad students may keep machine at your UW desk (not at home)
– Other machines will be in Sieg 327
– Hardware details
– 12 CPU cores / 24 CPU threads (3.33 GHz Xeon) with 12 GB of RAM
– AMD 5870 GPU with 1 GB of RAM
– Software
– Intel Parallel Studio (including Cilk and many other parallelism tools)
– Microsoft DirectX SDK
– AMD StreamSDK (OpenCL)
– AMD GPU drivers (OpenGL, Direct3D, DirectCompute)
– HW and SW donated by Intel, AMD, and Microsoft
9Winter 2011 – Beyond Programmable Shading
Logistics II
• We will have several guest lecturers
– Natasha Tatarchuk from Bungie
– Andrew Lauritzen from Intel
– Likely a couple of others later in the term (TBA)
• Website
– UW: http://www.cs.washington.edu/education/courses/cse558/11wi/
• Mailing list
– UW: https://mailman.cs.washington.edu/mailman/listinfo/cse558
10Winter 2011 – Beyond Programmable Shading
A Brief History of The Real-
Time Rendering Pipeline
11Winter 2011 – Beyond Programmable Shading
Early Software Renderers: Fixed Function
• 1970s through mid-1980s, most graphics pipelines were fixed-
function
• Must change core rendering code to change
geometry/lighting/surface model
12Winter 2011 – Beyond Programmable Shading
Mid’80s – Early 90s: Programmable Shading
• Most of graphics pipeline is fixed-function, highly optimized
architecture, but shaders allow users to customize small portions
of pipeline with simple language
– Shade Trees, Cook 1984
– An Image Synthesizer, Perlin 1985
– Renderman Shading Language, Hanrahan/Lawson 1990
13Winter 2011 – Beyond Programmable Shading
Cook Approach
Shader:
• Basic operations: dot products, norms, etc.
• Operations organized into trees
- Separated light source specification, surface reflectance, and
atmospheric effects
- Multiple trees: shade trees, light trees, atmosphere trees,
displacement maps
- Simple language
Slide courtesy of John Owens, UC Davi
14Winter 2011 – Beyond Programmable Shading
Shade Trees
Phong shading model:
Diffuse Specular
+
N L N H n
Color
Slide courtesy of John Owens, UC Davis
15Winter 2011 – Beyond Programmable Shading
Cook: Arbitrary Trees
Slide courtesy of John Owens, UC Davis
16Winter 2011 – Beyond Programmable Shading
Images from Perlin
17Winter 2011 – Beyond Programmable Shading
Contributions
Cook: Separates / modularizes factors that determine shading
– Geometric
– Material
– Environmental
Perlin: Language definition, conditionals, …
Leads to …
Slide courtesy of John Owens, UC Davi
18Winter 2011 – Beyond Programmable Shading
RenderMan—Hanrahan/Lawson
1. Abstract shading model based on optics for either
global/local illumination
2. Define interface between rendering program and shading
modules
3. High-level shading language
Three main kinds of shaders—light source, surface reflectance,
volume
Slide courtesy of John Owens, UC Davi
19Winter 2011 – Beyond Programmable Shading
Shading Language Summary
• Expert rendering engineers write highly optimized rendering
architecture
• Shading languages allow non-expert users to customize
renderer
• Consequence
– Renderers specify their own compilers, language runtimes, memory models
– User‘s shading code runs as inner-most loop---JIT ends up being very
important
20Winter 2011 – Beyond Programmable Shading
A Few (but not all) Seminal Papers
• Depth Buffer: Catmull 1978
• A-Buffer: Carpenter 1984
• Shade Trees: Cook 1984
• Alpha Blending: Porter and Duff 1984
• REYES: Cook, Carpenter, Catmull 1987
• Renderman: Hanrahan, Lawson 1990
• Reality Engine Graphics: Akeley 1993
• …
21Winter 2011 – Beyond Programmable Shading
Early 90s, Interactive Rendering Started Over
• Interactive software renderering (no GPUs yet)
• NOTE: SGI was building interactive rendering supercomputers,
but this was beginning of interactive 3D graphics on PC
Wolfenstein 3D, 1992 Doom I, 1993
22Winter 2011 – Beyond Programmable Shading
By Late 90s, Graphics Hardware Emerging
• ―Hey, let‘s build hardware to make a highly constrained
graphics pipeline go really fast‖
• 3DFX, NVIDIA, ATI, and countless others
• Eventually replaced SGI‘s supercomputers with plug-in boards
for PC
• OpenGL and Direct3D gained dominance as they provided the
only programming interface to GPUs
23Winter 2011 – Beyond Programmable Shading
OpenGL and DirectX (90’s to early 2000s)
• Fixed function pipeline
– Configure options via API
– Fixed per-vertex lighting model
– Fixed vertex transforms
– Limited texturing
• Rendering pipeline designed by expert HW architects with little
programmability available to end-user
24Winter 2011 – Beyond Programmable Shading
2001+: GPUs ―Rediscover‖ Programmable Shading
• NVIDIA GeForce 3 and ATI Radeon 8500
• Programmable vertex computations
– Up to 128 instructions
• Limited programmable fragment computations
– 8-16 instructions
25Winter 2011 – Beyond Programmable Shading
2008 (DirectX 10) Programmable Pipeline
• High-level shading languages
– GL Shading Language (GLSL) for OpenGL
– High Level Shading Language (HLSL) for DirectX
• 1000s of instructions permitted per stage
• Flow control, integers, arrays, temporary storage, large
number of textures, …
Input
Assembler
Vertex
Shader
Setup
Rasterizer
Output
Merger
Pixel
Shader
Geometry
Shader
26Winter 2011 – Beyond Programmable Shading
Beyond Programmable Shading:
Parallel Programming for
Graphics
27Winter 2011 – Beyond Programmable Shading
Completing the Circle
• Beginning in 2001/2002, researchers realized that
programmable GPUs could do more than graphics
– GPUs were becoming data-parallel co-processors
– A research field was born: General Purpose Programming on GPUs
(GPGPU)
– Scores of papers about data-parallel algorithms on GPUs
– Finance, physical simulation, medical imaging, …
28Winter 2011 – Beyond Programmable Shading
Non-Graphics GPU Programming Models
• From GPGPU, arose parallel programming models that let
users program GPUs without using the 3D APIs (OGL / D3D)
– Brook, Sh / RapidMind, PeakStream, CUDA, OpenCL, DirectCompute, …
• Focus on data-parallelism but task-parallelism is coming
– Nascent task parallelism in OpenCL
– Researchers have found ways to build task systems in GPU compute
languages
29Winter 2011 – Beyond Programmable Shading
―The Killer App of GPGPU is Graphics‖
• But then researchers starting writing rendering papers that
combined data-parallel GPU algorithms with the GPU
rendering pipeline
– Ray tracing and photon mapping, Purcell 2002-2003
– Summed Area Table Generation, Hensley 2005
– Approximate global illumination, Bunnell 2005
– Resolution Matched Shadow Maps, Lefohn 2007
– …
30Winter 2011 – Beyond Programmable Shading
Early ―Beyond Programmable Shading‖
“Fast Summed-Area Table Generation and its
Applications,”
Hensley et al., Eurographics 2005
“Resolution Matched Shadow Maps,”
Lefohn et al., ACM Transactions on Graphics 2007
“Dynamic Ambient Occlusion and Indirect
Lighting,” Bunnell, GPU Gems II, 2005
31Winter 2011 – Beyond Programmable Shading
Today: Creating New Rendering Algorithms
• What?
– Design interactive rendering algorithms that
– Adapt dynamically by performing per-frame analysis of rendering results
– Build dynamic per-frame data structures
– Use an alternate graphics pipeline
• How?
– Tools
– Use OpenGL/Direct3D for ―standard rendering‖ portion of algorithm
– Use OpenCL/DirectCompute for ―GPU parallel algorithm‖ portion
– Use Cilk, OpenCL, ArBB, … for ―CPU parallel algorithm portion‖
– Graphics programming uses mix of
– Data-parallelism
– Task-parallelism
– Pipeline parallelism
32Winter 2011 – Beyond Programmable Shading
“Sample Distribution Shadow Maps,”
Lauritzen et al., I3D 2011
“Render-analyze-render”
―Real-Time Concurrent Linked List Construction on the GPU,‖
Yang et al., EGSR 2010
“Render to user-defined data structure”
33Winter 2011 – Beyond Programmable Shading
―OptiX: A General Purpose Ray Tracing
Engine‖, Parket et al., SIGGRAPH 2010
―Real-Time Stochastic Rasterization
on Conventional GPU Architectures,‖ McGuire et al.,
High Performance Graphics 2010
Building New Real-Time
Rendering Pipelines
34Winter 2011 – Beyond Programmable Shading
Mainstream ―Beyond Programmable Shading‖
• In late 2009, Microsoft and Apple/Khronos released cross-
architecture APIs for programming GPUs in ways other than
through the 3D rendering API
– DirectX11: Direct3D plus DirectCompute
– OpenCL, interoperates with OpenGL
• DirectCompute designed to work closely with 3D rendering API
– Same language (HLSL)
– Same CPU-side API
– Same memory space
35Winter 2011 – Beyond Programmable Shading
DirectCompute in Graphics Products
(as of August 2010)
• Screen-Space Ambient Occlusion
– BattleForge
– Colin McRae Dirt 2
• Depth of Field
– Metro 2033
– Just Cause 2
• FFT Lens Effects
– FutureMark 3DMark 11
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
36Winter 2011 – Beyond Programmable Shading
DirectCompute in Shipping
Graphics Games/Demos
37Winter 2011 – Beyond Programmable Shading
Ambient Occlusion
DirectCompute Use in Real-Time Rendering Products
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
38Winter 2011 – Beyond Programmable Shading
Screen-Space Ambient Occlusion
• Conventional technique uses pixel shaders
– http://sites.google.com/site/perumaal/ao.pdf
• DirectCompute shaders enable more control of convolution filter cache
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
39Winter 2011 – Beyond Programmable Shading
HDSSAO Off
Image credit:
Codemasters
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
40Winter 2011 – Beyond Programmable Shading
HDSSAO On
Image credit:
Codemasters
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
41Winter 2011 – Beyond Programmable Shading
Depth of Field
DirectCompute Use in Real-Time Rendering Products
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
42Winter 2011 – Beyond Programmable Shading
Diffusion Post-Process Depth-of-Field
• Model the problem as a heat-flow simulation
– Blur radius analogous to heat distance traveled
– Controlled by ‗thermal conductivity‘ of image, which is defined by distance
from focal plane
– Solve thousands of tridiagonal linear systems in parallel each frame
• Original work at Pixar
–http://graphics.pixar.com/library/DepthOfField/paper.pdf
– Michael Kass, Aaron Lefohn, John Owens
• Metro 2033
– GDC 2010 Presentation by OlesShishkovtsov, 4A Games and AshuRege, NVIDIA
– http://developer.nvidia.com/object/gdc-2010.html#metro
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
43Winter 2011 – Beyond Programmable Shading
Image credit:
A4 Games
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
44Winter 2011 – Beyond Programmable Shading
FFT Lens Effects
DirectCompute Use in Real-Time Rendering Products
45Winter 2011 – Beyond Programmable Shading
FFT Lens Effects in 3DMark11
• FFT rendered image
• Apply lens effect in frequency domain
• Inverse FFT
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
46Winter 2011 – Beyond Programmable Shading
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
47Winter 2011 – Beyond Programmable Shading
Slide from Chas Boyd,
Beyond Programmable Shading, SIGGRAPH 2010
48Winter 2011 – Beyond Programmable Shading
Recap:
How did/do programmers
create new real-time
rendering algorithms?
49Winter 2011 – Beyond Programmable Shading
Fixed Function Pipelines (DX7)
• Writing new rendering algorithms means
– Tricks with multitexture, stencil buffer, depth buffer, blending …
• Examples
– Stencil shadow volumes
– Hidden line removal
– …
50Winter 2011 – Beyond Programmable Shading
Programmable Shaders (DX8-10)
• Writing new rendering algorithms means
– Tricks with stencil buffer, depth buffer, blending …
– Plus: Writing shaders
• Examples
– User-defined materials
– User-defined lights
– User-defined data structures (built in texture memory)
51Winter 2011 – Beyond Programmable Shading
Software Graphics: Part I (DX11)
• Writing new rendering algorithms means
– Tricks with stencil buffer, depth buffer, blending …
– Plus: Writing shaders
– Plus: Writing data- and task-parallel algorithms
– Analyze results of rendering pipeline
– Synthesize data structures
• Examples
– Dynamic summed area table
– Dynamic quadtree adaptive shadow map
– Dynamic histogram-analysis shadow map
– Dynamic ambient occlusion
– Order-independent transparency
– …
52Winter 2011 – Beyond Programmable Shading
Software Graphics: Part II (DX11+)
• Writing new rendering algorithms means
– Tricks with stencil buffer, depth buffer, blending …
– Plus: Writing shaders
– Plus: Writing data- and task-parallel algorithms
– Analyze results of rendering pipeline
– Synthesize data structures
– Plus: Creating new and extended rendering pipelines
• Examples
– Stochastic rasterization rendering
– Ray tracing pipelines
– …
53Winter 2011 – Beyond Programmable Shading
What Next? Full Software Graphics?
• Intel‘s Larrabee
– Advertised to come full-circle to ―full software‖ real-time graphics
– Unfortunately, the product was cancelled
• Yet
– ―Mixing SW graphics with the HW pipeline‖ is becoming mainstream
– GPUs are becoming increasingly programmable
– CPUs are becoming more parallel/powerful
– CPUs and GPUs are getting ―closer‖ with integrated CPU-GPU chips
• So…
– Will fixed-function graphics hardware and fixed pipelines go away?
54Winter 2011 – Beyond Programmable Shading
The Wheel of Reincarnation
Gradually the processor became more complex.... Finally the
display processor came to resemble a full-fledged computer
with some special graphics features. And then a strange thing
happened. We felt compelled to add to the processor a second,
subsidiary processor, which, itself, began to grow in
complexity. It was then that we discovered a disturbing truth.
Designing a display processor can become a never-ending
cyclical process. In fact, we found the process so frustrating
that we have come to call it the "wheel of reincarnation.”
- Myer and Sutherland ―On The Design of Display Processors‖,
Communications of the ACM, 1968
55Winter 2011 – Beyond Programmable Shading
Will There Be Another Turn of The
Wheel of Reincarnation?
• Is ―the rise of SW graphics‖ a temporary (5-10) year window
as we go around the wheel of reincarnation or has the wheel
stopped turning?
• If it has stopped turning, why?
• If it hasn‘t stopped turning, what will be the next fixed-
function?
– The next turn of the wheel will not look like today’s graphics hardware
– It is a great time to be a graphics researcher because the killer-app SW
rendering pipelines/capabilities created now may define future fixed-
function hardware
56Winter 2011 – Beyond Programmable Shading
(A Few) Open Real-Time Rendering Problems
• More realistic ―eye rays‖
– Anti-aliasing
– Motion blur and depth-of-field
– Transparency
– Curved and highly-detailed geometry
– …
• More realistic illumination
– Shadows (hard, soft, volumetric)
– Global illumination
– Reflections
From Johan Andersson’s SIGGRAPH 2010 talk
57Winter 2011 – Beyond Programmable Shading
(A Few) Open Programming Model Problems
• Consistent programming models for CPU and GPU
• Consistent programming model for pipeline, data, and task
parallelism (―braided parallelism‖)
• More flexible rendering pipelines
• (Many more will come up throughout the course as you use
the state-of-the-art)
58Winter 2011 – Beyond Programmable Shading
Conclusions
• SW+ HW graphics is here today (beginning ―for real‖ in DX11)
– Graphics programming is no longer simply a single pre-defined pipeline
– Research is ablaze with SW real-time rendering research on GPUs and CPUs
• Future real-time rendering programming will likely consist of
– A pre-defined (Direct3D/OpenGL) rendering pipeline
– User-defined software pipelines
– User-defined data- and task-parallel code tightly coupled to graphics pipelines
• Is the wheel of reincarnation still turning?
59Winter 2011 – Beyond Programmable Shading
Questions?
• Email Aaron: alefohn@cs.washington.edu
• Email Mike: mhouston@graphics.stanford.edu
• Next lecture: ―A trip down the 2003 rasterization pipeline‖

More Related Content

What's hot

PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...AMD Developer Central
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsAMD Developer Central
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFOfer Rosenberg
 
Resource Oriented Future for Geospatial Data
Resource Oriented Future for Geospatial DataResource Oriented Future for Geospatial Data
Resource Oriented Future for Geospatial DataArnulf Christl
 
D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...Mark Kilgard
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsShinya Takamaeda-Y
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...AMD Developer Central
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengAMD Developer Central
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingElectronic Arts / DICE
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...waqarnabi
 

What's hot (15)

PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
PT-4102, Simulation, Compilation and Debugging of OpenCL on the AMD Southern ...
 
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon WoodsWT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
WT-4073, ANGLE and cross-platform WebGL support, by Shannon Woods
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
GPU Ecosystem
GPU EcosystemGPU Ecosystem
GPU Ecosystem
 
Resource Oriented Future for Geospatial Data
Resource Oriented Future for Geospatial DataResource Oriented Future for Geospatial Data
Resource Oriented Future for Geospatial Data
 
D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...D11: a high-performance, protocol-optional, transport-optional, window system...
D11: a high-performance, protocol-optional, transport-optional, window system...
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
 
The GPGPU Continuum
The GPGPU ContinuumThe GPGPU Continuum
The GPGPU Continuum
 
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
Keynote (Nandini Ramani) - The Role of Java in Heterogeneous Computing & How ...
 
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu FengHC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
HC-4022, Towards an Ecosystem for Heterogeneous Parallel Computing, by Wu Feng
 
Sai Dheeraj_Resume
Sai Dheeraj_ResumeSai Dheeraj_Resume
Sai Dheeraj_Resume
 
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray TracingSyysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...
 
Br4201458461
Br4201458461Br4201458461
Br4201458461
 

Similar to 01 intro-bps-2011

Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureTalbott Crowell
 
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張Unity Technologies Japan K.K.
 
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張Unite2017Tokyo
 
Computing Without Computers - Oct08
Computing Without Computers - Oct08Computing Without Computers - Oct08
Computing Without Computers - Oct08Ian Page
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADDesign World
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4Peter Tröger
 
OliverStoneSWResume2015-05
OliverStoneSWResume2015-05OliverStoneSWResume2015-05
OliverStoneSWResume2015-05Oliver Stone
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...Edge AI and Vision Alliance
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
 
Cockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with ElixirCockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with ElixirHideki Takase
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...AMD Developer Central
 
Apache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analyticsApache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analyticsJulien Anguenot
 
2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System Linkers2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System LinkersChing-Yi Chen
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 
FEL Flyer F12
FEL Flyer F12FEL Flyer F12
FEL Flyer F12chitlesh
 

Similar to 01 intro-bps-2011 (20)

Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore Future
 
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
 
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
【Unite 2017 Tokyo】スクリプタブル・レンダーパイプラインのカスタマイズと拡張
 
Computing Without Computers - Oct08
Computing Without Computers - Oct08Computing Without Computers - Oct08
Computing Without Computers - Oct08
 
OpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CADOpenCL & the Future of Desktop High Performance Computing in CAD
OpenCL & the Future of Desktop High Performance Computing in CAD
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4
 
OliverStoneSWResume2015-05
OliverStoneSWResume2015-05OliverStoneSWResume2015-05
OliverStoneSWResume2015-05
 
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono..."The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Digital_system_design_A (1).ppt
Digital_system_design_A (1).pptDigital_system_design_A (1).ppt
Digital_system_design_A (1).ppt
 
Cockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with ElixirCockatrice: A Hardware Design Environment with Elixir
Cockatrice: A Hardware Design Environment with Elixir
 
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
 
Apache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analyticsApache Spark and Python: unified Big Data analytics
Apache Spark and Python: unified Big Data analytics
 
2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System Linkers2013 Hello GCC:The Theory, History and Future of System Linkers
2013 Hello GCC:The Theory, History and Future of System Linkers
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
FEL Flyer F12
FEL Flyer F12FEL Flyer F12
FEL Flyer F12
 

More from mistercteam

Preliminary xsx die_fact_finding
Preliminary xsx die_fact_findingPreliminary xsx die_fact_finding
Preliminary xsx die_fact_findingmistercteam
 
20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secrets20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secretsmistercteam
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudamistercteam
 
201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpu201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpumistercteam
 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)mistercteam
 
5 baker oxide (1)
5 baker oxide (1)5 baker oxide (1)
5 baker oxide (1)mistercteam
 
The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805mistercteam
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakovmistercteam
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloadedmistercteam
 
Mantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referenceMantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referencemistercteam
 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performancemistercteam
 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performancemistercteam
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12mistercteam
 

More from mistercteam (17)

Preliminary xsx die_fact_finding
Preliminary xsx die_fact_findingPreliminary xsx die_fact_finding
Preliminary xsx die_fact_finding
 
20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secrets20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secrets
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 
201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpu201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpu
 
3 673 (1)
3 673 (1)3 673 (1)
3 673 (1)
 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)
 
5 baker oxide (1)
5 baker oxide (1)5 baker oxide (1)
5 baker oxide (1)
 
The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805
 
Lecture14
Lecture14Lecture14
Lecture14
 
Gdce 2010 dx11
Gdce 2010 dx11Gdce 2010 dx11
Gdce 2010 dx11
 
Hpg2011 papers kazakov
Hpg2011 papers kazakovHpg2011 papers kazakov
Hpg2011 papers kazakov
 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
 
Mantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referenceMantle programming-guide-and-api-reference
Mantle programming-guide-and-api-reference
 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performance
 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performance
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

01 intro-bps-2011

  • 1. 1Winter 2011 – Beyond Programmable Shading Beyond Programmable Shading Introduction Aaron Lefohn - Intel / University of Washington Mike Houston – AMD / Stanford
  • 2. 2Winter 2011 – Beyond Programmable Shading Course Goals, To Learn… • The state-of-the-art in real-time rendering hardware, programming models, and algorithms • The open research problems in real-time rendering algorithms and programming models
  • 3. 3Winter 2011 – Beyond Programmable Shading Course Goals (Detail). To Learn… • Architecture / programming models – How does a GPU work and what is difference between CPU and GPU? – How to use all CPU and GPU FLOPs – Parallel programming models and algorithms for graphics on CPU and GPU – Open research problems in parallel programming models for graphics • Rendering – The modern real-time rendering pipeline (DirectX11) – Latest CPU and GPU parallel programming models – How to solve rendering problems by mixing traditional 3D pipeline with your own task- or data-parallel algorithms – State-of-the-art in real-time rendering algorithms – Open research problems in real-time rendering
  • 4. 4Winter 2011 – Beyond Programmable Shading Course History • Beyond Programmable Shading SIGGRAPH courses – 2008, 2009, 2010 • Beyond Programmable Shading Stanford course – Spring 2010 (5 student ―pilot‖ course) • Winter 2011 – First offered at University of Washington – To be offered at Stanford in spring 2011
  • 5. 5Winter 2011 – Beyond Programmable Shading Instructor Biographies • Aaron Lefohn – Director of research for the Advanced Rendering Technologies team at Intel – Real-time rendering and rendering programming model research – Neoptica, graphics startup acquired by Intel in 2007 – Rendering R&D at Pixar – Ph.D. University of California, Davis with John Owens – Early GPGPU researcher (beginning in 2002) • Mike Houston – Principal Architect for AMD Accelerated Parallel Processing – One of the OpenCL and Fusion SW technical leads – Heavily involved in GPU and Fusion hardware for parallel computation – Ph.D. Stanford with Pat Hanrahan – Early GPGPU researcher (beginning in 2002)
  • 6. 6Winter 2011 – Beyond Programmable Shading Syllabus Overview • Section 1: Review modern real-time rendering – DirectX 9 pipeline (ca. 2003) – DirectX 11 pipeline (ca. 2009) – Rendering effects / terms (shadow maps, cube maps, *-buffers, …) • Section 2: GPU architecture and parallel programming – Interleaved lectures on arch., prog. models, rendering applications – Focus on GPU (because it is new to you), but discuss CPU and GPU • Section 3: Putting it all together – Advanced rendering techniques that use 3D pipeline plus parallel algorithms – New real-time rendering pipelines – How games use these architectures, programming models, and techniques – Open research problems
  • 7. 7Winter 2011 – Beyond Programmable Shading Assignment (Rough) Overview • Assignment 1 (15%, individual) – Warm-up ―programmable shading‖ project using DirectX11 – I‘ll provide skeleton code, you write shaders and add rendering passes • Assignment 2 (35%, individual) – Heterogeneous CPU + GPU rendering project using CPU and GPU languages – I‘ll provide skeleton code, but this assig. will involve a lot of coding • Assignment 3, term project (50%, ~2 people / project) – ―Render pretty pictures using all CPU and GPU cores‖ – New solutions to open problems in rendering or parallel algorithms
  • 8. 8Winter 2011 – Beyond Programmable Shading Logistics I • Computers – Each registered student will be loaned a new computer for the term – Location – Grad students may keep machine at your UW desk (not at home) – Other machines will be in Sieg 327 – Hardware details – 12 CPU cores / 24 CPU threads (3.33 GHz Xeon) with 12 GB of RAM – AMD 5870 GPU with 1 GB of RAM – Software – Intel Parallel Studio (including Cilk and many other parallelism tools) – Microsoft DirectX SDK – AMD StreamSDK (OpenCL) – AMD GPU drivers (OpenGL, Direct3D, DirectCompute) – HW and SW donated by Intel, AMD, and Microsoft
  • 9. 9Winter 2011 – Beyond Programmable Shading Logistics II • We will have several guest lecturers – Natasha Tatarchuk from Bungie – Andrew Lauritzen from Intel – Likely a couple of others later in the term (TBA) • Website – UW: http://www.cs.washington.edu/education/courses/cse558/11wi/ • Mailing list – UW: https://mailman.cs.washington.edu/mailman/listinfo/cse558
  • 10. 10Winter 2011 – Beyond Programmable Shading A Brief History of The Real- Time Rendering Pipeline
  • 11. 11Winter 2011 – Beyond Programmable Shading Early Software Renderers: Fixed Function • 1970s through mid-1980s, most graphics pipelines were fixed- function • Must change core rendering code to change geometry/lighting/surface model
  • 12. 12Winter 2011 – Beyond Programmable Shading Mid’80s – Early 90s: Programmable Shading • Most of graphics pipeline is fixed-function, highly optimized architecture, but shaders allow users to customize small portions of pipeline with simple language – Shade Trees, Cook 1984 – An Image Synthesizer, Perlin 1985 – Renderman Shading Language, Hanrahan/Lawson 1990
  • 13. 13Winter 2011 – Beyond Programmable Shading Cook Approach Shader: • Basic operations: dot products, norms, etc. • Operations organized into trees - Separated light source specification, surface reflectance, and atmospheric effects - Multiple trees: shade trees, light trees, atmosphere trees, displacement maps - Simple language Slide courtesy of John Owens, UC Davi
  • 14. 14Winter 2011 – Beyond Programmable Shading Shade Trees Phong shading model: Diffuse Specular + N L N H n Color Slide courtesy of John Owens, UC Davis
  • 15. 15Winter 2011 – Beyond Programmable Shading Cook: Arbitrary Trees Slide courtesy of John Owens, UC Davis
  • 16. 16Winter 2011 – Beyond Programmable Shading Images from Perlin
  • 17. 17Winter 2011 – Beyond Programmable Shading Contributions Cook: Separates / modularizes factors that determine shading – Geometric – Material – Environmental Perlin: Language definition, conditionals, … Leads to … Slide courtesy of John Owens, UC Davi
  • 18. 18Winter 2011 – Beyond Programmable Shading RenderMan—Hanrahan/Lawson 1. Abstract shading model based on optics for either global/local illumination 2. Define interface between rendering program and shading modules 3. High-level shading language Three main kinds of shaders—light source, surface reflectance, volume Slide courtesy of John Owens, UC Davi
  • 19. 19Winter 2011 – Beyond Programmable Shading Shading Language Summary • Expert rendering engineers write highly optimized rendering architecture • Shading languages allow non-expert users to customize renderer • Consequence – Renderers specify their own compilers, language runtimes, memory models – User‘s shading code runs as inner-most loop---JIT ends up being very important
  • 20. 20Winter 2011 – Beyond Programmable Shading A Few (but not all) Seminal Papers • Depth Buffer: Catmull 1978 • A-Buffer: Carpenter 1984 • Shade Trees: Cook 1984 • Alpha Blending: Porter and Duff 1984 • REYES: Cook, Carpenter, Catmull 1987 • Renderman: Hanrahan, Lawson 1990 • Reality Engine Graphics: Akeley 1993 • …
  • 21. 21Winter 2011 – Beyond Programmable Shading Early 90s, Interactive Rendering Started Over • Interactive software renderering (no GPUs yet) • NOTE: SGI was building interactive rendering supercomputers, but this was beginning of interactive 3D graphics on PC Wolfenstein 3D, 1992 Doom I, 1993
  • 22. 22Winter 2011 – Beyond Programmable Shading By Late 90s, Graphics Hardware Emerging • ―Hey, let‘s build hardware to make a highly constrained graphics pipeline go really fast‖ • 3DFX, NVIDIA, ATI, and countless others • Eventually replaced SGI‘s supercomputers with plug-in boards for PC • OpenGL and Direct3D gained dominance as they provided the only programming interface to GPUs
  • 23. 23Winter 2011 – Beyond Programmable Shading OpenGL and DirectX (90’s to early 2000s) • Fixed function pipeline – Configure options via API – Fixed per-vertex lighting model – Fixed vertex transforms – Limited texturing • Rendering pipeline designed by expert HW architects with little programmability available to end-user
  • 24. 24Winter 2011 – Beyond Programmable Shading 2001+: GPUs ―Rediscover‖ Programmable Shading • NVIDIA GeForce 3 and ATI Radeon 8500 • Programmable vertex computations – Up to 128 instructions • Limited programmable fragment computations – 8-16 instructions
  • 25. 25Winter 2011 – Beyond Programmable Shading 2008 (DirectX 10) Programmable Pipeline • High-level shading languages – GL Shading Language (GLSL) for OpenGL – High Level Shading Language (HLSL) for DirectX • 1000s of instructions permitted per stage • Flow control, integers, arrays, temporary storage, large number of textures, … Input Assembler Vertex Shader Setup Rasterizer Output Merger Pixel Shader Geometry Shader
  • 26. 26Winter 2011 – Beyond Programmable Shading Beyond Programmable Shading: Parallel Programming for Graphics
  • 27. 27Winter 2011 – Beyond Programmable Shading Completing the Circle • Beginning in 2001/2002, researchers realized that programmable GPUs could do more than graphics – GPUs were becoming data-parallel co-processors – A research field was born: General Purpose Programming on GPUs (GPGPU) – Scores of papers about data-parallel algorithms on GPUs – Finance, physical simulation, medical imaging, …
  • 28. 28Winter 2011 – Beyond Programmable Shading Non-Graphics GPU Programming Models • From GPGPU, arose parallel programming models that let users program GPUs without using the 3D APIs (OGL / D3D) – Brook, Sh / RapidMind, PeakStream, CUDA, OpenCL, DirectCompute, … • Focus on data-parallelism but task-parallelism is coming – Nascent task parallelism in OpenCL – Researchers have found ways to build task systems in GPU compute languages
  • 29. 29Winter 2011 – Beyond Programmable Shading ―The Killer App of GPGPU is Graphics‖ • But then researchers starting writing rendering papers that combined data-parallel GPU algorithms with the GPU rendering pipeline – Ray tracing and photon mapping, Purcell 2002-2003 – Summed Area Table Generation, Hensley 2005 – Approximate global illumination, Bunnell 2005 – Resolution Matched Shadow Maps, Lefohn 2007 – …
  • 30. 30Winter 2011 – Beyond Programmable Shading Early ―Beyond Programmable Shading‖ “Fast Summed-Area Table Generation and its Applications,” Hensley et al., Eurographics 2005 “Resolution Matched Shadow Maps,” Lefohn et al., ACM Transactions on Graphics 2007 “Dynamic Ambient Occlusion and Indirect Lighting,” Bunnell, GPU Gems II, 2005
  • 31. 31Winter 2011 – Beyond Programmable Shading Today: Creating New Rendering Algorithms • What? – Design interactive rendering algorithms that – Adapt dynamically by performing per-frame analysis of rendering results – Build dynamic per-frame data structures – Use an alternate graphics pipeline • How? – Tools – Use OpenGL/Direct3D for ―standard rendering‖ portion of algorithm – Use OpenCL/DirectCompute for ―GPU parallel algorithm‖ portion – Use Cilk, OpenCL, ArBB, … for ―CPU parallel algorithm portion‖ – Graphics programming uses mix of – Data-parallelism – Task-parallelism – Pipeline parallelism
  • 32. 32Winter 2011 – Beyond Programmable Shading “Sample Distribution Shadow Maps,” Lauritzen et al., I3D 2011 “Render-analyze-render” ―Real-Time Concurrent Linked List Construction on the GPU,‖ Yang et al., EGSR 2010 “Render to user-defined data structure”
  • 33. 33Winter 2011 – Beyond Programmable Shading ―OptiX: A General Purpose Ray Tracing Engine‖, Parket et al., SIGGRAPH 2010 ―Real-Time Stochastic Rasterization on Conventional GPU Architectures,‖ McGuire et al., High Performance Graphics 2010 Building New Real-Time Rendering Pipelines
  • 34. 34Winter 2011 – Beyond Programmable Shading Mainstream ―Beyond Programmable Shading‖ • In late 2009, Microsoft and Apple/Khronos released cross- architecture APIs for programming GPUs in ways other than through the 3D rendering API – DirectX11: Direct3D plus DirectCompute – OpenCL, interoperates with OpenGL • DirectCompute designed to work closely with 3D rendering API – Same language (HLSL) – Same CPU-side API – Same memory space
  • 35. 35Winter 2011 – Beyond Programmable Shading DirectCompute in Graphics Products (as of August 2010) • Screen-Space Ambient Occlusion – BattleForge – Colin McRae Dirt 2 • Depth of Field – Metro 2033 – Just Cause 2 • FFT Lens Effects – FutureMark 3DMark 11 Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 36. 36Winter 2011 – Beyond Programmable Shading DirectCompute in Shipping Graphics Games/Demos
  • 37. 37Winter 2011 – Beyond Programmable Shading Ambient Occlusion DirectCompute Use in Real-Time Rendering Products Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 38. 38Winter 2011 – Beyond Programmable Shading Screen-Space Ambient Occlusion • Conventional technique uses pixel shaders – http://sites.google.com/site/perumaal/ao.pdf • DirectCompute shaders enable more control of convolution filter cache Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 39. 39Winter 2011 – Beyond Programmable Shading HDSSAO Off Image credit: Codemasters Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 40. 40Winter 2011 – Beyond Programmable Shading HDSSAO On Image credit: Codemasters Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 41. 41Winter 2011 – Beyond Programmable Shading Depth of Field DirectCompute Use in Real-Time Rendering Products Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 42. 42Winter 2011 – Beyond Programmable Shading Diffusion Post-Process Depth-of-Field • Model the problem as a heat-flow simulation – Blur radius analogous to heat distance traveled – Controlled by ‗thermal conductivity‘ of image, which is defined by distance from focal plane – Solve thousands of tridiagonal linear systems in parallel each frame • Original work at Pixar –http://graphics.pixar.com/library/DepthOfField/paper.pdf – Michael Kass, Aaron Lefohn, John Owens • Metro 2033 – GDC 2010 Presentation by OlesShishkovtsov, 4A Games and AshuRege, NVIDIA – http://developer.nvidia.com/object/gdc-2010.html#metro Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 43. 43Winter 2011 – Beyond Programmable Shading Image credit: A4 Games Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 44. 44Winter 2011 – Beyond Programmable Shading FFT Lens Effects DirectCompute Use in Real-Time Rendering Products
  • 45. 45Winter 2011 – Beyond Programmable Shading FFT Lens Effects in 3DMark11 • FFT rendered image • Apply lens effect in frequency domain • Inverse FFT Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 46. 46Winter 2011 – Beyond Programmable Shading Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 47. 47Winter 2011 – Beyond Programmable Shading Slide from Chas Boyd, Beyond Programmable Shading, SIGGRAPH 2010
  • 48. 48Winter 2011 – Beyond Programmable Shading Recap: How did/do programmers create new real-time rendering algorithms?
  • 49. 49Winter 2011 – Beyond Programmable Shading Fixed Function Pipelines (DX7) • Writing new rendering algorithms means – Tricks with multitexture, stencil buffer, depth buffer, blending … • Examples – Stencil shadow volumes – Hidden line removal – …
  • 50. 50Winter 2011 – Beyond Programmable Shading Programmable Shaders (DX8-10) • Writing new rendering algorithms means – Tricks with stencil buffer, depth buffer, blending … – Plus: Writing shaders • Examples – User-defined materials – User-defined lights – User-defined data structures (built in texture memory)
  • 51. 51Winter 2011 – Beyond Programmable Shading Software Graphics: Part I (DX11) • Writing new rendering algorithms means – Tricks with stencil buffer, depth buffer, blending … – Plus: Writing shaders – Plus: Writing data- and task-parallel algorithms – Analyze results of rendering pipeline – Synthesize data structures • Examples – Dynamic summed area table – Dynamic quadtree adaptive shadow map – Dynamic histogram-analysis shadow map – Dynamic ambient occlusion – Order-independent transparency – …
  • 52. 52Winter 2011 – Beyond Programmable Shading Software Graphics: Part II (DX11+) • Writing new rendering algorithms means – Tricks with stencil buffer, depth buffer, blending … – Plus: Writing shaders – Plus: Writing data- and task-parallel algorithms – Analyze results of rendering pipeline – Synthesize data structures – Plus: Creating new and extended rendering pipelines • Examples – Stochastic rasterization rendering – Ray tracing pipelines – …
  • 53. 53Winter 2011 – Beyond Programmable Shading What Next? Full Software Graphics? • Intel‘s Larrabee – Advertised to come full-circle to ―full software‖ real-time graphics – Unfortunately, the product was cancelled • Yet – ―Mixing SW graphics with the HW pipeline‖ is becoming mainstream – GPUs are becoming increasingly programmable – CPUs are becoming more parallel/powerful – CPUs and GPUs are getting ―closer‖ with integrated CPU-GPU chips • So… – Will fixed-function graphics hardware and fixed pipelines go away?
  • 54. 54Winter 2011 – Beyond Programmable Shading The Wheel of Reincarnation Gradually the processor became more complex.... Finally the display processor came to resemble a full-fledged computer with some special graphics features. And then a strange thing happened. We felt compelled to add to the processor a second, subsidiary processor, which, itself, began to grow in complexity. It was then that we discovered a disturbing truth. Designing a display processor can become a never-ending cyclical process. In fact, we found the process so frustrating that we have come to call it the "wheel of reincarnation.” - Myer and Sutherland ―On The Design of Display Processors‖, Communications of the ACM, 1968
  • 55. 55Winter 2011 – Beyond Programmable Shading Will There Be Another Turn of The Wheel of Reincarnation? • Is ―the rise of SW graphics‖ a temporary (5-10) year window as we go around the wheel of reincarnation or has the wheel stopped turning? • If it has stopped turning, why? • If it hasn‘t stopped turning, what will be the next fixed- function? – The next turn of the wheel will not look like today’s graphics hardware – It is a great time to be a graphics researcher because the killer-app SW rendering pipelines/capabilities created now may define future fixed- function hardware
  • 56. 56Winter 2011 – Beyond Programmable Shading (A Few) Open Real-Time Rendering Problems • More realistic ―eye rays‖ – Anti-aliasing – Motion blur and depth-of-field – Transparency – Curved and highly-detailed geometry – … • More realistic illumination – Shadows (hard, soft, volumetric) – Global illumination – Reflections From Johan Andersson’s SIGGRAPH 2010 talk
  • 57. 57Winter 2011 – Beyond Programmable Shading (A Few) Open Programming Model Problems • Consistent programming models for CPU and GPU • Consistent programming model for pipeline, data, and task parallelism (―braided parallelism‖) • More flexible rendering pipelines • (Many more will come up throughout the course as you use the state-of-the-art)
  • 58. 58Winter 2011 – Beyond Programmable Shading Conclusions • SW+ HW graphics is here today (beginning ―for real‖ in DX11) – Graphics programming is no longer simply a single pre-defined pipeline – Research is ablaze with SW real-time rendering research on GPUs and CPUs • Future real-time rendering programming will likely consist of – A pre-defined (Direct3D/OpenGL) rendering pipeline – User-defined software pipelines – User-defined data- and task-parallel code tightly coupled to graphics pipelines • Is the wheel of reincarnation still turning?
  • 59. 59Winter 2011 – Beyond Programmable Shading Questions? • Email Aaron: alefohn@cs.washington.edu • Email Mike: mhouston@graphics.stanford.edu • Next lecture: ―A trip down the 2003 rasterization pipeline‖