SlideShare a Scribd company logo
1 of 18
Download to read offline
Primitive processing and
advanced shading architecture
for embedded space
Maxim Kazakov Eisaku Ohbuchi
Digital Media Professionals, Inc
Contributions
ā€¢ Vertex cache-accelerated ļ¬xed
and variable size primitive
processing
ā€¢ One-pass on-chip
implementation of various
geometry processing
algorithms
ā€¢ Enables geometry
reconstruction from
compact description
ā€¢ Conļ¬gurable per-fragment
shading
ā€¢ Dot product+LUT machine
ā€¢ Various shading models can
be mapped
ā€¢ On-chip material
description
ā€¢ Reduced memory bandwidth requirements
ā€¢ Realized in embedded-space architecture
Motivation
ā€¢ Bring appealing shading to embedded space
ā€¢ Make complex geometry processing available for
embedded applications
ā€¢ Sufļ¬cient performance
ā€¢ Meet embedded space limitations
ā€¢ Minimize gate size, power consumption and memory
trafļ¬c growth
ā€¢ Heterogeneous architecture as a solution
Related work
ā€¢ Partially programmable IPs [Woo et al. 2004; Donghyun Kim 2005; Imai et al. 2004;
Kameyama et al. 2003] gradually replaced by programmable [IMG, NVIDIA,ARM etc] ones
ā€¢ Subdivision/high-order surfaces tessellation as geometry compression
ā€¢ Speciļ¬cally tailored solutions [Uesaki et al. 2004; Pedersen 2004]
ā€¢ Multipass techniques [Shiue et al. 2005;Andrews and Baker 2006]
ā€¢ Geometry shader-based ones [Loop et al. 2009]
ā€¢ Real-time BRDF rendering
ā€¢ BRDF factorization into 2D functions [Heidrich and Seidel 1999]
ā€¢ Half vector-based parametrizations [Kautz and McCool 1999]
ā€¢ Factorization into combination of 1D functions [Lawrence et al. 2004; Lawrence et al.
2006]
ā€¢ Fixed HW implementation for certain shading models [Ohbuchi and Unno 2002]
Architecture overview
ā€¢ Programmable geometry
processing + ļ¬xed function
fragment shader
ā€¢ Augmented OpenGL ES 1.X
pipeline
ā€¢ Primitive Engine
ā€¢ ExtendedVP
ā€¢ Fixed-function fragment shader
ā€¢ Calculates shading based
on interpolated local frame
and view info
ā€¢ Consumes bump/tangent/
shadow map samples
ā€¢ Provides extra inputs to
texture environment
Primitive
Engine
Transform
&Lighting
VPs
Post-TnL
vertex
cache
Rasterizer
Texture
sampling
Per-
fragment
shading
Color sum Fog
Alpha/
Depth/
Stencil
tests
Color
buffer
blend
Dither Framebuffer
Geometry engine
ā€¢ SM 3.0-levelVertex
Processors
ā€¢ Secondary vertex cache
(SVC) + Primitive Engine
(PE) combination
ā€¢ PE is aVP with
programmable primitive
output
ā€¢ Fixed and variable size
geometry primitives
ā€¢ Up to SVC size per
primitive
Command
interface
VP
Vertexdata
Cache controller SVC
Rasterizer
PE
Vertex data
SoC
memory
bus
Vertexdata
Geometry engine
ā€¢ All geometry shader`s geometry
input comes from SVC
ā€¢ No need for texture access
ā€¢ SVC exploits spatial coherency
ā€¢ VB trafļ¬c reduction
ā€¢ Important for multivertex
primitives and complex geometry
processing algorithms
ā€¢ Optional reduction of internal
vertex attribute trafļ¬c
ā€¢ Full set of attribs for a few initial
primitive vertices
ā€¢ Marginal gate size growth
ā€¢ Gate estate sharing withVPs
ā€¢ Limited modiļ¬cation of SVC logic
Command
interface
VP
Vertexdata
Cache controller SVC
Rasterizer
PE
Vertex data
SoC
memory
bus
Vertexdata
Variable-size primitives
ā€¢ Primitive size preļ¬xes vertices in
index buffer
ā€¢ Variable primitive size for GS
ā€¢ Subdivision implementation
ā€¢ Supports varying patch size
sequence naturally
ā€¢ One shader for all patches
ā€¢ No coherency breaks
ā€¢ No texture access for
connectivity information
ā€¢ Subdivision patches are big and
share a lot - greatly accelerated
by vertex cache
1
4 8
2
3
6
5 9
10
7 11
13
15
16
17
0
12
14
19
20
21
23
24
25
18 22
8
10
4
19
63
0
2
1 9
7
5
18 9 5 6 10 8 4 0 1 2 3 7 11 17 16 15 14 13 12
Catmull-Clark:
21 5 19 7 7 9 8 4 0 1 19 5 1 2 3 6 7 19 6 10 9 5
v5 neighborhood v19 neighborhood v7 neighborhood
Loop:
Fragment shader
ā€¢ OpenGL ES 1.X shader + per-fragment
shading module
ā€¢ Primary and secondary color outputs
ā€¢ Combines several 1D shading functions
stored in on-chip LUTs
ā€¢ Conļ¬gurable LUT inputs
ā€¢ NĀ·V, NĀ·L, NĀ·H, VĀ·H, cos(Ļ†), spot
ā€¢ Alpha output from LUT
ā€¢ Used for Fresnel-like reļ¬‚ection
ā€¢ LUT output can be disabled (constant)
ā€¢ Physically-based and NPR shading models
ā€¢ Multilayer reļ¬‚ection can be
approximated as well
G0,1 = G
or 1,
G
= (L Ā· N)/ |L + V|
2
L
V
N
H
Ļ†
T
B
Angle Ļ†:
Fragment shader
ā€¢ Multiple lights
ā€¢ Perturbation by bump/tangent map
ā€¢ Attenuation by shadow map
ā€¢ Local frame reconstruction from
quaternion
ā€¢ A long ļ¬xed function pipeline
ā€¢ 30-50 stages
ā€¢ Aligns with texture access latency
ā€¢ Matches 3 24bit 4way SIMD units
in size
ā€¢ All LUTs are on-chip
ā€¢ Zero external memory access
during rendering
ā€¢ Fixed time per fragment
ā€¢ 1-4 clocks/fragment/light
depending on conļ¬guration
ā€¢ Predictable performance
Rasterizer
Texture
circuit
Per-fragment shading
cosĻ† and G
factor
circuit
Normal and
tangent
circuit
Dot product and LUT circuit
Primary and secondary color
calculation circuit
SoC
memory
bus
Color blending and buffer updating circuit
U,VV Q
N,T
N,T
V,L,LN,
LN/|L+N|2
cosĻ†
Bump,Shadow
Fresnel, Spot, D,Rrgb
Primary and
secondary
colors Texture
colors
Textureread
ColorR/W
Shading performance
Shading model Clk/frag SM 3.0 asm steps
Phong shading model
D0 = cos s(NĀ·L)
G0,1=1
1 35
Phong + bump
D0 = cos s(Nā€™Ā·L)
G0,1=1
1 38
Schlick anisotropic model
D1 = Z(NĀ·H), RĪ»=FĪ»(VĀ·H)
S=A(cosĻ†), G0,1=Gā€™
4 61
Cook-Torrance shading model
D1 = D(NĀ·H), RĪ»=FĪ»(VĀ·H)
G0,1=Gā€™
2 48
1.X API
ā€¢ Dedicated APIs in the case of 1.X
library
ā€¢ In spirit of 1.X API for FS
ā€¢ Light reļ¬‚ection environment
(similar to texture
environment)
ā€¢ 8 API functions
ā€¢ Per-fragment shading and
LUT management
ā€¢ A lot of extra tokens
ā€¢ Preconļ¬gured geometry shaders
selected according to a primitive
type
ā€¢ Subdivision, silhouette, particle
systems
glActiveLightDMP ( GL_LIGHT0_DMP ) ;
glLightEnviDMP ( GL_LIGHT_ENV_LUT_INPUT_SELECTOR_D0_DMP
, GL_LIGHT_ENV_LN_DMP ) ;
glLightEnviDMP ( GL_LIGHT_ENV_LAYER_CONFIG_DMP
, GL_LIGHT_ENV_LAYER_CONFIG0_DMP ) ;
glLightEnviDMP ( GL_LIGHT_ENV_GEOM_FACTOR0_DMP, GL_FALSE ) ;
GLfloat lut[512] ;
for ( j = 1 ; j  128 ; j++ ){
! lut[j] = powf( (float)j/127.f, 30.f ) ;
! lut[j+255] = lut[j] - lut[j-1] ;
}
glMaterialLutDMP ( 2, lut ) ;
glMaterialfv ( GL_FRAGMENT_FRONT_AND_BACK_DMP
, GL_MATERIAL_LUT_D0_DMP, 2 ) ;
GLushort indices[] = {
! ! 14, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
! ! } ;
glDrawElements ( GL_SUBD_PRIM_DMP, 15
, GL_UNSIGNED_SHORT, indices ) ;
2.0 API
ā€¢ Preinstalled fragment shader
object
ā€¢ Exposes predeļ¬ned (~200)
uniforms for all parameters of
fragment pipeline
ā€¢ Predeļ¬ned attributes for
binding withVS/GS
ā€¢ GL program objects are great
in setting all params with a
single glUseProgram call
ā€¢ Other extensions to switch a
set of LUTs in one call
ā€¢ Minimal modiļ¬cations of app/
content creation chain
glAttachShader( progid, GL_DMP_FRAGMENT_SHADER_DMP );
glUniform1i( glGetUniformLocation( progid
, dmp_LightEnv.lutInputD0)
, GL_LIGHT_ENV_LN_DMP );
glUniform1i( glGetUniformLocation( progid
, dmp_LightEnv.config)
, GL_LIGHT_ENV_LAYER_CONFIG0_DMP );
glUniform1i( glGetUniformLocation( progid
, dmp_FragmentLightSource[0].geomFactor0), GL_FALSE);
GLfloat lut[512] ;
for ( j = 1 ; j  128 ; j++ ){
! lut[j] = powf( (float)j/127.f, 30.f ) ;
! lut[j+255] = lut[j] - lut[j-1] ;
}
glBindTexture(GL_LUT_TEXTURE0_DMP, lutid);
glTexImage1D( GL_LUT_TEXTURE0_DMP, 0, GL_LUMINANCEF_DMP, 512, 0
, GL_LUMINANCEF_DMP, GL_FLOAT, lut);
glUniform1i( glGetUniformLocation( progid
, dmp_FragmentMaterial.samplerD0), 0);
varying vec3 dmp_lrView;
varying vec3 dmp_lrQuat;
....
dmp_lrView! = -gl_Position.xyz;
gl_Position = u_projection_matrix * gl_Position;
gl_TexCoord[1] = vec4(a_texcoord1.x, a_texcoord1.y, 0.0, 1.0);
gl_FrontColor = u_material_constant_color0;
2.0 API
ā€¢ StandardVS shader API
ā€¢ GL 3.2-like GS API
ā€¢ Extended primitive type
for variable-size and
non-standard ļ¬xed-size
ones
ā€¢ GLSL`s gl_VerticesIn is
not a constant
GLushort indices[] = {
! ! 14, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13
! ! } ;
glUniform1f( glGetUniformLocation(progid
, subdivisionlevel), 2);
glDrawElements( GL_GEOMETRY_PRIMITIVE_DMP, 15
, GL_UNSIGNED_SHORT, indices ) ;
void main(void){
vec4 sc, se, v20 ;
float val = (gl_VerticesIn-8)/2 ;
if ( 3.0==val ) { // valence 3
sc = gl_PositionIn[2] + gl_PositionIn[4] + gl_PositionIn[12] ;
se = gl_PositionIn[1] + gl_PositionIn[3] +
gl_PositionIn[gl_VerticesIn-1] ;
e00 = gl_PositionIn[gl_VerticesIn-1] ;
e0k04 = gl_PositionIn[3] ;
c0k04 = gl_PositionIn[12];
} else { // 4 or more
sc = sumc() ;
se = sume() ;
e00 != gl_PositionIn[13];
e0k04! = gl_PositionIn[3] ;
c0k04! = gl_PositionIn[gl_VerticesIn-2];
}
...
Proļ¬ling results
0%
50%
100%
150%
GL ES + Subdivision + LR shading + Shadow + Soft shadow
Total trafļ¬c
Performance
VB trafļ¬c
ZB trafļ¬c
Texture
CB trafļ¬c
Results-subdivision
0x
1x
2x
3x
4x
5x
Control mesh 1st level 2nd level
VB trafļ¬c
Output verts
Setup triangle/s
No interpolation
control mesh level 1 level 2
Results-subdivision
ā€¢ Lower performance than of
pretessellated rendering
ā€¢ Single PE is a bottleneck for heavy
shaders
ā€¢ 800+ instructions in CC shader
ā€¢ One irregular vertex only
ā€¢ 700+ instructions in Loop shader
ā€¢ Up to 3 irregular vertices
ā€¢ ~50% of interpolation instructions
ā€¢ Explains HW tessellators in
desktop accelerators
ā€¢ ~2x vertex buffer trafļ¬c growth
compared to control mesh rendering
ā€¢ Vertex cache exploits a great
portion of spatial coherency
ā€¢ 8x less than of pretessellated
mesh rendering (2 levels)
ā€¢ Patch sorting causes 7-70% increase
inVB trafļ¬c
ā€¢ Depending on the object and
subdivision scheme
ā€¢ Sort breaks coherency as same
size primitives are not necessarily
neighbors
ā€¢ Loop primitive is bigger - sort
impact is heavier
Conclusion
ā€¢ Hybrid architecture for embedded space
ā€¢ Predictable fragment shader performance
ā€¢ Complex geometry processing capabilities
ā€¢ Vertex cache-accelerated processing of ļ¬xed- and variable-size
primitives
ā€¢ ReducedVB trafļ¬c due to preserved spatial coherency
ā€¢ On-chip subdivision and silhouette rendering as illustrations
ā€¢ Bump/Tangent/Shadow-mapped shading at few clk/fragment
ā€¢ Support for complex shading models
ā€¢ No extra memory access due to on-chip material data
ā€¢ Extended functionality exposed via both 1.X and 2.0 OpenGL ES API
ā€¢ Enables short porting times for OpenGL ES apps/content
creation chains

More Related Content

What's hot

Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Mark Kilgard
Ā 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
Ā 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing PipelineWolfgang Engel
Ā 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesNarann29
Ā 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016Mark Kilgard
Ā 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11smashflt
Ā 
Avoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsAvoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsMark Kilgard
Ā 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsTristan Lorach
Ā 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
Ā 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb RaiderWolfgang Engel
Ā 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineNarann29
Ā 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesAMD Developer Central
Ā 
CS 354 Viewing Stuff
CS 354 Viewing StuffCS 354 Viewing Stuff
CS 354 Viewing StuffMark Kilgard
Ā 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMark Kilgard
Ā 
Penn graphics
Penn graphicsPenn graphics
Penn graphicsfloored
Ā 
vkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan APIvkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan APITristan Lorach
Ā 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
Ā 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
Ā 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...Mark Kilgard
Ā 

What's hot (20)

Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...
Ā 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
Ā 
A new Post-Processing Pipeline
A new Post-Processing PipelineA new Post-Processing Pipeline
A new Post-Processing Pipeline
Ā 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering Techniques
Ā 
NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016NVIDIA OpenGL in 2016
NVIDIA OpenGL in 2016
Ā 
GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11GDC 2012: Advanced Procedural Rendering in DX11
GDC 2012: Advanced Procedural Rendering in DX11
Ā 
Avoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL PitfallsAvoiding 19 Common OpenGL Pitfalls
Avoiding 19 Common OpenGL Pitfalls
Ā 
Siggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentialsSiggraph 2016 - Vulkan and nvidia : the essentials
Siggraph 2016 - Vulkan and nvidia : the essentials
Ā 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
Ā 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb Raider
Ā 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
Ā 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
Ā 
CS 354 Viewing Stuff
CS 354 Viewing StuffCS 354 Viewing Stuff
CS 354 Viewing Stuff
Ā 
Migrating from OpenGL to Vulkan
Migrating from OpenGL to VulkanMigrating from OpenGL to Vulkan
Migrating from OpenGL to Vulkan
Ā 
Scope Stack Allocation
Scope Stack AllocationScope Stack Allocation
Scope Stack Allocation
Ā 
Penn graphics
Penn graphicsPenn graphics
Penn graphics
Ā 
vkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan APIvkFX: Effect(ive) approach for Vulkan API
vkFX: Effect(ive) approach for Vulkan API
Ā 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
Ā 
Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016Optimizing the Graphics Pipeline with Compute, GDC 2016
Optimizing the Graphics Pipeline with Compute, GDC 2016
Ā 
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...Programming with NV_path_rendering:  An Annex to the SIGGRAPH Asia 2012 paper...
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...
Ā 

Similar to Hpg2011 papers kazakov

OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading LanguageJungsoo Nam
Ā 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
Ā 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shadersgueste52f1b
Ā 
OpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUOpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUJiansong Chen
Ā 
Game Programming 12 - Shaders
Game Programming 12 - ShadersGame Programming 12 - Shaders
Game Programming 12 - ShadersNick Pruehs
Ā 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IšŸ’» Anton Gerdelan
Ā 
Introduction To Geometry Shaders
Introduction To Geometry ShadersIntroduction To Geometry Shaders
Introduction To Geometry Shaderspjcozzi
Ā 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
Ā 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Prabindh Sundareson
Ā 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicschangehee lee
Ā 
Graph processing
Graph processingGraph processing
Graph processingyeahjs
Ā 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015Mark Kilgard
Ā 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
Ā 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)mistercteam
Ā 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDAJungsoo Nam
Ā 
Introduction of openGL
Introduction  of openGLIntroduction  of openGL
Introduction of openGLGary Yeh
Ā 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
Ā 

Similar to Hpg2011 papers kazakov (20)

OpenGL Shading Language
OpenGL Shading LanguageOpenGL Shading Language
OpenGL Shading Language
Ā 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
Ā 
Hardware Shaders
Hardware ShadersHardware Shaders
Hardware Shaders
Ā 
OpenGL ES and Mobile GPU
OpenGL ES and Mobile GPUOpenGL ES and Mobile GPU
OpenGL ES and Mobile GPU
Ā 
Game Programming 12 - Shaders
Game Programming 12 - ShadersGame Programming 12 - Shaders
Game Programming 12 - Shaders
Ā 
Computer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming IComputer Graphics - Lecture 01 - 3D Programming I
Computer Graphics - Lecture 01 - 3D Programming I
Ā 
lect13_programmable_dp.pptx
lect13_programmable_dp.pptxlect13_programmable_dp.pptx
lect13_programmable_dp.pptx
Ā 
Introduction To Geometry Shaders
Introduction To Geometry ShadersIntroduction To Geometry Shaders
Introduction To Geometry Shaders
Ā 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
Ā 
Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011Advanced Graphics Workshop - GFX2011
Advanced Graphics Workshop - GFX2011
Ā 
Chapter-3.pdf
Chapter-3.pdfChapter-3.pdf
Chapter-3.pdf
Ā 
Smedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphicsSmedberg niklas bringing_aaa_graphics
Smedberg niklas bringing_aaa_graphics
Ā 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
Ā 
Graph processing
Graph processingGraph processing
Graph processing
Ā 
OpenGL for 2015
OpenGL for 2015OpenGL for 2015
OpenGL for 2015
Ā 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
Ā 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)
Ā 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
Ā 
Introduction of openGL
Introduction  of openGLIntroduction  of openGL
Introduction of openGL
Ā 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
Ā 

More from mistercteam

Preliminary xsx die_fact_finding
Preliminary xsx die_fact_findingPreliminary xsx die_fact_finding
Preliminary xsx die_fact_findingmistercteam
Ā 
20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secrets20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secretsmistercteam
Ā 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudamistercteam
Ā 
201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpu201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpumistercteam
Ā 
5 baker oxide (1)
5 baker oxide (1)5 baker oxide (1)
5 baker oxide (1)mistercteam
Ā 
The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805mistercteam
Ā 
01 intro-bps-2011
01 intro-bps-201101 intro-bps-2011
01 intro-bps-2011mistercteam
Ā 
Gdce 2010 dx11
Gdce 2010 dx11Gdce 2010 dx11
Gdce 2010 dx11mistercteam
Ā 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloadedmistercteam
Ā 
Mantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referenceMantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referencemistercteam
Ā 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performancemistercteam
Ā 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performancemistercteam
Ā 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
Ā 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12mistercteam
Ā 

More from mistercteam (16)

Preliminary xsx die_fact_finding
Preliminary xsx die_fact_findingPreliminary xsx die_fact_finding
Preliminary xsx die_fact_finding
Ā 
20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secrets20150207 howes-gpgpu8-dark secrets
20150207 howes-gpgpu8-dark secrets
Ā 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
Ā 
201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpu201210 howes-hsa and-the_modern_gpu
201210 howes-hsa and-the_modern_gpu
Ā 
3 673 (1)
3 673 (1)3 673 (1)
3 673 (1)
Ā 
5 baker oxide (1)
5 baker oxide (1)5 baker oxide (1)
5 baker oxide (1)
Ā 
The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805The technology behind_the_elemental_demo_16x9-1248544805
The technology behind_the_elemental_demo_16x9-1248544805
Ā 
Lecture14
Lecture14Lecture14
Lecture14
Ā 
01 intro-bps-2011
01 intro-bps-201101 intro-bps-2011
01 intro-bps-2011
Ā 
Gdce 2010 dx11
Gdce 2010 dx11Gdce 2010 dx11
Gdce 2010 dx11
Ā 
Dx11 performancereloaded
Dx11 performancereloadedDx11 performancereloaded
Dx11 performancereloaded
Ā 
Mantle programming-guide-and-api-reference
Mantle programming-guide-and-api-referenceMantle programming-guide-and-api-reference
Mantle programming-guide-and-api-reference
Ā 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performance
Ā 
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performanceD3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performance
Ā 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
Ā 
Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12Getting the-best-out-of-d3 d12
Getting the-best-out-of-d3 d12
Ā 

Recently uploaded

"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
Ā 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Ā 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
Ā 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
Ā 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
Ā 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
Ā 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
Ā 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
Ā 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
Ā 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
Ā 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
Ā 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
Ā 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo GarcĆ­a Lavilla
Ā 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
Ā 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
Ā 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
Ā 
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)Wonjun Hwang
Ā 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
Ā 

Recently uploaded (20)

"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition ā€“ A Pythonic Tour of Trade-Offs", Hynek Schlawack
Ā 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Ā 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Ā 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Ā 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Ā 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Ā 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Ā 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Ā 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Ā 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Ā 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Ā 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Ā 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Ā 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Ā 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
Ā 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Ā 
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Bun (KitWorks Team Study ė…øė³„ė§ˆė£Ø ė°œķ‘œ 2024.4.22)
Ā 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
Ā 

Hpg2011 papers kazakov

  • 1. Primitive processing and advanced shading architecture for embedded space Maxim Kazakov Eisaku Ohbuchi Digital Media Professionals, Inc
  • 2. Contributions ā€¢ Vertex cache-accelerated ļ¬xed and variable size primitive processing ā€¢ One-pass on-chip implementation of various geometry processing algorithms ā€¢ Enables geometry reconstruction from compact description ā€¢ Conļ¬gurable per-fragment shading ā€¢ Dot product+LUT machine ā€¢ Various shading models can be mapped ā€¢ On-chip material description ā€¢ Reduced memory bandwidth requirements ā€¢ Realized in embedded-space architecture
  • 3. Motivation ā€¢ Bring appealing shading to embedded space ā€¢ Make complex geometry processing available for embedded applications ā€¢ Sufļ¬cient performance ā€¢ Meet embedded space limitations ā€¢ Minimize gate size, power consumption and memory trafļ¬c growth ā€¢ Heterogeneous architecture as a solution
  • 4. Related work ā€¢ Partially programmable IPs [Woo et al. 2004; Donghyun Kim 2005; Imai et al. 2004; Kameyama et al. 2003] gradually replaced by programmable [IMG, NVIDIA,ARM etc] ones ā€¢ Subdivision/high-order surfaces tessellation as geometry compression ā€¢ Speciļ¬cally tailored solutions [Uesaki et al. 2004; Pedersen 2004] ā€¢ Multipass techniques [Shiue et al. 2005;Andrews and Baker 2006] ā€¢ Geometry shader-based ones [Loop et al. 2009] ā€¢ Real-time BRDF rendering ā€¢ BRDF factorization into 2D functions [Heidrich and Seidel 1999] ā€¢ Half vector-based parametrizations [Kautz and McCool 1999] ā€¢ Factorization into combination of 1D functions [Lawrence et al. 2004; Lawrence et al. 2006] ā€¢ Fixed HW implementation for certain shading models [Ohbuchi and Unno 2002]
  • 5. Architecture overview ā€¢ Programmable geometry processing + ļ¬xed function fragment shader ā€¢ Augmented OpenGL ES 1.X pipeline ā€¢ Primitive Engine ā€¢ ExtendedVP ā€¢ Fixed-function fragment shader ā€¢ Calculates shading based on interpolated local frame and view info ā€¢ Consumes bump/tangent/ shadow map samples ā€¢ Provides extra inputs to texture environment Primitive Engine Transform &Lighting VPs Post-TnL vertex cache Rasterizer Texture sampling Per- fragment shading Color sum Fog Alpha/ Depth/ Stencil tests Color buffer blend Dither Framebuffer
  • 6. Geometry engine ā€¢ SM 3.0-levelVertex Processors ā€¢ Secondary vertex cache (SVC) + Primitive Engine (PE) combination ā€¢ PE is aVP with programmable primitive output ā€¢ Fixed and variable size geometry primitives ā€¢ Up to SVC size per primitive Command interface VP Vertexdata Cache controller SVC Rasterizer PE Vertex data SoC memory bus Vertexdata
  • 7. Geometry engine ā€¢ All geometry shader`s geometry input comes from SVC ā€¢ No need for texture access ā€¢ SVC exploits spatial coherency ā€¢ VB trafļ¬c reduction ā€¢ Important for multivertex primitives and complex geometry processing algorithms ā€¢ Optional reduction of internal vertex attribute trafļ¬c ā€¢ Full set of attribs for a few initial primitive vertices ā€¢ Marginal gate size growth ā€¢ Gate estate sharing withVPs ā€¢ Limited modiļ¬cation of SVC logic Command interface VP Vertexdata Cache controller SVC Rasterizer PE Vertex data SoC memory bus Vertexdata
  • 8. Variable-size primitives ā€¢ Primitive size preļ¬xes vertices in index buffer ā€¢ Variable primitive size for GS ā€¢ Subdivision implementation ā€¢ Supports varying patch size sequence naturally ā€¢ One shader for all patches ā€¢ No coherency breaks ā€¢ No texture access for connectivity information ā€¢ Subdivision patches are big and share a lot - greatly accelerated by vertex cache 1 4 8 2 3 6 5 9 10 7 11 13 15 16 17 0 12 14 19 20 21 23 24 25 18 22 8 10 4 19 63 0 2 1 9 7 5 18 9 5 6 10 8 4 0 1 2 3 7 11 17 16 15 14 13 12 Catmull-Clark: 21 5 19 7 7 9 8 4 0 1 19 5 1 2 3 6 7 19 6 10 9 5 v5 neighborhood v19 neighborhood v7 neighborhood Loop:
  • 9. Fragment shader ā€¢ OpenGL ES 1.X shader + per-fragment shading module ā€¢ Primary and secondary color outputs ā€¢ Combines several 1D shading functions stored in on-chip LUTs ā€¢ Conļ¬gurable LUT inputs ā€¢ NĀ·V, NĀ·L, NĀ·H, VĀ·H, cos(Ļ†), spot ā€¢ Alpha output from LUT ā€¢ Used for Fresnel-like reļ¬‚ection ā€¢ LUT output can be disabled (constant) ā€¢ Physically-based and NPR shading models ā€¢ Multilayer reļ¬‚ection can be approximated as well G0,1 = G or 1, G = (L Ā· N)/ |L + V| 2 L V N H Ļ† T B Angle Ļ†:
  • 10. Fragment shader ā€¢ Multiple lights ā€¢ Perturbation by bump/tangent map ā€¢ Attenuation by shadow map ā€¢ Local frame reconstruction from quaternion ā€¢ A long ļ¬xed function pipeline ā€¢ 30-50 stages ā€¢ Aligns with texture access latency ā€¢ Matches 3 24bit 4way SIMD units in size ā€¢ All LUTs are on-chip ā€¢ Zero external memory access during rendering ā€¢ Fixed time per fragment ā€¢ 1-4 clocks/fragment/light depending on conļ¬guration ā€¢ Predictable performance Rasterizer Texture circuit Per-fragment shading cosĻ† and G factor circuit Normal and tangent circuit Dot product and LUT circuit Primary and secondary color calculation circuit SoC memory bus Color blending and buffer updating circuit U,VV Q N,T N,T V,L,LN, LN/|L+N|2 cosĻ† Bump,Shadow Fresnel, Spot, D,Rrgb Primary and secondary colors Texture colors Textureread ColorR/W
  • 11. Shading performance Shading model Clk/frag SM 3.0 asm steps Phong shading model D0 = cos s(NĀ·L) G0,1=1 1 35 Phong + bump D0 = cos s(Nā€™Ā·L) G0,1=1 1 38 Schlick anisotropic model D1 = Z(NĀ·H), RĪ»=FĪ»(VĀ·H) S=A(cosĻ†), G0,1=Gā€™ 4 61 Cook-Torrance shading model D1 = D(NĀ·H), RĪ»=FĪ»(VĀ·H) G0,1=Gā€™ 2 48
  • 12. 1.X API ā€¢ Dedicated APIs in the case of 1.X library ā€¢ In spirit of 1.X API for FS ā€¢ Light reļ¬‚ection environment (similar to texture environment) ā€¢ 8 API functions ā€¢ Per-fragment shading and LUT management ā€¢ A lot of extra tokens ā€¢ Preconļ¬gured geometry shaders selected according to a primitive type ā€¢ Subdivision, silhouette, particle systems glActiveLightDMP ( GL_LIGHT0_DMP ) ; glLightEnviDMP ( GL_LIGHT_ENV_LUT_INPUT_SELECTOR_D0_DMP , GL_LIGHT_ENV_LN_DMP ) ; glLightEnviDMP ( GL_LIGHT_ENV_LAYER_CONFIG_DMP , GL_LIGHT_ENV_LAYER_CONFIG0_DMP ) ; glLightEnviDMP ( GL_LIGHT_ENV_GEOM_FACTOR0_DMP, GL_FALSE ) ; GLfloat lut[512] ; for ( j = 1 ; j 128 ; j++ ){ ! lut[j] = powf( (float)j/127.f, 30.f ) ; ! lut[j+255] = lut[j] - lut[j-1] ; } glMaterialLutDMP ( 2, lut ) ; glMaterialfv ( GL_FRAGMENT_FRONT_AND_BACK_DMP , GL_MATERIAL_LUT_D0_DMP, 2 ) ; GLushort indices[] = { ! ! 14, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 ! ! } ; glDrawElements ( GL_SUBD_PRIM_DMP, 15 , GL_UNSIGNED_SHORT, indices ) ;
  • 13. 2.0 API ā€¢ Preinstalled fragment shader object ā€¢ Exposes predeļ¬ned (~200) uniforms for all parameters of fragment pipeline ā€¢ Predeļ¬ned attributes for binding withVS/GS ā€¢ GL program objects are great in setting all params with a single glUseProgram call ā€¢ Other extensions to switch a set of LUTs in one call ā€¢ Minimal modiļ¬cations of app/ content creation chain glAttachShader( progid, GL_DMP_FRAGMENT_SHADER_DMP ); glUniform1i( glGetUniformLocation( progid , dmp_LightEnv.lutInputD0) , GL_LIGHT_ENV_LN_DMP ); glUniform1i( glGetUniformLocation( progid , dmp_LightEnv.config) , GL_LIGHT_ENV_LAYER_CONFIG0_DMP ); glUniform1i( glGetUniformLocation( progid , dmp_FragmentLightSource[0].geomFactor0), GL_FALSE); GLfloat lut[512] ; for ( j = 1 ; j 128 ; j++ ){ ! lut[j] = powf( (float)j/127.f, 30.f ) ; ! lut[j+255] = lut[j] - lut[j-1] ; } glBindTexture(GL_LUT_TEXTURE0_DMP, lutid); glTexImage1D( GL_LUT_TEXTURE0_DMP, 0, GL_LUMINANCEF_DMP, 512, 0 , GL_LUMINANCEF_DMP, GL_FLOAT, lut); glUniform1i( glGetUniformLocation( progid , dmp_FragmentMaterial.samplerD0), 0); varying vec3 dmp_lrView; varying vec3 dmp_lrQuat; .... dmp_lrView! = -gl_Position.xyz; gl_Position = u_projection_matrix * gl_Position; gl_TexCoord[1] = vec4(a_texcoord1.x, a_texcoord1.y, 0.0, 1.0); gl_FrontColor = u_material_constant_color0;
  • 14. 2.0 API ā€¢ StandardVS shader API ā€¢ GL 3.2-like GS API ā€¢ Extended primitive type for variable-size and non-standard ļ¬xed-size ones ā€¢ GLSL`s gl_VerticesIn is not a constant GLushort indices[] = { ! ! 14, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 ! ! } ; glUniform1f( glGetUniformLocation(progid , subdivisionlevel), 2); glDrawElements( GL_GEOMETRY_PRIMITIVE_DMP, 15 , GL_UNSIGNED_SHORT, indices ) ; void main(void){ vec4 sc, se, v20 ; float val = (gl_VerticesIn-8)/2 ; if ( 3.0==val ) { // valence 3 sc = gl_PositionIn[2] + gl_PositionIn[4] + gl_PositionIn[12] ; se = gl_PositionIn[1] + gl_PositionIn[3] + gl_PositionIn[gl_VerticesIn-1] ; e00 = gl_PositionIn[gl_VerticesIn-1] ; e0k04 = gl_PositionIn[3] ; c0k04 = gl_PositionIn[12]; } else { // 4 or more sc = sumc() ; se = sume() ; e00 != gl_PositionIn[13]; e0k04! = gl_PositionIn[3] ; c0k04! = gl_PositionIn[gl_VerticesIn-2]; } ...
  • 15. Proļ¬ling results 0% 50% 100% 150% GL ES + Subdivision + LR shading + Shadow + Soft shadow Total trafļ¬c Performance VB trafļ¬c ZB trafļ¬c Texture CB trafļ¬c
  • 16. Results-subdivision 0x 1x 2x 3x 4x 5x Control mesh 1st level 2nd level VB trafļ¬c Output verts Setup triangle/s No interpolation control mesh level 1 level 2
  • 17. Results-subdivision ā€¢ Lower performance than of pretessellated rendering ā€¢ Single PE is a bottleneck for heavy shaders ā€¢ 800+ instructions in CC shader ā€¢ One irregular vertex only ā€¢ 700+ instructions in Loop shader ā€¢ Up to 3 irregular vertices ā€¢ ~50% of interpolation instructions ā€¢ Explains HW tessellators in desktop accelerators ā€¢ ~2x vertex buffer trafļ¬c growth compared to control mesh rendering ā€¢ Vertex cache exploits a great portion of spatial coherency ā€¢ 8x less than of pretessellated mesh rendering (2 levels) ā€¢ Patch sorting causes 7-70% increase inVB trafļ¬c ā€¢ Depending on the object and subdivision scheme ā€¢ Sort breaks coherency as same size primitives are not necessarily neighbors ā€¢ Loop primitive is bigger - sort impact is heavier
  • 18. Conclusion ā€¢ Hybrid architecture for embedded space ā€¢ Predictable fragment shader performance ā€¢ Complex geometry processing capabilities ā€¢ Vertex cache-accelerated processing of ļ¬xed- and variable-size primitives ā€¢ ReducedVB trafļ¬c due to preserved spatial coherency ā€¢ On-chip subdivision and silhouette rendering as illustrations ā€¢ Bump/Tangent/Shadow-mapped shading at few clk/fragment ā€¢ Support for complex shading models ā€¢ No extra memory access due to on-chip material data ā€¢ Extended functionality exposed via both 1.X and 2.0 OpenGL ES API ā€¢ Enables short porting times for OpenGL ES apps/content creation chains