Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Oit And Indirect Illumination Using... by Holger Gruen 15201 views
- Shadow Mapping with Today's OpenGL... by Mark Kilgard 8787 views
- Z Buffer Optimizations by pjcozzi 12493 views
- Photogrammetry and Star Wars Battle... by Electronic Arts /... 56035 views
- SIGGRAPH Asia 2008 Modern OpenGL by Mark Kilgard 350431 views
- Borderless Per Face Texture Mapping by basisspace 1161 views

9,280 views

9,107 views

9,107 views

Published on

No Downloads

Total views

9,280

On SlideShare

0

From Embeds

0

Number of Embeds

18

Shares

0

Downloads

168

Comments

0

Likes

9

No embeds

No notes for slide

- 1. presented for Dr. Bajaj’s graphics class University of Texas, Austin April 29, 2010 Anatomy of a Texture Fetch Mark J. Kilgard mjk@nvidia.com
- 2. About Me Principal System Software Engineer NVIDIA Distinguished Inventor Native Texan, living in Austin But instead of UT, attended Rice (C.S. 1991) Yet I root for Texas in football (unless playing Rice) Interested in Programmable shading languages for GPUs Novel GPU-accelerated rendering paradigms OpenGL & GPU Architecture Improvements
- 3. Our Topic: the Texture Fetch Dissected Texture Fetch All modern 3D games rely on texture-rich rendering Basis for all sorts of richness and detail More than graphics—GPU compute supports texture fetches Conceptually An image load with built-in re-sampling GeForce 480 capable of 42,000,000,000 per second! * * 700 Mhz clock × 15 Streaming Multiprocessors × 4 fetches per clock
- 4. A Texture Fetch Simplified Fetch at Seems pretty (s,t) = (0.6, 0.25) simple… Given 1. An image 2. A position Return the color of image at position RGBA Result is 0.95,0.4,0.24,1.0)
- 5. Texture Supplies Detail to Rendered Scenes With texture Without texture
- 6. Textures Make Graphics Pretty Texture → detail, detail → immersion, immersion → fun Microsoft Flight Simulator X
- 7. Shaders Combine Multiple Textures decal only (modulate) lightmaps only = * Id Software’s Quake 2 combined scene circa 1997
- 8. Projected Texturing for Shadow Mapping without light with position shadows shadows “what the light sees” Depth map from light’s point of view is re-used as a texture and re-projected into eye’s view to generate shadows
- 9. Shadow Mapping Explained Planar distance from light Depth map projected onto scene ≤ = less equals than True “un-shadowed” region shown green
- 10. Texture’s Not All Fun and Games
- 11. What’s so hard? Filtering Poor quality results if you just return the closest color sample in the image Bilinear filtering + mipmapping needed Complications Wrap modes, formats, compression, color spaces, other dimensionalities (1D, 3D, cube maps), etc. Gotta be quick Applications desire billions of fetches per second What’s done per-fragment in the shader, must be done per-texel in the texture fetch—so 8x times as much work! Essentially a miniature, real-time re-sampling kernel
- 12. Anatomy of a Texture Fetch Texture images Texel Texel offsets data Texture coordinate Filtered vector Texel Texel texel vector Selection Combination Combination parameters Texture parameters
- 13. Texture Fetch Functionality (1) Texture coordinate processing Projective texturing (OpenGL 1.0) Cube map face selection (OpenGL 1.3) Texture array indexing (OpenGL 2.1) Coordinate scale: normalization (ARB_texture_rectangle) Level-of-detail (LOD) computation Log of maximum texture coordinate partial derivative (OpenGL 1.0) LOD clamping (OpenGL 1.2) LOD bias (OpenGL 1.3) Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias) Wrap modes Repeat, clamp (OpenGL 1.0) Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3) Mirrored repeat (OpenGL 1.4) Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp) Wrap to adjacent cube map face Region clamp & mirror (PlayStation 2)
- 14. Arrays of 2D Textures Multiple skins packed in texture array Motivation: binding to one multi-skin texture array avoids texture bind per object Texture array index 0 1 2 3 4 Mipmap level index 0 1 2 3 4
- 15. Texture Fetch Functionality (2) Filter modes Minification / magnification transition (OpenGL 1.0) Nearest, linear, mipmap (OpenGL 1.0) 1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D) Anisotropic (EXT_texture_filter_anisotropic) Fixed-weights: Quincunx, 3x3 Gaussian Used for multi-sample resolves Detail texture magnification (SGIS_detail_texture) Sharpen texture magnification (SGIS_sharpen_texture) 4x4 filter (SGIS_texture_filter4) Sharp-edge texture magnification (E&S Harmony) Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)
- 16. Texture Fetch Functionality (3) Texture formats Uncompressed Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1) Type: unsigned, signed (NV_texture_shader) Normalized: fixed-point vs. integer (OpenGL 3.0) Compressed DXT compression formats (EXT_texture_compression_s3tc) 4:2:2 video compression (various extensions) 1- and 2-component compression (EXT_texture_compression_latc, OpenGL 3.0) Other approaches: IDCT, VQ, differential encoding, normal maps, separable decompositions Alternate encodings RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent) Spherical harmonics Sum of product decompositions
- 17. Texture Fetch Functionality (4) Pre-filtering operations Gamma correction (OpenGL 2.1) Table: sRGB / arbitrary Shadow map comparison (OpenGL 1.4) Compare functions: LEQUAL, GREATER, etc. (OpenGL 1.5) Needs “R” depth value per texel Palette lookup (EXT_paletted_texture) Thresh-holding Color key Generalized thresh-holding
- 18. Delicate Color Fidelity with sRGB Problem: PC display devices have a legacy non-linear (sRGB) display gamut—delicate color shading looks wrong Conventional Gamma correct rendering (sRGB rendered) (uncorrected color) Softer and more natural Unnaturally Implication for texture deep facial hardware: Should perform shadows sRGB-to-linear color space convert per-texel, so 24 scalar conversions—or more—per fetch NVIDIA’s Adriana GeForce 8 Launch Demo
- 19. Texture Fetch Functionality (5) Optimizations Level-of-detail weighting adjustments Mid-maps (extra pre-filtered levels in-between existing levels) Unconventional uses Bitmap textures for fonts with large filters (Direct3D 10) Rip-mapping Non-uniform texture border color Clip-mapping (SGIX_clipmap) Multi-texel borders Silhouette maps (Pardeep Sen’s work) Shadow mapping Sharp piecewise linear magnification
- 20. Phased Data Flow Must hide long memory read latency between Selection and Combination phases Texture images Memory Texel Texel reads for offsets data samples Filtered texel Texture vector Texel Texel coordinate Selection Combination vector Combination parameters FIFOing of combination parameters Texture parameters
- 21. What really happens? Let’s consider a simple tri-linear mip-mapped 2D projective texture fetch High-level language Logically just one instruction statement (Cg/HLSL) float4 color = tex2Dproj(decalSampler, st); TXP o[COLR], f[TEX3], TEX2, 2D; Logically Assembly instruction Texel selection (NV_fragment_program) Texel combination How many operations are involved?
- 22. Medium-Level Dissection of a Texture Fetch texture images interpolated texture coords vector texel texel data offsets integer integer / coords & fixed-point filtered Convert Convert texel texel fractional texel texture weights texel integer / intermediates floating-point vector coords coords floor / coords fixed-point scaling to frac to texel and texel texel combination combination combination parameters coords offsets texture parameters
- 23. Interpolation First we need to interpolate (s,t,r,q) This is the f[TEX3] part of the TXP instruction Projective texturing means we want (s/q, t/q) And possible r/q if shadow mapping In order to correct for perspective, hardware actually interpolates (s/w, t/w, r/w, q/w) If not projective texturing, could linearly interpolate inverse w (or 1/w) Then compute its reciprocal to get w Observe projective Since 1/(1/w) equals w texturing is same cost Then multiply (s/w,t/w,r/w,q/w) times w as perspective correction To get (s,t,r,q) If projective texturing, we can instead Compute reciprocal of q/w to get w/q Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)
- 24. Interpolation Operations Ax + By + C per scalar linear interpolation 2 MADs One reciprocal to invert q/w for projective texturing Or one reciprocal to invert 1/w for perspective texturing Then 1 MUL per component for s/w * w/q Or s/w * w For (s,t) means 4 MADs, 2 MULs, & 1 RCP (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP All floating-point operations
- 25. Texture Space Mapping Have interpolated & projected coordinates Now need to determine what texels to fetch Multiple (s,t) by (width,height) of texture base level Could convert (s,t) to fixed-point first Or do math in floating-point Say based texture is 256x256 so So compute (s*256, t*256)=(u,v)
- 26. Mipmap Level-of-detail Selection Tri-linear mip-mapping means compute appropriate mipmap level Hardware rasterizes in 2x2 pixel entities Typically called quad-pixels or just quad Finite difference with neighbors to get change in u and v with respect to window space Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y one-pixel separation Means 4 subtractions per quad (1 per pixel) Now compute approximation to gradient length p = max(sqrt((∂u/∂x)2+(∂u/∂y)2), sqrt((∂v/∂x)2+(∂v/∂y)2))
- 27. Level-of-detail Bias and Clamping Convert p length to power-of-two level-of-detail and apply LOD bias λ = log2(p) + lodBias Now clamp λ to valid LOD range λ’ = max(minLOD, min(maxLOD, λ))
- 28. Determine Mipmap Levels and Level Filtering Weight Determine lower and upper mipmap levels b = floor(λ’)) is bottom mipmap level t = floor(λ’+1) is top mipmap level Determine filter weight between levels w = frac(λ’) is filter weight
- 29. Determine Texture Sample Point Get (u,v) for selected top and bottom mipmap levels Consider a level l which could be either level t or b With (u,v) locations (ul,vl) Perform GL_CLAMP_TO_EDGE wrap modes uw = max(1/2*widthOfLevel(l), min(1-1/2*widthOfLevel(l), u)) vw = max(1/2*heightOfLevel(l), min(1-1/2*heightOfLevel(l), v)) t edge Get integer location (i,j) within each level s border (i,j) = ( floor(uw* widthOfLevel(l)), floor(vw* ) )
- 30. Determine Texel Locations Bilinear sample needs 4 texel locations (i0,j0), (i0,j1), (i1,j0), (i1,j1) With integer texel coordinates i0 = floor(i-1/2) i1 = floor(i+1/2) j0 = floor(j-1/2) j1 = floor(j+1/2) Also compute fractional weights for bilinear filtering a = frac(i-1/2) b = frac(j-1/2)
- 31. Determine Texel Addresses Assuming a texture level image’s base pointer, compute a texel address of each texel to fetch Assume bytesPerTexel = 4 bytes for RGBA8 texture Example addr00 = baseOfLevel(l) + bytesPerTexel*(i0+j0*widthOfLevel(l)) addr01 = baseOfLevel(l) + bytesPerTexel*(i0+j1*widthOfLevel(l)) addr10 = baseOfLevel(l) + bytesPerTexel*(i1+j0*widthOfLevel(l)) addr11 = baseOfLevel(l) + bytesPerTexel*(i1+j1*widthOfLevel(l)) More complicated address schemes are needed for good texture locality!
- 32. Initiate Texture Reads Initiate texture memory reads at the 8 texel addresses addr00, addr01, addr10, addr11 for the upper level addr00, addr01, addr10, addr11 for the lower level Queue the weights a, b, and w Latency FIFO in hardware makes these weights available when texture reads complete
- 33. Phased Data Flow Must hide long memory read latency between Selection and Combination phases Texture images Memory Texel Texel reads for offsets data samples Filtered texel Texture vector Texel Texel coordinate Selection Combination vector Combination parameters FIFOing of combination parameters Texture parameters
- 34. Texel Combination When texels reads are returned, begin filtering Assume results are Top texels: t00, t01, t10, t11 Bottom texels: b00, b01, b10, b11 Per-component filtering math is tri-linear filter RGBA8 is four components result = (1-a)*(1-b)*(1-w)*b00 + (1-a)*b*(1-w)*b*b01 + a*(1-b)*(1-w)*b10 + a*b*(1-w)*b11 + (1-a)*(1-b)*w*t00 + (1-a)*b*w*t01 + a*(1-b)*w*t10 + a*b*w*t11; 24 MADs per component, or 96 for RGBA Lerp-tree could do 14 MADs per component, or 56 for RGBA
- 35. Total Texture Fetch Operations Interpolation 6 MADs, 3 MULs, & 1 RCP (floating-point) Texel selection Texture space mapping 2 MULs (fixed-point) LOD determination (floating-point) 1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2 LOD bias and clamping (fixed-point) 1 ADD, 1 MIN, 1 MAX Assuming a fixed-point RGBA Level determination and level weighting (fixed-point) tri-linear mipmap filtered 1 FLOOR, 1 ADD, 1 FRAC projective texture fetch Texture sample point 4 MAXs, 4 MINs, 2 FLOORs (fixed-point) Texel locations and bi-linear weights 8 FLOORs, 4 FRACs, 8 ADDs (fixed-point) Addressing 16 integer MADs (integer) Texel combination 56 fixed-point MADs (fixed-point)
- 36. Observations about the Texture Fetch Lots of ways to implement the math Lots of clever ways to be efficient Lots more texture operations not considered in this analysis Compression Anisotropic filtering sRGB Shadow mapping Arguably TEX instructions are “world’s most CISC instructions” Texture fetches are incredibly complex instructions Good deal of GPU’s superiority at graphics operations over CPUs is attributable to TEX instruction efficiency Good for compute too
- 37. Take Away Information The GPU texture fetch is about two orders of magnitude more complex than the most complex CPU instruction And texture fetches are extremely common Dozens of billions of texture fetches are expected by modern GPU applications Not just a graphics thing Using CUDA, you can access textures from within your compute- and bandwidth-intensive parallel kernels
- 38. GPU Technology Conference 2010 Monday, Sept. 20 - Thurs., Sept. 23, 2010 Opportunities San Jose Convention Center, San Jose, California Call for Submissions The most important event in the GPU ecosystem Sessions & posters Learn about seismic shifts in GPU computing Preview disruptive technologies and emerging applications Sponsors / Exhibitors Reach decision makers Get tools and techniques to impact mission critical projects Network with experts, colleagues, and peers across industries “CEO on Stage” Showcase for Startups “I consider the GPU Technology Conference to be the single best place to see the amazing Tell your story to VCs and work enabled by the GPU. It’s a great venue for meeting researchers, developers, scientists, and entrepreneurs from around the world.” analysts -- Professor Hanspeter Pfister, Harvard University and GTC 2009 keynote speaker

No public clipboards found for this slide

Be the first to comment