presented for Dr. Bajaj’s graphics class
                              University of Texas, Austin
                                             April 29, 2010




Anatomy of a
Texture Fetch
Mark J. Kilgard
mjk@nvidia.com
About Me

 Principal System Software Engineer
    NVIDIA Distinguished Inventor
 Native Texan, living in Austin
    But instead of UT, attended Rice (C.S. 1991)
    Yet I root for Texas in football (unless playing Rice)
 Interested in
    Programmable shading languages for GPUs
    Novel GPU-accelerated rendering paradigms
    OpenGL & GPU Architecture Improvements
Our Topic: the Texture Fetch Dissected

 Texture Fetch
    All modern 3D games rely on texture-rich rendering
    Basis for all sorts of richness and detail
    More than graphics—GPU compute supports texture fetches
 Conceptually
    An image load with built-in re-sampling
 GeForce 480 capable of 42,000,000,000 per second! *

         * 700 Mhz clock × 15 Streaming Multiprocessors × 4 fetches per clock
A Texture Fetch Simplified

                      Fetch at
  Seems pretty        (s,t) = (0.6, 0.25)
  simple…
  Given
  1. An image
  2. A position
  Return the
  color of image at
  position


                                            RGBA Result is
                                            0.95,0.4,0.24,1.0)
Texture Supplies Detail to Rendered Scenes


                                With texture




   Without texture
Textures Make Graphics Pretty

                  Texture → detail,
                  detail → immersion,
                  immersion → fun




              Microsoft Flight Simulator X
Shaders Combine Multiple Textures
                                           decal only

                            
                         (modulate)



     lightmaps only




       =
                                       * Id Software’s Quake 2
                      combined scene                circa 1997
Projected Texturing for Shadow Mapping



 without   light                                           with
           position
shadows                                                    shadows




                      “what the
                      light sees”   Depth map from light’s point of view
                                    is re-used as a texture and
                                    re-projected into eye’s view
                                    to generate shadows
Shadow Mapping Explained
 Planar distance from light          Depth map projected onto scene




                              ≤                                  =
                              less                             equals
                              than




                                                      True “un-shadowed”
                                                      region shown green
Texture’s Not All Fun and Games
What’s so hard?

 Filtering
     Poor quality results if you just return the closest color sample in the
     image
     Bilinear filtering + mipmapping needed
 Complications
     Wrap modes, formats, compression, color spaces, other dimensionalities
     (1D, 3D, cube maps), etc.
 Gotta be quick
     Applications desire billions of fetches per second
     What’s done per-fragment in the shader, must be done per-texel in the
     texture fetch—so 8x times as much work!
 Essentially a miniature, real-time re-sampling kernel
Anatomy of a Texture Fetch

                         Texture images
              Texel                              Texel
             offsets                             data



   Texture
coordinate                                                  Filtered
    vector     Texel                             Texel      texel vector
             Selection                        Combination
                           Combination
                            parameters



                         Texture parameters
Texture Fetch Functionality (1)
 Texture coordinate processing
     Projective texturing (OpenGL 1.0)
     Cube map face selection (OpenGL 1.3)
     Texture array indexing (OpenGL 2.1)
     Coordinate scale: normalization (ARB_texture_rectangle)
 Level-of-detail (LOD) computation
     Log of maximum texture coordinate partial derivative (OpenGL 1.0)
     LOD clamping (OpenGL 1.2)
     LOD bias (OpenGL 1.3)
     Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias)
 Wrap modes
     Repeat, clamp (OpenGL 1.0)
     Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3)
     Mirrored repeat (OpenGL 1.4)
     Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp)
     Wrap to adjacent cube map face
     Region clamp & mirror (PlayStation 2)
Arrays of 2D Textures

                       Multiple skins packed in texture array
                            Motivation: binding to one multi-skin texture array avoids
                            texture bind per object

                                                Texture array index

                                0           1            2            3        4
  Mipmap level index




                        0



                        1

                        2
                        3
                        4
Texture Fetch Functionality (2)

 Filter modes
     Minification / magnification transition (OpenGL 1.0)
     Nearest, linear, mipmap (OpenGL 1.0)
     1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D)
     Anisotropic (EXT_texture_filter_anisotropic)
     Fixed-weights: Quincunx, 3x3 Gaussian
         Used for multi-sample resolves
     Detail texture magnification (SGIS_detail_texture)
     Sharpen texture magnification (SGIS_sharpen_texture)
     4x4 filter (SGIS_texture_filter4)
     Sharp-edge texture magnification (E&S Harmony)
     Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)
Texture Fetch Functionality (3)

 Texture formats
     Uncompressed
         Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1)
         Type: unsigned, signed (NV_texture_shader)
         Normalized: fixed-point vs. integer (OpenGL 3.0)
     Compressed
         DXT compression formats (EXT_texture_compression_s3tc)
         4:2:2 video compression (various extensions)
         1- and 2-component compression (EXT_texture_compression_latc,
         OpenGL 3.0)
         Other approaches: IDCT, VQ, differential encoding, normal maps, separable
         decompositions
     Alternate encodings
         RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent)
         Spherical harmonics
         Sum of product decompositions
Texture Fetch Functionality (4)

 Pre-filtering operations
     Gamma correction (OpenGL 2.1)
        Table: sRGB / arbitrary
     Shadow map comparison (OpenGL 1.4)
        Compare functions: LEQUAL, GREATER, etc. (OpenGL 1.5)
        Needs “R” depth value per texel
     Palette lookup (EXT_paletted_texture)
     Thresh-holding
        Color key
        Generalized thresh-holding
Delicate Color Fidelity with sRGB
      Problem: PC display devices have a legacy non-linear (sRGB)
      display gamut—delicate color shading looks wrong
Conventional                                             Gamma correct
   rendering                                             (sRGB rendered)
(uncorrected
       color)
                                                         Softer and more natural

 Unnaturally
                                                          Implication for texture
 deep facial
                                                          hardware: Should perform
   shadows
                                                          sRGB-to-linear color space
                                                          convert per-texel, so 24
                                                          scalar conversions—or
                                                          more—per fetch
                NVIDIA’s Adriana GeForce 8 Launch Demo
Texture Fetch Functionality (5)

 Optimizations
     Level-of-detail weighting adjustments
     Mid-maps (extra pre-filtered levels in-between existing levels)
 Unconventional uses
     Bitmap textures for fonts with large filters (Direct3D 10)
     Rip-mapping
     Non-uniform texture border color
     Clip-mapping (SGIX_clipmap)
     Multi-texel borders
     Silhouette maps (Pardeep Sen’s work)
         Shadow mapping
         Sharp piecewise linear magnification
Phased Data Flow
  Must hide long memory read latency between Selection
  and Combination phases
                                    Texture images
     Memory
                        Texel                              Texel
    reads for
                       offsets                             data
     samples

                                                                      Filtered
                                                                      texel
   Texture                                                            vector
                         Texel                             Texel
coordinate
                       Selection                        Combination
    vector                           Combination
                                      parameters
               FIFOing of
             combination
              parameters           Texture parameters
What really happens?

 Let’s consider a simple tri-linear mip-mapped 2D
 projective texture fetch                    High-level language
 Logically just one instruction              statement (Cg/HLSL)
  float4 color = tex2Dproj(decalSampler, st);
  TXP o[COLR], f[TEX3], TEX2, 2D;

 Logically                                 Assembly instruction
    Texel selection                        (NV_fragment_program)
    Texel combination
 How many operations are involved?
Medium-Level Dissection
   of a Texture Fetch
                                                                 texture images



 interpolated
texture coords
    vector
                                                    texel                                 texel data
                                                  offsets



                                        integer                                                        integer /
                                     coords &                                                          fixed-point                      filtered
  Convert                                            Convert                                                                            texel
                   texel             fractional                                                        texel
  texture                              weights
                                                      texel                            integer /       intermediates
                                                                                                                       floating-point   vector
                 coords
  coords                   floor /                   coords                          fixed-point                          scaling
     to                     frac                        to                               texel                               and
   texel                                              texel        combination      combination                         combination
                                                                   parameters
  coords                                             offsets



                                                               texture parameters
Interpolation

 First we need to interpolate (s,t,r,q)
 This is the f[TEX3] part of the TXP instruction
 Projective texturing means we want (s/q, t/q)
      And possible r/q if shadow mapping
 In order to correct for perspective, hardware actually interpolates
      (s/w, t/w, r/w, q/w)
 If not projective texturing, could linearly interpolate inverse w (or 1/w)
      Then compute its reciprocal to get w                        Observe projective
         Since 1/(1/w) equals w                                   texturing is same cost
      Then multiply (s/w,t/w,r/w,q/w) times w                     as perspective
                                                                  correction
         To get (s,t,r,q)
 If projective texturing, we can instead
      Compute reciprocal of q/w to get w/q
      Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)
Interpolation Operations

 Ax + By + C per scalar linear interpolation
    2 MADs
 One reciprocal to invert q/w for projective texturing
    Or one reciprocal to invert 1/w for perspective texturing
 Then 1 MUL per component for s/w * w/q
    Or s/w * w
 For (s,t) means
    4 MADs, 2 MULs, & 1 RCP
    (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP
 All floating-point operations
Texture Space Mapping

 Have interpolated & projected coordinates
 Now need to determine what texels to fetch

 Multiple (s,t) by (width,height) of texture base level
    Could convert (s,t) to fixed-point first
       Or do math in floating-point
    Say based texture is 256x256 so
       So compute (s*256, t*256)=(u,v)
Mipmap Level-of-detail Selection

 Tri-linear mip-mapping means compute
 appropriate mipmap level
 Hardware rasterizes in 2x2 pixel entities
    Typically called quad-pixels or just quad
    Finite difference with neighbors to get change in
    u and v with respect to window space
       Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y      one-pixel separation
       Means 4 subtractions per quad (1 per pixel)
 Now compute approximation to gradient
 length
    p = max(sqrt((∂u/∂x)2+(∂u/∂y)2),
           sqrt((∂v/∂x)2+(∂v/∂y)2))
Level-of-detail Bias and Clamping

 Convert p length to power-of-two level-of-detail and
 apply LOD bias
    λ = log2(p) + lodBias
 Now clamp λ to valid LOD range
    λ’ = max(minLOD, min(maxLOD, λ))
Determine Mipmap Levels and
Level Filtering Weight

 Determine lower and upper mipmap levels
    b = floor(λ’)) is bottom mipmap level
    t = floor(λ’+1) is top mipmap level
 Determine filter weight between levels
    w = frac(λ’) is filter weight
Determine Texture Sample Point

 Get (u,v) for selected top and bottom mipmap levels
    Consider a level l which could be either level t or b
      With (u,v) locations (ul,vl)
 Perform GL_CLAMP_TO_EDGE wrap modes
    uw = max(1/2*widthOfLevel(l),
         min(1-1/2*widthOfLevel(l), u))
    vw = max(1/2*heightOfLevel(l),
         min(1-1/2*heightOfLevel(l), v))          t
                                                            edge

 Get integer location (i,j) within each level         s
                                                            border

    (i,j) = ( floor(uw* widthOfLevel(l)),
              floor(vw* ) )
Determine Texel Locations

 Bilinear sample needs 4 texel locations
    (i0,j0), (i0,j1), (i1,j0), (i1,j1)
 With integer texel coordinates
    i0 = floor(i-1/2)
    i1 = floor(i+1/2)
    j0 = floor(j-1/2)
    j1 = floor(j+1/2)
 Also compute fractional weights for bilinear filtering
    a = frac(i-1/2)
    b = frac(j-1/2)
Determine Texel Addresses

 Assuming a texture level image’s base pointer, compute a texel
 address of each texel to fetch
    Assume bytesPerTexel = 4 bytes for RGBA8 texture
 Example
    addr00 = baseOfLevel(l) +
             bytesPerTexel*(i0+j0*widthOfLevel(l))
    addr01 = baseOfLevel(l) +
             bytesPerTexel*(i0+j1*widthOfLevel(l))
    addr10 = baseOfLevel(l) +
             bytesPerTexel*(i1+j0*widthOfLevel(l))
    addr11 = baseOfLevel(l) +
             bytesPerTexel*(i1+j1*widthOfLevel(l))
 More complicated address schemes are needed for good texture
 locality!
Initiate Texture Reads

 Initiate texture memory reads at the 8 texel addresses
    addr00, addr01, addr10, addr11 for the upper level
    addr00, addr01, addr10, addr11 for the lower level
 Queue the weights a, b, and w
    Latency FIFO in hardware makes these weights available
    when texture reads complete
Phased Data Flow
   Must hide long memory read latency between Selection
   and Combination phases
                                    Texture images
      Memory
                        Texel                              Texel
     reads for
                       offsets                             data
      samples

                                                                      Filtered
                                                                      texel
   Texture                                                            vector
                         Texel                             Texel
coordinate
                       Selection                        Combination
    vector                           Combination
                                      parameters
               FIFOing of
             combination
              parameters           Texture parameters
Texel Combination
 When texels reads are returned, begin filtering
     Assume results are
        Top texels: t00, t01, t10, t11
        Bottom texels: b00, b01, b10, b11
 Per-component filtering math is tri-linear filter
     RGBA8 is four components
 result = (1-a)*(1-b)*(1-w)*b00 +
         (1-a)*b*(1-w)*b*b01 +
         a*(1-b)*(1-w)*b10 +
         a*b*(1-w)*b11 +
         (1-a)*(1-b)*w*t00 +
         (1-a)*b*w*t01 +
         a*(1-b)*w*t10 +
         a*b*w*t11;
 24 MADs per component, or 96 for RGBA
     Lerp-tree could do 14 MADs per component, or 56 for RGBA
Total Texture Fetch Operations

 Interpolation
       6 MADs, 3 MULs, & 1 RCP (floating-point)
 Texel selection
       Texture space mapping
            2 MULs (fixed-point)
       LOD determination (floating-point)
            1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2
       LOD bias and clamping (fixed-point)
            1 ADD, 1 MIN, 1 MAX
                                                               Assuming a fixed-point RGBA
       Level determination and level weighting (fixed-point)   tri-linear mipmap filtered
            1 FLOOR, 1 ADD, 1 FRAC                             projective texture fetch
       Texture sample point
            4 MAXs, 4 MINs, 2 FLOORs (fixed-point)
       Texel locations and bi-linear weights
            8 FLOORs, 4 FRACs, 8 ADDs (fixed-point)
       Addressing
            16 integer MADs (integer)
 Texel combination
       56 fixed-point MADs (fixed-point)
Observations about the Texture Fetch

 Lots of ways to implement the math
    Lots of clever ways to be efficient
    Lots more texture operations not considered in this analysis
       Compression
       Anisotropic filtering
       sRGB
       Shadow mapping
 Arguably TEX instructions are “world’s most CISC instructions”
    Texture fetches are incredibly complex instructions
 Good deal of GPU’s superiority at graphics operations over CPUs
 is attributable to TEX instruction efficiency
    Good for compute too
Take Away Information

 The GPU texture fetch is about two orders of magnitude
 more complex than the most complex CPU instruction
    And texture fetches are extremely common
    Dozens of billions of texture fetches are expected by modern
    GPU applications
 Not just a graphics thing
    Using CUDA, you can access textures from within your
    compute- and bandwidth-intensive parallel kernels
GPU Technology Conference 2010
Monday, Sept. 20 - Thurs., Sept. 23, 2010                                                            Opportunities
San Jose Convention Center, San Jose, California
                                                                                                  Call for Submissions
The most important event in the GPU ecosystem                                                      Sessions & posters
     Learn about seismic shifts in GPU computing
     Preview disruptive technologies and emerging applications                                    Sponsors / Exhibitors
                                                                                                  Reach decision makers
     Get tools and techniques to impact mission critical projects
     Network with experts, colleagues, and peers across industries                                    “CEO on Stage”
                                                                                                  Showcase for Startups
  “I consider the GPU Technology Conference to be the single best place to see the amazing       Tell your story to VCs and
  work enabled by the GPU. It’s a great venue for meeting researchers, developers, scientists,
  and entrepreneurs from around the world.”                                                                analysts

  -- Professor Hanspeter Pfister, Harvard University and GTC 2009 keynote speaker

Anatomy of a Texture Fetch

  • 1.
    presented for Dr.Bajaj’s graphics class University of Texas, Austin April 29, 2010 Anatomy of a Texture Fetch Mark J. Kilgard mjk@nvidia.com
  • 2.
    About Me PrincipalSystem Software Engineer NVIDIA Distinguished Inventor Native Texan, living in Austin But instead of UT, attended Rice (C.S. 1991) Yet I root for Texas in football (unless playing Rice) Interested in Programmable shading languages for GPUs Novel GPU-accelerated rendering paradigms OpenGL & GPU Architecture Improvements
  • 3.
    Our Topic: theTexture Fetch Dissected Texture Fetch All modern 3D games rely on texture-rich rendering Basis for all sorts of richness and detail More than graphics—GPU compute supports texture fetches Conceptually An image load with built-in re-sampling GeForce 480 capable of 42,000,000,000 per second! * * 700 Mhz clock × 15 Streaming Multiprocessors × 4 fetches per clock
  • 4.
    A Texture FetchSimplified Fetch at Seems pretty (s,t) = (0.6, 0.25) simple… Given 1. An image 2. A position Return the color of image at position RGBA Result is 0.95,0.4,0.24,1.0)
  • 5.
    Texture Supplies Detailto Rendered Scenes With texture Without texture
  • 6.
    Textures Make GraphicsPretty Texture → detail, detail → immersion, immersion → fun Microsoft Flight Simulator X
  • 7.
    Shaders Combine MultipleTextures decal only  (modulate) lightmaps only = * Id Software’s Quake 2 combined scene circa 1997
  • 8.
    Projected Texturing forShadow Mapping without light with position shadows shadows “what the light sees” Depth map from light’s point of view is re-used as a texture and re-projected into eye’s view to generate shadows
  • 9.
    Shadow Mapping Explained Planar distance from light Depth map projected onto scene ≤ = less equals than True “un-shadowed” region shown green
  • 10.
    Texture’s Not AllFun and Games
  • 11.
    What’s so hard? Filtering Poor quality results if you just return the closest color sample in the image Bilinear filtering + mipmapping needed Complications Wrap modes, formats, compression, color spaces, other dimensionalities (1D, 3D, cube maps), etc. Gotta be quick Applications desire billions of fetches per second What’s done per-fragment in the shader, must be done per-texel in the texture fetch—so 8x times as much work! Essentially a miniature, real-time re-sampling kernel
  • 12.
    Anatomy of aTexture Fetch Texture images Texel Texel offsets data Texture coordinate Filtered vector Texel Texel texel vector Selection Combination Combination parameters Texture parameters
  • 13.
    Texture Fetch Functionality(1) Texture coordinate processing Projective texturing (OpenGL 1.0) Cube map face selection (OpenGL 1.3) Texture array indexing (OpenGL 2.1) Coordinate scale: normalization (ARB_texture_rectangle) Level-of-detail (LOD) computation Log of maximum texture coordinate partial derivative (OpenGL 1.0) LOD clamping (OpenGL 1.2) LOD bias (OpenGL 1.3) Anisotropic scaling of partial derivatives (SGIX_texture_lod_bias) Wrap modes Repeat, clamp (OpenGL 1.0) Clamp to edge (OpenGL 1.2), Clamp to border (OpenGL 1.3) Mirrored repeat (OpenGL 1.4) Fully generalized clamped mirror repeat (EXT_texture_mirror_clamp) Wrap to adjacent cube map face Region clamp & mirror (PlayStation 2)
  • 14.
    Arrays of 2DTextures Multiple skins packed in texture array Motivation: binding to one multi-skin texture array avoids texture bind per object Texture array index 0 1 2 3 4 Mipmap level index 0 1 2 3 4
  • 15.
    Texture Fetch Functionality(2) Filter modes Minification / magnification transition (OpenGL 1.0) Nearest, linear, mipmap (OpenGL 1.0) 1D & 2D (OpenGL 1.0), 3D (OpenGL 1.2), 4D (SGIS_texture4D) Anisotropic (EXT_texture_filter_anisotropic) Fixed-weights: Quincunx, 3x3 Gaussian Used for multi-sample resolves Detail texture magnification (SGIS_detail_texture) Sharpen texture magnification (SGIS_sharpen_texture) 4x4 filter (SGIS_texture_filter4) Sharp-edge texture magnification (E&S Harmony) Floating-point texture filtering (ARB_texture_float, OpenGL 3.0)
  • 16.
    Texture Fetch Functionality(3) Texture formats Uncompressed Packing: RGBA8, RGB5A1, etc. (OpenGL 1.1) Type: unsigned, signed (NV_texture_shader) Normalized: fixed-point vs. integer (OpenGL 3.0) Compressed DXT compression formats (EXT_texture_compression_s3tc) 4:2:2 video compression (various extensions) 1- and 2-component compression (EXT_texture_compression_latc, OpenGL 3.0) Other approaches: IDCT, VQ, differential encoding, normal maps, separable decompositions Alternate encodings RGB9 with 5-bit shared exponent (EXT_texture_shared_exponent) Spherical harmonics Sum of product decompositions
  • 17.
    Texture Fetch Functionality(4) Pre-filtering operations Gamma correction (OpenGL 2.1) Table: sRGB / arbitrary Shadow map comparison (OpenGL 1.4) Compare functions: LEQUAL, GREATER, etc. (OpenGL 1.5) Needs “R” depth value per texel Palette lookup (EXT_paletted_texture) Thresh-holding Color key Generalized thresh-holding
  • 18.
    Delicate Color Fidelitywith sRGB Problem: PC display devices have a legacy non-linear (sRGB) display gamut—delicate color shading looks wrong Conventional Gamma correct rendering (sRGB rendered) (uncorrected color) Softer and more natural Unnaturally Implication for texture deep facial hardware: Should perform shadows sRGB-to-linear color space convert per-texel, so 24 scalar conversions—or more—per fetch NVIDIA’s Adriana GeForce 8 Launch Demo
  • 19.
    Texture Fetch Functionality(5) Optimizations Level-of-detail weighting adjustments Mid-maps (extra pre-filtered levels in-between existing levels) Unconventional uses Bitmap textures for fonts with large filters (Direct3D 10) Rip-mapping Non-uniform texture border color Clip-mapping (SGIX_clipmap) Multi-texel borders Silhouette maps (Pardeep Sen’s work) Shadow mapping Sharp piecewise linear magnification
  • 20.
    Phased Data Flow Must hide long memory read latency between Selection and Combination phases Texture images Memory Texel Texel reads for offsets data samples Filtered texel Texture vector Texel Texel coordinate Selection Combination vector Combination parameters FIFOing of combination parameters Texture parameters
  • 21.
    What really happens? Let’s consider a simple tri-linear mip-mapped 2D projective texture fetch High-level language Logically just one instruction statement (Cg/HLSL) float4 color = tex2Dproj(decalSampler, st); TXP o[COLR], f[TEX3], TEX2, 2D; Logically Assembly instruction Texel selection (NV_fragment_program) Texel combination How many operations are involved?
  • 22.
    Medium-Level Dissection of a Texture Fetch texture images interpolated texture coords vector texel texel data offsets integer integer / coords & fixed-point filtered Convert Convert texel texel fractional texel texture weights texel integer / intermediates floating-point vector coords coords floor / coords fixed-point scaling to frac to texel and texel texel combination combination combination parameters coords offsets texture parameters
  • 23.
    Interpolation First weneed to interpolate (s,t,r,q) This is the f[TEX3] part of the TXP instruction Projective texturing means we want (s/q, t/q) And possible r/q if shadow mapping In order to correct for perspective, hardware actually interpolates (s/w, t/w, r/w, q/w) If not projective texturing, could linearly interpolate inverse w (or 1/w) Then compute its reciprocal to get w Observe projective Since 1/(1/w) equals w texturing is same cost Then multiply (s/w,t/w,r/w,q/w) times w as perspective correction To get (s,t,r,q) If projective texturing, we can instead Compute reciprocal of q/w to get w/q Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)
  • 24.
    Interpolation Operations Ax+ By + C per scalar linear interpolation 2 MADs One reciprocal to invert q/w for projective texturing Or one reciprocal to invert 1/w for perspective texturing Then 1 MUL per component for s/w * w/q Or s/w * w For (s,t) means 4 MADs, 2 MULs, & 1 RCP (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP All floating-point operations
  • 25.
    Texture Space Mapping Have interpolated & projected coordinates Now need to determine what texels to fetch Multiple (s,t) by (width,height) of texture base level Could convert (s,t) to fixed-point first Or do math in floating-point Say based texture is 256x256 so So compute (s*256, t*256)=(u,v)
  • 26.
    Mipmap Level-of-detail Selection Tri-linear mip-mapping means compute appropriate mipmap level Hardware rasterizes in 2x2 pixel entities Typically called quad-pixels or just quad Finite difference with neighbors to get change in u and v with respect to window space Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y one-pixel separation Means 4 subtractions per quad (1 per pixel) Now compute approximation to gradient length p = max(sqrt((∂u/∂x)2+(∂u/∂y)2), sqrt((∂v/∂x)2+(∂v/∂y)2))
  • 27.
    Level-of-detail Bias andClamping Convert p length to power-of-two level-of-detail and apply LOD bias λ = log2(p) + lodBias Now clamp λ to valid LOD range λ’ = max(minLOD, min(maxLOD, λ))
  • 28.
    Determine Mipmap Levelsand Level Filtering Weight Determine lower and upper mipmap levels b = floor(λ’)) is bottom mipmap level t = floor(λ’+1) is top mipmap level Determine filter weight between levels w = frac(λ’) is filter weight
  • 29.
    Determine Texture SamplePoint Get (u,v) for selected top and bottom mipmap levels Consider a level l which could be either level t or b With (u,v) locations (ul,vl) Perform GL_CLAMP_TO_EDGE wrap modes uw = max(1/2*widthOfLevel(l), min(1-1/2*widthOfLevel(l), u)) vw = max(1/2*heightOfLevel(l), min(1-1/2*heightOfLevel(l), v)) t edge Get integer location (i,j) within each level s border (i,j) = ( floor(uw* widthOfLevel(l)), floor(vw* ) )
  • 30.
    Determine Texel Locations Bilinear sample needs 4 texel locations (i0,j0), (i0,j1), (i1,j0), (i1,j1) With integer texel coordinates i0 = floor(i-1/2) i1 = floor(i+1/2) j0 = floor(j-1/2) j1 = floor(j+1/2) Also compute fractional weights for bilinear filtering a = frac(i-1/2) b = frac(j-1/2)
  • 31.
    Determine Texel Addresses Assuming a texture level image’s base pointer, compute a texel address of each texel to fetch Assume bytesPerTexel = 4 bytes for RGBA8 texture Example addr00 = baseOfLevel(l) + bytesPerTexel*(i0+j0*widthOfLevel(l)) addr01 = baseOfLevel(l) + bytesPerTexel*(i0+j1*widthOfLevel(l)) addr10 = baseOfLevel(l) + bytesPerTexel*(i1+j0*widthOfLevel(l)) addr11 = baseOfLevel(l) + bytesPerTexel*(i1+j1*widthOfLevel(l)) More complicated address schemes are needed for good texture locality!
  • 32.
    Initiate Texture Reads Initiate texture memory reads at the 8 texel addresses addr00, addr01, addr10, addr11 for the upper level addr00, addr01, addr10, addr11 for the lower level Queue the weights a, b, and w Latency FIFO in hardware makes these weights available when texture reads complete
  • 33.
    Phased Data Flow Must hide long memory read latency between Selection and Combination phases Texture images Memory Texel Texel reads for offsets data samples Filtered texel Texture vector Texel Texel coordinate Selection Combination vector Combination parameters FIFOing of combination parameters Texture parameters
  • 34.
    Texel Combination Whentexels reads are returned, begin filtering Assume results are Top texels: t00, t01, t10, t11 Bottom texels: b00, b01, b10, b11 Per-component filtering math is tri-linear filter RGBA8 is four components result = (1-a)*(1-b)*(1-w)*b00 + (1-a)*b*(1-w)*b*b01 + a*(1-b)*(1-w)*b10 + a*b*(1-w)*b11 + (1-a)*(1-b)*w*t00 + (1-a)*b*w*t01 + a*(1-b)*w*t10 + a*b*w*t11; 24 MADs per component, or 96 for RGBA Lerp-tree could do 14 MADs per component, or 56 for RGBA
  • 35.
    Total Texture FetchOperations Interpolation 6 MADs, 3 MULs, & 1 RCP (floating-point) Texel selection Texture space mapping 2 MULs (fixed-point) LOD determination (floating-point) 1 pixel difference, 2 SQRTs, 4 MULs, 1 LOG2 LOD bias and clamping (fixed-point) 1 ADD, 1 MIN, 1 MAX Assuming a fixed-point RGBA Level determination and level weighting (fixed-point) tri-linear mipmap filtered 1 FLOOR, 1 ADD, 1 FRAC projective texture fetch Texture sample point 4 MAXs, 4 MINs, 2 FLOORs (fixed-point) Texel locations and bi-linear weights 8 FLOORs, 4 FRACs, 8 ADDs (fixed-point) Addressing 16 integer MADs (integer) Texel combination 56 fixed-point MADs (fixed-point)
  • 36.
    Observations about theTexture Fetch Lots of ways to implement the math Lots of clever ways to be efficient Lots more texture operations not considered in this analysis Compression Anisotropic filtering sRGB Shadow mapping Arguably TEX instructions are “world’s most CISC instructions” Texture fetches are incredibly complex instructions Good deal of GPU’s superiority at graphics operations over CPUs is attributable to TEX instruction efficiency Good for compute too
  • 37.
    Take Away Information The GPU texture fetch is about two orders of magnitude more complex than the most complex CPU instruction And texture fetches are extremely common Dozens of billions of texture fetches are expected by modern GPU applications Not just a graphics thing Using CUDA, you can access textures from within your compute- and bandwidth-intensive parallel kernels
  • 38.
    GPU Technology Conference2010 Monday, Sept. 20 - Thurs., Sept. 23, 2010 Opportunities San Jose Convention Center, San Jose, California Call for Submissions The most important event in the GPU ecosystem Sessions & posters Learn about seismic shifts in GPU computing Preview disruptive technologies and emerging applications Sponsors / Exhibitors Reach decision makers Get tools and techniques to impact mission critical projects Network with experts, colleagues, and peers across industries “CEO on Stage” Showcase for Startups “I consider the GPU Technology Conference to be the single best place to see the amazing Tell your story to VCs and work enabled by the GPU. It’s a great venue for meeting researchers, developers, scientists, and entrepreneurs from around the world.” analysts -- Professor Hanspeter Pfister, Harvard University and GTC 2009 keynote speaker