GPU Data Structures
for Graphics and Vision
Promotionskolloquium, May 6th 2011
Gernot Ziegler
Dept. of Computer Graphics
(3D Video and Vision-Based Graphics Group)
Outline
Graphics Hardware:
Original Purpose and Recent Development
Classical Usage in Visual Computing
 Free Viewpoint Video Compression
 Color and Depth Reprojection
 Hierarchical Image Processing
General Data Processing
 Data Compaction with the HistoPyramid
 Quadtree and Octree Generation
 Data Expansion with the HistoPyramid
Conclusion
Graphics Hardware: Original Purpose
 Graphics hardware accelerates typical data operations of
computer graphics (pixel moves, triangle rasterization)
 GPU is simpler in design than CPU, but massively parallel.
Graphics Hardware: Capabilities
 ~2003: Graphics Hardware becomes programmable:
GPU (Graphics Processing Unit)
Graphics Hardware: Capabilities
 Data can now be anything (floating point & integer)
 General Purpose Computing on GPU = GPGPU
"Classical Usage" in Visual Computing (still graphics-related)
– Computer Vision
– Video processing
– Volume analysis
General Data Processing
– PDE / ODE solver
– Spatial Data Structure Generation
– Database Ops
– Etc…
Game of Life
(Early GPGPU by S. Green)
Classical Usage in Visual Computing
Free Viewpoint Video Compression
(Chapter 3)
 Map video footage into texture domain via proxy 3D model
Free Viewpoint Video Compression
(Chapter 3)
 Obtain texture surface masking via shadow mapping
Free Viewpoint Video Compression: Publications
 G. Ziegler, H. Lensch, N. Ahmed, M. Magnor, H.-P. Seidel.
Multi-Video Compression in Texture Space.
11th IEEE Intl Conference on Image Processing (ICIP 2004),
Singapore, pp. 2467-2470, 2004.
 G. Ziegler, H. Lensch, M. Magnor, H.-P. Seidel. Multi-Video
Compression in Texture Space using 4D SPIHT.
6th IEEE Workshop on Multimedia Signal Processing, Siena,
Italy, pp. 39-42, 2004.
Color and Depth Reprojection
(Chapter 4)
 Depth-map "Projection" via proxy mesh & vertex shader
Novel View reconstruction
from partial depth camera views
Color and Depth Reprojection
(Chapter 4)
Blending by View Angle Our per-pixel approach (Purple: Blended areas)
Hierarchical Image Processing: Stereo reconstruction
(Chapter 5.1)
 Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
Hierarchical Image Processing: Stereo reconstruction
(Chapter 5.1)
 Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
Hierarchical Image Processing: Reduction
(Chapter 5.2)
 Mipmap-like reduction: Dominant feature region, noise reduction
Hierarchical Image Processing: Reduction
(Thesis Chapter 5.3)
 Histogram of local gradients guides lens warp compensation
General Data Processing
Graphics Hardware: Capabilities
 GPU has massive computation and memory throughput
Graphics Hardware: Limitations
 GPU is connected with CPU via narrow databus
(bandwidth bottleneck, approx. 4 GB/s)
 GPU is a Stream processor:
– 10K thread workload necessary to keep 100s cores busy
(data parallelization!)
– Thread switching lightweight, but synchronization expensive!
– Each thread can only write at a fixed position
 Algorithms must be redesigned for GPU!
General Data Processing
Data Compaction
(Chapter 6)
Data-Parallel Algorithm Challenges
 Example from Computer Vision: List of all black pixels in an image
 Step 1: Detect black pixels:
 Step 2: Create a list of detected pixels
Previous approach to feature list generation
 Step 2 (List generation) was not possible on GPU!
 2a: GPU marks local features (e.g. thresholding, filtering)
 2b: CPU searches image and generates feature list
 But: Bus transfers expensive:
GPU useful only for complex feature isolation.
(e.g. large filter convolution & thresholding)
Our approach: Feature list generation on GPU
 We generate feature lists on the GPU using data compaction.
 Pixel/Voxels/Feature input is abstracted "data element stream”.
 Compaction keeps only elements deemed relevant for output.
1D example (keep all elements that are blue):
 Data flow:
Massive speedup due to strongly reduced bus dataflow!
1 1 0 1A B C D E F 1 1 1A B D E
Data Compaction: Problem task in 1D
 Keep number of elements from input, based on a Classifier:
 Implementation is trivial on CPU, single-thread.
 On GPU: Need to parallelize into 10k threads!
 First count number of output elements
using data-parallel reduction!
Data Compaction via HistoPyramid:Buildup
 First, count number of output elements,
e.g. 4:1 data-parallel reduction
 (Note the reduction pyramid, it is retained - HistoPyramid)
 Can now allocate compact output, no spill.
 But how are output elements generated?
Histogram pyramid /
HistoPyramid
Data Compaction via HistoPyramid: Traversal
 Output generate: Start one thread per output element
 Each output thread traverses reduction pyramid (read-only)
 No read/write hazards = Data-parallel output writing!
 As many threads as output elements
HistoPyramid: 2D Data Compaction
 1D was tutorial, actual implementation is 2D !
 Dataflow diagram:
GPU Data Compaction:Publications
 Data Compaction fast enough for real-time volume analysis
 First application: Mesh-to-volume-to-point cloud in real-time!
 G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel
On-the-fly Point Clouds through Histogram Pyramids
11th International Fall Workshop on Vision, Modeling and
Visualization 2006 (VMV2006), 2006, pp. 137-144.
GPU Data Compaction:Publications
 Data Compaction fast enough for real-time volume analysis
 First application: Mesh-to-volume-to-point cloud in real-time!
 G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel
On-the-fly Point Clouds through Histogram Pyramids
11th International Fall Workshop on Vision, Modeling and
Visualization 2006 (VMV2006), 2006, pp. 137-144.
GPU Data Compaction:Publications
 Vector Field Contours: View-dependent vectorfield analysis to
visualize contour lines throughout the volume
 Data Compaction delivers seedpoints for contour lines in ms!
 T. Annen, H. Theisel, C. Rössl, G. Ziegler, H.-P. Seidel
Vector Field Contours
Graphics Interface 2008, Windsor/Canada, 2008, pp. 97-105
General Data Processing
Quadtree and Octree Generation
(Chapter 8 and 9)
GPU Quadtrees: Introduction
 2D Reduction follows a quadtree-like reduction pattern.
 By tracking feature similarity in reduction,
quadtrees can be created from the reduction pyramid!
GPU QuadTree: Publications
 Speed (ms) enables real-time quadtree processing from video!
e.g. for Compression, Vision,..
 G. Ziegler, R. Dimitrov, C. Theobalt, H-P. Seidel.
Real-time Quadtree Analysis using HistoPyramids.
SPIE Electronic Imaging conference, San Jose/USA, 2007.
GPU Octree
(Chapter 9)
 Feature Clustering extended to 3D volumes
 Octrees from Volume Data
 New algorithm, pointer octrees
(e.g. for spatial data structures)
 Real-time creation of
high-resolution octrees
from meshes possible!
General Data Processing
Data Expansion
(Chapter 7)
Data Expansion via HistoPyramid:
Problem task
 We have a predicate function that determines
how many output copies to create from each input element:
 Implementation is trivial on CPU
 GPU: Input can be divided amongst threads, but:
Where shall each thread write its output?
 Insight:
HistoPyramid traversal works even here!
Data Expansion via HistoPyramid:
HP Buildup
 First, count number of output elements, e.g. via 4:1 reduction:
Data Expansion via HistoPyramid:
HP Traversal (single output copy)
 Traversal for single output elements:
 Exactly like data compaction, but: Mind local key index kL
Data Expansion via HistoPyramid:
HP Traversal (multiple output copies)
 Traversal for multiple output elements:
 kL determines number of copy. Still: one thread for each copy!
Data Expansion via HistoPyramid:
HP Traversal (multiple output copies)
 Traversal for multiple output elements:
 kL determines number of copy.
 Observation: Thread can modify input before write-out!
 Thus: Output can be modified version of input based on kL.
 e.g. Geometry Creation:
 (Generic algorithm…)
Data Expansion: Eikonal Rendering (Publication I)
 Compute light transport through volume objects of varying refraction
 Both real-time rendering and precomputed lighting simulation
 Lighting simulation requires adaptive light wavefront simulation
 I. Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. Seidel
Eikonal Rendering: Efficient Light Transport in Refractive Objects
ACM Transactions on Graphics 26 (3): 59-1 - 59-8, 2007
http://www.mpi-inf.mpg.de/resources/EikonalRendering/
Eikonal Rendering: Lighting Simulation
 For given light-object position, precompute lighting inside
the volumetric object for real-time novel view rendering.
 Lighting simulation implements numerical ODE solver on GPU.
 Subdivide light's wavefront into a set of patches
 Patch corners move as GPU particle system
– Each particle follows ray optics
 During update, some patches:
– weaken too much (discard)
– leave volume (discard)
– grow too large (tesselate)
 Since patch list is on GPU:
– Discard: Data Compaction
– Tesselate: Data Expansion
Eikonal Rendering: Wavefront Propagation
Eikonal Rendering: Short Demo
Eikonal Rendering: Short Demo
Data Expansion: Marching Cubes (Publication II)
 Marching Cubes algorithm extracts iso-surfaces from volumes
 Reformulate: Stream of voxels ...
– is first compacted to the relevant iso-surface voxels
– then expanded, becoming a stream of triangle vertices
 C. Dyken, G. Ziegler, C. Theobalt, and H.-P. Seidel
High-speed Marching Cubes using HistoPyramids
Computer Graphics Forum 27 (8): 2028-2039, 2008
http://www.sintef.no/hpmc
Performance of OpenGL approach (2007):
 Geometry shader (GS), e.g. NVIDIA GeForce 8, enabled
hardware data compaction & expansion for geometry -
should obsolete HistoPyramids, but HP-MC outperforms
geometry shader (HP-GS)!
HP-MC was 2007 fastest known MC algorithm.
(frames per second)
Conclusion and Outlook
(Chapter 10)
Conclusion and Outlook
 GPUs increasingly useful in general data processing
 Programming Model Restrictions not always bad
– Force programmer to change thought model
– E.g.: Fixed Output Location created HistoPyramid traversal concept
– Can be more efficient, even on more capable hardware!
(atomic counters, geometry shaders have less performance)
 Data-Parallel Algorithm Design is hard
– But once done, parallelizable over any number of available cores
(if sufficient data available)
– Hard to imagine that auto-parallelization can achieve this
 Future work
– Connected components, distance transforms, SATs…
– Accelerate further using CUDA C and OpenCL
Other work based on presented algorithms
Quadtree
 C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass.
QuadN4tree: A GPU-Friendly Quadtree Leaves Neighborhood Structure.
Proc. of Computer Graphics International Conference (CGI) 2008.
 C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass.
Using Quadtrees for Energy Minimization Via Graph Cuts.
Proc. of VMV - 12th Vision, Modeling, and Visualization Workshop, pp. 71-80.
Data Expansion
 C. Dyken, M. Reimers, J. Seland.
Real-time GPU Silhouette Refinement using adaptively blended Bézier patches.
Computer Graphics Forum, Volume 27, number 1, pp. 1-12, 2007.
Data Compaction (Implementation)
 J. Fung, S. Mann.
OpenVIDIA: parallel GPU computer vision.
Proc. of 13th annual ACM international conference on Multimedia, pp. 849 - 852.
http://openvidia.sf.net
End of Presentation
Recent Work
San Jose (CA) | September 23rd, 2010
Christopher Dyken, SINTEF Norway
Gernot Ziegler, NVIDIA UK
GPU-accelerated data expansion
for the Marching Cubes algorithm
HistoPyramid performance
 Accelerated HistoPyramids using CUDA C
 HistoPyramid BuildUp
— Reduce 5-to-1, but store only first four sums!
— Build several levels via on-GPU shared memory
(less video memory transactions)
 Marching Cubes specific
— Share scalar input data amongst neighbouring MC cells
(through shared memory)
Backpack (iso=0.4) (www.volvis.org)
Size 512x512x373 (187 mb)
Triangles 3 745 320 (0.039 tris/cell)
OpenGL HP4MC 13 fps (1291 mvps)
CUDA-OpenGL HP5MC 43 fps (4129 mvps)
Speedup
3.2x
Head aneuyrism (iso=0.4) (www.volvis.org)
Size 512x512x512 (256 mb)
Triangles 583 610 (0.004 tris/cell)
OpenGL HP4MC 15 fps (2034 mvps)
CUDA-OpenGL HP5MC 78 fps (10399 mvps)
Speedup
5.1x
Christmas tree (iso=0.05) (TU Wien)
Size 512x499x512 (250 mb)
Triangles 5 629 532 (0.043 tris/cell)
OpenGL HP4MC 10 fps (1358 mvps)
CUDA-OpenGL HP5MC 28 fps (3704 mvps)
Speedup
2.7x
5123-ish 16-bit performance
End of Presentation

PhD defense talk (portfolio of my expertise)

  • 1.
    GPU Data Structures forGraphics and Vision Promotionskolloquium, May 6th 2011 Gernot Ziegler Dept. of Computer Graphics (3D Video and Vision-Based Graphics Group)
  • 2.
    Outline Graphics Hardware: Original Purposeand Recent Development Classical Usage in Visual Computing  Free Viewpoint Video Compression  Color and Depth Reprojection  Hierarchical Image Processing General Data Processing  Data Compaction with the HistoPyramid  Quadtree and Octree Generation  Data Expansion with the HistoPyramid Conclusion
  • 3.
    Graphics Hardware: OriginalPurpose  Graphics hardware accelerates typical data operations of computer graphics (pixel moves, triangle rasterization)  GPU is simpler in design than CPU, but massively parallel.
  • 4.
    Graphics Hardware: Capabilities ~2003: Graphics Hardware becomes programmable: GPU (Graphics Processing Unit)
  • 5.
    Graphics Hardware: Capabilities Data can now be anything (floating point & integer)  General Purpose Computing on GPU = GPGPU "Classical Usage" in Visual Computing (still graphics-related) – Computer Vision – Video processing – Volume analysis General Data Processing – PDE / ODE solver – Spatial Data Structure Generation – Database Ops – Etc… Game of Life (Early GPGPU by S. Green)
  • 6.
    Classical Usage inVisual Computing
  • 7.
    Free Viewpoint VideoCompression (Chapter 3)  Map video footage into texture domain via proxy 3D model
  • 8.
    Free Viewpoint VideoCompression (Chapter 3)  Obtain texture surface masking via shadow mapping
  • 9.
    Free Viewpoint VideoCompression: Publications  G. Ziegler, H. Lensch, N. Ahmed, M. Magnor, H.-P. Seidel. Multi-Video Compression in Texture Space. 11th IEEE Intl Conference on Image Processing (ICIP 2004), Singapore, pp. 2467-2470, 2004.  G. Ziegler, H. Lensch, M. Magnor, H.-P. Seidel. Multi-Video Compression in Texture Space using 4D SPIHT. 6th IEEE Workshop on Multimedia Signal Processing, Siena, Italy, pp. 39-42, 2004.
  • 10.
    Color and DepthReprojection (Chapter 4)  Depth-map "Projection" via proxy mesh & vertex shader Novel View reconstruction from partial depth camera views
  • 11.
    Color and DepthReprojection (Chapter 4) Blending by View Angle Our per-pixel approach (Purple: Blended areas)
  • 12.
    Hierarchical Image Processing:Stereo reconstruction (Chapter 5.1)  Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
  • 13.
    Hierarchical Image Processing:Stereo reconstruction (Chapter 5.1)  Projective texturing in plane-sweep (GPU feedback, coarse-to-fine)
  • 14.
    Hierarchical Image Processing:Reduction (Chapter 5.2)  Mipmap-like reduction: Dominant feature region, noise reduction
  • 15.
    Hierarchical Image Processing:Reduction (Thesis Chapter 5.3)  Histogram of local gradients guides lens warp compensation
  • 16.
  • 17.
    Graphics Hardware: Capabilities GPU has massive computation and memory throughput
  • 18.
    Graphics Hardware: Limitations GPU is connected with CPU via narrow databus (bandwidth bottleneck, approx. 4 GB/s)  GPU is a Stream processor: – 10K thread workload necessary to keep 100s cores busy (data parallelization!) – Thread switching lightweight, but synchronization expensive! – Each thread can only write at a fixed position  Algorithms must be redesigned for GPU!
  • 19.
    General Data Processing DataCompaction (Chapter 6)
  • 20.
    Data-Parallel Algorithm Challenges Example from Computer Vision: List of all black pixels in an image  Step 1: Detect black pixels:  Step 2: Create a list of detected pixels
  • 21.
    Previous approach tofeature list generation  Step 2 (List generation) was not possible on GPU!  2a: GPU marks local features (e.g. thresholding, filtering)  2b: CPU searches image and generates feature list  But: Bus transfers expensive: GPU useful only for complex feature isolation. (e.g. large filter convolution & thresholding)
  • 22.
    Our approach: Featurelist generation on GPU  We generate feature lists on the GPU using data compaction.  Pixel/Voxels/Feature input is abstracted "data element stream”.  Compaction keeps only elements deemed relevant for output. 1D example (keep all elements that are blue):  Data flow: Massive speedup due to strongly reduced bus dataflow! 1 1 0 1A B C D E F 1 1 1A B D E
  • 23.
    Data Compaction: Problemtask in 1D  Keep number of elements from input, based on a Classifier:  Implementation is trivial on CPU, single-thread.  On GPU: Need to parallelize into 10k threads!  First count number of output elements using data-parallel reduction!
  • 24.
    Data Compaction viaHistoPyramid:Buildup  First, count number of output elements, e.g. 4:1 data-parallel reduction  (Note the reduction pyramid, it is retained - HistoPyramid)  Can now allocate compact output, no spill.  But how are output elements generated? Histogram pyramid / HistoPyramid
  • 25.
    Data Compaction viaHistoPyramid: Traversal  Output generate: Start one thread per output element  Each output thread traverses reduction pyramid (read-only)  No read/write hazards = Data-parallel output writing!  As many threads as output elements
  • 26.
    HistoPyramid: 2D DataCompaction  1D was tutorial, actual implementation is 2D !  Dataflow diagram:
  • 27.
    GPU Data Compaction:Publications Data Compaction fast enough for real-time volume analysis  First application: Mesh-to-volume-to-point cloud in real-time!  G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel On-the-fly Point Clouds through Histogram Pyramids 11th International Fall Workshop on Vision, Modeling and Visualization 2006 (VMV2006), 2006, pp. 137-144.
  • 28.
    GPU Data Compaction:Publications Data Compaction fast enough for real-time volume analysis  First application: Mesh-to-volume-to-point cloud in real-time!  G. Ziegler, A. Tevs, C. Theobalt, H.-P. Seidel On-the-fly Point Clouds through Histogram Pyramids 11th International Fall Workshop on Vision, Modeling and Visualization 2006 (VMV2006), 2006, pp. 137-144.
  • 29.
    GPU Data Compaction:Publications Vector Field Contours: View-dependent vectorfield analysis to visualize contour lines throughout the volume  Data Compaction delivers seedpoints for contour lines in ms!  T. Annen, H. Theisel, C. Rössl, G. Ziegler, H.-P. Seidel Vector Field Contours Graphics Interface 2008, Windsor/Canada, 2008, pp. 97-105
  • 30.
    General Data Processing Quadtreeand Octree Generation (Chapter 8 and 9)
  • 31.
    GPU Quadtrees: Introduction 2D Reduction follows a quadtree-like reduction pattern.  By tracking feature similarity in reduction, quadtrees can be created from the reduction pyramid!
  • 32.
    GPU QuadTree: Publications Speed (ms) enables real-time quadtree processing from video! e.g. for Compression, Vision,..  G. Ziegler, R. Dimitrov, C. Theobalt, H-P. Seidel. Real-time Quadtree Analysis using HistoPyramids. SPIE Electronic Imaging conference, San Jose/USA, 2007.
  • 33.
    GPU Octree (Chapter 9) Feature Clustering extended to 3D volumes  Octrees from Volume Data  New algorithm, pointer octrees (e.g. for spatial data structures)  Real-time creation of high-resolution octrees from meshes possible!
  • 34.
    General Data Processing DataExpansion (Chapter 7)
  • 35.
    Data Expansion viaHistoPyramid: Problem task  We have a predicate function that determines how many output copies to create from each input element:  Implementation is trivial on CPU  GPU: Input can be divided amongst threads, but: Where shall each thread write its output?  Insight: HistoPyramid traversal works even here!
  • 36.
    Data Expansion viaHistoPyramid: HP Buildup  First, count number of output elements, e.g. via 4:1 reduction:
  • 37.
    Data Expansion viaHistoPyramid: HP Traversal (single output copy)  Traversal for single output elements:  Exactly like data compaction, but: Mind local key index kL
  • 38.
    Data Expansion viaHistoPyramid: HP Traversal (multiple output copies)  Traversal for multiple output elements:  kL determines number of copy. Still: one thread for each copy!
  • 39.
    Data Expansion viaHistoPyramid: HP Traversal (multiple output copies)  Traversal for multiple output elements:  kL determines number of copy.  Observation: Thread can modify input before write-out!  Thus: Output can be modified version of input based on kL.  e.g. Geometry Creation:  (Generic algorithm…)
  • 40.
    Data Expansion: EikonalRendering (Publication I)  Compute light transport through volume objects of varying refraction  Both real-time rendering and precomputed lighting simulation  Lighting simulation requires adaptive light wavefront simulation  I. Ihrke, G. Ziegler, A. Tevs, C. Theobalt, M. Magnor, H.-P. Seidel Eikonal Rendering: Efficient Light Transport in Refractive Objects ACM Transactions on Graphics 26 (3): 59-1 - 59-8, 2007 http://www.mpi-inf.mpg.de/resources/EikonalRendering/
  • 41.
    Eikonal Rendering: LightingSimulation  For given light-object position, precompute lighting inside the volumetric object for real-time novel view rendering.
  • 42.
     Lighting simulationimplements numerical ODE solver on GPU.  Subdivide light's wavefront into a set of patches  Patch corners move as GPU particle system – Each particle follows ray optics  During update, some patches: – weaken too much (discard) – leave volume (discard) – grow too large (tesselate)  Since patch list is on GPU: – Discard: Data Compaction – Tesselate: Data Expansion Eikonal Rendering: Wavefront Propagation
  • 43.
  • 44.
  • 45.
    Data Expansion: MarchingCubes (Publication II)  Marching Cubes algorithm extracts iso-surfaces from volumes  Reformulate: Stream of voxels ... – is first compacted to the relevant iso-surface voxels – then expanded, becoming a stream of triangle vertices  C. Dyken, G. Ziegler, C. Theobalt, and H.-P. Seidel High-speed Marching Cubes using HistoPyramids Computer Graphics Forum 27 (8): 2028-2039, 2008 http://www.sintef.no/hpmc
  • 46.
    Performance of OpenGLapproach (2007):  Geometry shader (GS), e.g. NVIDIA GeForce 8, enabled hardware data compaction & expansion for geometry - should obsolete HistoPyramids, but HP-MC outperforms geometry shader (HP-GS)! HP-MC was 2007 fastest known MC algorithm. (frames per second)
  • 47.
  • 48.
    Conclusion and Outlook GPUs increasingly useful in general data processing  Programming Model Restrictions not always bad – Force programmer to change thought model – E.g.: Fixed Output Location created HistoPyramid traversal concept – Can be more efficient, even on more capable hardware! (atomic counters, geometry shaders have less performance)  Data-Parallel Algorithm Design is hard – But once done, parallelizable over any number of available cores (if sufficient data available) – Hard to imagine that auto-parallelization can achieve this  Future work – Connected components, distance transforms, SATs… – Accelerate further using CUDA C and OpenCL
  • 49.
    Other work basedon presented algorithms Quadtree  C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass. QuadN4tree: A GPU-Friendly Quadtree Leaves Neighborhood Structure. Proc. of Computer Graphics International Conference (CGI) 2008.  C. N. Vasconcelos, A. Sá, P. C. Carvalho, M. Gattass. Using Quadtrees for Energy Minimization Via Graph Cuts. Proc. of VMV - 12th Vision, Modeling, and Visualization Workshop, pp. 71-80. Data Expansion  C. Dyken, M. Reimers, J. Seland. Real-time GPU Silhouette Refinement using adaptively blended Bézier patches. Computer Graphics Forum, Volume 27, number 1, pp. 1-12, 2007. Data Compaction (Implementation)  J. Fung, S. Mann. OpenVIDIA: parallel GPU computer vision. Proc. of 13th annual ACM international conference on Multimedia, pp. 849 - 852. http://openvidia.sf.net
  • 50.
  • 51.
  • 52.
    San Jose (CA)| September 23rd, 2010 Christopher Dyken, SINTEF Norway Gernot Ziegler, NVIDIA UK GPU-accelerated data expansion for the Marching Cubes algorithm
  • 53.
    HistoPyramid performance  AcceleratedHistoPyramids using CUDA C  HistoPyramid BuildUp — Reduce 5-to-1, but store only first four sums! — Build several levels via on-GPU shared memory (less video memory transactions)  Marching Cubes specific — Share scalar input data amongst neighbouring MC cells (through shared memory)
  • 54.
    Backpack (iso=0.4) (www.volvis.org) Size512x512x373 (187 mb) Triangles 3 745 320 (0.039 tris/cell) OpenGL HP4MC 13 fps (1291 mvps) CUDA-OpenGL HP5MC 43 fps (4129 mvps) Speedup 3.2x Head aneuyrism (iso=0.4) (www.volvis.org) Size 512x512x512 (256 mb) Triangles 583 610 (0.004 tris/cell) OpenGL HP4MC 15 fps (2034 mvps) CUDA-OpenGL HP5MC 78 fps (10399 mvps) Speedup 5.1x Christmas tree (iso=0.05) (TU Wien) Size 512x499x512 (250 mb) Triangles 5 629 532 (0.043 tris/cell) OpenGL HP4MC 10 fps (1358 mvps) CUDA-OpenGL HP5MC 28 fps (3704 mvps) Speedup 2.7x 5123-ish 16-bit performance
  • 56.