Alternative Rendering Pipelines on NVIDIA CUDA

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Alternative Rendering Pipelines on NVIDIA CUDA - Presentation Transcript

    1. Alternative Rendering Pipelines Using NVIDIA CUDA Andrei Tatarinov Alexander Kharlamov
    2. Outline CUDA overview Ray-tracing REYES pipeline Future ideas
    3. CUDA Overview
    4. Compute Unified Device Architecture (CUDA) Parallel computing Application architecture DirectX … Allows easy access to GPU C/C++ OpenCL Fortran Compute A back-end for different APIs CUDA
    5. Streaming Multiprocessor Streaming Multiprocessor Texture Processing Cluster Instruction $ Constant $ Shared Memory SM SP SP SP SP Texture SM SFU SFU SP SP SP SP SM Register File Double Precision
    6. Threads and Blocks One block is executed on Streaming Multiprocessor one SM Instruction $ Constant $ Threads within a block can Shared Memory cooperate SP SP SP SP Shared memory SFU SFU SP SP __syncthreads() SP SP Register File Double Precision
    7. Multiprocessor Occupancy Registers (r.) & Threads 8192 r. per Streaming Multiprocessor on 8800GTX 128 r. – way too many registers r. ≤ 40: 6 active warps r. ≤ 32: 8 active warps r. ≤ 24: 10 active warps r. ≤ 20: 12 active warps r. ≤ 16: 16 active warps
    8. Usecases
    9. Ray tracing
    10. Ray tracing Natural rendering pipeline Important tool for determining visibility
    11. Research goals Investigate rendering pipelines Collaborative research with Moscow State University
    12. Path of a ray Tree traversal Primitive Material & Light Shading kernel intersection Kernel Kernel kernel Select K Leaves Select Next Leaf Intersection found: Compute light Primitive ID equation Ray Ray-triangle Generate Shadow Tree traversal Shading intersect Rays Generate Secondary Rays Select Next Primitive Shaded cluster is sampled
    13. Path of a ray Unknown number of rays Ray workload and memory access is highly irregular Register & Bandwidth pressure is high
    14. Kd-tree
    15. Kd-tree LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL L R LL LR RL RR LLL LLR LRL RRL RRR LLLL LLLR LRLL LRLR
    16. Kd-tree A tmax t* Registers – 13 min: Ray – 6 tmin t, tmin, tmax – 3 node – 2 B tmax tid, stack_top – 2 19 registers – is a practical number Stack in local memory t* tmin C tmax t* tmin
    17. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: Current Node:
    18. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: R Current Node: L
    19. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: R Current Node: LL
    20. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: LLR, R Current Node: LLL
    21. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: LLR, R Current Node: LLLR
    22. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: R We could stop here! Current Node: LLR
    23. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: Current Node: R
    24. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: RR Current Node: RL
    25. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: Current Node: RR
    26. Kd-tree Tree traversing LRLRL LRLRR LRR RRR LRLL RL LLLL LLLR LLR RRL Stack: Result: Current Node: RRR LLR, RRR
    27. Tree traversal Different rays may run for different time One thread can stall a whole block Each thread needs a buffer to store all possible leafs Worst case: a ray intersects all possible leafs of a tree
    28. Tree traversal Different rays may run for different time Solution: Persistent threads Each thread needs a buffer to store all possible leafs Solution: Screen tiling
    29. Persistent threads Launch as many threads as possible Depends on HW architecture and kernel requisites Keep all threads busy Create a pool of rays to traverse a tree
    30. Regular execution Disadvantages Waiting until all threads finish execution to launch new block time Warp 0 Warp 1 Warp 2 Warp 3 Block 0 Block 1
    31. Regular execution Disadvantages Waiting until all threads finish execution to launch new block time Warp 0 Warp 1 Warp 2 Warp 3 Block 0 Block 1
    32. Persistent threads execution Advantages Workload is balanced between warps time Warp 0 Warp 1 Warp 2 Warp 3 Block 0
    33. Screen Tiling Split the screen into multiple tiles Render tiles separately Tiles of 128x128 / 256x256 work well 128x128 is still 16K of threads! Allows easy multi-GPU performance scaling Control over memory
    34. Tree traversal Screen is split into tiles (256x256) Reserve place for a number of non-empty leafs Launch fixed number of threads
    35. Path of a ray Tree traversal Primitive Material & Light Shading kernel intersection Kernel Kernel kernel Select K Leaves Select Next Leaf Intersection found: Compute light Primitive ID equation Ray Ray-triangle Generate Shadow Tree traversal Shading intersect Rays Generate Secondary Rays Select Next Primitive Shaded cluster is sampled
    36. Ray-triangle intersection Minimum storage ray-triangle intersection t   dot (Q, E2 )  u  = 1  dot ( P, T )  v1   dot ( P, E1 )   v     dot (Q, D )    E1 = v1 − v0 u z v E2 = v2 − v0 v0 t1 T = p − v0 D v2 P = cross ( D, E2 ) Q = cross (T , E1 ) p
    37. Ray-triangle intersection Computational complexity (>30 MADs) Register Pressure (>23) v1 6 r. per ray 9 r. per triangle u z v 3 r. for intersection result (t, u, v) 1 r. for Triangle Count v0 t1 1 r. for loop index D v2 1 r. for thread ID (tid) p 2 r. min_t и min_id
    38. Ray-triangle kernel Each thread is mapped to a ray Each ray operates on its Block of threads shares triangle triangles (packet)
    39. Ray-triangle intersection Each thread is mapped to a ray triangles texture threads for (int i=0;i<triNum;i++) { (A,B,C) = tex1Dfetch(tex,i); // intersection code Kernel takes 32 registers }
    40. Ray-triangle intersection Each thread is mapped to a ray triangles texture threads for (int i=0;i<triNum;i++) { (A,B,C) = tex1Dfetch(tex,i); // intersection code Kernel takes 32 registers }
    41. Ray-triangle intersection Each thread is mapped to a ray triangles texture threads for (int i=0;i<triNum;i++) { (A,B,C) = tex1Dfetch(tex,i); // intersection code Kernel takes 32 registers }
    42. Ray-triangle intersection Each thread is mapped to a ray triangles texture threads for (int i=0;i<triNum;i++) { (A,B,C) = tex1Dfetch(tex,i); // intersection code Kernel takes 32 registers }
    43. Ray-triangle intersection Packet tracing triangles texture shared memory __shared__ sABC[N]; threads sABC[id] = tex1Dfetch(tex,i); __syncthreads(); for (int i=0;i<N;i++) { (A,B,C) = sABC[i]; // intersection code Kernel takes 20 registers }
    44. Ray-triangle intersection Packet tracing triangles texture shared memory __shared__ sABC[N]; threads sABC[id] = tex1Dfetch(tex,i); __syncthreads(); for (int i=0;i<N;i++) { (A,B,C) = sABC[i]; // intersection code Kernel takes 20 registers }
    45. Ray-triangle intersection Packet tracing triangles texture shared memory __shared__ sABC[N]; threads sABC[id] = tex1Dfetch(tex,i); __syncthreads(); for (int i=0;i<N;i++) { (A,B,C) = sABC[i]; // intersection code Kernel takes 20 registers }
    46. Ray-triangle intersection Performance comparison One thread per ray: 1x Packet tracing: 1.3x
    47. Path of a ray Tree traversal Primitive Material & Light Shading kernel intersection Kernel Kernel kernel Select K Leaves Select Next Leaf Intersection found: Compute light Primitive ID equation Ray Ray-triangle Generate Shadow Tree traversal Shading intersect Rays Generate Secondary Rays Select Next Primitive Shaded cluster is sampled
    48. Path of a ray Tree traversal Primitive Material & Light Shading kernel intersection Kernel Kernel kernel Select K Leaves Select Next Leaf Intersection found: Compute light Primitive ID equation Ray Ray-triangle Generate Shadow Tree traversal Shading intersect Rays Generate Secondary Rays Select Next Primitive Shaded cluster is sampled
    49. Uber-kernel Uber kernel – a kernel of the following structure: if ( condition_A ) { Do_Work_A(); } else if ( condition_B ) { Do_Work_B(); } … Ideally condition_A / condition_B … are constant per block
    50. Ray tracing Uber-kernel Uber kernel – a kernel of the following structure: if ( Leaf_List_is_Empty ) { Select_K_Leafs(); } else { Intersect_Trianles(); } …
    51. Ray-tracing: separate kernels Disadvantages Waiting until traversal kernel finishes execution time Blocks 0 Blocks 1 Blocks 2 Blocks 3 Tree Traversal Ray-Triangle Intersection
    52. Ray-tracing: separate kernels Disadvantages Waiting until traversal kernel finishes execution time Blocks 0 Blocks 1 Blocks 2 Blocks 3 Tree Traversal Ray-Triangle Intersection
    53. Ray-tracing uber-kernel Advantages Waiting until a block reaches the barrier time Blocks 0 Blocks 1 Blocks 2 Blocks 3 Tree Traversal Ray-Triangle Intersection
    54. Uber-kernel vs. Separate kernels Pros Work switching Memory savings Caution! Hard to profile Poor resources utilization
    55. Path of a ray Tree traversal Primitive Material & Light Shading kernel intersection Kernel Kernel kernel Select K Leaves Select Next Leaf Intersection found: Compute light Primitive ID equation Ray Ray-triangle Generate Shadow Tree traversal Shading intersect Rays Generate Secondary Rays Select Next Primitive Shaded cluster is sampled
    56. Material & Light Kernel Ray-tracing result: primitive ID and depth buffer Points to primitive matrix, material, etc For each pixel inside the tile evaluate: Number of secondary rays Number of shadow rays Get total number of rays
    57. Material & Light Kernel Number of rays depends on material and light properties Primitive ID 0 0 0 0 5 5 5 5 1 1 1 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 9 9 9 1 1 0 0 0 0 1 1 1 1 1 1 0 0 9 9 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 Refraction Reflection Shadow Diffuse ray count ray count ray count ray count
    58. Histogram Pyramid Use histogram pyramid for stream compaction 10 6 4 1 5 3 1 0 1 3 2 1 2 1 0
    59. Histogram Pyramid Use histogram pyramid for stream compaction 10 6 4 1 5 3 1 0 1 3 2 1 2 1 0 0 0 0 0 6 6 6 6
    60. Histogram Pyramid Use histogram pyramid for stream compaction 10 6 4 1 5 3 1 0 1 3 2 1 2 1 0 0 0 1 1 6 6 9 9
    61. Histogram Pyramid Use histogram pyramid for stream compaction 10 6 4 1 5 3 1 0 1 3 2 1 2 1 0 0 0 1 4 6 7 9 10
    62. Material & Light Kernel Read back the total number of rays required Check if resources are available Send generated rays to tree traversal & primitive intersection stage
    63. Path of a ray Tree traversal Primitive Material & Light Shading kernel intersection Kernel Kernel kernel Select K Leaves Select Next Leaf Intersection found: Compute light Primitive ID equation Ray Ray-triangle Generate Shadow Tree traversal Shading intersect Rays Generate Secondary Rays Select Next Primitive Shaded cluster is sampled
    64. Shading Deferred rendering style
    65. Ray-tracing & Global Illumination To simulate global illumination we can: Path-tracing: trace multiple rays through each pixel Photon mapping: trace photons from light source
    66. Photon mapping Brute force solution: Trace N photons from the light source Create lists of photons for each Kd- tree node For each visible pixel: Collect photons that are within radius R
    67. REYES Pipeline
    68. REYES Micropolygonal rendering pipeline Industrial standard for high-quality rendering Widely used in film production
    69. Research [Zhou09] [Patney08] Problems of implementing REYES on CUDA are common to many tasks in parallel computing
    70. Peculiarities A lot of uniform, non-divergent computations Natural parallelism exposed through bucketing Amount of work per input data element may vary significantly Some stages of pipeline can generate enormous amounts of data
    71. Implementation Design The idea is to implement REYES as a set of kernels which communicate through queues This allows implementing advanced scheduling schemes output Queue Kernel input input Kernel Queue output
    72. Pipeline overview input Transformation kernel Transforms control points Subdivision kernel Bounds and splits patches Dice/Shade/Sample kernel Uberkernel which can do dicing, shading and sampling Map A-buffer Map A-buffer to screen buffer to screen output
    73. Why queues? High-level memory access interface Push()/Pop()-style access Encapsulates complex memory management schemes Queue-specific tricks Recursive processing Workload management
    74. Path of the Patch Transformation Subdivision Dice/Shade/Sample Mapping kernel kernel kernel kernel Model-view Subdivision patch transformations check failed Transformation Screen Subdivision queue Dicing queue queue buffer Transformed to a set of diced microquads Shading Subdivision check queue passed The microquad is shaded Sampling ABuffer queue Shaded quad is sampled
    75. Queues High-level memory access interface Map()/Unmap() routines Implicitly perform scan inside Map() calls All threads get access to memory synchronously
    76. Scan operation Prefix sum Fundamental operation used in GPU computing Allows managing the memory depending on the needs of each thread
    77. Scan operation Can be performed by sequentially summing elements 1 1 2 2 1 1 4 1 0 1 2 4 6 7 8 12 data buffer This implementation is not suited for parallel execution
    78. Scan operation The naïve (n log(n) complexity) scan approach is used 1 1 2 2 1 1 4 1 0 1 1 2 2 1 1 4 Scan 0 1 2 3 4 3 2 5 0 1 2 4 6 6 6 8 0 1 2 4 6 7 8 12 0 1 2 4 6 7 8 12
    79. Scan operation The naïve (n log(n) complexity) scan approach is used 1 1 2 2 1 1 4 1 0 1 1 2 2 1 1 4 Scan 0 1 2 3 4 3 2 5 0 1 2 4 6 6 6 8 0 1 2 4 6 7 8 12 0 1 2 4 6 7 8 12
    80. Scan operation For more work efficient Scan approaches, see [Harris07]
    81. Queues Thread Thread Thread Thread Thread Thread Thread Thread 0 1 2 3 4 5 6 7 Calling Map() Map(1) Map(1) Map(2) Map(2) Map(1) Map(1) Map(4) Map(1)
    82. Queues Thread Thread Thread Thread Thread Thread Thread Thread 0 1 2 3 4 5 6 7 Calling Map() Map(1) Map(1) Map(2) Map(2) Map(1) Map(1) Map(4) Map(1) Performing scan 1 1 2 2 1 1 4 1 0 1 2 4 6 7 8 12
    83. Queues Thread Thread Thread Thread Thread Thread Thread Thread 0 1 2 3 4 5 6 7 Calling Map() Map(1) Map(1) Map(2) Map(2) Map(1) Map(1) Map(4) Map(1) Performing scan 1 1 2 2 1 1 4 1 0 1 2 4 6 7 8 12 Accessing data Read(0) Read(1) Read(2) Read(4) Read(6) Read(7) Read(8) Read(12)
    84. Advantages of using scan Used when a lot of threads need to access one shared resource Advantages: All threads get access to data synchronously No racing conditions and deadlocks
    85. Kernels in detail
    86. Bucketed rendering Each block in a kernel grid processes one screen tile Grid Block (0, 0) Block (1, 0) Blo Block (0, 1) Block (1, 1) Blo Blo Block (0, 2) Block (1, 2)
    87. Bucketed rendering Each block in a kernel grid processes one screen tile Grid Block (0, 0) Block (1, 0) Blo Block (0, 1) Block (1, 1) Blo Blo Block (0, 2) Block (1, 2)
    88. Bucketed rendering Each block in a kernel grid processes one screen tile Grid Block (0, 0) Block (1, 0) Blo Block (0, 1) Block (1, 1) Blo Blo Block (0, 2) Block (1, 2)
    89. Bucketed rendering Each block in a kernel grid processes one screen tile Grid Block (0, 0) Block (1, 0) Blo Block (0, 1) Block (1, 1) Blo Blo Block (0, 2) Block (1, 2)
    90. Bucketed rendering Input Input Input queue queue queue Subdivision block Subdivision block ... Subdivision block Dice/Shade/Sample block Dice/Shade/Sample block ... Dice/Shade/Sample block A-buffer tile 0 A-buffer tile 1 A-buffer tile N-1 A-buffer to screen mapping output
    91. Undetermined workload Data parallelism is not well-suited when amount of input data is undetermined Need to schedule kernel execution on CPU 308 1562 quads quads 1298 420 quads quads
    92. Persistent threads Launch as many threads as possible on GPU Use work stealing scheme to balance workload between threads Kernel works until the job is done
    93. Persistent threads Advantages Workload is balanced between warps automatically time Warp 0 Warp 1 Warp 2 Warp 3
    94. Persistent threads Trick Can write data back to the input buffer Allows performing recursion Input buffer Input Processed Persistent data kernel Output
    95. Subdivision kernel Take a patch from input FIFO Perform bucket check Perform subdivision check Input queue Subdivide and store the results Bucket fail check Discard OK OK Subdivision Subdivide check fail Output queue
    96. Working within limited memory Dicing kernel can generate enormous amounts of data Need to pipeline the computations to fit the limited amount of memory
    97. Uber-kernel Kernel capable of doing multiple types of job Can pipeline the computations Regular kernel Dicing Dicing
    98. Dice/Shade/Sample kernel Uber-kernel capable of dicing, shading and sampling Switch between jobs is based on the state of input queues Uber kernel Dicing Shading Sampling
    99. Dicing queue Regular kernels Do dicing Dicing Requires feedback from kernel Shading queue GPU to CPU and some scheduling logic on the host side Do shading Shading kernel Sampling queue Do sampling Sampling kernel
    100. Dicing queue Uber-kernel Do dicing When shading queue Shading queue Do shading is full, switch to shading When sampling queue If thereproceed with is full, is something Sampling queue left insampling dicing queue, Do sampling return to dicing Dicing queue Do dicing Shading queue Uber-kernel
    101. Dice/Shade/Sample kernel Allows to pipeline the computation process within the limited amount of memory Doesn’t require CPU readbacks and additional host- side scheduling
    102. A-buffer A-buffer is a set of Pixel is a set of Each sample can pixels samples contain up to N color/depth pairs Color = 1.0, 0.0, 1.0, 0.7 Depth = 0.9 Color = 1.0, 1.0, 1.0, 0.1 Depth = 0.5 Color = 1.0, 0.0, 1.0, 0.5 Depth = 0.7 Color = 0.0, 1.0, 0.0, 1.0 Depth = 0.1 empty … empty
    103. Synchronizing access to samples A-buffer is a shared resource Use atomics to secure the slot Count = 1 Thread 0 Thread 1 Thread 2 . . . Color; Depth . . . . . . empty empty empty empty … empty
    104. Synchronizing access to samples A-buffer is a shared resource Use atomics to secure the slot Count = 2 Thread 0 Thread 1 Thread 2 . . . Color; Depth . . . . . . Color; Depth Write . . Inc( count ) . . empty empty empty … empty
    105. Synchronizing access to samples A-buffer is a shared resource Use atomics to secure the slot Count = 2 Thread 0 Thread 1 Thread 2 . . . Color; Depth . . . . . . Color; Depth Write . . Inc( count ) . . empty . . . . empty Write Write empty … empty
    106. Synchronizing access to samples A-buffer is a shared resource Use atomics to secure the slot Count = 3 Thread 0 Thread 1 Thread 2 . . . Color; Depth . . . . . . Color; Depth Write . . Inc( count ) . . Color; Depth . . . . empty Write Write Inc( count ) empty … empty
    107. Synchronizing access to samples A-buffer is a shared resource Use atomics to secure the slot Count = 4 Thread 0 Thread 1 Thread 2 . . . Color; Depth . . . . . . Color; Depth Write . . Inc( count ) . . Color; Depth . . . . Color; Depth Write Write Inc( count ) empty Inc( count ) … empty
    108. Mapping A-buffer to screen Sort and blend all color/depth pairs per sample Sum all samples in pixel to get final pixel color 4 samples per pixel
    109. Mapping A-buffer to screen Sort and blend all color/depth pairs per sample Sum all samples in pixel to get final pixel color 9 samples per pixel
    110. Demo Vortigaunt model Resolution: 512x512 pixels 9 samples per pixel Shading rate: 0.25 ~4 microquads per pixel Rendering time: ~150ms Vortigaunt model © Valve Software
    111. Performance: persistent threads Subdivision kernel via persistent threads vs Regular threads with CPU readbacks Regular threads Persistent threads Perf improvement Bucket time 110ms 40ms 3x
    112. Performance: uber-kernel Dice/Shade/Sample kernel vs Separate kernels and CPU scheduling Separate kernels Uber-kernel Perf improvement Bucket time 500ms 110ms 5x
    113. Future work
    114. Future work Implement completely GPU-accelerated photorealistic renderer Use REYES pipeline to generate fine picture Use photon mapping to compute global illumination
    115. Idea Use REYES to rasterize a picture Compute eye-rays during rasterization During subdivision stage REYES pipeline would generate a set of triangles Use these triangles and eye-rays to compute GI
    116. Idea input Triangles Subdivision Photon mapping Subdivided Photon map triangles Eye-rays REYES Ray tracing dice/sample Combine final image output
    117. Conclusion
    118. Conclusion CUDA has proven to be a powerful instrument for computation-heavy tasks in computer graphics CUDA can also be used in tasks which generate non-uniform work and which are not easy to parallelize Scheduling mechanisms can be implemented in CUDA to allow better workload and memory balancing
    119. Keep in mind Persistent threads Uber-kernels Work stealing Scan (Prefix sum) Histogram pyramids Tiling / Bucketing
    120. Thanks!
    121. Ray-tracing references [ZTTS06] Gernot Ziegller et al. “GPU Point List Generation through Histogram Pyramids” [HSHH07] Daniel Reiter Horn et al. “Interactive k-D Tree GPU Raytracing” [PH04] Matt Pharr, Greg Humphreys “Physically Based Rendering” [AL09] Timo Aila Samuli Laine. “Understanding the Efficiency of Ray Traversal on GPUs” [MT97] Tomas Moller, Ben Trumbore. “Fast Minimum Storage Ray / Triangle Intersection”
    122. REYES references [Zhou09] Kun Zhou et al, “RenderAnts: Interactive REYES Rendering on GPUs” [Patney08] Anjul Patney, John D. Owens, “Real-Time Reyes-Style Adaptive Surface Subdivision” [Harris07] Mark Harris, “Parallel Prefix Sum (Scan) with CUDA” available at developer.nvidia.com
    123. GPU Technology Conference Sept 30 – Oct 2, 2009 – The Fairmont San Jose, California Learn about the latest breakthroughs developers, engineers and researchers are achieving on the GPU Learn about the seismic shifts happening in computing Preview disruptive technologies and emerging applications Get tools/techniques to impact mission critical projects now Network with experts and peers from across several industries Full Conference Pass for only $400! Special for (offer expires 8/21) SIGGRAPH Register and Learn More: www.nvidia.com/gtc Attendees! Use registration code FC300SIG90 NVIDIA Confidential
    124. Confirmed Sessions @ GPU Tech Conf Sept 30 – Oct 2, 2009 – The Fairmont San Jose, California KEYNOTES: Conference Sessions Opening Day 2 Day 3 Covered Topics:  3D  Algorithms & Numerical Techniques  Astronomy & Astrophysics  Computational Finance  Computational Fluid Dynamics  Computational Imaging  Computer Vision  Databases & Data Mining Jen-Hsun Huang Hanspeter Pfister Richard Kerris  Embedded & Mobile Co-Founder / CEO Professor CTO  Energy Exploration NVIDIA Harvard Lucasfilm  Film  GPU Clusters  High Performance Computing  Life Sciences  GENERAL SESSIONS: Machine Learning & Artificial Intelligence • Hot Trends in Visual Computing: Computer Vision,  Medical Imaging & Visualization Augmented Reality, Visual Analytics, Interactive Ray  Molecular Dynamics  Physical Simulation Tracing  Programming Languages & APIs • Breakthroughs in High Performance Computing:  Advanced Visualization Energy, Medical Science, Supercomputing, Research Learn more and register: www.nvidia.com/gtc NVIDIA Confidential

    + NVIDIANVIDIA, 3 months ago

    custom

    1714 views, 0 favs, 3 embeds more stats

    This session shows how NVIDIA CUDA can be used for more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1714
      • 1280 on SlideShare
      • 434 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 26
    Most viewed embeds
    • 421 views on http://developer.nvidia.com
    • 9 views on http://www.developer.nvidia.com
    • 4 views on http://origin-developer.nvidia.com

    more

    All embeds
    • 421 views on http://developer.nvidia.com
    • 9 views on http://www.developer.nvidia.com
    • 4 views on http://origin-developer.nvidia.com

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories