Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bp.On.Cuda

989 views

Published on

Belief Propagation Algorithm using CUDA

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Bp.On.Cuda

  1. 1. Disparity-Map Generation using GPUs<br />Yan Xu<br />Tutor: Hui Chen<br />School of Information Science and Engineering<br />Aug. 1, 2009<br />
  2. 2. Tsukuba Right Image<br />Tsukuba Left Image<br />Ground Truth<br />
  3. 3. Disparity-Map in Stereo Vision<br />Parallel Programming<br />Programming on GPUs<br />Belief Propagation<br />BP on CUDA<br />Experiment Results<br />Conclusions and Future Works<br />Over View<br />
  4. 4. Disparity-Map Generation<br />Disparity-Map<br />Stereo Match<br />Rectification<br />Calibration<br />
  5. 5. Local Algorithm<br />Belief Propagation<br />Graph Cut<br />Dynamic Programming<br />Disparity-Map Generation<br />
  6. 6. Ground Truth<br />Tsukuba Left Image<br />Tsukuba Right Image<br />Disparity Image by BP (F. Felzenszwalb)<br />Disparity Image by DP<br />Disparity Image by GC (Kolmogorov)<br />
  7. 7. Parallel Programming<br />Serial Programming<br />Parallel Programming<br />
  8. 8. Parallel Programming<br />Traditionally, software has been written for serial computation:<br />• To be run on a single computer having a single Central Processing Unit (CPU).<br />• A problem is broken into a discrete series of instructions.<br />• Instructions are executed one after another.<br />• Only one instruction may execute at any moment in time.<br />•Serial <br />Programming<br />Parallel <br />Programming<br />
  9. 9. Parallel Programming<br />In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:<br />• To be run using multiple CPUs.<br />• A problem is broken into discrete parts that can be solved concurrently.<br />• Each part is further broken down to a series of instructions.<br />• Instructions from each part execute simultaneously on different CPUs.<br />Serial <br />Programming<br />• Parallel <br />Programming<br />
  10. 10. Serial<br />Parallel<br />
  11. 11. Programming on GPUs<br />CPU (Host)<br />GPU (Device)<br />
  12. 12. (Device) Grid<br />Block (0, 0)<br />Block (1, 0)<br />Shared Memory<br />Shared Memory<br />Registers<br />Registers<br />Registers<br />Registers<br />Thread (0, 0)<br />Thread (1, 0)<br />Thread (0, 0)<br />Thread (1, 0)<br />Local<br />Memory<br />Local<br />Memory<br />Local<br />Memory<br />Local<br />Memory<br />Host<br />Global<br />Memory<br />Constant<br />Memory<br />Texture<br />Memory<br />Grid 1<br />Block<br />(0, 0)<br />Block<br />(1, 0)<br />Block<br />(2, 0)<br />Block<br />(0, 1)<br />Block<br />(1, 1)<br />Block<br />(2, 1)<br />Grid 2<br />Block (1, 1)<br />Thread<br />(0, 1)<br />Thread<br />(1, 1)<br />Thread<br />(2, 1)<br />Thread<br />(3, 1)<br />Thread<br />(4, 1)<br />Thread<br />(0, 2)<br />Thread<br />(1, 2)<br />Thread<br />(2, 2)<br />Thread<br />(3, 2)<br />Thread<br />(4, 2)<br />Thread<br />(0, 0)<br />Thread<br />(1, 0)<br />Thread<br />(2, 0)<br />Thread<br />(3, 0)<br />Thread<br />(4, 0)<br />Programming on GPUs<br />Host<br />Device<br />Kernel 1<br />Kernel 2<br />
  13. 13. Programming on GPUs<br />Main() {<br />//Allocate memory on GPU<br />float *Md;<br />cudaMalloc((void**)&Md, size);<br />//Copy data from CPU to GPU<br />cudaMemcpy(Md, M, size, cudaMemcpyHostToDevice);<br />//Call GPU kernel function<br /> kernel&lt;&lt;&lt;dimGrid, dimBlock&gt;&gt;&gt; (arguments);<br />//Copy data from GPU back to CPU<br />CopyFromDeviceMatrix(M, Md);<br />//Free device matrices<br />FreeDeviceMatrix(Md);<br />}<br />
  14. 14. Programming on GPUs<br />• CUDA (Compute Unified Device Architecture) is a computing architecture developed by nVIDIA to use graphic processing unit as a general purpose parallel processor.<br />nVIDIAGeFroce 8800<br />
  15. 15. Belief Propagation Algorithm<br />mlabels<br />s sites<br />data costs + discontinuity costs<br />
  16. 16. Belief Propagation Algorithm<br />
  17. 17. Belief Propagation on CUDA<br />1. Allocate GPU global memory<br />2. Load original images (left and right) to GPU global memory<br />3. (If real-world image) Pre-process images with Sobel / Residual<br />4. Calculate data cost<br />5. Calculate the data (Gaussian) pyramid<br />6. Message passing using created pyramid<br />7. Compute disparity map from messages and data-cost<br />8. Retrieve disparity map to local (host) memory<br />
  18. 18. Experiment Results<br />video<br />
  19. 19. Experiment Results<br />Original<br />
  20. 20. Experiment Results<br />Sobel<br />
  21. 21. Experiment Results<br />video<br />Residual<br />
  22. 22. Conclusions and Future Works<br />• Improve Belief Propagation (faster and better)<br />• Implement other stereo algorithms in parallel (such as DP, GC…)<br />• Apply the algorithm to stereo images captured by Truck <br />
  23. 23. Thank you for your attention !<br />Questions ?<br />

×