Generation of planar radiographs from 3D             anatomical models using the GPU                                     A...
Contents    Introduction and Context    CUDA Platform    Input Data    Pre-Processing Steps    Developed Algorithms    Con...
Introduction and Context    CUDA Platform    Input Data    Pre-Processing Steps    Developed Algorithms    Conclusion     ...
DRRs    • Digitally Reconstructed Radiographs – DRRs    • Artificial Radiographs taken from vertebrae models   Figure: L3 V...
DRRs – Why?  • Shape recovery of human spine    ◦ 100s of DRRs per second  • Scoliosis Evaluation    ◦ Alternative to MRIs...
Project’s Objective    Build Fast DRR Algorithms    • Common bottleneck!      ◦ Applications in medical area – high throug...
Existing Solution – GLSL    • GLSL implementation – multi-pass working solution    • Depth Peeling Based – Cass Everitt, I...
Algorithm Concepts        Image Plane                                                                          Obje       ...
Introduction and Context    CUDA Platform    Input Data    Pre-Processing Steps    Developed Algorithms    Conclusion     ...
CUDA Platform  • Compute Unified Device Architecture      ◦   Parallel Computing Architecture      ◦   Exposes GPU function...
CUDA Platform – Threading and Memory                                                                    9/André Cardoso   ...
Introduction and Context    CUDA Platform    Input Data    Pre-Processing Steps    Developed Algorithms    Conclusion     ...
Inputs for Our Algorithms  • Geometry file – the      vertebrae models                                                     ...
Inputs for Our Algorithms  • Geometry file – the      vertebrae models                                                     ...
Inputs for Our Algorithms  • Camera Calibration Matrix                                                                    ...
Inputs for Our Algorithms                                                                                               ...
Introduction and Context    CUDA Platform    Input Data    Pre-Processing Steps    Developed Algorithms    Conclusion     ...
Pre-Processing Steps 1. 2D Bounding Box                                                                    11 /André Cardo...
Pre-Processing Steps 1. 2D Bounding Box                                                                    11 /André Cardo...
Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source)                                                            ...
Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source)                                                            ...
Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction      (for each pixel)      ◦ R(t) = O + tD...
Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction      (for each pixel)      ◦ R(t) = O + tD...
Introduction and Context    CUDA Platform    Input Data    Pre-Processing Steps    Developed Algorithms    Conclusion     ...
Image Order Approach                                         1 Thread for Each Pixel                                      ...
Image Order Approach – Problems 1. Many threads looping    over many triangles                   L3 Vertebra Model        ...
Image Order Approach – Problems 1. Many threads looping    over many triangles 2. Useless intersection    tests – heavy   ...
Image Order Approach – Problems 1. Many threads looping    over many triangles 2. Useless intersection    tests – heavy   ...
Image Order Approach – Results  • L3 vertebra model  • PA camera – 265 × 137    pixels  • GPU time only!  • Incomplete imp...
Object Order Approach                                         1 Thread for Each Triangle  • Ray Casting!                  ...
Object Order Approach – Problems 1. Concurrency problems on                                                               ...
Object Order Approach – Problems 1. Concurrency problems on    pixel data. 2. Still many intersection    tests            ...
Object Order Approach – Problems 1. Concurrency problems on    pixel data. 2. Still many intersection    tests 3. Artifact...
Object Order Approach – Results  • L3 vertebra model  • PA camera – 265 × 137    pixels  • GPU time only!  • Incomplete im...
Multi-depth Approach - Principle    Assume a Simplification    • Discard the Euclidean distance between intersections!    •...
Multi-depth Approach - Pipeline    • Rasterization done using Scanline+Bresenham algorithm      ◦ Filling convention avoid...
Multi-depth Approach - Depth arrayOrdering    atomicMin inserts in right place      1:   initializeDepthArrays(MAX _INTEGE...
Multi-depth Approach - Results  • Best time:    ◦ 202 × 132 pixels    ◦ GPU + CPU time!      ◦ Performance With and       ...
Multi-depth Optimization    • Multi-depth allows for an ordered set of depths      ◦ More depths =⇒ more atomicMin() calls...
Multi-depth Optimization                                                    Concurrent Threads                            ...
Multi-depth Optimization – Results    • A-buffer Scheme Versus GLSL Solution    • 202 × 132 pixels                         ...
Multi-depth Optimization – Results          Better than Current Solution                                                  ...
Introduction and Context    CUDA Platform    Input Data    Pre-Processing Steps    Developed Algorithms    Conclusion     ...
Conclusion    • CUDA implementations for DRR extraction      ◦ Both pre-processing and main computation tasks      ◦ Artif...
Future Work      There’s a Big Chart to Fill Up...                                                                    25 /...
Future Work    • Still some artifacts    • Memory operations optimizations    • Comparisons with other implementations, ot...
Thank You for Listening!                                         Ask Away!                                                ...
Upcoming SlideShare
Loading in …5
×

Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

981 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
981
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

  1. 1. Generation of planar radiographs from 3D anatomical models using the GPU André dos Santos Cardoso Supervisor: Jorge M. G. Barbosa University of Porto Faculty of Engineering of University of Porto 11th February, 2011 1/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 1/27
  2. 2. Contents Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 2/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 2/27
  3. 3. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 2/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 2/27
  4. 4. DRRs • Digitally Reconstructed Radiographs – DRRs • Artificial Radiographs taken from vertebrae models Figure: L3 Vertebra, frontal DRR Figure: L3 Vertebra, lateral DRR 3/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 3/27
  5. 5. DRRs – Why? • Shape recovery of human spine ◦ 100s of DRRs per second • Scoliosis Evaluation ◦ Alternative to MRIs and CTs 4/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 4/27
  6. 6. Project’s Objective Build Fast DRR Algorithms • Common bottleneck! ◦ Applications in medical area – high throughputs are demanded • Take advantage new GPUs and APIs ◦ Common workstations could do the job! 5/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 5/27
  7. 7. Existing Solution – GLSL • GLSL implementation – multi-pass working solution • Depth Peeling Based – Cass Everitt, Interactive Order-Independent Transparency • Let’s try to enhance its performance!! 6/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 6/27
  8. 8. Algorithm Concepts Image Plane Obje ct P4 P3 P2 Object P1 X-ray source Problem! Potential Artifact Generation! • Each ray traverses the object ◦ Energy is attenuated PixelColor = exp ((||P2 − P1 || + ||P4 − P3 ||) × AttenuationFactor ) • Common edges may lead to artifact generation! 7/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 7/27
  9. 9. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 7/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 7/27
  10. 10. CUDA Platform • Compute Unified Device Architecture ◦ Parallel Computing Architecture ◦ Exposes GPU functions and memory ◦ SIMT execution model ◦ Allows hierarchical configuration of threads • Cheap threads, dozens/hundreds of cores ◦ Thousands of concurrent threads! • GeForce GT 240 ◦ 96 cores ◦ 12288 active threads 8/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 8/27
  11. 11. CUDA Platform – Threading and Memory 9/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 9/27
  12. 12. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 9/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 9/27
  13. 13. Inputs for Our Algorithms • Geometry file – the vertebrae models 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  14. 14. Inputs for Our Algorithms • Geometry file – the vertebrae models 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  15. 15. Inputs for Our Algorithms • Camera Calibration Matrix 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  16. 16. Inputs for Our Algorithms   αu λ u0 C =  0 αv v0    0 0 1   f 0 0 0 • Camera Calibration Matrix P= 0 f 0 0    0 0 1 0 R t K= 0T 1 3 X     u  Y  s v  = C.P.K.   Z       1 Figure: Pinhole Model 1 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  17. 17. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  18. 18. Pre-Processing Steps 1. 2D Bounding Box 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  19. 19. Pre-Processing Steps 1. 2D Bounding Box 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  20. 20. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  21. 21. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  22. 22. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction (for each pixel) ◦ R(t) = O + tD 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  23. 23. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction (for each pixel) ◦ R(t) = O + tD 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  24. 24. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  25. 25. Image Order Approach 1 Thread for Each Pixel • Thread ⇐⇒ Ray • Thread loops over ALL triangles • Ray Casting! ◦ Tests intersections between ray and triangle ◦ Acumulates distances to source along ray path 12 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 12/27
  26. 26. Image Order Approach – Problems 1. Many threads looping over many triangles L3 Vertebra Model • 776 vertices, 1552 triangles • PA perspective: 266 × 138 pixels = 36708 threads 13 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  27. 27. Image Order Approach – Problems 1. Many threads looping over many triangles 2. Useless intersection tests – heavy operations! 13 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  28. 28. Image Order Approach – Problems 1. Many threads looping over many triangles 2. Useless intersection tests – heavy operations! 3. Artifacts – hard to take care of! 13 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  29. 29. Image Order Approach – Results • L3 vertebra model • PA camera – 265 × 137 pixels • GPU time only! • Incomplete implementation SLOW! 14 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 14/27
  30. 30. Object Order Approach 1 Thread for Each Triangle • Ray Casting! • Thread loops over each pixel covered • Threads spanned for by the triangle bounding box each triangle ◦ Tests intersections between ray and ◦ Reverse the approach triangle of the former ◦ Acumulates distances to source algorithm! along ray path • Concurrency problems! 15 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 15/27
  31. 31. Object Order Approach – Problems 1. Concurrency problems on Concurrent Threads pixel data. ◦ Fang Liu et al, FreePipe: a programmable parallel int index = atomicInc(sharedCounter); rendering architecture for efficient multi-fragment Pixel Bu er effects 16 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  32. 32. Object Order Approach – Problems 1. Concurrency problems on pixel data. 2. Still many intersection tests 16 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  33. 33. Object Order Approach – Problems 1. Concurrency problems on pixel data. 2. Still many intersection tests 3. Artifacts still hard to avoid or correct 16 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  34. 34. Object Order Approach – Results • L3 vertebra model • PA camera – 265 × 137 pixels • GPU time only! • Incomplete implementation SLOW! 17 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 17/27
  35. 35. Multi-depth Approach - Principle Assume a Simplification • Discard the Euclidean distance between intersections! • Consider only distance between Fragments, along depth axis!! P2 d1 P1 P’2 d2 P’1 Source 18 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 18/27
  36. 36. Multi-depth Approach - Pipeline • Rasterization done using Scanline+Bresenham algorithm ◦ Filling convention avoids artifacts :) ! • Interpolation in Integer interval Z −Zmin ◦ Depth = Zmax −Zmin × INT _MAX • Saving depth in pixel array, raises concurrency problems (again)! 19 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 19/27
  37. 37. Multi-depth Approach - Depth arrayOrdering atomicMin inserts in right place 1: initializeDepthArrays(MAX _INTEGER) 2: Znew ← interpolateDepth() 3: for i = 0 to DEPTH_ARRAY _SIZE − 1 do 4: Zold ← atomicMin(&(getPixelDepthArray (u, v , i)), Znew ) 5: if Zold == MAX _INTEGER then 6: break 7: end if 8: Znew ← fmaxf (Znew , Zold) 9: end for • Fang Liu et al, FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects 20 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 20/27
  38. 38. Multi-depth Approach - Results • Best time: ◦ 202 × 132 pixels ◦ GPU + CPU time! ◦ Performance With and Without DRR transfer to host! BETTER! 21 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 21/27
  39. 39. Multi-depth Optimization • Multi-depth allows for an ordered set of depths ◦ More depths =⇒ more atomicMin() calls We can postpone depth Ordering... 1: index ← atomicInc(&counter, INT_MAX) 2: depthArray [index ] ← Znew // RAW-hazard free!!!! • depthArray has all the depth values; ◦ Ordering can be done on a post-processing kernel!!! 22 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 22/27
  40. 40. Multi-depth Optimization Concurrent Threads int index = atomicInc(sharedCounter); Pixel Bu er 22 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 22/27
  41. 41. Multi-depth Optimization – Results • A-buffer Scheme Versus GLSL Solution • 202 × 132 pixels 23 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  42. 42. Multi-depth Optimization – Results Better than Current Solution 23 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  43. 43. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 23 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  44. 44. Conclusion • CUDA implementations for DRR extraction ◦ Both pre-processing and main computation tasks ◦ Artifact-free • Single geometry pass • Shared memory model ◦ May be adapted to other technologies • Final implementation shows better performance than GLSL 24 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 24/27
  45. 45. Future Work There’s a Big Chart to Fill Up... 25 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 25/27
  46. 46. Future Work • Still some artifacts • Memory operations optimizations • Comparisons with other implementations, other geometry models • Build a DRR generation library ◦ possibly an open-source project • Participation in IJUP’11 • Paper preparation for VIPIMAGE 2011. Abstract Deadline: 15th March. 26 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 26/27
  47. 47. Thank You for Listening! Ask Away! 27 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 27/27

×