Your SlideShare is downloading. ×
Generation of Planar Radiographs from 3D Anatomical Models Using the GPU
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Generation of Planar Radiographs from 3D Anatomical Models Using the GPU

757

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
757
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Generation of planar radiographs from 3D anatomical models using the GPU André dos Santos Cardoso Supervisor: Jorge M. G. Barbosa University of Porto Faculty of Engineering of University of Porto 11th February, 2011 1/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 1/27
  • 2. Contents Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 2/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 2/27
  • 3. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 2/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 2/27
  • 4. DRRs • Digitally Reconstructed Radiographs – DRRs • Artificial Radiographs taken from vertebrae models Figure: L3 Vertebra, frontal DRR Figure: L3 Vertebra, lateral DRR 3/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 3/27
  • 5. DRRs – Why? • Shape recovery of human spine ◦ 100s of DRRs per second • Scoliosis Evaluation ◦ Alternative to MRIs and CTs 4/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 4/27
  • 6. Project’s Objective Build Fast DRR Algorithms • Common bottleneck! ◦ Applications in medical area – high throughputs are demanded • Take advantage new GPUs and APIs ◦ Common workstations could do the job! 5/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 5/27
  • 7. Existing Solution – GLSL • GLSL implementation – multi-pass working solution • Depth Peeling Based – Cass Everitt, Interactive Order-Independent Transparency • Let’s try to enhance its performance!! 6/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 6/27
  • 8. Algorithm Concepts Image Plane Obje ct P4 P3 P2 Object P1 X-ray source Problem! Potential Artifact Generation! • Each ray traverses the object ◦ Energy is attenuated PixelColor = exp ((||P2 − P1 || + ||P4 − P3 ||) × AttenuationFactor ) • Common edges may lead to artifact generation! 7/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 7/27
  • 9. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 7/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 7/27
  • 10. CUDA Platform • Compute Unified Device Architecture ◦ Parallel Computing Architecture ◦ Exposes GPU functions and memory ◦ SIMT execution model ◦ Allows hierarchical configuration of threads • Cheap threads, dozens/hundreds of cores ◦ Thousands of concurrent threads! • GeForce GT 240 ◦ 96 cores ◦ 12288 active threads 8/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 8/27
  • 11. CUDA Platform – Threading and Memory 9/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 9/27
  • 12. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 9/André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 9/27
  • 13. Inputs for Our Algorithms • Geometry file – the vertebrae models 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 14. Inputs for Our Algorithms • Geometry file – the vertebrae models 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 15. Inputs for Our Algorithms • Camera Calibration Matrix 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 16. Inputs for Our Algorithms   αu λ u0 C =  0 αv v0    0 0 1   f 0 0 0 • Camera Calibration Matrix P= 0 f 0 0    0 0 1 0 R t K= 0T 1 3 X     u  Y  s v  = C.P.K.   Z       1 Figure: Pinhole Model 1 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 17. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 10 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 10/27
  • 18. Pre-Processing Steps 1. 2D Bounding Box 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 19. Pre-Processing Steps 1. 2D Bounding Box 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 20. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 21. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 22. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction (for each pixel) ◦ R(t) = O + tD 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 23. Pre-Processing Steps 1. 2D Bounding Box 2. (Projection Source) 3. Ray Direction (for each pixel) ◦ R(t) = O + tD 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 24. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 11 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 11/27
  • 25. Image Order Approach 1 Thread for Each Pixel • Thread ⇐⇒ Ray • Thread loops over ALL triangles • Ray Casting! ◦ Tests intersections between ray and triangle ◦ Acumulates distances to source along ray path 12 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 12/27
  • 26. Image Order Approach – Problems 1. Many threads looping over many triangles L3 Vertebra Model • 776 vertices, 1552 triangles • PA perspective: 266 × 138 pixels = 36708 threads 13 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  • 27. Image Order Approach – Problems 1. Many threads looping over many triangles 2. Useless intersection tests – heavy operations! 13 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  • 28. Image Order Approach – Problems 1. Many threads looping over many triangles 2. Useless intersection tests – heavy operations! 3. Artifacts – hard to take care of! 13 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 13/27
  • 29. Image Order Approach – Results • L3 vertebra model • PA camera – 265 × 137 pixels • GPU time only! • Incomplete implementation SLOW! 14 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 14/27
  • 30. Object Order Approach 1 Thread for Each Triangle • Ray Casting! • Thread loops over each pixel covered • Threads spanned for by the triangle bounding box each triangle ◦ Tests intersections between ray and ◦ Reverse the approach triangle of the former ◦ Acumulates distances to source algorithm! along ray path • Concurrency problems! 15 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 15/27
  • 31. Object Order Approach – Problems 1. Concurrency problems on Concurrent Threads pixel data. ◦ Fang Liu et al, FreePipe: a programmable parallel int index = atomicInc(sharedCounter); rendering architecture for efficient multi-fragment Pixel Bu er effects 16 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  • 32. Object Order Approach – Problems 1. Concurrency problems on pixel data. 2. Still many intersection tests 16 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  • 33. Object Order Approach – Problems 1. Concurrency problems on pixel data. 2. Still many intersection tests 3. Artifacts still hard to avoid or correct 16 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 16/27
  • 34. Object Order Approach – Results • L3 vertebra model • PA camera – 265 × 137 pixels • GPU time only! • Incomplete implementation SLOW! 17 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 17/27
  • 35. Multi-depth Approach - Principle Assume a Simplification • Discard the Euclidean distance between intersections! • Consider only distance between Fragments, along depth axis!! P2 d1 P1 P’2 d2 P’1 Source 18 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 18/27
  • 36. Multi-depth Approach - Pipeline • Rasterization done using Scanline+Bresenham algorithm ◦ Filling convention avoids artifacts :) ! • Interpolation in Integer interval Z −Zmin ◦ Depth = Zmax −Zmin × INT _MAX • Saving depth in pixel array, raises concurrency problems (again)! 19 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 19/27
  • 37. Multi-depth Approach - Depth arrayOrdering atomicMin inserts in right place 1: initializeDepthArrays(MAX _INTEGER) 2: Znew ← interpolateDepth() 3: for i = 0 to DEPTH_ARRAY _SIZE − 1 do 4: Zold ← atomicMin(&(getPixelDepthArray (u, v , i)), Znew ) 5: if Zold == MAX _INTEGER then 6: break 7: end if 8: Znew ← fmaxf (Znew , Zold) 9: end for • Fang Liu et al, FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects 20 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 20/27
  • 38. Multi-depth Approach - Results • Best time: ◦ 202 × 132 pixels ◦ GPU + CPU time! ◦ Performance With and Without DRR transfer to host! BETTER! 21 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 21/27
  • 39. Multi-depth Optimization • Multi-depth allows for an ordered set of depths ◦ More depths =⇒ more atomicMin() calls We can postpone depth Ordering... 1: index ← atomicInc(&counter, INT_MAX) 2: depthArray [index ] ← Znew // RAW-hazard free!!!! • depthArray has all the depth values; ◦ Ordering can be done on a post-processing kernel!!! 22 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 22/27
  • 40. Multi-depth Optimization Concurrent Threads int index = atomicInc(sharedCounter); Pixel Bu er 22 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 22/27
  • 41. Multi-depth Optimization – Results • A-buffer Scheme Versus GLSL Solution • 202 × 132 pixels 23 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  • 42. Multi-depth Optimization – Results Better than Current Solution 23 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  • 43. Introduction and Context CUDA Platform Input Data Pre-Processing Steps Developed Algorithms Conclusion 23 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 23/27
  • 44. Conclusion • CUDA implementations for DRR extraction ◦ Both pre-processing and main computation tasks ◦ Artifact-free • Single geometry pass • Shared memory model ◦ May be adapted to other technologies • Final implementation shows better performance than GLSL 24 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 24/27
  • 45. Future Work There’s a Big Chart to Fill Up... 25 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 25/27
  • 46. Future Work • Still some artifacts • Memory operations optimizations • Comparisons with other implementations, other geometry models • Build a DRR generation library ◦ possibly an open-source project • Participation in IJUP’11 • Paper preparation for VIPIMAGE 2011. Abstract Deadline: 15th March. 26 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 26/27
  • 47. Thank You for Listening! Ask Away! 27 /André Cardoso andre.cardoso@fe.up.pt DRR Synthesis Algorithms 27 27/27

×