• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Phase Unwrapping and Affine Transformations using CUDA Perhaad Mistry, Sherman Braganza, David Kaeli, Miriam Leeser ECE Department Northeastern University Boston, MA
  • 2. Keck 3D Fusion Microscope
  • 3. Motivation Unwrapped Phase Image Wrapped Phase Image 3
  • 4. Motivation Phase Unwrapping used for research in In-Vitro Fertilization (IVF) in Optical Quadrature Microscopy Cell counts obtained after unwrapping determine viability of embryos Results required at close to real time to see changes in sample Improve cost-effectiveness of OQM and quality of research 4
  • 5. Outline Optical Quadrature Microscopy Affine Transformations Minimum LP Norm Phase Unwrapping MEX Interface Usage Performance Affine Phase OQM Frame Transforms Unwrapping Grabber Future Work Conclusion OQM Imaging System Layout 5
  • 6. OQM Imaging Pattern determined by phase Camera 0 PBS difference between laser beams Camera 1 Camera 3 Mirror Phase calculated using magnitudes captured from four cameras NPBS Camera 2 Mixed (M), signal only(S), REF SIG reference(R) and dark (D) images Phase difference measured by Sample angle(E) ranges from –π to π 1 n=3 n M n − S n − Rn E = ∑i * 4 n=0 Rn From Mirror Laser angle(E) yields phase that is required for unwrapping Optical Quadrature Microscopy Layout 6
  • 7. Affine Transformation Align images to fix sample’s position in different images Phase Affine Microscope’s Unwrapping Transforms Frame 2x2 and 2x1 matrices Grabber obtained from microscope Image pixels divided among Algorithm for Affine Transform [1] blocks of threads (x,y) = f(BlockId, ThreadID) Each thread calculates one Load Data = Image[x][y] pixel Includes - Rotation (θ), Scaling (s), Shearing (sh) ⎛ x' ⎞ ⎛ s(x)*cos(θ) sh(y)*sin(θ) ⎞ ⎛ x ⎞ ⎛ a ⎞ Implemented using CUDA ⎜ ⎟=⎜ ⎟⎜ ⎟+⎜ ⎟ y' ⎠ ⎝ -sh(x)*sin(θ) s(y)*cos(θ) ⎠ ⎝ y ⎠ ⎝ b ⎠ ⎝ since result needed for Phase Store Image[x'][y'] = Data Unwrapping (Uncoalesced W rites) [1] A. K. Jain, Fundamentals of digital image processing, Prentice Hall 7
  • 8. Noise Removal & Phase Information Separate “noise” only image available (Dark Image) Dark voltage image subtracted from the mixed, signal and reference images Subtraction of signal image (Sn) and reference image (Rn) due to fixed pattern levels of individual cameras Mn = Mn − Dn and so on for Sn and Rn 1 n=3 n Mn − Sn − Rn E = ∑i * n: camera number 4 n =0 Rn phase = angle(E ) Square root of Rn to balance intensities for detection 8
  • 9. Example Images for Affine Transform Mixed Image Signal only Image After Affine Transform and Noise Removal 9 Dark Image Reference Image
  • 10. Phase Unwrapping Frame Phase Affine Grabber Unwrapping Transforms Optical phase difference from interferometric image is shown Parameter of interest is Optical path difference which can be more than π Unwrapping along a line: Sum till a gradient of π then, add 2π In 2D if noise is present Path dependencies while summing Wrapped Image Causes accumulation errors (note range of values) 10
  • 11. MATLAB’s Unwrap v/s Minimum LP Norm Unwrap Using Minimum Lp Unwrap Using MATLAB’s unwrap() function Norm Algorithm [1] 11 [1] Dennis C. Ghiglia, Mark D. Pritt: Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software
  • 12. P Minimum L Norm Algorithm Minimize difference = (φi +1, j − φi , j )Ui , j + (φi +1, j − φi , j )Vi , j ci , j between gradients of −(φi , j − φi −1, j )Ui −1, j − (φi , j − φi , j −1 )Vi , j wrapped and unwrapped = ∆ ix, j U (i , j ) − ∆ ix−1, j U (i − 1, j ) + ci , j data [1] ∆ iy, jV (i , j ) − ∆ iy−1, jV (i − 1 j ) , Nonlinear PDE since U and where V are data dependant φx ,y is the unwrapped phase at (x,y) weights ∆ x ,y and ∆ y ,y denote the wrapped phase in the PDE solved iteratively x x x and y direction respectively using the Preconditioned Conjugate Gradient Method Can be expressed as Qφ = c [1] Dennis C. Ghiglia, Mark D. Pritt. Two- Dimensional Phase Unwrapping: Theory, 12 Algorithms, and Software, Wiley New York
  • 13. Preconditioned Conjugate Gradient Solve un-weighted least ε 2 = M −2 N−2 (φ − φ − ∆x )2 ∑ ∑ i +1, j i , j i , j square phase unwrapping i =0 j =0 problem M − 2 N −2 + ∑ ∑ (φi +1, j − φi , j − ∆iy, j )2 Minimize ε2 i =0 j =0 where φi , j is the wrapped phase at (i,j) and Preconditioning using a DCT needed ∆ix, j is the unwrapped phase at (i,j) After discretizing the equation Conjugate gradient and taking the 2D Fourier Transform method called after DCT Ρ m,n preconditioning Φ= 2cos(π m ) + 2cos(π n ) − 4 m,n M N where Φ and Ρ are the Fourier Transformed versions of φ and ∆ respectively 13
  • 14. Better DCT Implementation Implementing N point DCT using DFT uses 2N point DFT Another method seen in [1] Rearrange x(n) into v(n) such that ⎡ N − 1⎤ v ( n ) = x (2 n ) 0≤n≤⎢ ⎥ ⎣2⎦ ⎡ N + 1⎤ = x (2 N − 2 n − 1) ⎢ 2 ⎥ ≤ n ≤ N −1 ⎣ ⎦ W4kN * Real(v(n))) DCT(x(n)) = 2*DFT( For 2D DCT: 4x less work since N*N instead of 2N*2N PCG is ~ 90% of LP Norm Execution time Shuffle kernel before CUFFT call [1] Makhoul J. A fast cosine transform in one and two dimensions. 14
  • 15. Minimum L Norm Algorithm P for k ← 0 to LP Norm Count Calculate Data Derived Weights Preconditioning for j ← 0 to PCG Iteration Count before CG DCT To DFT Steps (Shuffle Kernel) Execute CUFFT Call Point-Wise Complex Multiplication to Get DCT Result PCG Scaling Step Algorithm Execute CUFFT Call to do iDCT Conjugate Gradient (Point Wise Matrix Operations) end for if Convergence exit end for 15
  • 16. Example Images – Phase Unwrapping Wrapped Phase Data Unwrapped Phase Data Images of Glass Bead and water 16
  • 17. Example Images – Mouse Embryo Wrapped Phase Data Unwrapped Phase Data 17
  • 18. MATLAB External Interface (MEX) MEX produces linked Frame Grabber objects from C code Image Acquisition Tool Box MATLAB present in our system due to the frame MATLAB Legacy grabbers MATLAB Legacy Interfaces Interface Gateway function in C Yes makes MATLAB data (mxArray) visible to our Affine Transform Phase Wrapper(C) Using Unwrapping Save linked C code MEX Wrapper (C) Affine ? Using MEX No Affine Transform Phase & Noise Removal Unwrapping GPU GPU code Code(CUDA) (CUDA) 18 OQM Processing Control Flow
  • 19. Performance (Phase Unwrapping) Baseline GPU MEX and GPU Mex & GPU GPU Time Time Speedup Time(sec) Speedup Preconditioning 11.17 1.2 9.3X 1.2 9.3X Conjugate Grad. 2.89 0.55 5.25X 0.55 5.25X IO Activity 0.7 0.7 1X 0.02 NA Miscellaneous 0.79 0.51 1.53X 0.41 1.92X 7.20X Total for Unwrap 15.55 2.97 5.24X 2.16 Baseline : Serial implementation using FFTW for the Fourier Transform 19
  • 20. Performance Results
  • 21. Future Work Study implementation of Conjugate Gradient Study scatter operations in CUDA Present literature shows improvements only for larger data sizes (may be better on G200) Multi-Gpu version for imaging scenarios where images may be grabbed at faster rates Presently only one image at a time acquired. Phase Unwrapping also required for applications like Synthetic Aperture Radar (SAR) 21
  • 22. Conclusions Implemented the computationally intensive Minimum Lp Norm Phase Unwrapping and Affine Transforms on the GPU Not a batch-oriented process Latency was critical, single image used at a time Reducing latency throughout the system improves quality of research and time to discovery in Optical Science Laboratory 22
  • 23. Thank You Questions? Perhaad Mistry (pmistry@ece.neu.edu)