Upcoming SlideShare
×

# Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor

595 views

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
595
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
7
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor

1. 1. Implementing 3D SPHARM Surfaces Registration on Cell Processor Huian Li (huili@indiana.edu) Mi Yan (miyan@us.ibm.com) Robert Henschel (rhensche@indiana edu) (rhensche@indiana.edu) Li Shen (shenli@iupui edu) (shenli@iupui.edu) July 29, 2009
2. 2. Contents • SPHARM registration • Matlab implementation • Cell implementation • Performance Analysis • Conclusion
3. 3. SPHARM Surfaces • R di l and stellar surfaces Radial d t ll f • Simply connected, arbitrarily shaped • Vision, graphics, imaging, bioinformatics
4. 4. SPHARM Expansion ( )  (x y z) (,)  (x,y,z) ( ) (,) (x,y,z) ( ) Area-preserving mapping
5. 5. SHREC (a) template, (b) object, (c) after ICP, (d) after registration of p g parameterization
6. 6. Calculation of coefficients • After rotating the parameter net on the surface in Euler angles (α, β, γ), new coefficients will be: l c (  )  m l  nl D l mn (  ) c l n where min( l  n ,l  m ) D mn ( )  e (  i m  in ) ( l  (  1) t d mnt (  )) t  max( 0 , n  m ) l and (l  n)!(l  n)!(l  m)!(l  m)!   d mnt (  )  l  (cos ) ( 2l nm2t ) (sin ) ( 2t mn ) (l  n  t )!(l  m  t )!(t  m  n)!t! 2 2
7. 7. RMSD • RMSD (Root Mean Square Distance): distance between two SPHARM models L max l 1 RMSD  4   l0 m l || c 1ml  c 2 , l || 2 , m m m c and c 1 ,l 2 ,l are coefficients of two SPHARM models
8. 8. Matlab implementation • A straightforward implementation in Matlab: for l = 0 Lmax 0, for m = -l, l for n = -l, l l for t = max(0, n-m), min(l+m, l-n) ... performing calculations ... • One rotation for Lmax = 50 took 823 seconds on 2GHz quad quad- core Intel Xeon E5335
9. 9. Cell B.E.
10. 10. Cell implementation • Domain decomposition: for l = 0, Lmax for m = -l l l, for n = -l, l for t = max(0 n-m) min(l+m l-n) max(0, n m), min(l+m, l n) ... calculations ... • Decomposition along l leads to work load imbalance among SPUs • Decomposition along m creates unnecessary data p g y communication
11. 11. Cell implementation • Loop fusion: for l = 0, Lmax for m = -l l l, for n = -l, l for t = max(0 n-m) min(l+m l-n) max(0, n m), min(l+m, l n) ... calculations ... • Unique index for combined loop: f(l, m) = l2 + m + l • W kl d f each SPE : Workload for h (Lmax + 1)2/(total # of SPEs)
12. 12. Cell implementation • Lookup table T for factorial • Transform exponentials & multiplications into multiplications & additions respectively additions, respectively. (l  n)!(l  n)!(l  m)!(l  m)!   d l ( )   (cos ) ( 2l nm2t ) (sin ) ( 2t mn ) (l  n  t )!(l  m  t )!(t  m  n)!t! mnt 2 2  exp( 1  (T (l  n )  T (l  n )  T (l  m )  T (l  m )) 2  T (l  n  t )  T (l  m  t )  T (t  m  n )  T (t )    ( 2l  n  m  2t )  log(cos )  ( 2t  m  n )  log(sin )) 2 2
13. 13. Cell implementation • Others that specific to Cell: • Vectorization & data alignment • DMA data transfer between main memory & local store • SPU d decrementert
14. 14. Cell implementation • Single p g precision vs. double p precision: all data in single p g precision
15. 15. Cell implementation • Single p g precision vs. double p precision: p partial data in double p precision
16. 16. Cell implementation • Single p g precision vs. double p precision: all critical data in double p precision
17. 17. Performance analysis Performance of one rotation on Cell BE 1.8 18 1.6 1.4 s) Time (seconds 1.2 1 0.8 0.6 0.4 04 T 0.2 0 1 2 4 8 16 Number of SPEs
18. 18. Performance analysis Performance of finding the shortest distance at Level 3 on Cell BE 7000 6000 5000 s) seconds 4000 Time (s 3000 GNU gcc IBM xlc 2000 1000 0 4 8 12 16 Number of SPEs
19. 19. Conclusion • Performance increases dramatically on Cell due to its unique architecture and algorithm optimization. • Carefulness must be taken for data placement due to limited local store. • Carefulness must also be taken for data transfer between local store and main memory.
20. 20. The End Questions?