• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor
 

Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor

on

  • 550 views

 

Statistics

Views

Total Views
550
Views on SlideShare
544
Embed Views
6

Actions

Likes
0
Downloads
5
Comments
0

5 Embeds 6

http://pti.iu.edu 2
http://iu-pti.org 1
https://pti.iu.edu 1
http://researchcomputing.uits.iu.edu 1
http://www.ipgrid.org 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor Presentation Transcript

    • Implementing 3D SPHARM Surfaces Registration on Cell Processor Huian Li (huili@indiana.edu) Mi Yan (miyan@us.ibm.com) Robert Henschel (rhensche@indiana edu) (rhensche@indiana.edu) Li Shen (shenli@iupui edu) (shenli@iupui.edu) July 29, 2009
    • Contents • SPHARM registration • Matlab implementation • Cell implementation • Performance Analysis • Conclusion
    • SPHARM Surfaces • R di l and stellar surfaces Radial d t ll f • Simply connected, arbitrarily shaped • Vision, graphics, imaging, bioinformatics
    • SPHARM Expansion ( )  (x y z) (,)  (x,y,z) ( ) (,) (x,y,z) ( ) Area-preserving mapping
    • SHREC (a) template, (b) object, (c) after ICP, (d) after registration of p g parameterization
    • Calculation of coefficients • After rotating the parameter net on the surface in Euler angles (α, β, γ), new coefficients will be: l c (  )  m l  nl D l mn (  ) c l n where min( l  n ,l  m ) D mn ( )  e (  i m  in ) ( l  (  1) t d mnt (  )) t  max( 0 , n  m ) l and (l  n)!(l  n)!(l  m)!(l  m)!   d mnt (  )  l  (cos ) ( 2l nm2t ) (sin ) ( 2t mn ) (l  n  t )!(l  m  t )!(t  m  n)!t! 2 2
    • RMSD • RMSD (Root Mean Square Distance): distance between two SPHARM models L max l 1 RMSD  4   l0 m l || c 1ml  c 2 , l || 2 , m m m c and c 1 ,l 2 ,l are coefficients of two SPHARM models
    • Matlab implementation • A straightforward implementation in Matlab: for l = 0 Lmax 0, for m = -l, l for n = -l, l l for t = max(0, n-m), min(l+m, l-n) ... performing calculations ... • One rotation for Lmax = 50 took 823 seconds on 2GHz quad quad- core Intel Xeon E5335
    • Cell B.E.
    • Cell implementation • Domain decomposition: for l = 0, Lmax for m = -l l l, for n = -l, l for t = max(0 n-m) min(l+m l-n) max(0, n m), min(l+m, l n) ... calculations ... • Decomposition along l leads to work load imbalance among SPUs • Decomposition along m creates unnecessary data p g y communication
    • Cell implementation • Loop fusion: for l = 0, Lmax for m = -l l l, for n = -l, l for t = max(0 n-m) min(l+m l-n) max(0, n m), min(l+m, l n) ... calculations ... • Unique index for combined loop: f(l, m) = l2 + m + l • W kl d f each SPE : Workload for h (Lmax + 1)2/(total # of SPEs)
    • Cell implementation • Lookup table T for factorial • Transform exponentials & multiplications into multiplications & additions respectively additions, respectively. (l  n)!(l  n)!(l  m)!(l  m)!   d l ( )   (cos ) ( 2l nm2t ) (sin ) ( 2t mn ) (l  n  t )!(l  m  t )!(t  m  n)!t! mnt 2 2  exp( 1  (T (l  n )  T (l  n )  T (l  m )  T (l  m )) 2  T (l  n  t )  T (l  m  t )  T (t  m  n )  T (t )    ( 2l  n  m  2t )  log(cos )  ( 2t  m  n )  log(sin )) 2 2
    • Cell implementation • Others that specific to Cell: • Vectorization & data alignment • DMA data transfer between main memory & local store • SPU d decrementert
    • Cell implementation • Single p g precision vs. double p precision: all data in single p g precision
    • Cell implementation • Single p g precision vs. double p precision: p partial data in double p precision
    • Cell implementation • Single p g precision vs. double p precision: all critical data in double p precision
    • Performance analysis Performance of one rotation on Cell BE 1.8 18 1.6 1.4 s) Time (seconds 1.2 1 0.8 0.6 0.4 04 T 0.2 0 1 2 4 8 16 Number of SPEs
    • Performance analysis Performance of finding the shortest distance at Level 3 on Cell BE 7000 6000 5000 s) seconds 4000 Time (s 3000 GNU gcc IBM xlc 2000 1000 0 4 8 12 16 Number of SPEs
    • Conclusion • Performance increases dramatically on Cell due to its unique architecture and algorithm optimization. • Carefulness must be taken for data placement due to limited local store. • Carefulness must also be taken for data transfer between local store and main memory.
    • The End Questions?