Your SlideShare is downloading. ×
  • Like
Parallel iterative solution of the hermite collocation equations on gpus
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Parallel iterative solution of the hermite collocation equations on gpus

  • 45 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
45
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Parallel Iterative solution of the Hermite Collocation Equations on GPUs Emmanuel N. Mathioudakis Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF SCIENCES APPLIED MATHEMATICS AND COMPUTERS LABORATORY 73100 CHANIA - CRETE - GREECE
  • 2. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Talk Overview  Hermite Collocation for elliptic BVPs & Derivation of the Collocation linear system  Development of a parallel algorithm for the Schur Complement method on Shared Memory Parallel Architectures  Parallel implementation on multicore computers with GPUs
  • 3. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y f x y x y u x y g x y x y     L BBVP ( , ) ( , ) 0 , ( , ) : ( , ) ( , ) 0 , ( , , ) y y yx x x i j i j i j y y yx x x i j i j i j a i j u f u g                   L B Hermite Collocation Method
  • 4. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  • 5. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  • 6. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 7. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 8. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 9. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 10. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 11. with 0 ( , ) ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y u x y f x y x y u x y g x y x y         2 Model Problem TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 12. Helmholtz Collocation Matrix TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 13. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 14.  The collocation matrix is large, sparse and enjoys no pleasant properties (e.g. symmetric, definite) Iterative + Parallel TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 15. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution with
  • 16. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution
  • 17. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Eigenvalues of Collocation matrix
  • 18. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Schur Complement Iterative Solution
  • 19. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Iterative Solution of Collocation Linear system on Shared Memory Architectures  Uniform Load Balancing between core threads  Minimal Idle Cycles of core threads  Minimal Communication
  • 20. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx
  • 21. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x 
  • 22. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p V1 ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x  V2 V3 V4 V5 V6 V7 V8
  • 23. V1 V2 V3 V4 V5
  • 24. Mapping into a fixed size Architecture of N Cores case of k = 2p/N even 2k virtual threads l=(j-1)k+1,…,jk l=2p+(j-1)k+1,…,2p+jk j=1,…,N ( )B l x ( )R l x V2 . . . Pj
  • 25. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Schur Complement Iterative Solution
  • 26. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  • 27. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  • 28. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY The Dirichlet Helmholtz Problem 2100( 0.1)2with ( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1] ( ) ( ) x u x y x y x y x x x e          
  • 29. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY HP SL390s 6 core Xeon@2.8GHz 24GB memory Oracle Linux 6.3 x64 PGI 13.5 Fortran PCI-e gen2 x16 HP SL390s – Tesla M2070 GPUs + 2 x
  • 30. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Iterations / Error measurements ns BiCGSTAB Iterations || b – Axn||2 256 294 6.06e-11 512 589 2.85e-11 1024 1161 1.39e-11 2048 3726 9.59e-12
  • 31. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 32. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 33. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 34. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 35. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 36. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 37. Conclusions • A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed. • The algorithm is realized on Shared Memory multi-core machines with GPUs . • A performance acceleration of up to 30% is observed. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 38. Future work • Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY