Parallel iterative solution of the hermite collocation equations on gpus

116 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
116
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Parallel iterative solution of the hermite collocation equations on gpus

  1. 1. Parallel Iterative solution of the Hermite Collocation Equations on GPUs Emmanuel N. Mathioudakis Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF SCIENCES APPLIED MATHEMATICS AND COMPUTERS LABORATORY 73100 CHANIA - CRETE - GREECE
  2. 2. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Talk Overview  Hermite Collocation for elliptic BVPs & Derivation of the Collocation linear system  Development of a parallel algorithm for the Schur Complement method on Shared Memory Parallel Architectures  Parallel implementation on multicore computers with GPUs
  3. 3. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y f x y x y u x y g x y x y     L BBVP ( , ) ( , ) 0 , ( , ) : ( , ) ( , ) 0 , ( , , ) y y yx x x i j i j i j y y yx x x i j i j i j a i j u f u g                   L B Hermite Collocation Method
  4. 4. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  5. 5. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  6. 6. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  7. 7. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  8. 8. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  9. 9. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  10. 10. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  11. 11. with 0 ( , ) ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y u x y f x y x y u x y g x y x y         2 Model Problem TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  12. 12. Helmholtz Collocation Matrix TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  13. 13. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  14. 14.  The collocation matrix is large, sparse and enjoys no pleasant properties (e.g. symmetric, definite) Iterative + Parallel TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  15. 15. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution with
  16. 16. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution
  17. 17. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Eigenvalues of Collocation matrix
  18. 18. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Schur Complement Iterative Solution
  19. 19. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Iterative Solution of Collocation Linear system on Shared Memory Architectures  Uniform Load Balancing between core threads  Minimal Idle Cycles of core threads  Minimal Communication
  20. 20. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx
  21. 21. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x 
  22. 22. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p V1 ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x  V2 V3 V4 V5 V6 V7 V8
  23. 23. V1 V2 V3 V4 V5
  24. 24. Mapping into a fixed size Architecture of N Cores case of k = 2p/N even 2k virtual threads l=(j-1)k+1,…,jk l=2p+(j-1)k+1,…,2p+jk j=1,…,N ( )B l x ( )R l x V2 . . . Pj
  25. 25. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Schur Complement Iterative Solution
  26. 26. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  27. 27. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  28. 28. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY The Dirichlet Helmholtz Problem 2100( 0.1)2with ( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1] ( ) ( ) x u x y x y x y x x x e          
  29. 29. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY HP SL390s 6 core Xeon@2.8GHz 24GB memory Oracle Linux 6.3 x64 PGI 13.5 Fortran PCI-e gen2 x16 HP SL390s – Tesla M2070 GPUs + 2 x
  30. 30. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Iterations / Error measurements ns BiCGSTAB Iterations || b – Axn||2 256 294 6.06e-11 512 589 2.85e-11 1024 1161 1.39e-11 2048 3726 9.59e-12
  31. 31. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  32. 32. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  33. 33. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  34. 34. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  35. 35. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  36. 36. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  37. 37. Conclusions • A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed. • The algorithm is realized on Shared Memory multi-core machines with GPUs . • A performance acceleration of up to 30% is observed. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  38. 38. Future work • Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY

×