Like this presentation? Why not share!

# Parallel iterative solution of the hermite collocation equations on gpus

## on Jul 26, 2013

• 94 views

### Views

Total Views
94
Views on SlideShare
94
Embed Views
0

Likes
0
0
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Parallel iterative solution of the hermite collocation equations on gpusPresentation Transcript

• Parallel Iterative solution of the Hermite Collocation Equations on GPUs Emmanuel N. Mathioudakis Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF SCIENCES APPLIED MATHEMATICS AND COMPUTERS LABORATORY 73100 CHANIA - CRETE - GREECE
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Talk Overview  Hermite Collocation for elliptic BVPs & Derivation of the Collocation linear system  Development of a parallel algorithm for the Schur Complement method on Shared Memory Parallel Architectures  Parallel implementation on multicore computers with GPUs
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y f x y x y u x y g x y x y     L BBVP ( , ) ( , ) 0 , ( , ) : ( , ) ( , ) 0 , ( , , ) y y yx x x i j i j i j y y yx x x i j i j i j a i j u f u g                   L B Hermite Collocation Method
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
• with 0 ( , ) ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y u x y f x y x y u x y g x y x y         2 Model Problem TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
• Helmholtz Collocation Matrix TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
•  The collocation matrix is large, sparse and enjoys no pleasant properties (e.g. symmetric, definite) Iterative + Parallel TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution with
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Eigenvalues of Collocation matrix
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Schur Complement Iterative Solution
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Iterative Solution of Collocation Linear system on Shared Memory Architectures  Uniform Load Balancing between core threads  Minimal Idle Cycles of core threads  Minimal Communication
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x 
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p V1 ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x  V2 V3 V4 V5 V6 V7 V8
• V1 V2 V3 V4 V5
• Mapping into a fixed size Architecture of N Cores case of k = 2p/N even 2k virtual threads l=(j-1)k+1,…,jk l=2p+(j-1)k+1,…,2p+jk j=1,…,N ( )B l x ( )R l x V2 . . . Pj
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Schur Complement Iterative Solution
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY The Dirichlet Helmholtz Problem 2100( 0.1)2with ( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1] ( ) ( ) x u x y x y x y x x x e          
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY HP SL390s 6 core Xeon@2.8GHz 24GB memory Oracle Linux 6.3 x64 PGI 13.5 Fortran PCI-e gen2 x16 HP SL390s – Tesla M2070 GPUs + 2 x
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Iterations / Error measurements ns BiCGSTAB Iterations || b – Axn||2 256 294 6.06e-11 512 589 2.85e-11 1024 1161 1.39e-11 2048 3726 9.59e-12
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
• TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
• Conclusions • A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed. • The algorithm is realized on Shared Memory multi-core machines with GPUs . • A performance acceleration of up to 30% is observed. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
• Future work • Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY