Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Using GPUs to accelerate
nonstiff and stiff chemical
kinetics in combustion
simulations
Kyle Niemeyer
School of Mechanical...
Benefit of GPU computing
(for combustion)
2
Two
avenues
https://www.olcf.ornl.gov/titan/
Exascale science on supercomputer...
Challenges of GPU computing
(for combustion)
3
Two
challenges
Design algorithms/strategies to
reduce computational expense...
GPU
• Graphics Processing Unit
• Developed to process & display 1000s pixels
• Throughput over latency massive parallelism...
Modern GPU hardware
architecture
Streaming
multiprocessor
(SM)
5
A.R. Brodtkorb et al. / J. Parallel Distrib. Comput. 73 (...
Using GPUs
• Parallel function: “kernel”
• Hundreds–millions of concurrent threads
• Executed in 32-thread “warps”
• Chall...
if
if if
else
elseelse
7
if
if if
else
elseelse
8
Problem Description
9
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ODE ODE
ODE ODE ODE ODE ODE ODE ODE ...
• Large number of independent ODEs to
solve
• Can be even more for turbulent
combustion!
dYi
dt
=
Wi
⇢
!i
0
B
B
B
@
dY1
dt...
• Traditionally solved with implicit algorithms:
complex logical flow
GPU Algorithms
significant thread divergence
11
Prior Efforts: Implicit
Implicit algorithms*: not well-suited for
GPU acceleration
12
Algorithm
Single-core CPU
speedup
St...
• Traditionally solved with implicit algorithms:
complex logical flow
• Instead, what about explicit algorithms?
GPU Algori...
Prior Efforts: Explicit
14
Algorithm
Single-core
CPU speedup
# Species
Spafford et al.
2010
4th-order Runge–
Kutta*
9× 22
...
GPU Algorithms
• For nonstiff chemistry: explicit Runge–Kutta–
Cash–Karp
- H2: 9 species & 38 reactions
• Moderately stiff...
Initial Condition Sampling
16
Runge–Kutta–Cash–Karp
• Fifth-order accuracy
• Adaptive time stepping
• Global time step: 1×10-8 sec
• “Nonstiff” hydrogen...
18
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
10
2
10
3
10
4
10
5
10
6
10
7
Computingtimeperglobaltimestep(s)
Number of indepen...
Dealing with Stiffness
• Standard explicit algorithms fail
• “Stabilized” explicit methods
19
Runge–Kutta–Chebyshev
• AKA “stabilized” Runge–Kutta
• Explicit, but capable of handling mild
stiffness
- Extended stabili...
H2/CO – RKC
21
59×
10×
CH4 – RKC
22
69×
13×
C2H4 – RKC
23
18×
4.5×
CH4 – RKC vs. VODE
24
57×
C2H4 – RKC vs. VODE (stiff)
25
2.5×
Time step: 1×10-4 s
Takeaways
• For exascale (i.e., DNS):
- Explicit algorithms significantly faster on GPUs
• For high-fidelity engineering (i....
?
27
Acknowledgements: Chih-Jen Sung
& Nick Curtis @ UConn
Thank you!
Questions?
28
10
0
10
1
10
2
10
3
10
4
10
3
10
4
10
5
10
6
Computingtimeperglobaltimestep(s)
Number of independent ODEs
4×
Similar co...
Upcoming SlideShare
Loading in …5
×

Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations

419 views

Published on

Presented at the 15th International Conference on Numerical Combustion in Avignon, France (19–22 April 2015).

Combustion simulations with detailed chemical kinetics require the integration of a large number of ordinary differential equation (ODEs), with at least one ODE system per spatial location solved every time step. This task is well-suited to the massively parallel processing capabilities of graphics processing units (GPUs), where individual GPU threads concurrently integrate independent ODE systems for different spatial locations. However, the typical high-order implicit algorithms used in combustion modeling applications (e.g., VODE, LSODE) to handle stiffness involve complex logical flow that causes severe thread divergence when implemented on GPUs, thus limiting performance. Alternate algorithms are therefore needed. This talk will discuss strategies and results using integration algorithms for nonstiff and stiff chemical kinetics on GPUs.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations

  1. 1. Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations Kyle Niemeyer School of Mechanical, Industrial, and Manufacturing Eng. Oregon State University 20 April 2015
  2. 2. Benefit of GPU computing (for combustion) 2 Two avenues https://www.olcf.ornl.gov/titan/ Exascale science on supercomputers High-fidelity engineering on workstations
  3. 3. Challenges of GPU computing (for combustion) 3 Two challenges Design algorithms/strategies to reduce computational expense Identify appropriate algorithms for equal or better performance
  4. 4. GPU • Graphics Processing Unit • Developed to process & display 1000s pixels • Throughput over latency massive parallelism 4 TECHNICAL SPECIFICATIONS FORM FACTOR> 9.75” PCIe x16 form factor # OF CUDA CORES> 448 FRE of high performance
  5. 5. Modern GPU hardware architecture Streaming multiprocessor (SM) 5 A.R. Brodtkorb et al. / J. Parallel Distrib. Comput. 73 (2013) 4–13 Fermi-class GPU hardware. The GPU consisting of up to 16 streaming multiprocessors (also known as SMs) is shown in (left), and (righ the information in this article can be found in different sources, including books, documentation, nference presentations, and on Internet fora. Getting of all this information is an arduous exercise that bstantial effort. The aim of this article is therefore to distributes thread blocks to multiprocessor thread sc Fig. 4a). This scheduler handles concurrent kernel2 e out-of-order thread block execution. Each multiprocessor has 16 load/store units, all and destination addresses to be calculated for 16 thre Brodtkorb AR, Hagen TR, Sætra ML. J Parallel Distrib Comput 2013;73:4–13.
  6. 6. Using GPUs • Parallel function: “kernel” • Hundreds–millions of concurrent threads • Executed in 32-thread “warps” • Challenge: thread divergence 6
  7. 7. if if if else elseelse 7
  8. 8. if if if else elseelse 8
  9. 9. Problem Description 9 ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE ODE
  10. 10. • Large number of independent ODEs to solve • Can be even more for turbulent combustion! dYi dt = Wi ⇢ !i 0 B B B @ dY1 dt dY2 dt ... dYk dt 1 C C C A 10
  11. 11. • Traditionally solved with implicit algorithms: complex logical flow GPU Algorithms significant thread divergence 11
  12. 12. Prior Efforts: Implicit Implicit algorithms*: not well-suited for GPU acceleration 12 Algorithm Single-core CPU speedup Stone & Davis 2013 DVODE 7.7× Sewerin & Rigopoulos 2015 implicit Runge– Kutta (Radau5) 4.8× *Thus far
  13. 13. • Traditionally solved with implicit algorithms: complex logical flow • Instead, what about explicit algorithms? GPU Algorithms significant thread divergence 13
  14. 14. Prior Efforts: Explicit 14 Algorithm Single-core CPU speedup # Species Spafford et al. 2010 4th-order Runge– Kutta* 9× 22 Niemeyer et al. 2011 4th-order Runge– Kutta 75× 9 Shi et al. 2012 CHEMEQ2 3–13× 39 & 117 Stone & Davis 2013 4th-order Runge– Kutta–Fehlberg 28.6× 19 (reduced) *Species production terms only
  15. 15. GPU Algorithms • For nonstiff chemistry: explicit Runge–Kutta– Cash–Karp - H2: 9 species & 38 reactions • Moderately stiff* chemistry: stabilized explicit Runge–Kutta–Chebyshev - H2/CO: 13 species & 27 reactions - CH4: 53 species & 325 reactions - C2H4: 111 species & 784 reactions • Custom CPU & GPU source code 15
  16. 16. Initial Condition Sampling 16
  17. 17. Runge–Kutta–Cash–Karp • Fifth-order accuracy • Adaptive time stepping • Global time step: 1×10-8 sec • “Nonstiff” hydrogen mechanism1 • Range number of ODEs from 10–107 1Yetter , Dryer, and Rabitz, CST 79 (1991) 97–128 17
  18. 18. 18 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 10 2 10 3 10 4 10 5 10 6 10 7 Computingtimeperglobaltimestep(s) Number of independent ODEs RKCK-CPU RKCK-CPU × 6 RKCK-GPU 25× 126×
  19. 19. Dealing with Stiffness • Standard explicit algorithms fail • “Stabilized” explicit methods 19
  20. 20. Runge–Kutta–Chebyshev • AKA “stabilized” Runge–Kutta • Explicit, but capable of handling mild stiffness - Extended stability domain along real axis • Second-order accuracy • Jacobian free • Time step: 1×10-6 sec 20
  21. 21. H2/CO – RKC 21 59× 10×
  22. 22. CH4 – RKC 22 69× 13×
  23. 23. C2H4 – RKC 23 18× 4.5×
  24. 24. CH4 – RKC vs. VODE 24 57×
  25. 25. C2H4 – RKC vs. VODE (stiff) 25 2.5× Time step: 1×10-4 s
  26. 26. Takeaways • For exascale (i.e., DNS): - Explicit algorithms significantly faster on GPUs • For high-fidelity engineering (i.e., LES): - Implicit algorithms perform comparably to CPU, so far… (but not much better) - Stabilized explicit algorithms offer attractive alternative - Greater stiffness still a problem 26 More details: see Niemeyer & Sung, J Comput Phys 256 (2014):854–871
  27. 27. ? 27 Acknowledgements: Chih-Jen Sung & Nick Curtis @ UConn Thank you! Questions?
  28. 28. 28 10 0 10 1 10 2 10 3 10 4 10 3 10 4 10 5 10 6 Computingtimeperglobaltimestep(s) Number of independent ODEs 4× Similar conditions Randomized conditions CH4 – Thread divergence

×