Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations

Using GPUs to accelerate
nonstiff and stiff chemical
kinetics in combustion
simulations
Kyle Niemeyer
School of Mechanical, Industrial, and Manufacturing Eng.
Oregon State University
20 April 2015

Benefit of GPU computing
(for combustion)
2
Two
avenues
https://www.olcf.ornl.gov/titan/
Exascale science on supercomputers
High-ﬁdelity engineering
on workstations

Challenges of GPU computing
(for combustion)
3
Two
challenges
Design algorithms/strategies to
reduce computational expense
Identify appropriate algorithms
for equal or better performance

GPU
• Graphics Processing Unit
• Developed to process & display 1000s pixels
• Throughput over latency massive parallelism
4
TECHNICAL SPECIFICATIONS
FORM FACTOR> 9.75” PCIe x16 form factor
# OF CUDA CORES> 448
FRE
of high performance

Modern GPU hardware
architecture
Streaming
multiprocessor
(SM)
5
A.R. Brodtkorb et al. / J. Parallel Distrib. Comput. 73 (2013) 4–13
Fermi-class GPU hardware. The GPU consisting of up to 16 streaming multiprocessors (also known as SMs) is shown in (left), and (righ
the information in this article can be found in
different sources, including books, documentation,
nference presentations, and on Internet fora. Getting
of all this information is an arduous exercise that
bstantial effort. The aim of this article is therefore to
distributes thread blocks to multiprocessor thread sc
Fig. 4a). This scheduler handles concurrent kernel2
e
out-of-order thread block execution.
Each multiprocessor has 16 load/store units, all
and destination addresses to be calculated for 16 thre
Brodtkorb AR, Hagen TR, Sætra ML. J
Parallel Distrib Comput 2013;73:4–13.

Using GPUs
• Parallel function: “kernel”
• Hundreds–millions of concurrent threads
• Executed in 32-thread “warps”
• Challenge: thread divergence
6

Problem Description
9
ODE ODE ODE ODE ODE ODE ODE ODE ODE

• Large number of independent ODEs to
solve
• Can be even more for turbulent
combustion!
dYi
dt
=
Wi
⇢
!i
0
B
B
B
@
dY1
dt
dY2
dt
...
dYk
dt
1
C
C
C
A
10

• Traditionally solved with implicit algorithms:
complex logical ﬂow
GPU Algorithms
signiﬁcant thread divergence
11

Prior Efforts: Implicit
Implicit algorithms*: not well-suited for
GPU acceleration
12
Algorithm
Single-core CPU
speedup
Stone & Davis 2013 DVODE 7.7×
Sewerin &
Rigopoulos 2015
implicit Runge–
Kutta (Radau5)
4.8×
*Thus far

• Traditionally solved with implicit algorithms:
complex logical ﬂow
• Instead, what about explicit algorithms?
GPU Algorithms
signiﬁcant thread divergence
13

Prior Efforts: Explicit
14
Algorithm
Single-core
CPU speedup
# Species
Spafford et al.
2010
4th-order Runge–
Kutta*
9× 22
Niemeyer et al.
2011
4th-order Runge–
Kutta
75× 9
Shi et al. 2012 CHEMEQ2 3–13× 39 & 117
Stone & Davis
2013
4th-order Runge–
Kutta–Fehlberg
28.6× 19 (reduced)
*Species production terms only

GPU Algorithms
• For nonstiff chemistry: explicit Runge–Kutta–
Cash–Karp
- H2: 9 species & 38 reactions
• Moderately stiff* chemistry: stabilized explicit
Runge–Kutta–Chebyshev
- H2/CO: 13 species & 27 reactions
- CH4: 53 species & 325 reactions
- C2H4: 111 species & 784 reactions
• Custom CPU & GPU source code
15

Runge–Kutta–Cash–Karp
• Fifth-order accuracy
• Adaptive time stepping
• Global time step: 1×10-8 sec
• “Nonstiff” hydrogen mechanism1
• Range number of ODEs from 10–107
1Yetter , Dryer, and Rabitz, CST 79 (1991) 97–128
17

18
10
-3
10
-2
10
-1
10
0
10
1
10
2
10
3
10
2
10
3
10
4
10
5
10
6
10
7
Computingtimeperglobaltimestep(s)
Number of independent ODEs
RKCK-CPU
RKCK-CPU × 6
RKCK-GPU
25×
126×

Dealing with Stiffness
• Standard explicit algorithms fail
• “Stabilized” explicit methods
19

Runge–Kutta–Chebyshev
• AKA “stabilized” Runge–Kutta
• Explicit, but capable of handling mild
stiffness
- Extended stability domain along real axis
• Second-order accuracy
• Jacobian free
• Time step: 1×10-6 sec
20

C2H4 – RKC vs. VODE (stiff)
25
2.5×
Time step: 1×10-4 s

Takeaways
• For exascale (i.e., DNS):
- Explicit algorithms signiﬁcantly faster on GPUs
• For high-ﬁdelity engineering (i.e., LES):
- Implicit algorithms perform comparably to CPU, so
far… (but not much better)
- Stabilized explicit algorithms offer attractive
alternative
- Greater stiffness still a problem
26
More details: see Niemeyer & Sung, J
Comput Phys 256 (2014):854–871

?
27
Acknowledgements: Chih-Jen Sung
& Nick Curtis @ UConn
Thank you!
Questions?

28
10
0
10
1
10
2
10
3
10
4
10
3
10
4
10
5
10
6
Computingtimeperglobaltimestep(s)
Number of independent ODEs
4×
Similar conditions
Randomized conditions
CH4 – Thread divergence

Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations

Similar to Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations (20)

Recently uploaded

Recently uploaded (20)

Using GPUs to accelerate nonstiff and stiff chemical kinetics in combustion simulations