Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

On Numerical Properties of Accelerated Multiple
Precision Implicit Runge-Kutta Methods
Shizuoka Institute of Science and Technology
Tomonori Kouya
http://na-inet.jp/na/birk/
SciCADE2013 in Valladolid, SPAIN
2013-09-16(Mon) – 20(Fri)

Abstarct
Abstract
Motivation
IRK method with simpliﬁed Newton method
Acceleration of inner iteration and stepsize selection
Performance check by solving linear ODE
Numerical experiments of Evolutionary PDEs
Conclusion and Future work

Motivation
BNCpack
▶ provides double and multiple precision numerical algorithms
based on MPFR/GMP.
▶ has simple explicit and implicit Runge-Kutta (IRK) methods
and extrapolation methods for solving ODEs.
⇓
In SciCADE 2007, a gentleman suggested to us that
Kuramoto-Sivashinsky (K-S) equation is suitable for our multiple
precision ODE solvers because of one of chaotic, stiﬀ and large
scale examples of ODEs.
⇓
Accelerated Multiple precision IRK methods based on MPFR/GMP
are neccesary to solve it.

The Features of Accelerated IRK methods
1. It uses Gauss formula, which is 2m-th order for m stages, A
and P-stable, and symplectic method.
2. Supporting mixed precision iterative reﬁnement method in
simpliﬁed Newton iteration in IRK process can drastically
reduce computational time.
3. The parallelization by using OpenMP can be more highly
performed.

IVP of n dimensional ODE to be solved
{ dy
dt = f(t, y) ∈ Rn
y(t0) = y0
Integration Interval:[t0, α]
(1)
We suppose that this above ODE has the unique solution, so
Lipschize constant L > 0 exists to be satisﬁed such as
||f(t, v) − f(t, w)|| ≤ L||v − w|| (2)
for ∀v, w ∈ Rn, ∀t ∈ [t0, α].
⇓
1D Brusselator problem and K-S eq. has large L >> 1, so they are
called “stiﬀ problems. ”

Skeleton of m stages IRK methods
Discretization: t0, t1 := t0 + h0, ..., tk+1 := tk + hk...
When we calculate the approximation yk+1 ≈ y(tk+1) from the
former yk ≈ y(tk), the following two steps are executed:
(A) Inner iteration: Solve the nonlinear equation for unknown
Y = [Y1 ... Ym]T ∈ Rmn.



Y1 = yk + hk
∑m
j=1 a1jf(tk + cjhk, Yj)
...
Ym = yk + hk
∑m
j=1 amjf(tk + cjhk, Yj)
⇕
F(Y) = 0 (3)
(B) Calculate the next approximation yk+1 with the above Y.
yk+1 := yk + hk
m∑
j=1
bjf(tk + cjhk, Yj)

Coeﬃcients of m stages Runge-Kutta method
We use IRK coeﬃcients such as:
c1 a11 · · · a1m
...
...
...
cm am1 · · · amm
b1 · · · bm
=
c A
bT (4)
Our IRK solver only uses Gauss formula family which is one of fully
IRK formulas (aij ̸= 0 (i ≤ j)).

Simplified Newton method as inner iteration of IRK
method
RADAU5 (by Hairer) and SPARK3(by Jay) use simplified Newton
method as inner iteration to solve the nonlinear equation (3).
Simplified Newton Method:
Yl+1 := Yl−(Im ⊗ In − hkA ⊗ J)−1
F(Yl) (5)
where In and Im are n × n and m × m unit matrix respectively, J
= ∂f/∂y(tk, yk) ∈ Rn×n is the Jacobi matrix corresponding to f.
⇒ We must solve the following linear equation for each iteration of
simplified Newton method (5):
(In ⊗ Im − hkA ⊗ J)Z = −F(Yl) (6)
and then obtain the solution Z and calculate Yl+1 := Yl + Z.

Why do we select SPARK3 reduction, not RADAU5?
RADAU5: Complex Diagonalization of A by Complex Similarity
Transformation Matrix S
(S ⊗ In)(Im ⊗ In − hA ⊗ J)(S−1
⊗ In) = Im ⊗ In − hΛ ⊗ J
=



In − hλ1J
...
In − hλmJ


 .
SPARK3: Real Tridiagonalization of A by Real Similarity Transformation
Matrix W
X = WT
BAW =








1/2 −ζ1
ζ1 0
...
...
... −ζm−2
ζm−2 0 −ζm−1
ζm−1 0








where W = [wij] = [ ˜Pj−1(ci)] (i, j = 1, 2, ..., m)
ζi =
(
2
√
4i2 − 1
)−1
(i = 1, 2, ..., m − 1)
B = diag(b), Im = WT
BW = diag(1 1 · · · 1)

Condition numbers of two kinds of similarity
transformation matrices
m 3 5 10 15 20 50
κ∞(S) 22.0 388 3. × 105 3. × 108 2. × 1011 4. × 1028
κ∞(W) 3.24 6.27 16.4 29.3 44.5 172
▶ RADAU5’s S has larger condition numbers
(κ∞(S) = ∥S∥∞∥S−1∥∞) as the number of stages of IRK
formulas.
▶ SPARK3’s W condition number (κ∞(W) = ∥W∥∞∥W−1∥∞)
become mildly larger.
=⇒ SPARK3 reduction is the only one selection for many stages
IRK formulas.

SPARK3 Reduction(1/3)
The coeﬃcient matrix of the linear equation for SPARK3 reduction
is:
(WT
B ⊗ In)(Im ⊗ In − hkA ⊗ J)(W ⊗ In)
= Im ⊗ In − hkX ⊗ J =







E1 F1
G1 E2 F2
...
...
...
Gm−2 Em−1 Fm−1
Gm−1 Em







where
E1 = In −
1
2
hkJ, E2 = · · · = Es = In
Fi = hkζiJ, Gi = −hkζiJ (i = 1, 2, ..., m − 1).

SPARK3 Reduction(2/3)
Jay proposed the left preconditioned matrix P for linear solver sush
as:
P =







˜E1 F1
G1
˜E2 F2
...
...
...
Gm−2
˜Em−1 Fm−1
Gm−1
˜Em







≈ Im ⊗ In − hkX ⊗ J
so the preconditioned linear equation to be solved for Z is
P−1
(Im ⊗ In − hkX ⊗ J)Z = P−1
(WT
B ⊗ In)(−F(Y)).

SPARK3 Reduction (3/3)
We use LU decomposed P such as
P =







In
G1
˜H−1
1 In
...
...
Gm−2
˜H−1
m−2 In
Gm−1
˜H−1
m−1 In







×







˜H1 F1
˜H2 F2
...
...
˜Hm−1 Fm−1
˜Hm







where
˜Hi := In − (2(2i − 1))−1
hJ (i = 1, 2, ..., m).
(cf.) ”A Parallelizable Preconditioner for the Iterative Solution of Implicit Runge-Kutta-type Methods”, Journal of
Computational and Applied Mathematics 111 (1999) P.63-76

Mixed precision iterative reﬁnement method
Mixed precision iterative reﬁnement method is to reduce
computational cost by combining short S digits arithmetic and
long L digits arithmetic (S << L).
The linear equation to be solved: Cx = d , C ∈ RN×N , d,
x ∈ RN
=⇒
(L) Solve Cx0 = d for x0.
For ν = 0, 1, 2, ...
(L) rν := d − Cxν
(S) r′
ν := rν/∥rν∥
(S) Solve Cz = r′
ν for z.
(L) xν+1 := xν + ∥rν∥z
Check convergence.
=⇒ x := xνstop
(cf.) Buttari, Alfredo, et al. International Journal of High Performance
Computing Applications 21.4 (2007): 457-466.

The whole algorithm of accelerated IRK method
Initial guess: Y−1 ∈ Rmn
For l = 0, 1, 2, ... Simpliﬁed Newton iteration
(1) Yl := [Y
(l)
1 Y
(l)
2 ... Y
(l)
m ]T
(2) C := Im ⊗ In − hkX ⊗ J, Compute ||C||F
(3) d := (WT B ⊗ In)(−F(Yl))
(4) Solve Cx0 = d for x0 (S)
For ν = 0, 1, 2, ... Mixed precision iterative reﬁnement
(5) rν := d − Cxν
(6-1) r′
ν := rν/||rν|| (S)
(6-2) Solve Cz = r′
ν for z (S)
(6-3) xν+1 := xν + ||rν||z
(6-4) Check convergence ⇒ xνstop
(7) Yl+1 := Yl + (W ⊗ In)xνstop
Check convergence ⇒ Ylstop
Y := Ylstop = [Y1 Y2 ... Ym]T
yk+1 := yk + hk
∑m
j=1 bjf(tk + cjhk, Yj)

Computational environment
H/W Intel Core i7 3820 (4 cores) 3.6GHz + 64GB RAM
OS Scientiﬁc Linux 6.3 x86 64
S/W Intel C++ 13.0.1, MPFR 3.1.1/GMP 5.1.1,
BNCpack 0.8
▶ OpenMP in Intel C++ standard.
▶ Block Parallelization for capable parts of IRK methods.
▶ Except left preconditioning and direct method.

Performance check by solving 128th dimentional constant
linear ODE (50 decimal digits)
1.E-38
1.E-34
1.E-30
1.E-26
1.E-22
1.E-18
1.E-14
1.E-10
1.E-06
1.E-02
1.E+02
1.E+06
1.E+10
1.E+14
0
200
400
600
800
1000
1200
1400
1600
3 4 5 6 7 8 9 10 11 12
Relative ErrorComp.Time (s)
m
Iter.Ref-DM W-Trans. W-Iter.Ref-MM W-Iter.Ref-DM Max.Rel.Err
Iter.Ref-DM No reduction + quasi-Newton + Double Precision (DP) -
Multiple Precision (MP) mixed precision iterative
refinement method (based on direct method)
W-Trans. SPARK3 reduction + MP direct method
W-Iter.Ref-MM SPARK3 + MP(S = L/2)-MP iterative refinement
W-Iter.Ref-DM SPARK3 + DP-MP iterative refinement

Stepsize selection by embedded formula (1/2)
Embedded formula for IRK methods (by Hairer): The following
m + 1 stages IRK formula for given contant γ0:
0 0 0T
c 0 A
γ0
ˆbT
In order to extend A stable area, we select γ0 = 1/8 where
ˆb= [ˆb1 · · ·ˆbm]T is obtained by solving the following linear equation
to be satisﬁed in simpliﬁed assumption B(m):





1 · · · 1
c1 · · · cm
...
...
cm−1
1 · · · cm−1
m










ˆb1
ˆb2
...
ˆbm





=





1 − γ0
1/2
...
1/m





.

Stepsize selection by embedded formula (2/2)
By using this embedded formula , we can get ˆyk+1 as following:
ˆyk+1 := yk + hkγ0f(tk, yk) + hk
m∑
j=1
ˆbjf(tk + cjhk, Yj).
And we use the ˆyk+1 for the following local error estimator errk.
||errk|| =
1
n
n∑
i=1
(
|ˆy
(k+1)
i − y
(k+1)
i |
ATOL + RTOL max(|y
(k)
i |, |y
(k+1)
i |)
)2
where ATOL is set as absolute tolerance and RTOL as relative
tolerance given by users.
This estimator is used in next stepsize hk+1 prediction as following:
hk+1 := 0.9||errk||m+1
hk

Numerical experiments of Evolutionary PDEs
▶ 1D Brusselator Problem (omit!)
{
∂u
∂t = 1 + u2v − 4 + 0.02 · ∂2u
∂x2
∂v
∂t = 3u − u2v + 0.02 · ∂2v
∂x2
(7)
▶ 1D Kuramoto-Sivashinsky (K-S) equation
∂U
∂t
= −
∂2U
∂x2
−
∂4U
∂x4
−
1
2
∂U2
∂x
(8)

1D Kuramoto-Sivashinsky Equation: Discretization
method(1/2)
cf. Hairer & Wanner, Solving ODE II, Chap. IV, pp.148 - 149.
∂U
∂t
= −
∂2U
∂x2
−
∂4U
∂x4
−
1
2
∂U2
∂x
Periodic boundary condition: U(x + L, t) = U(x, t)
Initial value: U(x, 0) = 16 max(0,
min(x/L, 0.1 − x/L),
20(x/L − 0.2)(0.3 − x/L),
min(x/L − 0.6, 0.7 − x/L),
min(x/L − 0.9, 1 − x/L))
Parameters: L = 2π/q, q = 0.025

1D Kuramoto-Sivashinsky Equation: Discretization
method(2/2)
⇓ Discretization by using pseudospectral method
Ûj(t) =
1
L
∫ L
0
U(x, t) exp(−iqjx)dx
U(x, t) =
∑
j∈Z
Ûj(t) exp(iqjx)
d Ûj
dt
= ((qj)2
− (qj)4
) Ûj −
iqj
2
(U · U)j (j ∈ Z)
⇓ Truncating at N = 1024, we make ODE for y(t) = {yj(t)}
dyj
dt
= ((qj)2
−(qj)4
)yj−
iqj
2
FN (F−1
N y·F−1
N y) (j = 1, 2, ..., N/2−1)
where FN , F−1
N means FFT and inverse FFT, respectively.
⋆ Mutiple precision real FFT and inverse real FFT routines are
originated by Ooura’s double precision C routines.
http://www.kurims.kyoto-u.ac.jp/~ooura/fftman/ftmn2_12.htm.

K-S eq. : Numerical values by Multiple precision and
RADAU5(Double precision)

K-S eq. : Relative Errors of RADAU5 (Double precision)

K-S eq.: Computational Times by using variable #stages
IRK formulas
4 threads, 80 stages formulas in 100 decimal digits as the true
solution, and t = 10
80 dec.digits RTOL = ATOL = 10−60
# stages(m) 20 30 40 50
Comp.Time(s) 130165.4 160601.8 133541.0 190131.4
# steps 6911 2667 1103 856
Average (s) 18.8 60.2 121.1 222.1
Max.Rel.Error 4.2E-38 2.7E-38 1.4E-38 2.0E-36
Min.Rel.Error 1.1E-54 1.8E-50 1.7E-52 1.0E-63
RTOL = ATOL = 10−70
# stages(m) 20 30 40 50
Comp.Time(s) 100695.2 86331.4 137232.9 200454.8
# steps 6978 1738 1175 918
Average (s) 14.4 49.7 116.8 218.4
Max.Rel.Error 4.4E-48 1.9E-49 2.4E-47 5.1E-47
Min.Rel.Error 2.7E-68 5.9E-68 1.3E-62 1.2E-68

Conclusion
▶ We can implement the accelerated multiple precision IRK
methods with DP-MP mixed precision iterative reﬁnement
method and SPARK3 reduction in inner simpliﬁed Newton
iteration.
▶ Parallelization can reduce the computational cost.
▶ Our implemented ODE solver is available for solving complex
evolutionary PDEs such as Brusselator problem or 1D
Kuramoto-Sivashinsky equation.

Future work
We have the following plans to:
1. Seek higher performance ODE solver in massively parallel
computation environment such as GPGPU or Intel MIC.
2. Implement stable double precision linear solvers such as
GMRES(m) or other stable Krylov subspace methods.
3. Solve many other problems by our ODE solver.
A part of our implemented ODE solver is published as
BIRK(extented Bncpack for Implicit Runge-Kutta methods) in our
Web site.
http://na-inet.jp/na/birk/

Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Similar to Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods" (20)

Recently uploaded

Recently uploaded (20)

Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"