SlideShare a Scribd company logo
1 of 27
Download to read offline
On Numerical Properties of Accelerated Multiple
Precision Implicit Runge-Kutta Methods
Shizuoka Institute of Science and Technology
Tomonori Kouya
http://na-inet.jp/na/birk/
SciCADE2013 in Valladolid, SPAIN
2013-09-16(Mon) – 20(Fri)
Abstarct
Abstract
Motivation
IRK method with simplified Newton method
Acceleration of inner iteration and stepsize selection
Performance check by solving linear ODE
Numerical experiments of Evolutionary PDEs
Conclusion and Future work
Motivation
BNCpack
▶ provides double and multiple precision numerical algorithms
based on MPFR/GMP.
▶ has simple explicit and implicit Runge-Kutta (IRK) methods
and extrapolation methods for solving ODEs.
⇓
In SciCADE 2007, a gentleman suggested to us that
Kuramoto-Sivashinsky (K-S) equation is suitable for our multiple
precision ODE solvers because of one of chaotic, stiff and large
scale examples of ODEs.
⇓
Accelerated Multiple precision IRK methods based on MPFR/GMP
are neccesary to solve it.
The Features of Accelerated IRK methods
1. It uses Gauss formula, which is 2m-th order for m stages, A
and P-stable, and symplectic method.
2. Supporting mixed precision iterative refinement method in
simplified Newton iteration in IRK process can drastically
reduce computational time.
3. The parallelization by using OpenMP can be more highly
performed.
IVP of n dimensional ODE to be solved
{ dy
dt = f(t, y) ∈ Rn
y(t0) = y0
Integration Interval:[t0, α]
(1)
We suppose that this above ODE has the unique solution, so
Lipschize constant L > 0 exists to be satisfied such as
||f(t, v) − f(t, w)|| ≤ L||v − w|| (2)
for ∀v, w ∈ Rn, ∀t ∈ [t0, α].
⇓
1D Brusselator problem and K-S eq. has large L >> 1, so they are
called “stiff problems. ”
Skeleton of m stages IRK methods
Discretization: t0, t1 := t0 + h0, ..., tk+1 := tk + hk...
When we calculate the approximation yk+1 ≈ y(tk+1) from the
former yk ≈ y(tk), the following two steps are executed:
(A) Inner iteration: Solve the nonlinear equation for unknown
Y = [Y1 ... Ym]T ∈ Rmn.



Y1 = yk + hk
∑m
j=1 a1jf(tk + cjhk, Yj)
...
Ym = yk + hk
∑m
j=1 amjf(tk + cjhk, Yj)
⇕
F(Y) = 0 (3)
(B) Calculate the next approximation yk+1 with the above Y.
yk+1 := yk + hk
m∑
j=1
bjf(tk + cjhk, Yj)
Coefficients of m stages Runge-Kutta method
We use IRK coefficients such as:
c1 a11 · · · a1m
...
...
...
cm am1 · · · amm
b1 · · · bm
=
c A
bT (4)
Our IRK solver only uses Gauss formula family which is one of fully
IRK formulas (aij ̸= 0 (i ≤ j)).
Simplified Newton method as inner iteration of IRK
method
RADAU5 (by Hairer) and SPARK3(by Jay) use simplified Newton
method as inner iteration to solve the nonlinear equation (3).
Simplified Newton Method:
Yl+1 := Yl−(Im ⊗ In − hkA ⊗ J)−1
F(Yl) (5)
where In and Im are n × n and m × m unit matrix respectively, J
= ∂f/∂y(tk, yk) ∈ Rn×n is the Jacobi matrix corresponding to f.
⇒ We must solve the following linear equation for each iteration of
simplified Newton method (5):
(In ⊗ Im − hkA ⊗ J)Z = −F(Yl) (6)
and then obtain the solution Z and calculate Yl+1 := Yl + Z.
Why do we select SPARK3 reduction, not RADAU5?
RADAU5: Complex Diagonalization of A by Complex Similarity
Transformation Matrix S
(S ⊗ In)(Im ⊗ In − hA ⊗ J)(S−1
⊗ In) = Im ⊗ In − hΛ ⊗ J
=



In − hλ1J
...
In − hλmJ


 .
SPARK3: Real Tridiagonalization of A by Real Similarity Transformation
Matrix W
X = WT
BAW =








1/2 −ζ1
ζ1 0
...
...
... −ζm−2
ζm−2 0 −ζm−1
ζm−1 0








where W = [wij] = [ ˜Pj−1(ci)] (i, j = 1, 2, ..., m)
ζi =
(
2
√
4i2 − 1
)−1
(i = 1, 2, ..., m − 1)
B = diag(b), Im = WT
BW = diag(1 1 · · · 1)
Condition numbers of two kinds of similarity
transformation matrices
m 3 5 10 15 20 50
κ∞(S) 22.0 388 3. × 105 3. × 108 2. × 1011 4. × 1028
κ∞(W) 3.24 6.27 16.4 29.3 44.5 172
▶ RADAU5’s S has larger condition numbers
(κ∞(S) = ∥S∥∞∥S−1∥∞) as the number of stages of IRK
formulas.
▶ SPARK3’s W condition number (κ∞(W) = ∥W∥∞∥W−1∥∞)
become mildly larger.
=⇒ SPARK3 reduction is the only one selection for many stages
IRK formulas.
SPARK3 Reduction(1/3)
The coefficient matrix of the linear equation for SPARK3 reduction
is:
(WT
B ⊗ In)(Im ⊗ In − hkA ⊗ J)(W ⊗ In)
= Im ⊗ In − hkX ⊗ J =







E1 F1
G1 E2 F2
...
...
...
Gm−2 Em−1 Fm−1
Gm−1 Em







where
E1 = In −
1
2
hkJ, E2 = · · · = Es = In
Fi = hkζiJ, Gi = −hkζiJ (i = 1, 2, ..., m − 1).
SPARK3 Reduction(2/3)
Jay proposed the left preconditioned matrix P for linear solver sush
as:
P =







˜E1 F1
G1
˜E2 F2
...
...
...
Gm−2
˜Em−1 Fm−1
Gm−1
˜Em







≈ Im ⊗ In − hkX ⊗ J
so the preconditioned linear equation to be solved for Z is
P−1
(Im ⊗ In − hkX ⊗ J)Z = P−1
(WT
B ⊗ In)(−F(Y)).
SPARK3 Reduction (3/3)
We use LU decomposed P such as
P =







In
G1
˜H−1
1 In
...
...
Gm−2
˜H−1
m−2 In
Gm−1
˜H−1
m−1 In







×







˜H1 F1
˜H2 F2
...
...
˜Hm−1 Fm−1
˜Hm







where
˜Hi := In − (2(2i − 1))−1
hJ (i = 1, 2, ..., m).
(cf.) ”A Parallelizable Preconditioner for the Iterative Solution of Implicit Runge-Kutta-type Methods”, Journal of
Computational and Applied Mathematics 111 (1999) P.63-76
Mixed precision iterative refinement method
Mixed precision iterative refinement method is to reduce
computational cost by combining short S digits arithmetic and
long L digits arithmetic (S << L).
The linear equation to be solved: Cx = d , C ∈ RN×N , d,
x ∈ RN
=⇒
(L) Solve Cx0 = d for x0.
For ν = 0, 1, 2, ...
(L) rν := d − Cxν
(S) r′
ν := rν/∥rν∥
(S) Solve Cz = r′
ν for z.
(L) xν+1 := xν + ∥rν∥z
Check convergence.
=⇒ x := xνstop
(cf.) Buttari, Alfredo, et al. International Journal of High Performance
Computing Applications 21.4 (2007): 457-466.
The whole algorithm of accelerated IRK method
Initial guess: Y−1 ∈ Rmn
For l = 0, 1, 2, ... Simplified Newton iteration
(1) Yl := [Y
(l)
1 Y
(l)
2 ... Y
(l)
m ]T
(2) C := Im ⊗ In − hkX ⊗ J, Compute ||C||F
(3) d := (WT B ⊗ In)(−F(Yl))
(4) Solve Cx0 = d for x0 (S)
For ν = 0, 1, 2, ... Mixed precision iterative refinement
(5) rν := d − Cxν
(6-1) r′
ν := rν/||rν|| (S)
(6-2) Solve Cz = r′
ν for z (S)
(6-3) xν+1 := xν + ||rν||z
(6-4) Check convergence ⇒ xνstop
(7) Yl+1 := Yl + (W ⊗ In)xνstop
Check convergence ⇒ Ylstop
Y := Ylstop = [Y1 Y2 ... Ym]T
yk+1 := yk + hk
∑m
j=1 bjf(tk + cjhk, Yj)
Computational environment
H/W Intel Core i7 3820 (4 cores) 3.6GHz + 64GB RAM
OS Scientific Linux 6.3 x86 64
S/W Intel C++ 13.0.1, MPFR 3.1.1/GMP 5.1.1,
BNCpack 0.8
▶ OpenMP in Intel C++ standard.
▶ Block Parallelization for capable parts of IRK methods.
▶ Except left preconditioning and direct method.
Performance check by solving 128th dimentional constant
linear ODE (50 decimal digits)
1.E-38
1.E-34
1.E-30
1.E-26
1.E-22
1.E-18
1.E-14
1.E-10
1.E-06
1.E-02
1.E+02
1.E+06
1.E+10
1.E+14
0
200
400
600
800
1000
1200
1400
1600
3 4 5 6 7 8 9 10 11 12
Relative ErrorComp.Time (s)
m
Iter.Ref-DM W-Trans. W-Iter.Ref-MM W-Iter.Ref-DM Max.Rel.Err
Iter.Ref-DM No reduction + quasi-Newton + Double Precision (DP) -
Multiple Precision (MP) mixed precision iterative
refinement method (based on direct method)
W-Trans. SPARK3 reduction + MP direct method
W-Iter.Ref-MM SPARK3 + MP(S = L/2)-MP iterative refinement
W-Iter.Ref-DM SPARK3 + DP-MP iterative refinement
Stepsize selection by embedded formula (1/2)
Embedded formula for IRK methods (by Hairer): The following
m + 1 stages IRK formula for given contant γ0:
0 0 0T
c 0 A
γ0
ˆbT
In order to extend A stable area, we select γ0 = 1/8 where
ˆb= [ˆb1 · · ·ˆbm]T is obtained by solving the following linear equation
to be satisfied in simplified assumption B(m):





1 · · · 1
c1 · · · cm
...
...
cm−1
1 · · · cm−1
m










ˆb1
ˆb2
...
ˆbm





=





1 − γ0
1/2
...
1/m





.
Stepsize selection by embedded formula (2/2)
By using this embedded formula , we can get ˆyk+1 as following:
ˆyk+1 := yk + hkγ0f(tk, yk) + hk
m∑
j=1
ˆbjf(tk + cjhk, Yj).
And we use the ˆyk+1 for the following local error estimator errk.
||errk|| =
1
n
n∑
i=1
(
|ˆy
(k+1)
i − y
(k+1)
i |
ATOL + RTOL max(|y
(k)
i |, |y
(k+1)
i |)
)2
where ATOL is set as absolute tolerance and RTOL as relative
tolerance given by users.
This estimator is used in next stepsize hk+1 prediction as following:
hk+1 := 0.9||errk||m+1
hk
Numerical experiments of Evolutionary PDEs
▶ 1D Brusselator Problem (omit!)
{
∂u
∂t = 1 + u2v − 4 + 0.02 · ∂2u
∂x2
∂v
∂t = 3u − u2v + 0.02 · ∂2v
∂x2
(7)
▶ 1D Kuramoto-Sivashinsky (K-S) equation
∂U
∂t
= −
∂2U
∂x2
−
∂4U
∂x4
−
1
2
∂U2
∂x
(8)
1D Kuramoto-Sivashinsky Equation: Discretization
method(1/2)
cf. Hairer & Wanner, Solving ODE II, Chap. IV, pp.148 - 149.
∂U
∂t
= −
∂2U
∂x2
−
∂4U
∂x4
−
1
2
∂U2
∂x
Periodic boundary condition: U(x + L, t) = U(x, t)
Initial value: U(x, 0) = 16 max(0,
min(x/L, 0.1 − x/L),
20(x/L − 0.2)(0.3 − x/L),
min(x/L − 0.6, 0.7 − x/L),
min(x/L − 0.9, 1 − x/L))
Parameters: L = 2π/q, q = 0.025
1D Kuramoto-Sivashinsky Equation: Discretization
method(2/2)
⇓ Discretization by using pseudospectral method
ˆUj(t) =
1
L
∫ L
0
U(x, t) exp(−iqjx)dx
U(x, t) =
∑
j∈Z
ˆUj(t) exp(iqjx)
d ˆUj
dt
= ((qj)2
− (qj)4
) ˆUj −
iqj
2
(U · U)j (j ∈ Z)
⇓ Truncating at N = 1024, we make ODE for y(t) = {yj(t)}
dyj
dt
= ((qj)2
−(qj)4
)yj−
iqj
2
FN (F−1
N y·F−1
N y) (j = 1, 2, ..., N/2−1)
where FN , F−1
N means FFT and inverse FFT, respectively.
⋆ Mutiple precision real FFT and inverse real FFT routines are
originated by Ooura’s double precision C routines.
http://www.kurims.kyoto-u.ac.jp/~ooura/fftman/ftmn2_12.htm.
K-S eq. : Numerical values by Multiple precision and
RADAU5(Double precision)
K-S eq. : Relative Errors of RADAU5 (Double precision)
K-S eq.: Computational Times by using variable #stages
IRK formulas
4 threads, 80 stages formulas in 100 decimal digits as the true
solution, and t = 10
80 dec.digits RTOL = ATOL = 10−60
# stages(m) 20 30 40 50
Comp.Time(s) 130165.4 160601.8 133541.0 190131.4
# steps 6911 2667 1103 856
Average (s) 18.8 60.2 121.1 222.1
Max.Rel.Error 4.2E-38 2.7E-38 1.4E-38 2.0E-36
Min.Rel.Error 1.1E-54 1.8E-50 1.7E-52 1.0E-63
RTOL = ATOL = 10−70
# stages(m) 20 30 40 50
Comp.Time(s) 100695.2 86331.4 137232.9 200454.8
# steps 6978 1738 1175 918
Average (s) 14.4 49.7 116.8 218.4
Max.Rel.Error 4.4E-48 1.9E-49 2.4E-47 5.1E-47
Min.Rel.Error 2.7E-68 5.9E-68 1.3E-62 1.2E-68
Conclusion
▶ We can implement the accelerated multiple precision IRK
methods with DP-MP mixed precision iterative refinement
method and SPARK3 reduction in inner simplified Newton
iteration.
▶ Parallelization can reduce the computational cost.
▶ Our implemented ODE solver is available for solving complex
evolutionary PDEs such as Brusselator problem or 1D
Kuramoto-Sivashinsky equation.
Future work
We have the following plans to:
1. Seek higher performance ODE solver in massively parallel
computation environment such as GPGPU or Intel MIC.
2. Implement stable double precision linear solvers such as
GMRES(m) or other stable Krylov subspace methods.
3. Solve many other problems by our ODE solver.
A part of our implemented ODE solver is published as
BIRK(extented Bncpack for Implicit Runge-Kutta methods) in our
Web site.
http://na-inet.jp/na/birk/

More Related Content

What's hot

Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...JamesMa54
 
Solving the energy problem of helium final report
Solving the energy problem of helium final reportSolving the energy problem of helium final report
Solving the energy problem of helium final reportJamesMa54
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theoremJamesMa54
 
Gamma & Beta functions
Gamma & Beta functionsGamma & Beta functions
Gamma & Beta functionsSelvaraj John
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsVjekoslavKovac1
 
Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersTaiji Suzuki
 
Partial differential equations
Partial differential equationsPartial differential equations
Partial differential equationsDr.Jagadish Tawade
 
Gamma beta functions-1
Gamma   beta functions-1Gamma   beta functions-1
Gamma beta functions-1Selvaraj John
 
Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction Amro Elfeki
 
Iit jam 2016 physics solutions BY Trajectoryeducation
Iit jam 2016 physics solutions BY TrajectoryeducationIit jam 2016 physics solutions BY Trajectoryeducation
Iit jam 2016 physics solutions BY TrajectoryeducationDev Singh
 
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete ObservationsMixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observationssipij
 
On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...
On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...
On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...BRNSS Publication Hub
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsVjekoslavKovac1
 
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERS
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERSINFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERS
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERSZac Darcy
 
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片Chyi-Tsong Chen
 
B.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma functionB.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma functionRai University
 
Borut Bajc "Asymptotic safety"
Borut Bajc "Asymptotic safety"Borut Bajc "Asymptotic safety"
Borut Bajc "Asymptotic safety"SEENET-MTP
 

What's hot (20)

Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...Fast and efficient exact synthesis of single qubit unitaries generated by cli...
Fast and efficient exact synthesis of single qubit unitaries generated by cli...
 
Solving the energy problem of helium final report
Solving the energy problem of helium final reportSolving the energy problem of helium final report
Solving the energy problem of helium final report
 
Ch13
Ch13Ch13
Ch13
 
Solovay Kitaev theorem
Solovay Kitaev theoremSolovay Kitaev theorem
Solovay Kitaev theorem
 
Gamma & Beta functions
Gamma & Beta functionsGamma & Beta functions
Gamma & Beta functions
 
Bellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproductsBellman functions and Lp estimates for paraproducts
Bellman functions and Lp estimates for paraproducts
 
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systemsAdaptive dynamic programming algorithm for uncertain nonlinear switched systems
Adaptive dynamic programming algorithm for uncertain nonlinear switched systems
 
Stochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of MultipliersStochastic Alternating Direction Method of Multipliers
Stochastic Alternating Direction Method of Multipliers
 
Partial differential equations
Partial differential equationsPartial differential equations
Partial differential equations
 
cheb_conf_aksenov.pdf
cheb_conf_aksenov.pdfcheb_conf_aksenov.pdf
cheb_conf_aksenov.pdf
 
Gamma beta functions-1
Gamma   beta functions-1Gamma   beta functions-1
Gamma beta functions-1
 
Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction Stochastic Hydrology Lecture 1: Introduction
Stochastic Hydrology Lecture 1: Introduction
 
Iit jam 2016 physics solutions BY Trajectoryeducation
Iit jam 2016 physics solutions BY TrajectoryeducationIit jam 2016 physics solutions BY Trajectoryeducation
Iit jam 2016 physics solutions BY Trajectoryeducation
 
Mixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete ObservationsMixed Spectra for Stable Signals from Discrete Observations
Mixed Spectra for Stable Signals from Discrete Observations
 
On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...
On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...
On Approach to Increase Integration Rate of Elements of a Switched-capacitor ...
 
Trilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operatorsTrilinear embedding for divergence-form operators
Trilinear embedding for divergence-form operators
 
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERS
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERSINFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERS
INFLUENCE OF OVERLAYERS ON DEPTH OF IMPLANTED-HETEROJUNCTION RECTIFIERS
 
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片
Intelligent Process Control Using Neural Fuzzy Techniques ~陳奇中教授演講投影片
 
B.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma functionB.tech ii unit-2 material beta gamma function
B.tech ii unit-2 material beta gamma function
 
Borut Bajc "Asymptotic safety"
Borut Bajc "Asymptotic safety"Borut Bajc "Asymptotic safety"
Borut Bajc "Asymptotic safety"
 

Viewers also liked

Numerical solution using runge kutta with programming in c++
Numerical solution using runge kutta with programming in c++Numerical solution using runge kutta with programming in c++
Numerical solution using runge kutta with programming in c++Vijay Choudhary
 
Fir and iir filter_design
Fir and iir filter_designFir and iir filter_design
Fir and iir filter_designshrinivasgnaik
 
Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1Dr. Nirav Vyas
 
Numerical Methods - Oridnary Differential Equations - 3
Numerical Methods - Oridnary Differential Equations - 3Numerical Methods - Oridnary Differential Equations - 3
Numerical Methods - Oridnary Differential Equations - 3Dr. Nirav Vyas
 

Viewers also liked (7)

Runge Kutta Methods
Runge Kutta MethodsRunge Kutta Methods
Runge Kutta Methods
 
Euler y runge kutta
Euler y runge kuttaEuler y runge kutta
Euler y runge kutta
 
Numerical solution using runge kutta with programming in c++
Numerical solution using runge kutta with programming in c++Numerical solution using runge kutta with programming in c++
Numerical solution using runge kutta with programming in c++
 
Fir and iir filter_design
Fir and iir filter_designFir and iir filter_design
Fir and iir filter_design
 
Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1Numerical Methods - Oridnary Differential Equations - 1
Numerical Methods - Oridnary Differential Equations - 1
 
Runge kutta
Runge kuttaRunge kutta
Runge kutta
 
Numerical Methods - Oridnary Differential Equations - 3
Numerical Methods - Oridnary Differential Equations - 3Numerical Methods - Oridnary Differential Equations - 3
Numerical Methods - Oridnary Differential Equations - 3
 

Similar to Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsPantelis Sopasakis
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationAlexander Litvinenko
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsJagadeeswaran Rathinavel
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniquesKrishna Gali
 
Multiband Transceivers - [Chapter 1]
Multiband Transceivers - [Chapter 1] Multiband Transceivers - [Chapter 1]
Multiband Transceivers - [Chapter 1] Simen Li
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Vladimir Bakhrushin
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Sloshing-aware MPC for upper stage attitude control
Sloshing-aware MPC for upper stage attitude controlSloshing-aware MPC for upper stage attitude control
Sloshing-aware MPC for upper stage attitude controlPantelis Sopasakis
 
New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...Alexander Litvinenko
 
Matlab lab manual
Matlab lab manualMatlab lab manual
Matlab lab manualnmahi96
 
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...Xi'an Jiaotong-Liverpool University
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodIAEME Publication
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processingaj ahmed
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksStratio
 

Similar to Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods" (20)

Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
 
Ph ddefence
Ph ddefencePh ddefence
Ph ddefence
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
05_AJMS_332_21.pdf
05_AJMS_332_21.pdf05_AJMS_332_21.pdf
05_AJMS_332_21.pdf
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
Integration techniques
Integration techniquesIntegration techniques
Integration techniques
 
Multiband Transceivers - [Chapter 1]
Multiband Transceivers - [Chapter 1] Multiband Transceivers - [Chapter 1]
Multiband Transceivers - [Chapter 1]
 
Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...Identification of the Mathematical Models of Complex Relaxation Processes in ...
Identification of the Mathematical Models of Complex Relaxation Processes in ...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Sloshing-aware MPC for upper stage attitude control
Sloshing-aware MPC for upper stage attitude controlSloshing-aware MPC for upper stage attitude control
Sloshing-aware MPC for upper stage attitude control
 
New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...New data structures and algorithms for \\post-processing large data sets and ...
New data structures and algorithms for \\post-processing large data sets and ...
 
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...
 
kactl.pdf
kactl.pdfkactl.pdf
kactl.pdf
 
Matlab lab manual
Matlab lab manualMatlab lab manual
Matlab lab manual
 
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...Theoretical and Practical Bounds on the Initial Value of  Skew-Compensated Cl...
Theoretical and Practical Bounds on the Initial Value of Skew-Compensated Cl...
 
Fast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s methodFast dct algorithm using winograd’s method
Fast dct algorithm using winograd’s method
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
 
Introduction to Artificial Neural Networks
Introduction to Artificial Neural NetworksIntroduction to Artificial Neural Networks
Introduction to Artificial Neural Networks
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

  • 1. On Numerical Properties of Accelerated Multiple Precision Implicit Runge-Kutta Methods Shizuoka Institute of Science and Technology Tomonori Kouya http://na-inet.jp/na/birk/ SciCADE2013 in Valladolid, SPAIN 2013-09-16(Mon) – 20(Fri)
  • 2. Abstarct Abstract Motivation IRK method with simplified Newton method Acceleration of inner iteration and stepsize selection Performance check by solving linear ODE Numerical experiments of Evolutionary PDEs Conclusion and Future work
  • 3. Motivation BNCpack ▶ provides double and multiple precision numerical algorithms based on MPFR/GMP. ▶ has simple explicit and implicit Runge-Kutta (IRK) methods and extrapolation methods for solving ODEs. ⇓ In SciCADE 2007, a gentleman suggested to us that Kuramoto-Sivashinsky (K-S) equation is suitable for our multiple precision ODE solvers because of one of chaotic, stiff and large scale examples of ODEs. ⇓ Accelerated Multiple precision IRK methods based on MPFR/GMP are neccesary to solve it.
  • 4. The Features of Accelerated IRK methods 1. It uses Gauss formula, which is 2m-th order for m stages, A and P-stable, and symplectic method. 2. Supporting mixed precision iterative refinement method in simplified Newton iteration in IRK process can drastically reduce computational time. 3. The parallelization by using OpenMP can be more highly performed.
  • 5. IVP of n dimensional ODE to be solved { dy dt = f(t, y) ∈ Rn y(t0) = y0 Integration Interval:[t0, α] (1) We suppose that this above ODE has the unique solution, so Lipschize constant L > 0 exists to be satisfied such as ||f(t, v) − f(t, w)|| ≤ L||v − w|| (2) for ∀v, w ∈ Rn, ∀t ∈ [t0, α]. ⇓ 1D Brusselator problem and K-S eq. has large L >> 1, so they are called “stiff problems. ”
  • 6. Skeleton of m stages IRK methods Discretization: t0, t1 := t0 + h0, ..., tk+1 := tk + hk... When we calculate the approximation yk+1 ≈ y(tk+1) from the former yk ≈ y(tk), the following two steps are executed: (A) Inner iteration: Solve the nonlinear equation for unknown Y = [Y1 ... Ym]T ∈ Rmn.    Y1 = yk + hk ∑m j=1 a1jf(tk + cjhk, Yj) ... Ym = yk + hk ∑m j=1 amjf(tk + cjhk, Yj) ⇕ F(Y) = 0 (3) (B) Calculate the next approximation yk+1 with the above Y. yk+1 := yk + hk m∑ j=1 bjf(tk + cjhk, Yj)
  • 7. Coefficients of m stages Runge-Kutta method We use IRK coefficients such as: c1 a11 · · · a1m ... ... ... cm am1 · · · amm b1 · · · bm = c A bT (4) Our IRK solver only uses Gauss formula family which is one of fully IRK formulas (aij ̸= 0 (i ≤ j)).
  • 8. Simplified Newton method as inner iteration of IRK method RADAU5 (by Hairer) and SPARK3(by Jay) use simplified Newton method as inner iteration to solve the nonlinear equation (3). Simplified Newton Method: Yl+1 := Yl−(Im ⊗ In − hkA ⊗ J)−1 F(Yl) (5) where In and Im are n × n and m × m unit matrix respectively, J = ∂f/∂y(tk, yk) ∈ Rn×n is the Jacobi matrix corresponding to f. ⇒ We must solve the following linear equation for each iteration of simplified Newton method (5): (In ⊗ Im − hkA ⊗ J)Z = −F(Yl) (6) and then obtain the solution Z and calculate Yl+1 := Yl + Z.
  • 9. Why do we select SPARK3 reduction, not RADAU5? RADAU5: Complex Diagonalization of A by Complex Similarity Transformation Matrix S (S ⊗ In)(Im ⊗ In − hA ⊗ J)(S−1 ⊗ In) = Im ⊗ In − hΛ ⊗ J =    In − hλ1J ... In − hλmJ    . SPARK3: Real Tridiagonalization of A by Real Similarity Transformation Matrix W X = WT BAW =         1/2 −ζ1 ζ1 0 ... ... ... −ζm−2 ζm−2 0 −ζm−1 ζm−1 0         where W = [wij] = [ ˜Pj−1(ci)] (i, j = 1, 2, ..., m) ζi = ( 2 √ 4i2 − 1 )−1 (i = 1, 2, ..., m − 1) B = diag(b), Im = WT BW = diag(1 1 · · · 1)
  • 10. Condition numbers of two kinds of similarity transformation matrices m 3 5 10 15 20 50 κ∞(S) 22.0 388 3. × 105 3. × 108 2. × 1011 4. × 1028 κ∞(W) 3.24 6.27 16.4 29.3 44.5 172 ▶ RADAU5’s S has larger condition numbers (κ∞(S) = ∥S∥∞∥S−1∥∞) as the number of stages of IRK formulas. ▶ SPARK3’s W condition number (κ∞(W) = ∥W∥∞∥W−1∥∞) become mildly larger. =⇒ SPARK3 reduction is the only one selection for many stages IRK formulas.
  • 11. SPARK3 Reduction(1/3) The coefficient matrix of the linear equation for SPARK3 reduction is: (WT B ⊗ In)(Im ⊗ In − hkA ⊗ J)(W ⊗ In) = Im ⊗ In − hkX ⊗ J =        E1 F1 G1 E2 F2 ... ... ... Gm−2 Em−1 Fm−1 Gm−1 Em        where E1 = In − 1 2 hkJ, E2 = · · · = Es = In Fi = hkζiJ, Gi = −hkζiJ (i = 1, 2, ..., m − 1).
  • 12. SPARK3 Reduction(2/3) Jay proposed the left preconditioned matrix P for linear solver sush as: P =        ˜E1 F1 G1 ˜E2 F2 ... ... ... Gm−2 ˜Em−1 Fm−1 Gm−1 ˜Em        ≈ Im ⊗ In − hkX ⊗ J so the preconditioned linear equation to be solved for Z is P−1 (Im ⊗ In − hkX ⊗ J)Z = P−1 (WT B ⊗ In)(−F(Y)).
  • 13. SPARK3 Reduction (3/3) We use LU decomposed P such as P =        In G1 ˜H−1 1 In ... ... Gm−2 ˜H−1 m−2 In Gm−1 ˜H−1 m−1 In        ×        ˜H1 F1 ˜H2 F2 ... ... ˜Hm−1 Fm−1 ˜Hm        where ˜Hi := In − (2(2i − 1))−1 hJ (i = 1, 2, ..., m). (cf.) ”A Parallelizable Preconditioner for the Iterative Solution of Implicit Runge-Kutta-type Methods”, Journal of Computational and Applied Mathematics 111 (1999) P.63-76
  • 14. Mixed precision iterative refinement method Mixed precision iterative refinement method is to reduce computational cost by combining short S digits arithmetic and long L digits arithmetic (S << L). The linear equation to be solved: Cx = d , C ∈ RN×N , d, x ∈ RN =⇒ (L) Solve Cx0 = d for x0. For ν = 0, 1, 2, ... (L) rν := d − Cxν (S) r′ ν := rν/∥rν∥ (S) Solve Cz = r′ ν for z. (L) xν+1 := xν + ∥rν∥z Check convergence. =⇒ x := xνstop (cf.) Buttari, Alfredo, et al. International Journal of High Performance Computing Applications 21.4 (2007): 457-466.
  • 15. The whole algorithm of accelerated IRK method Initial guess: Y−1 ∈ Rmn For l = 0, 1, 2, ... Simplified Newton iteration (1) Yl := [Y (l) 1 Y (l) 2 ... Y (l) m ]T (2) C := Im ⊗ In − hkX ⊗ J, Compute ||C||F (3) d := (WT B ⊗ In)(−F(Yl)) (4) Solve Cx0 = d for x0 (S) For ν = 0, 1, 2, ... Mixed precision iterative refinement (5) rν := d − Cxν (6-1) r′ ν := rν/||rν|| (S) (6-2) Solve Cz = r′ ν for z (S) (6-3) xν+1 := xν + ||rν||z (6-4) Check convergence ⇒ xνstop (7) Yl+1 := Yl + (W ⊗ In)xνstop Check convergence ⇒ Ylstop Y := Ylstop = [Y1 Y2 ... Ym]T yk+1 := yk + hk ∑m j=1 bjf(tk + cjhk, Yj)
  • 16. Computational environment H/W Intel Core i7 3820 (4 cores) 3.6GHz + 64GB RAM OS Scientific Linux 6.3 x86 64 S/W Intel C++ 13.0.1, MPFR 3.1.1/GMP 5.1.1, BNCpack 0.8 ▶ OpenMP in Intel C++ standard. ▶ Block Parallelization for capable parts of IRK methods. ▶ Except left preconditioning and direct method.
  • 17. Performance check by solving 128th dimentional constant linear ODE (50 decimal digits) 1.E-38 1.E-34 1.E-30 1.E-26 1.E-22 1.E-18 1.E-14 1.E-10 1.E-06 1.E-02 1.E+02 1.E+06 1.E+10 1.E+14 0 200 400 600 800 1000 1200 1400 1600 3 4 5 6 7 8 9 10 11 12 Relative ErrorComp.Time (s) m Iter.Ref-DM W-Trans. W-Iter.Ref-MM W-Iter.Ref-DM Max.Rel.Err Iter.Ref-DM No reduction + quasi-Newton + Double Precision (DP) - Multiple Precision (MP) mixed precision iterative refinement method (based on direct method) W-Trans. SPARK3 reduction + MP direct method W-Iter.Ref-MM SPARK3 + MP(S = L/2)-MP iterative refinement W-Iter.Ref-DM SPARK3 + DP-MP iterative refinement
  • 18. Stepsize selection by embedded formula (1/2) Embedded formula for IRK methods (by Hairer): The following m + 1 stages IRK formula for given contant γ0: 0 0 0T c 0 A γ0 ˆbT In order to extend A stable area, we select γ0 = 1/8 where ˆb= [ˆb1 · · ·ˆbm]T is obtained by solving the following linear equation to be satisfied in simplified assumption B(m):      1 · · · 1 c1 · · · cm ... ... cm−1 1 · · · cm−1 m           ˆb1 ˆb2 ... ˆbm      =      1 − γ0 1/2 ... 1/m      .
  • 19. Stepsize selection by embedded formula (2/2) By using this embedded formula , we can get ˆyk+1 as following: ˆyk+1 := yk + hkγ0f(tk, yk) + hk m∑ j=1 ˆbjf(tk + cjhk, Yj). And we use the ˆyk+1 for the following local error estimator errk. ||errk|| = 1 n n∑ i=1 ( |ˆy (k+1) i − y (k+1) i | ATOL + RTOL max(|y (k) i |, |y (k+1) i |) )2 where ATOL is set as absolute tolerance and RTOL as relative tolerance given by users. This estimator is used in next stepsize hk+1 prediction as following: hk+1 := 0.9||errk||m+1 hk
  • 20. Numerical experiments of Evolutionary PDEs ▶ 1D Brusselator Problem (omit!) { ∂u ∂t = 1 + u2v − 4 + 0.02 · ∂2u ∂x2 ∂v ∂t = 3u − u2v + 0.02 · ∂2v ∂x2 (7) ▶ 1D Kuramoto-Sivashinsky (K-S) equation ∂U ∂t = − ∂2U ∂x2 − ∂4U ∂x4 − 1 2 ∂U2 ∂x (8)
  • 21. 1D Kuramoto-Sivashinsky Equation: Discretization method(1/2) cf. Hairer & Wanner, Solving ODE II, Chap. IV, pp.148 - 149. ∂U ∂t = − ∂2U ∂x2 − ∂4U ∂x4 − 1 2 ∂U2 ∂x Periodic boundary condition: U(x + L, t) = U(x, t) Initial value: U(x, 0) = 16 max(0, min(x/L, 0.1 − x/L), 20(x/L − 0.2)(0.3 − x/L), min(x/L − 0.6, 0.7 − x/L), min(x/L − 0.9, 1 − x/L)) Parameters: L = 2π/q, q = 0.025
  • 22. 1D Kuramoto-Sivashinsky Equation: Discretization method(2/2) ⇓ Discretization by using pseudospectral method ˆUj(t) = 1 L ∫ L 0 U(x, t) exp(−iqjx)dx U(x, t) = ∑ j∈Z ˆUj(t) exp(iqjx) d ˆUj dt = ((qj)2 − (qj)4 ) ˆUj − iqj 2 (U · U)j (j ∈ Z) ⇓ Truncating at N = 1024, we make ODE for y(t) = {yj(t)} dyj dt = ((qj)2 −(qj)4 )yj− iqj 2 FN (F−1 N y·F−1 N y) (j = 1, 2, ..., N/2−1) where FN , F−1 N means FFT and inverse FFT, respectively. ⋆ Mutiple precision real FFT and inverse real FFT routines are originated by Ooura’s double precision C routines. http://www.kurims.kyoto-u.ac.jp/~ooura/fftman/ftmn2_12.htm.
  • 23. K-S eq. : Numerical values by Multiple precision and RADAU5(Double precision)
  • 24. K-S eq. : Relative Errors of RADAU5 (Double precision)
  • 25. K-S eq.: Computational Times by using variable #stages IRK formulas 4 threads, 80 stages formulas in 100 decimal digits as the true solution, and t = 10 80 dec.digits RTOL = ATOL = 10−60 # stages(m) 20 30 40 50 Comp.Time(s) 130165.4 160601.8 133541.0 190131.4 # steps 6911 2667 1103 856 Average (s) 18.8 60.2 121.1 222.1 Max.Rel.Error 4.2E-38 2.7E-38 1.4E-38 2.0E-36 Min.Rel.Error 1.1E-54 1.8E-50 1.7E-52 1.0E-63 RTOL = ATOL = 10−70 # stages(m) 20 30 40 50 Comp.Time(s) 100695.2 86331.4 137232.9 200454.8 # steps 6978 1738 1175 918 Average (s) 14.4 49.7 116.8 218.4 Max.Rel.Error 4.4E-48 1.9E-49 2.4E-47 5.1E-47 Min.Rel.Error 2.7E-68 5.9E-68 1.3E-62 1.2E-68
  • 26. Conclusion ▶ We can implement the accelerated multiple precision IRK methods with DP-MP mixed precision iterative refinement method and SPARK3 reduction in inner simplified Newton iteration. ▶ Parallelization can reduce the computational cost. ▶ Our implemented ODE solver is available for solving complex evolutionary PDEs such as Brusselator problem or 1D Kuramoto-Sivashinsky equation.
  • 27. Future work We have the following plans to: 1. Seek higher performance ODE solver in massively parallel computation environment such as GPGPU or Intel MIC. 2. Implement stable double precision linear solvers such as GMRES(m) or other stable Krylov subspace methods. 3. Solve many other problems by our ODE solver. A part of our implemented ODE solver is published as BIRK(extented Bncpack for Implicit Runge-Kutta methods) in our Web site. http://na-inet.jp/na/birk/