SlideShare a Scribd company logo
1 of 1
Download to read offline
Center for Uncertainty
Quantification
Scalable Hierarchical Algorithms for
stochastic PDEs and UQ
Alexander Litvinenko1
, Gustavo Ch´avez2
, David Keyes2
, Hatem Ltaief2
, Rio Yokota2
SRI Uncertainty Quantification1
and Extreme Computing Research
Centers at KAUST2
alexander.litvinenko@kaust.edu.sa
Center for Uncertainty
Quantification
Center for Uncertainty
Quantification
1. Abstract
H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming
from partial differential and integral equations as well as speed up computational cost from quadratic
or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The
storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for
parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by
Kriemann [1,2]. Since 2005, the area of parallel architectures and software is developing very fast.
Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend
work started in [1,2,7,8].
2. Fast Multipole (FMM) and Hierarchical (H-) matrices
Communication is the bottleneck for any algorithm as it approaches the limit of its parallel scalability.
We provide new upper bounds for the communication complexity of FMM
Reference Processes Data per Process Communication complexity
Teng 98 O(P) O (N/P)2/3(log N + µ)1/3 O P(N/P)2/3(log N + µ)1/3
Lashuk 09 O(
√
P) O (N/P)2/3 O
√
P(N/P)2/3
Yokota et al. 14
Global Local Global Local Global + Local
O(log P) O(1) O(1) O (N/P)2/3 O log P + (N/P)2/3
Table 1: Communication complexity of FMM, N is problem size, P is number of processors.
Aim:
To improve results of Kriemann [1,2] about parallel H-matrix implementation on shared memory sys-
tems on new architectures.
25 9
9 20 9
9
20 7
7 16
9
9
20 9
9 20 9
9 32
9
9
20 9
9 20 9
9 32 9
9
32 9
9 32
9
9
20 9
9 20 9
9 32 9
9
20 9
9 20 9
9 32
9
9
32 9
9 32 9
9
32 9
9 32
9
9
20 9
9 20 9
9 32 9
9
32 9
9 32
9
9
20 9
9 20 9
9 32 9
9
32 9
9 32
9
9
32 9
9 32 9
9
32 9
9 32
9
9
32 9
9 32 9
9
32 9
9 32
25 20
20 20
20 16
20 16
20 20
16 16
20 16
16 16
9 9
20 9 32
9 9
16 9 32
9 20
9 9
9 16
9 9
32 32
20 20
20 20 32
32 32
9 9
9 9 32
20 9
16 9 32
32 9
32 32
9 32
32 32
32 9
32 32
9 9
9 9
20 16
9 9
32 32
9 32
32 32
32 32
9 32
32 32
9 32
20 20
20 20 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
9 9
9 9
20 9 32
32 32 9
9 9
32 9
32 32 9
9 9
32 32
9 32 9
9 9
32 32
32 32 9
9
9 20
9 9 32
32 32
9 9
9
32 9
32 32
9 9
9
32 32
9 32
9 9
9
32 32
32 32
9 9
20 20
20 20 32
32 32
9 9
20 9 32
32 32
9 20
9 9 32
32 32
20 20
20 20 32
32 32
32 9
32 32
32 9
32 32
32 9
32 32
32 9
32 32
32 32
9 32
32 32
9 32
32 32
9 32
32 32
9 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
9 9
9 9 99 9
20 9 32
32 32
32 9
32 32
9 32
32 32
32 9
32 32
9 9
9 9
9 9
9 9 9
9 9
32 9
32 32 9
9 9
9 9
9 9
9 9 9
9
32 9
32 32
9 9
9 9
9 9
9 9
9 9 9
32 9
32 32
32 9
32 32
32 9
32 32
32 9
32 32
9 9
9 9
9 9
9 9
9 20
9 9 32
32 32
9 32
32 32
32 32
9 32
32 32
9 32
9
9 9
9 9
9 9
9 9
9 9
32 32
9 32 9
9
9 9
9 9
9 9
9 9
9
32 32
9 32
9 9
9
9 9
9 9
9 9
9 9
32 32
9 32
32 32
9 32
32 32
9 32
32 32
9 32
9
9 9
9 9
20 20
20 20 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
9 9
20 9 32
32 32
32 9
32 32
9 32
32 32
32 9
32 32
9 20
9 9 32
32 32
9 32
32 32
32 32
9 32
32 32
9 32
20 20
20 20 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
9 9
32 32
32 32 9
9 9
32 9
32 32 9
9 9
32 32
9 32 9
9 9
32 32
32 32 9
9
32 32
32 32
9 9
9
32 9
32 32
9 9
9
32 32
9 32
9 9
9
32 32
32 32
9 9
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 9
32 32
32 9
32 9
32 9
32 32
32 9
32 9
32 32
9 32
32 32
9 32
32 32
9 9
32 32
9 9
32 32
32 32
32 32
32 32
32 32
32 32
32 32
32 32
4 3
3 3 5
5 5
9 9
5 9
9 5
9 9
5 4
4 4
6 9
9 9
6 9
9 9
6 5
5 5
9 9
9 9
4 9 9 9
9 9
9
9 4
9 9
9 9 9
9 9
4 4
4 5
9 9
5 9
9 5
9 9
5 5
5
4 3
3 3
9 9
6 9
9 6
9 9
6 5
5 5
9 9
9 9
5 9 9
9 9
9 9
9 9
9 9
9 9
9 9
5 9 9
9
9 5
9 9
9 9
9 9
9 9
9 9
9 9
9
9 5
9 9
9 9
5 5
5 6
9 9
6 9
9 6
9 9
3 3
3 4 5
5 5
9 9
5 9
9 5
9 9
5 4
4 4
9 9
9
9 9
9 9
4 9 9
9 9
9
9
9 4
9 9
9 9
5 5
5 6
9 9
9 6
9 9
9 6
4 4
4 5
9 9
5 9
9 5
9 9
5 5
5
3 3
3 4
Figure 1: Examples of H-matrix approximation of exponential covariance matrix with weak and stan-
dard admissibility conditions, BEM matrix
Methodology:
1. Build cluster tree TI and block cluster tree TI×I.
I
I
I I
I
I
I I I I1
1
2
2
11 12 21 22
I11
I12
I21
I22
2. For each (t × s) ∈ TI×I, t, s ∈ TI, check admissibility condition
min{diam(Qt), diam(Qs)} ≤ η · dist(Qt, Qs).
if(adm=true) then M|t×s is a rank-k matrix block
if(adm=false) then divide M|t×s further or define as a dense matrix block, if small enough. Grid
→ cluster tree (TI) + admissibility condition → block cluster tree (TI×I) → H-matrix → H-matrix
arithmetics.
Q
Qt
S
dist
H=
t
s
4 2
2 2 3
3 3
4 2
2 2
4 2
2 2
4
Operation Sequential Complexity Parallel Complexity
(Hackbusch et al. ’99-’06) (Kriemann ’05)
storage(M) N = O(kn log n) N
P
Mx N = O(kn log n) N
P
M1 ⊕ M2 N = O(k2n log n) N
P
M1 M2, M−1 N = O(k2n log2 n) N
P + O(n)
H-LU N = O(k2n log2 n) N
P + O(k2
n log2
n
n1/d )
H-matrix conversion N = O(k2n log2 n) N
P
Further Applications: Matrix exponential allows us to solve ODEs
˙x(t) = Ax(t), x(0) = x0, → x(t) = exp (tA)x0
Other matrix function: use representation by the Cauchy integral
f(A) =
1
2πi Γ
f(t)(A − tI)−1dtf(A) ≈
k
j=1
wjf(tj)(A − tjI)−1
[Hackbusch, TR 4/2005, MPI]
Matrix equations (Arises in the context of optimal control problems)
(Lyapunov) AX+XAT +C = 0, (Sylvester) AX−XB+C = 0, (Riccati) AX+XAT −XFX+C = 0.
(1)
where A, B, C, F are given matrices and X is the solution (a matrix).
Solution of the Ricati equations Let A ∈ Rn×n, B ∈ Rm×m, and let σ(A) < σ(B), then
X :=
∞
0
exp(tA)C exp(−tB)dt, Xk :=
k
j=0
wj exp(tjA)C exp(−tjB) (2)
solves the Riccati equation [Hackbusch, TR 4/2005, MPI].
3. HiCMA: Hierarchical Computations on Multicore with
hardware Accelerators
HiCMA aims to tackle the challenge that is facing the linear algebra community due to an unprece-
dented level of on-chip concurrency, introduced by the manycore era. HiCMA is a high performance
numerical library designed for efficient compressions and fast implementations of (H-matrix) algo-
rithms across a range of architectures: multicore, accelerators and ARM processors. The core
idea is to redesign the numerical algorithms (as implemented in H-Lib) and to formulate them as
successive calls to computational tasks, which are then scheduled on the underlying system us-
ing a runtime system to ensure load balancing. The algorithm is then represented as a Directed
Acyclic Graph (DAG), where nodes represent hierarchical tasks and edges show the data depen-
dencies between them. There are various paramount factors such as admissibility condition (block
partition), the minimal leaf size (diagonal block) and the H-matrix rank (numerical accuracy), which
influence the performance of HiCMA. An auto-tuning framework is therefore critical to select the op-
timal parameters for a given application (FEM, BEM), which exhibits low rank matrix computations.
1:1 SPOTRF
STRSMSTRSMSTRSM STRSM
SPOTRF
2:4
SGEMM SSYRKSGEMM SGEMMSSYRKSGEMM SGEMMSSYRK SGEMM SSYRK
STRSMSTRSM STRSM
SPOTRF
3:10
SSYRKSGEMM SGEMMSSYRK SGEMM SSYRK
4:1
SSYRK
STRSM
SSYRK
STRSM
5:3
SPOTRF
SGEMM
6:6
7:1
SSYRK
8:2
STRSM
SPOTRF
9:3
10:1
11:1
12:1
13:1
Figure 3: Directed Acyclic Graph for a Cholesky factorization: DAG height corresponds to the length of the critical path
and the DAG width to the degree of concurrency.
4. New algorithmic ideas for parallel H-matrices
An hierarchical matrix is composed of a mixture of low-rank blocks (green) and dense blocks (red).
For maximum performance (c.f. Figure 4 left), the size of the dense blocks is typically a small constant
(8, 16, 32, or 64); the definitive number is processor-dependent.
Figure 4: Performance effect for different leaf sizes (left), and the projected performance using ac-
celerators (right). H-LU factorization of the 2D Poisson model problem on a 10242 points grid.
The main observation is that multi-processors are coupled with accelerators. Furthermore, there are
mature dense linear algebra libraries that run in such hardware, and in some cases the low-rank and
dense computations can be done concurrently, so there is an opportunity of hiding the communica-
tion with the accelerator. This will enable to work with larger dense blocks, at minimal burden to the
overall performance for kernels that demand larger dense blocks, or with improved performance for
the typical type of kernels (c.f. Figure 4 right).
5. Applications
Parallel H-matrix library will allow us to model very large, detailed covariance matrices which are
used in simulation of oil reservoirs, climate forecast, data assimilation on irregular grids etc. Addi-
tionally we will able to solve numerically convection-diffusion, integral, acoustic, Helmholtz, Maxwell
and matrix equations much faster with lower storage cost (all these examples are already available
in HLIB [9] and HLIBPro [1]).
References
1. R. Kriemann, H-LibPro documentation, http://www.hlibpro.com
2. R. Kriemann, Dissertation, University of Kiel, Germany, 2004.
3. R. Yokota, G. Turkiyyah, D. Keyes, Communication Complexity of Fast Multipole Method and its Algeb. Variants, 2014.
4. R. Yokota et al, Scalable Hierarchical Algorithms for eXtreme Computing’, Supercomp. Frontiers and Innov., 2014.
5. R. Yokota, J. Pestana, H. Ibeid, and D. E. Keyes. Fast Multipole Preconditioners for Sparse Matrices Arising from
Elliptic Equations. arXiv:1308.3339v2, 2014.
6. E. Agullo et al., Num. Lin. Alg. on Emerging Architec.: PLASMA and MAGMA Projects. J. of Ph. 180.1, 2009.
7. R. Kriemann, H-LU Factorisation on Many-Core Systems, MIS MPG preprint 5, 2014.
8. M. Izadi, Dissertation, Hierarchical Matrix Techniques on Massive Parallel Computers, Uni Leipzig, Germany, 2012.
9. L. Grasedyck, S. Boerm, H-matrix library, http://www.hlib.org

More Related Content

Viewers also liked

Likelihood approximation with parallel hierarchical matrices for large spatia...
Likelihood approximation with parallel hierarchical matrices for large spatia...Likelihood approximation with parallel hierarchical matrices for large spatia...
Likelihood approximation with parallel hierarchical matrices for large spatia...Alexander Litvinenko
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsAlexander Litvinenko
 
Jeni Paula Eric
Jeni Paula EricJeni Paula Eric
Jeni Paula Ericjennihr
 
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Alexander Litvinenko
 
Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsAlexander Litvinenko
 
My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"Alexander Litvinenko
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT posterAlexander Litvinenko
 
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005Alexander Litvinenko
 
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...Alexander Litvinenko
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionAlexander Litvinenko
 
2º de Bachillerato HES - Tema 0 - Introducción a la Historia de España
2º de Bachillerato HES - Tema 0 - Introducción a la Historia de España2º de Bachillerato HES - Tema 0 - Introducción a la Historia de España
2º de Bachillerato HES - Tema 0 - Introducción a la Historia de EspañaSergio García Arama
 
Taller 10 practica innovador grupo 1 2 APRENDIZAJE VIVENCIAL
Taller 10 practica innovador   grupo 1  2 APRENDIZAJE VIVENCIALTaller 10 practica innovador   grupo 1  2 APRENDIZAJE VIVENCIAL
Taller 10 practica innovador grupo 1 2 APRENDIZAJE VIVENCIALEdwin Riascos
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMario Cartia
 

Viewers also liked (20)

Likelihood approximation with parallel hierarchical matrices for large spatia...
Likelihood approximation with parallel hierarchical matrices for large spatia...Likelihood approximation with parallel hierarchical matrices for large spatia...
Likelihood approximation with parallel hierarchical matrices for large spatia...
 
The Heist Simulation - Participants' Briefing Pack
The Heist Simulation - Participants' Briefing PackThe Heist Simulation - Participants' Briefing Pack
The Heist Simulation - Participants' Briefing Pack
 
Tensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEsTensor train to solve stochastic PDEs
Tensor train to solve stochastic PDEs
 
Empresa.
Empresa.Empresa.
Empresa.
 
Jeni Paula Eric
Jeni Paula EricJeni Paula Eric
Jeni Paula Eric
 
RS
RSRS
RS
 
My PhD on 4 pages
My PhD on 4 pagesMy PhD on 4 pages
My PhD on 4 pages
 
Litvinenko nlbu2016
Litvinenko nlbu2016Litvinenko nlbu2016
Litvinenko nlbu2016
 
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
 
add_2_diplom_main
add_2_diplom_mainadd_2_diplom_main
add_2_diplom_main
 
Low-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problemsLow-rank tensor methods for stochastic forward and inverse problems
Low-rank tensor methods for stochastic forward and inverse problems
 
My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"My PhD talk "Application of H-matrices for computing partial inverse"
My PhD talk "Application of H-matrices for computing partial inverse"
 
Litvinenko low-rank kriging +FFT poster
Litvinenko low-rank kriging +FFT  posterLitvinenko low-rank kriging +FFT  poster
Litvinenko low-rank kriging +FFT poster
 
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005My paper for Domain Decomposition Conference in Strobl, Austria, 2005
My paper for Domain Decomposition Conference in Strobl, Austria, 2005
 
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
Application H-matrices for solving PDEs with multi-scale coefficients, jumpin...
 
Data sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansionData sparse approximation of the Karhunen-Loeve expansion
Data sparse approximation of the Karhunen-Loeve expansion
 
2º de Bachillerato HES - Tema 0 - Introducción a la Historia de España
2º de Bachillerato HES - Tema 0 - Introducción a la Historia de España2º de Bachillerato HES - Tema 0 - Introducción a la Historia de España
2º de Bachillerato HES - Tema 0 - Introducción a la Historia de España
 
Taller 10 practica innovador grupo 1 2 APRENDIZAJE VIVENCIAL
Taller 10 practica innovador   grupo 1  2 APRENDIZAJE VIVENCIALTaller 10 practica innovador   grupo 1  2 APRENDIZAJE VIVENCIAL
Taller 10 practica innovador grupo 1 2 APRENDIZAJE VIVENCIAL
 
S4 line business platform
S4 line business platformS4 line business platform
S4 line business platform
 
Machine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By ExamplesMachine Learning Real Life Applications By Examples
Machine Learning Real Life Applications By Examples
 

Similar to Scalable hierarchical algorithms for stochastic PDEs and UQ

Breaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuitsBreaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuitshquynh
 
Interpreting the data parallel analysis with sawzall
Interpreting the data  parallel analysis with sawzallInterpreting the data  parallel analysis with sawzall
Interpreting the data parallel analysis with sawzallLee David
 
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...ijcsit
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentNiccolò Tubini
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
Sparse Random Network Coding for Reliable Multicast Services
Sparse Random Network Coding for Reliable Multicast ServicesSparse Random Network Coding for Reliable Multicast Services
Sparse Random Network Coding for Reliable Multicast ServicesAndrea Tassi
 
A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentKelly Taylor
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcastingRudra Narayan Paul
 
IRJET - Candle Stick Chart for Stock Market Prediction
IRJET - Candle Stick Chart for Stock Market PredictionIRJET - Candle Stick Chart for Stock Market Prediction
IRJET - Candle Stick Chart for Stock Market PredictionIRJET Journal
 
Computational Fluid Dynamics for Aerodynamics
Computational Fluid Dynamics for AerodynamicsComputational Fluid Dynamics for Aerodynamics
Computational Fluid Dynamics for AerodynamicsIRJET Journal
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...IRJET Journal
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and BlockchainKan Yuenyong
 
Design of High Order Multirate GARK Schemes
Design of High Order Multirate GARK SchemesDesign of High Order Multirate GARK Schemes
Design of High Order Multirate GARK SchemesVirginiaTech
 
A survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matricesA survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matricesunyil96
 
Hamming Distance and Data Compression of 1-D CA
Hamming Distance and Data Compression of 1-D CAHamming Distance and Data Compression of 1-D CA
Hamming Distance and Data Compression of 1-D CAcsitconf
 

Similar to Scalable hierarchical algorithms for stochastic PDEs and UQ (20)

community detection
community detectioncommunity detection
community detection
 
Breaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuitsBreaking the 49 qubit barrier in the simulation of quantum circuits
Breaking the 49 qubit barrier in the simulation of quantum circuits
 
Interpreting the data parallel analysis with sawzall
Interpreting the data  parallel analysis with sawzallInterpreting the data  parallel analysis with sawzall
Interpreting the data parallel analysis with sawzall
 
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...
 
Jgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging componentJgrass-NewAge: Kriging component
Jgrass-NewAge: Kriging component
 
cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
Sparse Random Network Coding for Reliable Multicast Services
Sparse Random Network Coding for Reliable Multicast ServicesSparse Random Network Coding for Reliable Multicast Services
Sparse Random Network Coding for Reliable Multicast Services
 
A Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic AssignmentA Strategic Model For Dynamic Traffic Assignment
A Strategic Model For Dynamic Traffic Assignment
 
Project seminar ppt_steelcasting
Project seminar ppt_steelcastingProject seminar ppt_steelcasting
Project seminar ppt_steelcasting
 
Economia01
Economia01Economia01
Economia01
 
Economia01
Economia01Economia01
Economia01
 
IRJET - Candle Stick Chart for Stock Market Prediction
IRJET - Candle Stick Chart for Stock Market PredictionIRJET - Candle Stick Chart for Stock Market Prediction
IRJET - Candle Stick Chart for Stock Market Prediction
 
Computational Fluid Dynamics for Aerodynamics
Computational Fluid Dynamics for AerodynamicsComputational Fluid Dynamics for Aerodynamics
Computational Fluid Dynamics for Aerodynamics
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
 
|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain|QAB> : Quantum Computing, AI and Blockchain
|QAB> : Quantum Computing, AI and Blockchain
 
Design of High Order Multirate GARK Schemes
Design of High Order Multirate GARK SchemesDesign of High Order Multirate GARK Schemes
Design of High Order Multirate GARK Schemes
 
A survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matricesA survey of indexing techniques for sparse matrices
A survey of indexing techniques for sparse matrices
 
Hamming Distance and Data Compression of 1-D CA
Hamming Distance and Data Compression of 1-D CAHamming Distance and Data Compression of 1-D CA
Hamming Distance and Data Compression of 1-D CA
 

More from Alexander Litvinenko

litvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdflitvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdfAlexander Litvinenko
 
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and PermeabilityDensity Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and PermeabilityAlexander Litvinenko
 
Uncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdfUncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdfAlexander Litvinenko
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfAlexander Litvinenko
 
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Alexander Litvinenko
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
Identification of unknown parameters and prediction of missing values. Compar...
Identification of unknown parameters and prediction of missing values. Compar...Identification of unknown parameters and prediction of missing values. Compar...
Identification of unknown parameters and prediction of missing values. Compar...Alexander Litvinenko
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
 
Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...Alexander Litvinenko
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Alexander Litvinenko
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
 
Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Alexander Litvinenko
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Alexander Litvinenko
 
Propagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater FlowPropagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater FlowAlexander Litvinenko
 
Simulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flowSimulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flowAlexander Litvinenko
 
Approximation of large covariance matrices in statistics
Approximation of large covariance matrices in statisticsApproximation of large covariance matrices in statistics
Approximation of large covariance matrices in statisticsAlexander Litvinenko
 

More from Alexander Litvinenko (20)

litvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdflitvinenko_Intrusion_Bari_2023.pdf
litvinenko_Intrusion_Bari_2023.pdf
 
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and PermeabilityDensity Driven Groundwater Flow with Uncertain Porosity and Permeability
Density Driven Groundwater Flow with Uncertain Porosity and Permeability
 
litvinenko_Gamm2023.pdf
litvinenko_Gamm2023.pdflitvinenko_Gamm2023.pdf
litvinenko_Gamm2023.pdf
 
Litvinenko_Poster_Henry_22May.pdf
Litvinenko_Poster_Henry_22May.pdfLitvinenko_Poster_Henry_22May.pdf
Litvinenko_Poster_Henry_22May.pdf
 
Uncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdfUncertain_Henry_problem-poster.pdf
Uncertain_Henry_problem-poster.pdf
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...Computing f-Divergences and Distances of High-Dimensional Probability Density...
Computing f-Divergences and Distances of High-Dimensional Probability Density...
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Identification of unknown parameters and prediction of missing values. Compar...
Identification of unknown parameters and prediction of missing values. Compar...Identification of unknown parameters and prediction of missing values. Compar...
Identification of unknown parameters and prediction of missing values. Compar...
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...
 
Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...Identification of unknown parameters and prediction with hierarchical matrice...
Identification of unknown parameters and prediction with hierarchical matrice...
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...
 
Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...Application of parallel hierarchical matrices for parameter inference and pre...
Application of parallel hierarchical matrices for parameter inference and pre...
 
Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...Computation of electromagnetic fields scattered from dielectric objects of un...
Computation of electromagnetic fields scattered from dielectric objects of un...
 
Propagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater FlowPropagation of Uncertainties in Density Driven Groundwater Flow
Propagation of Uncertainties in Density Driven Groundwater Flow
 
Simulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flowSimulation of propagation of uncertainties in density-driven groundwater flow
Simulation of propagation of uncertainties in density-driven groundwater flow
 
Approximation of large covariance matrices in statistics
Approximation of large covariance matrices in statisticsApproximation of large covariance matrices in statistics
Approximation of large covariance matrices in statistics
 

Recently uploaded

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Recently uploaded (20)

Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Scalable hierarchical algorithms for stochastic PDEs and UQ

  • 1. Center for Uncertainty Quantification Scalable Hierarchical Algorithms for stochastic PDEs and UQ Alexander Litvinenko1 , Gustavo Ch´avez2 , David Keyes2 , Hatem Ltaief2 , Rio Yokota2 SRI Uncertainty Quantification1 and Extreme Computing Research Centers at KAUST2 alexander.litvinenko@kaust.edu.sa Center for Uncertainty Quantification Center for Uncertainty Quantification 1. Abstract H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming from partial differential and integral equations as well as speed up computational cost from quadratic or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by Kriemann [1,2]. Since 2005, the area of parallel architectures and software is developing very fast. Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend work started in [1,2,7,8]. 2. Fast Multipole (FMM) and Hierarchical (H-) matrices Communication is the bottleneck for any algorithm as it approaches the limit of its parallel scalability. We provide new upper bounds for the communication complexity of FMM Reference Processes Data per Process Communication complexity Teng 98 O(P) O (N/P)2/3(log N + µ)1/3 O P(N/P)2/3(log N + µ)1/3 Lashuk 09 O( √ P) O (N/P)2/3 O √ P(N/P)2/3 Yokota et al. 14 Global Local Global Local Global + Local O(log P) O(1) O(1) O (N/P)2/3 O log P + (N/P)2/3 Table 1: Communication complexity of FMM, N is problem size, P is number of processors. Aim: To improve results of Kriemann [1,2] about parallel H-matrix implementation on shared memory sys- tems on new architectures. 25 9 9 20 9 9 20 7 7 16 9 9 20 9 9 20 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 20 9 9 20 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 9 9 32 25 20 20 20 20 16 20 16 20 20 16 16 20 16 16 16 9 9 20 9 32 9 9 16 9 32 9 20 9 9 9 16 9 9 32 32 20 20 20 20 32 32 32 9 9 9 9 32 20 9 16 9 32 32 9 32 32 9 32 32 32 32 9 32 32 9 9 9 9 20 16 9 9 32 32 9 32 32 32 32 32 9 32 32 32 9 32 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 9 9 9 9 20 9 32 32 32 9 9 9 32 9 32 32 9 9 9 32 32 9 32 9 9 9 32 32 32 32 9 9 9 20 9 9 32 32 32 9 9 9 32 9 32 32 9 9 9 32 32 9 32 9 9 9 32 32 32 32 9 9 20 20 20 20 32 32 32 9 9 20 9 32 32 32 9 20 9 9 32 32 32 20 20 20 20 32 32 32 32 9 32 32 32 9 32 32 32 9 32 32 32 9 32 32 32 32 9 32 32 32 9 32 32 32 9 32 32 32 9 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 9 9 9 9 99 9 20 9 32 32 32 32 9 32 32 9 32 32 32 32 9 32 32 9 9 9 9 9 9 9 9 9 9 9 32 9 32 32 9 9 9 9 9 9 9 9 9 9 9 32 9 32 32 9 9 9 9 9 9 9 9 9 9 9 32 9 32 32 32 9 32 32 32 9 32 32 32 9 32 32 9 9 9 9 9 9 9 9 9 20 9 9 32 32 32 9 32 32 32 32 32 9 32 32 32 9 32 9 9 9 9 9 9 9 9 9 9 9 32 32 9 32 9 9 9 9 9 9 9 9 9 9 9 32 32 9 32 9 9 9 9 9 9 9 9 9 9 9 32 32 9 32 32 32 9 32 32 32 9 32 32 32 9 32 9 9 9 9 9 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 9 9 20 9 32 32 32 32 9 32 32 9 32 32 32 32 9 32 32 9 20 9 9 32 32 32 9 32 32 32 32 32 9 32 32 32 9 32 20 20 20 20 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 9 9 32 32 32 32 9 9 9 32 9 32 32 9 9 9 32 32 9 32 9 9 9 32 32 32 32 9 9 32 32 32 32 9 9 9 32 9 32 32 9 9 9 32 32 9 32 9 9 9 32 32 32 32 9 9 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 9 32 32 32 9 32 9 32 9 32 32 32 9 32 9 32 32 9 32 32 32 9 32 32 32 9 9 32 32 9 9 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 4 3 3 3 5 5 5 9 9 5 9 9 5 9 9 5 4 4 4 6 9 9 9 6 9 9 9 6 5 5 5 9 9 9 9 4 9 9 9 9 9 9 9 4 9 9 9 9 9 9 9 4 4 4 5 9 9 5 9 9 5 9 9 5 5 5 4 3 3 3 9 9 6 9 9 6 9 9 6 5 5 5 9 9 9 9 5 9 9 9 9 9 9 9 9 9 9 9 9 9 9 5 9 9 9 9 5 9 9 9 9 9 9 9 9 9 9 9 9 9 9 5 9 9 9 9 5 5 5 6 9 9 6 9 9 6 9 9 3 3 3 4 5 5 5 9 9 5 9 9 5 9 9 5 4 4 4 9 9 9 9 9 9 9 4 9 9 9 9 9 9 9 4 9 9 9 9 5 5 5 6 9 9 9 6 9 9 9 6 4 4 4 5 9 9 5 9 9 5 9 9 5 5 5 3 3 3 4 Figure 1: Examples of H-matrix approximation of exponential covariance matrix with weak and stan- dard admissibility conditions, BEM matrix Methodology: 1. Build cluster tree TI and block cluster tree TI×I. I I I I I I I I I I1 1 2 2 11 12 21 22 I11 I12 I21 I22 2. For each (t × s) ∈ TI×I, t, s ∈ TI, check admissibility condition min{diam(Qt), diam(Qs)} ≤ η · dist(Qt, Qs). if(adm=true) then M|t×s is a rank-k matrix block if(adm=false) then divide M|t×s further or define as a dense matrix block, if small enough. Grid → cluster tree (TI) + admissibility condition → block cluster tree (TI×I) → H-matrix → H-matrix arithmetics. Q Qt S dist H= t s 4 2 2 2 3 3 3 4 2 2 2 4 2 2 2 4 Operation Sequential Complexity Parallel Complexity (Hackbusch et al. ’99-’06) (Kriemann ’05) storage(M) N = O(kn log n) N P Mx N = O(kn log n) N P M1 ⊕ M2 N = O(k2n log n) N P M1 M2, M−1 N = O(k2n log2 n) N P + O(n) H-LU N = O(k2n log2 n) N P + O(k2 n log2 n n1/d ) H-matrix conversion N = O(k2n log2 n) N P Further Applications: Matrix exponential allows us to solve ODEs ˙x(t) = Ax(t), x(0) = x0, → x(t) = exp (tA)x0 Other matrix function: use representation by the Cauchy integral f(A) = 1 2πi Γ f(t)(A − tI)−1dtf(A) ≈ k j=1 wjf(tj)(A − tjI)−1 [Hackbusch, TR 4/2005, MPI] Matrix equations (Arises in the context of optimal control problems) (Lyapunov) AX+XAT +C = 0, (Sylvester) AX−XB+C = 0, (Riccati) AX+XAT −XFX+C = 0. (1) where A, B, C, F are given matrices and X is the solution (a matrix). Solution of the Ricati equations Let A ∈ Rn×n, B ∈ Rm×m, and let σ(A) < σ(B), then X := ∞ 0 exp(tA)C exp(−tB)dt, Xk := k j=0 wj exp(tjA)C exp(−tjB) (2) solves the Riccati equation [Hackbusch, TR 4/2005, MPI]. 3. HiCMA: Hierarchical Computations on Multicore with hardware Accelerators HiCMA aims to tackle the challenge that is facing the linear algebra community due to an unprece- dented level of on-chip concurrency, introduced by the manycore era. HiCMA is a high performance numerical library designed for efficient compressions and fast implementations of (H-matrix) algo- rithms across a range of architectures: multicore, accelerators and ARM processors. The core idea is to redesign the numerical algorithms (as implemented in H-Lib) and to formulate them as successive calls to computational tasks, which are then scheduled on the underlying system us- ing a runtime system to ensure load balancing. The algorithm is then represented as a Directed Acyclic Graph (DAG), where nodes represent hierarchical tasks and edges show the data depen- dencies between them. There are various paramount factors such as admissibility condition (block partition), the minimal leaf size (diagonal block) and the H-matrix rank (numerical accuracy), which influence the performance of HiCMA. An auto-tuning framework is therefore critical to select the op- timal parameters for a given application (FEM, BEM), which exhibits low rank matrix computations. 1:1 SPOTRF STRSMSTRSMSTRSM STRSM SPOTRF 2:4 SGEMM SSYRKSGEMM SGEMMSSYRKSGEMM SGEMMSSYRK SGEMM SSYRK STRSMSTRSM STRSM SPOTRF 3:10 SSYRKSGEMM SGEMMSSYRK SGEMM SSYRK 4:1 SSYRK STRSM SSYRK STRSM 5:3 SPOTRF SGEMM 6:6 7:1 SSYRK 8:2 STRSM SPOTRF 9:3 10:1 11:1 12:1 13:1 Figure 3: Directed Acyclic Graph for a Cholesky factorization: DAG height corresponds to the length of the critical path and the DAG width to the degree of concurrency. 4. New algorithmic ideas for parallel H-matrices An hierarchical matrix is composed of a mixture of low-rank blocks (green) and dense blocks (red). For maximum performance (c.f. Figure 4 left), the size of the dense blocks is typically a small constant (8, 16, 32, or 64); the definitive number is processor-dependent. Figure 4: Performance effect for different leaf sizes (left), and the projected performance using ac- celerators (right). H-LU factorization of the 2D Poisson model problem on a 10242 points grid. The main observation is that multi-processors are coupled with accelerators. Furthermore, there are mature dense linear algebra libraries that run in such hardware, and in some cases the low-rank and dense computations can be done concurrently, so there is an opportunity of hiding the communica- tion with the accelerator. This will enable to work with larger dense blocks, at minimal burden to the overall performance for kernels that demand larger dense blocks, or with improved performance for the typical type of kernels (c.f. Figure 4 right). 5. Applications Parallel H-matrix library will allow us to model very large, detailed covariance matrices which are used in simulation of oil reservoirs, climate forecast, data assimilation on irregular grids etc. Addi- tionally we will able to solve numerically convection-diffusion, integral, acoustic, Helmholtz, Maxwell and matrix equations much faster with lower storage cost (all these examples are already available in HLIB [9] and HLIBPro [1]). References 1. R. Kriemann, H-LibPro documentation, http://www.hlibpro.com 2. R. Kriemann, Dissertation, University of Kiel, Germany, 2004. 3. R. Yokota, G. Turkiyyah, D. Keyes, Communication Complexity of Fast Multipole Method and its Algeb. Variants, 2014. 4. R. Yokota et al, Scalable Hierarchical Algorithms for eXtreme Computing’, Supercomp. Frontiers and Innov., 2014. 5. R. Yokota, J. Pestana, H. Ibeid, and D. E. Keyes. Fast Multipole Preconditioners for Sparse Matrices Arising from Elliptic Equations. arXiv:1308.3339v2, 2014. 6. E. Agullo et al., Num. Lin. Alg. on Emerging Architec.: PLASMA and MAGMA Projects. J. of Ph. 180.1, 2009. 7. R. Kriemann, H-LU Factorisation on Many-Core Systems, MIS MPG preprint 5, 2014. 8. M. Izadi, Dissertation, Hierarchical Matrix Techniques on Massive Parallel Computers, Uni Leipzig, Germany, 2012. 9. L. Grasedyck, S. Boerm, H-matrix library, http://www.hlib.org