SlideShare a Scribd company logo
1 of 13
Download to read offline
AN EFFICIENT IMPLEMENTATION OF AN ALGEBRAIC MULTIGRID
SOLVER
ImplementacioĢn Eficiente de un Solver Multigrid Algebraico
JORGE A. CASTELLANOS D., JOSEĢ L. RAMIĢREZ y GERMAĢN A. LARRAZAĢBAL S.
Centro Multidisciplinario de VisualizacioĢn y CoĢmputo CientıĢfico (CeMViCC)
Universidad de Carabobo. Facultad Experimental de Ciencia y TecnologıĢa.
Carabobo. Venezuela.
{jcasteld, jbarrios, glarraza}@uc.edu.ve
Fecha de RecepcioĢn: 15/01/2010, Fecha de RevisioĢn: 01/03/2010, Fecha de AceptacioĢn: 15/07/2010
Abstract
In this work, we present an efficient implementation of an algebraic multigrid method (AMG) to solve
large sparse systems of linear equations. The multigrid methods (MG), in particular the AMGā€™s, exhibit
a theoretical linear complexity with respect to the number of floating point operations (FLOP) and the
problem size. In practice, the problem is to determine when it is preferable to use an AMG instead of
some other solver. For this reason, in this work the main focus is to solve this problem. We use a set of
linear systems arising from a 3D scalar elliptic operator discretized by finite difference method. In order
to evaluate the implementation, a set of 8 linear systems is generated. The maximum order of the matrices
associated to these linear systems is 2,744,000. The experimental results show the good performance of
the AMGā€™s implementation. This performance is observed in the linear behavior of the AMG in our test
problems.
Keywords: Sparse Linear Solvers, Algebraic Multigrid, AMG, Code Optimization.
Resumen
En este trabajo se presenta una implementacioĢn eficiente del MeĢtodo Multinivel Algebraico (AMG)
para la resolucioĢn de grandes sistemas de ecuaciones lineales esparcidos. Los meĢtodos multinivel (MG),
en particular AMG, muestran una complejidad lineal con respecto al nuĢmero de operaciones de punto
flotante (FLOP) y el tamanĢƒo del problema. En la praĢctica, el problema consiste en determinar cuando es
preferible emplear AMG en lugar de otro meĢtodo. Por este motivo, este trabajo se enfoca en resolver este
problema. Se empleoĢ un conjunto de sistemas lineales provenientes de la discretizacioĢn de un operador
3D escalar elıĢptico mediante diferencias finitas. Con el objetivo de evaluar la implementacioĢn presentada
se generoĢ un conjunto de 8 sistemas lineales. El orden maĢximode las matrices asociadas a los sistemas
lineales generados, fue de 2.744.000. Los resultados experimentales muestran un buen comportamiento
de la implementacioĢn, donde se obtuvo un comportamiento lineal del MeĢtodo Multinivel Algebraico
(AMG).
Palabra Claves: AMG, Multinivel Algebraico, OptimizacioĢn de CoĢdigo, Sistemas Esparcidos.
1. Introduction
The computational solution of linear equa-
tion systems is one of the most important re-
search areas nowadays, specially those systems
that come from modeling physical problems with
certain complexity. Examples are those systems
associated with industrial applications, such as
fluid dynamics and structural mechanics. The ty-
pical computational core, to solve these problems
is the linear system:
Ax = b (1)
where the A matrix is non-singular, large and
sparse. Then, it is very important to find a nu-
merical method to solve Eq.1 in such a man-
ner that the number of floating point operations
is proportional to problem size. This feature has
been reached with the multigrid (MG) methods
(Brandt, 1977). The multigrid methods exhibit a
theoretical linear complexity with respect to the
number of operations and the problem size. Ot-
hers methods, such as the classical iterative met-
hods, present a cuadratic complexity. But, the
multigrid methods spend more memory and have
a limited applicability. Currently, there are multi-
grid methods that can be used applying any kind
of discretization. These are the Algebraic Multi-
grid Methods (AMG) (Axelsson & Vassilevski,
1989; Axelsson & Vassilevski, 1990; Vanek et
al., 1996; StuĢˆben, 1999). The AMG algorithms
are composed by two stages, the setup phase and
the solving phase. The latter shows linear com-
plexity (Axelsson & Vassilevski, 1991; Cela &
Navarro, 1992), but the setup phase represents
an overhead. This overhead can only be decrea-
sed on large size problems. In Iwamura et al.
(2003), solvers has been developed based on an
AMG with a quick setup phase and a fast itera-
tion cycle. These characteristics make AMG sui-
table to solve medium size problems. In Mo & Xu
(2007), a parallel coarsening strategy that allows
to improve the convergence as well as to obtain
reasonable CPU times using tipical AMG algo-
rithms is presented. A recent work (Pereira et al.,
2006) shows that it is possible to reduce the AMG
memory use using the discrete wavelet transform
(DWT) in the setup phase. In that work, the DWT
was applied to build up the matrices hierarchy of
the wavelets multiresolution decomposition pro-
cess. In other recent work (Joubert & Cullum,
2006), a parallel scalable AMG is presented. This
scalability is achieved through the use of a new
parallel coarsening technique in addition to an
agressive coarsening and multipass interpolation
technique.
In this work, we show that it is possible,
in practice, to solve efficiently the linear system
Eq.1 that arises from a 3D scalar convection-
diffusion operator using an AMG with a CPU ti-
me proportional to the problem size.
The present paper is organized in the follo-
wing manner. In section 2, the theoretical foun-
dations of the AMG methods are presented. In
section 3, details of the AMG implementation are
shown. section 4 contains the techniques that we-
re used to optimize the codes presented in section
3. section 5 presents the experimental results ob-
tained by using the optimized solver. Finally, in
section 6, conclusions are discussed.
2. Multigrid Methods
A multigrid method consists of the follo-
wing elements:
1. A sequence of meshes with a matrix asso-
ciated to each grid.
2. Intergrid transfer operators between the
meshes (Interpolator and Restrictor).
3. A classical iterative method (Gauss-Seidel,
Jacobi, SSOR, etc.), which is called relaxer.
In order to explain the multigrid method,
suppose that there exist two grids called Mh
(fine-grid) and MH
(coarse-grid). Ah and AH are
matrices that come from a discretization in Mh
and MH
. The dimensions of Ah and AH are nƗn
and N Ɨ N, respectively. The problem is then to
solve a linear system in the fine-grid Mh
as fo-
llows:
Ahuh
= fh
(2)
where uh
, fh
āˆˆ Rn
, we will denominate V h
=
Rn
, and similarly V H
= RN
. Two transference
operators between V h
and V H
must be built up.
These are the interpolation operator I : V H
ā†’
V h
and the restriction operator R : V h
ā†’ V H
.
In the algebraic multigrid methods (AMG) AH is
defined as equation 3. Analogously V H
= RN
.
AH = RAhI (3)
2.1. Multigrid Algorithm
The multigrid method is based on the Al-
gorithm 1. Because usually the problem contains
more than two meshes, step 3 of the algorithm is
applied recursively until the problem is reduced
to a sufficiently coarse mesh. The previous algo-
rithm is known as the V-cycle algorithm (Fig. 1).
Different multigrid algorithms exist and they are
named depending on the form in which each level
is visited.
1. Relax v times Ahuh
= fh
in V h
, with
initial solution uh
0.
2. Compute rH
= R(fh
āˆ’ Ahuh
v ).
3. Solve AHeH
= rH
in V H
.
4. Correct the approximation in the fine
grid: uh
v = uh
v + IeH
.
5. Relax Āµ times Ahuh
= fh
with initial
solution uh
v
Algorithm 1: V-cycle multigrid algorithm
The Fig. 1 shows the scheme for the V-
cycle, although there are other schemes used such
as W-cycle and F-cycle.
Fig. 1. v-cycle.
The relaxation methods, such as Jacobi
or Gauss-Seidel, reduce effectively only certain
components of the error, in particular those com-
ponents associated with the relaxer eigenvalues
that are close to zero. These are called high fre-
quency components of the error. In Fig. 2 the ef-
fect of smoothing, after a relaxation step, can be
observed.
Fig. 2. Effect of smoothed.
The key to success for multigrid methods is
to find adequate R and I operators, so that the co-
rrection operator of coarse grid is able to correct
the errors that relaxation is not able to attenuate.
Thus, it is possible to obtain an exact solution of
the system with little relaxation. If this property
is achieved, the complexity of the method as a
whole, will be an expression of the type:
Complexity = O(k(n + Np
))
with p = 2 or 3 depending on the method used to
solve the system in the coarse grid. If Np
 n,
the multigrid method presents a linear comple-
xity, that is to say, O(kn). The main focus of this
work is to determine the value of k for which it is
preferable to use the algebraic multigrid method
instead of some classical iterative method.
2.2. Algebraic Multigrid
Formally, a cycle AMG can be described in
the same way that a multilevel geometric cycle,
except that the terms mesh, submesh, node, etc.
are replaced by a set of variables, subset of va-
riables, variables, etc. Formal components of an
AMG can be described by means of the Eq. 4.
X
jāˆˆV h
ah
ijuh
j = fh
i (i āˆˆ V h
) (4)
In Eq. 4, V h
denotes the set of indices
1, 2, . . . , n; it is assumed that Ah is a sparse ma-
trix. In order to generate a coarse system from
Eq.4, it is necessary to perfom a partition of the
set V h
in two disjoints subsets, V h
= Ch
āˆŖ Fh
,
where the subset Ch
contains the variables that
are in the coarse level (coarse nodes) and the sub-
set Fh
(fine nodes) is the complement of Ch
. As-
suming that such partition is given, and defining
V H
= Ch
, the coarse system is shown in Eq.5.
AHuH
= fH
o
X
lāˆˆā„¦H
aH
kluH
l = fH
k (k āˆˆ V H
) (5)
where AH = RAhI, R : V h
ā†’ V H
is the res-
triction operator and I : V H
ā†’ V h
is the in-
terpolation operator R = It
. Finally, as in any
multilevel method a process of smoothing Sh is
required. One step of this smoothing process is
of the form:
uh
ā†’ uh
where
uh
= Shuh
+ (Ih āˆ’ Sh)Aāˆ’1
h fh
(6)
In Eq. 6, Ih denotes the identity operator.
Consequently, the error eh
= uh
āˆ— āˆ’ uh
, where uh
āˆ—
denotes the exact solution of Eq.4, is transformed
accordingly equation 7.
eh
ā†’ eh
where eh
= Sheh
(7)
The AMG uses a simple process of smoot-
hing such as Gauss-Seidel (Sh = Ih āˆ’ Qāˆ’1
h Ah,
where Qh is the lower triangular of Ah, inclu-
ding the diagonal) or weighted-Jacobi relaxation
(Sh = Ih āˆ’ wDāˆ’1
h Ah, where Dh = diag(Ah)).
The matrix Ah should be diagonally dominant in
order to use the aforesaid methods, otherwise ad-
ditional hypotheses must be formulated on Ah.
The partition C/F and the transference ope-
rators I and R must be built up explicitly. The
construction of these components is the most im-
portant task of an AMG. These components must
be built so that an efficient relationship between
the smoothing process and the coarse grid correc-
tion operator exists. This relationship depends on
the algebraic sense of the smoothing errors under
certain smoothing process. Nevertheless, the al-
gebraic interpretation of the low frequency errors
for the relaxers, Gauss-Seidel and Jacobi, has
been studied for M symmetrical positive definite
with weakly diagonal dominance, as it is men-
tioned (Ruge  StuĢˆben, 1993). For these cases
the construction of coarse spaces is based on cha-
racterizing the errors approaching over the matrix
graph. This characterization is achieved if the fo-
llowing expression is satisfied:
X
j6=i
|aij|
aii
(ei āˆ’ ej)2
e2
i
 1 (8)
where aij are the matrix coefficients and ei, ej
are components of the error. Then, for Eq.8 to be
fulfilled, the matrix must be weakly diagonal do-
minant.
Similarly, it is important that the partition
and the transference operators are such that AH
is reasonably sparse and much smaller than Ah.
According to the construction phase, AMG
are divided into two groups, methods based on in-
terpolation techniques and methods based on ad-
dition techniques. The difference between these
methods is the manner in which they build the
partition C/F and the transference operators.
2.2.1. Interpolation Methods
In 1987, J. Ruge and K. StuĢˆben (Ruge
 StuĢˆben, 1993) proposed an algebraic mul-
tigrid method for symmetrical positive definite
Māˆ’matrices with weakly diagonal dominance,
using either the Jacobi or Gauss-Seidel method
as relaxer. Here, the interpolation is based on the
concept of the strong connection between nodes.
In this work, the convergence and linear com-
plexity order of the method is demonstrated. In
1991, W. Huang (Huang, 1991) supported on the
results of Ruge and StuĢˆben to demonstrate the
AMG convergence for symmetrical positive de-
finite matrices with weak dominant diagonal. In
1994, Reusken (Reusken, 1994) proposed a met-
hod similar to the one of Ruge and StuĢˆben ba-
sed on an approach to the Schur complement. In
1991, Wagner (Wagner et al., 1991) presented a
modified version of the method of Reusken. In
this version, a modified linear system Mx = Ff
is solved by means of a simple blocks elimina-
tion, as in the method of Reusken. Here, the coef-
ficients of M are defined in a different manner.
These two last proposals work well for positive
definite matrices with weak dominant diagonal.
In what follows, the Ruge and StuĢˆben met-
hod, with some alternative proposals in subse-
quent works (Krechel  StuĢˆben, 1997; StuĢˆben,
1999), is described because it is going to be used
in the implementation of this work. In this met-
hod the goal is to build the sets C and F so
that the F-nodes are obtained interpolating the
C-nodes.
Standard Coarsening This method is
implemented in this study because of its low me-
mory consumption in comparison with the other
methods shown in the subsequent paragraphs. In
this method, the following sets are defined in both
equations, 9 and 10.
Ni = {j āˆˆ V : j 6= i, aij 6= 0} (9)
Si =

j āˆˆ Ni : āˆ’aij ā‰„ Ī· maĢx
l6=i
{āˆ’ail}

(10)
The set Si is known as the set of strong con-
nections of the node i. The typical value of Ī· is
0,25. The interpolation nodes Ci are defined as
follows:
Ci = C āŠ‚ Si
Aggressive Coarsening This type of
coarsening is used when the standard coarsening
can cause a relatively high complexity, that is
to say, that the memory requirement is excessi-
ve due to the Galerkin operators of the coarse le-
vels. In these cases, the Galerkin matrix is much
denser than the original matrix.
Positive Strong Connections In both ty-
pes of coarsening, the building of the subsets C
and F is based on negative couplings. It is assu-
med that relatively small positive couplings can
be ignored in the processing as well as in the in-
terpolation. Nevertheless, this cannot be assumed
in all the cases. A process of building the subsets
C and F must ensure that for all F-nodes, which
have negative and positive strong couplings, a mi-
nimum number of both couplings should be re-
presented in C. Building such subsets within a
single step is relatively complicated, since for a
great variety of situations most of the strong con-
nections are negative. In StuĢˆben (1999) a simple
alternative implementation is proposed.
Interpolation It is assumed that subsets C and
F have been constructed using a standard or ag-
gressive coarsening. In the first case, direct or
standard interpolation is used. In the second case,
multistep interpolation is used. In both cases, the
interpolation can be improved by further steps of
relaxation.
Direct Interpolation For each i āˆˆ F, the
set of interpolation nodes Pi = Cs
i (Cs
i = C āˆ©Si)
is defined and it is approached as:
aiiei +
X
jāˆˆNi
aijej = 0 =ā‡’
aiiei + Ī±i
X
kāˆˆPi
aāˆ’
ikek + Ī²i
X
kāˆˆPi
a+
ikek = 0 (11)
with
Ī±i =
X
jāˆˆNi
aāˆ’
ij
X
kāˆˆPi
aāˆ’
ik
and Ī²i =
X
jāˆˆNi
a+
ij
X
kāˆˆPi
a+
ik
(12)
Where, aāˆ’
ij matrix element below the diagonal
and aāˆ’
ij matrix element above the diagonal. This
immediately leads to the following interpolation
formula:
ei =
X
kāˆˆPi
wikek with
wik =

āˆ’Ī±iaik/aii (k āˆˆ Pāˆ’
i )
āˆ’Ī²iaik/aii (k āˆˆ P+
i )
(13)
Standard Interpolation The direct inter-
polation can be modified so that the strong F-
connections are (indirectly) included in the inter-
polation of each i āˆˆ F. This is achieved imme-
diately by approximating the i-th equation (right
side of the Eq.11). First, all ej (j āˆˆ Fs
i =
F āˆ© Si) are eliminated, approximately by means
of the corresponding j-th equation. Specifically,
for each j āˆˆ Fs
i , ej are replaced by
ej āˆ’ā†’ āˆ’
X
kāˆˆNj
ajkek/ajj (14)
in the new equation for ei
aĢ‚iiei +
X
jāˆˆNĢ‚i
aĢ‚ijej = 0 with
NĢ‚i = {j 6= i : aĢ‚ij 6= 0} (15)
Defining Pi as the union between Ci and
all the Cj (j āˆˆ Fs
i ), the interpolation is defined
exactly as in Eq.11 - Eq.12 with all the a replaced
by aĢ‚ and Ni replaced by NĢ‚i.
This modification usually improves the qua-
lity of interpolation. The main reason is that
this type of approximation Eq.11 introduces less
error, when it is applied to the Eq. 15. However,
the main goal is to have F-nodes as interpolation
nodes.
Multistep Interpolation This interpola-
tion method is applied when subsets C and F
are constructed through an aggressive coarse-
ning. This interpolation is done in several steps,
using direct interpolation whenever itā€™s possible
and, for the rest of the nodes, using interpolation
formulas in the F-nodes neighborhoods.
Jacobi Interpolation Given any of the in-
terpolation formulas outlined above, it is possi-
ble to obtain an additional improvement if a Ja-
cobi relaxation is applied a posteriori. Only one
or two steps of Jacobi are worthwhile in prac-
tice. Depending on the situation, the relaxation
of the interpolation can significantly improve the
convergence. This improvement of the interpola-
tion operator was proposed in 1997 by Krechel 
StuĢˆben ( Krechel  StuĢˆben, 1997).
2.2.2. Aggregation Methods
In 1995, D. Braess (Braess, 1995) proposed
an aggregation method that grouped several no-
des in sets with sizes that range from one to four
nodes. The sets are constructed in two steps: in
the first step, pairs of nodes strongly connected
are grouped and in the second step, pairs formed
in the previous step are grouped.
In 1996, P. Vanek, J. Mandel and M. Brezi-
na (Vanek et al., 1996) developed an AMG based
on interpolation by smoothed aggregations. They
used their algorithm to solve second and fourth
order elliptic problems.
In 1997, F. Kickinger (Kickinger, 1997)
proposed an AMG where the coarsening strategy
is independent of the strong connections. This
means that this method is based only on the graph
of the matrix. The coarsening strategy is aggres-
sive and fast. This algorithm is based on coloring
the graph of the matrix, so that a node labelled as
coarse will have all neighbors labelled as fine. In
the Fig. 3, an example of this strategy is shown.
(a) Original Graph (b) Resulting Graph
Fig. 3. Graph Coloring.
2.2.3. Complexity Operators
There are two complexity operators that in-
dicate approximately the memory requirements
that arise when using any of the AMG methods.
These requirements are expressed in terms of
both the complexity grid operator CG and the al-
gebraic complexity operator CA,
CG =
X
l
nl
n1
(16)
and
CA =
X
l
nzl
nz1
(17)
where nl and nzl denote the number of variables
and the number of non-zero elements of the ma-
trix A in the level l, respectively.
3. AMG Linear Solver
In this work, the main purpose is to imple-
ment a linear solver, based on an AMG, that ex-
hibits a linear behavior when it is applied to large
size input matrices. The AMG solver implemen-
ted here incorporates two implementations for the
setup phase with the intention of illustrating the
behavior of both the interpolation and the ag-
gregation method. As interpolation method, the
standard coarsening proposed by Ruge  StuĢˆben
(1993) (section 2.2.1) was selected. As aggrega-
tion method, the red-black coloring method of
graphs proposed by Kickinger (1997) (section
2.2.1) was chosen. In order to evaluate the met-
hods under consideration, an existing implemen-
tation of the numerical library UCSparseLib (La-
rrazaĢbal, 2004) was used.
3.1. Test Matrices
The input matrices for the linear solver were
generated discretizing a 3D scalar elliptic opera-
tor by means of a second order 7 āˆ’ stencil finite
differences method.
The matrices generation characterizes a
great variety of industrial problems, and is defi-
ned as follows:
L(u) = āˆ†u + Ccāˆ‡ Ā· u (18)
Where Cc = 0 (Convection coeficient), with Di-
richlet boundary conditions and a unit cube as
computacional domain. This ensures that the ma-
trices generated are symmetrical positive definite.
3.2. Setup Phase
The function mgrid is called in a first ins-
tance to carry out the setup phase. During this sta-
ge, the levels are generated according to what was
explained in 2.2.1 using either, the StuĢˆben met-
hod or the Red-Black coloring method in agree
to the values defined in the array of parameters.
This stage concludes factorizing the resulting ma-
trix in the coarsest level using a direct method.
Among the operations carried out in the setup
phase, when the red-black coloring method was
used, the generation of the levels is the one with
the highest computational cost. This is because,
in order to generate each level, it is necessary to
carry out a product of three matrices. The three
matrix product has a cubic complexity order in
relation to number of rows of the matrix. Even
though a direct method is still used to factorize
the matrix in the coarsest level, its computational
cost is relatively smaller since the order of this
matrix is generally smaller than 5,000.
3.3. Solver Phase
The function mgrid is called for a second
time to execute the solution phase of the AMG
method. This phase consists, according to algo-
rithm 1, of a loop that is repeated until mee-
ting the set up tolerance or reaching the maxi-
mum number of iterations indicated in the para-
meter array. In each iteration, a cycle that can be
a V-cycle, W-cycle or F-cycle is made happen.
According to what was indicated in section 2.1,
when each one of the levels of the cycle is visi-
ted, a relaxation process is performed. The rela-
xation process can be either Weighted-Jacobi or
Gauss-Seidel according to the parameters array.
The smoothing process is repeated a number of
times according to the value configured in the pa-
rameters array. The efficiency of the solver co-
de is closely related to the implementation of the
product matrix-vector.
4. Code Optimization
In this section the improvements to the
first version of the code implemented for the
algebraic method multigrid, in UCSparseLib
(LarrazaĢbal, 2004). Test were run on a Sun
Fire V40z server, using one AMD Opteron
885, 2.6 GHz processor with 1MB cache and
main memory of 16GB. The codes were im-
plemented in ANSI C, and compiled using gcc
3.6 under GNU/Linux. The compiler optimiza-
tion flags selected were -march=opteron,
-O2, -funroll-loops, -fprefetch,
-loop-arrays. The use of these optimization
allowed: not only to obtain a linear behavior but
to reduce the CPU time in 35 % approximately
in comparison with the codes compiled without
optimization flags.
4.1. Setup Phase Optimization
At the beginning of the tests of the AMG
implementation, it was observed that the setup
phase demanded a CPU time greater than the
solution phase and that when the problem size
was increased there was not an improvement in
this ratio according to the AMG theory. Then
an analysis of the CPU time of the setup phase
was carried out using the GNU-gprof (Fenlason
 Stallman, 1998) and Valgrind (Weidendorfer
et al., 2004) tools.
In the setup phase, using the strong con-
nections method, with an input matrix of order
N = 64, 000, 87 % of CPU time was represen-
ted by the C-nodes selection algorithm (section
2.2.1). It was possible to improve the performan-
ce of the C-nodes selection algorithm by marking
early the C-nodes initially processed in the se-
lection loop. This change represented 32 % im-
provement in the CPU time of the selection algo-
rithm and was translated as a 55 % of the comple-
te setup phase time. The setup time before the im-
provement was 11.67 seconds and, after the im-
provement it was 4.09 seconds for the matrix of
N = 64, 000.
In the setup phase, using the red-black colo-
ring algorithm, with an input matrix of size N =
64, 000, 58 % of CPU time was represented by
the matrix-matrix product, while 37 % correspon-
ded to the matrix transpositions. Both the matrix-
matrix product and the transposition were used to
generate the AH matrices such as it was explai-
ned in section 2.2.2. The matrix-matrix product
algorithm was optimized to avoid the transpose
matrix operation. Also the matrix-matrix product
algorithm was improved to reduce its total exe-
cution time. After the improvement, time needed
to calculate the matrix-matrix product represen-
ted 68 % of the setup time. The setup time before
the improvement was 7.93 seconds and after the
change it was 0.785 seconds, using an input ma-
trix with N = 64, 000.
4.2. Solver Phase Optimization
In order to evaluate the solver phase, a set of
tests were executed using either the Gauss-Seidel
or the Weighted-Jacobi relaxer. Execution profi-
les were generated configuring the V-cycle using
the previously mentioned tools G-Prof and Val-
grind.
When the tests with the Gauss-Seidel rela-
xer were made, the 48 % of the time was repre-
sented by the matrix-vector operation, whereas
51 % was spent by the scalar vector product algo-
rithm. Because the procedure for the calculation
of the scalar product was called within the fun-
ction that calculates the matrix-vector product,
and furthermore the code of this function is very
simple, the calls to the scalar product were repla-
ced by the code itself. This represented a save in
the run time for the solver phase of approxima-
tely 15 %.
Finally, it was determined experimentally
that two iterations of Gauss-Seidel relaxer impro-
ved the solver time. When the Weighted-Jacobi
relaxer was used, it was determined experimenta-
lly that the best value for the weight constant was
w = 0,9.
In the tests with both types of relaxers, a li-
mit for the size of the matrix in the coarsest level
(N = 5, 000) was used because this value allows
to have a more linear response.
When N  5, 000 was used, an appreciable
increase in the setup phase was seen due to the
high computational cost of using a direct method
to factorizate a matrix in the coarsest level. The
solver selected for the coarsest level was the Cho-
lesky factorization method since the input matrix
was symmetrical positive definite (section 3.1)
for all the test cases. The stop criteria eps was
10āˆ’12
for all the cases.
5. Experimental Results
In order to evaluate the run time complexity
order, a set of 8 matrices was generated. The or-
der N of these matrices goes from a minimum
of 343, 000 (70 Ɨ 70 Ɨ 70) to a maximum of
2, 744, 000 (140 Ɨ 140 Ɨ 140) with the idea of
including a range of at least one order of magni-
tude. The matrices were generated from an ellip-
tical 3D operator according to section 3.1. The
main characteristics of these matrices can be ob-
served in the Table. 1.
nx Ɨ ny Ɨ nz Size Non-zeros
70 Ɨ 70 Ɨ 70 343,000 2,371,600
80 Ɨ 80 Ɨ 80 512,000 3,545,600
90 Ɨ 90 Ɨ 90 729,000 5,054,400
100 Ɨ 100 Ɨ 100 1,000,000 6,940,000
110 Ɨ 110 Ɨ 110 1,331,000 9,244,400
120 Ɨ 120 Ɨ 120 1,728,000 12,009,600
130 Ɨ 130 Ɨ 130 2,197,000 15,277,600
140 Ɨ 140 Ɨ 140 2,744,000 19,090,400
Table 1. Test matrices.
According to the discussion tackled in sec-
tion 4, the linearsystems represented by the test
matrices were solved using three strategies: an
iterative method (Conjugate Gradient), strong
connections based AMG method and an aggre-
gation based AMG (Red-black coloring). In or-
der to corroborate that the CPU time is propor-
tional to the order of the linear system (N), in
the case of AMG, the solution times for aggre-
gation based AMG, AMG (Red-black coloring)
and Conjugated Gradient were plotted for the test
matrices set. As it is known, the CPU time of the
method Conjugate Gradient follows a quadratic
behavior with respect to the order of the linear
system matrix because the floating point opera-
tions number (FLOP) is of order O(N2
). In order
to determine the influence of the cycle type in the
execution time tests using V-cycle, F-cycle and
W-cycle (Fig. 1 as a reference for the V-cycle)
were effected. In these tests Weighted-Jacobi and
Gauss-Seidel were used as relaxers.
In Table. 2, the results for V-cycle using
the Weighted-Jacobi relaxer with a weight cons-
tant w = 0,9 are shown. This value was selec-
ted because the best times were obtained for this
specific case. Also, in the same table, results for
F-cycle and W-cycle using Gauss-Seidel relaxer
with two iterations are shown, since in this spe-
cific condition better results are obtained. The
operator complexity, the grid complexity and the
number of levels were determined, in order to
evaluate the quality of the matrix systems gene-
rated in each level for both the interpolation and
aggregation methods. These results can be seen
in Table. 3.
The Fig. 4 is a graphical representation of
Table. 2 that illustrates a linear behavior of the
algebraic multigrid in relation to the matrix size,
N. This AMG uses a setup phase based on the
red-black coloring interpolation method. In the
graph, the AMG performance can be compared
against the Conjugate Gradient. Note that as the
size of the problem (N  500, 000) increases, the
multigrid method exhibits better times for both
W-cycle and F-cycle. The worst case for an AMG
occurs when using the V-cycle approach; in this
situation the Conjugate Gradient method shows
better times for N  1, 000, 000. The straight li-
ne that represents the behavior of the multigrid
method was obtained from the linear regression
of the coordinates of the points that indicate the
CPU times for the diverse sizes of the problem
solved. It is also to be observed that the best ti-
mes were obtained for the F-cycle, as foreseen in
N Conj. Grad.
Red-Black Coloring Strong-Connection
V-cycle F-cycle W-cycle V-cycle F-cycle W-cycle
343,000 13.269 17.557 14.688 15.304 44.198 39.348 37.677
512,000 22.612 28.799 22.602 21.804 80.275 74.205 70.663
729,000 36.470 40.075 29.999 33.714 122.209 114.262 108.783
1,000,000 55.401 49.817 37.305 39.334 202.765 192.443 183.698
1,331,000 76.063 78.637 54.827 61.665 280.646 267.269 253.981
1,728,000 106.442 89.627 64.640 73.796 425.041 409.450 392.990
2,197,000 145.802 123.651 90.532 102.007 579.050 563.157 541.644
2,744,000 188.627 142.499 105.295 116.760 832.633 806.268 789.846
Table 2. AMG vs. Conjugate Gradient CPU times (secs).
N
Red-Black Coloring Strong-Connection
Levels G. Complex. A. Complex Levels G. Complex. A. Complex
343,000 4 1.57 2.53 4 1.60 2.83
512,000 4 1.57 2.53 5 1.60 3.10
729,000 5 1.57 2.54 6 1.61 3.41
1,000,000 5 1.57 2.54 6 1.61 3.47
1,331,000 5 1.57 2.54 7 1.61 3.67
1,728,000 5 1.57 2.54 7 1.61 3.74
2,197,000 5 1.57 2.54 8 1.61 4.02
2,744,000 5 1.57 2.54 8 1.61 3.98
Table 3. Complexity operators.
Fig. 4. Solution times for Linear systems.
section 2.1.
In order to estimate the constant K, that
relates the number of floating point operations
(FLOP) with the order N, it was determined the
average of floating point operations per second
(FLOPS) for the test computer system. This was
done by means of DGEMM benchmark (Luszc-
zek et al., 2005), which measures the floating
point rate of execution of double real precision
matrix-matrix multiplication. The computer sys-
tem used in the tests shows an average value of
4.76 GFLOPS. Fig. 5.2, that shows the average
number of GFLOP, employed to solve the test li-
near systems, was generated using the estimated
value and the data of the Table. 2. Then, the ap-
proximate constants for the different AMG cy-
Fig. 5. GFLOP in solving linear systems.
cles were determined using Fig. 5. These cons-
tants are: 253,757 FLOP for the V-cycle, 208,528
FLOP for W-cycle and 183,771 FLOP for the F-
cycle. The behavior of the strong connections ba-
sed AMG was not plotted because it was not li-
near due to the long times of the setup phase.
6. Conclusions
In this work, we have presented an efficient
implementation of an algebraic multigrid method
(AMG) to solve large sparse systems of linear
equations. We have used a set of linear systems
arising from a 3D scalar elliptic operator discre-
tized by finite difference method. In order to eva-
luate the implementation, a set of 8 linear sys-
tems was generated. The maximum order of the
matrices associated to these linear systems was
2,744,000. The AMG performance was compa-
red against the Conjugate Gradient. The AMG
needed a fast setup phase to obtain a good per-
formance. Note that as the size of the problem
(N  500, 000) increases, the multigrid met-
hod exhibits better times for both W-cycle and F-
cycle. The worst case for an AMG occurs when
using the V-cycle approach; in this situation the
Conjugate Gradient method shows better times
for N  1, 000, 000. Also, we have observed
that the best CPU times were obtained for the F-
cycle. The approximate constants, k, for the dif-
ferent AMG cycles were: 253,757 FLOP for the
V-cycle, 208,528 FLOP for W-cycle and 183,771
FLOP for the F-cycle. In general, our AMG im-
plementation had a good perfomance and we ha-
ve obtained a linear AMG solver.
Acknowledgments
This work was supported by the Consejo de
Desarrollo CientıĢfico y HumanıĢstico de la Uni-
versidad de Carabobo under the projects CDCH-
UC No. 2004-002. and CDCH-UC No. 2004-
011. Also, we want to thank Pedro Linares for
his right suggestions with regard to this paper.
7. Bibliography
Axelsson, O.  P. Vassilevski. (1989). Al-
gebraic Multilevel Preconditioning Methods I.
Journal on Numer. Anal. Math. 56(2-3): 157-177.
Axelsson, O.  P. Vassilevski. (1990). Algebraic
Multilevel Preconditioning Methods II. Journal
on Numer. Anal. 27(6): 1564-1590.
Axelsson, O.  P. Vassilevski. (1991). Asym-
ptotic Work Estimates for AML Methods. Appl.
Numer. Math. 7(5): 437-451.
Braess, D. (1995). Towards Algebraic Multigrid
for Elliptic Problems of Second Order. Compu-
ting. 55(4): 379-393.
Brandt, A. (1977). Multi-Level Adaptive So-
lutions to Boundary-Value Problems. Math.
Comput. 31(138): 333-390.
Cela, J.  J. Navarro.(1992). Performance Model
for Algebraic Multilevel Preconditioner on a
Shared Memory Multicomputers. PACTA92.
Fenlason, J.  R. Stallman. (1998). The
GNU Profiler. Free Software Founda-
tion. [cited 08 enero 2010; 15:30 VET].
Also available at http://www.gnu.org/
software/binutils/manual/gprof-2.9.1/html chap-
ter/gprof toc.html.
Huang, W. (1991). Convergence of Algebraic
Multigrid Methods for Symmetric Positive Defi-
nite Matrices with Weak Diagonal Dominance.
Appl. Math. Comp. 46(2): 145-164.
Iwamura, C., F. Costa, I. Sbarski, A. Easton 
N. Li,(2003). An efficient algebraic multigrid
preconditioned conjugate gradient solver. Com-
put. Meth. Appl. Mech. Eng. 192(20): 2299-2318.
Joubert, W.  J. Cullum.(2006). Scalable alge-
braic multigrid on 3500 processors. Electron.
Trans. Numer. Anal. 23: 105-128.
Kickinger, F. (1997). Algebraic Multigrid for
Discrete Elliptic Second-Order Problems. Tech-
nical Report. Institute for Mathematics. Johannes
Kepler University Linz. Austria.
Krechel, A.  K. StuĢˆben. (1997). Operator
Dependent Interpolation in Algebraic Multigrid.
Technical Report. GMD Report 1048.
LarrazaĢbal, G. (2004). UCSparseLib: A nu-
merical library to solve sparse linear systems.
SimulacioĢn NumeĢrica y Simulado Computacio-
nal, pp. TC19-TC25. Eds. J. Rojo, M. Torres y
M. Cerrolaza. ISBN 980-6745-00-0, SVMNI,
Venezuela.
Luszczek, P., J. Dongarra, R. Rabenseifner,
B. Lucas, J. Kepner, J. McCalpin, D. Bai-
ley  D. Takahashi. (2005). Introduction to
the HPC Challenge Benchmark Suite. [cited
08 enero 2010; 14:30 VET]. Also available in
http://icl.cs.utk.edu/projectsfiles/hpcc/pubs/hpcc-
challenge-benchmark05.pdf.
Mo, Z.  X. Xu. (2007). Relaxed RS0 and CLJP
coarsening strategy for parallel AMG. Parallel
Comput. 33(3): 174-185.
Pereira, F., S. Lopes  S. Nabeta. (2006). A
wavelet-based algebraic multigrid preconditioner
for sparse linear systems. Appl. Math. Comput.
182(2): 1098-1107.
Reusken, A. (1994). Multigrid with Matrix-
Dependent Transfer Operators for Convection-
Diffusion Problems, in Multigrid Methods, vol.
IV, Internat. Ser. Numer. Math. 116, BirkhuĢˆser-
Verlag, Basel, pp. 269-280.
Ruge, J.  K. StuĢˆben. (1993). Algebraic
Multigrid (AMG). In: Multigrid Methods (Mc-
Cormick, S.F., ed). SIAM. Frontiers in Applied
Mathematics. Philadelphia, USA.
StuĢˆben, K. (1999). Algebraic Multigrid (AMG):
An Introduction with Applications, Technical
Report, GMD Report 53.
Vanek, P., J. Mandel  M. Brezina. (1996).
Algebraic Multigrid by Smoothed Aggregation
for Second Order and Fourth Order Elliptic
Problems. Computing. 56(3): 179-196.
Wagner, C., W. Kinzelbach  G. Wittum. (1991).
Schur-Complement Multigrid - a Robust Method
for Groundwater Flow and Transport Problems.
Numer. Math. 75(4): 523-545.
Weidendorfer, J., M. Kowarschik  C. Trinitis.
(2004). A Tool Suite for Simulation Based
Analysis of Memory Access Behavior. Procee-
dings of the 4th International Conference on
Computational Science (ICCS 2004). Krakow.
Poland.
View publication stats
View publication stats

More Related Content

Similar to An Efficient Implementation Of An Algebraic Multigrid Solver

A Self-Tuned Simulated Annealing Algorithm using Hidden Markov Mode
A Self-Tuned Simulated Annealing Algorithm using Hidden Markov ModeA Self-Tuned Simulated Annealing Algorithm using Hidden Markov Mode
A Self-Tuned Simulated Annealing Algorithm using Hidden Markov ModeIJECEIAES
Ā 
2007 santiago marchi_cobem_2007
2007 santiago marchi_cobem_20072007 santiago marchi_cobem_2007
2007 santiago marchi_cobem_2007CosmoSantiago
Ā 
Plan economico
Plan economicoPlan economico
Plan economicoCrist Oviedo
Ā 
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...Thomas Templin
Ā 
2005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_20052005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_2005CosmoSantiago
Ā 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
Ā 
Numerical disperison analysis of sympletic and adi scheme
Numerical disperison analysis of sympletic and adi schemeNumerical disperison analysis of sympletic and adi scheme
Numerical disperison analysis of sympletic and adi schemexingangahu
Ā 
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...Sheila Sinclair
Ā 
Acceleration Schemes Of The Discrete Velocity Method Gaseous Flows In Rectan...
Acceleration Schemes Of The Discrete Velocity Method  Gaseous Flows In Rectan...Acceleration Schemes Of The Discrete Velocity Method  Gaseous Flows In Rectan...
Acceleration Schemes Of The Discrete Velocity Method Gaseous Flows In Rectan...Monique Carr
Ā 
Slides TSALBP ACO 2008
Slides TSALBP ACO 2008Slides TSALBP ACO 2008
Slides TSALBP ACO 2008Manuel ChiSe
Ā 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3Saad Farooq
Ā 
Parellelism in spectral methods
Parellelism in spectral methodsParellelism in spectral methods
Parellelism in spectral methodsRamona Corman
Ā 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdfAllanKelvinSales
Ā 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMIJCSEA Journal
Ā 
Gy3312241229
Gy3312241229Gy3312241229
Gy3312241229IJERA Editor
Ā 

Similar to An Efficient Implementation Of An Algebraic Multigrid Solver (20)

A Self-Tuned Simulated Annealing Algorithm using Hidden Markov Mode
A Self-Tuned Simulated Annealing Algorithm using Hidden Markov ModeA Self-Tuned Simulated Annealing Algorithm using Hidden Markov Mode
A Self-Tuned Simulated Annealing Algorithm using Hidden Markov Mode
Ā 
RS
RSRS
RS
Ā 
2007 santiago marchi_cobem_2007
2007 santiago marchi_cobem_20072007 santiago marchi_cobem_2007
2007 santiago marchi_cobem_2007
Ā 
Plan economico
Plan economicoPlan economico
Plan economico
Ā 
Plan economico
Plan economicoPlan economico
Plan economico
Ā 
Plan economico del 2017
Plan economico del 2017Plan economico del 2017
Plan economico del 2017
Ā 
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Using the Componentwise Metropolis-Hastings Algorithm to Sample from the Join...
Ā 
2005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_20052005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_2005
Ā 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
Ā 
Numerical disperison analysis of sympletic and adi scheme
Numerical disperison analysis of sympletic and adi schemeNumerical disperison analysis of sympletic and adi scheme
Numerical disperison analysis of sympletic and adi scheme
Ā 
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...
Ā 
Acceleration Schemes Of The Discrete Velocity Method Gaseous Flows In Rectan...
Acceleration Schemes Of The Discrete Velocity Method  Gaseous Flows In Rectan...Acceleration Schemes Of The Discrete Velocity Method  Gaseous Flows In Rectan...
Acceleration Schemes Of The Discrete Velocity Method Gaseous Flows In Rectan...
Ā 
Slides TSALBP ACO 2008
Slides TSALBP ACO 2008Slides TSALBP ACO 2008
Slides TSALBP ACO 2008
Ā 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3
Ā 
Ijetr021210
Ijetr021210Ijetr021210
Ijetr021210
Ā 
Ijetr021210
Ijetr021210Ijetr021210
Ijetr021210
Ā 
Parellelism in spectral methods
Parellelism in spectral methodsParellelism in spectral methods
Parellelism in spectral methods
Ā 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
Ā 
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHMTHE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
THE RESEARCH OF QUANTUM PHASE ESTIMATION ALGORITHM
Ā 
Gy3312241229
Gy3312241229Gy3312241229
Gy3312241229
Ā 

More from Angela Shin

PPT Writing A Narrative Essay PowerPoint Presentation Free To
PPT Writing A Narrative Essay PowerPoint Presentation Free ToPPT Writing A Narrative Essay PowerPoint Presentation Free To
PPT Writing A Narrative Essay PowerPoint Presentation Free ToAngela Shin
Ā 
The Archives The College Board Essays, Part 3 Sam
The Archives The College Board Essays, Part 3 SamThe Archives The College Board Essays, Part 3 Sam
The Archives The College Board Essays, Part 3 SamAngela Shin
Ā 
Health Care Essay. Online assignment writing service.
Health Care Essay. Online assignment writing service.Health Care Essay. Online assignment writing service.
Health Care Essay. Online assignment writing service.Angela Shin
Ā 
PDF A Manual For Writers Of Term Papers, Theses, And D
PDF A Manual For Writers Of Term Papers, Theses, And DPDF A Manual For Writers Of Term Papers, Theses, And D
PDF A Manual For Writers Of Term Papers, Theses, And DAngela Shin
Ā 
Writing Topics For Kids Writing Topics, Journal Pro
Writing Topics For Kids Writing Topics, Journal ProWriting Topics For Kids Writing Topics, Journal Pro
Writing Topics For Kids Writing Topics, Journal ProAngela Shin
Ā 
Summary Essay. Online assignment writing service.
Summary Essay. Online assignment writing service.Summary Essay. Online assignment writing service.
Summary Essay. Online assignment writing service.Angela Shin
Ā 
College Essays, College Application Essays - The C
College Essays, College Application Essays - The CCollege Essays, College Application Essays - The C
College Essays, College Application Essays - The CAngela Shin
Ā 
Sample Essay Topics For College.. Online assignment writing service.
Sample Essay Topics For College.. Online assignment writing service.Sample Essay Topics For College.. Online assignment writing service.
Sample Essay Topics For College.. Online assignment writing service.Angela Shin
Ā 
Thematic Essay Writing Steps By. Online assignment writing service.
Thematic Essay Writing Steps By. Online assignment writing service.Thematic Essay Writing Steps By. Online assignment writing service.
Thematic Essay Writing Steps By. Online assignment writing service.Angela Shin
Ā 
DBQSEssays - UShistory. Online assignment writing service.
DBQSEssays - UShistory. Online assignment writing service.DBQSEssays - UShistory. Online assignment writing service.
DBQSEssays - UShistory. Online assignment writing service.Angela Shin
Ā 
007 Essay Example Writing App The Best For Mac Ipa
007 Essay Example Writing App The Best For Mac Ipa007 Essay Example Writing App The Best For Mac Ipa
007 Essay Example Writing App The Best For Mac IpaAngela Shin
Ā 
How To Write An Abstract For A Research Paper Fast And Easy
How To Write An Abstract For A Research Paper Fast And EasyHow To Write An Abstract For A Research Paper Fast And Easy
How To Write An Abstract For A Research Paper Fast And EasyAngela Shin
Ā 
How To Become A Better, Faster, And More Efficient
How To Become A Better, Faster, And More EfficientHow To Become A Better, Faster, And More Efficient
How To Become A Better, Faster, And More EfficientAngela Shin
Ā 
Narrative Essay Presentation. Online assignment writing service.
Narrative Essay Presentation. Online assignment writing service.Narrative Essay Presentation. Online assignment writing service.
Narrative Essay Presentation. Online assignment writing service.Angela Shin
Ā 
Describing People - All Things Topics Learn Engli
Describing People - All Things Topics Learn EngliDescribing People - All Things Topics Learn Engli
Describing People - All Things Topics Learn EngliAngela Shin
Ā 
Custom, Cheap Essay Writing Services - Essay Bureau Is
Custom, Cheap Essay Writing Services - Essay Bureau IsCustom, Cheap Essay Writing Services - Essay Bureau Is
Custom, Cheap Essay Writing Services - Essay Bureau IsAngela Shin
Ā 
Hypothesis Example In Research Paper - The Res
Hypothesis Example In Research Paper - The ResHypothesis Example In Research Paper - The Res
Hypothesis Example In Research Paper - The ResAngela Shin
Ā 
Social Science Research Paper Example - Mariah E
Social Science Research Paper Example - Mariah ESocial Science Research Paper Example - Mariah E
Social Science Research Paper Example - Mariah EAngela Shin
Ā 
Write Esse Best Websites For Essays In English
Write Esse Best Websites For Essays In EnglishWrite Esse Best Websites For Essays In English
Write Esse Best Websites For Essays In EnglishAngela Shin
Ā 
Brilliant How To Write A Good Conclusion With Examples Re
Brilliant How To Write A Good Conclusion With Examples ReBrilliant How To Write A Good Conclusion With Examples Re
Brilliant How To Write A Good Conclusion With Examples ReAngela Shin
Ā 

More from Angela Shin (20)

PPT Writing A Narrative Essay PowerPoint Presentation Free To
PPT Writing A Narrative Essay PowerPoint Presentation Free ToPPT Writing A Narrative Essay PowerPoint Presentation Free To
PPT Writing A Narrative Essay PowerPoint Presentation Free To
Ā 
The Archives The College Board Essays, Part 3 Sam
The Archives The College Board Essays, Part 3 SamThe Archives The College Board Essays, Part 3 Sam
The Archives The College Board Essays, Part 3 Sam
Ā 
Health Care Essay. Online assignment writing service.
Health Care Essay. Online assignment writing service.Health Care Essay. Online assignment writing service.
Health Care Essay. Online assignment writing service.
Ā 
PDF A Manual For Writers Of Term Papers, Theses, And D
PDF A Manual For Writers Of Term Papers, Theses, And DPDF A Manual For Writers Of Term Papers, Theses, And D
PDF A Manual For Writers Of Term Papers, Theses, And D
Ā 
Writing Topics For Kids Writing Topics, Journal Pro
Writing Topics For Kids Writing Topics, Journal ProWriting Topics For Kids Writing Topics, Journal Pro
Writing Topics For Kids Writing Topics, Journal Pro
Ā 
Summary Essay. Online assignment writing service.
Summary Essay. Online assignment writing service.Summary Essay. Online assignment writing service.
Summary Essay. Online assignment writing service.
Ā 
College Essays, College Application Essays - The C
College Essays, College Application Essays - The CCollege Essays, College Application Essays - The C
College Essays, College Application Essays - The C
Ā 
Sample Essay Topics For College.. Online assignment writing service.
Sample Essay Topics For College.. Online assignment writing service.Sample Essay Topics For College.. Online assignment writing service.
Sample Essay Topics For College.. Online assignment writing service.
Ā 
Thematic Essay Writing Steps By. Online assignment writing service.
Thematic Essay Writing Steps By. Online assignment writing service.Thematic Essay Writing Steps By. Online assignment writing service.
Thematic Essay Writing Steps By. Online assignment writing service.
Ā 
DBQSEssays - UShistory. Online assignment writing service.
DBQSEssays - UShistory. Online assignment writing service.DBQSEssays - UShistory. Online assignment writing service.
DBQSEssays - UShistory. Online assignment writing service.
Ā 
007 Essay Example Writing App The Best For Mac Ipa
007 Essay Example Writing App The Best For Mac Ipa007 Essay Example Writing App The Best For Mac Ipa
007 Essay Example Writing App The Best For Mac Ipa
Ā 
How To Write An Abstract For A Research Paper Fast And Easy
How To Write An Abstract For A Research Paper Fast And EasyHow To Write An Abstract For A Research Paper Fast And Easy
How To Write An Abstract For A Research Paper Fast And Easy
Ā 
How To Become A Better, Faster, And More Efficient
How To Become A Better, Faster, And More EfficientHow To Become A Better, Faster, And More Efficient
How To Become A Better, Faster, And More Efficient
Ā 
Narrative Essay Presentation. Online assignment writing service.
Narrative Essay Presentation. Online assignment writing service.Narrative Essay Presentation. Online assignment writing service.
Narrative Essay Presentation. Online assignment writing service.
Ā 
Describing People - All Things Topics Learn Engli
Describing People - All Things Topics Learn EngliDescribing People - All Things Topics Learn Engli
Describing People - All Things Topics Learn Engli
Ā 
Custom, Cheap Essay Writing Services - Essay Bureau Is
Custom, Cheap Essay Writing Services - Essay Bureau IsCustom, Cheap Essay Writing Services - Essay Bureau Is
Custom, Cheap Essay Writing Services - Essay Bureau Is
Ā 
Hypothesis Example In Research Paper - The Res
Hypothesis Example In Research Paper - The ResHypothesis Example In Research Paper - The Res
Hypothesis Example In Research Paper - The Res
Ā 
Social Science Research Paper Example - Mariah E
Social Science Research Paper Example - Mariah ESocial Science Research Paper Example - Mariah E
Social Science Research Paper Example - Mariah E
Ā 
Write Esse Best Websites For Essays In English
Write Esse Best Websites For Essays In EnglishWrite Esse Best Websites For Essays In English
Write Esse Best Websites For Essays In English
Ā 
Brilliant How To Write A Good Conclusion With Examples Re
Brilliant How To Write A Good Conclusion With Examples ReBrilliant How To Write A Good Conclusion With Examples Re
Brilliant How To Write A Good Conclusion With Examples Re
Ā 

Recently uploaded

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
Ā 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
Ā 
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdfssuser54595a
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
Ā 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
Ā 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
Ā 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)Dr. Mazin Mohamed alkathiri
Ā 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
Ā 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
Ā 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
Ā 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
Ā 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
Ā 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
Ā 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
Ā 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
Ā 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
Ā 

Recently uploaded (20)

Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
Ā 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
Ā 
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
Ā 
Model Call Girl in Bikash Puri Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Bikash Puri  Delhi reach out to us at šŸ”9953056974šŸ”Model Call Girl in Bikash Puri  Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Bikash Puri Delhi reach out to us at šŸ”9953056974šŸ”
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Ā 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
Ā 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
Ā 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
Ā 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Ā 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
Ā 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
Ā 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
Ā 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
Ā 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
Ā 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
Ā 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
Ā 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
Ā 

An Efficient Implementation Of An Algebraic Multigrid Solver

  • 1. AN EFFICIENT IMPLEMENTATION OF AN ALGEBRAIC MULTIGRID SOLVER ImplementacioĢn Eficiente de un Solver Multigrid Algebraico JORGE A. CASTELLANOS D., JOSEĢ L. RAMIĢREZ y GERMAĢN A. LARRAZAĢBAL S. Centro Multidisciplinario de VisualizacioĢn y CoĢmputo CientıĢfico (CeMViCC) Universidad de Carabobo. Facultad Experimental de Ciencia y TecnologıĢa. Carabobo. Venezuela. {jcasteld, jbarrios, glarraza}@uc.edu.ve Fecha de RecepcioĢn: 15/01/2010, Fecha de RevisioĢn: 01/03/2010, Fecha de AceptacioĢn: 15/07/2010 Abstract In this work, we present an efficient implementation of an algebraic multigrid method (AMG) to solve large sparse systems of linear equations. The multigrid methods (MG), in particular the AMGā€™s, exhibit a theoretical linear complexity with respect to the number of floating point operations (FLOP) and the problem size. In practice, the problem is to determine when it is preferable to use an AMG instead of some other solver. For this reason, in this work the main focus is to solve this problem. We use a set of linear systems arising from a 3D scalar elliptic operator discretized by finite difference method. In order to evaluate the implementation, a set of 8 linear systems is generated. The maximum order of the matrices associated to these linear systems is 2,744,000. The experimental results show the good performance of the AMGā€™s implementation. This performance is observed in the linear behavior of the AMG in our test problems. Keywords: Sparse Linear Solvers, Algebraic Multigrid, AMG, Code Optimization. Resumen En este trabajo se presenta una implementacioĢn eficiente del MeĢtodo Multinivel Algebraico (AMG) para la resolucioĢn de grandes sistemas de ecuaciones lineales esparcidos. Los meĢtodos multinivel (MG), en particular AMG, muestran una complejidad lineal con respecto al nuĢmero de operaciones de punto flotante (FLOP) y el tamanĢƒo del problema. En la praĢctica, el problema consiste en determinar cuando es preferible emplear AMG en lugar de otro meĢtodo. Por este motivo, este trabajo se enfoca en resolver este problema. Se empleoĢ un conjunto de sistemas lineales provenientes de la discretizacioĢn de un operador 3D escalar elıĢptico mediante diferencias finitas. Con el objetivo de evaluar la implementacioĢn presentada se generoĢ un conjunto de 8 sistemas lineales. El orden maĢximode las matrices asociadas a los sistemas lineales generados, fue de 2.744.000. Los resultados experimentales muestran un buen comportamiento de la implementacioĢn, donde se obtuvo un comportamiento lineal del MeĢtodo Multinivel Algebraico (AMG). Palabra Claves: AMG, Multinivel Algebraico, OptimizacioĢn de CoĢdigo, Sistemas Esparcidos.
  • 2. 1. Introduction The computational solution of linear equa- tion systems is one of the most important re- search areas nowadays, specially those systems that come from modeling physical problems with certain complexity. Examples are those systems associated with industrial applications, such as fluid dynamics and structural mechanics. The ty- pical computational core, to solve these problems is the linear system: Ax = b (1) where the A matrix is non-singular, large and sparse. Then, it is very important to find a nu- merical method to solve Eq.1 in such a man- ner that the number of floating point operations is proportional to problem size. This feature has been reached with the multigrid (MG) methods (Brandt, 1977). The multigrid methods exhibit a theoretical linear complexity with respect to the number of operations and the problem size. Ot- hers methods, such as the classical iterative met- hods, present a cuadratic complexity. But, the multigrid methods spend more memory and have a limited applicability. Currently, there are multi- grid methods that can be used applying any kind of discretization. These are the Algebraic Multi- grid Methods (AMG) (Axelsson & Vassilevski, 1989; Axelsson & Vassilevski, 1990; Vanek et al., 1996; StuĢˆben, 1999). The AMG algorithms are composed by two stages, the setup phase and the solving phase. The latter shows linear com- plexity (Axelsson & Vassilevski, 1991; Cela & Navarro, 1992), but the setup phase represents an overhead. This overhead can only be decrea- sed on large size problems. In Iwamura et al. (2003), solvers has been developed based on an AMG with a quick setup phase and a fast itera- tion cycle. These characteristics make AMG sui- table to solve medium size problems. In Mo & Xu (2007), a parallel coarsening strategy that allows to improve the convergence as well as to obtain reasonable CPU times using tipical AMG algo- rithms is presented. A recent work (Pereira et al., 2006) shows that it is possible to reduce the AMG memory use using the discrete wavelet transform (DWT) in the setup phase. In that work, the DWT was applied to build up the matrices hierarchy of the wavelets multiresolution decomposition pro- cess. In other recent work (Joubert & Cullum, 2006), a parallel scalable AMG is presented. This scalability is achieved through the use of a new parallel coarsening technique in addition to an agressive coarsening and multipass interpolation technique. In this work, we show that it is possible, in practice, to solve efficiently the linear system Eq.1 that arises from a 3D scalar convection- diffusion operator using an AMG with a CPU ti- me proportional to the problem size. The present paper is organized in the follo- wing manner. In section 2, the theoretical foun- dations of the AMG methods are presented. In section 3, details of the AMG implementation are shown. section 4 contains the techniques that we- re used to optimize the codes presented in section 3. section 5 presents the experimental results ob- tained by using the optimized solver. Finally, in section 6, conclusions are discussed. 2. Multigrid Methods A multigrid method consists of the follo- wing elements: 1. A sequence of meshes with a matrix asso- ciated to each grid. 2. Intergrid transfer operators between the meshes (Interpolator and Restrictor). 3. A classical iterative method (Gauss-Seidel, Jacobi, SSOR, etc.), which is called relaxer. In order to explain the multigrid method, suppose that there exist two grids called Mh (fine-grid) and MH (coarse-grid). Ah and AH are matrices that come from a discretization in Mh and MH . The dimensions of Ah and AH are nƗn and N Ɨ N, respectively. The problem is then to solve a linear system in the fine-grid Mh as fo- llows: Ahuh = fh (2)
  • 3. where uh , fh āˆˆ Rn , we will denominate V h = Rn , and similarly V H = RN . Two transference operators between V h and V H must be built up. These are the interpolation operator I : V H ā†’ V h and the restriction operator R : V h ā†’ V H . In the algebraic multigrid methods (AMG) AH is defined as equation 3. Analogously V H = RN . AH = RAhI (3) 2.1. Multigrid Algorithm The multigrid method is based on the Al- gorithm 1. Because usually the problem contains more than two meshes, step 3 of the algorithm is applied recursively until the problem is reduced to a sufficiently coarse mesh. The previous algo- rithm is known as the V-cycle algorithm (Fig. 1). Different multigrid algorithms exist and they are named depending on the form in which each level is visited. 1. Relax v times Ahuh = fh in V h , with initial solution uh 0. 2. Compute rH = R(fh āˆ’ Ahuh v ). 3. Solve AHeH = rH in V H . 4. Correct the approximation in the fine grid: uh v = uh v + IeH . 5. Relax Āµ times Ahuh = fh with initial solution uh v Algorithm 1: V-cycle multigrid algorithm The Fig. 1 shows the scheme for the V- cycle, although there are other schemes used such as W-cycle and F-cycle. Fig. 1. v-cycle. The relaxation methods, such as Jacobi or Gauss-Seidel, reduce effectively only certain components of the error, in particular those com- ponents associated with the relaxer eigenvalues that are close to zero. These are called high fre- quency components of the error. In Fig. 2 the ef- fect of smoothing, after a relaxation step, can be observed. Fig. 2. Effect of smoothed. The key to success for multigrid methods is to find adequate R and I operators, so that the co- rrection operator of coarse grid is able to correct the errors that relaxation is not able to attenuate. Thus, it is possible to obtain an exact solution of the system with little relaxation. If this property is achieved, the complexity of the method as a whole, will be an expression of the type: Complexity = O(k(n + Np )) with p = 2 or 3 depending on the method used to solve the system in the coarse grid. If Np n, the multigrid method presents a linear comple- xity, that is to say, O(kn). The main focus of this work is to determine the value of k for which it is preferable to use the algebraic multigrid method instead of some classical iterative method. 2.2. Algebraic Multigrid Formally, a cycle AMG can be described in the same way that a multilevel geometric cycle, except that the terms mesh, submesh, node, etc. are replaced by a set of variables, subset of va- riables, variables, etc. Formal components of an AMG can be described by means of the Eq. 4. X jāˆˆV h ah ijuh j = fh i (i āˆˆ V h ) (4)
  • 4. In Eq. 4, V h denotes the set of indices 1, 2, . . . , n; it is assumed that Ah is a sparse ma- trix. In order to generate a coarse system from Eq.4, it is necessary to perfom a partition of the set V h in two disjoints subsets, V h = Ch āˆŖ Fh , where the subset Ch contains the variables that are in the coarse level (coarse nodes) and the sub- set Fh (fine nodes) is the complement of Ch . As- suming that such partition is given, and defining V H = Ch , the coarse system is shown in Eq.5. AHuH = fH o X lāˆˆā„¦H aH kluH l = fH k (k āˆˆ V H ) (5) where AH = RAhI, R : V h ā†’ V H is the res- triction operator and I : V H ā†’ V h is the in- terpolation operator R = It . Finally, as in any multilevel method a process of smoothing Sh is required. One step of this smoothing process is of the form: uh ā†’ uh where uh = Shuh + (Ih āˆ’ Sh)Aāˆ’1 h fh (6) In Eq. 6, Ih denotes the identity operator. Consequently, the error eh = uh āˆ— āˆ’ uh , where uh āˆ— denotes the exact solution of Eq.4, is transformed accordingly equation 7. eh ā†’ eh where eh = Sheh (7) The AMG uses a simple process of smoot- hing such as Gauss-Seidel (Sh = Ih āˆ’ Qāˆ’1 h Ah, where Qh is the lower triangular of Ah, inclu- ding the diagonal) or weighted-Jacobi relaxation (Sh = Ih āˆ’ wDāˆ’1 h Ah, where Dh = diag(Ah)). The matrix Ah should be diagonally dominant in order to use the aforesaid methods, otherwise ad- ditional hypotheses must be formulated on Ah. The partition C/F and the transference ope- rators I and R must be built up explicitly. The construction of these components is the most im- portant task of an AMG. These components must be built so that an efficient relationship between the smoothing process and the coarse grid correc- tion operator exists. This relationship depends on the algebraic sense of the smoothing errors under certain smoothing process. Nevertheless, the al- gebraic interpretation of the low frequency errors for the relaxers, Gauss-Seidel and Jacobi, has been studied for M symmetrical positive definite with weakly diagonal dominance, as it is men- tioned (Ruge StuĢˆben, 1993). For these cases the construction of coarse spaces is based on cha- racterizing the errors approaching over the matrix graph. This characterization is achieved if the fo- llowing expression is satisfied: X j6=i |aij| aii (ei āˆ’ ej)2 e2 i 1 (8) where aij are the matrix coefficients and ei, ej are components of the error. Then, for Eq.8 to be fulfilled, the matrix must be weakly diagonal do- minant. Similarly, it is important that the partition and the transference operators are such that AH is reasonably sparse and much smaller than Ah. According to the construction phase, AMG are divided into two groups, methods based on in- terpolation techniques and methods based on ad- dition techniques. The difference between these methods is the manner in which they build the partition C/F and the transference operators. 2.2.1. Interpolation Methods In 1987, J. Ruge and K. StuĢˆben (Ruge StuĢˆben, 1993) proposed an algebraic mul- tigrid method for symmetrical positive definite Māˆ’matrices with weakly diagonal dominance, using either the Jacobi or Gauss-Seidel method as relaxer. Here, the interpolation is based on the concept of the strong connection between nodes. In this work, the convergence and linear com- plexity order of the method is demonstrated. In 1991, W. Huang (Huang, 1991) supported on the results of Ruge and StuĢˆben to demonstrate the AMG convergence for symmetrical positive de- finite matrices with weak dominant diagonal. In 1994, Reusken (Reusken, 1994) proposed a met-
  • 5. hod similar to the one of Ruge and StuĢˆben ba- sed on an approach to the Schur complement. In 1991, Wagner (Wagner et al., 1991) presented a modified version of the method of Reusken. In this version, a modified linear system Mx = Ff is solved by means of a simple blocks elimina- tion, as in the method of Reusken. Here, the coef- ficients of M are defined in a different manner. These two last proposals work well for positive definite matrices with weak dominant diagonal. In what follows, the Ruge and StuĢˆben met- hod, with some alternative proposals in subse- quent works (Krechel StuĢˆben, 1997; StuĢˆben, 1999), is described because it is going to be used in the implementation of this work. In this met- hod the goal is to build the sets C and F so that the F-nodes are obtained interpolating the C-nodes. Standard Coarsening This method is implemented in this study because of its low me- mory consumption in comparison with the other methods shown in the subsequent paragraphs. In this method, the following sets are defined in both equations, 9 and 10. Ni = {j āˆˆ V : j 6= i, aij 6= 0} (9) Si = j āˆˆ Ni : āˆ’aij ā‰„ Ī· maĢx l6=i {āˆ’ail} (10) The set Si is known as the set of strong con- nections of the node i. The typical value of Ī· is 0,25. The interpolation nodes Ci are defined as follows: Ci = C āŠ‚ Si Aggressive Coarsening This type of coarsening is used when the standard coarsening can cause a relatively high complexity, that is to say, that the memory requirement is excessi- ve due to the Galerkin operators of the coarse le- vels. In these cases, the Galerkin matrix is much denser than the original matrix. Positive Strong Connections In both ty- pes of coarsening, the building of the subsets C and F is based on negative couplings. It is assu- med that relatively small positive couplings can be ignored in the processing as well as in the in- terpolation. Nevertheless, this cannot be assumed in all the cases. A process of building the subsets C and F must ensure that for all F-nodes, which have negative and positive strong couplings, a mi- nimum number of both couplings should be re- presented in C. Building such subsets within a single step is relatively complicated, since for a great variety of situations most of the strong con- nections are negative. In StuĢˆben (1999) a simple alternative implementation is proposed. Interpolation It is assumed that subsets C and F have been constructed using a standard or ag- gressive coarsening. In the first case, direct or standard interpolation is used. In the second case, multistep interpolation is used. In both cases, the interpolation can be improved by further steps of relaxation. Direct Interpolation For each i āˆˆ F, the set of interpolation nodes Pi = Cs i (Cs i = C āˆ©Si) is defined and it is approached as: aiiei + X jāˆˆNi aijej = 0 =ā‡’ aiiei + Ī±i X kāˆˆPi aāˆ’ ikek + Ī²i X kāˆˆPi a+ ikek = 0 (11) with Ī±i = X jāˆˆNi aāˆ’ ij X kāˆˆPi aāˆ’ ik and Ī²i = X jāˆˆNi a+ ij X kāˆˆPi a+ ik (12) Where, aāˆ’ ij matrix element below the diagonal and aāˆ’ ij matrix element above the diagonal. This immediately leads to the following interpolation formula:
  • 6. ei = X kāˆˆPi wikek with wik = āˆ’Ī±iaik/aii (k āˆˆ Pāˆ’ i ) āˆ’Ī²iaik/aii (k āˆˆ P+ i ) (13) Standard Interpolation The direct inter- polation can be modified so that the strong F- connections are (indirectly) included in the inter- polation of each i āˆˆ F. This is achieved imme- diately by approximating the i-th equation (right side of the Eq.11). First, all ej (j āˆˆ Fs i = F āˆ© Si) are eliminated, approximately by means of the corresponding j-th equation. Specifically, for each j āˆˆ Fs i , ej are replaced by ej āˆ’ā†’ āˆ’ X kāˆˆNj ajkek/ajj (14) in the new equation for ei aĢ‚iiei + X jāˆˆNĢ‚i aĢ‚ijej = 0 with NĢ‚i = {j 6= i : aĢ‚ij 6= 0} (15) Defining Pi as the union between Ci and all the Cj (j āˆˆ Fs i ), the interpolation is defined exactly as in Eq.11 - Eq.12 with all the a replaced by aĢ‚ and Ni replaced by NĢ‚i. This modification usually improves the qua- lity of interpolation. The main reason is that this type of approximation Eq.11 introduces less error, when it is applied to the Eq. 15. However, the main goal is to have F-nodes as interpolation nodes. Multistep Interpolation This interpola- tion method is applied when subsets C and F are constructed through an aggressive coarse- ning. This interpolation is done in several steps, using direct interpolation whenever itā€™s possible and, for the rest of the nodes, using interpolation formulas in the F-nodes neighborhoods. Jacobi Interpolation Given any of the in- terpolation formulas outlined above, it is possi- ble to obtain an additional improvement if a Ja- cobi relaxation is applied a posteriori. Only one or two steps of Jacobi are worthwhile in prac- tice. Depending on the situation, the relaxation of the interpolation can significantly improve the convergence. This improvement of the interpola- tion operator was proposed in 1997 by Krechel StuĢˆben ( Krechel StuĢˆben, 1997). 2.2.2. Aggregation Methods In 1995, D. Braess (Braess, 1995) proposed an aggregation method that grouped several no- des in sets with sizes that range from one to four nodes. The sets are constructed in two steps: in the first step, pairs of nodes strongly connected are grouped and in the second step, pairs formed in the previous step are grouped. In 1996, P. Vanek, J. Mandel and M. Brezi- na (Vanek et al., 1996) developed an AMG based on interpolation by smoothed aggregations. They used their algorithm to solve second and fourth order elliptic problems. In 1997, F. Kickinger (Kickinger, 1997) proposed an AMG where the coarsening strategy is independent of the strong connections. This means that this method is based only on the graph of the matrix. The coarsening strategy is aggres- sive and fast. This algorithm is based on coloring the graph of the matrix, so that a node labelled as coarse will have all neighbors labelled as fine. In the Fig. 3, an example of this strategy is shown. (a) Original Graph (b) Resulting Graph Fig. 3. Graph Coloring.
  • 7. 2.2.3. Complexity Operators There are two complexity operators that in- dicate approximately the memory requirements that arise when using any of the AMG methods. These requirements are expressed in terms of both the complexity grid operator CG and the al- gebraic complexity operator CA, CG = X l nl n1 (16) and CA = X l nzl nz1 (17) where nl and nzl denote the number of variables and the number of non-zero elements of the ma- trix A in the level l, respectively. 3. AMG Linear Solver In this work, the main purpose is to imple- ment a linear solver, based on an AMG, that ex- hibits a linear behavior when it is applied to large size input matrices. The AMG solver implemen- ted here incorporates two implementations for the setup phase with the intention of illustrating the behavior of both the interpolation and the ag- gregation method. As interpolation method, the standard coarsening proposed by Ruge StuĢˆben (1993) (section 2.2.1) was selected. As aggrega- tion method, the red-black coloring method of graphs proposed by Kickinger (1997) (section 2.2.1) was chosen. In order to evaluate the met- hods under consideration, an existing implemen- tation of the numerical library UCSparseLib (La- rrazaĢbal, 2004) was used. 3.1. Test Matrices The input matrices for the linear solver were generated discretizing a 3D scalar elliptic opera- tor by means of a second order 7 āˆ’ stencil finite differences method. The matrices generation characterizes a great variety of industrial problems, and is defi- ned as follows: L(u) = āˆ†u + Ccāˆ‡ Ā· u (18) Where Cc = 0 (Convection coeficient), with Di- richlet boundary conditions and a unit cube as computacional domain. This ensures that the ma- trices generated are symmetrical positive definite. 3.2. Setup Phase The function mgrid is called in a first ins- tance to carry out the setup phase. During this sta- ge, the levels are generated according to what was explained in 2.2.1 using either, the StuĢˆben met- hod or the Red-Black coloring method in agree to the values defined in the array of parameters. This stage concludes factorizing the resulting ma- trix in the coarsest level using a direct method. Among the operations carried out in the setup phase, when the red-black coloring method was used, the generation of the levels is the one with the highest computational cost. This is because, in order to generate each level, it is necessary to carry out a product of three matrices. The three matrix product has a cubic complexity order in relation to number of rows of the matrix. Even though a direct method is still used to factorize the matrix in the coarsest level, its computational cost is relatively smaller since the order of this matrix is generally smaller than 5,000. 3.3. Solver Phase The function mgrid is called for a second time to execute the solution phase of the AMG method. This phase consists, according to algo- rithm 1, of a loop that is repeated until mee- ting the set up tolerance or reaching the maxi- mum number of iterations indicated in the para- meter array. In each iteration, a cycle that can be a V-cycle, W-cycle or F-cycle is made happen. According to what was indicated in section 2.1, when each one of the levels of the cycle is visi- ted, a relaxation process is performed. The rela- xation process can be either Weighted-Jacobi or Gauss-Seidel according to the parameters array. The smoothing process is repeated a number of
  • 8. times according to the value configured in the pa- rameters array. The efficiency of the solver co- de is closely related to the implementation of the product matrix-vector. 4. Code Optimization In this section the improvements to the first version of the code implemented for the algebraic method multigrid, in UCSparseLib (LarrazaĢbal, 2004). Test were run on a Sun Fire V40z server, using one AMD Opteron 885, 2.6 GHz processor with 1MB cache and main memory of 16GB. The codes were im- plemented in ANSI C, and compiled using gcc 3.6 under GNU/Linux. The compiler optimiza- tion flags selected were -march=opteron, -O2, -funroll-loops, -fprefetch, -loop-arrays. The use of these optimization allowed: not only to obtain a linear behavior but to reduce the CPU time in 35 % approximately in comparison with the codes compiled without optimization flags. 4.1. Setup Phase Optimization At the beginning of the tests of the AMG implementation, it was observed that the setup phase demanded a CPU time greater than the solution phase and that when the problem size was increased there was not an improvement in this ratio according to the AMG theory. Then an analysis of the CPU time of the setup phase was carried out using the GNU-gprof (Fenlason Stallman, 1998) and Valgrind (Weidendorfer et al., 2004) tools. In the setup phase, using the strong con- nections method, with an input matrix of order N = 64, 000, 87 % of CPU time was represen- ted by the C-nodes selection algorithm (section 2.2.1). It was possible to improve the performan- ce of the C-nodes selection algorithm by marking early the C-nodes initially processed in the se- lection loop. This change represented 32 % im- provement in the CPU time of the selection algo- rithm and was translated as a 55 % of the comple- te setup phase time. The setup time before the im- provement was 11.67 seconds and, after the im- provement it was 4.09 seconds for the matrix of N = 64, 000. In the setup phase, using the red-black colo- ring algorithm, with an input matrix of size N = 64, 000, 58 % of CPU time was represented by the matrix-matrix product, while 37 % correspon- ded to the matrix transpositions. Both the matrix- matrix product and the transposition were used to generate the AH matrices such as it was explai- ned in section 2.2.2. The matrix-matrix product algorithm was optimized to avoid the transpose matrix operation. Also the matrix-matrix product algorithm was improved to reduce its total exe- cution time. After the improvement, time needed to calculate the matrix-matrix product represen- ted 68 % of the setup time. The setup time before the improvement was 7.93 seconds and after the change it was 0.785 seconds, using an input ma- trix with N = 64, 000. 4.2. Solver Phase Optimization In order to evaluate the solver phase, a set of tests were executed using either the Gauss-Seidel or the Weighted-Jacobi relaxer. Execution profi- les were generated configuring the V-cycle using the previously mentioned tools G-Prof and Val- grind. When the tests with the Gauss-Seidel rela- xer were made, the 48 % of the time was repre- sented by the matrix-vector operation, whereas 51 % was spent by the scalar vector product algo- rithm. Because the procedure for the calculation of the scalar product was called within the fun- ction that calculates the matrix-vector product, and furthermore the code of this function is very simple, the calls to the scalar product were repla- ced by the code itself. This represented a save in the run time for the solver phase of approxima- tely 15 %. Finally, it was determined experimentally that two iterations of Gauss-Seidel relaxer impro- ved the solver time. When the Weighted-Jacobi relaxer was used, it was determined experimenta- lly that the best value for the weight constant was
  • 9. w = 0,9. In the tests with both types of relaxers, a li- mit for the size of the matrix in the coarsest level (N = 5, 000) was used because this value allows to have a more linear response. When N 5, 000 was used, an appreciable increase in the setup phase was seen due to the high computational cost of using a direct method to factorizate a matrix in the coarsest level. The solver selected for the coarsest level was the Cho- lesky factorization method since the input matrix was symmetrical positive definite (section 3.1) for all the test cases. The stop criteria eps was 10āˆ’12 for all the cases. 5. Experimental Results In order to evaluate the run time complexity order, a set of 8 matrices was generated. The or- der N of these matrices goes from a minimum of 343, 000 (70 Ɨ 70 Ɨ 70) to a maximum of 2, 744, 000 (140 Ɨ 140 Ɨ 140) with the idea of including a range of at least one order of magni- tude. The matrices were generated from an ellip- tical 3D operator according to section 3.1. The main characteristics of these matrices can be ob- served in the Table. 1. nx Ɨ ny Ɨ nz Size Non-zeros 70 Ɨ 70 Ɨ 70 343,000 2,371,600 80 Ɨ 80 Ɨ 80 512,000 3,545,600 90 Ɨ 90 Ɨ 90 729,000 5,054,400 100 Ɨ 100 Ɨ 100 1,000,000 6,940,000 110 Ɨ 110 Ɨ 110 1,331,000 9,244,400 120 Ɨ 120 Ɨ 120 1,728,000 12,009,600 130 Ɨ 130 Ɨ 130 2,197,000 15,277,600 140 Ɨ 140 Ɨ 140 2,744,000 19,090,400 Table 1. Test matrices. According to the discussion tackled in sec- tion 4, the linearsystems represented by the test matrices were solved using three strategies: an iterative method (Conjugate Gradient), strong connections based AMG method and an aggre- gation based AMG (Red-black coloring). In or- der to corroborate that the CPU time is propor- tional to the order of the linear system (N), in the case of AMG, the solution times for aggre- gation based AMG, AMG (Red-black coloring) and Conjugated Gradient were plotted for the test matrices set. As it is known, the CPU time of the method Conjugate Gradient follows a quadratic behavior with respect to the order of the linear system matrix because the floating point opera- tions number (FLOP) is of order O(N2 ). In order to determine the influence of the cycle type in the execution time tests using V-cycle, F-cycle and W-cycle (Fig. 1 as a reference for the V-cycle) were effected. In these tests Weighted-Jacobi and Gauss-Seidel were used as relaxers. In Table. 2, the results for V-cycle using the Weighted-Jacobi relaxer with a weight cons- tant w = 0,9 are shown. This value was selec- ted because the best times were obtained for this specific case. Also, in the same table, results for F-cycle and W-cycle using Gauss-Seidel relaxer with two iterations are shown, since in this spe- cific condition better results are obtained. The operator complexity, the grid complexity and the number of levels were determined, in order to evaluate the quality of the matrix systems gene- rated in each level for both the interpolation and aggregation methods. These results can be seen in Table. 3. The Fig. 4 is a graphical representation of Table. 2 that illustrates a linear behavior of the algebraic multigrid in relation to the matrix size, N. This AMG uses a setup phase based on the red-black coloring interpolation method. In the graph, the AMG performance can be compared against the Conjugate Gradient. Note that as the size of the problem (N 500, 000) increases, the multigrid method exhibits better times for both W-cycle and F-cycle. The worst case for an AMG occurs when using the V-cycle approach; in this situation the Conjugate Gradient method shows better times for N 1, 000, 000. The straight li- ne that represents the behavior of the multigrid method was obtained from the linear regression of the coordinates of the points that indicate the CPU times for the diverse sizes of the problem solved. It is also to be observed that the best ti- mes were obtained for the F-cycle, as foreseen in
  • 10. N Conj. Grad. Red-Black Coloring Strong-Connection V-cycle F-cycle W-cycle V-cycle F-cycle W-cycle 343,000 13.269 17.557 14.688 15.304 44.198 39.348 37.677 512,000 22.612 28.799 22.602 21.804 80.275 74.205 70.663 729,000 36.470 40.075 29.999 33.714 122.209 114.262 108.783 1,000,000 55.401 49.817 37.305 39.334 202.765 192.443 183.698 1,331,000 76.063 78.637 54.827 61.665 280.646 267.269 253.981 1,728,000 106.442 89.627 64.640 73.796 425.041 409.450 392.990 2,197,000 145.802 123.651 90.532 102.007 579.050 563.157 541.644 2,744,000 188.627 142.499 105.295 116.760 832.633 806.268 789.846 Table 2. AMG vs. Conjugate Gradient CPU times (secs). N Red-Black Coloring Strong-Connection Levels G. Complex. A. Complex Levels G. Complex. A. Complex 343,000 4 1.57 2.53 4 1.60 2.83 512,000 4 1.57 2.53 5 1.60 3.10 729,000 5 1.57 2.54 6 1.61 3.41 1,000,000 5 1.57 2.54 6 1.61 3.47 1,331,000 5 1.57 2.54 7 1.61 3.67 1,728,000 5 1.57 2.54 7 1.61 3.74 2,197,000 5 1.57 2.54 8 1.61 4.02 2,744,000 5 1.57 2.54 8 1.61 3.98 Table 3. Complexity operators. Fig. 4. Solution times for Linear systems. section 2.1. In order to estimate the constant K, that relates the number of floating point operations (FLOP) with the order N, it was determined the average of floating point operations per second (FLOPS) for the test computer system. This was done by means of DGEMM benchmark (Luszc- zek et al., 2005), which measures the floating point rate of execution of double real precision matrix-matrix multiplication. The computer sys- tem used in the tests shows an average value of 4.76 GFLOPS. Fig. 5.2, that shows the average number of GFLOP, employed to solve the test li- near systems, was generated using the estimated value and the data of the Table. 2. Then, the ap- proximate constants for the different AMG cy-
  • 11. Fig. 5. GFLOP in solving linear systems. cles were determined using Fig. 5. These cons- tants are: 253,757 FLOP for the V-cycle, 208,528 FLOP for W-cycle and 183,771 FLOP for the F- cycle. The behavior of the strong connections ba- sed AMG was not plotted because it was not li- near due to the long times of the setup phase. 6. Conclusions In this work, we have presented an efficient implementation of an algebraic multigrid method (AMG) to solve large sparse systems of linear equations. We have used a set of linear systems arising from a 3D scalar elliptic operator discre- tized by finite difference method. In order to eva- luate the implementation, a set of 8 linear sys- tems was generated. The maximum order of the matrices associated to these linear systems was 2,744,000. The AMG performance was compa- red against the Conjugate Gradient. The AMG needed a fast setup phase to obtain a good per- formance. Note that as the size of the problem (N 500, 000) increases, the multigrid met- hod exhibits better times for both W-cycle and F- cycle. The worst case for an AMG occurs when using the V-cycle approach; in this situation the Conjugate Gradient method shows better times for N 1, 000, 000. Also, we have observed that the best CPU times were obtained for the F- cycle. The approximate constants, k, for the dif- ferent AMG cycles were: 253,757 FLOP for the V-cycle, 208,528 FLOP for W-cycle and 183,771 FLOP for the F-cycle. In general, our AMG im- plementation had a good perfomance and we ha- ve obtained a linear AMG solver. Acknowledgments This work was supported by the Consejo de Desarrollo CientıĢfico y HumanıĢstico de la Uni- versidad de Carabobo under the projects CDCH- UC No. 2004-002. and CDCH-UC No. 2004- 011. Also, we want to thank Pedro Linares for his right suggestions with regard to this paper. 7. Bibliography Axelsson, O. P. Vassilevski. (1989). Al- gebraic Multilevel Preconditioning Methods I. Journal on Numer. Anal. Math. 56(2-3): 157-177. Axelsson, O. P. Vassilevski. (1990). Algebraic Multilevel Preconditioning Methods II. Journal on Numer. Anal. 27(6): 1564-1590. Axelsson, O. P. Vassilevski. (1991). Asym- ptotic Work Estimates for AML Methods. Appl. Numer. Math. 7(5): 437-451. Braess, D. (1995). Towards Algebraic Multigrid for Elliptic Problems of Second Order. Compu- ting. 55(4): 379-393.
  • 12. Brandt, A. (1977). Multi-Level Adaptive So- lutions to Boundary-Value Problems. Math. Comput. 31(138): 333-390. Cela, J. J. Navarro.(1992). Performance Model for Algebraic Multilevel Preconditioner on a Shared Memory Multicomputers. PACTA92. Fenlason, J. R. Stallman. (1998). The GNU Profiler. Free Software Founda- tion. [cited 08 enero 2010; 15:30 VET]. Also available at http://www.gnu.org/ software/binutils/manual/gprof-2.9.1/html chap- ter/gprof toc.html. Huang, W. (1991). Convergence of Algebraic Multigrid Methods for Symmetric Positive Defi- nite Matrices with Weak Diagonal Dominance. Appl. Math. Comp. 46(2): 145-164. Iwamura, C., F. Costa, I. Sbarski, A. Easton N. Li,(2003). An efficient algebraic multigrid preconditioned conjugate gradient solver. Com- put. Meth. Appl. Mech. Eng. 192(20): 2299-2318. Joubert, W. J. Cullum.(2006). Scalable alge- braic multigrid on 3500 processors. Electron. Trans. Numer. Anal. 23: 105-128. Kickinger, F. (1997). Algebraic Multigrid for Discrete Elliptic Second-Order Problems. Tech- nical Report. Institute for Mathematics. Johannes Kepler University Linz. Austria. Krechel, A. K. StuĢˆben. (1997). Operator Dependent Interpolation in Algebraic Multigrid. Technical Report. GMD Report 1048. LarrazaĢbal, G. (2004). UCSparseLib: A nu- merical library to solve sparse linear systems. SimulacioĢn NumeĢrica y Simulado Computacio- nal, pp. TC19-TC25. Eds. J. Rojo, M. Torres y M. Cerrolaza. ISBN 980-6745-00-0, SVMNI, Venezuela. Luszczek, P., J. Dongarra, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bai- ley D. Takahashi. (2005). Introduction to the HPC Challenge Benchmark Suite. [cited 08 enero 2010; 14:30 VET]. Also available in http://icl.cs.utk.edu/projectsfiles/hpcc/pubs/hpcc- challenge-benchmark05.pdf. Mo, Z. X. Xu. (2007). Relaxed RS0 and CLJP coarsening strategy for parallel AMG. Parallel Comput. 33(3): 174-185. Pereira, F., S. Lopes S. Nabeta. (2006). A wavelet-based algebraic multigrid preconditioner for sparse linear systems. Appl. Math. Comput. 182(2): 1098-1107. Reusken, A. (1994). Multigrid with Matrix- Dependent Transfer Operators for Convection- Diffusion Problems, in Multigrid Methods, vol. IV, Internat. Ser. Numer. Math. 116, BirkhuĢˆser- Verlag, Basel, pp. 269-280. Ruge, J. K. StuĢˆben. (1993). Algebraic Multigrid (AMG). In: Multigrid Methods (Mc- Cormick, S.F., ed). SIAM. Frontiers in Applied Mathematics. Philadelphia, USA. StuĢˆben, K. (1999). Algebraic Multigrid (AMG): An Introduction with Applications, Technical Report, GMD Report 53. Vanek, P., J. Mandel M. Brezina. (1996). Algebraic Multigrid by Smoothed Aggregation for Second Order and Fourth Order Elliptic Problems. Computing. 56(3): 179-196. Wagner, C., W. Kinzelbach G. Wittum. (1991). Schur-Complement Multigrid - a Robust Method for Groundwater Flow and Transport Problems. Numer. Math. 75(4): 523-545. Weidendorfer, J., M. Kowarschik C. Trinitis. (2004). A Tool Suite for Simulation Based Analysis of Memory Access Behavior. Procee- dings of the 4th International Conference on Computational Science (ICCS 2004). Krakow. Poland.
  • 13. View publication stats View publication stats