An Efficient Implementation Of An Algebraic Multigrid Solver

AN EFFICIENT IMPLEMENTATION OF AN ALGEBRAIC MULTIGRID
SOLVER
Implementación Eficiente de un Solver Multigrid Algebraico
JORGE A. CASTELLANOS D., JOSÉ L. RAMÍREZ y GERMÁN A. LARRAZÁBAL S.
Centro Multidisciplinario de Visualización y Cómputo Cientı́fico (CeMViCC)
Universidad de Carabobo. Facultad Experimental de Ciencia y Tecnologı́a.
Carabobo. Venezuela.
{jcasteld, jbarrios, glarraza}@uc.edu.ve
Fecha de Recepción: 15/01/2010, Fecha de Revisión: 01/03/2010, Fecha de Aceptación: 15/07/2010
Abstract
In this work, we present an efficient implementation of an algebraic multigrid method (AMG) to solve
large sparse systems of linear equations. The multigrid methods (MG), in particular the AMG’s, exhibit
a theoretical linear complexity with respect to the number of floating point operations (FLOP) and the
problem size. In practice, the problem is to determine when it is preferable to use an AMG instead of
some other solver. For this reason, in this work the main focus is to solve this problem. We use a set of
linear systems arising from a 3D scalar elliptic operator discretized by finite difference method. In order
to evaluate the implementation, a set of 8 linear systems is generated. The maximum order of the matrices
associated to these linear systems is 2,744,000. The experimental results show the good performance of
the AMG’s implementation. This performance is observed in the linear behavior of the AMG in our test
problems.
Keywords: Sparse Linear Solvers, Algebraic Multigrid, AMG, Code Optimization.
Resumen
En este trabajo se presenta una implementación eficiente del Método Multinivel Algebraico (AMG)
para la resolución de grandes sistemas de ecuaciones lineales esparcidos. Los métodos multinivel (MG),
en particular AMG, muestran una complejidad lineal con respecto al número de operaciones de punto
flotante (FLOP) y el tamaño del problema. En la práctica, el problema consiste en determinar cuando es
preferible emplear AMG en lugar de otro método. Por este motivo, este trabajo se enfoca en resolver este
problema. Se empleó un conjunto de sistemas lineales provenientes de la discretización de un operador
3D escalar elı́ptico mediante diferencias finitas. Con el objetivo de evaluar la implementación presentada
se generó un conjunto de 8 sistemas lineales. El orden máximode las matrices asociadas a los sistemas
lineales generados, fue de 2.744.000. Los resultados experimentales muestran un buen comportamiento
de la implementación, donde se obtuvo un comportamiento lineal del Método Multinivel Algebraico
(AMG).
Palabra Claves: AMG, Multinivel Algebraico, Optimización de Código, Sistemas Esparcidos.

1. Introduction
The computational solution of linear equa-
tion systems is one of the most important re-
search areas nowadays, specially those systems
that come from modeling physical problems with
certain complexity. Examples are those systems
associated with industrial applications, such as
fluid dynamics and structural mechanics. The ty-
pical computational core, to solve these problems
is the linear system:
Ax = b (1)
where the A matrix is non-singular, large and
sparse. Then, it is very important to find a nu-
merical method to solve Eq.1 in such a man-
ner that the number of floating point operations
is proportional to problem size. This feature has
been reached with the multigrid (MG) methods
(Brandt, 1977). The multigrid methods exhibit a
theoretical linear complexity with respect to the
number of operations and the problem size. Ot-
hers methods, such as the classical iterative met-
hods, present a cuadratic complexity. But, the
multigrid methods spend more memory and have
a limited applicability. Currently, there are multi-
grid methods that can be used applying any kind
of discretization. These are the Algebraic Multi-
grid Methods (AMG) (Axelsson & Vassilevski,
1989; Axelsson & Vassilevski, 1990; Vanek et
al., 1996; Stüben, 1999). The AMG algorithms
are composed by two stages, the setup phase and
the solving phase. The latter shows linear com-
plexity (Axelsson & Vassilevski, 1991; Cela &
Navarro, 1992), but the setup phase represents
an overhead. This overhead can only be decrea-
sed on large size problems. In Iwamura et al.
(2003), solvers has been developed based on an
AMG with a quick setup phase and a fast itera-
tion cycle. These characteristics make AMG sui-
table to solve medium size problems. In Mo & Xu
(2007), a parallel coarsening strategy that allows
to improve the convergence as well as to obtain
reasonable CPU times using tipical AMG algo-
rithms is presented. A recent work (Pereira et al.,
2006) shows that it is possible to reduce the AMG
memory use using the discrete wavelet transform
(DWT) in the setup phase. In that work, the DWT
was applied to build up the matrices hierarchy of
the wavelets multiresolution decomposition pro-
cess. In other recent work (Joubert & Cullum,
2006), a parallel scalable AMG is presented. This
scalability is achieved through the use of a new
parallel coarsening technique in addition to an
agressive coarsening and multipass interpolation
technique.
In this work, we show that it is possible,
in practice, to solve efficiently the linear system
Eq.1 that arises from a 3D scalar convection-
diffusion operator using an AMG with a CPU ti-
me proportional to the problem size.
The present paper is organized in the follo-
wing manner. In section 2, the theoretical foun-
dations of the AMG methods are presented. In
section 3, details of the AMG implementation are
shown. section 4 contains the techniques that we-
re used to optimize the codes presented in section
3. section 5 presents the experimental results ob-
tained by using the optimized solver. Finally, in
section 6, conclusions are discussed.
2. Multigrid Methods
A multigrid method consists of the follo-
wing elements:
1. A sequence of meshes with a matrix asso-
ciated to each grid.
2. Intergrid transfer operators between the
meshes (Interpolator and Restrictor).
3. A classical iterative method (Gauss-Seidel,
Jacobi, SSOR, etc.), which is called relaxer.
In order to explain the multigrid method,
suppose that there exist two grids called Mh
(fine-grid) and MH
(coarse-grid). Ah and AH are
matrices that come from a discretization in Mh
and MH
. The dimensions of Ah and AH are n×n
and N × N, respectively. The problem is then to
solve a linear system in the fine-grid Mh
as fo-
llows:
Ahuh
= fh
(2)

where uh
, fh
∈ Rn
, we will denominate V h
=
Rn
, and similarly V H
= RN
. Two transference
operators between V h
and V H
must be built up.
These are the interpolation operator I : V H
→
V h
and the restriction operator R : V h
→ V H
.
In the algebraic multigrid methods (AMG) AH is
defined as equation 3. Analogously V H
= RN
.
AH = RAhI (3)
2.1. Multigrid Algorithm
The multigrid method is based on the Al-
gorithm 1. Because usually the problem contains
more than two meshes, step 3 of the algorithm is
applied recursively until the problem is reduced
to a sufficiently coarse mesh. The previous algo-
rithm is known as the V-cycle algorithm (Fig. 1).
Different multigrid algorithms exist and they are
named depending on the form in which each level
is visited.
1. Relax v times Ahuh
= fh
in V h
, with
initial solution uh
0.
2. Compute rH
= R(fh
− Ahuh
v ).
3. Solve AHeH
= rH
in V H
.
4. Correct the approximation in the fine
grid: uh
v = uh
v + IeH
.
5. Relax µ times Ahuh
= fh
with initial
solution uh
v
Algorithm 1: V-cycle multigrid algorithm
The Fig. 1 shows the scheme for the V-
cycle, although there are other schemes used such
as W-cycle and F-cycle.
Fig. 1. v-cycle.
The relaxation methods, such as Jacobi
or Gauss-Seidel, reduce effectively only certain
components of the error, in particular those com-
ponents associated with the relaxer eigenvalues
that are close to zero. These are called high fre-
quency components of the error. In Fig. 2 the ef-
fect of smoothing, after a relaxation step, can be
observed.
Fig. 2. Effect of smoothed.
The key to success for multigrid methods is
to find adequate R and I operators, so that the co-
rrection operator of coarse grid is able to correct
the errors that relaxation is not able to attenuate.
Thus, it is possible to obtain an exact solution of
the system with little relaxation. If this property
is achieved, the complexity of the method as a
whole, will be an expression of the type:
Complexity = O(k(n + Np
))
with p = 2 or 3 depending on the method used to
solve the system in the coarse grid. If Np
n,
the multigrid method presents a linear comple-
xity, that is to say, O(kn). The main focus of this
work is to determine the value of k for which it is
preferable to use the algebraic multigrid method
instead of some classical iterative method.
2.2. Algebraic Multigrid
Formally, a cycle AMG can be described in
the same way that a multilevel geometric cycle,
except that the terms mesh, submesh, node, etc.
are replaced by a set of variables, subset of va-
riables, variables, etc. Formal components of an
AMG can be described by means of the Eq. 4.
X
j∈V h
ah
ijuh
j = fh
i (i ∈ V h
) (4)

In Eq. 4, V h
denotes the set of indices
1, 2, . . . , n; it is assumed that Ah is a sparse ma-
trix. In order to generate a coarse system from
Eq.4, it is necessary to perfom a partition of the
set V h
in two disjoints subsets, V h
= Ch
∪ Fh
,
where the subset Ch
contains the variables that
are in the coarse level (coarse nodes) and the sub-
set Fh
(fine nodes) is the complement of Ch
. As-
suming that such partition is given, and defining
V H
= Ch
, the coarse system is shown in Eq.5.
AHuH
= fH
o
X
l∈ΩH
aH
kluH
l = fH
k (k ∈ V H
) (5)
where AH = RAhI, R : V h
→ V H
is the res-
triction operator and I : V H
→ V h
is the in-
terpolation operator R = It
. Finally, as in any
multilevel method a process of smoothing Sh is
required. One step of this smoothing process is
of the form:
uh
→ uh
where
uh
= Shuh
+ (Ih − Sh)A−1
h fh
(6)
In Eq. 6, Ih denotes the identity operator.
Consequently, the error eh
= uh
∗ − uh
, where uh
∗
denotes the exact solution of Eq.4, is transformed
accordingly equation 7.
eh
→ eh
where eh
= Sheh
(7)
The AMG uses a simple process of smoot-
hing such as Gauss-Seidel (Sh = Ih − Q−1
h Ah,
where Qh is the lower triangular of Ah, inclu-
ding the diagonal) or weighted-Jacobi relaxation
(Sh = Ih − wD−1
h Ah, where Dh = diag(Ah)).
The matrix Ah should be diagonally dominant in
order to use the aforesaid methods, otherwise ad-
ditional hypotheses must be formulated on Ah.
The partition C/F and the transference ope-
rators I and R must be built up explicitly. The
construction of these components is the most im-
portant task of an AMG. These components must
be built so that an efficient relationship between
the smoothing process and the coarse grid correc-
tion operator exists. This relationship depends on
the algebraic sense of the smoothing errors under
certain smoothing process. Nevertheless, the al-
gebraic interpretation of the low frequency errors
for the relaxers, Gauss-Seidel and Jacobi, has
been studied for M symmetrical positive definite
with weakly diagonal dominance, as it is men-
tioned (Ruge Stüben, 1993). For these cases
the construction of coarse spaces is based on cha-
racterizing the errors approaching over the matrix
graph. This characterization is achieved if the fo-
llowing expression is satisfied:
X
j6=i
|aij|
aii
(ei − ej)2
e2
i
1 (8)
where aij are the matrix coefficients and ei, ej
are components of the error. Then, for Eq.8 to be
fulfilled, the matrix must be weakly diagonal do-
minant.
Similarly, it is important that the partition
and the transference operators are such that AH
is reasonably sparse and much smaller than Ah.
According to the construction phase, AMG
are divided into two groups, methods based on in-
terpolation techniques and methods based on ad-
dition techniques. The difference between these
methods is the manner in which they build the
partition C/F and the transference operators.
2.2.1. Interpolation Methods
In 1987, J. Ruge and K. Stüben (Ruge
Stüben, 1993) proposed an algebraic mul-
tigrid method for symmetrical positive definite
M−matrices with weakly diagonal dominance,
using either the Jacobi or Gauss-Seidel method
as relaxer. Here, the interpolation is based on the
concept of the strong connection between nodes.
In this work, the convergence and linear com-
plexity order of the method is demonstrated. In
1991, W. Huang (Huang, 1991) supported on the
results of Ruge and Stüben to demonstrate the
AMG convergence for symmetrical positive de-
finite matrices with weak dominant diagonal. In
1994, Reusken (Reusken, 1994) proposed a met-

hod similar to the one of Ruge and Stüben ba-
sed on an approach to the Schur complement. In
1991, Wagner (Wagner et al., 1991) presented a
modified version of the method of Reusken. In
this version, a modified linear system Mx = Ff
is solved by means of a simple blocks elimina-
tion, as in the method of Reusken. Here, the coef-
ficients of M are defined in a different manner.
These two last proposals work well for positive
definite matrices with weak dominant diagonal.
In what follows, the Ruge and Stüben met-
hod, with some alternative proposals in subse-
quent works (Krechel Stüben, 1997; Stüben,
1999), is described because it is going to be used
in the implementation of this work. In this met-
hod the goal is to build the sets C and F so
that the F-nodes are obtained interpolating the
C-nodes.
Standard Coarsening This method is
implemented in this study because of its low me-
mory consumption in comparison with the other
methods shown in the subsequent paragraphs. In
this method, the following sets are defined in both
equations, 9 and 10.
Ni = {j ∈ V : j 6= i, aij 6= 0} (9)
Si =

j ∈ Ni : −aij ≥ η máx
l6=i
{−ail}

(10)
The set Si is known as the set of strong con-
nections of the node i. The typical value of η is
0,25. The interpolation nodes Ci are defined as
follows:
Ci = C ⊂ Si
Aggressive Coarsening This type of
coarsening is used when the standard coarsening
can cause a relatively high complexity, that is
to say, that the memory requirement is excessi-
ve due to the Galerkin operators of the coarse le-
vels. In these cases, the Galerkin matrix is much
denser than the original matrix.
Positive Strong Connections In both ty-
pes of coarsening, the building of the subsets C
and F is based on negative couplings. It is assu-
med that relatively small positive couplings can
be ignored in the processing as well as in the in-
terpolation. Nevertheless, this cannot be assumed
in all the cases. A process of building the subsets
C and F must ensure that for all F-nodes, which
have negative and positive strong couplings, a mi-
nimum number of both couplings should be re-
presented in C. Building such subsets within a
single step is relatively complicated, since for a
great variety of situations most of the strong con-
nections are negative. In Stüben (1999) a simple
alternative implementation is proposed.
Interpolation It is assumed that subsets C and
F have been constructed using a standard or ag-
gressive coarsening. In the first case, direct or
standard interpolation is used. In the second case,
multistep interpolation is used. In both cases, the
interpolation can be improved by further steps of
relaxation.
Direct Interpolation For each i ∈ F, the
set of interpolation nodes Pi = Cs
i (Cs
i = C ∩Si)
is defined and it is approached as:
aiiei +
X
j∈Ni
aijej = 0 =⇒
aiiei + αi
X
k∈Pi
a−
ikek + βi
X
k∈Pi
a+
ikek = 0 (11)
with
αi =
X
j∈Ni
a−
ij
X
k∈Pi
a−
ik
and βi =
X
j∈Ni
a+
ij
X
k∈Pi
a+
ik
(12)
Where, a−
ij matrix element below the diagonal
and a−
ij matrix element above the diagonal. This
immediately leads to the following interpolation
formula:

ei =
X
k∈Pi
wikek with
wik =

−αiaik/aii (k ∈ P−
i )
−βiaik/aii (k ∈ P+
i )
(13)
Standard Interpolation The direct inter-
polation can be modified so that the strong F-
connections are (indirectly) included in the inter-
polation of each i ∈ F. This is achieved imme-
diately by approximating the i-th equation (right
side of the Eq.11). First, all ej (j ∈ Fs
i =
F ∩ Si) are eliminated, approximately by means
of the corresponding j-th equation. Specifically,
for each j ∈ Fs
i , ej are replaced by
ej −→ −
X
k∈Nj
ajkek/ajj (14)
in the new equation for ei
âiiei +
X
j∈N̂i
âijej = 0 with
N̂i = {j 6= i : âij 6= 0} (15)
Defining Pi as the union between Ci and
all the Cj (j ∈ Fs
i ), the interpolation is defined
exactly as in Eq.11 - Eq.12 with all the a replaced
by â and Ni replaced by N̂i.
This modification usually improves the qua-
lity of interpolation. The main reason is that
this type of approximation Eq.11 introduces less
error, when it is applied to the Eq. 15. However,
the main goal is to have F-nodes as interpolation
nodes.
Multistep Interpolation This interpola-
tion method is applied when subsets C and F
are constructed through an aggressive coarse-
ning. This interpolation is done in several steps,
using direct interpolation whenever it’s possible
and, for the rest of the nodes, using interpolation
formulas in the F-nodes neighborhoods.
Jacobi Interpolation Given any of the in-
terpolation formulas outlined above, it is possi-
ble to obtain an additional improvement if a Ja-
cobi relaxation is applied a posteriori. Only one
or two steps of Jacobi are worthwhile in prac-
tice. Depending on the situation, the relaxation
of the interpolation can significantly improve the
convergence. This improvement of the interpola-
tion operator was proposed in 1997 by Krechel
Stüben ( Krechel Stüben, 1997).
2.2.2. Aggregation Methods
In 1995, D. Braess (Braess, 1995) proposed
an aggregation method that grouped several no-
des in sets with sizes that range from one to four
nodes. The sets are constructed in two steps: in
the first step, pairs of nodes strongly connected
are grouped and in the second step, pairs formed
in the previous step are grouped.
In 1996, P. Vanek, J. Mandel and M. Brezi-
na (Vanek et al., 1996) developed an AMG based
on interpolation by smoothed aggregations. They
used their algorithm to solve second and fourth
order elliptic problems.
In 1997, F. Kickinger (Kickinger, 1997)
proposed an AMG where the coarsening strategy
is independent of the strong connections. This
means that this method is based only on the graph
of the matrix. The coarsening strategy is aggres-
sive and fast. This algorithm is based on coloring
the graph of the matrix, so that a node labelled as
coarse will have all neighbors labelled as fine. In
the Fig. 3, an example of this strategy is shown.
(a) Original Graph (b) Resulting Graph
Fig. 3. Graph Coloring.

2.2.3. Complexity Operators
There are two complexity operators that in-
dicate approximately the memory requirements
that arise when using any of the AMG methods.
These requirements are expressed in terms of
both the complexity grid operator CG and the al-
gebraic complexity operator CA,
CG =
X
l
nl
n1
(16)
and
CA =
X
l
nzl
nz1
(17)
where nl and nzl denote the number of variables
and the number of non-zero elements of the ma-
trix A in the level l, respectively.
3. AMG Linear Solver
In this work, the main purpose is to imple-
ment a linear solver, based on an AMG, that ex-
hibits a linear behavior when it is applied to large
size input matrices. The AMG solver implemen-
ted here incorporates two implementations for the
setup phase with the intention of illustrating the
behavior of both the interpolation and the ag-
gregation method. As interpolation method, the
standard coarsening proposed by Ruge Stüben
(1993) (section 2.2.1) was selected. As aggrega-
tion method, the red-black coloring method of
graphs proposed by Kickinger (1997) (section
2.2.1) was chosen. In order to evaluate the met-
hods under consideration, an existing implemen-
tation of the numerical library UCSparseLib (La-
rrazábal, 2004) was used.
3.1. Test Matrices
The input matrices for the linear solver were
generated discretizing a 3D scalar elliptic opera-
tor by means of a second order 7 − stencil finite
differences method.
The matrices generation characterizes a
great variety of industrial problems, and is defi-
ned as follows:
L(u) = ∆u + Cc∇ · u (18)
Where Cc = 0 (Convection coeficient), with Di-
richlet boundary conditions and a unit cube as
computacional domain. This ensures that the ma-
trices generated are symmetrical positive definite.
3.2. Setup Phase
The function mgrid is called in a first ins-
tance to carry out the setup phase. During this sta-
ge, the levels are generated according to what was
explained in 2.2.1 using either, the Stüben met-
hod or the Red-Black coloring method in agree
to the values defined in the array of parameters.
This stage concludes factorizing the resulting ma-
trix in the coarsest level using a direct method.
Among the operations carried out in the setup
phase, when the red-black coloring method was
used, the generation of the levels is the one with
the highest computational cost. This is because,
in order to generate each level, it is necessary to
carry out a product of three matrices. The three
matrix product has a cubic complexity order in
relation to number of rows of the matrix. Even
though a direct method is still used to factorize
the matrix in the coarsest level, its computational
cost is relatively smaller since the order of this
matrix is generally smaller than 5,000.
3.3. Solver Phase
The function mgrid is called for a second
time to execute the solution phase of the AMG
method. This phase consists, according to algo-
rithm 1, of a loop that is repeated until mee-
ting the set up tolerance or reaching the maxi-
mum number of iterations indicated in the para-
meter array. In each iteration, a cycle that can be
a V-cycle, W-cycle or F-cycle is made happen.
According to what was indicated in section 2.1,
when each one of the levels of the cycle is visi-
ted, a relaxation process is performed. The rela-
xation process can be either Weighted-Jacobi or
Gauss-Seidel according to the parameters array.
The smoothing process is repeated a number of

times according to the value configured in the pa-
rameters array. The efficiency of the solver co-
de is closely related to the implementation of the
product matrix-vector.
4. Code Optimization
In this section the improvements to the
first version of the code implemented for the
algebraic method multigrid, in UCSparseLib
(Larrazábal, 2004). Test were run on a Sun
Fire V40z server, using one AMD Opteron
885, 2.6 GHz processor with 1MB cache and
main memory of 16GB. The codes were im-
plemented in ANSI C, and compiled using gcc
3.6 under GNU/Linux. The compiler optimiza-
tion flags selected were -march=opteron,
-O2, -funroll-loops, -fprefetch,
-loop-arrays. The use of these optimization
allowed: not only to obtain a linear behavior but
to reduce the CPU time in 35 % approximately
in comparison with the codes compiled without
optimization flags.
4.1. Setup Phase Optimization
At the beginning of the tests of the AMG
implementation, it was observed that the setup
phase demanded a CPU time greater than the
solution phase and that when the problem size
was increased there was not an improvement in
this ratio according to the AMG theory. Then
an analysis of the CPU time of the setup phase
was carried out using the GNU-gprof (Fenlason
Stallman, 1998) and Valgrind (Weidendorfer
et al., 2004) tools.
In the setup phase, using the strong con-
nections method, with an input matrix of order
N = 64, 000, 87 % of CPU time was represen-
ted by the C-nodes selection algorithm (section
2.2.1). It was possible to improve the performan-
ce of the C-nodes selection algorithm by marking
early the C-nodes initially processed in the se-
lection loop. This change represented 32 % im-
provement in the CPU time of the selection algo-
rithm and was translated as a 55 % of the comple-
te setup phase time. The setup time before the im-
provement was 11.67 seconds and, after the im-
provement it was 4.09 seconds for the matrix of
N = 64, 000.
In the setup phase, using the red-black colo-
ring algorithm, with an input matrix of size N =
64, 000, 58 % of CPU time was represented by
the matrix-matrix product, while 37 % correspon-
ded to the matrix transpositions. Both the matrix-
matrix product and the transposition were used to
generate the AH matrices such as it was explai-
ned in section 2.2.2. The matrix-matrix product
algorithm was optimized to avoid the transpose
matrix operation. Also the matrix-matrix product
algorithm was improved to reduce its total exe-
cution time. After the improvement, time needed
to calculate the matrix-matrix product represen-
ted 68 % of the setup time. The setup time before
the improvement was 7.93 seconds and after the
change it was 0.785 seconds, using an input ma-
trix with N = 64, 000.
4.2. Solver Phase Optimization
In order to evaluate the solver phase, a set of
tests were executed using either the Gauss-Seidel
or the Weighted-Jacobi relaxer. Execution profi-
les were generated configuring the V-cycle using
the previously mentioned tools G-Prof and Val-
grind.
When the tests with the Gauss-Seidel rela-
xer were made, the 48 % of the time was repre-
sented by the matrix-vector operation, whereas
51 % was spent by the scalar vector product algo-
rithm. Because the procedure for the calculation
of the scalar product was called within the fun-
ction that calculates the matrix-vector product,
and furthermore the code of this function is very
simple, the calls to the scalar product were repla-
ced by the code itself. This represented a save in
the run time for the solver phase of approxima-
tely 15 %.
Finally, it was determined experimentally
that two iterations of Gauss-Seidel relaxer impro-
ved the solver time. When the Weighted-Jacobi
relaxer was used, it was determined experimenta-
lly that the best value for the weight constant was

w = 0,9.
In the tests with both types of relaxers, a li-
mit for the size of the matrix in the coarsest level
(N = 5, 000) was used because this value allows
to have a more linear response.
When N 5, 000 was used, an appreciable
increase in the setup phase was seen due to the
high computational cost of using a direct method
to factorizate a matrix in the coarsest level. The
solver selected for the coarsest level was the Cho-
lesky factorization method since the input matrix
was symmetrical positive definite (section 3.1)
for all the test cases. The stop criteria eps was
10−12
for all the cases.
5. Experimental Results
In order to evaluate the run time complexity
order, a set of 8 matrices was generated. The or-
der N of these matrices goes from a minimum
of 343, 000 (70 × 70 × 70) to a maximum of
2, 744, 000 (140 × 140 × 140) with the idea of
including a range of at least one order of magni-
tude. The matrices were generated from an ellip-
tical 3D operator according to section 3.1. The
main characteristics of these matrices can be ob-
served in the Table. 1.
nx × ny × nz Size Non-zeros
70 × 70 × 70 343,000 2,371,600
80 × 80 × 80 512,000 3,545,600
90 × 90 × 90 729,000 5,054,400
100 × 100 × 100 1,000,000 6,940,000
110 × 110 × 110 1,331,000 9,244,400
120 × 120 × 120 1,728,000 12,009,600
130 × 130 × 130 2,197,000 15,277,600
140 × 140 × 140 2,744,000 19,090,400
Table 1. Test matrices.
According to the discussion tackled in sec-
tion 4, the linearsystems represented by the test
matrices were solved using three strategies: an
iterative method (Conjugate Gradient), strong
connections based AMG method and an aggre-
gation based AMG (Red-black coloring). In or-
der to corroborate that the CPU time is propor-
tional to the order of the linear system (N), in
the case of AMG, the solution times for aggre-
gation based AMG, AMG (Red-black coloring)
and Conjugated Gradient were plotted for the test
matrices set. As it is known, the CPU time of the
method Conjugate Gradient follows a quadratic
behavior with respect to the order of the linear
system matrix because the floating point opera-
tions number (FLOP) is of order O(N2
). In order
to determine the influence of the cycle type in the
execution time tests using V-cycle, F-cycle and
W-cycle (Fig. 1 as a reference for the V-cycle)
were effected. In these tests Weighted-Jacobi and
Gauss-Seidel were used as relaxers.
In Table. 2, the results for V-cycle using
the Weighted-Jacobi relaxer with a weight cons-
tant w = 0,9 are shown. This value was selec-
ted because the best times were obtained for this
specific case. Also, in the same table, results for
F-cycle and W-cycle using Gauss-Seidel relaxer
with two iterations are shown, since in this spe-
cific condition better results are obtained. The
operator complexity, the grid complexity and the
number of levels were determined, in order to
evaluate the quality of the matrix systems gene-
rated in each level for both the interpolation and
aggregation methods. These results can be seen
in Table. 3.
The Fig. 4 is a graphical representation of
Table. 2 that illustrates a linear behavior of the
algebraic multigrid in relation to the matrix size,
N. This AMG uses a setup phase based on the
red-black coloring interpolation method. In the
graph, the AMG performance can be compared
against the Conjugate Gradient. Note that as the
size of the problem (N 500, 000) increases, the
multigrid method exhibits better times for both
W-cycle and F-cycle. The worst case for an AMG
occurs when using the V-cycle approach; in this
situation the Conjugate Gradient method shows
better times for N 1, 000, 000. The straight li-
ne that represents the behavior of the multigrid
method was obtained from the linear regression
of the coordinates of the points that indicate the
CPU times for the diverse sizes of the problem
solved. It is also to be observed that the best ti-
mes were obtained for the F-cycle, as foreseen in

N Conj. Grad.
Red-Black Coloring Strong-Connection
V-cycle F-cycle W-cycle V-cycle F-cycle W-cycle
343,000 13.269 17.557 14.688 15.304 44.198 39.348 37.677
512,000 22.612 28.799 22.602 21.804 80.275 74.205 70.663
729,000 36.470 40.075 29.999 33.714 122.209 114.262 108.783
1,000,000 55.401 49.817 37.305 39.334 202.765 192.443 183.698
1,331,000 76.063 78.637 54.827 61.665 280.646 267.269 253.981
1,728,000 106.442 89.627 64.640 73.796 425.041 409.450 392.990
2,197,000 145.802 123.651 90.532 102.007 579.050 563.157 541.644
2,744,000 188.627 142.499 105.295 116.760 832.633 806.268 789.846
Table 2. AMG vs. Conjugate Gradient CPU times (secs).
N
Red-Black Coloring Strong-Connection
Levels G. Complex. A. Complex Levels G. Complex. A. Complex
343,000 4 1.57 2.53 4 1.60 2.83
512,000 4 1.57 2.53 5 1.60 3.10
729,000 5 1.57 2.54 6 1.61 3.41
1,000,000 5 1.57 2.54 6 1.61 3.47
1,331,000 5 1.57 2.54 7 1.61 3.67
1,728,000 5 1.57 2.54 7 1.61 3.74
2,197,000 5 1.57 2.54 8 1.61 4.02
2,744,000 5 1.57 2.54 8 1.61 3.98
Table 3. Complexity operators.
Fig. 4. Solution times for Linear systems.
section 2.1.
In order to estimate the constant K, that
relates the number of floating point operations
(FLOP) with the order N, it was determined the
average of floating point operations per second
(FLOPS) for the test computer system. This was
done by means of DGEMM benchmark (Luszc-
zek et al., 2005), which measures the floating
point rate of execution of double real precision
matrix-matrix multiplication. The computer sys-
tem used in the tests shows an average value of
4.76 GFLOPS. Fig. 5.2, that shows the average
number of GFLOP, employed to solve the test li-
near systems, was generated using the estimated
value and the data of the Table. 2. Then, the ap-
proximate constants for the different AMG cy-

Fig. 5. GFLOP in solving linear systems.
cles were determined using Fig. 5. These cons-
tants are: 253,757 FLOP for the V-cycle, 208,528
FLOP for W-cycle and 183,771 FLOP for the F-
cycle. The behavior of the strong connections ba-
sed AMG was not plotted because it was not li-
near due to the long times of the setup phase.
6. Conclusions
In this work, we have presented an efficient
implementation of an algebraic multigrid method
(AMG) to solve large sparse systems of linear
equations. We have used a set of linear systems
arising from a 3D scalar elliptic operator discre-
tized by finite difference method. In order to eva-
luate the implementation, a set of 8 linear sys-
tems was generated. The maximum order of the
matrices associated to these linear systems was
2,744,000. The AMG performance was compa-
red against the Conjugate Gradient. The AMG
needed a fast setup phase to obtain a good per-
formance. Note that as the size of the problem
(N 500, 000) increases, the multigrid met-
hod exhibits better times for both W-cycle and F-
cycle. The worst case for an AMG occurs when
using the V-cycle approach; in this situation the
Conjugate Gradient method shows better times
for N 1, 000, 000. Also, we have observed
that the best CPU times were obtained for the F-
cycle. The approximate constants, k, for the dif-
ferent AMG cycles were: 253,757 FLOP for the
V-cycle, 208,528 FLOP for W-cycle and 183,771
FLOP for the F-cycle. In general, our AMG im-
plementation had a good perfomance and we ha-
ve obtained a linear AMG solver.
Acknowledgments
This work was supported by the Consejo de
Desarrollo Cientı́fico y Humanı́stico de la Uni-
versidad de Carabobo under the projects CDCH-
UC No. 2004-002. and CDCH-UC No. 2004-
011. Also, we want to thank Pedro Linares for
his right suggestions with regard to this paper.
7. Bibliography
Axelsson, O. P. Vassilevski. (1989). Al-
gebraic Multilevel Preconditioning Methods I.
Journal on Numer. Anal. Math. 56(2-3): 157-177.
Axelsson, O. P. Vassilevski. (1990). Algebraic
Multilevel Preconditioning Methods II. Journal
on Numer. Anal. 27(6): 1564-1590.
Axelsson, O. P. Vassilevski. (1991). Asym-
ptotic Work Estimates for AML Methods. Appl.
Numer. Math. 7(5): 437-451.
Braess, D. (1995). Towards Algebraic Multigrid
for Elliptic Problems of Second Order. Compu-
ting. 55(4): 379-393.

Brandt, A. (1977). Multi-Level Adaptive So-
lutions to Boundary-Value Problems. Math.
Comput. 31(138): 333-390.
Cela, J. J. Navarro.(1992). Performance Model
for Algebraic Multilevel Preconditioner on a
Shared Memory Multicomputers. PACTA92.
Fenlason, J. R. Stallman. (1998). The
GNU Profiler. Free Software Founda-
tion. [cited 08 enero 2010; 15:30 VET].
Also available at http://www.gnu.org/
software/binutils/manual/gprof-2.9.1/html chap-
ter/gprof toc.html.
Huang, W. (1991). Convergence of Algebraic
Multigrid Methods for Symmetric Positive Defi-
nite Matrices with Weak Diagonal Dominance.
Appl. Math. Comp. 46(2): 145-164.
Iwamura, C., F. Costa, I. Sbarski, A. Easton
N. Li,(2003). An efficient algebraic multigrid
preconditioned conjugate gradient solver. Com-
put. Meth. Appl. Mech. Eng. 192(20): 2299-2318.
Joubert, W. J. Cullum.(2006). Scalable alge-
braic multigrid on 3500 processors. Electron.
Trans. Numer. Anal. 23: 105-128.
Kickinger, F. (1997). Algebraic Multigrid for
Discrete Elliptic Second-Order Problems. Tech-
nical Report. Institute for Mathematics. Johannes
Kepler University Linz. Austria.
Krechel, A. K. Stüben. (1997). Operator
Dependent Interpolation in Algebraic Multigrid.
Technical Report. GMD Report 1048.
Larrazábal, G. (2004). UCSparseLib: A nu-
merical library to solve sparse linear systems.
Simulación Numérica y Simulado Computacio-
nal, pp. TC19-TC25. Eds. J. Rojo, M. Torres y
M. Cerrolaza. ISBN 980-6745-00-0, SVMNI,
Venezuela.
Luszczek, P., J. Dongarra, R. Rabenseifner,
B. Lucas, J. Kepner, J. McCalpin, D. Bai-
ley D. Takahashi. (2005). Introduction to
the HPC Challenge Benchmark Suite. [cited
08 enero 2010; 14:30 VET]. Also available in
http://icl.cs.utk.edu/projectsfiles/hpcc/pubs/hpcc-
challenge-benchmark05.pdf.
Mo, Z. X. Xu. (2007). Relaxed RS0 and CLJP
coarsening strategy for parallel AMG. Parallel
Comput. 33(3): 174-185.
Pereira, F., S. Lopes S. Nabeta. (2006). A
wavelet-based algebraic multigrid preconditioner
for sparse linear systems. Appl. Math. Comput.
182(2): 1098-1107.
Reusken, A. (1994). Multigrid with Matrix-
Dependent Transfer Operators for Convection-
Diffusion Problems, in Multigrid Methods, vol.
IV, Internat. Ser. Numer. Math. 116, Birkhüser-
Verlag, Basel, pp. 269-280.
Ruge, J. K. Stüben. (1993). Algebraic
Multigrid (AMG). In: Multigrid Methods (Mc-
Cormick, S.F., ed). SIAM. Frontiers in Applied
Mathematics. Philadelphia, USA.
Stüben, K. (1999). Algebraic Multigrid (AMG):
An Introduction with Applications, Technical
Report, GMD Report 53.
Vanek, P., J. Mandel M. Brezina. (1996).
Algebraic Multigrid by Smoothed Aggregation
for Second Order and Fourth Order Elliptic
Problems. Computing. 56(3): 179-196.
Wagner, C., W. Kinzelbach G. Wittum. (1991).
Schur-Complement Multigrid - a Robust Method
for Groundwater Flow and Transport Problems.
Numer. Math. 75(4): 523-545.
Weidendorfer, J., M. Kowarschik C. Trinitis.
(2004). A Tool Suite for Simulation Based
Analysis of Memory Access Behavior. Procee-
dings of the 4th International Conference on
Computational Science (ICCS 2004). Krakow.
Poland.

View publication stats
View publication stats

An Efficient Implementation Of An Algebraic Multigrid Solver

Recommended

Recommended

More Related Content

Similar to An Efficient Implementation Of An Algebraic Multigrid Solver

Similar to An Efficient Implementation Of An Algebraic Multigrid Solver (20)

More from Angela Shin

More from Angela Shin (20)

Recently uploaded

Recently uploaded (20)

An Efficient Implementation Of An Algebraic Multigrid Solver