SlideShare a Scribd company logo
PARALLELISM IN SPECTRAL METHODS
C. CANUTO(I)
ABSTRACT - Several strategies of parallelism for spectral algorithms are discussed.
The investigation shows that, despite the intrinsic lack of locality of spectral
methods, they are amenable to parallel implementations, even on fine grain
architectures. Typical algorithms for the spectral approximation of the viscous,
incompressible Navier-Stokes equations serve as examples in the discussion.
SOMMARIO - Si discutono diverse strategie di parallelizzazione di algoritmi di tipo
spettrale. L'analisi mostra che i metodi spettrali possono essere efficacemente
implementati su architetture parallele, anche a grana fine, nonostante il loro
carattere non-locale. Nella discussione si usano a titolo di esempio alcuni noti
algoritmi spettrali per l'approssimazione delle equazioni di Navier-Stokes vi-
scose e incompressibili.
Introduction.
Since their origin in the late sixties, spectral methods in their modern form have
been designed and developed with the aim of solving problems, which could not
be tackled by more conventional numerical methods (finite differences, and later
finite elements). The direct simulation of turbulence for incompressible flows is
the most popularly known example of such applications: the range of phenomena
amenable to a satisfactory numerical simulation has widened during the years
(l) Dipartimento di Matematica, Universit~ di Parma, 43100 Parma, Italy and
Istituto di Analisi Numerica del C.N.R., Corso C. Alberto, 5 - 27100 Pavia, Italy.
Invited paper at the International Symposium on ~,Vector and Parallel Proces-
sors for Scientific Computation- 2~, held by the Accademia Nazionale dei Lincei and
IBM, Rome, September 1987.
54 C. CANUTO:Parallelism in
under the twofold effect of the increase of the computers' power and the develop-
ment of sophisticated algorithms of spectral type. The simulation of the same
phenomena by other techniques would have required a computer power larger by
order of magnitudes, hence, it would not have been feasible on the currently
available machines (a discussion of the most significant achievements of spectral
methods in fluid dynamics can be found, e.g., in Chapter 1 of ref. [1]).
Since spectral methods have been constantly used in ~extreme>>applications,
their implementation has taken place on state-of-the-art computer architectures.
The vectorization of spectral algorithms was a fairly easy task. Nowadays, spec-
tral codes for fluid-dynamics run on vector supercomputers such as the Cray
family or the Cyber 205, taking full advantage of their pipeline architectures and
reaching rates ofvectorization well above 80% (we refer, e.g., to Appendix B in
ref. [1]).
On the contrary, the implementation of spectral algorithms on parallel com-
puters is still in its infancy. This is partly due to the fact that multiprocessor
supercomputers are only now becoming available to the scientific community.
But there is also a deeper motivation: it is not yet clear whether and how the
global character of spectral methods will efficiently fit into a highly granular
parallel architecture. Thus, a deep investigation - of both a theoretical and ex-
perimental nature - is needed. As a testimony of the present uncertainty on this
topic, we quote the point of view of researchers working at the development of a
multipurpose parallel supercomputer, especially tailored for fluid-dynamics ap-
plications, known as the Navier-Stokes Computer (NSC). This is a joint project
between Princeton University and the NASA Langley Research Center, aimed at
building a parallel supercomputer made up of a fairly small number of powerful
nodes. Each node has the performance of a class VI vector supercomputer; the
initial configuration will have 64 of such nodes. Despite the superior accuracy of
spectral methods over finite difference methods, the scientists involved in this
project have chosen to employ low-order finite differences at least in the initial
investigation on how well transition and turbulence algorithms can exploit the
NSC architecture. Indeed ,~the much greater communication demands of the
global discretization may well tip the balance in favor of the less accurate, but
simpler local discretizations>> ([ 12]).
Currently, a number of implementations of spectral algorithms on parallel
architectures is documented. Let us refer here to the work done at the IBM
European Center for Scientific and Engineering Computing (ECSEC) in Rome,
at the Nasa Langley Research Center by Erlebacher, Bokhari and Hussaini [5],
and at ONERA (France) by Leca and Sacchi-Landriani [11]. The IBM con-
Spectral Methods 55
tributions are described in detail by P. Sguazzero in a paper in this volume. The
latter contributions will be briefly reviewed in the present paper.
The purpose of this paper is to discuss where and to what extent it is possible
to introduce parallelism in spectral algorithms. We will also try to indicate which
communication networks are suitable for the implementation of spectral methods
on fine grain, local memory architectures.
1. Basic aspects of spectral methods.
Let us briefly review the fundamental properties of spectral methods for the
approximation of boundary value problems. We will focus on those aspects of the
methods which are more related to their implementation in a multiprocessor
environment. For complete details we refer, e.g., to refs. [1], [6], [15].
Let us assume that we are interested in approximating a boundary value
problem, which we write as
(1.1) j L(u)=f in S'2
+ boundary conditions on dl-2,
in a d-dimensional box I2 = /-/~i=1 (ai, hi). We approximate the solution u by a
finite expansion
(1.2) UN = ~ 1~lk ~k(X),
Ikl~N
where
k = (kl, ..., kd),
(1.3) 0k (X) = /-/~i=l ~I (Xi)"
Each ~m (i) is a smooth global basis function on (al, bi), satisfying the orthogonality
condition
bi
(1.4) f~(im) (x) ~p(1)(x) w (x) dx = c, ~mn
JI
a i
with respect to a weight function r In most applications, the one dimensional
basis functions are trigonometric polynomials in the space directions where a
periodicity boundary condition is enforced, and orthogonal algebraic polyno-
mials (Chebyshev, or Legendre polynomials) in the remaining directions.
56 C. CANUTO: Parallelism in
The boundary value problem is discretized by a suitable projection process,
which can be represented as
(1.5)
a N e X N
(LN(uN), V)N = (f, V)N, VV e YN
Here XN is the space of trial functions, YN is the space of test functions, LN is an
approximation of the differential operator L and (u, V)N is an inner product,
which may depend upon the cut-off number N. In general, when XN ----YN and
the inner product is the L2(I2) inner product we speak ofa Galerkinmethod; this is
quite common for periodic boundary value problems. Otherwise, for non-
periodic boundary conditions, we have a tau-methodwhen the inner product is the
L2- inner product and YN is a space of test functions which do not individually
satisfy the boundary conditions, or a collocationmethodwhen the inner product is
an approximation of the L2(g2)-inner product based on a Gaussian quadrature
rule.
In order to have a genuine spectral method, the basis functions in the expa-
sion (1.2) must satisfy a supplementary property, in addition to the orthogonality
condition (1.4): if one expands a smooth function according to this basis, the
~Fourier>> coefficients of the function should decay at a rate which is a monotoni-
cally increasing divergent function of the degree of regularity of the function. This
occurs if we approximate a periodic function by the trigonometric system (if u
CS(0,2z), then ilk= 0(]kl-S)). The same property holds if we expand a non-
periodic function according to the eigenfunctions of a singular Sturm-Liouville
problem (such as Jacobi polynomials). The above mentioned property is known
as the spectralaccuracyproperty. When it is satisfied, one is in the condition to prove
an error estimate for the approximation (1.5) of problem (1.1) of the form
(1.6) Ilu-uNlIH~~< C(m, r)N m-r llUllHr for all r I> r0 fixed,
where the spaces H r form a scale of Hilbert spaces in which the regularity of u is
measured. Estimate (1.6) gives theoretical evidence of the fundamental property
of spectral methods, namely, they guarantee an accurate representation of
smooth, although highly structured, phenomena by a ~minimab> number of un-
knowns.
Spectral methods owe their success to the availability of ~fast algorithms>~ to
handle complex problems. The discrete solution u~ is determined by the set of its
Spectral Methods 57
((Fourier coefficients)) {fikl ] k I ~< N} according to the expansion (1.2), but it can
also be uniquely defined by the set of its values {uj = uN(xj)l ~ ] ~N} at a selected
set GN = {xj I ~ I ~<N} in ~. The points in GN are usually the nodes of Gaussian
formulae in D, such as the points xj = j~/N, j=0, ..., 2N-1 in [0,2~] for the
trigonometric system, or the points xj = cos j~/N, j=0, ..., N in [-1,1] for the
Chebyshev system. Thus, we have a double representation of UN, one in transform
space, the other in physical space. The discrete transform
(1.7) {ilk) <=> {uj}
is a global transformation (each fik depends upon all the uj's, and conversely). For
the Fourier and Chebyshev systems, fast transform algorithms are available to
carry out the transformation in a cheap way. Thus, one can use either representa-
tion of the discrete solution within a spectral scheme, depending upon which is
the most appropriate and efficient.
Numerical differentiation, a crucial ingredient of any numerical method for
differential problems, can be executed within a spectral method either in trans-
form space or in physical space. Let us confine ourselves to one dimensional
Fourier or Chebyshev methods.
N-I
If u(x) = ~" Uk eikx
k=-N
is a trigonometric polynomial, then
du N-1
(#= z ~(~ e"~,
dx m=-N
(1.8)
In physical space, setting xj = j~/N, j = 0,..., 2N-1, we have
du 2N-1
(xl) = ~, dOu(xj), 0<l~<2N-1,
dx j=o
with
58 C. CANUTO: Parallelism in
(1.9) d~ =
1
2
0
(-l) ~+i cot (l-J)__Z, 14=j
2N
, l=j.
N
On the other hand, if u(x) = ~_,
k=O
polynomial of the first kind), then
uk Tk(x) (Tk(x) denoting the k-th Chebyshev
du U
- T. ~(~ Tm(x), with
dx m=o
N
(1.1o) ~(2 = 2 z kak
k=m+l
Cm k+m odd
(here Co= 2, Cm = 1 for m~>l). In physical space, setting xj = cosjar/N,j =0, ...,
N, we have
du N
(xt) = X dU u(xj), O<l<~N,
dx j=o
with
(1.11) d#=!
c: (_-1_)~§
9 xt- xj
-xj
2Ne+ 1
- ~----- ,
_ 2N 2+1
6
l<.t=j<.N,
l=j= 1,
I=j=N
(here Cl=CN=2, Cj=I for I<j<N-1).
Spectral Methods 59
The previous relations show that spectral differentiation - like a discrete trans-
form - is again a global transformation (with the lucky exception of Fourier dif-
ferentiation in transform space). The global characterof spectral methods is cohe-
rent with the global structure of the basis functions which are used in the expan-
sion.
Globality is the first feature of spectral methods we have to cope with in
discussing vectorization and parallelization. If we represent the previous trans-
forms in a matrix-times-vector form, they can be easily implemented on a vector
computer, and they take advantage of this architecture because matrices are
either diagonal, or upper triangular, or full. When the transforms are realized
through the Fast Fourier Transform algorithm, one can use efficiently vectorized
FFT's (see, e.g., [14]).
Conversely, if we are concerned with parallelization, globality implies grea-
ter communication demand among processors. This may not be a major problem
on coarse grain, large shared memory architectures, such as the now commercial-
ly available supercomputers (e.g., Cray XMP, Cray 2, ETA t~ ...). We expect
difficulties on the future fine grain, local memory architectures, where informa-
tion will be spread over the memories of tens or hundreds of processors.
In order to make our analysis more precise, let us focus on perhaps the most
significant application of spectral methods given so far, i.e., the numerical
simulation of a viscous, incompressible flow. Let us assume we want to discretize
the time-dependent Navier-Stokes equations in primitive variables
(1.12)
ut -vAtt + ~7p+(u'V)u=f
div u=O
u=g(or u periodic)
u(x, O)=uo (x)
in ~2x (0, T],
in g2x(0, T],
on 052x(0, T],
in ~,
in a bounded domain g2 C Ra (d=2 or 3).
So far, nearly all the methods which have been proposed in the literature
(see, e.g., Chapter 7 in [1] for a review) use a spectral method for the discretiza-
tion in space, and a finite difference scheme to advance the solution in time.
Typically, the convective term is advanced by an explicit scheme (e.g., second order
Adams-Bashforth, or fourth order Runge-Kutta) for two reasons: the stability
limit is always larger than the accuracy limit required to preserve overall spectral
accuracy, and the nonlinear terms are easily handled by the pseudospectral tech-
nique (see below). Conversely, the viscous and pressure terms are advanced by an
60 C. CANUTO*.Parallelism in
implicit scheme (e.g., Crank-Nicolson), in order to avoid too strict stability limits.
Thus, at each time level, one has to
1) evaluate the convective term (u-V)u for one, or several, known velocity fields.
The computed terms appear on the right-hand side G of a Stokes' like problem
(1.13)
au-vAu+ ~7p = G in t2,
div u=O in Y2,
u=g (or u periodic) on Or2,
where a= 1/At;
ii) solve the spectral discretization of problem (1.13).
In most cases, problem (1.13) is reduced to a sequence of Helmholtz prob-
lems. These, in turn, are solved by a direct method or an iterative one. In the
latter case, one has to evaluate residuals of spectral approximation of Helmholtz
problems.
We conclude that the main steps in a spectral algorithm are:
A) calculation of differential operators on given functions;
B) solution of linear systems.
When the geometry of the physical domain is not Cartesian, one first has to
reduce the computational domain to a simple geometry. In this case, one has to
resort to one of the existing
C) domain decomposition techniques.
In the next sections, we will examine these three steps in some detail in view
of their implementation on a multiprocessor architecture.
2. Spectral calculation of differential operators.
Let us consider the following problem: ~(given the representation of a finite-
dimensional velocity field u, either in Physical or in Transform space, compute
Spectral Methods 61
the representation of (u -~7)u in the same space~. We recall that by representa-
tion of a given function v in Transform space we mean the finite set of its
<<Fourier~ coefficients according to the expansion (1.2); this set will be denoted by
9~. Similarl% by representation ofv in Physical space we mean the set of its values
on a grid GN in the physical domain, which uniquely determines v; this repre-
sentation will be denoted by v.
Each component of (u 9 ~7)u is a sum of terms of the form
(2.1) vDw,
where v and w are components ofu and D denotes differentiation along one space
direction.
An <<approximate~ representation of (2.1) in Transform space can be com-
puted by the so-called pseudospectral technique, which can be described as fol-
lows:
" ~' V'.....
"'".............
....... " vDw , (vDw) ^(2.2) ..,,
......
.......'""
"& .... --9 D~' ~ Dw .........
The solid line denotes discrete transform (computable by FFT), the dashed
line indicates differentiation in transform space, and the dotted line means point-
wise multiplication in physical space. The result of (2.2) is not the exact repre-
sentation of (2.1) in transform space due to presence of an aliasing error; howev-
er, it is possible to eliminate such an error, using again transformations similar to
the previous ones (see, e.g., Chapter 3 in [1]).
The representation of (2.1) in physical space is as follows (with the same
meaning of the arrows as before):
(2.3)
V ............................................................... .~
.,..
,~
"""~ vDw. .,,,.O'
..............""
w ~ ~ .... --~ D~ ~ Dw ....."'"
62 C. CANUTO"Parallelism in
We are now in a position to discuss how to introduce parallelism in the
calculation of (u 9 ~7)u. Two conceptually different forms of parallelism can be
considered:
a) Mathematical Parallelism: assign the calculation of different terms vDw to diffe-
rent processors;
b) Numerical Parallelism: assign different portions of the computational domain (in
Physical space or in Transform space) to different processors, with the same
mathematical task.
The mathematical parallelism is the simplest to conceive and even to code;
however, it suffers from a number of drawbacks. Since the same component of u
may be needed by different processors, a large shared memory is necessry, and/or
large data transfers occur. Different processors may require the same data at the
same time, leading to severe memory bank conflicts. Furthermore, problems of
synchronisation and balancing may arise if the different mathematical terms do
not require the same computational effort, or if their number is not a multiple of
the number of processors. The strategy becomes definitely useless on fine grain
architectures.However, it can represent a first level of parallelism, if a hierarchy
of parallelisms is available.
Leca and Sacchi-Landriani [11] report their experience of parallelization of
a mixed Fourier-Chebyshev Navier-Stokes algorithm, known as the Kleiser-
Schumann method (see the next section). They use a multi-AP system at
ONERA (France), four AP-120 processors having access to a ((large)) shared
memory (compared to the ,local, memories). Starting from a single-processor
pre-existent code, Leca and Sacchi-Landriani simply send different subroutines -
computing different contributions to the equation - to different processors. The
largest observed speed-up is 2.78 out of the maximal 4.
From now on, we will discuss strategy b) of parallelization, i.e., Numerical
Parallelism. The question is: how do we split the computational domain among
the processors in order to get the highest degree of parallelism with the lowest
communication costs? The first answer to this question comes from the following
fundamental observation:
Spectral methodsfor multi-dimensional boundary valueproblems are inherently tensorproducts
of one-dimensional spectral methods.
This means that the orthogonal basis functions and the computational grids
which define a multidimensional spectral method are obtained by taking tensor
Spectral Methods 63
products of suitable orthogonal basis functions and grids on intervals of the real
line.
It follows that the elementary transformations (discrete transforms, dif-
ferentiation, pointwise product, ...) which constitute a spectral method can be
obtained as a sequence (a cascade) of one dimensional transformations of the
same nature. Each of these transformations (e.g., differentiation in the x direc-
tion) can be carried out in parallel over parallel rows or columns of the computa-
tional domain (either in Physical space or in Transform Space).
Therefore, the simplest strategy of domain decomposition will consist of
assigning ~slices~ of the computational domain (i.e., groups of contiguous rows or
columns, in a two dimensional geometry) to different processors. Once again we
stress that we consider slices both in Physical Space (i.e., rows/columns ofgridva-
lues) and in Transform Space (i.e., rows/columns of~Wourier~ coefficients). After
a transformation along one space direction has been completed, one has to trans-
pose the computational lattice in order to carry out transformations along the
other directions. Transposition should not be a major problem on architectures
with large shared memory or wide-band buses.
Erlebacher, Bohkari and Hussaini [5] report preliminary experiences of cod-
ing a Fourier-Chebyshev method for compressible Navier-Stokes simulations on
a 20 processor Flex/32 computer at the NASA Langley Research Center. Since
the time marching scheme is fully explicit, almost all the work is spent in comput-
ing convective or diffusive terms by the spectral technique. Parallelization is
achieved by the strategy described above. The physical variables on the com-
putational domain are stored in shared memory; slices of them are sent to the
processors, which write the results of their computation in shared memory. The
authors' conclusions are summarized in Table 1, where speed-ups (Sp) and effi-
ciencies (Ep) are documented for different choices of the computational grid.
According to the authors, moving variables between shared and local memory
should not cause major overheads even on such a supercomputer as the ETA 1~
Indeed, quoting from [5], ~a good algorithm [on the ETA 1~ should perform at
least 5 floating point operations per word transferred one way from common
memory,s. This minimum work is certainly achieved within a spectral code:
think, for instance, of differetiation in physical space via FFT.
Transposition of the computational lattice will eventually become prohibi-
tive on fine grain, local memory architectures. In this case, small portions of the
computational domain will be permanently resident in local memories, and inter-
communication among processors will be the major issue. In order to understand
the communication needs of a spectral method, let us observe that ifL(u) is any
64 C. CANUTO"Parallelism in
Table 1. Performance data of one residual calculation (courtesy
Bokhari and Hussaini [5]).
of Erlebacher,
Performance
Grid N,o,x2-'* P r. sp F.,
128x 16x8 12.7
8 365 7.55 94.3
4 705 3.91 97.6
2 1386 1.99 99.4
1 2757 1.00 100.0
8x64x32 9.0
8 269 7.52 94.0
4 510 3.87 96.8
2 1003 1.97 98.5
1 1977 1.00 100.0
64x 16x 16 8.0
16 138 13.01 81.3
8 242 7.41 92.6
4 466 3.84 95.9
2 916 1.95 97.7
1 1786 1.00 100.0
32x32x16 6.7
16 118 12.82 80.1
8 202 7.45 93.2
4 388 3.89 97.3
2 759 1.99 99.3
1 1511 1.00 100.0
32x16x16 2.7
16 62 9.98 62.4
8 92 6.72 84.0
4 168 3.67 91.7
2 321 1.92 96.0
1 616 1.00 100.0
16x16x16 1.0
16 37 6.54 40.9
8 43 5.60 70.0
4 71 3.44 86.0
2 127 1.92 95.8
1 244 1.00 100.0
Spectral Methods 65
differential operator (of any order, with variable coefficients or non-linear terms,
etc.) then one can compute a spectral approximation to L(u) at a point P of the
computational domain using only information at the points lying on the rows and
columns meeting at P (see Figure 1.a). This means that spectral methods,
although global methods in one space dimension, exhibit a precise sparse struc-
ture in multidimensional problems.
x x x |
X X X (~)
X X X (~)
X X X (~)
X X X ~)
X X X (~)
|174174174
X X X (~)
X X X (~)
X X X X X
X X X X X
X X X X X
X X X X X
X X X X X
X X X X X
--|174174174174
P
X X X X X
X X X X X
Figure 1.a - The spectral r162 at P in the computational
domain.
Let us confine ourselves to the 2-dimensional case. If we assume to partition
the computational domain among an array of m2 processors (see Figure 1.b),
then processor Pi0do will need to exchange data only with processors Pi0d (J
varying) and Pido(i varying). Thus, information will be exchanged among at most
O(n) processors.
Note that differentiation in Fourier space and evaluation of non-linear terms
in physical space require no communication among processors. Thus, the com-
munication demand in the spectral calculation of differential operators is dictated
by the two following one dimensional transformations:
(24) Fast Fourier transforms;
(25) Differentiation in Chebyshev transform space, according to (1.10).
66 C. CANUTO: Parallelism in
P~jo
Pi~jo Pij
Figure 1.b - Communications in a lattice of processors.
3. Solution of linear systems.
Hereafterl we will discuss two examples of solution of linear systems arising from
spectral approximations.
3.1. Solving a Stokes problem via an influence matrix technique.
We consider the Kleiser-Schumann algorithm for solving problem (1.13) in
the infinite slab ~ = R• (-1,1), with g= 0 and u 2~-periodic in the x and y
directions (see [10] for the details). The basic idea is that (1.13) is equivalent to
(3.1)
au-vAu+ Vp=G
Ap=div G
in g?,
u_-0 1div u=0 on 092;
this, in turn, is equivalent to
(3.2)
Ap= divG
p=2
in f2,
on 00,
au-vAu = G-Vp in g2,
u=0 on 0s
Spectral Methods 67
provided $ is chosen in such a way that div u -- 0 on 0g?. If we project (3.2) on
each Fourier mode in the x and y directions, we get a family of one dimensional
Helmholtz problems in the interval (-1,1), where the unknowns are the Fourier
coefficients ofp and u along the chosen mode. The boundary values 4+ and ~l for
the Fourier coefficient of p are obtained by solving a 2x2 linear system, whose
matrix - the influence matrix - is computed once and for all in pre-processing
stage.
Thus, one is reduced to solve Helmholtz problems of the form
(3.3)
-w" + flw = h for -l<z<l, fl>~O.
w(1)=a, w(-1)=b.
N
A Chebyshev approximation wN(z) =k~__owk Tk(Z) to w(z) is defined through
the tau-method as
-~b(2) + fl~bm = ]~m, O~m<~N-2,
(3.4) N N
~, ~b~ = a; ~ (-1)mrb,~ = b.
m=O m=O
^ (2) 1 N
Here Wm = Z k(k 2 - m2)~k is the m-th Chebyshev coefficient ofw".
k=m+2
Cra k+m even
Several levels of parallelism can be exploited in the Kleiser-Schumann algor-
ithm. The most obvious one consists of splitting the Fourier modes among the
processors. There is no communication needed to solve (3.2), and a perfect ba-
lance of work among processors can be easily achieved. This strategy has been
followed by Leca and Sacchi-Landriani [11]. The next level of parallelism origin-
ates from the observation that in each tau system (3.4), the odd Chebyshev mod-
es are uncoupled from the even ones. Hence, the task of solving (3.4) can be split
over two processors. Finally, each of the resulting linear systems can be written in
tridiagonal form (see e.g., [1], Chapter 5 for more details).
The last property also holds for tau approximations of the Poisson equation
in several space dimensions, provided a preliminary diagonalization has been
carried out (see, again, [1], Chapter 5). Thus, the communication demand when
solving linear systems originated by tau approximation is essentially related to
the
(3.5) solution of tridiagonal systems.
68 C. CANUTO:Parallelism in
3.2. Solving an elliptic boundary value problem by an iterative technique.
Let us assume we want to solve the model problem
-Au=f in the cube g2=(-1,1) 3,
(3.6)
u=O on Og2,
by a Chebyshev collocation method. For a fixed N>0, we define the Chebyshev
grid GN ={(xi, yj, zk) I 0~<i, J, k~<N}, with xt=yt=zt=cos lzc/N, 0~<I~N. We seek
a polynomial uN of degree N in each space variable, such that
(3.7)
--AuN(xi, yj, zk)= f(xi, Yi, Zk)
uN(xi, yj, Zk)=O
V(xi, yj, Zk) e GN f?s
"V(Xi, yj, Zk) E GNfq0t2.
Setting u = {uN(xi, yj, Zk) ] l~<i,J, k~<N-1} and f= {f(xi, yj, zk)l l~<i,j, k~<N-1},
let us write (3.7) in matrix form as
(3.8) Lsp u=f.
The resulting matrix, built up from the one-dimensional matrices (1.11), is
not banded. Hence, one has to resort to an iterative technique in order to solve
(3.8). Among the most popular schemes is the preconditioned Richardson itera-
tive method
(3.9) un+l=un-anA -1 [Lsp un--J] n=0, 1, 2,. .... ,
where an>0 is an acceleration parameter which can be dynamically chosen at
each iteration, and A-~ is a preconditioning matrix. Note that the spectral re-
sidual rn=Lspu~--fcan be efficiently computed by the transform techniques de-
scribed in the previous section, for which several strategies of parallelism have
been discussed.
The matrix A is an easily <<invertiblo> approximation of Lsp, such that
Spectral Methods 69
~max (A_lLsp)~ 1 as apposed to ~.,,a, (Lsp) = 0(N4). (Here 2max, resp., 2rain,
~min ~rain
denote the largest, resp., the smallest eigenvalue of the indicated matrix). This is
achieved, for instance, if A is the matrix of a low order finite difference or finite
element method for the Laplace operator on the Chebyshev grid GN. Multilinear
finite elements (Deville and Mund [19]) guarantee exceedingly good precon-
ditioning properties.
The direct solution of the finite difference or finite element system at each
Richardson iteration may be prohibitive for large problems. An approximate
solution is usually enough for preconditioning purposes. Most of the algorithms
proposed in the literature (see., e.g., [1], Chapter 5 for a review) are global se-
quential algorithms (say, an LU incomplete factorization).
Recently, Pietra and the author [2] have proposed solving approximately the
trilinear finite element system by a small number of ADI iterations. They use an
efficient ADI scheme for tensor product finite elements in dimension three intro-
duced by Douglas [4]. The method can be easily extended to handle general
variable coefficients. As usual, efficiency in an ADI scheme is gained by cycling
the parameters. The ADI parameters can be automatically chosen in such a way
that the cycle length lc(e) needed to reduce the error by a factor e satisfies
lc(e) = log (2r,a~) = log (Art) = 4logN.
~min
It follows that for a fixed e, 0 < e<l, there exists a cycle length lc(e), sati-
1 2max
flying lc(e)-- C(e) log N such that 2max (A_1L,p) -- -- (A-I L,p), where
2min 1-e •min
A is the exact finite element matrix and ,~ is its ADI approximation correspon-
ding to a length lc(e) of the parameter cycle. In other words, one can get nearly
the same preconditioning power as that of the exact finite element matrix, provi-
ded the number of ADI iterations is increased with N at a mere logarithmic rate.
The choice of ADI iterations is quite appropriate in conjunction with spec-
tral methods. Indeed, ADI share with spectral methods the tensor product struc-
ture, which is the basis for the alternate sweeps in the rows and the columns of the
computational domain. Furthermore, the Douglas version of ADI for finite el-
ments reduces each sweep to the solution of a set of indepedent tridiagonal sy-
stems of linear equations, the same kind of system which originates from a one
dimensional tau spectral method. Thus, ADI and spectral methods share most of
the pros and the cons with respect to the problem of their parallelization. Johns-
70 C. CAmlTO: Parallelismin
son, Saad and Schultz [9] discuss highly efficient implementations of ADI met-
hods on several parallel architectures.
3.3 Communication needs
We have explored the structure of several spectral type algorithms, pointing out
the most significant features in view of their implementation on parallel
architectures. We first stressed the tensor product structure of spectral methods,
next we indicated the one-dimensional transformations which more frequently
occur in these methods: they are given in (2.4), (2.5) and (3.5).
It is outside the scope of this paper to discuss in detail the implementation of
these transformations on specific parallel architectures. Here, we simply recall
the most suitable interconnection networks for each of these transformations re-
ferring for a deeper analysis to classical books on parallel computers such as [8],
or to review papers such as [13].
Fast Fourier Transforms play a fundamental r61e in spectral methods. The
Perfect Shuffle interconnection network (Pease (1968), Stone (1971)) is the
optimal communication scheme for this class of transforms.
Differentiation in Chebyshev transform space essentially amounts to a mat-
rix-vector multiplication, where the matrix is upper triangular Toepliz (see
(1.10)). Thus, it can be written as a recursion relation as follows
Cm I] (1) = Cm+ 2 1](1)+ 2 "~ (m+l)1]m+l, m=N-1, ..., 0;
1](1~+~ = 1](~ = 0.
Cyclic Reduction (Golub-Hockney, 1965) or Cyclic Elimination (Heller,
1976) are the implementations of recursive algorithms which are suggested for
parallel architectures. Several interconnection networks have been proposed for
these transformations (see again [8], [13]).
The tridiagonal systems arising from tau methods or finite-order precon-
ditioners can be efficiently solved on parallel machines by a variety of substruc-
turing algorithms, which include Cyclic Reduction or Cyclic Elimination. John-
son, Saad and Schultz [9] discuss to implementation of ADI methods on the
hypercube architecture. Often, it is advisable to invert the tridiagonal matrix
once and for all in a preprocessing stage and then solve the linear systems by
Spectral Methods 71
matrix-vector multiplication. In this case, the Nearest Neighbor Network pro-
vides the optimal communication scheme.
It is clear from the previous discussion that several intercommunication
paths should co-exist in order to allow an optimal implementation of spectral
algorithms on parallel architectures. The union of the Perfect Shuffle Network
with the Nearest Neighbor Network (PSNN) is an example of a multi-path
scheme, quite appropriate for spectral methods. The PSNN was first proposed by
Grosch [7] for an efficient parallel implementation of fast Poisson solvers.
4. Domain decompositions in spectral methods.
The parallel implementation of different domain decomposition techniques
for general boundary value problems is discussed in the paper by A. Quarteroni
in this volume; we refer to it for the details. Hereafter, we confine ourselves to
some basic considerations about the use of a domain decomposition strategy with
spectral methods.
Partitioning the domain is an absolute need if the geometry of the domain is
complex, i.e., if it cannot be easily mapped into a Cartesian region. In this case,
one breaks the domain into simple pieces, and sets up a separate scheme in each
subdomain; suitable continuities are enforced at the interfaces, usually by an
iterative procedure. (We refert to [1], Chapter 13, for a review of the existing
domain decomposition techniques for spectral methods).
The same strategy can be applied, even on a simple geometry, with the
primary purpose of splitting the computational effort over several processors.
This ~route to parallelism- - which is quite successful when the discretization
scheme is of finite order- may contain a potential source of inefficiency if used in
the context of spectral methods. Indeed, it leads to breaking the globality of the
expansion, which - as we know - is a necessary ingredient in order to have high
accuracy for regular solutions. We stress here one of the crucial differences be-
tween local, finite order approximations and spectral approximations to the same
boundary value problem, produced by a domain decomposition technique. In the
former case, the solution obtained at convergence of the iterative procedure coin-
cides with the solution obtained by a single-domain method of the same type,
which employs the union of the grids on the subdomains. In the latter case, the
single-domain solution is a global polynomial function, whereas the final multi-
domain solution is merely a piecewise polynomial function, with finite order
smoothness at the interfaces. Although this does not prevent asymptotic spectral
72 C. CANUTO:Parallelism in
accuracy for the multi-domain solution, its actual accuracy may be severely de-
graded if compared to that of the single-domain solution defined by the same
total number of degrees of freedom.
Let us illustrate the situation with a model problem, taken from [2]. Consid-
er the Dirichlet problem for the Poisson equation in the square (-1, 1)2, whose
exact solution is u(x, y) = cos 2nx cos 2ary. We divide the domain into four equal
squares, on each of which we set a Chebyshev collocation method, plus we en-
force C ~ continuity at the interfaces. The results are compared with those pro-
duced by a Chebyshev collocation method on the original square, which uses the
same total number of unknowns. The relative L = errors are reported in Table 2.
Table 2. Relative maximum-norm errors for a Chebyshev collocation method (from
[2]).
u(x, y) =
cos2~ Xcossry
4 DOM, 4x4 .62 E0
1 DOM, 8x8 .35 E-1
4 DOM, 8x8 .12 E-2
1 DOM, 16x16 .11 E-6
4 DOM, 16x16 .49 E-10
1 DOM, 32x32 .38 E-14
Note the loss of four orders of magnitude in replacing the single domain with
16x 16 nodes by the four domains, each with a 8x8 grid. Of course, if we have
four processors and we can reach the theoretical speed-up of four in the domain
decomposition technique, we can run four 16x 16 subdomains in parallel at the
cost of a single 16 X 16 domain on a single-processor, and gain four order of mag-
nitudes in accuracy. However, if we seek parallelism through the splitting techni-
ques described in Sections 2 and 3, and we maintain a speed-up of four, we can
run for the same cost a 32x32 grid on the single domain, yielding a superior
accurcy again by a factor of 10-4. Thus, it appears that it is better to keep the
Spectral Methods 73
spectral expansion as global as possible, and look for parallelism at the level of the
solution of the algebraic system originated from the discretization method.
We conclude by going back to domain decompositions for the spectral
scheme on each, <<simple>)subdomain, supplemented by suitable continuity con-
ditions at the interfaces. Deville and Mund [3] indicated that this can be done by
an iterative procedure such as (3.9), where A-1 is a <<globalpreconditioner>), i.e.,
an approximation of the differential problem over the whole domain. If the pre-
conditioner is of finite element type, the interface conditions can be incorporated
in the variational formulation, as shown in [2]. Thus, at each iteration, one has to
compute the spectral residuals separately on each subdomain. This can be done
in parallel. Next, one has to (approximately) solve a finite element system. Again,
this can be carried out in parallel form using one of the existing domain decom-
position techniques for finite elements methods. Note that in principle the do-
main decomposition used at this stage may be totally independent of the one
introduced for setting the spectral approximation.
REFERENCES
[1] C. CANUTO,M. Y. HUSSAINI,A. QUARTERONI,T. A. ZANGSpectral Methods in
Fluid Dynamics, Springer Vertag, New York, 1988.
[2] C. CANUTO,P. PIETRA,Boundary and interfaceconditionswithin afinite elementprecon-
ditionerfor spectral methods, I.A.N.-C.N.R. Report n~ 555, Pavia, 1987.
[3] M. DEVILLE,E. MUND, Chebyshevpseudospectral solution of second-orderelliptic equa-
tions with finite elementpreconditioning, J. Comput. Phys., 60 (1985), 517-533.
[4] J. DOUOLAS,JR., Alternating directionmethodsfor threespacevariables, Numer. Math.,
4 (1962), 41-63.
[5] G. ERLEBACHER,S. BOKHARI,M. Y. HUSSAINI,Threedimensionalcompressibletransi-
tion on a 20processorFlex/32 multicomputer, preprint, NASA Langley Research
Center, 1987.
[6] D. GOTTLIEB, S. A. ORSZAO,Numerical Analysis of Spectral Methods: Theory and
Applications, SIAM-CBMS, Philadelphia, 1977.
[7] C. E. GROSCH,Performance analysis of Poisson solvers on array computers, (1979) In-
fotech State of the Art Report: Supercomputers (C. Jesshope and R. Hockney,
eds.), Infotech, Maindehead, 147-181.
[8] R. HOCKNEY,C. JESSHOPE,Parallel Computers:Architecture, Programming and Algor-
ithms, Adam Hilger, Bristol, 1981.
[9] S. L. JOHNSSON,Y. SAAD,M. H. SCHULTZ,Alternating direction methods on multip-
rocessors, Report YALEU/DCS/RR-382, October 1985.
[10] L. KLEISER, U. SCHUMANN,Treatment of incompressibility and boundary conditions in
3-D numerical spectral simulations of plane channelflows, Proc. 3rd GAMM
74 C. CANUTO: Parallelism in
Conf. Numerical Methods in Fluid Mechanics (E. H. Hirschel, ed.),
Vieweg Verlag, Braunshweig, 1980, 165-173.
[11] P. Lv.cA,G. SACcm-LANDRL~NI,ParalMisation d'un algorithme de matrice d'influence
pour la rgsolutiondes equations deNavier-Stokes par m~thodesspectrales, La R6cher-
che A6rospatiale, 6 (1987), 35-42.
[12] D. M. NOSENCHUCK,S. E. KRXST,T. A. ZANO,On multigrid methodsfor the Navier-
Stokes Computer, paper presented at the 3rd Copper Mountain Conference
on Multigrid Methods, Copper Mountain, Colorado, April 6-10, 1987.
[13] J. M. ORTEGA,R. G. VOIOT,Solution of partial differential equations on vector and
parallel computers, SIAM Review, 27 (1985), 149-240.
[14] C. T~.MPERTON,Self-sorting mixed-radix fast Fourier transforms, J. Comput. Phys., 52
(1983), 1-23.
[15] R. G. Vola% D. GorrT.IEB, M. Y. HussAIm (eds). Spectral Methods for Partial
Differential Equations, SIAM, Philadelphia, 1984.

More Related Content

What's hot

Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...
Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...
Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...
Behzad Samadi
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
PRADOSH K. ROY
 
FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS
FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKSFEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS
FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS
ieijjournal
 
Entropy 19-00079
Entropy 19-00079Entropy 19-00079
Entropy 19-00079
Mazharul Islam
 
2005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_20052005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_2005
CosmoSantiago
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
International Journal of Engineering Inventions www.ijeijournal.com
 
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREEA NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
ijscmc
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series prediction
csandit
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
IRJET Journal
 
A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...
A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...
A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...
ijfls
 
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticLow Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
IJERA Editor
 
My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006
Dr. Catherine Sinclair She/Her
 
B02402012022
B02402012022B02402012022
B02402012022
inventionjournals
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)
Dr. Khurram Mehboob
 
HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...
HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...
HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...
cscpconf
 
Introduction to DFT Part 1
Introduction to DFT Part 1 Introduction to DFT Part 1
Introduction to DFT Part 1
Mariana M. Odashima
 
Economia01
Economia01Economia01
Economia01
Crist Oviedo
 

What's hot (17)

Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...
Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...
Backstepping Controller Synthesis for Piecewise Polynomial Systems: A Sum of ...
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
 
FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS
FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKSFEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS
FEEDBACK LINEARIZATION AND BACKSTEPPING CONTROLLERS FOR COUPLED TANKS
 
Entropy 19-00079
Entropy 19-00079Entropy 19-00079
Entropy 19-00079
 
2005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_20052005 pinto santiago_marchi_cobem_2005
2005 pinto santiago_marchi_cobem_2005
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
 
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREEA NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
A NEW PARALLEL ALGORITHM FOR COMPUTING MINIMUM SPANNING TREE
 
On selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series predictionOn selection of periodic kernels parameters in time series prediction
On selection of periodic kernels parameters in time series prediction
 
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
Development, Optimization, and Analysis of Cellular Automaton Algorithms to S...
 
A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...
A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...
A Counterexample to the Forward Recursion in Fuzzy Critical Path Analysis Und...
 
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed ArithmeticLow Power Adaptive FIR Filter Based on Distributed Arithmetic
Low Power Adaptive FIR Filter Based on Distributed Arithmetic
 
My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006My Prize Winning Physics Poster from 2006
My Prize Winning Physics Poster from 2006
 
B02402012022
B02402012022B02402012022
B02402012022
 
Computational Method to Solve the Partial Differential Equations (PDEs)
Computational Method to Solve the Partial Differential  Equations (PDEs)Computational Method to Solve the Partial Differential  Equations (PDEs)
Computational Method to Solve the Partial Differential Equations (PDEs)
 
HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...
HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...
HEATED WIND PARTICLE’S BEHAVIOURAL STUDY BY THE CONTINUOUS WAVELET TRANSFORM ...
 
Introduction to DFT Part 1
Introduction to DFT Part 1 Introduction to DFT Part 1
Introduction to DFT Part 1
 
Economia01
Economia01Economia01
Economia01
 

Similar to Parellelism in spectral methods

A New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming ProblemsA New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming Problems
Jody Sullivan
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
AllanKelvinSales
 
Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...
Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...
Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...
ArchiLab 7
 
E03503025029
E03503025029E03503025029
E03503025029
theijes
 
Numerical Solution and Stability Analysis of Huxley Equation
Numerical Solution and Stability Analysis of Huxley EquationNumerical Solution and Stability Analysis of Huxley Equation
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
VLSICS Design
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
VLSICS Design
 
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitRobust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
Pantelis Bouboulis
 
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINESAPPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
cseij
 
Application of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linesApplication of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip lines
cseij
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
Mostafa G. M. Mostafa
 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer Design
Angie Miller
 
3. AJMS _461_23.pdf
3. AJMS _461_23.pdf3. AJMS _461_23.pdf
3. AJMS _461_23.pdf
BRNSS Publication Hub
 
Haar wavelet method for solving coupled system of fractional order partial d...
Haar wavelet method for solving coupled system of fractional  order partial d...Haar wavelet method for solving coupled system of fractional  order partial d...
Haar wavelet method for solving coupled system of fractional order partial d...
nooriasukmaningtyas
 
Oscar Nieves (11710858) Computational Physics Project - Inverted Pendulum
Oscar Nieves (11710858) Computational Physics Project - Inverted PendulumOscar Nieves (11710858) Computational Physics Project - Inverted Pendulum
Oscar Nieves (11710858) Computational Physics Project - Inverted Pendulum
Oscar Nieves
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
Tomasz Waszczyk
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
Tomasz Waszczyk
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
IJECEIAES
 
Paper.pdf
Paper.pdfPaper.pdf
Paper.pdf
DavCla1
 
lost_valley_search.pdf
lost_valley_search.pdflost_valley_search.pdf
lost_valley_search.pdf
manuelabarca9
 

Similar to Parellelism in spectral methods (20)

A New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming ProblemsA New Neural Network For Solving Linear Programming Problems
A New Neural Network For Solving Linear Programming Problems
 
15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf15.sp.dictionary_draft.pdf
15.sp.dictionary_draft.pdf
 
Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...
Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...
Erwin. e. obermayer k._schulten. k. _1992: self-organising maps_stationary st...
 
E03503025029
E03503025029E03503025029
E03503025029
 
Numerical Solution and Stability Analysis of Huxley Equation
Numerical Solution and Stability Analysis of Huxley EquationNumerical Solution and Stability Analysis of Huxley Equation
Numerical Solution and Stability Analysis of Huxley Equation
 
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELSTEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
TEST GENERATION FOR ANALOG AND MIXED-SIGNAL CIRCUITS USING HYBRID SYSTEM MODELS
 
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
Test Generation for Analog and Mixed-Signal Circuits Using Hybrid System Mode...
 
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching PursuitRobust Image Denoising in RKHS via Orthogonal Matching Pursuit
Robust Image Denoising in RKHS via Orthogonal Matching Pursuit
 
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINESAPPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
APPLICATION OF PARTICLE SWARM OPTIMIZATION TO MICROWAVE TAPERED MICROSTRIP LINES
 
Application of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip linesApplication of particle swarm optimization to microwave tapered microstrip lines
Application of particle swarm optimization to microwave tapered microstrip lines
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
An Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer DesignAn Algorithm For Vector Quantizer Design
An Algorithm For Vector Quantizer Design
 
3. AJMS _461_23.pdf
3. AJMS _461_23.pdf3. AJMS _461_23.pdf
3. AJMS _461_23.pdf
 
Haar wavelet method for solving coupled system of fractional order partial d...
Haar wavelet method for solving coupled system of fractional  order partial d...Haar wavelet method for solving coupled system of fractional  order partial d...
Haar wavelet method for solving coupled system of fractional order partial d...
 
Oscar Nieves (11710858) Computational Physics Project - Inverted Pendulum
Oscar Nieves (11710858) Computational Physics Project - Inverted PendulumOscar Nieves (11710858) Computational Physics Project - Inverted Pendulum
Oscar Nieves (11710858) Computational Physics Project - Inverted Pendulum
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
 
Paper.pdf
Paper.pdfPaper.pdf
Paper.pdf
 
lost_valley_search.pdf
lost_valley_search.pdflost_valley_search.pdf
lost_valley_search.pdf
 

Recently uploaded

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
Kamal Acharya
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
mahaffeycheryld
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
mahaffeycheryld
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
Prakhyath Rai
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
PKavitha10
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
Yasser Mahgoub
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
AjmalKhan50578
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
aryanpankaj78
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
upoux
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
Nada Hikmah
 

Recently uploaded (20)

Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Gas agency management system project report.pdf
Gas agency management system project report.pdfGas agency management system project report.pdf
Gas agency management system project report.pdf
 
Generative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdfGenerative AI Use cases applications solutions and implementation.pdf
Generative AI Use cases applications solutions and implementation.pdf
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
AI for Legal Research with applications, tools
AI for Legal Research with applications, toolsAI for Legal Research with applications, tools
AI for Legal Research with applications, tools
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...Software Engineering and Project Management - Introduction, Modeling Concepts...
Software Engineering and Project Management - Introduction, Modeling Concepts...
 
CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1CEC 352 - SATELLITE COMMUNICATION UNIT 1
CEC 352 - SATELLITE COMMUNICATION UNIT 1
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 08 Doors and Windows.pdf
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
Welding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdfWelding Metallurgy Ferrous Materials.pdf
Welding Metallurgy Ferrous Materials.pdf
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 
Digital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptxDigital Twins Computer Networking Paper Presentation.pptx
Digital Twins Computer Networking Paper Presentation.pptx
 
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
一比一原版(osu毕业证书)美国俄勒冈州立大学毕业证如何办理
 
Curve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods RegressionCurve Fitting in Numerical Methods Regression
Curve Fitting in Numerical Methods Regression
 

Parellelism in spectral methods

  • 1. PARALLELISM IN SPECTRAL METHODS C. CANUTO(I) ABSTRACT - Several strategies of parallelism for spectral algorithms are discussed. The investigation shows that, despite the intrinsic lack of locality of spectral methods, they are amenable to parallel implementations, even on fine grain architectures. Typical algorithms for the spectral approximation of the viscous, incompressible Navier-Stokes equations serve as examples in the discussion. SOMMARIO - Si discutono diverse strategie di parallelizzazione di algoritmi di tipo spettrale. L'analisi mostra che i metodi spettrali possono essere efficacemente implementati su architetture parallele, anche a grana fine, nonostante il loro carattere non-locale. Nella discussione si usano a titolo di esempio alcuni noti algoritmi spettrali per l'approssimazione delle equazioni di Navier-Stokes vi- scose e incompressibili. Introduction. Since their origin in the late sixties, spectral methods in their modern form have been designed and developed with the aim of solving problems, which could not be tackled by more conventional numerical methods (finite differences, and later finite elements). The direct simulation of turbulence for incompressible flows is the most popularly known example of such applications: the range of phenomena amenable to a satisfactory numerical simulation has widened during the years (l) Dipartimento di Matematica, Universit~ di Parma, 43100 Parma, Italy and Istituto di Analisi Numerica del C.N.R., Corso C. Alberto, 5 - 27100 Pavia, Italy. Invited paper at the International Symposium on ~,Vector and Parallel Proces- sors for Scientific Computation- 2~, held by the Accademia Nazionale dei Lincei and IBM, Rome, September 1987.
  • 2. 54 C. CANUTO:Parallelism in under the twofold effect of the increase of the computers' power and the develop- ment of sophisticated algorithms of spectral type. The simulation of the same phenomena by other techniques would have required a computer power larger by order of magnitudes, hence, it would not have been feasible on the currently available machines (a discussion of the most significant achievements of spectral methods in fluid dynamics can be found, e.g., in Chapter 1 of ref. [1]). Since spectral methods have been constantly used in ~extreme>>applications, their implementation has taken place on state-of-the-art computer architectures. The vectorization of spectral algorithms was a fairly easy task. Nowadays, spec- tral codes for fluid-dynamics run on vector supercomputers such as the Cray family or the Cyber 205, taking full advantage of their pipeline architectures and reaching rates ofvectorization well above 80% (we refer, e.g., to Appendix B in ref. [1]). On the contrary, the implementation of spectral algorithms on parallel com- puters is still in its infancy. This is partly due to the fact that multiprocessor supercomputers are only now becoming available to the scientific community. But there is also a deeper motivation: it is not yet clear whether and how the global character of spectral methods will efficiently fit into a highly granular parallel architecture. Thus, a deep investigation - of both a theoretical and ex- perimental nature - is needed. As a testimony of the present uncertainty on this topic, we quote the point of view of researchers working at the development of a multipurpose parallel supercomputer, especially tailored for fluid-dynamics ap- plications, known as the Navier-Stokes Computer (NSC). This is a joint project between Princeton University and the NASA Langley Research Center, aimed at building a parallel supercomputer made up of a fairly small number of powerful nodes. Each node has the performance of a class VI vector supercomputer; the initial configuration will have 64 of such nodes. Despite the superior accuracy of spectral methods over finite difference methods, the scientists involved in this project have chosen to employ low-order finite differences at least in the initial investigation on how well transition and turbulence algorithms can exploit the NSC architecture. Indeed ,~the much greater communication demands of the global discretization may well tip the balance in favor of the less accurate, but simpler local discretizations>> ([ 12]). Currently, a number of implementations of spectral algorithms on parallel architectures is documented. Let us refer here to the work done at the IBM European Center for Scientific and Engineering Computing (ECSEC) in Rome, at the Nasa Langley Research Center by Erlebacher, Bokhari and Hussaini [5], and at ONERA (France) by Leca and Sacchi-Landriani [11]. The IBM con-
  • 3. Spectral Methods 55 tributions are described in detail by P. Sguazzero in a paper in this volume. The latter contributions will be briefly reviewed in the present paper. The purpose of this paper is to discuss where and to what extent it is possible to introduce parallelism in spectral algorithms. We will also try to indicate which communication networks are suitable for the implementation of spectral methods on fine grain, local memory architectures. 1. Basic aspects of spectral methods. Let us briefly review the fundamental properties of spectral methods for the approximation of boundary value problems. We will focus on those aspects of the methods which are more related to their implementation in a multiprocessor environment. For complete details we refer, e.g., to refs. [1], [6], [15]. Let us assume that we are interested in approximating a boundary value problem, which we write as (1.1) j L(u)=f in S'2 + boundary conditions on dl-2, in a d-dimensional box I2 = /-/~i=1 (ai, hi). We approximate the solution u by a finite expansion (1.2) UN = ~ 1~lk ~k(X), Ikl~N where k = (kl, ..., kd), (1.3) 0k (X) = /-/~i=l ~I (Xi)" Each ~m (i) is a smooth global basis function on (al, bi), satisfying the orthogonality condition bi (1.4) f~(im) (x) ~p(1)(x) w (x) dx = c, ~mn JI a i with respect to a weight function r In most applications, the one dimensional basis functions are trigonometric polynomials in the space directions where a periodicity boundary condition is enforced, and orthogonal algebraic polyno- mials (Chebyshev, or Legendre polynomials) in the remaining directions.
  • 4. 56 C. CANUTO: Parallelism in The boundary value problem is discretized by a suitable projection process, which can be represented as (1.5) a N e X N (LN(uN), V)N = (f, V)N, VV e YN Here XN is the space of trial functions, YN is the space of test functions, LN is an approximation of the differential operator L and (u, V)N is an inner product, which may depend upon the cut-off number N. In general, when XN ----YN and the inner product is the L2(I2) inner product we speak ofa Galerkinmethod; this is quite common for periodic boundary value problems. Otherwise, for non- periodic boundary conditions, we have a tau-methodwhen the inner product is the L2- inner product and YN is a space of test functions which do not individually satisfy the boundary conditions, or a collocationmethodwhen the inner product is an approximation of the L2(g2)-inner product based on a Gaussian quadrature rule. In order to have a genuine spectral method, the basis functions in the expa- sion (1.2) must satisfy a supplementary property, in addition to the orthogonality condition (1.4): if one expands a smooth function according to this basis, the ~Fourier>> coefficients of the function should decay at a rate which is a monotoni- cally increasing divergent function of the degree of regularity of the function. This occurs if we approximate a periodic function by the trigonometric system (if u CS(0,2z), then ilk= 0(]kl-S)). The same property holds if we expand a non- periodic function according to the eigenfunctions of a singular Sturm-Liouville problem (such as Jacobi polynomials). The above mentioned property is known as the spectralaccuracyproperty. When it is satisfied, one is in the condition to prove an error estimate for the approximation (1.5) of problem (1.1) of the form (1.6) Ilu-uNlIH~~< C(m, r)N m-r llUllHr for all r I> r0 fixed, where the spaces H r form a scale of Hilbert spaces in which the regularity of u is measured. Estimate (1.6) gives theoretical evidence of the fundamental property of spectral methods, namely, they guarantee an accurate representation of smooth, although highly structured, phenomena by a ~minimab> number of un- knowns. Spectral methods owe their success to the availability of ~fast algorithms>~ to handle complex problems. The discrete solution u~ is determined by the set of its
  • 5. Spectral Methods 57 ((Fourier coefficients)) {fikl ] k I ~< N} according to the expansion (1.2), but it can also be uniquely defined by the set of its values {uj = uN(xj)l ~ ] ~N} at a selected set GN = {xj I ~ I ~<N} in ~. The points in GN are usually the nodes of Gaussian formulae in D, such as the points xj = j~/N, j=0, ..., 2N-1 in [0,2~] for the trigonometric system, or the points xj = cos j~/N, j=0, ..., N in [-1,1] for the Chebyshev system. Thus, we have a double representation of UN, one in transform space, the other in physical space. The discrete transform (1.7) {ilk) <=> {uj} is a global transformation (each fik depends upon all the uj's, and conversely). For the Fourier and Chebyshev systems, fast transform algorithms are available to carry out the transformation in a cheap way. Thus, one can use either representa- tion of the discrete solution within a spectral scheme, depending upon which is the most appropriate and efficient. Numerical differentiation, a crucial ingredient of any numerical method for differential problems, can be executed within a spectral method either in trans- form space or in physical space. Let us confine ourselves to one dimensional Fourier or Chebyshev methods. N-I If u(x) = ~" Uk eikx k=-N is a trigonometric polynomial, then du N-1 (#= z ~(~ e"~, dx m=-N (1.8) In physical space, setting xj = j~/N, j = 0,..., 2N-1, we have du 2N-1 (xl) = ~, dOu(xj), 0<l~<2N-1, dx j=o with
  • 6. 58 C. CANUTO: Parallelism in (1.9) d~ = 1 2 0 (-l) ~+i cot (l-J)__Z, 14=j 2N , l=j. N On the other hand, if u(x) = ~_, k=O polynomial of the first kind), then uk Tk(x) (Tk(x) denoting the k-th Chebyshev du U - T. ~(~ Tm(x), with dx m=o N (1.1o) ~(2 = 2 z kak k=m+l Cm k+m odd (here Co= 2, Cm = 1 for m~>l). In physical space, setting xj = cosjar/N,j =0, ..., N, we have du N (xt) = X dU u(xj), O<l<~N, dx j=o with (1.11) d#=! c: (_-1_)~§ 9 xt- xj -xj 2Ne+ 1 - ~----- , _ 2N 2+1 6 l<.t=j<.N, l=j= 1, I=j=N (here Cl=CN=2, Cj=I for I<j<N-1).
  • 7. Spectral Methods 59 The previous relations show that spectral differentiation - like a discrete trans- form - is again a global transformation (with the lucky exception of Fourier dif- ferentiation in transform space). The global characterof spectral methods is cohe- rent with the global structure of the basis functions which are used in the expan- sion. Globality is the first feature of spectral methods we have to cope with in discussing vectorization and parallelization. If we represent the previous trans- forms in a matrix-times-vector form, they can be easily implemented on a vector computer, and they take advantage of this architecture because matrices are either diagonal, or upper triangular, or full. When the transforms are realized through the Fast Fourier Transform algorithm, one can use efficiently vectorized FFT's (see, e.g., [14]). Conversely, if we are concerned with parallelization, globality implies grea- ter communication demand among processors. This may not be a major problem on coarse grain, large shared memory architectures, such as the now commercial- ly available supercomputers (e.g., Cray XMP, Cray 2, ETA t~ ...). We expect difficulties on the future fine grain, local memory architectures, where informa- tion will be spread over the memories of tens or hundreds of processors. In order to make our analysis more precise, let us focus on perhaps the most significant application of spectral methods given so far, i.e., the numerical simulation of a viscous, incompressible flow. Let us assume we want to discretize the time-dependent Navier-Stokes equations in primitive variables (1.12) ut -vAtt + ~7p+(u'V)u=f div u=O u=g(or u periodic) u(x, O)=uo (x) in ~2x (0, T], in g2x(0, T], on 052x(0, T], in ~, in a bounded domain g2 C Ra (d=2 or 3). So far, nearly all the methods which have been proposed in the literature (see, e.g., Chapter 7 in [1] for a review) use a spectral method for the discretiza- tion in space, and a finite difference scheme to advance the solution in time. Typically, the convective term is advanced by an explicit scheme (e.g., second order Adams-Bashforth, or fourth order Runge-Kutta) for two reasons: the stability limit is always larger than the accuracy limit required to preserve overall spectral accuracy, and the nonlinear terms are easily handled by the pseudospectral tech- nique (see below). Conversely, the viscous and pressure terms are advanced by an
  • 8. 60 C. CANUTO*.Parallelism in implicit scheme (e.g., Crank-Nicolson), in order to avoid too strict stability limits. Thus, at each time level, one has to 1) evaluate the convective term (u-V)u for one, or several, known velocity fields. The computed terms appear on the right-hand side G of a Stokes' like problem (1.13) au-vAu+ ~7p = G in t2, div u=O in Y2, u=g (or u periodic) on Or2, where a= 1/At; ii) solve the spectral discretization of problem (1.13). In most cases, problem (1.13) is reduced to a sequence of Helmholtz prob- lems. These, in turn, are solved by a direct method or an iterative one. In the latter case, one has to evaluate residuals of spectral approximation of Helmholtz problems. We conclude that the main steps in a spectral algorithm are: A) calculation of differential operators on given functions; B) solution of linear systems. When the geometry of the physical domain is not Cartesian, one first has to reduce the computational domain to a simple geometry. In this case, one has to resort to one of the existing C) domain decomposition techniques. In the next sections, we will examine these three steps in some detail in view of their implementation on a multiprocessor architecture. 2. Spectral calculation of differential operators. Let us consider the following problem: ~(given the representation of a finite- dimensional velocity field u, either in Physical or in Transform space, compute
  • 9. Spectral Methods 61 the representation of (u -~7)u in the same space~. We recall that by representa- tion of a given function v in Transform space we mean the finite set of its <<Fourier~ coefficients according to the expansion (1.2); this set will be denoted by 9~. Similarl% by representation ofv in Physical space we mean the set of its values on a grid GN in the physical domain, which uniquely determines v; this repre- sentation will be denoted by v. Each component of (u 9 ~7)u is a sum of terms of the form (2.1) vDw, where v and w are components ofu and D denotes differentiation along one space direction. An <<approximate~ representation of (2.1) in Transform space can be com- puted by the so-called pseudospectral technique, which can be described as fol- lows: " ~' V'..... "'"............. ....... " vDw , (vDw) ^(2.2) ..,, ...... .......'"" "& .... --9 D~' ~ Dw ......... The solid line denotes discrete transform (computable by FFT), the dashed line indicates differentiation in transform space, and the dotted line means point- wise multiplication in physical space. The result of (2.2) is not the exact repre- sentation of (2.1) in transform space due to presence of an aliasing error; howev- er, it is possible to eliminate such an error, using again transformations similar to the previous ones (see, e.g., Chapter 3 in [1]). The representation of (2.1) in physical space is as follows (with the same meaning of the arrows as before): (2.3) V ............................................................... .~ .,.. ,~ """~ vDw. .,,,.O' .............."" w ~ ~ .... --~ D~ ~ Dw ....."'"
  • 10. 62 C. CANUTO"Parallelism in We are now in a position to discuss how to introduce parallelism in the calculation of (u 9 ~7)u. Two conceptually different forms of parallelism can be considered: a) Mathematical Parallelism: assign the calculation of different terms vDw to diffe- rent processors; b) Numerical Parallelism: assign different portions of the computational domain (in Physical space or in Transform space) to different processors, with the same mathematical task. The mathematical parallelism is the simplest to conceive and even to code; however, it suffers from a number of drawbacks. Since the same component of u may be needed by different processors, a large shared memory is necessry, and/or large data transfers occur. Different processors may require the same data at the same time, leading to severe memory bank conflicts. Furthermore, problems of synchronisation and balancing may arise if the different mathematical terms do not require the same computational effort, or if their number is not a multiple of the number of processors. The strategy becomes definitely useless on fine grain architectures.However, it can represent a first level of parallelism, if a hierarchy of parallelisms is available. Leca and Sacchi-Landriani [11] report their experience of parallelization of a mixed Fourier-Chebyshev Navier-Stokes algorithm, known as the Kleiser- Schumann method (see the next section). They use a multi-AP system at ONERA (France), four AP-120 processors having access to a ((large)) shared memory (compared to the ,local, memories). Starting from a single-processor pre-existent code, Leca and Sacchi-Landriani simply send different subroutines - computing different contributions to the equation - to different processors. The largest observed speed-up is 2.78 out of the maximal 4. From now on, we will discuss strategy b) of parallelization, i.e., Numerical Parallelism. The question is: how do we split the computational domain among the processors in order to get the highest degree of parallelism with the lowest communication costs? The first answer to this question comes from the following fundamental observation: Spectral methodsfor multi-dimensional boundary valueproblems are inherently tensorproducts of one-dimensional spectral methods. This means that the orthogonal basis functions and the computational grids which define a multidimensional spectral method are obtained by taking tensor
  • 11. Spectral Methods 63 products of suitable orthogonal basis functions and grids on intervals of the real line. It follows that the elementary transformations (discrete transforms, dif- ferentiation, pointwise product, ...) which constitute a spectral method can be obtained as a sequence (a cascade) of one dimensional transformations of the same nature. Each of these transformations (e.g., differentiation in the x direc- tion) can be carried out in parallel over parallel rows or columns of the computa- tional domain (either in Physical space or in Transform Space). Therefore, the simplest strategy of domain decomposition will consist of assigning ~slices~ of the computational domain (i.e., groups of contiguous rows or columns, in a two dimensional geometry) to different processors. Once again we stress that we consider slices both in Physical Space (i.e., rows/columns ofgridva- lues) and in Transform Space (i.e., rows/columns of~Wourier~ coefficients). After a transformation along one space direction has been completed, one has to trans- pose the computational lattice in order to carry out transformations along the other directions. Transposition should not be a major problem on architectures with large shared memory or wide-band buses. Erlebacher, Bohkari and Hussaini [5] report preliminary experiences of cod- ing a Fourier-Chebyshev method for compressible Navier-Stokes simulations on a 20 processor Flex/32 computer at the NASA Langley Research Center. Since the time marching scheme is fully explicit, almost all the work is spent in comput- ing convective or diffusive terms by the spectral technique. Parallelization is achieved by the strategy described above. The physical variables on the com- putational domain are stored in shared memory; slices of them are sent to the processors, which write the results of their computation in shared memory. The authors' conclusions are summarized in Table 1, where speed-ups (Sp) and effi- ciencies (Ep) are documented for different choices of the computational grid. According to the authors, moving variables between shared and local memory should not cause major overheads even on such a supercomputer as the ETA 1~ Indeed, quoting from [5], ~a good algorithm [on the ETA 1~ should perform at least 5 floating point operations per word transferred one way from common memory,s. This minimum work is certainly achieved within a spectral code: think, for instance, of differetiation in physical space via FFT. Transposition of the computational lattice will eventually become prohibi- tive on fine grain, local memory architectures. In this case, small portions of the computational domain will be permanently resident in local memories, and inter- communication among processors will be the major issue. In order to understand the communication needs of a spectral method, let us observe that ifL(u) is any
  • 12. 64 C. CANUTO"Parallelism in Table 1. Performance data of one residual calculation (courtesy Bokhari and Hussaini [5]). of Erlebacher, Performance Grid N,o,x2-'* P r. sp F., 128x 16x8 12.7 8 365 7.55 94.3 4 705 3.91 97.6 2 1386 1.99 99.4 1 2757 1.00 100.0 8x64x32 9.0 8 269 7.52 94.0 4 510 3.87 96.8 2 1003 1.97 98.5 1 1977 1.00 100.0 64x 16x 16 8.0 16 138 13.01 81.3 8 242 7.41 92.6 4 466 3.84 95.9 2 916 1.95 97.7 1 1786 1.00 100.0 32x32x16 6.7 16 118 12.82 80.1 8 202 7.45 93.2 4 388 3.89 97.3 2 759 1.99 99.3 1 1511 1.00 100.0 32x16x16 2.7 16 62 9.98 62.4 8 92 6.72 84.0 4 168 3.67 91.7 2 321 1.92 96.0 1 616 1.00 100.0 16x16x16 1.0 16 37 6.54 40.9 8 43 5.60 70.0 4 71 3.44 86.0 2 127 1.92 95.8 1 244 1.00 100.0
  • 13. Spectral Methods 65 differential operator (of any order, with variable coefficients or non-linear terms, etc.) then one can compute a spectral approximation to L(u) at a point P of the computational domain using only information at the points lying on the rows and columns meeting at P (see Figure 1.a). This means that spectral methods, although global methods in one space dimension, exhibit a precise sparse struc- ture in multidimensional problems. x x x | X X X (~) X X X (~) X X X (~) X X X ~) X X X (~) |174174174 X X X (~) X X X (~) X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X --|174174174174 P X X X X X X X X X X Figure 1.a - The spectral r162 at P in the computational domain. Let us confine ourselves to the 2-dimensional case. If we assume to partition the computational domain among an array of m2 processors (see Figure 1.b), then processor Pi0do will need to exchange data only with processors Pi0d (J varying) and Pido(i varying). Thus, information will be exchanged among at most O(n) processors. Note that differentiation in Fourier space and evaluation of non-linear terms in physical space require no communication among processors. Thus, the com- munication demand in the spectral calculation of differential operators is dictated by the two following one dimensional transformations: (24) Fast Fourier transforms; (25) Differentiation in Chebyshev transform space, according to (1.10).
  • 14. 66 C. CANUTO: Parallelism in P~jo Pi~jo Pij Figure 1.b - Communications in a lattice of processors. 3. Solution of linear systems. Hereafterl we will discuss two examples of solution of linear systems arising from spectral approximations. 3.1. Solving a Stokes problem via an influence matrix technique. We consider the Kleiser-Schumann algorithm for solving problem (1.13) in the infinite slab ~ = R• (-1,1), with g= 0 and u 2~-periodic in the x and y directions (see [10] for the details). The basic idea is that (1.13) is equivalent to (3.1) au-vAu+ Vp=G Ap=div G in g?, u_-0 1div u=0 on 092; this, in turn, is equivalent to (3.2) Ap= divG p=2 in f2, on 00, au-vAu = G-Vp in g2, u=0 on 0s
  • 15. Spectral Methods 67 provided $ is chosen in such a way that div u -- 0 on 0g?. If we project (3.2) on each Fourier mode in the x and y directions, we get a family of one dimensional Helmholtz problems in the interval (-1,1), where the unknowns are the Fourier coefficients ofp and u along the chosen mode. The boundary values 4+ and ~l for the Fourier coefficient of p are obtained by solving a 2x2 linear system, whose matrix - the influence matrix - is computed once and for all in pre-processing stage. Thus, one is reduced to solve Helmholtz problems of the form (3.3) -w" + flw = h for -l<z<l, fl>~O. w(1)=a, w(-1)=b. N A Chebyshev approximation wN(z) =k~__owk Tk(Z) to w(z) is defined through the tau-method as -~b(2) + fl~bm = ]~m, O~m<~N-2, (3.4) N N ~, ~b~ = a; ~ (-1)mrb,~ = b. m=O m=O ^ (2) 1 N Here Wm = Z k(k 2 - m2)~k is the m-th Chebyshev coefficient ofw". k=m+2 Cra k+m even Several levels of parallelism can be exploited in the Kleiser-Schumann algor- ithm. The most obvious one consists of splitting the Fourier modes among the processors. There is no communication needed to solve (3.2), and a perfect ba- lance of work among processors can be easily achieved. This strategy has been followed by Leca and Sacchi-Landriani [11]. The next level of parallelism origin- ates from the observation that in each tau system (3.4), the odd Chebyshev mod- es are uncoupled from the even ones. Hence, the task of solving (3.4) can be split over two processors. Finally, each of the resulting linear systems can be written in tridiagonal form (see e.g., [1], Chapter 5 for more details). The last property also holds for tau approximations of the Poisson equation in several space dimensions, provided a preliminary diagonalization has been carried out (see, again, [1], Chapter 5). Thus, the communication demand when solving linear systems originated by tau approximation is essentially related to the (3.5) solution of tridiagonal systems.
  • 16. 68 C. CANUTO:Parallelism in 3.2. Solving an elliptic boundary value problem by an iterative technique. Let us assume we want to solve the model problem -Au=f in the cube g2=(-1,1) 3, (3.6) u=O on Og2, by a Chebyshev collocation method. For a fixed N>0, we define the Chebyshev grid GN ={(xi, yj, zk) I 0~<i, J, k~<N}, with xt=yt=zt=cos lzc/N, 0~<I~N. We seek a polynomial uN of degree N in each space variable, such that (3.7) --AuN(xi, yj, zk)= f(xi, Yi, Zk) uN(xi, yj, Zk)=O V(xi, yj, Zk) e GN f?s "V(Xi, yj, Zk) E GNfq0t2. Setting u = {uN(xi, yj, Zk) ] l~<i,J, k~<N-1} and f= {f(xi, yj, zk)l l~<i,j, k~<N-1}, let us write (3.7) in matrix form as (3.8) Lsp u=f. The resulting matrix, built up from the one-dimensional matrices (1.11), is not banded. Hence, one has to resort to an iterative technique in order to solve (3.8). Among the most popular schemes is the preconditioned Richardson itera- tive method (3.9) un+l=un-anA -1 [Lsp un--J] n=0, 1, 2,. .... , where an>0 is an acceleration parameter which can be dynamically chosen at each iteration, and A-~ is a preconditioning matrix. Note that the spectral re- sidual rn=Lspu~--fcan be efficiently computed by the transform techniques de- scribed in the previous section, for which several strategies of parallelism have been discussed. The matrix A is an easily <<invertiblo> approximation of Lsp, such that
  • 17. Spectral Methods 69 ~max (A_lLsp)~ 1 as apposed to ~.,,a, (Lsp) = 0(N4). (Here 2max, resp., 2rain, ~min ~rain denote the largest, resp., the smallest eigenvalue of the indicated matrix). This is achieved, for instance, if A is the matrix of a low order finite difference or finite element method for the Laplace operator on the Chebyshev grid GN. Multilinear finite elements (Deville and Mund [19]) guarantee exceedingly good precon- ditioning properties. The direct solution of the finite difference or finite element system at each Richardson iteration may be prohibitive for large problems. An approximate solution is usually enough for preconditioning purposes. Most of the algorithms proposed in the literature (see., e.g., [1], Chapter 5 for a review) are global se- quential algorithms (say, an LU incomplete factorization). Recently, Pietra and the author [2] have proposed solving approximately the trilinear finite element system by a small number of ADI iterations. They use an efficient ADI scheme for tensor product finite elements in dimension three intro- duced by Douglas [4]. The method can be easily extended to handle general variable coefficients. As usual, efficiency in an ADI scheme is gained by cycling the parameters. The ADI parameters can be automatically chosen in such a way that the cycle length lc(e) needed to reduce the error by a factor e satisfies lc(e) = log (2r,a~) = log (Art) = 4logN. ~min It follows that for a fixed e, 0 < e<l, there exists a cycle length lc(e), sati- 1 2max flying lc(e)-- C(e) log N such that 2max (A_1L,p) -- -- (A-I L,p), where 2min 1-e •min A is the exact finite element matrix and ,~ is its ADI approximation correspon- ding to a length lc(e) of the parameter cycle. In other words, one can get nearly the same preconditioning power as that of the exact finite element matrix, provi- ded the number of ADI iterations is increased with N at a mere logarithmic rate. The choice of ADI iterations is quite appropriate in conjunction with spec- tral methods. Indeed, ADI share with spectral methods the tensor product struc- ture, which is the basis for the alternate sweeps in the rows and the columns of the computational domain. Furthermore, the Douglas version of ADI for finite el- ments reduces each sweep to the solution of a set of indepedent tridiagonal sy- stems of linear equations, the same kind of system which originates from a one dimensional tau spectral method. Thus, ADI and spectral methods share most of the pros and the cons with respect to the problem of their parallelization. Johns-
  • 18. 70 C. CAmlTO: Parallelismin son, Saad and Schultz [9] discuss highly efficient implementations of ADI met- hods on several parallel architectures. 3.3 Communication needs We have explored the structure of several spectral type algorithms, pointing out the most significant features in view of their implementation on parallel architectures. We first stressed the tensor product structure of spectral methods, next we indicated the one-dimensional transformations which more frequently occur in these methods: they are given in (2.4), (2.5) and (3.5). It is outside the scope of this paper to discuss in detail the implementation of these transformations on specific parallel architectures. Here, we simply recall the most suitable interconnection networks for each of these transformations re- ferring for a deeper analysis to classical books on parallel computers such as [8], or to review papers such as [13]. Fast Fourier Transforms play a fundamental r61e in spectral methods. The Perfect Shuffle interconnection network (Pease (1968), Stone (1971)) is the optimal communication scheme for this class of transforms. Differentiation in Chebyshev transform space essentially amounts to a mat- rix-vector multiplication, where the matrix is upper triangular Toepliz (see (1.10)). Thus, it can be written as a recursion relation as follows Cm I] (1) = Cm+ 2 1](1)+ 2 "~ (m+l)1]m+l, m=N-1, ..., 0; 1](1~+~ = 1](~ = 0. Cyclic Reduction (Golub-Hockney, 1965) or Cyclic Elimination (Heller, 1976) are the implementations of recursive algorithms which are suggested for parallel architectures. Several interconnection networks have been proposed for these transformations (see again [8], [13]). The tridiagonal systems arising from tau methods or finite-order precon- ditioners can be efficiently solved on parallel machines by a variety of substruc- turing algorithms, which include Cyclic Reduction or Cyclic Elimination. John- son, Saad and Schultz [9] discuss to implementation of ADI methods on the hypercube architecture. Often, it is advisable to invert the tridiagonal matrix once and for all in a preprocessing stage and then solve the linear systems by
  • 19. Spectral Methods 71 matrix-vector multiplication. In this case, the Nearest Neighbor Network pro- vides the optimal communication scheme. It is clear from the previous discussion that several intercommunication paths should co-exist in order to allow an optimal implementation of spectral algorithms on parallel architectures. The union of the Perfect Shuffle Network with the Nearest Neighbor Network (PSNN) is an example of a multi-path scheme, quite appropriate for spectral methods. The PSNN was first proposed by Grosch [7] for an efficient parallel implementation of fast Poisson solvers. 4. Domain decompositions in spectral methods. The parallel implementation of different domain decomposition techniques for general boundary value problems is discussed in the paper by A. Quarteroni in this volume; we refer to it for the details. Hereafter, we confine ourselves to some basic considerations about the use of a domain decomposition strategy with spectral methods. Partitioning the domain is an absolute need if the geometry of the domain is complex, i.e., if it cannot be easily mapped into a Cartesian region. In this case, one breaks the domain into simple pieces, and sets up a separate scheme in each subdomain; suitable continuities are enforced at the interfaces, usually by an iterative procedure. (We refert to [1], Chapter 13, for a review of the existing domain decomposition techniques for spectral methods). The same strategy can be applied, even on a simple geometry, with the primary purpose of splitting the computational effort over several processors. This ~route to parallelism- - which is quite successful when the discretization scheme is of finite order- may contain a potential source of inefficiency if used in the context of spectral methods. Indeed, it leads to breaking the globality of the expansion, which - as we know - is a necessary ingredient in order to have high accuracy for regular solutions. We stress here one of the crucial differences be- tween local, finite order approximations and spectral approximations to the same boundary value problem, produced by a domain decomposition technique. In the former case, the solution obtained at convergence of the iterative procedure coin- cides with the solution obtained by a single-domain method of the same type, which employs the union of the grids on the subdomains. In the latter case, the single-domain solution is a global polynomial function, whereas the final multi- domain solution is merely a piecewise polynomial function, with finite order smoothness at the interfaces. Although this does not prevent asymptotic spectral
  • 20. 72 C. CANUTO:Parallelism in accuracy for the multi-domain solution, its actual accuracy may be severely de- graded if compared to that of the single-domain solution defined by the same total number of degrees of freedom. Let us illustrate the situation with a model problem, taken from [2]. Consid- er the Dirichlet problem for the Poisson equation in the square (-1, 1)2, whose exact solution is u(x, y) = cos 2nx cos 2ary. We divide the domain into four equal squares, on each of which we set a Chebyshev collocation method, plus we en- force C ~ continuity at the interfaces. The results are compared with those pro- duced by a Chebyshev collocation method on the original square, which uses the same total number of unknowns. The relative L = errors are reported in Table 2. Table 2. Relative maximum-norm errors for a Chebyshev collocation method (from [2]). u(x, y) = cos2~ Xcossry 4 DOM, 4x4 .62 E0 1 DOM, 8x8 .35 E-1 4 DOM, 8x8 .12 E-2 1 DOM, 16x16 .11 E-6 4 DOM, 16x16 .49 E-10 1 DOM, 32x32 .38 E-14 Note the loss of four orders of magnitude in replacing the single domain with 16x 16 nodes by the four domains, each with a 8x8 grid. Of course, if we have four processors and we can reach the theoretical speed-up of four in the domain decomposition technique, we can run four 16x 16 subdomains in parallel at the cost of a single 16 X 16 domain on a single-processor, and gain four order of mag- nitudes in accuracy. However, if we seek parallelism through the splitting techni- ques described in Sections 2 and 3, and we maintain a speed-up of four, we can run for the same cost a 32x32 grid on the single domain, yielding a superior accurcy again by a factor of 10-4. Thus, it appears that it is better to keep the
  • 21. Spectral Methods 73 spectral expansion as global as possible, and look for parallelism at the level of the solution of the algebraic system originated from the discretization method. We conclude by going back to domain decompositions for the spectral scheme on each, <<simple>)subdomain, supplemented by suitable continuity con- ditions at the interfaces. Deville and Mund [3] indicated that this can be done by an iterative procedure such as (3.9), where A-1 is a <<globalpreconditioner>), i.e., an approximation of the differential problem over the whole domain. If the pre- conditioner is of finite element type, the interface conditions can be incorporated in the variational formulation, as shown in [2]. Thus, at each iteration, one has to compute the spectral residuals separately on each subdomain. This can be done in parallel. Next, one has to (approximately) solve a finite element system. Again, this can be carried out in parallel form using one of the existing domain decom- position techniques for finite elements methods. Note that in principle the do- main decomposition used at this stage may be totally independent of the one introduced for setting the spectral approximation. REFERENCES [1] C. CANUTO,M. Y. HUSSAINI,A. QUARTERONI,T. A. ZANGSpectral Methods in Fluid Dynamics, Springer Vertag, New York, 1988. [2] C. CANUTO,P. PIETRA,Boundary and interfaceconditionswithin afinite elementprecon- ditionerfor spectral methods, I.A.N.-C.N.R. Report n~ 555, Pavia, 1987. [3] M. DEVILLE,E. MUND, Chebyshevpseudospectral solution of second-orderelliptic equa- tions with finite elementpreconditioning, J. Comput. Phys., 60 (1985), 517-533. [4] J. DOUOLAS,JR., Alternating directionmethodsfor threespacevariables, Numer. Math., 4 (1962), 41-63. [5] G. ERLEBACHER,S. BOKHARI,M. Y. HUSSAINI,Threedimensionalcompressibletransi- tion on a 20processorFlex/32 multicomputer, preprint, NASA Langley Research Center, 1987. [6] D. GOTTLIEB, S. A. ORSZAO,Numerical Analysis of Spectral Methods: Theory and Applications, SIAM-CBMS, Philadelphia, 1977. [7] C. E. GROSCH,Performance analysis of Poisson solvers on array computers, (1979) In- fotech State of the Art Report: Supercomputers (C. Jesshope and R. Hockney, eds.), Infotech, Maindehead, 147-181. [8] R. HOCKNEY,C. JESSHOPE,Parallel Computers:Architecture, Programming and Algor- ithms, Adam Hilger, Bristol, 1981. [9] S. L. JOHNSSON,Y. SAAD,M. H. SCHULTZ,Alternating direction methods on multip- rocessors, Report YALEU/DCS/RR-382, October 1985. [10] L. KLEISER, U. SCHUMANN,Treatment of incompressibility and boundary conditions in 3-D numerical spectral simulations of plane channelflows, Proc. 3rd GAMM
  • 22. 74 C. CANUTO: Parallelism in Conf. Numerical Methods in Fluid Mechanics (E. H. Hirschel, ed.), Vieweg Verlag, Braunshweig, 1980, 165-173. [11] P. Lv.cA,G. SACcm-LANDRL~NI,ParalMisation d'un algorithme de matrice d'influence pour la rgsolutiondes equations deNavier-Stokes par m~thodesspectrales, La R6cher- che A6rospatiale, 6 (1987), 35-42. [12] D. M. NOSENCHUCK,S. E. KRXST,T. A. ZANO,On multigrid methodsfor the Navier- Stokes Computer, paper presented at the 3rd Copper Mountain Conference on Multigrid Methods, Copper Mountain, Colorado, April 6-10, 1987. [13] J. M. ORTEGA,R. G. VOIOT,Solution of partial differential equations on vector and parallel computers, SIAM Review, 27 (1985), 149-240. [14] C. T~.MPERTON,Self-sorting mixed-radix fast Fourier transforms, J. Comput. Phys., 52 (1983), 1-23. [15] R. G. Vola% D. GorrT.IEB, M. Y. HussAIm (eds). Spectral Methods for Partial Differential Equations, SIAM, Philadelphia, 1984.