Writing distributed N-body code using distributed FFT - 1
Use of distributed FFT for writing fully
distributed N-body code for cosmological
Supervisors : Dr. S. Sanyal, IIIT Allahabad
&, Dr. J. S. Bagla, HRI Allahabad
The classical N-body problem simulates the evolution of a
system of N bodies, where the force exerted on each body
arises due to its interaction with all the other bodies in the
system. It is used in cosmology to study processes of structure
formation like the dynamical evolution of star clusters under
the influence of physical forces.
Given the initial conditions of the bodies i.e. initial masses,
positions and velocities, an N-body code helps to calculate
their current positions and motions, evaluating the intermediate
values over timesteps and updating.
The particle-particle interactions lead to the order of N2
calculations which is extremely huge and practically not
Hence, the need for optimisation comes in; Fast Fourier
Transforms are used which reduce the time required for
calculation to order of N log N.
Even then large volumes of data are generated and the
calculation of an N-body code takes excessively long time even
on the fastest of computers .
As a solution, the computations are done on distributed
systems. The task is divided into the number of
processors/systems available which perform calculations on
their local data. As the calculations occur parallely, time
Hence, use of distributed FFT for writing a fully distributed N-
body code provides the advantages of faster calculations at a
comparatively lower cost.
Each N-body code has two basic modules, one for calculation
of the total force acting on each body, given the configuration
of particles and the other module moves the particles in this
The project deals with calculation of the force field based on
initial conditions and movement of the particles based on the
The data will be decomposed and stored into the local memory
of each distributed machine and processed.
Then the processed local data of all the machines will be
combined and the desired N-body code will be obtained.
Initial conditions are
setup for the model of
Compute forces for given
Move the particles by
If t = tfin
Write output to file
FFTW – Fastest Fourier Transform in the WEST is a C
subroutine library for computing the discrete Fourier transform
(DFT) in one or more dimensions, of arbitrary input size, and
of both real and complex data. The FFTW package was
developed at MIT by Matteo Frigo and Steven G. Johnson.
FFTW libraries can be used for writing codes in C, C++ and
It is used for solving the Poisson equation of the gravitational
potential and calculation of force using Fourier transform.
By default, both the forward and inverse Fourier transforms
are done out-place.
FFTW also provides for in-place transforms, with same input
and output arrays.
The FFTW routines store the data in row-major format for
It does not do normalization of data implicitly and hence if we
perform forward transform of some data and inverse transform
of the result, we get the original data multiplied by the size of
FFTW also support MPI (Message Passing Interface)
operations allowing for distributed memory parallelism, where
each CPU has its own separate memory, and which can scale up
to clusters of many thousands of processors. This is desirable in
the project building as the data is huge and will not fit in the
memory of a single processor.
In MPI, the data is divided among a set of “processes” which
each run in their own memory address space.
PMFAST is a particle-mesh N-body code, written in Fortran
90 and aimed towards use in large-scale structure cosmological
It offers support for distributed memory systems through MPI
as well as parallel initial condition generator.
Plan of Work
The project comprises of writing an N-body code taking input
conditions, solving the potential equation in k-space and calculating
the force and simulate over timesteps, calculating the intermediate
position and other attributes. As the major task here is solving of
the equation in k-space using Fourier transform, the following steps
The force and gravitational potential are related to each other as
Finding the potential energy Φ is easy, because the Poisson equation,
where G is Newton's constant and is the density (number of particles at the
It is trivial to solve Φ by using the fast Fourier transform to go to the frequency
domain where the Poisson equation has the simple form,
The gravitational field can now be found by multiplying by k and computing the
inverse Fourier transform.
• The first step of the project was taking a 1-dimensional real data
value and calculating the error obtained by using FFTW for
forward and then subsequent inverse transform followed by
– g(x) = exp(-(x-N/2)2/(2*σ2)) , x ranging from 1 to N
– ∂2g = ((x-N/2)2/σ2 – 1)*g(x)/σ2 = f(x), say
– f(x) ------> F(k) [forward fourier transform]
– F(k)/-k2 ---------> g(x) [inverse fourier transform]
where, k2 = kx2 + ky2 + kz2 , for 3–dimensional data
– in current case 1-d , k2 = kx2
– kx = 2π/N * i, i<=N/2
– = 2π/N * (N-i), i>n/2
• Calculated the dependence of error on the values of σ and N.
Error = Σ(i=1toN) (gobtained(i)-g(i))2 /g(i)2
– Error(N=256) = 0.077926
– Error(N=512) = 0.043631
– Error(N=1024) = 0.0264835 , keeping σ =5, constant.
– Error(σ =5) = 0.0264835
– Error(σ =10) = 0.043631
– Error(σ =15) = 0.0607785 , keeping N=1024, constant
Hence, it is deducted that the error value increases with
increasing σ but decreases as N increases.
• Performed multi-dimensional fast Fourier transform of real and
complex data. In this case the complex data's real part was kept
equal to the real data and complex value was left to zero, so that
both the real and complex transform were done on the same
2-d complex transform (above) and real transform (below)
• After successful completion of out-place transforms, in-place
transforms were done as they are useful in the project.
• The next step is to perform the in-place transforms using
Afore-mentioned work has been done before mid-semester.
• Work to be done now is to run the same MPI programs with
very large N values on a 32-node cluster, each node having
16GB RAM and a quad core processor. The task will be to plot
time against the number of processes for a particular N value
and find the optimal number of processes for which execution
time is minimised.
• The next step is to store the data required by each process in the
local memory of the process itself and then repeat the above.
This will reduce the storage requirements and now the data size
can be extremely large as it will not depend on the storage of
one processor only.
• After the optimisation of Fourier transform functions, a Particle
Mesh based N-body code, PMFAST, will be used and the force
computations will be done using the developed distributed-
memory Fourier transform codes.
• With the help of the force computations, particles will be moved
accordingly and subsequent calculations will be done iteratively
using timestep to achieve the final attributes of the particles.
1. J.S.Bagla 2001, Cosmological N-Body Simulations, Resource
Summary, Khagol 48, 5
2. J. S. Bagla, Cosmological N-Body Simulations, Gravitational
Clustering in an Expanding Universe -
3. FFTW – Fastest Fourier Transform in the WEST -
4. The Message Passing Interface (MPI) standard -
5. PMFAST - http://www.cita.utoronto.ca/~merz/pmfast/
6. Wikipedia – The online encyclopedia