SlideShare a Scribd company logo
1 of 16
Download to read offline
Use of distributed FFT for writing fully
distributed N-body code for cosmological
               applications


             Supervisors : Dr. S. Sanyal, IIIT Allahabad
                     &, Dr. J. S. Bagla, HRI Allahabad

                                          -Kalpana Roy
                                              R200513
Motivation
The classical N-body problem simulates the evolution of a
system of N bodies, where the force exerted on each body
arises due to its interaction with all the other bodies in the
system. It is used in cosmology to study processes of structure
formation like the dynamical evolution of star clusters under
the influence of physical forces.
Given the initial conditions of the bodies i.e. initial masses,
positions and velocities, an N-body code helps to calculate
their current positions and motions, evaluating the intermediate
values over timesteps and updating.
The particle-particle interactions lead to the order of N2
calculations which is extremely huge and practically not
feasible.
Hence, the need for optimisation comes in; Fast Fourier
Transforms are used which reduce the time required for
calculation to order of N log N.
Even then large volumes of data are generated and the
calculation of an N-body code takes excessively long time even
on the fastest of computers [2].
As a solution, the computations are done on distributed
systems. The task is divided into the number of
processors/systems available which perform calculations on
their local data. As the calculations occur parallely, time
required decreases.
Hence, use of distributed FFT for writing a fully distributed N-
body code provides the advantages of faster calculations at a
comparatively lower cost.
Problem Definition
Each N-body code has two basic modules, one for calculation
of the total force acting on each body, given the configuration
of particles and the other module moves the particles in this
force field.
The project deals with calculation of the force field based on
initial conditions and movement of the particles based on the
force.
The data will be decomposed and stored into the local memory
of each distributed machine and processed.
Then the processed local data of all the machines will be
combined and the desired N-body code will be obtained.
Initial conditions are
setup for the model of
interest.
                                  N-body



Compute forces for given
particle positions




 Move the particles by
 one step



                             no
         If t = tfin

                       yes


 Write output to file
Technologies Used
FFTW – Fastest Fourier Transform in the WEST is a C
subroutine library for computing the discrete Fourier transform
(DFT) in one or more dimensions, of arbitrary input size, and
of both real and complex data. The FFTW package was
developed at MIT by Matteo Frigo and Steven G. Johnson.
FFTW libraries can be used for writing codes in C, C++ and
Fortran languages.
It is used for solving the Poisson equation of the gravitational
potential and calculation of force using Fourier transform.
By default, both the forward and inverse Fourier transforms
are done out-place.
FFTW also provides for in-place transforms, with same input
and output arrays.
The FFTW routines store the data in row-major format for
multi-dimensional arrays.
It does not do normalization of data implicitly and hence if we
perform forward transform of some data and inverse transform
of the result, we get the original data multiplied by the size of
the array.
FFTW also support MPI (Message Passing Interface)
operations allowing for distributed memory parallelism, where
each CPU has its own separate memory, and which can scale up
to clusters of many thousands of processors. This is desirable in
the project building as the data is huge and will not fit in the
memory of a single processor.
In MPI, the data is divided among a set of “processes” which
each run in their own memory address space.
PMFAST is a particle-mesh N-body code, written in Fortran
90 and aimed towards use in large-scale structure cosmological
simulations [5].
It offers support for distributed memory systems through MPI
as well as parallel initial condition generator.
Plan of Work
The project comprises of writing an N-body code taking input
conditions, solving the potential equation in k-space and calculating
the force and simulate over timesteps, calculating the intermediate
position and other attributes. As the major task here is solving of
the equation in k-space using Fourier transform, the following steps
are followed:
 The force and gravitational potential are related to each other as

 Finding the potential energy Φ is easy, because the Poisson equation,


where G is Newton's constant and     is the density (number of particles at the
mesh points.)
It is trivial to solve Φ by using the fast Fourier transform to go to the frequency
  domain where the Poisson equation has the simple form,


  The gravitational field can now be found by multiplying by k and computing the
  inverse Fourier transform.
• The first step of the project was taking a 1-dimensional real data
  value and calculating the error obtained by using FFTW for
  forward and then subsequent inverse transform followed by
  normalisation.
   – g(x) = exp(-(x-N/2)2/(2*σ2)) , x ranging from 1 to N
   – ∂2g = ((x-N/2)2/σ2 – 1)*g(x)/σ2 = f(x), say
   – f(x) ------> F(k) [forward fourier transform]
   – F(k)/-k2 ---------> g(x) [inverse fourier transform]
   where, k2 = kx2 + ky2 + kz2 , for 3–dimensional data
   – in current case 1-d , k2 = kx2
   – kx = 2π/N * i, i<=N/2
   –     = 2π/N * (N-i), i>n/2
• Calculated the dependence of error on the values of σ and N.
                  Error = Σ(i=1toN) (gobtained(i)-g(i))2 /g(i)2

   – Error(N=256) = 0.077926
   – Error(N=512) = 0.043631
   – Error(N=1024) = 0.0264835 , keeping σ =5, constant.
– Error(σ =5) = 0.0264835
   – Error(σ =10) = 0.043631
   – Error(σ =15) = 0.0607785 , keeping N=1024, constant


  Hence, it is deducted that the error value increases with
  increasing σ but decreases as N increases.

• Performed multi-dimensional fast Fourier transform of real and
  complex data. In this case the complex data's real part was kept
  equal to the real data and complex value was left to zero, so that
  both the real and complex transform were done on the same
  data.
2-d complex transform (above) and real transform (below)
• After successful completion of out-place transforms, in-place
  transforms were done as they are useful in the project.
• The next step is to perform the in-place transforms using
  distributed-memory parallelism.
  Afore-mentioned work has been done before mid-semester.
• Work to be done now is to run the same MPI programs with
  very large N values on a 32-node cluster, each node having
  16GB RAM and a quad core processor. The task will be to plot
  time against the number of processes for a particular N value
  and find the optimal number of processes for which execution
  time is minimised.
• The next step is to store the data required by each process in the
  local memory of the process itself and then repeat the above.
  This will reduce the storage requirements and now the data size
  can be extremely large as it will not depend on the storage of
  one processor only.
• After the optimisation of Fourier transform functions, a Particle
  Mesh based N-body code, PMFAST, will be used and the force
  computations will be done using the developed distributed-
  memory Fourier transform codes.
• With the help of the force computations, particles will be moved
  accordingly and subsequent calculations will be done iteratively
  using timestep to achieve the final attributes of the particles.
References
1. J.S.Bagla 2001, Cosmological N-Body Simulations, Resource
   Summary, Khagol 48, 5
2. J. S. Bagla, Cosmological N-Body Simulations, Gravitational
   Clustering in an Expanding Universe -
   http://www.hri.res.in/~jasjeet/thesis.html
3. FFTW – Fastest Fourier Transform in the WEST -
   http://www.fftw.org
4. The Message Passing Interface (MPI) standard -
   http://www-unix.mcs.anl.gov/mpi/
5. PMFAST - http://www.cita.utoronto.ca/~merz/pmfast/
6. Wikipedia – The online encyclopedia

More Related Content

What's hot

Ecg signal compression for diverse transforms
Ecg signal compression for diverse transformsEcg signal compression for diverse transforms
Ecg signal compression for diverse transformsAlexander Decker
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa ReformerSan Kim
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...Salford Systems
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsPFHub PFHub
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphDr Shashikant Athawale
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepikaguest1f4fb3
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...eSAT Publishing House
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...Hiroki Shimanaka
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmMostafa G. M. Mostafa
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsDanish Javed
 
A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...journalBEEI
 
Parallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringParallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringAndreina Uzcategui
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencleSAT Publishing House
 

What's hot (20)

Ecg signal compression for diverse transforms
Ecg signal compression for diverse transformsEcg signal compression for diverse transforms
Ecg signal compression for diverse transforms
 
XLnet RoBERTa Reformer
XLnet RoBERTa ReformerXLnet RoBERTa Reformer
XLnet RoBERTa Reformer
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamics
 
Parallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and GraphParallel Algorithms- Sorting and Graph
Parallel Algorithms- Sorting and Graph
 
Chap9 slides
Chap9 slidesChap9 slides
Chap9 slides
 
PRML 5.5
PRML 5.5PRML 5.5
PRML 5.5
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...
 
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
[論文紹介] ReVal: A Simple and Effective Machine Translation Evaluation Metric Ba...
 
cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 
Chap8 slides
Chap8 slidesChap8 slides
Chap8 slides
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...
 
Parallel Algorithms K – means Clustering
Parallel Algorithms K – means ClusteringParallel Algorithms K – means Clustering
Parallel Algorithms K – means Clustering
 
Parallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using openclParallel k nn on gpu architecture using opencl
Parallel k nn on gpu architecture using opencl
 

Similar to Use of distributed FFT for fully distributed N-body cosmological code

Design of FFT Processor
Design of FFT ProcessorDesign of FFT Processor
Design of FFT ProcessorRohit Singh
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Storti Mario
 
El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711RCCSRENKEI
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IRamez Abdalla, M.Sc
 
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Storti Mario
 
Parallel implementation of pulse compression method on a multi-core digital ...
Parallel implementation of pulse compression method on  a multi-core digital ...Parallel implementation of pulse compression method on  a multi-core digital ...
Parallel implementation of pulse compression method on a multi-core digital ...IJECEIAES
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd Iaetsd
 
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...ijfcstjournal
 
Implementation of recurrent neural network for the forecasting of USD buy ra...
Implementation of recurrent neural network for the forecasting  of USD buy ra...Implementation of recurrent neural network for the forecasting  of USD buy ra...
Implementation of recurrent neural network for the forecasting of USD buy ra...IJECEIAES
 
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKSOPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKSZac Darcy
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsIOSR Journals
 
IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...
IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...
IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...IRJET Journal
 
Problems in Task Scheduling in Multiprocessor System
Problems in Task Scheduling in Multiprocessor SystemProblems in Task Scheduling in Multiprocessor System
Problems in Task Scheduling in Multiprocessor Systemijtsrd
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionIJCNCJournal
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applicationsAgam Goel
 
A Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemA Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemSheila Sinclair
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...NECST Lab @ Politecnico di Milano
 

Similar to Use of distributed FFT for fully distributed N-body cosmological code (20)

Design of FFT Processor
Design of FFT ProcessorDesign of FFT Processor
Design of FFT Processor
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
 
Bh36352357
Bh36352357Bh36352357
Bh36352357
 
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
Advances in the Solution of Navier-Stokes Eqs. in GPGPU Hardware. Modelling F...
 
El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711
 
Artificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part IArtificial Intelligence Applications in Petroleum Engineering - Part I
Artificial Intelligence Applications in Petroleum Engineering - Part I
 
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
Advances in the Solution of NS Eqs. in GPGPU Hardware. Second order scheme an...
 
Parallel implementation of pulse compression method on a multi-core digital ...
Parallel implementation of pulse compression method on  a multi-core digital ...Parallel implementation of pulse compression method on  a multi-core digital ...
Parallel implementation of pulse compression method on a multi-core digital ...
 
Iaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformationIaetsd pipelined parallel fft architecture through folding transformation
Iaetsd pipelined parallel fft architecture through folding transformation
 
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...
Multiprocessor scheduling of dependent tasks to minimize makespan and reliabi...
 
Implementation of recurrent neural network for the forecasting of USD buy ra...
Implementation of recurrent neural network for the forecasting  of USD buy ra...Implementation of recurrent neural network for the forecasting  of USD buy ra...
Implementation of recurrent neural network for the forecasting of USD buy ra...
 
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKSOPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
OPTIMIZED TASK ALLOCATION IN SENSOR NETWORKS
 
Parallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic MathematicsParallel Hardware Implementation of Convolution using Vedic Mathematics
Parallel Hardware Implementation of Convolution using Vedic Mathematics
 
IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...
IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...
IRJET- A Particle Swarm Optimization Algorithm for Total Cost Minimization in...
 
Problems in Task Scheduling in Multiprocessor System
Problems in Task Scheduling in Multiprocessor SystemProblems in Task Scheduling in Multiprocessor System
Problems in Task Scheduling in Multiprocessor System
 
A comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistributionA comparison of efficient algorithms for scheduling parallel data redistribution
A comparison of efficient algorithms for scheduling parallel data redistribution
 
Dft and its applications
Dft and its applicationsDft and its applications
Dft and its applications
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
A Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment ProblemA Connectionist Approach To The Quadratic Assignment Problem
A Connectionist Approach To The Quadratic Assignment Problem
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Use of distributed FFT for fully distributed N-body cosmological code

  • 1. Use of distributed FFT for writing fully distributed N-body code for cosmological applications Supervisors : Dr. S. Sanyal, IIIT Allahabad &, Dr. J. S. Bagla, HRI Allahabad -Kalpana Roy R200513
  • 2. Motivation The classical N-body problem simulates the evolution of a system of N bodies, where the force exerted on each body arises due to its interaction with all the other bodies in the system. It is used in cosmology to study processes of structure formation like the dynamical evolution of star clusters under the influence of physical forces. Given the initial conditions of the bodies i.e. initial masses, positions and velocities, an N-body code helps to calculate their current positions and motions, evaluating the intermediate values over timesteps and updating. The particle-particle interactions lead to the order of N2 calculations which is extremely huge and practically not feasible.
  • 3. Hence, the need for optimisation comes in; Fast Fourier Transforms are used which reduce the time required for calculation to order of N log N. Even then large volumes of data are generated and the calculation of an N-body code takes excessively long time even on the fastest of computers [2]. As a solution, the computations are done on distributed systems. The task is divided into the number of processors/systems available which perform calculations on their local data. As the calculations occur parallely, time required decreases. Hence, use of distributed FFT for writing a fully distributed N- body code provides the advantages of faster calculations at a comparatively lower cost.
  • 4. Problem Definition Each N-body code has two basic modules, one for calculation of the total force acting on each body, given the configuration of particles and the other module moves the particles in this force field. The project deals with calculation of the force field based on initial conditions and movement of the particles based on the force. The data will be decomposed and stored into the local memory of each distributed machine and processed. Then the processed local data of all the machines will be combined and the desired N-body code will be obtained.
  • 5. Initial conditions are setup for the model of interest. N-body Compute forces for given particle positions Move the particles by one step no If t = tfin yes Write output to file
  • 6. Technologies Used FFTW – Fastest Fourier Transform in the WEST is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data. The FFTW package was developed at MIT by Matteo Frigo and Steven G. Johnson. FFTW libraries can be used for writing codes in C, C++ and Fortran languages. It is used for solving the Poisson equation of the gravitational potential and calculation of force using Fourier transform. By default, both the forward and inverse Fourier transforms are done out-place. FFTW also provides for in-place transforms, with same input and output arrays.
  • 7. The FFTW routines store the data in row-major format for multi-dimensional arrays. It does not do normalization of data implicitly and hence if we perform forward transform of some data and inverse transform of the result, we get the original data multiplied by the size of the array. FFTW also support MPI (Message Passing Interface) operations allowing for distributed memory parallelism, where each CPU has its own separate memory, and which can scale up to clusters of many thousands of processors. This is desirable in the project building as the data is huge and will not fit in the memory of a single processor. In MPI, the data is divided among a set of “processes” which each run in their own memory address space.
  • 8. PMFAST is a particle-mesh N-body code, written in Fortran 90 and aimed towards use in large-scale structure cosmological simulations [5]. It offers support for distributed memory systems through MPI as well as parallel initial condition generator.
  • 9. Plan of Work The project comprises of writing an N-body code taking input conditions, solving the potential equation in k-space and calculating the force and simulate over timesteps, calculating the intermediate position and other attributes. As the major task here is solving of the equation in k-space using Fourier transform, the following steps are followed: The force and gravitational potential are related to each other as Finding the potential energy Φ is easy, because the Poisson equation, where G is Newton's constant and is the density (number of particles at the mesh points.)
  • 10. It is trivial to solve Φ by using the fast Fourier transform to go to the frequency domain where the Poisson equation has the simple form, The gravitational field can now be found by multiplying by k and computing the inverse Fourier transform. • The first step of the project was taking a 1-dimensional real data value and calculating the error obtained by using FFTW for forward and then subsequent inverse transform followed by normalisation. – g(x) = exp(-(x-N/2)2/(2*σ2)) , x ranging from 1 to N – ∂2g = ((x-N/2)2/σ2 – 1)*g(x)/σ2 = f(x), say – f(x) ------> F(k) [forward fourier transform] – F(k)/-k2 ---------> g(x) [inverse fourier transform] where, k2 = kx2 + ky2 + kz2 , for 3–dimensional data – in current case 1-d , k2 = kx2 – kx = 2π/N * i, i<=N/2 – = 2π/N * (N-i), i>n/2
  • 11. • Calculated the dependence of error on the values of σ and N. Error = Σ(i=1toN) (gobtained(i)-g(i))2 /g(i)2 – Error(N=256) = 0.077926 – Error(N=512) = 0.043631 – Error(N=1024) = 0.0264835 , keeping σ =5, constant.
  • 12. – Error(σ =5) = 0.0264835 – Error(σ =10) = 0.043631 – Error(σ =15) = 0.0607785 , keeping N=1024, constant Hence, it is deducted that the error value increases with increasing σ but decreases as N increases. • Performed multi-dimensional fast Fourier transform of real and complex data. In this case the complex data's real part was kept equal to the real data and complex value was left to zero, so that both the real and complex transform were done on the same data.
  • 13. 2-d complex transform (above) and real transform (below)
  • 14. • After successful completion of out-place transforms, in-place transforms were done as they are useful in the project. • The next step is to perform the in-place transforms using distributed-memory parallelism. Afore-mentioned work has been done before mid-semester. • Work to be done now is to run the same MPI programs with very large N values on a 32-node cluster, each node having 16GB RAM and a quad core processor. The task will be to plot time against the number of processes for a particular N value and find the optimal number of processes for which execution time is minimised.
  • 15. • The next step is to store the data required by each process in the local memory of the process itself and then repeat the above. This will reduce the storage requirements and now the data size can be extremely large as it will not depend on the storage of one processor only. • After the optimisation of Fourier transform functions, a Particle Mesh based N-body code, PMFAST, will be used and the force computations will be done using the developed distributed- memory Fourier transform codes. • With the help of the force computations, particles will be moved accordingly and subsequent calculations will be done iteratively using timestep to achieve the final attributes of the particles.
  • 16. References 1. J.S.Bagla 2001, Cosmological N-Body Simulations, Resource Summary, Khagol 48, 5 2. J. S. Bagla, Cosmological N-Body Simulations, Gravitational Clustering in an Expanding Universe - http://www.hri.res.in/~jasjeet/thesis.html 3. FFTW – Fastest Fourier Transform in the WEST - http://www.fftw.org 4. The Message Passing Interface (MPI) standard - http://www-unix.mcs.anl.gov/mpi/ 5. PMFAST - http://www.cita.utoronto.ca/~merz/pmfast/ 6. Wikipedia – The online encyclopedia