Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms

IMPLEMENTATION AND OPTIMIZATION OF
FDTD KERNELS BY USING CACHE-AWARE
TIME-SKEWING ALGORITHMS
THESIS PRESENTATION
1
SERHAN OZBEY
WARSAW UNIVERSITY OF TECHNOLOGY
INSTITUTE OF TELECOMMUNICATIONS 16/03/2017

ABSTRACT
 The main goal of this thesis was to implement and optimize cache-aware time-skewing algorithms for
FDTD kernels to reduce cache misses and idle time of the processor.
 Large scale discretization of space and computations needed for electromagnetic simulations
 Importance of utilization and optimization of an efficient memory access pattern
 Naive implementation of FDTD method into code is a kernel with cascaded loops that makes data reads
and writes from memory to calculate EM fields.
 Exploiting data dependencies and locality features of FDTD kernel with a better usage of memory
hierarchy, reducing processors’ idle time is achievable
 Execution time of FDTD can take long if cascaded loops are not incremented in a way to use data
dependencies efficiently.
 Reduction of this idle time can be done with skewing and blocking time and space domains to force
loop iterations to follow data dependencies for a better access scheme with better usage of fast CPU
cache memories

TOPICS
1. INTRODUCTION
2. LITERATURE REVIEW
3. METHODOLOGY
4. RESULTS AND DISCUSSION
5. CONCLUSIONS
3

INTRODUCTION
 For sustainable and reliable telecommunication networks, modelling of efficient and durable network
components are highly demanded. This is done by modelling and producing efficient devices that
interacts well with electromagnetic disturbances that affects performance of such components.
 Considerations of factors such as electromagnetic radiation, scattering should be done by
electromagnetic modelling of devices to simulate interactions of devices with nature conditions and
materials existing in environment.
 This is done by modelling and producing efficient devices that interacts well with electromagnetic
disturbances that affects performance of such components
4

INTRODUCTION
 Computational electromagnetics (electromagnetic modeling): is the process of modeling the interaction
of electromagnetic fields with physical objects and the environment. Maxwell’s equations should be
solved, which will evaluate electric and magnetic fields according to given boundary and constitutional
relation conditions.
 By using computationally efficient approximations to Maxwell's equations, it is used to
 calculate antenna performance
 electromagnetic compatibility,
 radar cross section
 electromagnetic wave propagation when not in free space.
5

INTRODUCTION
 Computational electromagnetics have been the answer for electromagnetic simulations using latest
technology available. By now, there is many methods existing in domain such as integral form Maxwell’s
equation solvers like MoM or differential form Maxwell’s equation solvers as FEM and FDTD.
 To achieve high details and accuracy in these solvers, huge discretization of space and time elements
needed to solve these problems.
 This means memory should be used in an efficient way by exchanging spatial and temporal data
in a fast way to calculate the field values with Maxwell’s equations till the end of the given time.
6

INTRODUCTION
• FDTD, the numerical analysis technique which is
used widely in computational electromagnetics ,
belongs in the general class of grid-based
differential numerical modeling methods. The
time-dependent Maxwell's equations (in partial
differential form) are discretized using central-
difference approximations to the space and
time partial derivatives.
7

FDTD METHOD
 Solving Maxwell’s equations in time domain.
 Saving each frame (one time iteration of our
code) as a movie.
 Electric field changing at a particular point will
induce a curling (circulating) magnetic field.
 Likewise, an induced magnetic field induces
curling electric field.
 This leaves us with a leapfrog way of
calculations as shown at the figure on right
hand side.
8

FDTD METHOD
for t in 0 to NT-1
for i in 1 to N-1
E[i] = k1*E[i] + k2 * ( H[i] - H[i-1] )
end for
for i in 1 to N-1
H[i]+=E[i]-E[i+1]
end for
end for
 A naïve 1D FDTD algorithm.
 It is calculating all field values N for every NT
timesteps.
9

INTRODUCTION
• FDTD, remains to be a challenging task for
the computers and devices running it due to
it’s high demands of computational power
and memory bandwidth .
• Programs can’t leverage fully efficiently from
the evolving processor power upgrades
matching Moore’s Law , as processors spend
more than %80 of their time waiting for a
data to process or to be received from the
main memory.
10

INTRODUCTION
• Stencil codes such as FDTD kernels includes
cascaded loops forcing processors to make a lot
of memory read and writes. This is because of
problem sizes in general are too big to fit inside
the biggest cache component of the processor.
• Special feature of stencil codes are known as
datas are somehow related to it’s neighbours.
• In case of FDTD kernels, this is happening
between E-fields and H-fields. Space and time
elements are dependent to elements close by in
FDTD, as a result of Maxwell’s equations.
11

A data dependency graph, showing how the elements at different space and time are related to
each others computations as shown at the FDTD formula.
12

Values that can be computed from tile after some values are loaded initially.
13

 As programs can’t leverage fully efficiently from the evolving processor power upgrades matching
Moore’s Law, one factor that is becoming more and more important is how well the algorithm takes
advantage of the memory hierarchy, its memory performance .
 Memory access speed is very important in modern microprocessors. And this is a reason that we will
focus our work to cache memory hierarchies to make the most of effective cache replacement methods
to
 reduce cache miss rates
 improving locality of data
 making the fast data access possible between processor and memory via effective cache usage.
14
INTRODUCTION

 Cache-aware time-skewing algorithms takes advantage of explicitly defined processor details which is
being used with. As the algorithm stores data together in the same block , and as mentioned earlier, this
is the reasons that processors memory page size and cache lines should be included inside algorithm.
 This is a vital part as the algorithm is taking advantage if processors cache behavior as it’s main objective
is minimizing the movement of memory pages in processors cache.
 Objectives will be focused on loop tiling , time skewing , reducing CPU stalls with data locality
optimizations. Significant rise on the performance will be expected as a result of these optimization
steps.
15
INTRODUCTION

INTRODUCTION
 FDTD solvers demands expensive hardware with parallelism features to run smoothly and accurately,
 Our objective was to extend previous researches that provided ideas against these solutions.
 The main objective of this thesis is achieving better results in means of reliability, cache usage
and execution times for FDTD codes to make it available to run smoothly and accurately given
problems with also taking the physics and engineering aspects of the problem into account which
has been lacking in previous researches.
 Extension of previously known works on code optimizations such as loop blocking, cache-aware
algorithms and time-skewing techniques has been introduced as a contribution in details, instead
only including implicit informations.
16

LITERATURE REVIEW
 FDTD method
 References for understanding the problem and implementation of theory to code
 Changes and proposals for new FDTD techniques
 Solving FDTD problems for extreme conditions and specific problems
 Photonics , biomedicine
 Solving Schrodinger equations with a generalized FDTD approach
 Different implementations to software as V2D.
17

LITERATURE REVIEW
 Memory hierarchy and the "memory wall"
 Referring to important concepts of memory management and optimizations such as
 Memory hierarchy
 ‘Memory wall’ term
 Von Neumann bottleneck
 Roofline model
 Memory mountain
18

LITERATURE REVIEW
 Stencil codes and data dependencies
 Definition and types of stencils
 Approximating problem into stencil code
 Methodology of determination of data dependencies
 Other terms such as: Paralellism, GPU
 Locality optimizations
 Understanding the ‘Principle of locality’
 Important terms related to locality features of codes ( machine balance, computer balance, scalable locality)
 Different code optimization algorithms studies
19

METHODOLOGY
 Research design
 Code generation and validation
 Dependence and loop iteration analysis
 Finding optimal tiling and skewing
 Methodogical assumptions
20

METHODOLOGY
 Instrumentations
 Hardware
 Software
 Computer Benchmark
 Data Processing and Analysis
21

22
DATA PROCESSING AND ANALYSIS
Example

RESULTS AND DISCUSSIONS
25
 Generation and validation of codes
 1D-FDTD

29
 2D-FDTD

Summarizing, for both 1D FDTD and 2D FDTD:
 Cache profiling
 Execution time
 Data types and Programming Languages
 Compiler optimizations
 Future works
33

CONCLUSIONS
 Computational electromagnetics gained much more importance with improvements and demands of the
related technologies, such as antenna design, bio-medicine, wireless communications
 A good software implementation is a must for highly memory and computational intense code kernel
such as FDTD
 In this thesis, previous literature work was extended and demonstrated about the improvements with
software optimizations such as loop blocking, cache-aware algorithms and time-skewing for 1D and 2D
FDTD kernels.
34

CONCLUSIONS
 Difference between naive FDTD codes and applied algorithms applied were shown in the results for 1D
and 2D cases.
 Results that were achieved indicates that applying time-skewing algorithms, with the way that has been
done in this thesis, comes with increased total data references but with much better cache hit rate
performance from other codes.
 Performance of time-skewing is much visible in 2D code in terms of cache misses.
 Run-time graphs and improved L1 and L3 cache miss rates for 1D and 2D cases have been achieved and
demonstrated with results.
 Explanation of line-by-line cache misses are explained throughout the thesis.
35

Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms

Similar to Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms (20)

Recently uploaded

Recently uploaded (20)

Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms

Editor's Notes