SlideShare a Scribd company logo
1 of 35
IMPLEMENTATION AND OPTIMIZATION OF
FDTD KERNELS BY USING CACHE-AWARE
TIME-SKEWING ALGORITHMS
THESIS PRESENTATION
1
SERHAN OZBEY
WARSAW UNIVERSITY OF TECHNOLOGY
INSTITUTE OF TELECOMMUNICATIONS 16/03/2017
ABSTRACT
 The main goal of this thesis was to implement and optimize cache-aware time-skewing algorithms for
FDTD kernels to reduce cache misses and idle time of the processor.
 Large scale discretization of space and computations needed for electromagnetic simulations
 Importance of utilization and optimization of an efficient memory access pattern
 Naive implementation of FDTD method into code is a kernel with cascaded loops that makes data reads
and writes from memory to calculate EM fields.
 Exploiting data dependencies and locality features of FDTD kernel with a better usage of memory
hierarchy, reducing processors’ idle time is achievable
 Execution time of FDTD can take long if cascaded loops are not incremented in a way to use data
dependencies efficiently.
 Reduction of this idle time can be done with skewing and blocking time and space domains to force
loop iterations to follow data dependencies for a better access scheme with better usage of fast CPU
cache memories
TOPICS
1. INTRODUCTION
2. LITERATURE REVIEW
3. METHODOLOGY
4. RESULTS AND DISCUSSION
5. CONCLUSIONS
3
INTRODUCTION
 For sustainable and reliable telecommunication networks, modelling of efficient and durable network
components are highly demanded. This is done by modelling and producing efficient devices that
interacts well with electromagnetic disturbances that affects performance of such components.
 Considerations of factors such as electromagnetic radiation, scattering should be done by
electromagnetic modelling of devices to simulate interactions of devices with nature conditions and
materials existing in environment.
 This is done by modelling and producing efficient devices that interacts well with electromagnetic
disturbances that affects performance of such components
4
INTRODUCTION
 Computational electromagnetics (electromagnetic modeling): is the process of modeling the interaction
of electromagnetic fields with physical objects and the environment. Maxwell’s equations should be
solved, which will evaluate electric and magnetic fields according to given boundary and constitutional
relation conditions.
 By using computationally efficient approximations to Maxwell's equations, it is used to
 calculate antenna performance
 electromagnetic compatibility,
 radar cross section
 electromagnetic wave propagation when not in free space.
5
INTRODUCTION
 Computational electromagnetics have been the answer for electromagnetic simulations using latest
technology available. By now, there is many methods existing in domain such as integral form Maxwell’s
equation solvers like MoM or differential form Maxwell’s equation solvers as FEM and FDTD.
 To achieve high details and accuracy in these solvers, huge discretization of space and time elements
needed to solve these problems.
 This means memory should be used in an efficient way by exchanging spatial and temporal data
in a fast way to calculate the field values with Maxwell’s equations till the end of the given time.
6
INTRODUCTION
• FDTD, the numerical analysis technique which is
used widely in computational electromagnetics ,
belongs in the general class of grid-based
differential numerical modeling methods. The
time-dependent Maxwell's equations (in partial
differential form) are discretized using central-
difference approximations to the space and
time partial derivatives.
7
FDTD METHOD
 Solving Maxwell’s equations in time domain.
 Saving each frame (one time iteration of our
code) as a movie.
 Electric field changing at a particular point will
induce a curling (circulating) magnetic field.
 Likewise, an induced magnetic field induces
curling electric field.
 This leaves us with a leapfrog way of
calculations as shown at the figure on right
hand side.
8
FDTD METHOD
for t in 0 to NT-1
for i in 1 to N-1
E[i] = k1*E[i] + k2 * ( H[i] - H[i-1] )
end for
for i in 1 to N-1
H[i]+=E[i]-E[i+1]
end for
end for
 A naïve 1D FDTD algorithm.
 It is calculating all field values N for every NT
timesteps.
9
INTRODUCTION
• FDTD, remains to be a challenging task for
the computers and devices running it due to
it’s high demands of computational power
and memory bandwidth .
• Programs can’t leverage fully efficiently from
the evolving processor power upgrades
matching Moore’s Law , as processors spend
more than %80 of their time waiting for a
data to process or to be received from the
main memory.
10
INTRODUCTION
• Stencil codes such as FDTD kernels includes
cascaded loops forcing processors to make a lot
of memory read and writes. This is because of
problem sizes in general are too big to fit inside
the biggest cache component of the processor.
• Special feature of stencil codes are known as
datas are somehow related to it’s neighbours.
• In case of FDTD kernels, this is happening
between E-fields and H-fields. Space and time
elements are dependent to elements close by in
FDTD, as a result of Maxwell’s equations.
11
A data dependency graph, showing how the elements at different space and time are related to
each others computations as shown at the FDTD formula.
12
Values that can be computed from tile after some values are loaded initially.
13
 As programs can’t leverage fully efficiently from the evolving processor power upgrades matching
Moore’s Law, one factor that is becoming more and more important is how well the algorithm takes
advantage of the memory hierarchy, its memory performance .
 Memory access speed is very important in modern microprocessors. And this is a reason that we will
focus our work to cache memory hierarchies to make the most of effective cache replacement methods
to
 reduce cache miss rates
 improving locality of data
 making the fast data access possible between processor and memory via effective cache usage.
14
INTRODUCTION
 Cache-aware time-skewing algorithms takes advantage of explicitly defined processor details which is
being used with. As the algorithm stores data together in the same block , and as mentioned earlier, this
is the reasons that processors memory page size and cache lines should be included inside algorithm.
 This is a vital part as the algorithm is taking advantage if processors cache behavior as it’s main objective
is minimizing the movement of memory pages in processors cache.
 Objectives will be focused on loop tiling , time skewing , reducing CPU stalls with data locality
optimizations. Significant rise on the performance will be expected as a result of these optimization
steps.
15
INTRODUCTION
INTRODUCTION
 FDTD solvers demands expensive hardware with parallelism features to run smoothly and accurately,
 Our objective was to extend previous researches that provided ideas against these solutions.
 The main objective of this thesis is achieving better results in means of reliability, cache usage
and execution times for FDTD codes to make it available to run smoothly and accurately given
problems with also taking the physics and engineering aspects of the problem into account which
has been lacking in previous researches.
 Extension of previously known works on code optimizations such as loop blocking, cache-aware
algorithms and time-skewing techniques has been introduced as a contribution in details, instead
only including implicit informations.
16
LITERATURE REVIEW
 FDTD method
 References for understanding the problem and implementation of theory to code
 Changes and proposals for new FDTD techniques
 Solving FDTD problems for extreme conditions and specific problems
 Photonics , biomedicine
 Solving Schrodinger equations with a generalized FDTD approach
 Different implementations to software as V2D.
17
LITERATURE REVIEW
 Memory hierarchy and the "memory wall"
 Referring to important concepts of memory management and optimizations such as
 Memory hierarchy
 ‘Memory wall’ term
 Von Neumann bottleneck
 Roofline model
 Memory mountain
18
LITERATURE REVIEW
 Stencil codes and data dependencies
 Definition and types of stencils
 Approximating problem into stencil code
 Methodology of determination of data dependencies
 Other terms such as: Paralellism, GPU
 Locality optimizations
 Understanding the ‘Principle of locality’
 Important terms related to locality features of codes ( machine balance, computer balance, scalable locality)
 Different code optimization algorithms studies
19
METHODOLOGY
 Research design
 Code generation and validation
 Dependence and loop iteration analysis
 Finding optimal tiling and skewing
 Methodogical assumptions
20
METHODOLOGY
 Instrumentations
 Hardware
 Software
 Computer Benchmark
 Data Processing and Analysis
21
22
DATA PROCESSING AND ANALYSIS
Example
23
Example
METHODOLOGY
24
RESULTS AND DISCUSSIONS
25
 Generation and validation of codes
 1D-FDTD
1D-FDTD
26
1D-FDTD
27
1D-FDTD
28
RESULTS AND DISCUSSIONS
29
 2D-FDTD
RESULTS AND DISCUSSIONS
30
RESULTS AND DISCUSSIONS
31
32
 Outputs and Discussion
Summarizing, for both 1D FDTD and 2D FDTD:
 Cache profiling
 Execution time
 Data types and Programming Languages
 Compiler optimizations
 Future works
33
RESULTS AND DISCUSSIONS
CONCLUSIONS
 Computational electromagnetics gained much more importance with improvements and demands of the
related technologies, such as antenna design, bio-medicine, wireless communications
 A good software implementation is a must for highly memory and computational intense code kernel
such as FDTD
 In this thesis, previous literature work was extended and demonstrated about the improvements with
software optimizations such as loop blocking, cache-aware algorithms and time-skewing for 1D and 2D
FDTD kernels.
34
CONCLUSIONS
 Difference between naive FDTD codes and applied algorithms applied were shown in the results for 1D
and 2D cases.
 Results that were achieved indicates that applying time-skewing algorithms, with the way that has been
done in this thesis, comes with increased total data references but with much better cache hit rate
performance from other codes.
 Performance of time-skewing is much visible in 2D code in terms of cache misses.
 Run-time graphs and improved L1 and L3 cache miss rates for 1D and 2D cases have been achieved and
demonstrated with results.
 Explanation of line-by-line cache misses are explained throughout the thesis.
35

More Related Content

What's hot

Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networkseSAT Publishing House
 
IEEE Emerging topic in computing Title and Abstract 2016
IEEE Emerging topic in computing Title and Abstract 2016 IEEE Emerging topic in computing Title and Abstract 2016
IEEE Emerging topic in computing Title and Abstract 2016 tsysglobalsolutions
 
Crdom cell re ordering based domino on-the-fly mapping
Crdom  cell re ordering based domino on-the-fly mappingCrdom  cell re ordering based domino on-the-fly mapping
Crdom cell re ordering based domino on-the-fly mappingVLSICS Design
 
Energy and latency aware application
Energy and latency aware applicationEnergy and latency aware application
Energy and latency aware applicationcsandit
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...IDES Editor
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...CSCJournals
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...ijcsa
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 

What's hot (15)

I017425763
I017425763I017425763
I017425763
 
Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networks
 
IEEE Emerging topic in computing Title and Abstract 2016
IEEE Emerging topic in computing Title and Abstract 2016 IEEE Emerging topic in computing Title and Abstract 2016
IEEE Emerging topic in computing Title and Abstract 2016
 
Crdom cell re ordering based domino on-the-fly mapping
Crdom  cell re ordering based domino on-the-fly mappingCrdom  cell re ordering based domino on-the-fly mapping
Crdom cell re ordering based domino on-the-fly mapping
 
Energy and latency aware application
Energy and latency aware applicationEnergy and latency aware application
Energy and latency aware application
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...
 
hetero_pim
hetero_pimhetero_pim
hetero_pim
 
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training...
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Jc2415921599
Jc2415921599Jc2415921599
Jc2415921599
 
ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07ACES_Journal_February_2012_Paper_07
ACES_Journal_February_2012_Paper_07
 
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural NetworkModelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
Modelling the Single Chamber Solid Oxide Fuel Cell by Artificial Neural Network
 
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 

Similar to Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms

A mixed decimation mdf architecture for radix-2k parallel fft
A mixed decimation mdf architecture for radix-2k parallel fftA mixed decimation mdf architecture for radix-2k parallel fft
A mixed decimation mdf architecture for radix-2k parallel fftI3E Technologies
 
RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)IJCSEA Journal
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Time and resource constrained offloading with multi-task in a mobile edge co...
Time and resource constrained offloading with multi-task  in a mobile edge co...Time and resource constrained offloading with multi-task  in a mobile edge co...
Time and resource constrained offloading with multi-task in a mobile edge co...IJECEIAES
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsJigisha Aryya
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...ijgca
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...ijgca
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGcscpconf
 
OptiFDTD manual
OptiFDTD manualOptiFDTD manual
OptiFDTD manualLuis Brito
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...ijgca
 
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...VLSICS Design
 
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...TELKOMNIKA JOURNAL
 
implementation of area efficient high speed eddr architecture
implementation of area efficient high speed eddr architectureimplementation of area efficient high speed eddr architecture
implementation of area efficient high speed eddr architectureKumar Goud
 
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular TrajectoryArea Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular TrajectoryIOSR Journals
 
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATIONEVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATIONVLSICS Design
 
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET  CHARACTERISTICS BY TCAD SIMULATIONEVALUATION OF OPTICALLY ILLUMINATED MOSFET  CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATIONVLSICS Design
 
Dark silicon and the end of multicore scaling
Dark silicon and the end of multicore scalingDark silicon and the end of multicore scaling
Dark silicon and the end of multicore scalingLéia de Sousa
 

Similar to Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms (20)

A mixed decimation mdf architecture for radix-2k parallel fft
A mixed decimation mdf architecture for radix-2k parallel fftA mixed decimation mdf architecture for radix-2k parallel fft
A mixed decimation mdf architecture for radix-2k parallel fft
 
RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Time and resource constrained offloading with multi-task in a mobile edge co...
Time and resource constrained offloading with multi-task  in a mobile edge co...Time and resource constrained offloading with multi-task  in a mobile edge co...
Time and resource constrained offloading with multi-task in a mobile edge co...
 
Dg34662666
Dg34662666Dg34662666
Dg34662666
 
Algorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systemsAlgorithm selection for sorting in embedded and mobile systems
Algorithm selection for sorting in embedded and mobile systems
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDMOD...
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
 
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSINGHOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
HOMOGENEOUS MULTISTAGE ARCHITECTURE FOR REAL-TIME IMAGE PROCESSING
 
OptiFDTD manual
OptiFDTD manualOptiFDTD manual
OptiFDTD manual
 
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...
 
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...
 
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...
A New Approach for Solving Inverse Scattering Problems with Overset Grid Gene...
 
implementation of area efficient high speed eddr architecture
implementation of area efficient high speed eddr architectureimplementation of area efficient high speed eddr architecture
implementation of area efficient high speed eddr architecture
 
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular TrajectoryArea Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory
Area Time Efficient Scaling Free Rotation Mode Cordic Using Circular Trajectory
 
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATIONEVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
 
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET  CHARACTERISTICS BY TCAD SIMULATIONEVALUATION OF OPTICALLY ILLUMINATED MOSFET  CHARACTERISTICS BY TCAD SIMULATION
EVALUATION OF OPTICALLY ILLUMINATED MOSFET CHARACTERISTICS BY TCAD SIMULATION
 
Dark silicon and the end of multicore scaling
Dark silicon and the end of multicore scalingDark silicon and the end of multicore scaling
Dark silicon and the end of multicore scaling
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Implementation and Optimization of FDTD Kernels by Using Cache-Aware Time-Skewing Algorithms

  • 1. IMPLEMENTATION AND OPTIMIZATION OF FDTD KERNELS BY USING CACHE-AWARE TIME-SKEWING ALGORITHMS THESIS PRESENTATION 1 SERHAN OZBEY WARSAW UNIVERSITY OF TECHNOLOGY INSTITUTE OF TELECOMMUNICATIONS 16/03/2017
  • 2. ABSTRACT  The main goal of this thesis was to implement and optimize cache-aware time-skewing algorithms for FDTD kernels to reduce cache misses and idle time of the processor.  Large scale discretization of space and computations needed for electromagnetic simulations  Importance of utilization and optimization of an efficient memory access pattern  Naive implementation of FDTD method into code is a kernel with cascaded loops that makes data reads and writes from memory to calculate EM fields.  Exploiting data dependencies and locality features of FDTD kernel with a better usage of memory hierarchy, reducing processors’ idle time is achievable  Execution time of FDTD can take long if cascaded loops are not incremented in a way to use data dependencies efficiently.  Reduction of this idle time can be done with skewing and blocking time and space domains to force loop iterations to follow data dependencies for a better access scheme with better usage of fast CPU cache memories
  • 3. TOPICS 1. INTRODUCTION 2. LITERATURE REVIEW 3. METHODOLOGY 4. RESULTS AND DISCUSSION 5. CONCLUSIONS 3
  • 4. INTRODUCTION  For sustainable and reliable telecommunication networks, modelling of efficient and durable network components are highly demanded. This is done by modelling and producing efficient devices that interacts well with electromagnetic disturbances that affects performance of such components.  Considerations of factors such as electromagnetic radiation, scattering should be done by electromagnetic modelling of devices to simulate interactions of devices with nature conditions and materials existing in environment.  This is done by modelling and producing efficient devices that interacts well with electromagnetic disturbances that affects performance of such components 4
  • 5. INTRODUCTION  Computational electromagnetics (electromagnetic modeling): is the process of modeling the interaction of electromagnetic fields with physical objects and the environment. Maxwell’s equations should be solved, which will evaluate electric and magnetic fields according to given boundary and constitutional relation conditions.  By using computationally efficient approximations to Maxwell's equations, it is used to  calculate antenna performance  electromagnetic compatibility,  radar cross section  electromagnetic wave propagation when not in free space. 5
  • 6. INTRODUCTION  Computational electromagnetics have been the answer for electromagnetic simulations using latest technology available. By now, there is many methods existing in domain such as integral form Maxwell’s equation solvers like MoM or differential form Maxwell’s equation solvers as FEM and FDTD.  To achieve high details and accuracy in these solvers, huge discretization of space and time elements needed to solve these problems.  This means memory should be used in an efficient way by exchanging spatial and temporal data in a fast way to calculate the field values with Maxwell’s equations till the end of the given time. 6
  • 7. INTRODUCTION • FDTD, the numerical analysis technique which is used widely in computational electromagnetics , belongs in the general class of grid-based differential numerical modeling methods. The time-dependent Maxwell's equations (in partial differential form) are discretized using central- difference approximations to the space and time partial derivatives. 7
  • 8. FDTD METHOD  Solving Maxwell’s equations in time domain.  Saving each frame (one time iteration of our code) as a movie.  Electric field changing at a particular point will induce a curling (circulating) magnetic field.  Likewise, an induced magnetic field induces curling electric field.  This leaves us with a leapfrog way of calculations as shown at the figure on right hand side. 8
  • 9. FDTD METHOD for t in 0 to NT-1 for i in 1 to N-1 E[i] = k1*E[i] + k2 * ( H[i] - H[i-1] ) end for for i in 1 to N-1 H[i]+=E[i]-E[i+1] end for end for  A naïve 1D FDTD algorithm.  It is calculating all field values N for every NT timesteps. 9
  • 10. INTRODUCTION • FDTD, remains to be a challenging task for the computers and devices running it due to it’s high demands of computational power and memory bandwidth . • Programs can’t leverage fully efficiently from the evolving processor power upgrades matching Moore’s Law , as processors spend more than %80 of their time waiting for a data to process or to be received from the main memory. 10
  • 11. INTRODUCTION • Stencil codes such as FDTD kernels includes cascaded loops forcing processors to make a lot of memory read and writes. This is because of problem sizes in general are too big to fit inside the biggest cache component of the processor. • Special feature of stencil codes are known as datas are somehow related to it’s neighbours. • In case of FDTD kernels, this is happening between E-fields and H-fields. Space and time elements are dependent to elements close by in FDTD, as a result of Maxwell’s equations. 11
  • 12. A data dependency graph, showing how the elements at different space and time are related to each others computations as shown at the FDTD formula. 12
  • 13. Values that can be computed from tile after some values are loaded initially. 13
  • 14.  As programs can’t leverage fully efficiently from the evolving processor power upgrades matching Moore’s Law, one factor that is becoming more and more important is how well the algorithm takes advantage of the memory hierarchy, its memory performance .  Memory access speed is very important in modern microprocessors. And this is a reason that we will focus our work to cache memory hierarchies to make the most of effective cache replacement methods to  reduce cache miss rates  improving locality of data  making the fast data access possible between processor and memory via effective cache usage. 14 INTRODUCTION
  • 15.  Cache-aware time-skewing algorithms takes advantage of explicitly defined processor details which is being used with. As the algorithm stores data together in the same block , and as mentioned earlier, this is the reasons that processors memory page size and cache lines should be included inside algorithm.  This is a vital part as the algorithm is taking advantage if processors cache behavior as it’s main objective is minimizing the movement of memory pages in processors cache.  Objectives will be focused on loop tiling , time skewing , reducing CPU stalls with data locality optimizations. Significant rise on the performance will be expected as a result of these optimization steps. 15 INTRODUCTION
  • 16. INTRODUCTION  FDTD solvers demands expensive hardware with parallelism features to run smoothly and accurately,  Our objective was to extend previous researches that provided ideas against these solutions.  The main objective of this thesis is achieving better results in means of reliability, cache usage and execution times for FDTD codes to make it available to run smoothly and accurately given problems with also taking the physics and engineering aspects of the problem into account which has been lacking in previous researches.  Extension of previously known works on code optimizations such as loop blocking, cache-aware algorithms and time-skewing techniques has been introduced as a contribution in details, instead only including implicit informations. 16
  • 17. LITERATURE REVIEW  FDTD method  References for understanding the problem and implementation of theory to code  Changes and proposals for new FDTD techniques  Solving FDTD problems for extreme conditions and specific problems  Photonics , biomedicine  Solving Schrodinger equations with a generalized FDTD approach  Different implementations to software as V2D. 17
  • 18. LITERATURE REVIEW  Memory hierarchy and the "memory wall"  Referring to important concepts of memory management and optimizations such as  Memory hierarchy  ‘Memory wall’ term  Von Neumann bottleneck  Roofline model  Memory mountain 18
  • 19. LITERATURE REVIEW  Stencil codes and data dependencies  Definition and types of stencils  Approximating problem into stencil code  Methodology of determination of data dependencies  Other terms such as: Paralellism, GPU  Locality optimizations  Understanding the ‘Principle of locality’  Important terms related to locality features of codes ( machine balance, computer balance, scalable locality)  Different code optimization algorithms studies 19
  • 20. METHODOLOGY  Research design  Code generation and validation  Dependence and loop iteration analysis  Finding optimal tiling and skewing  Methodogical assumptions 20
  • 21. METHODOLOGY  Instrumentations  Hardware  Software  Computer Benchmark  Data Processing and Analysis 21
  • 22. 22 DATA PROCESSING AND ANALYSIS Example
  • 25. RESULTS AND DISCUSSIONS 25  Generation and validation of codes  1D-FDTD
  • 32. 32  Outputs and Discussion
  • 33. Summarizing, for both 1D FDTD and 2D FDTD:  Cache profiling  Execution time  Data types and Programming Languages  Compiler optimizations  Future works 33 RESULTS AND DISCUSSIONS
  • 34. CONCLUSIONS  Computational electromagnetics gained much more importance with improvements and demands of the related technologies, such as antenna design, bio-medicine, wireless communications  A good software implementation is a must for highly memory and computational intense code kernel such as FDTD  In this thesis, previous literature work was extended and demonstrated about the improvements with software optimizations such as loop blocking, cache-aware algorithms and time-skewing for 1D and 2D FDTD kernels. 34
  • 35. CONCLUSIONS  Difference between naive FDTD codes and applied algorithms applied were shown in the results for 1D and 2D cases.  Results that were achieved indicates that applying time-skewing algorithms, with the way that has been done in this thesis, comes with increased total data references but with much better cache hit rate performance from other codes.  Performance of time-skewing is much visible in 2D code in terms of cache misses.  Run-time graphs and improved L1 and L3 cache miss rates for 1D and 2D cases have been achieved and demonstrated with results.  Explanation of line-by-line cache misses are explained throughout the thesis. 35

Editor's Notes

  1. Hello Dear Professors and valuable members of our institute of Telecommunications , I’m Serhan Ozbey. I am a graduate of Electrical & Electronics Engineering from Yasar University in Turkey. And I will be presenting my Master’s Thesis today in partial fulllment of the requirements for the degree of Master of Science in Telecommunications.
  2. FDTD meaning Large scale discretization of space and computations: By FDTD technique, we are handling time-domain problem by gridding both space and time. For each time step incrementation, we are making calculations for each field grid. Loop optimizations: process of increasing execution speed and reducing the overheads associated of loops. Most execution time of a scientific program is spent on loops Cache-aware algorithm: Time skewing:
  3. In my thesis, I decided to structure topics in this way. And I will be following it today at my presentation for clarification of this complex problem. 1) I made a brief Introduction to the problem by summarizing the problem, diagnosis of the problem and proposed solutions. Provided Background information about the frequently used terms throughout the thesis. 2) Then mentioned previous literatures that were used for a deeper understanding of the problem by improving knowledge and continuing the evaluation of the thesis with proposed techniques. 3) Methodology where I summarized which steps were taken in order to realize the results 4) Results and Discussion part where the results that were obtained following methodology steps we proposed. Possible future work discussion considering previous literatures And conclusion
  4. Electromagnetic interference also called radio-frequency interference (RFI) when in the radio frequency spectrum, is a disturbance generated by an external source that affects an electrical circuit by electromagnetic induction, electrostatic coupling, or conduction.
  5. Gauss's law - The electric flux leaving a volume is proportional to the charge inside. Gauss's law for magnetism - There are no magnetic monopoles; the total magnetic flux through a closed surface is zero. Maxwell–Faraday equation (Faraday's law of induction) - The voltage induced in a closed circuit is proportional to the rate of change of the magnetic flux it encloses. Ampère's law (with Maxwell's extension) - The magnetic field integrated around a closed loop is proportional to the electric current plus displacement current (rate of change of electric field) it encloses.
  6. What are the options? What can be better? MoM FEM FDTD Although in principal these technologies could be used to solve the same problems there are often good practical reasons why one particular simulator is better suited to solving a particular problem type
  7. The principle of using finite-difference approximations is an effective solution to deal with complex geometries of real-life problems by solving Maxwell’s equations in time-domain. Modelling in time-domain is really suitable to see transient phenomena of related problems. A basic example can be detection of a moving plane by radar, by producing electromagnetic radiation for detection, As in Figure. Approximation of the problem can be thought as making movies of electric and magnetic fields flowing through a media or device, as it is a time-domain method. Each iteration of Maxwell’s equations for fields are one frame of the movie. By the knowledge of electric field and magnetic field, calculation of many measurable components with the knowledge of the experimented medium, device or the environment coefficients has been done.
  8. Yee’s grid: Yee’s grid has been chosen because of it’s structure of different field components for different grid locations, there will not be any intersecting field values This grid is built by dividing space into discrete cells, but as there is still infinite information inside the cells, storage of information is done at one single point in each cell. Modes of FDTD Time dependent curl equations: are used because of the Maxwell’s diff equations
  9. A basic kernel of a FDTD algorithm. A really simplified version as at the first basic implementations we are not considering to implement parameters like grid widths and heights or wavelengths. We are inducing the field with a basic pulse like Gaussian to see the response of the field. Realizations starts with a pulse propagation in free space.
  10. What are the challenges of electromagnetic modelling? Moore’s Law, the transistor count of the integrated circuits doubles approximately every two years. On most modern microprocessors, the majority of transistors are contained in caches. Comes with improved power efficiency, higher core counts, and bigger last levels caches.
  11. Memory hierarchy: Many studies proved that solutions can be found with optimizing memory accesses of the programs by making the best out of the running systems’ memory architecture. This structure of FDTD’s allows to implement and optimize naive code with a new one which leverages spatial and temporal locality features. In this thesis, time-skewing principle has been focused and evaluated through experiments.
  12. This graph is really important in our case as this gives us the idea that we only try to reach to the last time step values meanwhile calculating other timesteps. So these data can be stored temporary.
  13. Temporal: Recently referenced data or instruction is likely to be referenced again in near future. Spatial: Data or instruction with nearby addresses tend to be referenced together.
  14. Objective Also effects on memory bound and computationally intensive codes on memory architecture will be investigated by running modified benchmark tools and further theoretical calculations. Validation of generated FDTD codes using locality optimization algorithms will also be investigated.
  15. Downgrading problems to 1d or 2d, Free space formulation of FDTD Transverse magnetic (TM) modes: no magnetic field in the direction of propagation. detections of breast cancers using 2D FDTD method to realize malignant tumors. changing FDTD solving theories is [37] where authors invented a hybrid FEM-FDTD method. Another research has made to implement nonuniform mesh grids for FDTD to decrease the resolution for the specific parts of the problem space that are out of intense interest V2D created to solve specific axisymmetrical devices with a unique approach of 2D solution. By simulating both circular TE and TM waveguide modes instead for one, it was proved that modelling is faster than using 3D FDTD Schrodinger eq: mathematical equation that describes the evolution over time of a physical system in which quantum effects, such as wave–particle duality, are significant. The equation is a mathematical formulation for studying quantum mechanical systems
  16. Memory hierarchy: To avoid one really expensive memory components, several memory components such as registers, caches, GPU caches, main memory are used. Memory wall: processors speeds exceeding rate of improvements at main memory The von Neumann bottleneck is the idea that computer system throughput is limited due to the relative ability of processors compared to top rates of data transfer Roofline model: As stated, this graph is a function of machine peak performance, machine peak bandwidth, and computation intensity. By plotting this graph, realization of the basic idea about the expected performance according to the computation intensity that the program demands is considered
  17. Research design -Generation of FDTD code converted from FDTD theory and sources -Analysis of FDTD theory and code to obtain data dependencies and iteration space -Realization of data dependency graphs -Optimal tiling and skewing implementation designs -Generation of optimized source code -Comparisons and tests with the output codes Methodological as: -free-space (vacuum). This means that an external array to hold field coefficients was neglected. -Materials simulated are assumed non-magnetic: -Boundary conditions were not set -Normalization of Gaussian units -Impulse response hard source as an excitation -Factor of grid resolution was neglected in this thesis -Courant’s stability condition is defined as -Calculation of cache rates, cache associativity factors -The slight skew of time distance difference observed between Ex and Hy sources are normal
  18. Why it was chosen? Hardware Dell XPS, I selected this as I own this and believe that it is an below-average PC that should be able to run these experiments with acceptable rates. Component preferences are listed in details at the thesis. Software Ubuntu Linux, Compiler Explorer, valgrind Computer Benchmark Simulation of some reliable computer memory benchmarks has been done throughout the thesis in order to determine performance of the memory hierarchy of the computer used. Data Processing and Analysis The most important metrics of this thesis was memory events happening at processors and execution time as the both phenomena are directly related with each other. After generation of the optimized codes, data was analysed in the following order:
  19. We store elements as float. One float is 4 bytes (single precision). As one cache line is 64 byte with one cache line transfer, we actually transfer 16 float elements. To eliminate this, we increment loop with 64.
  20. As in our research design, the codes were written in C and C++11 languages by using code optimizations and dependency relations of FDTD stencil codes. Then a variety of benchmark tools that has been modified to work for our specific machine to learn our computers’ capabilities was used. This experiments were run in order to set metrics and knowing the room for improvement. Then calculations were made about cache hit/miss scenarios and expectations were listed about execution times. Then comparison of these calculations and expectations with the actual results was made from the tools that was tested the codes, such as valgrind, cachegrind, perf and our own execution time test functions. Then comparison and discussion of all findings to other related works in the field has been made.
  21. As in our research design, the codes were written in C and C++11 languages by using code optimizations and dependency relations of FDTD stencil codes. Then a variety of benchmark tools that has been modified to work for our specific machine to learn our computers’ capabilities was used. This experiments were run in order to set metrics and knowing the room for improvement. Then calculations were made about cache hit/miss scenarios and expectations were listed about execution times. Then comparison of these calculations and expectations with the actual results was made from the tools that was tested the codes, such as valgrind, cachegrind, perf and our own execution time test functions. Then comparison and discussion of all findings to other related works in the field has been made.
  22. FUTURE WORKS: prefetching, parallel processing, adding GPU and other hardware specifications into the problem outer tiling to use the most of L2 and L3 cache rates Simple approach was made by keeping this measure for only at L1 cache in our experiment as valgrind tool does not support L2 memory access event information.