SlideShare a Scribd company logo
Stencil Computation
Research Project
Jishnu P | Reshmi Mitra
Presentation #1 | Date: 11-Jul-2017
The Agenda
Discuss 2 or 3 Optimization
techniques from
An Auto-Tuning
Framework for Parallel
Multicore Stencil
Computations
Optimizations
Techniques used in the auto
tuning framework.
Several common optimizations have
been implemented in the framework
as AST transformations, including
● Loop Unrolling
● Cache Blocking
● Arithmetic Simplification
Loop Unrolling
Cache Blocking
To expose temporal locality and increase cache reuse
Cache Blocking
● An important class of algorithmic changes involves blocking data structures to
fit in cache.
● By organizing data memory accesses, one can load the cache with a small
subset of a much larger data set.
● The idea is then to work on this block of data in cache.
● By using/reusing this data in cache we reduce the need to go to memory
(reduce memory bandwidth pressure).
An example.
Example contd...
Arithmetic simplification
AST - Abstract Syntax Tree
● Abstract syntax trees are data structures widely used in compilers,
due to their property of representing the structure of program code.
● An AST is usually the result of the syntax analysis phase of a
compiler.
● It often serves as an intermediate representation of the program
through several stages that the compiler requires, and has a strong
impact on the final output of the compiler.
AST example
These were some of the serial optimizations.
● Although the current set of optimizations may seem identical to existing
compiler optimizations, future strategies such as memory structure
transformations will be beyond the scope of compilers, since such
optimizations are specific to stencil-based computations.
● Additionally, the fact that the framework’s transformations yield code that
outperforms compiler-only optimized versions shows compiler algorithms
cannot always prove that these (safe) optimizations are allowed.
● Thus, a domain-specific code generator run by the user has the freedom to
implement transformations that a compiler may not.
Parallelization optimization
Parellel Optimization
● The shared-memory parallel code generators leverage the serial code
generation routines to produce the version run by each individual
thread.
● Since the parallelization strategy influences code structure, the AST —
which represents code run on each individual thread — must be
modified to reflect the chosen parallelization strategy.
● The parallel code generators make the necessary modifications to the
AST before passing it to the serial code generator.
Stencil auto-tuning framework flow
References
● http://people.csail.mit.edu/cycha
n/papers/ipdps10.pdf
● https://en.wikipedia.org/wiki/Abs
tract_syntax_tree
● https://www.youtube.com/watch
?v=SfV8aRX0YY0
● https://software.intel.com/en-us/
articles/cache-blocking-techniqu
es
Sometimes it is good to revisit our learnings. It helps to be a
good competitor and also to be prepared for grabbing
opportunities.
Thank you

More Related Content

Similar to Stencil computation research project presentation #1

Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
ijesajournal
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
ijesajournal
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
Bharath Sudharsan
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
IndicThreads
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
IJERD Editor
 
Different Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache MemoryDifferent Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache Memory
Dhritiman Halder
 
Cache memory
Cache memoryCache memory
Cache memory
Eklavya Gupta
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
PingCAP
 
Mod 3.pptx
Mod 3.pptxMod 3.pptx
Mod 3.pptx
lekha349785
 
Iaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpgaIaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpga
Iaetsd Iaetsd
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
csandit
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
CSCJournals
 
Developing, testing and distributing elasticsearch beats in a complex, heter...
Developing, testing and distributing elasticsearch beats in  a complex, heter...Developing, testing and distributing elasticsearch beats in  a complex, heter...
Developing, testing and distributing elasticsearch beats in a complex, heter...
Jesper Agerled Wermuth
 
Robust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory InterfaceRobust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory Interface
IOSRJVSP
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
Akhil Kaushik
 
Introduction to Microcontrollers
Introduction to MicrocontrollersIntroduction to Microcontrollers
Introduction to Microcontrollers
SaravananVijayakumar4
 
Embedded C
Embedded CEmbedded C
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
Anil Bohare
 
A Proficient Recognition Method for ML-AHB Bus Matrix
A Proficient Recognition Method for ML-AHB Bus MatrixA Proficient Recognition Method for ML-AHB Bus Matrix
A Proficient Recognition Method for ML-AHB Bus Matrix
IRJET Journal
 
Gate-Level Simulation Methodology Improving Gate-Level Simulation Performance
Gate-Level Simulation Methodology Improving Gate-Level Simulation PerformanceGate-Level Simulation Methodology Improving Gate-Level Simulation Performance
Gate-Level Simulation Methodology Improving Gate-Level Simulation Performance
suddentrike2
 

Similar to Stencil computation research project presentation #1 (20)

Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...Dominant block guided optimal cache size estimation to maximize ipc of embedd...
Dominant block guided optimal cache size estimation to maximize ipc of embedd...
 
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
ECML PKDD 2021 ML meets IoT Tutorial Part III: Deep Optimizations of CNNs and...
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Different Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache MemoryDifferent Approaches in Energy Efficient Cache Memory
Different Approaches in Energy Efficient Cache Memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
 
Mod 3.pptx
Mod 3.pptxMod 3.pptx
Mod 3.pptx
 
Iaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpgaIaetsd march c algorithm for embedded memories in fpga
Iaetsd march c algorithm for embedded memories in fpga
 
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Para...
 
Concurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core ProcessorsConcurrent Matrix Multiplication on Multi-core Processors
Concurrent Matrix Multiplication on Multi-core Processors
 
Developing, testing and distributing elasticsearch beats in a complex, heter...
Developing, testing and distributing elasticsearch beats in  a complex, heter...Developing, testing and distributing elasticsearch beats in  a complex, heter...
Developing, testing and distributing elasticsearch beats in a complex, heter...
 
Robust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory InterfaceRobust Fault Tolerance in Content Addressable Memory Interface
Robust Fault Tolerance in Content Addressable Memory Interface
 
Code Optimization
Code OptimizationCode Optimization
Code Optimization
 
Introduction to Microcontrollers
Introduction to MicrocontrollersIntroduction to Microcontrollers
Introduction to Microcontrollers
 
Embedded C
Embedded CEmbedded C
Embedded C
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
A Proficient Recognition Method for ML-AHB Bus Matrix
A Proficient Recognition Method for ML-AHB Bus MatrixA Proficient Recognition Method for ML-AHB Bus Matrix
A Proficient Recognition Method for ML-AHB Bus Matrix
 
Gate-Level Simulation Methodology Improving Gate-Level Simulation Performance
Gate-Level Simulation Methodology Improving Gate-Level Simulation PerformanceGate-Level Simulation Methodology Improving Gate-Level Simulation Performance
Gate-Level Simulation Methodology Improving Gate-Level Simulation Performance
 

More from Jishnu P

SinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageSinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural Image
Jishnu P
 
Breaking CAPTCHAs using ML
Breaking CAPTCHAs using MLBreaking CAPTCHAs using ML
Breaking CAPTCHAs using ML
Jishnu P
 
Btp 2017 presentation
Btp 2017 presentationBtp 2017 presentation
Btp 2017 presentation
Jishnu P
 
Ir mcq-answering-system
Ir mcq-answering-systemIr mcq-answering-system
Ir mcq-answering-system
Jishnu P
 
Cs403 Parellel Programming Travelling Salesman Problem
Cs403   Parellel Programming Travelling Salesman ProblemCs403   Parellel Programming Travelling Salesman Problem
Cs403 Parellel Programming Travelling Salesman Problem
Jishnu P
 
Ansible Overview - System Administration and Maintenance
Ansible Overview - System Administration and MaintenanceAnsible Overview - System Administration and Maintenance
Ansible Overview - System Administration and Maintenance
Jishnu P
 
CS404 Pattern Recognition - Locality Preserving Projections
CS404   Pattern Recognition - Locality Preserving ProjectionsCS404   Pattern Recognition - Locality Preserving Projections
CS404 Pattern Recognition - Locality Preserving Projections
Jishnu P
 

More from Jishnu P (7)

SinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural ImageSinGAN - Learning a Generative Model from a Single Natural Image
SinGAN - Learning a Generative Model from a Single Natural Image
 
Breaking CAPTCHAs using ML
Breaking CAPTCHAs using MLBreaking CAPTCHAs using ML
Breaking CAPTCHAs using ML
 
Btp 2017 presentation
Btp 2017 presentationBtp 2017 presentation
Btp 2017 presentation
 
Ir mcq-answering-system
Ir mcq-answering-systemIr mcq-answering-system
Ir mcq-answering-system
 
Cs403 Parellel Programming Travelling Salesman Problem
Cs403   Parellel Programming Travelling Salesman ProblemCs403   Parellel Programming Travelling Salesman Problem
Cs403 Parellel Programming Travelling Salesman Problem
 
Ansible Overview - System Administration and Maintenance
Ansible Overview - System Administration and MaintenanceAnsible Overview - System Administration and Maintenance
Ansible Overview - System Administration and Maintenance
 
CS404 Pattern Recognition - Locality Preserving Projections
CS404   Pattern Recognition - Locality Preserving ProjectionsCS404   Pattern Recognition - Locality Preserving Projections
CS404 Pattern Recognition - Locality Preserving Projections
 

Recently uploaded

Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Stencil computation research project presentation #1

  • 1. Stencil Computation Research Project Jishnu P | Reshmi Mitra Presentation #1 | Date: 11-Jul-2017
  • 2. The Agenda Discuss 2 or 3 Optimization techniques from An Auto-Tuning Framework for Parallel Multicore Stencil Computations
  • 3. Optimizations Techniques used in the auto tuning framework. Several common optimizations have been implemented in the framework as AST transformations, including ● Loop Unrolling ● Cache Blocking ● Arithmetic Simplification
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Cache Blocking To expose temporal locality and increase cache reuse
  • 13. Cache Blocking ● An important class of algorithmic changes involves blocking data structures to fit in cache. ● By organizing data memory accesses, one can load the cache with a small subset of a much larger data set. ● The idea is then to work on this block of data in cache. ● By using/reusing this data in cache we reduce the need to go to memory (reduce memory bandwidth pressure).
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. AST - Abstract Syntax Tree ● Abstract syntax trees are data structures widely used in compilers, due to their property of representing the structure of program code. ● An AST is usually the result of the syntax analysis phase of a compiler. ● It often serves as an intermediate representation of the program through several stages that the compiler requires, and has a strong impact on the final output of the compiler.
  • 27. These were some of the serial optimizations. ● Although the current set of optimizations may seem identical to existing compiler optimizations, future strategies such as memory structure transformations will be beyond the scope of compilers, since such optimizations are specific to stencil-based computations. ● Additionally, the fact that the framework’s transformations yield code that outperforms compiler-only optimized versions shows compiler algorithms cannot always prove that these (safe) optimizations are allowed. ● Thus, a domain-specific code generator run by the user has the freedom to implement transformations that a compiler may not.
  • 29. Parellel Optimization ● The shared-memory parallel code generators leverage the serial code generation routines to produce the version run by each individual thread. ● Since the parallelization strategy influences code structure, the AST — which represents code run on each individual thread — must be modified to reflect the chosen parallelization strategy. ● The parallel code generators make the necessary modifications to the AST before passing it to the serial code generator.
  • 31. References ● http://people.csail.mit.edu/cycha n/papers/ipdps10.pdf ● https://en.wikipedia.org/wiki/Abs tract_syntax_tree ● https://www.youtube.com/watch ?v=SfV8aRX0YY0 ● https://software.intel.com/en-us/ articles/cache-blocking-techniqu es
  • 32. Sometimes it is good to revisit our learnings. It helps to be a good competitor and also to be prepared for grabbing opportunities. Thank you