SlideShare a Scribd company logo
Parallel Iterative solution of the
Hermite Collocation Equations
on GPUs
Emmanuel N. Mathioudakis
Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis
TECHNICAL UNIVERSITY OF CRETE
DEPARTMENT OF SCIENCES
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
73100 CHANIA - CRETE - GREECE
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Talk Overview
 Hermite Collocation for elliptic BVPs &
Derivation of the Collocation linear system
 Development of a parallel algorithm for the
Schur Complement method on Shared
Memory Parallel Architectures
 Parallel implementation on multicore computers
with GPUs
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
( , ) ( , ) , ( , )
( , ) ( , ) , ( , )
u x y f x y x y
u x y g x y x y
 
 
L
BBVP
( , ) ( , ) 0 , ( , )
:
( , ) ( , ) 0 , (
,
, )
y y yx x x
i j i j i j
y y yx x x
i j i j i j
a
i j
u f
u g
     
     
  
  
L
B
Hermite Collocation Method
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Hermite Collocation Method…
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Hermite Collocation Method…
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
with 0
( , ) ( , ) ( , ) , ( , )
( , ) ( , ) , ( , )
u x y u x y f x y x y
u x y g x y x y 


  
 
2
Model
Problem
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Helmholtz
Collocation
Matrix
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Red – Black Collocation Linear system
 The collocation matrix is large, sparse and enjoys no pleasant
properties (e.g. symmetric, definite)
Iterative
+
Parallel
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Iterative Solution
with
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Iterative Solution
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Eigenvalues of Collocation matrix
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Schur Complement Iterative Solution
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel Iterative Solution of Collocation Linear system
on Shared Memory Architectures
 Uniform Load Balancing between core threads
 Minimal Idle Cycles of core threads
 Minimal Communication
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
( )
1
Rx ( )
2
Rx ( )
3
Rx ( )
4
Rx
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
( )
1
B
p
x 
( )
2
B
p
x 
( )
3
B
p
x 
( )
4
B
p
x 
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
case of ns = 2p
V1
( )
1
Rx ( )
2
Rx ( )
3
Rx ( )
4
Rx( )
1
B
p
x 
( )
2
B
p
x 
( )
3
B
p
x 
( )
4
B
p
x 
V2 V3 V4 V5 V6 V7 V8
V1 V2 V3 V4 V5
Mapping into a fixed size Architecture of N Cores
case of k = 2p/N even 2k virtual threads
l=(j-1)k+1,…,jk
l=2p+(j-1)k+1,…,2p+jk
j=1,…,N
( )B
l
x
( )R
l
x
V2 . . .
Pj
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel Schur Complement Iterative Solution
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel BiCGSTAB
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Parallel BiCGSTAB
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
The Dirichlet Helmholtz Problem
2100( 0.1)2with
( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1]
( ) ( ) x
u x y x y x y
x x x e
 
  
  
 
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
HP SL390s
6 core Xeon@2.8GHz
24GB memory
Oracle Linux 6.3 x64
PGI 13.5 Fortran
PCI-e gen2 x16
HP SL390s – Tesla M2070 GPUs
+ 2 x
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Iterations / Error measurements
ns
BiCGSTAB
Iterations
|| b – Axn||2
256 294 6.06e-11
512 589 2.85e-11
1024 1161 1.39e-11
2048 3726 9.59e-12
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Realization on HP SL390s Tesla GPU machine
Time measurements
Conclusions
• A new parallel algorithm implementing the Schur
complement with BiCGSTAB iterative method for
Hermite Collocation equations has been developed.
• The algorithm is realized on Shared Memory multi-core
machines with GPUs .
• A performance acceleration of up to 30% is observed.
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY
Future work
• Design an efficient parallel Schur complement
algorithm of the Hermite Collocation equations
for Multiprocessor /Grid machines with GPUs.
TECHNICAL UNIVERSITY OF CRETE
APPLIED MATHEMATICS AND COMPUTERS LABORATORY

More Related Content

What's hot

Novel reconfigurable hardware architecture for polynomial matrix multiplications
Novel reconfigurable hardware architecture for polynomial matrix multiplicationsNovel reconfigurable hardware architecture for polynomial matrix multiplications
Novel reconfigurable hardware architecture for polynomial matrix multiplications
I3E Technologies
 
A Gomez T Tat Cesga
A Gomez T Tat CesgaA Gomez T Tat Cesga
A Gomez T Tat Cesga
Miguel Morales
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Tom Hubregtsen
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
David Gleich
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
NECST Lab @ Politecnico di Milano
 
User guided discovery of declarative process models
User guided discovery of declarative process modelsUser guided discovery of declarative process models
User guided discovery of declarative process models
fmaggi
 
FMRIPREP - robust and easy to use fMRI preprocessing pipeline
FMRIPREP - robust and easy to use fMRI preprocessing pipelineFMRIPREP - robust and easy to use fMRI preprocessing pipeline
FMRIPREP - robust and easy to use fMRI preprocessing pipeline
Krzysztof Gorgolewski
 
6. Implementation
6. Implementation6. Implementation
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
Fabian Pedregosa
 

What's hot (9)

Novel reconfigurable hardware architecture for polynomial matrix multiplications
Novel reconfigurable hardware architecture for polynomial matrix multiplicationsNovel reconfigurable hardware architecture for polynomial matrix multiplications
Novel reconfigurable hardware architecture for polynomial matrix multiplications
 
A Gomez T Tat Cesga
A Gomez T Tat CesgaA Gomez T Tat Cesga
A Gomez T Tat Cesga
 
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
Aritra Sarkar - Search and Optimisation Algorithms for Genomics on Quantum Ac...
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
User guided discovery of declarative process models
User guided discovery of declarative process modelsUser guided discovery of declarative process models
User guided discovery of declarative process models
 
FMRIPREP - robust and easy to use fMRI preprocessing pipeline
FMRIPREP - robust and easy to use fMRI preprocessing pipelineFMRIPREP - robust and easy to use fMRI preprocessing pipeline
FMRIPREP - robust and easy to use fMRI preprocessing pipeline
 
6. Implementation
6. Implementation6. Implementation
6. Implementation
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 

Viewers also liked

Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitude
With Company
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
Yuan Wang
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
In a Rocket
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
Post Planner
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
Kirsty Hulse
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
ux singapore
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Stanford GSB Corporate Governance Research Initiative
 

Viewers also liked (7)

Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitude
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika AldabaLightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
Lightning Talk #9: How UX and Data Storytelling Can Shape Policy by Mika Aldaba
 
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job? Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
Succession “Losers”: What Happens to Executives Passed Over for the CEO Job?
 

Similar to Parallel iterative solution of the hermite collocation equations on gpus

Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic Programming
Raphael Reitzig
 
Experiences of numerical simulations on a PC cluster
Experiences of numerical simulations on a PC clusterExperiences of numerical simulations on a PC cluster
Experiences of numerical simulations on a PC cluster
Antti Vanne
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
Aritra Sarkar
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
NECST Lab @ Politecnico di Milano
 
Evolutionary Algorithms and Self-Organised Systems
Evolutionary Algorithms and Self-Organised SystemsEvolutionary Algorithms and Self-Organised Systems
Evolutionary Algorithms and Self-Organised Systems
Natalio Krasnogor
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
IRJET Journal
 
epscor_talk_2.pptx
epscor_talk_2.pptxepscor_talk_2.pptx
epscor_talk_2.pptx
ShadowCon
 
autoTVM
autoTVMautoTVM
autoTVM
Yi-Wen Hung
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
jesujoseph
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
Victor Asanza
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
NVIDIA Japan
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
AIST
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
Wireilla
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ijfls
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
Francisco Zamora-Martinez
 
CV_Akram_Malak
CV_Akram_MalakCV_Akram_Malak
CV_Akram_Malak
Akram Malak
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
Abhirup Mallik
 
Ijetr042170
Ijetr042170Ijetr042170
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022
Charles Martin
 

Similar to Parallel iterative solution of the hermite collocation equations on gpus (20)

Parallelising Dynamic Programming
Parallelising Dynamic ProgrammingParallelising Dynamic Programming
Parallelising Dynamic Programming
 
Experiences of numerical simulations on a PC cluster
Experiences of numerical simulations on a PC clusterExperiences of numerical simulations on a PC cluster
Experiences of numerical simulations on a PC cluster
 
Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18Virus, Vaccines, Genes and Quantum - 2020-06-18
Virus, Vaccines, Genes and Quantum - 2020-06-18
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Evolutionary Algorithms and Self-Organised Systems
Evolutionary Algorithms and Self-Organised SystemsEvolutionary Algorithms and Self-Organised Systems
Evolutionary Algorithms and Self-Organised Systems
 
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
An Optimized Parallel Algorithm for Longest Common Subsequence Using Openmp –...
 
epscor_talk_2.pptx
epscor_talk_2.pptxepscor_talk_2.pptx
epscor_talk_2.pptx
 
autoTVM
autoTVMautoTVM
autoTVM
 
B Eng Final Year Project Presentation
B Eng Final Year Project PresentationB Eng Final Year Project Presentation
B Eng Final Year Project Presentation
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
 
CV_Akram_Malak
CV_Akram_MalakCV_Akram_Malak
CV_Akram_Malak
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022 Weight watcher Bay Area ACM Feb 28, 2022
Weight watcher Bay Area ACM Feb 28, 2022
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 

Parallel iterative solution of the hermite collocation equations on gpus

  • 1. Parallel Iterative solution of the Hermite Collocation Equations on GPUs Emmanuel N. Mathioudakis Co-Authors : Elena Papadopoulou – Yiannis Saridakis – Nikolaos Vilanakis TECHNICAL UNIVERSITY OF CRETE DEPARTMENT OF SCIENCES APPLIED MATHEMATICS AND COMPUTERS LABORATORY 73100 CHANIA - CRETE - GREECE
  • 2. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Talk Overview  Hermite Collocation for elliptic BVPs & Derivation of the Collocation linear system  Development of a parallel algorithm for the Schur Complement method on Shared Memory Parallel Architectures  Parallel implementation on multicore computers with GPUs
  • 3. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y f x y x y u x y g x y x y     L BBVP ( , ) ( , ) 0 , ( , ) : ( , ) ( , ) 0 , ( , , ) y y yx x x i j i j i j y y yx x x i j i j i j a i j u f u g                   L B Hermite Collocation Method
  • 4. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  • 5. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Hermite Collocation Method…
  • 6. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 7. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 8. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 9. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 10. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 11. with 0 ( , ) ( , ) ( , ) , ( , ) ( , ) ( , ) , ( , ) u x y u x y f x y x y u x y g x y x y         2 Model Problem TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 12. Helmholtz Collocation Matrix TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 13. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Red – Black Collocation Linear system
  • 14.  The collocation matrix is large, sparse and enjoys no pleasant properties (e.g. symmetric, definite) Iterative + Parallel TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 15. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution with
  • 16. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Iterative Solution
  • 17. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Eigenvalues of Collocation matrix
  • 18. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Schur Complement Iterative Solution
  • 19. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Iterative Solution of Collocation Linear system on Shared Memory Architectures  Uniform Load Balancing between core threads  Minimal Idle Cycles of core threads  Minimal Communication
  • 20. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx
  • 21. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p ( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x 
  • 22. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY case of ns = 2p V1 ( ) 1 Rx ( ) 2 Rx ( ) 3 Rx ( ) 4 Rx( ) 1 B p x  ( ) 2 B p x  ( ) 3 B p x  ( ) 4 B p x  V2 V3 V4 V5 V6 V7 V8
  • 23. V1 V2 V3 V4 V5
  • 24. Mapping into a fixed size Architecture of N Cores case of k = 2p/N even 2k virtual threads l=(j-1)k+1,…,jk l=2p+(j-1)k+1,…,2p+jk j=1,…,N ( )B l x ( )R l x V2 . . . Pj
  • 25. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel Schur Complement Iterative Solution
  • 26. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  • 27. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Parallel BiCGSTAB
  • 28. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY The Dirichlet Helmholtz Problem 2100( 0.1)2with ( , ) 10 ( ) ( ) , ( , ) [0,1] [0,1] ( ) ( ) x u x y x y x y x x x e          
  • 29. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY HP SL390s 6 core Xeon@2.8GHz 24GB memory Oracle Linux 6.3 x64 PGI 13.5 Fortran PCI-e gen2 x16 HP SL390s – Tesla M2070 GPUs + 2 x
  • 30. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Iterations / Error measurements ns BiCGSTAB Iterations || b – Axn||2 256 294 6.06e-11 512 589 2.85e-11 1024 1161 1.39e-11 2048 3726 9.59e-12
  • 31. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 32. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 33. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 34. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 35. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 36. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY Realization on HP SL390s Tesla GPU machine Time measurements
  • 37. Conclusions • A new parallel algorithm implementing the Schur complement with BiCGSTAB iterative method for Hermite Collocation equations has been developed. • The algorithm is realized on Shared Memory multi-core machines with GPUs . • A performance acceleration of up to 30% is observed. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY
  • 38. Future work • Design an efficient parallel Schur complement algorithm of the Hermite Collocation equations for Multiprocessor /Grid machines with GPUs. TECHNICAL UNIVERSITY OF CRETE APPLIED MATHEMATICS AND COMPUTERS LABORATORY