SlideShare a Scribd company logo
1 of 20
Download to read offline
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
CUDA-based Linear Solvers for Stable
Fluids
G. Amador and A. Gomes
Departamento de Inform´atica
Universidade da Beira Interior
Covilh˜a, Portugal
m1420@ubi.pt, agomes@di.ubi.pt
April, 2010
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
1 Introduction
2 Stable Fluids
The Eulerian approach
Physics Model
3 NVIDIA Compute Unified Device Architecture (CUDA)
Workflow
Iterative solvers
Jacobi
Gauss-Seidel red-black
Conjugate gradient
4 Results
Jacobi performance
Gauss-Seidel performance
Conjugate gradient performance
5 Conclusions
Conclusions
Future Work
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Overview
The study of fluid simulation (e.g., water) is important
for two industries:
(real-time ≥ 30 fps) (off-line ≤ 30 fps)
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Overview
The study of fluid simulation (e.g., water) is important
for two industries:
(real-time ≥ 30 fps) (off-line ≤ 30 fps)
Problems:
How to implement (specifically for 3D stable fluids) the
CUDA-based versions of the Jacobi, Gauss-Seidel,
and conjugate gradient iterative solvers?
What are the real-time performance limitations of
these solvers implementations?
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
The Eulerian approach
The Eulerian approach
Space partitioning:
Variations of velocity and density are observed at the
center of each cell.
Velocities and densities are updated through an im-
plicit method (Stam stable fluids, 1999), i.e., uncondi-
tionally stable for any time step.
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Navier-Stokes equations for incompressible fluids
Mass conservation: −→
u = 0
Velocity evolution:
∂
−→
u
∂t
= −
−→
u ·
−→
u + v 2−→
u +
−→
f
Density evolution:
∂ρ
∂t
= −
−→
u · ρ + k 2
ρ + S
−→
u : velocity field.
v: fluids viscosity.
ρ: density of the field.
k: density diffusion rate.
−→
f : external forces added to the velocity field.
S: external sources added to the density field.
=
∂
∂x
,
∂
∂y
,
∂
∂z
: gradient.
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Navier-Stokes equations implementation
Update velocity:
Add external forces (
−→
f ).
Velocity Diffusion (v 2−→
u ).
Move (−
−→
u .
−→
u e
−→
u = 0).
Update density:
Add external sources (S).
Density advection (−
−→
u . ρ).
Density diffusion (k 2
ρ).
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Navier-Stokes equations implementation
Update velocity:
Add external forces (
−→
f ).
Velocity Diffusion (v 2−→
u ).
Move (−
−→
u .
−→
u e
−→
u = 0).
Update density:
Add external sources (S).
Density advection (−
−→
u . ρ).
Density diffusion (k 2
ρ).
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Diffusion
Exchanges of density
or velocity between
neighbours (2D).
Solve a sparse linear system (Ax = b), using an iter-
ative method (e.g., Jacobi, Gauss-Seidel, conjugate
gradient, etc.).
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Physics Model
Move
Ensure mass conservation and the fluid’s incom-
pressibility.
Hodge decomposition:
Conservative field = our field - gradient
Determine the gradient using diffusion’s iterative
method (e.g., Jacobi, Gauss-Seidel, conjugate gradi-
ent, etc.).
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Workflow
Workflow
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Iterative solvers
Jacobi
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Iterative solvers
Gauss-Seidel red-black
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Iterative solvers
Conjugate gradient
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Jacobi performance
Jacobi performance
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Gauss-Seidel performance
Gauss-Seidel performance
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Conjugate gradient performance
Conjugate gradient performance
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Conclusions
Conclusions
The CUDA-based implementation of the Gauss-
Seidel solver allows more iterations than the CPU-
based implementation, however it converges two
times slower.
The CUDA-based implementations of the Jacobi and
Gauss-Seidel iterative solvers achieved better perfor-
mances (i.e. faster in processing time) than the CPU-
based implementations.
The CUDA-based implementation of the conjugate
gradient, for grid sizes superior to 643, due to global
memory latency, performs worst than the CPU-based
version.
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Future Work
Future Work
Search ways, implementable using CUDA, to reduce
global memory accesses (e.g., data structures, dy-
namic memory, etc.).
Implement the CPU-based multi-core versions of
the solvers and compare their performance with the
CUDA-based versions.
Search new solvers implementable using CUDA, with
better convergence rate than relaxation techniques
(Jacobi and Gauss-Seidel), with no significant extra
computational effort such as the conjugate gradient.
ubi-logo
Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions
Future Work
Questions???

More Related Content

Viewers also liked

Schulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDASchulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDAJörn Dinkla
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsMarcos Gonzalez
 
PL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database AnalyticsPL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database AnalyticsKohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
MUE 2011 Conference Presentation
MUE 2011 Conference PresentationMUE 2011 Conference Presentation
MUE 2011 Conference PresentationGonçalo Amador
 

Viewers also liked (7)

Schulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDASchulung: Einführung in das GPU-Computing mit NVIDIA CUDA
Schulung: Einführung in das GPU-Computing mit NVIDIA CUDA
 
CUDA-Aware MPI
CUDA-Aware MPICUDA-Aware MPI
CUDA-Aware MPI
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
 
PL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database AnalyticsPL/CUDA - GPU Accelerated In-Database Analytics
PL/CUDA - GPU Accelerated In-Database Analytics
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
MUE 2011 Conference Presentation
MUE 2011 Conference PresentationMUE 2011 Conference Presentation
MUE 2011 Conference Presentation
 
Presentation visapp
Presentation visappPresentation visapp
Presentation visapp
 

Similar to ICISA 2010 Conference Presentation

A location-aware embedding technique for accurate landmark recognition
A location-aware embedding technique for accurate landmark recognitionA location-aware embedding technique for accurate landmark recognition
A location-aware embedding technique for accurate landmark recognitionFederico Magliani
 
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaCFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaMario Felipe Campuzano Ochoa
 
State of GeoServer 2.10
State of GeoServer 2.10State of GeoServer 2.10
State of GeoServer 2.10Jody Garnett
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesSubhajit Sahu
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemCDVClub
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesDatabricks
 
Accelerate-your-AI-Cloud-infrastructure.pdf
Accelerate-your-AI-Cloud-infrastructure.pdfAccelerate-your-AI-Cloud-infrastructure.pdf
Accelerate-your-AI-Cloud-infrastructure.pdfLiang Yan
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...NVIDIA Taiwan
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Clusterairbots
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsGanesan Narayanasamy
 
Converter Simulation - Beyond the Evaluation Board
Converter Simulation - Beyond the Evaluation BoardConverter Simulation - Beyond the Evaluation Board
Converter Simulation - Beyond the Evaluation BoardAnalog Devices, Inc.
 
2020 03-26 - meet up - zparkio
2020 03-26 - meet up - zparkio2020 03-26 - meet up - zparkio
2020 03-26 - meet up - zparkioLeo Benkel
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorDatabricks
 
CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019NVIDIA
 
the usefulness of ldope tools
the usefulness of ldope toolsthe usefulness of ldope tools
the usefulness of ldope toolsliang0816
 
Some Domestic & Global projects executed by our team.
Some Domestic & Global projects executed by our team.Some Domestic & Global projects executed by our team.
Some Domestic & Global projects executed by our team.Vasant Bhanushali
 
Meeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaMeeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaScyllaDB
 

Similar to ICISA 2010 Conference Presentation (20)

A location-aware embedding technique for accurate landmark recognition
A location-aware embedding technique for accurate landmark recognitionA location-aware embedding technique for accurate landmark recognition
A location-aware embedding technique for accurate landmark recognition
 
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano OchoaCFD Cornell Energy Workshop - M.F. Campuzano Ochoa
CFD Cornell Energy Workshop - M.F. Campuzano Ochoa
 
State of GeoServer 2.10
State of GeoServer 2.10State of GeoServer 2.10
State of GeoServer 2.10
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
Cuda
CudaCuda
Cuda
 
CUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : NotesCUDA by Example : The Final Countdown : Notes
CUDA by Example : The Final Countdown : Notes
 
Design Verification Using SystemC
Design Verification Using SystemCDesign Verification Using SystemC
Design Verification Using SystemC
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming Pipelines
 
Accelerate-your-AI-Cloud-infrastructure.pdf
Accelerate-your-AI-Cloud-infrastructure.pdfAccelerate-your-AI-Cloud-infrastructure.pdf
Accelerate-your-AI-Cloud-infrastructure.pdf
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Application Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systemsApplication Optimisation using OpenPOWER and Power 9 systems
Application Optimisation using OpenPOWER and Power 9 systems
 
Converter Simulation - Beyond the Evaluation Board
Converter Simulation - Beyond the Evaluation BoardConverter Simulation - Beyond the Evaluation Board
Converter Simulation - Beyond the Evaluation Board
 
2020 03-26 - meet up - zparkio
2020 03-26 - meet up - zparkio2020 03-26 - meet up - zparkio
2020 03-26 - meet up - zparkio
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019CUDA Sessions You Won't Want to Miss at GTC 2019
CUDA Sessions You Won't Want to Miss at GTC 2019
 
the usefulness of ldope tools
the usefulness of ldope toolsthe usefulness of ldope tools
the usefulness of ldope tools
 
Some Domestic & Global projects executed by our team.
Some Domestic & Global projects executed by our team.Some Domestic & Global projects executed by our team.
Some Domestic & Global projects executed by our team.
 
Meeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with ScyllaMeeting the challenges of OLTP Big Data with Scylla
Meeting the challenges of OLTP Big Data with Scylla
 

Recently uploaded

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxnada99848
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
software engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptxsoftware engineering Chapter 5 System modeling.pptx
software engineering Chapter 5 System modeling.pptx
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

ICISA 2010 Conference Presentation

  • 1. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions CUDA-based Linear Solvers for Stable Fluids G. Amador and A. Gomes Departamento de Inform´atica Universidade da Beira Interior Covilh˜a, Portugal m1420@ubi.pt, agomes@di.ubi.pt April, 2010
  • 2. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions 1 Introduction 2 Stable Fluids The Eulerian approach Physics Model 3 NVIDIA Compute Unified Device Architecture (CUDA) Workflow Iterative solvers Jacobi Gauss-Seidel red-black Conjugate gradient 4 Results Jacobi performance Gauss-Seidel performance Conjugate gradient performance 5 Conclusions Conclusions Future Work
  • 3. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Overview The study of fluid simulation (e.g., water) is important for two industries: (real-time ≥ 30 fps) (off-line ≤ 30 fps)
  • 4. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Overview The study of fluid simulation (e.g., water) is important for two industries: (real-time ≥ 30 fps) (off-line ≤ 30 fps) Problems: How to implement (specifically for 3D stable fluids) the CUDA-based versions of the Jacobi, Gauss-Seidel, and conjugate gradient iterative solvers? What are the real-time performance limitations of these solvers implementations?
  • 5. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions The Eulerian approach The Eulerian approach Space partitioning: Variations of velocity and density are observed at the center of each cell. Velocities and densities are updated through an im- plicit method (Stam stable fluids, 1999), i.e., uncondi- tionally stable for any time step.
  • 6. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Physics Model Navier-Stokes equations for incompressible fluids Mass conservation: −→ u = 0 Velocity evolution: ∂ −→ u ∂t = − −→ u · −→ u + v 2−→ u + −→ f Density evolution: ∂ρ ∂t = − −→ u · ρ + k 2 ρ + S −→ u : velocity field. v: fluids viscosity. ρ: density of the field. k: density diffusion rate. −→ f : external forces added to the velocity field. S: external sources added to the density field. = ∂ ∂x , ∂ ∂y , ∂ ∂z : gradient.
  • 7. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Physics Model Navier-Stokes equations implementation Update velocity: Add external forces ( −→ f ). Velocity Diffusion (v 2−→ u ). Move (− −→ u . −→ u e −→ u = 0). Update density: Add external sources (S). Density advection (− −→ u . ρ). Density diffusion (k 2 ρ).
  • 8. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Physics Model Navier-Stokes equations implementation Update velocity: Add external forces ( −→ f ). Velocity Diffusion (v 2−→ u ). Move (− −→ u . −→ u e −→ u = 0). Update density: Add external sources (S). Density advection (− −→ u . ρ). Density diffusion (k 2 ρ).
  • 9. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Physics Model Diffusion Exchanges of density or velocity between neighbours (2D). Solve a sparse linear system (Ax = b), using an iter- ative method (e.g., Jacobi, Gauss-Seidel, conjugate gradient, etc.).
  • 10. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Physics Model Move Ensure mass conservation and the fluid’s incom- pressibility. Hodge decomposition: Conservative field = our field - gradient Determine the gradient using diffusion’s iterative method (e.g., Jacobi, Gauss-Seidel, conjugate gradi- ent, etc.).
  • 11. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Workflow Workflow
  • 12. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Iterative solvers Jacobi
  • 13. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Iterative solvers Gauss-Seidel red-black
  • 14. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Iterative solvers Conjugate gradient
  • 15. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Jacobi performance Jacobi performance
  • 16. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Gauss-Seidel performance Gauss-Seidel performance
  • 17. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Conjugate gradient performance Conjugate gradient performance
  • 18. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Conclusions Conclusions The CUDA-based implementation of the Gauss- Seidel solver allows more iterations than the CPU- based implementation, however it converges two times slower. The CUDA-based implementations of the Jacobi and Gauss-Seidel iterative solvers achieved better perfor- mances (i.e. faster in processing time) than the CPU- based implementations. The CUDA-based implementation of the conjugate gradient, for grid sizes superior to 643, due to global memory latency, performs worst than the CPU-based version.
  • 19. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Future Work Future Work Search ways, implementable using CUDA, to reduce global memory accesses (e.g., data structures, dy- namic memory, etc.). Implement the CPU-based multi-core versions of the solvers and compare their performance with the CUDA-based versions. Search new solvers implementable using CUDA, with better convergence rate than relaxation techniques (Jacobi and Gauss-Seidel), with no significant extra computational effort such as the conjugate gradient.
  • 20. ubi-logo Introduction Stable Fluids NVIDIA Compute Unified Device Architecture (CUDA) Results Conclusions Future Work Questions???