SlideShare a Scribd company logo
Parallel Computing: Perspectives for more e cient
hydrological modeling

Grigorios Anagnostopoulos

Internal Seminar, 11.10.2011
General Concepts

GPU Programming

CA Parallel implementation

What is parallel computing?

Simultaneous use of multiple computing resources to solve a single
computational problem.
The computing resources can be:
A single computer with multiple processors.
A number of computers connected to a network.
A combination of both.

Benefits of parallel computing:
The computational load is broken apart in discrete pieces of work
that can be treated simultaneously.
The total simulation time is much less using multiple computing
resources.

Parallel Computing: Perspectives for more e cient hydrological modeling
2 / 20
General Concepts

GPU Programming

CA Parallel implementation

Parallel Computer Models Classification
Parallel Computer Classification
Flynn’s taxonomy: A widely used classification
Flynn's taxonomy: a widely used classifications
Classify along two independent dimensions:



◦ Classify along two independent dimensions:
Instruction and Data.
 Instruction and Data
Each dimension can have two possible states:
◦ Each dimension can have two possible states:
 Single or Multiple
Single or Multiple.
SISD
Single Instruction,
Single Data

SIMD
Single Instruction,
Multiple Data

MISD
Multiple Instruction,
Single Data

MIMD
Multiple Instruction,
Multiple Data

38
Parallel Computing: Perspectives for more e cient hydrological modeling
3 / 20
General Concepts

CPU

CPU GPU Programming
CPU

CPU
CA Parallel implementation

MIMD: Multiple Instruction, Multiple Data
The most common type of Interconnectcomputer (most modern
parallel
computers fall into this category).
Consists of a collection of fully independent processing units or
Memory
cores having their own control unit and its own ALU.
Execution
FIGURE 2.3

can be synchronous or asynchronous, as the processors
own pace.

Acan operate system
shared-memory at their

CPU

CPU

CPU

CPU

Memory

Memory

Memory

Memory

Interconnect

FIGURE 2.4
A distributed-memory system Parallel Computing: Perspectives for more e cient hydrological modeling

4 / 20
General Concepts

GPU Programming

CA Parallel implementation

Parallelism: An everyday example

Parallelism



Task parallelism: the ability to execute di↵erent tasks within a
problem at the same time.

As an analogy, think about a farmer who
hires workers to pick apples from an
orchard of trees

Data parallelism: the ability to execute parts of the same task on
di↵erent data at the same time.

◦ Worker  hardware
As an analogy, think about a
farmer who hires workers to
(processing element)
pick apples from his trees:

◦ Trees  tasks

Worker = hardware
◦ Apples  data
(processing element).
Trees = task.
Apples = data.

Parallel Computing: Perspectives for more e cient hydrological modeling
5 / 20
47
Parallelism

General Concepts



GPU Programming

CA Parallel implementation

Sequential approach
The serial approach would be to have one
worker pick all of the apples from each tree

The sequential approach would be to have the worker pick all of
the apples from each tree.

48

Parallel Computing: Perspectives for more e cient hydrological modeling
6 / 20
Parallelism – More workers
workers
Parallelism: More

General Concepts

GPU Programming

CA Parallel implementation

Data parallel hardware: Working on the same tree, which allows
Working on the same tree.
each task parallel hardware, and would allow each task to
◦ data to be completed quicker.

be completed quicker work per tree?
How many workers should

 How many workers should there be per tree?
What ififsome trees have few apples, while others have many?
What some trees have few apples, while others many?

49
Parallel Computing: Perspectives for more e cient hydrological modeling
7 / 20
Parallelism – More workers
Parallelism: More workers

General Concepts

GPU Programming

CA Parallel implementation

 Each parallelism: Each worker pick a different tree
Task worker pick apples from apples from a di↵erent tree.

◦ Task parallelism, and although each task takes the
Although as in the serial version, many are
same time each task takes the same time as in the sequential version,
many tasks are parallel
accomplished inaccomplished in parallel.
What there are only few densely populated trees?
◦ What if if there are only aafew densely populated trees?

50
Parallel Computing: Perspectives for more e cient hydrological modeling
8 / 20
General Concepts

GPU Programming

CA Parallel implementation

Algorithm Decomposition

Task Decomposition
Most of engineering problems are non trivial and it is crucial to



have more formal to functionally independent parts
reduces an algorithm concepts for determining parallelism.
Tasks may have dependencies on other tasks
The concept of decomposition
◦ If the input of task B is dependent on the output of task A, then task
B is Task decomposition: dividing the algorithm into individual tasks,
dependent on task A
which are functionally independent. Tasks which don’t have
◦ Tasks that don’t have dependencies (or whose dependencies are
dependencies (or whose dependencies are completed) can be
completed) can be executed at any time to achieve parallelism
executed at any time to achieve parallelism.
◦ Task dependency graphs are used to describe the relationship
Data decomposition: dividing a data set into discrete chunks that
between tasks
can be processed in parallel.
A

B

A

B is dependent on A

B

C

A and B are independent
of each other

C is dependent on A and B

Parallel Computing: Perspectives for more e cient hydrological modeling
52

9 / 20
General Concepts



GPU Programming
CA Parallel
A quiet revolution and potential build-up implementation
◦ Calculation:TFLOPS Programming?
Why GPU vs. 100 GFLOPS

◦

Memory Bandwidth: ~10x

Many-core GPU

Multi-core CPU

Courtesy: John Owens

Figure 1.1. GPU in every PC– massive volume and potential impact
◦ Enlarging Perform ance Gap betw een GPUs and CPUs.
Parallel programming is easier than ever because it can be done at
relative low-end pc’s.
10

Cards such as the Nvidia Tesla C1060 and GT200 contain 240
cores, each of which is highly multithreaded.
Parallel Computing: Perspectives for more e cient hydrological modeling
10 / 20
General Concepts

●

CPU

GPU Programming

CA Parallel implementation

GPU vs CPU

●
●

●

GPU: area used for but very cache
Most die Few instructions memoryfast execution. Uses very fast

Relatively few transistors for ALUs

GDDR3 RAM. Most die area is used for ALUs and the caches are
relative small.

GPU CPU: Lots of instructions but slower execution. Uses slower DDR2
●

or die area used it ALUs
Most DDR3 RAM (butfor has direct access to more memory than

●

Relativelyfew transistors for ALUs.
relative small caches

GPUs). Most die area is used for memory cache and there are

Parallel Computing: Perspectives for more e cient hydrological modeling
11 / 20
General Concepts

GPU Programming

CA Parallel implementation

GPU is fastGPU is fast

Parallel Computing: Perspectives for more e cient hydrological modeling
12 / 20
General Concepts

GPU Programming

CA Parallel implementation

CUDA: Compute Unified Device Architecture
CUDA Program: Consists of phases that are executed on either
the host (CPU) or a device (GPU).
No data parallelism = the code is executed at the host.
Data parallelism = the code is executed at the device.

Data-parallel portions of an application are expressed as device
kernels which run on the device.

Arrays of Parallel Threads

GPU kernels are written using the Single Program Multiple Data
(SPMD) programming model.
• A CUDA kernel is executed by an array of

threads
SPMD executes multiple instances of the same program
– All threads run the same code (SPMD)  
independently, where eachthat it uses to compute memorya di↵erent portion of
– Each thread has an ID program works on addresses and
the data. make control decisions
threadID

0 1 2 3 4 5 6 7

…
float x = input[threadID];
float y = func(x);
output[threadID] = y;
…

Parallel Computing: Perspectives for more e cient hydrological modeling
15

13 / 20
General Concepts

GPU Programming

CA Parallel implementation

CUDA: Compute Unified Device Architecture
Chapter 2. Programming Model

Grid

A CUDA kernel is executed
by an array of threads.
Each thread has an ID,
which is used to compute
memory addresses and make
control decisions.
CUDA threads are organized
into multiple blocks.
Threads within a block
cooperate via shared
memory, atomic operations
and barrier synchronization.

Block (0, 0)

Block (1, 0)

Block (2, 0)

Block (0, 1)

Block (1, 1)

Block (2, 1)

Block (1, 1)
Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0)

Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1)

Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2)

Figure 2-1.Grid of Thread Blocks
Parallel Computing: Perspectives for more e cient hydrological modeling

2.3

Memory Hierarchy

14 / 20
General Concepts

GPU Programming

CA Parallel implementation

CUDA memory types
Chapter 4: Hardware Implementation

Global memory: Low
bandwidth but large space.
Fastest read/write calls if
they are coalesced.

Device
Multiprocessor N

Multiprocessor 2
Multiprocessor 1

Texture memory: Cache
optimized for 2D spatial
patterns.

Shared Memory
Registers

Constant memory: Slow,
but with cache (8 kb).

Processor 1

Registers

Processor 2

Registers

…

Instruction
Unit
Processor M

Constant
Cache

Shared memory: Fast, but
it can be used only by the
threads of the same block.

Texture
Cache

Device Memory

Registers: 32768 32-bit
registers per Multi-processor.

A set of SIMT multiprocessors with on-chip shared memory.

Figure 4-2.Hardware Model

Parallel Computing: Perspectives for more e cient hydrological modeling

4.2

Multiple Devices

15 / 20
General Concepts

GPU Programming

CA Parallel implementation

CA Parallel implementation
A parallel version of the Cellular Automata algorithm for variably
saturated flow in soils was developed in CUDA API.
The infiltration experiment of Vauclin et al. (1979) was chosen as a
benchmark test for the accuracy and the speed of the algorithm.
0
t = 2 hrs
t = 3 hrs
t = 4 hrs
t = 8 hrs
experimental data

Water Depth (m)

0.5

1

1.5

2
0

0.5

1

1.5
Distance (m)

2

2.5

3

Parallel Computing: Perspectives for more e cient hydrological modeling
16 / 20
General Concepts

GPU Programming

CA Parallel implementation

Why parallel code is important?

In real case scenarios, where the 3-D simulation of large areas is
needed, the grid sizes are excessively large.
In natural hazards assessment the simulations should be fast in order
to be useful (the prediction should be before the actual event!).
Fast simulations allow us to calibrate easier the model parameters
and investigate more e ciently the physical phenomena.

The inherent CA concept natural parallelism make easier the
parallel implementation of the algorithm.

Parallel Computing: Perspectives for more e cient hydrological modeling
17 / 20
General Concepts

GPU Programming

CA Parallel implementation

Technical details
Di culties
The most challenging issue was the irregular geometry of the
domain which made more di cult the exploitation of the locality at
the thread computations and the use of the shared memory.
The cell values were stored in a 1D array and for each cell the
indexes of its neighboring cells were also stored.

Code structure
Simulation constants are stored in the constant memory.
Soil properties for each soil class are stored in the texture memory.
Atomic operations are used in order to check for convergence at
every iteration.
The shared memory is used to accelerate the atomic operations and
the block’s memory accesses.
Parallel Computing: Perspectives for more e cient hydrological modeling
18 / 20
General Concepts

GPU Programming

CA Parallel implementation

Results of the numerical tests
Nvidia Quadro 2000:
192 CUDA cores.
1 GB GDDR5 of RAM memory.

100000"

90"

70"

Speed%Up%

Speed%(%cells/sec%)%

80"
10000"
1000"
100"

CPU"

10"

GPU"

60"
50"
40"
30"
20"
10"

1"
1000"

10000"

100000"

Number%of%Cells%

1000000"

10000000"

0"
1000"

10000"

100000"
Number%of%Cells%

1000000"

10000000"

Parallel Computing: Perspectives for more e cient hydrological modeling
19 / 20
Thanks for your attention!

More Related Content

What's hot

Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
Mohammad Mustaqeem
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
Oscar Law
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
Taras Zakharchenko
 
Parallel computation
Parallel computationParallel computation
Parallel computation
Jayanti Prasad Ph.D.
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
cseij
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
csandit
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
Adam Muise
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Yahoo Developer Network
 
Modern processors
Modern processorsModern processors
Modern processors
gowrivageesan87
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
Edge AI and Vision Alliance
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
Mani Goswami
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
Cloudera, Inc.
 
Lec11 timing
Lec11 timingLec11 timing
Lec11 timing
Taras Zakharchenko
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
Lior Sidi
 
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
AMD Developer Central
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 
GCF
GCFGCF

What's hot (20)

Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
Lec09 nbody-optimization
Lec09 nbody-optimizationLec09 nbody-optimization
Lec09 nbody-optimization
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUSAVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
AVOIDING DUPLICATED COMPUTATION TO IMPROVE THE PERFORMANCE OF PFSP ON CUDA GPUS
 
KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012KnittingBoar Toronto Hadoop User Group Nov 27 2012
KnittingBoar Toronto Hadoop User Group Nov 27 2012
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Modern processors
Modern processorsModern processors
Modern processors
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
An Introduction to TensorFlow architecture
An Introduction to TensorFlow architectureAn Introduction to TensorFlow architecture
An Introduction to TensorFlow architecture
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
 
Lec11 timing
Lec11 timingLec11 timing
Lec11 timing
 
GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by  ...
WT-4065, Superconductor: GPU Web Programming for Big Data Visualization, by ...
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
GCF
GCFGCF
GCF
 

Viewers also liked

Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
David Chou
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
Klika Tech, Inc
 
Introduction P2p
Introduction P2pIntroduction P2p
Introduction P2p
Davide Carboni
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
Lars Marius Garshol
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Ameya Waghmare
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
Rahul Jain
 

Viewers also liked (6)

Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
Introduction P2p
Introduction P2pIntroduction P2p
Introduction P2p
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 

Similar to Parallel Computing: Perspectives for more efficient hydrological modeling

gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
ARUNACHALAM468781
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
mohamedragabslideshare
 
CUDA
CUDACUDA
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
Akhila Prabhakaran
 
Modern processor art
Modern processor artModern processor art
Modern processor art
waqasjadoon11
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
waqasjadoon11
 
processor struct
processor structprocessor struct
processor struct
waqasjadoon11
 
Modern processor art
Modern processor artModern processor art
Modern processor art
waqasjadoon11
 
Handout3o
Handout3oHandout3o
Handout3o
Shahbaz Sidhu
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
krnaween
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
Jayanti Prasad Ph.D.
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
Rob Gillen
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Content-Based Matching on GPUs
Content-Based Matching on GPUsContent-Based Matching on GPUs
Content-Based Matching on GPUs
Alessandro Margara
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
RojaT4
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
ssuser413a98
 
Hadoop
HadoopHadoop
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
samthemonad
 

Similar to Parallel Computing: Perspectives for more efficient hydrological modeling (20)

gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
CUDA
CUDACUDA
CUDA
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
processor struct
processor structprocessor struct
processor struct
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
Handout3o
Handout3oHandout3o
Handout3o
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
 
Content-Based Matching on GPUs
Content-Based Matching on GPUsContent-Based Matching on GPUs
Content-Based Matching on GPUs
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 

More from Grigoris Anagnostopoulos

Credibility of climate predictions revisited
Credibility of climate predictions revisitedCredibility of climate predictions revisited
Credibility of climate predictions revisited
Grigoris Anagnostopoulos
 
Modelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automataModelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automata
Grigoris Anagnostopoulos
 
Hydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow LandslidesHydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow Landslides
Grigoris Anagnostopoulos
 
Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)Grigoris Anagnostopoulos
 
A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...
Grigoris Anagnostopoulos
 
Hydrological Modelling of Slope Stability
Hydrological Modelling of Slope StabilityHydrological Modelling of Slope Stability
Hydrological Modelling of Slope Stability
Grigoris Anagnostopoulos
 

More from Grigoris Anagnostopoulos (6)

Credibility of climate predictions revisited
Credibility of climate predictions revisitedCredibility of climate predictions revisited
Credibility of climate predictions revisited
 
Modelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automataModelling variably saturated flow using cellular automata
Modelling variably saturated flow using cellular automata
 
Hydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow LandslidesHydrological Modelling of Shallow Landslides
Hydrological Modelling of Shallow Landslides
 
Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)Assessment of the reliability of climate models (in greek)
Assessment of the reliability of climate models (in greek)
 
A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...A distributed physically based model to predict timing and spatial distributi...
A distributed physically based model to predict timing and spatial distributi...
 
Hydrological Modelling of Slope Stability
Hydrological Modelling of Slope StabilityHydrological Modelling of Slope Stability
Hydrological Modelling of Slope Stability
 

Recently uploaded

CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
Nguyen Thanh Tu Collection
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
Celine George
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
Celine George
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
Nguyen Thanh Tu Collection
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
Celine George
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
blueshagoo1
 

Recently uploaded (20)

CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN TẬP VÀ PHÁT TRIỂN CÂU HỎI TRONG ĐỀ MINH HỌA THI TỐT NGHIỆP THPT ...
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17How to Manage Reception Report in Odoo 17
How to Manage Reception Report in Odoo 17
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
How to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in useHow to Fix [Errno 98] address already in use
How to Fix [Errno 98] address already in use
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 8 - CẢ NĂM - FRIENDS PLUS - NĂM HỌC 2023-2024 (B...
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17How to Download & Install Module From the Odoo App Store in Odoo 17
How to Download & Install Module From the Odoo App Store in Odoo 17
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
CIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdfCIS 4200-02 Group 1 Final Project Report (1).pdf
CIS 4200-02 Group 1 Final Project Report (1).pdf
 

Parallel Computing: Perspectives for more efficient hydrological modeling

  • 1. Parallel Computing: Perspectives for more e cient hydrological modeling Grigorios Anagnostopoulos Internal Seminar, 11.10.2011
  • 2. General Concepts GPU Programming CA Parallel implementation What is parallel computing? Simultaneous use of multiple computing resources to solve a single computational problem. The computing resources can be: A single computer with multiple processors. A number of computers connected to a network. A combination of both. Benefits of parallel computing: The computational load is broken apart in discrete pieces of work that can be treated simultaneously. The total simulation time is much less using multiple computing resources. Parallel Computing: Perspectives for more e cient hydrological modeling 2 / 20
  • 3. General Concepts GPU Programming CA Parallel implementation Parallel Computer Models Classification Parallel Computer Classification Flynn’s taxonomy: A widely used classification Flynn's taxonomy: a widely used classifications Classify along two independent dimensions:  ◦ Classify along two independent dimensions: Instruction and Data.  Instruction and Data Each dimension can have two possible states: ◦ Each dimension can have two possible states:  Single or Multiple Single or Multiple. SISD Single Instruction, Single Data SIMD Single Instruction, Multiple Data MISD Multiple Instruction, Single Data MIMD Multiple Instruction, Multiple Data 38 Parallel Computing: Perspectives for more e cient hydrological modeling 3 / 20
  • 4. General Concepts CPU CPU GPU Programming CPU CPU CA Parallel implementation MIMD: Multiple Instruction, Multiple Data The most common type of Interconnectcomputer (most modern parallel computers fall into this category). Consists of a collection of fully independent processing units or Memory cores having their own control unit and its own ALU. Execution FIGURE 2.3 can be synchronous or asynchronous, as the processors own pace. Acan operate system shared-memory at their CPU CPU CPU CPU Memory Memory Memory Memory Interconnect FIGURE 2.4 A distributed-memory system Parallel Computing: Perspectives for more e cient hydrological modeling 4 / 20
  • 5. General Concepts GPU Programming CA Parallel implementation Parallelism: An everyday example Parallelism  Task parallelism: the ability to execute di↵erent tasks within a problem at the same time. As an analogy, think about a farmer who hires workers to pick apples from an orchard of trees Data parallelism: the ability to execute parts of the same task on di↵erent data at the same time. ◦ Worker  hardware As an analogy, think about a farmer who hires workers to (processing element) pick apples from his trees: ◦ Trees  tasks Worker = hardware ◦ Apples  data (processing element). Trees = task. Apples = data. Parallel Computing: Perspectives for more e cient hydrological modeling 5 / 20 47
  • 6. Parallelism General Concepts  GPU Programming CA Parallel implementation Sequential approach The serial approach would be to have one worker pick all of the apples from each tree The sequential approach would be to have the worker pick all of the apples from each tree. 48 Parallel Computing: Perspectives for more e cient hydrological modeling 6 / 20
  • 7. Parallelism – More workers workers Parallelism: More General Concepts GPU Programming CA Parallel implementation Data parallel hardware: Working on the same tree, which allows Working on the same tree. each task parallel hardware, and would allow each task to ◦ data to be completed quicker. be completed quicker work per tree? How many workers should  How many workers should there be per tree? What ififsome trees have few apples, while others have many? What some trees have few apples, while others many? 49 Parallel Computing: Perspectives for more e cient hydrological modeling 7 / 20
  • 8. Parallelism – More workers Parallelism: More workers General Concepts GPU Programming CA Parallel implementation  Each parallelism: Each worker pick a different tree Task worker pick apples from apples from a di↵erent tree. ◦ Task parallelism, and although each task takes the Although as in the serial version, many are same time each task takes the same time as in the sequential version, many tasks are parallel accomplished inaccomplished in parallel. What there are only few densely populated trees? ◦ What if if there are only aafew densely populated trees? 50 Parallel Computing: Perspectives for more e cient hydrological modeling 8 / 20
  • 9. General Concepts GPU Programming CA Parallel implementation Algorithm Decomposition Task Decomposition Most of engineering problems are non trivial and it is crucial to   have more formal to functionally independent parts reduces an algorithm concepts for determining parallelism. Tasks may have dependencies on other tasks The concept of decomposition ◦ If the input of task B is dependent on the output of task A, then task B is Task decomposition: dividing the algorithm into individual tasks, dependent on task A which are functionally independent. Tasks which don’t have ◦ Tasks that don’t have dependencies (or whose dependencies are dependencies (or whose dependencies are completed) can be completed) can be executed at any time to achieve parallelism executed at any time to achieve parallelism. ◦ Task dependency graphs are used to describe the relationship Data decomposition: dividing a data set into discrete chunks that between tasks can be processed in parallel. A B A B is dependent on A B C A and B are independent of each other C is dependent on A and B Parallel Computing: Perspectives for more e cient hydrological modeling 52 9 / 20
  • 10. General Concepts  GPU Programming CA Parallel A quiet revolution and potential build-up implementation ◦ Calculation:TFLOPS Programming? Why GPU vs. 100 GFLOPS ◦ Memory Bandwidth: ~10x Many-core GPU Multi-core CPU Courtesy: John Owens Figure 1.1. GPU in every PC– massive volume and potential impact ◦ Enlarging Perform ance Gap betw een GPUs and CPUs. Parallel programming is easier than ever because it can be done at relative low-end pc’s. 10 Cards such as the Nvidia Tesla C1060 and GT200 contain 240 cores, each of which is highly multithreaded. Parallel Computing: Perspectives for more e cient hydrological modeling 10 / 20
  • 11. General Concepts ● CPU GPU Programming CA Parallel implementation GPU vs CPU ● ● ● GPU: area used for but very cache Most die Few instructions memoryfast execution. Uses very fast Relatively few transistors for ALUs GDDR3 RAM. Most die area is used for ALUs and the caches are relative small. GPU CPU: Lots of instructions but slower execution. Uses slower DDR2 ● or die area used it ALUs Most DDR3 RAM (butfor has direct access to more memory than ● Relativelyfew transistors for ALUs. relative small caches GPUs). Most die area is used for memory cache and there are Parallel Computing: Perspectives for more e cient hydrological modeling 11 / 20
  • 12. General Concepts GPU Programming CA Parallel implementation GPU is fastGPU is fast Parallel Computing: Perspectives for more e cient hydrological modeling 12 / 20
  • 13. General Concepts GPU Programming CA Parallel implementation CUDA: Compute Unified Device Architecture CUDA Program: Consists of phases that are executed on either the host (CPU) or a device (GPU). No data parallelism = the code is executed at the host. Data parallelism = the code is executed at the device. Data-parallel portions of an application are expressed as device kernels which run on the device. Arrays of Parallel Threads GPU kernels are written using the Single Program Multiple Data (SPMD) programming model. • A CUDA kernel is executed by an array of threads SPMD executes multiple instances of the same program – All threads run the same code (SPMD)   independently, where eachthat it uses to compute memorya di↵erent portion of – Each thread has an ID program works on addresses and the data. make control decisions threadID 0 1 2 3 4 5 6 7 … float x = input[threadID]; float y = func(x); output[threadID] = y; … Parallel Computing: Perspectives for more e cient hydrological modeling 15 13 / 20
  • 14. General Concepts GPU Programming CA Parallel implementation CUDA: Compute Unified Device Architecture Chapter 2. Programming Model Grid A CUDA kernel is executed by an array of threads. Each thread has an ID, which is used to compute memory addresses and make control decisions. CUDA threads are organized into multiple blocks. Threads within a block cooperate via shared memory, atomic operations and barrier synchronization. Block (0, 0) Block (1, 0) Block (2, 0) Block (0, 1) Block (1, 1) Block (2, 1) Block (1, 1) Thread (0, 0) Thread (1, 0) Thread (2, 0) Thread (3, 0) Thread (0, 1) Thread (1, 1) Thread (2, 1) Thread (3, 1) Thread (0, 2) Thread (1, 2) Thread (2, 2) Thread (3, 2) Figure 2-1.Grid of Thread Blocks Parallel Computing: Perspectives for more e cient hydrological modeling 2.3 Memory Hierarchy 14 / 20
  • 15. General Concepts GPU Programming CA Parallel implementation CUDA memory types Chapter 4: Hardware Implementation Global memory: Low bandwidth but large space. Fastest read/write calls if they are coalesced. Device Multiprocessor N Multiprocessor 2 Multiprocessor 1 Texture memory: Cache optimized for 2D spatial patterns. Shared Memory Registers Constant memory: Slow, but with cache (8 kb). Processor 1 Registers Processor 2 Registers … Instruction Unit Processor M Constant Cache Shared memory: Fast, but it can be used only by the threads of the same block. Texture Cache Device Memory Registers: 32768 32-bit registers per Multi-processor. A set of SIMT multiprocessors with on-chip shared memory. Figure 4-2.Hardware Model Parallel Computing: Perspectives for more e cient hydrological modeling 4.2 Multiple Devices 15 / 20
  • 16. General Concepts GPU Programming CA Parallel implementation CA Parallel implementation A parallel version of the Cellular Automata algorithm for variably saturated flow in soils was developed in CUDA API. The infiltration experiment of Vauclin et al. (1979) was chosen as a benchmark test for the accuracy and the speed of the algorithm. 0 t = 2 hrs t = 3 hrs t = 4 hrs t = 8 hrs experimental data Water Depth (m) 0.5 1 1.5 2 0 0.5 1 1.5 Distance (m) 2 2.5 3 Parallel Computing: Perspectives for more e cient hydrological modeling 16 / 20
  • 17. General Concepts GPU Programming CA Parallel implementation Why parallel code is important? In real case scenarios, where the 3-D simulation of large areas is needed, the grid sizes are excessively large. In natural hazards assessment the simulations should be fast in order to be useful (the prediction should be before the actual event!). Fast simulations allow us to calibrate easier the model parameters and investigate more e ciently the physical phenomena. The inherent CA concept natural parallelism make easier the parallel implementation of the algorithm. Parallel Computing: Perspectives for more e cient hydrological modeling 17 / 20
  • 18. General Concepts GPU Programming CA Parallel implementation Technical details Di culties The most challenging issue was the irregular geometry of the domain which made more di cult the exploitation of the locality at the thread computations and the use of the shared memory. The cell values were stored in a 1D array and for each cell the indexes of its neighboring cells were also stored. Code structure Simulation constants are stored in the constant memory. Soil properties for each soil class are stored in the texture memory. Atomic operations are used in order to check for convergence at every iteration. The shared memory is used to accelerate the atomic operations and the block’s memory accesses. Parallel Computing: Perspectives for more e cient hydrological modeling 18 / 20
  • 19. General Concepts GPU Programming CA Parallel implementation Results of the numerical tests Nvidia Quadro 2000: 192 CUDA cores. 1 GB GDDR5 of RAM memory. 100000" 90" 70" Speed%Up% Speed%(%cells/sec%)% 80" 10000" 1000" 100" CPU" 10" GPU" 60" 50" 40" 30" 20" 10" 1" 1000" 10000" 100000" Number%of%Cells% 1000000" 10000000" 0" 1000" 10000" 100000" Number%of%Cells% 1000000" 10000000" Parallel Computing: Perspectives for more e cient hydrological modeling 19 / 20
  • 20. Thanks for your attention!