Data Parallel Model
An Overview
Content taken from.
A book by Naresh Jotwani & Kai Hwang
Part IV Software for Parallel Programming
Chapter 10). Parallel Models, Languages, and Compilers
Parallel Programming Models
10.1.1 Shared Variable Model
10.1.2 Message Passing Model
Special Thanks to
Dr. Preeti Aggarwal
ASSISTANT PROFESSOR
Tutorials Point
Data-Parallel Model
Created and presented by:
Nikhil Sharma
M.E CSE 1st year
Roll no (19-311)
July 2019- Dec 2019 Session
Title Anatomy
• Data
• Parallel Model
Programming Model
A collection of program abstractions providing a programmer a
simplified and transparent view of the computer
Hardware/software system.
Parallel programming models are specifically designed for
multiprocessors, multicomputer, or vector SIMD computers.
Lockstep Operation
THIS
PROVIDES ATOMICITY
STATE 1--------- STATE 2
NOTHING IN B/W
set of changes (new inputs, new outputs, new state)
SIMD
• Parallelism is explicitly handled by hardware synchronization and flow
control.
• Choice of Data structure Figure courtesy: [2]
Main Focus
( local computations and data routing operation)
Where data-parallel programming is used ?
• Fine Grain
Figure courtesy : ACA, Naresh kotwani
Single Instruction, Multiple Data (SIMD)
• Single instruction:All processing units execute the same
instruction at any given clock cycle
• Multiple data: Each processing unit can operate on a different data element
• Best suited for specialized problems characterized by a high degree of regularity, such as
image processing.
• Two varieties: Processor Arrays and Vector Pipelines
• Examples:
• Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2
• Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
Simplified View
Figure courtesy: [1]
Data parallelism
• If you want to see data parallelism
•SIMD
•SPMD Multicomputer
Data parallelism
• Challenge
• Matching of problem size with fixed machine size
• Ex. Partitioning of large arrays or matrixes in to 64 –
element segments.
SCIENTIST BE LIKE
Synhronization
• Data – parallel operation is done at
COMPILE TIME
Instead of
RUN TIME
Hardware synchronization: Enforced by control unit and LOCKSTEP execution.
Synchronous SIMD
• Lockstep fashion
• Do not have mutual exclusion or synchronization problem associated
with multiprocessors or multicomputer
• Inter-PE directly controlled by hardware
• Inter-PE data communication is also carried by lockstep
• Spatial Parallelism
Parallelism Example
• What is the latency and throughput if Ben uses parallelism?
• Spatial parallelism: Ben asks Allysa to help, using her own oven
• Temporal parallelism: Ben breaks the task into two stages: roll and
baking. He uses two trays. While the first batch is baking he rolls
the second batch, and so on.
Spatial Parallelism
Spatial
Parallelism
Roll
Bake
Ben 1 Ben 1
Alyssa 1 Alyssa 1
Ben 2 Ben 2
Alyssa 2 Alyssa 2
Time
0 5 10 15 20 25 30 35 40 45 50
Tray 1
Tray 2
Tray 3
Tray 4
Latency:
time to
first tray
Legend
Latency = ?
Throughput = ?
Spatial Parallelism
Spatial
Parallelism
Roll
Bake
Ben 1 Ben 1
Alyssa 1 Alyssa 1
Ben 2 Ben 2
Alyssa 2 Alyssa 2
Time
0 5 10 15 20 25 30 35 40 45 50
Tray 1
Tray 2
Tray 3
Tray 4
Latency:
time to
first tray
Legend
Latency = 5 + 15 = 20 minutes = 1/3 hour (same)
Throughput = 2 trays/ 1/3 hour = 6 trays/hour (doubled)
SIMD--- SCALAR v/s VECTOR
• Scalar are directly executed by the control unit.
• Vector are broadcast to all processing elements.
• (because vector operands are located at different location-PE)
Array Language Extensions
Array extensions in data parallel languages are represented by high-level
data types.
Enables the removal of some nested loops in the code.
global address space, which obviates the need for explicit data routing between PEs
In computer science, array programming refers to solutions which allow the
application of operations to an entire set of values at once. Such solutions are
commonly used in scientific and engineering settings. (source wiki ) Fortan 77
Compiler Support
• facilitate precise control of massively parallel hardware, and
enable incremental migration to data-parallel execution.
• Compiler-optimized control of SIMD machine hardware allows the
programmer to drive the PE array transparently. The compiler
must separate the program into scalar and parallel components
and integrate with the OS environment.
• array extensions to optimize data placement, minimize data
movement, and virtualize the dimensions of the PE array. The
compiler generates data-parallel machine code to perform
operations on arrays.
Array sectioning
• allows a programmer to reference a section or a region of a
multidimensional array.
• Array sections are designated by specifying a start index, a bound, and a
stride.
• Vector-valued subscripts arc often used to construct arrays from arbitrary
permutations of another array.
These expressions are vectors that map the desired elements into the target
array. They facilitate the implementation of gather and scatter operations on
a vector of indices.
• SPMD programs are a special class of SIMD programs which
emphasize medium-grain parallelism and synchronization at the
subprogram level rather than at the instruction level.
What was covered ?
• Lockstep operation
• SIMD (data parallel model)
• Synchronization
• Spatial Parallelism
• SIMD--- SCALAR v/s VECTOR
• Array Language ( Fortan 77 ) and extension
• Compiler support
Q & A
Part IV Software for Parallel Programming
Chapter 10). Parallel Models, Languages, and Compilers
Parallel Programming Models
10.1.1 Shared Variable Model
10.1.2 Message Passing Model
Covered 10.1.3 Data- Parallel Model
 10.1 .4 Object-Oriented Model
In this model
• In this model, objects are dynamically created and manipulated.
• Concurrent programming models are built up from low-level objects
such as processes, queues, and semaphores into high level objects
like monitors and program modules.
Concurrent Object
First
• increased use of interacting processes by individual users
Second
• workstation networks have become a cost-effective mechanism for
resource sharing and distributed problem solving
Third
• multiprocessor technology in several variants has advanced to the
point of providing supercomputing power at a fraction of the
traditional cost.
Program abstraction
• Program modularity and software reusability as is commonly
experienced with OOP
Objects
• Program entities which encapsulate data and operations into single
computational units.
An Actor Model
• Message passing is attached with semantics
• Create: Creating an actor from a behavior description and a set of
parametric.
• Send-to: Sending a message to another actor.
• Become: An actor replacing its own behavior by a new behavior
Parallelism in COOP
• Pipeline concurrency involves the overlapped enumeration of
successive solutions and concurrent testing of the solutions as they
emerge from an evaluation pipeline.
• Divide-and-conquer concurrency
Example
A prime-number generation pipeline in
Integer numbers are generated and successively tested for divisibility
by previously generated primes in a linear pipeline of primes.
Example 10.2 Concurrency in object-oriented
programming
• Integer numbers are generated and successively tested for divisibility
by previously generated primes in a linear pipeline of primes.
• Multiplication of a list of numbers [10, 7, -2, 3, 4, -11, -3] using a
divide and-conquer approach
Figure courtesy: ACA a book by Jotwani
Q & A
References
Introduction to Parallel Computing
[1]. https://computing.llnl.gov/tutorials/parallel_comp/#ModelsData
Programming Models
[2]. https://ict.senecacollege.ca/~gpu621/pages/content/model.html
Data parallelism
[3]. https://en.wikipedia.org/wiki/Data_parallelism
End of Presentation

Data Parallel and Object Oriented Model

  • 1.
  • 2.
    Content taken from. Abook by Naresh Jotwani & Kai Hwang
  • 3.
    Part IV Softwarefor Parallel Programming Chapter 10). Parallel Models, Languages, and Compilers Parallel Programming Models 10.1.1 Shared Variable Model 10.1.2 Message Passing Model
  • 4.
    Special Thanks to Dr.Preeti Aggarwal ASSISTANT PROFESSOR Tutorials Point
  • 5.
    Data-Parallel Model Created andpresented by: Nikhil Sharma M.E CSE 1st year Roll no (19-311) July 2019- Dec 2019 Session
  • 6.
  • 7.
    Programming Model A collectionof program abstractions providing a programmer a simplified and transparent view of the computer Hardware/software system. Parallel programming models are specifically designed for multiprocessors, multicomputer, or vector SIMD computers.
  • 8.
    Lockstep Operation THIS PROVIDES ATOMICITY STATE1--------- STATE 2 NOTHING IN B/W set of changes (new inputs, new outputs, new state)
  • 9.
    SIMD • Parallelism isexplicitly handled by hardware synchronization and flow control. • Choice of Data structure Figure courtesy: [2] Main Focus ( local computations and data routing operation)
  • 10.
    Where data-parallel programmingis used ? • Fine Grain Figure courtesy : ACA, Naresh kotwani
  • 11.
    Single Instruction, MultipleData (SIMD) • Single instruction:All processing units execute the same instruction at any given clock cycle • Multiple data: Each processing unit can operate on a different data element • Best suited for specialized problems characterized by a high degree of regularity, such as image processing. • Two varieties: Processor Arrays and Vector Pipelines • Examples: • Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2 • Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820
  • 12.
  • 13.
    Data parallelism • Ifyou want to see data parallelism •SIMD •SPMD Multicomputer
  • 14.
    Data parallelism • Challenge •Matching of problem size with fixed machine size • Ex. Partitioning of large arrays or matrixes in to 64 – element segments. SCIENTIST BE LIKE
  • 15.
    Synhronization • Data –parallel operation is done at COMPILE TIME Instead of RUN TIME Hardware synchronization: Enforced by control unit and LOCKSTEP execution.
  • 16.
    Synchronous SIMD • Lockstepfashion • Do not have mutual exclusion or synchronization problem associated with multiprocessors or multicomputer • Inter-PE directly controlled by hardware • Inter-PE data communication is also carried by lockstep • Spatial Parallelism
  • 17.
    Parallelism Example • Whatis the latency and throughput if Ben uses parallelism? • Spatial parallelism: Ben asks Allysa to help, using her own oven • Temporal parallelism: Ben breaks the task into two stages: roll and baking. He uses two trays. While the first batch is baking he rolls the second batch, and so on.
  • 18.
    Spatial Parallelism Spatial Parallelism Roll Bake Ben 1Ben 1 Alyssa 1 Alyssa 1 Ben 2 Ben 2 Alyssa 2 Alyssa 2 Time 0 5 10 15 20 25 30 35 40 45 50 Tray 1 Tray 2 Tray 3 Tray 4 Latency: time to first tray Legend Latency = ? Throughput = ?
  • 19.
    Spatial Parallelism Spatial Parallelism Roll Bake Ben 1Ben 1 Alyssa 1 Alyssa 1 Ben 2 Ben 2 Alyssa 2 Alyssa 2 Time 0 5 10 15 20 25 30 35 40 45 50 Tray 1 Tray 2 Tray 3 Tray 4 Latency: time to first tray Legend Latency = 5 + 15 = 20 minutes = 1/3 hour (same) Throughput = 2 trays/ 1/3 hour = 6 trays/hour (doubled)
  • 20.
    SIMD--- SCALAR v/sVECTOR • Scalar are directly executed by the control unit. • Vector are broadcast to all processing elements. • (because vector operands are located at different location-PE)
  • 21.
    Array Language Extensions Arrayextensions in data parallel languages are represented by high-level data types. Enables the removal of some nested loops in the code. global address space, which obviates the need for explicit data routing between PEs In computer science, array programming refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings. (source wiki ) Fortan 77
  • 22.
    Compiler Support • facilitateprecise control of massively parallel hardware, and enable incremental migration to data-parallel execution. • Compiler-optimized control of SIMD machine hardware allows the programmer to drive the PE array transparently. The compiler must separate the program into scalar and parallel components and integrate with the OS environment. • array extensions to optimize data placement, minimize data movement, and virtualize the dimensions of the PE array. The compiler generates data-parallel machine code to perform operations on arrays.
  • 23.
    Array sectioning • allowsa programmer to reference a section or a region of a multidimensional array. • Array sections are designated by specifying a start index, a bound, and a stride. • Vector-valued subscripts arc often used to construct arrays from arbitrary permutations of another array. These expressions are vectors that map the desired elements into the target array. They facilitate the implementation of gather and scatter operations on a vector of indices.
  • 24.
    • SPMD programsare a special class of SIMD programs which emphasize medium-grain parallelism and synchronization at the subprogram level rather than at the instruction level.
  • 25.
    What was covered? • Lockstep operation • SIMD (data parallel model) • Synchronization • Spatial Parallelism • SIMD--- SCALAR v/s VECTOR • Array Language ( Fortan 77 ) and extension • Compiler support
  • 26.
  • 27.
    Part IV Softwarefor Parallel Programming Chapter 10). Parallel Models, Languages, and Compilers Parallel Programming Models 10.1.1 Shared Variable Model 10.1.2 Message Passing Model Covered 10.1.3 Data- Parallel Model  10.1 .4 Object-Oriented Model
  • 28.
    In this model •In this model, objects are dynamically created and manipulated. • Concurrent programming models are built up from low-level objects such as processes, queues, and semaphores into high level objects like monitors and program modules.
  • 29.
    Concurrent Object First • increaseduse of interacting processes by individual users Second • workstation networks have become a cost-effective mechanism for resource sharing and distributed problem solving Third • multiprocessor technology in several variants has advanced to the point of providing supercomputing power at a fraction of the traditional cost.
  • 30.
    Program abstraction • Programmodularity and software reusability as is commonly experienced with OOP Objects • Program entities which encapsulate data and operations into single computational units.
  • 31.
    An Actor Model •Message passing is attached with semantics • Create: Creating an actor from a behavior description and a set of parametric. • Send-to: Sending a message to another actor. • Become: An actor replacing its own behavior by a new behavior
  • 32.
    Parallelism in COOP •Pipeline concurrency involves the overlapped enumeration of successive solutions and concurrent testing of the solutions as they emerge from an evaluation pipeline. • Divide-and-conquer concurrency Example A prime-number generation pipeline in Integer numbers are generated and successively tested for divisibility by previously generated primes in a linear pipeline of primes.
  • 33.
    Example 10.2 Concurrencyin object-oriented programming • Integer numbers are generated and successively tested for divisibility by previously generated primes in a linear pipeline of primes.
  • 34.
    • Multiplication ofa list of numbers [10, 7, -2, 3, 4, -11, -3] using a divide and-conquer approach Figure courtesy: ACA a book by Jotwani
  • 35.
  • 36.
    References Introduction to ParallelComputing [1]. https://computing.llnl.gov/tutorials/parallel_comp/#ModelsData Programming Models [2]. https://ict.senecacollege.ca/~gpu621/pages/content/model.html Data parallelism [3]. https://en.wikipedia.org/wiki/Data_parallelism
  • 38.