SlideShare a Scribd company logo
1 of 46
Parallel and Distributed
Computing
Dr. Naween Kumar
Learning Outcomes
At the end of the course, the students will be able to
• - define Parallel Algorithms
• - recognize parallel speedup and performance analysis
• - identify task decomposition techniques
• - perform Parallel Programming
• - apply acceleration strategies for algorithms
Four decades of computing
• Batch Era
• Time sharing Era
• Desktop Era
• Network Era
Batch era
• Batch processing
• Is execution of a series of programs on a computer
without manual intervention
• The term originated in the days when users entered
programs on punch cards
Time-sharing Era
• time-sharing is the sharing of a computing
resource among many users by means of
multiprogramming and multi-tasking
• Developing a system that supported multiple
users at the same time
Desktop Era
• Personal Computers (PCs)
• With WAN
Network Era
• Systems with:
• Shared memory
• Distributed memory
• Example for parallel computers: Intel iPSC, nCUBE
Parallel Computing
Parallel computing is a form of computationin whichmanyinstructionsarecarriedout simultaneously
operating on the principle that largeproblems canoftenbedividedinto smallerones,whicharethen
solved concurrently (in parallel).
With the increased use of computers in every sphere of human activity, computer scientists are
facedwithtwo crucial issues today.
 Processing has to bedonefaster likenever before
 Larger or complex computationproblems need to be solved
Increasing the number of transistors as per Moore’s Law isn’t a solution, asit also increases the
frequency scalingandpower consumption.
Power consumptionhasbeen amajor issue recently, as it causes a problem of processor heating.
Theperfect solutionis PARALLELISM In hardware as well as software.
Parallel Computing
Difference between Parallel Computing & Distributed Computing
When different processors/computers work on a single common goal, It is parallel computing. Eg. Ten men pulling a
rope to lift up one rock, supercomputers implement parallel computing. Distributed computing is where several
different computers work separately on a multi-faceted computing workload. Eg. Ten men pulling ten ropes to lift ten
different rocks, employees working in an office doing their own work.
Difference between Parallel Computing & Cluster Computing
A computer cluster is a group of linked computers, working together closely so that in many respects they form a
single computer. Eg.,In an office of 50 employees, group of 15 doing some work,25 some other,and remaining 10
something else. Similarly, in a network of 20 computers,16 working on a common goal, whereas 4 on some other
common goal. Cluster Computing is a specific case of parallel computing.
Difference between Parallel Computing & Grid Computing
Grid Computing makes use of computers communicating over the Internet to work on a given problem.
Eg.When 3 persons,one of them from USA, another from Japan and a third from Norway are working together
online on a common project. Websites like Wikipedia,Yahoo!Answers,YouTube,FlickR or open source OS like Linux
are examples of grid computing. Again, it serves a san example of parallel computing.
FLYNN's taxonomy of computer
architecture
Two types of information flow into processor:
 Instructions
 Data
what are instructions and data?
FLYNN's taxonomy of computer
architecture
1. single-instruction single-data streams (SISD)
2. single-instruction multiple-data streams (SIMD)
3. multiple-instruction single-data streams (MISD)
4. multiple-instruction multiple-data streams (MIMD)
Parallel computing?
Serial computing
Parallel computing?
Parallel Computers
• all stand-alone computers today are parallel from a hardware
perspective
Parallel Computers
• Networks connect multiple stand-alone computers (nodes) to make
larger parallel computer clusters.
Why Use Parallel Computing?
• SAVE TIME AND/OR MONEY:
Why Use Parallel Computing?
• SOLVE LARGER / MORE COMPLEX PROBLEMS
Grand Challenge Problems ?
Why Use Parallel Computing?
• PROVIDE CONCURRENCY
Why Use Parallel Computing?
• TAKE ADVANTAGE OF NON-LOCAL RESOURCES:
Why Use Parallel Computing?
• MAKE BETTER USE OF UNDERLYING PARALLEL HARDWARE
• Modern computers, even laptops, are parallel in architecture with multiple
processors/cores
BACK to Flynn's Classical Taxonomy
Single Instruction Single Data
(SISD)
• A serial (non-parallel) computer
• This is the oldest type of computer
UNIVAC1
IBM 360
CRAY1 CDC 7600 PDP1
Single Instruction Multiple Data
(SIMD)
ILLIAC IV
MasPar
Cray X-MP
Cray Y-MP
Cell Processor (GPU)
Multiple Instruction Single Data
The Space Shuttle flight control computers
Multiple Instruction Multiple Data
(MIMD)
IBM POWER5
HP/Compaq Alphaserver
Intel IA32
AMD Opteron
What are we going to learn?
Factors influencing Parallel Computing
•Increased Scientific & Business Computing
•Sequential architecture constrained by speed of light and thermodynamics law
•H/w improvement in pipelining, superscalar, require sophisticated compiler
•Vector processing works well for matrix/graphics processing
•Parallel processing is matured and can be explored commercially
Shared Memory System
• A shared memory system typically accomplishes
interprocessor coordination through a global memory shared by all
processors.
Easier to program, Less tolerant, limited scalability
Failure affects entire system
• Ex: Server systems, GPGPU
Message Passing System
(Distributed Memory)
• This kind of systems typically combine the local
memory and processor at each node of the
interconnection network
• There is no global memory, Use message passing
technique to move data from one local memory to
another
• Difficult to program, More tolerant, higher scalability
Failure affects system partially
Superior price performance ratio
Limits and Costs of Parallel Programming
• Amdahl's Law:
Amdahl's Law states that potential program speedup is defined by the
fraction of code (P) that can be parallelized:
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
1
1 − 𝑝
• If none of the code can be parallelized, P = 0 and the speedup = 1 (no
speedup).
• If all of the code is parallelized, P = 1 and the speedup is infinite (in
theory).
Limits and Costs of Parallel Programming
• If 50% of the code can be parallelized, maximum speedup = 2,
meaning the code will run twice as fast.
Limits and Costs of Parallel Programming
• Introducing the number of processors performing the parallel fraction
of work, the relationship can be modeled by:
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 𝑃
1
𝑁 + 𝑆
• where P = parallel fraction, N = number of processors and S = serial
fraction
Limits and Costs of Parallel Programming
Types Of Parallelism
• Bit-Level
• Instructional
• Data
• Task
Bit-Level Parallelism
When an 8-bit processor needs to add two 16- bit integers,it’s
to be done in two steps.
 The processor must first add the 8 lower-order bits from each
integer using the standard addition instruction,
 Then add the 8 higher-order bits using an add- with-carry
instruction and the carry bit from the lower order addition
Instruction Level Parallelism
The instructions given to a computer for processing can be
divided into groups, or re- ordered and then processed
without changing the final result.
This is known as instruction-level parallelism. i.e.,ILP.
An Example
1. e = a + b
2. f = c + d
3. g = e * f
Here, instruction 3 is dependent on instruction
1 and 2 .
However,instruction 1 and 2 can be
independently processed.
Data Parallelism
Data parallelism focuses on distributing the data across
different parallel computing nodes.
It is also called as loop-level parallelism.
An Illustration
In a data parallel implementation, CPU A could add all
elements from the top half of the matrices, while CPU B could
add all elements from the bottom half of the matrices.
Since the two processors work in parallel, the job of
performing matrix addition would take one half the time of
performing the same operation in serial using one CPU
alone.
Task Parallelism
Task Parallelism focuses on distribution of tasks across
different processors.
It is also known as functional parallelism or control
parallelism
An Example
As a simple example, if we are running code
on a 2-processor system (CPUs "a" & "b") in a
parallel environment and we wish to do tasks
"A" and "B" , it is possible to tell CPU "a" to
do task "A" and CPU "b" to do task 'B"
simultaneously, thereby reducing the
runtime of the execution.
Key Difference Between Data And Task
Parallelism
 It is the division of
threads(processes) or instructions or
tasks internally into sub-parts for
execution.
 A task ‘A’ is divided into sub-parts
and then processed.
Data Parallelism Task Parallelism
 It is the divisions among
threads(processes) or instructions or
tasks themselves for execution.
 A task ‘A’ and task ‘B’ are
processed separately by different
processors.
Implementation Of Parallel Computing
In Software
When implemented in software(or rather algorithms), the
terminology calls it ‘parallel programming’.
An algorithm is split into pieces and then executed, as
seen earlier.
Important Points In Parallel
Programming
Dependencies-A typical scenario when line 6 of an algorithm
is dependent on lines 2,3,4 and 5
Application Checkpoints-Just like saving the algorithm, or
like creating a backup point.
Automatic Parallelisation-Identifying dependencies and
parallelising algorithms automatically.This has achieved
limited success.
Implementation Of Parallel Computing
In Hardware
When implemented in hardware, it is called as ‘parallel
processing’.
Typically,when a chunk of load for execution is divided for
processing by units like cores,processors,CPUs,etc.
Next
• Parallel Computer Memory Architectures

More Related Content

Similar to Parallel Computing-Part-1.pptx

System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
purplesea
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
giddy5
 

Similar to Parallel Computing-Part-1.pptx (20)

Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
System models for distributed and cloud computing
System models for distributed and cloud computingSystem models for distributed and cloud computing
System models for distributed and cloud computing
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented Model
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Week # 1.pdf
Week # 1.pdfWeek # 1.pdf
Week # 1.pdf
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 

More from krnaween (8)

AWS Glue.pptx
AWS Glue.pptxAWS Glue.pptx
AWS Glue.pptx
 
AMAZON ATHENA.pptx
AMAZON ATHENA.pptxAMAZON ATHENA.pptx
AMAZON ATHENA.pptx
 
AWS SQS SNS.pptx
AWS SQS SNS.pptxAWS SQS SNS.pptx
AWS SQS SNS.pptx
 
2-desktop virtualization.pptx
2-desktop virtualization.pptx2-desktop virtualization.pptx
2-desktop virtualization.pptx
 
AMAZON ATHENA.pptx
AMAZON ATHENA.pptxAMAZON ATHENA.pptx
AMAZON ATHENA.pptx
 
SessionBased.pptx
SessionBased.pptxSessionBased.pptx
SessionBased.pptx
 
AWS-Service.pptx
AWS-Service.pptxAWS-Service.pptx
AWS-Service.pptx
 
Module 1 cloud computing
Module 1   cloud computingModule 1   cloud computing
Module 1 cloud computing
 

Recently uploaded

Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
AnaAcapella
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
PANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxPANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPSSpellings Wk 4 and Wk 5 for Grade 4 at CAPS
Spellings Wk 4 and Wk 5 for Grade 4 at CAPS
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

Parallel Computing-Part-1.pptx

  • 2. Learning Outcomes At the end of the course, the students will be able to • - define Parallel Algorithms • - recognize parallel speedup and performance analysis • - identify task decomposition techniques • - perform Parallel Programming • - apply acceleration strategies for algorithms
  • 3. Four decades of computing • Batch Era • Time sharing Era • Desktop Era • Network Era
  • 4. Batch era • Batch processing • Is execution of a series of programs on a computer without manual intervention • The term originated in the days when users entered programs on punch cards
  • 5. Time-sharing Era • time-sharing is the sharing of a computing resource among many users by means of multiprogramming and multi-tasking • Developing a system that supported multiple users at the same time
  • 6. Desktop Era • Personal Computers (PCs) • With WAN
  • 7. Network Era • Systems with: • Shared memory • Distributed memory • Example for parallel computers: Intel iPSC, nCUBE
  • 8. Parallel Computing Parallel computing is a form of computationin whichmanyinstructionsarecarriedout simultaneously operating on the principle that largeproblems canoftenbedividedinto smallerones,whicharethen solved concurrently (in parallel). With the increased use of computers in every sphere of human activity, computer scientists are facedwithtwo crucial issues today.  Processing has to bedonefaster likenever before  Larger or complex computationproblems need to be solved Increasing the number of transistors as per Moore’s Law isn’t a solution, asit also increases the frequency scalingandpower consumption. Power consumptionhasbeen amajor issue recently, as it causes a problem of processor heating. Theperfect solutionis PARALLELISM In hardware as well as software.
  • 9. Parallel Computing Difference between Parallel Computing & Distributed Computing When different processors/computers work on a single common goal, It is parallel computing. Eg. Ten men pulling a rope to lift up one rock, supercomputers implement parallel computing. Distributed computing is where several different computers work separately on a multi-faceted computing workload. Eg. Ten men pulling ten ropes to lift ten different rocks, employees working in an office doing their own work. Difference between Parallel Computing & Cluster Computing A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. Eg.,In an office of 50 employees, group of 15 doing some work,25 some other,and remaining 10 something else. Similarly, in a network of 20 computers,16 working on a common goal, whereas 4 on some other common goal. Cluster Computing is a specific case of parallel computing. Difference between Parallel Computing & Grid Computing Grid Computing makes use of computers communicating over the Internet to work on a given problem. Eg.When 3 persons,one of them from USA, another from Japan and a third from Norway are working together online on a common project. Websites like Wikipedia,Yahoo!Answers,YouTube,FlickR or open source OS like Linux are examples of grid computing. Again, it serves a san example of parallel computing.
  • 10. FLYNN's taxonomy of computer architecture Two types of information flow into processor:  Instructions  Data what are instructions and data?
  • 11. FLYNN's taxonomy of computer architecture 1. single-instruction single-data streams (SISD) 2. single-instruction multiple-data streams (SIMD) 3. multiple-instruction single-data streams (MISD) 4. multiple-instruction multiple-data streams (MIMD)
  • 14. Parallel Computers • all stand-alone computers today are parallel from a hardware perspective
  • 15. Parallel Computers • Networks connect multiple stand-alone computers (nodes) to make larger parallel computer clusters.
  • 16. Why Use Parallel Computing? • SAVE TIME AND/OR MONEY:
  • 17. Why Use Parallel Computing? • SOLVE LARGER / MORE COMPLEX PROBLEMS Grand Challenge Problems ?
  • 18. Why Use Parallel Computing? • PROVIDE CONCURRENCY
  • 19. Why Use Parallel Computing? • TAKE ADVANTAGE OF NON-LOCAL RESOURCES:
  • 20. Why Use Parallel Computing? • MAKE BETTER USE OF UNDERLYING PARALLEL HARDWARE • Modern computers, even laptops, are parallel in architecture with multiple processors/cores
  • 21. BACK to Flynn's Classical Taxonomy
  • 22. Single Instruction Single Data (SISD) • A serial (non-parallel) computer • This is the oldest type of computer UNIVAC1 IBM 360 CRAY1 CDC 7600 PDP1
  • 23. Single Instruction Multiple Data (SIMD) ILLIAC IV MasPar Cray X-MP Cray Y-MP Cell Processor (GPU)
  • 24. Multiple Instruction Single Data The Space Shuttle flight control computers
  • 25. Multiple Instruction Multiple Data (MIMD) IBM POWER5 HP/Compaq Alphaserver Intel IA32 AMD Opteron
  • 26. What are we going to learn?
  • 27. Factors influencing Parallel Computing •Increased Scientific & Business Computing •Sequential architecture constrained by speed of light and thermodynamics law •H/w improvement in pipelining, superscalar, require sophisticated compiler •Vector processing works well for matrix/graphics processing •Parallel processing is matured and can be explored commercially
  • 28. Shared Memory System • A shared memory system typically accomplishes interprocessor coordination through a global memory shared by all processors. Easier to program, Less tolerant, limited scalability Failure affects entire system • Ex: Server systems, GPGPU
  • 29. Message Passing System (Distributed Memory) • This kind of systems typically combine the local memory and processor at each node of the interconnection network • There is no global memory, Use message passing technique to move data from one local memory to another • Difficult to program, More tolerant, higher scalability Failure affects system partially Superior price performance ratio
  • 30. Limits and Costs of Parallel Programming • Amdahl's Law: Amdahl's Law states that potential program speedup is defined by the fraction of code (P) that can be parallelized: 𝑆𝑝𝑒𝑒𝑑𝑢𝑝 = 1 1 − 𝑝 • If none of the code can be parallelized, P = 0 and the speedup = 1 (no speedup). • If all of the code is parallelized, P = 1 and the speedup is infinite (in theory).
  • 31. Limits and Costs of Parallel Programming • If 50% of the code can be parallelized, maximum speedup = 2, meaning the code will run twice as fast.
  • 32. Limits and Costs of Parallel Programming • Introducing the number of processors performing the parallel fraction of work, the relationship can be modeled by: 𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 𝑃 1 𝑁 + 𝑆 • where P = parallel fraction, N = number of processors and S = serial fraction
  • 33. Limits and Costs of Parallel Programming
  • 34. Types Of Parallelism • Bit-Level • Instructional • Data • Task
  • 35. Bit-Level Parallelism When an 8-bit processor needs to add two 16- bit integers,it’s to be done in two steps.  The processor must first add the 8 lower-order bits from each integer using the standard addition instruction,  Then add the 8 higher-order bits using an add- with-carry instruction and the carry bit from the lower order addition
  • 36. Instruction Level Parallelism The instructions given to a computer for processing can be divided into groups, or re- ordered and then processed without changing the final result. This is known as instruction-level parallelism. i.e.,ILP.
  • 37. An Example 1. e = a + b 2. f = c + d 3. g = e * f Here, instruction 3 is dependent on instruction 1 and 2 . However,instruction 1 and 2 can be independently processed.
  • 38. Data Parallelism Data parallelism focuses on distributing the data across different parallel computing nodes. It is also called as loop-level parallelism.
  • 39. An Illustration In a data parallel implementation, CPU A could add all elements from the top half of the matrices, while CPU B could add all elements from the bottom half of the matrices. Since the two processors work in parallel, the job of performing matrix addition would take one half the time of performing the same operation in serial using one CPU alone.
  • 40. Task Parallelism Task Parallelism focuses on distribution of tasks across different processors. It is also known as functional parallelism or control parallelism
  • 41. An Example As a simple example, if we are running code on a 2-processor system (CPUs "a" & "b") in a parallel environment and we wish to do tasks "A" and "B" , it is possible to tell CPU "a" to do task "A" and CPU "b" to do task 'B" simultaneously, thereby reducing the runtime of the execution.
  • 42. Key Difference Between Data And Task Parallelism  It is the division of threads(processes) or instructions or tasks internally into sub-parts for execution.  A task ‘A’ is divided into sub-parts and then processed. Data Parallelism Task Parallelism  It is the divisions among threads(processes) or instructions or tasks themselves for execution.  A task ‘A’ and task ‘B’ are processed separately by different processors.
  • 43. Implementation Of Parallel Computing In Software When implemented in software(or rather algorithms), the terminology calls it ‘parallel programming’. An algorithm is split into pieces and then executed, as seen earlier.
  • 44. Important Points In Parallel Programming Dependencies-A typical scenario when line 6 of an algorithm is dependent on lines 2,3,4 and 5 Application Checkpoints-Just like saving the algorithm, or like creating a backup point. Automatic Parallelisation-Identifying dependencies and parallelising algorithms automatically.This has achieved limited success.
  • 45. Implementation Of Parallel Computing In Hardware When implemented in hardware, it is called as ‘parallel processing’. Typically,when a chunk of load for execution is divided for processing by units like cores,processors,CPUs,etc.
  • 46. Next • Parallel Computer Memory Architectures