Lecture2

CS-416 Parallel and Distributed Systems JawwadShamsi Lecture #2 19th January 2010

How Much of Parallelism Decomposition: The process of partitioning a computer program into independent pieces that can be run simultaneously (in parallel). Data Parallelism Task Parallelism

Scalar Instruction Single Instruction at a time Load R1 @ 1000 Vector Instruction Vector operations C(1:100)=A(1:100)+B(1:100) Super computer Vector Instructions with pipelined floating point arithematic

Pipelining Pipelining overlaps various stages of instruction execution to achieve performance. At a high level of abstraction, an instruction can be executed while the next one is being decoded and the next one is being fetched. This is akin to an assembly line for manufacture of cars.

Pipeline-limitations Pipelining, however, has several limitations. The speed of a pipeline is eventually limited by the slowest stage. For this reason, conventional processors rely on very deep pipelines (20 stage pipelines in state-of-the-art Pentium processors). However, in typical program traces, every 5-6th instruction is a conditional jump! This requires very accurate branch prediction. The penalty of a misprediction grows with the depth of the pipeline, since a larger number of instructions will have to be flushed.

Pipelining and Superscalar Execution One simple way of alleviating these bottlenecks is to use multiple pipelines. The question then becomes one of selecting these instructions.

Super Scalar Execution Issue multiple instructions during the same CPU cycle How? By simultaneously dispatching multiple instructions to redundant units not a separate CPU core but an execution resource within a single CPU e.g. ALU

Difference Between Vector Instruction and Pipelining

Parallel Processing Computer speed / output is increased through parallel processing Increase hardware Increase cost Tech advancements Increase in cost is minimal

Types of Parallelism - explained

Data Parallelism Same code segment runs concurrently on each processor Each processor is assigned its own part of the data to work on

Lecture2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Lecture2

Similar to Lecture2 (20)

More from Asad Abbas

More from Asad Abbas (6)

Recently uploaded

Recently uploaded (20)

Lecture2