3. WHAT IS A PARALLEL INSTRUCTION?
Parallel instructions are a set of instructions that do not depend on each other
to be executed.
Hierarchy
Bit level Parallelism
• 16 bit add on 8 bit processor
Instruction level Parallelism
Loop level Parallelism
• for (i=1; i<=1000; i= i+1)
x[i] = x[i] + y[i];
Thread level Parallelism
• multi-core computers
4. EXAMPLE
Consider the following program:
1. e = a + b
2. f = c + d
3. g = e * f
Operation 3 depends on the results of "e" and "f" which are calculated from operations 1 and
2, so "g" cannot be calculated until both of "e" and "f" are computed.
However, operations 1 and 2 do not depend on any other operation, so they can be
computed simultaneously.
If we assume that each operation can be completed in one unit of time then these three
instructions can be completed in a total of two units of time, giving an ILP factor of 3/2;
which means 3/2 = 1.5 greater than without ILP.
5. WHY ILP?
One of the goals of compilers and processors designers is to use as much ILP as
possible.
Ordinary programs are written execute instructions in sequence; one after the other, in
the order as written by programmers.
ILP allows the compiler and the processor to overlap the execution of multiple
instructions or even to change the order in which instructions are executed.
7. INSTRUCTION PIPELINE
An instruction pipeline is a technique
used in the design of modern
microprocessors, microcontrollers and
CPUs to increase their instruction
throughput (the number of instructions
that can be executed in a unit of time).
8. PIPELINING
The main idea is to divide the processing of a CPU instruction
into a series of independent steps of "microinstructions with
storage at the end of each step.
This allows the CPUs control logic to handle instructions at the
processing rate of the slowest step, which is much faster than
the time needed to process the instruction as a single step.
9. EXAMPLE
For example, the RISC pipeline is broken into five stages with a set of flip flops between
each stage as follow:
Instruction fetch
Instruction decode & register fetch
Execute
Memory access
Register write back
The vertical axis is successive instructions, the horizontal axis is time. So in the green
column, the earliest instruction is in WB stage, and the latest instruction is undergoing
instruction fetch.
10. SUPERSCALER
A superscalar CPU architecture
implements ILP inside a single processor
which allows faster CPU throughput at the
same clock rate.
11. WHY SUPERSCALER
A superscalar processor executes more than one instruction during a clock
cycle
Simultaneously dispatches multiple instructions to multiple redundant
functional units built inside the processor.
Each functional unit is not a separate CPU core but an execution resource
inside the CPU such as an arithmetic logic unit, floating point unit (FPU), a
bit shifter, or a multiplier.
12. EXAMPLE
Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a
maximum of two instructions per cycle can be completed.
13. OUT-OF-ORDER EXECUTION
OoOE, is a technique used in most high-
performance microprocessors.
The key concept is to allow the processor to
avoid a class of delays that occur when the data
needed to perform an operation are unavailable.
Most modern CPU designs include support for out
of order execution.
14. STEPS
Out-of-order processors breaks up the processing of instructions into these steps:
Instruction fetch.
Instruction dispatch to an instruction queue (also called instruction buffer)
The instruction waits in the queue until its input operands are available.
The instruction is issued to the appropriate functional unit and executed by that unit.
The results are queued (Re-order Buffer).
Only after all older instructions have their results written back to the register file, then this
result is written back to the register.
15. OTHER ILP TECHNIQUES
Register renaming which is a technique used to avoid unnecessary serialization of
program operations caused by the reuse of registers by those operations, in order to
enable out-of-order execution.
Speculative execution which allow the execution of complete instructions or parts of
instructions before being sure whether this execution is required.
Branch prediction which is used to avoid delays cause of control dependencies to be
resolved. Branch prediction determines whether a conditional branch (jump) in the
instruction flow of a program is likely to be taken or not.