ComputerComputer
ArchitectureArchitecture
Instruction-Level Parallel
Processors
Improve CPU performance by
• increasing clock rates
• (CPU running at even higher frequencies.)
• increasing the number of instructions to be executed in parallel
• (executing 6-10 instructions at the same time)
• What is the limit for these?
Pipeline (assembly line)
Result of pipeline (e.g.)
• Very long instruction word (VLIW) refers to processor
architectures designed to exploit instruction level
parallelism (ILP).
• VLIW processor allows programs to explicitly specify
instructions to execute at the same time, concurrently, in
parallel
VLIW (very long instruction
word,1024 bits!)
VLIW (very long instruction
word,1024 bits!)
Superscalar (sequential stream of
instructions)
superscalar processor is a CPU that
implements a form of parallelism called
instruction-level parallelism within a single
processor. It therefore allows for more
throughput (the number of instructions that
can be executed in a unit of time) than would
otherwise be possible at a given clock rate
Scalar Vs Superscalar
•In contrast to a scalar processor that can execute at
most one single instruction per clock cycle
• superscalar processor can execute more than one
instruction during a clock cycle by simultaneously
dispatching multiple instructions to different
execution units on the processor.
Superscalar (sequential stream of
instructions)
Flynn's taxonomy,
From Sequential instructions
to parallel execution
• Dependencies between instructions
• Instruction scheduling
• Preserving sequential consistency
Dependencies between instructions
Instructions often depend on each other in such a way that a particular
instruction cannot be executed until a preceding instruction or even two or
three preceding instructions have been executed.
1 Data dependencies
2 Control dependencies
3 Resource dependencies
Data dependencies
• Read after Write (RAW)
• Write after Read (WAR)
• Write after Write (WAW)
• Recurrences
Data dependencies in straight-line code
(RAW)
•RAW dependencies
•i1: load r1, a
•r2: add r2, r1, r1
•flow dependencies
•true dependencies
•cannot be abandoned
Data dependencies in straight-line code
(WAR)
•WAR dependencies
• i1: mul r1, r2, r3
• r2: add r2, r4, r5
•anti-dependencies
•false dependencies
•can be eliminated through register renaming
• i1: mul r1, r2, r3
• r2: add r6, r4, r5
• by using compiler or ILP-processor
Data dependencies in straight-line code
(WAW)
•WAW dependencies
• i1: mul r1, r2, r3
• r2: add r1, r4, r5
•output dependencies
•false dependencies
•can be eliminated through register renaming
• i1: mul r1, r2, r3
• r2: add r6, r4, r5
• by using compiler or ILP-processor
Data dependencies in loops
(recurrences)
for (int i=2; i<10; i++) {
x[i] = a*x[i-1] + b
}
• cannot be executed in parallel
S1. if (a == b)
S2. a = a + b
S3. b = a + b
Data dependencies in conditional
statement
Data dependency graphs
• i1: load r1, a;
• i2: load r2, b;
• i3: add r3, r1, r2; RAW -> δt
• i4: mul r1, r2, r4; WAR -> δa
• i5: div r1, r2, r4; WAW -> δo
i1 i2
i3
i4
i5
δt δt
δa
δo
Control dependencies
mul r1, r2, r3
jz zproc
:
zproc: load r1, x
:
•actual path of execution depends on the outcome
of multiplication
•impose dependencies on the logical subsequent
instructions
Control Dependency Graph
Resource dependencies
•An instruction is resource-dependent on a
previously issued instruction if it requires a
hardware resource which is still being used by a
previously issued instruction.
•e.g.
• div r1, r2, r3
• div r4, r2, r5

Computer Architecture Instruction-Level paraallel processors

  • 1.
  • 2.
    Improve CPU performanceby • increasing clock rates • (CPU running at even higher frequencies.) • increasing the number of instructions to be executed in parallel • (executing 6-10 instructions at the same time) • What is the limit for these?
  • 3.
  • 4.
  • 5.
    • Very longinstruction word (VLIW) refers to processor architectures designed to exploit instruction level parallelism (ILP). • VLIW processor allows programs to explicitly specify instructions to execute at the same time, concurrently, in parallel VLIW (very long instruction word,1024 bits!)
  • 6.
    VLIW (very longinstruction word,1024 bits!)
  • 7.
    Superscalar (sequential streamof instructions) superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. It therefore allows for more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate
  • 8.
    Scalar Vs Superscalar •Incontrast to a scalar processor that can execute at most one single instruction per clock cycle • superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor.
  • 9.
  • 11.
  • 12.
    From Sequential instructions toparallel execution • Dependencies between instructions • Instruction scheduling • Preserving sequential consistency
  • 13.
    Dependencies between instructions Instructionsoften depend on each other in such a way that a particular instruction cannot be executed until a preceding instruction or even two or three preceding instructions have been executed. 1 Data dependencies 2 Control dependencies 3 Resource dependencies
  • 14.
    Data dependencies • Readafter Write (RAW) • Write after Read (WAR) • Write after Write (WAW) • Recurrences
  • 15.
    Data dependencies instraight-line code (RAW) •RAW dependencies •i1: load r1, a •r2: add r2, r1, r1 •flow dependencies •true dependencies •cannot be abandoned
  • 16.
    Data dependencies instraight-line code (WAR) •WAR dependencies • i1: mul r1, r2, r3 • r2: add r2, r4, r5 •anti-dependencies •false dependencies •can be eliminated through register renaming • i1: mul r1, r2, r3 • r2: add r6, r4, r5 • by using compiler or ILP-processor
  • 17.
    Data dependencies instraight-line code (WAW) •WAW dependencies • i1: mul r1, r2, r3 • r2: add r1, r4, r5 •output dependencies •false dependencies •can be eliminated through register renaming • i1: mul r1, r2, r3 • r2: add r6, r4, r5 • by using compiler or ILP-processor
  • 19.
    Data dependencies inloops (recurrences) for (int i=2; i<10; i++) { x[i] = a*x[i-1] + b } • cannot be executed in parallel S1. if (a == b) S2. a = a + b S3. b = a + b Data dependencies in conditional statement
  • 20.
    Data dependency graphs •i1: load r1, a; • i2: load r2, b; • i3: add r3, r1, r2; RAW -> δt • i4: mul r1, r2, r4; WAR -> δa • i5: div r1, r2, r4; WAW -> δo i1 i2 i3 i4 i5 δt δt δa δo
  • 21.
    Control dependencies mul r1,r2, r3 jz zproc : zproc: load r1, x : •actual path of execution depends on the outcome of multiplication •impose dependencies on the logical subsequent instructions
  • 22.
  • 23.
    Resource dependencies •An instructionis resource-dependent on a previously issued instruction if it requires a hardware resource which is still being used by a previously issued instruction. •e.g. • div r1, r2, r3 • div r4, r2, r5