Superscalar Processor
By
Manash Kumar Mondal
M.Tech. CSE. KUCSE
Contents
 What is superscalar processor ?
 Why Superscalar?
 Organization of superscalar processor
 Instruction dispatch
 Reservation station
 Reservation station: Centralized vs distributed
 Recorder buffer
 Instruction completion and Retire
 Limitations of superscalar processor
 References
Supersalar processor
A superscalar processor is a CPU that
implements a form of parallelism called
instruction-level parallelism within a single
processor.
Simple superscalar pipeline
By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be
completed. (IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access,
WB = Register write back, i = Instruction number, t = Clock cycle [i.e., time])
Why superscalar?
 Most operations are on scalar quantities
 Improve these operations to get an overall improvement
 Superscalar processor executes multiple independent instructions in
parallel.
Superscalar Organization
Instruction Dispatch
Instructing
fetching
Instruction
decoding
FU2FU1 FUnFU5FU3 FU4
Instruction Dispatch
 Route decoded instructions to appropriate functional units
Reservation Station
 Reservation station decouple instruction decoding and instruction
execution .
 Main task: Dispatching -- Waiting --Issuing
Allocate unit
Issuing unit
ReadyDispatch
Issuing
Busy
Entrytobeissued
Entrytobeallocated
Reservation station
Reservation station: Centralised Vs Distributed
Dispatch
(Issue)
Dispatch buffer
Issue
Execute
Execute
Completion buffer
Fig: Centralize reservation station (Intel P6) Fig: Distributed reservation station (Power PC 620)
Reorder Buffer
 Contain all in–flight instruction
Includes instruction in RS + instruction executing in FUs + instruction
which are finished execution but waiting to be completed in program order
 Only finished and non-speculative instructions can be completed
Next instruction to
complete (head pointer)
Next entry to be allocated
(tail pointer)
0 0 0 1 1 1 1 1 1 1
In-flight-instruction
Busy
Issued
Finished
Instruction address
Rename register
Speculative
Valid
Instruction completion and Retire
Completion – finish the execution and update the machine state
Retire - update the memory
 A store may complete by writing to store buffer, but it retire
only when the data is written into the memory
 When an interrupt occurs, stop fetching new instructions and finish the
execution of all-in-flight instructions
 When an exception occurs, the result of the completion may no longer be
valid.
Limitation of superscalar processor
Instruction-fetch inefficiencies caused by both branch delays and
instruction misalignment
 not worthwhile to explore highly- concurrent execution
hardware, rather, it is more appropriate to explore economical
execution hardware
 degree of intrinsic parallelism in the instruction stream
(instructions requiring the same computational resources from
the CPU)
 complexity and time cost of the dispatcher and associated
dependency checking logic
 branch instruction processing.
References
1. Nptel lecture IIT Madras (online certification course) Superscalar processor
organization
2. https://en.wikipedia.org/wiki/Supercalar
3. https://www.slideshare.net/
4. [John_L._Hennessy,_David_A._Patterson]_Computer_Architecture , A
Quantitative
Approach, 4th Edition , (Digital copy) from bookfi.net.
Thank you !

Superscalar Processor

  • 1.
    Superscalar Processor By Manash KumarMondal M.Tech. CSE. KUCSE
  • 2.
    Contents  What issuperscalar processor ?  Why Superscalar?  Organization of superscalar processor  Instruction dispatch  Reservation station  Reservation station: Centralized vs distributed  Recorder buffer  Instruction completion and Retire  Limitations of superscalar processor  References
  • 3.
    Supersalar processor A superscalarprocessor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor.
  • 4.
    Simple superscalar pipeline Byfetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed. (IF = Instruction Fetch, ID = Instruction Decode, EX = Execute, MEM = Memory access, WB = Register write back, i = Instruction number, t = Clock cycle [i.e., time])
  • 5.
    Why superscalar?  Mostoperations are on scalar quantities  Improve these operations to get an overall improvement  Superscalar processor executes multiple independent instructions in parallel.
  • 6.
  • 7.
    Instruction Dispatch Instructing fetching Instruction decoding FU2FU1 FUnFU5FU3FU4 Instruction Dispatch  Route decoded instructions to appropriate functional units
  • 8.
    Reservation Station  Reservationstation decouple instruction decoding and instruction execution .  Main task: Dispatching -- Waiting --Issuing Allocate unit Issuing unit ReadyDispatch Issuing Busy Entrytobeissued Entrytobeallocated Reservation station
  • 9.
    Reservation station: CentralisedVs Distributed Dispatch (Issue) Dispatch buffer Issue Execute Execute Completion buffer Fig: Centralize reservation station (Intel P6) Fig: Distributed reservation station (Power PC 620)
  • 10.
    Reorder Buffer  Containall in–flight instruction Includes instruction in RS + instruction executing in FUs + instruction which are finished execution but waiting to be completed in program order  Only finished and non-speculative instructions can be completed Next instruction to complete (head pointer) Next entry to be allocated (tail pointer) 0 0 0 1 1 1 1 1 1 1 In-flight-instruction Busy Issued Finished Instruction address Rename register Speculative Valid
  • 11.
    Instruction completion andRetire Completion – finish the execution and update the machine state Retire - update the memory  A store may complete by writing to store buffer, but it retire only when the data is written into the memory  When an interrupt occurs, stop fetching new instructions and finish the execution of all-in-flight instructions  When an exception occurs, the result of the completion may no longer be valid.
  • 12.
    Limitation of superscalarprocessor Instruction-fetch inefficiencies caused by both branch delays and instruction misalignment  not worthwhile to explore highly- concurrent execution hardware, rather, it is more appropriate to explore economical execution hardware  degree of intrinsic parallelism in the instruction stream (instructions requiring the same computational resources from the CPU)  complexity and time cost of the dispatcher and associated dependency checking logic  branch instruction processing.
  • 13.
    References 1. Nptel lectureIIT Madras (online certification course) Superscalar processor organization 2. https://en.wikipedia.org/wiki/Supercalar 3. https://www.slideshare.net/ 4. [John_L._Hennessy,_David_A._Patterson]_Computer_Architecture , A Quantitative Approach, 4th Edition , (Digital copy) from bookfi.net.
  • 14.