SlideShare a Scribd company logo
Chapter 4 The Processor Cheng-Jung Tsai  Assistant Professor Department of Mathematics, National Changhua University of Education, Taiwan, R.O.C. Email:  [email_address] Modified from the instructor’s teaching material provided by Elsevier inc. Copyright 2009
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§ 5 .1 Introduction
1.1  Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§4.1 Introduction
A basic MIPS implementation ,[object Object],[object Object],[object Object],[object Object],[object Object]
An overview of the implementation:  Instruction Execution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FIGURE 4.1 An abstract view of the implementation of the MIPS subset   branch on equal beq $1,$2,25 if ($1 == $2) go to PC+4+100 This figure  showing the major functional units and the major connections between them .  All instructions start by using the   program counter  to supply the instruction address to the  instruction memory . After the instruction is fetched,  the  register operands  used by an instruction are specified by fields of that instruction . Once the register operands have been fetched, they can be operated on  by  ALU   to  compute a memory address (for a load or store), to compute an arithmetic result (for an integer arithmetic-logical instruction), or a compare (for a branch).   If the instruction is an arithmetic-logical instruction, the result from the ALU must be  written to a register . If the operation is a load or store, the ALU result is used as an address to either  store a value from the registers  or  load a value from memory into the registers .  The result from the ALU or memory is written back into the register file .  Branches require the use of the ALU output to  determine the next instruction address , which  comes either from the ALU  (where the PC and branch offset are summed) or from an adder that  increments the current PC by 4 . The thick lines interconnecting the functional units represent  buses , which consist of multiple signals.
CPU Overview  &  Multiplexers  ( 多工器 ) ,[object Object],[object Object]
Multiplexors   in appendix C ,[object Object],[object Object],[object Object],[object Object],[object Object],AND OR S = 1    C = B S = 0    C = A
FIGURE 4.2 The basic implementation of the MIPS subset, including the necessary  multiplexors  and  control lines .   The  top multiplexor (“Mux”)  controls what value replaces the PC ( PC + 4 or the branch destination address ); the multiplexor  is controlled by the  gate that “ANDs ” together the  Zero output of the ALU  and  a control signal  that indicates that the instruction is a branch .  The middle multiplexor , whose output returns to the register file, is used to  steer the output of the ALU (in the case of an arithmetic-logical instruction) or the output of the data memory (in the case of a load) for writing into the register file . Finally, the  bottommost multiplexor  is used to  determine whether the second ALU input is from the registers (for an arithmetic-logical instruction OR a branch) or from the offset field of the instruction (for a load or store).  The added control lines are straightforward and determine the operation performed at the ALU, whether the data memory should read or write, and whether the registers should perform a write operation. The control lines are shown in color to make them easier to see.
4.2 Logic Design Conventions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§4.2 Logic Design Conventions
Recall in Computer Science: Logic operations at the bit level Truth table:  a truth table defines the values of the output for each possible input or inputs. exclusive-or
Combinational Elements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A B Y I0 I1 Y M u x S A B Y + A B Y ALU F
Sequential Elements ,[object Object],[object Object],[object Object],D Clk Q Clk D Q
Clocking Methodology ,[object Object],[object Object],[object Object],[object Object],An edge-triggered methodology  allows a state element to be read and written in the same clock cycle  without creating a race that could lead to indeterminate data values.
4.3  Building a Datapath ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§ 4 .3 Building a Datapath
Datapath element for each instruction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FIGURE 4.6 A portion of the datapath used for fetching instructions and incrementing the program counter 32-bit register Increment by 4 for next instruction ,[object Object]
Two elements needed to implement  R-type  ALU operations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],op rs rt rd shamt funct 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
Two elements needed to implement R-type ALU operations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],32 The operation to be performed by the ALU is controlled with the ALU operation signal, which will be  4 bits  wide use the Zero detection output of the ALU shortly to implement  branches  in this simple implementation .  5 bits  wide to specify one of the 32 registers
Two extra units needed to implement and  Load/Store   ,[object Object],[object Object],[object Object],[object Object],[object Object],I-type Has a 16-bit input that is sign-extended into a 32-bit result appearing on the output op rs rt constant or address 6 bits 5 bits 5 bits 16 bits
Recall:  Useful 3 shortcuts of 2’s complement numbers ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Datapath for  Branch Instructions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],op address 6 bits 26 bits
FIGURE 4.9: The datapath for a branch   Just re-routes wires Sign-bit wire replicated PC+4  or branch target
Creating a Single datapath:  Composing the Elements ,[object Object],[object Object],[object Object],[object Object]
Example in p.313 : building a datapath:  R-Type/Load/Store Datapath ,[object Object],[object Object],[object Object]
Figure 4.11  in p.315   Full Datapath PC+ 4 or branch target Arithmetic or address calculation  load or ALU result Give you 5 minutes to be familiar with this datapath & Please try the “Check Yourself” in P.315 What is missed in this Figure?  Control Unit
4.4 A Simple Implementation Scheme:  The  ALU Control ,[object Object],[object Object],[object Object],[object Object],[object Object],§ 4 .4 A Simple Implementation Scheme How to generate above 4-bit  ALU Control  input ? See next slide...... NOR 1100 set-on-less-than 0111 subtract 0110 add 0010 OR 0001 AND 0000 Function ALU control  lines
Generate the 4-bit  ALU Control  input ,[object Object],[object Object],[object Object],[object Object],6-bit
FIGURE 4.13: The truth table for the 4 ALU control bits  ,[object Object],[object Object],[object Object],[object Object]
Designing t he Main Control Unit ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],opcode always read read, except for load write for R-type and load
FIGURE 4.15 The datapath with all necessary multiplexors and all control lines   identified ,[object Object],[object Object],Used to decide  which operation is performed rs rt rd
FIGURE 4.16 & 4.18: The effect of each of the seven control signals 0 1
FIGURE 4.17: The simple datapath with the control unit.  The  PCsrc control line should be set if the instruction is branch on equal and the Zero output of ALU is true. Used to decide  which operation is performed Used to decide  if a branch is taken rs rt rd op address or f-code
FIGURE 4.22: implementation the truth table ,[object Object],[object Object]
Implementing Main Control in Appendix D
Operation of the Datapath for an R-type instruction:  add ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],op rs rt rd shamt funct 0 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
Fig. 4. 19 in p.324 Operation of Datapath:  add  Instruction Fetch at Start of  Add ,[object Object]
Operation of Datapath:  add Instruction Decode of Add ,[object Object],rs rt rd op address or shamt + f-code op rs rt rd shamt funct 0 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
Operation of Datapath:  add ALU Operation during Add ,[object Object]
Operation of Datapath:  add Write Back at the End of   Add ,[object Object]
Figure 4.19: Operation of Datapath in textbook: add
Figure 4.20: Datapath Operation for  I-type  instruction:  lw ,[object Object]
Figure 4.21:   Datapath Operation for  I-type  instruction:  beq ,[object Object],[object Object]
Implementing Jumps ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],J -type 000010 address  (26-bit immediate) 31:26 25:0
Figure 4.24:   Datapath With  Jumps Added
Why a Single-Cycle Implementation Is Not Used Today ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Performance of Single-Cycle Machine-  Example  in p.315 of the 3-ed. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],Performance of Single-Cycle Machine-  Example  in p.315 of the 3-ed. Instruction Fetch Jump ALU Register Access Instruction Fetch Branch Memory Access ALU Register Access Instruction Fetch Store Word Register Access Memory Access ALU Register Access Instruction Fetch Load Word Register Access ALU Register Access Instruction Fetch R-Format Used Function Units Instruction class
Performance of Single-Cycle Machine-  Example  in p.315 of the 3-ed. ,[object Object],Memory units: 2 ns, ALU and adders:  2 ns, Register file (read or write):  1 ns. 2 ns 2 Jump 5 ns 2 1 2 Branch 7 ns 2 2 1 2 Store Word 8 ns 1 2 2 1 2 Load Word 6 ns 1 0 2 1 2 R-Format Total Register Write Data Memory ALU Operation Register Read Instruction Memory Instruction Class
Performance of Single-Cycle Machine-  Example  in p.315 of the 3-ed. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4.5 An Overview of Pipelining ,[object Object],[object Object],[object Object],§4.5 An Overview of Pipelining ,[object Object],[object Object],[object Object],[object Object]
Pipelining Lessons ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],6 PM 7 8 9 T a s k O r d e r Time A B C D 30 40 40 40 40 20
MIPS Pipeline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pipeline Performance : Example  in P.333 ,[object Object],[object Object],[object Object],[object Object],500ps 200ps 100 ps 200ps beq 600ps 100 ps 200ps 100 ps 200ps R-format 700ps 200ps 200ps 100 ps 200ps sw 800ps 100 ps 200ps 200ps 100 ps 200ps lw Total time Register write Memory access ALU op Register read Instr fetch Instr
Pipeline Performance : Fig. 4.27  in P.333 Single-cycle (T c = 800ps) Pipelined (T c = 200ps) Unbalanced stage length  reduce the speedup of pipeline
Pipeline Speedup ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pipelining and ISA Design ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hazards ( 危障 ) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Structure Hazards ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Lw $4, 400 (s0) stall
Structural Hazard: Single Memory Mem I n s t r. O r d e r Time Load Instr 1 Instr 2 Instr 3 Instr 4 Reg Mem Reg Reg Mem Reg ALU Mem ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg ALU ALU Mem Reg Mem Reg
Data Hazards ,[object Object],[object Object],[object Object],means write means read Three bubbles Need $ s0
Solution for data hazard ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Load-Use Data Hazard ,[object Object],[object Object],[object Object],[object Object]
Complier:   Code Scheduling to Avoid Stalls  –  example in p.338 ,[object Object],[object Object],[object Object],lw $t1, 0($t0) lw $t2 , 4($t0) add $t3, $t1,  $t2 sw $t3, 12($t0) lw $t4 , 8($t0) add $t5, $t1,  $t4 sw $t5, 16($t0) stall stall lw $t1, 0($t0) lw $t2 , 4($t0) lw $t4 , 8($t0) add $t3, $t1,  $t2 sw $t3, 12($t0) add $t5, $t1,  $t4 sw $t5, 16($t0) 11 cycles 13 cycles
Control Hazards  (Branch  Hazards ) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Control Hazards  (Branch  Hazards ) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Solution 1:  Stall on Branch ,[object Object],[object Object]
Solution 2:  Branch Prediction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More-Realistic Branch Prediction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MIPS with  Predict Not Taken Prediction correct Prediction incorrect
Solution 3:  Delayed Branch Solution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pipeline Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The BIG Picture
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],4.6 Pipelined Datapath and Control §4.6 Pipelined Datapath and Control
[object Object],[object Object],Why all instructions have the same number of stages Clock Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 R-type R-type Load R-type R-type Ifetch Reg/Dec Exec Wr Ifetch Reg/Dec Exec Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Wr Ifetch Reg/Dec Exec Wr Ops!  We have a problem!
[object Object],[object Object],[object Object],[object Object],Important Observation for pipeline Ifetch Reg/Dec Exec Mem Wr Load 1 2 3 4 5 Ifetch Reg/Dec Exec Wr R-type 1 2 3 4
MIPS Pipelined Datapath WB MEM Right-to-left flow leads to hazards FIGURE  4 . 33   Each step of the instruction can be mapped onto the datapath  from left to right . The only exceptions  (left-to-right)  are the update of the PC and the write-back step, which sends either the ALU result  (can lead to control hazard)  or the data from memory to the left to be written into the register file  (can lead to data hazard) Control hazard data hazard
FIGURE 4.35 Pipeline registers ,[object Object],[object Object],FIGURE 4.35  The  pipeline registers   are labeled by the stages that they separate; for example, the first is labeled   IF/ID .  The registers must be wide enough to store all the data corresponding to the lines that go through them . For example,  the  IF/ID register must be 64 bits wide,  because it must hold both the 32-bit instruction fetched from memory and the incremented 32-bit PC address . We will expand these registers over the course of this chapter, but for now  the other three pipeline registers contain 128, 97, and 64 bits, respectively .
Graphically representing pipelines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Considering  load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load IF ID Exe Mem Wr
single-clock-cycle diagrams   Figure 4.36  IF for Load, Store, … instruction Read, PC+4 M eans read
ID for Load, Store, … FIGURE 4.36  Although the load  needs only the top register in stage 2, the processor  doesn’t know what instruction is being decoded , so it  sign-extends the 16-bit constant  and  reads both registers  into the ID/EX pipeline register.
EX E  for Load FIGURE 4.37  The register is added to the sign-extended immediate, and the sum is placed in the EX/MEM pipeline register.
MEM for Load FIGURE 4.38  Data memory is read using the address in the EX/MEM pipeline registers, and the data is placed in the MEM/WB pipeline register.
WB for Load FIGURE 4.38  Next, data is read from the MEM/WB pipeline register and written into the register file in the middle of the datapath.  Note: there is a bug in this design that is repaired in Figure 4.41. Who will supply this address?
FIGURE  4 . 41  Corrected Datapath for Load The write register number now comes from the MEM/WB pipeline register along with the data.  The register number is passed from the ID pipe stage until it reaches the MEM/WB pipeline register, adding  5  more bits to the last three pipeline registers . I-type
FIGURE 4.39  EX for Store Unlike the third stage of the load instruction,  the second register value is loaded into the EX/MEM  pipeline register to be used in the next stage .  ,[object Object],I-type
FIGURE 4.40  MEM for Store In the fourth stage, the data is  written into data memory for the store .  Note that the data comes from the EX/MEM pipeline register and that nothing is changed in the MEM/WB pipeline register.
FIGURE 4.40  WB for Store Once the data is written in memory,  there is nothing left for the store instruction to do, so   nothing happens in stage 5 .
Figure 4.43 in p.357 Multi-Cycle Pipeline  Diagram ,[object Object]
Figure 4.44 in p.357 Multi-Cycle Pipeline  Diagram ,[object Object],[object Object]
Figure 4.45 in p.358 Single-Cycle Pipeline  Diagram ,[object Object],[object Object],[object Object],[object Object]
FIGURE  4 . 46  Pipelined Control (Simplified  version ) FIGURE 4.46 The pipelined datapath of Figure 4.41 with the control signals identified.  This datapath borrows the control logic for PC source, register destination number, and ALU control from Section 4.4. Note that  we now  need the 6-bit function code of the instruction in the  EX stage  as input to ALU control , so these bits must also be included in the ID/EX pipeline register .  ,[object Object]
Recall:  FIGURE 4.17: The simple datapath with the control unit.  The  PCsrc control line should be set if the instruction is branch on equal and the Zero output of ALU is true. Used to decide  which operation is performed rs rt rd op address or f-code Used to decide  if a branch is taken
Pipelined Control ,[object Object],[object Object],[object Object],[object Object],[object Object],FIGURE 4.50 The control lines for the final three stages.  Note that  four of the nine  control lines are used in the  EX  phase, with  the remaining five control lines passed on  to the  EX/MEM  pipeline register extended to hold the control lines;  three  are used during the  MEM  stage, and the last  two  are passed to  MEM/WB  for use in the  WB  stage.
Pipelined Control   (cont.) ,[object Object],[object Object],[object Object],IF/ID Register ID/Ex Register Ex/MEM Register MEM/WB Register ID EX MEM ExtOp ALUOp RegDst ALUSrc Branch MemWr MemtoReg RegWr Main Control ExtOp ALUOp RegDst ALUSrc MemtoReg RegWr MemtoReg RegWr MemtoReg RegWr Branch MemWr Branch MemW WB
FIGURE 4.51  The pipelined datapath of Figure 4.46 with the control signals connected to the control portions of the pipe line registers.   The control values for the last three stages are  created during the   ID  stage and then placed in the  ID/EX pipeline register . The control lines for each pipe stage are used, and remaining control lines are then passed to the next pipeline stage.
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Supplement: Pipelined control (Write the control signals) -- Let’s Try it Out
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7 Fig. 6.34
Cycle 8 Fig. 6.34
Cycle 9
4.7  Data Hazards :  Forwarding  vs.  Stalling Data Hazards in ALU Instructions ,[object Object],[object Object],[object Object],[object Object],[object Object],§4.7 Data Hazards: Forwarding vs. Stalling
Dependencies & Forwarding ,[object Object]
Detecting the Need to Forward ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg
Recall:  the example in slide 108 Detecting the Need to Forward ,[object Object],[object Object],[object Object]
Detecting the Need to Forward ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FIGURE 4.54  Forwarding Paths  hardware On the top are the ALU and pipeline registers before adding forwarding.  On the bottom, the  multiplexors  have been expanded to add the forwarding paths , and we show the forwarding unit. This figure is a stylized drawing, how ever, leaving out details from the full datapath such as the sign extension hardware. Note that the ID/EX. RegisterRt field is shown twice, once to connect to the mux and once to the forwarding unit, but it is a single signal.
FIGURE  4 . 55  The control values for the forwarding multiplexors
Forwarding Conditions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
More complicated  Data Hazard : Double Data Hazard ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Revised Forwarding Condition  for double data hazard ,[object Object],[object Object],[object Object],[object Object],[object Object]
FIGURE  4 . 56  The datapath modified to resolve   hazards via forwarding
Load-Use Data Hazard Need to stall for one cycle
Load-Use Hazard Detection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to Stall the Pipeline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Stall/Bubble in the Pipeline Stall inserted here
Stall/Bubble in the Pipeline Or, more accurately…
Datapath with Hazard Detection
Stalls and Performance ,[object Object],[object Object],[object Object],[object Object],The BIG Picture
Branch Hazards ,[object Object],§4.8 Control Hazards PC Flush these instructions (Set control values to 0)
Reducing Branch Delay ,[object Object],[object Object],[object Object],[object Object],[object Object]
Example: Branch Taken
Example: Branch Taken
Data Hazards for Branches ,[object Object],… add  $4 , $5, $6 add  $1 , $2, $3 beq  $1 ,  $4 , target ,[object Object],IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB
Data Hazards for Branches ,[object Object],[object Object],beq  stalled IF ID ID EX MEM WB add  $4 , $5, $6 lw  $1 , addr beq  $1 ,  $4 , target IF ID EX MEM WB IF ID EX MEM WB
Data Hazards for Branches ,[object Object],[object Object],beq  stalled IF ID ID ID EX MEM WB beq  stalled lw  $1 , addr beq  $1 ,  $0 , target IF ID EX MEM WB
Dynamic Branch Prediction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1-Bit Predictor: Shortcoming ,[object Object],outer: …   … inner: … … beq …, …, inner   …   beq …, …, outer ,[object Object],[object Object]
2-Bit Predictor ,[object Object]
Calculating the Branch Target ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exceptions and Interrupts ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§4.9 Exceptions
Handling Exceptions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
An Alternate Mechanism ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Handler Actions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exceptions in a Pipeline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pipeline with Exceptions
Exception Properties ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exception Example ,[object Object],[object Object],[object Object],[object Object]
Exception Example
Exception Example
Multiple Exceptions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Imprecise Exceptions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Instruction-Level Parallelism (ILP) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§4.10  Parallelism and Advanced Instruction Level Parallelism
Multiple Issue ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Speculation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Compiler/Hardware Speculation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Speculation and Exceptions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Static Multiple Issue ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scheduling Static Multiple Issue ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MIPS with Static Dual Issue ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],n + 20 n + 16 n + 12 n + 8 n + 4 n Address WB MEM EX ID IF Load/store WB MEM EX ID IF ALU/branch WB MEM EX ID IF Load/store WB MEM EX ID IF ALU/branch WB MEM EX ID IF Load/store WB MEM EX ID IF ALU/branch Pipeline Stages Instruction type
MIPS with Static Dual Issue
Hazards in the Dual-Issue MIPS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scheduling Example ,[object Object],Loop: lw  $t0 , 0($s1)  # $t0=array element   addu  $t0 ,  $t0 , $s2  # add scalar in $s2   sw  $t0 , 0($s1)  # store result   addi  $s1 , $s1,–4  # decrement pointer   bne  $s1 , $zero, Loop # branch $s1!=0 ,[object Object],4 sw  $t0 , 4($s1) bne  $s1 , $zero, Loop 3 nop addu  $t0 ,  $t0 , $s2 2 nop addi  $s1 , $s1,–4 1 lw  $t0 , 0($s1) nop Loop: cycle Load/store ALU/branch
Loop Unrolling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Loop Unrolling Example ,[object Object],[object Object],3 lw  $t2 , 8($s1) addu  $t0 ,  $t0 , $s2 4 lw  $t3 , 4($s1) addu  $t1 ,  $t1 , $s2 5 sw  $t0 , 16($s1) addu  $t2 ,  $t2 , $s2 6 sw  $t1 , 12($s1) addu  $t3 ,  $t4 , $s2 8 sw  $t3 , 4($s1) bne  $s1 , $zero, Loop 7 sw  $t2 , 8($s1) nop 2 lw  $t1 , 12($s1) nop 1 lw  $t0 , 0($s1) addi  $s1 , $s1,–16 Loop: cycle Load/store ALU/branch
Dynamic Multiple Issue ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamic Pipeline Scheduling ,[object Object],[object Object],[object Object],[object Object],[object Object]
Dynamically Scheduled CPU Results also sent to any waiting reservation stations Reorders buffer for register writes Can supply operands for issued instructions Preserves dependencies Hold pending operands
Register Renaming ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Speculation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why Do Dynamic Scheduling? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Does Multiple Issue Work? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The BIG Picture
Power Efficiency ,[object Object],[object Object],70W 8 No 1 6 1200MHz 2005 UltraSparc T1 90W 1 No 4 14 1950MHz 2003 UltraSparc III 75W 2 Yes 4 14 2930MHz 2006 Core 103W 1 Yes 3 31 3600MHz 2004 P4 Prescott 75W 1 Yes 3 22 2000MHz 2001 P4 Willamette 29W 1 Yes 3 10 200MHz 1997 Pentium Pro 10W 1 No 2 5 66MHz 1993 Pentium 5W 1 No 1 5 25MHz 1989 i486 Power Cores Out-of-order/ Speculation Issue width Pipeline Stages Clock Rate Year Microprocessor
The Opteron X4 Microarchitecture §4.11 Real Stuff: The AMD Opteron X4 (Barcelona) Pipeline 72 physical registers
The Opteron X4 Pipeline Flow ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Fallacies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§4.13 Fallacies and Pitfalls
Pitfalls ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concluding Remarks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],§4.14 Concluding Remarks

More Related Content

What's hot

ADDRESSING MODES
ADDRESSING MODESADDRESSING MODES
ADDRESSING MODES
Sadaf Rasheed
 
Pipelining
PipeliningPipelining
Pipelining
AJAL A J
 
Processor Organization
Processor OrganizationProcessor Organization
Processor Organization
Dominik Salvet
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
tamizh arthanari
 
Cache memory
Cache memoryCache memory
Cache memory
Shailesh Tanwar
 
Processor organization & register organization
Processor organization & register organizationProcessor organization & register organization
Processor organization & register organization
Ghanshyam Patel
 
Micro programmed control
Micro programmed  controlMicro programmed  control
Micro programmed control
Shashank Singh
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
Mazin Alwaaly
 
Control Unit Design
Control Unit DesignControl Unit Design
Control Unit Design
Vinit Raut
 
Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1) Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1)
Subhasis Dash
 
Lecture1 - Computer Architecture
Lecture1 - Computer ArchitectureLecture1 - Computer Architecture
Lecture1 - Computer Architecture
Volodymyr Ushenko
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
AJAL A J
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
Vinit Raut
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
Dilum Bandara
 
Microoperations
MicrooperationsMicrooperations
Microoperations
Rakesh Pillai
 
Micro Programmed Control Unit
Micro Programmed Control UnitMicro Programmed Control Unit
Micro Programmed Control Unit
Kamal Acharya
 
04 cache memory.ppt 1
04 cache memory.ppt 104 cache memory.ppt 1
04 cache memory.ppt 1
Anwal Mirza
 

What's hot (20)

ADDRESSING MODES
ADDRESSING MODESADDRESSING MODES
ADDRESSING MODES
 
Pipelining
PipeliningPipelining
Pipelining
 
Chapter 5 c
Chapter 5 cChapter 5 c
Chapter 5 c
 
Processor Organization
Processor OrganizationProcessor Organization
Processor Organization
 
Virtual memory
Virtual memoryVirtual memory
Virtual memory
 
Cache memory
Cache memoryCache memory
Cache memory
 
Processor organization & register organization
Processor organization & register organizationProcessor organization & register organization
Processor organization & register organization
 
Micro programmed control
Micro programmed  controlMicro programmed  control
Micro programmed control
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Control Unit Design
Control Unit DesignControl Unit Design
Control Unit Design
 
Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1) Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1)
 
Lecture1 - Computer Architecture
Lecture1 - Computer ArchitectureLecture1 - Computer Architecture
Lecture1 - Computer Architecture
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
 
Microoperations
MicrooperationsMicrooperations
Microoperations
 
Risc
RiscRisc
Risc
 
Micro Programmed Control Unit
Micro Programmed Control UnitMicro Programmed Control Unit
Micro Programmed Control Unit
 
Instruction code
Instruction codeInstruction code
Instruction code
 
04 cache memory.ppt 1
04 cache memory.ppt 104 cache memory.ppt 1
04 cache memory.ppt 1
 

Similar to Chapter 4 the processor

Hennchthree 161102111515
Hennchthree 161102111515Hennchthree 161102111515
Hennchthree 161102111515
marangburu42
 
Hennchthree
HennchthreeHennchthree
Hennchthree
marangburu42
 
Addressing modes (detailed data path)
Addressing modes (detailed data path)Addressing modes (detailed data path)
Addressing modes (detailed data path)Mahesh Kumar Attri
 
Unit iii
Unit iiiUnit iii
Unit iii
Janani S
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processorBảo Hoang
 
pdfslide.net_morris-mano-ppt.ppt
pdfslide.net_morris-mano-ppt.pptpdfslide.net_morris-mano-ppt.ppt
pdfslide.net_morris-mano-ppt.ppt
SaurabhPorwal14
 
CAO-Unit-I.pptx
CAO-Unit-I.pptxCAO-Unit-I.pptx
CAO-Unit-I.pptx
ClassicFUKRA
 
Datapath Design of Computer Architecture
Datapath Design of Computer ArchitectureDatapath Design of Computer Architecture
Datapath Design of Computer Architecture
Abu Zaman
 
IntroductionCPU performance factorsInstruction countDeterm.docx
IntroductionCPU performance factorsInstruction countDeterm.docxIntroductionCPU performance factorsInstruction countDeterm.docx
IntroductionCPU performance factorsInstruction countDeterm.docx
normanibarber20063
 
Computer Organization and 8085 microprocessor notes
Computer Organization and 8085 microprocessor notesComputer Organization and 8085 microprocessor notes
Computer Organization and 8085 microprocessor notes
Lakshmi Sarvani Videla
 
4bit PC report
4bit PC report4bit PC report
4bit PC report
tanvin
 
4bit pc report[cse 08-section-b2_group-02]
4bit pc report[cse 08-section-b2_group-02]4bit pc report[cse 08-section-b2_group-02]
4bit pc report[cse 08-section-b2_group-02]
shibbirtanvin
 
THE PROCESSOR
THE PROCESSORTHE PROCESSOR
THE PROCESSOR
Jai Sudhan
 
Data Manipulation
Data ManipulationData Manipulation
Data ManipulationAsfi Bhai
 
Unit 3 The processor
Unit 3 The processorUnit 3 The processor
Unit 3 The processor
Balaji Vignesh
 
ITEC582-Chapter 12.pptx
ITEC582-Chapter 12.pptxITEC582-Chapter 12.pptx
ITEC582-Chapter 12.pptx
SabaNaeem26
 
basic-processing-unit computer organ.ppt
basic-processing-unit computer organ.pptbasic-processing-unit computer organ.ppt
basic-processing-unit computer organ.ppt
ssuser702532
 
Computer Organization for third semester Vtu SyllabusModule 4.ppt
Computer Organization  for third semester Vtu SyllabusModule 4.pptComputer Organization  for third semester Vtu SyllabusModule 4.ppt
Computer Organization for third semester Vtu SyllabusModule 4.ppt
ShilpaKc3
 

Similar to Chapter 4 the processor (20)

Hennchthree 161102111515
Hennchthree 161102111515Hennchthree 161102111515
Hennchthree 161102111515
 
Hennchthree
HennchthreeHennchthree
Hennchthree
 
Addressing modes (detailed data path)
Addressing modes (detailed data path)Addressing modes (detailed data path)
Addressing modes (detailed data path)
 
Unit iii
Unit iiiUnit iii
Unit iii
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processor
 
pdfslide.net_morris-mano-ppt.ppt
pdfslide.net_morris-mano-ppt.pptpdfslide.net_morris-mano-ppt.ppt
pdfslide.net_morris-mano-ppt.ppt
 
CAO-Unit-I.pptx
CAO-Unit-I.pptxCAO-Unit-I.pptx
CAO-Unit-I.pptx
 
Datapath Design of Computer Architecture
Datapath Design of Computer ArchitectureDatapath Design of Computer Architecture
Datapath Design of Computer Architecture
 
IntroductionCPU performance factorsInstruction countDeterm.docx
IntroductionCPU performance factorsInstruction countDeterm.docxIntroductionCPU performance factorsInstruction countDeterm.docx
IntroductionCPU performance factorsInstruction countDeterm.docx
 
Computer Organization and 8085 microprocessor notes
Computer Organization and 8085 microprocessor notesComputer Organization and 8085 microprocessor notes
Computer Organization and 8085 microprocessor notes
 
4bit PC report
4bit PC report4bit PC report
4bit PC report
 
4bit pc report[cse 08-section-b2_group-02]
4bit pc report[cse 08-section-b2_group-02]4bit pc report[cse 08-section-b2_group-02]
4bit pc report[cse 08-section-b2_group-02]
 
THE PROCESSOR
THE PROCESSORTHE PROCESSOR
THE PROCESSOR
 
Data Manipulation
Data ManipulationData Manipulation
Data Manipulation
 
Unit 3 The processor
Unit 3 The processorUnit 3 The processor
Unit 3 The processor
 
Bc0040
Bc0040Bc0040
Bc0040
 
ITEC582-Chapter 12.pptx
ITEC582-Chapter 12.pptxITEC582-Chapter 12.pptx
ITEC582-Chapter 12.pptx
 
basic-processing-unit computer organ.ppt
basic-processing-unit computer organ.pptbasic-processing-unit computer organ.ppt
basic-processing-unit computer organ.ppt
 
Computer Organization for third semester Vtu SyllabusModule 4.ppt
Computer Organization  for third semester Vtu SyllabusModule 4.pptComputer Organization  for third semester Vtu SyllabusModule 4.ppt
Computer Organization for third semester Vtu SyllabusModule 4.ppt
 
Lect11 organization
Lect11 organizationLect11 organization
Lect11 organization
 

Recently uploaded

Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
christianmathematics
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Excellence Foundation for South Sudan
 
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
SriSurya50
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 

Recently uploaded (20)

Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
What is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptxWhat is the purpose of studying mathematics.pptx
What is the purpose of studying mathematics.pptx
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Your Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective UpskillingYour Skill Boost Masterclass: Strategies for Effective Upskilling
Your Skill Boost Masterclass: Strategies for Effective Upskilling
 
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptxFresher’s Quiz 2023 at GMC Nizamabad.pptx
Fresher’s Quiz 2023 at GMC Nizamabad.pptx
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 

Chapter 4 the processor

  • 1. Chapter 4 The Processor Cheng-Jung Tsai Assistant Professor Department of Mathematics, National Changhua University of Education, Taiwan, R.O.C. Email: [email_address] Modified from the instructor’s teaching material provided by Elsevier inc. Copyright 2009
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. FIGURE 4.1 An abstract view of the implementation of the MIPS subset branch on equal beq $1,$2,25 if ($1 == $2) go to PC+4+100 This figure showing the major functional units and the major connections between them . All instructions start by using the program counter to supply the instruction address to the instruction memory . After the instruction is fetched, the register operands used by an instruction are specified by fields of that instruction . Once the register operands have been fetched, they can be operated on by ALU to compute a memory address (for a load or store), to compute an arithmetic result (for an integer arithmetic-logical instruction), or a compare (for a branch). If the instruction is an arithmetic-logical instruction, the result from the ALU must be written to a register . If the operation is a load or store, the ALU result is used as an address to either store a value from the registers or load a value from memory into the registers . The result from the ALU or memory is written back into the register file . Branches require the use of the ALU output to determine the next instruction address , which comes either from the ALU (where the PC and branch offset are summed) or from an adder that increments the current PC by 4 . The thick lines interconnecting the functional units represent buses , which consist of multiple signals.
  • 7.
  • 8.
  • 9. FIGURE 4.2 The basic implementation of the MIPS subset, including the necessary multiplexors and control lines . The top multiplexor (“Mux”) controls what value replaces the PC ( PC + 4 or the branch destination address ); the multiplexor is controlled by the gate that “ANDs ” together the Zero output of the ALU and a control signal that indicates that the instruction is a branch . The middle multiplexor , whose output returns to the register file, is used to steer the output of the ALU (in the case of an arithmetic-logical instruction) or the output of the data memory (in the case of a load) for writing into the register file . Finally, the bottommost multiplexor is used to determine whether the second ALU input is from the registers (for an arithmetic-logical instruction OR a branch) or from the offset field of the instruction (for a load or store). The added control lines are straightforward and determine the operation performed at the ALU, whether the data memory should read or write, and whether the registers should perform a write operation. The control lines are shown in color to make them easier to see.
  • 10.
  • 11. Recall in Computer Science: Logic operations at the bit level Truth table: a truth table defines the values of the output for each possible input or inputs. exclusive-or
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. FIGURE 4.9: The datapath for a branch Just re-routes wires Sign-bit wire replicated PC+4 or branch target
  • 24.
  • 25.
  • 26. Figure 4.11 in p.315 Full Datapath PC+ 4 or branch target Arithmetic or address calculation load or ALU result Give you 5 minutes to be familiar with this datapath & Please try the “Check Yourself” in P.315 What is missed in this Figure? Control Unit
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. FIGURE 4.16 & 4.18: The effect of each of the seven control signals 0 1
  • 33. FIGURE 4.17: The simple datapath with the control unit. The PCsrc control line should be set if the instruction is branch on equal and the Zero output of ALU is true. Used to decide which operation is performed Used to decide if a branch is taken rs rt rd op address or f-code
  • 34.
  • 35. Implementing Main Control in Appendix D
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Figure 4.19: Operation of Datapath in textbook: add
  • 42.
  • 43.
  • 44.
  • 45. Figure 4.24: Datapath With Jumps Added
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Pipeline Performance : Fig. 4.27 in P.333 Single-cycle (T c = 800ps) Pipelined (T c = 200ps) Unbalanced stage length reduce the speedup of pipeline
  • 56.
  • 57.
  • 58.
  • 59.
  • 60. Structural Hazard: Single Memory Mem I n s t r. O r d e r Time Load Instr 1 Instr 2 Instr 3 Instr 4 Reg Mem Reg Reg Mem Reg ALU Mem ALU Mem Reg Mem Reg ALU Mem Reg Mem Reg ALU ALU Mem Reg Mem Reg
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70. MIPS with Predict Not Taken Prediction correct Prediction incorrect
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76. MIPS Pipelined Datapath WB MEM Right-to-left flow leads to hazards FIGURE 4 . 33 Each step of the instruction can be mapped onto the datapath from left to right . The only exceptions (left-to-right) are the update of the PC and the write-back step, which sends either the ALU result (can lead to control hazard) or the data from memory to the left to be written into the register file (can lead to data hazard) Control hazard data hazard
  • 77.
  • 78.
  • 79.
  • 80. single-clock-cycle diagrams Figure 4.36 IF for Load, Store, … instruction Read, PC+4 M eans read
  • 81. ID for Load, Store, … FIGURE 4.36 Although the load needs only the top register in stage 2, the processor doesn’t know what instruction is being decoded , so it sign-extends the 16-bit constant and reads both registers into the ID/EX pipeline register.
  • 82. EX E for Load FIGURE 4.37 The register is added to the sign-extended immediate, and the sum is placed in the EX/MEM pipeline register.
  • 83. MEM for Load FIGURE 4.38 Data memory is read using the address in the EX/MEM pipeline registers, and the data is placed in the MEM/WB pipeline register.
  • 84. WB for Load FIGURE 4.38 Next, data is read from the MEM/WB pipeline register and written into the register file in the middle of the datapath. Note: there is a bug in this design that is repaired in Figure 4.41. Who will supply this address?
  • 85. FIGURE 4 . 41 Corrected Datapath for Load The write register number now comes from the MEM/WB pipeline register along with the data. The register number is passed from the ID pipe stage until it reaches the MEM/WB pipeline register, adding 5 more bits to the last three pipeline registers . I-type
  • 86.
  • 87. FIGURE 4.40 MEM for Store In the fourth stage, the data is written into data memory for the store . Note that the data comes from the EX/MEM pipeline register and that nothing is changed in the MEM/WB pipeline register.
  • 88. FIGURE 4.40 WB for Store Once the data is written in memory, there is nothing left for the store instruction to do, so nothing happens in stage 5 .
  • 89.
  • 90.
  • 91.
  • 92.
  • 93. Recall: FIGURE 4.17: The simple datapath with the control unit. The PCsrc control line should be set if the instruction is branch on equal and the Zero output of ALU is true. Used to decide which operation is performed rs rt rd op address or f-code Used to decide if a branch is taken
  • 94.
  • 95.
  • 96. FIGURE 4.51 The pipelined datapath of Figure 4.46 with the control signals connected to the control portions of the pipe line registers. The control values for the last three stages are created during the ID stage and then placed in the ID/EX pipeline register . The control lines for each pipe stage are used, and remaining control lines are then passed to the next pipeline stage.
  • 97.
  • 104. Cycle 7 Fig. 6.34
  • 105. Cycle 8 Fig. 6.34
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112. FIGURE 4.54 Forwarding Paths hardware On the top are the ALU and pipeline registers before adding forwarding. On the bottom, the multiplexors have been expanded to add the forwarding paths , and we show the forwarding unit. This figure is a stylized drawing, how ever, leaving out details from the full datapath such as the sign extension hardware. Note that the ID/EX. RegisterRt field is shown twice, once to connect to the mux and once to the forwarding unit, but it is a single signal.
  • 113. FIGURE 4 . 55 The control values for the forwarding multiplexors
  • 114.
  • 115.
  • 116.
  • 117. FIGURE 4 . 56 The datapath modified to resolve hazards via forwarding
  • 118. Load-Use Data Hazard Need to stall for one cycle
  • 119.
  • 120.
  • 121. Stall/Bubble in the Pipeline Stall inserted here
  • 122. Stall/Bubble in the Pipeline Or, more accurately…
  • 123. Datapath with Hazard Detection
  • 124.
  • 125.
  • 126.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 142.
  • 143.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153.
  • 154.
  • 155.
  • 156. MIPS with Static Dual Issue
  • 157.
  • 158.
  • 159.
  • 160.
  • 161.
  • 162.
  • 163. Dynamically Scheduled CPU Results also sent to any waiting reservation stations Reorders buffer for register writes Can supply operands for issued instructions Preserves dependencies Hold pending operands
  • 164.
  • 165.
  • 166.
  • 167.
  • 168.
  • 169. The Opteron X4 Microarchitecture §4.11 Real Stuff: The AMD Opteron X4 (Barcelona) Pipeline 72 physical registers
  • 170.
  • 171.
  • 172.
  • 173.

Editor's Notes

  1. 20 May 2010
  2. 20 May 2010
  3. 20 May 2010 OK, let get on with today lecture by looking at the simple add instruction. In terms of Register Transfer Language, this is what the Add instruction need to do. First, you need to fetch the instruction from Memory. Then you perform the actual add operation. More specifically: (a) You add the contents of the register specified by the Rs and Rt fields of the instruction. (b) Then you write the results to the register specified by the Rd field. And finally, you need to update the program counter to point to the next instruction. Now, let take a detail look at the datapath during various phase of this instruction. +2 = 10 min. (X:50)
  4. 20 May 2010
  5. 20 May 2010
  6. 20 May 2010
  7. 20 May 2010
  8. 20 May 2010
  9. 20 May 2010
  10. 20 May 2010 Well, the last slide pretty much illustrate one of the biggest disadvantage of the single cycle implementation: it has a long cycle time. More specifically, the cycle time must be long enough for the load instruction which has the following components: Clock to Q time of the PC, .... Having a long cycle time is a big problem but not the the only problem. Another problem of this single cycle implementation is that this cycle time, which is long enough for the load instruction, is too long for all other instructions. We will show you why this is bad and what we can do about it in the next few lectures. That all for today. +2 = 79 min (Y:59)
  11. 20 May 2010
  12. 20 May 2010
  13. 20 May 2010
  14. 20 May 2010 What happened if we try to pipeline the R-type instructions with the Load instructions? Well, we have a problem here!!! We end up having two instructions trying to write to the register file at the same time! Why do we have this problem (the write ubble?? Well, the reason for this problem is that there is something I have not yet told you. +1 = 16 min. (X:56)
  15. 20 May 2010 I already told you that in order for pipeline to work perfectly, each functional unit can ONLY be used once per instruction. What I have not told you is that this (1st bullet) is a necessary but NOT sufficient condition for pipeline to work. The other condition to prevent pipeline hiccup is that each functional unit must be used at the same stage for all instructions. For example here, the load instruction uses the Register File Wr port during its 5th stage but the R-type instruction right now will use the Register File port during its 4th stage. This (5 versus 4) is what caused our problem. How do we solve it? We have 2 solutions. +1 = 17 min. (X:57)
  16. 20 May 2010 As shown here, each of these five steps will take one clock cycle to complete. And in pipeline terminology, each step is referred to as one stage of the pipeline. +1 = 8 min. (X:48)
  17. 20 May 2010 The main control here is identical to the one in the single cycle processor. It generate all the control signals necessary for a given instruction during that instruction Reg/Decode stage. All these control signals will be saved in the ID/Exec pipeline register at the end of the Reg/Decode cycle. The control signals for the Exec stage (ALUSrc, ... etc.) come from the output of the ID/Exec register. That is they are delayed ONE cycle from the cycle they are generated. The rest of the control signals that are not used during the Exec stage is passed down the pipeline and saved in the Exec/Mem register. The control signals for the Mem stage (MemWr, Branch) come from the output of the Exec/Mem register. That is they are delayed two cycles from the cycle they are generated. Finally, the control signals for the Wr stage (MemtoReg & RegWr) come from the output of the Exec/Wr register: they are delayed three cycles from the cycle they are generated. +2 = 45 min. (Y:45)
  18. 20 May 2010