2. 11/17/2019 2
The CPU
• Processor (CPU): the active part of the computer,
which does all the work (data manipulation and
decision-making)
– Datapath: portion of the processor which contains
hardware necessary to perform all operations
required by the computer
– Control: portion of the processor (also in
hardware) which tells the datapath what needs to
be done (the brain)
4. 11/17/2019 4
Abstract View of the DataPath
• The data path contains 2 types of logic elements:
– Combinational: Elements that operate on data values. Their
outputs depend on their inputs. The ALU is an combinational
element.
– State: Elements with internal storage. Their state is defined
by the values they contain (memory and registers).
Registers
Register #
Data
Register #
Data
memory
Address
Data
Register #
PC Instruction ALU
Instruction
memory
Address
9. 11/17/2019 9
Instruction Datapath
Instruction
Memory
Read address
Instruction
PC
Add
4
• Instructions will be held in
the instruction memory
• The instruction to fetch is at
the location specified by the
PC
– Instr. = M[PC]
Note: Regular instruction width
(32 for MIPS) makes this easy
• After we fetch one
instruction, the PC must be
incremented to the next
instruction
• All instructions are 4 bytes
• PC = PC + 4
10. 11/17/2019 10
R-type Instruction Datapath
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Result
Zero
ALU
Instruction
• R-type Instructions have three registers
– Two read (Rs, Rt) to provide data to the ALU
– One write (Rd) to receive data from the ALU
• We’ll need to specify the operation to the ALU (later...)
• We might be interested if the result of the ALU is zero (later...)
Read reg num A
11. 11/17/2019 11
Memory Operations
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
sign
extend
16 32
• Memory operations first need to compute the effective address
– LW $t1, 450($s3) # E.A. = 450 + $s3
– Add together one register and 16 bits of immediate data
– Immediate data needs to be converted from 16-bit to 32-bit
• Memory then performs load or store using destination register
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
12. 11/17/2019 12
Branches
Add
Result
Sh.
Left
2
Result
Zero
sign
extend
16 32
PC + 4
To control
logic
Instruction
• Branches conditionally
change the next instruction
– BEQ $2, $1, 42
– The offset is specified as
the number of words to
be added to the next
instruction (PC+4)
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
• Control logic has to decide if
the branch is taken
– Uses ‘zero’ output of ALU
• Take offset, multiply by 4
– Shift left two
• Add this to PC+4 (from PC
logic)
offset
13. 11/17/2019 13
Integrating the R-types and Memory
• R-types and Load/Stores are similar in many respects
• Differences:
– 2nd ALU source: R-types use register, I-types use
Immediate
– Write Data: R-types use ALU result, I-types use memory
• Mux the conflicting datapaths together
Data Memory
Read address
Write address
Write data
Read data
Result
Zero
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Instruction
0
1
1
0
Memory
Datapath
14. 11/17/2019 14
Adding the instruction memory
Instruction
Memory
Add
4
Read address
Instruction [31-0]
Result
PC
Simply add the instruction memory
and PC to the beginning of the datapath.
Data Memory
Read address
Write address
Write data
Read data
Result
Zero 1
0
0
1
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
15. 11/17/2019 15
Adding the Branch Datapath
Instruction
Memory
Add
4
Read address
Instruction [31-0]
Result
PC
Data Memory
Read address
Write address
Write data
Read data
Result
Zero 1
0
0
1
sign
extend
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Add
Result
Sh.
Left
2
0
1
Now we have the datapath for R-type, I-type, and branch instructions.
On to the control logic!
16. 11/17/2019 16
When does everything happen?
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Combinational Logic:
Just does it! Outputs are
always just a function of its
inputs (with some delay)
Registers: Written at the end of the clock cycle.
(Rising edge triggered).
clk
clk
clk
Single-Cycle Design
17. 11/17/2019 17
What do we need to control?
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
ALU -
What is the
Operation?
Memory-
Read/Write/neither?
Mux - are we
branching or not?
Mux - Where
does 2nd ALU
operand come
from?
Registers-
Should we
write data? Mux - Result from
ALU or Memory?
Almost all of the information we need is in the instruction!
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
18. 11/17/2019 18
The ALU
• The ALU is stuck right in the middle of everything...
• It must:
– Add, Subtract, And, or Or for arithmetic instructions
– Subtract for a branch on equal
– Subtract and set for a SLT
– Add for a memory access
0
1
A
Operation
Result
+ 2
B
CarryIn
CarryOut
0
1
BInvert
3
Less
Function BInvert Op Carryin Result
And 0 00 0 R = A • B
Or 0 01 0 R = A B
Add 0 10 0 R = A + B
Subtract 1 10 1 R = A - B
SLT 1 11 1 R = 1 if A < B
0 if A B
Always the same: Combine into one signal called “sub”
19. 11/17/2019 19
Setting the ALU controls
• The instruction Opcode and Function give us the info we need
– For R-type instructions, Opcode is zero, function code
determines ALU controls
Instruction Opcode ALUOp Funct. Code ALU action ALU control
sub op
add R-type 10 100000 add 0 10
sub R-type 10 100010 subtract 1 10
and R-type 10 100100 and 0 00
or R-type 10 100101 or 0 01
SLT R-type 10 101010 SLT 1 11
New control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-type
– For I-type instructions, Opcode determines ALU controls
load word LW 00 xxxxxx add 0 10
store word SW 00 xxxxxx add 0 10
branch equal BEQ 01 xxxxxx subtract 1 10
20. 11/17/2019 20
Decoding the Instruction - Data
The instruction holds the key to all of the data signals
Write
reg./
Read
reg. B
R-type
Memory,
Branch
Opcode RS RT RD ShAmt Function
31-26 25-21 20-16 15-11 10-6 5-0
Opcode RS RT Immediate Data
31-26 25-21 20-16 15-0
To ctrl
logic
Read
reg. A
Memory address or Branch Offset
To ctrl
logic
Read
reg. A
Read
reg. B
Write
reg.
To ALU
Control
Not
Used
One problem - Write register number must come from two different places.
21. 11/17/2019 21
Instruction Decoding
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
Ctrl
Read Reg A: Rs
Read Reg B: Rt
Write Reg: Either Rd or Rt
Immediate Data: [15-0]
Opcode: [31-26]
0
1
We can decode the data simply
by dividing up the instruction bus
22. 11/17/2019 22
Control Signals
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
ALU Control - A function of: ALUOp and the function code
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
Load
Store
Load
Memory
Load,R-type BEQ and zero
00: Memory
01: Branch
10: R-type
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
R-type
23. 11/17/2019 23
Inside the control oval
Reg ALU Mem Reg Mem Mem
Instruction Opcode Write Src To Reg Dest Read Write PCSrc ALUOp
• This control logic can be decoded in several ways:
– Random logic, PLA, PAL
• Just build hardware that looks for the 4 opcodes
– For each opcode, assert the appropriate signals
Note: BEQ must also check the zero output of the ALU...
BEQ 000100 0 0 x x 0 0 1 01
R-format 000000 1 0 0 1 0 0 0 10
LW 100011 1 1 1 0 1 0 0 00
SW 101011 0 1 x x 0 1 0 00
0:Rt
1:Rd
0:Reg
1:Imm
1:Mem
0:ALU
1:Branch
00:Mem
01:Branch
10:R-type
25. 11/17/2019 25
Control Signals
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
Read
Write
We must AND
BEQ and Zero
26. 11/17/2019 26
Jumping
Instruction
Memory
Data Memory
Add
Add
4
Read address
Instruction [31-0]
Read address
Write address
Write data
Read data
Result
Zero
Result
Result Sh.
Left
2
0
1
1
0
0
1
sign
extend
PC
16 32
Read reg. num A
Registers
Read reg num B
Write reg num
Write reg data
Read reg data A
Read reg data B
Read reg num A
ALU
Ctrl
6
ALUOp
RegWrite
MemToReg
MemWrite
MemRead
ALUSrc
PCSrc
0
1
Ctrl
Imm:
[15-0]
Rs:[25-21]
Rt:[20-16]
Rd:
[15-11]
Op:[31-26]
FC:[5-0]
RegDest
BEQ
Read
Write
1
0
Sh.
Left
2
J:[25-0]
Concat.
26
4
32
28
[31-28]
Jump
28. 11/17/2019 28
Operation of the Datapath
• Let's see the stages of execution of a R-type instruction
add $t1,$t2,$t3:
1. An instruction is fetched from memory, the PC is incremented
2. Two registers $t2 and $t3 are read from the register file.
3. The ALU operates on the data read from the register file.
4. The results of the ALU is written into the register $t1.
• Let's look at lw $t1,offset($t2)
1. An instruction is fetched from memory, the PC is incremented
2. The register $t2 is read from the register file.
3. The ALU computes the sum of $t2 and the sign-extended offset.
4. The sum from the ALU is used as the address for the data memory.
5. The data from memory is written into register $t1.
29. 11/17/2019 29
Performance of Single-Cycle
Machines
• Let's assume that the operation time for the following units is:
Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register
file - 1 ns. We will assume that MUXs, control, sign-extension,
PC accesses, and wires have no delays.
• Which implementation is faster?
1. Every instruction operates in 1 clock cycle of fixed length.
2. Every instruction operates in a varying length clock cycle.
• Lets look at the time needed by each instruction:
Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total
R-Type
Load
Store
Branch
Jump
30. 11/17/2019 30
Performance of Single-Cycle
Machines
• Let's assume that the operation time for the following units is:
Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register
file - 1 ns. We will assume that MUXs, control, sign-extension,
PC accesses, and wires have no delays.
• Which implementation is faster?
1. Every instruction operates in 1 clock cycle of fixed length.
2. Every instruction operates in a varying length clock cycle.
• Lets look at the time needed by each instruction:
Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total
R-Type 2 1 2 0 1 6ns
Load 2 1 2 2 1 8ns
Store 2 1 2 2 7ns
Branch 2 1 2 5ns
Jump 2 2ns
31. 11/17/2019 31
Fixed vs. Variable Cycle Length
• Lets Assume a program has the following instruction mix: 24%
loads, 12% stores, 44% R-type, 18% branchs, 2% jumps.
• For the fixed cycle length the cycle time is 8 ns, long enough for
the longest instruction (load). Thus each instruction takes 8 ns
to execute.
• For the variable cycle time the average CPU clock cycle is:
8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns
• It is obvious that the variable clock implementation is faster but
it is extremely hard to implement.
• Variable clock implementation is 8/6.3 = 1.27 times faster
• When adding instructions such as multiply and divide which can
take tens of cycles this scheme is too slow.
32. 11/17/2019 32
Observations on the Single Cycle
Design
• The single-cycle datapath is straightforward, but...
– It has to use 3 separate ALU’s
– It has separate Instruction and Data memories
– Cycle time is determined by worst-case path
• A multi-cycle datapath might be better
– We can reuse some of the hardware
– We can combine the memories
– Cycle time is still constant, but instructions may
take differing numbers of cycles