verilog_case_study

ES6102
Advanced Digital Systems Design

Complex Sequential systems

Module 6
MIPS Datapath (Case Study)

School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers

The Five Classic Components of a Computer

Processor

Input

Control
Memory

Datapath
Output


The Performance Perspective
CPI

• Performance of a machine is determined by:
– Instruction count
– Clock cycle time Inst. Count Cycle Time
– Clock cycles per instruction
• Processor design (datapath and control) will determine:
– Clock cycle time
– Clock cycles per instruction
• Single cycle processor:
– Advantage: One clock cycle per instruction
– Disadvantage: long cycle time


The Processor: Datapath & Control
• We're ready to look at an implementation of the MIPS
• Simplified to contain only:
– memory-reference instructions: lw, sw
– arithmetic-logical instructions: add, sub, and, or, slt
– control flow instructions: beq, j
• Generic Implementation:
– use the program counter (PC) to supply instruction address
– get the instruction from memory
– read registers
– use the instruction to decide exactly what to do
• All instructions use the ALU after reading the registers


The MIPS Instruction Formats
• All MIPS instructions are 32 bits long. The three instruction formats:
31 26 21 16 11 6 0
op rs rt rd shamt funct
– R-type
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
31 26 21 16 0
– I-type op rs rt immediate
6 bits 5 bits 5 bits 16 bits
– J-type 31 26 0
op target address
6 bits 26 bits
• The different fields are:
– op: operation of the instruction
– rs, rt, rd: the source and destination register specifiers
– shamt: shift amount
– funct: selects the variant of the operation in the “op” field
– address / immediate: address offset or immediate value
– target address: target address of the jump instruction

Lets look at a MIPS subset
• ADD and SUB 31 26 21 16 11 6 0
– add rd, rs, rt op rs rt rd shamt funct
– sub rd, rs, rt
• OR Immediate: 31 26 21 16 0
op rs rt immediate
– ori rt, rs, imm16 6 bits 5 bits 5 bits 16 bits
• LOAD and STORE Word
31 26 21 16 0
– lw rt, rs, imm16
op rs rt immediate
– sw rt, rs, imm16 6 bits 5 bits 5 bits 16 bits
• BRANCH:
31 26 21 16 0
– beq rs, rt, imm16 op rs rt immediate


Register Transfers
• Process starts by fetching the instruction
op | rs | rt | rd | shamt | funct <= MEM[ PC ]
op | rs | rt | Imm16 <= MEM[ PC ]

inst Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4
SUBU R[rd] <– R[rs] – R[rt]; PC <– PC + 4
ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4
LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16) ]; PC <– PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC <– PC + 4
BE if ( R[rs] == R[rt] ) then PC <– PC + 4 +
sign_ext(Imm16 x 4)
else PC <– PC + 4


Requirements of the Instruction Set

• Memory
– instruction & data
• Registers (32 x 32)
– read RS
– read RT
– Write RT or RD
• PC
• Extender
• Add and Sub register or extended immediate
• Add 4 or extended immediate to PC


Need a Storage Element: Register File
RW RARB
• Register File consists of 32 registers: Write Enable 5 5 5
– Two 32-bit output busses: busA
• busA and busB busW 32 32-bit 32
32 Registers busB
– One 32-bit input bus: busW Clk
• Register is selected by: 32
– RA (number) selects the register to put on busA (data)
– RB (number) selects the register to put on busB (data)
– RW (number) selects the register to be written via busW (data) when
Write Enable is 1
• Clock input (CLK)
– The CLK input is a factor ONLY during write operation
– During read operation, behaves as a combinational logic block:
ie. RA or RB valid => busA or busB valid after “access time.”


Basic Building Blocks
CarryIn
• Adder A
32

Adder
Sum
32
B Carry
Select 32
• MUX
A
32

MUX
Y
32 OP
B
32
• ALU A
32

ALU
Result
32
B
32
• Registers


So what do we need?
MemWrite

Instruction
address Address Read
data 16 32
Sign
PC
extend
Instruction Add Sum Write Data
data memory
Instruction
memory
MemRead

a. Instruction memory b. Programcounter c. Adder d. Data memory unit
. e. Sign-extension unit

ALU control
5 3
Read
register 1
Select
Read
Register 5 data 1
Read
numbers register 2
Registers Data ALU
Zero
A 32

MUX
ALU
5 Write result
register
Read 32 Y
Data
Write
data
data 2
B 32
RegWrite

f. Registers g. ALU h. Selector


How do we connect them?
• Register Transfer Requirements -> Datapath Assembly
– Instruction Fetch
– Then Read Operand and Execute Operation

• Instruction fetch
– Fetch the Instruction: mem[PC]
– Update the program counter: Clk PC
• Sequential Code: PC <- PC + 4 Next Address
• Branch and Jump: PC <- “something else” Logic

Address
Instruction Word
Instruction
Memory 32


Execution (Add and Subtract)
• R[rd] <- R[rs] op R[rt] Example: addu rd, rs, rt
– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields
– ALUctr and RegWr: control logic after decoding the instruction
31 26 21 16 11 6 0
op rs rt rd shamt funct

Rd Rs Rt
RegWr 5 ALUctr
5 5
busA
Rw Ra Rb
busW 32

ALU
32 32-bit Result
32 Registers 32
Clk busB
32


Execution (Logical with Immediate)

• R[rt] <- R[rs] op ZeroExt[imm16] ]
31 26 21 16 0
op rs rt immediate

Rd Rt
RegDst Mux
Rs ALUct
RegWr 5 5 5
r
busA
Rw Ra Rb
busW

ALU
32 32-bit 32 Result
32 Registers 32
Clk busB

Mux
32
ZeroExt

imm16 32
16
ALUSrc


Execution (Load Operations)
• R[rt] <- Mem[ R[rs] + SignExt[imm16] ] Example: lw rt, rs, imm16
31 26 21 16 0
op rs rt immediate
Rd Rt 6 bits 5 bits 5 bits 16 bits
RegDst Mux
Rs
RegWr 5 ALUct
5 5
r
busA W_Src
Rw Ra Rb
busW 32

ALU
32 32-bit
32 Registers 32
Clk busB MemWr

Mux
Mux

32
WrEn Adr
Extender

Data In 32
imm16 32 Data
16 32
Memory
Clk
ALUSrc

ExtOp


Execution (Store Operations)
• Mem[ R[rs] + SignExt[imm16] ] <- R[rt] Example: sw rt, rs, imm16
31 26 21 16 0
op rs rt immediate
Rd Rt 6 bits 5 bits 5 bits 16 bits
RegDst
Mux
ALUctr MemWr W_Src
Rs Rt
RegWr 5 5 5
busA
Rw Ra Rb
busW 32

ALU
32 32-bit
32 Registers 32
Clk busB

Mux
Mux

32
WrEn Adr
Extender

Data In 32 32
imm16 Data
32
16 Memory
Clk

ExtOp ALUSrc


Execution (Branch Operations)
• beq rs, rt, imm16 Datapath generates condition (equal)
31 26 21 16 0
op rs rt immediate

Inst Address Cond

nPC_sel Rs Rt
4 RegWr 5 5 5
Adder

32 busA
Rw Ra Rb
00

busW

Equal?
32 32-bit 32
Mux

Registers
PC

Clk busB
32
Adder
PC Ext

imm16
Clk


Putting it all together
• A single cycle implementation
PCSrc

1
Add M
u
x
4 ALU 0
Add result
RegWrite Shift
left 2

Instruction [25– 21] Read
Read register 1 Read MemWrite
PC data 1
address Instruction [20– 16] Read MemtoReg
ALUSrc
Instruction register 2 Zero
1 Read ALU ALU
[31– 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15– 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15– 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5– 0]

ALUOp


An Abstract View of the Implementation

Ideal
Control
Instruction Control Signals Conditions
Instruction
Memory
Rd Rs Rt
5 5 5
Instruction
Address
A Data
Rw Ra Rb 32 Data
Address
Next Address

32 32 Ideal Out

ALU
32 32-bit Data
PC

Registers Data In Memory
B

Clk Clk
32
Clk

Datapath


Control of the Datapath
• Control is the hard part
• MIPS makes control easier
– Instructions same size
– Source registers always in same place
– Immediates same size, location
– Operations always on registers/immediates

• Lets skip control till later


An Abstract View of the Critical Path
• Register file and ideal memory:
– The CLK input is a factor ONLY during write operation
– During read operation, behave as combinational logic:
• Address valid => Output valid after “access time.”
Critical Path (Load) = PC’s Clk-to-Q +
Ideal
Instruction Inst. Memory Access Time + Register File
Instruction Access Time + ALU ( 32-bit Add ) + Data
Memory
Rd Rs Rt Imm Memory Access Time + Setup Time for
5 5 5 16 Register File Write + Clock Skew
Instruction
Address
A Data
Next Address

Rw Ra Rb 32 Address
32 32 Ideal

ALU
32 32-bit
PC

Data
Registers Data
B Memor
In y
Clk Clk
Clk

32


Critical Path (Load Instruction)
Instruction<31:0>

<21:25>

<16:20>

<11:15>
Inst

<0:15>
Memory
Adr
Rs Rt Rd Imm16

nPC_sel RegDst ALUctr MemWr MemtoReg
+4 rt Rd Rt Equal add
1 0
4 Rs Rt
RegWr 5 5 5
Adder

busA
Rw Ra Rb =
00

busW
32

ALU
32 32-bit
Mux

32 Registers busB 32 0
PC

0

Mux
Mux
32
Adder

Clk 32
Extender WrEn Adr 1
PC Ext

Clk 1 Data In
Data
imm16

imm16 32
16
Clk Memory
sign ext
ExtOp ALUSrc


Worst Case Timing (Load)
Clk
Clk-to-Q
PC Old New Value
Value Instruction Memoey Access Time
Rs, Rt, Rd, Old Value New Value
Op, Func
Delay through Control Logic
ALUct Old Value New Value
r
ExtOp Old Value New Value

ALUSrc Old Value New Value

MemtoReg Old Value New Value Register
RegWr Old Value New Value
Write Occurs
Register File Access Time
busA Old Value New Value
Delay through Extender & Mux
busB Old Value New Value
ALU Delay
Addres Old Value New Value
s Data Memory Access Time
busW Old Value New


Single cycle (CPI=1) processor: The problem

• Long Cycle Time
• All instructions take as much time as the slowest
• Real memory is not so nice as our idealized memory
– cannot always get the job done in one (short) cycle
Arithmetic & Logical
PC Inst Memory Reg File mux ALU mux setup

Load
PC Inst Memory Reg File mux ALU Data Mem mux setup

Critical Path
Store
PC Inst Memory Reg File mux ALU Data Mem
Branch
PC Inst Memory Reg File cmp mux


Time is the problem

• For a single cycle implementation, the time from when one
instruction is started till it completes (cycle time) is long.
– Cycle time must be long enough for the load instruction:
– Cycle time for load is much longer than needed for all other
instructions
• Instead consider a multi-cycle approach.
– We will be reusing functional units
• ALU used to compute address and to increment PC
• Memory used for instruction and data
– Our control signals will not be determined solely by instruction
• We’ll use a finite state machine for control


Multicycle Approach
• Break up the instructions into steps, each step takes a cycle
– balance the amount of work to be done
– restrict each cycle to use only one major functional unit
• At the end of a cycle
– store values for use in later cycles (easiest thing to do)
– introduce additional “internal” registers


Five Execution Steps
• Instruction Fetch

• Instruction Decode and Register Fetch

• Execution, Memory Address Computation, or Branch Completion

• Memory Access or R-type instruction completion

• Write-back step

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Load instruction is longest and uses all of the above steps.


Step 1: Instruction Fetch
• Use PC to get instruction and put it in the Instruction Register.
IR <- Memory[PC];
• Increment the PC by 4 and put the result back in the PC. (But what
about Branches or Jumps)
– Sequential Code:
PC <- PC + 4;
Clk PC
– Branch and Jump:
Next Address
PC <- “something else”; Logic

Address
Instruction Word
Instruction
Memory 32


Step 2: Inst. Decode and Register Fetch

• Read registers rs and rt in case we need them
A <- Reg[IR[25-21]];
B <- Reg[IR[20-16]];

• Compute the branch address in case the instruction is a branch
PC <- PC + (sign-extend(IR[15-0]) << 2);

Note: <<2 is the same
as a multiply by 4


Step 3 (instruction dependent)
• ALU is performing one of two functions, based on instruction type.
• Memory Reference:

ALUOut <- A + sign-extend(IR[15-0]);

• R-type:

ALUOut <- A op B;

• Note that in the Basic MIPS (MIPS_Basic.zip) the PC for a branch is
calculated in this stage (and not the ID stage as in the previous slide)
– Branch:
if (A==B) PC <- PC + (signext(IR[15-0]) << 2);


Step 4 & 5 (R-type or memory-access)
• Loads and stores access memory
MDR = Memory[ALUOut];
or
Memory[ALUOut] = B;

• R-type instructions finish (write back to register file)

Reg[IR[15-11]] = ALUOut;
The write actually takes place at the end of the cycle on the edge

Step 5 (The write-back step)
A load from memory to the register file needs an extra cycle to complete.

Reg[IR[20-16]]= MDR;


Summary

Action for R-type Action for memory-reference Action for Action for
Step name instructions instructions branches jumps
Instruction fetch IR = Memory[PC]
PC = PC + 4
Instruction A = Reg [IR[25-21]]
decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] = B

Memory read completion Load: Reg[IR[20-16]] = MDR


How can we reduce the cycle time?

• Cut combinational dependency graph and insert register / latch
• Do same work in two fast cycles, rather than one slow one
storage element storage element

Acyclic Acyclic
Combinational Combinational
Logic Logic (A)

=> storage element This is pipelining.

Acyclic
Combinational
storage element Logic (B)

storage element


How can we improve instruction throughput?

Ideal speedup is number of stages in the pipeline. Do we achieve this?


Why Pipeline? Because the resources are there!
Time (clock cycles)

But aren’t we
using two

ALU
I Im Reg Dm Reg
n Inst 0 resources here
s

ALU
t Inst 1 Im Reg Dm Reg
r.

ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3

ALU
Im Reg Dm Reg
e
r
Inst 4

ALU
Im Reg Dm Reg

2 School of Computer Engineering ES6102: Advanced Digital Systems Design 2011

Pipelining
• What makes it easy
– all instructions are the same length
– just a few instruction formats
– memory operands appear only in loads and stores

• What makes it hard?
– structural hazards: suppose we had only one memory
– control hazards: need to worry about branch instructions
– data hazards: an instruction depends on a previous instruction

• We’ll build a simple pipeline and look at these issues


Basic Idea

What do we need to add to actually split the datapath into stages?

Pipelined Datapath
But which value do we write
back. (see next slide)


Corrected Datapath
• The problem with the previous implementation:
– What happens when we writeback to the register file. What instruction
supplies the write register value (destination register)?
– Solution: We must forward (preserve) the destination register value.


Graphically Representing Pipelines

• Can help with answering questions like:
– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths


Load word instruction
• The load word (lw) instruction is the most complicated as it uses all
stages of the datapath. Consider:
lw $10, 20($1) # R10 <- Mem[R1+20]
1. Instruction fetch:
2. Instruction Decode:
– Immediate value (20) is sign extended. src & dest. Reg values forwarded.
3. Execution:
– Immed. & src Reg values added to generate address , dest Reg forwarded.
4. Memory
– Data read from memory.
5. Writeback
– Memory data is written to register file at dest. Reg location.


Pipeline control
• We have 5 stages. What needs to be controlled in each stage?
– Instruction Fetch and PC Increment
– Instruction Decode / Register Fetch
– Execution
– Memory Stage
– Write Back


Pipeline Control
• Pass control signals along just like the data
Write-back
Execution/Address Calculation Memory access stage stage control
stage control lines control lines lines
Reg ALU ALU ALU Mem Mem Reg Mem to
Instruction Dst Op1 Op0 Src Branch Read Write write Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X


Datapath with Control


Can pipelining get us into trouble?

• Yes: Pipeline Hazards
– structural hazards: attempt to use the same resource two different
ways at the same time
• Only one memory system and we want to access data and instruction
memory in same cycle.
– data hazards: attempt to use item before it is ready
• instruction depends on result of prior instruction still in the pipeline
– control hazards: attempt to make a decision before condition is
evaluated
• branch instructions
• Can always resolve hazards by waiting
– pipeline control must detect the hazard
– take action (or delay action) to resolve hazards


PC OP C Instruction
16 4905 add R2, R2, #2
Consider the code
17 6D08 lw R3,
18 0D95 #4(R2) R2 should be 1 but it is
19 0000 add R3, R3, R1 not updated to here
nop

ES6102: Advanced Digital Systems Design 2011

Single Memory is a Structural Hazard
Time (clock cycles)

Trying to perform

ALU
I Mem Reg Mem Reg two reads from the
n Load one memory at the
s same time.

ALU
Mem Reg Mem Reg
t Instr 1
r. Thus we need 2

ALU
Mem Reg Mem Reg separate memories.
O Instr 2 Instruction memory
r

ALU
d Mem Reg Mem Reg and Data memory
e
Instr 3
r

ALU
Mem Reg Mem Reg
Instr 4


Data Hazards
• Consider R2. Note: Dependencies backwards in time are hazards
Time (clock cycles)
IF ID/RF EX MEM WB

ALU
Reg Reg
I sub r2,r1,r3 Im Dm

n

ALU
s Im Reg Dm Reg
and r4,r2,r5
t
r.

ALU
Im Reg Dm Reg
or r8,r2,r6
O

ALU
r Im Reg Dm Reg
d
and r9,r4,r2
e

ALU
Im Reg Dm Reg
r slt r1,r6,r7


Data hazards: Forwarding
• Use temporary results, don’t wait for them to be written
– register file forwarding to handle read/write to same register
– ALU forwarding
Time (in clock cycles)
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
Value of register $2 : 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Value of EX/MEM : X X X – 20 X X X X X
Value of MEM/WB : X X X X – 20 X X X X

Program
execution order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

add $14, $2, $2 IM Reg DM Reg

sw $15, 100($2) IM Reg DM Reg


Data path with forwarding


Can't always forward
• Load word can still cause a hazard:
– an instruction tries to read a register following a load instruction that writes to
the same register.

• Thus, we need a hazard detection unit to “stall” the load instruction

Solution: Stalling
• We can stall the pipeline by keeping an instruction in the same stage


Hazard Detection Unit


Control Hazards:- Branch Hazards

• Stall: wait until decision is clear
– Its possible to move up decision to 2nd stage by adding hardware to
check registers as being read. How?
• Impact: 2 clock cycles per branch instruction => slow
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
O Beq
r

ALU
d Load Mem Reg Mem Reg
e
r Need
to stall


Control Hazards:- Branch Hazards
• MIPS uses a “branch delay slot”
– the next instruction after a branch is always executed
– rely on compiler to “fill” the slot with something useful
• Works about 50% of the time. Rest must be NOPs.
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
O Beq
r

ALU
d Misc Mem Reg Mem Reg
e
r

ALU
Load Mem Reg Mem Reg


Some MIPS Instructions
• Consider the following MIPS instructions (Note: add $2, $1, $3 is $2 <= $1 + $3)
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000001000110001000000100000"; -- add $2, $1, $3 -- 00231020
b"00000000001001100010000000100101"; -- or $4, $1, $6 -- 00262025
b"00000000010000110010100000100000"; -- add $5, $2, $3 -- 00432820
b"00010000001000010000000000000100"; -- beq $1, $1, #4 -- 10210010
b"00000000000000000000000000000000"; -- nop -- 00000000
Jumps 4
b"00000000010001100010000000100100"; -- and $4, $2, $6 -- 00462024 instructions
b"00000000001001010011000000100101"; -- or $6, $1, $5 -- 00253025
b"00000000111001110010000000100000"; -- add $4, $7, $7 -- 00E72020
b"00000000001000100001100000100000"; -- add $3, $1, $2 -- 00221820
b"00000000001001100000100000100101"; -- or $1, $1, $6 -- 00260825
b"00000000010001010001100000100000"; -- add $3, $2, $5 -- 00451820
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000000000000000000000000000"; -- nop -- 00000000


Some MIPS Instructions
• Remember the R format instruction

rs rd funct

000000 00001 00011 00010 00000 100000 -- add $2, $1, $3

opcode rt 0

-- add rd, rs, rt


Simulator Results These should not be there

The decoded Branch
The register file
instruction (IF Section)
contents (ID Section)
Branch decision made
New PC updated
(Ex Section)Correct instruction
(Mem Section) Section)
decoded (IF
beq $1, $1, #4 -- 10210010


Datapath with Control

We need to move the
The branch
branch is made in to the
decision decision
the MEM stage.
ID stage.


Lets look at the whole system


VERILOG code
• Instruction Fetch
module Stage_IF(IF_PC4_Out, IF_Instr_Out, IF_BranchPC_In,
IF_PCSrc_In, IF_Clk_In, IF_Reset_In);
output [31:0] IF_PC4_Out;
output [31:0] IF_Instruction_Out;
input [31:0] IF_BranchPC_In;
input IF_PCSrc_In;
input IF_Clk_In;
input IF_Reset_In;

Note: The Instruction memory is in this module.
Currently, the next address logic implements:
PC <- PC+4;
or: PC <- PC+4+branch_offset;
The branch offset is calculated in a later section
(ID or EX, depending on version)


Verilog Code (cont)
• Instruction Decode
module Stage_ID(ID_RegWrite_Out, ID_MemToReg_Out, ID_Branch_Out, ID_MemRead_Out,
ID_MemWrite_Out, ID_RegDst_Out, ID_ALUOp_Out, ID_ALUSrc_Out,
ID_PC4_Out, ID_ReadData1_Out, ID_ReadData2_Out, ID_Immediate_Out,
ID_rt_Out, ID_rd_Out, ID_RegWrite_In, ID_PC4_In, ID_Instruction_In,
ID_WriteRegister_In, ID_WriteData_In, ID_Clk_In, ID_Reset_In);

output ID_RegWrite_Out, ID_MemToReg_Out, ID_Branch_Out, ID_RegDst_Out;
output ID_MemRead_Out, ID_ALUSrc_Out, ID_MemWrite_Out;
output [3:0] ID_ALUOp_Out;
output [31:0] ID_PC4_Out;
output [15:0] ID_ReadData1_Out, ID_ReadData2_Out;
output [31:0] ID_Immediate_Out;
output [4:0] ID_rt_Out, ID_rd_Out;

input [31:0] ID_PC4_In;
input ID_RegWrite_In;
input [31:0] ID_Instruction_In;
input [4:0] ID_WriteRegister_In;
input [15:0] ID_WriteData_In;
input ID_Clk_In;
input ID_Reset_In;


Verilog Code (cont)
• EX section
module Stage_EX(EX_RegWrite_Out, EX_MemToReg_Out, EX_Branch_Out, EX_MemRead_Out,
EX_MemWrite_Out, EX_BranchPC_Out, EX_Zero_Out, EX_ALUResult_Out,
EX_ReadData2_Out, EX_WriteRegister_Out, EX_RegWrite_In,
EX_MemToReg_In, EX_Branch_In, EX_MemRead_In, EX_MemWrite_In,
EX_RegDst_In, EX_ALUOp_In, EX_ALUSrc_In, EX_PC4_In, EX_ReadData1_In,
EX_ReadData2_In, EX_Immediate_In, EX_rt_In,
EX_rd_In, EX_Clk_In, EX_Reset_In);

output EX_RegWrite_Out, EX_MemToReg_Out, EX_Branch_Out;
output EX_MemRead_Out, EX_MemWrite_Out;
output [31:0] EX_BranchPC_Out;
output EX_Zero_Out;
output [15:0] EX_ALUResult_Out;
output [15:0] EX_ReadData2_Out;
output [4:0] EX_WriteRegister_Out;

input EX_RegWrite_In, EX_MemToReg_In, EX_Branch_In;
input EX_MemRead_In, EX_MemWrite_In;
input [31:0] EX_PC4_In, EX_Immediate_In;
input EX_RegDst_In, EX_ALUSrc_In, EX_Clk_In, EX_Reset_In;
input [3:0] EX_ALUOp_In;
input [15:0] EX_ReadData1_In, EX_ReadData2_In;
input [4:0] EX_rt_In, EX_rd_In;


Verilog Code (cont)
• Data Memory system
module Stage_MEM(MEM_PCSrc_Out, MEM_BranchPC_Out, MEM_RegWrite_Out, MEM_MemToReg_Out,
MEM_ReadData_Out, MEM_ALUResult_Out, MEM_WriteRegister_Out,
MEM_RegWrite_In, MEM_MemToReg_In, MEM_Branch_In, MEM_MemRead_In,
MEM_MemWrite_In, MEM_BranchPC_In, MEM_Zero_In, MEM_ALUResult_In,
MEM_WriteData_In, MEM_WriteRegister_In, MEM_Clk_In, MEM_Reset_In);

output MEM_PCSrc_Out, MEM_RegWrite_Out, MEM_MemToReg_Out;
output [31:0] MEM_BranchPC_Out;
output [15:0] MEM_ReadData_Out, MEM_ALUResult_Out;
output [4:0] MEM_WriteRegister_Out;

input MEM_RegWrite_In, MEM_MemToReg_In, MEM_Branch_In, MEM_Zero_In;
input [31:0] MEM_BranchPC_In;
input MEM_MemRead_In, MEM_MemWrite_In, MEM_Clk_In, MEM_Reset_In;
input [15:0] MEM_ALUResult_In, MEM_WriteData_In;
input [4:0] MEM_WriteRegister_In;


Verilog Code (cont)
• Write Back system
module Stage_WB(WB_RegWrite_Out, WB_WriteRegister_Out, WB_WriteData_Out,
WB_RegWrite_In, WB_MemToReg_In, WB_ReadData_In, WB_ALUResult_In,
WB_WriteRegister_In);

output WB_RegWrite_Out;
output [4:0] WB_WriteRegister_Out;
output [15:0] WB_WriteData_Out;

input WB_RegWrite_In;
input WB_MemToReg_In;
input [15:0] WB_ReadData_In;
input [15:0] WB_ALUResult_In;
input [4:0] WB_WriteRegister_In;


Verilog Code (cont)
• IF/ID pipeline register (#1)
module Reg_IF_ID(PC4_Out, Instruction_Out, PC4_In,
Instruction_In, Clk_In, Reset_In);

This simply passes the PC and the Instruction from
the IF to the ID stage.


Verilog Code (cont)

• Also have
– ID/EX pipeline register (#2)
module Reg_ID_EX(…)
– EX/MEM pipeline register (#3)
module Reg_EX_MEM(…)
– MEM/WB pipeline register (#4)
module Reg_MEM_WB(…)

• See the code for more detail


The End


verilog_case_study

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (12)

Similar to verilog_case_study

Similar to verilog_case_study (20)

Recently uploaded

Recently uploaded (20)

verilog_case_study