The document describes the design of a MIPS processor datapath. It discusses the basic components needed for a processor including a register file, ALU, program counter, and data memory. It then shows how these components can be connected to implement the execution of MIPS instructions, including register transfers, arithmetic/logical operations, loads/stores, and branches. The datapath design is able to execute a subset of the MIPS instruction set in a single clock cycle.
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers introduction to paging in 80386, Address Translation (Linear to physical), Page Level Protection,
PAI Unit 2 Segmentation in 80386 microprocessorKanchanPatil34
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers types of address spaces : Logical, linear, Physical, Address Translation in 80386, Segment Descriptor Format, Types of Segment Descriptors,
PAI Unit 2 Protection in 80386 segmentationKanchanPatil34
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers protection mechanism in 80386 microprocessor through conforming code segment and call gate
SE PAI Unit 2_Data Structures in 80386 segmentationKanchanPatil34
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers Descriptor Tables in 80386 as Global Descriptor Table, Local Descriptor Table, Types of Interrupts/Exception : Traps, faults, Aborts, Real mode Interrupt Structure (IVT), Protected mode interrupt Structure (IDT)
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers Introduction to multitasking, Support Registers and Data Structures, Task State Segment (TSS), TSS Descriptor, Task Register, Task Switching via TSS and Task Gate, Task Gate Descriptor,
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers introduction to paging in 80386, Address Translation (Linear to physical), Page Level Protection,
PAI Unit 2 Segmentation in 80386 microprocessorKanchanPatil34
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers types of address spaces : Logical, linear, Physical, Address Translation in 80386, Segment Descriptor Format, Types of Segment Descriptors,
PAI Unit 2 Protection in 80386 segmentationKanchanPatil34
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers protection mechanism in 80386 microprocessor through conforming code segment and call gate
SE PAI Unit 2_Data Structures in 80386 segmentationKanchanPatil34
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers Descriptor Tables in 80386 as Global Descriptor Table, Local Descriptor Table, Types of Interrupts/Exception : Traps, faults, Aborts, Real mode Interrupt Structure (IVT), Protected mode interrupt Structure (IDT)
2015 course SPPU SEIT syllabus of subject Processor Architecture and Interfacing (PAI) This covers Introduction to multitasking, Support Registers and Data Structures, Task State Segment (TSS), TSS Descriptor, Task Register, Task Switching via TSS and Task Gate, Task Gate Descriptor,
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT VLSICS Design
Very large scale integration and Digital signal processing are the very crucial technologies from the last
few decades. DSP applications require high performance, low area and low power VLSI circuits. This
paper is discussing about FFT which is one of the vital component in the digital signal processing. In this
Paper, we propose a single path delay commutator–feedback (SDC-SDF) Architecture for Radix-4 FFT
and presented its simulation and synthesis results. The Radix-4 FFT architecture consists of log4 N-1 SDC
Stages and 1 SDF stage. Previously, the radix-2 SDC-SDF (Single path delay commutator-feedback) FFT
architecture was includes log2 N-1 SDC Stages and 1 SDF stage. The proposed Radix-4 SDC-SDF
architecture reduces the number of multiplications and additions as well as number of stages which
achieves reduced area and low power. The resultant architecture is simulated using Modelsim, design
verification and synthesis results are done using Xilinx ISE. The proposed architecture is compared with
Radix-2 SDC-SDF FFT and it can achieve less area as well as low power
Verilog code for design a specific processor to down sample a given image via a math-lab by using SPARTAN-6 FPGA. Math-lab code, results also included.
pipelining is the concept of decomposing the sequential process into number of small stages in which each stage execute individual parts of instruction life cycle inside the processor.
IMPLEMENTATION OF SDC - SDF ARCHITECTURE FOR RADIX-4 FFT VLSICS Design
Very large scale integration and Digital signal processing are the very crucial technologies from the last
few decades. DSP applications require high performance, low area and low power VLSI circuits. This
paper is discussing about FFT which is one of the vital component in the digital signal processing. In this
Paper, we propose a single path delay commutator–feedback (SDC-SDF) Architecture for Radix-4 FFT
and presented its simulation and synthesis results. The Radix-4 FFT architecture consists of log4 N-1 SDC
Stages and 1 SDF stage. Previously, the radix-2 SDC-SDF (Single path delay commutator-feedback) FFT
architecture was includes log2 N-1 SDC Stages and 1 SDF stage. The proposed Radix-4 SDC-SDF
architecture reduces the number of multiplications and additions as well as number of stages which
achieves reduced area and low power. The resultant architecture is simulated using Modelsim, design
verification and synthesis results are done using Xilinx ISE. The proposed architecture is compared with
Radix-2 SDC-SDF FFT and it can achieve less area as well as low power
Verilog code for design a specific processor to down sample a given image via a math-lab by using SPARTAN-6 FPGA. Math-lab code, results also included.
pipelining is the concept of decomposing the sequential process into number of small stages in which each stage execute individual parts of instruction life cycle inside the processor.
International Journal of Computational Engineering Research(IJCER) ijceronline
nternational Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Endianness or Byte OrderBhavana Honnappa, Sravya Karnati, Smit.docxgidmanmary
Endianness or Byte Order
Bhavana Honnappa, Sravya Karnati, Smita Dutta
Abstract
Computers speak different languages, like people. Some write data "left-to-right" and others "right-to-left". If a machine can read its own data it tends to encounter no problems but when one computer stores data and a different type tries to read it, that is when a problem occurs. This document aims to present how Endianness is willing to be taken into consideration how Endian specific system inter-operate sharing data without misinterpretation of the value. Endianness describes the location of the most significant byte (MSB) and least significant byte (LSB) of an address in memory and is defined by the CPU architecture implementation of the system. Unfortunately, not all computer systems are designed with constant Endian architecture. The difference in Endian architecture is a difficulty when software or data is shared between computer systems. Little and big endian are two ways of storing multibyte data- type (int, float, etc.). In little endian machines, last byte of binary representation of the multi byte data- type is stored first. On the opposite hand, in big endian machines, first byte of binary representation of the multi byte datatype is stored first. Suppose we write float value to a file on a little-endian machine and transfer this file to a big-endian machine. Unless there is correct transformation, big endian machine will read the file in reverse order. This paper targets on showcasing how CPU-based Endianness raises software issues when reading and writing the data from memory. We will try to reinterpret this information at register/system-level.
Keywords: -
endianness, big-endian, little-endian, most significant byte (MSB), least significant byte (LSB).
Definition of Endianness: -
Endianness refers to order of bits or bytes within a binary representation of a number. All computers do not store multi-byte value in the same order. The difference in Endian architecture is an issue when software or data is shared between computer systems. An analysis of the computer system and its interfaces will determine the requirements of the Endian implementation of the software. Based on which value is stored first, Endianness can be either big or small, with the adjectives referring to which value is stored first.
Little Endian and Big Endian: -
Endianness illustrates how a 32-bit pattern is held in the four bytes of memory. There are 32 bits in four bytes and 32 bits in the pattern, but a choice has to be made about which byte of memory gets what part of the pattern. There are two ways that computers commonly do this.
Little endian and Big endian are the two ways of storing multibyte data types. Little Endian and Big Endian are also called host byte order and network byte order respectively. In a multibyte data type, right most byte is called least significant byte (LSB) and left most b ...
Endianness or Byte OrderBhavana Honnappa, Sravya Karnati, Smit.docxchristinemaritza
Endianness or Byte Order
Bhavana Honnappa, Sravya Karnati, Smita Dutta
Abstract
Computers speak different languages, like people. Some write data "left-to-right" and others "right-to-left". If a machine can read its own data it tends to encounter no problems but when one computer stores data and a different type tries to read it, that is when a problem occurs. This document aims to present how Endianness is willing to be taken into consideration how Endian specific system inter-operate sharing data without misinterpretation of the value. Endianness describes the location of the most significant byte (MSB) and least significant byte (LSB) of an address in memory and is defined by the CPU architecture implementation of the system. Unfortunately, not all computer systems are designed with constant Endian architecture. The difference in Endian architecture is a difficulty when software or data is shared between computer systems. Little and big endian are two ways of storing multibyte data- type (int, float, etc.). In little endian machines, last byte of binary representation of the multi byte data- type is stored first. On the opposite hand, in big endian machines, first byte of binary representation of the multi byte datatype is stored first. Suppose we write float value to a file on a little-endian machine and transfer this file to a big-endian machine. Unless there is correct transformation, big endian machine will read the file in reverse order. This paper targets on showcasing how CPU-based Endianness raises software issues when reading and writing the data from memory. We will try to reinterpret this information at register/system-level.
Keywords: -
endianness, big-endian, little-endian, most significant byte (MSB), least significant byte (LSB).
Definition of Endianness: -
Endianness refers to order of bits or bytes within a binary representation of a number. All computers do not store multi-byte value in the same order. The difference in Endian architecture is an issue when software or data is shared between computer systems. An analysis of the computer system and its interfaces will determine the requirements of the Endian implementation of the software. Based on which value is stored first, Endianness can be either big or small, with the adjectives referring to which value is stored first.
Little Endian and Big Endian: -
Endianness illustrates how a 32-bit pattern is held in the four bytes of memory. There are 32 bits in four bytes and 32 bits in the pattern, but a choice has to be made about which byte of memory gets what part of the pattern. There are two ways that computers commonly do this.
Little endian and Big endian are the two ways of storing multibyte data types. Little Endian and Big Endian are also called host byte order and network byte order respectively. In a multibyte data type, right most byte is called least significant byte (LSB) and left most b.
AbstractComputers speak different languages, like people. .docxmakdul
Abstract
Computers speak different languages, like people. Some write data "left-to-right" and others "right-to-left". If a machine can read its own data it tends to encounter no problems but when one computer stores data and a different type tries to read it, that is when a problem occurs. This document aims to present how Endianness is willing to be taken into consideration how Endian specific system inter-operate sharing data without misinterpretation of the value. Endianness describes the location of the most significant byte (MSB) and least significant byte (LSB) of an address in memory and is defined by the CPU architecture implementation of the system. Unfortunately, not all computer systems are designed with constant Endian architecture. The difference in Endian architecture is a difficulty when software or data is shared between computer systems. Little and big endian are two ways of storing multibyte data- type (int, float, etc.). In little endian machines, last byte of binary representation of the multi byte data- type is stored first. On the opposite hand, in big endian machines, first byte of binary representation of the multi byte datatype is stored first. Suppose we write float value to a file on a little-endian machine and transfer this file to a big-endian machine. Unless there is correct transformation, big endian machine will read the file in reverse order. This paper targets on showcasing how CPU-based Endianness raises software issues when reading and writing the data from memory. We will try to reinterpret this information at register/system-level.
Keywords: -
endianness, big-endian, little-endian, most significant byte (MSB), least significant byte (LSB).
Definition of Endianness
: -
Endianness refers to order of bits or bytes within a binary representation of a number. All computers do not store multi-byte value in the same order. The difference in Endian architecture is an issue when software or data is shared between computer systems. An analysis of the computer system and its interfaces will determine the requirements of the Endian implementation of the software. Based on which value is stored first, Endianness can be either big or small, with the adjectives referring to which value is stored first.
Little Endian and Big Endian: -
Endianness illustrates how a 32-bit pattern is held in the four bytes of memory. There are 32 bits in four bytes and 32 bits in the pattern, but a choice has to be made about which byte of memory gets what part of the pattern. There are two ways that computers commonly do this.
Little endian and Big endian are the two ways of storing multibyte data types. Little Endian and Big Endian are also called host byte order and network byte order respectively. In a multibyte data type, right most byte is called least significant byte (LSB) and left most byte is called most significant byte (MSB). In little endian the least significant byte is stored first, while in big endian, most sign.
Presents features of ARM Processors, ARM architecture variants and Processor families. Further presents, ARM v4T architecture, ARM7-TDMI processor: Register organization, pipelining, modes, exception handling, bus architecture, debug architecture and interface signals.
Turning software into computer chips - HastlayerJ On The Beach
Turning software into computer chips - Hastlayer By Zoltán Lehóczky & Benedek Farkas
Software is flexible, specialized hardware is extremely fast. So why not write software, then turn it into a computer chip? This is what Hastlayer does by transforming .NET software into electronic circuits. The result is faster and uses less power while you simply keep on writing software!
EASY TUTORIAL OF HOW TO USE CAPCUT BY: FEBLESS HERNANEFebless Hernane
CapCut is an easy-to-use video editing app perfect for beginners. To start, download and open CapCut on your phone. Tap "New Project" and select the videos or photos you want to edit. You can trim clips by dragging the edges, add text by tapping "Text," and include music by selecting "Audio." Enhance your video with filters and effects from the "Effects" menu. When you're happy with your video, tap the export button to save and share it. CapCut makes video editing simple and fun for everyone!
Storytelling For The Web: Integrate Storytelling in your Design ProcessChiara Aliotta
In this slides I explain how I have used storytelling techniques to elevate websites and brands and create memorable user experiences. You can discover practical tips as I showcase the elements of good storytelling and its applied to some examples of diverse brands/projects..
Visual Style and Aesthetics: Basics of Visual Design
Visual Design for Enterprise Applications
Range of Visual Styles.
Mobile Interfaces:
Challenges and Opportunities of Mobile Design
Approach to Mobile Design
Patterns
Transforming Brand Perception and Boosting Profitabilityaaryangarg12
In today's digital era, the dynamics of brand perception, consumer behavior, and profitability have been profoundly reshaped by the synergy of branding, social media, and website design. This research paper investigates the transformative power of these elements in influencing how individuals perceive brands and products and how this transformation can be harnessed to drive sales and profitability for businesses.
Through an exploration of brand psychology and consumer behavior, this study sheds light on the intricate ways in which effective branding strategies, strategic social media engagement, and user-centric website design contribute to altering consumers' perceptions. We delve into the principles that underlie successful brand transformations, examining how visual identity, messaging, and storytelling can captivate and resonate with target audiences.
Methodologically, this research employs a comprehensive approach, combining qualitative and quantitative analyses. Real-world case studies illustrate the impact of branding, social media campaigns, and website redesigns on consumer perception, sales figures, and profitability. We assess the various metrics, including brand awareness, customer engagement, conversion rates, and revenue growth, to measure the effectiveness of these strategies.
The results underscore the pivotal role of cohesive branding, social media influence, and website usability in shaping positive brand perceptions, influencing consumer decisions, and ultimately bolstering sales and profitability. This paper provides actionable insights and strategic recommendations for businesses seeking to leverage branding, social media, and website design as potent tools to enhance their market position and financial success.
White wonder, Work developed by Eva TschoppMansi Shah
White Wonder by Eva Tschopp
A tale about our culture around the use of fertilizers and pesticides visiting small farms around Ahmedabad in Matar and Shilaj.
Hello everyone! I am thrilled to present my latest portfolio on LinkedIn, marking the culmination of my architectural journey thus far. Over the span of five years, I've been fortunate to acquire a wealth of knowledge under the guidance of esteemed professors and industry mentors. From rigorous academic pursuits to practical engagements, each experience has contributed to my growth and refinement as an architecture student. This portfolio not only showcases my projects but also underscores my attention to detail and to innovative architecture as a profession.
1. ES6102
Advanced Digital Systems Design
Complex Sequential systems
Module 6
MIPS Datapath (Case Study)
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
2. The Five Classic Components of a Computer
Processor
Input
Control
Memory
Datapath
Output
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
3. The Performance Perspective
CPI
• Performance of a machine is determined by:
– Instruction count
– Clock cycle time Inst. Count Cycle Time
– Clock cycles per instruction
• Processor design (datapath and control) will determine:
– Clock cycle time
– Clock cycles per instruction
• Single cycle processor:
– Advantage: One clock cycle per instruction
– Disadvantage: long cycle time
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
4. The Processor: Datapath & Control
• We're ready to look at an implementation of the MIPS
• Simplified to contain only:
– memory-reference instructions: lw, sw
– arithmetic-logical instructions: add, sub, and, or, slt
– control flow instructions: beq, j
• Generic Implementation:
– use the program counter (PC) to supply instruction address
– get the instruction from memory
– read registers
– use the instruction to decide exactly what to do
• All instructions use the ALU after reading the registers
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
5. The MIPS Instruction Formats
• All MIPS instructions are 32 bits long. The three instruction formats:
31 26 21 16 11 6 0
op rs rt rd shamt funct
– R-type
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
31 26 21 16 0
– I-type op rs rt immediate
6 bits 5 bits 5 bits 16 bits
– J-type 31 26 0
op target address
6 bits 26 bits
• The different fields are:
– op: operation of the instruction
– rs, rt, rd: the source and destination register specifiers
– shamt: shift amount
– funct: selects the variant of the operation in the “op” field
– address / immediate: address offset or immediate value
– target address: target address of the jump instruction
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
6. Lets look at a MIPS subset
• ADD and SUB 31 26 21 16 11 6 0
– add rd, rs, rt op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
– sub rd, rs, rt
• OR Immediate: 31 26 21 16 0
op rs rt immediate
– ori rt, rs, imm16 6 bits 5 bits 5 bits 16 bits
• LOAD and STORE Word
31 26 21 16 0
– lw rt, rs, imm16
op rs rt immediate
– sw rt, rs, imm16 6 bits 5 bits 5 bits 16 bits
• BRANCH:
31 26 21 16 0
– beq rs, rt, imm16 op rs rt immediate
6 bits 5 bits 5 bits 16 bits
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
7. Register Transfers
• Process starts by fetching the instruction
op | rs | rt | rd | shamt | funct <= MEM[ PC ]
op | rs | rt | Imm16 <= MEM[ PC ]
inst Register Transfers
ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4
SUBU R[rd] <– R[rs] – R[rt]; PC <– PC + 4
ORi R[rt] <– R[rs] + zero_ext(Imm16); PC <– PC + 4
LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16) ]; PC <– PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC <– PC + 4
BE if ( R[rs] == R[rt] ) then PC <– PC + 4 +
sign_ext(Imm16 x 4)
else PC <– PC + 4
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
8. Requirements of the Instruction Set
• Memory
– instruction & data
• Registers (32 x 32)
– read RS
– read RT
– Write RT or RD
• PC
• Extender
• Add and Sub register or extended immediate
• Add 4 or extended immediate to PC
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
9. Need a Storage Element: Register File
RW RARB
• Register File consists of 32 registers: Write Enable 5 5 5
– Two 32-bit output busses: busA
• busA and busB busW 32 32-bit 32
32 Registers busB
– One 32-bit input bus: busW Clk
• Register is selected by: 32
– RA (number) selects the register to put on busA (data)
– RB (number) selects the register to put on busB (data)
– RW (number) selects the register to be written via busW (data) when
Write Enable is 1
• Clock input (CLK)
– The CLK input is a factor ONLY during write operation
– During read operation, behaves as a combinational logic block:
ie. RA or RB valid => busA or busB valid after “access time.”
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
10. Basic Building Blocks
CarryIn
• Adder A
32
Adder
Sum
32
B Carry
Select 32
• MUX
A
32
MUX
Y
32 OP
B
32
• ALU A
32
ALU
Result
32
B
32
• Registers
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
11. So what do we need?
MemWrite
Instruction
address Address Read
data 16 32
Sign
PC
extend
Instruction Add Sum Write Data
data memory
Instruction
memory
MemRead
a. Instruction memory b. Programcounter c. Adder d. Data memory unit
. e. Sign-extension unit
ALU control
5 3
Read
register 1
Select
Read
Register 5 data 1
Read
numbers register 2
Registers Data ALU
Zero
A 32
MUX
ALU
5 Write result
register
Read 32 Y
Data
Write
data
data 2
B 32
RegWrite
f. Registers g. ALU h. Selector
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
12. How do we connect them?
• Register Transfer Requirements -> Datapath Assembly
– Instruction Fetch
– Then Read Operand and Execute Operation
• Instruction fetch
– Fetch the Instruction: mem[PC]
– Update the program counter: Clk PC
• Sequential Code: PC <- PC + 4 Next Address
• Branch and Jump: PC <- “something else” Logic
Address
Instruction Word
Instruction
Memory 32
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
13. Execution (Add and Subtract)
• R[rd] <- R[rs] op R[rt] Example: addu rd, rs, rt
– Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields
– ALUctr and RegWr: control logic after decoding the instruction
31 26 21 16 11 6 0
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
Rd Rs Rt
RegWr 5 ALUctr
5 5
busA
Rw Ra Rb
busW 32
ALU
32 32-bit Result
32 Registers 32
Clk busB
32
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
14. Execution (Logical with Immediate)
• R[rt] <- R[rs] op ZeroExt[imm16] ]
31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
Rd Rt
RegDst Mux
Rs ALUct
RegWr 5 5 5
r
busA
Rw Ra Rb
busW
ALU
32 32-bit 32 Result
32 Registers 32
Clk busB
Mux
32
ZeroExt
imm16 32
16
ALUSrc
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
15. Execution (Load Operations)
• R[rt] <- Mem[ R[rs] + SignExt[imm16] ] Example: lw rt, rs, imm16
31 26 21 16 0
op rs rt immediate
Rd Rt 6 bits 5 bits 5 bits 16 bits
RegDst Mux
Rs
RegWr 5 ALUct
5 5
r
busA W_Src
Rw Ra Rb
busW 32
ALU
32 32-bit
32 Registers 32
Clk busB MemWr
Mux
Mux
32
WrEn Adr
Extender
Data In 32
imm16 32 Data
16 32
Memory
Clk
ALUSrc
ExtOp
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
16. Execution (Store Operations)
• Mem[ R[rs] + SignExt[imm16] ] <- R[rt] Example: sw rt, rs, imm16
31 26 21 16 0
op rs rt immediate
Rd Rt 6 bits 5 bits 5 bits 16 bits
RegDst
Mux
ALUctr MemWr W_Src
Rs Rt
RegWr 5 5 5
busA
Rw Ra Rb
busW 32
ALU
32 32-bit
32 Registers 32
Clk busB
Mux
Mux
32
WrEn Adr
Extender
Data In 32 32
imm16 Data
32
16 Memory
Clk
ExtOp ALUSrc
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
17. Execution (Branch Operations)
• beq rs, rt, imm16 Datapath generates condition (equal)
31 26 21 16 0
op rs rt immediate
6 bits 5 bits 5 bits 16 bits
Inst Address Cond
nPC_sel Rs Rt
4 RegWr 5 5 5
Adder
32 busA
Rw Ra Rb
00
busW
Equal?
32 32-bit 32
Mux
Registers
PC
Clk busB
32
Adder
PC Ext
imm16
Clk
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
18. Putting it all together
• A single cycle implementation
PCSrc
1
Add M
u
x
4 ALU 0
Add result
RegWrite Shift
left 2
Instruction [25– 21] Read
Read register 1 Read MemWrite
PC data 1
address Instruction [20– 16] Read MemtoReg
ALUSrc
Instruction register 2 Zero
1 Read ALU ALU
[31– 0] Write data 2 1 Read
M result Address 1
u register M data
Instruction Instruction [15– 11] x u M
memory Write x u
0 data Registers x
0
Write Data 0
RegDst data memory
Instruction [15– 0] 16 Sign 32
extend ALU MemRead
control
Instruction [5– 0]
ALUOp
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
19. An Abstract View of the Implementation
Ideal
Control
Instruction Control Signals Conditions
Instruction
Memory
Rd Rs Rt
5 5 5
Instruction
Address
A Data
Rw Ra Rb 32 Data
Address
Next Address
32 32 Ideal Out
ALU
32 32-bit Data
PC
Registers Data In Memory
B
Clk Clk
32
Clk
Datapath
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
20. Control of the Datapath
• Control is the hard part
• MIPS makes control easier
– Instructions same size
– Source registers always in same place
– Immediates same size, location
– Operations always on registers/immediates
• Lets skip control till later
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
21. An Abstract View of the Critical Path
• Register file and ideal memory:
– The CLK input is a factor ONLY during write operation
– During read operation, behave as combinational logic:
• Address valid => Output valid after “access time.”
Critical Path (Load) = PC’s Clk-to-Q +
Ideal
Instruction Inst. Memory Access Time + Register File
Instruction Access Time + ALU ( 32-bit Add ) + Data
Memory
Rd Rs Rt Imm Memory Access Time + Setup Time for
5 5 5 16 Register File Write + Clock Skew
Instruction
Address
A Data
Next Address
Rw Ra Rb 32 Address
32 32 Ideal
ALU
32 32-bit
PC
Data
Registers Data
B Memor
In y
Clk Clk
Clk
32
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
22. Critical Path (Load Instruction)
Instruction<31:0>
<21:25>
<16:20>
<11:15>
Inst
<0:15>
Memory
Adr
Rs Rt Rd Imm16
nPC_sel RegDst ALUctr MemWr MemtoReg
+4 rt Rd Rt Equal add
1 0
4 Rs Rt
RegWr 5 5 5
Adder
busA
Rw Ra Rb =
00
busW
32
ALU
32 32-bit
Mux
32 Registers busB 32 0
PC
0
Mux
Mux
32
Adder
Clk 32
Extender WrEn Adr 1
PC Ext
Clk 1 Data In
Data
imm16
imm16 32
16
Clk Memory
sign ext
ExtOp ALUSrc
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
23. Worst Case Timing (Load)
Clk
Clk-to-Q
PC Old New Value
Value Instruction Memoey Access Time
Rs, Rt, Rd, Old Value New Value
Op, Func
Delay through Control Logic
ALUct Old Value New Value
r
ExtOp Old Value New Value
ALUSrc Old Value New Value
MemtoReg Old Value New Value Register
RegWr Old Value New Value
Write Occurs
Register File Access Time
busA Old Value New Value
Delay through Extender & Mux
busB Old Value New Value
ALU Delay
Addres Old Value New Value
s Data Memory Access Time
busW Old Value New
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
24. Single cycle (CPI=1) processor: The problem
• Long Cycle Time
• All instructions take as much time as the slowest
• Real memory is not so nice as our idealized memory
– cannot always get the job done in one (short) cycle
Arithmetic & Logical
PC Inst Memory Reg File mux ALU mux setup
Load
PC Inst Memory Reg File mux ALU Data Mem mux setup
Critical Path
Store
PC Inst Memory Reg File mux ALU Data Mem
Branch
PC Inst Memory Reg File cmp mux
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
25. Time is the problem
• For a single cycle implementation, the time from when one
instruction is started till it completes (cycle time) is long.
– Cycle time must be long enough for the load instruction:
– Cycle time for load is much longer than needed for all other
instructions
• Instead consider a multi-cycle approach.
– We will be reusing functional units
• ALU used to compute address and to increment PC
• Memory used for instruction and data
– Our control signals will not be determined solely by instruction
• We’ll use a finite state machine for control
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
26. Multicycle Approach
• Break up the instructions into steps, each step takes a cycle
– balance the amount of work to be done
– restrict each cycle to use only one major functional unit
• At the end of a cycle
– store values for use in later cycles (easiest thing to do)
– introduce additional “internal” registers
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
27. Five Execution Steps
• Instruction Fetch
• Instruction Decode and Register Fetch
• Execution, Memory Address Computation, or Branch Completion
• Memory Access or R-type instruction completion
• Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Load instruction is longest and uses all of the above steps.
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
28. Step 1: Instruction Fetch
• Use PC to get instruction and put it in the Instruction Register.
IR <- Memory[PC];
• Increment the PC by 4 and put the result back in the PC. (But what
about Branches or Jumps)
– Sequential Code:
PC <- PC + 4;
Clk PC
– Branch and Jump:
Next Address
PC <- “something else”; Logic
Address
Instruction Word
Instruction
Memory 32
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
29. Step 2: Inst. Decode and Register Fetch
• Read registers rs and rt in case we need them
A <- Reg[IR[25-21]];
B <- Reg[IR[20-16]];
• Compute the branch address in case the instruction is a branch
PC <- PC + (sign-extend(IR[15-0]) << 2);
Note: <<2 is the same
as a multiply by 4
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
30. Step 3 (instruction dependent)
• ALU is performing one of two functions, based on instruction type.
• Memory Reference:
ALUOut <- A + sign-extend(IR[15-0]);
• R-type:
ALUOut <- A op B;
• Note that in the Basic MIPS (MIPS_Basic.zip) the PC for a branch is
calculated in this stage (and not the ID stage as in the previous slide)
– Branch:
if (A==B) PC <- PC + (signext(IR[15-0]) << 2);
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
31. Step 4 & 5 (R-type or memory-access)
• Loads and stores access memory
MDR = Memory[ALUOut];
or
Memory[ALUOut] = B;
• R-type instructions finish (write back to register file)
Reg[IR[15-11]] = ALUOut;
The write actually takes place at the end of the cycle on the edge
Step 5 (The write-back step)
A load from memory to the register file needs an extra cycle to complete.
Reg[IR[20-16]]= MDR;
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
32. Summary
Action for R-type Action for memory-reference Action for Action for
Step name instructions instructions branches jumps
Instruction fetch IR = Memory[PC]
PC = PC + 4
Instruction A = Reg [IR[25-21]]
decode/register fetch B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address ALUOut = A op B ALUOut = A + sign-extend if (A ==B) then PC = PC [31-28] II
computation, branch/ (IR[15-0]) PC = ALUOut (IR[25-0]<<2)
jump completion
Memory access or R-type Reg [IR[15-11]] = Load: MDR = Memory[ALUOut]
completion ALUOut or
Store: Memory [ALUOut] = B
Memory read completion Load: Reg[IR[20-16]] = MDR
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
33. How can we reduce the cycle time?
• Cut combinational dependency graph and insert register / latch
• Do same work in two fast cycles, rather than one slow one
storage element storage element
Acyclic Acyclic
Combinational Combinational
Logic Logic (A)
=> storage element This is pipelining.
Acyclic
Combinational
storage element Logic (B)
storage element
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
34. How can we improve instruction throughput?
Ideal speedup is number of stages in the pipeline. Do we achieve this?
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
35. Why Pipeline? Because the resources are there!
Time (clock cycles)
But aren’t we
using two
ALU
I Im Reg Dm Reg
n Inst 0 resources here
s
ALU
t Inst 1 Im Reg Dm Reg
r.
ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3
ALU
Im Reg Dm Reg
e
r
Inst 4
ALU
Im Reg Dm Reg
2 School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
36. Pipelining
• What makes it easy
– all instructions are the same length
– just a few instruction formats
– memory operands appear only in loads and stores
• What makes it hard?
– structural hazards: suppose we had only one memory
– control hazards: need to worry about branch instructions
– data hazards: an instruction depends on a previous instruction
• We’ll build a simple pipeline and look at these issues
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
37. Basic Idea
What do we need to add to actually split the datapath into stages?
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
38. Pipelined Datapath
But which value do we write
back. (see next slide)
2 School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
39. Corrected Datapath
• The problem with the previous implementation:
– What happens when we writeback to the register file. What instruction
supplies the write register value (destination register)?
– Solution: We must forward (preserve) the destination register value.
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
40. Graphically Representing Pipelines
• Can help with answering questions like:
– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
41. Load word instruction
• The load word (lw) instruction is the most complicated as it uses all
stages of the datapath. Consider:
lw $10, 20($1) # R10 <- Mem[R1+20]
1. Instruction fetch:
2. Instruction Decode:
– Immediate value (20) is sign extended. src & dest. Reg values forwarded.
3. Execution:
– Immed. & src Reg values added to generate address , dest Reg forwarded.
4. Memory
– Data read from memory.
5. Writeback
– Memory data is written to register file at dest. Reg location.
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
42. Pipeline control
• We have 5 stages. What needs to be controlled in each stage?
– Instruction Fetch and PC Increment
– Instruction Decode / Register Fetch
– Execution
– Memory Stage
– Write Back
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
43. Pipeline Control
• Pass control signals along just like the data
Write-back
Execution/Address Calculation Memory access stage stage control
stage control lines control lines lines
Reg ALU ALU ALU Mem Mem Reg Mem to
Instruction Dst Op1 Op0 Src Branch Read Write write Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
44. Datapath with Control
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
45. Can pipelining get us into trouble?
• Yes: Pipeline Hazards
– structural hazards: attempt to use the same resource two different
ways at the same time
• Only one memory system and we want to access data and instruction
memory in same cycle.
– data hazards: attempt to use item before it is ready
• instruction depends on result of prior instruction still in the pipeline
– control hazards: attempt to make a decision before condition is
evaluated
• branch instructions
• Can always resolve hazards by waiting
– pipeline control must detect the hazard
– take action (or delay action) to resolve hazards
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
46. PC OP C Instruction
16 4905 add R2, R2, #2
Consider the code
17 6D08 lw R3,
18 0D95 #4(R2) R2 should be 1 but it is
19 0000 add R3, R3, R1 not updated to here
nop
ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
47. Single Memory is a Structural Hazard
Time (clock cycles)
Trying to perform
ALU
I Mem Reg Mem Reg two reads from the
n Load one memory at the
s same time.
ALU
Mem Reg Mem Reg
t Instr 1
r. Thus we need 2
ALU
Mem Reg Mem Reg separate memories.
O Instr 2 Instruction memory
r
ALU
d Mem Reg Mem Reg and Data memory
e
Instr 3
r
ALU
Mem Reg Mem Reg
Instr 4
1 School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
48. Data Hazards
• Consider R2. Note: Dependencies backwards in time are hazards
Time (clock cycles)
IF ID/RF EX MEM WB
ALU
Reg Reg
I sub r2,r1,r3 Im Dm
n
ALU
s Im Reg Dm Reg
and r4,r2,r5
t
r.
ALU
Im Reg Dm Reg
or r8,r2,r6
O
ALU
r Im Reg Dm Reg
d
and r9,r4,r2
e
ALU
Im Reg Dm Reg
r slt r1,r6,r7
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
49. Data hazards: Forwarding
• Use temporary results, don’t wait for them to be written
– register file forwarding to handle read/write to same register
– ALU forwarding
Time (in clock cycles)
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
Value of register $2 : 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Value of EX/MEM : X X X – 20 X X X X X
Value of MEM/WB : X X X X – 20 X X X X
Program
execution order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg
and $12, $2, $5 IM Reg DM Reg
or $13, $6, $2 IM Reg DM Reg
add $14, $2, $2 IM Reg DM Reg
sw $15, 100($2) IM Reg DM Reg
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
50. Data path with forwarding
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
51. Can't always forward
• Load word can still cause a hazard:
– an instruction tries to read a register following a load instruction that writes to
the same register.
• Thus, we need a hazard detection unit to “stall” the load instruction
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
52. Solution: Stalling
• We can stall the pipeline by keeping an instruction in the same stage
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
53. Hazard Detection Unit
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
54. Control Hazards:- Branch Hazards
• Stall: wait until decision is clear
– Its possible to move up decision to 2nd stage by adding hardware to
check registers as being read. How?
• Impact: 2 clock cycles per branch instruction => slow
I Time (clock cycles)
n
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
O Beq
r
ALU
d Load Mem Reg Mem Reg
e
r Need
to stall
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
55. Control Hazards:- Branch Hazards
• MIPS uses a “branch delay slot”
– the next instruction after a branch is always executed
– rely on compiler to “fill” the slot with something useful
• Works about 50% of the time. Rest must be NOPs.
I Time (clock cycles)
n
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
O Beq
r
ALU
d Misc Mem Reg Mem Reg
e
r
ALU
Load Mem Reg Mem Reg
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
56. Some MIPS Instructions
• Consider the following MIPS instructions (Note: add $2, $1, $3 is $2 <= $1 + $3)
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000001000110001000000100000"; -- add $2, $1, $3 -- 00231020
b"00000000001001100010000000100101"; -- or $4, $1, $6 -- 00262025
b"00000000010000110010100000100000"; -- add $5, $2, $3 -- 00432820
b"00010000001000010000000000000100"; -- beq $1, $1, #4 -- 10210010
b"00000000000000000000000000000000"; -- nop -- 00000000
Jumps 4
b"00000000010001100010000000100100"; -- and $4, $2, $6 -- 00462024 instructions
b"00000000001001010011000000100101"; -- or $6, $1, $5 -- 00253025
b"00000000111001110010000000100000"; -- add $4, $7, $7 -- 00E72020
b"00000000001000100001100000100000"; -- add $3, $1, $2 -- 00221820
b"00000000001001100000100000100101"; -- or $1, $1, $6 -- 00260825
b"00000000010001010001100000100000"; -- add $3, $2, $5 -- 00451820
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000000000000000000000000000"; -- nop -- 00000000
b"00000000000000000000000000000000"; -- nop -- 00000000
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
57. Some MIPS Instructions
• Remember the R format instruction
rs rd funct
000000 00001 00011 00010 00000 100000 -- add $2, $1, $3
opcode rt 0
-- add rd, rs, rt
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
58. Simulator Results These should not be there
The decoded Branch
The register file
instruction (IF Section)
contents (ID Section)
Branch decision made
New PC updated
(Ex Section)Correct instruction
(Mem Section) Section)
decoded (IF
beq $1, $1, #4 -- 10210010
6 School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
59. Datapath with Control
We need to move the
The branch
branch is made in to the
decision decision
the MEM stage.
ID stage.
2 School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
60. Lets look at the whole system
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
61. VERILOG code
• Instruction Fetch
module Stage_IF(IF_PC4_Out, IF_Instr_Out, IF_BranchPC_In,
IF_PCSrc_In, IF_Clk_In, IF_Reset_In);
output [31:0] IF_PC4_Out;
output [31:0] IF_Instruction_Out;
input [31:0] IF_BranchPC_In;
input IF_PCSrc_In;
input IF_Clk_In;
input IF_Reset_In;
Note: The Instruction memory is in this module.
Currently, the next address logic implements:
PC <- PC+4;
or: PC <- PC+4+branch_offset;
The branch offset is calculated in a later section
(ID or EX, depending on version)
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
64. Verilog Code (cont)
• Data Memory system
module Stage_MEM(MEM_PCSrc_Out, MEM_BranchPC_Out, MEM_RegWrite_Out, MEM_MemToReg_Out,
MEM_ReadData_Out, MEM_ALUResult_Out, MEM_WriteRegister_Out,
MEM_RegWrite_In, MEM_MemToReg_In, MEM_Branch_In, MEM_MemRead_In,
MEM_MemWrite_In, MEM_BranchPC_In, MEM_Zero_In, MEM_ALUResult_In,
MEM_WriteData_In, MEM_WriteRegister_In, MEM_Clk_In, MEM_Reset_In);
output MEM_PCSrc_Out, MEM_RegWrite_Out, MEM_MemToReg_Out;
output [31:0] MEM_BranchPC_Out;
output [15:0] MEM_ReadData_Out, MEM_ALUResult_Out;
output [4:0] MEM_WriteRegister_Out;
input MEM_RegWrite_In, MEM_MemToReg_In, MEM_Branch_In, MEM_Zero_In;
input [31:0] MEM_BranchPC_In;
input MEM_MemRead_In, MEM_MemWrite_In, MEM_Clk_In, MEM_Reset_In;
input [15:0] MEM_ALUResult_In, MEM_WriteData_In;
input [4:0] MEM_WriteRegister_In;
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
65. Verilog Code (cont)
• Write Back system
module Stage_WB(WB_RegWrite_Out, WB_WriteRegister_Out, WB_WriteData_Out,
WB_RegWrite_In, WB_MemToReg_In, WB_ReadData_In, WB_ALUResult_In,
WB_WriteRegister_In);
output WB_RegWrite_Out;
output [4:0] WB_WriteRegister_Out;
output [15:0] WB_WriteData_Out;
input WB_RegWrite_In;
input WB_MemToReg_In;
input [15:0] WB_ReadData_In;
input [15:0] WB_ALUResult_In;
input [4:0] WB_WriteRegister_In;
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
66. Verilog Code (cont)
• IF/ID pipeline register (#1)
module Reg_IF_ID(PC4_Out, Instruction_Out, PC4_In,
Instruction_In, Clk_In, Reset_In);
This simply passes the PC and the Instruction from
the IF to the ID stage.
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
67. Verilog Code (cont)
• Also have
– ID/EX pipeline register (#2)
module Reg_ID_EX(…)
– EX/MEM pipeline register (#3)
module Reg_EX_MEM(…)
– MEM/WB pipeline register (#4)
module Reg_MEM_WB(…)
• See the code for more detail
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers
68. The End
School of Computer Engineering ES6102: Advanced Digital Systems Design 2011
Adapted from Patterson and Hennessy, “Computer Organization and Design: The hardware/software interface”, 2 nd Ed., MKP, 1998. Copyright 1998 Morgan Kaufmann Publishers