Machine Purpose

  • 1,467 views
Uploaded on

Machine Purpose

Machine Purpose

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,467
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
26
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 1-1 Chapter 1—The General Purpose Machine Chapter 1: The General Purpose Machine Topics 1.1 The General Purpose Machine 1.2 The User’s View 1.3 The Machine/Assembly Language Programmer’s View 1.4 The Computer Architect’s View 1.5 The Computer System Logic Designer’s View 1.6 Historical Perspective 1.7 Trends and Research 1.8 Approach of the TextComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 2. 1-2 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 2 Explores the nature of machines and machine languages • Relationship of machines and languages • Generic 32-bit Simple RISC Computer—SRC • Register transfer notation—RTN • The main function of the CPU is the Register Transfer • RTN provides a formal specification of machine structure and function • Maps directly to hardware • RTN and SRC will be used for examples in subsequent chapters • Provides a general discussion of addressing modes • Presents a view of logic design aimed at implementing registers and register transfersComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 3. 1-3 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 3 • Treats 2 real machines of different types—CISC and RISC—in some depth • Discusses general machine characteristics and performance • Differences in design philosophies of • CISC (Complex Instruction Set Computer) and • RISC (Reduced Instruction Set Computer) architectures • CISC machine—Motorola MC68000 • Applies RTN to the description of real machines • RISC machine—SPARCComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 4. 1-4 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 4 This keystone chapter describes processor design at the logic gate level • Describes the connection between the instruction set and the hardware • Develops alternative 1-, 2-, and 3-bus designs of SRC at the gate level • RTN provides description of structure and function at low and high levels • Shows how to design the control unit that makes it all run • Describes two additional machine features: • implementation of exceptions (interrupts) • machine reset capabilityComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 5. 1-5 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 5 Important advanced topics in CPU design • General discussion of pipelining—having more than one instruction executing simultaneously • requirements on the instruction set • how instruction classes influence design • pipeline hazards: detection & management • Design of a pipelined version of SRC • Instruction-level parallelism—issuing more than one instruction simultaneously • Superscalar and VLIW designs • Microcoding as a way to implement controlComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 6. 1-6 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 6 The arithmetic and logic unit: ALU • Impact on system performance • Digital number systems and arithmetic in an arbitrary radix • number systems and radix conversion • integer add, subtract, multiply, and divide • Time/space trade-offs: fast parallel arithmetic • Floating point representations and operations • Branching and the ALU • Logic operations • ALU hardware designComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 7. 1-7 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 7 The memory subsystem of the computer • Structure of 1-bit RAM and ROM cells • RAM chips, boards, and modules • Concept of a memory hierarchy • nature of different levels • interaction of adjacent levels • Virtual memory • Cache design: matching cache & main memory • Memory as a complete systemComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 8. 1-8 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 8 Computer input and output: I/O • Kinds of system buses, signals and timing • Serial and parallel interfaces • Interrupts and the I/O system • Direct memory access—DMA • DMA, interrupts, and the I/O system • The hardware/software interface: device driversComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 9. 1-9 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 9 Structure, function, and performance of peripheral devices • Disk drives • Organization • Static and dynamic properties • Video display terminals • Memory-mapped video • Printers • Mouse and keyboard • Interfacing to the analog worldComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 10. 1-10 Chapter 1—The General Purpose Machine Looking Ahead—Chapter 10 Computer communications, networking, and the Internet • Communications protocols; layered networks • The OSI layer model • Point to point communication: RS-232 and ASCII • Local area networks—LANs • Example: Ethernet • Internetworking and the Internet • TCP/IP protocol stack • Packet routing and routers • IP addresses: assignment and use • Nets and subnets: subnet masks • Internet applications and futuresComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 11. 1-11 Chapter 1—The General Purpose Machine Chapter 1—A Perspective • Alan Turing showed that an abstract computer, a Turing machine, can compute any function that is computable by any means • A general purpose computer with enough memory is equivalent to a Turing machine • Over 50 years, computers have evolved • from memory size of 1 kiloword (1024 words) clock periods of 1 millisecond (0.001 s) • to memory size of a terabyte (240 bytes) and clock periods of 1 ns (10-9 s) • More speed and capacity is needed for many applications, such as real-time 3D animationComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 12. 1-12 Chapter 1—The General Purpose Machine Scales, Units, and Conventions Term Normal Usage As a power of 2 K (kilo-) 103 210 = 1,024 M (mega-) 106 220 = 1,048,576 G (giga-) 109 230 = 1,073,741,824 T (tera-) 1012 240 = 1,099,511,627,776 Term Usage Note the m (milli-) 10-3 differences µ (micro-) 10-6 between usages. n (nano-) You should commit 10-9 the powers of 2 and p (pico-) 10-12 10 to memory. Units: Bit (b), Byte (B), Nibble, Word (w), Double Word, Long Word, Second (s), Hertz (Hz)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 13. 1-13 Chapter 1—The General Purpose Machine Fig 1.1 The User’s View of a Computer 1.10 Looking Ahead comes from viewing The intellectual synthesis that the three perspectives a computer system from each design. It , effective computer leads to an efficient e functions tand how a machin is when you unders hitecture level and the system arc at the gate, ISA, chine . Whether derstand the ma that you fully un ience, is in Computer Sc yo ur career objective aspect of gineering, or some other Computer En this book st hope that is our sincere nding. computers it that understa u by providing will serve yo The user sees software, speed, storage capacity, and peripheral device functionality.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 14. 1-14 Chapter 1—The General Purpose Machine Machine/Assembly Language Programmer’s View • Machine language: • Set of fundamental instructions the machine can execute • Expressed as a pattern of 1’s and 0’s • Assembly language: • Alphanumeric equivalent of machine language • Mnemonics more human-oriented than 1’s and 0’s • Assembler: • Computer program that transliterates (one-to-one mapping) assembly to machine language • Computer’s native language is machine/assembly language • “Programmer,” as used in this course, means machine/ assembly language programmerComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 15. 1-15 Chapter 1—The General Purpose Machine Machine and Assembly Language• The assembler converts assembly language to machine language. You must also know how to do this. Op code Data reg. #5 Data reg. #4 MC68000 Assembly Language Machine Language MOVE.W D4, D5 0011 101 000 000 100 ADDI.W #9, D2 0000 000 010 111 100 0000 0000 0000 1001 Tbl 1.2 Two Motorola MC68000 InstructionsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 16. 1-16 Chapter 1—The General Purpose Machine The Stored Program Concept The stored program concept says that the program is stored with data in the computer’s memory. The computer is able to manipulate it as data—for example, to load it from disk, move it in memory, and store it back on disk. • It is the basic operating principle for every computer. • It is so common that it is taken for granted. • Without it, every instruction would have to be initiated manually.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 17. 1-17 Chapter 1—The General Purpose Machine Fig 1.2 The Fetch-Execute Process MC68000 CPU Main memory 31 0 0 Various CPU registers 0011 101 000 000 100 4000 15 0 PC 4000 15 0 IR 0011 101 000 000 100 231 – 1 15 0 Control signals The control unitComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 18. 1-18 Chapter 1—The General Purpose Machine Programmer’s Model: Instruction Set Architecture (ISA) • Instruction set: the collection of all machine operations. • Programmer sees set of instructions, along with the machine resources manipulated by them. • ISA includes • Instruction set, • Memory, and • Programmer-accessible registers of the system. • There may be temporary or scratch-pad memory used to implement some function is not part of ISA. • Not Programmer Accessible.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 19. 1-19 Chapter 1—The General Purpose Machine Fig 1.3 Programmer’s Models of 4 Commercial Machines M6800 I8086 VAX11 PPC601 (introduced 1975) (introduced 1979) (introduced 1981) (introduced 1993) 7 0 15 87 0 31 0 0 63 A AX R0 0 32 15 B Data BX 12 general 64-bit registers purpose floating point 6 special IX CX R11 registers registers purpose SP DX AP 31 registers PC FP 0 31 SP Status Address SP 0 and BP 32 32-bit PC general count SI registers purpose DI registers PSW 31 CS Memory 0 31 0 segment DS 232 bytes 0 216 bytes of main registers SS of main More than 50 memory ES memory 32-bit special capacity capacity purpose 216 – 1 232 – 1 registers IP Status More than 300 Fewer instructions than 100 instructions 0 252 bytes 0 220 bytes of main of main memory memory capacity capacity 220 – 1 252 – 1 More than 120 More than 250 instructions instructionsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 20. 1-20 Chapter 1—The General Purpose Machine Machine, Processor, and Memory State • The Machine State: contents of all registers in system, accessible to programmer or not • The Processor State: registers internal to the CPU • The Memory State: contents of registers in the memory system • “State” is used in the formal finite state machine sense • Maintaining or restoring the machine and processor state is important to many operations, especially procedure calls and interruptsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 21. 1-21 Chapter 1—The General Purpose Machine Data Type: HLL Versus Machine Language • HLLs provide type checking • Verifies proper use of variables at compile time • Allows compiler to determine memory requirements • Helps detect bad programming practices • Most machines have no type checking • The machine sees only strings of bits • Instructions interpret the strings as a type: usually limited to signed or unsigned integers and FP numbers • A given 32-bit word might be an instruction, an integer, a FP number, or 4 ASCII charactersComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 22. 1-22 Chapter 1—The General Purpose Machine Tbl 1.3 Instruction Classes Inst ruct ion Class C VAX Assembly Language Dat a Movement a = b MOV b, a b = c + d*e MPY d, e, b Arit hmet ic/ logic ADD c, b, b Cont rol flow goto LBL BR LBL • This compiler: • Maps C integers to 32-bit VAX integers • Maps C assign, *, and + to VAX MOV, MPY, and ADD • Maps C goto to VAX BR instruction • The compiler writer must develop this mapping for each language-machine pairComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 23. 1-23 Chapter 1—The General Purpose Machine Tools of the Assembly Language Programmer’s Trade • The assembler • The linker • The debugger or monitor • The development systemComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 24. 1-24 Chapter 1—The General Purpose Machine Who Uses Assembly Language • The machine designer • Must implement and trade off instruction functionality • The compiler writer • Must generate machine language from a HLL • The writer of time or space critical code • Performance goals may force program-specific optimizations of the assembly language • Special purpose or imbedded processor programmers • Special functions and heavy dependence on unique I/O devices can make HLLs uselessComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 25. 1-25 Chapter 1—The General Purpose Machine The Computer Architect’s View • Architect is concerned with design & performance • Designs the ISA for optimum programming utility and optimum performance of implementation • Designs the hardware for best implementation of the instructions • Uses performance measurement tools, such as benchmark programs, to see that goals are met • Balances performance of building blocks such as CPU, memory, I/O devices, and interconnections • Meets performance goals at lowest costComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 26. 1-26 Chapter 1—The General Purpose Machine Buses as Multiplexers • Interconnections are very important to computer • Most connections are shared • A bus is a time-shared connection or multiplexer • A bus provides a data path and control • Buses may be serial, parallel, or a combination • Serial buses transmit one bit at a time • Parallel buses transmit many bits simultaneously on many wiresComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 27. 1-27 Chapter 1—The General Purpose Machine Fig 1.4 Simple One- and Two-Bus Architectures Memory Memory Memory bus CPU CPU I/O bus Input/ Input/ output output subsystem subsystem n n-bit system bus Input/output Input/output devices devices (a) One bus (b) Two busesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 28. 1-28 Chapter 1—The General Purpose Machine Fig 1.5 The Apple Quadra 950 Bus System (Simplified) LocalTalk LocalTalk bus Printers, other interface computers ADB ADB bus Keyboard, transceiver mouse, bit pads System SCSI SCSI bus Disk drives, bus interface CD ROM drives NuBus NuBus Video and special interface purpose cards CPU Ethernet Ethernet Other computers transceiver MemoryComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 29. 1-29 Chapter 1—The General Purpose Machine Fig 1.6 The Memory Hierarchy • Modern computers have a hierarchy of memories • Allows tradeoffs of speed/cost/volatility/size, etc. • CPU sees common view of levels of the hierarchy. CPU Cache Tape Memory Main Memory Disk Memory MemoryComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 30. 1-30 Chapter 1—The General Purpose Machine Tools of the Architect’s Trade • Software models, simulators and emulators • Performance benchmark programs • Specialized measurement programs • Data flow and bottleneck analysis • Subsystem balance analysis • Parts, manufacturing, and testing cost analysisComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 31. 1-31 Chapter 1—The General Purpose Machine Logic Designer’s View • Designs the machine at the logic gate level • The design determines whether the architect meets cost and performance goals • Architect and logic designer may be a single person or teamComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 32. 1-32 Chapter 1—The General Purpose Machine Implementation Domains An implementation domain is the collection of devices, logic levels, etc. which the designer uses. Possible implementation domains: • VLSI on silicon • TTL or ECL chips • Gallium arsenide chips • PLAs or sea-of-gates arrays • Fluidic logic or optical switchesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 33. 1-33 Chapter 1—The General Purpose Machine Fig 1.7 Three Implementation Domains for the 2-1 Multiplexer • 2-1 multiplexer in three different implementation domains • Generic logic gates (abstract domain) • National Semiconductor FAST Advanced Schottky TTL (VLSI on Si) • Fiber optic directional coupler switch (optical signals in LiNbO3) U6 15 /G S1 /A/B 2 1A 3 1Y 4 1B 5 2A 6 2Y 7 S 2B I0 O 11 3A I1 I0 10 3Y 9 O 3B I0 14 4A 4Y 12 O S I1 I1 13 4B 74F257N (a) Abstract view of (b) TTL implementation (c) Optical switch Boolean logic domain implementationComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 34. 1-34 Chapter 1—The General Purpose Machine The Distinction Between Classical Logic Design and Computer Logic Design • The entire computer is too complex for traditional FSM design techniques • FSM techniques can be used “in the small” • There is a natural separation between data and control • Data path: storage cells, arithmetic, and their connections • Control path: logic that manages data path information flow • Well defined logic blocks are used repeatedly • Multiplexers, decoders, adders, etc.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 35. 1-35 Chapter 1—The General Purpose Machine Two Views of the CPU PC Register 31 0 Programmer: PC 32 32 B Bus D Q A Bus PC Logic Designer PCout (Fig 1.8): CK PCinComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 36. 1-36 Chapter 1—The General Purpose Machine Tools of the Logic Designer’s Trade • Computer-aided design tools • Logic design and simulation packages • Printed circuit layout tools • IC (integrated circuit) design and layout tools • Logic analyzers and oscilloscopes • Hardware development systemComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 37. 1-37 Chapter 1—The General Purpose Machine Historical Generations • 1st Generation: 1946–59, vacuum tubes, relays, mercury delay lines • 2nd generation: 1959–64, discrete transistors and magnetic cores • 3rd generation: 1964–75, small- and medium-scale integrated circuits • 4th generation: 1975–present, single-chip microcomputer • Integration scale: components per chip • Small: 10–100 • Medium: 100–1,000 • Large: 1000–10,000 • Very large: greater than 10,000Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 38. 1-38 Chapter 1—The General Purpose Machine Chapter 1 Summary • Three different views of machine structure and function • Machine/assembly language view: registers, memory cells, instructions • PC, IR • Fetch-execute cycle • Programs can be manipulated as data • No, or almost no, data typing at machine level • Architect views the entire system • Concerned with price/performance, system balance • Logic designer sees system as collection of functional logic blocks • Must consider implementation domain • Tradeoffs: speed, power, gate fan-in, fan-outComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 39. 2-1 Chapter 2—Machines, Machine Languages, and Digital Logic Chapter 2: Machines, Machine Languages, and Digital Logic Topics 2.1 Classification of Computers and Their Instructions 2.2 Computer Instruction Sets 2.3 Informal Description of the Simple RISC Computer, SRC 2.4 Formal Description of SRC Using Register Transfer Notation, RTN 2.5 Describing Addressing Modes with RTN 2.6 Register Transfers and Logic Circuits: From Behavior to HardwareComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 40. 2-2 Chapter 2—Machines, Machine Languages, and Digital Logic What Are the Components of an ISA?• Sometimes known as The Programmer’s Model of the machine• Storage cells • General and special purpose registers in the CPU • Many general purpose cells of same size in memory • Storage associated with I/O devices• The machine instruction set • The instruction set is the entire repertoire of machine operations • Makes use of storage cells, formats, and results of the fetch/ execute cycle • i.e., register transfers• The instruction format • Size and meaning of fields within the instruction• The nature of the fetch-execute cycle • Things that are done before the operation code is knownComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 41. 2-3 Chapter 2—Machines, Machine Languages, and Digital Logic Fig. 2.1 Programmer’s Models of Various Machines We saw in Chap. 1 a variation in number and type of storage cells M6800 I8086 VAX11 PPC601 (introduced 1975) (introduced 1979) (introduced 1981) (introduced 1993) 7 0 15 87 0 31 0 0 63 A AX R0 0 32 15 B Data BX 12 general 64-bit registers purpose floating point 6 special IX CX R11 registers registers purpose SP DX AP 31 registers PC FP 0 31 SP Status Address SP 0 and BP 32 32-bit PC general count SI registers purpose DI registers PSW 31 CS Memory 0 31 0 segment DS 232 bytes 0 216 bytes of main registers SS of main More than 50 memory ES memory 32-bit special capacity capacity purpose 216 –1 232 – 1 registers IP Status More than 300 Fewer instructions than 100 instructions 0 252 bytes 0 220 bytes of main of main memory memory capacity capacity 220 – 1 252 – 1 More than 120 More than 250 instructions instructionsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 42. 2-4 Chapter 2—Machines, Machine Languages, and Digital Logic What Must an Instruction Specify? Data Flow• Which operation to perform add r0, r1, r3 • Ans: Op code: add, load, branch, etc.• Where to find the operand or operands add r0, r1, r3 • In CPU registers, memory cells, I/O locations, or part of instruction• Place to store result add r0, r1, r3 • Again CPU register or memory cell• Location of next instruction add r0, r1, r3 br endloop • Almost always memory cell pointed to by program counter—PC• Sometimes there is no operand, or no result, or no next instruction. Can you think of examples?Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 43. 2-5 Chapter 2—Machines, Machine Languages, and Digital Logic Instructions Can Be Divided into 3 Classes • Data movement instructions • Move data from a memory location or register to another memory location or register without changing its form • Load—source is memory and destination is register • Store—source is register and destination is memory • Arithmetic and logic (ALU) instructions • Change the form of one or more operands to produce a result stored in another location • Add, Sub, Shift, etc. • Branch instructions (control flow instructions) • Alter the normal flow of control from executing the next instruction in sequence • Br Loc, Brz Loc2,—unconditional or conditional branchesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 44. 2-6 Chapter 2—Machines, Machine Languages, and Digital Logic Tbl 2.1 Examples of Data Movement InstructionsInstruction Meaning MachineMOV A, B Move 16 bits from memory location A to VAX11 Location BLDA A, Addr Load accumulator A with the byte at memory M6800 location Addrlwz R3, A Move 32-bit data from memory location A to PPC601 register R3li $3, 455 Load the 32-bit integer 455 into register $3 MIPS R3000mov R4, dout Move 16-bit data from R4 to output port dout DEC PDP11IN, AL, KBD Load a byte from in port KBD to accumulator Intel PentiumLEA.L (A0), A2 Load the address pointed to by A0 into A2 M6800 • Lots of variation, even with one instruction typeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 45. 2-7 Chapter 2—Machines, Machine Languages, and Digital Logic Tbl 2.2 Examples of ALU InstructionsInstruction Meaning MachineMULF A, B, C multiply the 32-bit floating point values at VAX11 mem loc’ns. A and B, store at Cnabs r3, r1 Store abs value of r1 in r3 PPC601ori $2, $1, 255 Store logical OR of reg $ 1 with 255 into reg $2 MIPS R3000DEC R2 Decrement the 16-bit value stored in reg R2 DEC PDP11SHL AX, 4 Shift the 16-bit value in reg AX left by 4 bit pos’ns. Intel 8086 • Notice again the complete dissimilarity of both syntax and semantics. Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 46. 2-8 Chapter 2—Machines, Machine Languages, and Digital Logic Tbl 2.3 Examples of Branch InstructionsInstruction Meaning MachineBLSS A, Tgt Branch to address Tgt if the least significant VAX11 bit of mem loc’n. A is set (i.e. = 1)bun r2 Branch to location in R2 if result of previous PPC601 floating point computation was Not a Number (NAN)beq $2, $1, 32 Branch to location (PC + 4 + 32) if contents MIPS R3000 of $1 and $2 are equalSOB R4, Loop Decrement R4 and branch to Loop if R4 ≠ 0 DEC PDP11JCXZ Addr Jump to Addr if contents of register CX ≠ 0. Intel 8086 Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 47. 2-9 Chapter 2—Machines, Machine Languages, and Digital Logic CPU Registers Associated with Flow of Control—Branch Instructions • Program counter usually locates next instruction • Condition codes may control branch • Branch targets may be separate registers Processor State C N V Z Program Counter Condition Codes • • • Branch TargetsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 48. 2-10 Chapter 2—Machines, Machine Languages, and Digital Logic HLL Conditionals Implemented by Control Flow Change • Conditions are computed by arithmetic instructions • Program counter is changed to execute only instructions associated with true conditions C language Assembly language CMP.W #5, NUM ;the comparisonif NUM==5 then SET=7 BNE L1 ;conditional branch MOV.W #7, SET ;action if true L1 ... ;action if falseComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 49. 2-11 Chapter 2—Machines, Machine Languages, and Digital Logic CPU Registers May Have a “Personality” • Architecture classes are often based on how where the operands and result are located and how they are specified by the instruction. • They can be in CPU registers or main memory: Stack Arithmetic Address General Purpose Registers Registers Registers Push Pop Top Second • • • • • • • • • • • • Stack Machine Accumulat or Machine General Regist er MachineComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 50. 2-12 Chapter 2—Machines, Machine Languages, and Digital Logic 3-, 2-, 1-, & 0-Address ISAs • The classification is based on arithmetic instructions that have two operands and one result • The key issue is “how many of these are specified by memory addresses, as opposed to being specified implicitly” • A 3-address instruction specifies memory addresses for both operands and the result R ← Op1 op Op2 • A 2-address instruction overwrites one operand in memory with the result Op2 ← Op1 op Op2 • A 1-address instruction has a processor, called the accumulator register, to hold one operand & the result (no addr. needed) Acc ← Acc op Op1 • A 0-address + uses a CPU register stack to hold both operands and the result TOS ← TOS op SOS (where TOS is Top Of Stack, SOS is Second On Stack) • The 4-address instruction, hardly ever seen, also allows the address of the next instruction to specified explicitlyComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 51. 2-13 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.2 The 4-Address Machine and Instruction Format Memory CPU add, Res, Op1, Op2, Nexti (Res ← Op1 + Op2) Op1Addr: Op1 Op2Addr: Op2 ResAddr: Res NextiAddr: Nexti Instruction format Bits: 8 24 24 24 24 add ResAddr Op1Addr Op2Addr NextiAddr Which Where to Where to find operation put result Where to find operands next instruction• Explicit addresses for operands, result, & next instruction• Example assumes 24-bit addresses • Discuss: size of instruction in bytesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 52. 2-14 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.3 The 3-Address Machine and Instruction Format Memory CPU add, Res, Op1, Op2 (Res ← Op2 + Op1) Op1Addr: Op1 Op2Addr: Op2 ResAddr: Res Program NextiAddr: Nexti 24 counter Where to find next instruction Instruction format Bits: 8 24 24 24 add ResAddr Op1Addr Op2Addr Which Where to operation put result Where to find operands • Address of next instruction kept in processor state register— the PC (except for explicit branches/jumps) • Rest of addresses in instruction • Discuss: savings in instruction word sizeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 53. 2-15 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.4 The 2-Address Machine and Instruction Format Memory CPU add Op2, Op1 (Op2 ← Op2 + Op1) Op1Addr: Op1 Op2Addr: Op2,Res Program NextiAddr: Nexti counter 24 Where to find next instruction Instruction format Bits: 8 24 24 add Op2Addr Op1Addr Which Where to find operands operation Where to put result • Result overwrites Operand 2 • Needs only 2 addresses in instruction but less choice in placing dataComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 54. 2-16 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.5 1-Address Machine and Instruction Format Memory CPU add Op1 (Acc ← Acc + Op1) Op1Addr: Op1 Where to find operand2, and where to put result Accumulator Program NextiAddr: Nexti counter 24 Where to find next instruction Need instructions to load Instruction format and store operands: Bits: 8 24 LDA OpAddr add Op1Addr STA OpAddr Which Where to find• Special CPU register, the accumulator, operation operand1 supplies 1 operand and stores result• One memory address used for other operandComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 55. 2-17 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.6 The 0-Address, or Stack, Machine and Instruction Format Instruction formats Memory CPU push Op1 (TOS ← Op1) Bits: 8 24 Op1Addr: Op1 Format push Op1Addr TOS Operation Result SOS etc. add (TOS ← TOS + SOS) Bits: 8 Stack Program Format add NextiAddr: Nexti 24 counter Which operation Where to find Where to find operands, next instruction and where to put result (on the stack)• Uses a push-down stack in CPU• Arithmetic uses stack for both operands and the result• Computer must have a 1-address instruction to push and pop operands to and from the stackComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 56. 2-18 Chapter 2—Machines, Machine Languages, and Digital Logic Example 2.1 Expression Evaluation for 3-, 2-, 1-, and 0-Address Machines Evaluat e a = (b+c)*d - e 3 - ad d r e s s 2 - ad d r e s s 1 - ad d r e ss St ac k add a, b, c load a, b load b push b mpy a, a, d add a, c add c push c sub a, a, e mpy a, d mpy d add sub a, e sub e push d store a mpy push e sub pop a • Number of instructions & number of addresses both vary • Discuss as examples: size of code in each caseComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 57. 2-19 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.7 General Register Machine and Instruction Formats CPU Registers Instruction formats Memory load load R8, Op1 (R8 ← Op1) Op1Addr: Op1 R8 load R8 Op1Addr R6 R4 add R2, R4, R6 (R2 ← R4 + R6) add R2 R4 R6 R2 Nexti Program counter• It is the most common choice in today’s general-purpose computers• Which register is specified by small “address” (3 to 6 bits for 8 to 64 registers)• Load and store have one long & one short address: 1-1/2 addresses• Arithmetic instruction has 3 “half” addressesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 58. 2-20 Chapter 2—Machines, Machine Languages, and Digital Logic Real Machines Are Not So Simple • Most real machines have a mixture of 3, 2, 1, 0, and 1-1/2 address instructions • A distinction can be made on whether arithmetic instructions use data from memory • If ALU instructions only use registers for operands and result, machine type is load-store • Only load and store instructions reference memory • Other machines have a mix of register-memory and memory-memory instructionsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 59. 2-21 Chapter 2—Machines, Machine Languages, and Digital Logic Addressing Modes • An addressing mode is hardware support for a useful way of determining a memory address • Different addressing modes solve different HLL problems • Some addresses may be known at compile time, e.g., global variables • Others may not be known until run time, e.g., pointers • Addresses may have to be computed. Examples include: • Record (struct) components: • variable base (full address) + constant (small) • Array components: • constant base (full address) + index variable (small) • Possible to store constant values w/o using another memory cell by storing them with or adjacent to the instruction itselfComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 60. 2-22 Chapter 2—Machines, Machine Languages, and Digital Logic HLL Examples of Structured Addresses• C language: rec → count Count • rec is a pointer to a record: full address variable • count is a field name: fixed byte offset, say 24 Rec →• C language: v[i] • v is fixed base address of array: full address constant V[i] • i is name of variable index: no larger than array size V→• Variables must be contained in registers or memory cells• Small constants can be contained in the instruction• Result: need for “address arithmetic.” • E.g., Address of Rec → Count is address of Rec + offset of count.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 61. 2-23 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.8 Common Addressing Modes a) Immediate Addressing b) Direct Addressing (Instruction contains the operand.) (Instruction contains Memory address of operand) Instr Opn 3 Instr Opn Addr of A LOAD #3, .... Operand LOAD A, ... c) Indirect Addressing d) Register Indirect Addressing (Instruction contains Memory (register contains address of operand) address of address of operand) Memory Operand Instr Opn R2 . . . Instr Opn Operand Addr R2 Operand Addr. LOAD (A), ... Operand LOAD [R2], ... Address of address of A e) Displacement (Based) (Indexed) Addressing f) Relative Addressing (address of operand = register +constant) (Address of operand = PC+constant) Memory Memory Instr Opn R2 4 Instr Opn 4 + + Operand Operand R2 PC LOAD 4[R2], ... Operand Addr. LOADRel 4[PC], ... Operand Addr.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 62. 2-24 Chapter 2—Machines, Machine Languages, and Digital Logic Example: Computer, SRC Simple RISC Computer • 32 general purpose registers of 32 bits • 32-bit program counter, PC, and instruction register, IR • 232 bytes of memory address space The SRC CPU Main memory 31 0 7 0 R0 32 32-bit 0 general 32 2 purpose bytes registers of R31 main R[7] means contents memory of register 7 PC M[32] means contents IR 232 – 1 of memory location 32Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 63. 2-25 Chapter 2—Machines, Machine Languages, and Digital Logic SRC Characteristics • Load-store design: only way to access memory is through load and store instructions • Only a few addressing modes are supported • ALU instructions are 3-register type • Branch instructions can branch unconditionally or conditionally on whether the value in a specified register is = 0, <> 0, >= 0, or < 0 • Branch and link instructions are similar, but leave the value of current PC in any register, useful for subroutine return • All instructions are 32 bits (1 word) longComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 64. 2-26 Chapter 2—Machines, Machine Languages, and Digital Logic SRC Basic Instruction Formats • There are three basic instruction format types • The number of register specifier fields and length of the constant field vary • Other formats result from unused fields or parts • Details of formats on next slide 31 27 26 22 21 0 op ra c1 Type 1 31 27 26 22 21 17 16 0 op ra rb c2 Type 2 31 27 26 22 21 17 16 12 11 0 op ra rb rc c3 Type 3Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 65. 2-27 Chapter 2—Machines, Machine Languages, and Digital Logic Instruction formats Example Fig 2.9 1. Id, st, la, 31 27 26 22 21 17 16 0 Id r3, A (R[3] = M[A]) Id r3, 4(r5) (R[3] = M[R[5] + 4]) (Partial) addi, andi, ori Op ra rb c2 addi r2, r4, #1 (R[2] = R[4] +1) Total of 7 31 2726 22 21 0 Idr r5, 8 (R[5] = M[PC + 8]) Detailed 2. Idr, str, lar Op ra c1 Iar r6, 45 (R[6] = PC + 45) Formats 31 27 26 22 21 17 16 0 3. neg, not Op ra rc unused neg r7, r9 (R[7] = – R[9]) unused 31 27 26 22 21 17 16 12 11 2 0 brzr r4, r0 4. br Op rb rc (c3) unused Cond (branch to R[4] if R[0] == 0) unused 31 27 26 22 21 17 16 12 11 2 0 brlnz r6, r4, r0 5. brl Op ra rb rc (c3) unused Cond (R[6] = PC; branch to R[4] if R[0] ≠ 0) 31 27 26 22 21 17 16 12 11 0 6. add, sub, Op ra rb rc unused add r0, r2, r4 (R[0] = R[2] + R[4]) and, or 31 27 26 22 21 17 4 2 0 shr r0, r1, #4 7a Op ra rb (c3) unused Count (R[0] = R[1] shifted right by 4 bits 7. shr, shra shl, shic 31 27 26 22 21 17 16 12 4 0 shl r2, r4, r6 7b Op ra rb rc (c3) unused 00000 (R[2] = R[4] shifted left by count in R[6]) 31 27 26 0 8. nop, stop Op unused stopComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 66. 2-28 Chapter 2—Machines, Machine Languages, and Digital Logic Tbl 2.4 Example SRC Load and Store Instructions • Address can be constant, constant + register, or constant + PC • Memory contents or address itself can be loaded Instruction op ra rb c1 Meaning Addressing Mode ld r1, 32 1 1 0 32 R[1] ← M[32] Direct ld r22, 24(r4) 1 22 4 24 R[22] ← M[24+R[4]] Displacement st r4, 0(r9) 3 4 9 0 M[R[9]] ← R[4] Register indirect la r7, 32 5 7 0 32 R[7] ← 32 Immediate ldr r12, -48 2 12 – -48 R[12] ← M[PC -48] Relative lar r3, 0 6 3 – 0 R[3] ← PC Register (!) (note use of la to load a constant)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 67. 2-29 Chapter 2—Machines, Machine Languages, and Digital Logic Assembly Language Forms of Arithmetic and Logic Instructions Format Example Meaning neg ra, rc neg r1, r2 ;Negate (r1 = -r2) not ra, rc not r2, r3 ;Not (r2 = r3´ ) add ra, rb, rc add r2, r3, r4 ;2’s complement addition sub ra, rb, rc ;2’s complement subtraction and ra, rb, rc ;Logical and or ra, rb, rc ;Logical or addi ra, rb, c2 addi r1, r3, #1 ;Immediate 2’s complement add andi ra, rb, c2 ;Immediate logical and ori ra, rb, c2 ;Immediate logical or• Immediate subtract not needed since constant in addi may be negativeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 68. 2-30 Chapter 2—Machines, Machine Languages, and Digital Logic Branch Instruction Format There are actually only two branch instructions: br rb, rc, c3<2..0> ; branch to R[rb] if R[rc] meets ; the condition defined by c3<2..0> brl ra, rb, rc, c3<2..0> ; R[ra] ← PC; branch as above • It is c3<2..0>, the 3 lsbs of c3, that governs what the branch condition is: lsbs condition Assy language form Example 000 never brlnv brlnv r6 001 always br, brl br r5, brl r5 010 if rc = 0 brzr, brlzr brzr r2, r4, r5 011 if rc ≠ 0 brnz, brlnz 100 if rc ≥ 0 brpl, brlpl 101 if rc < 0 brmi, brlmi • Note that branch target address is always in register R[rb]. •It must be placed there explicitly by a previous instruction.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 69. 2-31 Chapter 2—Machines, Machine Languages, and Digital Logic Tbl 2.6 Forms and Formats of the br and brl Instructions Ass’y Example instr. Meaning op ra rb rcc3 Branch lang. 〈2..0〉 Cond’n. brlnv brlnv r6 R[6] ← PC 9 6 — — 000 never br br r4 PC ← R[4] 8 — 4 — 001 always brl brl r6,r4 R[6] ← PC; 9 6 4 — 001 always PC ← R[4] brzr brzr r5,r1 if (R[1]=0) 8 — 5 1 010 zero PC ← R[5] brlzr brlzr r7,r5,r1 R[7] ← PC; 9 7 5 1 010 zero brnz brnz r1, r0 if (R[0]≠0) PC← R[1] 8 — 1 0 011 nonzero brlnz brlnz r2,r1,r0 R[2] ← PC; 9 2 1 0 011 nonzero if (R[0]≠0) PC← R[1] brpl brpl r3, r2 if (R[2]≥0) PC← R[3] 8 — 3 2 100 plus brlpl brlpl r4,r3,r2 R[4] ← PC; 9 4 3 2 plus if (R[2]≥0) PC← R[3] brmi brmi r0, r1 if (R[1]<0) PC← R[0] 8 — 0 1 101 minus brlmi brlmi r3,r0,r1 R[3] ← PC; 9 3 0 1 minus if (r1<0) PC← R[0]Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 70. 2-32 Chapter 2—Machines, Machine Languages, and Digital Logic Branch Instructions—Example C: goto Label3 SRC: lar r0, Label3 ; put branch target address into tgt reg. br r0 ; and branch • • • Label3 •••Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 71. 2-33 Chapter 2—Machines, Machine Languages, and Digital Logic Example of Conditional Branch in C: #define Cost 125 if (X<0) then X = -X; in SRC: Cost .equ 125 ;define symbolic constant .org 1000 ;next word will be loaded at address 100010 X: .dw 1 ;reserve 1 word for variable X .org 5000 ;program will be loaded at location 500010 lar r0, Over ;load address of “false” jump location ld r1, X ;load value of X into r1 brpl r0, r1 ;branch to Else if r1≥0 neg r1, r1 ;negate value Over: • • • ;continueComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 72. 2-34 Chapter 2—Machines, Machine Languages, and Digital Logic RTN (Register Transfer Notation) • Provides a formal means of describing machine structure and function • Is at the “just right” level for machine descriptions • Does not replace hardware description languages • Can be used to describe what a machine does (an abstract RTN) without describing how the machine does it • Can also be used to describe a particular hardware implementation (a concrete RTN)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 73. 2-35 Chapter 2—Machines, Machine Languages, and Digital Logic RTN (cont’d.) • At first you may find this “meta description” confusing, because it is a language that is used to describe a language • You will find that developing a familiarity with RTN will aid greatly in your understanding of new machine design concepts • We will describe RTN by using it to describe SRCComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 74. 2-36 Chapter 2—Machines, Machine Languages, and Digital Logic Some RTN Features— Using RTN to Describe a Machine’s Static Properties Static Properties • Specifying registers • IR〈31..0〉 specifies a register named “IR” having 32 bits numbered 31 to 0 • “Naming” using the := naming operator: • op〈4..0〉 := IR〈31..27〉 specifies that the 5 msbs of IR be called op, with bits 4..0 • Notice that this does not create a new register, it just generates another name, or “alias,” for an already existing register or part of a registerComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 75. 2-37 Chapter 2—Machines, Machine Languages, and Digital Logic Using RTN to Describe Dynamic Properties Dynamic Properties • Conditional expressions: (op=12) → R[ra] ← R[rb] + R[rc]: ; defines the add instruction “if” condition “then” RTN Assignment Operator This fragment of RTN describes the SRC add instruction. It says, “when the op field of IR = 12, then store in the register specified by the ra field, the result of adding the register specified by the rb field to the register specified by the rc field.”Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 76. 2-38 Chapter 2—Machines, Machine Languages, and Digital Logic Using RTN to Describe the SRC (Static) Processor State Processor state PC〈31..0〉: program counter (memory addr. of next inst.) IR〈31..0〉: instruction register Run: one bit run/halt indicator Strt: start signal R[0..31]〈31..0〉: general purpose registersComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 77. 2-39 Chapter 2—Machines, Machine Languages, and Digital Logic RTN Register Declarations • General register specifications shows some features of the notation • Describes a set of 32 32-bit registers with names R[0] to R[31] R[0..31]〈31..0〉: Name of Colon separates registers statements with Register # msb # no ordering in square brackets lsb# Bit # in .. specifies angle a range of brackets indicesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 78. 2-40 Chapter 2—Machines, Machine Languages, and Digital Logic Memory Declaration: RTN Naming Operator • Defining names with formal parameters is a powerful formatting tool • Used here to define word memory (big-endian) Main memory state Mem[0..232 - 1]〈7..0〉: 232 addressable bytes of memory M[x]〈31..0〉:= Mem[x]#Mem[x+1]#Mem[x+2]#Mem[x+3]: Dummy Naming Concatenation All bits in parameter operator operator register if no bit index givenComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 79. 2-41 Chapter 2—Machines, Machine Languages, and Digital Logic RTN Instruction Formatting Uses Renaming of IR Bits Instruction formats op〈4..0〉 := IR〈31..27〉: operation code field ra〈4..0〉 := IR〈26..22〉: target register field rb〈4..0〉 := IR〈21..17〉: operand, address index, or branch target register rc〈4..0〉 := IR〈16..12〉: second operand, conditional test, or shift count register c1〈21..0〉 := IR〈21..0〉: long displacement field c2〈16..0〉 := IR〈16..0〉: short displacement or immediate field c3〈11..0〉 := IR〈11..0〉: count or modifier fieldComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 80. 2-42 Chapter 2—Machines, Machine Languages, and Digital Logic Specifying Dynamic Properties of SRC: RTN Gives Specifics of Address CalculationEffective address calculations (occur at runtime): disp〈31..0〉 := ((rb=0) → c2〈16..0〉 {sign extend}: displacement (rb≠0) → R[rb] + c2〈16..0〉 {sign extend, 2’s comp.} ): address rel〈31..0〉 := PC〈31..0〉 + c1〈21..0〉 {sign extend, 2’s comp.}: relative address • Renaming defines displacement and relative addresses • New RTN notation is used • condition → expression means if condition then expression • modifiers in { } describe type of arithmetic or how short numbers are extended to longer ones • arithmetic operators (+ - * / etc.) can be used in expressions • Register R[0] cannot be added to a displacementComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 81. 2-43 Chapter 2—Machines, Machine Languages, and Digital Logic Detailed Questions Answered by the RTN for Addresses • What set of memory cells can be addressed by direct addressing (displacement with rb=0) • If c2〈16〉=0 (positive displacement) absolute addresses range from 00000000H to 0000FFFFH • If c2〈16〉=1 (negative displacement) absolute addresses range from FFFF0000H to FFFFFFFFH • What range of memory addresses can be specified by a relative address • The largest positive value of C1〈21..0〉 is 221-1 and its most negative value is -221, so addresses up to 221 -1 forward and 221 backward from the current PC value can be specified • Note the difference between rb and R[rb]Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 82. 2-44 Chapter 2—Machines, Machine Languages, and Digital Logic Instruction Interpretation: RTN Description of Fetch-Execute • Need to describe actions (not just declarations) • Some new notation Logical NOT Logical ANDinstruction_interpretation := (¬Run∧Strt → Run ← 1:Run → (IR ← M[PC]: PC ← PC + 4; instruction_execution) ); Register transfer Separates statements that occur in sequenceComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 83. 2-45 Chapter 2—Machines, Machine Languages, and Digital Logic RTN Sequence and Clocking • In general, RTN statements separated by : take place during the same clock pulse • Statements separated by ; take place on successive clock pulses • This is not entirely accurate since some things written with one RTN statement can take several clocks to perform • More precise difference between : and ; • The order of execution of statements separated by : does not matter • If statements are separated by ; the one on the left must be complete before the one on the right startsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 84. 2-46 Chapter 2—Machines, Machine Languages, and Digital Logic More About Instruction Interpretation RTN • In the expression IR ← M[PC]: PC ← PC + 4; which value of PC applies to M[PC] ? • The rule in RTN is that all right hand sides of “:” - separated RTs are evaluated before any LHS is changed • In logic design, this corresponds to “master-slave” operation of flip-flops • We see what happens when Run is true and when Run is false but Strt is true. What about the case of Run and Strt both false? • Since no action is specified for this case, the RTN implicitly says that no action occurs in this caseComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 85. 2-47 Chapter 2—Machines, Machine Languages, and Digital Logic Individual Instructions • instruction_interpretation contained a forward reference to instruction_execution • instruction_execution is a long list of conditional operations • The condition is that the op code specifies a given instruction • The operation describes what that instruction does • Note that the operations of the instruction are done after (;) the instruction is put into IR and the PC has been advanced to the next instructionComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 86. 2-48 Chapter 2—Machines, Machine Languages, and Digital Logic RTN Instruction Execution for Load and Store Instructions instruction_execution := ( ld (:= op= 1) → R[ra] ← M[disp]: load register ldr (:= op= 2) → R[ra] ← M[rel]: load register relative st (:= op= 3) → M[disp] ←R[ra]: store register str (:= op= 4) → M[rel] ← R[ra]: store register relative la (:= op= 5 ) → R[ra] ← disp: load displacement address lar (:= op= 6) → R[ra] ← rel: load relative address • The in-line definition (:= op=1) saves writing a separate definition ld := op=1 for the ld mnemonic • The previous definitions of disp and rel are needed to understand all the detailsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 87. 2-49 Chapter 2—Machines, Machine Languages, and Digital Logic SRC RTN—The Main Loop ii := instruction_interpretation: ie := instruction_execution : ii := ( ¬Run∧Strt → Run← 1: Run → (IR ← M[PC]: PC ← PC + 4; ie) ); ie := ( ld (:= op= 1) → R[ra] ← M[disp]: Big switch ldr (:= op= 2) → R[ra] ← M[rel]: statement ... on the opcode stop (:= op= 31) → Run ← 0: ); ii Thus ii and ie invoke each other, as coroutines.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 88. 2-50 Chapter 2—Machines, Machine Languages, and Digital Logic Use of RTN Definitions: Text Substitution Semantics ld (:= op= 1) → R[ra] ← M[disp]: disp〈31..0〉 := ((rb=0) → c2〈16..0〉 {sign extend}: (rb≠0) → R[rb] + c2〈16..0〉 {sign extend, 2’s comp.} ): ld (:= op= 1) → R[ra] ← M[ ((rb=0) → c2〈16..0〉 {sign extend}: (rb≠0) → R[rb] + c2〈16..0〉 {sign extend, 2’s comp.} ): ]: • An example: • If IR = 00001 00101 00011 00000000000001011 • then ld → R[5] ← M[ R[3] + 11 ]:Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 89. 2-51 Chapter 2—Machines, Machine Languages, and Digital Logic RTN Descriptions of SRC Branch Instructions • Branch condition determined by 3 lsbs of instruction • Link register (R[ra]) set to point to next instruction cond := ( c3〈2..0〉=0 → 0: never c3〈2..0〉=1 → 1: always c3〈2..0〉=2 → R[rc]=0: if register is zero c3〈2..0〉=3 → R[rc]≠0: if register is nonzero c3〈2..0〉=4 → R[rc]〈31〉=0: if positive or zero c3〈2..0〉=5 → R[rc]〈31〉=1 ): if negative br (:= op= 8) → (cond → PC ← R[rb]): conditional branch brl (:= op= 9) → (R[ra] ← PC: cond → (PC ← R[rb]) ): branch and linkComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 90. 2-52 Chapter 2—Machines, Machine Languages, and Digital Logic RTN for Arithmetic and Logicadd (:= op=12) → R[ra] ← R[rb] + R[rc]:addi (:= op=13) → R[ra] ← R[rb] + c2〈16..0〉 {2s comp. signext.}:sub (:= op=14) → R[ra] ← R[rb] - R[rc]:neg (:= op=15) → R[ra] ← -R[rc]:and (:= op=20) → R[ra] ← R[rb] ∧ R[rc]:andi (:= op=21) → R[ra] ← R[rb] ∧ c2〈16..0〉 {sign extend}:or (:= op=22) → R[ra] ← R[rb] ∨ R[rc]:ori (:= op=23) → R[ra] ← R[rb] ∨ c2〈16..0〉 {sign extend}:not (:= op=24) → R[ra] ← ¬R[rc]: • Logical operators: and ∧ or ∨ and not ¬Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 91. 2-53 Chapter 2—Machines, Machine Languages, and Digital Logic RTN for Shift Instructions • Count may be 5 lsbs of a register or the instruction • Notation: @ - replication, # - concatenation (c3〈4..0〉=0) → R[rc]〈4..0〉: n := ( (c3〈4..0〉≠0) → c3〈4..0〉 ): shr (:= op=26) → R[ra]〈31..0〉 ← (n @ 0) # R[rb] 〈31..n〉: shra (:= op=27) → R[ra]〈31..0〉 ← (n @ R[rb]〈31〉) # R[rb]〈31..n〉: shl (:= op=28) → R[ra]〈31..0〉 ← R[rb]〈31-n..0〉 # (n @ 0): shc (:= op=29) → R[ra]〈31..0〉 ← R[rb]〈31-n..0〉 # R[rb]〈31..32-n〉:Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 92. 2-54 Chapter 2—Machines, Machine Languages, and Digital Logic Example of Replication and Concatenation in Shift • Arithmetic shift right by 13 concatenates 13 copies of the sign bit with the upper 19 bits of the operand shra r1, r2, 13 R[2]= 1001 0111 1110 1010 1110 1100 0001 0110 13@R[2]〈31〉 # R[2]〈31..13〉 R[1]= 1111 1111 1111 1 100 1011 1111 0101 0111Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 93. 2-55 Chapter 2—Machines, Machine Languages, and Digital Logic Assembly Language for Shift • Form of assembly language instruction tells whether to set c3=0 shr ra, rb, rc ;Shift rb right into ra by 5 lsbs of rc shr ra, rb, count ;Shift rb right into ra by 5 lsbs of inst shra ra, rb, rc ;AShift rb right into ra by 5 lsbs of rc shra ra, rb, count ;AShift rb right into ra by 5 lsbs of inst shl ra, rb, rc ;Shift rb left into ra by 5 lsbs of rc shl ra, rb, count ;Shift rb left into ra by 5 lsbs of inst shc ra, rb, rc ;Shift rb circ. into ra by 5 lsbs of rc shc ra, rb, count ;Shift rb circ. into ra by 5 lsbs of instComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 94. 2-56 Chapter 2—Machines, Machine Languages, and Digital Logic End of RTN Definition of instruction_executionnop (:= op= 0) → : No operationstop (:= op= 31) → Run ← 0: Stop instruction); End of instruction_execution instruction_interpretation. • We will find special use for nop in pipelining • The machine waits for Strt after executing stop • The long conditional statement defining instruction_execution ends with a direction to go repeat instruction_interpretation, which will fetch and execute the next instruction (if Run still =1)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 95. 2-57 Chapter 2—Machines, Machine Languages, and Digital Logic Confused about RTN and SRC? • SRC is a Machine Language • It can be interpreted by either hardware or software simulator. • RTN is a Specification Language • Specification languages are languages that are used to specify other languages or systems—a metalanguage. • Other examples: LEX, YACC, VHDL, Verilog Figure 2.10 may help clear this up...Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 96. 2-58 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.10 The Relationship of RTN to SRC SRC specification written in RTN RTN compiler Generated processor SRC program Data output SRC interpreter and data or simulatorComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 97. 2-59 Chapter 2—Machines, Machine Languages, and Digital Logic A Note About Specification Languages• They allow the description of what without having to specify how.• They allow precise and unambiguous specifications, unlike natural language.• They reduce errors: • Errors due to misinterpretation of imprecise specifications written in natural language. • Errors due to confusion in design and implementation—“human error.”• Now the designer must debug the specification!• Specifications can be automatically checked and processed by tools. • An RTN specification could be input to a simulator generator that would produce a simulator for the specified machine. • An RTN specification could be input to a compiler generator that would generate a compiler for the language, whose output could be run on the simulator.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 98. 2-60 Chapter 2—Machines, Machine Languages, and Digital Logic Addressing Modes Described in RTN (Not SRC) Target registerMode name Assembler RTN meaning Use SyntaxRegister Ra R[t] ← R[a] Tmp. Var.Register indirect (Ra) R[t] ← M[R[a]] PointerImmediate #X R[t] ← X ConstantDirect, absolute X R[t] ← M[X] Global Var.Indirect (X) R[t] ← M[ M[X] ] Pointer Var.Indexed, based, X(Ra) R[t] ← M[X + R[a]] Arrays, structsor displacementRelative X(PC) R[t] ← M[X + PC] Vals stored w pgmAutoincrement (Ra)+ R[t] ← M[R[a]]; R[a] ← R[a] + 1 SequentialAutodecrement - (Ra) R[a] ← R[a] - 1; R[t] ← M[R[a]] access.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 99. 2-61 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.11 Register Transfers Hardware and Timing for a Single-Bit Register Transfer: A ← B • Implementing the RTN statement A ← B B 1 D Q D Q 0 B A Strobe 1 Q Q 0 A 1 0 Strobe (a) Hardware (b) TimingComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 100. 2-62 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.12 Multiple Bit Register Transfer: A〈m..1〉 ← B〈m..1〉 D Q D Q 1 1 Q Q D Q D Q 2 2 m D Q D Q Q Q B〈m..1〉 A〈m..1〉 Q Q D Q D Q m m Strobe Q Q B A Strobe (a) Individual flip-flops (b) Abbreviated notationComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 101. 2-63 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.13 Data Transmission View of Logic Gates • Logic gates can be used to control the transmission of data: data gate→data gate→0 gate data 1 Data gate data 2 data1(2), provided data2(1) data 1 is zero data control→data data 2 control→data Data merge control Controlled complementComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 102. 2-64 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.14 Two-Way Gated Merge, or Multiplexer • Data from multiple sources can be selected for transmission x m x m Gx x y m y m Gy y TimeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 103. 2-65 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.15 Basic Multiplexer and Symbol Abbreviation An n-way gated merge An n-way multiplexer with decoder m D0 m G0 D0 m m D1 m m D1 m G1 m m Dn–1 m k Dn–1 Gn–1 Select m (a) Multiplexer in terms of gates (b) Symbol abbreviation • Multiplexer gate signals Gi may be produced by a binary to one-out-of-n decoderComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 104. 2-66 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.16 Separating Merged Data x 0 m x y m Gx Time • Merged data can be separated by gating at the right time • It can also be strobed into a flip-flop when validComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 105. 2-67 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.17 Multiplexed Register Transfers Using Gates and Strobes Hold time m m D Q m D Q C A Q SA Q GC m GC m SB m D Q m D Q D B Q SB Q Propagation time GD Gates Strobes • Selected gate and strobe determine which RT • A←C and B←C can occur together, but not A←C and B←DComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 106. 2-68 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.18 Open-Collector NAND Gate Output Circuit +V Inputs Output 0v 0v Open (Out = +V) Out +V 0v +V Open (Out = +V) o.c. +V +V 0v Open (Out = +V) +V +V Closed (Out = 0v) (a) Open-collector NAND (b) Open-collector NAND (c) Symbol truth tableComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 107. 2-69 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.19 Wired AND Connection of Open-Collector Gates +V +V Out a b o.c. o.c. (a) Wired AND connection (b) With symbols Switch Wired AND a b output Closed(0) Closed(0) 0v (0) Closed(0) Open (1) 0v (0) Open (1) Closed(0) 0v (0) Open (1) Open (1) +V (1) (c) Truth tableComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 108. 2-70 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.20 Open-Collector Wired OR Bus • DeMorgan’s OR by not of AND of NOTS • Pull-up resistor removed from each gate - open collector • One pull-up resistor for whole bus • Forms an OR distributed over the connection +V D0 D1 Dn–1 o.c. o.c. o.c. G0 G1 Gn–1Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 109. 2-71 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.21 Tri-State Gate Internal Structure and Symbol +V Tri- Data Out Data Out state Enable Enable (a) Tri-state gate structure (b) Tri-state gate symbol Enable Data Output 0 0 Hi-Z 0 1 Hi-Z 1 0 0 1 1 1 (c) Tri-state gate truth tableComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 110. 2-72 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.22 Registers Connected by a Tri-State Bus m m m D Q D Q D Q m m m R[0] R[1] R[n – 1] S0 Q G0 S1 Q G1 Sn–1 Q Gn–1 m m m m Tri-state bus• Can make any register transfer R[i]←R[j]• Can’t have Gi = Gj = 1 for i≠j• Violating this constraint gives low resistance path from power supply to ground—with predictable results!Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 111. 2-73 Chapter 2—Machines, Machine Languages, and Digital Logic Fig 2.23 Registers and Arithmetic Units Connected by One Bus Example: m Incrementer D Q Abstract RTN R[0] m R[3] ← R[1]+R[2]; R[0]in Q R[0]out m Q D W m Wout Q Win Concrete RTN Y ← R[2]; D Q m D Q Combinational Y Z ← R[1]+Y; R[1] Yin Q m logic—no R[3] ← Z; R[1]in Q R[1]out memory Control Sequence Adder R[2]out, Yin; R[1]out, Zin; m m m D Q Q D Zout, R[3]in; R[n – 1] Z R[n – 1]in Q R[n – 1]out Zout Q ZinNotice that what could be described in one step in the abstract RTN took three steps on thisparticular hardwareComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 112. 2-74 Chapter 2—Machines, Machine Languages, and Digital Logic RTs Possible with the One-Bus Structure• R[i] or Y can get the contents of anything but Y• Since result different from operand, it cannot go on the bus that is carrying the operand• Arithmetic units thus have result registers• Only one of two operands can be on the bus at a time, so adder has register for one operand• R[i] ← R[j] + R[k] is performed in 3 steps: Y←R[k]; Z←R[j] + Y; R[i]←Z;• R[i] ← R[j] + R[k] is high level RTN description ←• Y←R[k]; Z←R[j] + Y; R[i] Z; is concrete RTN• Map to control sequence is: R[2]out, Yin; R[1]out, Zin; Zout, R[3]in;Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 113. 2-75 Chapter 2—Machines, Machine Languages, and Digital Logic From Abstract RTN to Concrete RTN to Control Sequences • The ability to begin with an abstract description, then describe a hardware design and resulting concrete RTN and control sequence is powerful. • We shall use this method in Chapter 4 to develop various hardware designs for SRC.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 114. 2-76 Chapter 2—Machines, Machine Languages, and Digital Logic Chapter 2 Summary • Classes of computer ISAs • Memory addressing modes • SRC: a complete example ISA • RTN as a description method for ISAs • RTN description of addressing modes • Implementation of RTN operations with digital logic circuits • Gates, strobes, and multiplexersComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 115. 3-1 Chapter 3—Some Real Machines Chapter 3: Some Real Machines Topics 3.1 Machine Characteristics and Performance 3.2 RISC versus CISC 3.3 A CISC Microprocessor: The Motorola MC68000 3.4 A RISC Architecture: The SPARCComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 116. 3-2 Chapter 3—Some Real Machines Practical Aspects of Machine Cost- Effectiveness • Cost for useful work is fundamental issue • Mounting, case, keyboard, etc. are dominating the cost of integrated circuits • Upward compatibility preserves software investment • Binary compatibility • Source compatibility • Emulation compatibility • Performance: strong function of applicationComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 117. 3-3 Chapter 3—Some Real Machines Performance Measures • MIPS: Millions of Instructions Per Second • Same job may take more instructions on one machine than on another • MFLOPS: Million Floating Point OPs Per Second • Other instructions counted as overhead for the floating point • Whetstones: Synthetic benchmark • A program made up to test specific performance features • Dhrystones: Synthetic competitor for Whetstone • Made up to “correct” Whetstone’s emphasis on floating point • SPEC: Selection of “real” programs • Taken from the C/Unix worldComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 118. 3-4 Chapter 3—Some Real Machines CISC Versus RISC Designs • CISC: Complex Instruction Set Computer • Many complex instructions and addressing modes • Some instructions take many steps to execute • Not always easy to find best instruction for a task • RISC: Reduced Instruction Set Computer • Few, simple instructions, addressing modes • Usually one word per instruction • May take several instructions to accomplish what CISC can do in one • Complex address calculations may take several instructions • Usually has load-store, general register ISAComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 119. 3-5 Chapter 3—Some Real Machines Design Characteristics of RISCs • Simple instructions can be done in few clocks • Simplicity may even allow a shorter clock period • A pipelined design can allow an instruction to complete in every clock period • Fixed length instructions simplify fetch and decode • The rules may allow starting next instruction without necessary results of the previous • Unconditionally executing the instruction after a branch • Starting next instruction before register load is completeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 120. 3-6 Chapter 3—Some Real Machines Other RISC Characteristics• Prefetching of instructions. (Similar to I8086.)• Pipelining: beginning execution of an instruction before the previous instruction(s) have completed. (Will cover in detail in Chapter 5.)• Superscalar operation—issuing more than one instruction simultaneously. (Instruction-level parallelism. Also covered in Chapter 5.)• Delayed loads, stores, and branches. Operands may not be available when an instruction attempts to access them.• Register windows—ability to switch to a different set of CPU registers with a single command. Alleviates procedure call/return overhead. Discussed with SPARC in this chapter.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 121. 3-7 Chapter 3—Some Real Machines Tbl 3.1 Order of Presenting or Developing a Computer ISA • Memories: structure of data storage in the computer • Processor-state registers • Main memory organization • Formats and their interpretation: meanings of register fields • Data types • Instruction format • Instruction address interpretation • Instruction interpretation: things done for all instructions • The fetch-execute cycle • Exception handling (sometimes deferred) • Instruction execution: behavior of individual instructions • Grouping of instructions into classes • Actions performed by individual instructionsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 122. 3-8 Chapter 3—Some Real Machines CISC: The Motorola MC68000 • Introduced in 1979 • One of first 32-bit microprocessors • Means that most operations are on 32-bit internal data • Some operations may use different number of bits • External data paths may not all be 32 bits wide • MC68000 had a 24-bit address bus • Complex Instruction Set Computer—CISC • Large instruction set • 14 addressing modesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 123. 3-9 Chapter 3—Some Real Machines Fig 3.1 The MC68000 Processor State 31 16 15 8 7 0 D0 224 bytes, or 0 8 general purpose data 223 16-bit words, or registers 222 longwords of main memory 223 – 1 System User byte byte D7 15 13 10 9 8 4 3 2 1 0 T S I2 I1 I0 X N Z V C Status 31 16 15 0 A0 Trace mode Supervisor state 8 address Interrupt mask registers Extend Negative CC Zero Overflow Carry A6 A7/SP/USP A7/SSP 31 23 19 0 15 0 PC IRComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 124. 3-10 Chapter 3—Some Real Machines Features of the 68000 Processor State • Distinction between 32-bit data registers and 32-bit address registers • 16-bit instruction register • Variable length instructions handled 16 bits at a time • Stack pointer registers • User stack pointer is one of the address registers • System stack pointer is a separate single register • Discuss: Why a separate system stack • Condition code register: System and user bytes • Arithmetic status (N, Z, V, C, X) is in user status byte • System status has supervisor and trace mode flags, as well as the interrupt maskComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 125. 3-11 Chapter 3—Some Real Machines RTN Processor State for the MC68000D[0..7]〈31..0〉: General purpose data registersA[0..7]〈31..0〉: Address registersA7´〈31..0〉: System stack pointerPC〈31..0〉: Program counterIR〈15..0〉: Instruction registerStatus〈15..0〉: System status byte and user status byteSP := A[7]: User stack pointer, also called USPSSP := A7´: System stack pointerC := Status〈0〉: V := Status〈1〉: Carry and Overflow flagsZ := Status〈2〉: N := Status〈3〉: Zero and Negative flagsX := Status〈4〉: Extend flagINT〈2..0〉 := Status〈10..8〉: Interrupt mask in system status byteS := Status〈13〉: T := Status〈15〉: Supervisor state and Trace mode flagsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 126. 3-12 Chapter 3—Some Real Machines Main Memory in the MC68000 Main memory: Mb[0..224-1]〈7..0〉: Memory as bytes Mw[ad]〈15..0〉 := Mb[ad]#Mb[ad+1]: Memory as words Ml[ad]〈31..0〉 := Mw[ad]#Mw[ad+2]: Memory as long words • The word and longword forms are “big-endian” • The lowest numbered byte contains the most significant bit (big end) of the word • Words and longwords have “hard” alignment constraints not described in the above RTN • Word addresses must end in one binary 0 • Longword addresses must end in two binary zerosComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 127. 3-13 Chapter 3—Some Real Machines MC68000 Supports Several Operand Types • Like many CISC machines, the 68000 allows one instruction to operate on several types • MOVE.B for bytes, MOVE.W for words, and MOVE.L for longwords; also ADD.B, ADD.W, ADD.L, etc. • Operand length is coded as bits of the instruction word • Bits coding operand type vary with instruction • For use with RTN descriptions, we assume a function d := datalen(IR) that returns 1, 2, or 4 for operand lengthComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 128. 3-14 Chapter 3—Some Real Machines Fig 3.2 Some MC68000 Instruction Formats 15 0 15 0 IR op rg2 md2 md1 rg1 md1 rg1 IR 16-bit constant Extra word (a) A 1-word move instruction (b) A 2-word instruction 15 0 15 0 IR md1 rg1 110 Reg IR Extra word 16-bit constant d/a Index reg w/l 000 disp8 Extra word Extra word 16-bit constant (c) A 3-word instruction (d) Instruction with indexed addressComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 129. 3-15 Chapter 3—Some Real Machines General Form of Addressing Modes in the MC68000 • A general address of an operand or result is specified by a 6-bit field with mode and register numbers 5 4 3 2 1 0 mode reg Provides access paths to operands • Not all operands and results can be specified by a general address: some must be in registers • Not all modes are legal in all parts of an instructionComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 130. 3-16 Chapter 3—Some Real Machines Tbl 3.2 MC68000 Addressing Modes 5 4 3 2 1 0 mode regName Mode Reg. Assembler Extra Brief description WordsData reg. direct 0 0-7 Dn 0 DnAddr. reg. direct 1 0-7 An 0 AnAddr. reg. indirect 2 0-7 (An) 0 M[An]Autoincrement 3 0-7 (An)+ 0 M[An];An←An+dAutodecrement 4 0-7 -(An) 0 An←An-d;M[An]Based 5 0-7 disp16(An) 1 M[An+disp16]Based indexed short 6 0-7 disp8(An,XnLo) 1 M[An+XnLo+disp8]Based indexed long 6 0-7 disp8(An,Xn) 1 M[An+Xn+disp8]Absolute short 7 0 addr16 1 M[addr16]Absolute long 7 1 addr32 2 M[addr32]Relative 7 2 disp16(PC) 1 M[PC+disp16]Rel. indexed short 7 3 disp8(PC,XnLo) 1 M[PC+XnLo+disp8]Rel. indexed long 7 3 disp8(PC,Xn) 1 M[PC+Xn+disp8]Immediate 7 4 #data 1-2 dataComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 131. 3-17 Chapter 3—Some Real Machines RTN Description of MC68000 5 4 3 2 1 0 Addressing mode reg • The addressing modes interpret many items • The instruction: in the IR register • The following 16-bit word: described as Mw[PC] • The D and A registers in the CPU • Many addressing modes calculate an effective memory address • Some modes designate a register • Some modes result in a constant operand • There are restrictions on the use of some modesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 132. 3-18 Chapter 3—Some Real Machines RTN Formatting for Effective Address Calculation XR[0..15]〈31..0〉 := D[0..7]〈31..0〉 # A[0..7]〈31..0〉: Index register can be D or A; xr〈3..0〉 := Mw[PC]〈15..12〉: Index specifier for index mode; wl := Mw[PC]〈11〉: Short or long index flag; dsp8〈7..0〉 := Mw[PC]〈7..0〉: Displacement for index mode; index := ( (wl=0) → XR[xr]〈15..0〉: Short or (w1=1) → XR[xr]〈31..0〉): long index value; • Either an A or a D register can be used as an index • A 4-bit field in the 2nd instruction word specifies the index register • Low order 8-bits of 2nd word are used as offset • Either 16 or 32 bits of index register may be used d/a Index reg w/l 0 0 0 disp8 = ldisp 15 14 13 12 11 10 9 8 7 0 0 = 16 bit index 1 = 32 bit index 0: index is in data register 1: index is in address registerComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 133. 3-19 Chapter 3—Some Real Machines Modes That Calculate a 5 4 3 2 1 0 Memory Address Using a 010 - 110 000 - 111 Register • md and rg are the 3-bit mode and register fields 5 4 3 2 1 0 • ea stands for effective address mode regea(md, rg) := ( (md = 2) → A[rg〈2..0〉]: Mode 2 is A register indirect; (md = 3) → Mode 3 is (A[rg〈2..0〉]; A[rg〈2..0〉] ← A[rg〈2..0〉] + d): autoincrement; (md = 4) → Mode 4 is (A[rg〈2..0〉] ← A[rg〈2..0〉] - d; A[rg〈2..0〉]): autodecrement; (md = 5) → Mode 5 is based (A[rg〈2..0〉] + Mw[PC]; PC ← PC + 2): or offset addressing; (md = 6) → Mode 6 is based (A[rg〈2..0〉] + index + dsp8; PC ← PC + 2): indexed addressing;Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 134. 3-20 Chapter 3—Some Real Machines Mode 7 Uses the Register 5 4 3 2 1 0 Field to Expand the 1 1 1 reg Number of Modes • These modes still calculate a memory addressea (md, rg) := ...(md = 7 ∧ rg = 0) → Mode 7, register 0 is (Mw[PC]{sign extend to 32 bits}; PC ← PC + 2): short absolute;(md = 7 ∧ rg = 1) → Mode 7, register 1 is (Ml[PC]; PC ← PC + 4): long absolute;(md = 7 ∧ rg = 2) → Mode 7, register 2 is (PC + Mw[PC]{sign extend to 32 bits}; program counter PC ← PC + 2): relativeaddressing;(md = 7 ∧ rg = 3) → Mode 7, register 3 is (PC + index + dsp8; PC ← PC + 2) ): relative indexed.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 135. 3-21 Chapter 3—Some Real Machines Fig 3.3 Address 5 4 3 2 1 0 Register Indirect 0 1 0 reg Addressing 68000 Main Regist ers memory Address regist er indirect ... 010 Reg A0 ... Address A7 Ex: MOVE (A6), ... Operand • Same picture for autoincrement or decrement • Address register incremented after address obtained in autoincrement • Address register decremented before address obtained in autodecrementComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 136. 3-22 Chapter 3—Some Real Machines Fig 3.4 Mode 6: Based 5 4 3 2 1 0 Indexed Addressing 1 1 0 reg Mode 6: Based indexed addressing 68000 Main Regist ers memory ... Reg A0 110 d/a Index reg w/l 0 0 0 disp8 = ldisp ... Base address 15 14 13 12 11 10 9 8 7 0 A7 + Operand 0 = 16 bit index 1 = 32 bit index 0: index is in data reg. • • 1: index is in address reg. • Index (16 or 32) D0-D7 Ex: MOVE.W LDISP (A6, D4), ... A0-A7 • • • • Three things are added to get the addressComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 137. 3-23 Chapter 3—Some Real Machines Mode 7-0,1: Absolute 5 4 3 2 1 0 Addressing 1 1 1 000 (16-bit) 001 (32-bit) Main Absolut e short addressing memory ... 111 000 (Sign extend to 32-bits) addr16 15 0 Ex: MOVE.B PRINTERPORT.W, ... Operand Absolut e long addressing ... 111 001 addr32Hi Concat. addr32Lo 15 0 Ex: MOVE.W INTVECT.L, ... • Absolute addresses can be 16 or 32 bitsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 138. 3-24 Chapter 3—Some Real Machines Mode 7, Reg 3: Relative 5 4 3 2 1 0 Indexed Addressing 1 1 1 0 1 1 Relative indexed addressing Main memory ... 111 011 d/a Index reg w/l 0 0 0 disp8 = ldisp Program count er 15 14 13 12 11 10 9 8 7 0 0 = 16 bit index + Operand 1 = 32 bit index 0: index is in data reg. 1: index is in address reg. Index (16 or 32) D0-D7 Ex: MOVE.W LDISP (PC, D4), ... A0-A7 • Same as indexed mode but uses PC instead of A register as baseComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 139. 3-25 Chapter 3—Some Real Machines Operands in Registers or Memory Can Have Different Lengths memval(md, rg) := A memory address is ( (md〈2..1〉 = 1) ∨ (md〈2..1〉 = 2) ∨ (md〈2..0〉 = 6) ∨ used with these ((md〈2..0〉 = 7) ∧ (rg〈2〉 =0)) ): modes only. opnd(md, rg) := ( The operand length in (d=1) → opndb(md, rg): (d=2) → opndw(md, rg): the instruction tells (d=4) → opndl(md, rg) ): which to use. opndl(md, rg)〈31..0〉 := ( A long operand can be ... ): ... opndw(md, rg)〈15..0〉 := ( A word operand is memval(md, rg) → Mw[ea(md, rg)]〈15..0〉: similar but needs only md =0 → D[rg]〈15..0〉: a 16-bit immediate md = 1 → A[rg]〈15..0〉: following the (md = 7 ∧ rg = 4) → (Mw[PC]〈15..0〉: PC ← PC+2) ): instruction word. opndb(md, rg)〈7..0〉 := ( Byte operands ... ... (md = 7 ∧ rg = 4) → (Mw[PC]〈7..0〉: PC ← PC+2) ): instruction word.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 140. 3-26 Chapter 3—Some Real Machines 5 4 3 2 1 0 Modes 0 and 1: Register 0 0 0 (D) reg Direct Addressing 0 0 1 (A) Data Address registers Address register direct registers Data register direct ... D0 ... 0 01 Reg A0 0 00 Reg ... ... Operand Operand D7 A7 Ex: MOVE D6, ... Ex: MOVE A6, ... • The register itself provides a place to store a result or a place to get an operand • There is no memory address with this modeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 141. 3-27 Chapter 3—Some Real Machines Fig 3.5 Mode 7, Reg 4: 5 4 3 2 1 0 Immediate Addressing 1 1 1 1 0 0 Operands are stored in the instruction Instruction word and 1 or 2 following words Byt e Word Longword ... 111 100 ... 111 100 ... 111 100 00000000 value8 value16 value16Hi 15 8 7 0 15 0 value16Lo Ex: MOVE.B #12, ... Ex: MOVE.W #1234, ... 15 0 Ex: MOVE.L #12348678, ... • Data length is specified by the opcode field, not the Mode/Reg fieldComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 142. 3-28 Chapter 3—Some Real Machines Not Every Addressing Mode Can Be Used for Results rsltadr(md, rg) := memval(md, rg) ∧ ¬(md=7 ∧(rg=2∨rg=3)): • The MC68000 disallows relative addressing for results • This is captured in RTN by defining a function that is true (= 1) if the memory address specified by the mode is legal for results • Register immediate is also legal for results, but will be handled separatelyComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 143. 3-29 Chapter 3—Some Real Machines Result Modes Must Have a Place to Write Data: Memory or Register rsltl(md, rg)〈31..0〉 := ( 32-bit result rsltadr(md, rg) → Ml[ea(md, rg)]〈31..0〉: md = 0 → D[rg]〈31..0〉: md = 1 → A[rg]〈31..0〉 ): rsltw(md, rg)〈15..0〉 := ( 16-bit result rsltadr(md, rg) → Mw[ea(md, rg)]〈15..0〉: md = 0 → D[rg]〈15..0〉: md = 1 → A[rg]〈15..0〉 ): rsltb(md, rg)〈7..0〉 := ( 8-bit result rsltadr(md, rg) → Mb[ea(md, rg)]〈7..0〉: md = 0 → D[rg]〈7..0〉: md = 1 → A[rg]〈7..0〉 ): rslt(md, rg) := ( The result length in the (d=1) → rsltb(md, rg): (d=2) → rsltw(md, rg): instruction tells (d=4) → rsltl(md, rg) ): which to useComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 144. 3-30 Chapter 3—Some Real Machines MC68000 Instruction Interpretation • Instruction interpretation is simple when exceptions are ignored Instruction_interpretation := ( Run → ( (IR〈15..0〉 ← Mw[PC]〈15..0〉: PC ← PC + 2); instruction_execution ); ): • Instructions are fetched 16 bits at a time • PC is advanced by 2 as each 16-bit word is fetched • Addressing mode may advance it a total of 2 or 4 or more words, under command from the control unitComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 145. 3-31 Chapter 3—Some Real Machines Tbl 3.3 MC68000 Data Movement InstructionsInst. Operands 1st word XNZVC Operation SizeMOVE.B EAs, EAd 0001ddddddssssss - x x 0 0 dst ← src byteMOVE.W EAs, EAd 0011ddddddssssss - x x 0 0 dst ← src wordMOVE.L EAs, EAd 0010ddddddssssss - x x 0 0 dst ← src longMOVEA.W EAs, An 0011rrr001ssssss - - - - - An ← src wordMOVEA.L EAs, An 0010rrr001ssssss - - - - - An ← src longLEA.L EAc, An 0100aaa111ssssss - - - - - An ← EA addr.EXG Dx, Dy 1100xxx1mmmmmyyy - - - - - Dx ↔ Dy long • The op code location and size depends on the instruction (compare to SRC)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 146. 3-32 Chapter 3—Some Real Machines RTN for a Typical MC68000 Move Instruction • The instruction format for Move includes mode and register for source and destination addresses 〉 op〈3..0〉 := IR〈15..12〉: rg1〈2..0 := IR〈2..0〉: md1〈2..0〉 := IR〈5..3〉: rg2〈2..0〉 := IR〈11..9〉: md2〈2..0〉 := IR〈8..6〉: tmp〈31..0〉: move (:= op〈3..2〉 := 0) → ( tmp ← opnd(md1, rg1); ( Z ← (tmp=0): N ← (tmp<0): V ← 0: C ← 0 ): rslt(md2, rg2) ← tmp ): • The temporary register tmp is used because every invocation of opnd() causes another fetchComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 147. 3-33 Chapter 3—Some Real Machines Tbl 3.4 MC68000 Integer Arithmetic and Logic Instructions Op. Operands Inst. word XNZVC Operation Sizes ADD EA,Dn 1101rrrmmmaaaaaa x x x x x dst ← dst + src b, w, l SUB EA,Dn 1001rrrmmmaaaaaa x x x x x dst ← dst - srC b, w, l CMP EA,Dn 1011rrrmmmaaaaaa - x x x x dst-src b, w, l CMPI #dat,EA 00001100wwaaaaaa - x x x x dst-immed.data b, w, l MULS EA, Dn 1100rrr111aaaaaa - x x 0 0 Dn←Dn*src l←w*w DIVS EA,Dn 1000rrr111aaaaaa - x x x 0 Dn←Dn/src l←l/w AND EA,Dn 1100rrrmmmaaaaaa - x x 0 0 dst←dst∧src b, w, l OR EA,Dn 1000rrrmmmaaaaaa - x x 0 0 dst←dst∨src b, w, l EOR EA,Dn 1011rrrmmmaaaaaa - x x 0 0 dst←dst⊕src b, w, l CLR EAs 01000010wwaaaaaa - 0 1 0 0 dst∧dst b, w, l NEG EAs 01000100wwaaaaaa - x x x x dst←0 - dst b, w, l TST EAs 01001010wwaaaaaa - x x 0 0 dst−0 b, w, l NOT EAs 01000110wwaaaaaa - x x x x dst← ¬dst b, w, lComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 148. 3-34 Chapter 3—Some Real Machines Notes on MC68000 Arithmetic and Logic InstructionsAll 2-operand ALU instructions are either D → EA or EA → D. Which is it? • Only one operand uses EA • The other operand is always accessed by Data register direct • The 3-bit mmm field specifies whether D is the source or destination, and whether it is B, W, or L Byte Word Long Destination 000 001 010 Dn 100 101 110 EA Ex: SUB EA, Dn: 1011 rrr mmm aaaaaa op Dn tbl abv. EANote: There are several exceptions to the rule above. See text and mfr. data sheet.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 149. 3-35 Chapter 3—Some Real Machines RTN Description of a Typical MC68000 Arithmetic Instruction • Subtract is a typical arithmetic instruction • Need a temporary register to hold an address tmp〈31..0〉: temporary register for address sub (:= op=9) → ( (md2〈2〉 =0) → D[rg2] ← D[rg2] - opnd(md1, rg1): (md2〈2〉 =1) → (memval(md1, rg1) → (tmp ← ea(md1, rg1); M[tmp] ← M[tmp] - D[rg2] ): ¬memval(md1, rg1) → rslt(md1, rg1) ← rslt(md1, rg1) - D[rg2]) ): • This definition does not handle the condition codesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 150. 3-36 Chapter 3—Some Real Machines MC68000 Arithmetic Shifts and Single Word Rotates Op. Operands Inst. word XV ASL ASd EA 1110000d11aaaaaa x x c 0 ASd #cnt,Dn 1110cccdww000rrr x x x ASR ASd Dm,Dn 1110RRRdww100rrr x x c Dn x ROd EA 1110011d11aaaaaa - 0 ROL ROd #cnt,Dn 1110cccdww011rrr - 0 c ROd Dm,Dn 1110RRRdww111rrr - 0 ROR Dn c • d is L or R for left or right shift, respectively • EA form has shift count of 1Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 151. 3-37 Chapter 3—Some Real Machines MC68000 Logical Shifts and Extended Rotates Op. Operands Inst. word XV LSd EA 1110001d11aaaaaa x 0 c LSL 0 LSd #cnt,Dn 1110cccdww001rrr x 0 x LSR c LSd Dm,Dn 1110RRRdww101rrr x 0 0 Dn x ROXL x ROXd EA 1110010d11aaaaaa x 0 c ROXR ROXd #cnt,Dn 1110cccdww010rrr x 0 Dn x c ROXd Dm,Dn 1110RRRdww110rrr x 0 • Field ww specifies byte, word, or longword • N and Z set according to result, C = last bit shifted outComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 152. 3-38 Chapter 3—Some Real Machines MC68000 Conditional Branch and Test InstructionsOp. Operands Inst. word OperationBcc disp 0110ccccdddddddd if (cond) then DDDDDDDDDDDDDDDD PC ← PC + dispDBcc Dn,disp 0101cccc11001rrr if ¬(cond) then Dn←Dn-1 if (Dn≠-1) then PC←PC+disp) else PC ← PC + 2Scc EA 0101cccc11aaaaaa if (cond) then (EA) ← FFH else (EA) ← 00H • DBcc is used for counted loops with an optional end condition • Scc sets a byte to the outcome of a testComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 153. 3-39 Chapter 3—Some Real Machines Conditions That Can Be Evaluated for Branch, Etc. Code Meaning Name Flag expression 0000 true T 1 0001 false F 0 0100 carry clear CC C 0101 carry set CS C 0111 equal EQ Z 0110 not equal NE Z 1011 minus MI N 1010 plus PL N 0011 low or same LS C+Z 1101 less than LT N·V+N·V 1100 greater or equal GE N·V+N·V 1110 greater than GT N·V·Z+N·V·Z 1111 less or equal LE N·V+N·V+Z 0010 high HI C·Z 1000 overflow clear VC V 1001 overflow set VS VComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 154. 3-40 Chapter 3—Some Real Machines Conditional Branches First Set Condition Codes, Then Branch if ( X = 0 ) goto LOC TST X ;ands X with itself and sets N and Z BEQ LOC ;branch to LOC if X = 0 . . . LOC: • EQ tests the right condition codes for = 0, as above, or A = B following a compare, CMP A, BComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 155. 3-41 Chapter 3—Some Real Machines MC68000 Unconditional Control Transfers Op. Operands Inst. word Operation BRA disp 01100000dddddddd PC ← PC + disp DDDDDDDDDDDDDDDD BSR disp 01100001dddddddd -(SP) ← PC; PC ← PC + disp DDDDDDDDDDDDDDDD JMP EA 0100111011aaaaaa PC ← EA JSR EA 0100111010aaaaaa -(SP) ← PC; PC ← EA • Subroutine links push the return address onto the stack pointed to by A7 = SPComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 156. 3-42 Chapter 3—Some Real Machines MC68000 Subroutine Return Instructions Op. Operands Inst. word Operation RTR 0100111001110111 CC ← (SP)+; PC ← (SP)+ RTS 0100111001110101 PC ← (SP)+ LINK An,disp 0100111001010rrr -(SP) ← An; An ← SP; DDDDDDDDDDDDDDDD SP ← SP + disp UNLK An 0100111001011rrr SP ← An; An ← (SP)+ • Subroutine linkage uses stack for return address • LINK and UNLK allocate and de-allocate multiple word stack framesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 157. 3-43 Chapter 3—Some Real Machines MC68000 Assembly Code Example: Search an Array CR EQU 13 ;Define return character. LEN EQU 132 ;Define line length. ORG $1000 ;Locate LINE at 1000H. LINE DS.B LEN ;Reserve LEN bytes of storage. MOVE.B #LEN-1,D0 ;Initialize D0 to count-1. MOVEA.L #LINE,A0 ;A0 gets start address of array. LOOP CMPI.B (A0)+,#CR ;Make the comparison. DBEQ D0,LOOP ;Double test: if LINE[131-D0]≠13 <next instruction> ; then decr. D0; if D0≠-1 branch ; to LOOP, else to next inst. • Program searches an array of bytes to find the first carriage return, ASCII code 13Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 158. 3-44 Chapter 3—Some Real Machines Pseudo-Operations in the MC68000 Assembler • A pseudo-operation is one that is performed by the assembler at assembly time, not by the CPU at run time • EQU defines a symbol to be equal to a constant. Substitution is made at assemble time Pi EQU 3.14 • DS.B (.W or .L) defines a block of storage • Any label is associated with the first word of the block Line DS.B 132 • The program loader (part of the operating system) accomplishes this -more-Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 159. 3-45 Chapter 3—Some Real Machines Pseudo Operations in the MC68000 Assembler (cont’d.) • # symbol indicates the value of the symbol instead of a location addressed by the symbol MOVE.L #1000, D0 ;moves 1000 to D0 MOVE.L 1000, D0 ;moves value at addr. 1000 to D0 • The assembler detects the difference and assembles the appropriate instruction • ORG specifies a memory address as the origin where the following code will be stored Start ORG $4000 ;next instruction/data will be loaded at ;address 4000H. • The Motorola assembler uses $ in front of a number to indicate hexadecimal • Character constants are in single quotes: ‘X’Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 160. 3-46 Chapter 3—Some Real Machines Review of Assembly, Link, Load, and Run Times • At assemble time, assembly language text is converted to (binary) machine language • They may be generated by translating instructions, hexadecimal or decimal numbers, characters, etc. • Addresses are translated by way of a symbol table • Addresses are adjusted to allow for blocks of memory reserved for arrays, etc. • At link time, separately assembled modules are combined and absolute addresses assigned • At load time, the binary words are loaded into memory • At run time, the PC is set to the starting address of the loaded module (usually the o.s. makes a jump or procedure call to that address)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 161. 3-47 Chapter 3—Some Real Machines MC68000 Assembly Language Example: Clear a Block MAIN … MOVE.L #ARRAY, A0 ;Base of array MOVE.W #COUNT, D0 ;Number of words to clear JSR CLEARW ;Make the call … CLEARW BRA LOOPE ;Branch for init. Decr. LOOPS CLR.W (A0)+ ;Autoincrement by 2 . LOOPE DBF D0, LOOPS ;Dec.D0,fall through if -1 RTS ;Finished. • Subroutine expects block base in A0, count in D0 • Linkage uses the stack pointer, so A7 cannot be used for anything elseComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 162. 3-48 Chapter 3—Some Real Machines Exceptions: Changes to Sequential Instruction Execution • Exceptions, also called interrupts, cause next instruction fetch from other than PC location • Address supplying next instruction called exception vector • Exceptions can arise from instruction execution, hardware faults, and external conditions • Externally generated exceptions usually called interrupts • Arithmetic overflow, power failure, I/O operation completion, and out of range memory access are some causes • A trace bit =1 causes an exception after every instruction • Used for debugging purposesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 163. 3-49 Chapter 3—Some Real Machines Steps in Handling MC68000 Exceptions • (1) Status change • Temporary copy of status register is made • Supervisor mode bit S is set, trace bit T is reset • (2) Exception vector address is obtained • Small address made by shifting 8 bit vector number left 2 • Contents of the longword at this vector address is the address of the next instruction to be executed • The exception handler or interrupt service routine starts there • (3) Old PC and status register are pushed onto supervisor stack, addressed by A7 = SSP • (4) PC is loaded from exception vector address • Return from handler is done by RTE • Like RTR except restores status register instead of CCsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 164. 3-50 Chapter 3—Some Real Machines Exception Priorities • When several exceptions occur at once, which exception vector is used? • Exceptions have priorities, and highest priority exception supplies the vector • MC68000 allows 7 levels of priority • Status register contains current priority • Exceptions with priority ≤ current are ignoredComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 165. 3-51 Chapter 3—Some Real Machines Exceptions and Reset Both Affect Instruction Interpretation • More processor state needed to describe reset and exception processing Reset: Reset input exc_req: Single bit exception request exc_lev〈2..0〉: Exception Level vect〈7..0〉 : Vector address for this exception exc := exc_req ∧ (exc_lev〈2..0〉 > INT〈2..0〉): There is a request, and the request level is > current mask in status reg. • exc_lev is the highest priority of any pending exceptionComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 166. 3-52 Chapter 3—Some Real Machines Exceptions Are Sensed Before Fetching Next InstructionInstruction_interpretation := (Run ∧ ¬(Reset ∨ exc) → (IR ← Mw[PC] : PC ← PC + 2); Normal execution stateReset → (INT〈2..0〉 ← 7 : S ← 1 : T ← 0: Machine reset SSP ← Ml[0] : PC ← Ml[4] : Reset ← 0 : Run ← 1 );Run ∧ ¬Reset ∧exc → (SSP ← SSP - 4; Ml[SSP] ← PC; Exception handling SSP ← SSP - 2; Mw[SSP] ← Status; S ← 1 : T ← 0 : INT〈2..0〉 ← exc_lev〈2..0〉 : PC ← Ml[vect〈7..0〉#002] );Instruction_execution ). • Reset starts the computer with a stack pointer from location 0 at the address from location 4Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 167. 3-53 Chapter 3—Some Real Machines Memory-Mapped I/O • No separate I/O space. Part of cpu memory space is devoted/ reserved for I/O instead of RAM or ROM. • Example: MC68000 has a total 24-bit address space. Suppose the top 32K is reserved for I/O: FFFFFFH . . . FF8000H } I/O Space FF7FFFH . . . 000000H } Memory Space Notice that top 32K can be addressed by a negative 16-bit value.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 168. 3-54 Chapter 3—Some Real Machines Memory-Mapped I/O in the MC68000 • Memory-mapped I/O allows µprocessor chip to have one bus for both memory and I/O • Multiple wires for both address and data • I/O uses address space that could otherwise contain memory • Not popular with machines having limited address bits • Sizes of I/O and memory “spaces” independent • Many or few I/O devices may be installed • Much or little memory may be installed • Spaces are separated by putting I/O at top end of the address spaceComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 169. 3-55 Chapter 3—Some Real Machines Fig 3.8 A Memory-Mapped Keyboard Interface FF7FFFH Memory 000000H MC68000 has a 24-bit address bus. Address space runs from 000000H CPU up to FFFFFFH. A 16-bit address constant can be Keyboard interface positive, and sign extend to an KBSTATUS address running from 000000H up FF8006H 1 to the maximum positive value, n or negative, and sign extend to anCharacter KBDATAavailable FF8008H 00001101 address running from FFFFFFH down to the last negative 16-bit value. "Q" Keyboard n-bit system bus I/O addresses in latter range can be accessed by a 16-bit constant. Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 170. 3-56 Chapter 3—Some Real Machines The SPARC (Scalable Processor ARChitecture) as a RISC Microprocessor Architecture • The SPARC is a general register, load-store architecture • It has only two addressing modes. Address = • (Reg + Reg) or (Reg + 31-bit constant) • Instructions are all 32 bits in length • SPARC has 69 basic instructions • Separate floating-point register set • First implementation had a 4-stage pipeline • Some important features not inherently RISC • Register windows: Separate but overlapping register sets available to calling and called routines • 32-bit address, big-endian organization of memoryComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 171. 3-57 Chapter 3—Some Real Machines 31 0 31 0 Fig 3.9 nPC Next program counter Y Multiply step register Simplified PC 31 0 TBR 31 0 SPARC Program counter Trap base register Processor IR 31 0 WIM 31 0 Instruction register Window-invalid mask State Condition codes n z v c Processor-status register 31 0 31 0 r31 f31 In f30 parameters r24 r23 Local registers r16 r15 Out parameters r8 r7 Global registers f2 r1 f1 r0 0 f0 Integer registers Floating-point registersComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 172. 3-58 Chapter 3—Some Real Machines Fig 3.10 SPARC Register Windows Mechanism r31 r31 in in parameters parameters r24 r24 r23 r23 local local registers registers r16 r16 r15 r31 r15 out in out parameters parameters parameters r8 r24 r8 r23 local registers r16 r15 out parameters r8 save restore CWP = N CWP = N – 1 CWP = N r7 global registers r0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 173. 3-59 Chapter 3—Some Real Machines SPARC Memory RTN for the SPARC memory: Mb[0..232-1]〈7..0〉: Byte memory Mh[a] 〈15..0〉 := Mb[a] 〈7..0〉#Mb[a + 1] 〈7..0〉: Halfword memory M[a] 〈31..0〉 := Mh[a] 〈15..0〉#Mh[a + 2] 〈15..0〉: Word memoryComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 174. 3-60 Chapter 3—Some Real Machines Register Windows Format the General Registers • 32 general integer and address registers are accessible at any one time • Global registers G0..G7 are not in any window • G0 is always zero: writes to G0 are ignored, reads return 0 • The other 24 are in a movable window from a total set of 120 • On subroutine call, the starting point changes so that 24–31 before call become 8–15 after • Registers 8–15 are used for incoming parameters • Registers 24–31 are for outgoing parameters • Current Window Pointer CWP locates register 8 • Overflow of register space causes trapComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 175. 3-61 Chapter 3—Some Real Machines save, restore, and the Current Window Pointer • CWP points to the register currently called G8 • save moves it to point of the old G24 • This makes the old G24..G31 into the new G8..G15 • If parameters are placed in G24..G31 by the caller, the callee can get them from G8..G15 • When all windows are used, save traps to a routine that saves registers to memory • Windows wrap around in the available registers • Window overflow “spills” the first window and reuses its spaceComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 176. 3-62 Chapter 3—Some Real Machines SPARC Operand Addressing • One mode computes address as sum of 2 registers; G0 gives zero if used • The other mode adds sign-extended 13-bit constant to a register • These can serve several purposes • Indexed: base in one register, index in another • Register indirect: G0 + Gn • Displacement: Gn + const, n ≠ 0 • Absolute: G0 + constant • Absolute addressing can only reach the bottom or top 4K bytes of memoryComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 177. 3-63 Chapter 3—Some Real Machines RTN for SPARC Instruction Format op〈1..0〉 := IR〈31..30〉: Instruction class, op code for format 1; disp30〈29..0〉 := IR〈29..0〉: Word displacement for call, format 1; a := IR〈29〉: Annul bit for branches, format 2a; cond〈3..0〉 := IR〈28..25〉: Branch condition select, format 2a; rd〈4..0〉 := IR〈29..25〉: Destination register for formats 2b & 3; op2〈2..0〉 := IR〈24..22〉: Op code for format 2; disp22〈21..0〉 := IR〈21..0〉: Constant for branch displacement or sethi; op3〈5..0〉 := IR〈24..19〉: Op code for format 3; rs1〈4..0〉 := IR〈18..14〉: Source register 1 for format 3; opf〈8..0〉 := IR〈13..5〉: Sub-op code for floating point, format 3a; i := IR〈13〉: Immediate operand indicator, formats 3b & c; simm13〈12..0〉 := IR〈12..0〉: Signed immediate operand for format 3c; rs2〈4..0〉 := IR〈4..0〉: Source register 2 for format 3b.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 178. 3-64 Chapter 3—Some Real Machines Fig 3.11 SPARC Instruction Formats Format number SPARC instruction formats 31 30 29 0 1. Call 01 disp30 31 30 29 28 25 24 22 21 0 2a. Branches 00 a cond op2 disp22 2b. sethi 00 rd op2 disp22 31 30 29 25 24 19 18 14 13 12 54 0 3a. Floating point op rd op3 rs1 opf rs2 3b. Data movement op rd op3 rs1 0 asi rs2 3c. ALU op rd op3 rs1 1 simm13 i (register or immediate) • Three basic formats with variationsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 179. 3-65 Chapter 3—Some Real Machines RTN For SPARC Addressing Modesadr〈31..0〉 := (i=0 → r[rs1] + r[rs2]: Address for load, store, i=1 → r[rs1] + simm13〈12..0〉 {sign ext.}): and jumpcalladr〈31..0〉 := PC〈31..0〉 + disp30〈29..0〉 #002: Call relative addressbradr〈31..0〉 := PC〈31..0〉 + disp22〈21..0〉 #002{sign ext.}: Branch addressComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 180. 3-66 Chapter 3—Some Real Machines RTN For SPARC Instruction Interpretation instruction_interpretation := (IR ← M[PC]; instruction_execution; update_PC_and_nPC; instruction_interpretation):Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 181. 3-67 Chapter 3—Some Real Machines Tbl 3.8 SPARC Data Movement Instructions Inst. Op. OPCODE Meaning ldsb 11 00 1001 Load signed byte ldsh 11 00 1010 Load signed halfword ldsw 11 00 1000 Load signed word ldub 11 00 0001 Load unsigned byte lduh 11 00 0010 Load unsigned halfword ldd 11 00 0011 Load doubleword stb 11 00 0101 Store byte sth 11 00 0110 Store halfword stw 11 00 0100 Store word std 11 00 0111 Store double word swap 11 00 1111 Swap register with memory or 10 00 0010 r[d] ← r[s1] OR (r[rs2] or immediate) sethi 00 Op2=100 High order 22 bits of Rdst ← disp22Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 182. 3-68 Chapter 3—Some Real Machines Register and Immediate Moves in the SPARC • OR is used with a G0 operand to do register-to-register moves • To load a register with a 32-bit constant, a 2-instruction sequence is used SETHI R17, #upper22 OR R17, R17, #lower10 • Doublewords are loaded into an even register and the next higher odd one • Floating-point instructions are not covered, but the 32 FP registers can hold single-length numbers, or 16 64-bit FP, or 8 128-bit FP numbersComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 183. 3-69 Chapter 3—Some Real Machines Tbl 3.9 SPARC Arithmetic Instructions Inst. Op. OPCODE Meaning add 10 0S 0000 Add or add and set condition codes addx 10 0S 1000 Add with carry: set CCs or not sub 10 0S 0100 Subtract: subtract and set CCs or not subx 10 0S 1100 Subtract with borrow: set CCs or not mulscc 10 10 1100 Do one step of multiply • All are format 3, Op = 10 • CCs are set if S = 1 and not if S = 0 • Both register and immediate forms are available • Multiply is done by software using MULSCC or using floating- point instructions • Multiply is hard to do in one clock but multiply step is notComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 184. 3-70 Chapter 3—Some Real Machines Tbl 3.10 SPARC Logical and Shift Instructions Inst. Op. OPCODE Meaning AND 10 0S 0001 AND, set CCs if S=1 or not if S=0 ANDN 10 0S 0101 NAND, set CCs or not OR 10 0S 0010 OR, set CCs or not ORN 10 0S 0110 NOR, set CCs or not XOR 10 0S 0011 XNOR(Equiv), set CCs or not SLL 10 10 0101 Shift left logical, count in RSRC2 or imm13 SRL 10 10 0110 Shift right logical, count in RSRC2 or imm13 SRA 10 10 0111 Shift right arithmetic, count as above • All instructions use format 3 with op = 10 • Both register and immediate forms are available • Condition codes set if S = 1 and undisturbed if S = 0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 185. 3-71 Chapter 3—Some Real Machines Tbl 3.11 SPARC Branch and Control Transfer Instructions Inst. Format Op Op2 or Op3 Meaning ba 2 00 010 Unconditional branch bcc 2 00 010 Conditional branch call 1 01 Call & save PC in R15 jmpl 3 10 11 1000 Jmp to EA, save PC in Rdst save 3 10 11 1100 New register window, & ADD restore 3 10 11 1101 Restore reg. window, & ADD Some condition fields: Inst. COND Inst. COND Inst. COND Inst. COND ba 1000 bne 1001 be 0001 ble 0010 bcc 1101 bcs 0101 bneg 0110 bvc 1111 bvs 0111Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 186. 3-72 Chapter 3—Some Real Machines Fig 3.12 Example SPARC Assembly Program .begin .orgprog: ld [x], %r1 ! Load a word from M[x] into register %r1. ld [y], %r2 ! Load a word from M[y] into register %r2. addcc%r1, %r2, %r3 ! %r3 ← %r1 + %r2 ; set CCs. st %r3, [z] ! Store sum into M[z]. jmpl %r15, +8, %r0 ! Return to caller. nop ! Branch delay slot.x: 15 ! Reserve storage for x, y, and z.y: 9z: 0 .end Note different syntax for SPARC. Note r15 contains return address—placed there by the OS in this case.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 187. 3-73 Chapter 3—Some Real Machines Fig 3.13 Example of Subroutine Linkage in the SPARC .begin .orgprog: ld [x], %o0 !Pass parameters in ld [y], %o1 ! first 3 output registers. call add3 !Call subroutine to put result in %o0. mov -17, %o2 !Set last parameter in delay slot st %o0, [z] !Store returned result. ...x: 15y: 9z: 0add3: save %sp,-(16*4),%sp !Get new window and adjust stack pointer. add %i0, %i1, %l0 !Add parameters that now appear in add %l0, %i3, %l0 ! input registers using a local. ret !Return. Short for jmp %i7+8. restore %l0, 0, %o0 !Result moved to caller’s %o0. .endComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 188. 3-74 Chapter 3—Some Real Machines Pipelining of the SPARC Architecture • Many aspects of the SPARC design are in support of a pipelined implementation • Simple addressing modes, simple instructions, delayed branches, load-store architecture • Simplest form of pipelining is fetch-execute overlap—fetching next instruction while executing current instruction • Pipelining breaks instruction processing into steps • A step of one instruction overlaps different steps for others • A new instruction is started (issued) before previously issued instructions are complete • Instructions guaranteed to complete in orderComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 189. 3-75 Chapter 3—Some Real Machines Fig 3.14 The SPARC MB86900 Pipeline Clock Cycle 1 2 3 4 5 6 7 Instr. 1 Fetch Dec. Exec. Write Instr. 2 Fetch Dec. Exec. Write Instr. 3 Fetch Dec. Exec. Write Instr. 4 Fetch Dec. Exec. Write • 4 pipeline stages are Fetch, Decode, Execute, and Write • Results are written to registers in Write stageComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 190. 3-76 Chapter 3—Some Real Machines Pipeline Hazards • Will be discussed later, but main issue is: • Branch or jump change the PC as late as Exec or Write, but next instruction has already been fetched • One solution is delayed branch • One (maybe 2) instruction following branch is always executed, regardless of whether branch is taken • SPARC has a delayed branch with one delay slot, but also allows the delay slot instruction to be annulled (have no effect on the machine state) if the branch is not taken • Registers to be written by one instruction may be needed by another already in the pipeline, before the update has happened (data hazard)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 191. 3-77 Chapter 3—Some Real Machines CISC versus RISC: Recap • CISCs supply powerful instructions tailored to commonly used operations, stack operations, subroutine linkage, etc. • RISCs require more instructions to do the same job • CISC instructions take varying lengths of time • RISC instructions can all be executed in the same few-cycle pipeline • RISCs should be able to finish (nearly) one instruction per clock cycleComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 192. 3-78 Chapter 3—Some Real Machines Key Concepts: RISC versus CISC • While a RISC machine may possibly have fewer instructions than a CISC, the instructions are always simpler. Multistep arithmetic operations are confined to special units. • Like all RISCs, the SPARC is a load-store machine. Arithmetic operates only on values in registers. • A few regular instruction formats and limited addressing modes make instruction decode and operand determination fast. • Branch delays are quite typical of RISC machines and arise from the way a pipeline processes branch instructions. • The SPARC does not have a load delay, which some RISCs do, and does have register windows, which many RISCs do not.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 193. 3-79 Chapter 3—Some Real Machines Chapter 3 Summary • Machine price/performance are the driving forces. • Performance can be measured in many ways: MIPS, execution time, Whetstone, Dhrystone, SPEC benchmarks. • CISC machines have fewer instructions that do more. • Instruction word length may vary widely • Addressing modes encourage memory traffic • CISC instructions are hard to map onto modern architectures • RISC machines usually have • One word per instruction • Load/store memory access • Simple instructions and addressing modes • Result in allowing higher clock cycles, prefetching, etc.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 194. 4-1 Chapter 4—Processor Design Chapter 4: Processor Design Topics 4.1 The Design Process 4.2 A 1-Bus Microarchitecture for the SRC 4.3 Data Path Implementation 4.4 Logic Design for the 1-Bus SRC 4.5 The Control Unit 4.6 The 2- and 3-Bus Processor Designs 4.7 The Machine Reset 4.8 Machine ExceptionsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 195. 4-2 Chapter 4—Processor Design Abstract and Concrete Register Transfer Descriptions • The abstract RTN for SRC in Chapter 2 defines “what,” not “how” • A concrete RTN uses a specific set of real registers and buses to accomplish the effect of an abstract RTN statement • Several concrete RTNs could implement the same ISAComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 196. 4-3 Chapter 4—Processor Design A Note on the Design Process • This chapter presents several SRC designs • We started in Chapter 2 with an informal description • In this chapter we will propose several block diagram architectures to support the abstract RTN, then we will: • Write concrete RTN steps consistent with the architecture • Keep track of demands made by concrete RTN on the hardware • Design data path hardware and identify needed control signals • Design a control unit to generate control signalsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 197. 4-4 Chapter 4—Processor Design Fig 4.1 Block Diagram of 1-Bus SRC CPU Figure 4.11 Control Unit ADD Wait PCin Gra Control signals out Control unit inputs 31 0 〈31..0〉 R0 32 32-bit 32 31 0 general PC purpose registers Main Input/ Data Path memory output R31 IR A MA A B To memory subsystem ALU C MD Memory bus C Figures 4.2, 4.3Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 198. 4-5 Chapter 4—Processor Design Fig 4.2 High-Level View of the 1-Bus SRC Design 〈31..0〉 31 0 R0 32 32-bit 32 31 0 general PC purpose registers R31 IR A MA A B To memory subsystem ALU C MD CComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 199. 4-6 Chapter 4—Processor Design Constraints Imposed by the Microarchitecture• One bus connecting most registers allows many different RTs, but only one at a time 31 0 〈31..0〉• Memory address must be copied into R0 32 32-bit 32 31 0 MA by CPU General Purpose Registers PC• Memory data written from or read into MD R31 IR• First ALU operand always in A, result A goes to C MA• Second ALU operand always comes A B To memory subsystem from bus ALU C MD• Information only goes into IR and MA C from bus • A decoder (not shown) interprets contents of IR • MA supplies address to memory, not to CPU busComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 200. 4-7 Chapter 4—Processor Design Abstract and Concrete RTN for SRC add Instruction Abstract RTN: (IR ← M[PC]: PC ← PC + 4; instruction_execution); instruction_execution := ( • • • add (:= op= 12) → R[ra] ← R[rb] + R[rc]: Tbl 4.1 Concrete RTN for the add R0 31 0 〈31..0〉 32 Instruction 32 32-bit General 31 PC 0 Purpose Registers Step RTN T0 MA ← PC: C ← PC + 4; R31 T1 MD ← M[MA]: PC ← C; IR T2 IR ← MD; IF A T3 A ← R[rb]; IEx. A B MA T4 C ← A + R[rc]; ALU To memory subsystem T5 R[ra] ← C; C MD C• Parts of 2 RTs (IR ← M[PC]: PC ← PC + 4;) done in T0• Single add RT takes 3 concrete RTs (T3, T4, T5)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 201. 4-8 Chapter 4—Processor Design Concrete RTN Gives Information About Sub-units • The ALU must be able to add two 32-bit values • ALU must also be able to increment B input by 4 • Memory read must use address from MA and return data to MD • Two RTs separated by : in the concrete RTN, as in T0 and T1, are operations at the same clock • Steps T0, T1, and T2 constitute instruction fetch, and will be the same for all instructions • With this implementation, fetch and execute of the add instruction takes 6 clock cyclesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 202. 4-9 Chapter 4—Processor Design Concrete RTN for Arithmetic Instructions: addi Abstract RTN: addi (:= op= 13) → R[ra] ← R[rb] + c2〈16..0〉 {2s complement sign extend} : 31 0 〈31..0〉 Concrete RTN for addi: R0 32 32-bit 32 31 0 General PC Purpose Registers Step RTN T0. MA ← PC: C ← PC + 4; R31 T1. MD ← M[MA]; PC ← C; IR T2. IR ← MD; Instr Fetch A T3. A ← R[rb]; Instr Execn. A B MA T4. C ← A + c2〈16..0〉 {sign ext.}; ALU To memory subsystem T5. R[ra] ← C; C MD C• Differs from add only in step T4• Establishes requirement for sign extend hardwareComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 203. 4-10 Chapter 4—Processor Design Fig 4.3 More Complete View of Registers and Buses in the 1-Bus SRC Design, Including Some Control Signals Figure 4.4 31 0 〈31..0〉 31 0 • Concrete RTN R0 32 32-bit 32 PC D Q CON lets us add general CONin purpose registers Cond Figure 4.9 detail to the logic data path Op 5 c3〈2..0〉 R31 5 Register select Select logic – Instruction IR Figure 4.5 register logic Select logic and new paths A 32 c1〈31..0〉 32 c2〈31..0〉 – Condition bit flip-flop A B MA ALU To memory subsystem Figure 4.6 – Shift count C MD register 〈4..0〉 4 0 C n n=0 Decrement Shift count, n Figure 4.8 Keep this slide in mind Figure 4.7 as we discuss concrete RTN of instructions.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 204. 4-11 Chapter 4—Processor Design Abstract and Concrete RTN for Load and Store ld (:= op= 1) → R[ra] ← M[disp] : st (:= op= 3) → M[disp] ← R[ra] : where disp〈31..0〉 := ((rb=0) → c2〈16..0〉 {sign ext.} : (rb≠0) → R[rb] + c2〈16..0 {sign extend, 2s comp.} ) : 〉 Tbl 4.3 The ld and St (load/store register from memory) Instructions Step RTN for ld RTN for st T0–T2 Instruction fetch T3 A ← (rb = 0 → 0: rb ≠ 0 → R[rb]); T4 C ← A + (16@IR〈16〉#IR〈15..0〉); T5 MA ← C; T6 MD ← M[MA]; MD ← R[ra]; T7 R[ra] ← MD; M[MA] ← MD;Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 205. 4-12 Chapter 4—Processor Design Notes for Load and Store RTN • Steps T0 through T2 are the same as for add and addi, and for all instructions • In addition, steps T3 through T5 are the same for ld and st, because they calculate disp • A way is needed to use 0 for R[rb] when rb = 0 • 15-bit sign extension is needed for IR〈16..0〉 • Memory read into MD occurs at T6 of ld • Write of MD into memory occurs at T7 of stComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 206. 4-13 Chapter 4—Processor Design Concrete RTN for Conditional Branch br (:= op= 8) → (cond → PC ← R[rb]): cond := ( c3〈2..0〉=0 → 0: never c3〈2..0〉=1 → 1: always c3〈2..0〉=2 → R[rc]=0: if register is zero c3〈2..0〉=3 → R[rc]≠0: if register is nonzero c3〈2..0〉=4 → R[rc]〈31〉=0: if positive or zero c3〈2..0〉=5 → R[rc]〈31〉=1 ): if negative Tbl 4.4 The Branch Instruction, br Step RTN T0–T2 Instruction fetch T3 CON ← cond(R[rc]); T4 CON → PC ← R[rb];Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 207. 4-14 Chapter 4—Processor Design Notes on Conditional Branch RTN • c3〈2..0〉 are just the low-order 3 bits of IR • cond() is evaluated by a combinational logic circuit having inputs from R[rc] and c3〈2..0〉 • The one bit register CON is not accessible to the programmer and only holds the output of the combinational logic for the condition • If the branch succeeds, the program counter is replaced by the contents of a general registerComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 208. 4-15 Chapter 4—Processor Design Abstract and Concrete RTN for SRC Shift Right shr (:= op = 26) → R[ra]〈31..0〉 ← (n @ 0) # R[rb]〈31..n〉 : n := ( (c3〈4..0〉 = 0) → R[rc]〈4..0〉 : Shift count in register (c3〈4..0〉 ≠ 0) → c3〈4..0〉 ): or constant field of instruction Tbl 4.5 The shr Instruction Step Concrete RTN T0–T2 Instruction fetch T3 n ← IR〈4..0〉; T4 (n = 0) → (n ← R[rc]〈4..0 〉); T5 C ← R[rb]; T6 Shr (:= (n ≠ 0) → (C〈31..0〉 ← 0#C〈31..1〉: n ← n - 1; Shr) ); T7 R[ra] ← C; step T6 is repeated n timesComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 209. 4-16 Chapter 4—Processor Design Notes on SRC Shift RTN • In the abstract RTN, n is defined with := • In the concrete RTN, it is a physical register • n not only holds the shift count but is used as a counter in step T6 • Step T6 is repeated n times as shown by the recursion in the RTN • The control for such repeated steps will be treated laterComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 210. 4-17 Chapter 4—Processor Design Data Path/Control Unit Separation • Interface between data path and control consists of gate and strobe signals • A gate selects one of several values to apply to a common point, say a bus • A strobe changes the values of the flip-flops in a register to match new inputs • The type of flip-flop used in registers has much influence on control and some on data path • Latch: simpler hardware, but more complex timing • Edge triggering: simpler timing, but about twice the hardwareComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 211. 4-18 Chapter 4—Processor Design Reminder on Latch- and Edge-Triggered Operation • Latch output follows input while strobe is high D D Q C C Q • Edge-triggering samples input at edge time D D Q C C QComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 212. 4-19 Chapter 4—Processor Design Fig 4.4 The Register File and Its Control Signals Bus 31 27 26 22 21 17 16 12 11 b<31...0> 2 IR Op ra rb rc 32 5 5 5 1 Gra Grb Grc 32• Rout gates selected D Q 3 32 register onto bus 5 R31 6 5 5 5 Q 32• Rin strobed selected register from bus 8 5 From Figure 4.3 31 0 32 5 to 32 decoder R0 31 1 32 32-bit R31 32 general 32 D Q purpose 5 R1 6 32 5 registers Select logic IR 1 Q R31 R1 4 0 32 8 R0 1 D Q 32 32• BAout differs from Rout by 5 R0 6 32 gating 0 when R[0] is Rin Q selected Rout BA = Base Address BAout 7Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 213. 4-20 Chapter 4—Processor Design Fig 4.5 Extracting c1, c2, and OP from the Instruction Register, IR<31...0> 32 D Q To control unit 5 5• I〈21〉 is the sign bit of C1 that must Q be extended IR〈31..27〉 From Figure 4.3 D Q 5 10 5 Q 〈31..22〉 Op IR IR〈26..22〉 Select logic 32 c1〈31..0〉 D Q 1 1 〈21〉 32 c2〈31..0〉 Q IR〈21〉 c1out D Q 〈20..17〉• I〈16〉 is the sign bit of C2 that must 4 Q 4 15 be extended c2out IR〈20..17〉 〈31..17〉 1• Sign bits are fanned out from one 1 D Q 1 〈16〉 to several bits and gated to bus Q IR〈16〉 D Q 16 16 〈15..0〉 IRin Q Bus IR〈15..0〉Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 214. 4-21 Chapter 4—Processor Design Fig 4.6 The CPU–Memory Interface: Memory Address and Memory Data Registers, MA<31...0> and MD<31...0> • MD is loaded from memory 32 or from 32 CPU bus MDbus 1 32 D Q 32 32 MDrd MD Read 32 32 From Figure 4.3 2 Q Write 3 MD〈31..0〉 Strobe Done MA To memory subsystem 32 32 data〈31..0〉 MD MDwr Memory MDout bus 32 32 addr〈31..0〉 D Q • MD can drive MA CPU bus or MAin Q memory bus MA〈31..0〉 CPU busComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 215. 4-22 Chapter 4—Processor Design Fig 4.7 The ALU and Its Associated Registers 32 D Q A 32 From Figure 4.3 Ain Q A A B ADD ALU A B C SUB AND 11 32 C ALU NOT C=B C INC4 32 32 D Q C Cout Cin QComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 216. 4-23 Chapter 4—Processor Design From Concrete RTN to Control Signals: The Control Sequence Tbl 4.6 The Instruction Fetch Step Concrete RTN Control Sequence T0 MA ← PC: C ← PC + 4; PCout, MAin, INC4, Cin T1 MD ← M[MA]: PC ← C; Read, Cout, PCin, Wait T2 IR ← MD; MDout, IRin T3 Instruction_execution • The register transfers are the concrete RTN • The control signals that cause the register transfers make up the control sequence • Wait prevents the control from advancing to step T3 until the memory asserts DoneComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 217. 4-24 Chapter 4—Processor Design Control Steps, Control Signals, and Timing • Within a given time step, the order in which control signals are written is irrelevant • In step T0, Cin, Inc4, MAin, PCout == PCout, MAin, INC4, Cin • The only timing distinction within a step is between gates and strobes • The memory read should be started as early as possible to reduce the wait • MA must have the right value before being used for the read • Depending on memory timing, Read could be in T0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 218. 4-25 Chapter 4—Processor Design Control Sequence for the SRC add Instruction add (:= op = 12) → R[ra] ← R[rb] + R[rc]: Tbl 4.7 The add Instruction Step Concrete RTN Control Sequence T0 MA ← PC: C ← PC + 4; PCout, MAin, INC4, Cin, Read T1 MD ← M[MA]: PC ← C; Cout, PCin, Wait T2 IR ← MD; MDout, IRin T3 A ← R[rb]; Grb, Rout, Ain T4 C ← A + R[rc]; Grc, Rout, ADD, Cin T5 R[ra] ← C; Cout, Gra, Rin, End • Note the use of Gra, Grb, and Grc to gate the correct 5-bit register select code to the registers • End signals the control to start over at step T0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 219. 4-26 Chapter 4—Processor Design Control Sequence for the SRC addi Instruction addi (:= op= 13) → R[ra] ← R[rb] + c2〈16..0〉 {2’s comp., sign ext.} : Tbl 4.8 The addi Instruction Step Concrete RTN Control Sequence T0. MA ← PC: C ← PC + 4; PCout, MAin, Inc4, Cin, Read T1. MD ← M[MA]; PC ← C; Cout, PCin, Wait T2. IR ← MD; MDout, IRin T3. A ← R[rb]; Grb, Rout, Ain T4. C ← A + c2〈16..0〉 {sign ext.}; c2out, ADD, Cin T5. R[ra] ← C; Cout, Gra, Rin, End • The c2out signal sign extends IR〈16..0〉 and gates it to the busComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 220. 4-27 Chapter 4—Processor Design Control Sequence for the SRC st Instructionst (:= op = 3) → M[disp] ← R[ra] :disp〈31..0〉 := ((rb=0) → c2〈16..0〉 {sign extend} : (rb≠0) → R[rb] + c2〈16..0〉 {sign extend, 2’s complement} ) : The st Instruction Step Concrete RTN Control Sequence T0–T2 Instruction fetch Instruction fetch T3 A ← (rb=0) → 0: rb ≠ 0 → R[rb]; Grb, BAout, Ain T4 C ← A + c2〈16..0〉 {sign-extend}; c2out, ADD, Cin T5 MA ← C; Cout, MAin T6 MD ← R[ra]; Gra, Rout, MDin, Write T7 M[MA] ← MD; Wait, End • Note BAout in T3 compared to Rout in T3 of addiComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 221. 4-28 Chapter 4—Processor Design Fig 4.8 The Shift Counter • The concrete RTN for shr relies upon a 5-bit register to hold the shift count • It must load, decrement, and have an = 0 test Bus From Figure 4.3 〈4..0〉 5 4 0 〈4..0〉 n n=0 n: shift count 32 Decr 5-bit down counter Decrement Shift count, n Ld n = Q4..Q0 〈31..0〉 n=0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 222. 4-29 Chapter 4—Processor Design Tbl 4.10 Control Sequence for the SRC shr Instruction—Looping Step Concrete RTN Control Sequence T0–T2 Instruction fetch Instruction fetch T3 n ← IR〈4..0〉; c1out, Ld T4 (n=0) → (n ← R[rc]〈4..0 〉); n=0 → (Grc, Rout, Ld) T5 C ← R[rb]; Grb, Rout, C=B, Cin T6 Shr (:= (n≠0) → n≠0 → (Cout, SHR, Cin, (C〈31..0〉 ← 0#C〈31..1〉: Decr, Goto6) n ← n-1; Shr) ); T7 R[ra] ← C; Cout, Gra, Rin, End • Conditional control signals and repeating a control step are new conceptsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 223. 4-30 Chapter 4—Processor Design Branching cond := ( c3〈2..0〉=0 → 0: c3〈2..0〉 = 1 → 1: c3〈2..0〉 = 2 → R[rc] = 0: c3〈2..0〉 = 3 → R[rc] ≠ 0: c3〈2..0〉 = 4 → R[rc]〈31〉 = 0: c3〈2..0〉 = 5 → R[rc]〈31〉 = 1 ): • This is equivalent to the logic expression cond = (c3〈2..0〉 = 1) ∨ (c3〈2..0〉 = 2)∧(R[rc] = 0) ∨ (c3〈2..0〉 = 3)∧¬(R[rc] = 0) ∨ (c3〈2..0〉 = 4)∧¬R[rc]〈31〉 ∨ (c3〈2..0〉 = 5)∧R[rc]〈31〉Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 224. 4-31 Chapter 4—Processor Design Fig 4.9 Computation of the Conditional Value CON IR〈2..0〉 3 Bus Decoder From Figure 4.3 5 4 3 2 1 0 0 D Q CON CONin 1 32 32 =0 Cond logic ≠0 D Q c3〈2..0〉 〈31〉 ≥0 CON CONin Q <0 〈31..0〉 • NOR gate does = 0 test of R[rc] on busComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 225. 4-32 Chapter 4—Processor Design Tbl 4.11 Control Sequence for SRC Branch Instruction, br br (:= op = 8) → (cond → PC ← R[rb]): Step Concrete RTN Control Sequence T0–T2 Instruction fetch Instruction fetch T3 CON ← cond(R[rc]); Grc, Rout, CONin T4 CON → PC ← R[rb]; Grb, Rout, CON → PCin, End • Condition logic is always connected to CON, so R[rc] only needs to be put on bus in T3 • Only PCin is conditional in T4 since gating R[rb] to bus makes no difference if it is not usedComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 226. 4-33 Chapter 4—Processor Design Summary of the Design Process Informal description ⇒ formal RTN description ⇒ block diagram architecture ⇒ concrete RTN steps ⇒ hardware design of blocks ⇒ control sequences ⇒ control unit and timing • At each level, more decisions must be made • These decisions refine the design • Also place requirements on hardware still to be designed • The nice one-way process above has circularity • Decisions at later stages cause changes in earlier ones • Happens less in a text than in reality because • Can be fixed on re-reading • Confusing to first-time studentComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 227. 4-34 Chapter 4—Processor Design Fig 4.10 Clocking the Data Path: Register Transfer Timing Source Bus Logic Destination register gate block register n-bit bus D Q D Q n Combinational • tR2valid is the R1 R2 Rout logic period from CK Q CK Q Rin begin of gate signal till inputsCircuit Gate Bus prop. ALU, Latch to R2 are validpropagation prop. delay, etc. prop.delay time, tbp delay, delay, • tcomb is delay tg tcomb tl through combinationalGate signal: Latch hold time, th logic, such as Rout Latch setup time, tsu ALU or cond logicStrobe signal: tR2valid Rin Minimum pulse width, tw Minimum clock period, tminComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 228. 4-35 Chapter 4—Processor Design Signal Timing on the Data Path • Several delays occur in getting data from R1 to R2 • Gate delay through the 3-state bus driver—tg • Worst case propagation delay on bus—tbp • Delay through any logic, such as ALU—tcomb • Set up time for data to affect state of R2—tsu • Data can be strobed into R2 after this time tR2valid = tg + tbp + tcomb + tsu • Diagram shows strobe signal in the form for a latch. It must be high for a minimum time—tw • There is a hold time, th, for data after strobe endsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 229. 4-36 Chapter 4—Processor Design Effect of Signal Timing on Minimum Clock Cycle • A total latch propagation delay is the sum Tl = tsu + tw + th • All above times are specified for latch • th may be very small or zero • The minimum clock period is determined by finding longest path from ff output to ff input • This is usually a path through the ALU • Conditional signals add a little gate delay • Using this path, the minimum clock period is tmin = tg + tbp + tcomb + tlComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 230. 4-37 Chapter 4—Processor Design Latches Versus Edge-Triggered or Master-Slave Flip-Flops • During the high part of a strobe a latch changes its output • If this output can affect its input, an error can occur • This can influence even the kind of concrete RTs that can be written for a data path • If the C register is implemented with latches, then C ← C + MD; is not legal • If the C register is implemented with master-slave or edge-triggered flip-flops, it is OKComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 231. 4-38 Chapter 4—Processor Design The Control Unit • The control unit’s job is to generate the control signals in the proper sequence • Things the control signals depend on • The time step Ti • The instruction opcode (for steps other than T0, T2, T2) • Some few data path signals like CON, n = 0, etc. • Some external signals: reset, interrupt, etc. (to be covered) • The components of the control unit are: a time state generator, instruction decoder, and combinational logic to generate control signalsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 232. 4-39 Chapter 4—Processor Design Fig 4.11 Control Unit Detail with Inputs and Outputs Master Strt Wait Done clock OpCode IR Other signals from the data path Decoder CON n=0 Clocking logic ld add shc br Enable Step generator T0 T1 Interrupts Counter Countln Control T2 Control step and other signal 4 decoder T4 external encoder signals Tn – 1 Load Reset Generated control signals PCout Rout ADD PCin Gra WaitComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 233. 4-40 Chapter 4—Processor Design Synthesizing Control Signal Encoder Logic Step Control Sequence T0. PC out , MA , Inc4, C , Read in in T1. Cout, PC in, Wait T2. MD out , IR in add addi st shrStep Control Sequence Step Control Sequence Step Control Sequence Step Control SequenceT3. Grb, R ,A out in T3. Grb, R ,A out in T3. Grb, BA ,A out in T3. c1 out , LdT4. Grc, R out , ADD, C in T4. c2 out , ADD, C in T4. c2 out , ADD, C in T4. n=0 → (Grc, R out , Ld)T5. C , Gra, R , End T5. C , Gra, R , End T5. C out , MA in T5. Grb, R out , C=B out in out in T6. Gra, Rout, MD in, Write T6. n≠0 → (C out , SHR, C , in Decr, Goto7) T7. Wait, End T7. C out , Gra, Rin, EndDesign process:• Comb through the entire set of control sequences.• Find all occurrences of each control signal.• Write an equation describing that signal.Example: Gra = T5·(add + addi) + T6·st + T7·shr + ...Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 234. 4-41 Chapter 4—Processor Design Use of Data Path Conditions in Control Signal Logic Step Control Sequence T0. PC out , MA , Inc4, C , Read in in T1. C out , PC , Wait in T2. MDo , IR ut in add addi st shrStep Control Sequence Step Control Sequence Step Control Sequence Step Control SequenceT3. Grb, R ,A out in T3. Grb, R ,A out in T3. Grb, BA ,A out in T3. c1 out , LdT4. Grc, Rout, ADD, Cin T4. c2 out , ADD, C in T4. c2 out , ADD, C in T4. n=0 → (Grc, Rout, Ld)T5. Cout, Gra, Rin, End T5. Co , Gra, R , End ut in T5. C out , MA in T5. Grb, Rout, C=B T6. Gra, R out , MD , Write in T6. n≠0 → (Cout, SHR, C in, T7. Wait, End Decr, Goto7) T7. Cout, Gra, Rin, End Example: Grc = T4·add + T4·(n=0)·shr + ...Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 235. 4-42 Chapter 4—Processor Design Fig 4.12 Generation of the logic for PCin and Gra T1 add T5 Cout addi Gra T5 T7 add ldComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 236. 4-43 Chapter 4—Processor Design Fig 4.13 Branching in the Control Unit Mck Enable Step generator • 3-state gates allow 6 to be applied to counter input Counter Control Countln step • Reset will decoder synchronously 4 reset counter to step T0 0110 Load Reset Goto6Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 237. 4-44 Chapter 4—Processor Design Fig 4.14 The Clocking Logic: Start, Stop, and Memory Synchronization 1 Run (G) 2 Strt (E) J Q Done (E) D Q Stop (C) K Q SDone (G) Mck (I) Q 4 Enable (G) Wait (C) Read (C) J Q R (G) 3 K Q To memory system Legend Write (C) J Q W (G) E – External G – Generated C – Control signal K Q I – Internal • Mck is master clock oscillatorComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 238. 4-45 Chapter 4—Processor Design The Complete 1-Bus Design of SRC • High-level architecture block diagram • Concrete RTN steps • Hardware design of registers and data path logic • Revision of concrete RTN steps where needed • Control sequences • Register clocking decisions • Logic equations for control signals • Time step generator design • Clock run, stop, and synchronization logicComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 239. 4-46 Chapter 4—Processor Design Other Architectural Designs Will Require a Different RTN • More data paths allow more things to be done in one step • Consider a two bus design • By separating input and output of ALU on different buses, the C register is eliminated • Steps can be saved by strobing ALU results directly into their destinationsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 240. 4-47 Chapter 4—Processor Design Fig 4.15 The 2-Bus SRC Microarchitecture A bus B bus (“In bus”) 31 0 (“Out bus”) R0 32 32 general 32 purpose registers • Bus A carries data going into registers • Bus B carries data R31 being gated out of IR registers PC MA • ALU function C = B is Memory bus used for all simple MD register transfers A A B ALU CComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 241. 4-48 Chapter 4—Processor Design Tbl 4.13 The 2-Bus add InstructionStep Concrete RTN Control SequenceT0 MA ← PC; PCout, C = B, MAin, ReadT1 PC ← PC + 4: MD ← M[MA];PCout, INC4, PCin, WaitT2 IR ← MD; MDout, C = B, IRinT3 A ← R[rb]; Grb, Rout, C = B, AinT4 R[ra] ← A + R[rc]; Grc, Rout, ADD, Sra, Rin, End • Note the appearance of Grc to gate the output of the register rc onto the B bus and Sra to select ra to receive data strobed from the A bus • Two register select decoders will be needed • Transparent latches will be required at step T2Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 242. 4-49 Chapter 4—Processor Design Performance and Design T 1 − bus − T 2 − bus % Speedup = × 100 T 2 − bus Where T = Execution Time = IC × CPI × τComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 243. 4-50 Chapter 4—Processor Design Speedup By Going to 2 Buses •Assume for now that IC and τ don’t change in going from 1 bus to 2 buses •Naively assume that CPI goes from 8 to 7 clocks. T 1 − bus − T 2 − bus %Speedup = × 100 T 2 − bus IC × 8 × τ − IC × 7 × τ 8−7 = × 100 = × 100 = 14% IC × 7 × τ 7 Class Problem: How will this speedup change if clock period of 2-bus machine is increased by 10%?Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 244. 4-51 Chapter 4—Processor Design 3-Bus Architecture Shortens Sequences Even More • A 3-bus architecture allows both operand inputs and the output of the ALU to be connected to buses • Both the C output register and the A input register are eliminated • Careful connection of register inputs and outputs can allow multiple RTs in a stepComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 245. 4-52 Chapter 4—Processor Design Fig 4.16 The 3-Bus SRC Design C bus A bus B bus 32 32 32 31 0 R0 32 general purpose • A-bus is ALU registers operand 1, B-bus is ALU operand 2, and C-bus is ALU output R31 IR • Note MA input connected to the PC B-bus MA Memory bus MD A B ALU CComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 246. 4-53 Chapter 4—Processor Design Tbl 4.15 The 3-Bus add Instruction Step Concrete RTN Control Sequence T0 MA ← PC: MD ← M[MA]; PCout, MAin, INC4, PCin, PC ← PC + 4: Read, Wait T1 IR ← MD; MDout, C = B, IRin T2 R[ra] ← R[rb] + R[rc]; GArc, RAout, GBrb, RBout, ADD, Sra, Rin, End • Note the use of 3 register selection signals in step T2: GArc, GBrb, and Sra • In step T0, PC moves to MA over bus B and goes through the ALU INC4 operation to reach PC again by way of bus C • PC must be edge-triggered or master-slave • Once more MA must be a transparent latchComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 247. 4-54 Chapter 4—Processor Design Performance and Design • How does going to three buses affect performance? • Assume average CPI goes from 8 to 4, while τ increases by 10%: IC × 8 × τ − IC × 4 × 1.1τ 8 − 4.4%Speedup = × 100 = × 100 = 82% IC × 4 × 1.1τ 4.4Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 248. 4-55 Chapter 4—Processor Design Processor Reset Function • Reset sets program counter to a fixed value • May be a hardwired value, or • contents of a memory cell whose address is hardwired • The control step counter is reset • Pending exceptions are prevented, so initialization code is not interrupted • It may set condition codes (if any) to known state • It may clear some processor state registers • A “soft” reset makes minimal changes: PC, T (trace) • A “hard” reset initializes more processor stateComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 249. 4-56 Chapter 4—Processor Design SRC Reset Capability • We specify both a hard and soft reset for SRC • The Strt signal will do a hard reset • It is effective only when machine is stopped • It resets the PC to zero • It resets all 32 general registers to zero • The Soft Reset signal is effective when the machine is running • It sets PC to zero • It restarts instruction fetch • It clears the Reset signal • Actions are described in instruction_interpretationComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 250. 4-57 Chapter 4—Processor Design Abstract RTN for SRC Reset and Start Processor State Strt: Start signal Rst: External reset signal instruction_interpretation := ( ¬Run∧Strt → (Run ← 1: PC, R[0..31] ← 0); Run∧¬Rst → (IR ← M[PC]: PC ← PC + 4; instruction_execution): Run∧Rst → ( Rst ← 0: PC ← 0); instruction_interpretation):Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 251. 4-58 Chapter 4—Processor Design Resetting in the Middle of Instruction Execution • The abstract RTN implies that reset takes effect after the current instruction is done • To describe reset during an instruction, we must go from abstract to concrete RTN • Questions for discussion: • Why might we want to reset in the middle of an instruction? • How would we reset in the middle of an instruction?Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 252. 4-59 Chapter 4—Processor Design Tbl 4.17 The add Instruction with Reset Processing Step Concrete RTN T0 ¬Rst → (MA ← PC: C ← PC + 4): Rst → (Rst ← 0: PC ← 0: T ←0): T1 ¬Rst → (MD ← M[MA]: P ← C): Rst → (Rst ← 0: PC ← 0: T ← 0): T2 ¬Rst → (IR ← MD): Rst → (Rst ← 0: PC ← 0: T ← 0): T3 ¬Rst → (A ← R[rb]): Rst → (Rst ← 0: PC ← 0: T ← 0): T4 ¬Rst → (C ← A + R[rc]): Rst → (Rst ← 0: PC ← 0: T ← 0): T5 ¬Rst → (R[ra ] ← C): Rst → (Rst ← 0: PC ← 0: T ← 0): • See text for the corresponding control signalsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 253. 4-60 Chapter 4—Processor Design Control Sequences Including the Reset Function Step Control Sequence T0 ¬Reset → (PCout, MAin, Inc4, Cin, Read): Reset → (ClrPC, ClrR, Goto0): T1 ¬Reset → (Cout, PCin, Wait): Reset → (ClrPC, ClrR, Goto0): ••• • ClrPC clears the program counter to all zeros, and ClrR clears the 1-bit Reset flip-flop • Because the same reset actions are in every step of every instruction, their control signals are independent of time step or opcodeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 254. 4-61 Chapter 4—Processor Design General Comments on Exceptions • An exception is an event that causes a change in the program specified flow of control • Because normal program execution is interrupted, they are often called interrupts • We will use exception for the general term and use interrupt for an exception caused by an external event, such as an I/O device condition • The usage is not standard. Other books use these words with other distinctions, or noneComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 255. 4-62 Chapter 4—Processor Design Combined Hardware/Software Response to an Exception • The system must control the type of exceptions it will process at any given time • The state of the running program is saved when an allowed exception occurs • Control is transferred to the correct software routine, or “handler,” for this exception • This exception, and others of less or equal importance, are disallowed during the handler • The state of the interrupted program is restored at the end of execution of the handlerComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 256. 4-63 Chapter 4—Processor Design Hardware Required to Support Exceptions • To determine relative importance, a priority number is associated with every exception • Hardware must save and change the PC, since without it no program excution is possible • Hardware must disable the current exception lest is interrupt the handler before it can start • Address of the handler is called the exception vector and is a hardware function of the exception type • Exceptions must access a save area for PC and other hardware saved items • Choices are special registers or a hardware stackComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 257. 4-64 Chapter 4—Processor Design New Instructions Needed to Support Exceptions • An instruction executed at the end of the handler must reverse the state changes done by hardware when the exception occurred • There must be instructions to control what exceptions are allowed • The simplest of these enable or disable all exceptions • If processor state is stored in special registers on an exception, instructions are needed to save and restore these registersComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 258. 4-65 Chapter 4—Processor Design Kinds of Exceptions • System reset • Exceptions associated with memory access • Machine check exceptions • Data access exceptions • Instruction access exceptions • Alignment exceptions • Program exceptions • Miscellaneous hardware exceptions • Trace and debugging exceptions • Nonmaskable exceptions • External exceptions—interruptsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 259. 4-66 Chapter 4—Processor Design An Interrupt Facility for SRC • The exception mechanism for SRC handles external interrupts • There are no priorities, but only a simple enable and disable mechanism • The PC and information about the source of the interrupt are stored in special registers • Any other state saving is done by software • The interrupt source supplies 8 bits that are used to generate the interrupt vector • It also supplies a 16-bit code carrying information about the cause of the interruptComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 260. 4-67 Chapter 4—Processor Design SRC Processor State Associated with Interrupts Processor interrupt mechanismFrom Device → ireq: Interrupt request signalTo Device → iack: Interrupt acknowledge signalInternal → IE: 1-bit interrupt enable flagto CPU → IPC〈31..0〉: Storage for PC saved upon interruptto CPU → II〈31..0〉: Information on source of last interruptFrom Device → Isrc_info〈15..0〉: Information from interrupt sourceFrom Device → Isrc_vect〈7..0〉: Type code from interrupt sourceInternal → Ivect〈31..0〉:= 20@0#Isrc_vect〈7..0〉#4@0: Ivect〈31..0〉 000 . . . 0 Isrc_vect〈7..0〉 0000 31 12 11 4 3 0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 261. 4-68 Chapter 4—Processor Design SRC Instruction Interpretation Modified for Interrupts instruction_interpretation := (¬Run∧Strt → Run← 1: Run∧¬(ireq∧IE) → (I ← M[PC]: PC ← PC + 4; instruction_execution): Run∧(ireq∧IE) → (IPC ← PC〈31..0〉: II〈15..0〉 ← Isrc_info〈15..0〉: iack ← 1: IE ← 0: PC ← Ivect〈31..0〉; iack ← 0); instruction_interpretation); • If interrupts are enabled, PC and interrupt information are stored in IPC and II, respectively • With multiple requests, external priority circuit (discussed in later chapter) determines which vector and information are returned • Interrupts are disabled • The acknowledge signal is pulsedComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 262. 4-69 Chapter 4—Processor Design SRC Instructions to Support Interrupts Return from interrupt instruction rfi (:= op = 29 ) → (PC ← IPC: IE ← 1): Save and restore interrupt state svi (:= op = 16) → (R[ra]〈15..0〉 ← II〈15..0〉: R[rb] ← IPC〈31..0〉): ri (:= op = 17) → (II〈15..0〉 ← R[ra]〈15..0〉 : IPC〈31..0〉 ← R[rb]): Enable and disable interrupt system een (:= op = 10 ) → (IE ← 1): edi (:= op = 11 ) → (IE ← 0): • The 2 rfi actions are indivisible, can’t een and branchComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 263. 4-70 Chapter 4—Processor Design Concrete RTN for SRC Instruction Fetch with Interrupts Step ¬(ireq∧IE) Concrete RTN (ireq∧IE) T0 (¬(ireq∧IE) → ( (ireq∧IE) → (IPC ← PC: II← Isrc_info: MA ← PC: C ← PC+4): IE ← 0: PC← 22@0#Isrc_vect〈7..0〉#00: Iack←1; Iack ← 0: End); T1 MD ← M[MA] : PC ← C; T2 IR ← MD; • PC could be transferred to IPC over the bus • II and IPC probably have separate inputs for the externally supplied values • iack is pulsed, described as ←1; ←0, which is easier as a control signal than in RTNComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 264. 4-71 Chapter 4—Processor Design Exceptions During Instruction Execution • Some exceptions occur in the middle of instructions • Some CISCs have very long instructions, like string move • Some exception conditions prevent instruction completion, like uninstalled memory • To handle this sort of exception, the CPU must make special provision for restarting • Partially completed actions must be reversed so the instruction can be re-executed after exception handling • Information about the internal CPU state must be saved so that the instruction can resume where it left off • We will see that this problem is acute with pipeline designs—always in middle of instructionsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 265. 4-72 Chapter 4—Processor Design Recap of the Design Process: the Main Topic of Chapter 4 SRC Informal description Chapter 2 Formal RTN description Block diagram architecture Concrete RTN steps Chapter 4 Hardware design of blocks Control sequences Control unit and timingComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 266. 4-73 Chapter 4—Processor Design Chapter 4 Summary • Chapter 4 has done a nonpipelined data path and a hardwired controller design for SRC • The concepts of data path block diagrams, concrete RTN, control sequences, control logic equations, step counter control, and clocking have been introduced • The effect of different data path architectures on the concrete RTN was briefly explored • We have begun to make simple, quantitative estimates of the impact of hardware design on performance • Hard and soft resets were designed • A simple exception mechanism was supplied for SRCComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 267. 5-1 Chapter 5—Processor Design—Advanced Topics Chapter 5: Processor Design— Advanced Topics Topics 5.1 Pipelining • A pipelined design of SRC • Pipeline hazards 5.2 Instruction-Level Parallelism • Superscalar processors • Very Long Instruction Word (VLIW) machines 5.3 Microprogramming • Control store and microbranching • Horizontal and vertical microprogrammingComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 268. 5-2 Chapter 5—Processor Design—Advanced Topics Fig 5.1 Executing Machine Instructions versus Manufacturing Small Parts Instruction Instruction interpretation Part interpretation Part and execution manufacture and execution manufacture Fetch Select Fetch Cover Select instruction part Id r2, addr2 instruction plate part Fetch Drill Fetch End Drill operands part st r4, addr1 operands plate part ALU Cut ALU Top Cut add r4, r3, r2 operation part operation plate part Memory Polish Memory Bottom Polish access part sub r2, r5, 1 access plate part Register Package Register Center Package write part shr r3, r3, 2 write plate part add r4, r3, r2 Make end plate (a) Without pipelining/assembly line (b) With pipelining/assembly lineComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 269. 5-3 Chapter 5—Processor Design—Advanced Topics The Pipeline Stages • 5 pipeline stages are shown • 1. Fetch instruction • 2. Fetch operands • 3. ALU operation • 4. Memory access • 5. Register write • 5 instructions are executing • shr r3, r3, #2 ;Storing result into r3 • sub r2, r5, #1 ;Idle—no memory access needed • add r4, r3, r2 ;Performing addition in ALU • st r4, addr1 ;Accessing r4 and addr1 • ld r2, addr2 ;Fetching instructionComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 270. 5-4 Chapter 5—Processor Design—Advanced Topics Notes on Pipelining Instruction Processing • Pipeline stages are shown top to bottom in order traversed by one instruction • Instructions listed in order they are fetched • Order of instructions in pipeline is reverse of listed • If each stage takes 1 clock: • every instruction takes 5 clocks to complete • some instruction completes every clock tick • Two performance issues: instruction latency and instruction bandwidthComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 271. 5-5 Chapter 5—Processor Design—Advanced Topics Dependence Among Instructions • Execution of some instructions can depend on the completion of others in the pipeline • One solution is to “stall” the pipeline • early stages stop while later ones complete processing • Dependences involving registers can be detected and data “forwarded” to instruction needing it, without waiting for register write • Dependence involving memory is harder and is sometimes addressed by restricting the way the instruction set is used • “Branch delay slot” is example of such a restriction • “Load delay” is another exampleComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 272. 5-6 Chapter 5—Processor Design—Advanced Topics Branch and Load Delay Examples Branch Delay brz r2, r3 This instruction always executed add r6, r7, r8 st r6, addr1 Only done if r2 ≠ 0 Load Delay ld r2, addr This instruction gets “old” add r5, r1, r2 value of r2 shr r1,r1,#4 sub r6, r8, r2 This instruction gets r2 value loaded from addr • Working of instructions is not changed, but way they work together isComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 273. 5-7 Chapter 5—Processor Design—Advanced Topics Characteristics of Pipelined Processor Design • Main memory must operate in one cycle • This can be accomplished by expensive memory, but • It is usually done with cache, to be discussed in Chap. 7 • Instruction and data memory must appear separate • Harvard architecture has separate instruction and data memories • Again, this is usually done with separate caches • Few buses are used • Most connections are point to point • Some few-way multiplexers are used • Data is latched (stored in temporary registers) at each pipeline stage—called “pipeline registers” • ALU operations take only 1 clock (esp. shift)Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 274. 5-8 Chapter 5—Processor Design—Advanced Topics Adapting Instructions to Pipelined Execution • All instructions must fit into a common pipeline stage structure • We use a 5-stage pipeline for the SRC (1) Instruction fetch (2) Decode and operand access (3) ALU operations (4) Data memory access (5) Register write • We must fit load/store, ALU, and branch instructions into this patternComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 275. 5-9 Chapter 5—Processor Design—Advanced Topics ALU operations including shifts Fig 5.2 ALU Instruction memory PC Instructions 1. Instruction Inc4 fetch Register file regwrite IR2 op, ra C2〈4..0〉 R[rb] R[rc] R[ra] ra 2. • Instructions fit into 5 Decode and stages operand read Mp4 • Second ALU operand X3 Y3 comes either from a 3. register or instruction ALU operation register c2 field Decode ALU • Opcode must be available Z4 in stage 3 to tell ALU what 4. Memory to do access • Result register, ra, is written in stage 5 5. • No memory operation ra writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 276. 5-10 Chapter 5—Processor Design—Advanced Topics Logic Expressions Defining Pipeline Stage Activity branch := br ∨ brl : cond := (IR2〈2..0〉 = 1) ∨ ((IR2〈2..1〉=1)∧(IR2〈0〉⊕R[rb]=0)) ∨ ((IR2〈2..1〉=2)∧(IR2〈0〉⊕R[rb]〈31〉) : sh := shr ∨ shra ∨ shl ∨ shc : alu := add ∨ addi ∨ sub ∨ neg ∨ and ∨ andi ∨ or ∨ ori ∨ not ∨ sh : imm := addi ∨ andi ∨ ori ∨ (sh ∧ (IR2〈4..0〉 ≠ 0) ): load := ld ∨ ldr : ladr := la ∨ lar : store := st ∨ str : l-s := load ∨ ladr ∨ store : regwrite := load ∨ ladr ∨ brl ∨ alu: Instructions that write to the register file dsp := ld ∨ st ∨ lar : Instructions that use disp addressing rl := ldr ∨ str ∨ lar : Instructions that use rel addressingComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 277. 5-11 Chapter 5—Processor Design—Advanced Topics Notes on the Equations and Different Stages • The logic equations are based on the instruction in the stage where they are used • When necessary, we append a digit to a logic signal name to specify it is computed from values in that stage • Thus regwrite5 is true when the opcode in stage 5 is load5 ∨ ladr5 ∨ brl5 ∨ alu5, all of which are determined from op5Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 278. 5-12 Chapter 5—Processor Design—Advanced Topics Fig 5.4 The Instruction ld, ldr, la, and lar PC Instruction st and str PC Memory memory memory 1. Instruction Access Inc4 Inc4 fetch regwrite regwrite Instructions: Register file Register file IR2 op, ra c1〈21..0〉 PC2 R[rb] R[rc] R[ra] IR2 op, ra c1〈21..0〉 PC2 R[rb] R[rc] R[ra] ra c1 c2 c1 c2 2. Decode ld, ldr, st, and operand read Mp3 Mp3 and str Mp4 Mp4 X3 Y3 X3 Y3 MD3 3. add add Decode ALU Decode ALU ALU operation• ALU computes effective addresses Z4 Z4• Stage 4 does read or Data Data 4. write Memory access memory memory Mp5• Result register Z5 written only on load 5. ra writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 279. 5-13 Chapter 5—Processor Design—Advanced Topics Fig 5.5 The Branch br and brl Instruction PC memory Branch 1. Instruction Inc4 Mp1 fetch Instructions IR2 op, ra c2〈2..0〉 Register file PC2 R[rb] R[rc] R[ra] 2. Decode Branch and logic ra • The new program operand read cond counter value is known in stage 2—but not in stage 1 3. ALU brl only operation • Only branch and link does a register write in stage 5 4. Memory • There is no ALU or access memory operation 5. ra writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 280. 5-14 Chapter 5—Processor Design—Advanced Topics Fig 5.6 The Instruction memory PCSRC Pipeline 1. IR2 ← M[PC] : PC + 4Registers and Instruction PC2 ← PC + 4 ; fetch R[rb] RTN IR2 op ra rb rc c1 c2 PC2 rb Register file R[rb] rc R[rc] ra R[ra]Specification 2. Decode X3 ← l-s2 → (rel2 → PC2 : disp2 → R[rb]) : brl2 → PC2 : alu2 → R[rb] : Y3 ← l-s2 → (rel2 → c1 : disp2 → c2) : and branch2 → : alu2 → (imm2 → c2 : ¬imm2→ R[rc]) : operand MD3 ← store2 → R[ra] : IR3← IR2 : stop2 → Run ← 0 : • The pipeline read PC ← ¬branch2 → PC + 4 : branch2 → (cond(IR2, R[rc]) → R[rb] ; ¬cond(IR2, R[rc]) → PC + 4) ; registers pass IR3 X3 Y3 MD3 information from stage to stage 3. Z4 ← (l-s3 → X3 + Y3 : brl3 → X3 : ALU alu3 → X3 op Y3) : • RTN specifies operation MD4 ← MD3 : IR4 ← IR3 ; output register values in terms of IR4 Z4 MD4 input register 4. Z5 ← (load4 → M[Z4]: ladr4 ∨ branch4 ∨ alu4 → Z4) : Data values for stage Memory access store4 → (M[Z4] ← MD4) : memory IR5 ← IR4 ; • Discuss RTN at IR5 Z5 each stage on 5. blackboard Register write regwrite5 → (R[ra] ← Z5) ;Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 281. 5-15 Chapter 5—Processor Design—Advanced Topics Global State of the Pipelined SRC • PC, the general registers, instruction memory, and data memory represent the global machine state • PC is accessed in stage 1 (and stage 2 on branch) • Instruction memory is accessed in stage 1 • General registers are read in stage 2 and written in stage 5 • Data memory is only accessed in stage 4Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 282. 5-16 Chapter 5—Processor Design—Advanced Topics Restrictions on Access to Global State by Pipeline • We see why separate instruction and data memories (or caches) are needed • When a load or store accesses data memory in stage 4, stage 1 is accessing an instruction • Thus two memory accesses occur simultaneously • Two operands may be needed from registers in stage 2 while another instruction is writing a result register in stage 5 • Thus as far as the registers are concerned, 2 reads and a write happen simultaneously • Increment of PC in stage 1 must be overridden by a successful branch in stage 2Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 283. 5-17 Chapter 5—Processor Design—Advanced Topics Instruction PC Fig 5.7 memory Mp1 The Instruction 1. Inc4 Mp1 ← (¬(branch2 ∨ cond) → lnc4): cond) → PC2):Pipeline fetch ∨ ( (branch2 Data IR2 Register file G1 GA1 op ra rb rc c1 c2 G2Path with PC2 rb a1 R1 a2 R2 a3 R3 W3Selected Decode 2. rc Mp2 cond Mp2 ← (¬store → rc): ( store → ra): Control operand and c1 c2 ra c2〈2..0〉 Branch Mp3 ← (rl ∨ branch → PC2): (dsp ∨ alu → R1): logic Signals read Mp3 Mp4 Mp4 ← (rl → c1): (dsp ∨ imm → c2): IR3 X3 Y3 (alu ∧ 71mm ¬imm → R2): MD3• Most op ra control ALU op’n 3. ALU signals ALU Decode shown operation and given IR4 Z4 MD4 values addr Data op ra memory 4.• Multi- Memory Mp5 ← (¬load → Z4): (load → mem data): plexer access Decode load/store control is Mp5 stressed IR5 Z5 in this 5. ra op value figure Register write Decode load ∨ ladr ∨ brl ∨ aluComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 284. 5-18 Chapter 5—Processor Design—Advanced Topics Example of Propagation of Instructions Through Pipe100: add r4, r6, r8; R[4] ← R[6] + R[8]104: ld r7, 128(r5); R[7] ← M[R[5]+128]108: brl r9, r11, 001; PC ← R[11]: R[9] ← PC112: str r12, 32; M[PC+32] ← R[12] . . . . . .512: sub ... next instr. ... • It is assumed that R[11] contains 512 when the brl instruction is executed • R[6] = 4 and R[8] = 5 are the add operands • R[5] =16 for the ld and R[12] = 23 for the strComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 285. 5-19 Chapter 5—Processor Design—Advanced Topics Instruction PC 100 104 memory Fig 5.8 1. Mp1First Clock Instruction fetch Inc4Cycle: add 100: add r4, r6, r8 104 Register file G1 GA1 IR2 Enters op ra rb rc c1 c2 PC2 rb a1 R1 a2 R2 a3 R3 G2 W3Stage 1 of 2. Decode rc ra Mp2 cond and Pipeline operand read c1 c2 c2〈2..0〉 Branch logic Mp3 Mp4 IR3 X3 Y3 MD3• Program op ra ALU op’n counter is 3. ALU Decode ALU incremented to operation 104 IR4 Z4 MD4512: sub ... addr Data op ra memory 4. ...... Memory112: str r12, #32 access Decode load/store108: brl r9, r11, 001 Mp5104: ld r7, r5, #128 IR5 Z5100: add r4, r6, r8 5. op ra value ra Decode load ∨ lader ∨ brl ∨ alu writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 286. 5-20 Chapter 5—Processor Design—Advanced Topics Instruction PC 104 108 Fig 5.9 1. memory Mp1Second Clock Instruction fetch Inc4 Cycle: add 104: ld r7 , r5, 128 108 G1 PC2 Register file GA1 Enters Stage IR2 add r4, r6, r8 104 r6 4 r8 5 a3 R3 G2 W3 rb2, While 1d is 2. Decode rc Mp2 condBeing Fetched and operand 4 c1 c2 ra c2〈2..0〉 5 Branch logic read at Stage 1 add r4 Mp3 Mp4 IR3 X3 Y3 MD3 op ra ALU op’n• add operands 3. ALU Decode ALU are fetched in operation stage 2 IR4 Z4 MD4 addr Data512: sub ... op ra memory 4. ...... Memory load/store112: str r12, #32 access Decode108: brl r9, r11, 001 Mp5104: ld r7, r5, #128 IR5 Z5 5. ra100: add r4, r6, r8 ra op Decode load ∨ lader ∨ brl ∨ alu value writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 287. 5-21 Chapter 5—Processor Design—Advanced Topics Instruction PC 108 112 Fig 5.10 memory Mp1 Third Clock 1. Instruction Inc4 fetch Cycle: brl 108: brl r9 , r11, 001 112 PC2 a1 R1 a2 R2 a3 R3 G1 GA1 IR2 Enters the ld r7 ,r5, 128 108 r5 16 G2 W3 rb Pipeline 2. Decode rc ra Mp2 cond and c2 c2〈2..0〉 Branch 16 c1 128 operand logic read Mp3 Mp4• add IR3 ld r7 add r4 X3 4 Y3 5 MD3 performs its op ra arithmetic in add 3. Decode ALU stage 3 ALU operation add r4 9512: sub ... IR4 Z4 MD4 addr Data ...... memory op ra112: str r12, #32 4. Memory108: brl r9, r11, 001 access Decode load/store104: ld r7, r5, #128 Mp5100: add r4, r6, r8 IR5 Z5 5. ra op value ra Decode load ∨ lader ∨ brl ∨ alu writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 288. 5-22 Chapter 5—Processor Design—Advanced Topics Instruction PC 112 512 Fig 5.11 memory Mp1 1.Fourth Clock Instruction fetch Inc4 512 Cycle: str 112: str r12, 32 116 PC2 a1 R1 a2 R2 a3 R3 G1 GA1 IR2 G2 Enters the 2. brl r9 , r11 001 op ra rb rc c1 c2 112 rb r11 512 W3 Mp2 Pipeline Decode and rc ra c2〈2..0〉=001 cond operand 112 Branch c1 logic• add is idle in read brl r9 Mp3 Mp4 stage 4 IR3 ld r7 X3 16 Y3 128 MD3• Success of brl op ra add 3. changes program ALU Decode ALU counter to 512 operation ld r7 144 IR4512: sub ... add r4 Z4 9 MD4 addr Data ...... 4. op ra memory112: str r12, #32 Memory 9 access Decode load/store108: brl r9, r11, 001104: ld r7, r5, #128 add r4 Mp5100: add r4, r6, r8 IR5 Z5 5. op ra value ra Decode load ∨ lader ∨ brl ∨ alu writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 289. 5-23 Chapter 5—Processor Design—Advanced Topics Fig 5.12 Instruction PC 512 516 memory Fifth Clock 1. Mp1 Instruction Inc4 Cycle: add fetch 512: sub, ... 516 G1 Completes, IR2 str r12, 32 116 PC2 a1 R1 a2 R2 a3 R3 r12 23 GA1 G2 W3sub Enters the 2. Decode op ra rb rc c1 c2 rb rc Mp2 r4 cond 9 r12 Pipeline and operand 116 32 c2〈2..0〉 Branch logic read str r12 Mp3 Mp4 23• add completes in IR3 brl r9 X3 112 Y3 XXX MD3 stage 5 op ra 3. Z=X X ALU Y• sub is fetched from ALU Decode Z operation location 512 after brl r9 112 successful brl IR4 ld r7 Z4 144 MD4 Data addr 144 memory512: sub ... 4. op ra read 55 Memory ...... access Decode load/store 55112: str r12, #32 ld r7 Mp5108: brl r9, r11, 001 IR5 add r4 Z5 9104: ld r7, r5, #128 5. r4 value ra Decode load ∨ lader ∨ brl ∨ alu100: add r4, r6, r8 writeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 290. 5-24 Chapter 5—Processor Design—Advanced Topics Functions of the Pipeline Registers in SRC • Registers between stages 1 and 2: • I2 holds full instruction including any register fields and constant • PC2 holds the incremented PC from instruction fetch • Registers between stages 2 and 3: • I3 holds opcode and ra (needed in stage 5) • X3 holds PC or a register value (for link or 1st ALU operand) • Y3 holds c1 or c2 or a register value as 2nd ALU operand • MD3 is used for a register value to be stored in memoryComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 291. 5-25 Chapter 5—Processor Design—Advanced Topics Functions of the Pipeline Registers in SRC (cont’d) • Registers between stages 3 and 4: • I4 has op code and ra • Z4 has memory address or result register value • MD4 has value to be stored in data memory • Registers between stages 4 and 5: • I5 has opcode and destination register number, ra • Z5 has value to be stored in destination register: from ALU result, PC link value, or fetched dataComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 292. 5-26 Chapter 5—Processor Design—Advanced Topics Functions of the SRC Pipeline Stages • Stage 1: fetches instruction • PC incremented or replaced by successful branch in stage 2 • Stage 2: decodes instruction and gets operands • Load or store gets operands for address computation • Store gets register value to be stored as 3rd operand • ALU operation gets 2 registers or register and constant • Stage 3: performs ALU operation • Calculates effective address or does arithmetic/logic • May pass through link PC or value to be stored in memoryComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 293. 5-27 Chapter 5—Processor Design—Advanced Topics Functions of the SRC Pipeline Stages (cont’d) • Stage 4: accesses data memory • Passes Z4 to Z5 unchanged for nonmemory instructions • Load fills Z5 from memory • Store uses address from Z4 and data from MD4 (no longer needed) • Stage 5: writes result register • Z5 contains value to be written, which can be ALU result, effective address, PC link value, or fetched data • ra field always specifies result register in SRCComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 294. 5-28 Chapter 5—Processor Design—Advanced Topics Dependence Between Instructions in Pipe: Hazards • Instructions that occupy the pipeline together are being executed in parallel • This leads to the problem of instruction dependence, well known in parallel processing • The basic problem is that an instruction depends on the result of a previously issued instruction that is not yet complete • Two categories of hazards • Data hazards: incorrect use of old and new data • Branch hazards: fetch of wrong instruction on a change in PCComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 295. 5-29 Chapter 5—Processor Design—Advanced Topics Classification of Data Hazards • A read after write hazard (RAW) arises from a flow dependence, where an instruction uses data produced by a previous one • A write after read hazard (WAR) comes from an anti- dependence, where an instruction writes a new value over one that is still needed by a previous instruction • A write after write hazard (WAW) comes from an output dependence, where two parallel instructions write the same register and must do it in the order in which they were issuedComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 296. 5-30 Chapter 5—Processor Design—Advanced Topics Data Hazards in SRC • Since all data memory access occurs in stage 4, memory writes and reads are sequential and give rise to no hazards • Since all registers are written in the last stage, WAW and WAR hazards do not occur • Two writes always occur in the order issued, and a write always follows a previously issued read • SRC hazards on register data are limited to RAW hazards coming from flow dependence • Values are written into registers at the end of stage 5 but may be needed by a following instruction at the beginning of stage 2Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 297. 5-31 Chapter 5—Processor Design—Advanced Topics Possible Solutions to the Register Data Hazard Problem • Detection: • The machine manual could list rules specifying that a dependent instruction cannot be issued less than a given number of steps after the one on which it depends • This is usually too restrictive • Since the operation and operands are known at each stage, dependence on a following stage can be detected • Correction: • The dependent instruction can be “stalled” and those ahead of it in the pipeline allowed to complete • Result can be “forwarded” to a following inst. in a previous stage without waiting to be written into its register • Preferred SRC design will use detection, forwarding and stalling only when unavoidableComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 298. 5-32 Chapter 5—Processor Design—Advanced Topics Detecting Hazards and Dependence Distance • To detect hazards, pairs of instructions must be considered • Data is normally available after being written to register • Can be made available for forwarding as early as the stage where it is produced • Stage 3 output for ALU results, stage 4 for memory fetch • Operands normally needed in stage 2 • Can be received from forwarding as late as the stage in which they are used • Stage 3 for ALU operands and address modifiers, stage 4 for stored register, stage 2 for branch targetComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 299. 5-33 Chapter 5—Processor Design—Advanced Topics Instruction Pair Hazard Interaction Write to Reg. File Result Normally/Earliest available Read from Class alu load ladr brl Reg. File Class N/L N/E 6/4 6/5 6/4 6/2 Value alu 2/3 4/1 4/2 4/1 4/1 Normally/ load 2/3 4/1 4/2 4/1 4/1 Latest ladr 2/3 4/1 4/2 4/1 4/1 needed store 2/3 4/1 4/2 4/1 4/1 branch 2/2 4/2 4/3 4/2 4/1 Instruction separation to eliminate hazard, Normal/Forwarded • Latest needed stage 3 for store is based on address modifier register. The stored value is not needed until stage 4 • Store also needs an operand from ra. See Text Tbl 5.1Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 300. 5-34 Chapter 5—Processor Design—Advanced Topics Delays Unavoidable by Forwarding • In the Table 5.1 “Load” column, we see the value loaded cannot be available to the next instruction, even with forwarding • Can restrict compiler not to put a dependent instruction in the next position after a load (next 2 positions if the dependent instruction is a branch) • Target register cannot be forwarded to branch from the immediately preceding instruction • Code is restricted so that branch target must not be changed by instruction preceding branch (previous 2 instructions if loaded from memory) • Do not confuse this with the branch delay slot, which is a dependence of instruction fetch on branch, not a dependence of branch on something elseComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 301. 5-35 Chapter 5—Processor Design—Advanced Topics Stalling the Pipeline on Hazard Detection • Assuming hazard detection, the pipeline can be stalled by inhibiting earlier stage operation and allowing later stages to proceed • A simple way to inhibit a stage is a pause signal that turns off the clock to that stage so none of its output registers are changed • If stages 1 and 2, say, are paused, then something must be delivered to stage 3 so the rest of the pipeline can be cleared • Insertion of nop into the pipeline is an obvious choiceComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 302. 5-36 Chapter 5—Processor Design—Advanced Topics Example of Detecting ALU Hazards and Stalling Pipeline• The following expression detects hazards between ALU instructions in stages 2 and 3 and stalls the pipeline( alu3 ∧ alu2 ∧ ((ra3 = rb2) ∨ (ra3 = rc2) ∧¬ imm2 ) ) → ( pause2: pause1: op3 ← 0 ):• After such a stall, the hazard will be between stages 2 and 4, detected by( alu4 ∧ alu2 ∧ ((ra4 = rb2) ∨ (ra4 = rc2) ∧¬imm2 ) ) → ( pause2: pause1: op3 ← 0 ):• Hazards between stages 2 & 5 require( alu5 ∧ alu2 ∧ ((ra5 = rb2) ∨ (ra5 = rc2) ∧¬ imm2 ) ) → Ck ( pause2: pause1: op3 ← 0 ): To stage 1 pause1 Fig 5.13 Pipeline To stage 2 Clocking Signals pause2Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 303. 5-37 Chapter 5—Processor Design—Advanced Topics Fig 5.14 Stall Due to a Data Dependence Between Two ALU Instructions Clock cycle 1 Clock cycle 2 Clock cycle 3 Clock cycle 4 Clock cycle 5 New Fetch Stalled Stalled Stalled ld r8, addr2 ld r8, addr2 ld r8, addr2 ld r8, addr2 add r5, r8, r6 instruction Fetch Stalled Stalled Stalled add r1, r2, r3 add r1, r2, r3 add r1, r2, r3 add r1, r2, r3 ld r8, addr2 operands New New New ALU add r2, r3, r4 nop nop nop operation add r1, r2, r3 Memory sub r6, r5, #1 add r2, r3, r4 nop nop nop access Register nop nop shr r7, r7, #2 sub r6, r5, #1 add r2, r3, r4 write Completed Completed Completed Bloop!Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 304. 5-38 Chapter 5—Processor Design—Advanced Topics Data Forwarding: from ALU Instruction to ALU Instruction • The pair table for data dependencies says that if forwarding is done, dependent ALU instructions can be adjacent, not 4 apart • For this to work, dependences must be detected and data sent from where it is available directly to X or Y input of ALU • For a dependence of an ALU instruction in stage 3 on an ALU instruction in stage 5 the equation is alu5 ∧ alu3 → ((ra5 = rb3) → X ←Z5: (ra5 = rc3) ∧¬imm3 → Y ← Z5 ):Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 305. 5-39 Chapter 5—Processor Design—Advanced Topics Data Forwarding: ALU to ALU Instruction (cont’d) • For an ALU instruction in stage 3 depending on one in stage 4, the equation is alu4 ∧ alu3 → ((ra4 = rb3) → X ←Z4: (ra4 = rc3) ∧ ¬imm3 → Y ← Z4 ): • We can see that the rb and rc fields must be available in stage 3 for hazard detection • Multiplexers must be put on the X and Y inputs to the ALU so that Z4 or Z5 can replace either X3 or Y3 as inputsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 306. 5-40 Chapter 5—Processor Design—Advanced Topics Fig 5.15 Instruction memory PC Hazard 1. Instruction Inc4 Mp1 fetch Detection IR2 Register file G1 GA1 op ra rb rc c1 c2 G2 and 2. PC2 rb a1 R1 a2 R2 a3 R3 Mp2 cond W3 Decode Forwarding rc and ra c2 c2〈2..0〉 Branch operand c1 logic read Mp3 Mp4 • Can be from IR3 X3 Y3 MD3 op ra either Z4 or Z5 3. Mp6 rb, rc Mp7 to either X or Y ALU operation 2 X ALU Y Decode input to ALU 2 Z • rb and rc IR4 op ra Hazard 2 Z4 Data MD4 addr needed in 4. detection and forward unit 2 r/w memory stage 3 for Memory access Decode detection Mp5 Hazard IR5 detection and Z5 forward unit 5. op ra ra op,ra value write Decode reg write Computer Systems Design and Architecture by V. Heuring and H. Jordan©1996 Vincent P. Heuring and Harry F. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 307. 5-41 Chapter 5—Processor Design—Advanced Topics Restrictions Left If Forwarding Done Wherever Possible(1) Branch delay slot br r4 add . . .• The instruction after a branch is always executed, ••• whether the branch succeeds or not. ld r4, 4(r5)(2) Load delay slot nop• A register loaded from memory cannot be used neg r6, r4 as an operand in the next instruction. ld r0, 1000• A register loaded from memory cannot be used nop as a branch target for the next two instructions. nop br r0(3) Branch target• Result register of ALU or ladr instruction cannot not r0, r1 nop be used as branch target by the next instruction. br r0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 308. 5-42 Chapter 5—Processor Design—Advanced Topics Questions for Discussion • How and when would you debug this design? • How does RTN and similar Hardware Description Languages fit into testing and debugging? • What tools would you use, and which stage? • What kind of software test routines would you use? • How would you correct errors at each stage in the design?Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 309. 5-43 Chapter 5—Processor Design—Advanced Topics Instruction-Level Parallelism • A pipeline that is full of useful instructions completes at most one every clock cycle • Sometimes called the Flynn limit • If there are multiple function units and multiple instructions have been fetched, then it is possible to start several at once • Two approaches are: superscalar • Dynamically issue as many prefetched instructions to idle function units as possible • and Very Long Instruction Word (VLIW) • Statically compile long instruction words with many operations in a word, each for a different function unitComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 310. 5-44 Chapter 5—Processor Design—Advanced Topics Character of the Function Units in Multiple Issue Machines • There may be different types of function units • Floating-point • Integer • Branch • There can be more than one of the same type • Each function unit is itself pipelined • Branches become more of a problem • There are fewer clock cycles between branches • Branch units try to predict branch direction • Instructions at branch target may be prefetched, and even executed speculatively, in hopes the branch goes that wayComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 311. 5-45 Chapter 5—Processor Design—Advanced Topics Microprogramming: Basic Idea • Recall control sequence for 1-bus SRC Step Concrete RTN Control Sequence T0 MA ← PC: C ← PC + 4; PCout, MAin, INC4, Cin, Read T1 MD ← M[MA]: PC ← C; Cout, PCin, Wait T2 IR ← MD; MDout, IRin T3 A ← R[rb]; Grb, Rout, Ain T4 C ← A + R[rc]; Grc, Rout, ADD, Cin T5 R[ra] ← C; Cout, Gra, Rin, End • Control unit job is to generate the sequence of control signals • How about building a computer to do this?Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 312. 5-46 Chapter 5—Processor Design—Advanced Topics The Microcode Engine • A computer to generate control signals is much simpler than an ordinary computer • At the simplest, it just reads the control signals in order from a read-only memory • The memory is called the control store • A control store word, or microinstruction, contains a bit pattern telling which control signals are true in a specific step • The major issue is determining the order in which microinstructions are readComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 313. 5-47 Chapter 5—Processor Design—Advanced Topics Fig 5.16 Block Diagram of Microcoded Control Unit Ck CCs Other IR Opcode PLA • Microinstruction has Sequencer 2 (computes start addr) External branch control, n source branch address, and control signal fields Increment 4–1 Mux n • Microprogram µPC counter can be set n from several sources to do the required Control store sequencing k n m µBranch µIR control Branch Control signals address PCout, etc.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 314. 5-48 Chapter 5—Processor Design—Advanced Topics Parts of the Microprogrammed Control Unit • Since the control signals are just read from memory, the main function is sequencing • This is reflected in the several ways the µPC can be loaded • Output of incrementer—µPC + 1 • PLA output—start address for a macroinstruction • Branch address from µinstruction • External source—say for exception or reset • Micro conditional branches can depend on condition codes, data path state, external signals, etc.Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 315. 5-49 Chapter 5—Processor Design—Advanced Topics Contents of a Microinstruction Microinstruction format Branch control Control signals Branch address PCout MAin PCin End Cout Ain • Main component is list of 1/0 control signal values • There is a branch address in the control store • There are branch control bits to determine when to use the branch address and when to use µPC + 1Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 316. 5-50 Chapter 5—Processor Design—Advanced Topics Fig 5.17 The Control Store 0 µCode for instruction fetch • Common instruction a1 fetch µCode for add Microaddress sequence • Separate a2 µCode for br sequences for each (macro) instruction a3 µCode for shr • Wide words 2n-1 m bits wide k µbranch c control n branch control bits signals addr. bitsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 317. 5-51 Chapter 5—Processor Design—Advanced Topics Tbl 5.2 Control Signals for the add Instruction 101 ••• 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 102 ••• 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 103 ••• 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 200 ••• 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 201 ••• 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 0 202 ••• 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 • Addresses 101–103 are the instruction fetch • Addresses 200–202 do the add • Change of µcontrol from 103 to 200 uses a kind of µbranchComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 318. 5-52 Chapter 5—Processor Design—Advanced Topics Uses for µbranching in the Microprogrammed Control Unit (1) Branch to start of µcode for a specific inst. (2) Conditional control signals, e.g. CON → PCin (3) Looping on conditions, e.g. n ≠ 0 → ... Goto6 • Conditions will control µbranches instead of being ANDed with control signals • Microbranches are frequent and control store addresses are short, so it is reasonable to have a µbranch address field in every µ instructionComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 319. 5-53 Chapter 5—Processor Design—Advanced Topics Illustration of µbranching Control Logic • We illustrate a µbranching control scheme by a machine having condition code bits N and Z • Branch control has 2 parts: (1) selecting the input applied to the µPC and (2) specifying whether this input or µPC + 1 is used • We allow 4 possible inputs to µPC • The incremented value µPC + 1 • The PLA lookup table for the start of a macroinstruction • An externally supplied address • The branch address field in the µinstruction wordComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 320. 5-54 Chapter 5—Processor Design—Advanced Topics Fig 5.18 Branching Controls in the Microcoded Control Unit ZN Sequencer PLA 2 • 5 branch External address conditions 2 • NotN 2 2 4–1 Mux • N 2 • NotZ 2 Incr. µPC Branch address • Z Control • Unconditional store • To 1 of 4 places • Next 0 0 0 0 0 0 0 Control signals 24410 µinstruction 2 Mux control • PLA BrUn BrNotZ Mux Ctl Select • External BrZ 00 Increment µPc BrNotN 01 PLA address BrN 10 External address 11 Branch address • Branch addressComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 321. 5-55 Chapter 5—Processor Design—Advanced Topics Some Possible µbranches Using the Illustrated Logic (Refer to Tbl 5.3) Cont rol Branch Sig nals Address Branching act ion 00 0 0 0 0 0 ••• XXX None—next ins truct ion 01 1 0 0 0 0 ••• XXX Branch t o out pu t of PLA 10 0 0 1 0 0 ••• XXX Br if Z t o Ext ern. Addr. 11 0 0 0 0 1 ••• 300 Br if N t o 300 ( else next ) 11 0 0 0 1 0 0• • • 0 206 Br if N t o 206 ( else next ) 11 1 0 0 0 0 ••• 204 Br t o 204 • If the control signals are all zero, the µinstruction only does a test • Otherwise test is combined with data path activityComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 322. 5-56 Chapter 5—Processor Design—Advanced Topics Horizontal versus Vertical Microcode Schemes • In horizontal microcode, each control signal is represented by a bit in the µinstruction • In vertical microcode, a set of true control signals is represented by a shorter code • The name horizontal implies fewer control store words of more bits per word • Vertical µcode only allows RTs in a step for which there is a vertical µinstruction code • Thus vertical µcode may take more control store words of fewer bitsComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 323. 5-57 Chapter 5—Processor Design—Advanced Topics Fig 5.19 A Somewhat Vertical Encoding ALU Register-out ops field field µIR F5 F8 4 3 4–16 decoder 3–8 decoder 16 ALU 7 Regout control control signals signals • Scheme would save (16 + 7) - (4 + 3) = 16 bits/word in the case illustratedComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 324. 5-58 Chapter 5—Processor Design—Advanced Topics Fig 5.20 Completely Horizontal and Vertical Microcoding µPC Vertical control Horizontal store µPC control store Data Cin Inc4 MAin PCout n to 2n decoder path PCout MAin Inc4 CinComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 325. 5-59 Chapter 5—Processor Design—Advanced Topics Saving Control Store Bits with Horizontal Microcode • Some control signals cannot possibly be true at the same time • One and only one ALU function can be selected • Only one register out gate can be true with a single bus • Memory read and write cannot be true at the same step • A set of m such signals can be encoded using log2m bits (log2(m + 1) to allow for no signal true) • The raw control signals can then be generated by a k to 2k decoder, where 2k ≥ m (or 2k ≥ m + 1) • This is a compromise between horizontal and vertical encodingComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 326. 5-60 Chapter 5—Processor Design—Advanced Topics A Microprogrammed Control Unit for the 1-Bus SRC • Using the 1-bus SRC data path design gives a specific set of control signals • There are no condition codes, but data path signals CON and n = 0 will need to be tested • We will use µbranches BrCON, Brn = 0, and Brn ≠ 0 • We adopt the clocking logic of Fig. 4.14 • Logic for exception and reset signals is added to the microcode sequencer logic • Exception and reset are assumed to have been synchronized to the clockComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 327. 5-61 Chapter 5—Processor Design—Advanced Topics Tbl 5.4 The add Instruction Ot her Br Addr. Cont rol Act ions Addr. Sig nals 100 00 0 0 0 0 0 1 1 • • • XXX MA ← PC: C ← PC+4; 101 00 0 0 0 0 0 0 0 • • • XXX MD ← M[ MA] : PC ← C; 102 01 1 0 0 0 0 0 0 • • • XXX I R ← MD; µPC ← PLA; 200 00 0 0 0 0 0 0 0 • • • XXX A ← R [rb ]; 201 00 0 0 0 0 0 0 0 • • • XXX C ← A + R[rc] ; 202 11 1 0 0 0 1 0 0 • • • 1 00 R[ra] ← C: µPC ← 1 00; • Microbranching to the output of the PLA is shown at 102 • Microbranch to 100 at 202 starts next fetchComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 328. 5-62 Chapter 5—Processor Design—Advanced Topics Getting the PLA Output in Time for the Microbranch • For the input to the PLA to be correct for the µbranch in 102, it has to come from MD, not IR • An alternative is to use see-through latches for IR so the opcode can pass through IR to PLA before the end of the clock cycleComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 329. 5-63 Chapter 5—Processor Design—Advanced Topics See-Through Latch Hardware for IR So µPC Can Load Immediately IR〈 3 1 ..2 7 〉 µPC〈9..0 〉 P R Bus D Q PLA D Q 5 5 10 • Data must have time to get from S Cl MD across Bus, through IR, through the PLA, and satisfy µPC Clock set up time cy cle before trailing Str obe S edge of S Bus delay Bus Valid data Valid data Data at P V ali d Data at R Lat ch delay PLA delay PLA outp ut st robed int o µPCComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 330. 5-64 Chapter 5—Processor Design—Advanced Topics Fig 5.21 SRC Microcode Sequencer CON n=0 Exception Reset n 400 Sequencer n 000 Branch PLA address 10 External address 2 2 2–1 Mux 2 2 4–1 Mux 2 2 Increment µPC 2 n 2 Mux control BrUn BrCON BrN ≠ 0 BrN = 0 EndComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 331. 5-65 Chapter 5—Processor Design—Advanced Topics Tbl 5.6 Somewhat Vertical Encoding of the SRC MicroinstructionF1 F2 F3 F4 F5 F6 F7 F8 F9Mux Branch Out In Gat e Branch End Misc. ALUCt l cont rol sig nals sig nals regs. address00 000 BrUn 0 Cont . 000 PCout 000 MA in 000 Read 00 Gra 0000 ADD01 001 Br ¬CON 1 End 001 C 001 Wait 01 Grb 0001 C=B 10 bit s out 001 P in C10 010 BrCON 010 Ld 10 Grc 0010 SHR 010 MDout 010 IRin11 011Br n=0 011 Decr 11 None 0011 Inc4 011 Rout 011 A in 100 Br n≠0 100 CONin • 101 None 100 BA out 100 Rin 101 Cin • 101 c1 out 101 MDin 110 St op • 110 c2 out 110 None 111 None 1111 NOT 111 None2 bit s 3 bit s 1 bit 3 bit s 3 bit s 3 bit s 2 bit s 4 bit s 10 bit s Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 332. 5-66 Chapter 5—Processor Design—Advanced Topics Other Microprogramming Issues • Multiway branches: often an instruction can have 4–8 cases, say address modes • Could take 2–3 successive µbranches, i.e. clock pulses • The bits selecting the case can be ORed into the branch address of the µinstruction to get a several way branch • Say if 2 bits were ORed into the 3rd and 4th bits from the low end, 4 possible addresses ending in 0000, 0100, 1000, and 1100 would be generated as branch targets • Advantage is a multiway branch in one clock • A hardware push-down stack for the µPC can turn repeated µsequences into µsubroutines • Vertical µcode can be implemented using a horizontal µengine, sometimes called nanocodeComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 333. 5-67 Chapter 5—Processor Design—Advanced Topics Chapter 5 Summary • This chapter has dealt with some alternative ways of designing a computer • A pipelined design is aimed at making the computer fast— target of one instruction per clock • Forwarding, branch delay slot, and load delay slot are steps in approaching this goal • More than one issue per clock is possible, but beyond the scope of this text • Microprogramming is a design method with a target of easing the design task and allowing for easy design change or multiple compatible implementations of the same instruction setComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 334. 6-1 Chapter 6—Computer Arithmetic and the Arithmetic Unit Chapter 6: Computer Arithmetic and the Arithmetic Unit Topics 6.1 Number Systems and Radix Conversion 6.2 Fixed-Point Arithmetic 6.3 Seminumeric Aspects of ALU Design 6.4 Floating-Point ArithmeticComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 335. 6-2 Chapter 6—Computer Arithmetic and the Arithmetic Unit Digital Number Systems • Digital number systems have a base or radix b • Using positional notation, an m-digit base b number is written x = xm-1 xm-2 ... x1 x0 0 ≤ xi ≤ b-1, 0 ≤ i < m • The value of this unsigned integer is m-1 value(x) = ∑ x i ⋅ b i Eq. 6.1 i=0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 336. 6-3 Chapter 6—Computer Arithmetic and the Arithmetic Unit Range of Unsigned m Digit Base b Numbers • The largest number has all of its digits equal to b-1, the largest possible base b digit • Its value can be calculated in closed form m-1 m-1 xmax = ∑ ( b-1 ) ⋅bi = ( b-1 ) ⋅ ∑ b i = bm - 1 Eq. 6.2 i=0 i=0 • An important summation—geometric series m-1 bm - 1 ∑ bi = Eq. 6.3 b - 1 i=0Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 337. 6-4 Chapter 6—Computer Arithmetic and the Arithmetic Unit Radix Conversion: General Matters • Converting from one number system to another involves computation • We call the base in which calculation is done c and the other base b • Calculation is based on the division algorithm — For integers a and b, there exist integers q and r such that a = q⋅b + r, with 0 ≤ r ≤ b-1 • Notation: q = a/b r = a mod bComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 338. 6-5 Chapter 6—Computer Arithmetic and the Arithmetic Unit Digit Symbol Correspondence Between Bases • Each base has b (or c) different symbols to represent the digits • If b < c, there is a table of b + 1 entries giving base c symbols for each base b symbol and b • If the same symbol is used for the first b base c digits as for the base b digits, the table is implicit • If c < b, there is a table of b + 1 entries giving a base c number for each base b symbol and b • For base b digits ≥ c, the base c numbers have more than one digit Base 12: 0 1 2 3 4 5 6 7 8 9 A B 10 Base 3: 0 1 2 10 11 12 20 21 22 100 101 102 110Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 339. 6-6 Chapter 6—Computer Arithmetic and the Arithmetic Unit Convert Base b Integer to Calculator’s Base, c 1) Start with base b x = xm-1 xm-2 ... x1 x0 2) Set x = 0 in base c 3) Left to right, get next symbol xi 4) Lookup base c number Di for symbol xi 5) Calculate in base c: x = x⋅b + Di 6) If there are more digits, repeat from step 3 • Example: convert 3AF16 to base 10 x=0 x = 16x + 3 = 3 x = 16⋅3 + 10(= A) = 58 x = 16⋅58 + 15(= F) = 943Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 340. 6-7 Chapter 6—Computer Arithmetic and the Arithmetic Unit Convert Calculator’s Base Integer to Base b 1) Let x be the base c integer 2) Initialize i = 0 and v = x & get digits right to left 3) Set Di = v mod b & v = v/b. Lookup Di to get xi 4) i = i + 1; If v ≠ 0, repeat from step 3 • Example: convert 356710 to base 12 3587 ÷ 12 = 298 (rem = 11) ⇒ x0 = B 298 ÷ 12 = 24 (rem = 10) ⇒ x1 = A 24 ÷ 12 = 2 (rem = 0) ⇒ x2 = 0 2 ÷ 12 = 0 (rem = 2) ⇒ x3 = 2 Thus 358710 = 20AB12Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 341. 6-8 Chapter 6—Computer Arithmetic and the Arithmetic Unit Fractions and Fixed-Point Numbers • The value of the base b fraction .f-1f-2...f-m is the value of the integer f-1f-2...f-m divided by bm • The value of a mixed fixed point number xn-1xn-2...x1x0.x-1x-2...x-m is the value of the n+m digit integer xn-1xn-2...x1x0x-1x-2...x-m divided by bm • Moving radix point one place left divides by b • For fixed radix point position in word, this is a right shift of word • Moving radix point one place right multiplies by b • For fixed radix point position in word, this is a left shift of wordComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 342. 6-9 Chapter 6—Computer Arithmetic and the Arithmetic Unit Converting Fraction to Calculator’s Base • Can use integer conversion and divide result by bm • Alternative algorithm 1) Let base b number be .f-1f-2...f-m 2) Initialize f = 0.0 and i = -m 3) Find base c equivalent D of fi 4) f = (f + D)/b; i = i + 1 5) If i = 0, the result is f. Otherwise repeat from 3 • Example: convert 4138 to base 10 f = (0 + 3)/8 = 0.375 f = (0.375 + 1)/8 = 0.171875 f = (0.171875 + 4)/8 = 0.521484375Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 343. 6-10 Chapter 6—Computer Arithmetic and the Arithmetic Unit Nonterminating Fractions • The division in the algorithm may give a nonterminating fraction in the calculator’s base • This is a general problem: a fraction of m digits in one base may have any number of digits in another base • The calculator will normally keep only a fixed number of digits • Number should make base c accuracy about that of base b • This problem appears in generating base b digits of a base c fraction • The algorithm can continue to generate digits unless terminatedComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 344. 6-11 Chapter 6—Computer Arithmetic and the Arithmetic Unit Convert Fraction from Calculator’s Base to Base b 1) Start with exact fraction f in base c 2) Initialize i = 1 and v = f 3) D-i = b⋅v; v = b⋅v - D-i; Get base b f-i for D-i 4) i = i + 1; repeat from 3 unless v = 0 or enough base b digits have been generated • Example: convert 0.3110 to base 8 0.31×8 = 2.48 ⇒ f-1 = 2 0.48×8 = 3.84 ⇒ f-2 = 3 0.84×8 = 6.72 ⇒ f-1 = 6 • Since 83 > 102, 0.2368 has more accuracy than 0.3110Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 345. 6-12 Chapter 6—Computer Arithmetic and the Arithmetic Unit Conversion Between Related Bases by Digit Grouping • Let base b = ck; for example b = c2 • Then base b number x1x0 is base c number y3y2y1y0, where x1 base b = y3y2 base c and x0 base b = y1y0 base c • Examples: 1021304 = 10 21 304 = 49C16 49C16 = 0100 1001 11002 1021304 = 01 00 10 01 11 002 0100100111002 = 010 010 011 1002 = 22348Computer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 346. 6-13 Chapter 6—Computer Arithmetic and the Arithmetic Unit Negative Numbers, Complements, and Complement Representations We will: • Define two complement operations • Define two complement number systems • Systems represent both positive and negative numbers • Give a relation between complement and negate in a complement number system • Show how to compute the complements • Explain the relation between shifting and scaling a number by a power of the base • Lead up to the use of complement number systems in signed addition hardwareComputer Systems Design and Architecture by V. Heuring and H. Jordan © 1997 V. Heuring and H. Jordan http://krimo666.mylivepage.com/
  • 347. 6-14 Chapter 6—Computer Arithmetic and the Arithmetic Unit Complement Operations for m-Digit Base b Numbers • Radix complement of m-digit base b number x xc = (bm - x) mod bm • Diminished radix complement of x xc = bm - 1 - x • The complement of a number in the range 0≤x≤bm-1 is in the same range • The mod bm in the radix complement definition makes this true for x = 0; it has no effect for any other value