Bangladesh University of Engineering and TechnologyDigital System Design SessionalCSE 404Report on Very Simple Computer Design & ImplementationSection: B2 - Group No. : 02Submitted by :Md. Abul HashemStudent Id: 0805092Sirat SamyounStudent Id : 0805096Shibbir AhmedStudent Id : 0805097Imtiaz AhmadStudent Id: 0805100Md. Rashidujjaman RifatStudent Id : 0805107Date of Submission:June 22, 2013Submitted to:Nashid Shahriar Md. Mustafizur RahmanAssistant Professor, CSE, BUET Lecturer, CSE, BUETShagufta Mehnaz Sarah Masud PreumLecturer, CSE, BUET Lecturer, CSE, BUET
Problem DescriptionA very simple computer system has to be designed and implemented in this assignment. TheControl unit should be micro-programmed. Efficiency of design has to be maintained based ontwo principles stated below: Firstly, the fewer the T-states of instruction cycle, the better the design. Secondly, for architecture, the fewer the MSI(Medium Scale Integrated) or LSI (LargeScale Integrated) chips requirement, the better the design.There will some other important specification which has to be maintained to design the 4 bitcomputer system. These are stated below: The bus will be of 8-bit address bus and 4-bit data bus. There will be two independent units inside the processor – Fetch Unit and ExecutionUnit. So two stage pipe line will be preferable. In addition, programs (comprising instructions from the instruction set assigned) hasto be written on a normal PC and then sent the appropriate machine codes throughcommunication ports to the microcomputer implemented .Assigned Instruction SetSection:B2Group:02LDA address; STA address; MOV Acc, B; MOV B, Acc; MOV Acc, immediate; IN;OUT; ADD B; ADC B; SUB B; SBB B; ADC immediate; SBB immediate; CMP B;XCHG; JC address; JE address; PUSH; POP; CALL address; RET; JMP; HLT; NOP;STZ; CLZ; AND immediate; OR [address];
Solution ApproachIn order to design our 4 bit computer system we have maintained some specific approachesstated below: For executing 28 instructions maintaining fewer clock cycles on average, the entiresystem is designed in such a way that it consists of a 2 stage pipeline – instruction fetch& decode unit and 4 bit execution unit. The execution unit has been designed so that it has a shared 4 bit data bus. The memory system is organized as Harvard architecture. That is instruction memoryand data memories are separate units.Figure: Harvard architecture Programs are stored in instruction memory and data in data memory. Data memory alsocontains the program stack. All instructions are either 1 or 2 bytes long. Some instructions require 1 clock cycle onaverage and others require 2 clock cycles on average. The computer is designed in such a way that it can also communicate with the externaldevices through the input and output port registers. As the instruction set contains all types of instructions which generalize the computer toexecute any kind of program. However the 8 bit address bus restricts the programlength to be within 256 Bytes. The 4 bit arithmetic and logic unit can perform additionand subtraction and several logical operations.
Instruction SetInstruction DescriptionLDA address Acc Memory [address]STA address Memory [address] AccMOV Acc, B Acc BMOV B, Acc B AccMOV Acc, immediate Acc immediateIN Acc input_portOUT Output_port AccADD B Acc Acc + BADC B Acc Acc + B + C (Contents of Carry Flag)SUB B Acc Acc – BSBB B Acc Acc – B – Bo (Contents of Carry Flag)ADC immediate Acc Acc + immediate + C (Contents of Carry Flag)SBB immediate Acc Acc – immediate – Bo (Contents of Carry Flag)CMP B Accumulator will be unchanged. Set flags according to (Acc – B).XCHG Acc ↔ B (Exchanges contents of Accumulator and B)JC address Jumps to the address if carry flag is setJE address Jump if equalPUSH Pushes the content of Accumulator to the stackPOP Pops off stack to AccumulatorCALL address Calls a subroutine (at the specified address) unconditionallyRET Returns from current subroutine to the caller unconditionally
Instruction DescriptionJMP address Jumps unconditionally to the addressHLT Halts executionNOP No OperationSTZ Sets the zero flagCLZ Clears the zero flagAND immediate Acc Acc . immediateOR [address] Acc Acc | Memory [address]Description of Important Component1. Fetch and decode unitThis unit supplies instructions in program order to the execution unit. It performs the followingfunctions: Fetches instructions that will be executed next Decodes instruction into micro‐instructions Generates micro‐code for executing microinstructions I. Program counter registerThe program counter register contains the address of the next instruction to be executed. It isincremented automatically at each clock cycle to contain the next instruction address oroperand address. Sometimes it is directly loaded with address values by JMP, CALL, RET, JE andJC instructions. The output of the program counter register is directly fed to the input of theinstruction memory.II. Instruction memoryInstruction memory is a 256X8 RAM. It has 8 address pins and 8 output pins. So it is byteaddressable. Each byte contains the desired instruction. The read signal of instruction memoryis always activated. Since address inputs are directly connected with the outputs of programcounter register, hence the memory output is always available without requiring any extra clock
cycle. As long as program counter outputs are valid, memory outputs are also valid. There is noneed to write the instruction memory. Because we used separate memory for data andinstruction. Data write and stack writes occurs on the data memory. Since program code is readonly, hence write signal of instruction memory is deactivated permanently. Only at the start ofthe simulation program code needs to be loaded in the instruction ROM.III. Instruction registerThe instruction register holds the opcode of the instruction that is being executed on theexecution unit. This register is 8 bits long. The instruction register content is directly fed to theinstruction decoder unit to generate the micro‐operations for each instruction.IV. Instruction memory buffer registerThis register is used to hold the content of the second byte for all 2 byte instructions. The firstbyte which is the opcode byte is stored in the instruction register. During the first clock cyclethe opcode byte is fetched into the instruction register. During the second clock cycle thesecond byte is fetched and stored in the instruction memory buffer register. The instructionregister is not loaded during the second clock cycle.V. MUX1, Data memory address selectorThis MUX controls the input to the data memory address. The possible inputs can be SP value, required in PUSH instruction. SP value + 1, required in POP instruction. Instruction memory buffer register output, required in indirect data memory accesswhere data address value is given as second byte of the instruction.VI. Adder for computing PC + 1 and output bufferThis adder computes the value of current program counter + 1. It is required in CALL instructionwhere we need to save the address of next instruction to the stack. The 3 state buffer in frontof the adder for PC + 1 output is necessary so that the adder output does not load the datamemory output bus while data memory is accessed for other purpose.
VII. Bidirectional transceiver In some instructions we require that the data memory output is to be passed to theshared data bus. In some other instructions we require that shared bus output is to be loaded to the datamemory output bus. In other cases the data memory must not load the shared bus while others are using it. To meet all the above purposes we require a bi‐directional 3 state buffer. The bus transceiverdoes the same. It has a chip select pin. If this pin is in active it does no load the bus on bothsides. When chip select pin inactive, then depending on another pin (data direction pin), ittransfers data from any one side to the other side.VIII. Instruction decoder unitInstruction decoder unit decodes each instruction according to their type. This is a ROM thatgenerates the appropriate micro‐operation depending on the value in the instruction register.This decoder actually determines the ROM address at which the instruction is to be handled.The ROM address thus found generates the control signals for handling the execution of theinstruction. However at the next clock cycle, the control ROM address might be selected fromother sources other than instruction decoder output as determined by the instruction type andlength of clock cycle required to execute it. The important thing that it does is that it mapsmany instructions to the same initial address (all these instructions have same output from thecontrol ROM) of the ROM since all those instructions have the same control signals in their firstclock cycle. It reduces number of rows needed for the control ROM thus reducing the ROM size.2. Execution unitThe execution unit can perform arithmetic and logic operations, transfer values betweengeneral purpose registers and data memory and input and output port registers.I. Arithmetic and logic unitThis is a 4‐bit arithmetic and logic unit that supports the following operations: Arithmetic operations –addition, subtraction, increment, decrement and transferoperation Logical operations – logical or, xor, and, compliment operations
The operation of arithmetic logic unit is as determined by the function pins are given below:One operand of the arithmetic logic unit is the accumulator register. The other operand isselected by MUX2. The two selector pins S1, S0 of MUX2 selects second operand as follows:S1 S0 Operand0 0 B register0 1 11 0 Data memory output1 1 Instruction memory buffer register output Immediate operands are selected from instruction memory buffer register output. Memory operands are selected from data memory output. During NEG instruction value 1 is selected. During all other arithmetic and logic instructions B register is the second operand. II. RegistersThe entire register set consists of: Two general purpose registers Accumulator register and B register One program status register or flags register One temporary register One input port register One output port register III. Functions of each register Accumulator : For any arithmetic or logic operation one operand is the default accumulatorregister and the result also goes to this register.
B register : This works as a general purpose temporary storage register. We can movevalues in and out of this register to the accumulator. This register can also be selected asthe second operand for some arithmetic and logic operations. Temp register – The temporary register is used for XCHG instruction. It is used for swappingthe content of the accumulator with the content of B register. Input port register – The input port register is used for interfacing with devices outside thecomputer. This register is used to receive input values from devices like keyboard or anyother device. Output port register – the output port register is used to send data to devices external tothe computer. This device can be either a display or anything. Flags register –Stores status of recent arithmetic or logic operation. IV. The Status Flags RegisterThe status flags of the 4‐bit flags register indicate the results of arithmetic and logic operationssuch as ADD, SUB, OR, XOR. The status flag functions are:CF (bit 0) Carry flag – Set if an arithmetic operation generates a carry or a borrowout of the most significant bit of the result; cleared otherwise. This flagindicates an overflow for the unsigned‐integer arithmetic.SF (bit 1) Sign flag – Set equal to the most significant bit of the result which is thesign bit of the signed integer.ZF (bit 2) Zero flag – Set it the result is zero; cleared otherwise.VF (bit 3) Overflow flag – Set if the integer result is too large a positive number ortoo small a negative number (excluding the sign bit) to fit in thedestination operand; cleared otherwise. This flag indicates an overflowfor signed integer arithmetic.
Functionalities of the status flag register are given below: Arithmetic operations modifies all flags CF and VF are not changed during logic operations. The status flags allow a single arithmetic or logic operation to produce results for twodifferent data types: unsigned integers and signed integers. When performing multiple precision arithmetic on integers, the CF flag is used inconjunction with add with carry (ADC) and subtract with borrow (SBB) instruction topropagate a carry or borrow from one computation to the next. The conditional branch instructions (such as JE, JC) use one or more of the status flags ascondition codes and test them for branching.
Description of Control UnitControl unit consists of five 32K (4K X 8) ROM and one 32 K(4K X 8) ROM and 3 4 x1MUX for instruction decoder and control circuit . The ROM generates the controlword for each micro‐instruction and the 2 bit flip flop stores the next address selectorvalue which is fed into a MUX selector pins to select control ROM address.The full block diagram of control unit is given below:Figure: Block Diagram of Control ROM
Control State Diagram:Figure: State Diagram for Control ROMThe working principle of control unit is Stated below : For all instructions, at the first clock cycle the control ROM address is selected frominstruction decoder output. The next address selector input from the control ROM output selects the appropriatenext address of the control ROM through the MUX. For some 2 cycle instructions, at the second clock cycle the control ROM address input isselected from the instruction register output. These instructions have same controlsignals in their first clock cycle, however different control signals in their second clockcycle. For some other 2 cycle instructions, at the second clock cycle control ROM address inputis selected as the constant value 31. This is because all these instruction have samecontrol signals in their second clock cycle although they have different control signals intheir first clock cycle.
Pin Designation & Description of Control Rom PCOPS.1:0 – these two bits selects the pc operation at the next clock cycle according tothe following table: S1 S0 Operation0 0 Increment PC0 1 Load PC1 0 Load PC if V = 11 1 Load PC if Z = 1 PCINSEL – this pin selects the value to be loaded into the program counter. If 1 loads pcwith the value from instruction memory output; otherwise loads pc from data memoryoutput. IOPCODELOAD – Load instruction register. If 1 loads the instruction register with thevalue of instruction memory at the next clock cycle. DM.S1:S0 – these two bit selects the address that will be the input to the data memoryaccording to the following table: S1 S0 Address input to data memory0 0 Instruction memory buffer register0 1 SP1 0 SP+11 1 UnusedBit7 Bit6 Bit5 Bit4 Bit3 Bit2 Bit1 Bit0ROM1 Pcops1 Pcops0 pcinsel Iopcodeload Dms1 Dms0 Spcntsel SpcntROM2 Dmdbbirenble Dmdb Accout Accload Dmrws pcnextdm Immout AluoutROM3 Alubs1 Alubs0 Baseload Baseout Inout Outload Nss1 Nss0ROM4 Alus3 Alus2 Alus1 Alus0 Alum ALUCNS0 Zfs0 TmpoutROM5 Flagload Tmpload Zfs1
SPCNTSEL ‐ if 1 increments the stack pointer register at the next clock cycle; otherwisedecrements. Note that this pin is active only when SP.CEN = 1. SPCNT – count enable signal for SP. This pin enables SP.U/D# signal. DMDBBIRENBLE – this pin enables the bus transceiver. DMDB ‐ if 1 the bus transceiver sends data (from data memory to the data bus);otherwise receives data (from data bus to data memory). ACCOUT – output enable signal for the accumulator register. If active the accumulatorloads the data bus with its contents. ACCLOAD ‐ load signal for the accumulator register. If active accumulator will be loadedwith the value from data bus at the next clock cycle. DMRWS ‐ data memory read/write# signal. If 1 data memory is selected for readoperation; otherwise data memory is selected for write operation. PCNEXTDM ‐ this pin enables the output buffer for PC + 1 output latch. This loads theoutput of the data memory output bus. IMMOUT ‐ this pin enables output buffer for instruction memory buffer register. Thisloads the data bus with the value of this register. ALUOUT – output enable signal for arithmetic and logic unit ALUB.S1:S0 – these two bits selects the second operand for the arithmetic and logicoperation according to the following table:S1 S0 Operand0 0 B register0 1 11 0 Data memory output1 1 Instruction memory buffer register Output BASELOAD ‐ load signal for B register. BASEOUT – output enable signal for B register.
INOUT‐ output enable signal for input port register OUTLOAD‐ load signal for output port register. NS.S1:S0 – selects next ROM input ALU.S3:S0 – these FOUR bits selects the appropriate ALU operation to be performed. ALUM – selects the mode of ALU operation. If 1 the operation is logic; otherwiseoperation will be arithmetic ALUCNS0 – this bit determines the carry in signal for ALU along with some other bits. ZFS0 – clear the Zero flag. ZFS1 – set the Zero flag. TMPOUT – output enable for temp register. FLOAD ‐ load signal for flags register. TMPLOAD – load signal for temp register.
Instructions’ Steps (Timing Diagram)After power on the computer system starts execution of the program starting at memorylocation 0000 of the instruction memory. Hence the first instruction of the program must beloaded at this address. After then an unconditional jump instruction can be used to runprograms starting at any other address.The instruction memory must be loaded with the program to be executed. The data memoryshould be loaded with initial data values if program requires. The stack pointer (SP register) isinitially loaded with the value FFH at start up. Note that all data values and stack pops fetchesvalues from data memory. Stack pointer grows from higher end (address FF) of the datamemory to the lower addresses. After start up contents of any other register are undefined.The entire program must end with the HLT instruction. Otherwise processor will not stopexecuting instructions and thus no output can be observed.The two stage pipeline architecture allows one instruction in executing while anotherinstruction is fetched from the instruction memory. Some instructions are executed in two clockcycles (containing 1 fetch cycle and 1 execution cycle) and others require three clock cycles(containing 2 fetch cycle and 1 execution cycle).A two cycle instruction execution requires the following steps:Before the first clock cycle occurs, program counter contains the address of the instruction tobe executed. This address value is directly connected to the input address of instructionmemory. Hence the Op-code value is available at the output of the instruction memory. Clock cycle 1‐ At the first clock cycle (low to high transition) the Op-code value is loadedin the instruction register. The output of the instruction register is directly fed to theinput of the instruction decoder. Hence decoded output of the decoder is availablebefore the next clock cycle occurs. The decoded output is fed to the input of the controlROM which generates the control world before the next clock cycle occurs. Clock cycle 2 – At this clock cycle the instruction is executed. Since control word wasgenerated before this clock cycle, hence result of arithmetic or logical operation are alsocomputed before this clock cycle and appears in the shared data bus. Only the output ofthe instruction is loaded at the low to high transition of this clock cycle. Fetch of nextinstruction also occurs in this clock cycle.
Clock cycle 1: Clock cycle 2:Fetches op-code ExecutesinstructionFetches nextinstructionA three cycle instruction execution (except the control transfer instructions) requires thefollowing steps:Before the first clock cycle occurs, program counter contains the address of the instruction tobe executed. This address value is directly connected to the input address of instructionmemory. Hence the Op-code value is available at the output of the instruction memory. Clock cycle 1 ‐ At the first clock cycle (low to high transition) the Op-code value (firstbyte of the instruction) is loaded in the instruction register. The output of theinstruction register is directly fed to the input of the instruction decoder. Hencedecoded output of the decoder is available before the next clock cycle occurs. Thedecoded output is fed to the input of the control ROM which generates the controlworld before the next clock cycle occurs. Clock cycle 2‐ At the second clock cycle (low to high transition) second byte of theinstruction is loaded in the instruction memory buffer register (the instruction registerremains unchanged and contains the first byte of this instruction that was loaded at theprevious clock cycle). This can be either an immediate operand or data memory addressoperand. Clock cycle 3 – At the third clock cycle the instruction is executed. Since control word wasgenerated before this clock cycle, hence result of arithmetic or logical operation are alsocomputed before this clock cycle and appears in the shared databus. Only the output ofthe instruction is loaded at the low to high transition of this clock cycle. In this clockcycle fetch of next instruction also occurs.Clock cycle 1: Clock cycle 2: Clock cycle 3:Fetches op-code Fetches operand Executesvalue Fetches nextinstruction
For control transfer instruction (such as JMP, CALL) the following steps occur: Clock cycle 1 – same as others. Clock cycle 2‐ At this clock cycle, the program counter is loaded with address valueselected by MUX3 from either instruction memory ( during JMP, JE, JO, CALLinstructions) or data memory ( during RET instruction). In case of CALL instruction, theaddress of the next sequential instruction (current program counter value + 1) is alsostored in the stack (data memory) at this clock cycle. The selector pin S0 determines theaddress value that is loaded to the program counter as follows: Clock cycle 3 – At the third clock cycle no operation is performed. Only the fetch of thenext instruction occurs. This can be considered as a pipeline stall. Example:Code Segment for example:S0 Value0 Instruction Memory Output1 Data memory outputClock cycle 1: Clock cycle 2: Clock cycle 3:Fetches op-code Executes Fetches theinstruction Op-code for nextinstructionAddress Instruction00 IN01 ADD B02 SUB 0904 PUSH05 ADD B06 POP07 JMP 09
Figure: Actual timing diagram for example program with control signalsAnother example for programs that has several pipeline stalls due to JMP, CALL and RET is givenbelow:Address Instruction00 IN01 ADD B02 CALL 0808 RET04 JMP 0A10 ADD B
Figure: Time Line diagram for the example programF1 Fetch op-codeF2 Fetch operandE ExecutionE1 Execution & Load PCS Stall of execution pipelineC Clock Cycle
Average CPIIn order to calculate average CPI (Cycles per Instruction) we have to consider separate clockcycles for each instruction. For 2 cycle instructions, due to pipeline architecture clock cycle 2 (ofcurrent instruction executing) and clock cycle 1(of next instruction) are performed parallel inonly 1 clock cycle. For 3 cycle instructions clock cycle 3 (of current instruction executing) andclock cycle 1(of next instruction) are performed parallel in only 1 clock cycle This reduces theaverage clock cycle required for each instruction executed. The following table shows therequired number of clock cycles for each instruction (considering pipelined architecture):So, Average CPI = ((13*2)+15) = 1.46428No. Instruction Clock cycles1 ADD B 12 ADC B 13 SUB B 14 SBB B 15 CMP B 16 MOV Acc ,B 17 MOV B ,Acc 18 IN 19 OUT 110 STZ 111 CLZ 112 LDA [addr] 213 STA[addr] 214 MOV Acc , imm 215 ADC imm 216 SBB imm 217 AND imm 218 OR[addr] 219 JC addr 220 JE addr 221 CALL addr 222 JMP addr 223 XCHG 224 RET 225 PUSH 126 POP 127 NOP 128 HLT 1
IC used for 4-bit PC DesignIC Number Name Pcs74126 Quadruple Bus Buffer Gate With TristateOutputs874194 4-Bit Bidirectional Universal Shift Registers 674LS181 ALU/ Function Generators 27404 Hex inverter 474LS86 Quadruple 2-input Exclusive-OR Gates 14002 Dual 4-Input NOR Gate 174153 Dual 1-of-4 Data Selectors/Multiplexers 116116 16K (2K X 8) Static RAM 174LS245 Octal Bus Transceivers with TristateOutputs17408 Quadruple 2-input Positive-AND Gates 24555 Dual 1-to-4 Line Decoder/ Demultiplexer 1Counter_8 Universal Counter Digital Primitive ModelWith Up/Down Clock27483 4- bit Binary Full Adder With Fast Carry 42732 32K (4K X 8) EPROM 774157 Quadruple 1-of-2 DataSelectors/Multiplexers274198 8-bit Bidirectional Universal Shift RegisterWith Asynchronous Reset2Total IC used 55
How to simulate our 4-bit PC1. At first, the assembler.cpp program has to be compiled and build which has beenwritten for generating the binary file with hex value from the instruction set in input.txtfile.Figure: Compiling assembler.cpp2. After compiling and executing the program, ‘out.bin’ has been created in the samedirectory where ‘in.txt’ has been located.Figure: Input Code Segment & Corresponding Generated output Hex value
Discussion As , we have used Proteus Design Suite which is very user friendly and convenient fordesigning the 4-bit PC , any modification in the design can be easily manipulated . Wehave designed the circuit diagram with easy to handle wireless terminal method bygiving label and marking the important components with marker, for this anyone caneasily understand the circuit diagram from block diagram. We have separated the data memory from instruction memory. This reduced numberof average clock cycles for those instructions to be nearly to one. We have easilymanipulated writing to data memory by using 16K (2K X 8) Static RAM. The design contains 4-bit data bus, there is no need for address bus in our design asaddress has not been needed to be shared like the data. There is two independent unitsinside the processor – Fetch unit and Execution unit, so two stage pipe line has beensuccessfully implemented. Average CPI is calculated 1.46428 and the number of IC used is 55 only. So it can bestated that , our 4-bit PC design both satisfies the condition of lowering the T-states ofinstruction cycle and using the fewer MSI/LSI chips for the design.The Circuit Diagram (.DSN file), assembler (.cpp file), block diagram (.pptx file) and othernecessary (.bin) files are available for anyone for further usage and modification at followinglink:4-bit PC Contents Available Here for You.