2. TOPICS COVERED
FUNDAMENTAL CONCEPTS
INSTRUCTION CYCLE
EXECUTION OF A COMPLETE INSTRUCTION
MULTIPLE BUS ORGANIZATION
SEQUENCING OF CONTROL SIGNALS
2
3. 3
INTRODUCTION
This unit is about the processing unit, which executes machine
instructions and coordinates the activities of other units.
This is also called a processor or instruction set processor (ISP).
We understand its internal structure and how it performs the tasks of
fetching, decoding, and executing instructions of a program.
The processing unit is called central processing unit (CPU).
We explore the organization of the hardware that enables a CPU to
perform its main function.
We learn how the execution of a complete instruction takes place and
we also learn Branch Instructions
4. 4
SOME FUNDAMENTAL CONCEPTS
A program, a set of instructions, to be executed by a computer is
loaded in sequential locations in the main memory.
To execute this program, the CPU fetches one instruction at a time and
performs the functions specified.
Until a branch or a jump instruction is executed, instructions are
fetched from successive memory locations.
The address of the next instruction to be executed is kept by the CPU
in a dedicated register called program counter (PC).
The contents of the PC are updated to point to the next instruction in
the sequence.
5. 5
Assume that each instruction occupies one memory word.
Therefore, one instruction execution requires the CPU to perform the
following 3 steps:
1. Fetch the contents of the memory location by the PC into
instruction register (IR).
Symbolically, this can be written as:
IR ← [[PC]]
2. Increment the contents of the PC by 1, i.e., (assuming word
addressable) PC ← [PC]+1
3. Carry out the actions specified by the instruction in the IR.
Steps 1 and 2 are called the fetch phase and step 3 is called the
execution phase.
7. 7
Figure shows an organization in which the arithmetic and logic unit
(ALU) and all the registers are interconnected via a single common
bus.
This bus is internal to the processor
The address and data lines of the external memory bus are connected
to the internal processor bus via the memory data register, MDR, and
the memory address register, MAR, respectively.
Register MDR has two inputs and two outputs.
Data may be loaded into MDR either from the memory bus or from
the internal processor bus.
8. 8
The data stored in MDR may be placed on either bus. The input of
MAR is connected to the internal bus, and its output is connected to
the external bus.
The control lines of the memory bus are connected to the instruction
decoder and control logic block.
This unit is responsible for issuing the signals that control the
operation of all the units inside the processor and for interacting with
the memory bus.
The use and number of the processor registers R0 through R(n-1) vary
considerably from one processor to another
Three registers, Y, S, and TEMP are used by the processor for
temporary storage during execution of some instructions
9. 9
The multiplexer MUX selects either the output of register Y or a
constant value 4 to be provided as input A of the ALU.
The constant 4 is used to increment the contents of the program
counter.
We will refer to the two possible values of the MUX control input
Select as Select4 and Select Y for selecting the constant 4 or register
Y, respectively.
10. 10
As instruction execution progresses, data are transferred from one
register to another, often passing through the ALU to perform some
arithmetic or logic operation.
The instruction decoder and control logic unit is responsible for
implementing the actions specified by the instruction loaded in the IR
register.
The decoder generates the control signals needed to select the registers
involved and direct the transfer of data.
The registers, the ALU, and the interconnecting bus are collectively
referred to as the datapath
11. 11
With few exceptions, an instruction can be executed by performing
one or more of the following operations in some specified sequence:
1. Fetch the contents of a given memory location and load them into a
processor register.
2. Store a word of data from a processor register into a given memory
location.
3. Transfer a word of data from one processor register to another or
to the ALU.
4. Perform an arithmetic or logic operation and store the result in a
processor register
12. 1. Fetching a Word from memory
To fetch a word from memory, the CPU has to specify the address of
the memory location where this information is stored and request a
read operation.
The CPU transfers the address of the required word of information to
the MAR, which is connected to address lines of the memory bus.
The CPU uses the control lines of the memory bus to indicate a Read
operation is needed.
Then the CPU waits for Read operation completion, which is indicated
by Memory-Function Completed (MFC) signal set. When the MFC is
set, the information on the data lines is loaded into MDR
13. The connections for register MDR has four control signals: MDRin
and MDRout control the connection to the internal bus, and MDRin E
and MDRout E control the connection to the external bus.
Example :how to fetch a word from memory location, whose
address is specified in R1, and place the word fetched in R2
1 MAR ← [R1]
2 Request memory READ and put the data to the address register
3 Wait for the Memory Fetch Cycle (MFC) signal
4 Load MDR from the memory
5 R2 ←[MDR]
14. The memory read operation requires three steps:
Which can be described by the signals being activated as follows:
1. R1out, MARin, Read
2. MDRinE, WMFC
3. MDRout, R2in
Where WMFC is the control signal that causes the processor’s control
circuitry to wait for the arrival of the MFC signal.
14
15. 2. Storing a Word in Memory
After the address is loaded into MAR and data into MDR, The CPU
uses the control lines of the memory bus to indicate a Write operation
is needed.
The example below shows how the machine store a word in R2 into a
memory location, whose address is specified in R1
1 MAR ← [R1]
2 MDR ←[R2]
3 Request memory write
4 Wait for MFC signal
15
16. Executing the instruction Move R2,(R1) requires the following steps:
1. R1out, MARin,
2. R2out, MDRin, Write
3. MDRoutE, WMFC
As in the case of the Read operation, the Write control signal causes
the memory bus interface hardware to issue a Write command on the
memory bus.
The processor remains in step 3 until the memory operation is
completed and an MFC response is received
16
17. 3. Register Transfers
Instruction execution involves a sequence of steps in which data are
transferred from one register to another.
For each register, two control signals are used to place the contents of
that register on the bus or to load the data on the bus into the register.
This is represented symbolically in Figure the input and output of
register Ri are connected to the bus via switches controlled by the
signals Riinn and Riout respectively.
When Riinn is set to 1, the data on the bus are loaded into Ri.
Similarly, when Riout is set to 1, the contents of register Ri are placed
on the bus.
While Riout is equal to 0, the bus can be used for transferring data
17
19. Suppose that ,to transfer the contents of register R1 to register R4.
This can be accomplished as follows:
Enable the output of register R1 by setting R1out to 1.
• This places the contents of R1 on the processor bus.
Enable the input of register R4 by setting R4in to 1.
• This loads data from the processor bus into register R4.
19
20. All operations and data transfers within the processor take place
within time periods defined by the processor clock.
The control signals that govern a particular transfer are asserted at the
start of the clock cycle.
In our example, R1out and R4in are set to 1.
The registers consist of edge-triggered flip-flops.
Hence, at the next active edge of the clock, the fip-flops that
constitute R4 will load the data present at their inputs.
At the same time, the controls signals R1out and R4in will return to 0.
20
21. 4. Performing an Arithmetic or Logic Operation
The ALU is a combinational circuit that has no internal storage.
It performs arithmetic and logic operations on the two operands
applied to its‘ A and B inputs.
To add two numbers, the two operands have to be made available at
the inputs of the ALU simultaneously.
In Figures one of the operands is the output of the multiplexer MUX
and the other operand is obtained directly from the bus.
The result produced by the ALU is stored temporarily in register Z.
21
22. A sequence of operations to add the contents of register R1 to those of
register R2 and store the result in register R3 is:
1. R1out, Yin
2. R2out, Select Y, Add, Zin
3. Zout, R3in
The signals whose names are given in any step are activated for the
duration of the clock cycle corresponding to that step.
All other signals are inactive.
Hence, in step 1, the output of register R1 and the input of register Y
are enabled, causing the contents of R1 to be transferred over the bus
to Y.
22
23. In step 2, the multiplexer‘s Select signal is set to Select Y, causing the
multiplexer to gate the contents of register Y to input A of the ALU.
At the same time, the contents of register R2 are gated onto the bus
and, hence, to input B.
The function performed by the ALU depends on the signals applied to
its control lines.
In this case, the Add line is set to 1, causing the output of the ALU to
be the sum of the two numbers at inputs A and B.
This sum is loaded into register Z, because its input control signal is
activated.
In step 3, the contents of register Z are transferred to the destination
register, R3.
This last transfer cannot be earned out during step 2, because only one
register output can be connected to the bus during any clock cycle.
23
24. EXECUTION OF A COMPLETE INSTRUCTION
Let us now consider the sequence of elementary operations required to
execute one instruction.
Consider the instruction,
Add (R3), R1 which adds the contents of a memory location
pointed to by R3 to register R1.
Executing this instruction requires the following actions:
1. Fetch the instruction
2. Fetch the first operand (the contents of the memory
location pointed to by R3)
3. Perform the addition
4. Load the result into R1
24
25. Instruction execution is as follows:
In step1, the instruction fetch operation is initiated by loading the
contents of the PC into the MAR and sending a Read request to the
memory.
The Select signal is set to Select1, which causes the multiplexer MUX
to select the constant 1.
This value is added to the operand at input B, which is the contents of
the PC and the result is stored in register.
The updated value is moved from register back into the PC during step
2, while waiting for the memory to respond, the word fetched from the
memory is loaded into the IR 25
26. Control Sequence for Execution of the
instruction ADD (R3), R1
Step Action
1. PCout , MARin , Read, Select1, Add, Zin
2. Zout , PCin , Y in , WMFC
3. MDRout , IRin
4. R3out , MARin , Read
5. R1out , Yin , WMFC
6. MDRout , SelectY, Add Zin
7. Zout , R1in , End
26
27. Steps 1 through 3 constitute the instruction fetch phase, which is the
same for all instructions.
The instruction decoding circuit interprets the contents of the IR at the
beginning of step 4.
This enables the control circuitry to activate the control signals for
steps 4 through 7, which constitute the execution phase.
The contents of register R3 are transferred to the MAR in step 4 and a
memory Read operation is initiated.
Then, the contents of R1 are transferred to register Y in step 5, to
prepare for the addition operation.
27
28. When the Read operation is completed, the memory operand is
available in register MDR and the addition operation is performed in
step 6.
The contents of MDR are gated to the bus and thus also to the B input
of the ALU and register Y is selected as the second input to the ALU
by choosing Select Y.
The sum is stored in register Z and then transferred toR1 in step 7.
The End signal causes a new instruction fetch cycle to begin by
returning to step 1.
28
29. Branch Instructions
A branch instruction replaces the contents of the PC with the branch
target address.
This address is usually obtained by adding an offset X which is given
in the branch instruction to the updated value of the PC.
Control sequence that implements an unconditional branch instruction
is as follows:
29
31. Processing starts, as usual with the fetch phase.
This phase ends when the instruction is loaded into the IR in step 3.
The offset value is extracted from the IR by the instruction decoding
circuit which will also perform sign extension if required.
Since the value of the updated PC is already available in register Y,
the offset X is gated onto the bus in step 4 and an addition operation is
performed.
The result, which is the branch target address, is loaded into the PC in
step 5
31
32. The offset X used in branch instruction is usually the difference
between the branch target address and the address immediately
following the branch instruction.
For example, if the branch instruction is at location 2000 and if the
branch target address is 2050, the value of X must be 49.
The PC is incremented during the fetch phase, before knowing the
type of instruction being executed.
Thus, when the branch address is computed in step 4, the PC value
used is the updated value, which points to the instruction following the
branch instruction in the memory.
32
33. Consider now a conditional branch.
In this case, we need to check the status of the condition codes before
loading a new value into the PC.
For example, for a Branch-on negative (Branch<0) instruction, step 4
in Figure 5.5 is replaced with:
Offset-field-of-IRout, Add, Zin , If N=0 then End
Thus, if N=0, the processor returns to step 1 immediately after step 4.
If N=1, step 5 is performed to load a new value into the PC, thus
performing the branch operation
33
34. MULTIPLE BUS ORGANIZATION
Three-bus structure used to connect the registers and the ALU of a
processor.
All general-purpose registers are combined into a single block called
the register file.
In VLSI technology, the most efficient way to implement a number of
registers is in the form of an array of memory cells similar to those
used in the implementation of random-access memories (RAMs).
34
35. The register file have three ports.
There are two outputs, allowing the contents of two different registers
to be accessed simultaneously and have their contents placed on buses
A and B.
The third port allows the data on bus C to be loaded into a third
register during the same clock cycle
35
37. Buses A and B are used to transfer the source operands to the A and B
inputs of the ALU where an arithmetic or logic operation may be
performed.
The result is transferred to the destination over bus C.
If needed, the ALU may simply pass one of its two input operands
unmodified to bus C.
We will call the ALU control signals for such an operation R=A or
R=B.
37
38. A second feature is the introduction of the Incrementer unit, which is
used to increment the PC by 4.
Using the Incrementer, eliminates the need to add 4 to the PC.
The source for the constant 4 at the ALU input multiplexer is still
useful.
It can be used to increment other addresses such as the memory
addresses in Load Multiple and Store Multiple instructions
38
39. The structure in Figure requires significantly fewer control steps to
execute instructions .
Consider the three-operand instruction of the form
OP Rsrc1, Rsrc2, Rdst
in which an operation is performed on the contents of
two source registers, and the result is placed into a destination register.
Buses A and B are used to transfer the source operands, and bus C
provides the path to the destination.
The path from the source buses to the destination bus goes through the
ALU, where the required operation is performed.
39
40. Thus, assuming that the operation to be performed can be completed in
one pass through the ALU, the structure of Figure allows the
execution phase of an instruction to be performed in one cycle.
Note that if it is merely necessary to copy the contents of one register
into another, then the transfer is also done through the ALU, but no
arithmetic or logic operation is performed
40
41. The temporary storage registers Y and Z are not required .
Register Y is not needed because both inputs to the ALU are provided
simultaneously via buses A and B.
Register Z is not needed because the output from the ALU is
transferred to the destination register via the third bus, C.
In this structure it is essential to ensure that the same register can serve
as both the source and the destination in a given instruction.
41
42. Example:
Add R4, R5, R6
The control sequence for executing this instruction is:
In step 1,the contents of the PC are passed through the ALU using the
R=B control signal and loaded into the MAR to start a memory read
operation.
At the same time, the PC is incremented by 4.
Note that the value loaded into MAR is the original contents of the
PC
42
43. The incremented value is loaded into the PC at the end of the clock
cycle and will not affect the contents of MAR.
In step 2, the processor waits for MFC and loads the data received
into MDR and then transfers them to IR in step 3.
Finally, the execution phase of the instruction requires only one
control step to complete step 4.
43
44. Control Sequence for the instruction Add R4, R5, R6
Step Action
1. PC out , R=B, MAR in , Read, IncPC
2. WMFC
3. MDR out , R=B, IR in
4. R4outA , R5outBt, SelectA, Add, R6in , End
44
45. By providing more paths for data transfer, a significant reduction in
the number of clock cycles needed to execute an instruction is
achieved.
The three-bus structure allows execution of register-to-register
operation in a single clock cycle.
This is particularly well suited to the requirements of RISC processors,
in which most arithmetic and logic instructions have register operands.
45
Editor's Notes
Since the instructions and data need to be feteched from the memory in order to perform a task, the time it takes to access and fetch this information will be one factor influencing how fast a given task will complete. In order to increase the speed of performing a task, one way is to reduce the amount of time it takes to fetch the data and the instructions. This time is called as “access time”.
Suppose if we want to fetch the data at memory location with the address 10. In case of sequential access, we have to access locations 1-9, and then access location 10. Clearly, in case of sequential access the access times increase as memory locations with higher access times are accessed. We need some kind of memory which provides fixed and short access time irrespective of the memory location being accessed. That is, it provides random access.
Why is the access time faster for the Cache than it is for primary storage? I haven’t yet discussed how the various units communicate with each other. In a few minutes I will discuss that, and it will become clear.