1. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 1
Design of Program Control Unit
The Control Unit is classified into two major categories:
1. Hardwired Control
2. Micro programmed Control
Hardwired Control
The Hardwired Control organization involves the control logic to
be implemented with gates, flip-flops, decoders, and other digital
circuits.
o A Hard-wired Control consists of two decoders, a
sequence counter, and a number of logic gates.
o An instruction fetched from the memory unit is placed in
the instruction register (IR).
o The component of an instruction register includes; I bit,
the operation code, and bits 0 through 11.
o The operation code in bits 12 through 14 are coded with a
3 x 8 decoder.
o The outputs of the decoder are designated by the symbols
D0 through D7.
o The operation code at bit 15 is transferred to a flip-flop
designated by the symbol I.
o The operation codes from Bits 0 through 11 are applied
to the control logic gates.
o The Sequence counter (SC) can count in binary from 0
through 15.
Micro-programmed Control
The Micro-programmed Control organization is implemented by
using the programming approach.
In Micro-programmed Control, the micro-operations are
performed by executing a program consisting of micro-
instructions.
o The Control memory address register specifies the
address of the micro-instruction.
o The Control memory is assumed to be a ROM, within
which all control information is permanently stored.
o The control register holds the microinstruction fetched
from the memory.
o The micro-instruction contains a control word that
specifies one or more micro-operations for the data
processor.
o While the micro-operations are being executed, the next
address is computed in the next address generator circuit
and then transferred into the control address register to
read the next microinstruction.
o The next address generator is often referred to as a
micro-program sequencer, as it determines the address
sequence that is read from control memory.
Difference between Hardwired Control and Micro
programmed Control
Hardwired Control Micro-programmed Control
Technology is circuit
based.
Technology is software based.
It is implemented
through flip-flops, gates,
decoders etc.
Microinstructions generate signals
to control the execution of
instructions.
Fixed instruction
format.
Variable instruction format (16-64
bits per instruction).
Instructions are register
based.
Instructions are not register based.
ROM is not used. ROM is used.
It is used in RISC. It is used in CISC.
Faster decoding. Slower decoding.
Difficult to modify. Easily modified.
Chip area is less. Chip area is large.
Computer Instructions
Computer instructions are a set of machine language instructions
that a particular processor understands and executes. A computer
performs tasks on the basis of the instruction provided.
An instruction comprises of groups called fields. These fields
include:
o The Operation code (Opcode) field which specifies the
operation to be performed.
o The Address field which contains the location of the
operand, i.e., register or memory location.
o The Mode field which specifies how the operand will be
located.
A basic computer has three instruction code formats which are:
1. Memory - reference instruction
2. Register - reference instruction
3. Input-Output instruction
Memory - reference instruction
2. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 2
In Memory-reference instruction, 12 bits of memory is used to
specify an address and one bit to specify the addressing mode 'I'.
Register - reference instruction
The Register-reference instructions are represented by the Opcode
111 with a 0 in the leftmost bit (bit 15) of the instruction.
Note: The Operation code (Opcode) of an instruction refers
to a group of bits that define arithmetic and logic
operations such as add, subtract, multiply, shift, and
compliment.
A Register-reference instruction specifies an operation on or a test
of the AC (Accumulator) register.
Input-Output instruction
Just like the Register-reference instruction, an Input-Output
instruction does not need a reference to memory and is recognized
by the operation code 111 with a 1 in the leftmost bit of the
instruction. The remaining 12 bits are used to specify the type of
the input-output operation or test performed.
Note
o The three operation code bits in positions 12 through 14
should be equal to 111. Otherwise, the instruction is a
memory-reference type, and the bit in position 15 is
taken as the addressing mode I.
o When the three operation code bits are equal to 111,
control unit inspects the bit in position 15. If the bit is 0,
the instruction is a register-reference type. Otherwise, the
instruction is an input-output type having bit 1 at position
15.
Instruction types
Examples of operations common to many instruction sets include:
Data handling and memory operations
1. Set a register to a fixed constant value.
• Copy data from a memory location or a register to a memory
location or a register (a machine instruction is often
called move; however, the term is misleading). Used to store
the contents of a register, the result of a computation, or to
retrieve stored data to perform a computation on it later.
Often called load and store operations.
• Read and write data from hardware devices.
2. Arithmetic and logic operations
• Add, subtract, multiply, or divide the values of two registers,
placing the result in a register, possibly setting one or
more condition codes in a status register.
o increment, decrement in some ISAs, saving operand
fetch in trivial cases.
• Perform bitwise operations, e.g., taking
the conjunction and disjunction of corresponding bits in a
pair of registers, taking the negation of each bit in a register.
• Compare two values in registers (for example, to see if one is
less, or if they are equal).
• Floating-point instructions for arithmetic on floating-point
numbers.
3. Control flow operations
• Branch to another location in the program and execute
instructions there.
• Conditionally branch to another location if a certain
condition holds.
• Indirectly branch to another location.
• Call another block of code, while saving the location of the
next instruction as a point to return to.
4. Coprocessor instructions
• Load/store data to and from a coprocessor or exchanging
with CPU registers.
• Perform coprocessor operations.
Instructions Format –
An instruction format defines the different component of
an instruction. The main components of an instruction
are opcode (which instruction to be executed) and
operands (data on which instruction to be executed).
Here are the different terms related to instruction
format:
• Instruction set size – It tells the total number of
instructions defined in the processor.
• Opcode size – It is the number of bits occupied by
the opcode which is calculated by taking log of
instruction set size.
• Operand size – It is the number of bits occupied by
the operand.
• Instruction size – It is calculated as sum of bits
occupied by opcode and operands.
Instruction set- The instruction set, also
called ISA (instruction set architecture), is part of a
computer that pertains to programming, which is
more or less machine language
Examples of instruction set
• ADD – Add two numbers together.
• COMPARE - Compare numbers.
• IN – Input information from a device, e.g.,
keyboard.
• JUMP – Jump to designated RAM address.
• JUMP IF - Conditional statement that
jumps to a designated RAM address.
3. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 3
• LOAD - Load information from RAM to the
CPU.
• OUT - Output information to device, e.g.,
monitor.
• STORE - Store information to RAM.
Instruction Cycle
A program residing in the memory unit of a computer
consists of a sequence of instructions. Instructions are
processed under direction of the control unit in step-by-
step manner. Each step is referred to as a phase.
There are six fundamental phases of the instruction cycle:
1. fetch instruction (aka pre-fetch)
2. decode instruction
3. evaluate address (address generation)
4. fetch operands (read memory data)
5. execute (ALU access)
6. store result (write back memory data)
• Instruction cycle:
• Pentium 4 instruction cycle:
Fetch Instruction Phase
• Obtain next instruction from memory.
• Load instruction into instruction register IR.
• MAR is loaded with instruction pointer.
• The instruction is loaded through the MDR.
• Increment processor counter PC, that is, updates
instruction pointer address while reading
instruction from memory.
• Memory Circuitry:
Memory Operations:
Decode Instruction Phase
• Decoder circuit examines opcode of the
instruction.
• Result is selecting unique decoder output line.
• Output line signals a circuit which implements the
corresponding operation.
• Instruction decoder:
4. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 4
Evaluate Operand Address Phase
Compute address of the memory location of the
instruction operand.
Memory Circuitry:
Fetch Operands Phase
• Load MAR with address calculated.
• Read memory into MDR, making data available
as input to the processing unit.
• Memory Operations:
Steps in a Typical Read Cycle
1. Place the address of the location to be read on the
address bus via MAR.
2. Activate the memory read control signal on the
control bus.
3. Wait for the memory to retrieve the data from the
addressed memory location.
4. Read the data from the data bus into MDR.
5. Drop the memory read control signal
to terminate the read cycle.
A simple Pentium memory read cycle takes 3 clocks:
• Steps 1-2 and then 4-5 are done in one clock cycle
each.
• For slower memories, wait cycles will have to be
inserted.
Execute phase
Microcode for the instruction, selected by the decoder
output line, is executed by the ALU.
The ALU:
5. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 5
Store Result Phase
• Result is written to the designated destination of
the instruction operand
• Instruction Cycle begins anew.
• Memory Operations:
Steps in a Typical Write Cycle
1. Place the address of the location to be written on
the address bus via MAR.
2. Place the data to be written on the data bus
via MDR.
3. Activate the memory write control signal on the
control bus.
4. Wait for the memory to store the data at the
addressed location.
5. Drop the memory write control signal
to terminate the write cycle.
A simple Pentium memory write cycle takes 3 clocks:
• Steps 1-2 and 4-5 are done in one clock cycle
each.
• For slower memories, wait cycles will have to be
inserted.
Micro Operations-
In computer central processing units, micro-operations (micro-
ops or μops) are detailed low-level instructions used in some
designs to implement complex machine instructions (sometimes
termed macro-instructions in this context).
Usually, micro-operations perform basic operations on data stored
in one or more registers, including transferring data between
registers or between registers and external buses of the central
processing unit (CPU), and performing arithmetic or logical
operations on registers.
6. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 6
Execution of Complete Instructions:
1. Fetch information from memory to CPU.
2. Store information to CPU register to memory.
3. Transfer of data between CPU registers.
4. Perform arithmetic or logic operation and store the result
in CPU registers.
RISC and CISC Processors
RISC Processor
It is known as Reduced Instruction Set Computer. It is a
type of microprocessor that has a limited number of
instructions. They can execute their instructions very fast
because instructions are very small and simple.
RISC chips require fewer transistors which make them
cheaper to design and produce. In RISC, the instruction
set contains simple and basic instructions from which
more complex instruction can be produced. Most
instructions complete in one cycle, which allows the
processor to handle many instructions at same time.
In this instructions are register based and data transfer
takes place from register to register.
CISC Processor
• It is known as Complex Instruction Set Computer.
• It was first developed by Intel.
• It contains large number of complex instructions.
• In this instructions are not register based.
• Instructions cannot be completed in one machine
cycle.
• Data transfer is from memory to memory.
• Micro programmed control unit is found in CISC.
• Also they have variable instruction formats.
Difference between CISC and RISC
Architectural
Characteristics
Complex
Instruction Set
Computer(CISC)
Reduced
Instruction Set
Computer(RISC)
Instruction size
and format
Large set of
instructions with
variable formats
(16-64 bits per
instruction).
Small set of
instructions with
fixed format (32
bit).
Data transfer Memory to memory. Register to register.
CPU control
Most micro coded
using control
memory (ROM) but
modern CISC use
hardwired control.
Mostly hardwired
without control
memory.
Instruction type
Not register based
instructions.
Register based
instructions.
Memory access
More memory
access.
Less memory
access.
Clocks
Includes multi-
clocks.
Includes single
clock.
Instruction
nature
Instructions are
complex.
Instructions are
reduced and simple.
Design of a Basic Computer
A basic computer consists of the following hardware
components.
1. A memory unit with 4096 words of 16 bits each
2. Registers: AC (Accumulator), DR (Data register),
AR (Address register), IR (Instruction register),
PC (Program counter), TR (Temporary register),
SC (Sequence Counter), INPR (Input register),
and OUTR (Output register).
3. Flip-Flops: I, S, E, R, IEN, FGI and FGO
Note: FGI and FGO are corresponding input and output
flags which are considered as control flip-flops.
1. Two decoders: a 3 x 8 operation decoder and 4 x
16 timing decoder
2. A 16-bit common bus
3. Control Logic Gates
4. The Logic and Adder circuits connected to the
input of AC.
The cycle is then repeated by fetching the next
instruction. Thus in this way the instruction cycle is
repeated continuously.
Parallel Processing and Data Transfer Modes in a Computer System
Instead of processing each instruction sequentially, a parallel
processing system provides concurrent data processing to increase
the execution time.
7. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 7
In this the system may have two or more ALU's and should be able to
execute two or more instructions at the same time. The purpose of
parallel processing is to speed up the computer processing capability
and increase its throughput.
NOTE: Throughput is the number of instructions that can be
executed in a unit of time.
Parallel processing can be viewed from various levels of complexity.
At the lowest level, we distinguish between parallel and serial
operations by the type of registers used. At the higher level of
complexity, parallel processing can be achieved by using multiple
functional units that perform many operations simultaneously.
Data Transfer Modes of a Computer
System
According to the data transfer mode, computer can be divided
into 4 major groups:
1. SISD
2. SIMD
3. MISD
4. MIMD
SISD (Single Instruction Stream, Single Data Stream)
It represents the organization of a single computer containing a
control unit, processor unit and a memory unit. Instructions are
executed sequentially. It can be achieved by pipelining or
multiple functional units.
SIMD (Single Instruction Stream, Multiple Data Stream)
It represents an organization that includes multiple processing
units under the control of a common control unit. All
processors receive the same instruction from control unit but
operate on different parts of the data.
They are highly specialized computers. They are basically used
for numerical problems that are expressed in the form of vector
or matrix. But they are not suitable for other types of
computations
MISD (Multiple Instruction Stream, Single Data Stream)
It consists of a single computer containing multiple processors
connected with multiple control units and a common memory
unit. It is capable of processing several instructions over single
data stream simultaneously. MISD structure is only of
theoretical interest since no practical system has been
constructed using this organization.
MIMD (Multiple Instruction Stream, Multiple Data
Stream
It represents the organization which is capable of processing
several programs at same time. It is the organization of a single
computer containing multiple processors connected with
multiple control units and a shared memory unit. The shared
memory unit contains multiple modules to communicate with
all processors simultaneously. Multiprocessors and
multicomputer are the examples of MIMD. It fulfills the
demand of large scale computations.
Micro program sequencing-
Micro Instructions Sequencer is a combination of all
hardware for selecting the next micro-instruction
address. The micro-instruction in control memory
contains a set of bits to initiate micro operations in
computer registers and other bits to specify the method
by which the address is obtained.
Implementation of Micro Instructions Sequencer –
• Control Address Register (CAR): Control address
register receives the address from four different paths.
For receiving the addresses from four different paths,
Multiplexer is used.
• Multiplexer: Multiplexer is a combinational circuit
which contains many data inputs and single data output
depending on control or select inputs.
8. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 8
• Branching: Branching is achieved by specifying the
branch address in one of the fields of the micro
instruction. Conditional branching is obtained by using
part of the micro-instruction to select a specific status bit
in order to determine its condition.
• Mapping Logic: An external address is transferred into
control memory via a mapping logic circuit.
• Incrementer: Incrementer increments the content of the
control address register by one, to select the next micro-
instruction in sequence.
• Subroutine Register (SBR): The return address for a
subroutine is stored in a special register called Subroutine
Register whose value is then used when the micro-
program wishes to return from the subroutine.
• Control Memory: Control memory is a type of memory
which contains addressable storage registers. Data is
temporarily stored in control memory. Control memory
can be accessed quicker than main memory.
Pipelining-
Pipelining is the process of accumulating instruction from the
processor through a pipeline. It allows storing and executing
instructions in an orderly process. It is also known as pipeline
processing.
Pipelining is a technique where multiple instructions are
overlapped during execution. Pipeline is divided into stages
and these stages are connected with one another to form a pipe
like structure. Instructions enter from one end and exit from
another end.
Pipelining increases the overall instruction throughput.
In pipeline system, each segment consists of an input register
followed by a combinational circuit. The register is used to
hold data and combinational circuit performs operations on it.
The output of combinational circuit is applied to the input
register of the next segment.
Pipeline system is like the modern day assembly line setup in
factories. For example in a car manufacturing industry, huge
assembly lines are setup and at each point, there are robotic
arms to perform a certain task, and then the car moves on
ahead to the next arm.
Types of Pipeline
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the
computers. They are used for floating point operations,
multiplication of fixed point numbers etc. For example: The
input to the Floating Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point
numbers), while a and b are exponents.
The floating point addition and subtraction is done in 4 parts:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.
Registers are used for storing the intermediate results between
the above operations.
Instruction Pipeline
In this a stream of instructions can be executed by
overlapping fetch, decode and execute phases of an
instruction cycle. This type of technique is used to increase
the throughput of the computer system.
An instruction pipeline reads instruction from the memory
while previous instructions are being executed in other
segments of the pipeline. Thus we can execute multiple
instructions simultaneously. The pipeline will be more
efficient if the instruction cycle is divided into segments
of equal duration.
Pipeline Conflicts
There are some factors that cause the pipeline to deviate
its normal performance. Some of these factors are given
below:
1. Timing Variations
All stages cannot take same amount of time. This
problem generally occurs in instruction processing where
different instructions have different operand requirements
and thus different processing time.
2. Data Hazards
When several instructions are in partial execution, and if
they reference same data then the problem arises. We
must ensure that next instruction does not attempt to
access data before the current instruction, because this
will lead to incorrect results.
3. Branching
In order to fetch and execute the next instruction, we
must know what that instruction is. If the present
instruction is a conditional branch, and its result will lead
us to the next instruction, then the next instruction may
not be known until the current one is processed.
4. Interrupts
Interrupts set unwanted instruction into the instruction
stream. Interrupts effect the execution of instruction.
9. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 9
5. Data Dependency
It arises when an instruction depends upon the result of a
previous instruction but this result is not yet available.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and
costly to manufacture.
2. The instruction latency is more.
What is Vector (Array) Processing?
There is a class of computational problems that are
beyond the capabilities of a conventional computer.
These problems require vast number of computations on
multiple data items that will take a conventional
computer (with scalar processor) days or even weeks to
complete.
Such complex instruction, which operates on multiple
data at the same time, requires a better way of instruction
execution, which was achieved by Vector processors.
Scalar CPUs can manipulate one or two data items at a
time, which is not very efficient. Also, simple instructions
like ADD A to B, and store into C are not practically
efficient.
Addresses are used to point to the memory location where
the data to be operated will be found, which leads to
added overhead of data lookup. So until the data is found,
the CPU would be sitting ideal, which is a big
performance issue.
Hence, the concept of Instruction Pipeline comes into
picture, in which the instruction passes through several
sub-units in turn. These sub-units perform various
independent functions, for example: the first one
decodes the instruction, the second sub-unit fetches the
data and the third sub-unit performs the math itself.
Therefore, while the data is fetched for one instruction,
CPU does not sit idle, it rather works on decoding the
next instruction set, ending up working like an assembly
line.
Vector processor, not only use Instruction pipeline, but
it also pipelines the data, working on multiple data at the
same time.
A normal scalar processor instruction would be ADD A,
B, which leads to addition of two operands, but what if
we can instruct the processor to ADD a group of
numbers(from 0 to n memory location) to another group
of numbers(lets say, n to k memory location). This can be
achieved by vector processors.
In vector processor a single instruction, can ask for
multiple data operations, which saves time, as instruction
is decoded once, and then it keeps on operating on
different data items.
Applications of Vector Processors
Computer with vector processing capabilities are in
demand in specialized applications. The following are
some areas where vector processing is used:
1. Petroleum exploration.
2. Medical diagnosis.
3. Data analysis.
4. Weather forecasting.
5. Aerodynamics and space flight simulations.
6. Image processing.
7. Artificial intelligence.
Superscalar Processors
It was first invented in 1987. It is a machine which is
designed to improve the performance of the scalar
processor. In most applications, most of the operations
are on scalar quantities. Superscalar approach produces
the high performance general purpose processors.
The main principle of superscalar approach is that it
executes instructions independently in different pipelines.
As we already know, that Instruction pipelining leads to
parallel processing thereby speeding up the processing of
instructions. In Superscalar processor, multiple such
pipelines are introduced for different operations, which
further improves parallel processing.
There are multiple functional units each of which is
implemented as a pipeline. Each pipeline consists of
multiple stages to handle multiple instructions at a time
which support parallel execution of instructions.
It increases the throughput because the CPU can execute
multiple instructions per clock cycle. Thus, superscalar
processors are much faster than scalar processors.
A scalar processor works on one or two data items,
while the vector processor works with multiple data
items. A superscalar processor is a combination of both.
Each instruction processes one data item, but there are
multiple execution units within each CPU thus multiple
instructions can be processing separate data items
concurrently.
While a superscalar CPU is also pipelined, there are two
different performance enhancement techniques. It is
possible to have a non-pipelined superscalar CPU or
pipelined non-superscalar CPU.
The superscalar technique is associated with some
characteristics, these are:
10. [CO Notes]
[By Prof. Narendra Kumar_CSE] Page 10
1. Instructions are issued from a sequential
instruction stream.
2. CPU must dynamically check for data
dependencies.
3. Should accept multiple instructions per clock
cycle.
Horizontal and Vertical
Microprogramming-
Micro-programmed control unit can be classified
into two types based on the type of Control Word
stored in the Control Memory, viz
• In Horizontal micro-programmed control unit,
the control signals are represented in the
decoded binary format, i.e., 1 bit/CS. Here ‘n’
control signals require n bit encoding. On the
other hand.
• In Vertical micro-programmed control unit, the
control signals are represented in the encoded
binary format. Here ‘n’ control signals require
log2n bit encoding.
Horizontal µ-
programmed CU
Vertical µ-
programmed CU
It supports longer
control word.
It supports shorter control
word.
It allows higher degree
of parallelism. If
degree is n, then n
Control Signals are
enabled at a time.
It allows low degree of
parallelism i.e., degree of
parallelism is either 0 or 1.
No additional
hardware is required.
Additional hardware in the
form of decoders are
required to generate
control signals.
It is faster than
Vertical micro-
programmed control
unit.
it is slower than Horizontal
micro-programmed control
unit.
It is less flexible than It is more flexible than
Horizontal µ-
programmed CU
Vertical µ-
programmed CU
Vertical micro-
programmed control
unit.
Horizontal micro-
programmed control unit.
Horizontal micro-
programmed control
unit uses horizontal
microinstruction,
where every bit in the
control field attaches
to a control line.
Vertical micro-
programmed control unit
uses vertical
microinstruction, where a
code is used for each
action to be performedand
thedecoder translates this
code into individual
control signals.
Horizontal micro-
programmed control
unit makes less use of
ROM encoding than
vertical micro-
programmed control
unit.
Vertical micro-
programmed control unit
makes more use of ROM
encoding to reduce the
length of the control word.
Example: Consider a hypothetical Control Unit which
supports 4 k words. The Hardware contains 64 control signals
and 16 Flags. What is the size of control word used in bits
and control memory in byte using:
a) Horizontal Programming
b) Vertical programming
Solution:
a. For Horizontal
64 bits for 64 signals
Control Word Size = 4 + 64 + 12 = 80 bits
Control Memory = 4 kW = ( (4* 80) / 8 ) = 40 kByte
b. For Vertical
6 bits for 64 signals i.e log264
Control Word Size = 4 + 6 + 12 = 22 bits