Data path of Computer Architecture ALU and other components

Computer Architecture
Course EE3213

Basic Computer Architecture
• Overview
– CPU
Register organisation, instruction cycle, control unit
– Memory
Hierarchy, main memory, cache memory
– Input/Output
Programmed I/O, interrupt-driven I/O, DMA
– Interconnection structures
Bus systems

Introduction
• von Neumann computer model
All modern computer systems are built upon the von Neumann
model which consists of
– A CPU to perform data processing
– A memory system to store programs and data
– Input/Output (I/O) devices for communication between the
computer and the outside world
Memory I/O
devices
CPU
Bus
von Neumann computer architecture

CPU organisation
• The functions performed by the CPU:
– Fetch instructions
– Instruction Decode
– Process data/ Execute
– Memory access
– Write data
• Organizational requirements that are derived from these functions:
– ALU
– Control logic
– Temporary storage
– Means to move data and instructions in and around the CPU
CPU Architecture

External view of the CPU
ALU
Registers
Control
unit
CPU
Control
bus
Data
bus
Address
bus

Internal view of MC68000
Including: CU; ALU; general-purpose registers: An, Dn; special-
purpose registers: PC, IR, CCR, MAR, MBR; internal buses: A~F;
external buses – address, data, control; ALU2: address in-/de-crement
bus D
bus B
PC
A0
A1
An
D0
D1
Dm
IR op-code operand
ALU
MAR MBR
CCR
CU
ALU2
control signals
bus A bus C
bus E
bus F
Address bus Data bus
Memory

Register organisation
• Registers form the highest level of the memory hierarchy
– Small set of high speed (but electronically complex) storage
locations inside CPU
– Temporary storage for data and control information
• Two types of registers
– User-visible
» May be referenced by assembly-level instructions and are
thus “visible” to the user
– Control and status registers
» Used to control the operation of the CPU
» Most are not visible to the user

User-visible registers
• General-purpose registers
– Data registers (e.g. D0~D7 in MC68000)
» These registers only hold data
– Address registers (e.g. A0~A7 in MC68000)
» These registers hold address information
» Examples: address register indirect addressing (A0), (A1)+,
–(A2), –6(A3), stack pointers
» They may also be used for holding data
• Condition code registers
– Visible to the user but values set by the CPU as the result of
performing operations
– Example code bits: zero (Z), negative (N), overflow (V)
– Bit values are used as the basis for conditional jump instructions

Control and status registers
• These registers are used during the fetching, decoding and
execution of instructions
– Many are not visible to the user/programmer
– Some are visible but can not be (easily) modified
• Typical registers
– Program counter (PC)
» stores the memory address of the instruction to be executed
next
» automatically updated to the next instruction position after an
instruction is fetched from memory
» branches, subroutine calls and interrupts force PC to change
to a different value from its routine increment, so change the
flow of the program

– Instruction register (IR)
» Contains the instruction currently being executed
– Memory address register (MAR) and Memory buffer register
(MBR)
» These buffers are needed to accommodate speed conflict
between the external, slower memory and internal, fast CPU
– Program status register
» Condition code register
» Interrupt masks, supervisory modes, etc.
» Status information

The instruction cycle
• How does the CPU execute a program? It keeps repeating fetching
then executing of an instruction until end of program. This is called
the instruction cycle, or fetch-execute cycle.
• First, in outline.
Fetch an instruction from memory
Decode and execute the instruction
The address of the instruction to be fetched within each cycle is
given by the contents of the program counter (PC). Thus, after
fetching an instruction, the contents of PC must be updated to point
to the next instruction. This is called PC increment.

• In more detail – the fetch-execute cycle.
– Fetch:
• Copy the instruction pointed to by the PC from memory into
the instruction register IR
• Increment the PC to the address of the next instruction
– Execute:
• Decode the op-code into set of signals needed to control the
ALU and other components
• If the instruction requires operands find them (from memory,
I/O) and load them into CPU’s internal registers
• Execute the instruction under controlling signals from the CU
• If required, write result into appropriate register, memory
address or I/O
• Recognize pending interrupts

Register transfer language (RTL)
• To describe the details of operation of the CPU, we use a simple
language called RTL. The notations are as follows:
– Reg denotes contents of register Reg; Reg may be, for example,
MAR, MBR, PC, IR, D0, A5, ….
– Mem[x] denotes contents of a memory cell with address x;
sometimes [x].
– Mem[Reg] denotes contents of a memory cell whose address is
given by the contents of register Reg; sometimes [Reg].
– We use  to denote transfer of contents: MAR  PC means that
the contents of PC are transferred into MAR; PC  Mem[x]
means that the contents of address x are transferred into PC; and,
Mem[x]  PC means that the contents of PC are stored in
memory address x.

• Example: In RTL, the fetch-execute cycle for the instruction
ADD.W ($2000), D0 (in MC68000 assembly
language)
i.e. add the contents of memory address $2000 to the contents of a
data register D0 and store the result in D0 – can be expressed as
(assume that each instruction is 16 bits wide, or 2 bytes wide):
MAR  PC move contents of PC to MAR
PC  PC + 2 increment PC by 2 (why 2 ?)
MBR  Mem[MAR] read instruction from memory
IR  MBR move instruction to IR
CU  IR(op-code) move op-code to CU
MAR  IR(address) operand address ($2000) to MAR
MBR  Mem[MAR] read operand from memory
ALU  MBR, D0 perform addition
D0  ALU move output of ALU to D0

• Example: Describe, in register transfer language (RTL), the micro-
operations involved in executing the following instructions:
MOVE.W (A0), D5 * reading a word from memory
* address of the word to MAR
* read word from memory address
MAR
* send the word to D5
MOVE.W D2, (A7) * writing a word into memory
* address of memory to MAR
* copy the word to MBR
* write word into memory address
MAR

The control unit (CU)
• The CU is the most complex section in a CPU. It decodes the
instruction in the instruction register (IR), and generates signals that
control all parts of the CPU to execute the instruction.
• Two different approaches have been used to implement the CU:
hardwiring – seen in early computers, and microprogramming –
adopted by most modern computers.
Hardwiring
• Each instruction is executed directly by logic circuit – hardware. To
do this, all one has to do is ask what sequence of logical and
arithmetic operations are needed to carry out an instruction, and
then to design the appropriate logic circuit to bring this about.
• This is the technique used until the early 1960s.

Microprogramming
• This method was introduced in 1951 by Wilkes.
• Microprogramming means that each instruction could be translated
into a sequence of even more primitive instructions called
microinstructions, which specify the control signals for the
electronic components in every detail.
• For example, the operation ‘fetch an instruction’ in a fetch-execute
cycle can be decomposed into microinstructions as follows
MAR  PC move contents of PC to MAR
PC  PC + 2 increment PC by 2 (for 16 bit
machine)
MBR  Mem[MAR] read instruction from memory
IR  MBR move instruction to IR
CU  IR(op-code) move op-code to CU

• Microprogram is classified at the second lowest machine level, just
above the hardware - logic circuit. It is mainly the province of the
computer designer; and it comes as part of the CPU. For this reason,
it is often referred to as firmware – between hardware and
software.

Microprogram control vs. hardware control
• Microprogrammed solution
– allows arbitrarily complex instructions to be built-up;
– may also be more flexible, for example, there were some
machines that users could microprogram themselves;
– and, there were computers which differed only by their
microprograms, perhaps one optimised for execution of C
programs, another for PASCAL programs.
• Hardware solution
– hardware circuit is capable of decoding an instruction in ONE
clock period, i.e. a lot faster than the microprogrammed solution.

Memory Hierarchy
• The time spent for memory access has been a major factor limiting
the speed of a computer
– Memory speed is slow compared to the speed of the processor
– A process could be bottlenecked due to the inability of the
memory system to “keep up” with the processor
• Major design objective - to provide adequate storage capacity
– at an acceptable level of performance
– at a reasonable cost
Memory

Typical memory hierarchy: (Technology, size, access time)
Registers
in CPU
Cache
(RAM, 100sKB, ~10ns)
Main memory
(RAM, 100sMB~GB, ~50ns)
Magnetic disk
(Hard disk, 10sGB-100sGB, ~10ms)
Optical disk
(CD-ROM, GB, ~100ms)
Magnetic tape
(Tape, 100sMB, sec-min)

Addressing main memory
• We draw a memory as an array of 16-bit (2-byte) words, and
consider the addresses for storing bytes, words and long words in it.
• Bytes. Byte is the smallest unit that can be addressed. Bytes are
addressed as follows:
• Successive bytes in memory are stored at consecutive byte
addresses, e.g. 0, 1, 2, 3…, as above.
0 7 8 15
0 Byte 0 Byte 1 1
2 Byte 2 Byte 3 3
4 Byte 4 Byte 5 5
6 Byte 6 Byte 7 7
8 Byte 8 Byte 9 9
A B
Bit-number
Address
etc.
Address

• Words. Each consists of 2 bytes, addressed as follows:
• Words are stored and accessed at even addresses (at even byte
numbers), e.g. 0, 2, 4, 6…, as above.
Byte 0 Byte 1
LSB Bit-number
Address
etc.
First
byte
Second
byte
0 15
0 Word 0 1
2 Word 2 3
4 Word 4 5
6 Word 6 7
8 Word 8 9
A B

• Long words. Each consists of 4 bytes, addressed as follows:
• Long words are stored and accessed at even addresses that are
multiple of 4, e.g. 0, 4, 8, C…, as above.
Byte 0 Byte 1
Byte 2 Byte 3
LSB Bit-number
Address
etc.
First
byte Last
byte
0 15
0 1
2
Long
word 0 3
4 5
6
Long
word 4 7
8 9
A
Long
word 8 B

Microsoft vs. Unix, or Intel vs. Motorola
• The arrangement of items of multi-byte data in memory follows one
of two conventions set independently by Intel and Motorola.
• Intel policy (little-endian): store the least significant (LS) byte first
and the most significant (MS) byte last.
• Motorola policy (big-endian): store the MS byte first and the LS
byte last.
• Example: Difference in byte ordering in multi-byte numbers.
The region of memory being viewed contains two bytes: $01, $02;
one word: $0580; and one long word: $89ABCDEF.
2 01 02
4 80 05
6 EF CD
8 AB 89
Address
Intel Pentium
2 01 02
4 05 80
6 89 AB
8 CD EF
Address
Motorola 68k

Cache Memory
• People observe empirically
– Temporal Locality: The word referenced now is likely to be
referenced again soon. Hence it is wise to keep the currently
accessed word handy (high in the memory hierarchy) for a
while.
– Spatial Locality: Words near the currently referenced word are
likely to be referenced soon. Hence it is wise to prefetch words
near the currently referenced word and keep them handy (high in
the memory hierarchy) for a while.
• A cache is a small fast memory between the processor and the main
memory. It contains a subset of the contents of the main memory.
• Because of the high speeds involved with the cache, management of
the data transfer and storage in the cache is done in hardware

• A Cache is organised in units of blocks. Common block sizes are
16, 32, and 64 bytes. This is the smallest unit we can move to/from
a cache.
• We view memory as organised in blocks as well. If the block size is
16, then bytes 0-15 of memory are in block 0, bytes 16-31 are in
block 1, etc.
• Transfers from memory to cache and back are one block.
• A hit occurs when a memory reference is found in the cache.
– A miss is a non-hit.
– The hit rate is the fraction of memory references that are hits.
– The miss rate is 1 - hit rate, which is the fraction of references
that are misses.
– The hit time is the time required for a hit.
– The miss time is the time required for a miss.

Example: a cache with 4 blocks and a memory with 16 blocks
Cache
Main
memory
Block# 0 1 2 3
Block# 0 1 2 3 4 5 6 7 8 9 A B C D E F
Direct mapping: memory blocks 0, 4, 8, C  cache block 0
memory blocks 1, 5, 9, D  cache block 1
memory blocks 2, 6, A, E  cache block 2
memory blocks 3, 7, B, F  cache block 3

Software Classification
Software
System
Application
Utility Operating System
Kernel
Hardware
Drivers

Hardware - RAM
CPU
L1 Cache
L2 Cache
Memory
Disk
Processor

I/O Modules (or I/O Interfaces)
• External devices are not generally connected directly into the bus
structure of the computer. A wide variety of devices require their
own respective logic interfaces because of:
– Mismatch of data rates
– Different data representations
• The I/O module
– Provides a standard interface to the CPU and the bus
– Tailored to specific I/O device and its interface requirements
– Relieves the CPU of the management of the I/O devices
– Interface consists of
» Control signals
» Status signals
» Data signals
Input / Output

Programmed I/O
• Refer to the I/O operation in which CPU issues the I/O command to
the I/O module
• CPU is in direct control of the operation
– CPU waits until the I/O operation is completed before it can
perform other tasks
– Completion indicated by a change in the module status bits
– CPU must periodically poll the module to check its status
• As a result of the speed difference between a CPU and the
peripheral devices, programmed I/O wastes an enormous amount of
CPU processing power.
– Very inefficient
– CPU slowed to the speed of the peripheral

issue Write command
to I/O module
read Status of I/O
module
read Character from
memory
write Character to I/O
module
check Status
done
next instruction
CPU  I/O
I/O  CPU
Error condition
Memory  CPU
CPU  I/O
y
n
ready
not ready
Programmed
I/O
Writing an array
of characters to
I/O

Interrupt-Driven I/O
• To reduce the time spent on I/O operations, the CPU can use an
interrupt-driven approach
– CPU issues I/O command to the module
– CPU continues with its other tasks while the module performs its
task
– Module signals the CPU when the I/O operation is finished (the
interrupt)
– CPU responds to the interrupt by executing an interrupt service
routine (e.g. issuing another I/O command) and then continues
on with its primary task
• CPU recognizes and responds to interrupts at the end of an
instruction execution cycle
• Interrupt technique is used to support a wide variety of devices

Interrupt-driven
I/O
Writing an array
of characters to
I/O
issue Write command
to I/O module
read Status of I/O
module
read Character from
memory
write Character to I/O
module
check Status
done
next instruction
CPU  I/O
I/O  CPU
Error condition
Memory  CPU
CPU  I/O
y
n
ready
Do something else
Interrupt

Interrupt
Service
routine
User’s
program
System
stack
Main memory
Address
i
i-m
Start
Return
k
k+L
n
n+1
PC
Registers
SP
n+1
k
i
i-m
CPU
Interrupt occurs after instruction at
location n

Interrupt
Service
routine
User’s
program
System
stack
Main memory
Address
i
i-m
Start
Return
k
k+L
n
n+1
PC
Registers
SP
k+L
i-m
i
CPU
Return from interrupt
n+1

Direct Memory Access - DMA
• Both programmed and interrupt driven I/O require the continued
involvement of the CPU in ongoing I/O operations.
• DMA takes the CPU out of the task except for initialisation of the
operation.
• Large amounts of data can be transferred between memory and the
peripheral without severely impacting CPU performance:
– CPU initialises DMA module:
» Read or write operation defined
» I/O device involved
» Starting address of memory block
» Number of words (size of block) to be transferred
– CPU then continues with other work.

• DMA is possible because both the device (e.g. disk) and memory
are connected to the system buses.
• Hence a third device - a DMA controller may be used to take over
the system buses for data transfer, freeing CPU for its primary task.
• DMA controller uses the bus when the CPU is not using it - no
impact on the CPU performance.
• DMA controller is nothing but another processor which
– accesses memory to retrieve a data word
– forwards the word to the I/O peripheral
• DMA provides the fastest possible means of transferring data
between a device and memory.
CPU DMA
module
I/O I/O I/O Memory
…
System buses

CPU involvements in DMA
issue Read/Write
block command to
DMA module
CPU  DMA
Do something else
read Status of
DMA module
Interrupt
DMA  CPU
next instruction
Data transfer between
memory and I/O under the
control of DMA controller
Initialisation
Completion

Computer modules and their interconnection requirements
Interconnection Structures
Memory
N words
0
N-1
.
.
.
Data
Data
Address
Read
Write I/O
module
Data
Data
Address
Read
Write
Interrupt
signals
CPU
Control signals
Interrupt signals
Data
Dt/Addr
Instructions

Bus system
• Computer systems contain a number of buses that provide pathways
between components.
• Typical buses consist of 50-100 lines and are divided into 3 parts:
– Address bus
» Address information
– specifies source/destination of data transfer
– Data bus
» Instruction/Data information
– Control bus
» Control information, controlling access to and use of the
address and data bus

Example: MC68000 computer system
A computer system based on MC68000 may be constructed as
follows.
23
MC68000
Address bus (A01~A23)
Data bus (D00~D15)
Control bus
Main
memory
I/O
devices
16
6
4
p
i
n
s
= 16 line bus
16

Introduction
• Measure of performance of computer (how fast) is complex.
• Clock speed or clock rate – cycles per second – is important. This is
usually measured in MHz, e.g., 450 MHz, 866 MHz, etc..
– Now we have a few machines which have a clock rate exceeding
a gigahertz (GHz) or even 38 GHz (e.g. Pentium 4).
• But clearly performance depends on design of CPU too.
– e.g. Pentium III faster than Pentium II faster than Pentium even
if clock speed was the same.
• Crucially, speed depends on how fast the whole computer system,
including CPU, memory, I/O devices, etc., functions together.
– e.g. if CPU very fast but memory slow, whole system is slowed.
Computer Performance Analysis

MIPS
• MIPS stands for Millions of Instructions Per Second.
MIPS = Number of Machine Instructions / (Execute Time  106
)
• In general, faster machines will have higher MIPS ratings and appear
to have better performance.
• Problems:
– Rating of a machine based on its instruction set.
» For the same program, in Java, or C, or Ada, etc., different
machines may need different numbers of instructions due to
different instruction sets;
» For example, one VAX instruction might correspond to 2
instructions on a power-PC and 3 instructions on a Pentium.
– For the same program on the same machine, different compilers
may generate different numbers of instructions.

Benchmark Tests
• Test how long a computer takes to run a whole program involving
different sorts of instructions and memory accesses.
• The test programs may include
• Still have difficulties:
– It is hard to find benchmarks that represent your future usage.
– Compilers can be “tuned‘” for important benchmarks.
– Benchmarks can be chosen to favour certain architectures.

Data path of Computer Architecture ALU and other components

More Related Content

Similar to Data path of Computer Architecture ALU and other components

More from FazalHameed14

Recently uploaded

Data path of Computer Architecture ALU and other components