Instruction Sets:
Addressing Modes
and Formats
Chapter 2
Addressing Modes
Addressing Modes
Addressing modes are illustrated in the figure
above and we use the following notation:
 (X) = contents of memory location X or register
X
 EA = actual (effective) address of the location
containing the referenced operand
 R = contents of an address field in the
instruction that refers to a register
 A = contents of an address field in the instruction
Basic Addressing Modes
Immediate
The simplest form of addressing is immediate
addressing, in which the operand value is
present in the instruction
Operand = A
This mode can be used to define and use
constants or set initial values of variables
 Merit : No memory reference
 Demerit: Limited operand magnitude
Basic Addressing Modes
 Direct:
A very simple form of addressing is direct
addressing, in which the address field contains
the effective address of the operand:
EA = A
 It requires only one memory reference and no
special calculation.
Merit: Simple
Demerit: Limited address space
Indirect Addressing (Addressing
Modes)
 In direct addressing, the length of the address
field is usually less than the word length, thus
limiting the address range. One solution is to
have the address field refer to the address of a
word in memory, which in turn contains a full-
length address of the operand hence Indirect
Addressing
Algorithm: EA = (A).
Merit: Large address space
 Demerit: Multiple memory references (instruction
execution requires two memory references to
fetch the operand: one to get its address and a
second to get its value.)
Register Addressing
It is similar to direct addressing but the only
difference is that the address field refers to
a register rather than a main memory
address:
Algorithm: EA = R
Merit: No memory reference
Demerit: Limited address space
Register Indirect Addressing
Register addressing is analogous to direct addressing as
register indirect addressing is analogous to indirect
addressing. In both cases, the only difference is whether
the address field refers to a memory location or a
register. Thus, for register indirect address
Algorithm: EA = (R)
Merit: Large address space
Demerit: Extra memory reference
In both cases, the address space limitation (limited range of
addresses) of the address field is overcome by having
that field refer to a wordlength location containing an
address. In addition, register indirect addressing uses
one less memory reference than indirect addressing.
Displacement Addressing
It is a very powerful mode of addressing combines the
capabilities of direct addressing and register indirect
addressing.
Algorithm: EA = A + (R)
 Displacement addressing requires that the instruction
have two address fields, at least one of which is explicit.
The value contained in one address field (value = A) is
used directly.The other address field, or an implicit
reference based on opcode, refers to a register whose
contents are added to A to produce the effective
address.
Merit: Flexibility
Demerit: Complexity
Stack Addressing
 a stack is a linear array of locations. It is sometimes
referred to as a pushdown list or last-in-first-out queue.
The stack is a reserved block of locations. Items are
appended to the top of the stack so that, at any given
time, the block is partially filled. Associated with the
stack is a pointer whose value is the address of the top
of the stack. Alternatively, the top two elements of the
stack may be in processor registers, in which case the
stack pointer references the third element of the stack.
The stack mode of addressing is a form of implied
addressing. The machine instructions need not include a
memory reference but implicitly operate on the top of the
stack.
 Merit: No memory reference
 Demerit: Limited applicability
Summary
 Immediate :Value of operand is included in the
instruction
 Direct (Absolute): Address of operand is included in the
instruction
 Register indirect: Operand is in a memory location
whose address is in the register specified in the
instruction
 Memory indirect: Operand is in a memory location whose
address is in the memory location specified in the
instruction
 Indexed : Address of operand is the sum of an index
value and the contents of an index register
 Relative: Address of operand is the sum of an index
value and the contents of the program counter
 Autoincrement : Address of operand is in a register
whose value is incremented after fetching the operand
 Autodecrement: Address of operand is in a register
whose value is decremented before fetching the operand
Pentium Addressing Modes
 Virtual or effective address is offset into segment
 Starting address plus offset gives linear address
 This goes through page translation if paging enabled
 Addressing modes available
 Immediate
 Register operand
 Displacement
 Base
 Base with displacement
 Scaled index with displacement
 Base with index and displacement
 Base scaled index with displacement
 Relative
Immediate Addressing
 Immediate addressing is so-named because the
value to be referenced immediately follows the
operation code in the instruction. That is to say,
the data to be operated on is part of the
instruction. The 12 bits of the operand field do
not specify an address-they specify the actual
operand the instruction requires. Immediate
addressing is very fast because the value to be
loaded is included in the instruction. However,
because the value to be loaded is fixed at
compile time it is not very flexible.
Direct Addressing
 Direct addressing is so-named because the
value to be referenced is obtained by specifying
its memory address directly in the instruction.
Direct addressing is typically quite fast because,
although the value to be loaded is not included
in the instruction, it is quickly accessible. It is
also much more flexible than immediate
addressing because the value to be loaded is
whatever is found at the given address, which
may be variable.
Register Addressing
 In register addressing, a register, instead
of memory, is used to specify the operand.
This is very similar to direct addressing,
except that instead of a memory address,
the address field contains a register
reference. The contents of that register
are used as the operand
Indirect Addressing
 Indirect addressing is a very powerful addressing mode
that provides an exceptional level of flexibility. In this
mode, the bits in the address field specify a memory
address that is to be used as a pointer. The effective
address of the operand is found by going to this memory
address.
 In a variation on this scheme, the operand bits specify a
register instead of a memory address. This mode, known
as register indirect addressing, works exactly the same
way as indirect addressing mode, except it uses a
register instead of a memory address to point to the
data. For example, if the instruction is Load R1 and we
are using register indirect addressing mode, we would
find the effective address of the desired operand in R1.
Indexed and Based Addressing
 In indexed addressing mode, an index register (either explicitly or
implicitly designated) is used to store an offset (or displacement),
which is added to the operand, resulting in the effective address of
the data.
 Based addressing mode is similar, except a base address register,
rather than an index register, is used. In theory, the difference
between these two modes is in how they are used, not how the
operands are computed. An index register holds an index that is
used as an offset, relative to the address given in the address field
of the instruction. A base register holds a base address, where the
address field represents a displacement from this base. These two
addressing modes are quite useful for accessing array elements as
well as characters in strings. In fact, most assembly languages
provide special index registers that are implied in many string
operations. Depending on the instruction-set design, general-
purpose registers may also be used in this mode.
Pentium Addressing Mode
Calculation
 There are six segment registers; the one being
used for a particular reference depends on the
context of execution and the instruction.
Associated with each user-visible segment
register is a segment descriptor register (not
programmer visible), which records the access
rights for the segment as well as the starting
address and limit (length) of the segment. In
addition, there are two registers that may be
used in constructing an address: the base
register and the index register
X86 Addressing modes (Mode and
Algorithm)
 Immediate: Operand = A
 Register Operand : LA = R
 Displacement: LA = (SR) + A
 Base: LA = (SR) + (B)
 Base with Displacement: LA = (SR) + (B) + A
 Scaled Index with Displacement: LA = (SR) + (I) * S + A
 Base with Index and Displacement: LA = (SR) + (B) + (I) + A
 Base with Scaled Index and Displacement : LA = (SR) + (I) * S + (B)
+ A
 Relative: LA = (PC) + A
Addressing modes
 S = scaling factor
 I = index register
 B = base register
 R = register
 A = contents of an address field in the instruction
 PC = program counter
 SR = segment register
 (X) = contents of X
 LA = linear address
 For the immediate mode, the operand is included in the instruction.
The operand can be a byte, word, or doubleword of data.
 For register operand mode, the operand is located in a register.
For general instructions, such as data transfer, arithmetic, and
logical instructions, the operand can be one of the 32-bit general
registers (EAX, EBX, ECX,EDX,ESI, EDI, ESP, EBP), one of the 16-
bit general registers (AX, BX, CX, DX, SI, DI, SP, BP), or one of the
8- bit general registers (AH, BH, CH, DH, AL, BL, CL, DL). There
are also some instructions that reference the segment selector
registers (CS, DS, ES, SS, FS, GS).
 The remaining addressing modes reference locations in memory.
The memory location must be specified in terms of the segment
containing the location and the offset from the beginning of the
segment. In some cases, a segment is specified explicitly; in others,
the segment is specified by simple rules that assign a segment by
default.
 In the displacement mode, the operand’s offset in the figure above
is contained as part of the instruction as an 8-, 16-, or 32-bit
displacement.
 With segmentation, all addresses in instructions refer merely to an
offset in a segment.
 The base mode specifies that one of the 8-, 16-,
or 32-bit registers contains the effective
address.This is equivalent to what we have
referred to as register indirect addressing.
 The base with displacement mode, the
instruction includes a displacement to be added
to a base register, which may be any of the
general-purpose registers. Examples of uses of
this mode are as follows:
• Used by a compiler to point to the start of a local
variable area. For example, the base register
could point to the beginning of a stack frame,
which contains the local variables for the
corresponding procedure.
The scaled index with displacement mode, the
instruction includes a displacement to be added to a
register, in this case called an index register.The index
register may be any of the general-purpose registers
except the one called ESP, which is generally used for
stack processing. In calculating the effective address,
the contents of the index register are multiplied by a
scaling factor of 1, 2, 4, or 8, and then added to a
displacement.This mode is very convenient for indexing
arrays. A scaling factor of 2 can be used for an array of
16-bit integers. A scaling factor of 4 can be used for 32-
bit integers or floating-point numbers. Finally, a scaling
factor of 8 can be used for an array of double-precision
floating-point numbers.
 The base with index and displacement mode
sums the contents of the base register, the index
register, and a displacement to form the
effective address. Again, the base register can
be any general-purpose register and the index
register can be any general-purpose register
except ESP. As an example, this addressing
mode could be used for accessing a local array
on a stack frame. This mode can also be used to
support a two-dimensional array; in this case,
the displacement points to the beginning of the
array, and each register handles one dimension
of the array.
 The based scaled index with
displacement mode sums the contents of
the index register multiplied by a scaling
factor, the contents of the base register,
and the displacement. This is useful if an
array is stored in a stack frame; in this
case, the array elements would be 2, 4, or
8 bytes each in length. This mode also
provides efficient indexing of a two-
dimensional array when the array
elements are 2, 4, or 8 bytes in length.
 relative addressing can be used in
transfer-of-control instructions. A
displacement is added to the value of the
program counter, which points to the next
instruction. In this case, the displacement
is treated as a signed byte, word, or
doubleword value, and that value either
increases or decreases the address in the
program counter.
 LOAD/STORE ADDRESSING Load and store
instructions are the only instructions that reference
memory. This is always done indirectly through a base
register plus offset. There are three alternatives with
respect to indexing:
 Offset: For this addressing method, indexing is not
used. An offset value is added to or subtracted from the
value in the base register to form the memory address.
As an example Figure 11.3a illustrates this method with
the assembly language instruction STRB r0, [r1, #12].
This is the store byte instruction. In this case the base
address is in register r1 and the displacement is an
immediate value of decimal 12. The resulting address
(base plus offset) is the location where the least
significant byte from r0 is to be stored
 Preindex: The memory address is formed in the
same way as for offset addressing. The memory
address is also written back to the base register.
In other words, the base register value is
incremented or decremented by the offset value.
Figure below illustrates this method with the
assembly language instruction STRB r0, [r1,
#12]!. The exclamation point signifies
preindexing.
 Postindex:The memory address is the base
register value.An offset is added to or subtracted
from the base register value and the result is
written back to the base register. Figure below
illustrates this method with the assembly
language instruction STRB r0, [r1], #12.
 Note that what ARM refers to as a base register
acts as an index register for preindex and
postindex addressing. The offset value can
either be an immediate value stored in the
instruction or it can be in another register. If the
offset value is in a register, another useful
feature is available: scaled register
addressing.The value in the offset register is
scaled by one of the shift operators: Logical Shift
Left, Logical Shift Right, Arithmetic Shift Right,
Rotate Right, or Rotate Right Extended (which
includes the carry bit in the rotation).The amount
of the shift is specified as an immediate value in
the instruction.
 DATA PROCESSING INSTRUCTION
ADDRESSING Data processing instructions use
either register addressing of a mixture of register
and immediate addressing. For register
addressing, the value in one of the register
operands may be scaled using one of the five
shift operators defined in the preceding
paragraph.
 BRANCH INSTRUCTIONS The only form of
addressing for branch instructions is immediate
addressing.The branch instruction contains a
24-bit value. For address calculation, this value
is shifted left 2 bits, so that the address is on a
word boundary. Thus the effective address
range is ;32 MB from the program counter.
PowerPC Addressing Modes
 Load/store architecture
 Indirect
 Instruction includes 16 bit displacement to be added to base
register (may be GP register)
 Can replace base register content with new address
 Indirect indexed
 Instruction references base register and index register (both
may be GP)
 EA is sum of contents
 Branch address
 Absolute
 Relative
 Indirect
 Arithmetic
 Operands in registers or part of instruction
 Floating point is register only
Questions
 Briefly define immediate addressing, direct
addressing, indirect addressing, register
addressing, register indirect addressing,
displacement addressing, relative addressing.
 An address field in an instruction contains
decimal value 14. Where is the corresponding
operand located for :
 immediate addressing?
 direct addressing?
 indirect addressing?
 register addressing?
 register indirect addressing?
PROCESSOR
STRUCTURE
AND FUNCTION
Lesson 3
PROCESSOR ORGANIZATION
To understand the organization of the processor, we
consider the requirements placed on the processor, the
things that it must do:
 Fetch instruction: The processor reads an instruction
from memory (register, cache, main memory).
 Interpret instruction: The instruction is decoded to
determine what action is required
 Fetch data: The execution of an instruction may require
reading data from memory or an I/O module.
 Process data: The execution of an instruction may
require performing some arithmetic or logical operation
on data.
 Write data: The results of an execution may require
writing data to memory or an I/O module
 To do these things, the processor needs
to store some data temporarily. It must
remember the location of the last
instruction so that it can know where to get
the next instruction. It needs to store
instructions and data temporarily while an
instruction is being executed. In other
words, the processor needs a small
internal memory.
Register Organization
 A computer system employs a memory hierarchy. At higher levels of the
hierarchy, memory is faster, smaller, and more expensive (per bit).Within
the processor, there is a set of registers that function as a level of memory
above main memory and cache in the hierarchy. The registers in the
processor perform two roles:
 • User-visible registers: Enable the machine- or assembly language
programmer to minimize main memory references by optimizing use of
registers.
 • Control and status registers: Used by the control unit to control the
operation of the processor and by privileged, operating system programs to
control the execution of programs.
 There is not a clean separation of registers into these two categories. For
example, on some machines the program counter is user visible (e.g., x86),
but on many it is not. For purposes of the following discussion, however, we
will use these categories.
 User-Visible Registers
A user-visible register is one that may be
referenced by means of the machine
language that the processor executes. We
can characterize these in the following
categories:
 General purpose
 Data
 Address
 Condition codes
 General-purpose registers can be assigned to
a variety of functions by the programmer.
Sometimes their use within the instruction set is
orthogonal to the operation. That is, any general-
purpose register can contain the operand for any
opcode. This provides true general-purpose
register use. Often, however, there are
restrictions.
 For example, there may be dedicated registers
for floating-point and stack operations.
 In some cases, these registers can be used for
addressing functions (e.g., register indirect,
displacement). In other cases, there is a partial
or clean separation between data registers and
address registers
 Data registers may be used only to hold data and
cannot be employed in the calculation of an operand
address.
 Address registers may themselves be somewhat
general purpose, or they may be devoted to a particular
addressing mode.
Examples include the following:
 Segment pointers: In a machine with segmented
addressing, a segment register holds the address of the
base of the segment. There may be multiple registers:
for example, one for the operating system and one for
the current process.
 Index registers: These are used for indexed addressing
and may be auto indexed.
 Stack pointer: If there is user-visible stack addressing,
then typically there is a dedicated register that points to
the top of the stack. This allows implicit addressing; that
is, push, pop, and other stack instructions need not
contain an explicit stack operand
 A final category of registers, which is at least
partially visible to the user, holds condition
codes (also referred to as flags). Condition
codes are bits set by the processor hardware as
the result of operations.
Advantages of Conditional Codes:
1. Because condition codes are set by normal
arithmetic and data movement instructions, they
should reduce the number of COMPARE and
TEST instructions needed.
2. Conditional instructions, such as BRANCH are
simplified relative to composite instructions,
such as TEST AND BRANCH.
3. Condition codes facilitate multiway branches.
For example, a TEST instruction can be followed
by two branches, one on less than or equal to
zero and one on greater than zero.
1. Condition codes add complexity, both to the
hardware and software. Condition code bits are
often modified in different ways by different
instructions, making life more difficult for both the
microprogrammer and compiler writer.
2. Condition codes are irregular; they are typically not
part of the main data path, so they require extra
hardware connections.
3. Often condition code machines must add special
non-condition-code instructions for special
situations anyway, such as bit checking, loop
control, and atomic semaphore operations.
4. In a pipelined implementation, condition codes
require special synchronization to avoid conflicts.
Control and Status Registers
 There are a variety of processor registers
that are employed to control the operation
of the processor. Most of these, on most
machines, are not visible to the user.
Some of them may be visible to machine
instructions executed in a control or
operating system mode
Four registers are essential to instruction
execution:
 Program counter (PC): Contains the address of
an instruction to be fetched
 Instruction register (IR): Contains the
instruction most recently fetched
 Memory address register (MAR): Contains the
address of a location in memory
 Memory buffer register (MBR): Contains a
word of data to be written to memory or the word
most recently read
 Many processor designs include a register
or set of registers, often known as the
program status word (PSW), that contain
status information. The PSW typically
contains condition codes plus other status
information. Common fields or flags
include the following
 Sign: Contains the sign bit of the result of the
last arithmetic operation.
 Zero: Set when the result is 0.
 Carry: Set if an operation resulted in a carry
(addition) into or borrow (subtraction) out of a
high-order bit. Used for multiword arithmetic
operations.
 Equal: Set if a logical compare result is equality.
 Overflow: Used to indicate arithmetic overflow.
 Interrupt Enable/Disable: Used to enable or
disable interrupts.
 Supervisor: Indicates whether the processor is
executing in supervisor or user mode. Certain
privileged instructions can be executed only in
supervisor mode, and certain areas of memory
can be accessed only in supervisor mode.
Instruction Cycle
An instruction cycle includes the following stages:
 Fetch: Read the next instruction from memory
into the processor.
 Execute: Interpret the opcode and perform the
indicated operation.
 Interrupt: If interrupts are enabled and an
interrupt has occurred, save the current process
state and service the interrupt.
Instruction Cycle diagram
The Indirect Cycle
 The execution of an instruction may involve one or more
operands in memory, each of which requires a memory
access. Further, if indirect addressing is used, then
additional memory accesses are required.
 We can think of the fetching of indirect addresses as one
more instruction stages. The main line of activity consists
of alternating instruction fetch and instruction execution
activities. After an instruction is fetched, it is examined to
determine if any indirect addressing is involved. If so, the
required operands are fetched using indirect addressing.
Following execution, an interrupt may be processed
before the next instruction fetch.
Data Flow
 The exact sequence of events during an
instruction cycle depends on the design of the
processor. We can, however, indicate in general
terms what must happen.
 Let us assume that a processor that employs a
memory address register (MAR), a memory
buffer register (MBR), a program counter (PC),
and an instruction register (IR).
 During the fetch cycle, an instruction is
read from memory. Figure below shows
the flow of data during this cycle. The PC
contains the address of the next
instruction to be fetched. This address is
moved to the MAR and placed on the
address bus
Instruction Cycle State Diagram
Data Flow, Fetch Cycle diagram
 The control unit requests a memory read,
and the result is placed on the data bus
and copied into the MBR and then moved
to the IR. Meanwhile, the PC is
incremented by 1, preparatory for the next
fetch.
 Once the fetch cycle is over, the control
unit examines the contents of the IR to
determine if it contains an operand
specifier using indirect addressing.
 If so, an indirect cycle is performed. .The
rightmost N bits of the MBR, which contain the
address reference, are transferred to the MAR.
Then the control unit requests a memory read,
to get the desired address of the operand into
the MBR.
 The fetch and indirect cycles are simple and
predictable. The execute cycle takes many
forms; the form depends on which of the various
machine instructions is in the IR. This cycle may
involve transferring data among registers, read
or write from memory or I/O, and/or the
invocation of the ALU.
Instruction Pipelining
 Pipelining Strategy
Instruction pipelining is similar to the use of an assembly
line in a manufacturing plant. An assembly line takes
advantage of the fact that a product goes through
various stages of production. By laying the production
process out in an assembly line, products at various
stages can be worked on simultaneously. This process is
also referred to as pipelining, because, as in a pipeline,
new inputs are accepted at one end before previously
accepted inputs appear as outputs at the other end.
 To apply this concept to instruction execution, we must
recognize that, in fact, an instruction has a number of
stages.
Two-Stage Instruction pipeline
Expanded view
 As a simple approach, consider subdividing
instruction processing into two stages: fetch
instruction and execute instruction. There are
times during the execution of an instruction
when main memory is not being accessed. This
time could be used to fetch the next instruction
in parallel with the execution of the current one.
 The pipeline has two independent stages. The
first stage fetches an instruction and buffers it.
When the second stage is free, the first stage
passes it the buffered instruction. While the
second stage is executing the instruction, the
first stage takes advantage of any unused
memory cycles to fetch and buffer the next
instruction. This is called instruction prefetch or
fetch overlap
 In general, pipelining requires registers to
store data between stages. It should be
clear that this process will speed up
instruction execution. If the fetch and
execute stages were of equal duration, the
instruction cycle time would be halved.
However, if we look more closely at this
pipeline, we will see that this doubling of
execution rate is unlikely for two reasons:
1. The execution time will generally be longer than
the fetch time. Execution will involve reading and
storing operands and the performance of some
operation. Thus, the fetch stage may have to
wait for some time before it can empty its buffer.
2. A conditional branch instruction makes the
address of the next instruction to be fetched
unknown. Thus, the fetch stage must wait until it
receives the next instruction address from the
execute stage. The execute stage may then
have to wait while the next instruction is fetched.
 While these factors reduce the potential effectiveness of the two-
stage pipeline, some speedup occurs. To gain further speedup, the
pipeline must have more stages. Let us consider the following
decomposition of the instruction processing.
 Fetch instruction (FI): Read the next expected instruction into a
buffer.
 Decode instruction (DI): Determine the opcode and the operand
specifiers.
 Calculate operands (CO): Calculate the effective address of each
source operand. This may involve displacement, register indirect,
indirect, or other forms of address calculation.
 Fetch operands (FO): Fetch each operand from memory.
Operands in registers need not be fetched.
 Execute instruction (EI): Perform the indicated operation and store
the result, if any, in the specified destination operand location.
 Write operand (WO): Store the result in memory.
Six-stage CPU Instruction Pipeline
An alternative Pipeline Depiction
 Some of the IBM S/360 designers pointed out two factors that
frustrate this seemingly simple pattern for high performance design
[ANDE67a], and they remain elements that designer must still
consider:
1. At each stage of the pipeline, there is some overhead involved in
moving data from buffer to buffer and in performing various
preparation and delivery functions. This overhead can appreciably
lengthen the total execution time of a single instruction. This is
significant when sequential instructions are logically dependent,
either through heavy use of branching or through memory access
dependencies.
2. The amount of control logic required to handle memory and register
dependencies and to optimize the use of the pipeline increases
enormously with the number of stages. This can lead to a situation
where the logic controlling the gating between stages is more
complex than the stages being controlled.
Another consideration is latching delay: It takes time for pipeline buffers
to operate and this adds to instruction cycle time. Instruction
pipelining is a powerful technique for enhancing performance but
requires careful design to achieve optimum results with reasonable
complexity
Pipeline Hazards
 A pipeline hazard occurs when the pipeline, or
some portion of the pipeline, must stall because
conditions do not permit continued execution.
Such a pipeline stall is also referred to as a
pipeline bubble. There are three types of
hazards:
 resource,
 data,
 control
 RESOURCE HAZARDS A resource hazard occurs when
two (or more) instructions that are already in the pipeline
need the same resource. The result is that the
instructions must be executed in serial rather than
parallel for a portion of the pipeline. A resource hazard is
sometime referred to as a structural hazard.
 DATA HAZARDS A data hazard occurs when there is a
conflict in the access of an operand location. In general
terms, we can state the hazard in this form: Two
instructions in a program are to be executed in sequence
and both access a particular memory or register
operand. If the two instructions are executed in strict
sequence, no problem occurs. However, if the
instructions are executed in a pipeline, then it is possible
for the operand value to be updated in such a way as to
produce a different result than would occur with strict
sequential execution. In other words, the program
produces an incorrect result because of the use of
pipelining.
There are three types of data hazards;
 Read after write (RAW), or true dependency: An
instruction modifies a register or memory location and a
succeeding instruction reads the data in that memory or
register location. A hazard occurs if the read takes place
before the write operation is complete.
 Write after read (WAR), or antidependency: An
instruction reads a register or memory location and a
succeeding instruction writes to the location. A hazard
occurs if the write operation completes before the read
operation takes place.
 Write after write (WAW), or output dependency: Two
instructions both write to the same location. A hazard
occurs if the write operations take place in the reverse
order of the intended sequence
 CONTROL HAZARDS A control hazard,
also known as a branch hazard, occurs
when the pipeline makes the wrong
decision on a branch prediction and
therefore brings instructions into the
pipeline that must subsequently be
discarded. We discuss approaches to
dealing with control hazards next.
Dealing with Branches
 One of the major problems in designing an instruction
pipeline is assuring a steady flow of instructions to the
initial stages of the pipeline. The primary impediment, as
we have seen, is the conditional branch instruction. Until
the instruction is actually executed, it is impossible to
determine whether the branch will be taken or not. A
variety of approaches have been taken for dealing with
conditional branches:
 Multiple streams
 Prefetch branch target
 Loop buffer
 Branch prediction
 Delayed branch
 MULTIPLE STREAMS A simple pipeline suffers
a penalty for a branch instruction because it
must choose one of two instructions to fetch next
and may make the wrong choice. A brute-force
approach is to replicate the initial portions of the
pipeline and allow the pipeline to fetch both
instructions, making use of two streams. There
are two problems with this approach:
 With multiple pipelines there are contention
delays for access to the registers and to
memory.
 Additional branch instructions may enter the
pipeline (either stream) before the original
branch decision is resolved. Each such
instruction needs an additional stream
 PREFETCH BRANCH TARGET When a
conditional branch is recognized, the
target of the branch is prefetched, in
addition to the instruction following the
branch. This target is then saved until the
branch instruction is executed. If the
branch is taken, the target has already
been prefetched. The IBM 360/91 uses
this approach
 LOOP BUFFER A loop buffer is a small, very-
high-speed memory maintained by the
instruction fetch stage of the pipeline and
containing the n most recently fetched
instructions, in sequence. If a branch is to be
taken, the hardware first checks whether the
branch target is within the buffer. If so, the next
instruction is fetched from the buffer.The loop
buffer has three benefits:
 1. With the use of prefetching, the loop buffer will
contain some instruction sequentially ahead of
the current instruction fetch address. Thus,
instructions fetched in sequence will be available
without the usual memory access time
2. If a branch occurs to a target just a few locations
ahead of the address of the branch instruction,
the target will already be in the buffer. This is
useful for the rather common occurrence of IF–
THEN and IF–THEN–ELSE sequences.
3. This strategy is particularly well suited to dealing
with loops, or iterations; hence the name loop
buffer. If the loop buffer is large enough to
contain all the instructions in a loop, then those
instructions need to be fetched from memory
only once, for the first iteration. For subsequent
iterations, all the needed instructions are already
in the buffer.
 BRANCH PREDICTION Various
techniques can be used to predict whether
a branch will be taken. Among the more
common are the following:
 Predict never taken
 Predict always taken
 Predict by opcode
 Taken/not taken switch
 Branch history table
Intel 80486 Pipelining
An instructive example of an instruction pipeline is that of
the Intel 80486. The 80486 implements a five-stage
pipeline:
 Fetch: Instructions are fetched from the cache or from
external memory and placed into one of the two 16-byte
prefetch buffers.The objective of the fetch stage is to fill
the prefetch buffers with new data as soon as the old
data have been consumed by the instruction decoder.
Because instructions are of variable length (from 1 to 11
bytes not counting prefixes), the status of the prefetcher
relative to the other pipeline stages varies from
instruction to instruction. On average, about five
instructions are fetched with each 16-byte load
[CRAW90].The fetch stage operates independently of
the other stages to keep the prefetch buffers full
 Decode stage 1: All opcode and addressing-mode
information is decoded in the D1 stage. The required
information, as well as instruction-length information, is
included in at most the first 3 bytes of the instruction.
Hence, 3 bytes are passed to the D1 stage from the
prefetch buffers. The D1 decoder can then direct the D2
stage to capture the rest of the instruction (displacement
and immediate data), which is not involved in the D1
decoding.
 Decode stage 2: The D2 stage expands each opcode
into control signals for the ALU. It also controls the
computation of the more complex addressing modes.
 Execute: This stage includes ALU operations, cache
access, and register update.
 Write back: This stage, if needed, updates registers and
status flags modified during the preceding execute
stage. If the current instruction updates memory, the
computed value is sent to the cache and to the bus-
interface write buffers at the same time.
Course work
questions
1. List and briefly explain five important instruction set
design issues
2. What is the difference between an arithmetic shift and
a logical shift?
3. Suppose we wished to apply the x86 CMP instruction to
32-bit operands that contained numbers in a floating-
point format. For correct results, what requirements
have to be met in the following areas?
a. The relative position of the significand, sign, and
exponent fields.
b. The representation of the value zero.
c. The representation of the exponent.
d. Does the IEEE format meet these requirements?
Explain.
4. Briefly define:
 immediate addressing.
 direct addressing.
 indirect addressing.
 register addressing.
 register indirect addressing.
 displacement addressing.
 relative addressing
5. If the last operation performed on a computer
with an 8-bit word was an addition in which the
two operands were 00000010 and 00000011,
what would be the value of the following flags?
 Carry
 Zero
 Overflow
 Sign
 Even Parity
 Half-Carry
6. a. What are some typical characteristics of a
RISC instruction set architecture?
b. What is a delayed branch?
7. In many cases, common machine instructions that are
not listed as part of the MIPS instruction set can be
synthesized with a single MIPS instruction. Show this for
the following:
a. Register-to-register move
b. Increment, decrement
c. Complement
d. Negate
e. Clear
8. Briefly define the following terms:
 True data dependency
 Procedural dependency
 Resource conflicts
 Output dependency
 Antidependency
9.a. Explain the distinction between the
written sequence and the time sequence
of an instruction.
b. What is the relationship between
instructions and micro-operations?
10. Your ALU can add its two input
registers, and it can logically complement
the bits of either input register, but it
cannot subtract. Numbers are to be stored
in two’s complement representation. List
the micro-operations your control unit
must perform to cause a subtraction

Advanced computer architect lesson 3 and 4

  • 1.
  • 2.
  • 3.
    Addressing Modes Addressing modesare illustrated in the figure above and we use the following notation:  (X) = contents of memory location X or register X  EA = actual (effective) address of the location containing the referenced operand  R = contents of an address field in the instruction that refers to a register  A = contents of an address field in the instruction
  • 4.
    Basic Addressing Modes Immediate Thesimplest form of addressing is immediate addressing, in which the operand value is present in the instruction Operand = A This mode can be used to define and use constants or set initial values of variables  Merit : No memory reference  Demerit: Limited operand magnitude
  • 5.
    Basic Addressing Modes Direct: A very simple form of addressing is direct addressing, in which the address field contains the effective address of the operand: EA = A  It requires only one memory reference and no special calculation. Merit: Simple Demerit: Limited address space
  • 6.
    Indirect Addressing (Addressing Modes) In direct addressing, the length of the address field is usually less than the word length, thus limiting the address range. One solution is to have the address field refer to the address of a word in memory, which in turn contains a full- length address of the operand hence Indirect Addressing Algorithm: EA = (A). Merit: Large address space  Demerit: Multiple memory references (instruction execution requires two memory references to fetch the operand: one to get its address and a second to get its value.)
  • 7.
    Register Addressing It issimilar to direct addressing but the only difference is that the address field refers to a register rather than a main memory address: Algorithm: EA = R Merit: No memory reference Demerit: Limited address space
  • 8.
    Register Indirect Addressing Registeraddressing is analogous to direct addressing as register indirect addressing is analogous to indirect addressing. In both cases, the only difference is whether the address field refers to a memory location or a register. Thus, for register indirect address Algorithm: EA = (R) Merit: Large address space Demerit: Extra memory reference In both cases, the address space limitation (limited range of addresses) of the address field is overcome by having that field refer to a wordlength location containing an address. In addition, register indirect addressing uses one less memory reference than indirect addressing.
  • 9.
    Displacement Addressing It isa very powerful mode of addressing combines the capabilities of direct addressing and register indirect addressing. Algorithm: EA = A + (R)  Displacement addressing requires that the instruction have two address fields, at least one of which is explicit. The value contained in one address field (value = A) is used directly.The other address field, or an implicit reference based on opcode, refers to a register whose contents are added to A to produce the effective address. Merit: Flexibility Demerit: Complexity
  • 10.
    Stack Addressing  astack is a linear array of locations. It is sometimes referred to as a pushdown list or last-in-first-out queue. The stack is a reserved block of locations. Items are appended to the top of the stack so that, at any given time, the block is partially filled. Associated with the stack is a pointer whose value is the address of the top of the stack. Alternatively, the top two elements of the stack may be in processor registers, in which case the stack pointer references the third element of the stack. The stack mode of addressing is a form of implied addressing. The machine instructions need not include a memory reference but implicitly operate on the top of the stack.  Merit: No memory reference  Demerit: Limited applicability
  • 11.
    Summary  Immediate :Valueof operand is included in the instruction  Direct (Absolute): Address of operand is included in the instruction  Register indirect: Operand is in a memory location whose address is in the register specified in the instruction  Memory indirect: Operand is in a memory location whose address is in the memory location specified in the instruction  Indexed : Address of operand is the sum of an index value and the contents of an index register  Relative: Address of operand is the sum of an index value and the contents of the program counter  Autoincrement : Address of operand is in a register whose value is incremented after fetching the operand  Autodecrement: Address of operand is in a register whose value is decremented before fetching the operand
  • 12.
    Pentium Addressing Modes Virtual or effective address is offset into segment  Starting address plus offset gives linear address  This goes through page translation if paging enabled  Addressing modes available  Immediate  Register operand  Displacement  Base  Base with displacement  Scaled index with displacement  Base with index and displacement  Base scaled index with displacement  Relative
  • 13.
    Immediate Addressing  Immediateaddressing is so-named because the value to be referenced immediately follows the operation code in the instruction. That is to say, the data to be operated on is part of the instruction. The 12 bits of the operand field do not specify an address-they specify the actual operand the instruction requires. Immediate addressing is very fast because the value to be loaded is included in the instruction. However, because the value to be loaded is fixed at compile time it is not very flexible.
  • 14.
    Direct Addressing  Directaddressing is so-named because the value to be referenced is obtained by specifying its memory address directly in the instruction. Direct addressing is typically quite fast because, although the value to be loaded is not included in the instruction, it is quickly accessible. It is also much more flexible than immediate addressing because the value to be loaded is whatever is found at the given address, which may be variable.
  • 15.
    Register Addressing  Inregister addressing, a register, instead of memory, is used to specify the operand. This is very similar to direct addressing, except that instead of a memory address, the address field contains a register reference. The contents of that register are used as the operand
  • 16.
    Indirect Addressing  Indirectaddressing is a very powerful addressing mode that provides an exceptional level of flexibility. In this mode, the bits in the address field specify a memory address that is to be used as a pointer. The effective address of the operand is found by going to this memory address.  In a variation on this scheme, the operand bits specify a register instead of a memory address. This mode, known as register indirect addressing, works exactly the same way as indirect addressing mode, except it uses a register instead of a memory address to point to the data. For example, if the instruction is Load R1 and we are using register indirect addressing mode, we would find the effective address of the desired operand in R1.
  • 17.
    Indexed and BasedAddressing  In indexed addressing mode, an index register (either explicitly or implicitly designated) is used to store an offset (or displacement), which is added to the operand, resulting in the effective address of the data.  Based addressing mode is similar, except a base address register, rather than an index register, is used. In theory, the difference between these two modes is in how they are used, not how the operands are computed. An index register holds an index that is used as an offset, relative to the address given in the address field of the instruction. A base register holds a base address, where the address field represents a displacement from this base. These two addressing modes are quite useful for accessing array elements as well as characters in strings. In fact, most assembly languages provide special index registers that are implied in many string operations. Depending on the instruction-set design, general- purpose registers may also be used in this mode.
  • 18.
  • 19.
     There aresix segment registers; the one being used for a particular reference depends on the context of execution and the instruction. Associated with each user-visible segment register is a segment descriptor register (not programmer visible), which records the access rights for the segment as well as the starting address and limit (length) of the segment. In addition, there are two registers that may be used in constructing an address: the base register and the index register
  • 20.
    X86 Addressing modes(Mode and Algorithm)  Immediate: Operand = A  Register Operand : LA = R  Displacement: LA = (SR) + A  Base: LA = (SR) + (B)  Base with Displacement: LA = (SR) + (B) + A  Scaled Index with Displacement: LA = (SR) + (I) * S + A  Base with Index and Displacement: LA = (SR) + (B) + (I) + A  Base with Scaled Index and Displacement : LA = (SR) + (I) * S + (B) + A  Relative: LA = (PC) + A
  • 21.
    Addressing modes  S= scaling factor  I = index register  B = base register  R = register  A = contents of an address field in the instruction  PC = program counter  SR = segment register  (X) = contents of X  LA = linear address
  • 22.
     For theimmediate mode, the operand is included in the instruction. The operand can be a byte, word, or doubleword of data.  For register operand mode, the operand is located in a register. For general instructions, such as data transfer, arithmetic, and logical instructions, the operand can be one of the 32-bit general registers (EAX, EBX, ECX,EDX,ESI, EDI, ESP, EBP), one of the 16- bit general registers (AX, BX, CX, DX, SI, DI, SP, BP), or one of the 8- bit general registers (AH, BH, CH, DH, AL, BL, CL, DL). There are also some instructions that reference the segment selector registers (CS, DS, ES, SS, FS, GS).  The remaining addressing modes reference locations in memory. The memory location must be specified in terms of the segment containing the location and the offset from the beginning of the segment. In some cases, a segment is specified explicitly; in others, the segment is specified by simple rules that assign a segment by default.  In the displacement mode, the operand’s offset in the figure above is contained as part of the instruction as an 8-, 16-, or 32-bit displacement.  With segmentation, all addresses in instructions refer merely to an offset in a segment.
  • 23.
     The basemode specifies that one of the 8-, 16-, or 32-bit registers contains the effective address.This is equivalent to what we have referred to as register indirect addressing.  The base with displacement mode, the instruction includes a displacement to be added to a base register, which may be any of the general-purpose registers. Examples of uses of this mode are as follows: • Used by a compiler to point to the start of a local variable area. For example, the base register could point to the beginning of a stack frame, which contains the local variables for the corresponding procedure.
  • 24.
    The scaled indexwith displacement mode, the instruction includes a displacement to be added to a register, in this case called an index register.The index register may be any of the general-purpose registers except the one called ESP, which is generally used for stack processing. In calculating the effective address, the contents of the index register are multiplied by a scaling factor of 1, 2, 4, or 8, and then added to a displacement.This mode is very convenient for indexing arrays. A scaling factor of 2 can be used for an array of 16-bit integers. A scaling factor of 4 can be used for 32- bit integers or floating-point numbers. Finally, a scaling factor of 8 can be used for an array of double-precision floating-point numbers.
  • 25.
     The basewith index and displacement mode sums the contents of the base register, the index register, and a displacement to form the effective address. Again, the base register can be any general-purpose register and the index register can be any general-purpose register except ESP. As an example, this addressing mode could be used for accessing a local array on a stack frame. This mode can also be used to support a two-dimensional array; in this case, the displacement points to the beginning of the array, and each register handles one dimension of the array.
  • 26.
     The basedscaled index with displacement mode sums the contents of the index register multiplied by a scaling factor, the contents of the base register, and the displacement. This is useful if an array is stored in a stack frame; in this case, the array elements would be 2, 4, or 8 bytes each in length. This mode also provides efficient indexing of a two- dimensional array when the array elements are 2, 4, or 8 bytes in length.
  • 27.
     relative addressingcan be used in transfer-of-control instructions. A displacement is added to the value of the program counter, which points to the next instruction. In this case, the displacement is treated as a signed byte, word, or doubleword value, and that value either increases or decreases the address in the program counter.
  • 28.
     LOAD/STORE ADDRESSINGLoad and store instructions are the only instructions that reference memory. This is always done indirectly through a base register plus offset. There are three alternatives with respect to indexing:  Offset: For this addressing method, indexing is not used. An offset value is added to or subtracted from the value in the base register to form the memory address. As an example Figure 11.3a illustrates this method with the assembly language instruction STRB r0, [r1, #12]. This is the store byte instruction. In this case the base address is in register r1 and the displacement is an immediate value of decimal 12. The resulting address (base plus offset) is the location where the least significant byte from r0 is to be stored
  • 29.
     Preindex: Thememory address is formed in the same way as for offset addressing. The memory address is also written back to the base register. In other words, the base register value is incremented or decremented by the offset value. Figure below illustrates this method with the assembly language instruction STRB r0, [r1, #12]!. The exclamation point signifies preindexing.  Postindex:The memory address is the base register value.An offset is added to or subtracted from the base register value and the result is written back to the base register. Figure below illustrates this method with the assembly language instruction STRB r0, [r1], #12.
  • 30.
     Note thatwhat ARM refers to as a base register acts as an index register for preindex and postindex addressing. The offset value can either be an immediate value stored in the instruction or it can be in another register. If the offset value is in a register, another useful feature is available: scaled register addressing.The value in the offset register is scaled by one of the shift operators: Logical Shift Left, Logical Shift Right, Arithmetic Shift Right, Rotate Right, or Rotate Right Extended (which includes the carry bit in the rotation).The amount of the shift is specified as an immediate value in the instruction.
  • 31.
     DATA PROCESSINGINSTRUCTION ADDRESSING Data processing instructions use either register addressing of a mixture of register and immediate addressing. For register addressing, the value in one of the register operands may be scaled using one of the five shift operators defined in the preceding paragraph.  BRANCH INSTRUCTIONS The only form of addressing for branch instructions is immediate addressing.The branch instruction contains a 24-bit value. For address calculation, this value is shifted left 2 bits, so that the address is on a word boundary. Thus the effective address range is ;32 MB from the program counter.
  • 32.
    PowerPC Addressing Modes Load/store architecture  Indirect  Instruction includes 16 bit displacement to be added to base register (may be GP register)  Can replace base register content with new address  Indirect indexed  Instruction references base register and index register (both may be GP)  EA is sum of contents  Branch address  Absolute  Relative  Indirect  Arithmetic  Operands in registers or part of instruction  Floating point is register only
  • 33.
    Questions  Briefly defineimmediate addressing, direct addressing, indirect addressing, register addressing, register indirect addressing, displacement addressing, relative addressing.  An address field in an instruction contains decimal value 14. Where is the corresponding operand located for :  immediate addressing?  direct addressing?  indirect addressing?  register addressing?  register indirect addressing?
  • 34.
  • 35.
    PROCESSOR ORGANIZATION To understandthe organization of the processor, we consider the requirements placed on the processor, the things that it must do:  Fetch instruction: The processor reads an instruction from memory (register, cache, main memory).  Interpret instruction: The instruction is decoded to determine what action is required  Fetch data: The execution of an instruction may require reading data from memory or an I/O module.  Process data: The execution of an instruction may require performing some arithmetic or logical operation on data.  Write data: The results of an execution may require writing data to memory or an I/O module
  • 36.
     To dothese things, the processor needs to store some data temporarily. It must remember the location of the last instruction so that it can know where to get the next instruction. It needs to store instructions and data temporarily while an instruction is being executed. In other words, the processor needs a small internal memory.
  • 37.
    Register Organization  Acomputer system employs a memory hierarchy. At higher levels of the hierarchy, memory is faster, smaller, and more expensive (per bit).Within the processor, there is a set of registers that function as a level of memory above main memory and cache in the hierarchy. The registers in the processor perform two roles:  • User-visible registers: Enable the machine- or assembly language programmer to minimize main memory references by optimizing use of registers.  • Control and status registers: Used by the control unit to control the operation of the processor and by privileged, operating system programs to control the execution of programs.  There is not a clean separation of registers into these two categories. For example, on some machines the program counter is user visible (e.g., x86), but on many it is not. For purposes of the following discussion, however, we will use these categories.
  • 38.
     User-Visible Registers Auser-visible register is one that may be referenced by means of the machine language that the processor executes. We can characterize these in the following categories:  General purpose  Data  Address  Condition codes
  • 39.
     General-purpose registerscan be assigned to a variety of functions by the programmer. Sometimes their use within the instruction set is orthogonal to the operation. That is, any general- purpose register can contain the operand for any opcode. This provides true general-purpose register use. Often, however, there are restrictions.  For example, there may be dedicated registers for floating-point and stack operations.  In some cases, these registers can be used for addressing functions (e.g., register indirect, displacement). In other cases, there is a partial or clean separation between data registers and address registers
  • 40.
     Data registersmay be used only to hold data and cannot be employed in the calculation of an operand address.  Address registers may themselves be somewhat general purpose, or they may be devoted to a particular addressing mode. Examples include the following:  Segment pointers: In a machine with segmented addressing, a segment register holds the address of the base of the segment. There may be multiple registers: for example, one for the operating system and one for the current process.  Index registers: These are used for indexed addressing and may be auto indexed.  Stack pointer: If there is user-visible stack addressing, then typically there is a dedicated register that points to the top of the stack. This allows implicit addressing; that is, push, pop, and other stack instructions need not contain an explicit stack operand
  • 41.
     A finalcategory of registers, which is at least partially visible to the user, holds condition codes (also referred to as flags). Condition codes are bits set by the processor hardware as the result of operations. Advantages of Conditional Codes: 1. Because condition codes are set by normal arithmetic and data movement instructions, they should reduce the number of COMPARE and TEST instructions needed. 2. Conditional instructions, such as BRANCH are simplified relative to composite instructions, such as TEST AND BRANCH. 3. Condition codes facilitate multiway branches. For example, a TEST instruction can be followed by two branches, one on less than or equal to zero and one on greater than zero.
  • 42.
    1. Condition codesadd complexity, both to the hardware and software. Condition code bits are often modified in different ways by different instructions, making life more difficult for both the microprogrammer and compiler writer. 2. Condition codes are irregular; they are typically not part of the main data path, so they require extra hardware connections. 3. Often condition code machines must add special non-condition-code instructions for special situations anyway, such as bit checking, loop control, and atomic semaphore operations. 4. In a pipelined implementation, condition codes require special synchronization to avoid conflicts.
  • 43.
    Control and StatusRegisters  There are a variety of processor registers that are employed to control the operation of the processor. Most of these, on most machines, are not visible to the user. Some of them may be visible to machine instructions executed in a control or operating system mode
  • 44.
    Four registers areessential to instruction execution:  Program counter (PC): Contains the address of an instruction to be fetched  Instruction register (IR): Contains the instruction most recently fetched  Memory address register (MAR): Contains the address of a location in memory  Memory buffer register (MBR): Contains a word of data to be written to memory or the word most recently read
  • 45.
     Many processordesigns include a register or set of registers, often known as the program status word (PSW), that contain status information. The PSW typically contains condition codes plus other status information. Common fields or flags include the following
  • 46.
     Sign: Containsthe sign bit of the result of the last arithmetic operation.  Zero: Set when the result is 0.  Carry: Set if an operation resulted in a carry (addition) into or borrow (subtraction) out of a high-order bit. Used for multiword arithmetic operations.  Equal: Set if a logical compare result is equality.  Overflow: Used to indicate arithmetic overflow.  Interrupt Enable/Disable: Used to enable or disable interrupts.  Supervisor: Indicates whether the processor is executing in supervisor or user mode. Certain privileged instructions can be executed only in supervisor mode, and certain areas of memory can be accessed only in supervisor mode.
  • 47.
    Instruction Cycle An instructioncycle includes the following stages:  Fetch: Read the next instruction from memory into the processor.  Execute: Interpret the opcode and perform the indicated operation.  Interrupt: If interrupts are enabled and an interrupt has occurred, save the current process state and service the interrupt.
  • 48.
  • 49.
    The Indirect Cycle The execution of an instruction may involve one or more operands in memory, each of which requires a memory access. Further, if indirect addressing is used, then additional memory accesses are required.  We can think of the fetching of indirect addresses as one more instruction stages. The main line of activity consists of alternating instruction fetch and instruction execution activities. After an instruction is fetched, it is examined to determine if any indirect addressing is involved. If so, the required operands are fetched using indirect addressing. Following execution, an interrupt may be processed before the next instruction fetch.
  • 50.
    Data Flow  Theexact sequence of events during an instruction cycle depends on the design of the processor. We can, however, indicate in general terms what must happen.  Let us assume that a processor that employs a memory address register (MAR), a memory buffer register (MBR), a program counter (PC), and an instruction register (IR).
  • 51.
     During thefetch cycle, an instruction is read from memory. Figure below shows the flow of data during this cycle. The PC contains the address of the next instruction to be fetched. This address is moved to the MAR and placed on the address bus
  • 52.
  • 53.
    Data Flow, FetchCycle diagram
  • 54.
     The controlunit requests a memory read, and the result is placed on the data bus and copied into the MBR and then moved to the IR. Meanwhile, the PC is incremented by 1, preparatory for the next fetch.  Once the fetch cycle is over, the control unit examines the contents of the IR to determine if it contains an operand specifier using indirect addressing.
  • 55.
     If so,an indirect cycle is performed. .The rightmost N bits of the MBR, which contain the address reference, are transferred to the MAR. Then the control unit requests a memory read, to get the desired address of the operand into the MBR.  The fetch and indirect cycles are simple and predictable. The execute cycle takes many forms; the form depends on which of the various machine instructions is in the IR. This cycle may involve transferring data among registers, read or write from memory or I/O, and/or the invocation of the ALU.
  • 56.
    Instruction Pipelining  PipeliningStrategy Instruction pipelining is similar to the use of an assembly line in a manufacturing plant. An assembly line takes advantage of the fact that a product goes through various stages of production. By laying the production process out in an assembly line, products at various stages can be worked on simultaneously. This process is also referred to as pipelining, because, as in a pipeline, new inputs are accepted at one end before previously accepted inputs appear as outputs at the other end.  To apply this concept to instruction execution, we must recognize that, in fact, an instruction has a number of stages.
  • 57.
  • 58.
  • 59.
     As asimple approach, consider subdividing instruction processing into two stages: fetch instruction and execute instruction. There are times during the execution of an instruction when main memory is not being accessed. This time could be used to fetch the next instruction in parallel with the execution of the current one.  The pipeline has two independent stages. The first stage fetches an instruction and buffers it. When the second stage is free, the first stage passes it the buffered instruction. While the second stage is executing the instruction, the first stage takes advantage of any unused memory cycles to fetch and buffer the next instruction. This is called instruction prefetch or fetch overlap
  • 60.
     In general,pipelining requires registers to store data between stages. It should be clear that this process will speed up instruction execution. If the fetch and execute stages were of equal duration, the instruction cycle time would be halved. However, if we look more closely at this pipeline, we will see that this doubling of execution rate is unlikely for two reasons:
  • 61.
    1. The executiontime will generally be longer than the fetch time. Execution will involve reading and storing operands and the performance of some operation. Thus, the fetch stage may have to wait for some time before it can empty its buffer. 2. A conditional branch instruction makes the address of the next instruction to be fetched unknown. Thus, the fetch stage must wait until it receives the next instruction address from the execute stage. The execute stage may then have to wait while the next instruction is fetched.
  • 62.
     While thesefactors reduce the potential effectiveness of the two- stage pipeline, some speedup occurs. To gain further speedup, the pipeline must have more stages. Let us consider the following decomposition of the instruction processing.  Fetch instruction (FI): Read the next expected instruction into a buffer.  Decode instruction (DI): Determine the opcode and the operand specifiers.  Calculate operands (CO): Calculate the effective address of each source operand. This may involve displacement, register indirect, indirect, or other forms of address calculation.  Fetch operands (FO): Fetch each operand from memory. Operands in registers need not be fetched.  Execute instruction (EI): Perform the indicated operation and store the result, if any, in the specified destination operand location.  Write operand (WO): Store the result in memory.
  • 63.
  • 64.
  • 65.
     Some ofthe IBM S/360 designers pointed out two factors that frustrate this seemingly simple pattern for high performance design [ANDE67a], and they remain elements that designer must still consider: 1. At each stage of the pipeline, there is some overhead involved in moving data from buffer to buffer and in performing various preparation and delivery functions. This overhead can appreciably lengthen the total execution time of a single instruction. This is significant when sequential instructions are logically dependent, either through heavy use of branching or through memory access dependencies. 2. The amount of control logic required to handle memory and register dependencies and to optimize the use of the pipeline increases enormously with the number of stages. This can lead to a situation where the logic controlling the gating between stages is more complex than the stages being controlled. Another consideration is latching delay: It takes time for pipeline buffers to operate and this adds to instruction cycle time. Instruction pipelining is a powerful technique for enhancing performance but requires careful design to achieve optimum results with reasonable complexity
  • 66.
    Pipeline Hazards  Apipeline hazard occurs when the pipeline, or some portion of the pipeline, must stall because conditions do not permit continued execution. Such a pipeline stall is also referred to as a pipeline bubble. There are three types of hazards:  resource,  data,  control
  • 67.
     RESOURCE HAZARDSA resource hazard occurs when two (or more) instructions that are already in the pipeline need the same resource. The result is that the instructions must be executed in serial rather than parallel for a portion of the pipeline. A resource hazard is sometime referred to as a structural hazard.  DATA HAZARDS A data hazard occurs when there is a conflict in the access of an operand location. In general terms, we can state the hazard in this form: Two instructions in a program are to be executed in sequence and both access a particular memory or register operand. If the two instructions are executed in strict sequence, no problem occurs. However, if the instructions are executed in a pipeline, then it is possible for the operand value to be updated in such a way as to produce a different result than would occur with strict sequential execution. In other words, the program produces an incorrect result because of the use of pipelining.
  • 68.
    There are threetypes of data hazards;  Read after write (RAW), or true dependency: An instruction modifies a register or memory location and a succeeding instruction reads the data in that memory or register location. A hazard occurs if the read takes place before the write operation is complete.  Write after read (WAR), or antidependency: An instruction reads a register or memory location and a succeeding instruction writes to the location. A hazard occurs if the write operation completes before the read operation takes place.  Write after write (WAW), or output dependency: Two instructions both write to the same location. A hazard occurs if the write operations take place in the reverse order of the intended sequence
  • 69.
     CONTROL HAZARDSA control hazard, also known as a branch hazard, occurs when the pipeline makes the wrong decision on a branch prediction and therefore brings instructions into the pipeline that must subsequently be discarded. We discuss approaches to dealing with control hazards next.
  • 70.
    Dealing with Branches One of the major problems in designing an instruction pipeline is assuring a steady flow of instructions to the initial stages of the pipeline. The primary impediment, as we have seen, is the conditional branch instruction. Until the instruction is actually executed, it is impossible to determine whether the branch will be taken or not. A variety of approaches have been taken for dealing with conditional branches:  Multiple streams  Prefetch branch target  Loop buffer  Branch prediction  Delayed branch
  • 71.
     MULTIPLE STREAMSA simple pipeline suffers a penalty for a branch instruction because it must choose one of two instructions to fetch next and may make the wrong choice. A brute-force approach is to replicate the initial portions of the pipeline and allow the pipeline to fetch both instructions, making use of two streams. There are two problems with this approach:  With multiple pipelines there are contention delays for access to the registers and to memory.  Additional branch instructions may enter the pipeline (either stream) before the original branch decision is resolved. Each such instruction needs an additional stream
  • 72.
     PREFETCH BRANCHTARGET When a conditional branch is recognized, the target of the branch is prefetched, in addition to the instruction following the branch. This target is then saved until the branch instruction is executed. If the branch is taken, the target has already been prefetched. The IBM 360/91 uses this approach
  • 73.
     LOOP BUFFERA loop buffer is a small, very- high-speed memory maintained by the instruction fetch stage of the pipeline and containing the n most recently fetched instructions, in sequence. If a branch is to be taken, the hardware first checks whether the branch target is within the buffer. If so, the next instruction is fetched from the buffer.The loop buffer has three benefits:  1. With the use of prefetching, the loop buffer will contain some instruction sequentially ahead of the current instruction fetch address. Thus, instructions fetched in sequence will be available without the usual memory access time
  • 74.
    2. If abranch occurs to a target just a few locations ahead of the address of the branch instruction, the target will already be in the buffer. This is useful for the rather common occurrence of IF– THEN and IF–THEN–ELSE sequences. 3. This strategy is particularly well suited to dealing with loops, or iterations; hence the name loop buffer. If the loop buffer is large enough to contain all the instructions in a loop, then those instructions need to be fetched from memory only once, for the first iteration. For subsequent iterations, all the needed instructions are already in the buffer.
  • 75.
     BRANCH PREDICTIONVarious techniques can be used to predict whether a branch will be taken. Among the more common are the following:  Predict never taken  Predict always taken  Predict by opcode  Taken/not taken switch  Branch history table
  • 76.
    Intel 80486 Pipelining Aninstructive example of an instruction pipeline is that of the Intel 80486. The 80486 implements a five-stage pipeline:  Fetch: Instructions are fetched from the cache or from external memory and placed into one of the two 16-byte prefetch buffers.The objective of the fetch stage is to fill the prefetch buffers with new data as soon as the old data have been consumed by the instruction decoder. Because instructions are of variable length (from 1 to 11 bytes not counting prefixes), the status of the prefetcher relative to the other pipeline stages varies from instruction to instruction. On average, about five instructions are fetched with each 16-byte load [CRAW90].The fetch stage operates independently of the other stages to keep the prefetch buffers full
  • 77.
     Decode stage1: All opcode and addressing-mode information is decoded in the D1 stage. The required information, as well as instruction-length information, is included in at most the first 3 bytes of the instruction. Hence, 3 bytes are passed to the D1 stage from the prefetch buffers. The D1 decoder can then direct the D2 stage to capture the rest of the instruction (displacement and immediate data), which is not involved in the D1 decoding.  Decode stage 2: The D2 stage expands each opcode into control signals for the ALU. It also controls the computation of the more complex addressing modes.  Execute: This stage includes ALU operations, cache access, and register update.  Write back: This stage, if needed, updates registers and status flags modified during the preceding execute stage. If the current instruction updates memory, the computed value is sent to the cache and to the bus- interface write buffers at the same time.
  • 78.
  • 79.
    1. List andbriefly explain five important instruction set design issues 2. What is the difference between an arithmetic shift and a logical shift? 3. Suppose we wished to apply the x86 CMP instruction to 32-bit operands that contained numbers in a floating- point format. For correct results, what requirements have to be met in the following areas? a. The relative position of the significand, sign, and exponent fields. b. The representation of the value zero. c. The representation of the exponent. d. Does the IEEE format meet these requirements? Explain.
  • 80.
    4. Briefly define: immediate addressing.  direct addressing.  indirect addressing.  register addressing.  register indirect addressing.  displacement addressing.  relative addressing
  • 81.
    5. If thelast operation performed on a computer with an 8-bit word was an addition in which the two operands were 00000010 and 00000011, what would be the value of the following flags?  Carry  Zero  Overflow  Sign  Even Parity  Half-Carry 6. a. What are some typical characteristics of a RISC instruction set architecture? b. What is a delayed branch?
  • 82.
    7. In manycases, common machine instructions that are not listed as part of the MIPS instruction set can be synthesized with a single MIPS instruction. Show this for the following: a. Register-to-register move b. Increment, decrement c. Complement d. Negate e. Clear 8. Briefly define the following terms:  True data dependency  Procedural dependency  Resource conflicts  Output dependency  Antidependency
  • 83.
    9.a. Explain thedistinction between the written sequence and the time sequence of an instruction. b. What is the relationship between instructions and micro-operations? 10. Your ALU can add its two input registers, and it can logically complement the bits of either input register, but it cannot subtract. Numbers are to be stored in two’s complement representation. List the micro-operations your control unit must perform to cause a subtraction