2. The x86 provides a complex array of operation types, including a number of
specialized instruction. The intent was to provide tools for the compiler writer to
produce optimized machine language translation of high-level language programs.
Instruction Description
Data Movement
MOV Move operand, between registers or between register and memory
PUSH Push operand onto stack
PUSHA Push all registers on stack
MOVSX Move byte, word, dword, sign extended. Moves a byte to a word or a word to a
doubleword with twos-complement sign extension.
LEA Load effective address. Loads the offset of the source operand , rather than its value to
the destination operand.
XLAT Table look up translation, R replaces a byte in AL with a byte from a user coded
translation table. When XLAT is Executed , AL should have an unsigned index of the
table. XLAT changes the content of AL from the table index to the table entry.
IN,OUT Input, output operand from I/O space.
Arithmetic
ADD Add operands
SUB Subtract operands
MUL Unsigned integers multiplication, with byte, word, or double word operands and word,
doubleword, or quadword result.
IDIV Signed divide
Table 10.8 Operation Types (with Examples of Typical Operations)
3. Logical
AND AND operands
BTS Bit test and set. Operates on a bit field operand. The instruction copies the current
value of bit to flag CF and sets the original bit to 1.
BSF Bit scan forward. Scans a word or doubleword for a 1-bit and storesthe number of the
first 1-bit into a register.
SHL/SHR Shift logical left or right
SAL/SAR Shift arithmetic left or right
ROL/ROR Rotate left or right
SETcc Sets a byte to zero or one depending on any of the 16 conditions defined by status
flags.
Control Transfer
JMP Unconditional jump
CALL Transfer control to another location. Before transfer, the address of the instruction
following the CALL is the placed on the stack.
JE/JZ Jump is equal/zero
LOOPE/LOOPZ Loops if equal/zero. This is a conditional jump using a value stored in register ECX. The
instruction first decrements EXC before testing EXC for the branch condition
INT/INTO Interrupt/ Interrupt if overflow. Transfer control to an interrupt service routine.
String Operations
MOVS Move byte, word, dword string. The instruction operates on one element of a string,
indexed by registers ESI and EDI. After each string operation, the register are
automatically incremented or decremented to point to the next element of the string.
LODS Load byte, word, dword of string.
4. High- Level Language Support
ENTER Creates a stack frame that can be used to implement the rules of a block structured high-
level language.
LEAVE Reserves the action of the previous ENTER
BOUND Check array bounds. Verifies that the value in operand 1 is within lower and upper limits.
The limits are in two adjacent memory locations referenced by operand 2. An interrupt
occurs if the value is out of bounds. This instruction Is used to check an array index.
Flag Control
STC Set Carry flag
LAHF Load AH register from flags, Copies SF, ZF, AF, PF, and CF into A register.
Segment Register
LDC Load pointer into DS and another register.
System control
HLT Halt.
LOCK Asserts a hold on shared memory so that the Pentium has inclusive use of it during the
instruction that immediately follows the LOCK.
ESC Process extension escape. An escape code that indicates the succeeding instructions are to
be executed by a numeric coprocessor that supports high-precision integer and floating-
point calculations.
WAIT Wait until BUSY# negated. Suspends Pentium program execution until the processor
detects that the BUSY pin is inactive, indicating that the numeric coprocessor has finished
execution.
PROTECTION
SGDT Store global descriptor
LSL Load segment limit. Load a user-specified register with a segment limit.
VERR/VERW Verify segment for reading/writing.
CACHE MANAGEMENT
INVD Flushes the internal cache memory.
WBINVD Flushes the internal cache memory after writing dirty lines to memory.
INVLPG Invalidates a translation look aside buffer (TLB) entry.
5. CALL/RETURN INSTRUCTIONS
The x86 provides four instructions to support procedure call/return: CALL, ENTER, LEAVE,
RETURN. It will be instructive to look at the support provided by these instruction.
Push the return point on the stack.
Push the current frame pointer on the stack.
Copy the stack pointer as the new value of the frame pointer.
Adjust the stack pointer to allocate the frame.
The CALL instruction pushes the current instruction pointer value onto the stack and causes
a jump to the entry point of the procedure by placing the address of the entry point in the pointer.
In the 8088 and 8086 machines, the typical procedure began with the sequence:
PUSH EBP
MOV EBP, ESP
SUB ESP, space_for_locals
Where EBP is the frame pointer and ESP is the stack pointer. In the 80286 and later
machines, the ENTER instruction performs all the aforementioned operations in a single instruction. The
ENTER instruction was added to the instruction set to provide direct support for the compiler.
MEMORY MANAGEMENT
Another set of specialized instruction deals with memory segmentation. These are the
privileged instructions that can only be executed from the operating system. They allow local and global
segment tables (called descriptor table) to be loaded and read, and for the privilege level of a segment
to be checked and altered.
6. Status Bit Name Description
C Carry Indicated carrying or borrowing out of the left-most bit position
following an arithmetic operation. Also modified by the some of the
shift and rotate operations.
P Parity Parity of the least significant byte of the result an arithmetic or logic
operation. 1 indicates even parity: 0 indicates odd parity.
A Auxiliary Carry Represents carrying or borrowing between half-bytes of an 8888-bit
arithmetic or logic operation. Used in binary-coded decimal arithmetic.
Z Zero Indicates that the result of an arithmetic or logic operation.
S Sign Indicates the sign of the result of an arithmetic or logic operation.
O Overflow Indicates an arithmetic overflow after an addition or subtraction for
twos complement arithmetic.
Table 10.9 x86 Status Flags
7. STATUS FLAGS AND CONDITION CODES
Status flags are bits with special registers that maybe set by certain operations and used in conditional
branch operations. The term conditional code refers to the setting of one or more status flags.
Several interesting observations can be made about this list.
First we may wish to test two operands to determine if one number is bigger than another. But this will depend on
whether the numbers are sign or unsigned.
Example:
8-bit number 11111111 is bigger than 00000000 if two numbers are interpreted as unsigned integers
(255>0) but less if they are considered as 8-bit twos complement numbers (-1 < 0). Many languages Therefore
introduce two sets of terms to distinguish the two cases:
1. If we are comparing two numbers as signed integers, we used terms less than and greater than.
2. If we are comparing them as unsigned integers, we use the term below and above.
Second observation concerns the complexity of comparing signed integers. A signed result is greater than or equal
to zero if:
1. The sign bit is zero and there is no overflow (S = 0 AND O = 0), or
2. The sign bit is one and the other is an overflow.
x86 SIMD INSTRUCTIONS
In 1996, Intel introduced MMX technology into its Pentium product line. MMX is set of highly optimized
instructions for multimedia tasks. There are 57 new instructions that treat data in a SIMD (single-instruction, multiple
data) fashion, which makes it possible to perform, such as addition or multiplication, on multiple data elements at
ones. Each instruction typically takes a single clock cycle to execute. For proper application, these fast parallel
operations can yield a speed of two to eight times over comparable algorithms that do not use the MMX instruction
[ATK196]. With the introduce of 64-bit x86 architecture, Intel has expanded this extension to include double quadword
(128 bits) operands and floating-point operations.
8. Symbol Condition Test Comment
A, NBE C = 0AND Z =0 Above; not below or equal (greater than, unsigned)
AE, NB, NC C = 0 Above or equal; Not below (greater than or equal,
unsigned); Not carry
B, NAE, C C = 1 Below or equal; Not above (less than, unsigned);Carry set
BE, NA C = 1 OR Z = 1 Below or equal; not above (less than or equal, unsigned)
E, Z Z = 1 Equal; Zero (signed or unsigned)
G, NLE [(S = 1 AND O = 1) OR (S = 0) and O =
0) AND [Z = 0]
Greater than; Not less than or equal (signed)
GE, NL [(S = 1 AND O = 1) OR (S = 0 AND O =
0)
Greater than or equal; not less than (signed)
L, NGE (S = 1 AND O = 0) OR (S = 0 AND O =
1)
Less than; Not greater than or equal (signed)
LE, NG (S = 1 AND O = 0) OR (S = 0 AND O =
1) 0R (Z = 1)
Less than or equal; not greater than (signed)
NE NZ Z =0 Not equal; Not zero (signed or unsigned)
NO O = 0 No overflow
NS S = 0 Not sign (not negative)
NP, PO P = 0 No parity; Parity odd
O O = 1 Overflow
P P = 1 Parity; Parity even
S S =1 Sign (negative)
Table 10.10 x86 Condition Codes for Conditional Jump and Setcc Instructions
9. The types are as follows:
• Packed byte: Eight bytes packed into one 64-bit quantity
• Packed word: Four 16-bit words packed into 64 bits
• Packed doubleword: Two 32-bit doubleword packed into 64-bits
Table 10.11 MMX Instruction Set
Category Instruction Description
Arithmetic
PADD [B, W, D]
PADDS [B, W]
PADDUS [B, W]
PSUB [B, W, D]
PSUBS [B, W]
PSUBUS [ B, W]
PMULHW
PMULLW
PMADDWD
Parallel add of packed eight bytes, four 16-bit words, or two 32-bit
doublewords, with wraparound.
Add the saturation.
Add signed with saturation.
Subtract with wraparound.
Subtracts with saturation.
Subtract unsigned with saturation.
Parallel multiply of four signed 16-bit words, with high order 16 bits of
32-bit result chosen.
Parallel multiply of four signed 16-bit words, with low order 16 bits of
32-bit result chosen.
Parallel multiply of four signed 16-bit words; Add together adjacent
pairs of 32-bit result.
Comparison
PCMPEQ [B, W, D]
PCMPGT
Parallel compare for quality; result is mask of 1s if true or 0s if false.
Parallel compare for greater than; result is mask of 1s if true or 0s if
false.
Conversion
PACKUSWB
PACKSS [WB, DW]
PUNPCKH [BW, WD, DQ]
PUNPCKL [BW, WD, DQ
Pack words into bytes with unsigned saturation.
Pack words into bytes, or doublewords into words, with signed
saturation.
Parallel unpacked (interleaved merge) high order bytes, words, or
doublewords from MMX register.
Parallel unpacked (interleaved merge) low order bytes, words, or
doublewords from MMX register.
10. Logical
PAND
PNDN
POR
PXOR
64-bitwise logical AND
64-bitwise logical AND NOT
64-bitwise logical OR
64-bitwise logical XOR
Shift
PSLL [W, D, Q]
PSRL [W, D, Q]
PSRA [W, D]
Parallel logical left shift of packed words, doubleword, or
quadword by amount specified MMX register or immediate value.
Parallel logical right shift of packed words, doubleword, or
quadword.
Parallel arithmetic right shift of packed words, doubleword, or
quadword.
Data Transfer MOV [D, Q] Move doubleword, or quadword to/from MMX register.
State Mgt EMMS Empty MMX state (empty FP register tag bits)
11. One unusual feature of the new instruction set is the introduction of saturation arithmetic for the byte
and 16-bit word operand. With ordinary unsigned arithmetic when an operation overflows (i.e., carry
out of the most significant bit), the extra bit is truncated.
Example:
F000h = 1111 0000 0000 0000
+3000h= 0011 0000 0000 0000
10010 0000 0000 0000 = 200h
With saturation arithmetic.
Example:
F000h = 1111 0000 0000 0000
+3000h= 0011 0000 0000 0000
10010 0000 0000 0000 = 200h
1111 1111 1111 1111 = FFFFh
To provide a feel for the use of MMX instruction we look for an example, taken from [PELE97]. A
common video application is the fade-out, fade-in effect, in which one scene gradually dissolves into
another. Two images are combined with weighed average:
Result_pixel = A_pixel x fade + B_pixel x (1 – fade)
12. ARM Operation Types
The ARM architectures provides a large collection of operation types. The
Following are the principle categories.
• Load and store instruction: In the ARM architecture, only load and store
instruction access memory locations; arithmetic and logical instructions are
performed only in registers and immediate values encoded in the instruction.
• Branch Instruction: ARM supports branch instruction that allows conditional
branch forwards or backwards up to 32MB. As the program counter is one of the
general purpose registers (R15), a branch or jump can also generated by writing a
value of R15.
• Data-processing instruction: This category includes logical instruction (AND, OR,
XOR), add and subtract instructions, and tests and compare instructions.
• Multiply instructions: The integer multiply instructions operate on word or half
word operands and can produce normal and long result.
• Parallel addition and subtraction instructions: In addition to the normal data
processing and multiply instructions, there are set of parallel addition and
subtraction instructions, which portions in 2 operands are operated on in parallel.
• Extended instructions : There are several instructions for unpacking data by sign or
zero extending bytes to halfwords or words, and halfwords to words.
• Status register access instructions: ARM provides the ability to read and also to
write portions of the status register.
13. Condition Codes
The ARM architectures define four condition flags that are
stored in the program status register: N, Z, C, and V (Negative, Zero,
Carry and oVerflow) with meanings essentially the same as the S, Z, C,
and V flags in x86 architecture.
There are two unusual aspects to be use of condition codes in ARM:
1. All instructions, not just branch instructions, include a condition
code field, which means that virtually all instructions may be
conditionally executed. Any combination of flag setting except
1110 or 1111 in an instructions condition code field signifies that
the instruction will be executed only if the condition is met.
2. All data processing instructions (arithmetic, logical) S bit that
signifies whether the instruction updates the condition flags.
14. Code Symbol Condition Tested Comment
0000 EQ Z = 1 Equal
0001 NE Z = 0 Not equal
0010 CS/HS C = 1 Carry set/unsigned higher or same
0011 CC/LO C = 0 Carry set/unsigned lower
0100 MI N = 1 Minus/negative
0101 PL N = 0 Plus/positive or zero
0110 VS V = 1 Overflow
0111 VC V = 0 No overflow
1000 HI C = 1 AND Z = 0 Unsigned higher
1001 LS C = 0 OR Z = 1 Unsigned lower or same
1010 GE N = V [(N = 1 AND V =
1) OR (N = 0 AND V =
0)
Signed greater than or equal
1011 LT N ≠V [(N = 1 AND V =
0) OR (N = 0 AND V =
1)
Signed less than
1100 GT (Z = 0) AND (N = V) Sign greater than
1101 LE (Z = 1) OR (N ≠ V) Signed less than or equal
1110 AL - Always (unconditional)
1111 - - This instruction can only be executed unconditionally.
Table 10.12 ARM Condition for Conditional Instruction Execution
15. Addressing
The address field or fields in a instruction format
are relatively small. We would like to be able to reference
a large range of location in main memory or, for some
system, virtual memory.
• Immediate
• Direct
• Indirect
• Register
• Register Indirect
• Displacement
• Stack
16. Mode Algorithms Principal Advantage Principal Disadvantage
Immediate
Direct
Indirect
Register
Register Indirect
Displacement
Stack
Operand = A
EA = A
EA = (A)
EA = R
EA = (R)
EA = A + (R)
EA = top of stack
No memory reference
Simple
Large address space
No memory reference
Large address space
Flexibility
No memory reference
Limited operand magnitude
Limited address space
Multiple memory reference
Limited address space
Extra memory reference
Complexity
Limited Applicability
Table 11.1 Basic Addressing Modes