x86 architecture

x86
Features and Instruction Set Architecture

Stored-program computer
Stores program instructions in electronic memory, where programs and data in memory can be treated
interchangeably or uniformly.
von Neumann architecture
• also known as the von Neumann model and Princeton architecture, after 1945 work by John von
Neumann and others in the First Draft of a Report on the EDVAC
• stores program data and instruction data in the same memory
• consists of:
processing unit (arithmetic logic unit and processor registers)
control unit (instruction register and program counter)
memory (for data and instructions)
external mass storage
input and output mechanisms
• instruction fetch and a data operation cannot occur concurrently because they share a common bus;
referred to as the von Neumann bottleneck which often limits system performance.

Harvard architecture
• Based on the Harvard Mark I
• Data and instruction are stored in entirely separate memory systems
• CPU can fetch next instruction and load or store data simultaneously and independently

Modified Harvard architecture
• loose separation between code and data
• contents of the instruction memory can be accessed as if it were data.
• implemented on most modern CPU architectures
Implementation Modifications
• Split-cache (or Almost-von-Neumann) architecture
• builds memory hierarchy with a CPU cache separating instructions and data;
• unifies all except small portions of the data and instruction address spaces, providing the von
Neumann model
• cache coherency issues matter since it can greatly affect performance
• Instruction-memory-as-data architecture
• Preserves Harvard memory separation, but provides special machine operations to access the
contents of the instruction memory as data.
• Data-memory-as-instruction architecture
• can execute instructions fetched from any memory segment
• can read an instruction and read a data value simultaneously if they're in separate memory segments
with independent data buses (like Harvard).
• when executing an instruction from one memory segment, the same memory segment cannot be
simultaneously accessed as data

Three characteristics to distinguish modified Harvard machines from pure Harvard and von
Neumann machines:
Pure Harvard Von Neumann Modified Harvard
Instruction and data
memories occupy
different address
spaces
Separate address
"zero" in instruction
space and in data
space
store both instructions
and data in a single
address space
Separate address
"zero" in instruction
space and in data
space
memories have
separate hardware
pathways to the
central processing
unit (CPU)
Separate pathways for
instruction and data
memories to CPU
unified address space such separate access
paths for CPU caches
or other tightly coupled
memories, but a
unified address space
covers the rest of the
memory hierarchy
memories may be
accessed in different
ways
stored instructions on
a punched paper tape
and data in electro-
mechanical counters
provides uniform access to flash memory and
SRAM

Basic properties of the x86 architecture
• General consensus suggests that x86 is a modified Harvard architecture.
• The x86 architecture is a variable instruction length (typical 2 or 3 bytes, some are single-byte, others up to
15 bytes).
• Primarily "CISC" design with emphasis on backward compatibility.
• The instruction set is not typical CISC, but an extended version of the simple eight-bit 8008 and 8080
architectures
• Byte-addressing is enabled and words are stored in memory with little-endian byte order (LSB first)
• Memory access to unaligned addresses is allowed for all valid word sizes
• Native integer sizes for arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on
architecture generation
• Multiple scalar values can be handled simultaneously via the SIMD unit (starting with Pentium 3)
• Floating point (separate prior to 80486, built-in ever since) instructions and registers for floating point
operations
• SIMD (single instruction, multiple data) instructions works on (one or two) 128-bit words, each containing
two or four floating point numbers (each 64 or 32 bits wide respectively), or alternatively, 2, 4, 8 or 16
integers (each 64, 32, 16 or 8 bits wide respectively).
• Pipelining and Superscalar features (starting with Pentium) added extra decoding steps to split most
instructions into micro-operations buffered and scheduled by a control unit to be executed, partly in
parallel, by one of several execution units.
• Out-of-order and speculative execution uses branch prediction, register renaming, and memory
dependence prediction to allow execution of multiple x86 instructions simultaneously and not in the same
order as given in the instruction stream.
• Simultaneous multithreading

x86 REGISTERS
16-bit
• The original Intel 8086 and 8088 have fourteen 16-bit registers.
• Four are general-purpose registers (GPRs): AX, BX, CX, DX; Each can be accessed as two separate bytes
(the high byte and low byte)
• Two pointer registers have special roles: SP (stack pointer) points to the "top" of the stack, BP (base
pointer) is used to point anywhere on the stack.
• The address/index registers: SI, DI, BX and BP
• Four segment registers : CS, DS, SS and ES (used to form a memory address in segmented memory
mode)
• The FLAGS register contains, among others, carry flag (CF), overflow flag (OF) and zero flag (ZF).
• The instruction pointer (IP) points to the next instruction that will be fetched from memory and then
executed; is read-only to the software.
• three special registers (GDTR, LDTR, IDTR) hold descriptor table addresses to support protected mode in
80286 and a fourth task register (TR) is used for task switching.

32-bit
• 32-bit processor (starting with 80386) expanded the 16-bit GPRs, base and index registers, instruction
pointer, and FLAGS register to 32 bits (segment registers not affected)
• Represented by prefixing an "E" (for "extended") to the register names in x86 assembly language.
• The general-purpose, base, and index registers can all be used as the base in addressing modes, and all of
those registers except for the stack pointer can be used as the index in addressing modes.
• Two new segment registers (FS and GS) were added.
• the machine code format was expanded to accommodate expanded registers.
• control/status register (MXCSR) 32-bit Streaming SIMD Extensions (SSE) added starting with the Pentium
III.
64-bit
• 32-bit registers are expanded into 64-bit registers (introduced with AMD Opteron)
• addressing extended to 64 bits
• An R-prefix identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP),
• eight additional 64-bit general registers (R8-R15) were also introduced (only usable in 64-bit mode, which
is one of the two modes only available in long mode)
• extra addressing mode allows memory references relative to RIP (the instruction pointer), to ease the
implementation of position-independent code, used in shared libraries in some operating systems.
Miscellaneous/special purpose
• 32-bit x86 processors (starting with the 80386) also include various special/miscellaneous registers:
• control registers (CR0 through 4, CR8 for 64-bit only)
• debug registers (DR0 through 3, plus 6 and 7)
• test registers (TR3 through 7; 80486 only)
• model-specific registers (MSRs, appearing with the Pentium)

80-bit
• Available in all floating point units (FPU) also known as math co-processors
• They appears as part of the CPU
• 8087 (8086, 8088, 80186, and 80188), 80287 (80286), 80387 (80386), built-in starting with 80486
• eight 80-bit wide registers: st(0) to st(7)
• each register holds numeric data in one of seven formats: 32-, 64-, or 80-bit floating point, 16-, 32-, or 64-
bit (binary) integer, and 80-bit packed decimal integer
• The Pentium MMX added eight 64-bit MMX integer registers (MMX0 to MMX7, which share lower bits with
the 80-bit-wide FPU stack).
128-bit
• SIMD registers XMM0–XMM15.
256-bit
• SIMD registers YMM0–YMM15.
• introduced with Intel's Sandy Bridge processors, SIMD registers widened to 256 bits; AVX (Advanced
Vector Extensions) instructions also introduced.
512-bit
• SIMD registers ZMM0–ZMM31. Used by Knights Corner (on Intel Xeon Phi co-processors)

General Purpose Registers (A, B, C and D)
64 56 48 40 32 24 16 8
R?X
E?X
?X
?H ?L
General Purpose
• AL/AH/AX/EAX/RAX: Accumulator
• BL/BH/BX/EBX/RBX: Base index (for use with arrays)
• CL/CH/CX/ECX/RCX: Counter (for use with loops and strings)
• DL/DH/DX/EDX/RDX: Extend the precision of the accumulator (e.g. combine 32-bit EAX and EDX for 64-bit
integer operations in 32-bit code)
R8-R15 (for 64-bit CPUs)
64-bit mode-only General Purpose Registers
(R8, R9, R10, R11, R12, R13, R14, R15)
64 56 48 40 32 24 16 8
?
?D
?W
?B

Address/Index Registers
• SI/ESI/RSI: Source index for string operations.
• DI/EDI/RDI: Destination index for string operations.
Index Registers (S and D)
64 56 48 40 32 24 16 8
R?I
E?I
?I
?IL
Note: The ?IL registers are only available in 64-bit mode.
Stack Pointer Register
• SP/ESP/RSP: Stack pointer for top address of the stack.
• BP/EBP/RBP: Stack base pointer for holding the address of the current stack frame.
Pointer Registers (S and B)
64 56 48 40 32 24 16 8
R?P
E?P
?P
?PL
Note: The ?PL registers are only available in 64-bit mode.

Instruction Pointer Register
• IP/EIP/RIP: Instruction pointer. Holds the program counter, the current instruction address.
Instruction Pointer Register (I)
64 56 48 40 32 24 16 8
RIP
EIP
IP
Segment registers
• CS: Code
• DS: Data
• SS: Stack
• ES: Extra data
• FS: Extra data #2
GS: Extra data #3
Segment Registers (C, D, S, E, F and G)
16 8
?S

• First introduced with Intel 8086 and 8088 16-bit CPUs.
• Used by Intel, AMD, Cyrix, NEC, and Zilog
• Inherited many characteristics and instructions from the previous generation of 8-bit CPUs such as the
8080.
• modern x86 instruction set is a superset of 8086 instructions and a series of extensions to this instruction
set that began with the Intel 8008 microprocessor.
• Nearly full binary backward compatibility (between the Intel 8086 chip through to the current generation of
x86 processors, with certain exceptions)
• Using instructions that will execute on either anything later than an Intel 80386 (or fully compatible clone)
processor or else anything later than an Intel Pentium (or compatible clone) processor, (In recent years
various software requirements need at least support for later specific extensions to the instruction set,
e.g., MMX or SIMD).
x86 INSTRUCTION SET ARCHITECTURE

Basic Instruction Format
• most registers are expressed in opcodes using three or four bits to conserve encoding space;
• at most one operand to an instruction can be a memory location
• memory operand may also be the destination (or a combined source and destination), while the other
operand, the source, can be either register or immediate.
• The relatively small number of general registers (also inherited from its 8-bit ancestors) has made
register-relative addressing (using small immediate offsets) an important method of accessing
operands, especially on the stack, making such accesses as fast as register accesses, i.e. a one cycle
instruction throughput, in most circumstances where the accessed data is available in the top-level
cache.

• IA-32E Mode
sub-modes: Compatibility Mode (64-bit, legacy protected mode) 64-Bit Mode (full access to 64-bit address)
REX Prefixes
REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following:
• Specify GPRs and SSE registers.
• Specify 64-bit operand size.
• Specify extended control registers.
Not all instructions require a REX prefix in 64-bit mode. A prefix is necessary only if an instruction references one of the extended
registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is ignored. Only one REX prefix is allowed
per instruction. If used, the prefix must immediately precede the opcode byte or the two-byte opcode escape prefix (if present).
Other placements are ignored. The instruction-size limit of 15 bytes still applies to instructions with a REX prefix.
• Instruction format for protected mode, real-address mode, and virtual-8086 mode
The Intel 64 and IA-32 architectures instruction encodings are subsets of the format shown. Instructions consist of optional
instruction prefixes (in any order), primary opcode bytes (up to three bytes), an addressing-form specifier (if required) consisting
of the ModR/M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if
required)

Mnemonics and opcodes
• Each x86 assembly instruction is represented by a mnemonic which, often combined with one or more
operands, translates to one or more bytes called an opcode;
NOP : 0x90
HLT : 0xF4
There are potential opcodes with no documented mnemonic which different processors may interpret
differently, making a program using them behave inconsistently or even generate an exception on some
processors. These opcodes often turn up in code writing competitions as a way to make the code smaller,
faster, more elegant or just show off the author's prowess.
Demonstrates how to find undocumented opcodes in x86 CPUs:
https://www.youtube.com/watch?v=KrksBdWcZgQ

Syntax
• x86 assembly language has two main syntax branches:
Intel syntax, originally used for documentation of the x86 platform and is dominant in the MS-DOS and
Windows world (Many x86 assemblers use Intel syntax, including NASM, FASM, MASM, TASM, and YASM)
AT&T syntax is dominant in the Unix world, since Unix was created at AT&T Bell Labs
Summary of the main differences between Intel syntax and AT&T syntax:
AT&T Intel
Parameter
order
Source before the destination.
mov $5, %eax
Destination before source.
mov eax, 5
Parameter
size
Mnemonics are suffixed with a letter
indicating the size of the operands: q for
qword, l for long (dword), w for word, and b
for byte.
addl $4, %esp
Derived from the name of the register that is
used (e.g. rax, eax, ax, al imply q, l, w, b,
respectively).
add esp, 4
Sigils
Immediate values prefixed with a "$",
registers prefixed with a "%".
The assembler automatically detects the type of
symbols; i.e., whether they are registers,
constants or something else.
Effective
addresses
General syntax of
DISP(BASE,INDEX,SCALE).
Example:
movl mem_location(%ebx,%ecx,4), %eax
Arithmetic expressions in square brackets;
additionally, size keywords like byte, word, or
dword have to be used if the size cannot be
determined from the operands.
Example:
mov eax, [ebx + ecx*4 + mem_location]

Execution modes
• Real mode (16-bit)
• Original operating mode of early generation x86 CPUs
• Protected mode (16-bit and 32-bit)
• 16-bit subset of instructions are available on the 16-bit x86 processors. These instructions are
available in real mode on all x86 processors, and in 16-bit protected mode (80286 onwards),
additional instructions relating to protected mode are available. On the 80386 and later, 32-bit
instructions (including later extensions) are also available in all modes, including real mode.
• protected of 80286 was extended to allow the 80386 to address up to 4 GB of memory,
• The 32-bit flat memory model of the 80386's helped drive large scale adoption of Windows 3.1
(which relied on protected mode) since Windows could now run many applications at once,
including DOS applications, by using virtual memory and simple multitasking.
• Virtual 8086 mode (16-bit)
• virtual 8086 mode (VM86) made it possible to run one or more real mode programs in a protected
environment which emulated real mode, (some programs could not run fully compatible)
• System Management Mode (16-bit)
• SMM, with some of its own special instructions, is available on some Intel i386SL, i486 and later
CPUs
• Long mode (64-bit)
• 64-bit instructions, and more registers, are also available. The instruction set is similar in each mode
but memory addressing and word size vary, requiring different programming strategies.

Segmented addressing (real, vm86, 80286 protected modes)
• uses a process known as segmentation to address memory
• Segmentation composes a memory address from two parts: a segment and an offset; the segment points to
the beginning of a 64 KB group of addresses and the offset determines how far from this beginning address
the desired address is.
• In segmented addressing, two registers are required for a complete memory address: one to hold the
segment, the other to hold the offset. In order to translate back into a flat address, the segment value is
shifted four bits left (equivalent to multiplication by 24 or 16) then added to the offset to form the full address,
which allows breaking the 64k barrier through clever choice of addresses, though it makes programming
considerably more complex.
Example:
DS = 0xDEAD, DX = 0xCAFE
memory address = 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE.
Therefore, the CPU can address up to 1,048,576 bytes (1 MB) in real mode.
• By combining segment and offset values we find a 20-bit address.
• When referring to an address with a segment and an offset the notation of segment:offset is used, so in the
above example the flat address 0xEB5CE can be written as 0xDEAD:0xCAFE or as a segment and offset
register pair; DS:DX.

• There are some special combinations of segment registers and general registers that point to important
addresses:
CS:IP (CS is Code Segment, IP is Instruction Pointer)
points to the address where the processor will fetch the next byte of code.
SS:SP (SS is Stack Segment, SP is Stack Pointer)
points to the address of the top of the stack, i.e. the most recently pushed byte.
DS:SI (DS is Data Segment, SI is Source Index)
is often used to point to string data that is about to be copied to ES:DI.
ES:DI (ES is Extra Segment, DI is Destination Index)
is typically used to point to the destination for a string copy, as mentioned above.
• In 80286 protected mode (utilized by OS/2)
80286 had 16-bit address registers, limiting only 216 bytes (64 kilobytes) of addressable space.
In protected mode, the CPU can use 24-bit addressing to access 224 bytes of memory (16 megabytes).
• In protected mode, the segment selector can be broken down into three parts: a 13-bit index, a Table
Indicator bit that determines whether the entry is in the GDT or LDT and a 2-bit Requested Privilege
Level

Stack instructions
PUSH src/immed
Decrements SP by the size of the operand (two or four, byte values are sign extended) and transfers one word
from source to the stack top (SS:SP).
POP dest
Transfers word at the current stack top (SS:SP) to the destination then increments SP by two to point to the
new stack top. CS is not a valid destination.
PUSHA
PUSHAD (386+)
Pushes all general purpose registers onto the stack in the following order: (E)AX, (E)CX, (E)DX, (E)BX, (E)SP,
(E)BP, (E)SI, (E)DI. The value of SP is the value before the actual push of SP.
POPA
POPAD (386+)
Pops the top 8 words off the stack into the 8 general purpose 16/32 bit registers. Registers are popped in the
following order: (E)DI, (E)SI, (E)BP, (E)SP, (E)DX, (E)CX and (E)AX. The (E)SP value popped from the stack
is actually discarded.
POPF
POPFD (386+)
Pops word / doubleword from stack into the Flags Register and then increments SP by 2 (for POPF) or 4 (for
POPFD).

Integer ALU instructions
standard mathematical operations:
ADD dest,src Modifies flags: AF CF OF PF SF ZF
Adds "src" to "dest" and replacing the original contents of "dest". Both operands are binary.
SUB dest,src Modifies flags: AF CF OF PF SF ZF
The source is subtracted from the destination and the result is stored in the destination.
MUL src Modifies flags: CF OF (AF,PF,SF,ZF undefined)
Unsigned multiply of the accumulator by the source. If "src" is a byte value, then AL is used as the other
multiplicand and the result is placed in AX. If "src" is a word value, then AX is multiplied by "src" and DX:AX
receives the result. If "src" is a double word value, then EAX is multiplied by "src" and EDX:EAX receives the
result. The 386+ uses an early out algorithm which makes multiplying any size value in EAX as fast as in the
8 or 16 bit registers.
DIV src Modifies flags: (AF,CF,OF,PF,SF,ZF undefined)
Unsigned binary division of accumulator by source. If the source divisor is a byte value then AX is divided by
"src" and the quotient is placed in AL and the remainder in AH. If source operand is a word value, then DX:AX
is divided by "src" and the quotient is stored in AX and the remainder in DX.

logical operators:
AND dest,src Modifies flags: CF OF PF SF ZF (AF undefined)
Performs a logical AND of the two operands replacing the destination with the result.
OR dest,src Modifies flags: CF OF PF SF ZF (AF undefined)
Logical inclusive OR of the two operands returning the result in the destination. Any bit set in either
operand will be set in the destination.
XOR dest,src Modifies flags: CF OF PF SF ZF (AF undefined)
Performs a bitwise exclusive OR of the operands and returns the result in the destination.
NEG dest Modifies flags: AF CF OF PF SF ZF
Subtracts the destination from 0 and saves the 2s complement of "dest" back into "dest"

bitshift arithmetic and logical:
SAL dest,count Modifies flags: CF OF PF SF ZF (AF undefined)
SHL dest,count
.-. .---------------. .-.
|C|<----|7 <---------- 0|<----|0|
'-' '---------------' '-'
Shifts the destination left by "count" bits with zeroes shifted in on right. The Carry Flag contains the last bit
shifted out.
SAR dest,count Modifies flags: CF OF PF SF ZF (AF undefined)
.---------------. .-.
.--|7 ----------> 0|---->|C|
| '---------------' '-'
'---^
Shifts the destination right by "count" bits with the current sign bit replicated in the leftmost bit. The Carry
Flag contains the last bit shifted out.
SHR dest,count Modifies flags: CF OF PF SF ZF (AF undefined)
.-. .---------------. .-.
|0|---->|7 ----------> 0|---->|C|
'-' '---------------' '-'
Shifts the destination right by "count" bits with zeroes shifted in on the left. The Carry Flag contains the last
bit shifted out.

rotate with and without carry:
RCL dest,count Modifies flags: CF OF
.-. .---------------.
.--|C|<----|7 <---------- 0|<-.
| '-' '---------------' |
'-----------------------------'
Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on
the right. The Carry Flag holds the last bit rotated out.
RCR dest,count Modifies flags: CF OF
.---------------. .-.
.->|7 ----------> 0|---->|C|--.
| '---------------' '-' |
'-----------------------------'
Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on
the left. The Carry Flag holds the last bit rotated out.
ROL dest,count Modifies flags: CF OF
.-. .---------------.
|C|<-.--|7 <---------- 0|<-.
'-' | '---------------' |
'---------------------‘
Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on
the right. The Carry Flag will contain the value of the last bit rotated out.
ROR dest,count Modifies flags: CF OF
.---------------. .-.
.->|7 ----------> 0|--.->|C|
| '---------------' | '-'
'---------------------'
Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on
the left. The Carry Flag will contain the value of the last bit rotated out.

complement of BCD arithmetic instructions / others
AAA Modifies flags: AF CF (OF,PF,SF,ZF undefined)
Changes contents of AL to valid unpacked decimal. The high order nibble is zeroed.
AAD Modifies flags: SF ZF PF (AF,CF,OF undefined)
Used before dividing unpacked decimal numbers. Multiplies AH by 10 and the adds result into AL. Sets AH to
zero. This instruction is also known to have an undocumented behavior.
AL := 10*AH+AL
AH := 0
AAM Modifies flags: PF SF ZF (AF,CF,OF undefined)
AH := AL / 10
AL := AL mod 10
Used after multiplication of two unpacked decimal numbers, this instruction adjusts an unpacked decimal
number. The high order nibble of each byte must be zeroed before using this instruction. This instruction is
also known to have an undocumented behavior.
AAS Modifies flags: AF CF (OF,PF,SF,ZF undefined)
Corrects result of a previous unpacked decimal subtraction in AL. High order nibble is zeroed.
DAA Modifies flags: AF CF PF SF ZF (OF undefined)
Corrects result (in AL) of a previous BCD addition operation. Contents of AL are changed to a pair of packed
decimal digits.
DAS Modifies flags: AF CF PF SF ZF (OF undefined)
Corrects result (in AL) of a previous BCD subtraction operation. Contents of AL are changed to a pair of
packed decimal digits.

Data manipulation instructions
data transfer instructions
MOV dest,src
Copies byte or word from the source operand to the destination operand. If the destination is SS interrupts
are disabled except on early buggy 808x CPUs. Some CPUs disable interrupts if the destination is any of
the segment registers
XCHG dest,src
Exchanges contents of source and destination.
MOVSX dest,src
Copies the value of the source operand to the destination register with the sign extended.
MOVZX dest,src
Copies the value of the source operand to the destination register with the zeroes extended.
CMPXCHG dest,src (486+) Modifies flags: AF CF OF PF SF ZF
Compares the accumulator (8-32 bits) with "dest". If equal the "dest" is loaded with "src", otherwise the
accumulator is loaded with "dest".
CWD
Extends sign of word in register AX throughout register DX forming a doubleword quantity in DX:AX.
CDQ
Converts signed DWORD in EAX to a signed quad word in EDX:EAX by extending the high order bit of EAX
throughout EDX

string/array instructions
MOVS dest,src
MOVSB
MOVSW
MOVSD (386+)
Copies data from addressed by DS:SI (even if operands are given) to the location ES:DI destination and
updates SI and DI based on the size of the operand or instruction used. SI and DI are incremented when the
Direction Flag is cleared and decremented when the Direction Flag is Set. Use with REP prefixes.
CMPS dest,src Modifies flags: AF CF OF PF SF ZF
CMPSB
CMPSW
CMPSD (386+)
Subtracts destination value from source without saving results. Updates flags based on the subtraction and
the index registers (E)SI and (E)DI are incremented or decremented depending on the state of the Direction
Flag. CMPSB inc/decrements the index registers by 1, CMPSW inc/decrements by 2, while CMPSD
increments or decrements by 4. The REP prefixes can be used to process entire data items.

SCAS string Modifies flags: AF CF OF PF SF ZF
SCASB
SCASW
SCASD (386+)
Compares value at ES:DI (even if operand is specified) from the accumulator and sets the flags similar to a
subtraction. DI is incremented/decremented based on the instruction format (or operand size) and the state
of the Direction Flag. Use with REP prefixes.
LODS src
LODSB
LODSW
LODSD (386+)
Transfers string element addressed by DS:SI (even if an operand is supplied) to the accumulator. SI is
incremented based on the size of the operand or based on the instruction used. If the Direction Flag is set
SI is decremented, if the Direction Flag is clear SI is incremented. Use with REP prefixes.
STOS dest
STOSB
STOSW
STOSD
Stores value in accumulator to location at ES:(E)DI (even if operand is given). (E)DI is
incremented/decremented based on the size of the operand (or instruction format) and the state of the
Direction Flag. Use with REP prefixes.

REP
Repeats execution of string instructions while CX != 0. After each string operation, CX is decremented and
the Zero Flag is tested. The combination of a repeat prefix and a segment override on CPU's before the 386
may result in errors if an interrupt occurs before CX=0. The following code shows code that is susceptible to
this and how to avoid it:
again: rep movs byte ptr ES:[DI],ES:[SI] ; vulnerable instr.
jcxz next ; continue if REP successful
loop again ; interrupt goofed count
next:
REPE
REPZ
Repeats execution of string instructions while CX != 0 and the Zero Flag is set. CX is decremented and the
Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override on
processors other than the 386 may result in errors if an interrupt occurs before CX=0.
REPNE
REPNZ
Repeats execution of string instructions while CX != 0 and the Zero Flag is clear. CX is decremented and
the Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override
on processors other than the 386 may result in errors if an interrupt occurs before CX=0.

Program flow
conditional jumps
Mnemonic Meaning Jump Condition
JA Jump if Above CF=0 and ZF=0
JAE Jump if Above or Equal CF=0
JB Jump if Below CF=1
JBE Jump if Below or Equal CF=1 or ZF=1
JC Jump if Carry CF=1
JCXZ Jump if CX Zero CX=0
JE Jump if Equal ZF=1
JG Jump if Greater (signed) ZF=0 and SF=OF
JGE Jump if Greater or Equal (signed) SF=OF
JL Jump if Less (signed) SF != OF
JLE Jump if Less or Equal (signed) ZF=1 or SF != OF
JNA Jump if Not Above CF=1 or ZF=1
JNAE Jump if Not Above or Equal CF=1
JNB Jump if Not Below CF=0
JNBE Jump if Not Below or Equal CF=0 and ZF=0
JNC Jump if Not Carry CF=0
JNE Jump if Not Equal ZF=0
JNG Jump if Not Greater (signed) ZF=1 or SF != OF
JNGE Jump if Not Greater or Equal (signed) SF != OF
JNL Jump if Not Less (signed) SF=OF
JNLE Jump if Not Less or Equal (signed) ZF=0 and SF=OF
JNO Jump if Not Overflow (signed) OF=0
JNP Jump if No Parity PF=0
JNS Jump if Not Signed (signed) SF=0
JNZ Jump if Not Zero ZF=0
JO Jump if Overflow (signed) OF=1
JP Jump if Parity PF=1
JPE Jump if Parity Even PF=1
JPO Jump if Parity Odd PF=0
JS Jump if Signed (signed) SF=1
JZ Jump if Zero

JCXZ label
JECXZ label (386+)
Causes execution to branch to "label" if register CX is zero. Uses unsigned comparision.
JMP target
Unconditionally transfers control to "label". Jumps by default are within -32768 to 32767 bytes from the
instruction following the jump. NEAR and SHORT jumps cause the IP to be updated while FAR jumps
cause CS and IP to be updated.
LEAVE
Releases the local variables created by the previous ENTER instruction by restoring SP and BP to their
condition before the procedure stack frame was initialized.
ENTER locals, level
Modifies stack for entry to procedure for high level language.
Operand "locals" specifies the amount of storage to be allocated on the stack. "Level" specifies the nesting
level of the routine. Paired with the LEAVE instruction, this is an efficient method of entry and exit to
procedures.

LOOP label
Decrements CX by 1 and transfers control to "label" if CX is not Zero. The "label" operand must be within -
128 or 127 bytes of the instruction following the loop instruction
LOOPE label
LOOPZ label
Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag
is set. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction.
LOOPNZ label
LOOPNE label
Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag
is clear. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction.
INT num Modifies flags: TF IF
Initiates a software interrupt by pushing the flags, clearing the Trap and Interrupt Flags, pushing CS followed
by IP and loading CS:IP with the value found in the interrupt vector table. Execution then begins at the
location addressed by the new CS:IP
CALL destination
Pushes Instruction Pointer (and Code Segment for far calls) onto stack and loads Instruction Pointer with the
address of proc-name. Code continues with execution at CS:IP.
RET/RETF/RETN nBytes
Transfers control from a procedure back to the instruction address saved on the stack. "n bytes“ is an optional
number of bytes to release. Far returns pop the IP followed by the CS, while near returns pop only the IP
register.

Segment Register Instructions
The segment register instructions allow far pointers (segment addresses) to be loaded into the segment
registers.
LDS dest,src
Loads 32-bit pointer from memory source to destination register and DS. The offset is placed in the
destination register and the segment is placed in DS. To use this instruction the word at the lower memory
address must contain the offset and the word at the higher address must contain the segment. This simplifies
the loading of far pointers from the stack and the interrupt vector table.
LFS dest,src
Loads 32-bit pointer from memory source to destination register and FS. The offset is placed in the
destination register and the segment is placed in FS. To use this instruction the word at the lower memory
address must contain the offset and the word at the higher address must contain the segment. This simplifies
the loading of far pointers from the stack and the interrupt vector table.
LEA dest,src
Transfers offset address of "src" to the destination register.
LES dest,src
Loads 32-bit pointer from memory source to destination register and ES. The offset is placed in the destination
register and the segment is placed in ES. To use this instruction the word at the lower memory address must
contain the offset and the word at the higher address must contain the segment. This simplifies the loading of
far pointers from the stack and the interrupt vector table.
LSS dest,src
Loads 32-bit pointer from memory source to destination register and SS. The offset is placed in the destination
register and the segment is placed in SS. To use this instruction the word at the lower memory address must
contain the offset and the word at the higher address must contain the segment. This simplifies the loading of
far pointers from the stack and the interrupt vector table.

I/O INSTRUCTIONS
These instructions move data between the processor’s I/O ports and a register or memory.
IN accum,port
A byte, word or dword is read from "port" and placed in AL, AX or EAX respectively. If the port number is in the range of 0-255 it
can be specified as an immediate, otherwise the port number must be specified in DX. Valid port ranges on the PC are 0-1024,
though values through 65535 may be specified and recognized by third party vendors and PS/2's.
OUT port,accum
Transfers byte in AL,word in AX or dword in EAX to the specified hardware port address. If the port number is in the range of 0-
255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only
decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the
port range 0-1023.
INS dest,port
INSB
INSW
INSD (386+)
Loads data from port to the destination ES:(E)DI (even if a destination operand is supplied). (E)DI is adjusted by the size of the
operand and increased if the Direction Flag is cleared and decreased if the Direction Flag is set. For INSB, INSW, INSD no
operands are allowed and the size is determined by the mnemonic.
OUTS port,src
OUTSB
OUTSW
OUTSD (386+)
Transfers a byte, word or doubleword from "src" to the hardware port specified in DX. For instructions with no operands the "src"
is located at DS:SI and SI is incremented or decremented by the size of the operand or the size dictated by the instruction
format. When the Direction Flag is set SI is decremented, when clear, SI is incremented. If the port number is in the range of 0-
255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only
decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the
port range 0-1023.

Flag Control (EFLAG) Instructions
The flag control instructions operate on the flags in the EFLAGS register
STC Modifies flags: CF
Sets the Carry Flag to 1.
STD Modifies flags: DF
Sets the Direction Flag to 1 causing string instructions to auto-decrement SI and DI instead of auto-increment
STI Modifies flags: IF
Sets the Interrupt Flag to 1, which enables recognition of all hardware interrupts. If an interrupt is generated by a hardware
device, an End of Interrupt (EOI) must also be issued to enable other hardware interrupts of the same or lower priority.
SAHF Modifies flags: AF CF PF SF ZF
Transfers bits 0-7 of AH into the Flags Register. This includes AF, CF, PF, SF and ZF.
LAHF
Copies bits 0-7 of the flags register into AH. This includes flags AF, CF, PF, SF and ZF other bits are undefined.
AH := SF ZF xx AF xx PF xx CF
CLC Modifies flags: CF
Clears the Carry Flag.
CLD Modifies flags: DF
Clears the Direction Flag causing string instructions to increment the SI and DI index registers.
CLI Modifies flags: IF
Disables the maskable hardware interrupts by clearing the Interrupt flag. NMI's and software interrupts are not inhibited.
CLTS
Clears the Task Switched Flag in the Machine Status Register. This is a privileged operation and is generally used only by
operating system code.

Miscellaneous Instructions
The miscellaneous instructions provide such functions as loading an effective address, executing a “no-
operation,” and retrieving processor identification information.
NOP
This is a do nothing instruction. It results in occupation of both space and time and is most useful for
patching code segments. (This is the original XCHG AL,AL instruction)
XLAT translation-table
XLATB (masm 5.x)
Replaces the byte in AL with byte from a user table addressed by BX. The original value of AL is the index
into the translate table. The best way to discripe this is MOV AL,[BX+AL]
CPUID
Processor Identification

OTHERS
Floating-point instructions
• instructions for a stack-based floating-point unit (FPU).
• The FPU instructions:
addition, subtraction, negation, multiplication, division, remainder, square roots,
integer truncation, fraction truncation, and scale by power of two.
The operations also include conversion instructions, which can load or store a value from memory in any of
the following formats: binary-coded decimal, 32-bit integer, 64-bit integer, 32-bit floating-point, 64-bit floating-
point or 80-bit floating-point (upon loading, the value is converted to the currently used floating-point mode).
• transcendental functions: sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to
bases 2, 10, or e.
• The stack register to stack register format of the instructions:
fop st, st(n) or fop st(n), st
where st is equivalent to st(0), and st(n) is one of the 8 stack registers (st(0),st(1),…, st(7)).
Like the integers, the first operand is both the first source operand and the destination operand.
fsubr and fdivr should be singled out as first swapping the source operands before performing the
subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions
include instruction modes that pop the top of the stack after their operation is complete.
So, for example, faddp st(1), st performs the calculation st(1) = st(1) + st(0), then removes st(0) from
the top of stack, thus making what was the result in st(1) the top of the stack in st(0).

SIMD instructions
• Modern x86 CPUs contain SIMD instructions,
• which largely perform the same operation in parallel on many values encoded in a wide SIMD register.
• Various instruction technologies support different operations on different register sets, but taken as
complete whole (from MMX to SSE4.2) they include general computations on integer or floating point
arithmetic (addition, subtraction, multiplication, shift, minimization, maximization, comparison, division or
square root).
So for example, paddw mm0, mm1 performs 4 parallel 16-bit (indicated by the w) integer adds (indicated by
the padd) of mm0 values to mm1 and stores the result in mm0.
• Streaming SIMD Extensions or SSE also includes a floating point mode in which only the very first value of
the registers is actually modified (expanded in SSE2).
• Some other unusual instructions have been added including a sum of absolute differences (used for motion
estimation in video compression, such as is done in MPEG) and a 16-bit multiply accumulation instruction
(useful for software-based alpha-blending and digital filtering).
• SSE (since SSE3) and 3DNow! extensions include addition and subtraction instructions for treating paired
floating point values like complex numbers.
These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and
extracting the values around within the registers. In addition there are instructions for moving data between
the integer registers and XMM (used in SSE)/FPU (used in MMX) registers.

Sources:
https://en.wikipedia.org/wiki/Stored-program_computer
https://en.wikipedia.org/wiki/Von_Neumann_architecture
https://en.wikipedia.org/wiki/Harvard_architecture
https://en.wikipedia.org/wiki/X86
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-
instruction-set-reference-manual-325383.pdf
http://www.masm32.com/

x86 architecture

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to x86 architecture

Similar to x86 architecture (20)

More from i i

More from i i (15)

Recently uploaded

Recently uploaded (20)

x86 architecture