SlideShare a Scribd company logo
x86
Features and Instruction Set Architecture
Stored-program computer
Stores program instructions in electronic memory, where programs and data in memory can be treated
interchangeably or uniformly.
von Neumann architecture
• also known as the von Neumann model and Princeton architecture, after 1945 work by John von
Neumann and others in the First Draft of a Report on the EDVAC
• stores program data and instruction data in the same memory
• consists of:
processing unit (arithmetic logic unit and processor registers)
control unit (instruction register and program counter)
memory (for data and instructions)
external mass storage
input and output mechanisms
• instruction fetch and a data operation cannot occur concurrently because they share a common bus;
referred to as the von Neumann bottleneck which often limits system performance.
Harvard architecture
• Based on the Harvard Mark I
• Data and instruction are stored in entirely separate memory systems
• CPU can fetch next instruction and load or store data simultaneously and independently
Modified Harvard architecture
• loose separation between code and data
• contents of the instruction memory can be accessed as if it were data.
• implemented on most modern CPU architectures
Implementation Modifications
• Split-cache (or Almost-von-Neumann) architecture
• builds memory hierarchy with a CPU cache separating instructions and data;
• unifies all except small portions of the data and instruction address spaces, providing the von
Neumann model
• cache coherency issues matter since it can greatly affect performance
• Instruction-memory-as-data architecture
• Preserves Harvard memory separation, but provides special machine operations to access the
contents of the instruction memory as data.
• Data-memory-as-instruction architecture
• can execute instructions fetched from any memory segment
• can read an instruction and read a data value simultaneously if they're in separate memory segments
with independent data buses (like Harvard).
• when executing an instruction from one memory segment, the same memory segment cannot be
simultaneously accessed as data
Three characteristics to distinguish modified Harvard machines from pure Harvard and von
Neumann machines:
Pure Harvard Von Neumann Modified Harvard
Instruction and data
memories occupy
different address
spaces
Separate address
"zero" in instruction
space and in data
space
store both instructions
and data in a single
address space
Separate address
"zero" in instruction
space and in data
space
Instruction and data
memories have
separate hardware
pathways to the
central processing
unit (CPU)
Separate pathways for
instruction and data
memories to CPU
unified address space such separate access
paths for CPU caches
or other tightly coupled
memories, but a
unified address space
covers the rest of the
memory hierarchy
Instruction and data
memories may be
accessed in different
ways
stored instructions on
a punched paper tape
and data in electro-
mechanical counters
provides uniform access to flash memory and
SRAM
Basic properties of the x86 architecture
• General consensus suggests that x86 is a modified Harvard architecture.
• The x86 architecture is a variable instruction length (typical 2 or 3 bytes, some are single-byte, others up to
15 bytes).
• Primarily "CISC" design with emphasis on backward compatibility.
• The instruction set is not typical CISC, but an extended version of the simple eight-bit 8008 and 8080
architectures
• Byte-addressing is enabled and words are stored in memory with little-endian byte order (LSB first)
• Memory access to unaligned addresses is allowed for all valid word sizes
• Native integer sizes for arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on
architecture generation
• Multiple scalar values can be handled simultaneously via the SIMD unit (starting with Pentium 3)
• Floating point (separate prior to 80486, built-in ever since) instructions and registers for floating point
operations
• SIMD (single instruction, multiple data) instructions works on (one or two) 128-bit words, each containing
two or four floating point numbers (each 64 or 32 bits wide respectively), or alternatively, 2, 4, 8 or 16
integers (each 64, 32, 16 or 8 bits wide respectively).
• Pipelining and Superscalar features (starting with Pentium) added extra decoding steps to split most
instructions into micro-operations buffered and scheduled by a control unit to be executed, partly in
parallel, by one of several execution units.
• Out-of-order and speculative execution uses branch prediction, register renaming, and memory
dependence prediction to allow execution of multiple x86 instructions simultaneously and not in the same
order as given in the instruction stream.
• Simultaneous multithreading
x86 REGISTERS
16-bit
• The original Intel 8086 and 8088 have fourteen 16-bit registers.
• Four are general-purpose registers (GPRs): AX, BX, CX, DX; Each can be accessed as two separate bytes
(the high byte and low byte)
• Two pointer registers have special roles: SP (stack pointer) points to the "top" of the stack, BP (base
pointer) is used to point anywhere on the stack.
• The address/index registers: SI, DI, BX and BP
• Four segment registers : CS, DS, SS and ES (used to form a memory address in segmented memory
mode)
• The FLAGS register contains, among others, carry flag (CF), overflow flag (OF) and zero flag (ZF).
• The instruction pointer (IP) points to the next instruction that will be fetched from memory and then
executed; is read-only to the software.
• three special registers (GDTR, LDTR, IDTR) hold descriptor table addresses to support protected mode in
80286 and a fourth task register (TR) is used for task switching.
32-bit
• 32-bit processor (starting with 80386) expanded the 16-bit GPRs, base and index registers, instruction
pointer, and FLAGS register to 32 bits (segment registers not affected)
• Represented by prefixing an "E" (for "extended") to the register names in x86 assembly language.
• The general-purpose, base, and index registers can all be used as the base in addressing modes, and all of
those registers except for the stack pointer can be used as the index in addressing modes.
• Two new segment registers (FS and GS) were added.
• the machine code format was expanded to accommodate expanded registers.
• control/status register (MXCSR) 32-bit Streaming SIMD Extensions (SSE) added starting with the Pentium
III.
64-bit
• 32-bit registers are expanded into 64-bit registers (introduced with AMD Opteron)
• addressing extended to 64 bits
• An R-prefix identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP),
• eight additional 64-bit general registers (R8-R15) were also introduced (only usable in 64-bit mode, which
is one of the two modes only available in long mode)
• extra addressing mode allows memory references relative to RIP (the instruction pointer), to ease the
implementation of position-independent code, used in shared libraries in some operating systems.
Miscellaneous/special purpose
• 32-bit x86 processors (starting with the 80386) also include various special/miscellaneous registers:
• control registers (CR0 through 4, CR8 for 64-bit only)
• debug registers (DR0 through 3, plus 6 and 7)
• test registers (TR3 through 7; 80486 only)
• model-specific registers (MSRs, appearing with the Pentium)
80-bit
• Available in all floating point units (FPU) also known as math co-processors
• They appears as part of the CPU
• 8087 (8086, 8088, 80186, and 80188), 80287 (80286), 80387 (80386), built-in starting with 80486
• eight 80-bit wide registers: st(0) to st(7)
• each register holds numeric data in one of seven formats: 32-, 64-, or 80-bit floating point, 16-, 32-, or 64-
bit (binary) integer, and 80-bit packed decimal integer
• The Pentium MMX added eight 64-bit MMX integer registers (MMX0 to MMX7, which share lower bits with
the 80-bit-wide FPU stack).
128-bit
• SIMD registers XMM0–XMM15.
256-bit
• SIMD registers YMM0–YMM15.
• introduced with Intel's Sandy Bridge processors, SIMD registers widened to 256 bits; AVX (Advanced
Vector Extensions) instructions also introduced.
512-bit
• SIMD registers ZMM0–ZMM31. Used by Knights Corner (on Intel Xeon Phi co-processors)
General Purpose Registers (A, B, C and D)
64 56 48 40 32 24 16 8
R?X
E?X
?X
?H ?L
General Purpose
• AL/AH/AX/EAX/RAX: Accumulator
• BL/BH/BX/EBX/RBX: Base index (for use with arrays)
• CL/CH/CX/ECX/RCX: Counter (for use with loops and strings)
• DL/DH/DX/EDX/RDX: Extend the precision of the accumulator (e.g. combine 32-bit EAX and EDX for 64-bit
integer operations in 32-bit code)
R8-R15 (for 64-bit CPUs)
64-bit mode-only General Purpose Registers
(R8, R9, R10, R11, R12, R13, R14, R15)
64 56 48 40 32 24 16 8
?
?D
?W
?B
Address/Index Registers
• SI/ESI/RSI: Source index for string operations.
• DI/EDI/RDI: Destination index for string operations.
Index Registers (S and D)
64 56 48 40 32 24 16 8
R?I
E?I
?I
?IL
Note: The ?IL registers are only available in 64-bit mode.
Stack Pointer Register
• SP/ESP/RSP: Stack pointer for top address of the stack.
• BP/EBP/RBP: Stack base pointer for holding the address of the current stack frame.
Pointer Registers (S and B)
64 56 48 40 32 24 16 8
R?P
E?P
?P
?PL
Note: The ?PL registers are only available in 64-bit mode.
Instruction Pointer Register
• IP/EIP/RIP: Instruction pointer. Holds the program counter, the current instruction address.
Instruction Pointer Register (I)
64 56 48 40 32 24 16 8
RIP
EIP
IP
Segment registers
• CS: Code
• DS: Data
• SS: Stack
• ES: Extra data
• FS: Extra data #2
GS: Extra data #3
Segment Registers (C, D, S, E, F and G)
16 8
?S
MODERN x86 REGISTER MAP
• First introduced with Intel 8086 and 8088 16-bit CPUs.
• Used by Intel, AMD, Cyrix, NEC, and Zilog
• Inherited many characteristics and instructions from the previous generation of 8-bit CPUs such as the
8080.
• modern x86 instruction set is a superset of 8086 instructions and a series of extensions to this instruction
set that began with the Intel 8008 microprocessor.
• Nearly full binary backward compatibility (between the Intel 8086 chip through to the current generation of
x86 processors, with certain exceptions)
• Using instructions that will execute on either anything later than an Intel 80386 (or fully compatible clone)
processor or else anything later than an Intel Pentium (or compatible clone) processor, (In recent years
various software requirements need at least support for later specific extensions to the instruction set,
e.g., MMX or SIMD).
x86 INSTRUCTION SET ARCHITECTURE
Basic Instruction Format
• most registers are expressed in opcodes using three or four bits to conserve encoding space;
• at most one operand to an instruction can be a memory location
• memory operand may also be the destination (or a combined source and destination), while the other
operand, the source, can be either register or immediate.
• The relatively small number of general registers (also inherited from its 8-bit ancestors) has made
register-relative addressing (using small immediate offsets) an important method of accessing
operands, especially on the stack, making such accesses as fast as register accesses, i.e. a one cycle
instruction throughput, in most circumstances where the accessed data is available in the top-level
cache.
• IA-32E Mode
sub-modes: Compatibility Mode (64-bit, legacy protected mode) 64-Bit Mode (full access to 64-bit address)
REX Prefixes
REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following:
• Specify GPRs and SSE registers.
• Specify 64-bit operand size.
• Specify extended control registers.
Not all instructions require a REX prefix in 64-bit mode. A prefix is necessary only if an instruction references one of the extended
registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is ignored. Only one REX prefix is allowed
per instruction. If used, the prefix must immediately precede the opcode byte or the two-byte opcode escape prefix (if present).
Other placements are ignored. The instruction-size limit of 15 bytes still applies to instructions with a REX prefix.
• Instruction format for protected mode, real-address mode, and virtual-8086 mode
The Intel 64 and IA-32 architectures instruction encodings are subsets of the format shown. Instructions consist of optional
instruction prefixes (in any order), primary opcode bytes (up to three bytes), an addressing-form specifier (if required) consisting
of the ModR/M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if
required)
Mnemonics and opcodes
• Each x86 assembly instruction is represented by a mnemonic which, often combined with one or more
operands, translates to one or more bytes called an opcode;
NOP : 0x90
HLT : 0xF4
There are potential opcodes with no documented mnemonic which different processors may interpret
differently, making a program using them behave inconsistently or even generate an exception on some
processors. These opcodes often turn up in code writing competitions as a way to make the code smaller,
faster, more elegant or just show off the author's prowess.
Demonstrates how to find undocumented opcodes in x86 CPUs:
https://www.youtube.com/watch?v=KrksBdWcZgQ
Syntax
• x86 assembly language has two main syntax branches:
Intel syntax, originally used for documentation of the x86 platform and is dominant in the MS-DOS and
Windows world (Many x86 assemblers use Intel syntax, including NASM, FASM, MASM, TASM, and YASM)
AT&T syntax is dominant in the Unix world, since Unix was created at AT&T Bell Labs
Summary of the main differences between Intel syntax and AT&T syntax:
AT&T Intel
Parameter
order
Source before the destination.
mov $5, %eax
Destination before source.
mov eax, 5
Parameter
size
Mnemonics are suffixed with a letter
indicating the size of the operands: q for
qword, l for long (dword), w for word, and b
for byte.
addl $4, %esp
Derived from the name of the register that is
used (e.g. rax, eax, ax, al imply q, l, w, b,
respectively).
add esp, 4
Sigils
Immediate values prefixed with a "$",
registers prefixed with a "%".
The assembler automatically detects the type of
symbols; i.e., whether they are registers,
constants or something else.
Effective
addresses
General syntax of
DISP(BASE,INDEX,SCALE).
Example:
movl mem_location(%ebx,%ecx,4), %eax
Arithmetic expressions in square brackets;
additionally, size keywords like byte, word, or
dword have to be used if the size cannot be
determined from the operands.
Example:
mov eax, [ebx + ecx*4 + mem_location]
Execution modes
• Real mode (16-bit)
• Original operating mode of early generation x86 CPUs
• Protected mode (16-bit and 32-bit)
• 16-bit subset of instructions are available on the 16-bit x86 processors. These instructions are
available in real mode on all x86 processors, and in 16-bit protected mode (80286 onwards),
additional instructions relating to protected mode are available. On the 80386 and later, 32-bit
instructions (including later extensions) are also available in all modes, including real mode.
• protected of 80286 was extended to allow the 80386 to address up to 4 GB of memory,
• The 32-bit flat memory model of the 80386's helped drive large scale adoption of Windows 3.1
(which relied on protected mode) since Windows could now run many applications at once,
including DOS applications, by using virtual memory and simple multitasking.
• Virtual 8086 mode (16-bit)
• virtual 8086 mode (VM86) made it possible to run one or more real mode programs in a protected
environment which emulated real mode, (some programs could not run fully compatible)
• System Management Mode (16-bit)
• SMM, with some of its own special instructions, is available on some Intel i386SL, i486 and later
CPUs
• Long mode (64-bit)
• 64-bit instructions, and more registers, are also available. The instruction set is similar in each mode
but memory addressing and word size vary, requiring different programming strategies.
Segmented addressing (real, vm86, 80286 protected modes)
• uses a process known as segmentation to address memory
• Segmentation composes a memory address from two parts: a segment and an offset; the segment points to
the beginning of a 64 KB group of addresses and the offset determines how far from this beginning address
the desired address is.
• In segmented addressing, two registers are required for a complete memory address: one to hold the
segment, the other to hold the offset. In order to translate back into a flat address, the segment value is
shifted four bits left (equivalent to multiplication by 24 or 16) then added to the offset to form the full address,
which allows breaking the 64k barrier through clever choice of addresses, though it makes programming
considerably more complex.
Example:
DS = 0xDEAD, DX = 0xCAFE
memory address = 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE.
Therefore, the CPU can address up to 1,048,576 bytes (1 MB) in real mode.
• By combining segment and offset values we find a 20-bit address.
• When referring to an address with a segment and an offset the notation of segment:offset is used, so in the
above example the flat address 0xEB5CE can be written as 0xDEAD:0xCAFE or as a segment and offset
register pair; DS:DX.
• There are some special combinations of segment registers and general registers that point to important
addresses:
CS:IP (CS is Code Segment, IP is Instruction Pointer)
points to the address where the processor will fetch the next byte of code.
SS:SP (SS is Stack Segment, SP is Stack Pointer)
points to the address of the top of the stack, i.e. the most recently pushed byte.
DS:SI (DS is Data Segment, SI is Source Index)
is often used to point to string data that is about to be copied to ES:DI.
ES:DI (ES is Extra Segment, DI is Destination Index)
is typically used to point to the destination for a string copy, as mentioned above.
• In 80286 protected mode (utilized by OS/2)
80286 had 16-bit address registers, limiting only 216 bytes (64 kilobytes) of addressable space.
In protected mode, the CPU can use 24-bit addressing to access 224 bytes of memory (16 megabytes).
• In protected mode, the segment selector can be broken down into three parts: a 13-bit index, a Table
Indicator bit that determines whether the entry is in the GDT or LDT and a 2-bit Requested Privilege
Level
BASIC x86 INSTRUCTIONS
Stack instructions
PUSH src/immed
Decrements SP by the size of the operand (two or four, byte values are sign extended) and transfers one word
from source to the stack top (SS:SP).
POP dest
Transfers word at the current stack top (SS:SP) to the destination then increments SP by two to point to the
new stack top. CS is not a valid destination.
PUSHA
PUSHAD (386+)
Pushes all general purpose registers onto the stack in the following order: (E)AX, (E)CX, (E)DX, (E)BX, (E)SP,
(E)BP, (E)SI, (E)DI. The value of SP is the value before the actual push of SP.
POPA
POPAD (386+)
Pops the top 8 words off the stack into the 8 general purpose 16/32 bit registers. Registers are popped in the
following order: (E)DI, (E)SI, (E)BP, (E)SP, (E)DX, (E)CX and (E)AX. The (E)SP value popped from the stack
is actually discarded.
POPF
POPFD (386+)
Pops word / doubleword from stack into the Flags Register and then increments SP by 2 (for POPF) or 4 (for
POPFD).
Integer ALU instructions
standard mathematical operations:
ADD dest,src Modifies flags: AF CF OF PF SF ZF
Adds "src" to "dest" and replacing the original contents of "dest". Both operands are binary.
SUB dest,src Modifies flags: AF CF OF PF SF ZF
The source is subtracted from the destination and the result is stored in the destination.
MUL src Modifies flags: CF OF (AF,PF,SF,ZF undefined)
Unsigned multiply of the accumulator by the source. If "src" is a byte value, then AL is used as the other
multiplicand and the result is placed in AX. If "src" is a word value, then AX is multiplied by "src" and DX:AX
receives the result. If "src" is a double word value, then EAX is multiplied by "src" and EDX:EAX receives the
result. The 386+ uses an early out algorithm which makes multiplying any size value in EAX as fast as in the
8 or 16 bit registers.
DIV src Modifies flags: (AF,CF,OF,PF,SF,ZF undefined)
Unsigned binary division of accumulator by source. If the source divisor is a byte value then AX is divided by
"src" and the quotient is placed in AL and the remainder in AH. If source operand is a word value, then DX:AX
is divided by "src" and the quotient is stored in AX and the remainder in DX.
logical operators:
AND dest,src Modifies flags: CF OF PF SF ZF (AF undefined)
Performs a logical AND of the two operands replacing the destination with the result.
OR dest,src Modifies flags: CF OF PF SF ZF (AF undefined)
Logical inclusive OR of the two operands returning the result in the destination. Any bit set in either
operand will be set in the destination.
XOR dest,src Modifies flags: CF OF PF SF ZF (AF undefined)
Performs a bitwise exclusive OR of the operands and returns the result in the destination.
NEG dest Modifies flags: AF CF OF PF SF ZF
Subtracts the destination from 0 and saves the 2s complement of "dest" back into "dest"
bitshift arithmetic and logical:
SAL dest,count Modifies flags: CF OF PF SF ZF (AF undefined)
SHL dest,count
.-. .---------------. .-.
|C|<----|7 <---------- 0|<----|0|
'-' '---------------' '-'
Shifts the destination left by "count" bits with zeroes shifted in on right. The Carry Flag contains the last bit
shifted out.
SAR dest,count Modifies flags: CF OF PF SF ZF (AF undefined)
.---------------. .-.
.--|7 ----------> 0|---->|C|
| '---------------' '-'
'---^
Shifts the destination right by "count" bits with the current sign bit replicated in the leftmost bit. The Carry
Flag contains the last bit shifted out.
SHR dest,count Modifies flags: CF OF PF SF ZF (AF undefined)
.-. .---------------. .-.
|0|---->|7 ----------> 0|---->|C|
'-' '---------------' '-'
Shifts the destination right by "count" bits with zeroes shifted in on the left. The Carry Flag contains the last
bit shifted out.
rotate with and without carry:
RCL dest,count Modifies flags: CF OF
.-. .---------------.
.--|C|<----|7 <---------- 0|<-.
| '-' '---------------' |
'-----------------------------'
Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on
the right. The Carry Flag holds the last bit rotated out.
RCR dest,count Modifies flags: CF OF
.---------------. .-.
.->|7 ----------> 0|---->|C|--.
| '---------------' '-' |
'-----------------------------'
Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on
the left. The Carry Flag holds the last bit rotated out.
ROL dest,count Modifies flags: CF OF
.-. .---------------.
|C|<-.--|7 <---------- 0|<-.
'-' | '---------------' |
'---------------------‘
Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on
the right. The Carry Flag will contain the value of the last bit rotated out.
ROR dest,count Modifies flags: CF OF
.---------------. .-.
.->|7 ----------> 0|--.->|C|
| '---------------' | '-'
'---------------------'
Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on
the left. The Carry Flag will contain the value of the last bit rotated out.
complement of BCD arithmetic instructions / others
AAA Modifies flags: AF CF (OF,PF,SF,ZF undefined)
Changes contents of AL to valid unpacked decimal. The high order nibble is zeroed.
AAD Modifies flags: SF ZF PF (AF,CF,OF undefined)
Used before dividing unpacked decimal numbers. Multiplies AH by 10 and the adds result into AL. Sets AH to
zero. This instruction is also known to have an undocumented behavior.
AL := 10*AH+AL
AH := 0
AAM Modifies flags: PF SF ZF (AF,CF,OF undefined)
AH := AL / 10
AL := AL mod 10
Used after multiplication of two unpacked decimal numbers, this instruction adjusts an unpacked decimal
number. The high order nibble of each byte must be zeroed before using this instruction. This instruction is
also known to have an undocumented behavior.
AAS Modifies flags: AF CF (OF,PF,SF,ZF undefined)
Corrects result of a previous unpacked decimal subtraction in AL. High order nibble is zeroed.
DAA Modifies flags: AF CF PF SF ZF (OF undefined)
Corrects result (in AL) of a previous BCD addition operation. Contents of AL are changed to a pair of packed
decimal digits.
DAS Modifies flags: AF CF PF SF ZF (OF undefined)
Corrects result (in AL) of a previous BCD subtraction operation. Contents of AL are changed to a pair of
packed decimal digits.
Data manipulation instructions
data transfer instructions
MOV dest,src
Copies byte or word from the source operand to the destination operand. If the destination is SS interrupts
are disabled except on early buggy 808x CPUs. Some CPUs disable interrupts if the destination is any of
the segment registers
XCHG dest,src
Exchanges contents of source and destination.
MOVSX dest,src
Copies the value of the source operand to the destination register with the sign extended.
MOVZX dest,src
Copies the value of the source operand to the destination register with the zeroes extended.
CMPXCHG dest,src (486+) Modifies flags: AF CF OF PF SF ZF
Compares the accumulator (8-32 bits) with "dest". If equal the "dest" is loaded with "src", otherwise the
accumulator is loaded with "dest".
CWD
Extends sign of word in register AX throughout register DX forming a doubleword quantity in DX:AX.
CDQ
Converts signed DWORD in EAX to a signed quad word in EDX:EAX by extending the high order bit of EAX
throughout EDX
string/array instructions
MOVS dest,src
MOVSB
MOVSW
MOVSD (386+)
Copies data from addressed by DS:SI (even if operands are given) to the location ES:DI destination and
updates SI and DI based on the size of the operand or instruction used. SI and DI are incremented when the
Direction Flag is cleared and decremented when the Direction Flag is Set. Use with REP prefixes.
CMPS dest,src Modifies flags: AF CF OF PF SF ZF
CMPSB
CMPSW
CMPSD (386+)
Subtracts destination value from source without saving results. Updates flags based on the subtraction and
the index registers (E)SI and (E)DI are incremented or decremented depending on the state of the Direction
Flag. CMPSB inc/decrements the index registers by 1, CMPSW inc/decrements by 2, while CMPSD
increments or decrements by 4. The REP prefixes can be used to process entire data items.
SCAS string Modifies flags: AF CF OF PF SF ZF
SCASB
SCASW
SCASD (386+)
Compares value at ES:DI (even if operand is specified) from the accumulator and sets the flags similar to a
subtraction. DI is incremented/decremented based on the instruction format (or operand size) and the state
of the Direction Flag. Use with REP prefixes.
LODS src
LODSB
LODSW
LODSD (386+)
Transfers string element addressed by DS:SI (even if an operand is supplied) to the accumulator. SI is
incremented based on the size of the operand or based on the instruction used. If the Direction Flag is set
SI is decremented, if the Direction Flag is clear SI is incremented. Use with REP prefixes.
STOS dest
STOSB
STOSW
STOSD
Stores value in accumulator to location at ES:(E)DI (even if operand is given). (E)DI is
incremented/decremented based on the size of the operand (or instruction format) and the state of the
Direction Flag. Use with REP prefixes.
REP
Repeats execution of string instructions while CX != 0. After each string operation, CX is decremented and
the Zero Flag is tested. The combination of a repeat prefix and a segment override on CPU's before the 386
may result in errors if an interrupt occurs before CX=0. The following code shows code that is susceptible to
this and how to avoid it:
again: rep movs byte ptr ES:[DI],ES:[SI] ; vulnerable instr.
jcxz next ; continue if REP successful
loop again ; interrupt goofed count
next:
REPE
REPZ
Repeats execution of string instructions while CX != 0 and the Zero Flag is set. CX is decremented and the
Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override on
processors other than the 386 may result in errors if an interrupt occurs before CX=0.
REPNE
REPNZ
Repeats execution of string instructions while CX != 0 and the Zero Flag is clear. CX is decremented and
the Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override
on processors other than the 386 may result in errors if an interrupt occurs before CX=0.
Program flow
conditional jumps
Mnemonic Meaning Jump Condition
JA Jump if Above CF=0 and ZF=0
JAE Jump if Above or Equal CF=0
JB Jump if Below CF=1
JBE Jump if Below or Equal CF=1 or ZF=1
JC Jump if Carry CF=1
JCXZ Jump if CX Zero CX=0
JE Jump if Equal ZF=1
JG Jump if Greater (signed) ZF=0 and SF=OF
JGE Jump if Greater or Equal (signed) SF=OF
JL Jump if Less (signed) SF != OF
JLE Jump if Less or Equal (signed) ZF=1 or SF != OF
JNA Jump if Not Above CF=1 or ZF=1
JNAE Jump if Not Above or Equal CF=1
JNB Jump if Not Below CF=0
JNBE Jump if Not Below or Equal CF=0 and ZF=0
JNC Jump if Not Carry CF=0
JNE Jump if Not Equal ZF=0
JNG Jump if Not Greater (signed) ZF=1 or SF != OF
JNGE Jump if Not Greater or Equal (signed) SF != OF
JNL Jump if Not Less (signed) SF=OF
JNLE Jump if Not Less or Equal (signed) ZF=0 and SF=OF
JNO Jump if Not Overflow (signed) OF=0
JNP Jump if No Parity PF=0
JNS Jump if Not Signed (signed) SF=0
JNZ Jump if Not Zero ZF=0
JO Jump if Overflow (signed) OF=1
JP Jump if Parity PF=1
JPE Jump if Parity Even PF=1
JPO Jump if Parity Odd PF=0
JS Jump if Signed (signed) SF=1
JZ Jump if Zero
JCXZ label
JECXZ label (386+)
Causes execution to branch to "label" if register CX is zero. Uses unsigned comparision.
JMP target
Unconditionally transfers control to "label". Jumps by default are within -32768 to 32767 bytes from the
instruction following the jump. NEAR and SHORT jumps cause the IP to be updated while FAR jumps
cause CS and IP to be updated.
LEAVE
Releases the local variables created by the previous ENTER instruction by restoring SP and BP to their
condition before the procedure stack frame was initialized.
ENTER locals, level
Modifies stack for entry to procedure for high level language.
Operand "locals" specifies the amount of storage to be allocated on the stack. "Level" specifies the nesting
level of the routine. Paired with the LEAVE instruction, this is an efficient method of entry and exit to
procedures.
LOOP label
Decrements CX by 1 and transfers control to "label" if CX is not Zero. The "label" operand must be within -
128 or 127 bytes of the instruction following the loop instruction
LOOPE label
LOOPZ label
Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag
is set. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction.
LOOPNZ label
LOOPNE label
Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag
is clear. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction.
INT num Modifies flags: TF IF
Initiates a software interrupt by pushing the flags, clearing the Trap and Interrupt Flags, pushing CS followed
by IP and loading CS:IP with the value found in the interrupt vector table. Execution then begins at the
location addressed by the new CS:IP
CALL destination
Pushes Instruction Pointer (and Code Segment for far calls) onto stack and loads Instruction Pointer with the
address of proc-name. Code continues with execution at CS:IP.
RET/RETF/RETN nBytes
Transfers control from a procedure back to the instruction address saved on the stack. "n bytes“ is an optional
number of bytes to release. Far returns pop the IP followed by the CS, while near returns pop only the IP
register.
Segment Register Instructions
The segment register instructions allow far pointers (segment addresses) to be loaded into the segment
registers.
LDS dest,src
Loads 32-bit pointer from memory source to destination register and DS. The offset is placed in the
destination register and the segment is placed in DS. To use this instruction the word at the lower memory
address must contain the offset and the word at the higher address must contain the segment. This simplifies
the loading of far pointers from the stack and the interrupt vector table.
LFS dest,src
Loads 32-bit pointer from memory source to destination register and FS. The offset is placed in the
destination register and the segment is placed in FS. To use this instruction the word at the lower memory
address must contain the offset and the word at the higher address must contain the segment. This simplifies
the loading of far pointers from the stack and the interrupt vector table.
LEA dest,src
Transfers offset address of "src" to the destination register.
LES dest,src
Loads 32-bit pointer from memory source to destination register and ES. The offset is placed in the destination
register and the segment is placed in ES. To use this instruction the word at the lower memory address must
contain the offset and the word at the higher address must contain the segment. This simplifies the loading of
far pointers from the stack and the interrupt vector table.
LSS dest,src
Loads 32-bit pointer from memory source to destination register and SS. The offset is placed in the destination
register and the segment is placed in SS. To use this instruction the word at the lower memory address must
contain the offset and the word at the higher address must contain the segment. This simplifies the loading of
far pointers from the stack and the interrupt vector table.
I/O INSTRUCTIONS
These instructions move data between the processor’s I/O ports and a register or memory.
IN accum,port
A byte, word or dword is read from "port" and placed in AL, AX or EAX respectively. If the port number is in the range of 0-255 it
can be specified as an immediate, otherwise the port number must be specified in DX. Valid port ranges on the PC are 0-1024,
though values through 65535 may be specified and recognized by third party vendors and PS/2's.
OUT port,accum
Transfers byte in AL,word in AX or dword in EAX to the specified hardware port address. If the port number is in the range of 0-
255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only
decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the
port range 0-1023.
INS dest,port
INSB
INSW
INSD (386+)
Loads data from port to the destination ES:(E)DI (even if a destination operand is supplied). (E)DI is adjusted by the size of the
operand and increased if the Direction Flag is cleared and decreased if the Direction Flag is set. For INSB, INSW, INSD no
operands are allowed and the size is determined by the mnemonic.
OUTS port,src
OUTSB
OUTSW
OUTSD (386+)
Transfers a byte, word or doubleword from "src" to the hardware port specified in DX. For instructions with no operands the "src"
is located at DS:SI and SI is incremented or decremented by the size of the operand or the size dictated by the instruction
format. When the Direction Flag is set SI is decremented, when clear, SI is incremented. If the port number is in the range of 0-
255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only
decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the
port range 0-1023.
Flag Control (EFLAG) Instructions
The flag control instructions operate on the flags in the EFLAGS register
STC Modifies flags: CF
Sets the Carry Flag to 1.
STD Modifies flags: DF
Sets the Direction Flag to 1 causing string instructions to auto-decrement SI and DI instead of auto-increment
STI Modifies flags: IF
Sets the Interrupt Flag to 1, which enables recognition of all hardware interrupts. If an interrupt is generated by a hardware
device, an End of Interrupt (EOI) must also be issued to enable other hardware interrupts of the same or lower priority.
SAHF Modifies flags: AF CF PF SF ZF
Transfers bits 0-7 of AH into the Flags Register. This includes AF, CF, PF, SF and ZF.
LAHF
Copies bits 0-7 of the flags register into AH. This includes flags AF, CF, PF, SF and ZF other bits are undefined.
AH := SF ZF xx AF xx PF xx CF
CLC Modifies flags: CF
Clears the Carry Flag.
CLD Modifies flags: DF
Clears the Direction Flag causing string instructions to increment the SI and DI index registers.
CLI Modifies flags: IF
Disables the maskable hardware interrupts by clearing the Interrupt flag. NMI's and software interrupts are not inhibited.
CLTS
Clears the Task Switched Flag in the Machine Status Register. This is a privileged operation and is generally used only by
operating system code.
Miscellaneous Instructions
The miscellaneous instructions provide such functions as loading an effective address, executing a “no-
operation,” and retrieving processor identification information.
NOP
This is a do nothing instruction. It results in occupation of both space and time and is most useful for
patching code segments. (This is the original XCHG AL,AL instruction)
XLAT translation-table
XLATB (masm 5.x)
Replaces the byte in AL with byte from a user table addressed by BX. The original value of AL is the index
into the translate table. The best way to discripe this is MOV AL,[BX+AL]
CPUID
Processor Identification
OTHERS
Floating-point instructions
• instructions for a stack-based floating-point unit (FPU).
• The FPU instructions:
addition, subtraction, negation, multiplication, division, remainder, square roots,
integer truncation, fraction truncation, and scale by power of two.
The operations also include conversion instructions, which can load or store a value from memory in any of
the following formats: binary-coded decimal, 32-bit integer, 64-bit integer, 32-bit floating-point, 64-bit floating-
point or 80-bit floating-point (upon loading, the value is converted to the currently used floating-point mode).
• transcendental functions: sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to
bases 2, 10, or e.
• The stack register to stack register format of the instructions:
fop st, st(n) or fop st(n), st
where st is equivalent to st(0), and st(n) is one of the 8 stack registers (st(0),st(1),…, st(7)).
Like the integers, the first operand is both the first source operand and the destination operand.
fsubr and fdivr should be singled out as first swapping the source operands before performing the
subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions
include instruction modes that pop the top of the stack after their operation is complete.
So, for example, faddp st(1), st performs the calculation st(1) = st(1) + st(0), then removes st(0) from
the top of stack, thus making what was the result in st(1) the top of the stack in st(0).
SIMD instructions
• Modern x86 CPUs contain SIMD instructions,
• which largely perform the same operation in parallel on many values encoded in a wide SIMD register.
• Various instruction technologies support different operations on different register sets, but taken as
complete whole (from MMX to SSE4.2) they include general computations on integer or floating point
arithmetic (addition, subtraction, multiplication, shift, minimization, maximization, comparison, division or
square root).
So for example, paddw mm0, mm1 performs 4 parallel 16-bit (indicated by the w) integer adds (indicated by
the padd) of mm0 values to mm1 and stores the result in mm0.
• Streaming SIMD Extensions or SSE also includes a floating point mode in which only the very first value of
the registers is actually modified (expanded in SSE2).
• Some other unusual instructions have been added including a sum of absolute differences (used for motion
estimation in video compression, such as is done in MPEG) and a 16-bit multiply accumulation instruction
(useful for software-based alpha-blending and digital filtering).
• SSE (since SSE3) and 3DNow! extensions include addition and subtraction instructions for treating paired
floating point values like complex numbers.
These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and
extracting the values around within the registers. In addition there are instructions for moving data between
the integer registers and XMM (used in SSE)/FPU (used in MMX) registers.
Sources:
https://en.wikipedia.org/wiki/Stored-program_computer
https://en.wikipedia.org/wiki/Von_Neumann_architecture
https://en.wikipedia.org/wiki/Harvard_architecture
https://en.wikipedia.org/wiki/X86
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-
instruction-set-reference-manual-325383.pdf
http://www.masm32.com/

More Related Content

What's hot

Pentium processor
Pentium processorPentium processor
Pentium processor
Pranjali Deshmukh
 
Register Organization of 80386
Register Organization of 80386Register Organization of 80386
Internal microprocessor architecture
Internal microprocessor architectureInternal microprocessor architecture
Internal microprocessor architecture
University of Gujrat, Pakistan
 
EE5440 – Computer Architecture - Lecture 2
EE5440 – Computer Architecture - Lecture 2EE5440 – Computer Architecture - Lecture 2
EE5440 – Computer Architecture - Lecture 2
Dilawar Khan
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
sreea4
 
Register of 80386
Register of 80386Register of 80386
Register of 80386aviban
 
8086 assembly language
8086 assembly language8086 assembly language
8086 assembly languageMir Majid
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
Vinit Raut
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
Zakaria Gomaa
 
isa architecture
isa architectureisa architecture
isa architecture
AJAL A J
 
Embedded systems basics
Embedded systems basicsEmbedded systems basics
Embedded systems basics
Mathivanan Natarajan
 
Programmers model of 8086
Programmers model of 8086Programmers model of 8086
Programmers model of 8086
KunalPatel260
 
Arm processors' architecture
Arm processors'   architectureArm processors'   architecture
Arm processors' architecture
Dr.YNM
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystemSandeep Kamath
 
Advanced micro -processor
Advanced micro -processorAdvanced micro -processor
Advanced micro -processor
Hinal Lunagariya
 
Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086
Shehrevar Davierwala
 
Memory banking-of-8086-final
Memory banking-of-8086-finalMemory banking-of-8086-final
Memory banking-of-8086-final
Estiak Khan
 
Chapter 7
Chapter 7Chapter 7
Intel® 80386 microprocessor registers
Intel® 80386 microprocessor registersIntel® 80386 microprocessor registers
Intel® 80386 microprocessor registers
Neel Shah
 
Arm assembly language programming
Arm assembly language programmingArm assembly language programming
Arm assembly language programming
v Kalairajan
 

What's hot (20)

Pentium processor
Pentium processorPentium processor
Pentium processor
 
Register Organization of 80386
Register Organization of 80386Register Organization of 80386
Register Organization of 80386
 
Internal microprocessor architecture
Internal microprocessor architectureInternal microprocessor architecture
Internal microprocessor architecture
 
EE5440 – Computer Architecture - Lecture 2
EE5440 – Computer Architecture - Lecture 2EE5440 – Computer Architecture - Lecture 2
EE5440 – Computer Architecture - Lecture 2
 
The ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM ArchitectureThe ARM Architecture: ARM : ARM Architecture
The ARM Architecture: ARM : ARM Architecture
 
Register of 80386
Register of 80386Register of 80386
Register of 80386
 
8086 assembly language
8086 assembly language8086 assembly language
8086 assembly language
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 
isa architecture
isa architectureisa architecture
isa architecture
 
Embedded systems basics
Embedded systems basicsEmbedded systems basics
Embedded systems basics
 
Programmers model of 8086
Programmers model of 8086Programmers model of 8086
Programmers model of 8086
 
Arm processors' architecture
Arm processors'   architectureArm processors'   architecture
Arm processors' architecture
 
Ct213 memory subsystem
Ct213 memory subsystemCt213 memory subsystem
Ct213 memory subsystem
 
Advanced micro -processor
Advanced micro -processorAdvanced micro -processor
Advanced micro -processor
 
Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086Assembly language programming_fundamentals 8086
Assembly language programming_fundamentals 8086
 
Memory banking-of-8086-final
Memory banking-of-8086-finalMemory banking-of-8086-final
Memory banking-of-8086-final
 
Chapter 7
Chapter 7Chapter 7
Chapter 7
 
Intel® 80386 microprocessor registers
Intel® 80386 microprocessor registersIntel® 80386 microprocessor registers
Intel® 80386 microprocessor registers
 
Arm assembly language programming
Arm assembly language programmingArm assembly language programming
Arm assembly language programming
 

Similar to x86 architecture

Microprocessor
MicroprocessorMicroprocessor
Microprocessor
CharltonInao1
 
Architecture_of_80386_Microprocessor - Inroduction
Architecture_of_80386_Microprocessor - InroductionArchitecture_of_80386_Microprocessor - Inroduction
Architecture_of_80386_Microprocessor - Inroduction
rajasekarandpm
 
80386.pptx
80386.pptx80386.pptx
Microprocessor 80386
Microprocessor 80386Microprocessor 80386
Microprocessor 80386yash sawarkar
 
80386
8038680386
Intel microprocessor history lec12_x86arch.ppt
Intel microprocessor history lec12_x86arch.pptIntel microprocessor history lec12_x86arch.ppt
Intel microprocessor history lec12_x86arch.ppt
jeronimored
 
Architecture_of_80386_Micropro-An Introduction
Architecture_of_80386_Micropro-An IntroductionArchitecture_of_80386_Micropro-An Introduction
Architecture_of_80386_Micropro-An Introduction
rajasekarandpm
 
It322 intro 2
It322 intro 2It322 intro 2
It322 intro 2
J Cza Àkera
 
Architecture of 80386(www.munnuz.co.cc)
Architecture of 80386(www.munnuz.co.cc)Architecture of 80386(www.munnuz.co.cc)
Architecture of 80386(www.munnuz.co.cc)
muneer.k
 
SS-CISC -1.pptx
SS-CISC -1.pptxSS-CISC -1.pptx
SS-CISC -1.pptx
kalavathisugan
 
U I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptxU I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptx
SangeetaShekhawatTri
 
Introduction to 80386 microprocessor
Introduction to 80386 microprocessorIntroduction to 80386 microprocessor
Introduction to 80386 microprocessor
Shehrevar Davierwala
 
ADVANCED MICROPROCESSORS featuers, block diagram and register organization.ppt
ADVANCED MICROPROCESSORS featuers, block diagram and register organization.pptADVANCED MICROPROCESSORS featuers, block diagram and register organization.ppt
ADVANCED MICROPROCESSORS featuers, block diagram and register organization.ppt
NaganarasaiahGoud
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
Sanjay164567
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
ssuserbe76c3
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
HarshShah659
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
ChagantiSahith
 
advancsed microprocessor and interfacing
advancsed microprocessor and interfacingadvancsed microprocessor and interfacing
advancsed microprocessor and interfacing
@zenafaris91
 

Similar to x86 architecture (20)

The 80386 80486
The 80386 80486The 80386 80486
The 80386 80486
 
Microprocessor
MicroprocessorMicroprocessor
Microprocessor
 
Cis cvs risc
Cis cvs riscCis cvs risc
Cis cvs risc
 
Architecture_of_80386_Microprocessor - Inroduction
Architecture_of_80386_Microprocessor - InroductionArchitecture_of_80386_Microprocessor - Inroduction
Architecture_of_80386_Microprocessor - Inroduction
 
80386.pptx
80386.pptx80386.pptx
80386.pptx
 
Microprocessor 80386
Microprocessor 80386Microprocessor 80386
Microprocessor 80386
 
80386
8038680386
80386
 
Intel microprocessor history lec12_x86arch.ppt
Intel microprocessor history lec12_x86arch.pptIntel microprocessor history lec12_x86arch.ppt
Intel microprocessor history lec12_x86arch.ppt
 
Architecture_of_80386_Micropro-An Introduction
Architecture_of_80386_Micropro-An IntroductionArchitecture_of_80386_Micropro-An Introduction
Architecture_of_80386_Micropro-An Introduction
 
It322 intro 2
It322 intro 2It322 intro 2
It322 intro 2
 
Architecture of 80386(www.munnuz.co.cc)
Architecture of 80386(www.munnuz.co.cc)Architecture of 80386(www.munnuz.co.cc)
Architecture of 80386(www.munnuz.co.cc)
 
SS-CISC -1.pptx
SS-CISC -1.pptxSS-CISC -1.pptx
SS-CISC -1.pptx
 
U I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptxU I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptx
 
Introduction to 80386 microprocessor
Introduction to 80386 microprocessorIntroduction to 80386 microprocessor
Introduction to 80386 microprocessor
 
ADVANCED MICROPROCESSORS featuers, block diagram and register organization.ppt
ADVANCED MICROPROCESSORS featuers, block diagram and register organization.pptADVANCED MICROPROCESSORS featuers, block diagram and register organization.ppt
ADVANCED MICROPROCESSORS featuers, block diagram and register organization.ppt
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
 
8086_architecture.ppt
8086_architecture.ppt8086_architecture.ppt
8086_architecture.ppt
 
advancsed microprocessor and interfacing
advancsed microprocessor and interfacingadvancsed microprocessor and interfacing
advancsed microprocessor and interfacing
 

More from i i

Bouncing circle
Bouncing circleBouncing circle
Bouncing circle
i i
 
0-1 KNAPSACK PROBLEM
0-1 KNAPSACK PROBLEM0-1 KNAPSACK PROBLEM
0-1 KNAPSACK PROBLEM
i i
 
sequential and combinational circuits exam
sequential and combinational circuits examsequential and combinational circuits exam
sequential and combinational circuits exam
i i
 
hypothesis testing overview
hypothesis testing overviewhypothesis testing overview
hypothesis testing overview
i i
 
boolean algebra exercises
boolean algebra exercisesboolean algebra exercises
boolean algebra exercises
i i
 
database normalization case study
database normalization case studydatabase normalization case study
database normalization case study
i i
 
cpbricks context diagram
cpbricks context diagramcpbricks context diagram
cpbricks context diagram
i i
 
cpbricks project document
cpbricks project documentcpbricks project document
cpbricks project document
i i
 
cpbricks manual
cpbricks manualcpbricks manual
cpbricks manual
i i
 
imperative programming language, java, android
imperative programming language, java, androidimperative programming language, java, android
imperative programming language, java, android
i i
 
shortest job first
shortest job firstshortest job first
shortest job first
i i
 
designing reports
designing reportsdesigning reports
designing reports
i i
 
bnf of c switch statement
bnf of c switch statementbnf of c switch statement
bnf of c switch statement
i i
 
shell and merge sort
shell and merge sortshell and merge sort
shell and merge sort
i i
 
adders/subtractors, multiplexers, intro to ISA
adders/subtractors, multiplexers, intro to ISAadders/subtractors, multiplexers, intro to ISA
adders/subtractors, multiplexers, intro to ISA
i i
 

More from i i (15)

Bouncing circle
Bouncing circleBouncing circle
Bouncing circle
 
0-1 KNAPSACK PROBLEM
0-1 KNAPSACK PROBLEM0-1 KNAPSACK PROBLEM
0-1 KNAPSACK PROBLEM
 
sequential and combinational circuits exam
sequential and combinational circuits examsequential and combinational circuits exam
sequential and combinational circuits exam
 
hypothesis testing overview
hypothesis testing overviewhypothesis testing overview
hypothesis testing overview
 
boolean algebra exercises
boolean algebra exercisesboolean algebra exercises
boolean algebra exercises
 
database normalization case study
database normalization case studydatabase normalization case study
database normalization case study
 
cpbricks context diagram
cpbricks context diagramcpbricks context diagram
cpbricks context diagram
 
cpbricks project document
cpbricks project documentcpbricks project document
cpbricks project document
 
cpbricks manual
cpbricks manualcpbricks manual
cpbricks manual
 
imperative programming language, java, android
imperative programming language, java, androidimperative programming language, java, android
imperative programming language, java, android
 
shortest job first
shortest job firstshortest job first
shortest job first
 
designing reports
designing reportsdesigning reports
designing reports
 
bnf of c switch statement
bnf of c switch statementbnf of c switch statement
bnf of c switch statement
 
shell and merge sort
shell and merge sortshell and merge sort
shell and merge sort
 
adders/subtractors, multiplexers, intro to ISA
adders/subtractors, multiplexers, intro to ISAadders/subtractors, multiplexers, intro to ISA
adders/subtractors, multiplexers, intro to ISA
 

Recently uploaded

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
abh.arya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
Kamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 

Recently uploaded (20)

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 

x86 architecture

  • 1. x86 Features and Instruction Set Architecture
  • 2. Stored-program computer Stores program instructions in electronic memory, where programs and data in memory can be treated interchangeably or uniformly. von Neumann architecture • also known as the von Neumann model and Princeton architecture, after 1945 work by John von Neumann and others in the First Draft of a Report on the EDVAC • stores program data and instruction data in the same memory • consists of: processing unit (arithmetic logic unit and processor registers) control unit (instruction register and program counter) memory (for data and instructions) external mass storage input and output mechanisms • instruction fetch and a data operation cannot occur concurrently because they share a common bus; referred to as the von Neumann bottleneck which often limits system performance.
  • 3. Harvard architecture • Based on the Harvard Mark I • Data and instruction are stored in entirely separate memory systems • CPU can fetch next instruction and load or store data simultaneously and independently
  • 4. Modified Harvard architecture • loose separation between code and data • contents of the instruction memory can be accessed as if it were data. • implemented on most modern CPU architectures Implementation Modifications • Split-cache (or Almost-von-Neumann) architecture • builds memory hierarchy with a CPU cache separating instructions and data; • unifies all except small portions of the data and instruction address spaces, providing the von Neumann model • cache coherency issues matter since it can greatly affect performance • Instruction-memory-as-data architecture • Preserves Harvard memory separation, but provides special machine operations to access the contents of the instruction memory as data. • Data-memory-as-instruction architecture • can execute instructions fetched from any memory segment • can read an instruction and read a data value simultaneously if they're in separate memory segments with independent data buses (like Harvard). • when executing an instruction from one memory segment, the same memory segment cannot be simultaneously accessed as data
  • 5. Three characteristics to distinguish modified Harvard machines from pure Harvard and von Neumann machines: Pure Harvard Von Neumann Modified Harvard Instruction and data memories occupy different address spaces Separate address "zero" in instruction space and in data space store both instructions and data in a single address space Separate address "zero" in instruction space and in data space Instruction and data memories have separate hardware pathways to the central processing unit (CPU) Separate pathways for instruction and data memories to CPU unified address space such separate access paths for CPU caches or other tightly coupled memories, but a unified address space covers the rest of the memory hierarchy Instruction and data memories may be accessed in different ways stored instructions on a punched paper tape and data in electro- mechanical counters provides uniform access to flash memory and SRAM
  • 6. Basic properties of the x86 architecture • General consensus suggests that x86 is a modified Harvard architecture. • The x86 architecture is a variable instruction length (typical 2 or 3 bytes, some are single-byte, others up to 15 bytes). • Primarily "CISC" design with emphasis on backward compatibility. • The instruction set is not typical CISC, but an extended version of the simple eight-bit 8008 and 8080 architectures • Byte-addressing is enabled and words are stored in memory with little-endian byte order (LSB first) • Memory access to unaligned addresses is allowed for all valid word sizes • Native integer sizes for arithmetic and memory addresses (or offsets) is 16, 32 or 64 bits depending on architecture generation • Multiple scalar values can be handled simultaneously via the SIMD unit (starting with Pentium 3) • Floating point (separate prior to 80486, built-in ever since) instructions and registers for floating point operations • SIMD (single instruction, multiple data) instructions works on (one or two) 128-bit words, each containing two or four floating point numbers (each 64 or 32 bits wide respectively), or alternatively, 2, 4, 8 or 16 integers (each 64, 32, 16 or 8 bits wide respectively). • Pipelining and Superscalar features (starting with Pentium) added extra decoding steps to split most instructions into micro-operations buffered and scheduled by a control unit to be executed, partly in parallel, by one of several execution units. • Out-of-order and speculative execution uses branch prediction, register renaming, and memory dependence prediction to allow execution of multiple x86 instructions simultaneously and not in the same order as given in the instruction stream. • Simultaneous multithreading
  • 7. x86 REGISTERS 16-bit • The original Intel 8086 and 8088 have fourteen 16-bit registers. • Four are general-purpose registers (GPRs): AX, BX, CX, DX; Each can be accessed as two separate bytes (the high byte and low byte) • Two pointer registers have special roles: SP (stack pointer) points to the "top" of the stack, BP (base pointer) is used to point anywhere on the stack. • The address/index registers: SI, DI, BX and BP • Four segment registers : CS, DS, SS and ES (used to form a memory address in segmented memory mode) • The FLAGS register contains, among others, carry flag (CF), overflow flag (OF) and zero flag (ZF). • The instruction pointer (IP) points to the next instruction that will be fetched from memory and then executed; is read-only to the software. • three special registers (GDTR, LDTR, IDTR) hold descriptor table addresses to support protected mode in 80286 and a fourth task register (TR) is used for task switching.
  • 8. 32-bit • 32-bit processor (starting with 80386) expanded the 16-bit GPRs, base and index registers, instruction pointer, and FLAGS register to 32 bits (segment registers not affected) • Represented by prefixing an "E" (for "extended") to the register names in x86 assembly language. • The general-purpose, base, and index registers can all be used as the base in addressing modes, and all of those registers except for the stack pointer can be used as the index in addressing modes. • Two new segment registers (FS and GS) were added. • the machine code format was expanded to accommodate expanded registers. • control/status register (MXCSR) 32-bit Streaming SIMD Extensions (SSE) added starting with the Pentium III. 64-bit • 32-bit registers are expanded into 64-bit registers (introduced with AMD Opteron) • addressing extended to 64 bits • An R-prefix identifies the 64-bit registers (RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP), • eight additional 64-bit general registers (R8-R15) were also introduced (only usable in 64-bit mode, which is one of the two modes only available in long mode) • extra addressing mode allows memory references relative to RIP (the instruction pointer), to ease the implementation of position-independent code, used in shared libraries in some operating systems. Miscellaneous/special purpose • 32-bit x86 processors (starting with the 80386) also include various special/miscellaneous registers: • control registers (CR0 through 4, CR8 for 64-bit only) • debug registers (DR0 through 3, plus 6 and 7) • test registers (TR3 through 7; 80486 only) • model-specific registers (MSRs, appearing with the Pentium)
  • 9. 80-bit • Available in all floating point units (FPU) also known as math co-processors • They appears as part of the CPU • 8087 (8086, 8088, 80186, and 80188), 80287 (80286), 80387 (80386), built-in starting with 80486 • eight 80-bit wide registers: st(0) to st(7) • each register holds numeric data in one of seven formats: 32-, 64-, or 80-bit floating point, 16-, 32-, or 64- bit (binary) integer, and 80-bit packed decimal integer • The Pentium MMX added eight 64-bit MMX integer registers (MMX0 to MMX7, which share lower bits with the 80-bit-wide FPU stack). 128-bit • SIMD registers XMM0–XMM15. 256-bit • SIMD registers YMM0–YMM15. • introduced with Intel's Sandy Bridge processors, SIMD registers widened to 256 bits; AVX (Advanced Vector Extensions) instructions also introduced. 512-bit • SIMD registers ZMM0–ZMM31. Used by Knights Corner (on Intel Xeon Phi co-processors)
  • 10. General Purpose Registers (A, B, C and D) 64 56 48 40 32 24 16 8 R?X E?X ?X ?H ?L General Purpose • AL/AH/AX/EAX/RAX: Accumulator • BL/BH/BX/EBX/RBX: Base index (for use with arrays) • CL/CH/CX/ECX/RCX: Counter (for use with loops and strings) • DL/DH/DX/EDX/RDX: Extend the precision of the accumulator (e.g. combine 32-bit EAX and EDX for 64-bit integer operations in 32-bit code) R8-R15 (for 64-bit CPUs) 64-bit mode-only General Purpose Registers (R8, R9, R10, R11, R12, R13, R14, R15) 64 56 48 40 32 24 16 8 ? ?D ?W ?B
  • 11. Address/Index Registers • SI/ESI/RSI: Source index for string operations. • DI/EDI/RDI: Destination index for string operations. Index Registers (S and D) 64 56 48 40 32 24 16 8 R?I E?I ?I ?IL Note: The ?IL registers are only available in 64-bit mode. Stack Pointer Register • SP/ESP/RSP: Stack pointer for top address of the stack. • BP/EBP/RBP: Stack base pointer for holding the address of the current stack frame. Pointer Registers (S and B) 64 56 48 40 32 24 16 8 R?P E?P ?P ?PL Note: The ?PL registers are only available in 64-bit mode.
  • 12. Instruction Pointer Register • IP/EIP/RIP: Instruction pointer. Holds the program counter, the current instruction address. Instruction Pointer Register (I) 64 56 48 40 32 24 16 8 RIP EIP IP Segment registers • CS: Code • DS: Data • SS: Stack • ES: Extra data • FS: Extra data #2 GS: Extra data #3 Segment Registers (C, D, S, E, F and G) 16 8 ?S
  • 14. • First introduced with Intel 8086 and 8088 16-bit CPUs. • Used by Intel, AMD, Cyrix, NEC, and Zilog • Inherited many characteristics and instructions from the previous generation of 8-bit CPUs such as the 8080. • modern x86 instruction set is a superset of 8086 instructions and a series of extensions to this instruction set that began with the Intel 8008 microprocessor. • Nearly full binary backward compatibility (between the Intel 8086 chip through to the current generation of x86 processors, with certain exceptions) • Using instructions that will execute on either anything later than an Intel 80386 (or fully compatible clone) processor or else anything later than an Intel Pentium (or compatible clone) processor, (In recent years various software requirements need at least support for later specific extensions to the instruction set, e.g., MMX or SIMD). x86 INSTRUCTION SET ARCHITECTURE
  • 15. Basic Instruction Format • most registers are expressed in opcodes using three or four bits to conserve encoding space; • at most one operand to an instruction can be a memory location • memory operand may also be the destination (or a combined source and destination), while the other operand, the source, can be either register or immediate. • The relatively small number of general registers (also inherited from its 8-bit ancestors) has made register-relative addressing (using small immediate offsets) an important method of accessing operands, especially on the stack, making such accesses as fast as register accesses, i.e. a one cycle instruction throughput, in most circumstances where the accessed data is available in the top-level cache.
  • 16. • IA-32E Mode sub-modes: Compatibility Mode (64-bit, legacy protected mode) 64-Bit Mode (full access to 64-bit address) REX Prefixes REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following: • Specify GPRs and SSE registers. • Specify 64-bit operand size. • Specify extended control registers. Not all instructions require a REX prefix in 64-bit mode. A prefix is necessary only if an instruction references one of the extended registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is ignored. Only one REX prefix is allowed per instruction. If used, the prefix must immediately precede the opcode byte or the two-byte opcode escape prefix (if present). Other placements are ignored. The instruction-size limit of 15 bytes still applies to instructions with a REX prefix. • Instruction format for protected mode, real-address mode, and virtual-8086 mode The Intel 64 and IA-32 architectures instruction encodings are subsets of the format shown. Instructions consist of optional instruction prefixes (in any order), primary opcode bytes (up to three bytes), an addressing-form specifier (if required) consisting of the ModR/M byte and sometimes the SIB (Scale-Index-Base) byte, a displacement (if required), and an immediate data field (if required)
  • 17. Mnemonics and opcodes • Each x86 assembly instruction is represented by a mnemonic which, often combined with one or more operands, translates to one or more bytes called an opcode; NOP : 0x90 HLT : 0xF4 There are potential opcodes with no documented mnemonic which different processors may interpret differently, making a program using them behave inconsistently or even generate an exception on some processors. These opcodes often turn up in code writing competitions as a way to make the code smaller, faster, more elegant or just show off the author's prowess. Demonstrates how to find undocumented opcodes in x86 CPUs: https://www.youtube.com/watch?v=KrksBdWcZgQ
  • 18. Syntax • x86 assembly language has two main syntax branches: Intel syntax, originally used for documentation of the x86 platform and is dominant in the MS-DOS and Windows world (Many x86 assemblers use Intel syntax, including NASM, FASM, MASM, TASM, and YASM) AT&T syntax is dominant in the Unix world, since Unix was created at AT&T Bell Labs Summary of the main differences between Intel syntax and AT&T syntax: AT&T Intel Parameter order Source before the destination. mov $5, %eax Destination before source. mov eax, 5 Parameter size Mnemonics are suffixed with a letter indicating the size of the operands: q for qword, l for long (dword), w for word, and b for byte. addl $4, %esp Derived from the name of the register that is used (e.g. rax, eax, ax, al imply q, l, w, b, respectively). add esp, 4 Sigils Immediate values prefixed with a "$", registers prefixed with a "%". The assembler automatically detects the type of symbols; i.e., whether they are registers, constants or something else. Effective addresses General syntax of DISP(BASE,INDEX,SCALE). Example: movl mem_location(%ebx,%ecx,4), %eax Arithmetic expressions in square brackets; additionally, size keywords like byte, word, or dword have to be used if the size cannot be determined from the operands. Example: mov eax, [ebx + ecx*4 + mem_location]
  • 19. Execution modes • Real mode (16-bit) • Original operating mode of early generation x86 CPUs • Protected mode (16-bit and 32-bit) • 16-bit subset of instructions are available on the 16-bit x86 processors. These instructions are available in real mode on all x86 processors, and in 16-bit protected mode (80286 onwards), additional instructions relating to protected mode are available. On the 80386 and later, 32-bit instructions (including later extensions) are also available in all modes, including real mode. • protected of 80286 was extended to allow the 80386 to address up to 4 GB of memory, • The 32-bit flat memory model of the 80386's helped drive large scale adoption of Windows 3.1 (which relied on protected mode) since Windows could now run many applications at once, including DOS applications, by using virtual memory and simple multitasking. • Virtual 8086 mode (16-bit) • virtual 8086 mode (VM86) made it possible to run one or more real mode programs in a protected environment which emulated real mode, (some programs could not run fully compatible) • System Management Mode (16-bit) • SMM, with some of its own special instructions, is available on some Intel i386SL, i486 and later CPUs • Long mode (64-bit) • 64-bit instructions, and more registers, are also available. The instruction set is similar in each mode but memory addressing and word size vary, requiring different programming strategies.
  • 20. Segmented addressing (real, vm86, 80286 protected modes) • uses a process known as segmentation to address memory • Segmentation composes a memory address from two parts: a segment and an offset; the segment points to the beginning of a 64 KB group of addresses and the offset determines how far from this beginning address the desired address is. • In segmented addressing, two registers are required for a complete memory address: one to hold the segment, the other to hold the offset. In order to translate back into a flat address, the segment value is shifted four bits left (equivalent to multiplication by 24 or 16) then added to the offset to form the full address, which allows breaking the 64k barrier through clever choice of addresses, though it makes programming considerably more complex. Example: DS = 0xDEAD, DX = 0xCAFE memory address = 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE. Therefore, the CPU can address up to 1,048,576 bytes (1 MB) in real mode. • By combining segment and offset values we find a 20-bit address. • When referring to an address with a segment and an offset the notation of segment:offset is used, so in the above example the flat address 0xEB5CE can be written as 0xDEAD:0xCAFE or as a segment and offset register pair; DS:DX.
  • 21. • There are some special combinations of segment registers and general registers that point to important addresses: CS:IP (CS is Code Segment, IP is Instruction Pointer) points to the address where the processor will fetch the next byte of code. SS:SP (SS is Stack Segment, SP is Stack Pointer) points to the address of the top of the stack, i.e. the most recently pushed byte. DS:SI (DS is Data Segment, SI is Source Index) is often used to point to string data that is about to be copied to ES:DI. ES:DI (ES is Extra Segment, DI is Destination Index) is typically used to point to the destination for a string copy, as mentioned above. • In 80286 protected mode (utilized by OS/2) 80286 had 16-bit address registers, limiting only 216 bytes (64 kilobytes) of addressable space. In protected mode, the CPU can use 24-bit addressing to access 224 bytes of memory (16 megabytes). • In protected mode, the segment selector can be broken down into three parts: a 13-bit index, a Table Indicator bit that determines whether the entry is in the GDT or LDT and a 2-bit Requested Privilege Level
  • 23. Stack instructions PUSH src/immed Decrements SP by the size of the operand (two or four, byte values are sign extended) and transfers one word from source to the stack top (SS:SP). POP dest Transfers word at the current stack top (SS:SP) to the destination then increments SP by two to point to the new stack top. CS is not a valid destination. PUSHA PUSHAD (386+) Pushes all general purpose registers onto the stack in the following order: (E)AX, (E)CX, (E)DX, (E)BX, (E)SP, (E)BP, (E)SI, (E)DI. The value of SP is the value before the actual push of SP. POPA POPAD (386+) Pops the top 8 words off the stack into the 8 general purpose 16/32 bit registers. Registers are popped in the following order: (E)DI, (E)SI, (E)BP, (E)SP, (E)DX, (E)CX and (E)AX. The (E)SP value popped from the stack is actually discarded. POPF POPFD (386+) Pops word / doubleword from stack into the Flags Register and then increments SP by 2 (for POPF) or 4 (for POPFD).
  • 24. Integer ALU instructions standard mathematical operations: ADD dest,src Modifies flags: AF CF OF PF SF ZF Adds "src" to "dest" and replacing the original contents of "dest". Both operands are binary. SUB dest,src Modifies flags: AF CF OF PF SF ZF The source is subtracted from the destination and the result is stored in the destination. MUL src Modifies flags: CF OF (AF,PF,SF,ZF undefined) Unsigned multiply of the accumulator by the source. If "src" is a byte value, then AL is used as the other multiplicand and the result is placed in AX. If "src" is a word value, then AX is multiplied by "src" and DX:AX receives the result. If "src" is a double word value, then EAX is multiplied by "src" and EDX:EAX receives the result. The 386+ uses an early out algorithm which makes multiplying any size value in EAX as fast as in the 8 or 16 bit registers. DIV src Modifies flags: (AF,CF,OF,PF,SF,ZF undefined) Unsigned binary division of accumulator by source. If the source divisor is a byte value then AX is divided by "src" and the quotient is placed in AL and the remainder in AH. If source operand is a word value, then DX:AX is divided by "src" and the quotient is stored in AX and the remainder in DX.
  • 25. logical operators: AND dest,src Modifies flags: CF OF PF SF ZF (AF undefined) Performs a logical AND of the two operands replacing the destination with the result. OR dest,src Modifies flags: CF OF PF SF ZF (AF undefined) Logical inclusive OR of the two operands returning the result in the destination. Any bit set in either operand will be set in the destination. XOR dest,src Modifies flags: CF OF PF SF ZF (AF undefined) Performs a bitwise exclusive OR of the operands and returns the result in the destination. NEG dest Modifies flags: AF CF OF PF SF ZF Subtracts the destination from 0 and saves the 2s complement of "dest" back into "dest"
  • 26. bitshift arithmetic and logical: SAL dest,count Modifies flags: CF OF PF SF ZF (AF undefined) SHL dest,count .-. .---------------. .-. |C|<----|7 <---------- 0|<----|0| '-' '---------------' '-' Shifts the destination left by "count" bits with zeroes shifted in on right. The Carry Flag contains the last bit shifted out. SAR dest,count Modifies flags: CF OF PF SF ZF (AF undefined) .---------------. .-. .--|7 ----------> 0|---->|C| | '---------------' '-' '---^ Shifts the destination right by "count" bits with the current sign bit replicated in the leftmost bit. The Carry Flag contains the last bit shifted out. SHR dest,count Modifies flags: CF OF PF SF ZF (AF undefined) .-. .---------------. .-. |0|---->|7 ----------> 0|---->|C| '-' '---------------' '-' Shifts the destination right by "count" bits with zeroes shifted in on the left. The Carry Flag contains the last bit shifted out.
  • 27. rotate with and without carry: RCL dest,count Modifies flags: CF OF .-. .---------------. .--|C|<----|7 <---------- 0|<-. | '-' '---------------' | '-----------------------------' Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on the right. The Carry Flag holds the last bit rotated out. RCR dest,count Modifies flags: CF OF .---------------. .-. .->|7 ----------> 0|---->|C|--. | '---------------' '-' | '-----------------------------' Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on the left. The Carry Flag holds the last bit rotated out. ROL dest,count Modifies flags: CF OF .-. .---------------. |C|<-.--|7 <---------- 0|<-. '-' | '---------------' | '---------------------‘ Rotates the bits in the destination to the left "count" times with all data pushed out the left side re-entering on the right. The Carry Flag will contain the value of the last bit rotated out. ROR dest,count Modifies flags: CF OF .---------------. .-. .->|7 ----------> 0|--.->|C| | '---------------' | '-' '---------------------' Rotates the bits in the destination to the right "count" times with all data pushed out the right side re-entering on the left. The Carry Flag will contain the value of the last bit rotated out.
  • 28. complement of BCD arithmetic instructions / others AAA Modifies flags: AF CF (OF,PF,SF,ZF undefined) Changes contents of AL to valid unpacked decimal. The high order nibble is zeroed. AAD Modifies flags: SF ZF PF (AF,CF,OF undefined) Used before dividing unpacked decimal numbers. Multiplies AH by 10 and the adds result into AL. Sets AH to zero. This instruction is also known to have an undocumented behavior. AL := 10*AH+AL AH := 0 AAM Modifies flags: PF SF ZF (AF,CF,OF undefined) AH := AL / 10 AL := AL mod 10 Used after multiplication of two unpacked decimal numbers, this instruction adjusts an unpacked decimal number. The high order nibble of each byte must be zeroed before using this instruction. This instruction is also known to have an undocumented behavior. AAS Modifies flags: AF CF (OF,PF,SF,ZF undefined) Corrects result of a previous unpacked decimal subtraction in AL. High order nibble is zeroed. DAA Modifies flags: AF CF PF SF ZF (OF undefined) Corrects result (in AL) of a previous BCD addition operation. Contents of AL are changed to a pair of packed decimal digits. DAS Modifies flags: AF CF PF SF ZF (OF undefined) Corrects result (in AL) of a previous BCD subtraction operation. Contents of AL are changed to a pair of packed decimal digits.
  • 29. Data manipulation instructions data transfer instructions MOV dest,src Copies byte or word from the source operand to the destination operand. If the destination is SS interrupts are disabled except on early buggy 808x CPUs. Some CPUs disable interrupts if the destination is any of the segment registers XCHG dest,src Exchanges contents of source and destination. MOVSX dest,src Copies the value of the source operand to the destination register with the sign extended. MOVZX dest,src Copies the value of the source operand to the destination register with the zeroes extended. CMPXCHG dest,src (486+) Modifies flags: AF CF OF PF SF ZF Compares the accumulator (8-32 bits) with "dest". If equal the "dest" is loaded with "src", otherwise the accumulator is loaded with "dest". CWD Extends sign of word in register AX throughout register DX forming a doubleword quantity in DX:AX. CDQ Converts signed DWORD in EAX to a signed quad word in EDX:EAX by extending the high order bit of EAX throughout EDX
  • 30. string/array instructions MOVS dest,src MOVSB MOVSW MOVSD (386+) Copies data from addressed by DS:SI (even if operands are given) to the location ES:DI destination and updates SI and DI based on the size of the operand or instruction used. SI and DI are incremented when the Direction Flag is cleared and decremented when the Direction Flag is Set. Use with REP prefixes. CMPS dest,src Modifies flags: AF CF OF PF SF ZF CMPSB CMPSW CMPSD (386+) Subtracts destination value from source without saving results. Updates flags based on the subtraction and the index registers (E)SI and (E)DI are incremented or decremented depending on the state of the Direction Flag. CMPSB inc/decrements the index registers by 1, CMPSW inc/decrements by 2, while CMPSD increments or decrements by 4. The REP prefixes can be used to process entire data items.
  • 31. SCAS string Modifies flags: AF CF OF PF SF ZF SCASB SCASW SCASD (386+) Compares value at ES:DI (even if operand is specified) from the accumulator and sets the flags similar to a subtraction. DI is incremented/decremented based on the instruction format (or operand size) and the state of the Direction Flag. Use with REP prefixes. LODS src LODSB LODSW LODSD (386+) Transfers string element addressed by DS:SI (even if an operand is supplied) to the accumulator. SI is incremented based on the size of the operand or based on the instruction used. If the Direction Flag is set SI is decremented, if the Direction Flag is clear SI is incremented. Use with REP prefixes. STOS dest STOSB STOSW STOSD Stores value in accumulator to location at ES:(E)DI (even if operand is given). (E)DI is incremented/decremented based on the size of the operand (or instruction format) and the state of the Direction Flag. Use with REP prefixes.
  • 32. REP Repeats execution of string instructions while CX != 0. After each string operation, CX is decremented and the Zero Flag is tested. The combination of a repeat prefix and a segment override on CPU's before the 386 may result in errors if an interrupt occurs before CX=0. The following code shows code that is susceptible to this and how to avoid it: again: rep movs byte ptr ES:[DI],ES:[SI] ; vulnerable instr. jcxz next ; continue if REP successful loop again ; interrupt goofed count next: REPE REPZ Repeats execution of string instructions while CX != 0 and the Zero Flag is set. CX is decremented and the Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override on processors other than the 386 may result in errors if an interrupt occurs before CX=0. REPNE REPNZ Repeats execution of string instructions while CX != 0 and the Zero Flag is clear. CX is decremented and the Zero Flag tested after each string operation. The combination of a repeat prefix and a segment override on processors other than the 386 may result in errors if an interrupt occurs before CX=0.
  • 33. Program flow conditional jumps Mnemonic Meaning Jump Condition JA Jump if Above CF=0 and ZF=0 JAE Jump if Above or Equal CF=0 JB Jump if Below CF=1 JBE Jump if Below or Equal CF=1 or ZF=1 JC Jump if Carry CF=1 JCXZ Jump if CX Zero CX=0 JE Jump if Equal ZF=1 JG Jump if Greater (signed) ZF=0 and SF=OF JGE Jump if Greater or Equal (signed) SF=OF JL Jump if Less (signed) SF != OF JLE Jump if Less or Equal (signed) ZF=1 or SF != OF JNA Jump if Not Above CF=1 or ZF=1 JNAE Jump if Not Above or Equal CF=1 JNB Jump if Not Below CF=0 JNBE Jump if Not Below or Equal CF=0 and ZF=0 JNC Jump if Not Carry CF=0 JNE Jump if Not Equal ZF=0 JNG Jump if Not Greater (signed) ZF=1 or SF != OF JNGE Jump if Not Greater or Equal (signed) SF != OF JNL Jump if Not Less (signed) SF=OF JNLE Jump if Not Less or Equal (signed) ZF=0 and SF=OF JNO Jump if Not Overflow (signed) OF=0 JNP Jump if No Parity PF=0 JNS Jump if Not Signed (signed) SF=0 JNZ Jump if Not Zero ZF=0 JO Jump if Overflow (signed) OF=1 JP Jump if Parity PF=1 JPE Jump if Parity Even PF=1 JPO Jump if Parity Odd PF=0 JS Jump if Signed (signed) SF=1 JZ Jump if Zero
  • 34. JCXZ label JECXZ label (386+) Causes execution to branch to "label" if register CX is zero. Uses unsigned comparision. JMP target Unconditionally transfers control to "label". Jumps by default are within -32768 to 32767 bytes from the instruction following the jump. NEAR and SHORT jumps cause the IP to be updated while FAR jumps cause CS and IP to be updated. LEAVE Releases the local variables created by the previous ENTER instruction by restoring SP and BP to their condition before the procedure stack frame was initialized. ENTER locals, level Modifies stack for entry to procedure for high level language. Operand "locals" specifies the amount of storage to be allocated on the stack. "Level" specifies the nesting level of the routine. Paired with the LEAVE instruction, this is an efficient method of entry and exit to procedures.
  • 35. LOOP label Decrements CX by 1 and transfers control to "label" if CX is not Zero. The "label" operand must be within - 128 or 127 bytes of the instruction following the loop instruction LOOPE label LOOPZ label Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag is set. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction. LOOPNZ label LOOPNE label Decrements CX by 1 (without modifying the flags) and transfers control to "label" if CX != 0 and the Zero Flag is clear. The "label" operand must be within -128 or 127 bytes of the instruction following the loop instruction. INT num Modifies flags: TF IF Initiates a software interrupt by pushing the flags, clearing the Trap and Interrupt Flags, pushing CS followed by IP and loading CS:IP with the value found in the interrupt vector table. Execution then begins at the location addressed by the new CS:IP CALL destination Pushes Instruction Pointer (and Code Segment for far calls) onto stack and loads Instruction Pointer with the address of proc-name. Code continues with execution at CS:IP. RET/RETF/RETN nBytes Transfers control from a procedure back to the instruction address saved on the stack. "n bytes“ is an optional number of bytes to release. Far returns pop the IP followed by the CS, while near returns pop only the IP register.
  • 36. Segment Register Instructions The segment register instructions allow far pointers (segment addresses) to be loaded into the segment registers. LDS dest,src Loads 32-bit pointer from memory source to destination register and DS. The offset is placed in the destination register and the segment is placed in DS. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table. LFS dest,src Loads 32-bit pointer from memory source to destination register and FS. The offset is placed in the destination register and the segment is placed in FS. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table. LEA dest,src Transfers offset address of "src" to the destination register. LES dest,src Loads 32-bit pointer from memory source to destination register and ES. The offset is placed in the destination register and the segment is placed in ES. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table. LSS dest,src Loads 32-bit pointer from memory source to destination register and SS. The offset is placed in the destination register and the segment is placed in SS. To use this instruction the word at the lower memory address must contain the offset and the word at the higher address must contain the segment. This simplifies the loading of far pointers from the stack and the interrupt vector table.
  • 37. I/O INSTRUCTIONS These instructions move data between the processor’s I/O ports and a register or memory. IN accum,port A byte, word or dword is read from "port" and placed in AL, AX or EAX respectively. If the port number is in the range of 0-255 it can be specified as an immediate, otherwise the port number must be specified in DX. Valid port ranges on the PC are 0-1024, though values through 65535 may be specified and recognized by third party vendors and PS/2's. OUT port,accum Transfers byte in AL,word in AX or dword in EAX to the specified hardware port address. If the port number is in the range of 0- 255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the port range 0-1023. INS dest,port INSB INSW INSD (386+) Loads data from port to the destination ES:(E)DI (even if a destination operand is supplied). (E)DI is adjusted by the size of the operand and increased if the Direction Flag is cleared and decreased if the Direction Flag is set. For INSB, INSW, INSD no operands are allowed and the size is determined by the mnemonic. OUTS port,src OUTSB OUTSW OUTSD (386+) Transfers a byte, word or doubleword from "src" to the hardware port specified in DX. For instructions with no operands the "src" is located at DS:SI and SI is incremented or decremented by the size of the operand or the size dictated by the instruction format. When the Direction Flag is set SI is decremented, when clear, SI is incremented. If the port number is in the range of 0- 255 it can be specified as an immediate. If greater than 255 then the port number must be specified in DX. Since the PC only decodes 10 bits of the port address, values over 1023 can only be decoded by third party vendor equipment and also map to the port range 0-1023.
  • 38. Flag Control (EFLAG) Instructions The flag control instructions operate on the flags in the EFLAGS register STC Modifies flags: CF Sets the Carry Flag to 1. STD Modifies flags: DF Sets the Direction Flag to 1 causing string instructions to auto-decrement SI and DI instead of auto-increment STI Modifies flags: IF Sets the Interrupt Flag to 1, which enables recognition of all hardware interrupts. If an interrupt is generated by a hardware device, an End of Interrupt (EOI) must also be issued to enable other hardware interrupts of the same or lower priority. SAHF Modifies flags: AF CF PF SF ZF Transfers bits 0-7 of AH into the Flags Register. This includes AF, CF, PF, SF and ZF. LAHF Copies bits 0-7 of the flags register into AH. This includes flags AF, CF, PF, SF and ZF other bits are undefined. AH := SF ZF xx AF xx PF xx CF CLC Modifies flags: CF Clears the Carry Flag. CLD Modifies flags: DF Clears the Direction Flag causing string instructions to increment the SI and DI index registers. CLI Modifies flags: IF Disables the maskable hardware interrupts by clearing the Interrupt flag. NMI's and software interrupts are not inhibited. CLTS Clears the Task Switched Flag in the Machine Status Register. This is a privileged operation and is generally used only by operating system code.
  • 39. Miscellaneous Instructions The miscellaneous instructions provide such functions as loading an effective address, executing a “no- operation,” and retrieving processor identification information. NOP This is a do nothing instruction. It results in occupation of both space and time and is most useful for patching code segments. (This is the original XCHG AL,AL instruction) XLAT translation-table XLATB (masm 5.x) Replaces the byte in AL with byte from a user table addressed by BX. The original value of AL is the index into the translate table. The best way to discripe this is MOV AL,[BX+AL] CPUID Processor Identification
  • 40. OTHERS Floating-point instructions • instructions for a stack-based floating-point unit (FPU). • The FPU instructions: addition, subtraction, negation, multiplication, division, remainder, square roots, integer truncation, fraction truncation, and scale by power of two. The operations also include conversion instructions, which can load or store a value from memory in any of the following formats: binary-coded decimal, 32-bit integer, 64-bit integer, 32-bit floating-point, 64-bit floating- point or 80-bit floating-point (upon loading, the value is converted to the currently used floating-point mode). • transcendental functions: sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to bases 2, 10, or e. • The stack register to stack register format of the instructions: fop st, st(n) or fop st(n), st where st is equivalent to st(0), and st(n) is one of the 8 stack registers (st(0),st(1),…, st(7)). Like the integers, the first operand is both the first source operand and the destination operand. fsubr and fdivr should be singled out as first swapping the source operands before performing the subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions include instruction modes that pop the top of the stack after their operation is complete. So, for example, faddp st(1), st performs the calculation st(1) = st(1) + st(0), then removes st(0) from the top of stack, thus making what was the result in st(1) the top of the stack in st(0).
  • 41. SIMD instructions • Modern x86 CPUs contain SIMD instructions, • which largely perform the same operation in parallel on many values encoded in a wide SIMD register. • Various instruction technologies support different operations on different register sets, but taken as complete whole (from MMX to SSE4.2) they include general computations on integer or floating point arithmetic (addition, subtraction, multiplication, shift, minimization, maximization, comparison, division or square root). So for example, paddw mm0, mm1 performs 4 parallel 16-bit (indicated by the w) integer adds (indicated by the padd) of mm0 values to mm1 and stores the result in mm0. • Streaming SIMD Extensions or SSE also includes a floating point mode in which only the very first value of the registers is actually modified (expanded in SSE2). • Some other unusual instructions have been added including a sum of absolute differences (used for motion estimation in video compression, such as is done in MPEG) and a 16-bit multiply accumulation instruction (useful for software-based alpha-blending and digital filtering). • SSE (since SSE3) and 3DNow! extensions include addition and subtraction instructions for treating paired floating point values like complex numbers. These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and extracting the values around within the registers. In addition there are instructions for moving data between the integer registers and XMM (used in SSE)/FPU (used in MMX) registers.