The x86 Family

11,918 views

Published on

The x86 Family

Published in: Technology
4 Comments
10 Likes
Statistics
Notes
No Downloads
Views
Total views
11,918
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
848
Comments
4
Likes
10
Embeds 0
No embeds

No notes for slide

The x86 Family

  1. 1. Assembly Language x86 Family Architecture Motaz K. Saad Spring 2007 Motaz K. Saad, Dept. of CS
  2. 2. Overview <ul><li>General Concepts </li></ul><ul><li>IA-32 Processor Architecture </li></ul><ul><li>IA-32 Memory Management </li></ul><ul><li>Components of an IA-32 Microcomputer </li></ul><ul><li>Input-Output System </li></ul>Motaz K. Saad, Dept. of CS
  3. 3. General Concepts <ul><li>Basic microcomputer design </li></ul><ul><li>Instruction execution cycle </li></ul><ul><li>Reading from memory </li></ul><ul><li>How programs run </li></ul>Motaz K. Saad, Dept. of CS
  4. 4. Basic Microcomputer Design <ul><li>Clock synchronizes CPU operations </li></ul><ul><li>Control unit (CU) coordinates sequence of execution steps </li></ul><ul><li>ALU performs arithmetic and bitwise processing </li></ul>Motaz K. Saad, Dept. of CS
  5. 5. Motaz K. Saad, Dept. of CS Processor Control Unit Arithmetic Logic Unit (ALU) Arithmetic Logic Unit (ALU) Input Devices Storage Devices Output Devices Memory Data Information Instructions Data Information Instructions Data Information Control Unit
  6. 6. Motaz K. Saad, Dept. of CS
  7. 7. Clock <ul><li>Synchronizes all CPU and BUS operations </li></ul><ul><li>Machine (clock) cycle measures time of a single operation </li></ul><ul><li>Clock is used to trigger events </li></ul>Motaz K. Saad, Dept. of CS
  8. 8. What's Next <ul><li>General Concepts </li></ul><ul><li>IA-32 Processor Architecture </li></ul><ul><li>IA-32 Memory Management </li></ul><ul><li>Components of an IA-32 Microcomputer </li></ul><ul><li>Input-Output System </li></ul>Motaz K. Saad, Dept. of CS
  9. 9. Instruction Execution Cycle <ul><li>Fetch </li></ul><ul><li>Decode </li></ul><ul><li>Fetch operands </li></ul><ul><li>Execute </li></ul><ul><li>Store output </li></ul>Motaz K. Saad, Dept. of CS
  10. 10. Cache Memory <ul><li>High-speed expensive static RAM both inside and outside the CPU. </li></ul><ul><ul><li>Level-1 cache: inside the CPU </li></ul></ul><ul><ul><li>Level-2 cache: outside the CPU </li></ul></ul><ul><li>Cache hit: when data to be read is already in cache memory </li></ul><ul><li>Cache miss: when data to be read is not in cache memory. </li></ul>Motaz K. Saad, Dept. of CS
  11. 11. How a Program Runs Motaz K. Saad, Dept. of CS
  12. 12. Multitasking <ul><li>OS can run multiple programs at the same time. </li></ul><ul><li>Multiple threads of execution within the same program. </li></ul><ul><li>Scheduler utility assigns a given amount of CPU time to each running program. </li></ul><ul><li>Rapid switching of tasks </li></ul><ul><ul><li>gives illusion that all programs are running at once </li></ul></ul><ul><ul><li>the processor must support task switching. </li></ul></ul>Motaz K. Saad, Dept. of CS
  13. 13. IA-32 Processor Architecture <ul><li>Modes of operation </li></ul><ul><li>Basic execution environment </li></ul><ul><li>Floating-point unit </li></ul><ul><li>Intel Microprocessor history </li></ul>Motaz K. Saad, Dept. of CS
  14. 14. Modes of Operation <ul><li>Protected mode </li></ul><ul><ul><li>native mode (Windows, Linux) </li></ul></ul><ul><li>Real-address mode </li></ul><ul><ul><li>native MS-DOS </li></ul></ul><ul><li>System management mode </li></ul><ul><ul><li>power management, system security, diagnostics </li></ul></ul><ul><li>Virtual-8086 mode </li></ul><ul><ul><li>hybrid of Protected </li></ul></ul><ul><ul><li>each program has its own 8086 computer </li></ul></ul>Motaz K. Saad, Dept. of CS
  15. 15. Basic Execution Environment <ul><li>Addressable memory </li></ul><ul><li>General-purpose registers </li></ul><ul><li>Index and base registers </li></ul><ul><li>Specialized register uses </li></ul><ul><li>Status flags </li></ul><ul><li>Floating-point, MMX, XMM registers </li></ul>Motaz K. Saad, Dept. of CS
  16. 16. Addressable Memory <ul><li>Protected mode </li></ul><ul><ul><li>4 GB </li></ul></ul><ul><ul><li>32-bit address </li></ul></ul><ul><li>Real-address and Virtual-8086 modes </li></ul><ul><ul><li>1 MB space </li></ul></ul><ul><ul><li>20-bit address </li></ul></ul>Motaz K. Saad, Dept. of CS
  17. 17. X86 General-Purpose Registers Named storage locations inside the CPU, optimized for speed. Motaz K. Saad, Dept. of CS
  18. 18. Accessing Parts of Registers <ul><li>Use 8-bit name, 16-bit name, or 32-bit name </li></ul><ul><li>Applies to EAX, EBX, ECX, and EDX </li></ul>Motaz K. Saad, Dept. of CS
  19. 19. Index and Base Registers <ul><li>Some registers have only a 16-bit name for their lower half: </li></ul>Motaz K. Saad, Dept. of CS
  20. 20. Some Specialized Register Uses <ul><li>Segment </li></ul><ul><ul><li>CS – code segment </li></ul></ul><ul><ul><li>DS – data segment </li></ul></ul><ul><ul><li>SS – stack segment </li></ul></ul><ul><ul><li>ES, FS, GS - additional segments </li></ul></ul><ul><li>EIP – instruction pointer </li></ul><ul><li>EFLAGS </li></ul><ul><ul><li>status and control flags </li></ul></ul><ul><ul><li>each flag is a single binary bit </li></ul></ul><ul><li>General-Purpose </li></ul><ul><ul><li>EAX – accumulator </li></ul></ul><ul><ul><li>EBX – base register </li></ul></ul><ul><ul><li>ECX – loop counter </li></ul></ul><ul><ul><li>EDX – data register </li></ul></ul><ul><ul><li>ESP – stack pointer </li></ul></ul><ul><ul><li>ESI, EDI – index registers </li></ul></ul><ul><ul><li>EBP – extended frame pointer (stack) </li></ul></ul>Motaz K. Saad, Dept. of CS
  21. 21. Status Flags <ul><li>Carry </li></ul><ul><ul><li>unsigned arithmetic out of range </li></ul></ul><ul><li>Overflow </li></ul><ul><ul><li>signed arithmetic out of range </li></ul></ul><ul><li>Sign </li></ul><ul><ul><li>result is negative </li></ul></ul><ul><li>Zero </li></ul><ul><ul><li>result is zero </li></ul></ul><ul><li>Auxiliary Carry </li></ul><ul><ul><li>carry from bit 3 to bit 4 </li></ul></ul><ul><li>Parity </li></ul><ul><ul><li>sum of 1 bits is an even number </li></ul></ul>Motaz K. Saad, Dept. of CS
  22. 22. Intel Microprocessor History <ul><li>Intel 8086, 80286 </li></ul><ul><li>IA-32 processor family </li></ul><ul><li>P6 processor family </li></ul><ul><li>CISC and RISC </li></ul>Motaz K. Saad, Dept. of CS
  23. 23. Early Intel Microprocessors <ul><li>Intel 8080 </li></ul><ul><ul><li>64K addressable RAM </li></ul></ul><ul><ul><li>8-bit registers </li></ul></ul><ul><ul><li>CP/M operating system </li></ul></ul><ul><ul><li>S-100 BUS architecture </li></ul></ul><ul><ul><li>8-inch floppy disks! </li></ul></ul><ul><li>Intel 8086/8088 </li></ul><ul><ul><li>IBM-PC Used 8088 </li></ul></ul><ul><ul><li>1 MB addressable RAM </li></ul></ul><ul><ul><li>16-bit registers </li></ul></ul><ul><ul><li>16-bit data bus (8-bit for 8088) </li></ul></ul><ul><ul><li>separate floating-point unit (8087) </li></ul></ul>Motaz K. Saad, Dept. of CS
  24. 24. The IBM-AT <ul><li>Intel 80286 </li></ul><ul><ul><li>16 MB addressable RAM </li></ul></ul><ul><ul><li>Protected memory </li></ul></ul><ul><ul><li>several times faster than 8086 </li></ul></ul><ul><ul><li>introduced IDE bus architecture </li></ul></ul><ul><ul><li>80287 floating point unit </li></ul></ul>Motaz K. Saad, Dept. of CS
  25. 25. Intel IA-32 Family <ul><li>Intel386 </li></ul><ul><ul><li>4 GB addressable RAM, 32-bit registers, paging (virtual memory) </li></ul></ul><ul><li>Intel486 </li></ul><ul><ul><li>instruction pipelining </li></ul></ul><ul><li>Pentium </li></ul><ul><ul><li>superscalar, 32-bit address bus, 64-bit internal data path </li></ul></ul>Motaz K. Saad, Dept. of CS
  26. 26. Intel P6 Family <ul><li>Pentium Pro </li></ul><ul><ul><li>advanced optimization techniques in microcode </li></ul></ul><ul><li>Pentium II </li></ul><ul><ul><li>MMX (multimedia) instruction set </li></ul></ul><ul><li>Pentium III </li></ul><ul><ul><li>SIMD (streaming extensions) instructions </li></ul></ul><ul><li>Pentium 4 and Xeon </li></ul><ul><ul><li>Intel NetBurst micro-architecture, tuned for multimedia </li></ul></ul>Motaz K. Saad, Dept. of CS
  27. 27. CISC and RISC <ul><li>CISC – complex instruction set </li></ul><ul><ul><li>large instruction set </li></ul></ul><ul><ul><li>high-level operations </li></ul></ul><ul><ul><li>requires microcode interpreter </li></ul></ul><ul><ul><li>examples: Intel 80x86 family </li></ul></ul><ul><li>RISC – reduced instruction set </li></ul><ul><ul><li>simple, atomic instructions </li></ul></ul><ul><ul><li>small instruction set </li></ul></ul><ul><ul><li>directly executed by hardware </li></ul></ul><ul><ul><li>examples: </li></ul></ul><ul><ul><ul><li>ARM (Advanced RISC Machines) </li></ul></ul></ul><ul><ul><ul><li>DEC Alpha (now Compaq) </li></ul></ul></ul>Motaz K. Saad, Dept. of CS
  28. 28. What's Next <ul><li>General Concepts </li></ul><ul><li>IA-32 Processor Architecture </li></ul><ul><li>IA-32 Memory Management </li></ul><ul><li>Components of an IA-32 Microcomputer </li></ul><ul><li>Input-Output System </li></ul>Motaz K. Saad, Dept. of CS
  29. 29. IA-32 Memory Management <ul><li>Real-address mode </li></ul><ul><li>Calculating linear addresses </li></ul><ul><li>Protected mode </li></ul><ul><li>Multi-segment model </li></ul><ul><li>Paging </li></ul>Motaz K. Saad, Dept. of CS
  30. 30. Real-Address mode <ul><li>1 MB RAM maximum addressable </li></ul><ul><li>Application programs can access any area of memory </li></ul><ul><li>Single tasking </li></ul><ul><li>Supported by MS-DOS operating system </li></ul>Motaz K. Saad, Dept. of CS
  31. 31. Segmented Memory <ul><li>Segmented memory addressing: absolute (linear) address is a combination of a 16-bit segment value added to a 16-bit offset </li></ul>linear addresses one segment Motaz K. Saad, Dept. of CS
  32. 32. Calculating Linear Addresses <ul><li>Given a segment address, multiply it by 16 (add a hexadecimal zero), and add it to the offset </li></ul><ul><li>Example: convert 08F1:0100 to a linear address </li></ul>Adjusted Segment value: 0 8 F 1 0 Add the offset: 0 1 0 0 Linear address: 0 9 0 1 0 Motaz K. Saad, Dept. of CS
  33. 33. Your turn . . . What linear address corresponds to the segment/offset address 028F:0030? 028F0 + 0030 = 02920 Always use hexadecimal notation for addresses. Motaz K. Saad, Dept. of CS
  34. 34. Your turn . . . What segment addresses correspond to the linear address 28F30h? Many different segment-offset addresses can produce the linear address 28F30h. For example: 28F0:0030, 28F3:0000, 28B0:0430, . . . Motaz K. Saad, Dept. of CS
  35. 35. Protected Mode (1 of 2) <ul><li>4 GB addressable RAM </li></ul><ul><ul><li>(00000000 to FFFFFFFFh) </li></ul></ul><ul><li>Each program assigned a memory partition which is protected from other programs </li></ul><ul><li>Designed for multitasking </li></ul><ul><li>Supported by Linux & MS-Windows </li></ul>Motaz K. Saad, Dept. of CS
  36. 36. Protected mode (2 of 2) <ul><li>Segment descriptor tables </li></ul><ul><li>Program structure </li></ul><ul><ul><li>code, data, and stack areas </li></ul></ul><ul><ul><li>CS, DS, SS segment descriptors </li></ul></ul><ul><ul><li>global descriptor table (GDT) </li></ul></ul><ul><li>MASM Programs use the Microsoft flat memory model </li></ul>Motaz K. Saad, Dept. of CS
  37. 37. What's Next <ul><li>General Concepts </li></ul><ul><li>IA-32 Processor Architecture </li></ul><ul><li>IA-32 Memory Management </li></ul><ul><li>Components of an IA-32 Microcomputer </li></ul><ul><li>Input-Output System </li></ul>Motaz K. Saad, Dept. of CS
  38. 38. Components of an IA-32 Microcomputer <ul><li>Motherboard </li></ul><ul><li>Video output </li></ul><ul><li>Memory </li></ul><ul><li>Input-output ports </li></ul>Motaz K. Saad, Dept. of CS
  39. 39. Motherboard <ul><li>CPU socket </li></ul><ul><li>External cache memory slots </li></ul><ul><li>Main memory slots </li></ul><ul><li>BIOS chips </li></ul><ul><li>Sound synthesizer chip (optional) </li></ul><ul><li>Video controller chip (optional) </li></ul><ul><li>IDE, parallel, serial, USB, video, keyboard, joystick, network, and mouse connectors </li></ul><ul><li>PCI bus connectors (expansion cards) </li></ul>Motaz K. Saad, Dept. of CS
  40. 40. Intel D850MD Motherboard dynamic RAM Pentium 4 socket Speaker IDE drive connectors mouse, keyboard, parallel, serial, and USB connectors AGP slot Battery Video Power connector memory controller hub Diskette connector PCI slots I/O Controller Firmware hub Audio chip Source: Intel® Desktop Board D850MD/D850MV Technical Product Specification Motaz K. Saad, Dept. of CS
  41. 41. Video Output <ul><li>Video controller </li></ul><ul><ul><li>on motherboard, or on expansion card </li></ul></ul><ul><ul><li>AGP ( accelerated graphics port technology ) </li></ul></ul><ul><li>Video memory (VRAM) </li></ul><ul><li>Video CRT Display </li></ul><ul><ul><li>uses raster scanning </li></ul></ul><ul><ul><li>horizontal retrace </li></ul></ul><ul><ul><li>vertical retrace </li></ul></ul><ul><li>Direct digital LCD monitors </li></ul><ul><ul><li>no raster scanning required </li></ul></ul>Motaz K. Saad, Dept. of CS
  42. 42. Sample Video Controller (ATI Corp.) <ul><ul><li>128-bit 3D graphics performance powered by RAGE™ 128 PRO </li></ul></ul><ul><ul><li>3D graphics performance </li></ul></ul><ul><ul><li>Intelligent TV-Tuner with Digital VCR </li></ul></ul><ul><ul><li>TV-ON-DEMAND ™ </li></ul></ul><ul><ul><li>Interactive Program Guide </li></ul></ul><ul><ul><li>Still image and MPEG-2 motion video capture </li></ul></ul><ul><ul><li>Video editing </li></ul></ul><ul><ul><li>Hardware DVD video playback </li></ul></ul><ul><ul><li>Video output to TV or VCR </li></ul></ul>Motaz K. Saad, Dept. of CS
  43. 43. Memory <ul><li>ROM </li></ul><ul><ul><li>read-only memory </li></ul></ul><ul><li>EPROM </li></ul><ul><ul><li>erasable programmable read-only memory </li></ul></ul><ul><li>Dynamic RAM (DRAM) </li></ul><ul><ul><li>inexpensive; must be refreshed constantly </li></ul></ul><ul><li>Static RAM (SRAM) </li></ul><ul><ul><li>expensive; used for cache memory; no refresh required </li></ul></ul><ul><li>Video RAM (VRAM) </li></ul><ul><ul><li>dual ported; optimized for constant video refresh </li></ul></ul><ul><li>CMOS RAM </li></ul><ul><ul><li>complimentary metal-oxide semiconductor </li></ul></ul><ul><ul><li>system setup information </li></ul></ul><ul><li>See: Intel platform memory (Intel technology brief) </li></ul>Motaz K. Saad, Dept. of CS
  44. 44. Input-Output Ports <ul><li>USB (universal serial bus) </li></ul><ul><ul><li>intelligent high-speed connection to devices </li></ul></ul><ul><ul><li>up to 12 megabits/second </li></ul></ul><ul><ul><li>USB hub connects multiple devices </li></ul></ul><ul><ul><li>enumeration : computer queries devices </li></ul></ul><ul><ul><li>supports hot connections </li></ul></ul><ul><li>Parallel </li></ul><ul><ul><li>short cable, high speed </li></ul></ul><ul><ul><li>common for printers </li></ul></ul><ul><ul><li>bidirectional, parallel data transfer </li></ul></ul><ul><ul><li>Intel 8255 controller chip </li></ul></ul>Motaz K. Saad, Dept. of CS
  45. 45. Input-Output Ports (cont) <ul><li>Serial </li></ul><ul><ul><li>RS-232 serial port </li></ul></ul><ul><ul><li>one bit at a time </li></ul></ul><ul><ul><li>uses long cables and modems </li></ul></ul><ul><ul><li>16550 UART (universal asynchronous receiver transmitter) </li></ul></ul><ul><ul><li>programmable in assembly language </li></ul></ul>Motaz K. Saad, Dept. of CS
  46. 46. What's Next <ul><li>General Concepts </li></ul><ul><li>IA-32 Processor Architecture </li></ul><ul><li>IA-32 Memory Management </li></ul><ul><li>Components of an IA-32 Microcomputer </li></ul><ul><li>Input-Output System </li></ul>Motaz K. Saad, Dept. of CS
  47. 47. Levels of Input-Output <ul><li>Level 3: Call a library function (C++, Java) </li></ul><ul><ul><li>easy to do; abstracted from hardware; details hidden </li></ul></ul><ul><ul><li>slowest performance </li></ul></ul><ul><li>Level 2: Call an operating system function </li></ul><ul><ul><li>specific to one OS; device-independent </li></ul></ul><ul><ul><li>medium performance </li></ul></ul><ul><li>Level 1: Call a BIOS (basic input-output system) function </li></ul><ul><ul><li>may produce different results on different systems </li></ul></ul><ul><ul><li>knowledge of hardware required </li></ul></ul><ul><ul><li>usually good performance </li></ul></ul><ul><li>Level 0: Communicate directly with the hardware </li></ul><ul><ul><li>May not be allowed by some operating systems </li></ul></ul>Motaz K. Saad, Dept. of CS
  48. 48. Displaying a String of Characters <ul><li>When a HLL program displays a string of characters, the following steps take place: </li></ul>Motaz K. Saad, Dept. of CS
  49. 49. ASM Programming levels ASM programs can perform input-output at each of the following levels: Motaz K. Saad, Dept. of CS
  50. 50. Summary <ul><li>Central Processing Unit (CPU) </li></ul><ul><li>Arithmetic Logic Unit (ALU) </li></ul><ul><li>Instruction execution cycle </li></ul><ul><li>Multitasking </li></ul><ul><li>Floating Point Unit (FPU) </li></ul><ul><li>Complex Instruction Set </li></ul><ul><li>Real mode and Protected mode </li></ul><ul><li>Motherboard components </li></ul><ul><li>Memory types </li></ul><ul><li>Input/Output and access levels </li></ul>Motaz K. Saad, Dept. of CS
  51. 51. More Details about X86 Family Architecture X86 family Generations Motaz K. Saad, Dept. of CS
  52. 52. X86 Family <ul><li>8086 and 8088 Microprocessors </li></ul><ul><li>80x86 architecture </li></ul><ul><li>Address bus : 20 bits, 16 bits for 8-bit chips  Max. memory capacity : 1 Mbytes </li></ul><ul><li>Internal structure is divided into BIU and EU  Fetch and instruction execution can occur simultaneously </li></ul><ul><li>Length of internal registers expanded from 8 bit to 16 bit </li></ul><ul><li>H/W multiply and divide instructions built into the processor </li></ul><ul><li>Support for an external math coprocessor for floating-point operations in H/W as much as 100 times faster </li></ul>Motaz K. Saad, Dept. of CS
  53. 53. Intel 8085 architecture : 8-bit data, 16-bit address Motaz K. Saad, Dept. of CS
  54. 54. Internal architecture of 8086 Motaz K. Saad, Dept. of CS
  55. 55. <ul><li>PC Standard </li></ul><ul><li>For 16bit data bus, two 8-bit memory banks are required  expensive at the time </li></ul><ul><li>in 1979, Intel announced 8088 µ -P that is identical to the 8086 except an external 8-bit data bus.  Two memory accesses are needed to input a word. </li></ul><ul><li>IBM announced the IBM-PC, using 8088 µ -P and 16 KB memory (expandable to 64 KB). Clock speed : 4.77 MHz -------- PC standard is defined. </li></ul>Motaz K. Saad, Dept. of CS
  56. 56. <ul><li>80186 and 80188 Microprocessors </li></ul><ul><li>High-integration CPUs : includes 8086 (or 8088) core and a clock generator, a programmable timer, an interrupt controller, a DMA controller, etc. </li></ul><ul><li>Instruction set is fully compatible to 8086 and 8088, but include 9 new instructions. </li></ul><ul><li>Used for IBM-PC compatibles and many embedded computers. </li></ul>Motaz K. Saad, Dept. of CS
  57. 57. <ul><li>80286 Microprocessor </li></ul><ul><li>Processor of IBM PC-AT </li></ul><ul><li>Provide two programming modes </li></ul><ul><ul><li>Real mode - functions exactly same as 8086 - use only 20 least significant address lines (max. 1 MB) - faster than 8086 due to redesigning and higher clock </li></ul></ul><ul><ul><li>Protected mode - 16 new instructions are added - support multi-program environment by giving each program a predetermined amount of memory (16 MB) - programs no longer have physical addresses, but are addressed by a segment selector - Several programs can be loaded into memory at the same time, but protected from each other (*MS-DOS) </li></ul></ul>Motaz K. Saad, Dept. of CS
  58. 58. The 8086 and 80286 microprocessors. Motaz K. Saad, Dept. of CS
  59. 59. <ul><li>80386 Microprocessor </li></ul><ul><li>New Standard announced (1985) by Intel with commitment of successive u-P generations being remained compatible with this chip, Intel Architecture-32 ( IA-32 ) thru 2000. </li></ul><ul><li>Data bus & internal registers : 32 bits </li></ul><ul><li>Address bus : 32 bits  max. 4 GB of physical memory </li></ul>Motaz K. Saad, Dept. of CS
  60. 60. Internal architecture of 80386 Motaz K. Saad, Dept. of CS
  61. 61. Internal registers (partly) of 80386 Motaz K. Saad, Dept. of CS
  62. 62. <ul><li>80386 supports two operating modes (like 80286) 1) Real Address Mode - used by MS-DOS - in this mode, 80386 becomes a fast 8086. 2) Protected Virtual Address Mode (Protected Mode) - On-board MMU manages 4 GB of memory - Each task is given a segment of memory governed by a descriptor register , that defines the segment base address, the segment limit, and the attributes for the segment (code, data, read-only, etc.) - Use paging technique : 4 KB pages can be swapped in and out of memory (using a disk) to allow a task to have a virtual memory space as large as 64 TB. </li></ul>Motaz K. Saad, Dept. of CS
  63. 63. <ul><li>When operating with 64 KB of cache, the 386 achieves a hit rate of 93%  the processor operates at full speed 93% of the time </li></ul><ul><li>Instruction set of 386 is 100% compatible with the older processors in the family. </li></ul><ul><li>14 new instructions are added and several others are modified. [ex] data can be moved between the internal registers at a time. </li></ul><ul><li>80386SX : designed to ease the transition from 16- to 32-bit processors --- 16-bit external data bus and 24-bit address bus . </li></ul>Motaz K. Saad, Dept. of CS
  64. 64. <ul><li>80486 Microprocessor </li></ul><ul><li>Maintain compatibility with the older u-Ps </li></ul><ul><li>Only 6 new instructions are added to be used by OS S/W, not by application programs. </li></ul><ul><li>Redesigned using RISC concepts  frequently used instructions to execute in a single clock cycle. </li></ul><ul><li>New 5-stage instruction execution pipeline  5 instructions can be executed at once. </li></ul><ul><li>On-board 8K cache and 80387 coprocessor  twice faster than 386 (20 MHz 387 = 40 MHz 386) </li></ul><ul><li>486SX : excludes 80387, designed for low-end appli- cations that do not require a coprocessor. </li></ul>Motaz K. Saad, Dept. of CS
  65. 65. <ul><li>486DX2 and DX4 </li></ul><ul><li>DX2 : the internal clock rate is twice the external clock. </li></ul><ul><li>DX4 : the internal clock rate is three times .  Allow to use less expensive components on the computer system board, while the processor operate at its maximum data rate (internally). [Ex] 486DX2 66 : 66 MHz (int. clock) & 33 MHz (ext. clock) 486DX4 100 : 100 MHz (int. clock) & 33 MHz (ext. clock) </li></ul><ul><li>Overdrive Processors : 486 system boards include an over- drive socket to allow users to upgrade low-speed 486DX or 486SX with 486DX2 and DX4 style processors. </li></ul>Motaz K. Saad, Dept. of CS
  66. 66. <ul><li>Pentium </li></ul><ul><li>Superscalar Architecture : provides two instruction execution pipelines, each with its own ALU, address generation circuitry, and data cache interface.  execute two different instructions simultaneously </li></ul><ul><li>Additional Features : </li></ul><ul><ul><li>includes on-board cache (separate 8K instruction cache and data cache) and a coprocessor </li></ul></ul><ul><ul><li>8-stage instruction pipelines </li></ul></ul><ul><ul><li>achieves 5~8 times floating-point performance of 486 </li></ul></ul><ul><ul><li>external data bus : 64 bits </li></ul></ul><ul><ul><li>about twice as fast as the 486 </li></ul></ul>Motaz K. Saad, Dept. of CS
  67. 67. Key features of the Pentium microprocessor. The execution unit has two pipelines allowing two instructions to be executed simultaneously. Motaz K. Saad, Dept. of CS
  68. 68. <ul><li>MMX (Multimedia Extension) : provides 3 architectural enhancements over non-MMX Pentium </li></ul><ul><li>57 instructions are added for multimedia (audio, video, and graphic data) applications. </li></ul><ul><li>SIMD(Single-Instruction stream Multiple-Data stream) allows the same operation to be performed on multiple data items. Because many multimedia applications require large blocks of data to be manipulated, SIMD provides a significant performance enhancement. </li></ul><ul><li>Internal cache size is increased from 16K to 32K. </li></ul><ul><li>For general applications, 10~20% performance improved. </li></ul><ul><li>For multimedia applications, nearly 70% improved. </li></ul>Motaz K. Saad, Dept. of CS
  69. 69. <ul><li>Socket 7 : ZIF(zero insertion force) socket </li></ul><ul><li>Pentium chip : 296-pin PGA package. A heat sink and fan are mounted atop the chip, and the entire assembly plugged into a ZIF, so-called socket 7. </li></ul><ul><li>Socket 7 defines a platform that defines the front side bus connection to the L2 cache, disk interface, video interface, and the ISA and PCI expansion buses. </li></ul>Motaz K. Saad, Dept. of CS
  70. 70. Pentium processor with heat sink and fan mated to a Socket 7 connector. Motaz K. Saad, Dept. of CS
  71. 71. <ul><li>Pentium Pro </li></ul><ul><li>6 th - generation processors ( Pentium Pro, Pentium II, Pentium III and Celeron ) </li></ul><ul><li>36 address lines  max. 64 GB memory </li></ul><ul><li>New features 1. Inclusion of L2 cache in the same package with proc. 2. New system board platform called Socket 8 (Pro), slot 1 & 2 (Pentium II, III, and Celeron), and Socket 370 (Pentium III and Celeron). 3. New instruction architecture based on Dynamic Execution </li></ul><ul><li>Two chips in One Package : Pentium Pro consists of two separate silicon dies – one for the processor and the other for 256KB L2 cache. </li></ul>Motaz K. Saad, Dept. of CS
  72. 72. The Pentium Pro is two chips in one. The larger die is the processor, the smaller a 256K L2 cache. (Courtesy of Intel Corporation.) Motaz K. Saad, Dept. of CS
  73. 73. <ul><li>Dynamic Execution : a new approach to processing S/W instructions that reduces idle processor time. </li></ul><ul><li>Multiple Branch Prediction : Pentium Pro can look as far as 30 instructions ahead to anticipate conditional branches  reduce waste of pipeline clocks. </li></ul><ul><li>Data Flow Analysis : looks at upcoming S/W instruc- tions for the optimal sequence of processing. </li></ul><ul><li>Speculative Execution : allows to execute instructions in a different order from which they are entered the processor = “ out-of-order execution ” . The result of these instructions are stored as speculative results until their final states can be determined. </li></ul>Motaz K. Saad, Dept. of CS
  74. 74. Superscalar Processor of Degree Three : Pentium has three instruction decoders, and can execute 3 simul- taneous instructions. Internal Cache : L2 cache in the same package. Motaz K. Saad, Dept. of CS
  75. 75. <ul><li>Pentium II </li></ul><ul><li>Pentium Pro is dead (short life) due to </li></ul><ul><ul><li>the lack of MMX instructions </li></ul></ul><ul><ul><li>use of the expensive dual- and tri-cavity package </li></ul></ul><ul><li>Pentium II is a Pentium Pro with MMX technology, repackaged in a new single-edge contact(SEC) cartridge that is inserted in “ Slot 1 connector – 242 pins ” or “ Slot 2 connector – 330 pins </li></ul><ul><li>Processor and L2 are mounted on a ceramic substrate (silicon dies are separate) </li></ul><ul><li>Processor clock : 300 ~ 450 MHz, bus clock : 100 MHz </li></ul><ul><li>L1(32 KB) & L2(512 KB) with 64-bit dedicated bus </li></ul>Motaz K. Saad, Dept. of CS
  76. 76. Exploded view of single-edge contact (SEC) cartridge. (Courtesy of Intel Corporation.) Motaz K. Saad, Dept. of CS
  77. 77. Installing the SEC cartridge into the retention mechanism. (Courtesy of Intel Corporation.) Motaz K. Saad, Dept. of CS
  78. 78. <ul><li>Celeron </li></ul><ul><li>Pentium II without L2 cache (Pentium II SX ?) </li></ul><ul><li>Use the slot 1 connector without the plastic cover called “ naked CPU ” </li></ul><ul><li>Celeron A : Include 128KB L2 cache on the same die with processor. </li></ul><ul><ul><li>Drawback : 66 MHz bus cycle </li></ul></ul><ul><ul><li>370-pin PGA package (called Socket 370) </li></ul></ul>Motaz K. Saad, Dept. of CS
  79. 79. The Celeron processor is a Pentium II without the L2 cache. Later versions, called the Celeron A, include this cache on the same silicon die with the processor. (Courtesy of Intel Corporation.) Motaz K. Saad, Dept. of CS
  80. 80. <ul><li>Pentium III </li></ul><ul><li>H igher clock speed : b ased on the Pentium II core, with 600MHz clock and an external bus freq. of 133MHz </li></ul><ul><li>70 new streaming SIMD extensions (SSE) : </li></ul><ul><ul><li>50 to improve floating-point performance </li></ul></ul><ul><ul><li>12 to improve multimedia processing </li></ul></ul><ul><ul><li>8 to improve the efficiency of L1 cache </li></ul></ul>Motaz K. Saad, Dept. of CS
  81. 81. The Pentium III microprocessor with integrated L2 cache. This chip has more than 22 million transistors. (Courtesy of Intel Corporation.) Motaz K. Saad, Dept. of CS
  82. 82. <ul><li>Xeon Processors </li></ul><ul><li>Scalability : As processing demands increase, additional processors can be interconnected to keep pace. - One of the advantages of Pentium Pro that can support up to 4 processors ; SMP (symmetric multiprocessing) </li></ul><ul><li>Pentium II Xeon processor can be scaled to 2, 4, 8 or more, and used for high-end server and workstations. </li></ul><ul><li>Pentium III Xeon processor : similar but offer the strea- ming SIMD technology. </li></ul>Motaz K. Saad, Dept. of CS
  83. 83. <ul><li>P7 Itanium </li></ul><ul><li>IA-64 : 7 th -generation processor architecture, Code name = Merced </li></ul><ul><li>64-bit architecture : 128 64-bit registers & 128 82-bit floating-point registers (including hidden bits) [c.f.] IA-32 : 10 32-bit reg., 8 fl-pt. reg. </li></ul><ul><li>Explicit parallelism : instructions are packed in 128-bit bundles ready for execution. Each bundle consists of 3 41-bit instructions and 5-bit template. All three inst- ructions are dispatched in parallel </li></ul>Motaz K. Saad, Dept. of CS
  84. 84. <ul><li>Speculation : preload data to minimize memory delays when data is needed </li></ul><ul><li>Predication : When a conditional branch instruction is encountered, Itanium follows both branch paths, then commits the results of the correct path only. </li></ul><ul><li>Data bus : 128 bits </li></ul><ul><li>Address bus : 64 bits  max. 2 64 bytes memory </li></ul>Motaz K. Saad, Dept. of CS
  85. 85. <ul><li>80x86 Compatible Microprocessors </li></ul><ul><li>Second Sources : manufacturing 80x86 u-P chips after licensed by Intel. </li></ul><ul><li>Clones and Look-Alikes </li></ul><ul><ul><li>Pin-for-pin replacements with all of the same fea- tures as the Intel processor. [Ex] AMD 386DX, 486DX4-100, Cyrix 5x86, etc. </li></ul></ul>Motaz K. Saad, Dept. of CS
  86. 86. The AMD K7 or Athlon processor. It mates to a new proprietary socket called Slot A. (Courtesy of Advanced Micro Devices.) Motaz K. Saad, Dept. of CS
  87. 87. <ul><li>Measuring Processor Performance </li></ul><ul><li>Benchmark programs : used to measure the performance of a computer system (system benchmarks) or of a com- ponent in that system such as the processor, disk, video card, or main memory (component benchmarks). </li></ul><ul><li>Component-level Benchmarks </li></ul><ul><ul><li>Whetstone : used to measure the time to execute integer and floating-point arithmetic instructions and “ if ” statements. --- including a high percentage of fl.pt. operations  mostly used to represent numerical programs. </li></ul></ul>Motaz K. Saad, Dept. of CS
  88. 88. <ul><li>Dhrystone : a synthetic benchmark consisting of 12 procedures with 94 statements, no fl.-pt. ops. </li></ul><ul><li>Microprocessor Benchmarks : developed for compa- ring the processing ability of the vaious u-P chips. --- Ziff-Davis ’ CPUmark and Intel ’ s iCOMP index. </li></ul><ul><ul><li>CPUmark : measures the speed of a PC ’ s proc- essor subsystem, including the CPU, its internal and external caches, and system RAM. [Ex] Fig. 1-20 : CPUmark99 ratings for 80x86s </li></ul></ul><ul><ul><li>iCOMP : combines 4 industry standard benchmarks : CPUmark32, Norton SI32, SPEC95, and the Intel Media Benchmark (audio, vedio, image, 3-D, etc.). </li></ul></ul>Motaz K. Saad, Dept. of CS
  89. 89. CPUmark is a benchmark that measures the speed of the processor and its internal cache. Motaz K. Saad, Dept. of CS
  90. 90. <ul><li>System-level Benchmarks </li></ul><ul><ul><li>Microcomputer Benchmarks : measures the speed of processor with considering a slow disk or video subsystem. </li></ul></ul><ul><ul><li>Winston : System-level, application-based benchmark to measure a PC ’ s overall performance when running today ’ s 32-bit applications on Window 95, 98, NT. [Ex] Winstone 98 ratings for 80x86s </li></ul></ul><ul><ul><li>Performance Rating : Cyrix and AMD developed the P-rating (Processor Performance) system --- running applications on a processor and compare to a Pentium u-P. </li></ul></ul><ul><ul><li>[Ex] Table 1-2 : PR166 ~ 366 for AMD and Cyrix chips </li></ul></ul>Motaz K. Saad, Dept. of CS
  91. 91. Winstone 98 measures the performance of a PC system running typical Windows applications. Motaz K. Saad, Dept. of CS

×