Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pentium processor


Published on

It will help students for Advanced Microprocessor notes on Topic Pentium processors

Published in: Engineering
  • Be the first to comment

Pentium processor

  1. 1. Pentium Processor
  2. 2. Features of Pentium • Introduced in 1993 with clock frequency ranging from 60 to 66 MHz • The primary changes in Pentium Processor were: – Superscalar Architecture – Dynamic Branch Prediction – Pipelined Floating-Point Unit – Separate 8K Code and Data Caches – Writeback MESI Protocol in the Data Cache – 64-Bit Data Bus – Bus Cycle Pipelining
  3. 3. Pentium Architecture
  4. 4. Pentium Architecture • It has data bus of 64 bit and address bus of 32- bit • There are two separate 8kB caches – one for code and one for data. • Each cache has a separate address translation TLB which translates linear addresses to physical. • Code Cache: – 2 way set associative cache – 256 lines b/w code cache and prefetch buffer, permitting prefetching of 32 bytes (256/8) of instructions
  5. 5. Pentium Architecture • Prefetch Buffers: ▫ Four prefetch buffers within the processor works as two independent pairs.  When instructions are prefetched from cache, they are placed into one set of prefetch buffers.  The other set is used as when a branch operation is predicted. ▫ Prefetch buffer sends a pair of instructions to instruction decoder • Instruction Decode Unit: ▫ It occurs in two stages – Decode1 (D1) and Decode2(D2) ▫ D1 checks whether instructions can be paired ▫ D2 calculates the address of memory resident operands
  6. 6. Pentium Architecture • Control Unit : ▫ This unit interprets the instruction word and microcode entry point fed to it by Instruction Decode Unit ▫ It handles exceptions, breakpoints and interrupts. ▫ It controls the integer pipelines and floating point sequences • Microcode ROM : ▫ Stores microcode sequences • Arithmetic/Logic Units (ALUs) : ▫ There are two parallel integer instruction pipelines: u- pipeline and v-pipeline ▫ The u-pipeline has a barrel shifter ▫ The two ALUs perform the arithmetic and logical operations specified by their instructions in their respective pipeline
  7. 7. Pentium Registers • Four 32-bit registers can be used as ∗ Four 32-bit register (EAX, EBX, ECX, EDX) ∗ Four 16-bit register (AX, BX, CX, DX) ∗ Eight 8-bit register (AH, AL, BH, BL, CH, CL, DH, DL) • Some registers have special use ∗ ECX for count in loop instructions
  8. 8. Pentium Registers (Eflags) • Flags never change for any data transfer or program control operation. • Some of the flags are also used to control features found in the microprocessor.
  9. 9. • Flag bits, with a brief description of function. • C (carry) holds the carry after addition or borrow after subtraction. ▫ also indicates error conditions • P (parity) is the count of ones in a number expressed as even or odd. Logic 0 for odd parity; logic 1 for even parity. ▫ if a number contains three binary one bits, it has odd parity ▫ if a number contains no one bits, it has even parity
  10. 10. • C (carry) holds the carry after addition or borrow after subtraction. ▫ also indicates error conditions • P (parity) is the count of ones in a number expressed as even or odd. Logic 0 for odd parity; logic 1 for even parity. ▫ if a number contains three binary one bits, it has odd parity; If a number contains no one bits, it has even parity • A (auxiliary carry) holds the carry (half- carry) after addition or the borrow after subtraction between bit positions 3 and 4 of the result.
  11. 11. • Z (zero) shows that the result of an arithmetic or logic operation is zero. • S (sign) flag holds the arithmetic sign of the result after an arithmetic or logic instruction executes. • T (trap) The trap flag enables trapping through an on-chip debugging feature. • I (interrupt) controls operation of the INTR (interrupt request) input pin. • D (direction) selects increment or decrement mode for the DI and/or SI registers. • O (overflow) occurs when signed numbers are added or subtracted. ▫ an overflow indicates the result has exceeded the capacity of the machine
  12. 12. • IOPL used in protected mode operation to select the privilege level for I/O devices. • NT (nested task) flag indicates the current task is nested within another task in protected mode operation. • RF (resume) used with debugging to control resumption of execution after the next instruction. • VM (virtual mode) flag bit selects virtual mode operation in a protected mode system
  13. 13. • AC, (alignment check) flag bit activates if a word or doubleword is addressed on a non-word or non- doubleword boundary. • VIF is a copy of the interrupt flag bit available to the Pentium 4–(virtual interrupt) • VIP (virtual) provides information about a virtual mode interrupt for (interrupt pending) Pentium. ▫ used in multitasking environments to provide virtual interrupt flags • ID (identification) flag indicates that the Pentium microprocessors support the CPUID instruction. ▫ CPUID instruction provides the system with information about the Pentium microprocessor
  14. 14. Control Registers
  15. 15. • CD cache disable controls the internal cache. If CD=1 , the cache will not fill with new data . If CD=0 misses will cause the cache to fill with new data • NW Not write through selects the mode of operation for the data cache. If NW=1, the data cache is inhibited from cache write though • AM Alignment mask enables alignment checking when set, it only occurs for protected mode • WP write protect protects user level pages against supervisor level write operations. When WP=1, the supervisor can write to user level segments • NE numeric error enables standard numeric coprocessor error detection.
  16. 16. Pin Diagram
  17. 17. • CLOCK ▫ CLK - Clock (Input)  Fundamental Timing for the Pentium  The CPU uses this signal as the internal processor clock. ▫ BF - Bus Frequency (Input)  Bus Frequency determines the bus-to-core frequency ratio  When BF is strapped to Vcc, the processor will operate at a 2 to 3 bus to core frequency ratio.  When BF is strapped to Vss, the processor will operate at a 1 to 2 bus to core frequency ratio.
  18. 18. • Initialization ▫ RESET - (Input)  Forces the CPU to begin execution at a known state. ▫ INIT - Initialization (Input)  The Pentium processor initialization input pin forces the Pentium processor to begin execution in a known state.  The processor state after INIT is the same as the state after RESET except that the internal caches, write buffers, and floating point registers retain the values they had prior to INIT.
  19. 19. • Address Bus ▫ A31:A3 - ADDRESS bus lines  Output except for cache snooping ▫ The number of address lines determines the amount of memory supported by the processor. ▫ Determines where in the 4GB memory space or 64K IO space the processor is accessing. ▫ These are input lines when AHOLD & EADS# are active for Inquire Cycles (snooping)
  20. 20. • Address Bus ▫ BE7#:BEO#: Byte Enable lines (Outputs) ▫ Byte Enables to enable each of the 8 bytes in the 64-bit data path.  Helps define the physical area of memory or I/O accessed.  The Pentium uses Byte Enables to address locations within a QWORD.  In effect a decode of the address lines A2-A0 which the Pentium does not generate.  Which lines go active depends on the address, and whether a byte, word, double word or quad word is required.
  21. 21. • Address Mask ▫ A20M#: Address 20 Mask (Input)  Emulates the address wraparound at 1 MByte which occurs on the 8086.  When A20M# is asserted, the Pentium processor masks physical address bit 20 (A20) before performing a lookup to the internal caches or driving a memory cycle on the bus.  A20#M must be asserted only when the processor is in real mode. • Internal Parity ▫ IERR# - Internal Error (Output)  Alerts System of Internal Parity Errors
  22. 22. • Address Parity ▫ AP Address Parity (I/O)  Bi-directional address parity pin for the address lines.  Address Parity is driven by the Pentium processor with even parity information on all CPU generated cycles in the same clock that the address is driven  Even parity must be driven back to the CPU during inquire cycles on this pin in the same clock as EADS#.  Not supported on all systems ▫ APCHK#: Address Parity Check Signal (Output)  The status of the address parity check is driven on the APCHK# output.  Even Parity Checking
  23. 23. • Data Bus. ▫ D63:DO - Data Lines (I/O).  The bi-directional 64-bit data path to or from the CPU.  The signal W/R# distinguishes direction.  During reads, the CPU samples the data bus when BRDY# is asserted. ▫ DP7: DP0 - Data Parity (I/O)  Bi-directional data parity pins for the data bus.  Even Parity Check. One for each byte of the data bus  Output on writes, Input on reads.  Not supported on all systems.
  24. 24. • Bus Control ▫ ADS# - Address Strobe (output)  Indicates that a new valid bus cycle is currently being driven by the Pentium processor.  The following are some of the signals which are valid when ADS#=0  Addresses (A31:3)  Byte Enables (BE7#:0#)  Bus Cycle definition (M/IO#; D/C#; W/R#, CACHE#)  From power-on the ADS# signal should be asserted periodically when bus cycles are running
  25. 25. • Bus Control (Cont.) ▫ BRDY# - Burst Ready (Input)  Transfer complete indication.  The burst ready input indicates that the external system has presented data on the data pins in response to a read or that the external system has accepted the Pentium processor data in response to a write request.  This signal ends the current bus cycle and is used to extend bus cycles to allow slow devices extra time.  If LOW (non-burst cycles), this signal ends the current bus cycle and the next bus cycle can begin.  If HIGH the Pentium is prevented from continuing processing and wait states are added.
  26. 26. • Bus Cycle Definition ▫ M/IO# - Memory or Input/Output (output)  M/IO# distinguishes between Memory and I/O cycles.  The memory/input-output is one of the primary bus cycle definition pins.  1 = Memory Cycle  0 = Input/Output Cycle  It is driven valid in the same clock as the ADS# signal is asserted.
  27. 27. • Bus Cycle Definition (Cont.) ▫ D/C# - Data or Code (output)  D/C# distinguishes between data and code or special cycles (control)  The data/code output is one of the primary bus cycle definition pins.  1 = Data  0 = Code / Control »Control for Interrupt Acknowledge or Special Cycles  It is driven valid in the same clock as the ADS# signal is asserted.
  28. 28. • Bus Cycle Definition (Cont.) ▫ W/R# - Write or Read (output) W/R# distinguishes between Write and Read cycles. Write/read is one of the primary bus cycle definition pins.  1 = Write  0 = Read It is driven valid in the same clock as the ADS# signal is asserted.
  29. 29. • Bus Cycle Definition (Cont.) ▫ Cache# - Cache ability (output)  Processor indication of internal cache ability.  The L1 cache must be enabled using the CD bit in CR0 for Cache# to be asserted low.  The Cache# signal could also be described as the BURST instruction signal, because the Cache# signal (qualified with KEN#) results in a burst mode transfer of 32 bytes of code or data.  Cache# and Ken# are used together to determine if a read will be turned into a linefill. (Burst cycle).  During write-back cycles, the CPU asserts the CACHE# signal (KEN# does not have to be asserted)
  30. 30. • Bus Cycle Definition (Cont.) ▫ NA# - Next Address (Input)  Indicates external memory is prepared for a pipeline cycle.  An active next address input indicates that the external memory system is ready to accept a new bus cycle although all data transfers for the current cycle have not yet completed.  When NA# is asserted, the Pentium supplies the address for the start of the next transfer early, so that the memory system can latch the new address before the transfer is ready to start.  A detailed discussion of Address Pipelining is beyond the scope of this course.
  31. 31. • Bus Cycle Definition (Cont.) ▫ Lock# - Bus Lock (Output)  The bus lock pin indicates that the current bus cycle is locked, typically for a read-modify-write operation.  The CPU will not allow a bus hold when LOCK# is asserted.  Locked cycles are generated when the programmer prefixes certain instructions with the LOCK prefix.  e.g. LOCK INC [EDI] ;Increment a memory location  Locked cycles are generated automatically for certain bus transfer operations.  Interrupt Acknowledge cycles  The XCHG instructions when 1 operand is memory-based.  See Pentium manual for more details.
  32. 32. • Cache Control ▫ KEN# - Cache Enable (Input)  Indicates to the Pentium whether or not the system can support a cache line fill for the current cycle.  Cache# and Ken# are used together to determine if a read will be turned into a linefill. (Burst cycle). ▫ WB/WT# - Write-back/Write-through (Input)  This pin allows a cache line to be defined as a a write back or write-through on a line by line basis.
  33. 33. • Bus Arbitration ▫ HOLD - Bus Hold (Input)  Allows another bus master complete control of the CPU bus.  In response to the bus hold request, the Pentium processor will float most of its output and input/output pins and assert HLDA after completing all outstanding bus cycles.  The Pentium processor will maintain its bus in this state until HOLD is de-asserted. ▫ HLDA - Bus Hold Acknowledge (Output)  External indication that the Pentium™ outputs are floated.
  34. 34. • Bus Arbitration (Cont.) ▫ BOFF# - Backoff (Input)  Forces the Pentium to get off the bus in the next clock.  After BOFF# is removed, the Pentium restarts the bus cycle. ▫ BREQ - Bus Request (output)  Indicates externally when a bus cycle is pending internally.  Used to inform the arbitration logic that the Pentium need control of the bus to perform a bus cycle.
  35. 35. • Interrupts ▫ INTR - Maskable Interrupt (Input)  Indicates that an external interrupt has been generated.  If the IF(Interrupt Enable Flag) bit in the EFLAGS register is set, the Pentium processor will generate two locked interrupt acknowledge bus cycles (to get type number) and vectors to an interrupt handler after the current instruction execution is completed. ▫ NMI - Non-Maskable Interrupt (Input)  Indicates that an external non maskable interrupt has been generated.  The Pentium processor will vector to a Type 2 interrupt handler after the current instruction execution is completed
  36. 36. • Probe Mode ▫ R/S# - Resume/Stop [Run/Scan] (Input)  The run/stop input is an asynchronous, edge- sensitive interrupt used to stop the normal execution of the processor and place it into an idle state. ▫ PRDY - Probe Ready (Output)  The probe ready output pin indicates that the processor has stopped normal execution in response to the R/S# pin going active. The CPU enters Probe Mode.
  37. 37. What is Superscalar? • Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independently • Equally applicable to RISC & CISC • In practice usually RISC
  38. 38. General Superscalar Organization
  39. 39. Superpipelined • Many pipeline stages need less than half a clock cycle • Double internal clock speed gets two tasks per external clock cycle • Superscalar allows parallel fetch execute
  40. 40. Superscalar v Superpipeline
  41. 41. Limitations • Instruction level parallelism • Compiler based optimisation • Hardware techniques • Limited by ▫ True data dependency ▫ Procedural dependency ▫ Resource conflicts ▫ Output dependency ▫ Antidependency
  42. 42. True Data Dependency • ADD r1, r2 (r1 := r1+r2;) • MOVE r3,r1 (r3 := r1;) • Can fetch and decode second instruction in parallel with first • Can NOT execute second instruction until first is finished
  43. 43. Procedural Dependency • Can not execute instructions after a branch in parallel with instructions before a branch • Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed • This prevents simultaneous fetches
  44. 44. Resource Conflict • Two or more instructions requiring access to the same resource at the same time ▫ e.g. two arithmetic instructions • Can duplicate resources ▫ e.g. have two arithmetic units
  45. 45. Output Dependency • Write-write dependency ▫ R3:=R3 + R5; (I1) ▫ R4:=R3 + 1; (I2) ▫ R3:=R5 + 1; (I3) ▫ R7:=R3 + R4; (I4) In the above instruction sequence I2 cannot be executed before I1 as of true dependency and similar of I4 and I3 If They are not executed sequentially wrong values will be fetched which is referred as output dependency
  46. 46. Antidependency • Write-write dependency ▫ R3:=R3 + R5; (I1) ▫ R4:=R3 + 1; (I2) ▫ R3:=R5 + 1; (I3) ▫ R7:=R3 + R4; (I4) ▫ I3 can not complete before I2 starts as I2 needs a value in R3 and I3 changes R3
  47. 47. Design Issues • Instruction level parallelism ▫ Instructions in a sequence are independent ▫ Execution can be overlapped ▫ Governed by data and procedural dependency • Machine Parallelism ▫ Ability to take advantage of instruction level parallelism ▫ Governed by number of parallel pipelines
  48. 48. Instruction Issue Policy • Order in which instructions are fetched • Order in which instructions are executed • Order in which instructions change registers and memory
  49. 49. In-Order Issue In-Order Completion • Issue instructions in the order they occur • Not very efficient • May fetch >1 instruction • Instructions must stall if necessary
  50. 50. In-Order Issue Out-of-Order Completion • If any instruction is independent on current instruction then it is then it is allowed to execute before completion of current instruction
  51. 51. Out-of-Order Issue Out-of-Order Completion • Decouple decode pipeline from execution pipeline • Can continue to fetch and decode until this pipeline is full • When a functional unit becomes available an instruction can be executed • Since instructions have been decoded, processor can look ahead
  52. 52. Register Renaming • Output and antidependencies occur because register contents may not reflect the correct ordering from the program • May result in a pipeline stall • Registers allocated dynamically ▫ i.e. registers are not specifically named
  53. 53. Superscalar Execution
  54. 54. Superscalar Implementation • Simultaneously fetch multiple instructions • Logic to determine true dependencies involving register values • Mechanisms to communicate these values • Mechanisms to initiate multiple instructions in parallel • Resources for parallel execution of multiple instructions • Mechanisms for committing process state in correct order
  55. 55. Programmers model
  56. 56. Data Transfer Instructions • Move data between memory and the general purpose and segment registers. • Perform some operations as conditional moves, stack access, and data conversion