William Stallings  Computer Organization  and Architecture  7 th  Edition Chapter 2 Computer Evolution and Performance
ENIAC - background <ul><li>Electronic Numerical Integrator And Computer </li></ul><ul><li>Eckert and Mauchly </li></ul><ul...
ENIAC - details <ul><li>Decimal (not binary) </li></ul><ul><li>20 accumulators of 10 digits </li></ul><ul><li>Programmed m...
von Neumann/Turing <ul><li>Stored Program concept </li></ul><ul><li>Main memory storing programs and data </li></ul><ul><l...
Structure of von Neumann machine
IAS - details <ul><li>1000 x 40 bit words </li></ul><ul><ul><li>Binary number </li></ul></ul><ul><ul><li>2 x 20 bit instru...
Structure of IAS –  detail
Commercial Computers <ul><li>1947 - Eckert-Mauchly Computer Corporation </li></ul><ul><li>UNIVAC I (Universal Automatic Co...
IBM <ul><li>Punched-card processing equipment </li></ul><ul><li>1953 - the 701 </li></ul><ul><ul><li>IBM’s first stored pr...
Transistors <ul><li>Replaced vacuum tubes </li></ul><ul><li>Smaller </li></ul><ul><li>Cheaper </li></ul><ul><li>Less heat ...
Transistor Based Computers <ul><li>Second generation machines </li></ul><ul><li>NCR & RCA produced small transistor machin...
Microelectronics <ul><li>Literally - “small electronics” </li></ul><ul><li>A computer is made up of gates, memory cells an...
Generations of Computer <ul><li>Vacuum tube - 1946-1957 </li></ul><ul><li>Transistor - 1958-1964 </li></ul><ul><li>Small s...
Moore’s Law <ul><li>Increased density of components on chip </li></ul><ul><li>Gordon Moore – co-founder of Intel </li></ul...
Growth in CPU Transistor Count
IBM 360 series <ul><li>1964 </li></ul><ul><li>Replaced (& not compatible with) 7000 series </li></ul><ul><li>First planned...
DEC PDP-8 <ul><li>1964 </li></ul><ul><li>First minicomputer (after miniskirt!) </li></ul><ul><li>Did not need air conditio...
DEC - PDP-8 Bus Structure
Semiconductor Memory <ul><li>1970 </li></ul><ul><li>Fairchild </li></ul><ul><li>Size of a single core </li></ul><ul><ul><l...
Intel <ul><li>1971 - 4004  </li></ul><ul><ul><li>First microprocessor </li></ul></ul><ul><ul><li>All CPU components on a s...
Speeding it up <ul><li>Pipelining </li></ul><ul><li>On board cache </li></ul><ul><li>On board L1 & L2 cache </li></ul><ul>...
Performance Balance <ul><li>Processor speed increased </li></ul><ul><li>Memory capacity increased </li></ul><ul><li>Memory...
Login and Memory Performance Gap
Solutions <ul><li>Increase number of bits retrieved at one time </li></ul><ul><ul><li>Make DRAM “wider” rather than “deepe...
I/O Devices <ul><li>Peripherals with intensive I/O demands </li></ul><ul><li>Large data throughput demands </li></ul><ul><...
Typical I/O Device Data Rates
Key is Balance <ul><li>Processor components </li></ul><ul><li>Main memory </li></ul><ul><li>I/O devices </li></ul><ul><li>...
Improvements in Chip Organization and Architecture <ul><li>Increase hardware speed of processor </li></ul><ul><ul><li>Fund...
Problems with Clock Speed and Login Density <ul><li>Power </li></ul><ul><ul><li>Power density increases with density of lo...
Intel Microprocessor Performance
Increased Cache Capacity <ul><li>Typically two or three levels of cache between processor and main memory </li></ul><ul><l...
More Complex Execution Logic <ul><li>Enable parallel execution of instructions </li></ul><ul><li>Pipeline works like assem...
Diminishing Returns <ul><li>Internal organization of processors complex </li></ul><ul><ul><li>Can get a great deal of para...
New Approach – Multiple Cores <ul><li>Multiple processors on single chip </li></ul><ul><ul><li>Large shared cache </li></u...
POWER4 Chip Organization
Pentium Evolution (1) <ul><li>8080 </li></ul><ul><ul><li>first general purpose microprocessor </li></ul></ul><ul><ul><li>8...
Pentium Evolution (2) <ul><li>80486 </li></ul><ul><ul><li>sophisticated powerful cache and instruction pipelining </li></u...
Pentium Evolution (3) <ul><li>Pentium II </li></ul><ul><ul><li>MMX technology </li></ul></ul><ul><ul><li>graphics, video &...
PowerPC <ul><li>1975, 801 minicomputer project (IBM) RISC  </li></ul><ul><li>Berkeley RISC I processor </li></ul><ul><li>1...
PowerPC Family (1) <ul><li>601: </li></ul><ul><ul><li>Quickly to market. 32-bit machine </li></ul></ul><ul><li>603: </li><...
PowerPC Family (2) <ul><li>740/750: </li></ul><ul><ul><li>Also known as G3 </li></ul></ul><ul><ul><li>Two levels of cache ...
Internet Resources <ul><li>http://www.intel.com/  </li></ul><ul><ul><li>Search for the Intel Museum </li></ul></ul><ul><li...
Upcoming SlideShare
Loading in...5
×

02 Computer Evolution And Performance

4,651

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
4,651
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
119
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

02 Computer Evolution And Performance

  1. 1. William Stallings Computer Organization and Architecture 7 th Edition Chapter 2 Computer Evolution and Performance
  2. 2. ENIAC - background <ul><li>Electronic Numerical Integrator And Computer </li></ul><ul><li>Eckert and Mauchly </li></ul><ul><li>University of Pennsylvania </li></ul><ul><li>Trajectory tables for weapons </li></ul><ul><li>Started 1943 </li></ul><ul><li>Finished 1946 </li></ul><ul><ul><li>Too late for war effort </li></ul></ul><ul><li>Used until 1955 </li></ul>
  3. 3. ENIAC - details <ul><li>Decimal (not binary) </li></ul><ul><li>20 accumulators of 10 digits </li></ul><ul><li>Programmed manually by switches </li></ul><ul><li>18,000 vacuum tubes </li></ul><ul><li>30 tons </li></ul><ul><li>15,000 square feet </li></ul><ul><li>140 kW power consumption </li></ul><ul><li>5,000 additions per second </li></ul>
  4. 4. von Neumann/Turing <ul><li>Stored Program concept </li></ul><ul><li>Main memory storing programs and data </li></ul><ul><li>ALU operating on binary data </li></ul><ul><li>Control unit interpreting instructions from memory and executing </li></ul><ul><li>Input and output equipment operated by control unit </li></ul><ul><li>Princeton Institute for Advanced Studies </li></ul><ul><ul><li>IAS </li></ul></ul><ul><li>Completed 1952 </li></ul>
  5. 5. Structure of von Neumann machine
  6. 6. IAS - details <ul><li>1000 x 40 bit words </li></ul><ul><ul><li>Binary number </li></ul></ul><ul><ul><li>2 x 20 bit instructions </li></ul></ul><ul><li>Set of registers (storage in CPU) </li></ul><ul><ul><li>Memory Buffer Register </li></ul></ul><ul><ul><li>Memory Address Register </li></ul></ul><ul><ul><li>Instruction Register </li></ul></ul><ul><ul><li>Instruction Buffer Register </li></ul></ul><ul><ul><li>Program Counter </li></ul></ul><ul><ul><li>Accumulator </li></ul></ul><ul><ul><li>Multiplier Quotient </li></ul></ul>
  7. 7. Structure of IAS – detail
  8. 8. Commercial Computers <ul><li>1947 - Eckert-Mauchly Computer Corporation </li></ul><ul><li>UNIVAC I (Universal Automatic Computer) </li></ul><ul><li>US Bureau of Census 1950 calculations </li></ul><ul><li>Became part of Sperry-Rand Corporation </li></ul><ul><li>Late 1950s - UNIVAC II </li></ul><ul><ul><li>Faster </li></ul></ul><ul><ul><li>More memory </li></ul></ul>
  9. 9. IBM <ul><li>Punched-card processing equipment </li></ul><ul><li>1953 - the 701 </li></ul><ul><ul><li>IBM’s first stored program computer </li></ul></ul><ul><ul><li>Scientific calculations </li></ul></ul><ul><li>1955 - the 702 </li></ul><ul><ul><li>Business applications </li></ul></ul><ul><li>Lead to 700/7000 series </li></ul>
  10. 10. Transistors <ul><li>Replaced vacuum tubes </li></ul><ul><li>Smaller </li></ul><ul><li>Cheaper </li></ul><ul><li>Less heat dissipation </li></ul><ul><li>Solid State device </li></ul><ul><li>Made from Silicon (Sand) </li></ul><ul><li>Invented 1947 at Bell Labs </li></ul><ul><li>William Shockley et al. </li></ul>
  11. 11. Transistor Based Computers <ul><li>Second generation machines </li></ul><ul><li>NCR & RCA produced small transistor machines </li></ul><ul><li>IBM 7000 </li></ul><ul><li>DEC - 1957 </li></ul><ul><ul><li>Produced PDP-1 </li></ul></ul>
  12. 12. Microelectronics <ul><li>Literally - “small electronics” </li></ul><ul><li>A computer is made up of gates, memory cells and interconnections </li></ul><ul><li>These can be manufactured on a semiconductor </li></ul><ul><li>e.g. silicon wafer </li></ul>
  13. 13. Generations of Computer <ul><li>Vacuum tube - 1946-1957 </li></ul><ul><li>Transistor - 1958-1964 </li></ul><ul><li>Small scale integration - 1965 on </li></ul><ul><ul><li>Up to 100 devices on a chip </li></ul></ul><ul><li>Medium scale integration - to 1971 </li></ul><ul><ul><li>100-3,000 devices on a chip </li></ul></ul><ul><li>Large scale integration - 1971-1977 </li></ul><ul><ul><li>3,000 - 100,000 devices on a chip </li></ul></ul><ul><li>Very large scale integration - 1978 -1991 </li></ul><ul><ul><li>100,000 - 100,000,000 devices on a chip </li></ul></ul><ul><li>Ultra large scale integration – 1991 - </li></ul><ul><ul><li>Over 100,000,000 devices on a chip </li></ul></ul>
  14. 14. Moore’s Law <ul><li>Increased density of components on chip </li></ul><ul><li>Gordon Moore – co-founder of Intel </li></ul><ul><li>Number of transistors on a chip will double every year </li></ul><ul><li>Since 1970’s development has slowed a little </li></ul><ul><ul><li>Number of transistors doubles every 18 months </li></ul></ul><ul><li>Cost of a chip has remained almost unchanged </li></ul><ul><li>Higher packing density means shorter electrical paths, giving higher performance </li></ul><ul><li>Smaller size gives increased flexibility </li></ul><ul><li>Reduced power and cooling requirements </li></ul><ul><li>Fewer interconnections increases reliability </li></ul>
  15. 15. Growth in CPU Transistor Count
  16. 16. IBM 360 series <ul><li>1964 </li></ul><ul><li>Replaced (& not compatible with) 7000 series </li></ul><ul><li>First planned “family” of computers </li></ul><ul><ul><li>Similar or identical instruction sets </li></ul></ul><ul><ul><li>Similar or identical O/S </li></ul></ul><ul><ul><li>Increasing speed </li></ul></ul><ul><ul><li>Increasing number of I/O ports (i.e. more terminals) </li></ul></ul><ul><ul><li>Increased memory size </li></ul></ul><ul><ul><li>Increased cost </li></ul></ul><ul><li>Multiplexed switch structure </li></ul>
  17. 17. DEC PDP-8 <ul><li>1964 </li></ul><ul><li>First minicomputer (after miniskirt!) </li></ul><ul><li>Did not need air conditioned room </li></ul><ul><li>Small enough to sit on a lab bench </li></ul><ul><li>$16,000 </li></ul><ul><ul><li>$100k+ for IBM 360 </li></ul></ul><ul><li>Embedded applications & OEM </li></ul><ul><li>BUS STRUCTURE </li></ul>
  18. 18. DEC - PDP-8 Bus Structure
  19. 19. Semiconductor Memory <ul><li>1970 </li></ul><ul><li>Fairchild </li></ul><ul><li>Size of a single core </li></ul><ul><ul><li>i.e. 1 bit of magnetic core storage </li></ul></ul><ul><li>Holds 256 bits </li></ul><ul><li>Non-destructive read </li></ul><ul><li>Much faster than core </li></ul><ul><li>Capacity approximately doubles each year </li></ul>
  20. 20. Intel <ul><li>1971 - 4004 </li></ul><ul><ul><li>First microprocessor </li></ul></ul><ul><ul><li>All CPU components on a single chip </li></ul></ul><ul><ul><li>4 bit </li></ul></ul><ul><li>Followed in 1972 by 8008 </li></ul><ul><ul><li>8 bit </li></ul></ul><ul><ul><li>Both designed for specific applications </li></ul></ul><ul><li>1974 - 8080 </li></ul><ul><ul><li>Intel’s first general purpose microprocessor </li></ul></ul>
  21. 21. Speeding it up <ul><li>Pipelining </li></ul><ul><li>On board cache </li></ul><ul><li>On board L1 & L2 cache </li></ul><ul><li>Branch prediction </li></ul><ul><li>Data flow analysis </li></ul><ul><li>Speculative execution </li></ul>
  22. 22. Performance Balance <ul><li>Processor speed increased </li></ul><ul><li>Memory capacity increased </li></ul><ul><li>Memory speed lags behind processor speed </li></ul>
  23. 23. Login and Memory Performance Gap
  24. 24. Solutions <ul><li>Increase number of bits retrieved at one time </li></ul><ul><ul><li>Make DRAM “wider” rather than “deeper” </li></ul></ul><ul><li>Change DRAM interface </li></ul><ul><ul><li>Cache </li></ul></ul><ul><li>Reduce frequency of memory access </li></ul><ul><ul><li>More complex cache and cache on chip </li></ul></ul><ul><li>Increase interconnection bandwidth </li></ul><ul><ul><li>High speed buses </li></ul></ul><ul><ul><li>Hierarchy of buses </li></ul></ul>
  25. 25. I/O Devices <ul><li>Peripherals with intensive I/O demands </li></ul><ul><li>Large data throughput demands </li></ul><ul><li>Processors can handle this </li></ul><ul><li>Problem moving data </li></ul><ul><li>Solutions: </li></ul><ul><ul><li>Caching </li></ul></ul><ul><ul><li>Buffering </li></ul></ul><ul><ul><li>Higher-speed interconnection buses </li></ul></ul><ul><ul><li>More elaborate bus structures </li></ul></ul><ul><ul><li>Multiple-processor configurations </li></ul></ul>
  26. 26. Typical I/O Device Data Rates
  27. 27. Key is Balance <ul><li>Processor components </li></ul><ul><li>Main memory </li></ul><ul><li>I/O devices </li></ul><ul><li>Interconnection structures </li></ul>
  28. 28. Improvements in Chip Organization and Architecture <ul><li>Increase hardware speed of processor </li></ul><ul><ul><li>Fundamentally due to shrinking logic gate size </li></ul></ul><ul><ul><ul><li>More gates, packed more tightly, increasing clock rate </li></ul></ul></ul><ul><ul><ul><li>Propagation time for signals reduced </li></ul></ul></ul><ul><li>Increase size and speed of caches </li></ul><ul><ul><li>Dedicating part of processor chip </li></ul></ul><ul><ul><ul><li>Cache access times drop significantly </li></ul></ul></ul><ul><li>Change processor organization and architecture </li></ul><ul><ul><li>Increase effective speed of execution </li></ul></ul><ul><ul><li>Parallelism </li></ul></ul>
  29. 29. Problems with Clock Speed and Login Density <ul><li>Power </li></ul><ul><ul><li>Power density increases with density of logic and clock speed </li></ul></ul><ul><ul><li>Dissipating heat </li></ul></ul><ul><li>RC delay </li></ul><ul><ul><li>Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them </li></ul></ul><ul><ul><li>Delay increases as RC product increases </li></ul></ul><ul><ul><li>Wire interconnects thinner, increasing resistance </li></ul></ul><ul><ul><li>Wires closer together, increasing capacitance </li></ul></ul><ul><li>Memory latency </li></ul><ul><ul><li>Memory speeds lag processor speeds </li></ul></ul><ul><li>Solution: </li></ul><ul><ul><li>More emphasis on organizational and architectural approaches </li></ul></ul>
  30. 30. Intel Microprocessor Performance
  31. 31. Increased Cache Capacity <ul><li>Typically two or three levels of cache between processor and main memory </li></ul><ul><li>Chip density increased </li></ul><ul><ul><li>More cache memory on chip </li></ul></ul><ul><ul><ul><li>Faster cache access </li></ul></ul></ul><ul><li>Pentium chip devoted about 10% of chip area to cache </li></ul><ul><li>Pentium 4 devotes about 50% </li></ul>
  32. 32. More Complex Execution Logic <ul><li>Enable parallel execution of instructions </li></ul><ul><li>Pipeline works like assembly line </li></ul><ul><ul><li>Different stages of execution of different instructions at same time along pipeline </li></ul></ul><ul><li>Superscalar allows multiple pipelines within single processor </li></ul><ul><ul><li>Instructions that do not depend on one another can be executed in parallel </li></ul></ul>
  33. 33. Diminishing Returns <ul><li>Internal organization of processors complex </li></ul><ul><ul><li>Can get a great deal of parallelism </li></ul></ul><ul><ul><li>Further significant increases likely to be relatively modest </li></ul></ul><ul><li>Benefits from cache are reaching limit </li></ul><ul><li>Increasing clock rate runs into power dissipation problem </li></ul><ul><ul><li>Some fundamental physical limits are being reached </li></ul></ul>
  34. 34. New Approach – Multiple Cores <ul><li>Multiple processors on single chip </li></ul><ul><ul><li>Large shared cache </li></ul></ul><ul><li>Within a processor, increase in performance proportional to square root of increase in complexity </li></ul><ul><li>If software can use multiple processors, doubling number of processors almost doubles performance </li></ul><ul><li>So, use two simpler processors on the chip rather than one more complex processor </li></ul><ul><li>With two processors, larger caches are justified </li></ul><ul><ul><li>Power consumption of memory logic less than processing logic </li></ul></ul><ul><li>Example: IBM POWER4 </li></ul><ul><ul><li>Two cores based on PowerPC </li></ul></ul>
  35. 35. POWER4 Chip Organization
  36. 36. Pentium Evolution (1) <ul><li>8080 </li></ul><ul><ul><li>first general purpose microprocessor </li></ul></ul><ul><ul><li>8 bit data path </li></ul></ul><ul><ul><li>Used in first personal computer – Altair </li></ul></ul><ul><li>8086 </li></ul><ul><ul><li>much more powerful </li></ul></ul><ul><ul><li>16 bit </li></ul></ul><ul><ul><li>instruction cache, prefetch few instructions </li></ul></ul><ul><ul><li>8088 (8 bit external bus) used in first IBM PC </li></ul></ul><ul><li>80286 </li></ul><ul><ul><li>16 Mbyte memory addressable </li></ul></ul><ul><ul><li>up from 1Mb </li></ul></ul><ul><li>80386 </li></ul><ul><ul><li>32 bit </li></ul></ul><ul><ul><li>Support for multitasking </li></ul></ul>
  37. 37. Pentium Evolution (2) <ul><li>80486 </li></ul><ul><ul><li>sophisticated powerful cache and instruction pipelining </li></ul></ul><ul><ul><li>built in maths co-processor </li></ul></ul><ul><li>Pentium </li></ul><ul><ul><li>Superscalar </li></ul></ul><ul><ul><li>Multiple instructions executed in parallel </li></ul></ul><ul><li>Pentium Pro </li></ul><ul><ul><li>Increased superscalar organization </li></ul></ul><ul><ul><li>Aggressive register renaming </li></ul></ul><ul><ul><li>branch prediction </li></ul></ul><ul><ul><li>data flow analysis </li></ul></ul><ul><ul><li>speculative execution </li></ul></ul>
  38. 38. Pentium Evolution (3) <ul><li>Pentium II </li></ul><ul><ul><li>MMX technology </li></ul></ul><ul><ul><li>graphics, video & audio processing </li></ul></ul><ul><li>Pentium III </li></ul><ul><ul><li>Additional floating point instructions for 3D graphics </li></ul></ul><ul><li>Pentium 4 </li></ul><ul><ul><li>Note Arabic rather than Roman numerals </li></ul></ul><ul><ul><li>Further floating point and multimedia enhancements </li></ul></ul><ul><li>Itanium </li></ul><ul><ul><li>64 bit </li></ul></ul><ul><ul><li>see chapter 15 </li></ul></ul><ul><li>Itanium 2 </li></ul><ul><ul><li>Hardware enhancements to increase speed </li></ul></ul><ul><li>See Intel web pages for detailed information on processors </li></ul>
  39. 39. PowerPC <ul><li>1975, 801 minicomputer project (IBM) RISC </li></ul><ul><li>Berkeley RISC I processor </li></ul><ul><li>1986, IBM commercial RISC workstation product, RT PC. </li></ul><ul><ul><li>Not commercial success </li></ul></ul><ul><ul><li>Many rivals with comparable or better performance </li></ul></ul><ul><li>1990, IBM RISC System/6000 </li></ul><ul><ul><li>RISC-like superscalar machine </li></ul></ul><ul><ul><li>POWER architecture </li></ul></ul><ul><li>IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh) </li></ul><ul><li>Result is PowerPC architecture </li></ul><ul><ul><li>Derived from the POWER architecture </li></ul></ul><ul><ul><li>Superscalar RISC </li></ul></ul><ul><ul><li>Apple Macintosh </li></ul></ul><ul><ul><li>Embedded chip applications </li></ul></ul>
  40. 40. PowerPC Family (1) <ul><li>601: </li></ul><ul><ul><li>Quickly to market. 32-bit machine </li></ul></ul><ul><li>603: </li></ul><ul><ul><li>Low-end desktop and portable </li></ul></ul><ul><ul><li>32-bit </li></ul></ul><ul><ul><li>Comparable performance with 601 </li></ul></ul><ul><ul><li>Lower cost and more efficient implementation </li></ul></ul><ul><li>604: </li></ul><ul><ul><li>Desktop and low-end servers </li></ul></ul><ul><ul><li>32-bit machine </li></ul></ul><ul><ul><li>Much more advanced superscalar design </li></ul></ul><ul><ul><li>Greater performance </li></ul></ul><ul><li>620: </li></ul><ul><ul><li>High-end servers </li></ul></ul><ul><ul><li>64-bit architecture </li></ul></ul>
  41. 41. PowerPC Family (2) <ul><li>740/750: </li></ul><ul><ul><li>Also known as G3 </li></ul></ul><ul><ul><li>Two levels of cache on chip </li></ul></ul><ul><li>G4: </li></ul><ul><ul><li>Increases parallelism and internal speed </li></ul></ul><ul><li>G5: </li></ul><ul><ul><li>Improvements in parallelism and internal speed </li></ul></ul><ul><ul><li>64-bit organization </li></ul></ul>
  42. 42. Internet Resources <ul><li>http://www.intel.com/ </li></ul><ul><ul><li>Search for the Intel Museum </li></ul></ul><ul><li>http://www.ibm.com </li></ul><ul><li>http://www.dec.com </li></ul><ul><li>Charles Babbage Institute </li></ul><ul><li>PowerPC </li></ul><ul><li>Intel Developer Home </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×