PowerPoint Presentation

1,610 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,610
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
46
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PowerPoint Presentation

  1. 1. ECE337 Fall 2009 Chapter 1 Introduction to Computer Architecture and Organization
  2. 2. Architecture & Organization 1 <ul><li>Architecture is those attributes visible to the programmer (alternatively say, those attributes have a direct impact on the logic execution of a program) </li></ul><ul><ul><li>Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques. </li></ul></ul><ul><ul><li>e.g. Is there a multiply instruction? </li></ul></ul><ul><li>Organization is how features are implemented </li></ul><ul><ul><li>Control signals, interfaces, memory technology. </li></ul></ul><ul><ul><li>e.g. Is there a hardware multiply unit or is it done by repeated addition? </li></ul></ul>
  3. 3. Architecture & Organization 2 <ul><li>All Intel x86 family share the same basic architecture </li></ul><ul><li>This gives code compatibility </li></ul><ul><ul><li>At least backwards </li></ul></ul><ul><li>Organization differs between different versions, same architecture, but different organization </li></ul>
  4. 4. Architecture & Organization 3 <ul><li>Also the Intel Pentium and the AMD Athlon have nearly identical versions of the x86 instruction set architecture , but have totally different internal designs to implement the same architecture. </li></ul>
  5. 5. Why study computer architecture and organization? <ul><li>Acquire understanding and appreciation of a computer system’s functional components, their characteristics, their performance and their interactions </li></ul><ul><li>Structure a program that runs more efficiently on a real machine </li></ul><ul><li>Understand the trade-off among various components in order to select the most cost-effective computer system </li></ul><ul><li>- IEEE/ACM Computer Curricula 2001 </li></ul>
  6. 6. Languages, Levels, Virtual (hypothetical) Machines <ul><li>A multilevel model </li></ul>primitive,machine-oriented complex, people-oriented
  7. 7. Translator and interpreter <ul><li>Translator : translate instructions before execution </li></ul><ul><li>eg. Compiler, Assembler </li></ul><ul><li>Interpreter : interpret instructions during execution </li></ul><ul><li>eg. Matlab software, OS software </li></ul>
  8. 8. Contemporary Multilevel Machines <ul><li>A six-level computer. </li></ul><ul><li>The support method for each level is indicated below it </li></ul><ul><li>(along with the name of supporting program). </li></ul>gates, registers collection of registers,ALU(data path) machine language instruction sets (opcode) Hybrid level. new instruction sets (system calls), memory organization, multiple processes control etc. system programmer application programmer
  9. 9. Computer Generations <ul><li>First Generation Vacuum Tubes (1945 – 1955) </li></ul><ul><li>Second Generation Transistors (1955 – 1965) </li></ul><ul><li>Third Generation Integrated Circuits (1965 – 1980) </li></ul><ul><li>Fourth Generation Very Large Scale Integration (1980 – ?) </li></ul>Computer History - An Overview http://library.thinkquest.org/18268/History/hist_m.htm
  10. 10. Vacuum Tubes ENIAC - background <ul><li>Electronic Numerical Integrator And Computer </li></ul><ul><li>Eckert and Mauchly </li></ul><ul><li>University of Pennsylvania </li></ul><ul><li>Trajectory tables for weapons </li></ul><ul><li>Started 1943 </li></ul><ul><li>Finished 1946 </li></ul><ul><ul><li>Too late for war effort </li></ul></ul><ul><li>Used until 1955 </li></ul>
  11. 11. ENIAC - details <ul><li>Decimal (not binary) </li></ul><ul><li>20 accumulators of 10 digits </li></ul><ul><li>Programmed manually by switches </li></ul><ul><li>18,000 vacuum tubes </li></ul><ul><li>30 tons </li></ul><ul><li>15,000 square feet </li></ul><ul><li>140 kW power consumption </li></ul><ul><li>5,000 additions per second </li></ul>
  12. 12. von Neumann machine <ul><li>Stored Program concept </li></ul><ul><li>Main memory storing programs and data </li></ul><ul><li>ALU operating on binary data </li></ul><ul><li>Control unit interpreting instructions from memory and executing </li></ul><ul><li>Input and output equipment operated by control unit </li></ul><ul><li>Princeton Institute for Advanced Studies </li></ul><ul><ul><li>IAS machine </li></ul></ul><ul><li>Completed 1952 </li></ul>
  13. 13. Structure of von Neumann machine
  14. 14. EDSAC <ul><li>the first stored program computer </li></ul><ul><li>built in 1949 </li></ul><ul><li>could complete 714 operations per second. </li></ul>
  15. 15. IAS machine- details <ul><li>Stored Program </li></ul><ul><li>4096 x 40 bit word memory </li></ul><ul><ul><li>Binary number </li></ul></ul><ul><ul><li>2 x 20 bit instructions </li></ul></ul><ul><ul><li>A 40-bit signed integer </li></ul></ul>
  16. 16. Transistors <ul><li>Replaced vacuum tubes </li></ul><ul><li>Smaller </li></ul><ul><li>Cheaper </li></ul><ul><li>Less heat dissipation </li></ul><ul><li>Solid State device </li></ul><ul><li>Made from Silicon </li></ul><ul><li>Invented 1947 at Bell Labs </li></ul><ul><li>William Shockley et al. </li></ul>
  17. 17. TX-0 (1956) <ul><li>the first general-purpose, programmable computer built with transistors. </li></ul><ul><li>created by MIT researchers. </li></ul>
  18. 18. PDP-1(1960) <ul><li>The PDP-1 sold for $120,000. </li></ul><ul><li>MIT wrote the first video game, Space War! for it. </li></ul><ul><li>A total of 50 were built. </li></ul>
  19. 19. Integrated Circuit - Microelectronics <ul><li>Literally - “small electronics” </li></ul><ul><li>A computer is made up of gates, memory cells and interconnections among them </li></ul><ul><li>These can be manufactured on a semiconductor </li></ul><ul><li>e.g. silicon wafer </li></ul>
  20. 20. IBM 360 (1964) <ul><li>IBM 360 with different models </li></ul><ul><li>a great variety of combinations of speed, memory, and power. </li></ul><ul><li>all the models were compatible </li></ul><ul><li>IBM were getting 1,000 orders each month, month within two years. </li></ul>
  21. 21. Very Large Scale Integration <ul><li>1981 - IBM PC </li></ul><ul><li>IBM's first PC ran on Intel's 4.77 MHz 8088 microprocessor. It came with Microsoft's MS-DOS operating system. </li></ul>
  22. 22. IBM PS/2 (1987) <ul><li>IBM's PS/2 was based on Intel's 80386 chip. At the same time, IBM introduced OS/2 </li></ul><ul><li>More than 1 million machines were sold by the end of the year. </li></ul>
  23. 23. Computer Generation <ul><li>Vacuum tube - 1946-1957 </li></ul><ul><li>Transistor - 1958-1964 </li></ul><ul><li>Small scale integration - 1965 - 1971 </li></ul><ul><ul><li>Up to 100 devices on a chip </li></ul></ul><ul><li>Medium scale integration – 1965 -1971 </li></ul><ul><ul><li>100-3,000 devices on a chip </li></ul></ul><ul><li>Large scale integration - 1971-1979 </li></ul><ul><ul><li>3,000 - 100,000 devices on a chip </li></ul></ul><ul><li>Very large scale integration - 1980 -1991 </li></ul><ul><ul><li>100,000 - 100,000,000 devices on a chip </li></ul></ul><ul><li>Ultra large scale integration – 1991 - </li></ul><ul><ul><li>Over 100,000,000 devices on a chip </li></ul></ul>
  24. 24. Moore’s Law <ul><li>Gordon Moore – co-founder of Intel </li></ul><ul><li>Number of transistors on a chip was doubling every two years </li></ul><ul><li>Since 1970’s development has slowed a little </li></ul><ul><ul><li>Number of transistors doubles every 18 months </li></ul></ul><ul><li>Cost of a chip has remained almost unchanged </li></ul><ul><li>Higher packing density means shorter electrical paths, giving higher speed </li></ul><ul><li>Smaller size gives increased flexibility for usage environment </li></ul><ul><li>Reduce power and cooling requirements </li></ul><ul><li>Fewer interconnections increases reliability (than solder connections) </li></ul>
  25. 25. Growth in CPU Transistor Count
  26. 26. The Computer Spectrum <ul><li>A rough categorization of current computers </li></ul>
  27. 27. Better Performance Design: Speeding Microprocessor Up <ul><li>Pipelining </li></ul><ul><li>On board cache </li></ul><ul><li>Branch prediction </li></ul><ul><li>Data flow analysis </li></ul><ul><li>Speculative execution </li></ul>
  28. 28. Computer Architecture Challenge: Performance Balance <ul><li>Processor speed increased </li></ul><ul><li>Memory capacity increased </li></ul><ul><li>Memory speed lags behind processor speed </li></ul><ul><li>Mismatch interface between processor and main memory </li></ul>
  29. 29. Logic(CPU) and Memory Performance Gap
  30. 30. Solutions <ul><li>Increase number of bits retrieved at one time </li></ul><ul><ul><li>Make DRAM “wider” </li></ul></ul><ul><li>Reduce frequency of memory access </li></ul><ul><ul><li>More complex cache and cache on chip </li></ul></ul><ul><li>Increase interconnection bandwidth </li></ul><ul><ul><li>High speed buses </li></ul></ul><ul><ul><li>Hierarchy of buses </li></ul></ul><ul><ul><ul><li>The processor bus VLB </li></ul></ul></ul><ul><ul><ul><li>The cache bus </li></ul></ul></ul><ul><ul><ul><li>The memory bus </li></ul></ul></ul><ul><ul><ul><li>The local I/O bus (VLB, PCI) </li></ul></ul></ul><ul><ul><ul><li>The standard I/O bus (ISA) </li></ul></ul></ul><ul><ul><ul><li> PCI </li></ul></ul></ul>ISA
  31. 31. Typical I/O Device Data Rates
  32. 32. Improvements in Processor Chip <ul><li>Increase hardware speed of processor </li></ul><ul><ul><li>Fundamentally due to shrinking logic gate size </li></ul></ul><ul><ul><ul><li>More gates, packed more tightly, increasing clock rate </li></ul></ul></ul><ul><ul><ul><li>Propagation time for signals reduced </li></ul></ul></ul><ul><li>Increase size and speed of caches </li></ul><ul><ul><li>Dedicating part of processor chip </li></ul></ul><ul><ul><ul><li>Cache access times drop significantly </li></ul></ul></ul><ul><li>Change processor organization and architecture </li></ul><ul><ul><li>Increase effective speed of execution </li></ul></ul><ul><ul><li>Parallelism (instruction-level) </li></ul></ul>
  33. 33. Problems with Clock Speed and Logic Density <ul><li>Power </li></ul><ul><ul><li>Power density increases with density of logic and clock speed </li></ul></ul><ul><ul><li>Dissipating heat </li></ul></ul><ul><li>RC delay( the signal delay of a wire) </li></ul><ul><ul><li>Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them </li></ul></ul><ul><ul><li>Delay increases as RC product increases </li></ul></ul><ul><ul><li>Wire interconnects thinner, increasing resistance </li></ul></ul><ul><ul><li>Wires closer together, increasing capacitance </li></ul></ul><ul><li>Solution: </li></ul><ul><ul><li>More emphasis on organizational and architectural approaches </li></ul></ul>
  34. 34. Solution1: Increased Cache Capacity <ul><li>Typically two or three levels of cache between processor and main memory </li></ul><ul><li>Chip density increased </li></ul><ul><ul><li>More cache memory on chip </li></ul></ul><ul><ul><ul><li>Faster cache access </li></ul></ul></ul><ul><li>Pentium chip devoted about 10% of chip area to cache </li></ul><ul><li>Pentium 4 devotes about 50% </li></ul>
  35. 35. Solution 2: More Complex Execution Logic <ul><li>Enable parallel execution of instructions </li></ul><ul><li>Pipeline works like assembly line </li></ul><ul><ul><li>Different stages of execution of different instructions at same time along pipeline </li></ul></ul><ul><li>Superscalar allows multiple pipelines within single processor </li></ul><ul><ul><li>Instructions that do not depend on one another can be executed in parallel </li></ul></ul>
  36. 36. Soultion3: Processor-level Parallelism – Multiple Cores <ul><li>Multiple processors on single chip </li></ul><ul><ul><li>Large shared cache </li></ul></ul><ul><li>If software can use multiple processors, doubling number of processors almost doubles performance </li></ul><ul><li>Use two simpler processors on the chip rather than one more complex processor </li></ul><ul><li>With two processors, larger caches are justified </li></ul><ul><ul><li>Power consumption of memory logic less than processing logic </li></ul></ul><ul><li>Example: IBM POWER4 </li></ul><ul><ul><li>Two cores based on PowerPC </li></ul></ul>
  37. 37. Pentium Evolution (CISC)(1) <ul><li>8080 </li></ul><ul><ul><li>first general purpose microprocessor </li></ul></ul><ul><ul><li>8 bit data path </li></ul></ul><ul><ul><li>Used in first personal computer </li></ul></ul><ul><li>8086 </li></ul><ul><ul><li>much more powerful </li></ul></ul><ul><ul><li>16 bit </li></ul></ul><ul><ul><li>instruction queue, prefetch a few instructions </li></ul></ul><ul><ul><li>8088 (8 bit external bus) used in first IBM PC </li></ul></ul><ul><li>80286 </li></ul><ul><ul><li>16 Mbyte memory addressable instead of just 1 Mbyte </li></ul></ul><ul><li>80386 </li></ul><ul><ul><li>32 bit ( IA-32 architecture, or called x86, x86-32 ) </li></ul></ul><ul><ul><li>Support for multitasking </li></ul></ul>
  38. 38. Pentium Evolution (2) <ul><li>80486 </li></ul><ul><ul><li>sophisticated powerful cache and instruction pipelining </li></ul></ul><ul><ul><li>built in math co-processor, offloading complex math operations from the main CPU </li></ul></ul><ul><li>Pentium </li></ul><ul><ul><li>Superscalar </li></ul></ul><ul><ul><li>Multiple instructions executed in parallel </li></ul></ul><ul><li>Pentium Pro </li></ul><ul><ul><li>Increased superscalar organization </li></ul></ul><ul><ul><li>Aggressive register renaming </li></ul></ul><ul><ul><li>branch prediction </li></ul></ul><ul><ul><li>data flow analysis </li></ul></ul><ul><ul><li>speculative execution </li></ul></ul>
  39. 39. Pentium Evolution (3) <ul><li>Pentium II </li></ul><ul><ul><li>MMX technology (graphics, video & audio processing) </li></ul></ul><ul><li>Pentium III </li></ul><ul><ul><li>Additional floating point instructions for 3D graphics </li></ul></ul><ul><li>Pentium 4 </li></ul><ul><ul><li>Further floating point and multimedia enhancements </li></ul></ul><ul><li>Itanium </li></ul><ul><ul><li>64 bit </li></ul></ul><ul><ul><li>IA-64 Architecture </li></ul></ul><ul><li>Itanium 2 </li></ul><ul><ul><li>Hardware enhancements to increase speed </li></ul></ul><ul><li>See Intel web pages for detailed information on processors </li></ul>
  40. 40. Intel Computer Family <ul><li>Moore’s law for (Intel) CPU chips. </li></ul>
  41. 41. PowerPC (RISC) <ul><li>1975, 801 minicomputer project (IBM) RISC </li></ul><ul><li>1986, IBM first commercial RISC workstation product, RT PC. 2MIPS. </li></ul><ul><li>1990, IBM RISC System/6000 </li></ul><ul><ul><li>RISC-like superscalar machine </li></ul></ul><ul><ul><li>referred to as POWER architecture </li></ul></ul><ul><li>IBM alliance with Motorola (68000 microprocessors), and Apple, (used 68000 in Macintosh) </li></ul><ul><li>Result is PowerPC architecture </li></ul><ul><ul><li>Derived from the POWER architecture </li></ul></ul><ul><ul><li>Superscalar RISC </li></ul></ul><ul><ul><li>Used in Apple Macintosh </li></ul></ul><ul><ul><li>Embedded chip applications </li></ul></ul>
  42. 42. PowerPC Family (general-purpose)(1) <ul><li>601 (G1): </li></ul><ul><ul><li>Quickly bring PowerPC architecture to market. </li></ul></ul><ul><ul><li>32-bit machine </li></ul></ul><ul><li>603(G2): </li></ul><ul><ul><li>Low-end desktop and portable computers </li></ul></ul><ul><ul><li>32-bit </li></ul></ul><ul><ul><li>Comparable performance with 601 </li></ul></ul><ul><ul><li>Lower cost and more efficient implementation </li></ul></ul><ul><li>604(G2): </li></ul><ul><ul><li>Desktop and low-end servers </li></ul></ul><ul><ul><li>32-bit machine </li></ul></ul><ul><ul><li>Much more advanced superscalar design and greater performance </li></ul></ul><ul><li>620(G2): </li></ul><ul><ul><li>High-end servers </li></ul></ul><ul><ul><li>64-bit architecture </li></ul></ul>
  43. 43. PowerPC Family (2) <ul><li>740/750(G3): </li></ul><ul><ul><li>Also known as G3 processor </li></ul></ul><ul><ul><li>Two levels of cache on chip </li></ul></ul><ul><li>G4: </li></ul><ul><ul><li>Increases parallelism and internal speed </li></ul></ul><ul><li>G5: </li></ul><ul><ul><li>Improvements in parallelism and internal speed </li></ul></ul><ul><ul><li>64-bit organization </li></ul></ul>
  44. 44. Internet Resources <ul><li>http://www.intel.com/ </li></ul><ul><ul><li>Search for the Intel Museum, click on online exhibit </li></ul></ul><ul><li>http://www.ibm.com </li></ul><ul><li>PowerPC (IBM, Motorola) </li></ul><ul><li>Intel Developer Home, http://developer.intel.com/design/index.htm </li></ul>
  45. 45. Example Computer Families <ul><li>Pentium 4 by Intel (CISC) </li></ul><ul><li>von Neumann architecture </li></ul><ul><li>UltraSPARC III by Sun Microsystems (RISC) von Neumann architecture </li></ul><ul><li>The 8051 chip by Intel, used for embedded systems </li></ul><ul><li>Harvard architecture </li></ul>
  46. 46. Harvard architecture v.s. von Neumann architecture <ul><li>Harvard architecture </li></ul><ul><ul><li>Separate instruction and data memory </li></ul></ul><ul><ul><li>Can read an instruction and access data memory at the same time </li></ul></ul><ul><li>Von Neumann architecture </li></ul><ul><ul><li>Memory store instruction and data </li></ul></ul><ul><ul><li>Instruction fetch and data access cannot be at the same time because of the same bus system </li></ul></ul>
  47. 47. Read Chapter 1 and Review Appendix A & B Reading Assignment

×