Oak 0-2011


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Oak 0-2011

  1. 1. Computer Organization and Architecture (3 Credits/SKS) Prof. Dr. Bagio Budiardjo Semester Genap 2010/2011
  2. 2. About the Course :Course Objectives: After completing this course the students are expected to understand and to be able to analyze the computer architecture, in particular the instruction-set design (e.g. addressing modes), and its influence to performance. The students are also expected to understand the meaning of computer organization, that is, the interconnections of computer sub-systems : CPU, memory, bus and I/O from a computing system. The student is expected to understand the more advanced technique in processor design : pipelining.Key words : architecture, instruction-set design, computer organization, performance, processor design and, pipelining techniques
  3. 3. About the grading scheme :• This part is actually not too rigid but it will appear as the combination of : homework, quiz, exercise, mid-test and final-test; whenever possible.• One scheme possible is : Homework : 15% (4) Mid test : 40 % Final Test : 45 %• Grading the homework : Maximum point , 5 point each. Three levels of grading :Good(5), OK(3), and Bad(2).
  4. 4. The books and supporting materials :• Williams Stalling’s book titled Computer Organization and Architecture, Seventh Edition, Prentice Hall 2006; will be used as the main reference for this lecture. There is a new edition of this book, issued in 2010 but up till now is still unavailable in Jakarta.• The classic book is good (Logic and Computer Design Fundamentals) , by Morris M Manno and Charles Kilme - Pearson Asia – 2004), but too many stresses on digital logics. We use materials from this book to explain the hardware design of computer components, whenever possible• Chapters covered will be : Chapters: 1, 2, 3, 4, 5, 10 and 11 and 13 (Stalling’s). Additional materials about pipelining are taken from another book.
  5. 5. Books and supporting materials - continued• There will be no handouts (unless it is very important).• Lecture notes are given through memory stick/CD, SAP could be downloaded from SIAK-NG• Students are encouraged to read books/papers in this field of study.Schedule of class :• At scheduled time and place (K-102) for about 120 minutes• Lecture will be given mainly using LCD projector
  6. 6. About the “course direction”Why do we study Computer Architecture ? History : Course under this name has been taught in many universities long before the microprocessors exist. Years ago, people studied mainframe architectures : IBM S/370, CDC Cyber, CRAY, Amdahl, etc. Since the microprocessors emerge, this course is changed slightly to cope with more advanced topics: Computer design and performance issues
  7. 7. About the “course direction” Computer Organization & Architecture Micro & Embedded OAK Microprocessors Processors Architecture & Design Processors Architecture & Design Application of µproc Analyzing processor design emphasizing Analyzing & Implementing on how to obtain Systems to achieve Computer better processing speed (Cost effectiveness) best processing speed – Cost effectiveness Parallel & DistributedEmbedded Systems Computing Systemsembedding µproc based Organizing Processors/Computingintelligence to new system/device systems to obtain better speed up with different processing paradigm
  8. 8. About the “course direction” - continuedThis course is aimed at : 1. Explaining the phenomena of computer architecture and computer design Knowing the basic instruction cycle and its implication to processing speed 2. Studying the “key” problems : a. CPU memory bottleneck b. CPU I/O devices problems 3. Studying how the “performance” could be improved example : CPU-memory : cache memory 4. How could we improve execution speed with other techniques ? Example : pipelining
  9. 9. Reasons for studying Computer Architecture (Stalling’s arguments)• Able to select “proper” computer systems for a particular environment (cost and effectiveness)• Able to analyzed a processor “embedded” to an environment. Able to analyzed the use of processor in automobile, able to use proper tools to analyzed• Able to choose proper software for a particular computer system
  10. 10. View of a Computer System
  11. 11. – Processor Organization : Another view CPU : Central Processing Unit Control UnitMMU : Mem Mng. Unit IRTo/from PCmemory R1 Cache MAR memory MBR R2 ALU1 ALU2 R3 ADDER Issues : ALU3 Clock speed, Gating signal FPU : Floating Point Unit BUS
  12. 12. Implementation in CHIP
  13. 13. Frequently Asked Question What is the role of CPU clock ? What is the difference between P IV/2.4 G & P IV/3.0 G ? (CPU - clock speed 2.4 and 3.0 Ghz) Consider an instruction of a CPU : AR R1, R2 (add register, content of R1 and content of register R2, place result in R1)
  14. 14. – Execution steps of AR R1,R2 The “possible” micro-execution steps are : a. ALU1 ← [R1] {content of R1 is moved to ALU1} b. ALU2 ← [R2] {content of R2 is moved to ALU2} c. ADD {content of ALU1 + ALU2 = ALU3} d. R1 ← [ALU3] {Result of addition is moved to R1} If, each micro-step is executed in “one” clock-cycle, then this AR instruction needs 4 clock-cycles. For the time being, we ignore the fetch cycle
  15. 15. Question : How do we fetch the instruction? (from memory)• There is a procedure to bring an instruction from memory to CPU (IR), is called the instruction fetch• PC always hold the address of (next) instruction in memory• PC tranfer the address to MAR, and READ memory• PC ususally is icremented by 1 (point to next instruction)• Instruction is placed by memory in MBR• Content of MBR is transferred to IR (instruction is fetched, ready to be executed)
  16. 16. Question : How do we fetch the instruction? (from memory) - continued• Or with register transfer language, we could express the fetch cycle as 1. MAR ← [PC] 2. READ (memory) and wait for completion 3. IR ← [MBR] In terms of CPU clock, this steps may take up to 50 CPU clocks depending on the memory clock speed.
  17. 17. – Processor Organization – continued.1 Control Unit IRTo/frommemory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ALU3 ALU1 [R1] BUS : jalur/unit tidak aktif
  18. 18. – Processor Organization – continued.2 Control Unit IRTo/frommemory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ALU2 [R2] ALU3 : jalur/komponen tdk aktif BUS
  19. 19. – Processor Organization – continued.3 Control Unit IRTo/frommemory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER ADD ALU3 : jalur/komponen tdk aktif BUS
  20. 20. – Processor Organization – continued.4 Control Unit IRTo/frommemory PC R1 MAR MBR R2 ALU1 ALU2 R3 ADDER R1 [ALU3] ALU3 : jalur/komponen tdk aktif BUS
  21. 21. Analysis of Instruction Cycle• With single bus, it is slow, since in each “clock” only one transfer could be executed• Is there any other way to “improve” the speed?• Dual bus processor may be faster• Additional processor cost
  22. 22. Dual processor-bus : A way to improve speed 1. ALU1 ← [R1] (bus1) 1 2 ALU2 ← [R2] (bus2) Other components (Control Unit,IR,PC, 2. ADD MAR,MBR) 3. R1 ← [ALU3] (bus1) R1 Only 3 clocks cycles needed, R2 25% faster ALU1 ALU2 How about this : R3 1. ALU1 ← [R1] (bus1) ADDER ALU2 ← [R2] (bus2) ADD 2. R1 ← [ALU3] (bus1) ALU3 Only 2 clocks cycles needed, DUAL BUS 50% faster
  23. 23. Triple processor-bus : Can the processing speed imrpoved? 1 2 3 Other components (Control Unit,IR,PC, MAR,MBR) R1 Please notice the direction of arrows R2 ALU1 ALU2 If all the CPU components R3 (registers, ALUs and adder) could work in a one third (1/3) clock ADDER cycle (transfer of bits, adding numbers), how many clock (s) needed to complete an addition operation (ADD R1,R2) ? ALU3 Write down the “register transfer” (micro instruction steps) language! Triple Bus
  24. 24. Program Execution• A scientific program using assembly language is run on a microprocessor with 1 Ghz clock. To complete the program , it needs to execute : a. 150.000 arithmetic instructions (e.g ADD R1,R2; MUL R1,R3; etc) b. 250.000 register transfer instructions (e.g MOV R1,R2; etc) c. 100.000 memory access instructions (e.g LOAD R1,X; STORE R2,Y; etc). If, average arithmetic instructions need 2 clocks (to complete), average register transfer instructions need 1 clock and average memory access instructions need 10 clocks; calculate the average CPI (clock per instruction) of the above mentioned program. How many times it needs to complete the program (in seconds)?
  25. 25. Can it be “one clock?” – Yes it can !Views of Other Books on “Micro Operations”• The Bus is called “data path”• It is not only consist of bus (a bunch of wires), but other digital devices• Enable signals is forced to fasten execution• Additional (processor) cost
  26. 26. Datapath Example : Taken from Morris Manno’s book Load enable A select B select Write A address B address• Four parallel-load D data n registers Load R0 2 2 n n• Two mux-based Load R1 register selectors n 0 1 MUX 2• Register destination n 0 1 3 MUX decoder Load n R2 2 3 n• Mux B for external Load R3 constant input 0 1 2 3 n n Register file n Decoder• A data B data Buses A and B with external 2 D address Constant in n n Destination select address and data outputs MB select n 1 MUX B 0 n Address Bus A Out• ALU and Shifter with A B Bus B n n Data Out G select H select Mux F for output select V 4 A S2:0 || Cin B 0 2 S IR B Shifter IL 0 Arithmetic/logic• Mux D for external data input N C unit (ALU) G n H n Z Zero Detect• Logic for generating status bits MF select 0 MUX F 1 Function unit V, C, N, Z F n n Data In MD select 0 1 Bus D MUX D n
  27. 27. Datapath Example: Performing a MicrooperationMicrooperation: R0 ← R1 + R2 Load enable A select B select Write A address B address D data n Apply 01 to A select to place Load R0 2 2 contents of R1 onto Bus A n n Apply 10 to B select to place Load R1 contents of R2 onto B data and n 0 1 MUX apply 0 to MB select to place n 0 2 3 B data on Bus B Load R2 1 2 MUX 3 Apply 0010 to G select to perform n n addition G = Bus A + Bus B Load R3 n n Apply 0 to MF select and 0 to MD 0 1 2 3 n Register file Decoder select to place the value of G onto 2 D address Constant in n A data n B data BUS D Destination select MB select n 1 0 MUX B Apply 00 to Destination select to Bus A Bus B n n Address Out Data enable the Load input to R0 G select A B H select n B Out 4 A B 2 Apply 1 to Load Enable to force the V S2:0 || Cin Arithmetic/logic 0 S IR Shifter IL 0 unit (ALU) Load input to R0 to 1 so that R0 is N C G n H n loaded on the clock pulse (not shown) Z Zero Detect 0 1 MF select MUX F Function unit The overall microoperation requires F n n Data In 1 clock cycle (!) n MD select Bus D 0 1 MUX D
  28. 28. Lesson Learned• We could improve the instruction execution speed by increasing processor clock speed (can we?)• We could improve the instruction execution speed by implementing dual bus (can we?)• We can overcome (partly) the CPU-Memory bottleneck by inserting cache memory between CPU and Main Memory (can we?)• Is there any other way to improve instruction execution speed (increasing performance)? - pipelining• Are these improvements need extra cost? (cost vs performance issue)
  29. 29. What do we get after studying Computer Architecture ?• It is always a complicated problem to answer.• Basically we learn about the processor design issues, namely hardware of a computer but it was taught through “software” logics.• At least we know about basic building blocks of a computer• We know the design development trends
  30. 30. What is our topic ?Intruction Set Architecture(ISA) Application Program Compiler OS ISA CPU Design Circuit Design Chip Layout
  31. 31. Chapter 1 : Introduction
  32. 32. 1. 1. Introduction : Organization & Architecture• Organization and Architecture : two jargons that are often confusing• Computer organization refers to the operational units and their interconnections that realize the architectural specifications (!)• Computer Architecture refers to those attributes of a system visible to a programmer, or put another way, those attributes that have a direct impact on the logical execution of a program (!)• The later definition (architecture) concerns more about the performance, compared to the first one (organization)
  33. 33. 1. 1. Introduction - continued• Architecture concerns more about the basic instruction design, that may lead to better performance of the system• Organization, is the implementation of computer system, in terms of its interconnection of functional units : CPU, memory, bus and I/O devices.• Example : IBM/S-370 family architecture. There are plenty of IBM products having the same architecture (S- 370) but different organization, depending on its price/performance measures. Cost and performance differs the organizations• So, organization of a computer is the implementation of its architecture, but tailored to fit the intended price and performance measures.
  34. 34. Chapter 2 :Computer Evolution and Performance
  35. 35. ENIAC - background• Electronic Numerical Integrator And Computer• Eckert and Mauchly• University of Pennsylvania• Trajectory tables for weapons• Started 1943• Finished 1946 – Too late for war effort• Used until 1955
  36. 36. ENIAC - details• Decimal (not binary)• 20 accumulators of 10 digits• Programmed manually by switches• 18,000 vacuum tubes• 30 tons• 15,000 square feet• 140 kW power consumption• 5,000 additions per second
  37. 37. ENIAC
  38. 38. ENIAC
  40. 40. Structure of von Neumann machine
  41. 41. IAS - details• 1000 x 40 bit words – Binary number – 2 x 20 bit instructions• Set of registers (storage in CPU) – Memory Buffer Register – Memory Address Register – Instruction Register – Instruction Buffer Register – Program Counter – Accumulator – Multiplier Quotient
  42. 42. 2. 1.Evolution and Performance - history• 1946 Von Neuman and his gang proposed IAS (Institute for Advanced Studies)• The design included : – main memory – ALU – Control Unit – I/O• First Stored Program, able to perform : +, -, x, :• The “father” of all modern computer/processor
  43. 43. Structure of IAS
  44. 44. IAS
  45. 45. 2. 1. Evolution and Performance -historyIAS components are :• MBR (memory buffer register), MAR (memory address register), IR (instruction register), IBR (instruction buffer register), PC (program counter), AC (accumulator and MQ (multiplier quotient), memory (1000 locations)• 20 bit instruction : 8 bit opcode, 12 bit address (addressing one of 1000 memory locations - 0 to 999)• 39 bit data (with sign bit - 1 bit)• Operations : data transfer between registers and ALU, unconditional branch, conditional branch, arithmetic, address modify
  46. 46. 2.1. Evolution - History of Commercial computers• First Generation : 1950 Mauchly & Eckert developed UNIVAC I, used by Census Beureau• Then appeared UNIVAC II, and later grew to UNIVAC 1100 series (1103, 1104,1105,1106,1108) - vacuum tubes and later transistor• Second Generation : Transistors, IBM 7094 (although there are NCR, RCA and others tried to develop their versions - commercially not successful)• Third Generation : Integrated Circuit (IC) - SSI. IBM S/360 was the successful example• Later generations (possibly fourth and fifth) : LSI and VLSI technology
  47. 47. 2.1. Evolution - history of commercial computers Table 2.1 Approx SpeedGeneration Time Technology (opr/sec)--------------------------------------------------------------------------1. 1946-57 Vacuum tube 40,0002. 1958-64 Transistor 200,0003. 1965-71 SSI & MSI 1,000,0004. 1972-77 LSI 10,000,0005. 1978- VLSI 100,000,000--------------------------------------------------------------------------
  48. 48. Vaccum Tubes
  49. 49. Transistor
  50. 50. 2.1. Evolution - System 360 Family Model Model Model Model ModelCharacteristic 30 40 50 65 75---------------------------------------------------------------------------------------- --Max memory size (Bytes) 64K 256K 256K 512K 512KMemory data-rate(MB/s) 0.5 0.8 2.0 8.0 16.0Processor cycle time (µs) 1.0 0.625 0.5 0.25 0.2Relative Speed 1 3.5 10 21 50Max Number data channel 3 3 4 6 6Max chan. data-rate(KB/s) 250 400 800 1250 1250---------------------------------------------------------------------------------------• Family architecture menyebabkan adanya istilah : upward dan downward compatible
  51. 51. Generations of Computer• Vacuum tube - 1946-1957• Transistor - 1958-1964• Small scale integration - 1965 on – Up to 100 devices on a chip• Medium scale integration - to 1971 – 100-3,000 devices on a chip• Large scale integration - 1971-1977 – 3,000 - 100,000 devices on a chip• Very large scale integration - 1978 to date – 100,000 - 100,000,000 devices on a chip• Ultra large scale integration – Over 100,000,000 devices on a chip
  52. 52. Moore’s Law• Increased density of components on chip• Gordon Moore - cofounder of Intel• Number of transistors on a chip will double every year• Since 1970’s development has slowed a little – Number of transistors doubles every 18 months• Cost of a chip has remained almost unchanged• Higher packing density means shorter electrical paths, giving higher performance• Smaller size gives increased flexibility• Reduced power and cooling requirements• Fewer interconnections increases reliability
  53. 53. Moore’s Law
  54. 54. Growth in CPU Transistor Count
  55. 55. Growth in CPU Transistor Count
  56. 56. IBM 360 series• 1964• Replaced (& not compatible with) 7000 series• First planned “family” of computers – Similar or identical instruction sets – Similar or identical O/S – Increasing speed – Increasing number of I/O ports (i.e. more terminals) – Increased memory size – Increased cost• Multiplexed switch structure
  57. 57. 2.1. Evolution - Later generations• Semiconductor memories : 1K,4K,16K,64K,256K,1M,4M,16 Mbits on a single chip At present : 256 Mbit, 512 Mbit per chip• Microprocessors appeared : Intel 4004 (1971), Intel 8008 (72), Intel 8080 (8 bit-74), 8086 (16 bit-81), 80386 (32bit-85) onward.• At almost the same time : Motorola, 6800 (8bit), 68000 (16bit), 68010 (16bit), 68020 (32bit), 68030/40 (32bit)• Then Motorola’s product disappeared commercially• Intel products dominated the market, since the appearance of IBM PC
  58. 58. 2.1. Evolution of Microprocessors Table 2.2---------------------------------------------------------------------------------------- --Feature 8008 8080 8086 80386 80486---------------------------------------------------------------------------------------- --Year introduced 1972 1974 1978 1985 1989# of instructions 66 111 133 154 235Address bus width 8 16 20 32 32Data bus width 8 8 16 32 32# of registers 8 8 16 8 8Memory addressability 16KB 64KB 1 MB 4 GB 4 GBBus Bandwidth (MB/s) - 0.75 5 32 32Reg-Reg add time (µs) - 1.3 0.3 0.125 0.06----------------------------------------------------------------------------------------
  59. 59. 2.2 Designing for Performance• Price of µprocessor continue to drop every year• $1000 for an advanced system is today’s price : in it you may find more than 100 million transistors !• Even 100 millions pieces of toilet papers cost more !!• Computing power is for free !!• People solve problem that never been thought possible before : image processing, speech recognition, videoconferencing, multimedia authoring, etc.• We need more and more computing power• The organization and architecture of today’s processor remains the same (basically) as those of IAS !• Algorithms to improve speed and efficiency differs !
  60. 60. 2.2. Designing - µprocessor speed• Intel Pentium and PowerPC follows Moore’s Law : By shrinking size of lines in IC chips by 10%, industry may get new IC with 4 times transistor density every 3 years !• The above law is true for DRAM (Dynamic Random Access Memory)• If the capacity does increase, the speed doesn’t increase automatically• More work in designing instructions needed• Also, techniques for faster instruction execution must be developed : branch prediction, data flow analysis and speculative execution
  61. 61. Pentium Evolution (1)• 8080 – first general purpose microprocessor – 8 bit data path – Used in first personal computer – Altair• 8086 – much more powerful – 16 bit – instruction cache, prefetch few instructions – 8088 (8 bit external bus) used in first IBM PC• 80286 – 16 Mbyte memory addressable – up from 1Mb• 80386 – 32 bit – Support for multitasking
  62. 62. Pentium Evolution (2)• 80486 – sophisticated powerful cache and instruction pipelining – built in maths co-processor• Pentium – Superscalar – Multiple instructions executed in parallel• Pentium Pro – Increased superscalar organization – Aggressive register renaming – branch prediction – data flow analysis – speculative execution
  63. 63. Pentium Evolution (3)• Pentium II – MMX technology – graphics, video & audio processing• Pentium III – Additional floating point instructions for 3D graphics• Pentium 4 – Note Arabic rather than Roman numerals – Further floating point and multimedia enhancements• Itanium – 64 bit – see chapter 15• See Intel web pages for detailed information on processors
  64. 64. Intel Microprocessor Performance
  65. 65. Summary: Important Points• Organization and Architecture• Family Architectures• Function of a Computer (Data Processing, Control, Data movement)• Born of Computers (Eniac-decimal, IAS-digital) Mauckly-Eckert• Microprocessors(I-4004,8008,8080,8086/16,80386/32)• IAS Instructions• Von Neuman bottleneck• Increasing clock speed, make bus wider, cache memory• Loosers : e.g. Motorola Micro Processor, Radio Shack,• More dense transistor in a single chip (4 times every 3 years, by shrinking lines by 10%)