Advanced Computer Architectures – Part 2.1
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Advanced Computer Architectures – Part 2.1

on

  • 538 views

Part 2.1 of the slides I wrote for the course "Advanced Computer Architectures", which I taught in the framework of the Advanced Masters Programme in Artificial Intelligence of the Catholic University ...

Part 2.1 of the slides I wrote for the course "Advanced Computer Architectures", which I taught in the framework of the Advanced Masters Programme in Artificial Intelligence of the Catholic University of Leuven, Leuven (B)

Statistics

Views

Total Views
538
Views on SlideShare
536
Embed Views
2

Actions

Likes
0
Downloads
23
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Advanced Computer Architectures – Part 2.1 Presentation Transcript

  • 1. Advanced Computer Architectures – HB49 – Part 2.1 Vincenzo De Florio K.U.Leuven / ESAT / ELECTA
  • 2. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/2 Course contents • Basic Concepts Computer Design • Computer Architectures for AI • Computer Architectures in Practice
  • 3. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/3 Computer Design Quantitative assessments • Instruction sets • Pipelining
  • 4. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/4 Computer design • First part of the course: a survey of computer history • Key aspect of this history:  In the last 60 years computers have experienced a formidable growth in performance and a huge costs decrease  A 1000¤ PC today provides its user with more performance, memory, and disk space of a 1M$ mainframe of the Sixties
  • 5. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/5 Computer design • How this could be possible? • Through  Advances in computer technology  Advances in computer design
  • 6. © V. De Florio KULeuven 2003 Basic Concepts Computer design • The tasks of a computer designer:  Determine key attributes for a new machine Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/6  E.g., design a machine that maximize performance keeping costs under control  Aspects:  Instruction set design  Functional organization  Logic design  Implementation (To be defined later)
  • 7. © V. De Florio KULeuven 2003 Basic Concepts Significant improvements • First 25 years:  From both technology and design • From the Seventies: Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/7  Mainly from IC technology  Main concern = compatibility with the past (to save investments)  Compatibility at ML  No room for design improvements  20-30% per year for mainframes and minis • Late Seventies: advent of the mP  Higher rate (35% per year)
  • 8. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/8 Significant improvements: the mP • The mP Mass-produced  lower costs Significant changes in computer marketplace Higher level language compatibility (no need for object code compatibility) Availability of standard, vendor-independent OS (less risks and costs in producing a new architecture) allowed to develop a new concept: RISC architectures
  • 9. © V. De Florio KULeuven 2003 Basic Concepts Significant improvements: RISC RISC architectures  Designed in the Eighties, on the market ca.‘85  Since then, a 50% improvement per year Computer Design Computer Architectures for AI Computer Architectures In Practice 300 Sun UltraSparc P e r f o r m a n c e 1.54X/yr 250 DEC 21064a 200 150 IBM Power 2/590 100 DEC AXP 3000 HP 9000/750 50 MIPS M/120 Sun-4/260 MIPS M2000 0 1987 2.1/9 1.35X/yr IBM RS6000/540 1988 1989 1990 1991 Year 1992 1993 1994 1995
  • 10. © V. De Florio KULeuven 2003 Technology Trends Basic Concepts 1000 Computer Design Supercomputers 100 Mainframes Computer Architectures for AI 10 Minicomputers Microprocessors 1 Computer Architectures In Practice 0.1 1965 2.1/10 1970 1975 1980 1985 Year 1990 1995 2000
  • 11. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/11 Computer design • The mP allowed a 50% of performance increase. How was that possible?  Enhanced capability for users  IBM Power 21993  Cray Y-MP1988  The fastest supercomputer in 1988 has approx. the same performance of the fastest 1993 workstation  Price: 1/10  Computers became more and more mP-based  Mainframes were disappearing or becoming based on off-the-shelf mPs
  • 12. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/12 Computer design • Big consequence  No more market urge for object code compatibility  Freedom from compatibility with old designs  Renaissance in computer design  Again, significant improvements from both technology and design  50% of performance growth!
  • 13. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/13 Computer design • The highest performance mP in ’95 is mainly a result of design improvements (1-to-5) • In this section we focus on the design techniques that allowed this state of facts
  • 14. © V. De Florio KULeuven 2003 Performance Computer Design • What are the aspects to be taken into account in order to reach a higher performance? • How to choose between different alternatives? Computer Architectures for AI  Amdhal’s law  Quantitative assessment Basic Concepts Computer Architectures In Practice 2.1/14
  • 15. © V. De Florio KULeuven 2003 Basic Concepts Amdhal’s law • Speed-up: Execution time for entire task w/o using the “enhancement” S= Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/15 Execution time for entire task using enhancement when possible • Amdhal’s law on speed-up: • Speed up depends on the fraction of time that may be affected by the enhancement
  • 16. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Amdhal’s law Let us call F the fraction of time affected by the enhancement For instance, F=0.40 means that the original program would benefit of the enhancement for 40% of the time of execution What do we gain by introducing the enhancement? Exec-timeNEW = Exec-timeOLD  ((1 -F) + F/SENH) Where SENH is the speedup in the enhanced mode. Hence, Computer Architectures In Practice 2.1/16 S= Exec-timeNEW Exec-timeOLD = 1 (1 - F) + F / SENH
  • 17. © V. De Florio KULeuven 2003 Amdhal’s law Basic Concepts Computer Design SENH grows, but SOVER does not Computer Architectures for AI Computer Architectures In Practice 2.1/17 F = 40%
  • 18. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Amdhal’s law • Law of diminishing returns  the incremental improvement in speedup gained by an additional improvement in the performance of just a portion of the computation diminishes as improvements are added Computer Architectures for AI Computer Architectures In Practice 1 1 lim SENH S = lim SENH = (1 - F) + F / SENH (1 - F) = SMAX 2.1/18
  • 19. © V. De Florio KULeuven 2003 Amdhal’s law Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/19 To reach a maximum speedup = 3, F must be at least 66%
  • 20. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/20 Amdhal’s law… • “…can serve as a guide to how much an enhancement will improve performance and how to distribute resources to improve cost/performance. • The goal, clearly, is to spend resources proportional to where time is spent.’’
  • 21. © V. De Florio KULeuven 2003 Basic Concepts Amdhal’s law • Example 1 (p.30 P&H)  Method allows an improvement by factor 10  That can be exploited for 40% of the time Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/21 speeupoveral  1 fract. enhanced 1  fract. enhanced   speedupenhanced 1   1.56 0.4 1  0.4  10
  • 22. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice Amdhal’s law Example 2 (p.31 P&H)  50% of the instructions of a given benchmark are floating point instructions  FPSQR applies to 20% of the same benchmark  Alternative 1: extra hardware: FPSQR is 10 times faster  Alternative 2: all the FP instructions go 2 times faster speedupoveral  speedupFPSQR  speedupFP 2.1/22 1 1  fract. enhanced   1 fract. enhanced speedupenhanced  1.22 0.2 1  0.2  10 1   1.33 0.5 1  0.5  2.0
  • 23. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/23 Quantitative assessment • CPUTIME(p) = Time spent by the CPU to run program p • Clock cycle time = tcc , clock rate = 1/ tcc • CPUTIME(p) = # clock cycles  tcc = # clock cycles / clock rate • E.g.: clock cycle time = 2ns clock rate = 500 MHz • #CC(p) = number of clock cycles spent in the execution of p
  • 24. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/24 Quantitative assessment • Instruction count • IC(c,p) = number of instructions that CPU c executed during the activity of program p • Often, IC(p)
  • 25. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/25 Quantitative assessment • Clock cycles per instruction • CPI(p) = #CC(p) / IC(p) average number of clock cycles needed to execute one instruction of p
  • 26. © V. De Florio KULeuven 2003 Quantitative assessment Basic Concepts • CPUTIME(p) = Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/26 = #clock cycles  clock cycle time = #CC(p)  tcc = IC(p)  CPI(p)  tcc = IC(p)  CPI(p) clock rate  We can influence the performance of a given program p by optimizing the three key variables IC(p), CPI(p), and clock rate.
  • 27. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/27 Quantitative assessment • CPU performance is equally dependent upon three characteristics  Clock rate (the higher, the better)  Clock cycles per instructions (the lesser, the better)  Instruction count (the lesser, the better)
  • 28. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/28 Quantitative assessment • CPU performance is equally dependent upon three characteristics  Clock rate (HW technology & organization)  Clock cycles per instruction (organization & instruction set architecture)  Instruction count (instruction set architecture & compiler technology) • Note: technologies are not independent of each other!
  • 29. © V. De Florio KULeuven 2003 Basic Concepts Quantitative assessment CPU time = Seconds Program Computer Design Program Computer Architectures for AI Computer Architectures In Practice Program Cycles x Seconds Instruction Inst Count CPI X Compiler X Inst. Set. X X Organization X Cycle Clock Rate (X) Technology 2.1/29 = Instructions x X X
  • 30. © V. De Florio KULeuven 2003 Basic Concepts Quantitative assessment • Decades long challenge: optimizing CPUTIME(p) = IC(p)  CPI(p) Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/30 clock rate • This is a function of p! • The choice of benchmarks is important
  • 31. © V. De Florio KULeuven 2003 Basic Concepts Quantitative assessment • Which methods to use? CPUTIME(p) = IC(p)  CPI(p) Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/31 clock rate • Method 1: increasing the clock rate (Note: independent of p!) • Methods 2: those trying to decrease IC(p) • Methods 3: those trying to decrease CPI(p) • Each method is equally important • Some methods are more effective
  • 32. © V. De Florio KULeuven 2003 Basic Concepts Quantitative assessment: how to calculate CPI? n  CPIi  ICi ICi   CPI =   CPIi     Instr. Count  Instr. count i 1 i 1 Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/32 n ICi = number of times instruction i is executed by p CPIi = average number of clock cycles for instruction i CPIi needs to be measured and not just read from a table in the Reference Manual! That is, we need to take into account the memory access time! (Cache misses do count… a lot)
  • 33. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Quantitative assessment • Example 3: 2 alternatives for a conditional branch instruction  A: a CMP that sets a condition code (Z bit) followed by a JZ  B: a single instruction to do CMP and JZ Arch. A Computer Architectures for AI Computer Architectures In Practice 2.1/33 LD R1, 0 L: INC R1 CMP R1, 5 JZ L RET Arch. B LD R1, 0 L: INC R1 JRZ R1, 5, L RET We assume that JZ and JRZ take 2 cycles, all the other instructions take 1 cycle
  • 34. © V. De Florio KULeuven 2003 Quantitative assessment Arch. A Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice LD R1, 0 L: INC R1 CMP R1, 5 JZ L RET LD R1, 0 L: INC R1 JRZ R1,5,L RET Arch. B • 20% of the instructions are c.jumps (instructions such as JZ or JRZ) • 80% are other instructions • On A, for each c.jump there is a CMP  on A, 20% are c.jumps and 20% are CMP’s • 60% are other instructions Because of the extra complexity in B, the clock of A is faster (CTB = 1.25 CTA) 2.1/34
  • 35. © V. De Florio KULeuven 2003 n Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/35 Quantitative assessment n • CPIA = Si instri x cyclesi / #CCA = = #BRA x cyclesBR + #BRA x cyclesBR #CCA #CCA = nBRA x cyclesBR + nBRA x cyclesBR = 20% x 2 + 80% x 1 = 1.2 • CPUA = ICA x CPIA x CTA = ICA x 1.2 x CTA • CPIB = Si instri x cyclesi / #CCB = = #BRB x cyclesBR + #BRB x cyclesBR #CCB #CCB = nBRB x cyclesBR + nBRB x cyclesBR
  • 36. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice Quantitative assessment • Now, on B:  One spares 20% of the instructions (the extra cmp’s), hence: nBRB = 20 / (100 – 20) = 0.25 (25%)  Furthermore, ICB = 0.8 ICA • Hence CPIB = 0.25 x 2 + 0.75 x 1 = 1.25 • CPUB = ICB x CPIB x CTB = = 0.8 ICA x 1.25 x 1.25 CTA So CPUB = 1.25 x ICA x CTA CPUA = 1.2 x ICA x CTA So A is faster 2.1/36 (for which P?)
  • 37. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/38 Performance • A straightforward enhancement is given by increasing the clock rate • The entire program benefits • Also, independent of the particular program • Dependent on the efficiency of the compiler etc.
  • 38. © V. De Florio KULeuven 2003 Clock Frequency Growth Rate 1,000 Computer Design Computer Architectures for AI Clock rate (MHz) Basic Concepts 100      R10000              Pentium100                          i80386  i80286 10 i8086   1 i8080    i8008 i4004 Computer Architectures In Practice 0.1 1970 1975 • 30% per year 2.1/39 1980 1985 1990 1995 2000 2005
  • 39. © V. De Florio KULeuven 2003 Transistor Count Growth Rate 100,000,000 Basic Concepts  Computer Design Computer Architectures for AI Transistors 10,000,000   R10000     Pentium                    i80386  i80286    R3000 R2000   1,000,000 100,000 i8086 10,000  i8080   i8008 Computer Architectures In Practice i4004 1,000 1970 1975 1980 1985 1990 1995 2000 • 100 million transistors on chip in early year 2000. • Transistor count grows much faster than clock rate 2.1/40 2005
  • 40. © V. De Florio KULeuven 2003 Basic Concepts Performance • Another important factor for performance is given by  Memory accesses  I/O (disk accesses) Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/43
  • 41. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Memory • Semiconductor DRAM technology  Density: increase of 60% per year (quadruplicate in 3 years)  Cycle time: much less increase than this! Computer Architectures In Practice Capacity Speed Logic 2x in 3 years 2x in 3 years DRAM Computer Architectures for AI 4x in 3 years 1.4x in 10 years Disk 2x in 3 years 1.4x in 10 years Speed increases of memory and I/O have not kept pace with processor speed increases. 2.1/44
  • 42. © V. De Florio KULeuven 2003 Memory size Basic Concepts 1000000000 100000000 10000000 Bits Computer Design year 1980 1983 1986 1989 1992 1996 2000 Computer Architectures for AI 1000000 100000 10000 1000 Computer Architectures In Practice 2.1/45 1970 1975 1980 1985 Year 1990 1995 2000 size(Mb) cyc time 0.0625 250 ns 0.25 220 ns 1 190 ns 4 165 ns 16 145 ns 64 120 ns 256 100 ns
  • 43. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/46 Basic definitions 1. Bandwidth: the rate at which data can be transferred. Bandwidth is typically measured in bytes per second. 2. Block size: the amount of data transferred per request. Block size is typically measured in bytes. 3. Latency: the time between making a request (e.g. to read or write a block of data) and completing the request. Latency is typically measured in seconds. 4. Throughput: The number of requests that can be completed per unit time. Throughput is typically measured in requests per second.
  • 44. © V. De Florio KULeuven 2003 Basic Concepts Memory • DRAM: main memory of all computers  Commodity chip industry: no company >20% share  Packaged in SIMM or DIMM (e.g.,16 DRAMs/SIMM) Computer Design Computer Architectures for AI Computer Architectures In Practice • Capacity: 4X/3 years (60%/year)  Moore’s Law • MB/$: + 25%/year • Latency: – 7%/year, Bandwidth: + 20%/year (so far) SIMM = single in-line memory chip, a small circuit board that can hold a group of memory chips. Measured in bytes vs bits 32-bit path to memory DIMM = dual in-line memory chip. 64-bit to memory source: www.pricewatch.com, 5/21/98 2.1/47
  • 45. © V. De Florio KULeuven 2003 Processor Limit: DRAM Gap Basic Concepts 1000 CPU Computer Architectures for AI Computer Architectures In Practice 2.1/48 100 Processor-Memory Performance Gap: (grows 50% / year) 10 DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Computer Design Performance “Moore’s Law” µProc 60%/yr. DRAM 7%/yr..
  • 46. © V. De Florio KULeuven 2003 Memory Summary Basic Concepts • DRAM: Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/49  rapid improvements in capacity, MB/$, bandwidth;  slow improvement in latency  Processor-memory interface is a bottleneck to delivered bandwidth
  • 47. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/50 Disk Components
  • 48. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/51 Disk Components: Platters • Platters: the recording surfaces. i. 1 to 8 inches in diameter (2.5 to 20 cm). ii. Stacked on a spindle: typical disks have 1-12 platters. iii. Data can be stored on one or both surfaces. iv. Spindle and platters rotate at 3600 - 10000 rpm (60-165 Hz). v. Recording density depends on applying a magnetic film with few defects. vi. Rotation rate limited by bearings and power consumption.
  • 49. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/52 Disk Components: Heads • i. Heads: write and read data to and from platters. Data stored as presence or absence of magnetization. ii. Head “floats” on air-film that rotates with the disk. Bernoulli effect pulls head toward disk but not into it. A dust particle can cause a “head crash” where the disk surface is scratched and any data on it is lost. iii. Disk heads are manufactured using thin film technology. Advancing technology allows smaller heads and therefore more closely spaced tracks and bits.
  • 50. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/53 Disk Components: Actuators • i. ii. iii. Actuators: move heads radially over the platters. Actuator arm needs to be light to move quickly. Actuator arm needs to stiff to prevent flexing. Smaller platters allow shorter arms: therefore lighter and stiffer. iv. Actuators limited by • • power of actuator motor and weight and strength of actuator components
  • 51. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Disks: Data Layout • Each surface consists of concentric rings called tracks • Each track is divided into sectors. Data is written to and read from the disk a whole sector at a time • The set of tracks that are a the same relative position on each surface form a cylinder Computer Architectures for AI Computer Architectures In Practice 2.1/54 cylinder
  • 52. © V. De Florio KULeuven 2003 Three Components of Disk Access Time Basic Concepts 1. Seek time: the time to move the heads to the desired cylinder  Advertised to be 8 to 12 ms. May be lower in real life Computer Design 2. Rotational latency: the time for the desired sector to arrive under the head  4.1 ms at 7200 RPM and 8.3 ms at 3600 RPM Computer Architectures for AI 3. Transfer time: the time to read the data from the disk and send it over the I/O bus to the processor  2 to 12 MB per second Computer Architectures In Practice Queue Proc Ctrl Disk Access Time IOC Device Response time = Queue + Ctrl + Device Service time 2.1/55
  • 53. © V. De Florio KULeuven 2003 Hard Disks Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice Disk Latency = Queueing Time + Controller time + Seek Time + Rotation Time + Xfer Time Order of magnitude times for 4K byte transfers: Average Seek: 8 ms or less Rotate: 4.2 ms @ 7200 rpm Xfer: 1 ms @ 7200 rpm 2.1/56
  • 54. © V. De Florio KULeuven 2003 Hard Disks • Capacity Basic Concepts  + 60%/year (2X / 1.5 yrs) • Transfer rate (BW) Latency = Queuing Time +  + 40%/year (2X / 2.0 yrs) Computer Controller time + Design • Rotation + Seek time per access Seek Time + Rotation Time  – 8%/ year (1/2 in 10 yrs) + + Size / Bandwidth per byte • MB/$ Computer { Architectures for AI  > 60%/year (2X / <1.5 yrs) Computer Architectures In Practice source: Ed Grochowski, 1996, “IBM leadership in disk drive technology”; www.storage.ibm.com/storage/technolo/grochows/grocho01.htm, 2.1/57
  • 55. © V. De Florio KULeuven 2003 Hard disks Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/58 1973: 1. 7 Mbit/sq. in 140 MBytes 1979: 7. 7 Mbit/sq. in 2,300 MBytes
  • 56. © V. De Florio KULeuven 2003 Hard Disks Areal Density Basic Concepts 10000 1000 100 10 1 1970 Computer Design 1980 1990 2000 Year Computer Architectures for AI Computer Architectures In Practice 1989: 63 Mbit/sq. in 60,000 MBytes 2.1/59 1997: 1450 Mbit/sq. in 1600 MBytes 1997: 3090 Mbit/sq. in 8100 MBytes
  • 57. © V. De Florio KULeuven 2003 Hard Disks Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/60 • Continued advance in capacity (60%/yr) and bandwidth (40%/yr.) • Slow improvement in seek, rotation (8%/yr) • Time to read whole disk Year Sequentially Randomly 1990 4 minutes 6 hours 2000 12 minutes 1 week
  • 58. © V. De Florio KULeuven 2003 Memory/Disk Summary Basic Concepts • Memory:  DRAM rapid improvements in capacity, MB/$, bandwidth; slow improvement in latency Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/61 • Disk:  Continued advance in capacity, cost/bit, bandwidth; slow improvement in seek, rotation • Huge gap between CPU and external memories • How to address this problem? • Classical way: memory hierarchies
  • 59. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/62 Memory hierarchies • Axiom of HW designer: smaller is faster  Larger memories => larger signal delay  More levels are required to encode addresses  In a smaller memory the designer can use more power per cell => shorter access times • Crucial features for performance  Huge bandwidth (in MB/sec.)  Short access times • Principle of locality  The data most recently used is very likely to be accessed again in the near future (temporal l.)  Memory cells close to the most recently used one are likely to be accessed in the near future (spatial) • Combining the above with the Amdhal law, the “best” enhancement is using hierarchies of memories
  • 60. © V. De Florio KULeuven 2003 Typical memory hierarchy (`95) Basic Concepts CPU Registers Cache Computer Design I/O bus Memory bus Memory I/O devices 32 MB 100 ns 2 GB 5 ms Computer Architectures for AI Computer Architectures In Practice 2.1/63 Size: 200B Speed: 5 ns 64KB 10 ns
  • 61. © V. De Florio KULeuven 2003 Basic Concepts Memory hierarchies Input/Output and Storage Disks, WORM, Tape Computer Design Computer Architectures for AI Coherence, Bandwidth, Latency L2 Cache L1 Cache Computer Architectures In Practice Emerging Technologies Interleaving Bus protocols DRAM Memory Hierarchy VLSI Instruction Set Architecture Addressing, Protection, Exception Handling Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, DSP 2.1/64 RAID Pipelining and Instruction Level Parallelism
  • 62. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/65 Memory hierarchies • • • • • Registers: smallest and fastest memory Size: less than 1KB Access time: 2-5 ns Bandwidth: 4000-32000 MB/sec Managed by the compiler (or the assembly programmer)  register int a; • Special purpose vs. general purpose • Monolithic or double-shaped  Rx = Rl + Rh • Backed in cache • Implemented via custom memory with multiple ports
  • 63. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/66 Memory hierarchies • Cache = small, fast memory located close to the CPU • The cache holds the most recently accessed code or data  Managed by HW  No way to tell “put these data in cache” at SW  New research: cache-conscious data structures • • • • • Size: less than 4 MB Access time: 3-10 ns Bandwidth: 800-5000 MB/sec Backed in main memory Implemented with (on- or off-chip) CMOS SRAM
  • 64. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/67 Memory hierarchies • Cache terminology: cache hit, cache miss, cache block  Cache hit: the CPU has been able to find in cache the requested data  Cache miss:  Cache hit  Cache block: the fixed-size buffer used to load a portion of memory into the cache • A cache miss blocks the CPU until the corresponding memory block gets cached
  • 65. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/68 Memory hierarchies • Virtual memory: same principles behind the use of cache, but implemented between main memory and disk storage • At any point in time, not all the data referenced by p need to be in main memory • Address space is partitioned into fixedsize blocks: pages • A page is either in memory or on disk • When CPU references an item within a page if ( Check-if-in-cache() == CACHE_MISS ) if ( Check-if-in-memory() == MEM_MISS) PageFault(); // Loads page in memory  CPU doesn’t stall – switches to other tasks
  • 66. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/69 Cache performance • Example: speedup using a cache  Cache 10 times faster than main memory  Cache is used 90% of the cases speedup   1 1  fract. enhanced   1 0.9 1  0.9  10  5.3 fract. enhanced speedupenhanced
  • 67. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/70 Cache performance CPUtime = (CPU clock cycles + memory stall cycles) x clock cycle time Memory stall cycles = #(misses)  £(miss) = IC  #(misses per instruction)  £(miss) = IC  #(memory references per instr.)  miss-frequency  £(miss)
  • 68. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/71 Cache performance • Example (P&H, p.43)  A computer has a CPI = 2 when data is in cache  Memory access is only required by load and store instructions (40% of total #)  £(miss) = 25 clock cycles  Cache misses frequency = 2% ? How faster would the machine be when no cache miss occurs? CPU"-hit = (CPU clock cycles + memory stall cycles)  clock cycle time = (IC  CPI + 0)  clock cycle time = IC  2  clock cycle time
  • 69. © V. De Florio KULeuven 2003 Basic Concepts Cache performance ? How fast would the machine be when cache misses do occur? 1. Compute the memory stall cycles (msc) Computer Design msc = IC  memory references per instruction  miss rate  miss penalty = IC  (1 + 0.4)  0.02  25 Data access Computer Architectures for AI Instruction access = IC  0.7 Computer Architectures In Practice 2.1/72 2. Compute total performance: CPUcache=(CPU clock cycle + msc)  clock cycle time = (IC  2 + IC  0.7)  clock cycle time = 2.7  IC  clock cycle time
  • 70. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/73 Computer Design • Quantitative assessments Instruction sets • Pipelining
  • 71. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/74 Computer design • Instruction-set architecture:  The architecture of the machine level  The boundary between SW and HW • Organization:  High level aspects: memory system, bus structure, internal CPU design • Hardware:  The specifics of a machine: detailed logic design, packaging technology… • Architecture = I + O + H
  • 72. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/75 Instruction Sets • IS = Instruction sets = The architecture of the machine language • IS Classification • Roles of the compilers • DLX
  • 73. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/76 Computer Design  IS IS Classification • Role of the compilers • DLX
  • 74. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/77 Computer Design  IS  IS Classification • Key: type of internal storage in the CPU • Three main classes  Stack architectures  Accumulator architectures  General-purpose register architectures
  • 75. Computer Design  IS  IS Classification  Stack A. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI • • • • Stack architecture: Operands are implicitly referred to Top two items on the system stack Example: C = A + B 3. ADD 2.1/78 2. PUSH B A Computer Architectures In Practice B 1. PUSH A ADD = PUSH (POP + POP)
  • 76. Computer Design  IS  IS Classification  Stack A. © V. De Florio KULeuven 2003 Basic Concepts Computer Design • • • • Stack architecture: Operands are implicitly referred to Top two items on the system stack Example: C = A + B 3. ADD Computer Architectures for AI 2. PUSH B A Computer Architectures In Practice 2.1/79 1. PUSH A ADD = PUSH (POP + POP) ADD = PUSH (B + POP)
  • 77. Computer Design  IS  IS Classification  Stack A. © V. De Florio KULeuven 2003 Basic Concepts Computer Design • • • • Stack architecture: Operands are implicitly referred to Top two items on the system stack Example: C = A + B 3. ADD Computer Architectures for AI 2. PUSH B B+A Computer Architectures In Practice 2.1/80 1. PUSH A ADD = PUSH (POP + POP) ADD = PUSH (B + POP) ADD = PUSH (B + A)
  • 78. Computer Design  IS  IS Classification  Stack A. © V. De Florio KULeuven 2003 Basic Concepts Computer Design • • • • Stack architecture: Operands are implicitly referred to Top two items on the system stack Example: C = A + B 4. POP C 3. ADD Computer Architectures for AI 2. PUSH B 1. PUSH A Computer Architectures In Practice C = TOP STACK = A+B An example: the ARIEL virtual machine (Part 1, Slides 91 –) 2.1/81
  • 79. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Design  IS  IS Classification  Accumulator A. • Accumulator Architectures • A special register (the accumulator) plays the role of an implicit argument • Example: C = A + B 1. Computer Architectures for AI Computer Architectures In Practice 2.1/82 LOAD A ; let Acml = A 2. ADD B ; let Acml = Acml + B 3. STORE C ; let C = Acml
  • 80. Computer Design  IS  IS Classification  Register A. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/83 • • • • General-purpose Register Architecture Explicit operands only Either registers or memory locations Two flavors:  Register-memory architectures (RMA)  Register-register architectures (RRA) • Example: C = A + B  RMA: Load R1, A  Add R1, B ; in C, R1 += B  Store C, R1  RRA: Load R1, A  Load R2, B  Add R3, R1, R2  Store C, R3
  • 81. © V. De Florio KULeuven 2003 Basic Concepts Computer Design  IS  IS Classification  RRA • Some old machines used stack or accumulator architectures  For instance, T800 and 6502/6510 Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/84 • Today the de facto standard is RRA  Regs are fast  Regs are easier to use (compiler writers)  Do not require to deal with associativity issues  Stacks do!  Regs can hold variables register int I; for (I=0; I<1000000;I++) { do-stgh(I); … }  Using regs you don’t need a memory address
  • 82. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Design  IS  IS Classification  Register A. • RRA: no memory operands  All instructions are similar in size -> take similar number of clocks to execute (very useful property… see later)  No side effect  Higher instruction count • RMA: one memory operand Computer Architectures for AI Computer Architectures In Practice  One load can be spared  A register operand is destroyed ( R += B )  Clocks per instruction varies by operand location • Memory-memory:  Compact  Large variation of work per instruction  Large variation in instruction size 2.1/85
  • 83. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice Computer Design  IS  Memory addressing • How is memory organized? • What does it mean, e.g., read memory at address 512? • What do we read?  Bytes, half words, words, double words • How are consecutive bytes stored in a word? (Assumption: word is 4 bytes)  Little endian: &word = &LSB  Big endian: &word = &MSB  XDR routines are needed to exchange data (&word  address of word) 2.1/86
  • 84. © V. De Florio KULeuven 2003 Basic Concepts A memory model for didactics • Memory can be thought as finite, long array of cells, each of size 1 byte 0 Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/87 1 2 3 4 5 6 7 … • Each cell has a label, called address, and a content, i.e. the byte stored into it • Think of a chest of drawers, with a label on each drawer and possibly something into it
  • 85. © V. De Florio KULeuven 2003 A memory model for didactics Basic Concepts Content Computer Design Computer Architectures for AI 4 3 2 1 Computer Architectures In Practice 2.1/88 Address
  • 86. © V. De Florio KULeuven 2003 Basic Concepts A memory model for didactics • The character * has a special meaning • It refers to the contents of a cell Computer Design • For instance: Computer Architectures for AI *(1)  Computer Architectures In Practice This character means we’re inspecting the contents of a cell (we open a drawer and see what’s in it) 2.1/89
  • 87. © V. De Florio KULeuven 2003 Basic Concepts A memory model for didactics • The character * has a special meaning • It refers to the contents of a cell Computer Design • For instance: Computer Architectures for AI *(1)  Computer Architectures In Practice This character means we’re writing new contents into a cell (we open a drawer and change its contents) 2.1/90
  • 88. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/91 A memory model for didactics • Memory is (often) byte addressable, though it is organized into small groups of bytes: the machine word • A common size for the machine word is 4 bytes (32 bits) • Two possible organizations for the bytes in a word  Little endian  Big endian
  • 89. © V. De Florio KULeuven 2003 Little endian versus Big endian MSB0 LSB0 LSB0 MSB0 4 Big endian (Motorola) Little endian 3 Basic Concepts Big endian MSB1 LSB1 LSB1 MLSB1 0 MSB 0 Computer Design Computer Architectures for AI 1 2 LSB 3 1 MSB 4 5 6 LSB 7 2 2.1/92 5 LSB 3 Computer Architectures In Practice Little endian (Intel) 2 1 MSB 0 6 LSB 7 6 5 MSB 4 7
  • 90. © V. De Florio KULeuven 2003 Little endian versus Big endian Problem: communication between the two Little endian 0 MSB0 00 LSB0 00 1 00 00 2 00 00 3 LSB0 01 MSB0 01 4 MSB1 10 LSB1 10 Little endian (Intel) 5 00 00 LSB 00 Basic Concepts Big endian 00 00 MSB 01 6 00 00 LSB 10 00 00 MSB 00 7 LSB1 00 MLSB1 00 Big endian (Motorola) MSB 00 Computer Design 00 00 LSB 01 MSB 10 00 00 LSB 00 =268435456 Computer Architectures for AI Computer Architectures In Practice So they are the same; though, interpreted as if they were… =16777216 01 00 00 00 00 2.1/93 =1 00 00 10 =16
  • 91. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/94 Computer Design  IS  Memory addressing • Alignment is mandatory on some machines  Object O; int t = sizeof(O);  ALIGNED(O) means &O modulo t is 0  “access to O is aligned”  For instance if access to integers (4 bytes) is aligned, then an integer can only be stored in addresses divisible by 4  Alignment is sometimes necessary because prevents hardware complications  Alignment implies faster access
  • 92. Computer Design  IS  Memory addressing © V. De Florio KULeuven 2003 Basic Concepts Computer Design • Addressing modes: ways to specify the address of an object in memory • An addressing mode can specify  A constant  A register  A memory location Computer Architectures for AI Computer Architectures In Practice In what follows, A += B means * (x) means x++ --x Rx 2.1/95 A=A+B return the contents of memory at address x means “at the end, let x = x + 1” means “at the beginning, let x = x – 1” means register x
  • 93. Computer Design  IS  Memory addressing © V. De Florio KULeuven 2003 Meaning Add R4, R3 Add R4, #3 R4 += R3 R4 += 3 Displacement Indirect Add R4, 100(R1) Add R4, (R1) R4 += *(100+R1) R4 += *(R1) Add R4, (R1 + R2) R4 += *(R1 + R2) Absolute Computer Architectures for AI Example Indexed Computer Design Mode Register Immediate Basic Concepts Add R4, (100) R4 += *(100) Deferred Add R4, @(R3) R4 += *(*(R3)) Autoincrement Add R4, (R3)+ Indirect, R3++ Autodecrement Add R4, -(R2) Computer Architectures In Practice Scaled Add R4, 100(R2)[R3] R2--, indirect R4 += * ( 100 + R2 + R3 * d ) d = size of the addressed data (1, 2, 4, 8, or 16) 2.1/96
  • 94. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/97 Computer Design  IS  Memory addressing • Addressing mode can reduce IC • Complex addressing modes increase the complexity of the hardware  can increase CPI • Displacement, immediate and deferred represent b/w 75% and 99% of addressing modes (experiments done with TeX, spice, and gcc) • IC(p) = number of instructions that the CPU executed during the activity of program p • CPI(p) = clock cycles per instruction = #CC(p) / IC(p) average number of clock cycles needed to execute one instruction of p
  • 95. Computer Design  IS  Operations © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/98 • • • • • • • • Arithmetical and logical (add, and, sub...) Data transfer (move, store) Control (br, jmp, call, ret, iret…) System (virtual memory mngt…) Floating point (add, mul, …) Decimal (decimal add, decimal mul…) String (str move, str cmp, str search) Graphics (pixel operations) • Benchmarks show that often a small set of simple instructions account for stg like 95% of instructions executed (see Fig. 2.11, P&H p.81)
  • 96. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/99 Computer Design  IS  Operations • Control Flow Instructions  Branch (conditional change)  Jump (unconditional change)  Procedure calls  Procedure returns • Most of the comparisons in conditional branches are simple “==“, “!=“ with 0! • In some cases, the address to go to is only known at run-time  “Return” uses a stack  Switch statements  Dynamic libraries
  • 97. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/100 Computer Design  IS  Operands • When we say, e.g., “Add R1, #5” do we work with bytes? Half-words? Words? • How do we specify the type of the operand? 1. Classical method: the type of operand is part of the opcode • Add family is coded as ffff…fffvv where f are fixed bits and v are bits that specify the type
  • 98. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/101 Computer Design  IS  Operands and types • Example: Add family = 10110101000100vv • 1011010100010000 = 1011010100010001 = 1011010100010010 = 1011010100010011 = Add Add Add Add float words words half-words bytes • Old fashioned method: operand = data + tag • Tag describes a type • Tag is interpreted by HW • Operation is chosen accordingly
  • 99. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/102 Computer Design  IS  Operands and types • Which types to support? • Old fashioned solution: all (bytes, semiwords, words, f.p., double words, double precision f.p., …) • Current trend: Only operations on items greater than or equal to 32 bits • On the DEC Alpha one needs multiple instructions to access objects smaller than 32 bits
  • 100. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/103 Computer Design  IS  Operands and types • Floating point numbers: IEEE standard 754 • In the early ’80, each manufacturer had its own f.p. representation • Sometimes string operations are available (strcmp, strcpy…) • Sometimes BCD is used to code numbers  Four bits are used to code a decimal digit  A byte codes two decimal digits  Functions for “packing” and “unpacking” are required  It is unclear if this will stay in the future
  • 101. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/104 Computer Design  IS • IS Classification Role of the compilers • DLX
  • 102. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/105 Computer Design  IS  Role of the compiler • In the past, the role of Assembly language was crucial • Architectural decisions aimed at easing assembly language programming • Now, the user interface is a high level language (C, C++, Java…) • The user interfaces the machine via the HLL, though the machine actually executes some lower level code • This lower level code is produced by a compiler  The role of the compiler is fundamental  The IS architecture needs to take the compiler into strong account
  • 103. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice Computer Design  IS  Role of the compiler • Goals of the compiler writer  Correctness  Performance  …Fast compilation, debugging support, … • Strategy for writing a compiler Use a number of “passes” From high level structures down to lower levels, until machine level  This way complexity is decomposed in smaller blocks  Optimizing becomes more difficult 2.1/106
  • 104. © V. De Florio KULeuven 2003 Basic Concepts Computer Design  IS  Role of the compiler Dependencies D(language) D(machine) Function Front-end Language  common intermediate form HL Opt Loop transformations, function inlining… Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/107 Global Opt D(language) D(machine) Register allocation… Code generator Instruction selection, D(machine) opt.
  • 105. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Design  IS  Role of the compiler • HL Optimizations: source-level optimizations (code  code’) • Local optimizations: basic block optimizations • Global optimizations: loop optimization and basic blocks optimizations • Machine-dependent optimization: using low level architectural knowledge Computer Architectures In Practice 2.1/108 • Basic Block = a straight-line code fragment
  • 106. © V. De Florio KULeuven 2003 Basic Concepts Computer Design  IS  Role of the compiler • Compilers have different optimization levels  -O1 .. -On Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/109 • Optimization can have a big impact on instruction count  on performance
  • 107. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/110 Computer Design  IS  Role of the compiler
  • 108. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/111 Computer Design  IS  Role of the compiler • In some cases, though, optimization may be counterproductive! • This happens because there might be conflicts between local and global optimization tasks SAME EXPRESSION • Example: a = sqrt(x*x + y*y) + f()… ; b = sqrt(x*x + y*y) + g()…; • Idea: tmp = sqrt(x*x + y*y); a = tmp + f() …; b = tmp + g() …;
  • 109. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/112 Computer Design  IS  Role of the compiler • Effective, but only if tmp can be stored in a register • No register  in memory  cache misses  … bad performance • Problem is  When the compiler performs, e.g., code transformations like in the example, it does not know whether a register will actually be available  This will only become clear later (at global optimization level) • (Phase ordering problem)
  • 110. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/113 Computer Design  IS  Role of the compiler • Key resource is the register file • “Intelligent” register allocation techniques are a must • Current solution: graph coloring (graph with possible candidates for allocation to a register) • NP-complete, though effective heuristic algorithms exist
  • 111. © V. De Florio KULeuven 2003 Basic Concepts Computer Design  IS  Role of the compiler • A special class of compilers – Algorithmdriven software generation  FFTW approach: Software generation system based on symbolic computation Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/114  Objective CamL  Sort of FFT compiler that generates optimal C code via symbolic computing  Possible future steps (project works, theses…): Extending the approach going down to code generation for, e.g., the TI ‘C67 DSP and other VLIW CPUs
  • 112. © V. De Florio KULeuven 2003 Basic Concepts Computer Design Computer Architectures for AI Computer Architectures In Practice 2.1/115 Exam of 16 Jan 2002 • A program is composed of three classes of instructions: i1 (integer instructions), i2 (loadstore instructions), and i3 (floating point instructions) • The three classes are responsible of r1 = 60%, r2 = 30% and r3 = 10% of the overall execution time, respectively • You can choose between three levels of optimisation on your computer: O1, O2, and O3: O1 optimises i1, O2 optimises i2, and O3 optimises i3 • The corresponding enhancements would be e1 = 2, e2 = 3, e3 = 10 • Suppose you can only choose one of the three levels of optimisation. Which one would you choose? Justify your choice
  • 113. © V. De Florio KULeuven 2003 Basic Concepts Solution • r1 = 60% r2 = 30% r3 = 10% Computer Design S= Exec-timeNEW = Exec-timeOLD Computer Architectures for AI Computer Architectures In Practice 2.1/116 • s1 = 1.42857 s2 = 1.25 s3 = 1.0989 e1 = 2 e2 = 3 e3 = 10 1 (1 - r) + r / e