Your SlideShare is downloading. ×
0
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Memory hir
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Memory hir

760

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
760
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MemoryEri Prasetyo Wibowohttp://eri.staff.gunadarma.ac.id
  • 2. Sem icon duc tor Me m or y Typ es 2CompOrg - Memory Hierarchy
  • 3. SemRAM icon duc • Misnamed as all semiconduct or memory is random access • Read/ Writ e • Volat ile tor • Temporary st orage Me • St at ic or dynamic m or y 3 CompOrg - Memory Hierarchy
  • 4. 4CompOrg - Memory Hierarchy
  • 5. Random-Access Memory (RAM)Key features • RAM adalah sebagai kemasan chip. • Dasar penyimpanan unit adalah sel (satu bit per sel). • Beberapa chip RAM membentuk memoriStatic RAM (SRAM) • Setiap sel menyimpan bit dengan enam transistor-sirkuit. • Nilai tetap tentu, selama ini disimpan daya. • Relatif kebal untuk gangguan listrik seperti kebisingan. • Lebih cepat dan lebih mahal daripada DRAM.Dynamic RAM (DRAM) • Setiap sel menyimpan bit dengan kapasitor dan transistor. • Nilai harus refresh setiap 10-100 ms. • Sensitif terhadap gangguan. • Lambat dan lebih murah dibandingkan SRAM. 5 CompOrg - Memory Hierarchy
  • 6. SRAM vs DRAM summary Tran. Access per bit time Persist? Sensitive? Cost ApplicationsSRAM 6 1X Yes No 100x cache memoriesDRAM 1 10X No Yes 1X Main memories, frame buffers 6 CompOrg - Memory Hierarchy
  • 7. DynBit s st ored as charge in capacit ors i amCharges leakNeed refreshing even when powered cSimpler const ruct ion RASmaller per bit MLess ex pensiveNeed refresh circuit sSlowerMain memoryEssent ially analogue • Level of charge det ermines value 7 CompOrg - Memory Hierarchy
  • 8. 8CompOrg - Memory Hierarchy
  • 9. DRAddress line active when bit read AM or writt en Ope • Transist or swit ch closed (current flows)Write rati • Volt age t o bit line – High for 1 low for 0 on • Then signal address line – Transfers charge to capacitorRead • Address line select ed – transistor turns on • Charge from capacit or fed via bit line t o sense amplifier – Compares with reference value to determine 0 or 1 • Capacit or charge must be rest ored 9 CompOrg - Memory Hierarchy
  • 10. Conventional DRAM organizationd x w DRAM: • dw total bits organized as d supercells of size w bits 16 x 8 DRAM chip cols 0 1 2 3 2 bits 0 / addr 1 rows memory 2 supercell controller (to CPU) (2,1) 3 8 bits / data internal row buffer 10 CompOrg - Memory Hierarchy
  • 11. Reading DRAM supercell (2,1)Step 1(a): Row access strobe (RAS) selects row 2.Step 1(b): Row 2 copied from DRAM array to row buffer. 16 x 8 DRAM chip cols 0 1 2 3 RAS = 2 2 / 0 addr 1 rows memory controller 2 8 3 / data row 2 internal row buffer 11 CompOrg - Memory Hierarchy
  • 12. Reading DRAM supercell (2,1)Step 2(a): Column access strobe (CAS) selects column 1.Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually back to the CPU. 16 x 8 DRAM chip cols 0 1 2 3 CAS = 1 2 / 0 addr 1 rows memory controller supercell 2 (2,1) 8 3 / data internal row buffer 12 CompOrg - Memory Hierarchy
  • 13. Memory modules addr (row = i, col = j) : supercell (i,j) DRAM 0 64 MB memory module consisting ofDRAM 7 eight 8Mx8 DRAMs data bits bits bits bits bits bits bits bits 56-63 48-55 40-47 32-39 24-31 16-23 8-15 0-7 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 Memory controller64-bit doubleword at main memory address A 64-bit doubleword to CPU chip 13 CompOrg - Memory Hierarchy
  • 14. Enhanced DRAMsAll enhanced DRAMs are built around the conventional DRAM core. • Fast page mode DRAM (FPM DRAM) – Access contents of row with [RAS, CAS, CAS, CAS, CAS] instead of [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)]. • Extended data out DRAM (EDO DRAM) – Enhanced FPM DRAM with more closely spaced CAS signals. • Synchronous DRAM (SDRAM) – Driven with rising clock edge instead of asynchronous control signals. • Double data-rate synchronous DRAM (DDR SDRAM) – Enhancement of SDRAM that uses both clock edges as control signals. • Video RAM (VRAM) – Like FPM DRAM, but output is produced by shifting row buffer – Dual ported (allows concurrent reads and writes) 14 CompOrg - Memory Hierarchy
  • 15. Stat icBit s stored as on/ off swit chesNo charges t o leak RANo refreshing needed when powered MMore complex constructionLarger per bitMore ex pensiveDoes not need refresh circuitsFast erCacheDigital • Uses flip-flops 15 CompOrg - Memory Hierarchy
  • 16. 16CompOrg - Memory Hierarchy
  • 17. Stat ing RA M Stru ctur e 17CompOrg - Memory Hierarchy
  • 18. Stat ic stable logic stateTransistor arrangement givesSt ate 1 RA • C high, C low 1 2 M • T T off, T T on 1 4 2 3 OpeSt ate 0 • C high, C low 2 1 rati • T T off, T T on 2 3 1 4 onAddress line transistors T5 T6 is switchWrite – apply value to B & compliment t o BRead – value is on line B 18 CompOrg - Memory Hierarchy
  • 19. SRABoth volatile M vs • Power needed t o preserve dat aDynamic cell • Simpler t o build, smaller DR • More dense AM • Less ex pensive • Needs refresh • Larger memory unit sSt atic • Fast er • Cache 19 CompOrg - Memory Hierarchy
  • 20. Nonvolatile memoriesDRAM and SRAM are volatile memories • Lose information if powered off.Nonvolatile memories retain value even if powered off. • Generic name is read-only memory (ROM). • Misleading because some ROMs can be read and modified.Types of ROMs • Programmable ROM (PROM) • Eraseable programmable ROM (EPROM) • Electrically eraseable PROM (EEPROM) • Flash memoryFirmware • Program stored in a ROM – Boot time code, BIOS (basic input/ouput system) – graphics cards, disk controllers. 20 CompOrg - Memory Hierarchy
  • 21. ReaPermanent storage d • Nonvolat ile OnlMicroprogramming (see later)Library subroutines ySystems programs (BIOS) MeFunction tables m or y (RO M) 21 CompOrg - Memory Hierarchy
  • 22. TypWritten during manufacture es of • Very ex pensive for small runsProgrammable (once) • PROM RO • Needs special equipment t o MprogramRead “mostly” • Erasable Programmable (EPROM) – Erased by UV • Elect rically Erasable (EEPROM) – Takes much longer to write than read • Flash memory – Erase whole memory electrically 22 CompOrg - Memory Hierarchy
  • 23. Org anisA 16Mbit chip can be organised as 1M of 16 bit words atioA bit per chip system has 16 lot s of 1Mbit chip wit h bit 1 of each word in chip 1 and so on n in as a 2048 x 2048 xA 16Mbit chip can be organised 4bit array det ail column address • Reduces number of address pins – Multiplex row address and – 11 pins to address (2 11 =2048) – Adding one more pin doubles range of values so x4 capacity (2 12 x4 Capacity with 2 11 ) 23 CompOrg - Memory Hierarchy
  • 24. Bus structure connecting CPU and memoryA bus is a collection of parallel wires that carry address, data, and control signals.Buses are typically shared by multiple devices. CPU chip register file ALU system bus memory bus I/O main bus interface bridge memory 24 CompOrg - Memory Hierarchy
  • 25. Memory read transaction (1)CPU places address A on the memory bus. register file Load operation: movl A, %eax ALU %eax main memory I/O bridge 0 A bus interface x A 25 CompOrg - Memory Hierarchy
  • 26. Memory read transaction (2)Main memory reads A from the memory bus, retreives word x, and places it on the bus. register file Load operation: movl A, %eax ALU %eax main memory I/O bridge x 0 bus interface x A 26 CompOrg - Memory Hierarchy
  • 27. Memory read transactionCPU read word x from the(3) and copies it into bus register %eax. register file Load operation: movl A, %eax ALU %eax x main memory I/O bridge 0 bus interface x A 27 CompOrg - Memory Hierarchy
  • 28. Memory write transaction (1)CPU places address A on bus. Main memory reads itand waits for the corresponding data word to arrive. register file Store operation: movl %eax, A ALU %eax y main memory I/O bridge 0 A bus interface A 28 CompOrg - Memory Hierarchy
  • 29. Memory write transaction (2)CPU places data word y on the bus. register file Store operation: movl %eax, A ALU %eax y main memory I/O bridge 0 y bus interface A 29 CompOrg - Memory Hierarchy
  • 30. Memory write transaction (3)Main memory read data word y from the bus andstores it at address A. register file Store operation: movl %eax, A ALU %eax y main memory I/O bridge 0 bus interface y A 30 CompOrg - Memory Hierarchy
  • 31. Disk geometryDisks consist of platters, each with two surfaces.Each surface consists of concentric rings called tracks.Each track consists of sectors separated by gaps. tracks surface track k gaps spindle sectors 31 CompOrg - Memory Hierarchy
  • 32. Disk geometry (muliple-platter view)Aligned tracks form a cylinder. cylinder k surface 0 platter 0 surface 1 surface 2 platter 1 surface 3 surface 4 platter 2 surface 5 spindle 32 CompOrg - Memory Hierarchy
  • 33. Disk capacityCapacity: maximum number of bits that can be stored. • Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^6.Capacity is determined by these technology factors: • Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. • Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. • Areal density (bits/in2): product of recording and track density.Modern disks partition tracks into disjoint subsets called recording zones • Each track in a zone has the same number of sectors, determined by the circumference of innermost track. • Each zone has a different number of sectors/track 33 CompOrg - Memory Hierarchy
  • 34. Computing disk capacityCapacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk)Example: • 512 bytes/sector • 300 sectors/track (on average) • 20,000 tracks/surface • 2 surfaces/platter • 5 platters/diskCapacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB 34 CompOrg - Memory Hierarchy
  • 35. Disk operation (single-platter view)The disk The read/write headsurface is attached to the endspins at a fixed of the arm and flies overrotational rate the disk surface on a thin cushion of air. spindle By moving radially, the arm can position the read/write head over any track. 35 CompOrg - Memory Hierarchy
  • 36. Disk operation (multi-platter view) read/write heads move in unison from cylinder to cylinder arm spindle 36 CompOrg - Memory Hierarchy
  • 37. Disk access timeAverage time to access some target sector approximated by : • Taccess = Tavg seek + Tavg rotation + Tavg transferSeek time • Time to position heads over cylinder containing target sector. • Typical Tavg seek = 9 msRotational latency • Time waiting for first bit of target sector to pass under r/w head. • Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 minTransfer time • Time to read the bits in the target sector. • Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min. 37 CompOrg - Memory Hierarchy
  • 38. Disk access timeGiven: example • Rotational rate = 7,200 RPM • Average seek time = 9 ms. • Avg # sectors/track = 400.Derived: • Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms. • Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms • Taccess = 9 ms + 4 ms + 0.02 msImportant points: • Access time dominated by seek time and rotational latency. • First bit in a sector is the most expensive, the rest are free. • SRAM access time is about 4ns/doubleword, DRAM about 60 ns – Disk is about 40,000 times slower than SRAM, – 2,500 times slower then DRAM. 38 CompOrg - Memory Hierarchy
  • 39. Logical disk blocksModern disks present a simpler abstract view of the complex sector geometry: • The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2, ...)Mapping between logical blocks and actual (physical) sectors • Maintained by hardware/firmware device called disk controller. • Converts requests for logical blocks into (surface,track,sector) triples.Allows controller to set aside spare cylinders for each zone. • Accounts for the difference in “formatted capacity” and “maximum capacity”. 39 CompOrg - Memory Hierarchy
  • 40. Bus structure connecting I/O and CPU CPU chip register file ALU system bus memory bus I/O main bus interface bridge memory I/O bus Expansion slots for other devices such USB graphics disk as network adapters. controller adapter controller mouse keyboard monitor disk 40 CompOrg - Memory Hierarchy
  • 41. Reading a disk sector (1)CPU chip CPU initiates a disk read by writing a register file command, logical block number, and destination memory address to a port ALU (address) associated with disk controller. main bus interface memory I/O bus USB graphics disk controller adapter controller mouse keyboard monitor disk 41 CompOrg - Memory Hierarchy
  • 42. Reading a disk sector (2)CPU chip Disk controller reads the sector and register file performs a direct memory access (DMA) transfer into main memory. ALU main bus interface memory I/O bus USB graphics disk controller adapter controller mouse keyboard monitor disk 42 CompOrg - Memory Hierarchy
  • 43. Reading a disk sector (3)CPU chip When the DMA transfer completes, the register file disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” ALU pin on the CPU) main bus interface memory I/O bus USB graphics disk controller adapter controller mouse keyboard monitor disk 43 CompOrg - Memory Hierarchy
  • 44. Storage trends metric 1980 1985 1990 1995 2000 2000:1980SRAM $/MB 19,200 2,900 320 256 100 190 access (ns) 300 150 35 15 2 100 metric 1980 1985 1990 1995 2000 2000:1980DRAM $/MB 8,000 880 100 30 1 8,000 access (ns) 375 200 100 70 60 6 typical size(MB) 0.064 0.256 4 16 64 1,000 metric 1980 1985 1990 1995 2000 2000:1980 $/MB 500 100 8 0.30 0.05 10,000Disk access (ms) 87 75 28 10 8 11 typical size(MB) 1 10 160 1,000 9,000 9,000 44 (Culled from back issues of Byte and PC Magazine) CompOrg - Memory Hierarchy
  • 45. CPU clock rates 1980 1985 1990 1995 2000 2000:1980processor 8080 286 386 Pent P-IIIclock rate(MHz) 1 6 20 150 750 750cycle time(ns) 1,000 166 50 6 1.6 750 45 CompOrg - Memory Hierarchy
  • 46. The CPU-Memory Gap The increasing gap between DRAM, disk, and CPU speeds. 100,000,000 10,000,000 1,000,000 Disk seek time 100,000 DRAM access time 10,000ns SRAM access time 1,000 CPU cycle time 100 10 1 1980 1985 1990 1995 2000 year 46 CompOrg - Memory Hierarchy
  • 47. Memory hierarchiesSome fundamental and enduring properties of hardware and software: • Fast storage technologies cost more per byte and have less capacity. • The gap between CPU and main memory speed is widening. • Well-written programs tend to exhibit good locality.These fundamental properties complement each other beautifully.Suggest an approach for organizing memory and storage systems known as a “memory hierarchy”. 47 CompOrg - Memory Hierarchy
  • 48. An example memory hierarchy L0: Smaller, faster, registers CPU registers hold words retrieved and from cache memory. costlier L1: on-chip L1(per byte) cache (SRAM) storage L1 cache holds cache lines devices retrieved from the L2 cache. L2: off-chip L2 cache (SRAM) L2 cache holds cache lines retrieved from memory. L3: main memory (DRAM) Larger, Main memory holds disk slower, blocks retrieved from local and disks. cheaper local secondary storage(per byte) L4: storage (local disks) devices Local disks hold files retrieved from disks on remote network servers. L5: remote secondary storage (distributed file systems, Web servers) 48 CompOrg - Memory Hierarchy
  • 49. Cache sCache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.Fundamental idea of a memory hierarchy: • For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.Why do memory hierarchies work? • Programs tend to access the data at level k more often than they access the data at level k+1. • Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit. • Net effect: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top. 49 CompOrg - Memory Hierarchy
  • 50. Cac heSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip or module 50 CompOrg - Memory Hierarchy
  • 51. Cac he/ Mai n Me mor y Stru ctur e 51CompOrg - Memory Hierarchy
  • 52. CacCPU requests contents of he memory locationCheck cache for this dataopeIf present, get from cacherati (fast)If not present, read required block from main memory to cache on – oveThen deliver from cache to CPU rvieCache includes tags to identify which block of main memory is in each cache slot w 52 CompOrg - Memory Hierarchy
  • 53. Cac he Rea d Ope rati on - Flo wch art 53CompOrg - Memory Hierarchy
  • 54. CacSize heMapping Function DesReplacement Algorithm ignWrite PolicyBlock SizeNumber of Caches 54 CompOrg - Memory Hierarchy
  • 55. SizeCost doe • More cache is expensive sSpeed mat • More cache is faster (up to a point) ter • Checking cache for data takes time 55 CompOrg - Memory Hierarchy
  • 56. Typi cal Cac he Org aniz atio n 56CompOrg - Memory Hierarchy
  • 57. Co Processor Type Year ofmpa L1 cachea L2 cache L3 cache Introduction IBM 360/85 PDP-11/70 Mainframe Minicomputer 1968 1975 riso 16 to 32 KB 1 KB — — — — VAX 11/780 IBM 3033 Minicomputer Mainframe 1978 1978 n of 16 KB 64 KB — — — — IBM 3090 Intel 80486 Mainframe PC 1985 1989 Cac 128 to 256 KB 8 KB — — — — Pentium PowerPC 601 PC PC 1993 1993 he 8 KB/8 KB 32 KB 256 to 512 KB — — — PowerPC 620 PowerPC G4 PC PC/server 1996 1999 Size 32 KB/32 KB 32 KB/32 KB — 256 KB to 1 MB — 2 MBIBM S/390 G4IBM S/390 G6 Mainframe Mainframe 1997 1999 s 32 KB 256 KB 256 KB 8 MB 2 MB — Pentium 4 PC/server 2000 8 KB/8 KB 256 KB — High-end server/ IBM SP 2000 64 KB/32 KB 8 MB — supercomputer CRAY MTAb Supercomputer 2000 8 KB 2 MB — Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MBSGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB — Itanium 2 PC/server 2002 32 KB 256 KB 6 MBIBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB 57 CRAY XD-1 Supercomputer CompOrg - Memory Hierarchy 2004 64 KB/64 KB 1MB —
  • 58. MapCache of 64kByte pinCache block of 4 bytes g • i.e. cache is 16k (2 ) lines of 4 bytes 1416MBytes main memory Fun24 bit address ctio • (2 =16M) 24 n 58 CompOrg - Memory Hierarchy
  • 59. Dire ctEach block of main memory maps to only one cache line Map in one specific place • i.e. if a block is in cache, it must beAddress is in two parts pin gLeast Significant w bits identify unique wordMost Significant s bits specify one memory blockThe MSBs are split into a cache line field r and a tag of s-r (most significant) 59 CompOrg - Memory Hierarchy
  • 60. Dire ct Tag s-r MapSlot r Line or Word w pin 14 2 8 g24 bit address2 bit word identifier (4 byte block) Add22 bit block identifier • 8 bit tag (=22-14) res • 14 bit slot or line sNo two blocks in the same line have the same Tag field StruCheck contents of cache by finding line and checking Tag ctur e 60 CompOrg - Memory Hierarchy
  • 61. DireCache line ct Main Memory blocks held0 0, m, 2m, 3m…2s-m1 Map 1,m+1, 2m+1…2s-m+1 pinm-1 g m-1, 2m-1,3m-1…2s-1 Cac he Line Tabl e 61 CompOrg - Memory Hierarchy
  • 62. Dire ct Map pin g Cac he Org aniz atio n 62CompOrg - Memory Hierarchy
  • 63. Direct Mapping Example 63 CompOrg - Memory Hierarchy
  • 64. Dire ctAddress length = (s + w) bits MapNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytes pinNumber of blocks in main memory = 2s+ w/2w = 2s gNumber of lines in cache = m = 2rSize of tag = (s – r) bits Su mm ary 64 CompOrg - Memory Hierarchy
  • 65. DireSimple ctInexpensive MapFixed location for given block pin • If a program accesses 2 blocks that map to the same line repeatedly, cache misses areg high very pro s& con s 65 CompOrg - Memory Hierarchy
  • 66. Ass ociaA main memory block can load into any line of cache tiveMemory address is interpreted as tag and wordTag uniquely identifies block of memory MapEvery line’s tag is examined for a match pinCache searching gets expensive g 66 CompOrg - Memory Hierarchy
  • 67. Full y Ass ocia tive Cac he Org aniz atio n 67CompOrg - Memory Hierarchy
  • 68. Assosiative Mapping Example 68 CompOrg - Memory Hierarchy
  • 69. Ass ocia tive Tag 22 bit Word Map 2 bit pin of data22 bit tag stored with each 32 bit block gCompare tag field with tag entry in cache to check for hit AddLeast significant 2 bits of address identify which 16 bit word is required from 32 bit data blocke.g. res • Address • FFFFFC Tag s Data FFFFFC 246824683FFF Cache line Stru ctur e CompOrg - Memory Hierarchy 69
  • 70. AssAddress length = (s + w) ocia bits tiveNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytes MapNumber of blocks in main memory = 2s+ w/2w = 2s pinNumber of lines in cache = undeterminedSize of tag = s bits g Su mm ary 70 CompOrg - Memory Hierarchy
  • 71. Set AssCache is divided into a number of sets ociaEach set contains a number of linesA given block maps to any line in a given set tive • e.g. Block B can be in any line of set ie.g. 2 lines per set Map • 2 way associative mapping pin • A given block can be in one of 2 lines in only one set g 71 CompOrg - Memory Hierarchy
  • 72. Set13 bit set number Ass ociaBlock number in main memory is modulo 2 13000000, 00A000, 00B000, tive … map to same set 00C000 Map pin g Exa mpl e 72 CompOrg - Memory Hierarchy
  • 73. Two Way Set Associative Cache Organization 73 CompOrg - Memory Hierarchy
  • 74. Set Ass ocia Word Tag 9 bit Set 13 bit tive 2 bit MapUse set field to determine cache set to look in pinCompare tag field to see if we have a hite.g g • Address Tag Data Set number • 1FF 7FFC 1FF Add 1FFF 12345678 • 001 7FFC 001 11223344res 1FFF s Stru ctur CompOrg - Memory Hierarchy 74
  • 75. Two Way set Assosiative Mapping Example Two Way Set Associative Mapping Example 75 CompOrg - Memory Hierarchy
  • 76. Set AssAddress length = (s + w) bits ociaNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytes tiveNumber of blocks in main memory = 2d MapNumber of lines in set = kNumber of sets = v = 2d pinNumber of lines in cache = g = k * 2d kvSize of tag = (s – d) bits Su mm ary 76 CompOrg - Memory Hierarchy
  • 77. RepNo choice laceEach block only maps tomen one lineReplace that line t Alg orit hms (1) Dire ct map pin CompOrg - Memory Hierarchy 77
  • 78. Replacement Algorithms (2) Associative & Set AssociativeHardware implemented algorithm (speed)Least Recently used (LRU)e.g. in 2 way set associative • Which of the 2 block is lru?First in first out (FIFO) • replace block that has been in cache longestLeast frequently used • replace block which has had fewest hitsRandom 78 CompOrg - Memory Hierarchy
  • 79. WritMust not overwrite a cachee block unless main memory is up to date PoliMultiple CPUs may have individual caches cy directlyI/O may address main memory 79 CompOrg - Memory Hierarchy
  • 80. Writ eAll writes go to main memory as well as cache throMultiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to dateLots of traffic ughSlows down writesRemember bogus write through caches! 80 CompOrg - Memory Hierarchy
  • 81. Writ eUpdates initially made in cache only bacUpdate bit for cache slot is set when update occursIf block is to be replaced, write to main memory only if k update bit is setOther caches get out of syncI/O must access main memory through cacheN.B. 15% of memory references are writes 81 CompOrg - Memory Hierarchy
  • 82. Caching in a memory hierarchy Smaller, faster, more expensive Level k: 4 9 14 3 device at level k caches a subset of the blocks from level k+1 Data is copied between levels in block-sized transfer units 0 1 2 3 4 5 6 7 Larger, slower, cheaper storageLevel k+1: device at level k+1 is partitioned 8 9 10 11 into blocks. 12 13 14 15 82 CompOrg - Memory Hierarchy
  • 83. General caching concepts Level k: Program needs object d, 4 9 14 3 which is stored in some block b. Cache hit • Program finds b in the cacheLevel k+1: at level k. E.g. block 14. Cache miss 0 1 2 3 • b is not at level k, so level k 4 5 6 7 cache must fetch it from level k+1. E.g. block 12. 8 9 10 11 • If level k cache is full, then 12 13 14 15 some current block must be replaced (evicted). • Which one? Determined by replacement policy. E.g. evict least recently used block. 83 CompOrg - Memory Hierarchy
  • 84. General caching conceptsTypes of cache misses: • Cold (compulsary) miss – Cold misses occur because the cache is empty. • Conflict miss – Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k. – E.g. Block i at level k+1 must be placed in block (i mod 4) at level k+1. – Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block. – E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time. • Capacity miss – Occurs when the set of active cache blocks (working set) is larger than the cache. 84 CompOrg - Memory Hierarchy
  • 85. Examples of caching in the hierarchyCache Type What Cached Where Cached Latency Managed (cycles) ByRegisters 4-byte word CPU registers 0 CompilerTLB Address On-Chip TLB 0 Hardware translationsL1 cache 32-byte block On-Chip L1 1 HardwareL2 cache 32-byte block Off-Chip L2 10 HardwareVirtual Memory 4-KB page Main memory 100 Hardware+ OSBuffer cache Parts of files Main memory 100 OSNetwork buffer Parts of files Local disk 10,000,000 AFS/NFScache clientBrowser cache Web pages Local disk 10,000,000 Web browserWeb cache Web pages Remote server 1,000,000,000 Web proxy disks server 85 CompOrg - Memory Hierarchy
  • 86. Pen80386 – no on chip cache tiu80486 – 8k using 16 byte lines and four way set associative organization m4 CacPentium (all versions) – two on chip L1 caches • Data & instructions hePentium III – L3 cache added off chipPentium 4 • L1 caches – 8k bytes – 64 byte lines – four way set associative • L2 cache – Feeding both L1 caches – 256k – 128 byte lines – 8 way set associative • L3 cache on chip 86 CompOrg - Memory Hierarchy
  • 87. IntelProblem Cac Solution Processor on which feature first appearsExternal memory slower than the system bus. he Add external cache using faster 386 memory technology.Increased processor speed results in external bus becoming a Evo Move external cache on-chip, 486bottleneck for cache access. luti operating at the same speed as the processor.Internal cache is rather small, due to limited space on chip on Add external L2 cache using faster technology than main memory 486Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentiumthe Execution Unit simultaneously require access to the caches.cache. In that case, the Prefetcher is stalled while theExecution Unit’s data access takes place. Create separate back-side bus that Pentium Pro runs at higher speed than the mainIncreased processor speed results in external bus becoming a (front-side) external bus. The BSB isbottleneck for L2 cache access. dedicated to the L2 cache. Move L2 cache on to the processor Pentium II chip.Some applications deal with massive databases and must Add external L3 cache. Pentium IIIhave rapid access to large amounts of data. The on-chipcaches are too small. 87 CompOrg - Memorycache on-chip. Move L3 Hierarchy Pentium 4
  • 88. Pen tiu m4 Blo ck Dia gra m 88CompOrg - Memory Hierarchy
  • 89. PenFetch/Decode Unit tiu • Fetches instructions from L2 cache • Decode into micro-ops m4 • Store micro-ops in L1 cache CorOut of order execution logic • Schedules micro-ops e Pro • Based on data dependence and resources • May speculatively executeExecution units ces • Execute micro-ops • Data from L1 cache sor • Results in registersMemory subsystem • L2 cache and systems bus 89 CompOrg - Memory Hierarchy
  • 90. Pen tiuDecodes instructions into RISC like micro-ops before L1 cacheMicro-ops fixed length • m4 Superscalar pipelining and scheduling DesPentium instructions long & complexPerformance improved by separating decoding from scheduling & pipelining • (More later – ch14) ignData cache is write back • Rea Can be configured to write through soniL1 cache controlled by 2 bits in register • CD = cache disable • NW = not write through ng • 2 instructions to invalidate (flush) cache and write back then invalidateL2 and L3 8-way set-associative • Line size 128 bytes 90 CompOrg - Memory Hierarchy
  • 91. Pow erP601 – single 32kb 8 way set associative C603 – 16kb (2 x 8kb) two way set associative604 – 32kb Cac620 – 64kb heG3 & G4 • 64kb L1 cache Org – 8 way set associative • 256k, 512k or 1M L2 cache aniz – two way set associative atioG5 n • 32kB instruction cache • 64kB data cache 91 CompOrg - Memory Hierarchy
  • 92. Pow erP C G5 Blo ck Dia gra m 92CompOrg - Memory Hierarchy

×