MemoryEri Prasetyo Wibowohttp://eri.staff.gunadarma.ac.id
Sem       icon       duc        tor        Me       m or         y       Typ        es                             2CompOr...
SemRAM                        icon                            duc  • Misnamed as all semiconduct or memory is random acces...
4CompOrg - Memory Hierarchy
Random-Access Memory (RAM)Key features  • RAM adalah sebagai kemasan chip.  • Dasar penyimpanan unit adalah sel (satu bit ...
SRAM vs DRAM                         summary       Tran.     Access       per bit   time   Persist? Sensitive?           C...
DynBit s st ored as charge in capacit ors i                                amCharges leakNeed refreshing even when powered...
8CompOrg - Memory Hierarchy
DRAddress line active when bit read                                  AM or writt en                                 Ope   ...
Conventional DRAM organizationd x w DRAM:  • dw total bits organized as d supercells of size w bits                       ...
Reading DRAM supercell                        (2,1)Step 1(a): Row access strobe (RAS) selects row 2.Step 1(b): Row 2 copie...
Reading DRAM supercell                        (2,1)Step 2(a): Column access strobe (CAS) selects column 1.Step 2(b): Super...
Memory            modules  addr (row = i, col = j)                                                                    : su...
Enhanced                   DRAMsAll enhanced DRAMs are built around the conventional DRAM core.  • Fast page mode DRAM (FP...
Stat                              icBit s stored as on/ off swit chesNo charges t o leak          RANo refreshing needed w...
16CompOrg - Memory Hierarchy
Stat        ing         RA         M        Stru        ctur          e                             17CompOrg - Memory Hie...
Stat                         ic stable logic stateTransistor arrangement givesSt ate 1                RA    • C high, C lo...
SRABoth volatile                    M                                 vs   • Power needed t o preserve dat aDynamic cell  ...
Nonvolatile                memoriesDRAM and SRAM are volatile memories  • Lose information if powered off.Nonvolatile memo...
ReaPermanent storage       d   • Nonvolat ile                       OnlMicroprogramming (see later)Library subroutines    ...
TypWritten during manufacture       es                                 of   • Very ex pensive for small runsProgrammable (...
Org                             anisA 16Mbit chip can be organised as 1M of 16 bit words                             atioA...
Bus structure connecting                CPU and memoryA bus is a collection of parallel wires that carry address, data, an...
Memory read transaction                        (1)CPU places address A on the memory bus.         register file           ...
Memory read transaction                        (2)Main memory reads A from the memory bus, retreives word x, and places it...
Memory read transactionCPU read word x from the(3) and copies it into                         bus register %eax.         r...
Memory write transaction                       (1)CPU places address A on bus. Main memory reads itand waits for the corre...
Memory write transaction                        (2)CPU places data word y on the bus.        register file                ...
Memory write transaction                       (3)Main memory read data word y from the bus andstores it at address A.    ...
Disk                     geometryDisks consist of platters, each with two surfaces.Each surface consists of concentric rin...
Disk geometry (muliple-platter view)Aligned tracks form a cylinder.                                 cylinder k         sur...
Disk                   capacityCapacity: maximum number of bits that can be stored.  • Vendors express capacity in units o...
Computing disk capacityCapacity = (# bytes/sector) x (avg. # sectors/track) x           (# tracks/surface) x (# surfaces/p...
Disk operation (single-platter view)The disk                                                The read/write headsurface    ...
Disk operation (multi-platter view)                            read/write heads                             move in unison...
Disk access                        timeAverage time to access some target sector approximated by :  • Taccess = Tavg seek ...
Disk access timeGiven:                  example  • Rotational rate = 7,200 RPM  • Average seek time = 9 ms.  • Avg # secto...
Logical disk blocksModern disks present a simpler abstract view of the complex sector geometry:  • The set of available se...
Bus structure connecting I/O and CPU  CPU chip             register file                             ALU                  ...
Reading a disk sector (1)CPU chip                                         CPU initiates a disk read by writing a          ...
Reading a disk sector (2)CPU chip                                           Disk controller reads the sector and        re...
Reading a disk sector (3)CPU chip                                      When the DMA transfer completes, the        registe...
Storage trends       metric          1980     1985    1990   1995    2000    2000:1980SRAM   $/MB            19,200   2,90...
CPU clock                            rates                  1980      1985      1990      1995   2000    2000:1980processo...
The CPU-Memory                            Gap     The increasing gap between DRAM, disk, and CPU      speeds.     100,000,...
Memory                 hierarchiesSome fundamental and enduring properties of hardware and software:  • Fast storage techn...
An example memory                          hierarchy                                       L0: Smaller,  faster,          ...
Cache                           sCache: A smaller, faster storage device that acts as a staging area for a subset of the d...
Cac                        heSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip ...
Cac       he/       Mai        n       Me       mor        y       Stru       ctur        e                             51...
CacCPU requests contents of he memory locationCheck cache for this dataopeIf present, get from cacherati                  ...
Cac         he        Rea         d        Ope        rati        on -        Flo        wch        art                   ...
CacSize                      heMapping Function         DesReplacement Algorithm    ignWrite PolicyBlock SizeNumber of Cac...
SizeCost                           doe  • More cache is expensive                                  sSpeed                 ...
Typi        cal       Cac        he       Org       aniz       atio         n                             56CompOrg - Memo...
Co  Processor            Type                                        Year ofmpa    L1 cachea         L2 cache      L3 cach...
MapCache of 64kByte                   pinCache block of 4 bytes               g   • i.e. cache is 16k (2 ) lines of 4 byte...
Dire                                      ctEach block of main memory maps to only one cache  line                        ...
Dire                                       ct Tag s-r                                     MapSlot r                       ...
DireCache line            ct             Main Memory blocks held0            0, m, 2m, 3m…2s-m1                    Map    ...
Dire        ct       Map       pin        g       Cac        he       Org       aniz       atio        n                  ...
Direct Mapping Example                                  63     CompOrg - Memory Hierarchy
Dire                            ctAddress length = (s + w) bits                           MapNumber of addressable units =...
DireSimple                           ctInexpensive                    MapFixed location for given block  pin   • If a prog...
Ass                         ociaA main memory block can load into any line of cache                          tiveMemory ad...
Full         y       Ass       ocia       tive       Cac        he       Org       aniz       atio         n              ...
Assosiative Mapping Example                                         68            CompOrg - Memory Hierarchy
Ass                                ocia                                tive                         Tag 22 bit            ...
AssAddress length = (s + w) ocia                          bits                          tiveNumber of addressable units = ...
Set                                AssCache is divided into a number of sets                               ociaEach set co...
Set13 bit set number       Ass                       ociaBlock number in main memory is modulo 2       13000000, 00A000, 0...
Two Way Set Associative Cache        Organization                                     73        CompOrg - Memory Hierarchy
Set                           Ass                           ocia                       Word Tag 9 bit                    S...
Two Way set Assosiative Mapping Example             Two Way               Set            Associative             Mapping  ...
Set                           AssAddress length = (s + w) bits                           ociaNumber of addressable units =...
RepNo choice              laceEach block only maps tomen                        one lineReplace that line         t       ...
Replacement Algorithms (2)            Associative & Set AssociativeHardware implemented algorithm (speed)Least Recently us...
WritMust not overwrite a cachee                          block unless main memory   is up to date                        P...
Writ                           eAll writes go to main memory as well as cache                        throMultiple CPUs can...
Writ                             eUpdates initially made in cache only                           bacUpdate bit for cache s...
Caching in a memory hierarchy                                                 Smaller, faster, more expensive    Level k: ...
General caching concepts  Level k:                        Program needs object d,      4      9    14   3                 ...
General caching conceptsTypes of cache misses:  • Cold (compulsary) miss     – Cold misses occur because the cache is empt...
Examples of caching in the hierarchyCache Type       What Cached            Where Cached           Latency        Managed ...
Pen80386 – no on chip cache        tiu80486 – 8k using 16 byte lines and four way set associative  organization           ...
IntelProblem                                                               Cac            Solution                        ...
Pen        tiu        m4        Blo         ck        Dia        gra         m                             88CompOrg - Mem...
PenFetch/Decode Unit                    tiu    • Fetches instructions from L2 cache    • Decode into micro-ops         m4 ...
Pen                                    tiuDecodes instructions into RISC like micro-ops before L1 cacheMicro-ops fixed len...
Pow                                erP601 – single 32kb 8 way set associative                                 C603 – 16kb ...
Pow       erP        C        G5       Blo        ck       Dia       gra        m                             92CompOrg - ...
Upcoming SlideShare
Loading in …5
×

Memory hir

1,111 views

Published on

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,111
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Memory hir

  1. 1. MemoryEri Prasetyo Wibowohttp://eri.staff.gunadarma.ac.id
  2. 2. Sem icon duc tor Me m or y Typ es 2CompOrg - Memory Hierarchy
  3. 3. SemRAM icon duc • Misnamed as all semiconduct or memory is random access • Read/ Writ e • Volat ile tor • Temporary st orage Me • St at ic or dynamic m or y 3 CompOrg - Memory Hierarchy
  4. 4. 4CompOrg - Memory Hierarchy
  5. 5. Random-Access Memory (RAM)Key features • RAM adalah sebagai kemasan chip. • Dasar penyimpanan unit adalah sel (satu bit per sel). • Beberapa chip RAM membentuk memoriStatic RAM (SRAM) • Setiap sel menyimpan bit dengan enam transistor-sirkuit. • Nilai tetap tentu, selama ini disimpan daya. • Relatif kebal untuk gangguan listrik seperti kebisingan. • Lebih cepat dan lebih mahal daripada DRAM.Dynamic RAM (DRAM) • Setiap sel menyimpan bit dengan kapasitor dan transistor. • Nilai harus refresh setiap 10-100 ms. • Sensitif terhadap gangguan. • Lambat dan lebih murah dibandingkan SRAM. 5 CompOrg - Memory Hierarchy
  6. 6. SRAM vs DRAM summary Tran. Access per bit time Persist? Sensitive? Cost ApplicationsSRAM 6 1X Yes No 100x cache memoriesDRAM 1 10X No Yes 1X Main memories, frame buffers 6 CompOrg - Memory Hierarchy
  7. 7. DynBit s st ored as charge in capacit ors i amCharges leakNeed refreshing even when powered cSimpler const ruct ion RASmaller per bit MLess ex pensiveNeed refresh circuit sSlowerMain memoryEssent ially analogue • Level of charge det ermines value 7 CompOrg - Memory Hierarchy
  8. 8. 8CompOrg - Memory Hierarchy
  9. 9. DRAddress line active when bit read AM or writt en Ope • Transist or swit ch closed (current flows)Write rati • Volt age t o bit line – High for 1 low for 0 on • Then signal address line – Transfers charge to capacitorRead • Address line select ed – transistor turns on • Charge from capacit or fed via bit line t o sense amplifier – Compares with reference value to determine 0 or 1 • Capacit or charge must be rest ored 9 CompOrg - Memory Hierarchy
  10. 10. Conventional DRAM organizationd x w DRAM: • dw total bits organized as d supercells of size w bits 16 x 8 DRAM chip cols 0 1 2 3 2 bits 0 / addr 1 rows memory 2 supercell controller (to CPU) (2,1) 3 8 bits / data internal row buffer 10 CompOrg - Memory Hierarchy
  11. 11. Reading DRAM supercell (2,1)Step 1(a): Row access strobe (RAS) selects row 2.Step 1(b): Row 2 copied from DRAM array to row buffer. 16 x 8 DRAM chip cols 0 1 2 3 RAS = 2 2 / 0 addr 1 rows memory controller 2 8 3 / data row 2 internal row buffer 11 CompOrg - Memory Hierarchy
  12. 12. Reading DRAM supercell (2,1)Step 2(a): Column access strobe (CAS) selects column 1.Step 2(b): Supercell (2,1) copied from buffer to data lines, and eventually back to the CPU. 16 x 8 DRAM chip cols 0 1 2 3 CAS = 1 2 / 0 addr 1 rows memory controller supercell 2 (2,1) 8 3 / data internal row buffer 12 CompOrg - Memory Hierarchy
  13. 13. Memory modules addr (row = i, col = j) : supercell (i,j) DRAM 0 64 MB memory module consisting ofDRAM 7 eight 8Mx8 DRAMs data bits bits bits bits bits bits bits bits 56-63 48-55 40-47 32-39 24-31 16-23 8-15 0-7 63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0 Memory controller64-bit doubleword at main memory address A 64-bit doubleword to CPU chip 13 CompOrg - Memory Hierarchy
  14. 14. Enhanced DRAMsAll enhanced DRAMs are built around the conventional DRAM core. • Fast page mode DRAM (FPM DRAM) – Access contents of row with [RAS, CAS, CAS, CAS, CAS] instead of [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)]. • Extended data out DRAM (EDO DRAM) – Enhanced FPM DRAM with more closely spaced CAS signals. • Synchronous DRAM (SDRAM) – Driven with rising clock edge instead of asynchronous control signals. • Double data-rate synchronous DRAM (DDR SDRAM) – Enhancement of SDRAM that uses both clock edges as control signals. • Video RAM (VRAM) – Like FPM DRAM, but output is produced by shifting row buffer – Dual ported (allows concurrent reads and writes) 14 CompOrg - Memory Hierarchy
  15. 15. Stat icBit s stored as on/ off swit chesNo charges t o leak RANo refreshing needed when powered MMore complex constructionLarger per bitMore ex pensiveDoes not need refresh circuitsFast erCacheDigital • Uses flip-flops 15 CompOrg - Memory Hierarchy
  16. 16. 16CompOrg - Memory Hierarchy
  17. 17. Stat ing RA M Stru ctur e 17CompOrg - Memory Hierarchy
  18. 18. Stat ic stable logic stateTransistor arrangement givesSt ate 1 RA • C high, C low 1 2 M • T T off, T T on 1 4 2 3 OpeSt ate 0 • C high, C low 2 1 rati • T T off, T T on 2 3 1 4 onAddress line transistors T5 T6 is switchWrite – apply value to B & compliment t o BRead – value is on line B 18 CompOrg - Memory Hierarchy
  19. 19. SRABoth volatile M vs • Power needed t o preserve dat aDynamic cell • Simpler t o build, smaller DR • More dense AM • Less ex pensive • Needs refresh • Larger memory unit sSt atic • Fast er • Cache 19 CompOrg - Memory Hierarchy
  20. 20. Nonvolatile memoriesDRAM and SRAM are volatile memories • Lose information if powered off.Nonvolatile memories retain value even if powered off. • Generic name is read-only memory (ROM). • Misleading because some ROMs can be read and modified.Types of ROMs • Programmable ROM (PROM) • Eraseable programmable ROM (EPROM) • Electrically eraseable PROM (EEPROM) • Flash memoryFirmware • Program stored in a ROM – Boot time code, BIOS (basic input/ouput system) – graphics cards, disk controllers. 20 CompOrg - Memory Hierarchy
  21. 21. ReaPermanent storage d • Nonvolat ile OnlMicroprogramming (see later)Library subroutines ySystems programs (BIOS) MeFunction tables m or y (RO M) 21 CompOrg - Memory Hierarchy
  22. 22. TypWritten during manufacture es of • Very ex pensive for small runsProgrammable (once) • PROM RO • Needs special equipment t o MprogramRead “mostly” • Erasable Programmable (EPROM) – Erased by UV • Elect rically Erasable (EEPROM) – Takes much longer to write than read • Flash memory – Erase whole memory electrically 22 CompOrg - Memory Hierarchy
  23. 23. Org anisA 16Mbit chip can be organised as 1M of 16 bit words atioA bit per chip system has 16 lot s of 1Mbit chip wit h bit 1 of each word in chip 1 and so on n in as a 2048 x 2048 xA 16Mbit chip can be organised 4bit array det ail column address • Reduces number of address pins – Multiplex row address and – 11 pins to address (2 11 =2048) – Adding one more pin doubles range of values so x4 capacity (2 12 x4 Capacity with 2 11 ) 23 CompOrg - Memory Hierarchy
  24. 24. Bus structure connecting CPU and memoryA bus is a collection of parallel wires that carry address, data, and control signals.Buses are typically shared by multiple devices. CPU chip register file ALU system bus memory bus I/O main bus interface bridge memory 24 CompOrg - Memory Hierarchy
  25. 25. Memory read transaction (1)CPU places address A on the memory bus. register file Load operation: movl A, %eax ALU %eax main memory I/O bridge 0 A bus interface x A 25 CompOrg - Memory Hierarchy
  26. 26. Memory read transaction (2)Main memory reads A from the memory bus, retreives word x, and places it on the bus. register file Load operation: movl A, %eax ALU %eax main memory I/O bridge x 0 bus interface x A 26 CompOrg - Memory Hierarchy
  27. 27. Memory read transactionCPU read word x from the(3) and copies it into bus register %eax. register file Load operation: movl A, %eax ALU %eax x main memory I/O bridge 0 bus interface x A 27 CompOrg - Memory Hierarchy
  28. 28. Memory write transaction (1)CPU places address A on bus. Main memory reads itand waits for the corresponding data word to arrive. register file Store operation: movl %eax, A ALU %eax y main memory I/O bridge 0 A bus interface A 28 CompOrg - Memory Hierarchy
  29. 29. Memory write transaction (2)CPU places data word y on the bus. register file Store operation: movl %eax, A ALU %eax y main memory I/O bridge 0 y bus interface A 29 CompOrg - Memory Hierarchy
  30. 30. Memory write transaction (3)Main memory read data word y from the bus andstores it at address A. register file Store operation: movl %eax, A ALU %eax y main memory I/O bridge 0 bus interface y A 30 CompOrg - Memory Hierarchy
  31. 31. Disk geometryDisks consist of platters, each with two surfaces.Each surface consists of concentric rings called tracks.Each track consists of sectors separated by gaps. tracks surface track k gaps spindle sectors 31 CompOrg - Memory Hierarchy
  32. 32. Disk geometry (muliple-platter view)Aligned tracks form a cylinder. cylinder k surface 0 platter 0 surface 1 surface 2 platter 1 surface 3 surface 4 platter 2 surface 5 spindle 32 CompOrg - Memory Hierarchy
  33. 33. Disk capacityCapacity: maximum number of bits that can be stored. • Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^6.Capacity is determined by these technology factors: • Recording density (bits/in): number of bits that can be squeezed into a 1 inch segment of a track. • Track density (tracks/in): number of tracks that can be squeezed into a 1 inch radial segment. • Areal density (bits/in2): product of recording and track density.Modern disks partition tracks into disjoint subsets called recording zones • Each track in a zone has the same number of sectors, determined by the circumference of innermost track. • Each zone has a different number of sectors/track 33 CompOrg - Memory Hierarchy
  34. 34. Computing disk capacityCapacity = (# bytes/sector) x (avg. # sectors/track) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk)Example: • 512 bytes/sector • 300 sectors/track (on average) • 20,000 tracks/surface • 2 surfaces/platter • 5 platters/diskCapacity = 512 x 300 x 20000 x 2 x 5 = 30,720,000,000 = 30.72 GB 34 CompOrg - Memory Hierarchy
  35. 35. Disk operation (single-platter view)The disk The read/write headsurface is attached to the endspins at a fixed of the arm and flies overrotational rate the disk surface on a thin cushion of air. spindle By moving radially, the arm can position the read/write head over any track. 35 CompOrg - Memory Hierarchy
  36. 36. Disk operation (multi-platter view) read/write heads move in unison from cylinder to cylinder arm spindle 36 CompOrg - Memory Hierarchy
  37. 37. Disk access timeAverage time to access some target sector approximated by : • Taccess = Tavg seek + Tavg rotation + Tavg transferSeek time • Time to position heads over cylinder containing target sector. • Typical Tavg seek = 9 msRotational latency • Time waiting for first bit of target sector to pass under r/w head. • Tavg rotation = 1/2 x 1/RPMs x 60 sec/1 minTransfer time • Time to read the bits in the target sector. • Tavg transfer = 1/RPM x 1/(avg # sectors/track) x 60 secs/1 min. 37 CompOrg - Memory Hierarchy
  38. 38. Disk access timeGiven: example • Rotational rate = 7,200 RPM • Average seek time = 9 ms. • Avg # sectors/track = 400.Derived: • Tavg rotation = 1/2 x (60 secs/7200 RPM) x 1000 ms/sec = 4 ms. • Tavg transfer = 60/7200 RPM x 1/400 secs/track x 1000 ms/sec = 0.02 ms • Taccess = 9 ms + 4 ms + 0.02 msImportant points: • Access time dominated by seek time and rotational latency. • First bit in a sector is the most expensive, the rest are free. • SRAM access time is about 4ns/doubleword, DRAM about 60 ns – Disk is about 40,000 times slower than SRAM, – 2,500 times slower then DRAM. 38 CompOrg - Memory Hierarchy
  39. 39. Logical disk blocksModern disks present a simpler abstract view of the complex sector geometry: • The set of available sectors is modeled as a sequence of b-sized logical blocks (0, 1, 2, ...)Mapping between logical blocks and actual (physical) sectors • Maintained by hardware/firmware device called disk controller. • Converts requests for logical blocks into (surface,track,sector) triples.Allows controller to set aside spare cylinders for each zone. • Accounts for the difference in “formatted capacity” and “maximum capacity”. 39 CompOrg - Memory Hierarchy
  40. 40. Bus structure connecting I/O and CPU CPU chip register file ALU system bus memory bus I/O main bus interface bridge memory I/O bus Expansion slots for other devices such USB graphics disk as network adapters. controller adapter controller mouse keyboard monitor disk 40 CompOrg - Memory Hierarchy
  41. 41. Reading a disk sector (1)CPU chip CPU initiates a disk read by writing a register file command, logical block number, and destination memory address to a port ALU (address) associated with disk controller. main bus interface memory I/O bus USB graphics disk controller adapter controller mouse keyboard monitor disk 41 CompOrg - Memory Hierarchy
  42. 42. Reading a disk sector (2)CPU chip Disk controller reads the sector and register file performs a direct memory access (DMA) transfer into main memory. ALU main bus interface memory I/O bus USB graphics disk controller adapter controller mouse keyboard monitor disk 42 CompOrg - Memory Hierarchy
  43. 43. Reading a disk sector (3)CPU chip When the DMA transfer completes, the register file disk controller notifies the CPU with an interrupt (i.e., asserts a special “interrupt” ALU pin on the CPU) main bus interface memory I/O bus USB graphics disk controller adapter controller mouse keyboard monitor disk 43 CompOrg - Memory Hierarchy
  44. 44. Storage trends metric 1980 1985 1990 1995 2000 2000:1980SRAM $/MB 19,200 2,900 320 256 100 190 access (ns) 300 150 35 15 2 100 metric 1980 1985 1990 1995 2000 2000:1980DRAM $/MB 8,000 880 100 30 1 8,000 access (ns) 375 200 100 70 60 6 typical size(MB) 0.064 0.256 4 16 64 1,000 metric 1980 1985 1990 1995 2000 2000:1980 $/MB 500 100 8 0.30 0.05 10,000Disk access (ms) 87 75 28 10 8 11 typical size(MB) 1 10 160 1,000 9,000 9,000 44 (Culled from back issues of Byte and PC Magazine) CompOrg - Memory Hierarchy
  45. 45. CPU clock rates 1980 1985 1990 1995 2000 2000:1980processor 8080 286 386 Pent P-IIIclock rate(MHz) 1 6 20 150 750 750cycle time(ns) 1,000 166 50 6 1.6 750 45 CompOrg - Memory Hierarchy
  46. 46. The CPU-Memory Gap The increasing gap between DRAM, disk, and CPU speeds. 100,000,000 10,000,000 1,000,000 Disk seek time 100,000 DRAM access time 10,000ns SRAM access time 1,000 CPU cycle time 100 10 1 1980 1985 1990 1995 2000 year 46 CompOrg - Memory Hierarchy
  47. 47. Memory hierarchiesSome fundamental and enduring properties of hardware and software: • Fast storage technologies cost more per byte and have less capacity. • The gap between CPU and main memory speed is widening. • Well-written programs tend to exhibit good locality.These fundamental properties complement each other beautifully.Suggest an approach for organizing memory and storage systems known as a “memory hierarchy”. 47 CompOrg - Memory Hierarchy
  48. 48. An example memory hierarchy L0: Smaller, faster, registers CPU registers hold words retrieved and from cache memory. costlier L1: on-chip L1(per byte) cache (SRAM) storage L1 cache holds cache lines devices retrieved from the L2 cache. L2: off-chip L2 cache (SRAM) L2 cache holds cache lines retrieved from memory. L3: main memory (DRAM) Larger, Main memory holds disk slower, blocks retrieved from local and disks. cheaper local secondary storage(per byte) L4: storage (local disks) devices Local disks hold files retrieved from disks on remote network servers. L5: remote secondary storage (distributed file systems, Web servers) 48 CompOrg - Memory Hierarchy
  49. 49. Cache sCache: A smaller, faster storage device that acts as a staging area for a subset of the data in a larger, slower device.Fundamental idea of a memory hierarchy: • For each k, the faster, smaller device at level k serves as a cache for the larger, slower device at level k+1.Why do memory hierarchies work? • Programs tend to access the data at level k more often than they access the data at level k+1. • Thus, the storage at level k+1 can be slower, and thus larger and cheaper per bit. • Net effect: A large pool of memory that costs as much as the cheap storage near the bottom, but that serves data to programs at the rate of the fast storage near the top. 49 CompOrg - Memory Hierarchy
  50. 50. Cac heSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip or module 50 CompOrg - Memory Hierarchy
  51. 51. Cac he/ Mai n Me mor y Stru ctur e 51CompOrg - Memory Hierarchy
  52. 52. CacCPU requests contents of he memory locationCheck cache for this dataopeIf present, get from cacherati (fast)If not present, read required block from main memory to cache on – oveThen deliver from cache to CPU rvieCache includes tags to identify which block of main memory is in each cache slot w 52 CompOrg - Memory Hierarchy
  53. 53. Cac he Rea d Ope rati on - Flo wch art 53CompOrg - Memory Hierarchy
  54. 54. CacSize heMapping Function DesReplacement Algorithm ignWrite PolicyBlock SizeNumber of Caches 54 CompOrg - Memory Hierarchy
  55. 55. SizeCost doe • More cache is expensive sSpeed mat • More cache is faster (up to a point) ter • Checking cache for data takes time 55 CompOrg - Memory Hierarchy
  56. 56. Typi cal Cac he Org aniz atio n 56CompOrg - Memory Hierarchy
  57. 57. Co Processor Type Year ofmpa L1 cachea L2 cache L3 cache Introduction IBM 360/85 PDP-11/70 Mainframe Minicomputer 1968 1975 riso 16 to 32 KB 1 KB — — — — VAX 11/780 IBM 3033 Minicomputer Mainframe 1978 1978 n of 16 KB 64 KB — — — — IBM 3090 Intel 80486 Mainframe PC 1985 1989 Cac 128 to 256 KB 8 KB — — — — Pentium PowerPC 601 PC PC 1993 1993 he 8 KB/8 KB 32 KB 256 to 512 KB — — — PowerPC 620 PowerPC G4 PC PC/server 1996 1999 Size 32 KB/32 KB 32 KB/32 KB — 256 KB to 1 MB — 2 MBIBM S/390 G4IBM S/390 G6 Mainframe Mainframe 1997 1999 s 32 KB 256 KB 256 KB 8 MB 2 MB — Pentium 4 PC/server 2000 8 KB/8 KB 256 KB — High-end server/ IBM SP 2000 64 KB/32 KB 8 MB — supercomputer CRAY MTAb Supercomputer 2000 8 KB 2 MB — Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MBSGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB — Itanium 2 PC/server 2002 32 KB 256 KB 6 MBIBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB 57 CRAY XD-1 Supercomputer CompOrg - Memory Hierarchy 2004 64 KB/64 KB 1MB —
  58. 58. MapCache of 64kByte pinCache block of 4 bytes g • i.e. cache is 16k (2 ) lines of 4 bytes 1416MBytes main memory Fun24 bit address ctio • (2 =16M) 24 n 58 CompOrg - Memory Hierarchy
  59. 59. Dire ctEach block of main memory maps to only one cache line Map in one specific place • i.e. if a block is in cache, it must beAddress is in two parts pin gLeast Significant w bits identify unique wordMost Significant s bits specify one memory blockThe MSBs are split into a cache line field r and a tag of s-r (most significant) 59 CompOrg - Memory Hierarchy
  60. 60. Dire ct Tag s-r MapSlot r Line or Word w pin 14 2 8 g24 bit address2 bit word identifier (4 byte block) Add22 bit block identifier • 8 bit tag (=22-14) res • 14 bit slot or line sNo two blocks in the same line have the same Tag field StruCheck contents of cache by finding line and checking Tag ctur e 60 CompOrg - Memory Hierarchy
  61. 61. DireCache line ct Main Memory blocks held0 0, m, 2m, 3m…2s-m1 Map 1,m+1, 2m+1…2s-m+1 pinm-1 g m-1, 2m-1,3m-1…2s-1 Cac he Line Tabl e 61 CompOrg - Memory Hierarchy
  62. 62. Dire ct Map pin g Cac he Org aniz atio n 62CompOrg - Memory Hierarchy
  63. 63. Direct Mapping Example 63 CompOrg - Memory Hierarchy
  64. 64. Dire ctAddress length = (s + w) bits MapNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytes pinNumber of blocks in main memory = 2s+ w/2w = 2s gNumber of lines in cache = m = 2rSize of tag = (s – r) bits Su mm ary 64 CompOrg - Memory Hierarchy
  65. 65. DireSimple ctInexpensive MapFixed location for given block pin • If a program accesses 2 blocks that map to the same line repeatedly, cache misses areg high very pro s& con s 65 CompOrg - Memory Hierarchy
  66. 66. Ass ociaA main memory block can load into any line of cache tiveMemory address is interpreted as tag and wordTag uniquely identifies block of memory MapEvery line’s tag is examined for a match pinCache searching gets expensive g 66 CompOrg - Memory Hierarchy
  67. 67. Full y Ass ocia tive Cac he Org aniz atio n 67CompOrg - Memory Hierarchy
  68. 68. Assosiative Mapping Example 68 CompOrg - Memory Hierarchy
  69. 69. Ass ocia tive Tag 22 bit Word Map 2 bit pin of data22 bit tag stored with each 32 bit block gCompare tag field with tag entry in cache to check for hit AddLeast significant 2 bits of address identify which 16 bit word is required from 32 bit data blocke.g. res • Address • FFFFFC Tag s Data FFFFFC 246824683FFF Cache line Stru ctur e CompOrg - Memory Hierarchy 69
  70. 70. AssAddress length = (s + w) ocia bits tiveNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytes MapNumber of blocks in main memory = 2s+ w/2w = 2s pinNumber of lines in cache = undeterminedSize of tag = s bits g Su mm ary 70 CompOrg - Memory Hierarchy
  71. 71. Set AssCache is divided into a number of sets ociaEach set contains a number of linesA given block maps to any line in a given set tive • e.g. Block B can be in any line of set ie.g. 2 lines per set Map • 2 way associative mapping pin • A given block can be in one of 2 lines in only one set g 71 CompOrg - Memory Hierarchy
  72. 72. Set13 bit set number Ass ociaBlock number in main memory is modulo 2 13000000, 00A000, 00B000, tive … map to same set 00C000 Map pin g Exa mpl e 72 CompOrg - Memory Hierarchy
  73. 73. Two Way Set Associative Cache Organization 73 CompOrg - Memory Hierarchy
  74. 74. Set Ass ocia Word Tag 9 bit Set 13 bit tive 2 bit MapUse set field to determine cache set to look in pinCompare tag field to see if we have a hite.g g • Address Tag Data Set number • 1FF 7FFC 1FF Add 1FFF 12345678 • 001 7FFC 001 11223344res 1FFF s Stru ctur CompOrg - Memory Hierarchy 74
  75. 75. Two Way set Assosiative Mapping Example Two Way Set Associative Mapping Example 75 CompOrg - Memory Hierarchy
  76. 76. Set AssAddress length = (s + w) bits ociaNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytes tiveNumber of blocks in main memory = 2d MapNumber of lines in set = kNumber of sets = v = 2d pinNumber of lines in cache = g = k * 2d kvSize of tag = (s – d) bits Su mm ary 76 CompOrg - Memory Hierarchy
  77. 77. RepNo choice laceEach block only maps tomen one lineReplace that line t Alg orit hms (1) Dire ct map pin CompOrg - Memory Hierarchy 77
  78. 78. Replacement Algorithms (2) Associative & Set AssociativeHardware implemented algorithm (speed)Least Recently used (LRU)e.g. in 2 way set associative • Which of the 2 block is lru?First in first out (FIFO) • replace block that has been in cache longestLeast frequently used • replace block which has had fewest hitsRandom 78 CompOrg - Memory Hierarchy
  79. 79. WritMust not overwrite a cachee block unless main memory is up to date PoliMultiple CPUs may have individual caches cy directlyI/O may address main memory 79 CompOrg - Memory Hierarchy
  80. 80. Writ eAll writes go to main memory as well as cache throMultiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to dateLots of traffic ughSlows down writesRemember bogus write through caches! 80 CompOrg - Memory Hierarchy
  81. 81. Writ eUpdates initially made in cache only bacUpdate bit for cache slot is set when update occursIf block is to be replaced, write to main memory only if k update bit is setOther caches get out of syncI/O must access main memory through cacheN.B. 15% of memory references are writes 81 CompOrg - Memory Hierarchy
  82. 82. Caching in a memory hierarchy Smaller, faster, more expensive Level k: 4 9 14 3 device at level k caches a subset of the blocks from level k+1 Data is copied between levels in block-sized transfer units 0 1 2 3 4 5 6 7 Larger, slower, cheaper storageLevel k+1: device at level k+1 is partitioned 8 9 10 11 into blocks. 12 13 14 15 82 CompOrg - Memory Hierarchy
  83. 83. General caching concepts Level k: Program needs object d, 4 9 14 3 which is stored in some block b. Cache hit • Program finds b in the cacheLevel k+1: at level k. E.g. block 14. Cache miss 0 1 2 3 • b is not at level k, so level k 4 5 6 7 cache must fetch it from level k+1. E.g. block 12. 8 9 10 11 • If level k cache is full, then 12 13 14 15 some current block must be replaced (evicted). • Which one? Determined by replacement policy. E.g. evict least recently used block. 83 CompOrg - Memory Hierarchy
  84. 84. General caching conceptsTypes of cache misses: • Cold (compulsary) miss – Cold misses occur because the cache is empty. • Conflict miss – Most caches limit blocks at level k+1 to a small subset (sometimes a singleton) of the block positions at level k. – E.g. Block i at level k+1 must be placed in block (i mod 4) at level k+1. – Conflict misses occur when the level k cache is large enough, but multiple data objects all map to the same level k block. – E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time. • Capacity miss – Occurs when the set of active cache blocks (working set) is larger than the cache. 84 CompOrg - Memory Hierarchy
  85. 85. Examples of caching in the hierarchyCache Type What Cached Where Cached Latency Managed (cycles) ByRegisters 4-byte word CPU registers 0 CompilerTLB Address On-Chip TLB 0 Hardware translationsL1 cache 32-byte block On-Chip L1 1 HardwareL2 cache 32-byte block Off-Chip L2 10 HardwareVirtual Memory 4-KB page Main memory 100 Hardware+ OSBuffer cache Parts of files Main memory 100 OSNetwork buffer Parts of files Local disk 10,000,000 AFS/NFScache clientBrowser cache Web pages Local disk 10,000,000 Web browserWeb cache Web pages Remote server 1,000,000,000 Web proxy disks server 85 CompOrg - Memory Hierarchy
  86. 86. Pen80386 – no on chip cache tiu80486 – 8k using 16 byte lines and four way set associative organization m4 CacPentium (all versions) – two on chip L1 caches • Data & instructions hePentium III – L3 cache added off chipPentium 4 • L1 caches – 8k bytes – 64 byte lines – four way set associative • L2 cache – Feeding both L1 caches – 256k – 128 byte lines – 8 way set associative • L3 cache on chip 86 CompOrg - Memory Hierarchy
  87. 87. IntelProblem Cac Solution Processor on which feature first appearsExternal memory slower than the system bus. he Add external cache using faster 386 memory technology.Increased processor speed results in external bus becoming a Evo Move external cache on-chip, 486bottleneck for cache access. luti operating at the same speed as the processor.Internal cache is rather small, due to limited space on chip on Add external L2 cache using faster technology than main memory 486Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentiumthe Execution Unit simultaneously require access to the caches.cache. In that case, the Prefetcher is stalled while theExecution Unit’s data access takes place. Create separate back-side bus that Pentium Pro runs at higher speed than the mainIncreased processor speed results in external bus becoming a (front-side) external bus. The BSB isbottleneck for L2 cache access. dedicated to the L2 cache. Move L2 cache on to the processor Pentium II chip.Some applications deal with massive databases and must Add external L3 cache. Pentium IIIhave rapid access to large amounts of data. The on-chipcaches are too small. 87 CompOrg - Memorycache on-chip. Move L3 Hierarchy Pentium 4
  88. 88. Pen tiu m4 Blo ck Dia gra m 88CompOrg - Memory Hierarchy
  89. 89. PenFetch/Decode Unit tiu • Fetches instructions from L2 cache • Decode into micro-ops m4 • Store micro-ops in L1 cache CorOut of order execution logic • Schedules micro-ops e Pro • Based on data dependence and resources • May speculatively executeExecution units ces • Execute micro-ops • Data from L1 cache sor • Results in registersMemory subsystem • L2 cache and systems bus 89 CompOrg - Memory Hierarchy
  90. 90. Pen tiuDecodes instructions into RISC like micro-ops before L1 cacheMicro-ops fixed length • m4 Superscalar pipelining and scheduling DesPentium instructions long & complexPerformance improved by separating decoding from scheduling & pipelining • (More later – ch14) ignData cache is write back • Rea Can be configured to write through soniL1 cache controlled by 2 bits in register • CD = cache disable • NW = not write through ng • 2 instructions to invalidate (flush) cache and write back then invalidateL2 and L3 8-way set-associative • Line size 128 bytes 90 CompOrg - Memory Hierarchy
  91. 91. Pow erP601 – single 32kb 8 way set associative C603 – 16kb (2 x 8kb) two way set associative604 – 32kb Cac620 – 64kb heG3 & G4 • 64kb L1 cache Org – 8 way set associative • 256k, 512k or 1M L2 cache aniz – two way set associative atioG5 n • 32kB instruction cache • 64kB data cache 91 CompOrg - Memory Hierarchy
  92. 92. Pow erP C G5 Blo ck Dia gra m 92CompOrg - Memory Hierarchy

×