Presented by:
Trupti Diwan
Shweta Ghate
Sapana Vasave
 Basics of memory
 Memory Technology
 Memory optimization
 Main memory is RAM.(The words Memory,
Buffer, Cache are all refers Ram). Which is
Nearly 11,000 times faster than secondary
memory (Hard Disk) in Random Access.)
 Characteristics of Main Memory is as vital as
the processor chip to a computer system.
Fast systems have both a fast processor and a
large memory
 Here is a list of some characteristics of
computer memory.
 closely connected to the processor.
 Holds programs and data that the processor is
actively working with.
 Used for long term storage.
 The processor interacts with it millions of times
per second.
 The contents is easily changed.
 Usually its contents are organized into files.
 Main memory is the short term memory of a
computer. It retains data only for the period
that a program is running, and that's it.
 Memory is used for the purpose of running
programs
 Main memory satisfies the demands of caches
and serves as the I/O interface.
 Performance measures of main memory
emphasize both latency and bandwidth
(Memory bandwidth is the number of bytes
read or written per unit time)
 Memory latency is the time delay required to
obtain a specific item of data
 Memory Bandwidth is the rate at which data can
be accessed (e.g. bits per second)
 Bandwidth unit is normally 1/cycle time
 This rate can be improved by concurrent access
 The main memory affects the cache miss
penalty,
 the primary concern of the cache.
 Main memory bandwidth
 the primary concern of I/O and multiprocessors.
 Although caches are interested in low
latency memory, it is generally easier to
improve memory bandwidth with new
organizations than it is to reduce latency
 cache designers increase block size to take
advantage of the high memory bandwidth.
 Memory latency is traditionally quoted using
two measures
 Access time
 Access time is the time between when a read is
requested and when the desired word arrives and
 Cycle time
 Cycle time is the minimum time between requests to
memory.
 Cycle time is greater than access time
because the memory needs the address lines
to be stable between accesses.
Memory Hierarchy of a Modern Computer System
• By taking advantage of the principle of locality:
– Present the user with as much memory as is available in the
cheapest technology.
– Provide access at the speed offered by the fastest technology.
Control
Datapath
Secondary
Storage
(Disk)
Processor
Registers
Main
Memory
(DRAM)
Second
Level
Cache
(SRAM)
On-Chip
Cache
1s 10,000,000s
(10s ms)
Speed (ns): 10s 100s
100s GsSize (bytes): Ks Ms
Tertiary
Storage
(Tape)
10,000,000,000s
(10s sec)
Ts
 Static Random Access Memory (SRAM)
- use for cache.
 Dynamic Random Access Memory (DRAM)
- use for main memory
 ‘S’ stands for static.
 No need to be refresh , so access time is close to
cycle time.
 Uses 6 transistor per bit.
 Bits stored as on/off switches
 No charges to leak
 More complex construction
 Larger per bit
 More expensive
 Faster
 Transistor arrangement gives stable logic state
 State 1
 C1 high, C2 low
 T1 T4 off, T2 T3 on
 State 0
 C2 high, C1 low
 T2 T3 off, T1 T4 on
 Address line transistors T5 T6 is switch
 Write – apply value to B & compliment to B
 Read – value is on line B
 Bits stored as charge in capacitors
 Charges leak
 Need refreshing even when powered
 Simpler construction
 Smaller per bit
 Less expensive
 Need refresh circuits
 Slower
 Cycle time is longer than the access time
 Address line active when bit read or written
 Transistor switch closed (current flows)
 Write
 Voltage to bit line
 High for 1 low for 0
 Then signal address line
 Transfers charge to capacitor
 Read
 Address line selected
 transistor turns on
 Charge from capacitor fed via bit line to sense amplifier
 Compares with reference value to determine 0 or 1
 Capacitor charge must be restored
 Addresses divided into 2 halves (Memory as a 2D matrix):
 RAS or Row Access Strobe
 CAS or Column Access Strobe
Fig : Internal Organization of a DRAM
 DRAM had an asynchronous interface to the
memory controller so every transfer involved
overhead to synchronize with controller.
 Adding clock signal to the DRAM interface
reduces the overhead, such optimization
called
“Synchronous DRAM “ i.e SDRAMs
 Double Data Rate i.e DDR was a later development
of SDRAM, used in PC memory beginning in 2000.
 DDR SDRAM internally performs double-width
accesses at the clock rate, and uses a double data
rate interface to transfer one half on each clock
edge.
 Further version of DDR(2 data transfer/cycle)
-DDR2(4 data transfer/cycle)
-DDR3 (8 data transfer/cycle)
 Review: 6 Basic Cache Optimizations
 • Reducing hit time
 1.Address Translation during Cache Indexing
 • Reducing Miss Penalty
 2. Multilevel Caches
 3. Giving priority to read misses over write
misses
 • Reducing Miss Rate
 4. Larger Block size (Compulsory misses)
 5. Larger Cache size (Capacity misses)
 6. Higher Associativity (Conflict misses)
 Reducing hit time
 1. Small and simple caches
 2. Way prediction
 3. Trace caches
 • Increasing cache bandwidth
 4. Pipelined caches
 5. Multibanked caches
 6. Nonblocking caches
 • Reducing Miss Penalty
 7. Critical word first
 8. Merging write buffers
 • Reducing Miss Rate
 9.Compiler optimizations
 • Reducing miss penalty or miss rate via parallelism
 10.Hardware prefetching
 11.Compiler prefetching
 Advanced cache optimizations into the following categories:
 Reducing the hit time: small and simple caches, way prediction, and trace
 caches
 Increasing cache bandwidth: pipelined caches, multibanked caches, and
nonblocking
 caches
 Reducing the miss penalty: critical word first and merging write buffers
 Reducing the miss rate: compiler optimizations
 Reducing the miss penalty or miss rate via parallelism: hardware prefetching
 and compiler prefetching
 We will conclude with a summary of the implementation complexity and the
performance
 First Optimization: Small and Simple Caches to Reduce
Hit Time
 A time-consuming portion of a cache hit is using the
index portion of the address
 to read the tag memory and then compare it to the
address. Smaller hardware can
 be faster, so a small cache can help the hit time. It is
also critical to keep an L2
 cache small enough to fit on the same chip as the
processor to avoid the time penalty
 of going off chip.
 The second suggestion is to keep the cache simple, such
as using direct mapping.
 One benefit of direct-mapped caches is that the designer
can overlap the tag
 check with the transmission of the data. This effectively
reduces hit time
 Compiler techniques to reduce cache misses+ 0 Software is a
challenge; some computers have compiler option
 Hardware prefetching of instructions and data + + 2 instr.,3
data
 Many prefetch instructions;Opteron and Pentium 4 prefetch
data
 Compiler-controlled prefetching + + 3 Needs nonblocking
cache;
 possible instruction overhead; in many CPUs
 Figure 5.11 Summary of 11 advanced cache
optimizations showing impact on cache performance
and complexity.
 Although generally a technique helps only one factor,
prefetching can reduce misses if done sufficiently early; if
not,
 it can reduce miss penalty. + means that the technique
improves the factor, – means it hurts that factor, and blank
 means it has no impact. The complexity measure is
subjective, with 0 being the easiest and 3 being a challenge
 The techniques to improve hit time, bandwidth, miss
penalty
 Miss rate generally affect the other components of the
average memory access equation as well as the complexity
of the memory hierarchy.
 Estimates the impact on complexity, with + meaning that
the technique
 Improves the factor, – meaning it hurts that factor, and
blank meaning it has no impact.
 Generally, no technique helps more than one category.
 Computer Architecture - A Quantitative
Approach
Unit I Memory technology and optimization

Unit I Memory technology and optimization

  • 1.
  • 2.
     Basics ofmemory  Memory Technology  Memory optimization
  • 3.
     Main memoryis RAM.(The words Memory, Buffer, Cache are all refers Ram). Which is Nearly 11,000 times faster than secondary memory (Hard Disk) in Random Access.)  Characteristics of Main Memory is as vital as the processor chip to a computer system. Fast systems have both a fast processor and a large memory
  • 4.
     Here isa list of some characteristics of computer memory.  closely connected to the processor.  Holds programs and data that the processor is actively working with.  Used for long term storage.  The processor interacts with it millions of times per second.  The contents is easily changed.  Usually its contents are organized into files.
  • 5.
     Main memoryis the short term memory of a computer. It retains data only for the period that a program is running, and that's it.  Memory is used for the purpose of running programs
  • 6.
     Main memorysatisfies the demands of caches and serves as the I/O interface.  Performance measures of main memory emphasize both latency and bandwidth (Memory bandwidth is the number of bytes read or written per unit time)  Memory latency is the time delay required to obtain a specific item of data  Memory Bandwidth is the rate at which data can be accessed (e.g. bits per second)  Bandwidth unit is normally 1/cycle time  This rate can be improved by concurrent access
  • 7.
     The mainmemory affects the cache miss penalty,  the primary concern of the cache.  Main memory bandwidth  the primary concern of I/O and multiprocessors.
  • 8.
     Although cachesare interested in low latency memory, it is generally easier to improve memory bandwidth with new organizations than it is to reduce latency  cache designers increase block size to take advantage of the high memory bandwidth.
  • 9.
     Memory latencyis traditionally quoted using two measures  Access time  Access time is the time between when a read is requested and when the desired word arrives and  Cycle time  Cycle time is the minimum time between requests to memory.  Cycle time is greater than access time because the memory needs the address lines to be stable between accesses.
  • 10.
    Memory Hierarchy ofa Modern Computer System • By taking advantage of the principle of locality: – Present the user with as much memory as is available in the cheapest technology. – Provide access at the speed offered by the fastest technology. Control Datapath Secondary Storage (Disk) Processor Registers Main Memory (DRAM) Second Level Cache (SRAM) On-Chip Cache 1s 10,000,000s (10s ms) Speed (ns): 10s 100s 100s GsSize (bytes): Ks Ms Tertiary Storage (Tape) 10,000,000,000s (10s sec) Ts
  • 11.
     Static RandomAccess Memory (SRAM) - use for cache.  Dynamic Random Access Memory (DRAM) - use for main memory
  • 12.
     ‘S’ standsfor static.  No need to be refresh , so access time is close to cycle time.  Uses 6 transistor per bit.  Bits stored as on/off switches  No charges to leak  More complex construction  Larger per bit  More expensive  Faster
  • 14.
     Transistor arrangementgives stable logic state  State 1  C1 high, C2 low  T1 T4 off, T2 T3 on  State 0  C2 high, C1 low  T2 T3 off, T1 T4 on  Address line transistors T5 T6 is switch  Write – apply value to B & compliment to B  Read – value is on line B
  • 15.
     Bits storedas charge in capacitors  Charges leak  Need refreshing even when powered  Simpler construction  Smaller per bit  Less expensive  Need refresh circuits  Slower  Cycle time is longer than the access time
  • 17.
     Address lineactive when bit read or written  Transistor switch closed (current flows)  Write  Voltage to bit line  High for 1 low for 0  Then signal address line  Transfers charge to capacitor  Read  Address line selected  transistor turns on  Charge from capacitor fed via bit line to sense amplifier  Compares with reference value to determine 0 or 1  Capacitor charge must be restored
  • 18.
     Addresses dividedinto 2 halves (Memory as a 2D matrix):  RAS or Row Access Strobe  CAS or Column Access Strobe Fig : Internal Organization of a DRAM
  • 19.
     DRAM hadan asynchronous interface to the memory controller so every transfer involved overhead to synchronize with controller.  Adding clock signal to the DRAM interface reduces the overhead, such optimization called “Synchronous DRAM “ i.e SDRAMs
  • 20.
     Double DataRate i.e DDR was a later development of SDRAM, used in PC memory beginning in 2000.  DDR SDRAM internally performs double-width accesses at the clock rate, and uses a double data rate interface to transfer one half on each clock edge.  Further version of DDR(2 data transfer/cycle) -DDR2(4 data transfer/cycle) -DDR3 (8 data transfer/cycle)
  • 21.
     Review: 6Basic Cache Optimizations  • Reducing hit time  1.Address Translation during Cache Indexing  • Reducing Miss Penalty  2. Multilevel Caches  3. Giving priority to read misses over write misses  • Reducing Miss Rate  4. Larger Block size (Compulsory misses)  5. Larger Cache size (Capacity misses)  6. Higher Associativity (Conflict misses)
  • 22.
     Reducing hittime  1. Small and simple caches  2. Way prediction  3. Trace caches  • Increasing cache bandwidth  4. Pipelined caches  5. Multibanked caches  6. Nonblocking caches  • Reducing Miss Penalty  7. Critical word first  8. Merging write buffers  • Reducing Miss Rate  9.Compiler optimizations  • Reducing miss penalty or miss rate via parallelism  10.Hardware prefetching  11.Compiler prefetching
  • 23.
     Advanced cacheoptimizations into the following categories:  Reducing the hit time: small and simple caches, way prediction, and trace  caches  Increasing cache bandwidth: pipelined caches, multibanked caches, and nonblocking  caches  Reducing the miss penalty: critical word first and merging write buffers  Reducing the miss rate: compiler optimizations  Reducing the miss penalty or miss rate via parallelism: hardware prefetching  and compiler prefetching  We will conclude with a summary of the implementation complexity and the performance
  • 24.
     First Optimization:Small and Simple Caches to Reduce Hit Time  A time-consuming portion of a cache hit is using the index portion of the address  to read the tag memory and then compare it to the address. Smaller hardware can  be faster, so a small cache can help the hit time. It is also critical to keep an L2  cache small enough to fit on the same chip as the processor to avoid the time penalty  of going off chip.  The second suggestion is to keep the cache simple, such as using direct mapping.  One benefit of direct-mapped caches is that the designer can overlap the tag  check with the transmission of the data. This effectively reduces hit time
  • 26.
     Compiler techniquesto reduce cache misses+ 0 Software is a challenge; some computers have compiler option  Hardware prefetching of instructions and data + + 2 instr.,3 data  Many prefetch instructions;Opteron and Pentium 4 prefetch data  Compiler-controlled prefetching + + 3 Needs nonblocking cache;  possible instruction overhead; in many CPUs
  • 27.
     Figure 5.11Summary of 11 advanced cache optimizations showing impact on cache performance and complexity.  Although generally a technique helps only one factor, prefetching can reduce misses if done sufficiently early; if not,  it can reduce miss penalty. + means that the technique improves the factor, – means it hurts that factor, and blank  means it has no impact. The complexity measure is subjective, with 0 being the easiest and 3 being a challenge
  • 28.
     The techniquesto improve hit time, bandwidth, miss penalty  Miss rate generally affect the other components of the average memory access equation as well as the complexity of the memory hierarchy.  Estimates the impact on complexity, with + meaning that the technique  Improves the factor, – meaning it hurts that factor, and blank meaning it has no impact.  Generally, no technique helps more than one category.
  • 29.
     Computer Architecture- A Quantitative Approach