All the lecture slides were adopted from the slides of David Patterson (1998, 2001) and David E. Culler (2001), Copyright 1998-2002, University of California Berkeley
Standing on shoulders of giants
“ Ideally one would desire an indefinitely large memory capacity such that any particular… word would be immediately available… We are… forced to recognize the possibility of constructing a hierarchy of memories, each of which has a greater capacity than the preceding but which is less quickly accessible.”
A.W.Burks, H.H.Goldstine and J. von Neumann
Preliminary Discussion of the Logical Design of an Electronic Computing Instrument ( 1946 )
Elements of Memory Organization
The technologies (SRAM, DRAM etc)
Main Memory Background
Random Access Memory (vs. Serial Access Memory)
Different flavors at different levels
Physical Makeup (CMOS, DRAM)
Low Level Architectures (FPM,EDO,BEDO,SDRAM)
Cache uses SRAM : Static Random Access Memory
No refresh (6 transistors/bit vs. 1 transistor Size : DRAM/SRAM 4-8 , Cost/Cycle time : SRAM/DRAM 8-16
Main Memory is DRAM : Dynamic Random Access Memory
Dynamic since needs to be refreshed periodically (8 ms, 1% time)
Addresses divided into 2 halves (Memory as a 2D matrix):
RAS or Row Access Strobe
CAS or Column Access Strobe
Static RAM (SRAM)
Six transistors in cross connected fashion
Provides regular AND inverted outputs
Implemented in CMOS process
Single Port 6-T SRAM Cell
SRAM cells exhibit high speed/poor density
DRAM: simple transistor/capacitor pairs in high density form
Dynamic RAM Word Line Bit Line C Sense Amp . . .
Charge bitline HIGH or LOW and set wordline HIGH
Bit line is precharged to a voltage halfway between HIGH and LOW , and then the word line is set HIGH.
Depending on the charge in the cap, the precharged bitline is pulled slightly higher or lower.
Sense Amp Detects change
Explains why Cap can’t shrink
Need to sufficiently drive bitline
Increase density => increase parasitic capacitance
Word Line Bit Line C Sense Amp . . .
DRAM logical organization (4 Mbit)
Square root of bits per RAS/CAS
Column Decoder Sense Amps & I/O Memory Array (2,048 x 2,048) A0…A1 0 … 1 1 D Q W ord Line Storage Cell Row Decoder …
So, Why do I freaking care?
By it’s nature, DRAM isn’t built for speed
Response times dependent on capacitive circuit properties which get worse as density increases
DRAM process isn’t easy to integrate into CMOS process
DRAM is off chip
Connectors, wires, etc introduce slowness
IRAM efforts looking to integrating the two
Memory Architectures are designed to minimize impact of DRAM latency
Low Level: Memory chips
High Level memory designs.
You will pay $$$$$$ and then some $$$ for a good memory system.
So, Why do I freaking care?
1960-1985: Speed = ƒ(no. operations)
Pipelined Execution & Fast Clock Rate
Superscalar Instruction Issue
1998: Speed = ƒ(non-cached memory accesses)
What does this mean for
Compilers?,Operating Systems?, Algorithms? Data Structures?
A 60 ns ( t RAC ) DRAM can
perform a row access only every 110 ns ( t RC )
perform column access ( t CAC ) in 15 ns, but time between column accesses is at least 35 ns ( t PC ).
In practice, external address delays and turning around buses make it 40 to 50 ns
These times do not include the time to drive the addresses off the microprocessor nor the memory controller overhead!
Can it be made faster?
Many techniques are trading higher bandwidth, but have higher latency
The idea that the latency will be taken care of by the cache.
Has a clock input.
Data output is in bursts w/ each element clocked
Flavors: SDRAM, DDR
PC100: Intel spec to meet 100MHz memory bus designs. Introduced w/ i440BX chipset Write Read
“ Intellectual property company”.
Located in Los Altos, CA
Designed a memory architecture
Licenced to manufacturers
They have no factories.
Picked up by Intel, who signed an exclusive deal with them for Pentium 4 motherboards.
Litigation regarding the intellectual property.
Protocol based RAM w/ narrow (16-bit) bus
High clock rate (400 Mhz), but long latency
Multiple arrays w/ data transferred on both edges of clock
RAMBUS Bank RDRAM Memory System
DRAMs: capacity +60%/yr, cost –30%/yr
2.5X cells/area, 1.5X die size in 3 years
‘ 98 DRAM fab line costs $2B
DRAM only: density, leakage v. speed
Rely on increasing no. of computers & memory per computer (60% market)
SIMM or DIMM is replaceable unit => computers use any generation DRAM
Commodity, second source industry => high volume, low profit, conservative
Little organization innovation in 20 years
Don’t want to be chip foundries (bad for RDRAM)
Order of importance: 1) Cost/bit 2) Capacity
First RAMBUS: 10X BW, +30% cost => little impact
Read-only memory (ROM)
Programmed at time of manufacture
Can not be written by the computer
It is not erased by loss of power
Some of them can be erased and rewritten by special hardware (EEPROM)
One transistor / bit.
BIOS of desktop computers
Embedded devices (also serves as a code protection device)
Floating gate transitor
Presence of charge => “0”
Erase Electrically or UV (EPROM)
Reads like DRAM (~ns)
Writes like DISK (~ms). Write is a complex operation