Successfully reported this slideshow.



Published on

  • Be the first to comment


  1. 1. EECS150 - Digital Design Lecture 20 - Memory April 4&9, 2002 John Wawrzynek
  2. 2. Memory Basics <ul><li>Uses: </li></ul><ul><ul><li>data & program storage </li></ul></ul><ul><ul><li>general purpose registers </li></ul></ul><ul><ul><li>buffering </li></ul></ul><ul><ul><li>table lookups </li></ul></ul><ul><ul><li>CL implementation </li></ul></ul><ul><ul><li>Whenever a large collection of state elements is required. </li></ul></ul><ul><li>Types: </li></ul><ul><ul><li>RAM - random access memory </li></ul></ul><ul><ul><li>ROM - read only memory </li></ul></ul><ul><ul><li>EPROM, FLASH - electrically programmable read only memeory </li></ul></ul><ul><li>Example RAM: Register file </li></ul><ul><ul><li>regid = register identifier </li></ul></ul><ul><ul><li>sizeof(regid) = log2(# of reg) </li></ul></ul><ul><ul><li>WE = write enable </li></ul></ul>
  3. 3. Register File Internals <ul><li>Functionally the regfile is equivalent to a 2-D array of flip-flops: </li></ul><ul><li>Cell with write logic: </li></ul>How do we go from &quot;regid&quot; to &quot;SEL&quot;?
  4. 4. Regid (address) Decoding
  5. 5. Standard Internal Memory Organization <ul><li>Special circuit tricks are used for the cell array to improve storage density. (We will look at these later) </li></ul><ul><li>RAM/ROM naming convention: </li></ul><ul><ul><li>examples: 32 X 8, &quot;32 by 8&quot; => 32 8-bit words </li></ul></ul><ul><ul><li>1M X 1, &quot;1 meg by 1&quot; => 1M 1-bit words </li></ul></ul>
  6. 6. Read Only Memory (ROM) <ul><li>Functional Equivalence: </li></ul><ul><li>Of course, full tri-state buffers are not needed at each cell point. </li></ul><ul><li>Single transistors are used to implement zero cells. Logic one’s are derived through precharging or bit-line pullup transistor . </li></ul>
  7. 7. Column MUX in ROMs and RAMs: <ul><li>Controls physical aspect ratio </li></ul><ul><li>In DRAM, allows reuse of chip address pins </li></ul>
  8. 8. Cascading Memory Modules (or chips) <ul><li>example 256 X 8 ROM using 256 X 4 parts: </li></ul><ul><li>example: 1K X * ROM using 256 X 4 parts: </li></ul><ul><li>each module has tri-state outputs: </li></ul>
  9. 9. Definitions <ul><li>Bandwidth: </li></ul><ul><ul><li>Total amount of data accross out of a device or across an interface per unit time. (usually Bytes/sec) </li></ul></ul><ul><li>Latency: </li></ul><ul><ul><li>A measure of the time from a request for a data transfer until the data is received. </li></ul></ul><ul><li>Memory Interfaces for Acessing Data </li></ul><ul><li>Asynchronous (unclocked): </li></ul><ul><ul><li>A change in the address results in data appearing </li></ul></ul><ul><li>Synchronous (clocked): </li></ul><ul><ul><li>A change in address, followed by an edge on CLK results in data appearing. Somtimes, multiple request may be outstanding. </li></ul></ul><ul><li>Volatile: </li></ul><ul><ul><li>Looses its state when the power goes off. </li></ul></ul>
  10. 10. Example Memory Components: <ul><li>Volatile: </li></ul><ul><ul><li>Random Access Memory (RAM): </li></ul></ul><ul><ul><ul><li>DRAM &quot;dynamic&quot; </li></ul></ul></ul><ul><ul><ul><li>SRAM &quot;static&quot; </li></ul></ul></ul><ul><li>Non-volatile: </li></ul><ul><ul><li>Read Only Memory (ROM): </li></ul></ul><ul><ul><ul><li>Mask ROM &quot;mask programmable&quot; </li></ul></ul></ul><ul><ul><ul><li>EPROM &quot;electrically programmable&quot; </li></ul></ul></ul><ul><ul><ul><li>EEPROM &quot;erasable electrically programmable&quot; </li></ul></ul></ul><ul><ul><ul><li>FLASH memory - similar to EEPROM with programmer integrated on chip </li></ul></ul></ul>
  11. 11. Volatile Memory Comparison <ul><li>SRAM Cell </li></ul><ul><li>Larger cell  lower density, higher cost/bit </li></ul><ul><li>No refresh required </li></ul><ul><li>Simple read  faster access </li></ul><ul><li>Standard IC process  natural for integration with logic </li></ul><ul><li>DRAM Cell </li></ul><ul><li>Smaller cell  higher density, lower cost/bit </li></ul><ul><li>Needs periodic refresh, and refresh after read </li></ul><ul><li>Complex read  longer access time </li></ul><ul><li>Special IC process  difficult to integrate with logic circuits </li></ul>word line bit line bit line word line bit line
  12. 12. In Desktop Computer Systems: <ul><li>SRAM (lower density, higher speed) used in CPU register file, on- and off-chip caches. </li></ul><ul><li>DRAM (higher density, lower speed) used in main memory </li></ul><ul><li>Closing the GAP: Innovation targeted towards higher bandwidth for memory systems: </li></ul><ul><ul><li>SDRAM - synchronous DRAM </li></ul></ul><ul><ul><li>RDRAM - Rambus DRAM </li></ul></ul><ul><ul><li>EDORAM - extended data out SRAM </li></ul></ul><ul><ul><li>Three-dimensional RAM </li></ul></ul><ul><ul><li>hyper-page mode DRAM video RAM </li></ul></ul><ul><ul><li>multibank DRAM </li></ul></ul>
  13. 13. Important DRAM Examples: <ul><li>EDO - extended data out (similar to fast-page mode) </li></ul><ul><ul><li>RAS cycle fetched rows of data from cell array blocks (long access time, around 100ns) </li></ul></ul><ul><ul><li>Subsequent CAS cycles quickly access data from row buffers if within an address page (page is around 256 Bytes) </li></ul></ul><ul><li>SDRAM - synchronous DRAM </li></ul><ul><ul><li>clocked interface </li></ul></ul><ul><ul><li>uses dual banks internally. Start access in one back then next, then receive data from first then second. </li></ul></ul><ul><li>DDR - Double data rate SDRAM </li></ul><ul><ul><li>Uses both rising (positive edge) and falling (negative) edge of clock for data transfer. (typical 100MHz clock with 200 MHz transfer). </li></ul></ul><ul><li>RDRAM - Rambus DRAM </li></ul><ul><ul><li>Entire data blocks are access and transferred out on a highspeed bus-like interface (500 MB/s, 1.6 GB/s) </li></ul></ul><ul><ul><li>Tricky system level design. More expensive memory chips. </li></ul></ul>
  14. 14. Non-volatile Memory <ul><li>Mask ROM </li></ul><ul><ul><li>Used with logic circuits for tables etc. </li></ul></ul><ul><ul><li>Contents fixed at IC fab time (truly write once!) </li></ul></ul><ul><li>EPROM (erasable programmable) </li></ul><ul><li> & FLASH </li></ul><ul><ul><li>requires special IC process </li></ul></ul><ul><ul><li>(floating gate technology) </li></ul></ul><ul><ul><li>writing is slower than RAM. EPROM uses special programming system to provide special voltages and timing. </li></ul></ul><ul><ul><li>reading can be made fairly fast. </li></ul></ul><ul><ul><li>rewriting is very slow. </li></ul></ul><ul><ul><ul><li>erasure is first required , EPROM - UV light exposure </li></ul></ul></ul>Used to hold fixed code (ex. BIOS), tables of data (ex. FSM next state/output logic), slowly changing values (date/time on computer)
  15. 15. FLASH Memory <ul><li>Electrically erasable </li></ul><ul><li>In system programmability and erasability (no special system or voltages needed) </li></ul><ul><li>On-chip circuitry (FSM) to control erasure and programming (writing) </li></ul><ul><li>Erasure happens in variable sized &quot;sectors&quot; in a flash (16K - 64K Bytes) </li></ul>See: for product descriptions, etc.
  16. 16. Relationship between Memory and CL <ul><li>Memory blocks can be (and often are) used to implement combinational logic functions: </li></ul><ul><li>Examples: </li></ul><ul><ul><li>LUTs in FPGAs </li></ul></ul><ul><ul><li>1Mbit x 8 EPROM can implement 8 independent functions each of log 2 (1M)=20 inputs. </li></ul></ul><ul><li>The decoder part of a memory block can be considered a “minterm generator”. </li></ul><ul><li>The cell array part of a memory block can be considered an OR function over a subset of rows. </li></ul><ul><li>The combination gives us a way to implement logic functions directly in sum of products form. </li></ul><ul><li>Several variations on this theme exist in a set of devices called Programmable logic devices (PLDs) </li></ul>
  17. 17. A ROM as AND/OR Logic Device
  18. 18. PLD Summary
  19. 19. PLA Example
  20. 20. PAL Example
  21. 21. Memory Blocks in FPGAs <ul><li>LUTs can double as small RAM blocks: </li></ul><ul><ul><li>5-LUT is a 16x1 memory </li></ul></ul><ul><ul><li>achieves 16x density advantage over using CLB flip-flops </li></ul></ul><ul><li>Newer FPGA families include additional on chip RAM blocks (usually dual ported) </li></ul><ul><ul><li>Called “block-rams” in Xilinx Virtex series </li></ul></ul>
  22. 22. Memory Specification in Verilog <ul><li>Memory modeled by an array of registers: </li></ul>reg [15:0] memword[0:1023]; // 1,024 registers of 16 bits each //Example Memory Block Specification //----------------------------- //Read and write operations of memory. //Memory size is 64 words of 4 bits each. module memory (Enable,ReadWrite,Address,DataIn,DataOut); input Enable,ReadWrite; input [3:0] DataIn; input [5:0] Address; output [3:0] DataOut; reg [3:0] DataOut; reg [3:0] Mem [0:63]; //64 x 4 memory always @ (Enable or ReadWrite) if (Enable) if (ReadWrite) DataOut = Mem[Address]; //Read else Mem[Address] = DataIn; //Write else DataOut = 4'bz; //High impedance state endmodule
  23. 23. Error Correction Codes (ECC) <ul><li>Memory systems generate errors (accidentally fliped-bits) </li></ul><ul><ul><li>DRAMs store very little charge per bit </li></ul></ul><ul><ul><li>“ Soft” errors occur occasionally when cells are struck by alpha particles or other environmental upsets. </li></ul></ul><ul><ul><li>Less frequently, “hard” errors can occur when chips permanently fail. </li></ul></ul><ul><li>Where “perfect” memory is required </li></ul><ul><ul><li>servers, spacecraft/military computers, … </li></ul></ul><ul><li>Memories are protected against failures with ECCs </li></ul><ul><li>Extra bits are added to each data-word </li></ul><ul><ul><li>extra bits are used to detect and/or correct faults in the memory system </li></ul></ul><ul><ul><li>in general, each possible data word value is mapped to a unique “code word”. A fault changes a valid code word to an invalid one - which can be detected. </li></ul></ul>
  24. 24. Simple Error Detection Coding <ul><li>Each data value, before it is written to memory is “tagged” with an extra bit to force the stored word to have even parity : </li></ul><ul><li>Each word, as it is read from memory is “checked” by finding its parity (including the parity bit). </li></ul>Parity Bit <ul><li>A non-zero parity indicates an error occurred: </li></ul><ul><ul><li>two errors (on different bits) is not detected (nor any even number of errors) </li></ul></ul><ul><ul><li>odd numbers of errors are detected. </li></ul></ul>b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 p + b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 p + c
  25. 25. Hamming Error Correcting Code <ul><li>Use more parity bits to pinpoint bit(s) in error, so they can be corrected. </li></ul><ul><li>Example: SEC on 4-bit data </li></ul><ul><ul><li>use 3 parity bits, with 4-data bits results in 7-bit code word </li></ul></ul><ul><ul><li>3 parity bits sufficient to identify any one of 7 code word bits </li></ul></ul><ul><ul><li>overlap the assignment of parity bits so that a single error in the 7-bit work can be corrected </li></ul></ul><ul><li>Group parity bits so they correspond to subsets of the 7 bits: </li></ul><ul><ul><li>p 1 protects bits 1,3,5,7 </li></ul></ul><ul><ul><li>p 2 protects bits 2,3,6,7 </li></ul></ul><ul><ul><li>p 3 protects bits 4,5,6,7 </li></ul></ul><ul><li>1 2 3 4 5 6 7 </li></ul><ul><li>p 1 p 2 d 1 p 3 d 2 d 3 d 4 </li></ul><ul><li>Bit position number </li></ul><ul><li>001 = 1 10 </li></ul><ul><li>011 = 3 10 </li></ul><ul><li>101 = 5 10 </li></ul><ul><li>111 = 7 10 </li></ul><ul><li>010 = 2 10 </li></ul><ul><li>011 = 3 10 </li></ul><ul><li>110 = 6 10 </li></ul><ul><li>111 = 7 10 </li></ul><ul><li>100 = 4 10 </li></ul><ul><li>101 = 5 10 </li></ul><ul><li>110 = 6 10 </li></ul><ul><li>111 = 7 10 </li></ul>p 1 p 2 p 3
  26. 26. Hamming Code Example <ul><li>Example: c = c 1 c 2 c 3 = 101 </li></ul><ul><ul><li>error in 4,5,6, or 7 (by c 3 =1) </li></ul></ul><ul><ul><li>error in 1,3,5, or 7 (by c 1 =1) </li></ul></ul><ul><ul><li>no error in 2, 3, 6, or 7 (by c 2 =0) </li></ul></ul><ul><li>Therefore error must be in bit 5. </li></ul><ul><li>Note the check bits point to 5 </li></ul><ul><li>By our clever positioning and assignment of parity bits, the check bits always address the position of the error! </li></ul><ul><li>c=000 indicates no error </li></ul><ul><li>1 2 3 4 5 6 7 </li></ul><ul><li>p 1 p 2 d 1 p 3 d 2 d 3 d 4 </li></ul><ul><ul><li>Note: parity bits occupy power-of-two bit positions in code-word. </li></ul></ul><ul><ul><li>On writing parity bits are assigned to force even parity over their respective groups. </li></ul></ul><ul><ul><li>On reading, check bits (c 1 ,c 2 ,c 3 ) are generated by finding the parity of the group along with its parity bit. If an error occurred in a group, the corresponding check bit will be 1, if no error the check bit will be 0. </li></ul></ul>
  27. 27. Hamming Error Correcting Code <ul><li>Overhead involved in single error correction code: </li></ul><ul><ul><li>let p be the total number of parity bits and d the number of data bits in a p + d bit word. </li></ul></ul><ul><ul><li>If p error correction bits are to point to the error bit ( p + d cases) plus indicate that no error exists (1 case), we need: </li></ul></ul><ul><ul><li>2 p >= p + d + 1, </li></ul></ul><ul><ul><li>thus p >= log( p + d + 1 ) </li></ul></ul><ul><ul><li>for large d , p approaches log( d) </li></ul></ul><ul><li>Adding on extra parity bit covering the entire word can provide double error detection </li></ul><ul><li>1 2 3 4 5 6 7 8 </li></ul><ul><li>p 1 p 2 d 1 p 3 d 2 d 3 d 4 p 4 </li></ul><ul><li>On reading the C bits are computed (as usual) plus the parity over the entire word, P: </li></ul><ul><li>C=0 P=0, no error </li></ul><ul><li>C!=0 P=1, correctable single error </li></ul><ul><li>C!=0 P=0, a double error occurred </li></ul><ul><li>C=0 P=1, an error occurred in p 4 bit </li></ul>Typical modern codes in DRAM memory systems: 64-bit data blocks (8 bytes) with 72-bit code words (9 bytes).