Your SlideShare is downloading. ×
Chapter 12
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Chapter 12

1,000

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,000
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
42
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Digital Integrated Circuits A Design Perspective Semiconductor Memories Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic December 20, 2002
  • 2. Chapter Overview
    • Memory Classification
    • Memory Architectures
    • The Memory Core
    • Periphery
    • Reliability
    • Case Studies
  • 3. Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory EPROM E 2 PROM FLASH Random Access Non-Random Access SRAM DRAM Mask-Programmed Programmable (PROM) FIFO Shift Register CAM LIFO
  • 4. Memory Timing: Definitions
  • 5. Memory Architecture: Decoders Word 0 Word 1 Word 2 Word N 2 2 Word N 2 1 Storage cell M bits M bits N words S 0 S 1 S 2 S N 2 2 A 0 A 1 A K 2 1 K 5 log 2 N S N 2 1 Word 0 Word 1 Word 2 Word N 2 2 Word N 2 1 Storage cell S 0 Input-Output ( M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals Input-Output ( M bits) Decoder K = log 2 N Decoder reduces the number of select signals
  • 6. Array-Structured Memory Architecture Problem: ASPECT RATIO or HEIGHT >> WIDTH Amplify swing to rail-to-rail amplitude Selects appropriate word
  • 7. Hierarchical Memory Architecture Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings
  • 8. Block Diagram of 4 Mbit SRAM Subglobal row decoder Global row decoder Subglobal row decoder Block 30 Block 31 128 K Array Block 0 Block 1 Local row decoder [Hirose90] Clock generator CS, WE buffer I/O buffer Y -address buffer X -address buffer x1/x4 controller Z -address buffer X -address buffer Predecoder and block selector Bit line load Transfer gate Column decoder Sense amplifier and write driver
  • 9. Contents-Addressable Memory Address Decoder I/O Buffers Commands 2 9 Validity Bits Priority Encoder Address Decoder I/O Buffers Commands 2 9 Validity Bits Priority Encoder
  • 10. Memory Timing: Approaches DRAM Timing Multiplexed Adressing SRAM Timing Self-timed
  • 11. Read-Only Memory Cells WL BL WL BL 1 WL BL WL BL WL BL 0 V DD WL BL GND Diode ROM MOS ROM 1 MOS ROM 2
  • 12. MOS OR ROM WL [0] V DD BL [0] WL [1] WL [2] WL [3] V bias BL [1] Pull-down loads BL [2] BL [3] V DD
  • 13. MOS NOR ROM WL [0] GND BL [0] WL [1] WL [2] WL [3] V DD BL [1] Pull-up devices BL [2] BL [3] GND
  • 14. MOS NOR ROM Layout Programmming using the Active Layer Only Polysilicon Metal1 Diffusion Metal1 on Diffusion Cell (9.5  x 7  )
  • 15. MOS NOR ROM Layout Polysilicon Metal1 Diffusion Metal1 on Diffusion Cell (11  x 7  ) Programmming using the Contact Layer Only
  • 16. MOS NAND ROM All word lines high by default with exception of selected row WL [0] WL [1] WL [2] WL [3] V DD Pull-up devices BL [3] BL [2] BL [1] BL [0]
  • 17. MOS NAND ROM Layout Polysilicon Diffusion Metal1 on Diffusion Cell (8  x 7  ) Programmming using the Metal-1 Layer Only No contact to VDD or GND necessary; Loss in performance compared to NOR ROM drastically reduced cell size
  • 18. NAND ROM Layout Cell (5  x 6  ) Polysilicon Threshold-altering implant Metal1 on Diffusion Programmming using Implants Only
  • 19. Equivalent Transient Model for MOS NOR ROM
    • Word line parasitics
      • Wire capacitance and gate capacitance
      • Wire resistance (polysilicon)
    • Bit line parasitics
      • Resistance not dominant (metal)
      • Drain and Gate-Drain capacitance
    Model for NOR ROM V DD C bit r word c word WL BL
  • 20. Equivalent Transient Model for MOS NAND ROM
    • Word line parasitics
      • Similar to NOR ROM
    • Bit line parasitics
      • Resistance of cascaded transistors dominates
      • Drain/Source and complete gate capacitance
    Model for NAND ROM V DD C L r word c word c bit r bit WL BL
  • 21. Decreasing Word Line Delay
  • 22. Precharged MOS NOR ROM PMOS precharge device can be made as large as necessary, but clock driver becomes harder to design. WL [0] GND BL [0] WL [1] WL [2] WL [3] V DD BL [1] Precharge devices BL [2] BL [3] GND pre f
  • 23. Non-Volatile Memories The Floating-gate transistor (FAMOS) Floating gate Source Substrate Gate Drain n + n +_ p t ox t ox Device cross-section Schematic symbol G S D
  • 24. Floating-Gate Transistor Programming 0 V 2 5 V 0 V D S Removing programming voltage leaves charge trapped 5 V 2 2.5 V 5 V D S Programming results in higher V T . 20 V 10 V 5 V 20 V D S Avalanche injection
  • 25. A “Programmable-Threshold” Transistor
  • 26. FLOTOX EEPROM Floating gate Source Substrate p Gate Drain n 1 n 1 FLOTOX transistor Fowler-Nordheim I - V characteristic 20 – 30 nm 10 nm -10 V 10 V I V GD
  • 27. EEPROM Cell WL BL Absolute threshold control is hard Unprogrammed transistor might be depletion  2 transistor cell V DD
  • 28. Flash EEPROM Control gate erasure p- substrate Floating gate Thin tunneling oxide n 1 source n 1 drain programming Many other options …
  • 29. Cross-sections of NVM cells EPROM Flash Courtesy Intel
  • 30. Basic Operations in a NOR Flash Memory― Erase
  • 31. Basic Operations in a NOR Flash Memory― Write
  • 32. Basic Operations in a NOR Flash Memory― Read
  • 33. NAND Flash Memory Unit Cell Word line(poly) Source line (Diff. Layer) Courtesy Toshiba
  • 34. NAND Flash Memory Courtesy Toshiba Word lines Select transistor Bit line contact Source line contact Active area STI
  • 35. Characteristics of State-of-the-art NVM
  • 36. Read-Write Memories (RAM)
    • STATIC (SRAM)
    • DYNAMIC (DRAM)
    Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended
  • 37. 6-transistor CMOS SRAM Cell WL BL V DD M 5 M 6 M 4 M 1 M 2 M 3 BL Q Q
  • 38. CMOS SRAM Analysis (Read) WL BL V DD M 5 M 6 M 4 M 1 V DD V DD V DD BL Q = 1 Q = 0 C bit C bit
  • 39. CMOS SRAM Analysis (Read) 0 0 0.2 0.4 0.6 0.8 1 1.2 0.5 Voltage rise [V] 1 1.2 1.5 2 Cell Ratio (CR) 2.5 3 Voltage Rise (V)
  • 40. CMOS SRAM Analysis (Write) BL = 1 BL = 0 Q = 0 Q = 1 M 1 M 4 M 5 M 6 V DD V DD WL
  • 41. CMOS SRAM Analysis (Write)
  • 42. 6T-SRAM — Layout V DD GND Q Q WL BL BL M1 M3 M4 M2 M5 M6
  • 43. Resistance-load SRAM Cell M 3 R L R L V DD WL Q Q M 1 M 2 M 4 BL BL Static power dissipation -- Want R L large Bit lines precharged to V DD to address t p problem
  • 44. SRAM Characteristics
  • 45. 3-Transistor DRAM Cell No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL -V Tn WWL BL 1 M 1 X M 3 M 2 C S BL 2 RWL V DD V DD 2 V T D V V DD 2 V T BL 2 BL 1 X RWL WWL
  • 46. 3T-DRAM — Layout BL2 BL1 GND RWL WWL M3 M2 M1
  • 47. 1-Transistor DRAM Cell Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance Voltage swing is small; typically around 250 mV.  V BL V PRE – V BIT V PRE – C S C S C BL + ------------ = = V
  • 48. DRAM Cell Observations
    • 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out.
    • DRAM memory cells are single ended in contrast to SRAM cells.
    • The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation.
    • Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design.
    • When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than V DD
  • 49. Sense Amp Operation D V (1) V (1) V (0) t V PRE V BL Sense amp activated Word line activated
  • 50. 1-T DRAM Cell Uses Polysilicon-Diffusion Capacitance Expensive in Area M 1 word line Capacitor Cross-section Layout Diffused bit line Polysilicon gate Polysilicon plate Metal word line Poly SiO 2 Field Oxide n + n + Inversion layer induced by plate bias Poly
  • 51. SEM of poly-diffusion capacitor 1T-DRAM
  • 52. Advanced 1T DRAM Cells Cell Plate Si Capacitor Insulator Storage Node Poly 2nd Field Oxide Refilling Poly Si Substrate Trench Cell Stacked-capacitor Cell Capacitor dielectric layer Cell plate Word line Insulating Layer Isolation Transfer gate Storage electrode
  • 53. Static CAM Memory Cell ••• ••• CAM Bit Word Bit ••• CAM Bit Bit CAM Word Wired-NOR Match Line Match M1 M2 M7 M6 M4 M5 M8 M9 M3 int S Word ••• CAM Bit Bit S
  • 54. CAM in Cache Memory Address Decoder Hit Logic CAM ARRAY Input Drivers Tag Hit Address SRAM ARRAY Sense Amps / Input Drivers Data R/W
  • 55. Periphery
    • Decoders
    • Sense Amplifiers
    • Input/Output Buffers
    • Control / Timing Circuitry
  • 56. Row Decoders Collection of 2 M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder
  • 57. Hierarchical Decoders • • • • • • A 2 A 2 A 2 A 3 WL 0 A 2 A 3 A 2 A 3 A 2 A 3 A 3 A 3 A 0 A 0 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 1 A 1 WL 1 Multi-stage implementation improves performance NAND decoder using 2-input pre-decoders
  • 58. Dynamic Decoders Precharge devices V DD  GND WL 3 WL 2 WL 1 WL 0 A 0 A 0 GND A 1 A 1  WL 3 A 0 A 0 A 1 A 1 WL 2 WL 1 WL 0 2-input NOR decoder 2-input NAND decoder V DD V DD V DD V DD
  • 59. 4-input pass-transistor based column decoder Advantages: speed ( t pd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count 2-input NOR decoder A 0 S 0 BL 0 BL 1 BL 2 BL 3 A 1 S 1 S 2 S 3 D
  • 60. 4-to-1 tree based column decoder Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders buffers progressive sizing combination of tree and pass transistor approaches Solutions: BL 0 BL 1 BL 2 BL 3 D A 0 A 0 A 1 A 1
  • 61. Decoder for circular shift-register V DD V DD R WL 0 V DD f f f f V DD R WL 1 V DD f f f f V DD R WL 2 V DD f f f f • • •
  • 62. Sense Amplifiers Idea: Use Sense Amplifer output input s.a. small transition t p C  V  I av ---------------- = make  V as small as possible small large
  • 63. Differential Sense Amplifier Directly applicable to SRAMs M 4 M 1 M 5 M 3 M 2 V DD bit bit SE Out y
  • 64. Differential Sensing ― SRAM
  • 65. Latch-Based Sense Amplifier (DRAM) Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point. EQ V DD BL BL SE SE
  • 66. Charge-Redistribution Amplifier Concept M 2 M 3 M 1 V L V S V ref C small C large Transient Response
  • 67. Charge-Redistribution Amplifier― EPROM SE V DD WLC Load Cascode device Column decoder EPROM array BL WL V casc Out C out C col C BL M 1 M 2 M 3 M 4
  • 68. Single-to-Differential Conversion How to make a good V ref ?
  • 69. Open bitline architecture with dummy cells C S C S C S C S BLL L L 1 L 0 R 0 C S R 1 C S L … … BLR V DD SE SE EQ Dummy cell Dummy cell
  • 70. DRAM Read Process with Dummy Cell 3 2 1 0 0 1 2 3 V BL BL t (ns) reading 0 3 2 1 0 0 1 2 3 V SE EQ WL t (ns) control signals 3 2 1 0 0 1 2 3 V BL BL t (ns) reading 1
  • 71. Voltage Regulator - + V DD V REF V bias M drive M drive V DL V DL V REF Equivalent Model
  • 72. Charge Pump
  • 73. DRAM Timing
  • 74. RDRAM Architecture memory array mux/demux network Data bus Clocks Column Row demux packet dec. packet dec. Bus k k 3 l demux
  • 75. Address Transition Detection DELAY t d A 0 DELAY t d A 1 DELAY t d A N 2 1 V DD ATD ATD …
  • 76. Reliability and Yield
  • 77. Sensing Parameters in DRAM From [Itoh01] 4K 10 100 1000 64K 1M 16M 256M 4G 64G Memory Capacity (bits / chip) C D , Q S , C S , V DD , V smax C D (1F) C S (1F) Q S (1C) V smax (mv) V DD (V) Q S 5 C S V DD / 2 V smax 5 Q S / ( C S 1 C D )
  • 78. Noise Sources in 1T DRam C cross electrode a -particles leakage C S WL BL substrate Adjacent BL C WBL
  • 79. Open Bit-line Architecture —Cross Coupling Sense Amplifier C WL 1 BL C BL C WBL C WBL C C WL 0 C C BL C C WL D WL D WL 0 WL 1 BL EQ
  • 80. Folded-Bitline Architecture
  • 81. Transposed-Bitline Architecture
  • 82. Alpha-particles (or Neutrons) 1 Particle ~ 1 Million Carriers WL BL V DD n 1 a -particle SiO 2 1 1 1 1 1 1 2 2 2 2 2 2
  • 83. Yield Yield curves at different stages of process maturity (from [Veendrick92])
  • 84. Redundancy Memory Array Column Decoder Row Decoder Redundant rows Redundant columns Row Address Column Address Fuse Bank :
  • 85. Error-Correcting Codes Example: Hamming Codes with e.g. B3 Wrong 1 1 0 = 3
  • 86. Redundancy and Error Correction
  • 87. Sources of Power Dissipation in Memories PERIPHERY ROW DEC selected non-selected CHIP COLUMN DEC nC DE V INT f mC DE V INT f C PT V INT f I DCP ARRAY m n m(n 2 1)i hld mi act V DD V SS I DD 5 S C i D V i f 1S I DCP From [Itoh00]
  • 88. Data Retention in SRAM (A) SRAM leakage increases with technology scaling 1.30u 1.10u 900n 700n 500n 300n 100n 0.00 .600 1.20 1.80 Factor 7 0.13 m CMOS m 0.18 m CMOS m V DD I leakage
  • 89. Suppressing Leakage in SRAM SRAM cell SRAM cell SRAM cell V DD,int V DD V DD V DDL V SS,int sleep sleep SRAM cell SRAM cell SRAM cell V DD,int sleep low-threshold transistor Reducing the supply voltage Inserting Extra Resistance
  • 90. Data Retention in DRAM From [Itoh00]
  • 91. Case Studies
    • Programmable Logic Array
    • SRAM
    • Flash Memory
  • 92. PLA versus ROM
    • Programmable Logic Array
    structured approach to random logic “ two level logic implementation” NOR-NOR (product of sums) NAND-NAND (sum of products) IDENTICAL TO ROM!
    • Main difference
    ROM: fully populated PLA: one element per minterm Note: Importance of PLA’s has drastically reduced 1. slow 2. better software techniques (mutli-level logic synthesis) But …
  • 93. Programmable Logic Array GND GND GND GND GND GND GND V DD V DD X 0 X 0 X 1 f 0 f 1 X 1 X 2 X 2 AND-plane OR-plane Pseudo-NMOS PLA
  • 94. Dynamic PLA GND GND V DD V DD X 0 X 0 X 1 f 0 f 1 X 1 X 2 X 2 AND f AND f OR f OR f AND-plane OR-plane
  • 95. Clock Signal Generation for self-timed dynamic PLA f t pre t eval f AND f f AND f AND f OR f OR (a) Clock signals (b) Timing generation circuitry Dummy AND row Dummy AND row
  • 96. PLA Layout
  • 97. 4 Mbit SRAM Hierarchical Word-line Architecture
  • 98. Bit-line Circuitry Bit-line load Block select ATD BEQ Local WL Memory cell I/O line I / O B / T CD Sense amplifier CD CD I / O B / T
  • 99. Sense Amplifier (and Waveforms) BS I / O I / O DATA Block select ATD BS SA SA BS SEQ SEQ SEQ SEQ SEQ De i I/O Lines Address Data-cut ATD BEQ SEQ DATA Vdd GND SA, SA Vdd GND
  • 100. 1 Gbit Flash Memory From [Nakamura02]
  • 101. Writing Flash Memory Read level (4.5 V) Number of cells Evolution of thresholds Final Distribution From [Nakamura02] 10 0 0V 1V 2V Vt of memory cells 3V 4V 10 2 10 4 10 6 10 8
  • 102. 125 mm 2 1Gbit NAND Flash Memory 10.7mm 11.7mm 2kB Page buffer & cache Charge pump 16896 bit lines 32 word lines x 1024 blocks From [Nakamura02]
  • 103. 125 mm 2 1Gbit NAND Flash Memory
    • Technology 0.13  m p-sub CMOS triple-well
    • 1poly, 1polycide, 1W, 2Al
    • Cell size 0.077  m2
    • Chip size 125.2mm2
    • Organization 2112 x 8b x 64 page x 1k block
    • Power supply 2.7V-3.6V
    • Cycle time 50ns
    • Read time   25  s
    • Program time 200  s / page
    • Erase time 2ms / block
    From [Nakamura02]
  • 104. Semiconductor Memory Trends (up to the 90’s) Memory Size as a function of time: x 4 every three years
  • 105. Semiconductor Memory Trends (updated) From [Itoh01]
  • 106. Trends in Memory Cell Area From [Itoh01]
  • 107. Semiconductor Memory Trends Technology feature size for different SRAM generations

×