• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
A 0.13-µm CMOS NOR Flash Memory Experimental Chip for 4-b ...

A 0.13-µm CMOS NOR Flash Memory Experimental Chip for 4-b ...






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    A 0.13-µm CMOS NOR Flash Memory Experimental Chip for 4-b ... A 0.13-µm CMOS NOR Flash Memory Experimental Chip for 4-b ... Document Transcript

    • ESSCIRC 2002 A 0.13-µm CMOS NOR Flash Memory Experimental Chip for 4-b/cell Digital Storage R. Micheloni1, O. Khouri2, S. Gregori3, A. Cabrini3, G. Campardo1, L. Fratin4, G. Torelli3 1 3 STMicroelectronics Department of Electronics Flash Memory Division University of Pavia Via Olivetti 2, 20041 Agrate Brianza MI, Italy Via Ferrata 1, 27100 Pavia, Italy 2 STMicroelectronics 4 STMicroelectronics Memory Product Group R&D Central R&D Via Ferrata 1, 27100 Pavia, Italy Via Olivetti 2, 20041 Agrate Brianza MI, Italy rino.micheloni@st.com , osama.khouri@st.com a.cabrini@ele.unipv.it , lorenzo.fratin@st.com Abstract less than 1 cm2 with present fabrication technology. A closed-loop voltage-sensing approach is adopted, which This paper presents architectural and circuit solutions calls for a new row-decoding architecture to allow for the developed to achieve 4-b/cell storage in NOR-type Flash read parallelism required in a digital memory. Program- memories. A multiple closed-loop voltage sensing topol- ming is carried out by means of a staircase word-line ogy, combined with hierarchical load-decoupling row voltage generated by using switched-capacitor tech- selection, and a two-step analog-to-digital conversion niques. The developed schemes were integrated in a 0.13- with early most significant bit (MSB) detection, achieve µm CMOS test chip together with a Flash memory array 120-ns access time for the stored MSBs. 80-mV pro- (cell size = 0.16 µm2) and the relevant peripheral circuits, gramming step is provided by a switched-capacitor stair- and experimentally evaluated. case waveform generator. Experimental data from a 0.13-µm CMOS test-chip are given. 2. Multilevel sensing and row decoding 1. Introduction 16-level cell sensing is a very critical task. Indeed, as- suming a voltage window from 2 V to 6.5 V for allocat- An emerging solution to decrease the cost-per-bit of ing all the programmed VT levels, very reduced spacing is Flash memories is the multilevel (ML) approach, where provided between adjacent states (~150 mV for a distri- any cell can be programmed to one of m = 2n predeter- bution width of each state equal to ~150 mV). With a cell mined threshold (VT) levels or, better, taking process and large-signal transconductance Gm of ~15 µA/V, this operating spreads into account, of 2n distributions of val- means a current separation of ~2 µA between contiguous ues, and is therefore capable of storing n bits [1]. To in- distributions. Hence, sensing is very hard with the con- crease the number of programmable levels beyond the ventional “current sensing” technique, where the contents present limit of 2 b/cell for digital applications, designers stored in any cell is determined by sensing the current have to face a number of issues as far as sensing and pro- flowing through the cell biased with suitable fixed gate gramming operations are concerned. Indeed, all VT levels and drain voltages. Moreover, when this approach is fol- must be allocated within a predetermined voltage win- lowed, parasitic effects such as source-line resistance af- dow, which can not be too large for reliability reasons. fect the sensed current in a way that depends on the Programming circuits must therefore ensure sufficiently stored data, thereby limiting the achievable sensitivity. In narrow VT distributions, and sensing circuits must be able addition, a large current consumption arises when reading to quickly detect the small VT difference available be- cells programmed to the lowest VT levels, as the read gate tween adjacent programmed states. voltage must be set to a value sufficiently high as to al- This paper presents architectural and circuit solutions de- low adequate read current through the cells programmed veloped to achieve 16-level (i.e., 4 b/cell) storage in to the (m – 1)-th state. common-ground NOR Flash memories programmed by To overcome the above drawbacks, a “voltage sensing” channel hot-electron injection, with the goal of approach has been adopted. A negative-feedback loop investigating the possibility to integrate a 1-Gb device in (Fig. 1) forces a predetermined current IRD (in our case, 15 µA) through the cell under sensing, by driving IREF the cell gate voltage to the corresponding value VGS,E. The latter carries the information on the cell contents, as VGS,E = VT + VOV, where VOV = IRD/Gm. The stored bits are COLUMN then determined through an analog-to-digital (A/D) con- DECODER version of the “extracted gate voltage” VGS,E. An addi- IRD SOUT tional key advantage provided by the adopted sensing ROW DECODER technique is reduced read disturb, as any cell is read with a minimum overdrive voltage VOV (~1 V) regardless of its contents. A similar approach has been previously proposed for ap- Fig. 1. Principle of the closed-loop voltage sensing approach. plications such as image, speech, and music ML storage 131
    • VPCX must not be connected to the corresponding MWL like in MWL conventional hierarchical schemes [3]. Instead, in our so- SWloop LWL SWloop LWL lution, each MWL acts as a control line, driving a com- Driver of the plementary switch which closes the feedback sensing LD LD main row decoder VPCX loop through the addressed LWL, as shown in Fig. 2. As MWL observed, any non-addressed LWL is connected to SWloop LWL SWloop LWL ground, while the addressed LWL is connected to the sense amplifier output SOUT, thereby closing the sensing LD LD loop (the line SOUT runs vertically across all the subsec- tors placed in the same stack). It should be pointed out MWL CL CL the addressing and analog-voltage transmission functions VPCX MP1 MN1 SOUT are separated. Considering the above 1-Gb 4-b/cell de- SWloop Local selection SOUT SOUT vice organization for our technology, the total array area MN2 MP2 including local column and row decoders turns out to be MN3 MN4 LWL LD = LOCAL DECODER 67 mm2. This figure should be compared with an area of 101 mm2 obtained for the case of a 1-Gb 2-b/cell array using a conventional hierarchical decoding approach with Fig. 2. Hierarchical row selection with separate address and ana- the same technology. log-voltage transmission functions. MP1, MN1: complementary In addition to allowing for p parallel independent sensing switch SW loop driven by the MWL; MP2, MN2: LWL selection; loops, the above strategy minimizes the capacitive load MN3, MN4 connect unselected word-lines to ground. CL associated to the sense amplifier output node SOUT, as [2]. However, in that case, the feedback loop includes the the MWL capacitance and, more importantly, the heavy final stage of the row decoder with the associated heavy capacitance associated to the final stages of the main row capacitive load, thus leading to an access time on order of decoder, are kept outside the loop. With the proposed ap- 0.8 µs. Moreover, no possibility for parallel reading proach, the main contributions to CL come from the metal within a memory sector was provided. As a consequence, line SOUT and all off switches SWloop belonging to the a maximum read throughput in the range of few Mbit/s considered subsector stack (the other contributions being with 4-b/cell storage is reported, which is not acceptable much smaller). CL is on the order of 3.5 pF for the in most digital applications. A new concept of row selec- above 1-Gb device organization. Moreover, it should be tion has therefore been devised, as described in next sub- pointed out that, as any MWL carries a digital informa- section. tion (i.e., turning the associated SWloop switches on or off), no analog accuracy is required of the output voltage 2.1. Hierarchical row decoding for multiple sensing of the main row decoder, which greatly relaxes the re- loops quirements of the respective high-voltage supply. In the conceived memory organization, a whole memory LV Stage HV Stage sector is made up of p identical subsectors placed in the VDD VPCX same horizontal line, p being the number of cells to be VDD read in parallel (p = 16 for 2 double-word (i.e., 64-b) IBL reading). More specifically, a 1-Gb array will consist 1 k VDD of 512 2-Mb (0.5-Mcell) sectors, each made up by 16 identical subsectors placed in the same horizontal line. INV M1 kICELL (k=2) SOUT The 512 sectors will be arranged in 8 stacks, each made BITLINE LIMITER up by 64 sectors. A two-level hierarchical approach [3] is IREF (30 µA) used for both column and row decoding: each subsector IHV [64 bit-lines by 128 main word-lines (MWLs), each con- trolling 4 local word-lines (LWLs)] is provided with its CBL own local column and row decoders, thus obtaining zero COLUMN DECODER stress outside the selected sector during any operation, ICELL which is vital for adequate reliability of ML storage. ROW DECODER From Fig. 1, the sensing loop includes both column and CL row selection corresponding to the addressed cell. Con- ventional hierarchical column decoding is suited to the Fig. 3. Sense amplifier for closed-loop voltage sensing. adopted sensing technique. Indeed, no more than one cell in any given bit-line is read at any memory access and, 2.2. Sensing loop implementation therefore, the addressed bit-line can be selected by means The sense amplifier (Fig. 3) is made up of two stages. of a usual switch tree. By contrast, to allow for simulta- The first stage compares the mirrored cell current kICELL neous reading of p cells belonging to the same MWL, the (in our case, k = 2) and a reference current IREF (30 µA). concept of word-line selection must now be changed: in- The resulting current difference is amplified and buffered deed, the gate terminal of each cell under sensing must be by the second stage. The latter drives the addressed LWL included in a sense loop totally independent of the others and, hence, the gate terminal of the selected cell through (in other words, a separate loop must be provided per the corresponding switches, thereby closing the feedback each subsector, so that p sensing loops will be present for loop so that, at equilibrium, ICELL ≡ IRD = IREF/k = 15 µA. each sector stack). This means that each addressed LWL, A conventional bit-limiter (common-gate device M1 bi- which is selected by its respective local row decoder, ased by inverter INV) keeps the cell drain voltage at the 132
    • value (~1 V) required for sensing operation. The value of extra margin is required between levels 8 and 9, as for the VGS,E generated by the amplifier can be higher than VDD latter the exponential portion of the SOUT curve is sub- and, therefore, the second stage is supplied by a high stantially over at t = t1. The cost of the above choice is a voltage VPCX. As on-chip charge pumps required to pro- reduction of ~15 mV in the spacing between the other vide this voltage (through a regulator) have limited effi- adjacent levels, which still remains more than enough to ciency and current drive capability, this stage is operated detect the LSBs. in class AB, and the first stage of the amplifier is sup- According to the proposed conversion strategy, the effec- plied by VDD. tive conversion time for the LSBs is equivalent to that of To speed-up sensing operations, SOUT is set to an inter- a full-flash A/D converter. In applications requiring page mediate level VPR during bit-line precharge. When the or burst synchronous reading of 4p bits (p cells), the fol- sense loop is enabled, in the case VGS,E > VPR, first a con- lowing organization is chosen. When programming, the stant-charge increasing ramp (~25 V/µs) takes place as first 2p bits of the burst are stored as the 2 MSBs of the p long as no current flows through the selected memory cells, and the remaining 2p bits represent the 2 LSBs of cell, and then a settling exponential behaviour follows the same cells. This allows the latency read time to be while, in the case VGS,E < VPR, an exponential discharge substantially reduced, which is a key factor in a number occurs (Fig. 4). The worst-case ±5 mV settling time of of applications. the sense loop (level 16) is less than 160 ns, as is required C2 to keep asynchronous access time of a complete memory chip within 200 ns. Current consumption of the sensing loop is 40 µA from VPCX and 60 µA from VDD. Φ2 C1 Φ2 Reset 3. Bit extraction VREF Φ1 Φ1 The 4 bits stored in the sensed cell are determined from the extracted gate voltage VGS,E by using a two-step flash A/D conversion approach. The first step finds out the two VINI most significant bits (MSBs), and the second step pro- vides the two least significant bits (LSBs). This choice Fig. 5. SC staircase programming waveform generator. turns out to be the best trade-off between silicon area and power consumption on the one side and conversion time 4. Multilevel programming circuit on the other. The binary conversion is achieved by means of a fully-differential offset-compensated topology [4] Staircase gate voltage and program-and-verify techniques working in two phases, i.e., (i) input and offset sampling, are used to achieve the required programming accuracy. and (ii) offset subtraction and signal regeneration. Indeed, with this approach, the width of any VT distribu- tion is ideally equal to the staircase voltage incre- ment ∆VGP although, in practice, a number of nonideali- ties lead to a wider broadening [3]. The staircase voltage is generated by a parasitic-insensitive switched-capacitor (SC) integrator (Fig. 5) using the available poly1-to- poly2 capacitors. The staircase starts at 1.5 V and can reach a maximum value of 8.5 V with an increment ∆VGP of 80 mV at each step [this value has been chosen as a trade–off between programming resolution and number of pulses (and, hence, programming time)]. Effects due to nonidealities such as offset and charge injection were minimized by choosing VREF – VINI = 1.6 V (C2/C1 = 20). (a) (b) The used SC integrator allows silicon area saving as compared to a staircase waveform generator based on a Fig. 4. (a) Transient of the sensing loop output voltage (reference levels required for the A/D conversion are also shown; solid-lines: programmable resistive string [5]. MSB references; dashed-lines: LSB references); (b) detail of the The program gate voltage is fed to the addressed cells transient around point A for levels 12 and 13. through the respective terminals SOUT (the sense loops are obviously open when applying any program pulse). For The programmed VT levels can be grouped into four dif- program verify, the extracted gate voltage is compared ferent sets, each corresponding to four adjacent states. with a single reference voltage selected depending on the The precharge value VPR for node SOUT is set in the sepa- target VT level. The MWLs again act as switch controllers ration range between the two lowest groups (i.e., between in both program and verify steps. The line SOUT must VT levels 4 and 5). From Fig. 4, the value of the voltage swing between voltages which can be very different on node SOUT at t = t1 (~80 ns) makes it possible to detect when switching from program to verify phases and vice- the set including the VT level stored in the sensed cell, versa. However, it should be pointed out that, with the thus providing the two MSBs and allowing early refer- proposed organization, the heavy capacitance of the main ence selection for the subsequent LSB conversion. row decoder is not switched at each step as occurs when To guarantee safe detection of the MSBs, an additional using a conventional “current sensing” approach This margin (200 mV) is added to the spacing between the two leads to fast transitions between the two phases, with en- highest level groups (i.e., between levels 12 and 13). No suing benefits in terms of overall programming time. 133
    • (b) (a) (a) (b) Fig. 6. Chip microphotograph Fig. 8. Measured programming characteristics (curve a) (pro- 5. Experimental Results gramming step = 80 mV). The staircase programming waveform is also shown (curve b). The circuits presented in previous sections were inte- grated in a test chip fabricated in triple-metal triple-well As observed, the MSBs are delivered to the output shallow-trench isolation 0.13-µm CMOS Flash technol- within 120 ns, and the LSB access time is 200 ns. With ogy (Fig. 6). The chip also includes a Flash memory ar- the proposed array organization, the ensuing read ray and the peripheral circuits required for the experi- throughput is 40 MByte/s. mental validation. The memory array is organized in Finally Fig. 8 illustrates the measured programming char- eight blocks arranged in a vertical stack. Each block acteristics and the staircase waveform generated by the represents a subsector in a complete memory such as a 1- SC integrator. The corresponding program throughput for Gb chip according to the above organization. A dummy the above memory organization is 0.1 MByte/s. This capacitor of ~3.5 pF was connected to the line SOUT so as figure can be increased by reducing the time duration of to account for all vertically stacked subsectors required each program step and/or increasing the parallelism p. for a 1-Gb device. Fig. 7 illustrates the measured waveform of two read ac- 6. Conclusions cesses (non-latched data output), corresponding to differ- This paper has presented architecture and circuit solu- ent programmed levels (the waveforms of bits b2 and b3, tions for 4-b/cell digital storage in NOR-type Flash which are different in the two cases, are plotted). The memories. A closed-loop voltage sense amplifier allows waveforms at the sense loop output are also shown. Dur- reliable detection of the stored contents, while a row de- ing no-operation, the comparator outputs are kept to VDD coding architecture with separate addressing and analog- to save power. As a polarity inversion takes place during voltage transmission functions allows parallel reading, as the first operating phase of the comparator [4], a transient is required in general-purpose digital memories. 80-mV low level is present at its output terminal when the de- programming step is obtained by using an SC staircase tected bit is high (output data is valid after this transient). waveform generator. The use of the proposed schemes allows 4 bits to be stored in a single 0.13-µm cell, thus overcoming today’s technology roadmap and allowing Address a 1-Gb NOR-type memory in less than 1 cm2. Sout Acknowledgments This work has been partially supported by Italian MURST (Cofunded Project). The authors would like to Bit 2 (a) thank A. Pierin, S. Coronini, M. Sangalli, and D. Soltesz for their contribution, and M. Carrera and his team for careful layout. Bit 3 7. References Address [1] M. Bauer, et al.: “A multilevel-cell 32-Mb flash memory”, IEEE Int. Solid-State Circuits Conference Dig. Tech. Papers, Sout Feb. 1995, pp. 132-133. [2] M. Pasotti, et al.: “Analog sense amplifiers for high density NOR Flash memories”, Proc. IEEE Custom Integrated Circuits (b) Conference, May 1999, pp. 334-335. Bit 2 [3] G. Campardo, et al.: “A 40mm2 3V 50MHz 64Mb 4-level cell NOR-type Flash memory”, IEEE Int. Solid-State Circuits Bit 3 Conference Dig. Tech. Papers, Feb. 2000, pp. 274-275. [4] A. Pierin, et al.: “High-speed low-power sense comparator for multilevel Flash memories”, Proc. IEEE Int. Conference on Electronics, Circuits, and Systems, Dec. 2000, pp. 759-763. Fig. 7. Measured read access waveforms for two different pro- [5] O. Khouri, et al.: “Program word-line voltage generator for grammed levels: (a) level 12 (VGS,E = 5 V, bits 0011); (b) level 13 multilevel Flash memories”, Proc. IEEE Int. Conference on (VGS,E = 5.3 V, bits 0100). Electronics, Circuits and Systems, Dec. 2000, pp. 1030-1033. 134