IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011 919A Memory Built-In Self-Repair Scheme Based onConﬁgurable SparesMincent Lee, Student Member, IEEE, Li-Ming Denq, and Cheng-Wen Wu, Fellow, IEEEAbstract—There is growing need for embedded memory built-in self-repair (MBISR) due to the introduction of more and moresystem-on-chip (SoC) and other highly integrated products, forwhich the chip yield is being dominated by the yield of on-chipmemories, and repairing embedded memories by conventionaloff-chip schemes is expensive. Therefore, we propose an MBISRgenerator called BRAINS+, which automatically generates regis-ter transfer level MBISR circuits for SoC designers. The MBISRcircuit is based on a redundancy analysis (RA) algorithm thatenhances the essential spare pivoting algorithm, with a moreﬂexible spare architecture, which can conﬁgure the same spareto a row, a column, or a rectangle to ﬁt failure patternsmore efﬁciently. The proposed MBISR circuit is small, and itsupports at-speed test without timing-penalty during normaloperation, e.g., with a typical 0.13 μm complementary metal-oxide-semiconductor technology, it can run at 333 MHz for a512 Kb memory with four spare elements (rows and/or columns),and the MBISR area overhead is only 0.36%. With its low areaoverhead and zero test-time penalty, the MBISR can easily beapplied to multiple memories with a distributed RA scheme.Compared with recent studies, the proposed scheme is better innot only test-time but also area overhead.Index Terms—Built-in self-repair (BISR), DRAM, embeddedmemory, infrastructure IP, memory repair, memory testing,redundancy analysis, SoC, spare allocation, SRAM, yield im-provement.I. IntroductionMEMORY built-in self-repair (MBISR)  is increas-ingly necessary for system-on-chip (SoC) and otherhighly integrated products, because embedded memories areoccupying a signiﬁcant amount of chip area (over 70% inmany cases) . Besides, memories are normally very denseand more susceptible to process variation and defects thanlogic circuits. As a result, embedding large memories withoutrepairability in an SoC will likely result in a very low chipyield. Moreover, for faster yield ramp-up and shorter time-to-volume, it is necessary to develop effective and efﬁcientmethodologies and tools such as memory test and repairManuscript received August 21, 2010; revised December 7, 2010; acceptedDecember 10, 2010. Date of current version May 18, 2011. This work wassupported in part by the National Science Council (NSC), Taiwan, under GrantNSC 95-2221-E007-258-MY3, as well as the Chip Implementation Center ofNSC. This paper was recommended by Associate Editor F. Lombardi.The authors are with the Department of Electrical Engineering, Na-tional Tsinghua University, Hsinchu 30070, Taiwan (e-mail: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org).Color versions of one or more of the ﬁgures in this paper are availableonline at http://ieeexplore.ieee.org.Digital Object Identiﬁer 10.1109/TCAD.2011.2106812, , memory built-in self-test (MBIST) generator ,redundancy scheme evaluator , and even MBISR schemes. The need for MBISR generator follows naturally .Nevertheless, 2-D redundancy repair using general row andcolumn spares (identifying a row/column cover in an optimalway) is an NP-complete problem .There have been many MBISR architecture schemes re-ported recently –, including even a commercial im-plementation . An optimal solution called comprehensivereal-time exhaustive search test and analysis (CRESTA) – is equipped with parallel exhaustive analyzers, which isan extreme case due to very high area overhead. To reducehardware overhead,  and  trades time with area. Heuris-tic redundancy analysis/allocation (RA) algorithms , – are widely used to solve the NP-complete problem withreasonable time complexity, area, and repair rate. The tradeoffamong repair rate, test time, and area is not straightforward.Examples can be found in , , and . The sparesare normally rows, columns, or words . However, as thesize of the embedded static random access memory increasesdramatically, recently, , , RA algorithms have beenlimited to dealing with row/column spares. We do need moresophisticated spares  to improve the spare utilization andrepair efﬁciency.Defects can span multiple circuit elements  and havebeen shown to occur in clusters on wafers and semiconductorchips (defect clustering) –, and failures also occur inclusters (failure clustering) with spatial locality that resultsin serious yield loss. Therefore, there have been numerousstudies considering clustered failure repair , , –. Moreover, cluster failures should be repaired togetherrather than individually , because individual spares areinefﬁcient for clustered failures . However, it has beenshown that the block redundancy, e.g., global spare rowblocks constructed from several local spare rows, can repair a17.6 μm×2.25 mm defect and 8% clustered failures effectively. Therefore, the spare row blocks are often seen in large-scale memories for repairing cluster failures.For embedded memory applications, spare memory coresgenerated from the same memory generator (compiler) as forthe main memory is a more realistic and cheaper solution formany SoC integrators now. In this paper, we assume that thesoft repair scheme is implemented. It is more cost-effectivethan hard repair when BIST is available for embedded memo-ries. We propose an MBISR generator called BRAINS+, whichautomatically generates register transfer level (RTL) MBISR0278-0070/$26.00 c 2011 IEEE
920 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011circuits for embedded memories. BRAINS+ is extended fromBRAINS , which is our MBIST circuit generator , i.e.,the new features in this paper include the RA algorithmand automatic repair circuit generation. The MBISR circuitgenerated by BRAINS+ is based on an RA algorithm thatenhances the essential spare pivoting (ESP) algorithm ,with a more ﬂexible spare architecture, which can conﬁgurethe same spare to a row, a column, or a rectangle to ﬁtfailure patterns more efﬁciently. The proposed MBISR circuitis small, and it supports at-speed test without timing penaltyduring normal operation . As an example using a typical0.13 μm complementary metal-oxide-semiconductor (CMOS)technology, the MBISR circuit can run at 333 MHz for a512 Kb memory with four spare rows and/or columns, andthe MBISR area overhead is only 0.36%.By the features of the low area overhead and zero test timepenalty, the MBISR can easily be applied to multiple memorieswith a distributed RA scheme. Compared with recent papers,the proposed scheme reduces test time as well as area overheadunder a reasonable repair rate.II. Proposed MBISRA. MBISR FeaturesThe proposed MBISR generator (BRAINS+) targets largememories (embedded or stand-alone) that require redundancyrepair. It takes the memory speciﬁcation and then generatesthe RTL MBISR circuit, which is a logic circuit that canbe synthesized (i.e., converted to a gate-level circuit) using acommercial tool. The MBISR circuit has a scalable architec-ture, and it implements an enhanced version of our ESP RA algorithm for efﬁcient 2-D repair. The MBISR circuit canconcurrently analyze a memory for repair (spare allocation)during at-speed test by the MBIST circuit that is generated byBRAINS . There is no test time overhead and only a littlelogic area overhead.B. MBISR Architecture and OperationFig. 1 shows the architecture of the proposed memorybuilt-in redundancy-analyzer (MBIRA) scheme, whose detailsfollow.1) MBIST Operation: The tester controls the MBIST fortesting and diagnosing the main and spare memories. For theﬁeld reset-triggered repair, the MBIST executes the defaultMarch-CW  test algorithm after each reset, by connectingsome control pins (Ctrl) to VDD or ground.The main memory under test/repair has its own sparememory. All the memory inputs have multiplexers in frontof them to select (use the mode signal) between the normal-access channels (in normal mode: CE, D, A, WE) or MBISTchannels (in test mode: CETest, DTest, ATest, and WETest).2) MBISR RA in Test Mode: During RA, the MBIRA ﬁrstrecords the unusable faulty spare elements and then allocatesspare elements according to the faulty rows/columns of themain memory based on the test result. That is, the MBISTtests the spare memory ﬁrst to determine the available spareelements then tests the main memory and simultaneouslyFig. 1. Proposed MBISR scheme.executes the RA process. When the spare elements are notenough for repairing the faults, the Go signal is de-asserted toabort the MBIST. Otherwise, when the test is done, the Readyand Go signals are asserted.Whenever the MBIST begins testing, the MBIRA starts theRA process by detecting the Mode signal. During RA, theMBIRA checks the 2-bit DetectM/S signals in the MBISTfor the main and spare memory faults, respectively, andanalyzes each faulty address from AFail within the same clockperiod.The MBIRA can use multiple clock cycles to pre-analyzethe faulty addresses without suspending the MBIST, because itgets the addresses from the MBIST before the addressed dataare read and determined faulty, especially for a high-speedpipelined memory.3) MBIRA Address Remapping in Normal Mode: In thenormal mode, the MBIRA remaps the faulty main memoryaddresses to those of the spare memory that have been deter-mined during the RA process. Therefore, the spare memoryhas the same chip-enable (CE) and data-in (D) signals as themain memory, but it receives the remapped address (AS) andwrite-enable (WES) signals from the MBIRA. The MBIRAalso can select the correct spare data-out (QS) to replace thefaulty data-out (QM) for the output (Q).If a processor is addressing fault-free data in the memory,the MBIRA de-activates WES and switches Q to QM, i.e., itdisables the spare memory. Only if the MBIRA detects faultymain memory address on A from the processor, it concurrentlyremaps A to AS according to the previous RA result. In thatsituation, the MBIRA activates WES in the same clock periodwhen we perform the Write operation, or assert remap to selectthe spare QS during the Read operation. Although connectingCE directly to the spare memory wastes additional power dueto the unnecessary read, it avoids timing penalty for all theRead operations. It has also been shown that the memoryoutput timing penalty from the output multiplexers (see Fig. 1)can be effectively reduced .An allocated spare row may intersect one or more allocatedspare columns in the logical address space. Remapping thecrossed addresses to the spare row or spare column is done
LEE et al.: A MEMORY BUILT-IN SELF-REPAIR SCHEME BASED ON CONFIGURABLE SPARES 921based on prioritized structure reliabilities or other criticalconcerns.The MBIRA enters the normal mode by detecting the de-assertion of Mode signal that switches the memory inputmultiplexer from the test to the normal inputs. In fact, theMBIRA can switch between test and normal modes arbitrarilyaccording to the mode signal. Therefore, the MBIRA alsosupports soft cumulative repair, i.e., it can switch between thenormal mode and the test mode to test and repair more faultslater for reliability improvement.C. Logical Pivoting and Conﬁgurable Spare RemappingTraditional 2-D redundancy repair uses two spare memorycores for spare rows and columns . However, the pivotingspare remapping uses only one spare memory core that canremap either a faulty row or column to the same spare element.The spare utilization efﬁciency increases and area overhead isreduced, as the traditional method requires two different sparememory cores for columns and rows, respectively, which re-sults in higher area overhead of MBIST and peripheral circuits.Furthermore, the logic address is conﬁgurable so that the spareelement can be mapped from a rectangle or even a set ofrectangles to block-repair cluster faults or other special cases.In Figs. 2–6, we give simple conceptual cases showingthe strategies of address remapping from various rectangularpatterns in the main memory to the spare elements in the sparememory.Fig. 2 shows two basic cases when the row and columnaddress lengths of the main memory are equal, the spare rowor column elements (from the spare memory) are suitable forrepairing the faulty rows or columns. However, because thememory is usually square or close to square, an element mightnot exactly be a row or column, but just of the same sizedivided by the most signiﬁcant bit (MSB) or any address bit.For example, to remap the faulty row (010), the row address(a5a4a3: 010) of the main memory address (AM) is remappedto a spare element number (a3: 0) of the spare memory address(AS) by the MBIRA according to the redundancy analysisresults. The other address ﬁeld (a2a1a0) of AM is the columnaddress, which is don’t-care in redundancy analysis, while thesame address value will be rearranged to the column addressﬁeld of AS during remapping. Similarly, the faulty column(000) can also be remapped to the same kind of spare element(1) while the column address (a2a1a0) of AM is remapped toa spare element number (1), and the row address (a5a4a3) ofAM is rearranged to the column address (a2a1a0) of AS.Deﬁnition. Intra Address: A set of rectangular regions inthe array can be speciﬁed by a subset of the address bits(other bits are don’t-cares), which will be called an intraaddress.Fig. 3 shows two general cases where the column addresslength is shorter than the row address length, as longercolumn address makes wider multiplexers, which reduce thememory performance. Wide words also reduce the columnaddress length to keep scrambled cell-array physicallysquared. Therefore, a spare element can cover a faulty row,but only a faulty column segment. This is also suitable forordinary cases, which have column segments to preventFig. 2. Spare pivoting mapping for square address.Fig. 3. Spare pivoting mapping for column segment.them from high bit-line loading. For example, a columnsegment has column address 111 (a2a1a0) and row address11000-11111 (a7a6a5a4a3). All the members inside havethe same column segment address (11a5a4a3111). It is thecolumn address (a2a1a0: 111) with two MSB (a7a6: 11) ofthe row address to divide the column into 22segments inthe row direction, and select the 4th (1·21+ 1·20) segment inthis example, by which, the MBIRA can allocate a columnsegment for remapping to a spare element of the same size.The other address ﬁeld (a5a4a3) is an intra-column-segmentaddress.In Fig. 4, we show spare elements that have twice thesize of those in Fig. 3. These spare elements can repair largefailure areas (cluster faults) more efﬁciently. The remappingmethod has no constraints on row or column boundaries. Fromthe original row address bits, an additional intra-rectangle(don’t-care in row: a3) or intra-column-segment (don’t-carein column-segment: a6) address bit can double the repairrange in the row direction. Because those address bits aredon’t-cares in redundancy analysis, they can reduce thearea of the MBIRA. On the contrary, large elements reducespare utilization. The tradeoff between the logic area and theutilization can be dealt with by the yield analysis of the RA
922 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011Fig. 4. Spare pivoting mapping for extended spare element.Fig. 5. Spare pivoting mapping for block repair.simulation , the statistics of fault-pattern diagnosis, andthe memory architecture and layout.Fig. 5 shows two examples. The intra address is a combina-tion of the row and column address bits as deﬁned above. Theﬁrst example is the shaded rectangle shown on the upper-rightcorner, which is represented by the intra address 000a4a31a1a0.In this intra address, a4a3 covers four rows, and a1a0 coversfour columns. Therefore, the region covered by this intraaddress is a 4 × 4 rectangle. The second example is theshaded rectangle shown on the lower-right corner, which isrepresented by the intra address 11a5a4a311a0. In this intraaddress, a5a4a3 covers eight rows, and a0 covers two columns.Therefore, the region covered by this intra address is a 8 × 2rectangle.Fig. 6 shows more complicated examples, where an intraaddress actually speciﬁes a unique set of rectangles to beremapped. Although the remapping essentially should involvethe scrambling of the memory logical address to cover acontinuous rectangle in the physical memory, the remappingis also useful in some special spare constraints and layoutrequirements.Fig. 6. Scrambled spare mapping.III. BIRA AlgorithmsA. ESP Algorithm and Proposed ImplementationESP  is a greedy RA algorithm with a speciﬁed thresh-old. During testing, ESP marks essential-repair lines (rows andcolumns whose number of faults exceeds the threshold) andthen ignores the incoming faults on those essential lines. Afterthe testing ﬁnishes, the spare allocation phase handles otherfaulty lines and orthogonal faults  (a single fault withoutother faults on the same row and column) to complete therepair procedure.Although there were many complex and even optimalRA studies, the ESP with threshold 2 has sufﬁciently highrepair rate in normal situations now , as it can easilydetect faulty rows/columns, and single faults are rarely onthe same faulty line in large memory arrays with few defects. There were 1552 memory blocks (1024 × 64 bits)simulated  with ten spare rows, some spare columns,and randomly injected faulty cells and lines. The resultsshow that the repair rates of ESP are very close to optimalsolutions. The critical/corner cases are rare and only slightlyreduce the repair rate, although ESP is a simple heuristicalgorithm for the NP-complete RA problem. More-accurateimplementations may need to be used for advanced andimmature technologies with higher defect densities, but usingmore-ﬂexible spares can be a more cost-effective way thancomplex RAs, especially when the scale grows.A row/column coordinate-table [can be a content-addressable memory (CAM)], with enough items for everyspare row/column, can be used to implement the simpleESP algorithm without any fault bitmaps, which simpliﬁesthe hardware. When faults are detected, their addresses arecompared with the table items for determining new faultyaddresses to record or essential-repair lines to mark (forthe ﬁrst time) or those to ignore. Therefore, during testing,the MBIRA simultaneously compares the incoming testingaddresses with the table and then ﬁnishes updating the tablewithin a clock period after the address is determined faulty ornot, without any additional delay or pause to work with theat-speed MBIST.
LEE et al.: A MEMORY BUILT-IN SELF-REPAIR SCHEME BASED ON CONFIGURABLE SPARES 923Fig. 7. Memory block with defect sequence.Fig. 8. ESP with threshold 2, two spare rows, and two spare columns.Fig. 9. ESP with threshold 2 and 4 pivoting spare elements.Fig. 7 shows an example fault map to illustrate the algo-rithms. Fig. 8 is an ESP example with threshold 2, two sparerows, and two spare columns, where rows 1, 5, 6, and column5 are essential-repair lines, and cell (5, 5) has an orthogonalfault. Due to the lack of a spare row for the essential-repairrow 6, this case will fail during the spare allocation phase,which allocates spare rows, columns, or both to speciﬁc tableitems after testing.Fig. 9 has the same faults as in Fig. 8, but now each spareelement can be used to replace a faulty row or a faulty col-umn. Therefore, the faults become repairable. Every spare isinitially in the default conﬁguration and conﬁgured to repair arow/column when one of the address-coordinates is marked asan essential-repair row/column. For example, Spares 1, 2, and4 are conﬁgured as rows, and Spare 3 is conﬁgured asa column when their essential-repair ﬂags (circles) are set.Determination of default conﬁguration depends on the memorystructure and failure diagnosis.If an address is marked simultaneously as an essential-repairrow and an essential-repair column [e.g., address (5, 5) inFig. 8], the address is split into two items (e.g., spare 2 and 3in Fig. 9, where “X” means don’t-care) to avoid conﬁgurationconﬂict of the spare. Because the number of items equals thenumber of spares, the table of the original ESP can havemore faults than it can repair. Note that the proposed RAalgorithm uses the same table space as the original ESP, andboth algorithms use the same memory space for the spares.Therefore, the early-abort feature of the proposed architecturehas no extra overhead so far as memory space is concerned,while the original ESP needs additional logic for handlingthe above case, or it fails at the ﬁnal spare allocation phaseas the spares are not conﬁgurable. For example, if there isone spare column available, it cannot repair a faulty row, butthe proposed method will reconﬁgure the spare into a row,so the faulty line can be repaired. Therefore, the repair rateimprovement depends on the fault distribution.Consequently, with the proposed algorithm, the spare al-location phase can be omitted to simplify the MBISR, i.e.,with lower area. Moreover, the two components of a generalMBISR, i.e., the MBIRA, for RA, and address remapping unit[(ARU) for normal mode]  can be easily merged to share thearea-dominating CAM or address-comparators and registers.B. Conﬁgurable Spare Allocation AlgorithmThe conﬁgurable spare allocation (CSA) algorithm to bediscussed next is similar to the original ESP algorithm, butis more general in utilizing spare elements, e.g., the modiﬁedESP discussed above for the pivoting spare is a special caseof the CSA. It can handle not only row and column sparesbut also more than two different address remapping schemessimultaneously as shown in Figs. 5 and 6. As a comparison,the ESP ﬁts each failure pattern with a spare row or columnand marks the essential-repair-row or column where necessary,while the CSA uses multiple spare patterns to repair morecomplicated failure patterns.Fig. 10 shows a simple four-step example with four conﬁg-urable spares. In the ﬁgure, the pattern ﬂags are shown on theleft and the memory under repair is shown on the right. We usea very simple case as an example. The bit-oriented memory isan 8 × 12 array, and there are two spare elements, each with8 bits. In Step A, we show that each of the two spares can beconﬁgured as an 8-bit column, an 8-bit row, a pair of 4-bitrows, or a 2 × 4 block. These are called spare patterns. Whenthe ﬁrst fault is detected, each of the four spare patterns inSpare 1 can be used to cover it. In Step B, two more faultsare detected, and there are only three out of the four sparepatterns in Spare 1 that can be used to cover the two faults onthe same row, while any of the four spare patterns in Spare 2can be used to cover the third fault. In Step C, we can see thatthere is only one spare pattern in Spare 1, i.e., the 8-bit row,which can cover the failed row with four fail bits. Note thatthe 4-bit row pair cannot be used due to the location constraintdiscussed before. Therefore, for Spare 2 that is to cover thefaulty row with three fail bits, either the 8-bit row pattern orthe 4-bit row pair pattern can be used. Finally, in Step D, itis clear that there is exactly one spare pattern in each of thetwo spare elements that can be used to cover the cumulatedfaults. The spare patterns are allocated for the memory underrepair based on the method described in Section II-C.The more clustered failures (e.g., the failure covered bySpare 2 in Fig. 10), the higher repair rate the CSA has than thatin previous papers. We will show some experimental resultsin Section V. In our examples and experiments, we assumebit-oriented memories for ease of discussion. However, theapproach can be extended to word-oriented memories in astraightforward way, thanks to the conﬁgurability of the spares.
924 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011Fig. 10. Example of conﬁgurable spare allocation algorithm. (a) AllocateSpare 1 to cover the ﬁrst fault. (b) Conﬁgure Spare 1 to ﬁt the ﬁrst failurepattern and allocate Spare 2 to cover the third fault. (c) Conﬁgure the sparesto ﬁt the failure patterns. (d) The ﬁnal spare conﬁgures.In the MBIRA circuit, address comparisons dominatethe area and timing overhead. The traditional ESP withrow/column spares only compares two patterns (row andcolumn), so the area of the four-pattern example in Fig. 10is about twice that of a two-pattern one. However, becausethe address comparisons are done in parallel, the timing ofthe two designs is about the same, and can be reduced byCAM . Further analysis will be shown in Section V.Fig. 11 shows the pseudo code of the CSA. For everyincoming fault f, try every allocated spare s to see whetherthere is a pattern in s that covers the f, and store the resultin c. We “set” the pattern that covers f, and unset others.Otherwise, continue to ﬁnd an available spare for f (with setpatterns that can cover f). Note that the ﬁrst “if FlagCovered”block can be omitted, as this is an option to set higher priorityfor the patterns against the test sequence.IV. BRAINS+: The Memory BIST/R GenerationSystemA. BRAINS: BIST for RAM in SecondsFig. 12 ,  shows the MBIST architecture generatedfrom BRAINS , . The architecture has a controllerincluding command buffers and an interface between parallelsequencers and low-pin-count serial ports for external tester.The sequencers concurrently send read/write test commandswith data backgrounds to grouped test pattern generator (TPG)neighborhoods under their power-constrained scheduling. EachFig. 11. Pseudo code of the CSA algorithm.Fig. 12. MBIST architecture of BRAINS ,  with proposed MBIRA.dedicated TPG translates the test commands to operating sig-nals of its random access memory (RAM) under test nearby forat-speed test, and checks the response. The overall test resultand optional diagnosis information (Signature) of each faultcan be scanned back, through sequencers and the controller,to the external tester.B. BRAINS+: The Memory BIST/R Generation SystemRepair scheme evaluation , ,  helps determineproper repair strategies, including RAs, test algorithms, sparearchitectures, and amounts of each RAM size. Small RAMsunder repair usually should use 1-D spares without RA, whilelarge ones can use 2-D or even more sophisticated spares like the conﬁgurable spares proposed here. The proposedconﬁgurable spares and MBISR circuits are integrated withthe RAMs under repair (see Figs. 1 and 12) and their TPG.Therefore, in the test mode, the MBIRA can get each addressunder test from the sequencer to perform analysis beforeaccessing the RAM and test result from the TPG later. Inthe normal mode, the MBISR can switch each faulty addressaccess from the RAM to its spare nearby. The switching timepenalty can be effectively reduced  if necessary.Fig. 13 shows the automatic MBISR-compilation ﬂow ofBRAINS+, which is extended from BRAINS , . Theﬂow is similar to BRAINS, with the additional MBISR gener-
LEE et al.: A MEMORY BUILT-IN SELF-REPAIR SCHEME BASED ON CONFIGURABLE SPARES 925Fig. 13. BRAINS+, the automatic memory BIST/R compilation ﬂow.Fig. 14. Synthesis results of original ESP and proposed scheme.ation combined with the MBIST generation of BRAINS. TheBRAINS+ compiler parses the input BIST/R description ﬁlesof the memory speciﬁcation and test and repair requirement,and then consults the user-deﬁned or built-in memory libraryand BIST/R template to generate the BIST/R RTL design,activation sequences test bench, and synthesis scripts.In a real case using a typical 0.13 μm CMOS technologyand a 512K×32 bits spare memory, the MBIST  of a16K × 32 bits memory has 3291 equivalent NAND gates and6.35 ns in timing critical path. After adding a repair circuit thatdivides the spare into four spare elements with two-patternconﬁgurability, the gate count and critical path increase to5367 gates and 7.18 ns, respectively. The overhead of theMBIST depends on the memory size, test algorithm, anddiagnosis option, while the MBIRA depends on the addresswidth, spare/pattern complexity, and pipeline depth.V. Experimental ResultsFig. 14 compares the proposed spare pivoting scheme withthe traditional ESP algorithm and row and column sparearchitecture, using the same 8 Kb spare memory for a 512 Kbmemory under repair. With a typical 0.13 μm CMOS technol-ogy, the proposed scheme improves about 30% in silicon area,10% in power, and 20% in critical path.Both MBIRA designs include the necessary comparison andmultiplexing circuit to implement the repair scheme becauseof the limitation from the memory compiler. The criticalTABLE IMBISR Schemes with Target Memories, Spares, and BufferDepthCritical Path(ns)Power(mW)Area(KGates)Area Overhead(‰)SparesMem. (MB)4 8 4 8 4 8 4 8Buffer Depth 00.5 3.1 3.1 0.7 1.4 0.9 1.8 2.7 5.52 3.1 3.1 0.8 1.5 1.0 2.0 0.8 1.5Buffer Depth 30.5 3.1 3.1 1.0 1.7 1.2 2.2 3.6 6.52 3.1 3.1 1.2 2.0 1.3 2.4 1.0 1.8paths are in the test mode and they are almost the same,because the address comparisons dominate the timing and bothschemes compare row and column addresses for each spare.However, in the normal mode, the conﬁgurable spares reusethe same comparison circuit, but the traditional 2-D sparesonly need to compare either row or column address for eachspare. Because address comparisons are done in parallel, theadditional timing latency of the conﬁgurable spares is minorwhen the amount of spares is low, e.g., two to eight sparesin common cases. In the proposed scheme, only the size ofthe address-selection multiplexer doubles, and an additionalOR gate in the OR gate tree is added. The latency can furtherbe reduced , and CAM can be used to reduce the timingoverhead.The main reason is that the MBIRA with simpliﬁed archi-tecture merges the ARU as mentioned previously (see SectionsII–III). Therefore, the MBIRA is almost equal to the wholeMBISR, except for an output (Q) multiplexer and a 1-bitremap delay buffer for ﬁtting the timing of reading the memoryoutput Q as shown in Fig. 1.Table I compares the synthesis results (using a typical0.13 μm CMOS technology, with a 3 ns clock period) ofeight different MBISR circuits with 0.5/2 MB target memories,four/eight spare elements, and none/three-stages pipelinedbuffers, respectively. A buffer is a queue for keeping thetesting address until the MBIST reports it. The buffers cost17–25% of the MBISR area and critical path delay. Althoughthe performance can be improved by pipelining, the RA delayoverlaps with the data comparators of the MBIST, i.e., theglobal critical path is in the MBIST. Therefore, the criticalpath is in the address remapping of the normal mode, whichcan be reduced . As shown in Table I, the power, area,and area overhead of the non-buffer MBISR circuits double asthe number of spares doubles, because the analysis table (seeFig. 9) grows linearly with the number of spares (table items).However, when the memory size quadruples, the MBISR onlylogarithmically adds 2 bits to the addresses. Therefore, theMBISR area overhead decreases about n/log2n times whenthe memory size increases n times.In general, the address comparison circuits dominate thearea of the MBISR. Therefore, the area is directly proportionalto the amount of spares, the amount of conﬁgurable patterns,and the address length of the spare pattern, because all addressbits of all conﬁgurable patterns of all spares need to be com-pared. The critical paths in Table I are almost the same, exceptfor synthesis variations due to power and area constraints.
926 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011TABLE IIMemory Repair Scheme ComparisonProposedSpareArchitecturePivotingLinesRows/ColumnsESP Local RAParallelExhaustive1-D RepairMust-repairRow-repairNoneShared RALocal-repair-most ESP Row/Col FirstMediumLowHighScalabilityArea Overhead (gates)CAM may be used in the MBISR for efﬁcient address map-ping. In a CAM-based MBISR called CRESTA , each of theparallel exhaustive analyzers is about half the size as that inthe MBIRA of ESP for the same spare conﬁguration. However,there are multiple copies of BIRA hardware in CRESTA forexhaustive search of all possible solutions, simultaneously.Therefore, the area ratio (ESP:CRESTA) is about 1 :(r+c) Cc/2,where r and c are the numbers of the spare rows and columns,respectively, and (r+c)Cc indicates the number of all possiblecombinations of the given spare rows and columns (i.e., thenumber of all possible solutions, or the number of analyzers).Based on this, we estimate the MBIRA area overhead of as shown in Table II.Table II compares the proposed scheme with  and twoother recent papers , . Both  and the proposedMBIRA can achieve at-speed repair without any test timeoverhead, whereas the other two schemes take longer timebecause of complex RA, especially when the number ofdefects and the number of total faults increase. The area ofthe proposed MBIRA is also smaller than others. Therefore,the proposed MBIRA has higher scalability to deal with morememories.The area of the ESP in the proposed scheme is smaller thanthose in  and , because the ESP is also included in and is a simpliﬁed special case of the local-repair-most (LRM)algorithm  and the must-repair algorithm , .Figs. 15–19 compare the repair rates of the CSA usingconﬁgurable spares, CRESTA, and the ESP using traditionalrow/column spares. The Monte-Carlo simulation  runs100 000 memory cases. Each memory is 2K × 512 with four1×512 spare rows and four 1K×1 spare columns (segment) fortraditional RAs. On the contrary, the CSA partitions the sametotal size of spare cells into six spare elements. Each elementcan be conﬁgured to 2×512, 1K×1, or 32×32. The amountof defects per memory is generated by the Poisson distributionwith an average of two to six defects. The failure patterns with“1×” cluster magniﬁcation include 10% of faulty rows, 10%Fig. 15. Repair rate of algorithms with different cluster ratios.Fig. 16. Repair rate of algorithms with different average defects (worse).of faulty columns, 10% of 2×2 clustered failures, 5% of 4×4clustered failures, 2.5% of 8×8 clustered failures, and the restare single failures. The “2×” magniﬁcation doubles the threecluster ratios, and so on.Fig. 15 shows that when the number of clustered failuresis reduced, the repair rate of the CSA increases slower thanothers, because the CSA is tailored for clustered failures inthis case, and the amount of six spare elements is lower thanthe amount of four spare rows and four spare columns in theother two schemes. However, when the number of clusteredfailures increases, the repair rate of the CSA clearly stands outas the best.Fig. 16 shows that the repair rate of the CSA decreases fasterthan others when the amount of defects increases, because ithas a lower amount of spares.Figs. 17–19 show a similar but better case, where thealgorithms have the same amount of spare elements and sparecells while the columns (segments) are 512 × 1 and theconﬁgurable spares can be conﬁgured to 1 × 512, 512 × 1,or 32 × 16. The repair rates of the CSA are higher thanothers.Fig. 19 shows that when the number of defects increasesand the repair rates decrease, the CSA still can handle theincreasing clustered failures.The results also depend on pattern ﬁtting between sparesand failures. The repair rate also depends on many other
LEE et al.: A MEMORY BUILT-IN SELF-REPAIR SCHEME BASED ON CONFIGURABLE SPARES 927Fig. 17. Repair rate of algorithms with different cluster ratios (better).Fig. 18. Repair rate of algorithms with different cluster ratios (better).Fig. 19. Repair rate comparison with different failure distributions.factors, e.g., the spare pattern priority (see Fig. 11), testaddress sequence, and failure detection sequence (random,row-by-row, sorted from low-address to high-address, andso on).Note that when there are some constraints between thememory under repair and its spare memory, e.g., if there aredifferent row/column access times as in a typical DRAM, theuse of reconﬁgurable spares is limited.VI. ConclusionWe proposed a cost-effective MBISR generator calledBRAINS+, which enhances the MBIST generator—BRAINS with repair option for general embedded memories. Itgenerated synthesizable RTL MBISR circuits with little areaoverhead and high repair rate. The proposed MBISR uses aspare memory generated from the same memory generator asfor the main memory without any built-in redundancy, whosespeciﬁcation is obtained from RAISIN . We enhancedthe ESP  RA algorithm for traditional 2-D or block-redundancy repair with the proposed pivotable or conﬁgurablespare architecture, which can allocate the spare elements ineither row, column, or any block/rectangular patterns that theaddress bits can deﬁne. We also removed the spare allocationphase of ESP for lower area overhead and shorter analysistime. Besides, the RA can analyze each fault within a clockperiod, with the at-speed MBIST generated by BRAINS. TheBIRA scheme can easily be applied to multiple memorieson a chip. Moreover, the proposed soft-repair scheme iscost-effective when MBIST is available, without the need ofnonvolatile storage.References L.-T. Wang, C.-W. Wu, and X. Wen, Design for Testability: VLSI TestPrinciples and Architectures. San Francisco, CA: Morgan Kaufmann,2006. International Technology Roadmap for Semiconductors (ITRS), Semi-conductor Industry Association, Sematech, Hsinchu, Taiwan, Dec.2009. S.-Y. Kuo and W. K. Fuchs, “Efﬁcient spare allocation in reconﬁg-urable arrays,” IEEE Des. Test Comput., vol. 4, no. 1, pp. 24–31,Feb. 1987. C. Cheng, C.-T. Huang, J.-R. Huang, C.-W. Wu, C.-J. Wey, and M.-C.Tsai, “BRAINS: A BIST compiler for embedded memories,” in Proc.IEEE Int. Symp. DFT VLSI Syst., Oct. 2000, pp. 299–307. R.-F. Huang, J.-F. Li, J.-C. Yeh, and C.-W. Wu, “Raisin: Redundancyanalysis algorithm simulation,” IEEE Des. Test Comput., vol. 24, no. 4,pp. 386–396, Jul.–Aug. 2007. M.-S. Lee, L.-M. Denq, and C.-W. Wu, “BRAINS+: A memory built-inself-repair generator,” in Proc. 1st VTTW, Jul. 2007, Paper 1.2. T. Kawagoe, J. Ohtani, M. Niiro, T. Ooishi, M. Hamada, and H. Hidaka,“A built-in self-repair analyzer (CRESTA) for embedded DRAMs,” inProc. ITC, 2000, pp. 567–574. X. Du, S. M. Reddy, W.-T. Cheng, J. Rayhawk, and N. Mukherjee, “At-speed built-in self-repair analyzer for embedded word-oriented memo-ries,” in Proc. 17th Int. Conf. VLSI Des., 2004, pp. 895–900. P. Ohler, S. Hellebrand, and H. J. Wunderlich, “An integrated built-intest and repair approach for memories with 2-D redundancy,” in Proc.12th IEEE ETS, May 2007, pp. 91–96. J. Lee, K. Park, and S. Kang, “An area-efﬁcient built-in redundancyanalysis for embedded memories with optimal repair rate using 2-Dredundancy,” in Proc. ISOCC, 2009, pp. 353–356. S.-K. Lu and C.-H. Hsu, “Built-In self-repair for divided word linememory,” in Proc. IEEE ISCAS, May 2001, pp. 13–16. V. Schober, S. Paul, and O. Picot, “Memory built-in self-repair usingredundant words,” in Proc. ITC, 2001, pp. 995–1001. A. Benso, S. Chiusano, G. Di Natale, and P. Prinetto, “An on-line BISTRAM architecture with self-repair capabilities,” IEEE Trans. Reliab.,vol. 51, no. 1, pp. 123–128, Mar. 2002. M. Nicolaidis, N. Achouri, and S. Boutobza, “Optimal reconﬁgurationfunctions for column or data-bit built-in self-repair,” in Proc. Conf.DATE, 2003, pp. 590–595. M. Nicolaidis, N. Achouri, and S. Boutobza, “Dynamic data-bit memorybuilt-in self-repair,” in Proc. IEEE/ACM ICCAD, Nov. 2003, pp. 588–594. J.-F. Li, J.-C. Yeh, R.-F. Huang, C.-W. Wu, P.-Y. Tsai, A. Hsu, and E.Chow, “A built-in self-repair scheme for semiconductor memories with2-D redundancy,” in Proc. ITC, 2003, pp. 393–402.
928 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011 C.-L. Su, R.-F. Huang, and C.-W. Wu, “A processor-based built-in self-repair design for embedded memories,” in Proc. 12th IEEE ATS, Nov.2003, pp. 366–371. C.-T. Huang, C.-F. Wu, J.-F. Li, and C.-W. Wu, “Built-in redundancyanalysis for memory yield improvement,” IEEE Trans. Reliab., vol. 52,no. 4, pp. 386–399, Dec. 2003. M. Nicolaidis, N. Achouri, and L. Anghel, “A diversiﬁed memory built-in self-repair approach for nanotechnologies,” in Proc. IEEE VTS, Apr.2004, pp. 313–318. J.-F. Li, J.-C. Yeh, R.-F. Huang, and C.-W. Wu, “A built-in self-repairdesign for RAMs with 2-D redundancy,” IEEE Trans. Very Large ScaleIntegr. Syst., vol. 13, no. 6, pp. 742–745, Jun. 2005. S.-K. Lu, Y.-C. Tsai, C.-H. Hsu, K.-H. Wang, and C.-W. Wu, “Efﬁcientbuilt-in redundancy analysis for embedded memories with 2-D redun-dancy,” IEEE Trans. Very Large Scale Integr. Syst., vol. 14, no. 1, pp.34–42, Jan. 2006. Y.-J. Huang, D.-M. Chang, and J.-F. Li, “A built-in redundancy-analysisscheme for self-repairable RAMs with two-level redundancy,” in Proc.21st IEEE Int. Symp. DFT VLSI Syst., Oct. 2006, pp. 362–370. S. K. Thakur, R. A. Parekhji, and A. N. Chandorkar, “On-chip test andrepair of memories for static and dynamic faults,” in Proc. ITC, 2006,pp. 1–10. C.-D. Huang, J.-F. Li, and T.-W. Tseng, “ProTaR: An infrastructure IPfor repairing RAMs in system-on-chips,” IEEE Trans. Very Large ScaleIntegr. Syst., vol. 15, no. 10, pp. 1135–1143, Oct. 2007. T.-W. Tseng and J.-F. Li, “A shared parallel built-in self-repair schemefor random access memories in SoCs,” in Proc. IEEE ITC, Oct. 2008,pp. 1–9. M. Lee and C.-W. Wu, “Method for repairing memory and systemthereof,” U.S. Patent 20090119537, 2009. T. W. Tseng, J. F. Li, and C. C. Hsu, “ReBISR: A reconﬁgurablebuilt-in self-repair scheme for random access memories in SoCs,” IEEETrans. Very Large Scale Integr. Syst., vol. 18, no. 6, pp. 921–932, Jun.2010. S.-K. Lu, C.-L. Yang, Y.-C. Hsiao, and C.-W. Wu, “Efﬁcient BISRtechniques for embedded memories considering cluster faults,” IEEETrans. Very Large Scale Integr. Syst., vol. 18, no. 2, pp. 184–193, Feb.2010. Y. Zorian and S. Shoukourian, “Embedded-memory test and repair:Infrastructure IP for SoC yield,” IEEE Des. Test Comput., vol. 20, no.3, pp. 58–66, May–Jun. 2003. International Technology Roadmap for Semiconductors (ITRS), Semi-conductor Industry Association, Sematech, Seoul, Korea, Dec. 2008. D. M. Blough and A. Pelc, “A clustered failure model for the memoryarray reconﬁguration problem,” IEEE Trans. Comput., vol. 42, no. 5,pp. 518–528, May 1993. C. H. Stapper, “On yield, fault distributions, and clustering of par-ticles,” IBM J. Res. Develop., vol. 30, no. 3, pp. 326–338, May1986. D. M. Blough, “Performance evaluation of a reconﬁguration-algorithmfor memory arrays containing clustered faults,” IEEE Trans. Reliab., vol.45, no. 2, pp. 274–284, Jun. 1996. A. Choi, N. Park, F. J. Meyer, F. Lombardi, and V. Piuri, “Reliabilitymeasurement of fault-tolerant onboard memory system under faultclustering,” in Proc. 19th IEEE IMTC, vol. 2. Aug. 2002, pp. 1161–1166. B. Jang, M. Choi, N. Park, Y. B. Kim, V. Piuri, and F. Lombardi, “Spareline borrowing technique for distributed memory cores in SoC,” in Proc.IEEE IMTC, May 2005, pp. 43–48. Y. N. Shen and F. Lombardi, “An approach for online repair and yieldenhancement of VLSI/WSI redundant memories,” in Proc. 5th Annu.Eur. Comput. Conf. CompEuro Adv. Comput. Technol., Reliable Syst.Applicat., May 1991, pp. 685–689. D. M. Blough, “On the reconﬁguration of memory arrays containingclustered faults,” in Proc. 21st Int. Symp. FTCS, Dig. Papers, Jun. 1991,pp. 444–451. T. Kirihata, Y. Watanabe, H. Wong, J. DeBrosse, M. Yoshida, D. Katoh,S. Fujii, M. Wordeman, P. Poechmueller, S. Parke, and Y. Asao, “Fault-tolerant designs for 256 Mb DRAM,” in Proc. Symp. VLSI Circuits, Dig.Tech. Papers, Jun. 1995, pp. 107–108. T. Kirihata, Y. Watanabe, W. Hing, J. K. DeBrosse, M. Yoshida, D.Kato, S. Fujii, M. R. Wordeman, P. Poechmueller, S. A. Parke, and Y.Asao, “Fault-tolerant designs for 256 Mb DRAM,” IEEE J. Solid-StateCircuits, vol. 31, no. 4, pp. 558–566, Apr. 1996. B. Vinnakota and J. Andrews, “Repair of RAMs with clusteredfaults,” in Proc. IEEE ICCD: VLSI Comput. Processors, Oct. 1992,pp. 582–585. C.-L. Yang, Y.-C. Hsiao, and S.-K. Lu, “Efﬁcient BISR techniques forembedded memories considering cluster faults,” in Proc. 13th PRDC,2007, pp. 224–231. L.-M. Denq, T.-C. Wang, and C.-W. Wu, “An enhanced SRAM BISRdesign with reduced timing penalty,” in Proc. 15th IEEE ATS, Nov. 2006,pp. 25–30. C.-F. Wu, C.-T. Huang, K.-L. Cheng, and C.-W. Wu, “Fault simulationand test algorithm generation for random access memories,” IEEE Trans.Comput.-Aided Des. Integr. Circuits Syst., vol. 21, no. 4, pp. 480–490,Apr. 2002. J. P. Bickford, R. Rosner, E. Hedberg, J. W. Yoder, and T. S.Barnett, “SRAM redundancy: Silicon area versus number of re-pairs trade-off,” in Proc. IEEE/SEMI ASMC, May 2008, pp. 387–392. K.-L. Cheng, C.-M. Hsueh, J.-R. Huang, J.-C. Yeh, C.-T. Huang,and C.-W. Wu, “Automatic generation of memory built-in self-testcores for system-on-chip,” in Proc. 10th IEEE ATS, Nov. 2001, pp.91–96. R.-F. Huang, C.-H. Chen, and C.-W. Wu, “Economic aspects of memorybuilt-in self-repair,” IEEE Des. Test Comput., vol. 24, no. 2, pp. 164–172, Mar.–Apr. 2007.Mincent Lee (S’07) received the B.S. degree inelectronics engineering from National Chiao TungUniversity, Hsinchu, Taiwan. He is currently pursu-ing the Ph.D. degree in electrical engineering fromNational Tsinghua University, Hsinchu.His current research interests include very large-scale integrated design and testing, especially thetesting and repair of semiconductor memory.Li-Ming Denq received the B.S. degree in electronicengineering from Chung Yuan Christian University,Taoyuan, Taiwan, and the Ph.D. degree in electri-cal engineering from National Tsinghua University,Hsinchu, Taiwan.In 2009, he was with the Department of Researchand Development, HOY Technologies Company,Ltd., Hsinchu, where he is currently an AssistantManager working on the development of infrastruc-ture IP for memory and logic testing. His currentresearch interests include design for testability, wire-less testing, and embedded memory testing, diagnosis, and repair.Cheng-Wen Wu (F’04) received the B.S.E.E. degreefrom National Taiwan University, Taipei, Taiwan, in1981, and the M.S. and Ph.D. degrees in electricaland computer engineering from the University ofCalifornia, Santa Barbara (UCSB), in 1985 and1987, respectively.Since 1988, he has been with the Department ofElectrical Engineering, National Tsinghua Univer-sity (NTHU), Hsinchu, Taiwan. At NTHU, he hasserved as the Director of Computer and Communi-cations Center from 1996 to 1998, and the Directorof Technology Service Center from 1998 to 1999. From 1999 to 2000, hewas a Visiting Scholar with the Department of Electrical and ComputerEngineering at UCSB. He then served at NTHU as the Chair of the Departmentof Electrical Engineering from 2000 to 2003, the Director of the IC DesignTechnology Center from 2000 to 2005, and the Dean of the College of Elec-trical Engineering and Computer Science from 2004 to 2007. He is currently aTsinghua Chair Professor with NTHU. He also serves as the Vice President ofthe Industrial Technology Research Institute (ITRI), Hsinchu, and the GeneralDirector of its Information and Communications Laboratories (ICL). He isappointed jointly by NTHU and ITRI. He was the General Director of theSoC Technology Center (STC), ITRI, from 2007 to 2009, before he architectedthe merger of STC and ICL. He is leading the integrated organization thatcovers information technologies, communications technologies, as well asIC design technologies. He is also the Chair of the IC Design Committee,Taiwan Semiconductor Industry Association, Hsinchu, Taiwan. His currentresearch interests include design and test of high performance very large-
LEE et al.: A MEMORY BUILT-IN SELF-REPAIR SCHEME BASED ON CONFIGURABLE SPARES 929scale integrated (VLSI) circuits and systems, and test and repair of memorycircuits.Dr. Wu was a recipient of the Distinguished Teaching Award from NTHU,the Outstanding Electrical Engineering Professor Award from the ChineseInstitute of Electrical Engineers (CIEE), the Distinguished Research Awardfrom the National Science Council, the Industrial Collaboration Award fromthe Ministry of Education (MOE), Best Paper Award of the 2002 IEEEInternational Workshop on Design and Diagnostics of Electronic Circuitsand Systems, Best Paper Award of the 2003 IEEE Asia and South PaciﬁcDesign Automation Conference (ASP-DAC), Special Feature Award of the2003 ASP-DAC University LSI Design Contest, Best Paper Award of the 2007VLSI Design/CAD Symposium, Academic Award from MOE, ContinuousService Award and Outstanding Contribution Award from the IEEE ComputerSociety, the IEEE VLSI Test Symposium Best Innovative Practices SessionAward from Test Technology Technical Council, IEEE Computer Society, theDistinguished Industrial Collaboration Award from NTHU, and the NationalInvention Award (Silver Medal) from the Ministry of Economic Affairs. Hewas the Technical Program Chair of the IEEE 5th Asian Test Symposium(ATS96), the General Chair of ATS00, the General Co-Chair of the IEEEMemory Technology, Design, and Testing Workshop in 2005, 2006, 2007, and2009, the Organizing Committee Chair of the IEEE Asian Solid-State CircuitConference in 2009, and the General Co-Chair of the IEEE InternationalSymposium on VLSI Design, Automation, and Test in 2008 and 2009. Hewas the Editor-in-Chief for the International Journal of Electrical Engineeringand, subsequently, the Chair of its Editorial Board. He is an Editor for theJournal of Electronic Testing: Theory and Applications (JETTA) and theIEEE Design and Test of Computers, and an Associate Editor for IEEETransactions on Computers and IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. He became aGolden Core Member of the IEEE Computer Society in 2006. He is a LifeMember of both CIEE and the Taiwan IC Design Society.