Class 260: Flash Memory Technology and Techniques
Upcoming SlideShare
Loading in...5
×
 

Class 260: Flash Memory Technology and Techniques

on

  • 607 views

 

Statistics

Views

Total Views
607
Views on SlideShare
607
Embed Views
0

Actions

Likes
0
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Class 260: Flash Memory Technology and Techniques Class 260: Flash Memory Technology and Techniques Document Transcript

  • Flash Memory Technology and Techniques Classes 260 and 426 Embedded Systems Conference 1999 William Grundmann, Intel Corporation Flash Memory Technology FLASH TECHNOLOGY OVERVIEW This section describes how to use flash memory, how it works and some new Flash memory has emerged as a developments. mainstream technology for storing firmware in OPERATING MODES embedded systems. While flash memory can be used like an EPROM–installing a programmed device into a PCB–such an As will be seen later, modifying a implementation wastes a great deal of flash memory cell is a little more involved than functionality. The same flash memory that simply writing data to it. Program and erase stores the firmware can store data that is operations are required to alter array contents, important to an end user. An important step and these operations are done by putting a in taking full advantage of flash memory's flash device into the appropriate operating capabilities is an understanding of how to use mode. flash and a few suggestions on how flash can improve a product. Operating modes vary somewhat between different manufacturers, and a This paper begins with a brief manufacturer can offer two devices with overview of flash memory technology—how different modes and commands to reflect to read, write, program and erase it—and different architectures. This discussion will continues by taking a look at how a flash use the modes supported by the 28F800C3, an memory cell stores data. This is useful Intel® Boot Block Flash memory. information, because it explains some of the unique operating characteristics of flash. It These modes make use of the Write also includes a discussion of several new State Machine (WSM). The WSM is a developments that will increase the number of controller that is built into a flash device. It applications that can use flash memory. looks at all writes to the flash, decodes them and implements the commands. Two techniques will be described: storing code and data in flash memory and Read Array programming the boot code in-system. They improve a product or manufacturing flow The Read Array is the default mode, without adding substantial cost. and a flash memory enters this mode upon power-up. It stays in this mode until a valid command is received that puts it in a different mode. While in this mode, all read operations return memory contents.
  • Program Read Status Register Command Programming is the process of writing Read Array Command a “0” into a flash location. A single byte or word can be programmed. Upon receiving a valid program Writing the value 0x40 to the memory sequence the WSM turns on the charge will put it in the Program mode. While in this pumps and programs the location. It does this mode, any bus reads may return invalid data. by internally applying program pulses until the The address and data included in the next location verifies correctly. Once the location is write cycle will be used to program the array. verified, it reflects this successful operation in the status register. Failure of the programming operation sets one or more error flags. Once begun, all this happens without Program the intervention of the external controller, and for a typical location in a new device, takes Write 0x40 about 10 microseconds. to Flash Most flash memories can only do one thing at Write Address and a time, so while a location is being Data of location to programmed or erased, the rest of the device be programmed is unavailable. This means that a CPU cannot execute code from a flash that is not in the Read Array mode. Put Flash into Read Status Mode Erase Erase is the process of writing a “1” Read Status into an entire device or block. Note that an no individual location cannot be erased, the entire block or device must be erased. The typical usage mode for flash is to erase a block, that Finished initializes all locations to 0xFF, and program Programming ? individual locations as needed. A bit that is a “1”, erased, can always be programmed to a Yes “0”. A bit that is a “0” must be erased, along Return with the block it is in, to a “1”. Writing 0x20 to the flash begins the Figure 1 Erase mode. If the next write has a data value of 0xD0, then an erase operation will Figure 1 shows a flow chart of a typical commence. Any read operations done when programming operation. For a typical location in the Erase mode may return invalid data or there will be a minimum of 4 bus transactions: status. Program Setup Command As with programming, the entire erase operation is automatic, and the result is posted Program Command to the status register. An erase command will erase the entire block; it takes about 1 second
  • for the WSM to do this. Clearly, the WSM off-loads most of the work from the CPU. 64-KB Main Read Status Total Both program and erase operations 31 Main blocks post their results in the status register. When the flash receives a 0x70 command, it enters the Read Status mode. In this mode, any read 64-KB Main operation returns the contents of the status register. 8-KB Parameter PROTECTION Total 8 Parameter blocks Flash devices incorporate some form 8-KB Parameter of protection. The concern is that if a processor were operating in some erroneous way, it could erroneously issue program or 28F160C3 erase commands to the flash memory. To prevent this, a flash memory will usually support some combination of two basic types Figure 2 of protection schemes: hardware and software. SEGMENTATION A hardware scheme requires that specific voltages be applied to pins on the Current flash devices are blocked or device before the device can be altered. The segmented. All the locations in a block share voltages can be TTL logic levels or some common erase circuitry, so while only one special voltage like 12v, and the pins can be location can be programmed at a time, an dedicated or shared with other functions. An entire block or segment is erased at once. example of hardware protection is the VPP pin on a flash memory; if no voltage is applied, the Normally, the boot code will be device will not program or erase. located in one or more blocks, the application code placed in several other blocks and some Software protection mandates that a non-volatile data stored in another. Figure 2 code sequence be written in order to initiate an shows an address map of an Intel® operation. Consider the erase command in 28F800C3. Its blocks are sized to fulfill these the preceding discussion. Two codes, 0x20 three functions. and 0xD0, were required in order to begin the erase operation. The first can be considered a The boot code should be put in a command, but the second is clearly a code. separate block so that data or application code Some devices require code sequences that are can be erased and reprogrammed without four or six bytes long. affecting the boot program. For small boot programs, one or more the the 8-Kbyte parameter blocks can be used. For large boot programs, use the main blocks. Two variations of a boot block device are available. One has the parameter blocks at the bottom of
  • the memory map, the other puts them at the would. When sufficient positive voltage is top. applied to it, greater than the threshold voltage, the transistor turns on, and a SUSPEND FEATURES conduction path will be established between the Source and the Drain. As will be seen in the next section, program or erase operations can take tens of The Floating Gate(FG) modulates the microseconds or even seconds. There are affect of the CG. If the Floating Gate has a many instances where a system will need to sufficient negative charge, it will prevent the read data in one block while a different block transistor from turning on when the normal is being erased or programmed. Flash threshold voltage is applied to the Control memories implement a suspend command that Gate. permits this. OXIDE When the WSM receives a suspend CONTROL GATE command during an erase or program FLOATING GATE operation, it saves the state of the operation SOURCE DRAIN and enters a mode that allows the array to be CHANNEL accessed. This is like the Read Array mode, except that the WSM is running, so the component may draw more a little more current. Figure 3 SUMMARY READING A FLASH CELL Flash memories can be a complete An erased cell has no additional non-volatile memory subsystem. Current charge on the FG, and the transistor turns on devices implement all the pumps and control at its characteristic threshold . When the electronics necessary to program or erase same cell is programmed, it has enough them. electrons in the FG to prevent it from turning on when the threshold voltage is applied. To FLASH CELL TECHNOLOGY read a cell, the external circuitry applies the threshold voltage to the CG and observes if Flash memories can do more than the transistor turns on. store the firmware; they can store or log data. Some understanding of how flash memories While all architectures use similar store data will be useful in matching flash’s principles to store and read data, they do not programming characteristics to an application. act the same from a system point of view. NOR permits a single location—word or Flash, EEPROM and EPROM all byte—to be read in approximately 100ns. store data the same way. Figure 3 shows a simplified diagram of one of these cells. It is a NAND only reads blocks of 512 N-channel transistor with a second gate, the bytes. There is an initial latency of 10 µs Floating Gate, sandwiched between the while the block is transferred out of the Control Gate, and the channel. memory array and into an on-chip SRAM buffer. It can be transferred out of the buffer The Control Gate (CG) is connected at a rate of 20 Mbytes/s. to other circuitry and works like a normal gate
  • specify a device, it is necessary to understand The type of reading supported by an how long it takes for an operation and how big architecture determines if the memory can be a block the operation affects. An operation executed from. The former example supports that takes 1 second may seem slow, but if it direct execution, or eXecution In Place (XIP), erases 128-Kbytes, the throughput is 128 the latter does not. Kbytes/sec. PROGRAMMING OR ERASING Depending on the type of flash, it will take between 5 µs and 1 second to program or The WSM modifies a flash memory erase a cell. Cells are usually programmed or cell by injecting electrons onto the Floating erased in parallel, so that a slow operation is Gate (programming) or by removing them done on a lot of cells at once. For example, a (erase). On today’s devices, a cell is NOR architecture device may program a programmed when approximately 50,000 single byte in 5 µS and erase 64-Kbytes in 1 excess electrons are in the FG. They stay in second the FG for 10 years or more, because there is no conduction path to the FG--the FG is There are three aspects to consider completely isolated from the rest of the when choosing a flash memory that will be transistor. The only way for electrons to get used to store data and will be programmed in- into it is for them to be driven through the system. oxide that surrounds the FG. The cell is erased by removing the electrons. • What rate does the application require? Programming and erasing are • What rate does the device support? accomplished by applying elevated voltages, This can be the average time to 10v to 20v, to the cell to cause electrons to program and erase a byte propagate through the oxide. This has three consequences. • How big of a buffer is necessary, if any, to store in-coming data while the • It takes time to drive the current through flash is busy programming or erasing the oxide barrier, so program and erase the previous block. operations take much longer than in an SRAM. Oxide Wear-Out • The oxide barrier can “wear out” after When electrons or holes propagate repeated program/erase cycles. through the oxide, some do not make it all the way--they get trapped. When they do, they • Elevated voltages are required by the alter behavior of, or wear out, the oxide. cell A cell whose oxide has trapped Program and Erase Performance charge does not program or erase as fast as an electrically neutral one. The trapped If the application will store data in a charges repel the carriers that are trying to flash memory, the amount of time it takes to propagate through it. For example, if a program or erase the device can be an programming mechanism drives electrons important specification. Also, flash memories through the oxide, then the oxide regions that typically operate on blocks. To accurately sustain the heaviest program current will gain
  • a negative charge. That negative charge given method, the rate can be increased by repels electrons and reduces the amount of operating on many bits in parallel. To choose current that flows into the Floating Gate. The a flash memory, match the write/erase result is a slower program or erase operation. characteristics of the memory to the The typical times published in application. datasheets are for a new cell. Since the amount of performance degradation will vary with the technology and vendor, it is necessary to contact the vendor to estimate what effect cycling will have on their devices. In general though, the degradation is gradual and occurs over tens of thousands of program erase cycles. Applying Elevated Voltages The original flash memories required that elevated voltages be applied for precise times. New devices can generate and control these voltages internally. It is worth mentioning that the flash transistor requires these elevated voltages, and if they are not applied externally, they are generated on-chip by voltage pumps. Voltage pumps take up silicon and add cost to the device. The more current the pump can supply the more bits that can be programmed at once. However, that higher performance pump takes up more space Typical flash memories strike a balance between the two. Several vendors have a separate pin for a programming voltage, VPP. One benefit of this is that a device may program quicker when a higher voltage is applied to the pin. This is because the device will sense the higher voltage and modify the internal algorithm to program more bits at once. SUMMARY The flash memory cell intrinsically take time to program or erase. How much time depends on the method used to for carriers through insulating barriers. For a
  • unnecessary, because most manufacturers COMMON FLASH INTERFACE make new devices compatible with their previous ones. The Common Flash Interface (CFI) is a new way to find out what kind of flash COMMON FLASH INTERFACE memory is in a socket. It will work even with flash components that are designed and The Common Flash Interface solves produced after the system software has been this problem by providing a standard way for a written. system to interrogate a flash memory, and it defines the format for the information. The JEDEC IDENTIFIERS information is descriptive. Embedded in the component is a trimmed-down datasheet that In the past, a flash memory or an contains enough information for the system to EPROM had a mode where a JEDEC program or erase the flash memory, even if Identifier could be read from the device. This the component was designed after the system was originally intended for, and used by, firmware was frozen in ROM. EPROM programmers. When system designers decided to make their flash sockets CFI ACCESS accept different devices, they used these identifiers, because they were the only way to A flash memory will enter into the tell what program and erase algorithm to use. CFI data mode when a value of 0x98 is written to location 0x00055. From then on, all This technique had two flaws: there read cycles will return data from the CFI was no standard way to read the identifier, ROM instead of the flash array. Figure 4 and there was no way to anticipate what the shows how this works. identifiers would be for future devices. To read the identifier from an Intel device, the system must first write 0x90 to it, CFI then read the identifier from locations 0x00001 ROM FLASH INTERFACE and 0x00002. To accomplish the same thing FLASH with a device from another vendor, the system ARRAY must write 0x55 to location 0x0000, followed by 0xAA to location 0x0000. The identifier can then be read from locations 0x0000 and 0x0001. CFI COMPLIANT FLASH To find out what kind of memory is in The CFI data is stored in a separate ROM the socket, the program attempts to read an located on-chip. The memory locations in the ROM do not detract from the size of the array. identifier using one algorithm, and if it fails to This illustration shows a 2 Mbit device that return the expected result, it tries a different supports CFI. It has a 2 Mbit Flash array and a ROM with about 50 locations. When 0x98 is one. This is workable but not optimal. written to 0x0005, the interface is switched over to the ROM. To access the Flash again, write 0xFF to location 0x0005. The second issue is that the identifier must, by definition, be unique. When a new device is introduced, it will have a new, unique Figure 4 identifier, so all software written prior to the release will reject the new device. This is
  • CFI DATA uses the Intel basic algorithm—algorithm number 3. The ROM is separated into two areas, data that applies to all devices and data that is There is also a mechanism to support specific to one vendor. The generic area has minor variations of the same algorithm. These a predetermined format. Each location stores details are located in the vendor specific area. a specific parameter using a standard encoding. SUMMARY For example, the minimum VCC is The Common Flash Interface allows a stored in location 0x1B of the CFI ROM. It is single software driver to work with a variety in BCD with a decimal point between the two of flash components from different nybbles. Other information like access time, manufacturers. block architecture, typical program and erase times and VPP are also stored in this area. MULTI-LEVEL CELL The format and encoding are listed in Multi-Level Cell (MLC) technology the “Common Flash Memory Interface stores two bits in a single flash memory Specification”. This specification was transistor, twice the density of existing devices recently adopted by JEDEC and should appear that store one bit. An MLC flash memory on their website soon. Check with 16 million transistors can store 32 million www.eia.org/jedec/. Incidentally, the bits. This is not a mode that can be invoked in 28F800C3 described earlier supports CFI. any device; the flash memory must have been designed to do this. PROGRAMMING ALGORITHMS Charge on the Floating Gate 00 One location in the generic area contains a number that identifies what 0 programming algorithm the device supports. If 01 two devices have the same algorithm, they will have the same number in the algorithm 10 location. 1 11 Notice the difference between CFI data and the JEDEC ID. The JEDEC ID was Coding for Coding for unique for each device. Since CFI data is Existing MLC descriptive, two devices with identical Devices Devices programming algorithms will have the same number in the algorithm location. Figure 5 The table that lists what numbers go with what algorithms is in a separate Current devices store data by varying document, “CFI Publication 100”. Each flash the amount of charge on a Floating Gate. If vendor will have a few numbers that the gate is neutral, the cell is erased and correspond to their unique programming contains a 1. If charged, the cell is algorithms. For example, the boot block programmed and contains a 0. These device described in the introductory section
  • components implement two charge levels per A better solution is to incorporate high cell. speed interfaces and architectures already supported by many CPUs. MLC devices also vary the gate charge, but more precisely. They use four PAGE MODE levels; each level represents one of four states A page mode flash has the same or two bits. Figure 5 illustrates the difference interface signals as a standard asynchronous between MLC flash memories and standard flash memory; it functions the same as the ones. standard device except that some of the accesses are faster than others. If a CPU USING MLC FLASH MEMORIES supports page mode, it can take advantage of the faster accesses. MLC devices have the same interface as regular flash memories. The Write State The first access to a page mode flash Machine does all the work and improved causes a group of locations, or a page, to be sense circuitry distinguishes between the four read and latched inside the device. The levels to produce the read data. MLC devices location that was explicitly accessed is driven behave the same as standard devices because onto the interface; the other locations on the the details are handled on-chip. page are saved in case the next access is to another location on the same page. SUMMARY A page is always aligned on For a given array size and lithography, boundaries that are integral powers of two. MLC technology can double the amount of Typical page sizes are four or eight 16-bit data the array can store. MLC components in words. For example, if a page mode flash chip-sized packages offer unprecedented memory has an eight word page, the first page capacity in a small area. One new MLC will consist of words 0 - 7, the second page, 8 device can store 64Mbits in a package that is - 15 and so on. less than half the size of a postage stamp. Continuing with the example, if location 29 were accessed—address and HIGH SPEED INTERFACES control applied just like a standard device— locations 24 - 31 would be read from the There is a fundamental difference array, internally latched, and the data from between logic devices, like CPUs, and flash location 29 driven onto the bus. All this memories: As lithographies shrink and the happens in a normal access time, say 100ns. operating voltages go down, logic devices If the next access is to one of the locations on speed up; flash slows down. the same page, the data will be driven onto the bus much sooner, perhaps in 25 ns. The One solution to this dilemma is to performance of a real-world device, the Intel® spend die area to increase the raw speed of 28F160F3, is 90 ns for the first access and 25 the flash array, and this is practical for speed ns for subsequent access. Its page size is 4, increases on the order of 20%. Of course, the 16-bit words. higher speed devices will be more expensive. This technique is not practical for significant Since most accesses to code memory performance increases; the die size and cost are sequential, a page mode memory can would be too high for high volume applications. substantially increase the performance of a system.
  • 5 times faster than a standard asynchronous flash memory SYNCHRONOUS BURST INTERFACE Unlike page mode, a synchronous burst flash Flash Memory Techniques memory has unique signals that are not present in standard memories, most notably, a Armed with an understanding of flash clock. As the name implies, a synchronous memory technology, a designer can improve interface uses a clock to time accesses to the the manufacturing flow and add value and flash. versatility to a product. A burst begins when a starting address is Many systems are data-centric. The latched on a clock edge. After some number input, manipulate or store some form of data. of clocks elapse, typically 1 to 4, the data for A CNC machine uses programs (data) to the requested location is driven onto the bus, create piece parts. A seismograph records again on a clock edge. After some number of the motion of the earth on some media. Even clocks elapse, typically 1 or 2, the next location applications that do not operate on data, will is driven onto the bus, then the next, and so on. probably need to store and update some The number of clocks for the first and configuration information. subsequent access is programmable and depends on the clock frequency. A system that uses linear flash to store the executable code can, with the right Internally, the burst device uses that same software, store this data in the same access technique as a page mode device, it component that stores the code. reads multiple location at once, latches them and drives them out one by one. It can be a This section discusses how to write little faster than the page mode device, and, more importantly, modify data in flash, because for burst accesses, the device does specifically, in the same flash component as not need to get an address, decide what kind the code. of access to do and drive the right word onto the bus. Since the locations are always read WRITING DATA TO FLASH in a pre-determined order, the output circuitry can be streamlined. The order that locations Flash is routinely used to store code, are access during the burst may be but most applications also need to permanently programmable. store data. The data can be configuration information that is has a fixed size, a log of The ability to program the time and burst order some physical process that is updated means that the device must be initialized frequently or a pictures in a camera of varying before burst access can be done. It typically sizes that are managed with complete powers up looking like a standard device. flexibility. Once initialized both the flash and the CPU must be programmed to begin burst operation. The only media that can store both executable code and data is linear flash The 28F160F3 mentioned previously also has a memory, and new flash drivers store both burst mode. Given a clock input of 54 MHz, it code and data in a single component. The can perform the first access in 4 clocks; firmware can be executed from one part of subsequent ones in one clock. That is almost the component, while the data is stored
  • another. With finer lithographies and MLC block be erased to modify the small data set? technology, that component will have enough No, rather than erasing the block and rewriting storage capacity for all but the most complex the small data set, a technique can be systems. employed where the old version of the data is marked invalid, and the newest version written The flash driver that implements this into the block. Figure 6 illustrates this. A one- single-chip code + data solution must do two location flag in the data-set will be used to things, manage the media and manage system keep track of which one is valid. When a new events. version is written to the block, , its “valid” flag will be all ones; the “valid” flag of the previous one will be programmed to zero. Notice that MEDIA MANAGEMENT for our example, a 64 KB block can handle 64 The requirement for a media manager modifications of a 1KB data set. arises from flash memory’s program and erase capability: A single location can be From a cycling perspective, if a block programmed, but an entire block must be can be erased 100K times, it can support 6.4 erased. million modifications of the data set—the data The functionality of the manager can could be modified once each minute for over vary with the type of data being stored. If the 10 years and not exceed the cycling size of the data matches the block size in the capabilities of the flash. Incidentally, most flash, the media manager will be trivial. For flash memories can program 1K locations example, if the application were a digital within 20 ms. camera, and if the size of the pictures were always 64-KB, the same size as the flash block, then when a new picture was taken, a block would be erased and the new data written. Another example of a simple media manager is for data-logging applications. The ed storage area can be completely erased and us data new data programmed as it is acquired. Un Few data storage requirements fit Current Copy flash’s architecture as well as these. A more Old Copy realistic example would be a media manager Old Copy that stores configuration information. Old Copy Simple Media Manager Consider an application that uses a Flash Block small amount of configuration information. It is a fixed size, for this example 1KB, and is Figure 6 updated periodically. Finally, assume the flash device has 64-KB blocks. This kind of simple media manager is used in countless embedded systems. There are no issues with writing the However, it is inadequate for applications that data-set the first time, but what happens when have variable length data-sets, especially it needs to be modified? Obviously the where the data size is not known at build time. contents cannot be overwritten without first erasing the block. Must the entire 64Kbyte
  • Those applications require a more robust Copy the necessary service routines manager. to the alternate execution memory so they will be available when needed. General Purpose Media Manager Suspend the program operation. This A general purpose media manager is is a new capability for flash and is usually required if there are a number of becoming a common feature in new different data-sets of differing sizes that may devices. If the flash is busy, the LLD be modified. issues a program or erase suspend; and within about 20µs, the flash will The details of implementing a general be available for reading or executing. purpose media manager are beyond the scope of this paper, and it is not necessary to create The last technique is the most general one; many real time operating systems include purpose, because it does not require any support for a random access file system in changes in functionality or redesign of the linear flash. An alternative is to use one from system software. a flash vendor. For example, Intel offers several flash media managers with varying Configuration levels of capability. Data EVENT MANAGEMENT Files It has been mentioned several times that the typical flash component goes busy when it is programming or erasing. That is not an issue if the data storage is in a separate component Executable as the code, but what if the executable code Code and data are in the same chip? The standard solution is to copy a Boot Code small low level driver (LLD) to some alternate execution memory, and run from it. That LLD Single Flash has just enough code to program a location or erase a block, return the flash to Read-Array Component mode when the operation is finished and jump Figure 7 back to the firmware in flash. What happens if a real time event The event manager is part of the low occurs while the flash is busy and the system level driver that typically resides in RAM. cannot access the appropriate service Prior to beginning a program or erase routines? There are several techniques operation, it disables all interrupts in the system. Once the program or erase operation Postpone the programming until there is begun, the manager polls for either a ready is no chance a real time event will status from the flash or an interrupt. If the occur. flash is ready, it enables interrupts. If an interrupt occurs, the event manager suspends the operation, puts the flash in Read-Array
  • mode and enables interrupts. At this point, the number of OSs that are stored in interrupt gets serviced and control is the code partition. eventually returned to the event manager. It disables the interrupts again and resumes the • The configuration data partition suspended operation and continues polling. will usually be a small one and managed by a simple media manager. Its function is to store From a system perspective, the only basic information about the affect all this has is small increase, on the system like its serial number, what order of 20us, in the maximum interrupt drivers to load or perhaps what latency. If the application is not sensitive to OS or version of the OS to this, then the system can benefit from the execute. increased integration. Those applications that cannot tolerate the additional latency can fall • The media manager for the File back to separate components for code and partition must be robust. It must data. handle variable file sizes, with multiple open file, that are A COMPLETE SOLUTION shrinking or growing. Depending on the implementation, it may Ideally, the memory map for a system even need to manage the would look like the one shown in Figure 7. A directory structures. single flash component is divided up into three partitions: executable code, file storage and • There are flash components that configuration storage. This implementation are on the market today that have supports two kinds of data. Configuration data enough density to easily store all is accessed without the OS, so its contents can this information in a single chip. be used to configure the system before the OS loads. Data stored as files are accessed using the OS’s file interface—the application opens, Solutions like this exist today. One creates, reads and writes files to the flash just example is Intel® Persistent Storage like it does to other media. Manager. It supports storing executable code, files and configuration information in systems This solution has several key benefits. that use the Windows CE* OS. • It is a single chip solution. COMPARISON OF FLASH AND OTHER MEDIA • The system can execute directly from the code partition. Almost every kind of memory has at Obviously this solution includes an one time or another been adapted to event manager. This code permanently store data or code. If a partition will consist of the boot technology like RAM is volatile, batteries are code and the OS image or images. used to keep it alive. If an application calls for Programming the boot code will rugged storage, a special enclosure is added to be discussed in the next section protect a disk drive. Many of these applications would be better served by flash • From an architectural viewpoint, memory. Conversely, there are applications there is no reason to limit the where flash is not appropriate.
  • Battery Backed-up RAM hard disk only costs a few hundred dollars. However, hard disks suffer from a high fixed Battery backed-up DRAM or SRAM cost, high power, and since they are (BBRAM) has the advantage of fast, simple mechanical systems, poor reliability and writes with no limitations on the number of ruggedness. times a location can be modified. This makes it ideal for its primary purpose: storing temporary, volatile data. This is one EEPROM application that flash is not suited for. The most common type of EEPROM Depending on the kind of RAM, there is a small footprint serial device that stores a are several issues with using it for persistent few Kbytes. Each location can be individually storage. Both DRAM and SRAM suffer from erased and programmed. They are a intermittent data corruption. For no apparent convenient way to add a few Kbytes to a reason, over a long period of time, something system, but they may be unnecessary--a happens to the contents. Pocket organizers simple program can make a couple of flash typically use BBRAM to store user data. blocks behave like an EEPROM. Experienced users of these have learned to back-up their data. SUMMARY SRAM is expensive. Where flash is High density linear flash memory ½ or 1 transistor per bit, an SRAM bit uses 4 along with the right software can dramatically or 6 transistors. This may not be an issue for increase the functionality of a system with an application that only stores a few Kbytes, little or no effort on the part of the developer. but it will be for one that stores hundreds of In some cases, complete software drivers are Kbytes. available that allow applications to store files in flash using standard OS file operations. DRAM is low cost, but the power required to keep it refreshed is significant, PROGRAMMING BOOT CODE IN- especially because it must be powered all the SYSTEM time. Consider a 3.3v DRAM array that requires 500µA of self refresh current. If two The boot code resides at the CPU's AAA NiCad cells (capacity of 250mAh) are reset address and performs critical functions used to supply the power, that array has a to start the system. These functions can be as shelf-life of 500 hours, or about 3 weeks. simple as initializing a few interrupt vectors prior to branching to resident applications The final issue with BBRAM is code, or it can involve something more batteries—they add cost and size to a system. complex like loading the application code from a disk prior to branching. In a system with Rotating Media flash memory, this boot code may also contain the necessary code for in-system application Rotating media like floppy disks or program updates. hard disks enjoy extremely low cost per bit, and they excel at storing large amounts of A system cannot program its own data. There is no question that a 10-gigabyte boot code, because the boot code is necessary flash array would be prohibitively expensive for the CPU to operate. Therefore, a typical for most applications, while the same sized production flow involves programming the boot code into the flash memory prior to soldering it *Other brands and marks are the property of their respective owners
  • to the PCB. A better way would be to solder and those that simply isolate the flash from blank components to the PCB and program everything else, so it can be manipulated. the boot code in-system. There are several advantages to doing this: reduced component handling, reduced inventory and improved responsiveness. There are two mutually exclusive JTAG Boundary Scan trends in flash packaging: the size is decreasing and the lead count is increasing. If the CPU chip includes a JTAG The only way to satisfy both is to decrease the interface, that interface can be used to lead pitch. An example of this is the TSOP manipulate the CPU's pins and hence the bus. (Thin Small Outline Package) package; it has a 0.5 mm lead pitch. As components shrink, JTAG is a standard interface and they become more difficult to handle reliably, protocol consisting of a 4 or 5 signals that can and a programming step introduces opportunity shift commands and data into a device. The for lead damage. New chip-sized packages basic protocol is to shift in a command which make a programming operation even more will usually be 4 to 8 bits long, then shift in the difficult. data for the command. The JTAG interface is static, so there is no maximum time between Programming and handling equipment bits or edges. is expensive, so many companies elect to have a third party do their flash programming. This All JTAG implementations will have adds lead-time to the production process, and the capability to do boundary scan. The if different products have different boot codes, boundary scan register is a shift register that is the added lead-time will mean increased connected to all the I/O pins in the device. inventory. This inventory makes it difficult to The state of all input pins can be latched into respond to unexpected events such as this register then shifted out, or the levels to be emergency orders output can be shifted in, then driven onto the pins. All this can be done with the CPU in a When a flash memory is programmed, suspended state. it becomes more application specific, and therefore, less flexible. The best approach is The result of this is that bus cycles, to postpone the programming until it is albeit slow ones, can be generated by shifting necessary, and allow the capability to change in the correct levels for address, control and the programming. In-system programming of data pins; driving them onto the bus; latching boot code achieves both of these goals. any inputs and shifting them out. The entire boundary scan register must be shifted in or TECHNIQUES out to write or read a pin, but more than one pin can be read or written in a single shift The basic strategy for including this operation. capability in a system is opportunistic--one looks at what one has to work with, then For a more complete discussion of this chooses the tactic that minimizes system cost technique, refer to "Designing for On-Board and manufacturing infrastructure penalties. Programming Using the IEEE 1149.1 (JTAG) Most of these techniques will fall into two Access Port ", Intel Application Note AP-630 main categories: those that make use of the . It is available from CPU or interface logic to operate on the flash, www.intel.com/design/flcomp/applnots
  • noise and ringing, but even so a location could and includes sources for commercially be programmed in 30 microseconds. All 8K available JTAG programmers locations could be programmed in .25 seconds. This is well within the requirements for high CPU Debug Interface volume manufacturing. Many CPUs have a debug interface. This usually takes the form of a proprietary In this technique, all the pins of the serial interface and protocol that can access flash must be manipulated by an external internal CPU registers and the bus. Although programmer. The best way to do this is as the included on the same chip, this capability is last step in testing a board on a bed-of-nails totally separate from the CPU, so the CPU tester. See AP-629, “Simplify Manufacturing need not be running in order for this to work. by Using Automatic-Test-Equipment for On- The commands that are useful for Board Programming” at programming flash are ones that simply read and write memory; they are used to write www.intel.com/design/flcomp/applnots commands to and read status from the flash. for more information on this topic. Contact the CPU vendor for information on the debug interface and any Programming the Application Code tools that support it. Once the boot code is installed, the Isolating The Flash applications programs can be loaded in. Some manufacturing flows involve programming the If there is no JTAG or debug interface flash with test and calibration routines first, on the CPU, and if there is little or no then when they are no longer needed, the interface logic to modify, then a third application code is loaded in. approach is to get the CPU completely off the bus so it will not interfere with the There are two functions that must be programming operation. included in a system in order for a CPU to program its own application code. There must Many CPUs support some mechanism some means of inputting the new program: a that relinquishes the bus. This could be a serial port or a floppy disk are two good HOLD/HOLD_ACK protocol, or a test mode examples. such as ONCE (ON Circuit Emulation). Either mode could be invoked by signals that Also, there must be some alternative are applied by the programming connector. execution memory that the CPU can execute When the programming controller is not from while the flash is being programmed. connected, these signals could be weakly Recall that while the WSM is programming deasserted. the flash, any reads will return invalid information, and when it is in the Read Status Once the CPU is not driving the Mode, any reads will return status. Since the memory bus, the programming controller can flash memory array, and hence the program, is take over, and do reads and writes to the not available, the CPU must run out of some flash. Since it is a parallel interface, it can be other memory. One of the most common the highest performance method for solutions to this is to have the CPU copy a programming the boot code. The bus small routine into SRAM, then run from it operations should be slowed down to prevent while the flash is programming.
  • CONCLUSION Flash Memory’s unique capabilities are tools a designer can use to improve a product and manufacturing flow.