Class 260: Flash Memory Technology and TechniquesDocument Transcript
Flash Memory Technology and Techniques
Classes 260 and 426
Embedded Systems Conference 1999
William Grundmann, Intel Corporation
Flash Memory Technology FLASH TECHNOLOGY
OVERVIEW This section describes how to use
flash memory, how it works and some new
Flash memory has emerged as a developments.
mainstream technology for storing firmware in
embedded systems. While flash memory can
be used like an EPROM–installing a
programmed device into a PCB–such an As will be seen later, modifying a
implementation wastes a great deal of flash memory cell is a little more involved than
functionality. The same flash memory that simply writing data to it. Program and erase
stores the firmware can store data that is operations are required to alter array contents,
important to an end user. An important step and these operations are done by putting a
in taking full advantage of flash memory's flash device into the appropriate operating
capabilities is an understanding of how to use mode.
flash and a few suggestions on how flash can
improve a product. Operating modes vary somewhat
between different manufacturers, and a
This paper begins with a brief manufacturer can offer two devices with
overview of flash memory technology—how different modes and commands to reflect
to read, write, program and erase it—and different architectures. This discussion will
continues by taking a look at how a flash use the modes supported by the 28F800C3, an
memory cell stores data. This is useful Intel® Boot Block Flash memory.
information, because it explains some of the
unique operating characteristics of flash. It These modes make use of the Write
also includes a discussion of several new State Machine (WSM). The WSM is a
developments that will increase the number of controller that is built into a flash device. It
applications that can use flash memory. looks at all writes to the flash, decodes them
and implements the commands.
Two techniques will be described:
storing code and data in flash memory and Read Array
programming the boot code in-system. They
improve a product or manufacturing flow The Read Array is the default mode,
without adding substantial cost. and a flash memory enters this mode upon
power-up. It stays in this mode until a valid
command is received that puts it in a different
mode. While in this mode, all read operations
return memory contents.
Program Read Status Register Command
Programming is the process of writing Read Array Command
a “0” into a flash location. A single byte or
word can be programmed.
Upon receiving a valid program
Writing the value 0x40 to the memory sequence the WSM turns on the charge
will put it in the Program mode. While in this pumps and programs the location. It does this
mode, any bus reads may return invalid data. by internally applying program pulses until the
The address and data included in the next location verifies correctly. Once the location is
write cycle will be used to program the array. verified, it reflects this successful operation in
the status register. Failure of the
programming operation sets one or more error
flags. Once begun, all this happens without
the intervention of the external controller, and
for a typical location in a new device, takes
about 10 microseconds.
Most flash memories can only do one thing at
Write Address and a time, so while a location is being
Data of location to programmed or erased, the rest of the device
is unavailable. This means that a CPU cannot
execute code from a flash that is not in the
Read Array mode.
Put Flash into
Read Status Mode
Erase is the process of writing a “1”
Read Status into an entire device or block. Note that an
no individual location cannot be erased, the entire
block or device must be erased. The typical
usage mode for flash is to erase a block, that
Finished initializes all locations to 0xFF, and program
? individual locations as needed. A bit that is a
“1”, erased, can always be programmed to a
“0”. A bit that is a “0” must be erased, along
Return with the block it is in, to a “1”.
Writing 0x20 to the flash begins the
Figure 1 Erase mode. If the next write has a data
value of 0xD0, then an erase operation will
Figure 1 shows a flow chart of a typical commence. Any read operations done when
programming operation. For a typical location in the Erase mode may return invalid data or
there will be a minimum of 4 bus transactions: status.
Program Setup Command As with programming, the entire erase
operation is automatic, and the result is posted
Program Command to the status register. An erase command will
erase the entire block; it takes about 1 second
for the WSM to do this. Clearly, the WSM
off-loads most of the work from the CPU.
Both program and erase operations 31 Main
post their results in the status register. When
the flash receives a 0x70 command, it enters
the Read Status mode. In this mode, any read 64-KB Main
operation returns the contents of the status
PROTECTION Total 8
Flash devices incorporate some form 8-KB Parameter
of protection. The concern is that if a
processor were operating in some erroneous
way, it could erroneously issue program or 28F160C3
erase commands to the flash memory. To
prevent this, a flash memory will usually
support some combination of two basic types Figure 2
of protection schemes: hardware and
A hardware scheme requires that
specific voltages be applied to pins on the Current flash devices are blocked or
device before the device can be altered. The segmented. All the locations in a block share
voltages can be TTL logic levels or some common erase circuitry, so while only one
special voltage like 12v, and the pins can be location can be programmed at a time, an
dedicated or shared with other functions. An entire block or segment is erased at once.
example of hardware protection is the VPP pin
on a flash memory; if no voltage is applied, the Normally, the boot code will be
device will not program or erase. located in one or more blocks, the application
code placed in several other blocks and some
Software protection mandates that a non-volatile data stored in another. Figure 2
code sequence be written in order to initiate an shows an address map of an Intel®
operation. Consider the erase command in 28F800C3. Its blocks are sized to fulfill these
the preceding discussion. Two codes, 0x20 three functions.
and 0xD0, were required in order to begin the
erase operation. The first can be considered a The boot code should be put in a
command, but the second is clearly a code. separate block so that data or application code
Some devices require code sequences that are can be erased and reprogrammed without
four or six bytes long. affecting the boot program. For small boot
programs, one or more the the 8-Kbyte
parameter blocks can be used. For large boot
programs, use the main blocks. Two
variations of a boot block device are available.
One has the parameter blocks at the bottom of
the memory map, the other puts them at the would. When sufficient positive voltage is
top. applied to it, greater than the threshold
voltage, the transistor turns on, and a
SUSPEND FEATURES conduction path will be established between
the Source and the Drain.
As will be seen in the next section,
program or erase operations can take tens of The Floating Gate(FG) modulates the
microseconds or even seconds. There are affect of the CG. If the Floating Gate has a
many instances where a system will need to sufficient negative charge, it will prevent the
read data in one block while a different block transistor from turning on when the normal
is being erased or programmed. Flash threshold voltage is applied to the Control
memories implement a suspend command that Gate.
When the WSM receives a suspend CONTROL GATE
command during an erase or program FLOATING GATE
operation, it saves the state of the operation
and enters a mode that allows the array to be CHANNEL
accessed. This is like the Read Array mode,
except that the WSM is running, so the
component may draw more a little more
current. Figure 3
READING A FLASH CELL
Flash memories can be a complete
An erased cell has no additional
non-volatile memory subsystem. Current
charge on the FG, and the transistor turns on
devices implement all the pumps and control
at its characteristic threshold . When the
electronics necessary to program or erase
same cell is programmed, it has enough
electrons in the FG to prevent it from turning
on when the threshold voltage is applied. To
FLASH CELL TECHNOLOGY
read a cell, the external circuitry applies the
threshold voltage to the CG and observes if
Flash memories can do more than the transistor turns on.
store the firmware; they can store or log data.
Some understanding of how flash memories While all architectures use similar
store data will be useful in matching flash’s principles to store and read data, they do not
programming characteristics to an application. act the same from a system point of view.
NOR permits a single location—word or
Flash, EEPROM and EPROM all byte—to be read in approximately 100ns.
store data the same way. Figure 3 shows a
simplified diagram of one of these cells. It is a NAND only reads blocks of 512
N-channel transistor with a second gate, the bytes. There is an initial latency of 10 µs
Floating Gate, sandwiched between the while the block is transferred out of the
Control Gate, and the channel. memory array and into an on-chip SRAM
buffer. It can be transferred out of the buffer
The Control Gate (CG) is connected at a rate of 20 Mbytes/s.
to other circuitry and works like a normal gate
specify a device, it is necessary to understand
The type of reading supported by an how long it takes for an operation and how big
architecture determines if the memory can be a block the operation affects. An operation
executed from. The former example supports that takes 1 second may seem slow, but if it
direct execution, or eXecution In Place (XIP), erases 128-Kbytes, the throughput is 128
the latter does not. Kbytes/sec.
PROGRAMMING OR ERASING Depending on the type of flash, it will
take between 5 µs and 1 second to program or
The WSM modifies a flash memory erase a cell. Cells are usually programmed or
cell by injecting electrons onto the Floating erased in parallel, so that a slow operation is
Gate (programming) or by removing them done on a lot of cells at once. For example, a
(erase). On today’s devices, a cell is NOR architecture device may program a
programmed when approximately 50,000 single byte in 5 µS and erase 64-Kbytes in 1
excess electrons are in the FG. They stay in second
the FG for 10 years or more, because there is
no conduction path to the FG--the FG is There are three aspects to consider
completely isolated from the rest of the when choosing a flash memory that will be
transistor. The only way for electrons to get used to store data and will be programmed in-
into it is for them to be driven through the system.
oxide that surrounds the FG. The cell is
erased by removing the electrons. • What rate does the application require?
Programming and erasing are • What rate does the device support?
accomplished by applying elevated voltages, This can be the average time to
10v to 20v, to the cell to cause electrons to program and erase a byte
propagate through the oxide. This has three
consequences. • How big of a buffer is necessary, if
any, to store in-coming data while the
• It takes time to drive the current through flash is busy programming or erasing
the oxide barrier, so program and erase the previous block.
operations take much longer than in an
SRAM. Oxide Wear-Out
• The oxide barrier can “wear out” after When electrons or holes propagate
repeated program/erase cycles. through the oxide, some do not make it all the
way--they get trapped. When they do, they
• Elevated voltages are required by the alter behavior of, or wear out, the oxide.
A cell whose oxide has trapped
Program and Erase Performance charge does not program or erase as fast as
an electrically neutral one. The trapped
If the application will store data in a charges repel the carriers that are trying to
flash memory, the amount of time it takes to propagate through it. For example, if a
program or erase the device can be an programming mechanism drives electrons
important specification. Also, flash memories through the oxide, then the oxide regions that
typically operate on blocks. To accurately sustain the heaviest program current will gain
a negative charge. That negative charge given method, the rate can be increased by
repels electrons and reduces the amount of operating on many bits in parallel. To choose
current that flows into the Floating Gate. The a flash memory, match the write/erase
result is a slower program or erase operation. characteristics of the memory to the
The typical times published in application.
datasheets are for a new cell. Since the
amount of performance degradation will vary
with the technology and vendor, it is necessary
to contact the vendor to estimate what effect
cycling will have on their devices.
In general though, the degradation is
gradual and occurs over tens of thousands of
program erase cycles.
Applying Elevated Voltages
The original flash memories required
that elevated voltages be applied for precise
times. New devices can generate and control
these voltages internally. It is worth
mentioning that the flash transistor requires
these elevated voltages, and if they are not
applied externally, they are generated on-chip
by voltage pumps.
Voltage pumps take up silicon and add
cost to the device. The more current the
pump can supply the more bits that can be
programmed at once. However, that higher
performance pump takes up more space
Typical flash memories strike a balance
between the two.
Several vendors have a separate pin
for a programming voltage, VPP. One benefit
of this is that a device may program quicker
when a higher voltage is applied to the pin.
This is because the device will sense the
higher voltage and modify the internal
algorithm to program more bits at once.
The flash memory cell intrinsically
take time to program or erase. How much
time depends on the method used to for
carriers through insulating barriers. For a
unnecessary, because most manufacturers
COMMON FLASH INTERFACE make new devices compatible with their
The Common Flash Interface (CFI) is
a new way to find out what kind of flash COMMON FLASH INTERFACE
memory is in a socket. It will work even with
flash components that are designed and The Common Flash Interface solves
produced after the system software has been this problem by providing a standard way for a
written. system to interrogate a flash memory, and it
defines the format for the information. The
JEDEC IDENTIFIERS information is descriptive. Embedded in the
component is a trimmed-down datasheet that
In the past, a flash memory or an contains enough information for the system to
EPROM had a mode where a JEDEC program or erase the flash memory, even if
Identifier could be read from the device. This the component was designed after the system
was originally intended for, and used by, firmware was frozen in ROM.
EPROM programmers. When system
designers decided to make their flash sockets CFI ACCESS
accept different devices, they used these
identifiers, because they were the only way to A flash memory will enter into the
tell what program and erase algorithm to use. CFI data mode when a value of 0x98 is
written to location 0x00055. From then on, all
This technique had two flaws: there read cycles will return data from the CFI
was no standard way to read the identifier, ROM instead of the flash array. Figure 4
and there was no way to anticipate what the shows how this works.
identifiers would be for future devices.
To read the identifier from an Intel
device, the system must first write 0x90 to it, CFI
then read the identifier from locations 0x00001 ROM
and 0x00002. To accomplish the same thing
with a device from another vendor, the system ARRAY
must write 0x55 to location 0x0000, followed
by 0xAA to location 0x0000. The identifier
can then be read from locations 0x0000 and
0x0001. CFI COMPLIANT FLASH
To find out what kind of memory is in The CFI data is stored in a separate ROM
the socket, the program attempts to read an located on-chip. The memory locations in the
ROM do not detract from the size of the array.
identifier using one algorithm, and if it fails to This illustration shows a 2 Mbit device that
return the expected result, it tries a different supports CFI. It has a 2 Mbit Flash array and a
ROM with about 50 locations. When 0x98 is
one. This is workable but not optimal. written to 0x0005, the interface is switched
over to the ROM. To access the Flash again,
write 0xFF to location 0x0005.
The second issue is that the identifier
must, by definition, be unique. When a new
device is introduced, it will have a new, unique Figure 4
identifier, so all software written prior to the
release will reject the new device. This is
CFI DATA uses the Intel basic algorithm—algorithm
The ROM is separated into two areas,
data that applies to all devices and data that is There is also a mechanism to support
specific to one vendor. The generic area has minor variations of the same algorithm. These
a predetermined format. Each location stores details are located in the vendor specific area.
a specific parameter using a standard
For example, the minimum VCC is The Common Flash Interface allows a
stored in location 0x1B of the CFI ROM. It is single software driver to work with a variety
in BCD with a decimal point between the two of flash components from different
nybbles. Other information like access time, manufacturers.
block architecture, typical program and erase
times and VPP are also stored in this area. MULTI-LEVEL CELL
The format and encoding are listed in Multi-Level Cell (MLC) technology
the “Common Flash Memory Interface stores two bits in a single flash memory
Specification”. This specification was transistor, twice the density of existing devices
recently adopted by JEDEC and should appear that store one bit. An MLC flash memory
on their website soon. Check with 16 million transistors can store 32 million
www.eia.org/jedec/. Incidentally, the bits. This is not a mode that can be invoked in
28F800C3 described earlier supports CFI. any device; the flash memory must have been
designed to do this.
Charge on the Floating Gate
One location in the generic area
contains a number that identifies what 0
programming algorithm the device supports. If 01
two devices have the same algorithm, they will
have the same number in the algorithm 10
Notice the difference between CFI
data and the JEDEC ID. The JEDEC ID was
Coding for Coding for
unique for each device. Since CFI data is
descriptive, two devices with identical
programming algorithms will have the same
number in the algorithm location.
The table that lists what numbers go
with what algorithms is in a separate Current devices store data by varying
document, “CFI Publication 100”. Each flash the amount of charge on a Floating Gate. If
vendor will have a few numbers that the gate is neutral, the cell is erased and
correspond to their unique programming contains a 1. If charged, the cell is
algorithms. For example, the boot block programmed and contains a 0. These
device described in the introductory section
components implement two charge levels per A better solution is to incorporate high
cell. speed interfaces and architectures already
supported by many CPUs.
MLC devices also vary the gate
charge, but more precisely. They use four PAGE MODE
levels; each level represents one of four states A page mode flash has the same
or two bits. Figure 5 illustrates the difference interface signals as a standard asynchronous
between MLC flash memories and standard flash memory; it functions the same as the
ones. standard device except that some of the
accesses are faster than others. If a CPU
USING MLC FLASH MEMORIES supports page mode, it can take advantage of
the faster accesses.
MLC devices have the same interface
as regular flash memories. The Write State The first access to a page mode flash
Machine does all the work and improved causes a group of locations, or a page, to be
sense circuitry distinguishes between the four read and latched inside the device. The
levels to produce the read data. MLC devices location that was explicitly accessed is driven
behave the same as standard devices because onto the interface; the other locations on the
the details are handled on-chip. page are saved in case the next access is to
another location on the same page.
A page is always aligned on
For a given array size and lithography, boundaries that are integral powers of two.
MLC technology can double the amount of Typical page sizes are four or eight 16-bit
data the array can store. MLC components in words. For example, if a page mode flash
chip-sized packages offer unprecedented memory has an eight word page, the first page
capacity in a small area. One new MLC will consist of words 0 - 7, the second page, 8
device can store 64Mbits in a package that is - 15 and so on.
less than half the size of a postage stamp.
Continuing with the example, if
location 29 were accessed—address and
HIGH SPEED INTERFACES control applied just like a standard device—
locations 24 - 31 would be read from the
There is a fundamental difference array, internally latched, and the data from
between logic devices, like CPUs, and flash location 29 driven onto the bus. All this
memories: As lithographies shrink and the happens in a normal access time, say 100ns.
operating voltages go down, logic devices If the next access is to one of the locations on
speed up; flash slows down. the same page, the data will be driven onto the
bus much sooner, perhaps in 25 ns. The
One solution to this dilemma is to performance of a real-world device, the Intel®
spend die area to increase the raw speed of 28F160F3, is 90 ns for the first access and 25
the flash array, and this is practical for speed ns for subsequent access. Its page size is 4,
increases on the order of 20%. Of course, the 16-bit words.
higher speed devices will be more expensive.
This technique is not practical for significant Since most accesses to code memory
performance increases; the die size and cost are sequential, a page mode memory can
would be too high for high volume applications. substantially increase the performance of a
5 times faster than a standard asynchronous
SYNCHRONOUS BURST INTERFACE
Unlike page mode, a synchronous burst flash Flash Memory Techniques
memory has unique signals that are not
present in standard memories, most notably, a Armed with an understanding of flash
clock. As the name implies, a synchronous memory technology, a designer can improve
interface uses a clock to time accesses to the the manufacturing flow and add value and
flash. versatility to a product.
A burst begins when a starting address is Many systems are data-centric. The
latched on a clock edge. After some number input, manipulate or store some form of data.
of clocks elapse, typically 1 to 4, the data for A CNC machine uses programs (data) to
the requested location is driven onto the bus, create piece parts. A seismograph records
again on a clock edge. After some number of the motion of the earth on some media. Even
clocks elapse, typically 1 or 2, the next location applications that do not operate on data, will
is driven onto the bus, then the next, and so on. probably need to store and update some
The number of clocks for the first and configuration information.
subsequent access is programmable and
depends on the clock frequency. A system that uses linear flash to
store the executable code can, with the right
Internally, the burst device uses that same software, store this data in the same
access technique as a page mode device, it component that stores the code.
reads multiple location at once, latches them
and drives them out one by one. It can be a This section discusses how to write
little faster than the page mode device, and, more importantly, modify data in flash,
because for burst accesses, the device does specifically, in the same flash component as
not need to get an address, decide what kind the code.
of access to do and drive the right word onto
the bus. Since the locations are always read WRITING DATA TO FLASH
in a pre-determined order, the output circuitry
can be streamlined. The order that locations Flash is routinely used to store code,
are access during the burst may be but most applications also need to permanently
programmable. store data. The data can be configuration
information that is has a fixed size, a log of
The ability to program the time and burst order some physical process that is updated
means that the device must be initialized frequently or a pictures in a camera of varying
before burst access can be done. It typically sizes that are managed with complete
powers up looking like a standard device. flexibility.
Once initialized both the flash and the CPU
must be programmed to begin burst operation. The only media that can store both
executable code and data is linear flash
The 28F160F3 mentioned previously also has a memory, and new flash drivers store both
burst mode. Given a clock input of 54 MHz, it code and data in a single component. The
can perform the first access in 4 clocks; firmware can be executed from one part of
subsequent ones in one clock. That is almost the component, while the data is stored
another. With finer lithographies and MLC block be erased to modify the small data set?
technology, that component will have enough No, rather than erasing the block and rewriting
storage capacity for all but the most complex the small data set, a technique can be
systems. employed where the old version of the data is
marked invalid, and the newest version written
The flash driver that implements this into the block. Figure 6 illustrates this. A one-
single-chip code + data solution must do two location flag in the data-set will be used to
things, manage the media and manage system keep track of which one is valid. When a new
events. version is written to the block, , its “valid” flag
will be all ones; the “valid” flag of the previous
one will be programmed to zero. Notice that
MEDIA MANAGEMENT for our example, a 64 KB block can handle 64
The requirement for a media manager modifications of a 1KB data set.
arises from flash memory’s program and
erase capability: A single location can be From a cycling perspective, if a block
programmed, but an entire block must be can be erased 100K times, it can support 6.4
erased. million modifications of the data set—the data
The functionality of the manager can could be modified once each minute for over
vary with the type of data being stored. If the 10 years and not exceed the cycling
size of the data matches the block size in the capabilities of the flash. Incidentally, most
flash, the media manager will be trivial. For flash memories can program 1K locations
example, if the application were a digital within 20 ms.
camera, and if the size of the pictures were
always 64-KB, the same size as the flash
block, then when a new picture was taken, a
block would be erased and the new data
written. Another example of a simple media
manager is for data-logging applications. The ed
storage area can be completely erased and us
data new data programmed as it is acquired. Un
Few data storage requirements fit Current Copy
flash’s architecture as well as these. A more Old Copy
realistic example would be a media manager
that stores configuration information.
Simple Media Manager
Consider an application that uses a Flash Block
small amount of configuration information. It
is a fixed size, for this example 1KB, and is Figure 6
updated periodically. Finally, assume the flash
device has 64-KB blocks. This kind of simple media manager is
used in countless embedded systems.
There are no issues with writing the However, it is inadequate for applications that
data-set the first time, but what happens when have variable length data-sets, especially
it needs to be modified? Obviously the where the data size is not known at build time.
contents cannot be overwritten without first
erasing the block. Must the entire 64Kbyte
Those applications require a more robust Copy the necessary service routines
manager. to the alternate execution memory so
they will be available when needed.
General Purpose Media Manager
Suspend the program operation. This
A general purpose media manager is is a new capability for flash and is
usually required if there are a number of becoming a common feature in new
different data-sets of differing sizes that may devices. If the flash is busy, the LLD
be modified. issues a program or erase suspend;
and within about 20µs, the flash will
The details of implementing a general be available for reading or executing.
purpose media manager are beyond the scope
of this paper, and it is not necessary to create The last technique is the most general
one; many real time operating systems include purpose, because it does not require any
support for a random access file system in changes in functionality or redesign of the
linear flash. An alternative is to use one from system software.
a flash vendor. For example, Intel offers
several flash media managers with varying Configuration
levels of capability. Data
It has been mentioned several times that the
typical flash component goes busy when it is
programming or erasing. That is not an issue
if the data storage is in a separate component Executable
as the code, but what if the executable code Code
and data are in the same chip?
The standard solution is to copy a Boot Code
small low level driver (LLD) to some alternate
execution memory, and run from it. That LLD
has just enough code to program a location or
erase a block, return the flash to Read-Array Component
mode when the operation is finished and jump Figure 7
back to the firmware in flash.
What happens if a real time event The event manager is part of the low
occurs while the flash is busy and the system level driver that typically resides in RAM.
cannot access the appropriate service Prior to beginning a program or erase
routines? There are several techniques operation, it disables all interrupts in the
system. Once the program or erase operation
Postpone the programming until there is begun, the manager polls for either a ready
is no chance a real time event will status from the flash or an interrupt. If the
occur. flash is ready, it enables interrupts. If an
interrupt occurs, the event manager suspends
the operation, puts the flash in Read-Array
mode and enables interrupts. At this point, the number of OSs that are stored in
interrupt gets serviced and control is the code partition.
eventually returned to the event manager. It
disables the interrupts again and resumes the • The configuration data partition
suspended operation and continues polling. will usually be a small one and
managed by a simple media
manager. Its function is to store
From a system perspective, the only basic information about the
affect all this has is small increase, on the system like its serial number, what
order of 20us, in the maximum interrupt drivers to load or perhaps what
latency. If the application is not sensitive to OS or version of the OS to
this, then the system can benefit from the execute.
increased integration. Those applications that
cannot tolerate the additional latency can fall • The media manager for the File
back to separate components for code and partition must be robust. It must
data. handle variable file sizes, with
multiple open file, that are
A COMPLETE SOLUTION shrinking or growing. Depending
on the implementation, it may
Ideally, the memory map for a system even need to manage the
would look like the one shown in Figure 7. A directory structures.
single flash component is divided up into three
partitions: executable code, file storage and • There are flash components that
configuration storage. This implementation are on the market today that have
supports two kinds of data. Configuration data enough density to easily store all
is accessed without the OS, so its contents can this information in a single chip.
be used to configure the system before the OS
loads. Data stored as files are accessed using
the OS’s file interface—the application opens, Solutions like this exist today. One
creates, reads and writes files to the flash just example is Intel® Persistent Storage
like it does to other media. Manager. It supports storing executable code,
files and configuration information in systems
This solution has several key benefits. that use the Windows CE* OS.
• It is a single chip solution. COMPARISON OF FLASH AND OTHER
• The system can execute directly
from the code partition. Almost every kind of memory has at
Obviously this solution includes an one time or another been adapted to
event manager. This code permanently store data or code. If a
partition will consist of the boot technology like RAM is volatile, batteries are
code and the OS image or images. used to keep it alive. If an application calls for
Programming the boot code will rugged storage, a special enclosure is added to
be discussed in the next section protect a disk drive. Many of these
applications would be better served by flash
• From an architectural viewpoint, memory. Conversely, there are applications
there is no reason to limit the where flash is not appropriate.
Battery Backed-up RAM hard disk only costs a few hundred dollars.
However, hard disks suffer from a high fixed
Battery backed-up DRAM or SRAM cost, high power, and since they are
(BBRAM) has the advantage of fast, simple mechanical systems, poor reliability and
writes with no limitations on the number of ruggedness.
times a location can be modified. This makes
it ideal for its primary purpose: storing
temporary, volatile data. This is one EEPROM
application that flash is not suited for.
The most common type of EEPROM
Depending on the kind of RAM, there is a small footprint serial device that stores a
are several issues with using it for persistent few Kbytes. Each location can be individually
storage. Both DRAM and SRAM suffer from erased and programmed. They are a
intermittent data corruption. For no apparent convenient way to add a few Kbytes to a
reason, over a long period of time, something system, but they may be unnecessary--a
happens to the contents. Pocket organizers simple program can make a couple of flash
typically use BBRAM to store user data. blocks behave like an EEPROM.
Experienced users of these have learned to
back-up their data. SUMMARY
SRAM is expensive. Where flash is High density linear flash memory
½ or 1 transistor per bit, an SRAM bit uses 4 along with the right software can dramatically
or 6 transistors. This may not be an issue for increase the functionality of a system with
an application that only stores a few Kbytes, little or no effort on the part of the developer.
but it will be for one that stores hundreds of In some cases, complete software drivers are
Kbytes. available that allow applications to store files in
flash using standard OS file operations.
DRAM is low cost, but the power
required to keep it refreshed is significant, PROGRAMMING BOOT CODE IN-
especially because it must be powered all the SYSTEM
time. Consider a 3.3v DRAM array that
requires 500µA of self refresh current. If two The boot code resides at the CPU's
AAA NiCad cells (capacity of 250mAh) are reset address and performs critical functions
used to supply the power, that array has a to start the system. These functions can be as
shelf-life of 500 hours, or about 3 weeks. simple as initializing a few interrupt vectors
prior to branching to resident applications
The final issue with BBRAM is code, or it can involve something more
batteries—they add cost and size to a system. complex like loading the application code from
a disk prior to branching. In a system with
Rotating Media flash memory, this boot code may also contain
the necessary code for in-system application
Rotating media like floppy disks or program updates.
hard disks enjoy extremely low cost per bit,
and they excel at storing large amounts of A system cannot program its own
data. There is no question that a 10-gigabyte boot code, because the boot code is necessary
flash array would be prohibitively expensive for the CPU to operate. Therefore, a typical
for most applications, while the same sized production flow involves programming the boot
code into the flash memory prior to soldering it
*Other brands and marks are the property of their respective owners
to the PCB. A better way would be to solder and those that simply isolate the flash from
blank components to the PCB and program everything else, so it can be manipulated.
the boot code in-system. There are several
advantages to doing this: reduced component
handling, reduced inventory and improved
There are two mutually exclusive JTAG Boundary Scan
trends in flash packaging: the size is
decreasing and the lead count is increasing. If the CPU chip includes a JTAG
The only way to satisfy both is to decrease the interface, that interface can be used to
lead pitch. An example of this is the TSOP manipulate the CPU's pins and hence the bus.
(Thin Small Outline Package) package; it has
a 0.5 mm lead pitch. As components shrink, JTAG is a standard interface and
they become more difficult to handle reliably, protocol consisting of a 4 or 5 signals that can
and a programming step introduces opportunity shift commands and data into a device. The
for lead damage. New chip-sized packages basic protocol is to shift in a command which
make a programming operation even more will usually be 4 to 8 bits long, then shift in the
difficult. data for the command. The JTAG interface
is static, so there is no maximum time between
Programming and handling equipment bits or edges.
is expensive, so many companies elect to have
a third party do their flash programming. This All JTAG implementations will have
adds lead-time to the production process, and the capability to do boundary scan. The
if different products have different boot codes, boundary scan register is a shift register that is
the added lead-time will mean increased connected to all the I/O pins in the device.
inventory. This inventory makes it difficult to The state of all input pins can be latched into
respond to unexpected events such as this register then shifted out, or the levels to be
emergency orders output can be shifted in, then driven onto the
pins. All this can be done with the CPU in a
When a flash memory is programmed, suspended state.
it becomes more application specific, and
therefore, less flexible. The best approach is The result of this is that bus cycles,
to postpone the programming until it is albeit slow ones, can be generated by shifting
necessary, and allow the capability to change in the correct levels for address, control and
the programming. In-system programming of data pins; driving them onto the bus; latching
boot code achieves both of these goals. any inputs and shifting them out. The entire
boundary scan register must be shifted in or
TECHNIQUES out to write or read a pin, but more than one
pin can be read or written in a single shift
The basic strategy for including this operation.
capability in a system is opportunistic--one
looks at what one has to work with, then For a more complete discussion of this
chooses the tactic that minimizes system cost technique, refer to "Designing for On-Board
and manufacturing infrastructure penalties. Programming Using the IEEE 1149.1 (JTAG)
Most of these techniques will fall into two Access Port ", Intel Application Note AP-630
main categories: those that make use of the . It is available from
CPU or interface logic to operate on the flash,
noise and ringing, but even so a location could
and includes sources for commercially be programmed in 30 microseconds. All 8K
available JTAG programmers locations could be programmed in .25 seconds.
This is well within the requirements for high
CPU Debug Interface volume manufacturing.
Many CPUs have a debug interface.
This usually takes the form of a proprietary In this technique, all the pins of the
serial interface and protocol that can access flash must be manipulated by an external
internal CPU registers and the bus. Although programmer. The best way to do this is as the
included on the same chip, this capability is last step in testing a board on a bed-of-nails
totally separate from the CPU, so the CPU tester. See AP-629, “Simplify Manufacturing
need not be running in order for this to work. by Using Automatic-Test-Equipment for On-
The commands that are useful for Board Programming” at
programming flash are ones that simply read
and write memory; they are used to write www.intel.com/design/flcomp/applnots
commands to and read status from the flash.
for more information on this topic.
Contact the CPU vendor for
information on the debug interface and any Programming the Application Code
tools that support it.
Once the boot code is installed, the
Isolating The Flash applications programs can be loaded in. Some
manufacturing flows involve programming the
If there is no JTAG or debug interface flash with test and calibration routines first,
on the CPU, and if there is little or no then when they are no longer needed, the
interface logic to modify, then a third application code is loaded in.
approach is to get the CPU completely off the
bus so it will not interfere with the There are two functions that must be
programming operation. included in a system in order for a CPU to
program its own application code. There must
Many CPUs support some mechanism some means of inputting the new program: a
that relinquishes the bus. This could be a serial port or a floppy disk are two good
HOLD/HOLD_ACK protocol, or a test mode examples.
such as ONCE (ON Circuit Emulation).
Either mode could be invoked by signals that Also, there must be some alternative
are applied by the programming connector. execution memory that the CPU can execute
When the programming controller is not from while the flash is being programmed.
connected, these signals could be weakly Recall that while the WSM is programming
deasserted. the flash, any reads will return invalid
information, and when it is in the Read Status
Once the CPU is not driving the Mode, any reads will return status. Since the
memory bus, the programming controller can flash memory array, and hence the program, is
take over, and do reads and writes to the not available, the CPU must run out of some
flash. Since it is a parallel interface, it can be other memory. One of the most common
the highest performance method for solutions to this is to have the CPU copy a
programming the boot code. The bus small routine into SRAM, then run from it
operations should be slowed down to prevent while the flash is programming.
Flash Memory’s unique capabilities are tools a
designer can use to improve a product and