Flash File Systems Flash differentiates between write and erase. An erased
ﬂash is completely ﬁlled with ones, while a ﬂash write
Hackers Hut / Linux Kernel will ﬂip bits from 1 to 0. The only way to ﬂip bits back
from 0 to 1 is to erase.
Guido R. Kok A big difference between ﬂash memory and mag-
netic hard disks is that the erase operations on ﬂash only
Faculty of Electrical Engineering, operate on large blocks. Erases on ﬂash happen in coarse
Mathematics and Computer Science granularities of powers of two, ranging from 32k to 256k
blocks. Writes can occur in much smaller granularities,
University of Twente such as individual bits for NOR ﬂash, to 256, 512 or 2048
bytes in case of NAND ﬂash. More details on NOR and
NAND ﬂash will be given in Section 2.
Hardware manufacturers try to use ﬂash memory as
magnetic hard disk replacements. Before ﬂash memory
Abstract can threaten the dominating market position of magnetic
Using ﬂash memory in mass storage devices is an up- hard disks as means of mass storage media, there are
coming trend. With high performance, low energy con- several limitations on ﬂash that need to be coped with by
sumption and shock proof properties, ﬂash memory is an the ﬁlesystems running on them.
appealing alternative to the large magnetic disk drives.
However, ﬂash memory has other properties that warrant 1.1 Flash limitations
special treatment in comparison to magnetic disk drives.
Instead of the disadvantages of slow seek times for mag- 1. The lifetime of ﬂash memory is ﬁnite. It is measured
netic hard drives, ﬂash memory has other disadvantages in the amount of write-erase cycles on an erase block
that need to be coped with, such as slow erasure times, big before the block begins to fail. Most hardware man-
erasure blocks and blocks wearing out. Several ﬂash ﬁle ufacturers guarantee 100.000 cycles on each erase
systems have been developed to deal with these shortcom- block in their chips. Magnetic hard disks do not have
ings. An overview and comparison of ﬂash ﬁle systems is such a limitation.
2. Flash requires out of place updates of data, mean-
ing that a new version of data has to be written to a
1 Introduction new block instead of overwriting the old data. Before
being able to write to a speciﬁc location, the target
Flash memory is growing to be one of the main means erase block must be erased. If an unclean unmount
of mass data storage. Compared to the traditional and occurs at this time, data loss will occur, as both the
very common magnetic hard disks, ﬂash memory offers old and new data cannot be retrieved.
faster access times and better kinetic shock resistance.
These two characteristics explain the popularity of ﬂash 3. Erase blocks are far larger than magnetic hard disk
memory in portable devices, such as PDAs, laptops, sectors or blocks. One erase block on ﬂash is there-
mobile phones, digital cameras and digital audio players. fore shared by multiple ﬁlesystem blocks. Because
of the out of place updates, erase blocks get par-
Two types of ﬂash will be considered in this paper; tially obsoleted. Once free space runs low, a tech-
NOR (Negative OR) and NAND (Negative AND) ﬂash. nique called Garbage Collection starts to collect
These two types of memory differ in details, but they valid ﬁlesystem blocks to free space. For more de-
share the same principles as any ﬂash memory does. tails, see Section 3.3
It can be seen that traditional ﬁle systems that were de- direct-access, the density of NOR memory is low com-
signed for magnetic hard disk usage, are not suitable to pared to NAND ﬂash. In 2001, a 16MB NOR array would
cope with the ﬂash memory limitations mentioned above. be considered large, while a 128MB NAND was already
This paper will will give an overview of techniques that available as a single chip.
cope with the above problems and ﬂash ﬁle systems that
implement these techniques in their own way.
2.2 NAND Flash
1.2 Paper layout
In section 2 a more detailed explanation on NOR and the All operations on NAND ﬂash (read, write, unlock, erase)
newer NAND ﬂash will be given, pointing out their dif- operate on a block-by-block basis. For NAND ﬂash there
ferences and limitations. Section 3 a presentation is given are two kinds of blocks; write blocks, also known as
on the general block mapping techniques and garbage col- pages, are typically 512 to 4096 bytes in size. Associated
lection in ﬂash ﬁlesystems. With the general approach on with each page are some Out-Of-Band (OOB) bytes,
garbage collection and mapping techniques, section 4 ad- which are used to store EEC codes and other header
vances on this topic by giving a survey on several ﬂash information. The other type of NAND block is the erase
ﬁle systems, including FFS, JFFS, YAFFS, LogFS and block.
UBIFS. Section 5 compares the ﬂash ﬁlesystems and con- Typical erase block sizes can vary between 32 pages of
cludes the paper. 512 bytes each for an erase block size of 16 kB, up to
128 pages of 4,096 bytes each for a total erase block size
of 512 kB. Programming and reading can be performed
2 Background on a page basis, while erasing can only be performed on
erase blocks. Pages are buffered before a write operation
NOR and NAND ﬂash are non-volatile memory that is executed, because each erase block can be written only
can be electrically erased and reprogrammed. They up to four times before information starts to leak and the
are successors of EEPROM (Electrically Erasable Pro- block must be erased.
grammable Read-Only Memory), which was considered
too slow to write to. NOR ﬂash was the ﬁrst to appear in NAND memory has more capacity and lower costs
1984, while the ﬁrst NAND principles were presented in due to two reasons:
• The external buses in NOR ﬂash have been removed,
placing the memory cells in series rather than paral-
2.1 NOR Flash
lel to the bit lanes. This saves space, enhancing the
Read operations on NOR ﬂash is the same as reading from density of memory cells, but at the cost of no more
RAM, as NOR ﬂash has external address buses that al- direct access.
low to read NOR memory bit-by-bit. This direct access
read ability allows the NOR ﬂash to be used as eXecute-
In-Place (XIP) memory, meaning that programs stored in • The memory cells inside NAND are not guaranteed
NOR ﬂash can be executed directly without the need to error-free when shipped. Bad block management
copy them into RAM ﬁrst. XIP reduces the need of RAM, needs to be implemented in the ﬁle systems in or-
but as a disadvantage compression cannot be used. der to handle bad blocks. Because the manufac-
Unlocking, erasing and writing NOR memory operates on turers dropped the requirement that each cell must
a block-by-block basis, which are typically 64, 128 or 256 be error-free (except the ﬁrst physical block), far
bytes big. higher yields can be made, dropping the manufac-
Due to the integration of external address buses that allow turing costs.
3 Block mapping techniques 1. Frequently updated blocks are written to different
sectors each time it is modiﬁed, evening out the wear
The earliest approach to use ﬂash memory is to treat it of different erase blocks
the same way older ﬁlesystems do, like FAT. FAT treats
ﬂash memory as a block device that allows data blocks to 2. An updated block is written to a new location with-
be read and written. This is the approach typically used out the need to erase and rewrite an entire erase block
on magnetic hard disks. However, this linear mapping of
blocks on ﬂash addresses give cause to several problems. 3. When power is lost during a write operation, the dy-
First of all, by rewriting new data to the old location, fre- namic mapping makes sure that this is an atomic op-
quently used data translates to a block that is used fre- eration, meaning that no data is lost
quently. This is no problem on magnetic hard disks, but
as mentioned in section 1.1, ﬂash memory blocks wear The atomicity mentioned in item 3 is achieved by in-
out, causing failure of the memory block if it has been formation stored in the header associated with each data
written to too often. sector. When blocks needs to be written to ﬂash, the soft-
Secondly, data blocks can be a lot smaller than the phys- ware searches for a free and erased sector. In this sector
ical erase blocks. Writing a small data block to the old and its associated header, all bits are set to 1. To achieve
location means that a big erase block has to be read into atomicity, three bits in the header are used. Before the
RAM, the appropriate data block is overwritten, the erase block is written, the used bit is cleared (made 0), to in-
block in the ﬂash is erased and the whole erase block in dicate that the sector is no longer free. Then the virtual
the RAM is written back into ﬂash. Clearly this approach block number is written to the header and the new data is
is very inefﬁcient. written to the sector. The used bit can be used in conjunc-
Finally, if an unclean unmount, such as a power outage, tion with the virtual block number, under the requirement
occurs during the above operation data is not only lost in that an virtual block number consisting of all ones, is not
the form of the small data block, but also the total con- a valid block number.
tents of the erase block can be lost. Once the data has been written, the so called valid bit is
The above problems are solved with more sophisticated cleared to indicate that the data in the sector is ready to be
block-to-ﬂash mappings and wear-leveling. read. The last bit called the obsolete bit, is cleared in the
header of the old sector, once that sector does not contain
the latest version of the virtual block.
3.1 Block-mapping in ﬂash When power is lost during a write operation, the system
can be in two states. The ﬁrst happens when the power
Instead of using a direct mapping of blocks onto physi- is lost before the valid is cleared. In this case, the sector
cal addresses, the basic idea of dynamic mapping is de- contains data that is no longer valid and when the ﬂash is
veloped. The idea is that blocks presented by the higher used again, this sector is marked obsolete to ready it for
layer functions, such as an Operating System, are identi- erasure.
ﬁed by a virtual block number. This virtual block num- The second inconsistent state occurs after the sector is
ber is mapped to a physical ﬂash address called a sector marked valid, but before the old sector is marked obso-
(Some authors use a different terminology [GT05]). This lete. In this case there are two valid data sectors of the
mapping is stored in RAM and can be updated, giving a same virtual block and the system can choose which one
virtual block a new physical location. This idea is com- to use. In the case it is important to pick the most recent
monly used in wear leveling techniques as follows. When version, a two-bit version number can be inserted in the
a virtual block is written to ﬂash, it is not written to the sector header, indicating a version number that has 4 dif-
old location, but written to a new location and the virtual- ferent values, where the version number 0 is more recent
block-to-sector mapping is updated with the new physical than 3.
location of the block. This dynamic mapping approach For more information on these header ﬁelds the reader is
serves multiple purposes: referred to [GT05] and [Man02].
3.2 Block mapping data structures 2. The valid sectors in each erase block are copied to
free sectors in newly allocated erased blocks.
In ﬂash memory mappings cannot be stored in the same
sector due to the wear-leveling. In order to ﬁnd a sector 3. The data structures used in the mapping proccess are
that contains a new block, two approaches have been updated to reﬂect the new location of the valid sec-
developed. Direct maps are maps containing the current tors.
location of a given block, while inverse maps store the
identity of a block given a sector. In other words, direct 4. The erase block is erased and its sectors are added to
maps allow efﬁcient mapping of blocks to sectors, while a free-sector pool. This step might include writing
inverse maps does the inverse by mapping sectors to an erase-block header to specify details such as an
blocks. erase counter.
Inverse maps are stored in the ﬂash itself. The virtual
block number in the header of a sector point out to which The choice of which erase blocks to reclaim and where
block the sector belongs to. The virtual block number can to move the valid sectors to, affect the ﬁle system in three
also be stored in a sector full of block numbers, as long ways. The ﬁrst is the efﬁciency of the garbage collection,
as they are stored within the same erase unit, so that on measured in how many obsolete sectors are reclaimed
erase all data and associated block numbers are erased. in each erased block. The more obsolete sectors in
The main purpose of these inverse maps is to reconstruct comparison to valid sectors, the higher the efﬁciency.
the direct map upon device initialization, such as a mount. The second effect is wear leveling, where the target free
Direct maps are at least partially, if not totally, stored block to copy the valid sectors to, is under inﬂuency of
in RAM, which supports fast lookups. When a block how many times that block has already been used.
is updated and written to a new location, its mapping The last effect is the way these choices affect the mapping
is updated in the RAM. Updating of the mapping in data structures, as some garbage collections only require
ﬂash would not be possible because it does not support one simple update of a direct map while other systems
in-place modiﬁcation. may require complex updates.
To summarize, inverse maps ensure that sectors can be Wear leveling and garbage collection efﬁciency of-
linked to the blocks they contain, while direct maps allow ten produce contradictory results. The best example is
the system to make fast lookups of the physical where- when we have an erase block ﬁlled with static data, data
abouts of blocks, as it is stored in RAM. Direct and in- that is never or rarely updated. In terms of efﬁciency,
verse maps, as well as the atomicity property are illus- it would be a terrible mistake to select this block for
trated in Figure 1. garbage collection, as there are no obsolete blocks that
would be freed. However, garbage collecting this block
has it’s use in terms of wear leveling, as such a static
3.3 Garbage Collection block would have a very low wear. Moving the static data
Garbage collection for ﬂash ﬁlesystems has its basis in the to a block that already has a high wear, reducing the wear
principle of ”segment cleaning,” designed by Rosenblum on that block, while making the low wear erase block
et al. [RO92]. Data that is no longer needed, is not deleted available for dynamic data.
but obsoleted. Obsolete data is still in the ﬂash and oc-
cupies space. Obsolete data cannot be deleted at once,
as there may be valid data remaining in the same erase
block. The implementation, efﬁciency and activation of 4 File Systems
the Garbage Collection depends on which ﬁle system is
This section will give an overview of ﬁle systems used on
used, but can generally be described in four stages:
ﬂash memory. The ﬁrst approach is to use a Flash Transla-
1. One or more erase blocks are selected for garbage tion Layer and run a normal ﬁlesystem on top of that, such
collection as FAT. The combination of FTL and FAT is used a lot in
Figure 1: Block mapping in a ﬂash device. The gray array on the right is the direct map, which resides in RAM. Each
sector contains a header and data. The header contains the virtual block number, an erase counter, a valid and obsolete
bit, as well as an ECC code for error checking and a version number. The virtual block numbers in used sectors
constitute the inverse map, from which the direct map can be constructed. The erase counter is used in wear-leveling,
where the valid and obsolete bit and the version number support the atomicity and consistency of write operations.
The ECC code supports to detect errors in failing blocks. Courtesy of Gal et al. [GT05]
removable ﬂash devices as portability is a main require- • Optionally, depending on the concrete FTL used,
ment. After FTL a background will be given on which more writes to ﬂash may be necessary to update
principle dedicated ﬂash ﬁle systems work; Journaling block-to-sector mappings, increasing erase and free
and Logging. Then several ﬂash ﬁlesystems will be han- block counters and so on
dled, old ﬁle systems such as Microsoft’s FFS, currently
used ﬁlesystems such as JFFS and YAFFS and promising
ﬁlesystems still in development, such as LogFS.
4.1 Flash Translation Layer A FTL keeps track of the current location of each sec-
tor in the emulated block device, which makes it a sort
The ﬁrst approach for ﬁle systems to be used on ﬂash of journaling ﬁle system. Journaling ﬁle systems will be
memory is to emulate a virtual block device, which can be handled in the next section. The idea of the FTL stems
used by a regular ﬁlesystem such as FAT. A Flash Trans- from a patent owned by A. Ban [Ban95], while this patent
lation Layer (FTL) provides this functionality and takes was adopted as a PCMIA standard in 1998 [Cor98b].
care of the drawbacks mentioned in section 1.1. FTL was created for NOR memory only, but in 1999 on
A write of a block to the virtual block device handled by a NAND version of FTL was developed, called NTFL
the FTL causes the FTL to do three things: [Ban99] . NTFL is incorporated in the DiskOnChip de-
• The content of the data block is written to ﬂash Because the use of an extra layer between a regular ﬁle
system and the ﬂash device is inefﬁcient, scientists started
• The location of the old data is marked obsolete to work ﬁlesystems speciﬁcally made for ﬂash memory.
The fact that the use of FTL and NTFL was heavily re-
• Garbage collection may be activated to free up space stricted by all the patents involved, fueled the need for
for later use ﬂash speciﬁc ﬁlesystems.
4.2 Background: Journaling and Log struc- ﬂushed to the log. When a crash occurs the table needs to
tured ﬁle systems be reconstructed, so the ﬁlesystem searches for the latest
entry of this table in the log and scans the remaining part
Most of the ﬂash speciﬁc ﬁle systems are based on the of the log to ﬁnd ﬁles whose position changed after the
principle of Log-structured ﬁle systems. This principle is table was ﬂushed.
the successor of Journaling ﬁle systems, which we will The main advantage of Log-structured ﬁle systems is
explain ﬁrst. the favorable write speed, as writings occur at the end of
the log, so it rarely has seek and rotational delays. The
In Journaling ﬁle systems modiﬁcations of metadata downside of log-structured ﬁlesystems is the read perfor-
are stored in a journal before modiﬁcations are made to mance. Reading can be very slow as blocks of ﬁles may
the data block itself. When a crash has occurred, the be scattered around, especially if blocks were modiﬁed at
ﬁxing-up process examines the tail of the journal and rolls different times. In the case of magnetic hard disks this re-
back or completes each metadata operation, depending sults in a lot of seek and rotational delays. This downside
on which moment the crash occurred. Journaling ﬁle is the main reason Log-structured ﬁlesystems are rarely
systems are used a lot in current ﬁle systems for magnetic used on magnetic hard disks.
hard disks, such as ext3 [Rob] and ReiserFS [Mas] on However, Log-structured ﬁlesystems are an excellent
Linux systems. choice for ﬂash devices, as old data cannot be overwrit-
ten, so a new version must be written to a new location
The principle of Log-structured ﬁlesystems was de- anyway. Furthermore, read performance is not affected
signed to be used on magnetic disks, on which it is not by ﬂash devices, as ﬂash has uniform low random-access
currently used much. However, the idea is very useful in times. Kawaguchi, Nishioka and Motoda were the ﬁrst to
the context of ﬂash memory. Log-structured ﬁlesystems point out that Log-structured ﬁlesystems would be very
are a rather extreme version of journaling ﬁlesystems, in suitable for ﬂash memory [KNM95].
a way that the journal/log is the ﬁlesystem. The disk is
organized as one long log, which consists of ﬁxed-size
4.3 Microsoft’s Flash File System
segments of contiguous disk fragments, chained together
as a linked list. In the mid 1990’s Microsoft developed a ﬁlesystem for
When data and metadata are written, they are appended removable ﬂash memories, called FFS2. Documentation
to the tail of the log, never rewritten in-place. When of the supposedly earlier version FFS1 can not be found.
(meta)data is appended to the end of the log, two prob- The ﬁrst patent used in the development of FFS2 [BL95]
lems arise; how to ﬁnd new data and how can garbage describes a system for NOR ﬂash that consist of one large
collection work properly. In order to ﬁnd new data, erase unit. This results in a write-once device with the
pointers to that data must be updated, and these new exception that bits that are not cleared yet can be cleared
pointers are normally also appended to the end of the log. later. FF2 uses linked lists to keep track of the ﬁles and
This recursive updating of data and pointers can lead to it attributes and data. When a ﬁle is extended, a new
a snowball effect, so Rosenblum et al [RO92] came up record is appended to the end of the linked list, followed
with the idea of implementing inodes in Log-structured by clearing the next ﬁeld of the last record of the current
ﬁlesystems. list (which was all ones before).
Inodes are data structures containing of ﬁle attributes, As can be seen in Figure 3, each record consists of 4
such as type, owner and permissions, as well as the ﬁelds; a raw data pointer which points to the start of a
physical addresses of the ﬁrst ten blocks of the ﬁle. If the data block, a data size ﬁeld to state the length of the data
ﬁle data consists of more than 10 blocks, the inode will used, a replacement pointer to be used in updates of data
point to indirect blocks, which will point further to data and a next pointer to be used in appending data in the ﬁle.
blocks or lower layered indirect blocks, see Figure 2. The Updates within the data of a ﬁle is a more difﬁcult prob-
inode to physical location of related ﬁles is stored in a lem. Because records point to raw data and data is written
table, kept in RAM memory. This table is periodically once, the replacement pointer is used to indicate that the
Figure 2: An inode in the top of the ﬁgure can point directly to data at level 0, or via indirect blocks which support
fragmented and/or big data ﬁles. Courtesy of Engel et al. [EBM07]
Figure 3: The data structure of the Microsoft Flash File System. The data structure shows a linked-list element
pointing to a block of 20 raw bytes in a ﬁle. Courtesy of Gal et al. [GT05]
data pointer and next pointer are not valid anymore. The data. This drawback is caused by the design decision to
replacement pointer points to a new record that uses a part try to keep the objects with static addresses, meaning that
of the old data, while a second record points to the up- each ﬁle starts at the same physical address, no matter
dated data. Figure 4 shows this lengthy and cumbersome which version it is. This design makes it easy to ﬁnd
approach. things in the ﬁlesystem, but requires long and inefﬁcient
As can be seen, a big drawback of FFS2 is that in case traversing of invalid data chains to ﬁnd current data.
of dynamic data which changes frequently, a long linked The log-structured approach makes it more difﬁcult to
list has to be traversed in order to access all the data. locate objects as they are moved around on updates, but
Suppose we have a ﬁle with the ﬁrst 5 bytes of data being once found, sectors pointing to invalid data do not need
updated 10 times. In this case when we try to access the to be traversed.
ﬁle, a chain of 10 invalid records needs to be traversed
before the record is reached that points to the most recent Douglis et al. report very poor write performance
Figure 4: Updating in FFS2. The data structure is modiﬁed to accommodate an update of 5 of the 20 bytes. The data
and next-in-list pointers of the original node are invalidated. The replacement pointer, which was originally free (all
1s; marked in gray in the ﬁgure), is set to point to a chain of 3 new nodes, two of which point to still-valid data within
the existing block, and one of which points to a new block of raw data. The last node in the new chain points back to
the tail of the original list. Courtesy of Gal et al. [GT05]
for FFS2 in 1994 [DCK + 94], which is presumed to marked obsolete. When storage space runs low, garbage
be the main reason why it failed and to be reported as collection kicks in. Garbage collection examines the head
obsolete by Intel in 1998 [Cor98a]. of the circular log and moves valid nodes to the tail of the
log and marks the valid node at the head of the log obso-
lete. Once a complete erase block is rendered obsolete, it
4.4 JFFS is erased and made available for reuse by the tail of the
JFFS1 has several drawbacks:
Journaling Flash File System (FFS1) was developed by • At mount time, the entire device must be scanned
Axis Communication AB [AC04], designed to be used to construct the direct map. This scanning process
in Linux embedded systems. JFFS1 was designed to be can be very slow and the space occupied in RAM by
used with NOR ﬂash memory only. JFFS1 is a purely the direct map can be quite large, proportional to the
log-structured ﬁlesystem. Nodes containing metadata and number of ﬁles in the ﬁle system.
possibly data are stored on the NOR ﬂash in a circular
log fashion. In JFFS1 there is only one type of node; the • The circular log design results in that all the data at
struct jffs raw inode, which is associated with a the head of the log is deleted, even if the head of
single inode by an inode number in its header. Next to the log consists of only valid nodes. This is not only
an inode number there is a version number of the node inefﬁcient, it is also not positive in terms of wear-
and ﬁlesystem metadata in the header. The node may also leveling.
carry a variable amount of data.
When the ﬂash device is mounted a scan is made of the en- • Compression is not supported.
tire medium. With the information found in the nodes, a • Hard links are not supported.
complete direct map is reconstructed and stored in RAM.
When a node is superseded by a newer node, that node is • JFFS1 does not support NAND ﬂash.
David Woodhouse of Red Hat enhanced JFFS1 into
JFFS2 [Woo01]. Compression using zlib, rubin or rtime
is available. Hard linking and NAND ﬂash memory are
now supported. Instead of one type of node, JFFS2 uses
three types of nodes:
• inodes: just like the struct jffs raw inode
in JFFS1, but without ﬁle name nor parent inode
number. An inode is removed once the last directory
entry referring to is has been unlinked.
• dirent nodes: directory entries, holding a name and
an inode number. Hard links are maintained with dif-
ferent names but the same inode number. A link is
removed by writing a dirent node with a higher ver-
sion number, having the same name but with target
inode number 0.
• cleanmarker node: this node is written in an erased
block to inform that the block has been properly
erased, in case of a scan at mount time.
Like in JFFS1, nodes with a lower version than the most
recent one are considered obsolete. Instead of the circular
log in JFFS1, the ﬁlesystem deals in blocks, which
correspond to physical erase blocks in the ﬂash device. A
block containing only valid nodes is called clean, blocks
having at least one obsolete node are called dirty and a
free block only contains the cleanmarker node.
When a JFF2 system is mounted, the system scans
all nodes in the ﬂash device and constructs two data
structures, called struct jffs2 inode cache and
struct jffs2 raw node ref. The ﬁrst is a direct
map from each inode number to the start of a linked list Figure 5: Two data structures in JFFS2. Each inode is rep-
of the physical nodes which belong to that inode. The resented by the struct jffs2 inode cache, which
second structure represents each valid node on the ﬂash, points to the start of the chain of nodes representing the
containing two linked lists, one pointing to the next node ﬁle. To indicate the end of the chain, the last node points
in the same physical block, and the other list that points back to the inode. Courtesy of Gal et al. [GT05]
to the next node belonging to the same inode. Figure 5
shows how these two data structures interconnect.
When a JFF2 system is mounted, the system scans
all nodes in the ﬂash device and constructs two data
structures, called struct jffs2 inode cache and
struct jffs2 raw node ref. The ﬁrst is a direct
map from each inode number to the start of a linked list
of the physical nodes which belong to that inode. The
second structure represents each valid node on the ﬂash,
containing two linked lists, one pointing to the next node
in the same physical block, and the other list that points
to the next node belonging to the same inode. Figure 5
shows how these two data structures interconnect.
Garbage collection frees up dirty blocks, turning
them into free blocks. To provide wear leveling on
semi-static data, JFFS2 picks a clean block once every
100 selections, instead of a dirty block.
The big drawback on JFFS2 remains the mounting time.
As JFFS2 also supports NAND and thus bigger ﬂash
Figure 6: yaffs Tnode tree of data chunks in a ﬁle. If
devices, the time to scan the whole device becomes a
the ﬁle grows in size, the levels increase. Each Tnode is
32 bytes big. Level 0 (i.e. lowest level) has 16 2-byte
pointers to data chunks. Higher level Tnodes comprise 8
4.5 YAFFS 4-byte pointers to other Tnodes lower in the tree.
Yet Another Flash File System was developed by Charles
Garbage collection comes a deterministic mode and an
Manning of Aleph One [Man02]. YAFFS is the ﬁrst
aggressive mode. The former is the normal mode, acti-
NAND-only ﬂash ﬁle system. YAFFS was made for
vated when a write occurs. When a write has been com-
NAND ﬂash of 512 byte chunks and 16 bytes headers
pleted and there is a block that is completely ﬁlled with
(see section 2.2), while YAFFS2 supports bigger NAND
discarded chunks, it is garbage collected. The aggressive
chips, 1KB or 2KB pages with respectively 30 and 42
mode is activated once free space is running low, collect-
ing blocks that contain valid chunks, copying the valid
Because the earliest NAND ﬂash memory with 512
chunks to a free block and erasing the old erase block.
page chunks allowed up to three writes to the same area
Wear leveling is not of a high priority, as the authors argue
before an erasure was needed, YAFFS1 marked chunks as
that NAND devices are already shipped with bad blocks
obsolete by rewriting a ﬁeld in the header of each chunk.
so the ﬁlesystem needs to take care of bad blocks anyway.
YAFFS2 required a more complex arrangement to obso-
Uneven wear will only lead to loss of storage capacity,
lete chunks newer ﬂash only supported write-once before
not to errors as bad blocks are handled by the ﬁlesystem.
erasure was needed. In YAFFS2 every header does not
YAFFS does not support compression.
only contain a ﬁle ID and the position within the ﬁle, but
also a sequence number. When multiple chunks with the
same ﬁle ID and position within the ﬁle are encountered, 4.6 LogFS
the chunk with the higher sequence number counts and
LogFS is a creation of Engel et al. [EM] [EBM07] as a
the others are considered obsolete.
response on user comments that JFFS2 and YAFFS have
When the system boots, a scan is performed to create
high RAM usage and long mount times.
a direct map that maps ﬁles to chunks using a tree like
The ﬂash medium is split into segments, each segment
structure, see Figure 6. To speed up the scan YAFFS
consists of multiple erase blocks. LogFS structures the
incorporates checkpointing, saving the RAM map in
device in three storage areas:
ﬂash before a clean unmount. When a system boots,
it reads the ﬂash device from the end to the beginning, • Superblock (1 segment)
encountering the checkpoint fairly fast. Any write to the
• Journal (2-8 segments)
ﬁlesystem after the creation of a checkpoint renders the
checkpoint invalid. • Object store
The superblock contains the global information such as before applying them to the ﬂash device. This not only
ﬁle system type. The journals will be discussed later. increases write speed but also decreases the number of
Each object store consists of one segment, in where all inode updates as some data updates may have the same
but the last erase block are normal data blocks. The last direct or indirect parent inode.
block of each segment contains a summary of the Object Journal replacing is activated when the erase blocks are
store containing for each data block its Inode number, its weared too often. A clean segment is designated as a new
logical position in a ﬁle, physical offset of the block. Next journal and the ﬁrst entry in ﬁrst journal points to this new
to these block speciﬁc ﬁelds, the summary maintains journal.
segment-global information such as erase count, write As LogFS is still under development, wear leveling is
time, etc. not optimized yet. As of January 2007, the segment
When an update of data occurs, the data block is rewritten picked for writing is the ﬁrst empty segment encountered
out-of-place, so the pointer referring to the data must when scanning some segments ahead. Future develop-
be updated (also out-of-place), which needs an update ment would optimize the wear leveling on basis of age
of each parent node of that node. So basically each and/or erase count.
change at the bottom of the tree will propagate upward A free space counter is maintained in the journal and when
all the way to the root. This method of updating the tree space is running out, garbage collection comes into play.
bottom-up is known as the wandering tree algorithm. A Due to the wandering tree algorithm and the fact that the
crash before the root node has been rewritten only causes tree is stored in ﬂash itself, garbage collection in LogFS
the loss of the last operation, as the root node still points is complex. When garbage collection is needed on a seg-
to the previous data and structure. ment containing valid nodes, one free segment is needed
for each level of the tree, because blocks on different lev-
Because inodes do not have reserved areas in ﬂash de- els should be written to different segments and blocks on
vices, LogFS stores the inodes in an inode ﬁle (iﬁle). The the same level should be written to the same segment. Be-
the root inode of this iﬁle is stored in the journal. This cause of this, LogFS becomes slow when the device is
design of iﬁle and normal ﬁles not only simpliﬁes the getting full.
code (ﬁle writes and inode writes are identical now), it The author states that LogFS is designed for big ﬂash de-
also makes it possible to use hardlinks. Figure 7 shows vices, ranging from gigabytes upwards. For smaller ﬂash
the setup of the iﬁle and normal ﬁles. All data, inodes, devices, the author recommends using JFFS2. LogFS was
indirect blocks and the iﬁle inode are stored in the ﬂash, planned to be included in the 2.6.25 Linux kernel. LogFS
although the iﬁle inode is stored in the journal. has a codesize of around 8 KLOC.
The journal is a circular log but much smaller than the
log used in JFFS2, in which the log is the ﬁlesystem. In
LogFS the small journal is ﬁlled with iﬁle inodes, being
4.7 UBI and UBIFS
tuples of a version number and offset. This offset points 4.7.1 Unsorted Block Images - UBI
to the tree root node of the Iﬁle.
Upon mounting, the system does not perform a full scan UBI is a ﬂash management layer with almost the same
but only scans the superblock and the journal to ﬁnd the functionality as the Logical Volume Manager (LVM) on
most recent version of the root node. This approach im- hard drives, but with additional functions. It is designed
proves the mount time by a big factor (J¨ rn Engel states an
o by IBM [TG06]. An UBI runs on top of a ﬂash device,
OLPC system mount goes from 3.3 seconds under JFFS2 and UBIFS runs on top of UBI (see Figure 8 ).
to 60ms under LogFS [Cor07]). UBI has the following relevant functionalities:
As each updated and new data block would indirectly lead
• Bad block management
to a new version of the root node, the erase blocks contain-
ing the journals would wear at a rapid pace. Two solutions • Wear leveling across all physical ﬂash
counter this aggressive wearing, write buffering and jour-
nal replacing. Write buffering stores updates in a buffer • Logical to physical block mapping
Figure 7: LogFS. Combination of Inode ﬁle and normal ﬁle tree structure. Directory entries are inodes with no pointer
to data. Courtesy of Engel et al. [EBM07]
As we see, UBI hides these functions from higher lay-
ered ﬁlesystems. UBI provides an UBI volume to higher
layers, consisting of logical erase blocks. Higher layer
ﬁlesystems may rewrite this logical erase block over and
over without danger of wearing, because UBI transpar-
ently changes the mapping to another physical eraseblock
when it is time.
UBI is not a FTL, as it was designed for bare ﬂashes and
not for ﬂash devices such as MMC/SD carde, USB sticks,
CompactFlash and so on. As so, neither ext2 nor other
”traditional” ﬁle systems can be run on top of an UBI de-
vice. UBI weighs around 11 KLOC. For more informa-
tion on UBI, the reader is referred to [TG06].
4.7.2 UBI Filesystem - UBIFS
UBIFS is developed by Nokia engineers with help of the
University of Szeged [Hun08]. UBIFS is designed to Figure 8: Layered structure of UBI, UBIFS and the ﬂash
work on top of UBI volumes, it cannot operate directly device.
on top of MTD devices or FTL’s [Hun08]. Basicly the
whole setup of UBI, UBIFS and MTD is as follows (see
also Figure 8):
• MTD subsystem, providing a uniform interface to ac- on each known LEB in the main area. The main area
cess raw ﬂash is discussed later. Each leaf node contains three values
about each LEB in the main are: free space, dirty space
• UBI subsystem, the volume manager providing wear- and whether the LEB is a index eraseblock or not. Index
leveling, bad block management and logical to phys- nodes (being part of the on-ﬂash tree) and non-index
ical erase block mapping nodes are kept seperate in different blocks, meaning
• UBIFS ﬁlesystem, providing all other functionality eraseblocks either contain only index nodes or only
ﬁlesystems should provide non-index nodes.
The free space can be used in new writes and the dirty
In contrast, FFS, JFFS2, YAFFS and LogFS work directly space counter is used in garbage collection. The LPT is
on top of raw MTD devices. updated only during a commit.
As UBIFS runs on top of an UBI volume, it is not The on-ﬂash tree and LPT represent the ﬁlesystem just
provided with physical erase blocks but with logical erase after the last commit. The difference between these two
blocks (LEBs). As such, UBIFS does not need to take and the actual state of the ﬁlesystem is represented by the
care of wear leveling as that is handled by the UBI layer. nodes in the journal.
Just like in LogFS, UBIFS uses a wandering tree in a tree
just like the one pictured in Figure 7. The ﬁfth area is called the orphan area, consisting
of inode numbers whose inodes have a link count of
There are 6 areas in UBIFS whose position is ﬁxed zero. After an commit these inodes appear in the tree
at ﬁlesystem creation. The ﬁrst area is the superblock, as leaves with no parent. This is possible when an
using one LEB. The second area are two LEBs ﬁlled with unclean unmount occurs when an open ﬁle is unlinked
master nodes, which store the position of all on-ﬂash and committed. To delete these orphan nodes after an
data structures that do not have ﬁxed logical positions. unclean unmount, either the entire on-ﬂash tree must be
To prevent data corruption, two LEBs are used instead of scanned for unlinked leaf nodes, or a list of orphans must
one. be kept somewhere. UBIFS incorporates the latter with
the orphan area. When the link count of an inode drops
The third ﬁxed area is the log of UBIFS, designed to to zero, the inode number is added to the orphan area
reduce the frequency of updates to the on-ﬂash tree as as leaves of the orphan tree. These inode numbers are
updated nodes can share the same parent. The log is part deleted when the corresponding inode is deleted.
of the journal. Nodes that are updated are placed in the
journal and the index tree in memory (called the TNC) is The sixth and last area is the main area, containing the
updated. Once the journal is full it is committed. data nodes and the on-ﬂash tree (also called index). As
The commit process consists of writing the new version describes earlier, main area LEBs are either ﬁlled with
of the on-ﬂash tree and the corresponding master node. index nodes or non-index nodes.
This process is based on two special type of nodes stored When a UBIFS is mounted, the LPT and on-ﬂash tree are
in the log, being the commit start node which recods the scanned, after which the journal is replayed to receive the
commit has begun, and the reference nodes that record correct stats of the ﬁlesystem.
the LEB numbers of the LEBs in the rest of the journal. The UBIFS code size is around 30 KLOC.
Those LEBs are called buds, so the journal consists of the
log and the buds. The start of a commit is recorded by the
commit start node, while the end of a commit is deﬁned 5 Comparison and Conclusion
when the master node has been written. After that the
reference nodes are obsolete and can be deleted. Flash memory is gowing rapidly in speed, capacity and
popularity. Newer ﬂash device with bigger storage and
The fourth area is the LEB properties tree (LPT), which higher speeds appear constantly, often at the expense of
is a tree in where each leaf node represents information ease of use. This trend requires constant development of
software techniques for these newer ﬂash devices. ﬁle system. This approachs seems very promising as
Several approaches have been discussed, from the other ﬁlesystems can be adapted to work on top of UBI,
inefﬁcient and potentially dangerous Flash Translation as a patched JFFS2 is already capable of. UBIFS also
Layer and the ﬁrst and abandoned Microsoft’s Flash FS, maintains on-ﬂash trees to minimize mount times.
to more advanced dedicated ﬂash ﬁlesystems like JFFS, UBIFS and LogFS are developed around the same time
YAFFS, LogFS and UBIFS. The ﬁrst ﬁlesystems handles and changes are implemented at the moment of writing.
were designed for NOR ﬂash, while nowadays NAND The codebase for the UBI/UBIFS combination is quite
ﬂash is commonly used in ﬂash devices. large in comparison to LogFS, respectively 11/30 and 8
FTL is commonly used on removable devices like USB KLOC.
sticks, because so far the only ﬁlesystem that is supported
by every system is FAT. This approach of FTL with
a traditional ﬁlesystem works, but it is inefﬁcient and
potentially dangerous as FTL does not treat ﬂash memory References
properties properly, even at the cost of potential data loss
in case of a crash. [AC04] Sweden. Axis Communica-
Microsofts FFS had very poor performance and was tions, Lund. Jffs homepage.
abandoned early, but it gave other developers ideas for http://developer.axis.com/software/jffs/,
further development. JFFS was the ﬁrst dedicated ﬂash 2004.
ﬁle system that brought good performance. JFFS1 was
[Ban95] A. Ban. Flash ﬁle system. us patent
focussed on NOR memory, JFFS2 released later with
5,404,485. ﬁled march 8, 1993; issued april
several serious improvements, including NAND and
4,1995;assigned to m-systems. 1995.
hardlink support. JFFS scans the whole device upon
mount time and with the introduction of NAND ﬂash and [Ban99] A. Ban. Flash ﬁle system optimized
its enormous size potential, the mount time became the for page-mode ﬂash technologies. us patent
major disadvantage of JFFS. The full structure of JFFS 5,937,425.ﬁled october 16, 1997; issued au-
remains in memory, laying a heavy burden on RAM gust 10, 1999; assigned to m-systems. 1999.
YAFFS was developed for NAND ﬂash only and to [BL95] S. D. Barrett, P. L. Quinn and R. A. Lipe.
cope with the long scan time and high RAM usage of System for updating data stored on a ﬂash-
JFFS2. YAFFS maintains a smaller tree structure in RAM erasable, programmable, read-only memory
and supports checkpointing, decreasing the mount scan (feprom) based upon predetermined bit value
time dramaticly if and only if the device is unmounted of indicating pointers. us patent 5,392,427.
properly. In case of a crash the whole system needs to be ﬁled may 18, 1993; issued february 21, 1995;
scanned again. The author of YAFFS state it is better to assigned to microsoft. 1995.
use JFFS2 on devices smaller than 64MB and YAFFS on
bigger devices. [Cor98a] Intel Corporation. Flash ﬁle system selection
LogFS solves the mounting time problem and high RAM guide. application note 686. 1998.
usage by maintaining the tree structure in the ﬂash itself,
[Cor98b] Intel Corporation. Understanding the ﬂash
rather then reconstructing it with a scan and keeping it in
translation layer (ftl) speciﬁcation, applica-
RAM only. LogFS is created for large NAND devices of
tion note 648. 1998.
1 GB and bigger and performance drops when the device
is almost full with valid data. The LogFS author states it [Cor07] Corbet. Logfs.
is better to use JFFS2 for smaller devices. http://lwn.net/Articles/234441/, 2007.
UBI and UBIFS introduce a new approach for ﬂash
ﬁlesystems using a layered approach to provide [DCK+ 94] F. Douglis, R. Caceres, M.F. Kaashoek,
transparancy and simplicity to the higher layered K. Li, B. Marsh, and J.A. Tauber. Storage
alternatives for mobile computers. In In Pro- [Woo01] Jffs: The journaling ﬂash ﬁle system.
ceedings of the First USENIX Symposium on Presented in the Ottawa Linux Sym-
Operating Systems Design and Implementa- posium, July 2001 (no proceedings);
tion (OSDI), pages 25–37, Monterey, Califor- a 12-page article is available online at
nia, 1994. ACM. http://sources.redhat.com/jffs2/jffs2.pdf,
[EBM07] Jorn Engel, Dirk Bolte, and Robert
Mertens. Garbage collection in logfs.
[EM] Jorn Engel and Robert Mertens. Logfs
- ﬁnally a scalable ﬂash ﬁle system.
[GT05] Eran Gal and Sivan Toledo. Algorithms and
data structures for ﬂash memories. ACM
Comput. Surv., 37(2):138–163, 2005.
[Hun08] A. Hunter. A brief introduction to
the design of ubifs. http://www.linux-
[KNM95] Atsuo Kawaguchi, Shingo Nishioka, and Hi-
roshi Motoda. A ﬂash-memory based ﬁle
system. In USENIX Winter, pages 155–164,
[Man02] Charles Manning. Yaffs: Yet another ﬂash ﬁl-
ing system, available at http://www.yaffs.net.
[Mas] Chris Mason. Journaling with reiserfs.
[RO92] Mendel Rosenblum and John K. Ousterhout.
The design and implementation of a log-
structured ﬁle system. ACM Transactions on
Computer Systems, 10(1):26–52, 1992.
[Rob] Daniel Robbins. Intro-
ducing ext3. http://www-
[TG06] A. Bityutskiy T. Gleixner, F. Haverkamp. Ubi
- unsorted block images. http://www.linux-