Your SlideShare is downloading. ×
  • Like
NANDFS: A RAM-Constrained Flash File system
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

NANDFS: A RAM-Constrained Flash File system

  • 413 views
Published

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
413
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NANDFS: A Flexible Flash File System for RAM-Constrained Systems Aviad Zuck, Ohad Barzliay and Sivan Toledo
  • 2. Overview
    • Introduction + motivation
    • Flash properties
    • Big Ideas
    • Going into details
    • Software engineering, tests and experiments
    • General flash issues
  • 3. Flash is Everywhere
  • 4.
    • Resilient to vibrations and extreme conditions
    • Faster up 100 times more (random access) than rotating disks
  • 5. What’s missing?
  • 6.
    • Sequential access
    • And
      • “ Today, consumer-grade SSD costs from $2 to $3.45 per gigabyte, hard drives about $0.38 per gigabyte…”
      • Computerworld.com, 27.8.2008*
    • *http://www.computerworld.com/s/article/print/9112065/Solid_state_disk_lackluster_for_laptops_PCs
  • 7. NOR Flash NAND Flash Looser Constrained Mostly Reads Storage Few MB Many MB/GB
  • 8. Two Ways of Flash Management NTFS FAT ext3 … JFFS YAFFS NANDFS …
  • 9. So Why NANDFS?
  • 10.
  • 11. NANDFS Also Has:
    • File locking
    • Transactions
    • Competitive performance and graceful degradation
  • 12. How is it Done, in a Nutshell?
    • Explanation does not fit in a nutshell
    • Complex data structures
    • New garbage collection mechanism
    • And much more…
    • Let’s elaborate
  • 13. Flash Properties
  • 14.
    • Flash memory is divided to pages – 0.5KB, 2KB, 4KB
    • Page consists of Data and Metadata areas – 16B of metadata for every 512B of data
    • Pages arranged in units – 32/64/128 pages per unit
    • Metadata contains unit validity indicator, ECC code and file system metadata
  • 15.
  • 16. Erasures & Programming
    • Page bits initialized to 1’s
    • Writing clears bits (1 to 0)
    • Bits set by erasing entire unit (“erase unit”).
    • Erase unit has limited endurance
  • 17. The Design of NANDFS - The “Big” Ideas
  • 18. Log-structured design
    • Overwrite-in-place is not permitted in flash
    • Caching avoids rippling effect
  • 19. Modular Flash File System
    • Modularity is good. But…
    • We need a block device API designated for flash
    • We call our “block device” the sequencing layer
    Traditional Block Device NANDFS “Block Device” READ READ WRITE ALLOCATE-AND-WRITE (TRIM) TRIM
  • 20. High-level Design
    • A 2-layer structure:
      • File System Layer - transactional file system with unix-like file structure
      • Sequencing Layer – manages the allocation of immutable page-sized chunks of data. Assists in crash recovery and atomicity
  • 21. The Sequencing Layer
  • 22.
    • Divides flash to fixed-size physical units called slots
    • Slots assigned to segments - logical units of the same size
    • Each segment maps to one physical matching slot, except one “ active segment” which is mapped to two slots.
  • 23. Block access
    • Segment ~> Slot mapping table in RAM
    • Block is referenced by a logical handle
        • < segment_id , offset_in_segment >
    • Address translation
      • Example: Logical address <0,2> ~> Physical address 8
  • 24. Where’s the innovation?
    • Logical address mapping not a new idea:
      • Logical Disk (1993), YAFFS, JFFS, And more
    • Many FTL’s use some logical address mapping
      • Full mapping ~> expensive
      • Coarse-grained mapping
        • Fragmentation, performance degradation
        • Costly merges
  • 25. * DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings (2009)
  • 26.
    • The difference in NANDFS
      • NANDFS uses coarse-grained mapping, not full mapping
      • Less RAM for page mapping (more RAM flexibility)
      • Collect garbage while preserving validity of pointers to non-obsolete blocks
    • Appropriate for flash, not for magnetic disks
  • 27. Block allocation
    • NANDFS is log-structured
    • New blocks allocated sequentially from the active segment.
    • In a log-structured system blocks are never re-written
    • File pointer structures need to be updated to reflect the new location of the data.
  • 28. Garbage collection
    • TRIM - pages with obsolete data are marked with a special “obsolete flag”
    • sequencing layer manages counters of obsolete pages in every segment.
    • Problem - EUs contain a mixture of valid and obsolete data (pages), we can’t simply collect entire EUs
    • Solution : Garbage collection is performed together with allocation
  • 29.
    • Reclamation unit = Segment
      • The sequencing layer chooses a segment to reclaim, and allocates it another (fresh) second slot.
    • Reclaim obsolete pages while copying non-obsolete pages
    • NOTICE – Logical addresses are preserved, although physical translation changed
  • 30.
    • Finally when the new slot is full, the old slot is erased.
    • Can now be used to reclaim another segment
    • We choose the segment with the highest obsolete counter level as the new “active segment”.
    • This will not go down well in rotating disks – too many seek operations
  • 31. Sequencing Layer Recovery
    • When a new slot is allocated to a segment, a segment header is written in the slot’s first page
    • Header contains:
      • Incremented segment sequencing number
      • Segment number
      • Segment type
      • Checkpoint (further details later)
  • 32.
    • On mounting the header of every slot is read
    • The segment-to-slot map can be reconstructed using only the data from the headers
    • Other systems (with complete mapping) need to scan entire flash
  • 33. Bad EU Management
    • Each flash memory chip contains some bad EUs
    • Some slots contain more valid EUs than others
    • Solution – some slots are set aside as a bank of reserve EUs
  • 34. Brief Summary
  • 35. The Design of NANDFS - More Ideas
  • 36. Wear Leveling
    • Writes and erases should be spread evenly over all EUs
    • Problem : some slots may be reclaimed rarely
    • Solution: Perform periodic random wear leveling process
      • Choose random slot and copy it to a fresh slot
      • Incurs only a low overhead
      • Guarantees near-optimal expected endurance
      • (Ben-Aroya and Toledo, 2006)
    • Technique widely used (YAFFS, JFFS)
  • 37. Transactions
    • File system operations are atomic and transactional
    • Marking pages as obsolete is not straightforward
    • Simple transaction – block re-write
      • After rewriting, old data block should be marked obsolete
      • If we mark it, and the transaction aborts before completing, old data should remain valid
      • If already marked as obsolete – cannot undo
  • 38.
    • Solution : Perform valid-to-obsolete-transition (or VOT) AFTER the transaction commits.
    • Write VOT records to flash in dedicated pages
    • On commit use VOT records to mark pages as obsolete
    • Maintain linked list of all pages written in a specific transaction on flash
    • Keep in RAM a pointer to the last page written in a transaction
    • On abort mark all pages written by the transaction as obsolete
  • 39.
  • 40. Checkpoints
    • Snapshot of system state
    • Ensures returning to stable state following a crash
    • Checkpoint is written:
      • As part of a segment header.
      • Whenever a transaction commits.
    • Structure:
      • Obsolete counters array
      • Pointer to last-written block address of committed transaction
      • Pointers to the last-written blocks of all on-going transactions
      • Pointer to root inode
  • 41. Simple Example
  • 42. Finding the Last Checkpoint
    • In every given time there is only one valid checkpoint in flash
    • On mounting
      • Locate last allocated slot (using its sequence #)
      • Perform binary search to see if another later checkpoint exists in the slot
      • Aborting all other transactions
      • Truncate all pages written after the checkpoint
      • Finishing the transaction that was committed
  • 43. File System Layer
  • 44.
    • Files represented by inode trees
      • File metadata
      • Direct pointers to data pages
      • Indirect pointers etc.
    • All pointers are logical pointers
    • Regular files not permitted to be sparse
  • 45.
    • Root file and directory inodes may be sparse.
    • Hole indicated by special flag
  • 46. The Root File
    • Array of inodes
  • 47.
    • When a file is deleted a page-size hole is created
    • When creating a file a hole can easily be located
    • If no hole exists, allocate a new inode by extending the root file
  • 48. Directory Structure
    • Directory = array of directory entries
      • inode number
      • Length
      • UTF-8 file name.
    • Direntry length <= 256 bytes.
    • Direntries packed into chunks without gaps
  • 49.
    • chunk size < (page - direntry size) ~> directory contains “hole”
    • Allocating new direntry requires finding a hole
    • Direntry Lookup is sequential
  • 50. System Calls
    • Most system calls ( creat , unlink , mkdir …) are atomic transactions
    • Transaction that handles a write() commits only when on close()
      • System calls that modify a single file can be bundled into a single transaction
      • 5 consecutive calls to write() + close() on a single file are treated as a single transaction
    • Overhead of transaction commit ~ 1
    Actual physical page writes Minimum possible page writes
  • 51. Running Out of Space
    • Log-structured file system writes even when user deletes files
    • When flash is full, the system may have too few free pages to delete a file
    • Solution – maintain number of free+obsolete pages.
    • If next write lowers this number below threshold - abort transactions until we have enough free pages
    • Threshold is :
      • c = # of blocks written on direntry delete
      • = max file pages
      • = re-do records per page.
  • 52. Software Engineering
  • 53. Coding
    • Code written with intention to be “humanly readable”
        • (&(transactions[tid]))->f_type = 0x02
        • vs.
        • TRANSACTION_SET_FTYPE(tid, FTYPE_FILE)
    • Embedded development
      • External libraries not an option (math, string)
      • More macros, less functions (stack)
      • No debugging – need good simulator!
      • Various gcc compliances – cygwin, debian, arm-gcc
  • 54. Incremental development
    • High level and Low level design preceded development
      • 3 weeks
    • Code written bottom up
      • Flash driver –> sequencing layer –> file system layer
      • Caching layer added later. Challenging…
      • 1 year (~commercial code)
    • Test driven development
      • “ By hand” (no libraries)
  • 55. My own boss - lessons
    • Time frames
    • Outsider notes
      • Feedback
      • “ pairing”
  • 56. Experiments & Tests
  • 57. Testing
    • Extensive test-suite:
      • Integration and performance tests
      • Extensive crash tests
      • Large set of unit tests for every function
    • Integrated to eCos
    • Tests and integration verified on actual 32 MB flash
  • 58. Experiments
    • Simulated 1GB flash
    • Configuration - 512 slots, 8 reserved for bad-block replacement
    • 6 open files and 8 file descriptors
    • 3 concurrent transactions
  • 59. Workload
  • 60. Slot Partitioning
  • 61. Mounting
    • YAFFS mounting time - 2.7s
      • 80% utilization
  • 62. Endurance
    • Repeatedly re-write a small file when the file system contains a static 205MB file.
  • 63. (Some) Challenges in flash
  • 64. Single vs. Multi level cell
    • Flash classified by number of bits stored in a single cell
    SLC (1 bits) MLC (2-4 bit)
      • Smaller capacity
      • Cheaper
      • Errors from partial writes
      • Write-constrained
      • Faster
      • More error-prone
      • Less endurance
  • 65. Parallelism
    • *Picture from N Agrawal, V Prabhakaran, T Wobber (2008)
  • 66.
    • Simple example for utilizing parallelism
    • * J Seol, H Shim, J Kim, and S Maeng (2009)
  • 67. Enterprise storage
    • * SW Lee, B Moon, C Park, JM Kim, SW Kim (2008)
    • Disk bandwidth (sequential) still 2-3 times higher than flash
    • Read/write latency flash smaller than disk by more than an order of magnitude
    • This improves throughput of transaction processing – useful for database servers
  • 68. The End
      • Thank you!