• Save
ZFS - The Last Word in Filesystems

Like this? Share it with your network


ZFS - The Last Word in Filesystems



This presentation explains why the Zettabyte File System(ZFS) is considered the last word in filesystems. It explains all the new features that are present in ZFS and their internals.

This presentation explains why the Zettabyte File System(ZFS) is considered the last word in filesystems. It explains all the new features that are present in ZFS and their internals.



Total Views
Views on SlideShare
Embed Views



10 Embeds 1,032

http://www.clogeny.com 718
http://clogeny.com 254
http://newsite.clogeny.com 33
http://www.lmodules.com 11
http://www.linkedin.com 9 2
https://www.linkedin.com 2
http://localhost 1
http://oliveindesign.in 1
http://www.newsite.clogeny.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

ZFS - The Last Word in Filesystems Presentation Transcript

  • 1. ZFS – The Zettabyte File System - The Last Word In File Systems Kalpak Shah Lustre Group, Sun Microsystems Inc. 1
  • 2. EXPLOSION OF DATA How does it affect you and me? Unlimited downloads Lots of movies and songs High resolution pictures and literally thousands of them Videos Documents, work related data How does it affect the computer industry? Burgeoning web Larger, complex databases Data archival High availability Speed Data center, power costs and carbon footprint! 2
  • 3. What's wrong with existing filesystems? No defense against silent data corruption Any defect in disk, controller, cable, driver or firmware can corrupt data silently Brutal to manage Partitions, volumes, provisioning, grow/shrink, arghh! Lots of limits! Filesystem/volume size, file size, number of files, files per directory. Can you name some more? Not portable between platforms(x86, SPARC, ARM) Dog slow Linear time creates, fat locks, fixed block size, naive prefetch, growing backup times 3
  • 4. ZFS Overview Pooled storage Eliminate volume management headaches Does for storage what VM did for memory Transactional object system Always consistent on disk – no fsck, ever! Provable end-to-end data integrity Detects and corrects silent data corruption Historically considered too expensive – no longer true Simple administration Concisely express your intent 4
  • 5. FS/Volume Model vs. Pooled Storage TRADITIONAL VOLUMES ZFS POOLED STORAGE Abstraction: virtual disk Abstraction: malloc/free Partition/volume for each FS No partitions to manage Grow/shrink by hand Grow/shrink automatically Each FS has limited bandwidth All bandwidth always available Storage is fragmented, All storage in the pool is stranded shared 5
  • 6. Hands-on ZFS - Pools # zpool create home /dev/sda /dev/sdb /dev/sdc /dev/sdd # zfs create home/kalpak # zfs create home/girish # zfs create home/docs # zfs create home/kalpak/mail # zfs set compression=on home/kalpak/mail # zfs list NAME USED AVAIL REFER MOUNTPOINT home 72K 18.0T 10K /home home/kalpak 18.5K 18.0T 9.50K /home/kalpak home/kalpak/mail 9K 18.0T 9K /home/kalpak/mail home/docs 9K 18.0T 9K /home/docs 6
  • 7. FS/Volume Interfaces vs. ZFS 7
  • 8. Copy-on-write transactions Bonus: Constant-time snapshots 8
  • 9. Trends in storage integrity Uncorrectable bit error rates have stayed roughly constant 1 in 10^15 bits (~120TB) for enterprise-class drives Bad sector every 8-20 TB in practice Drive capacities doubling every 12-18 months Number of drives per deployment increasing Cheap flash storage will accelerate this trend Experiments at CERN: Simple application to write/verify 1GB file After 3 weeks, found 152 instances of silent data corruption. Previously everything was assumed to be fine. 9
  • 10. Ditto Blocks Each logical block can have upto 3 physical blocks Different devices whenever possible Different places on the same device otherwise All ZFS metadata 2+ copies Metadata is precious and its loss can cause significant data loss. Explicitly settable for important user data Detects and corrects silent data corruption In a multi-disk pool, ZFS survives any non- consecutive disk failures In a single-disk pool, ZFS survives loss of up to 1/8th of the platter 10
  • 11. Self-healing Data in ZFS 11
  • 12. ZFS Scalability Immense capacity (128 bits) Moore's law (Need 65th bit in 10-15 years) ZFS capacity: 256 quadrillion ZB (1ZB = 1billion TB) 100% dynamic metadata No limits on files, directory entries, number of filesystems, snapshots, etc. Concurrent everything Byte-range locking: parallel read-write in conformance with POSIX Parallel constant time directory operations 12
  • 13. ZFS Performance Copy-on-write design Turns random writes into sequential writes Intrinsically hot-spot free Dynamic striping across all devices Distribute load across devices Write: stripe data across all mirrors Read: wherever data was written Variable block size Intelligent prefetch Multiple independent prefetch streams 13
  • 14. ZFS Snapshots Read-only point-in-time copy of the filesystem Instantaneous creation, unlimited numbers No additional space used – remember COW Accessible through .zfs/snapshot in root of each filesystem Take a snapshot of Kalpak's data in home # zfs snapshot home/kalpak@tuesday Rollback to a previous snapshot # zfs rollback home/kalpak@monday Take a look at Wednesday's version of foo.c # cat /home/kalpak/.zfs/snapshot/wednesday/foo.c 14
  • 15. ZFS Clones, send/receive Clones – writable copy of a snapshot # zfs clone home/docs@monday home/girish/docs Ideal for saving private copies of mostly shared data Software installations, source code repositories, diskless clients, virtual machines ZFS send/receive is powered by snapshots Full backup: any snapshot Incremental backup: any snapshot delta Efficient enough for remote replication Generate full backup # zfs send home/kalpak/docs@monday > backup/monday Generate incremental backup # zfs send -i home/kalpak@A home/kalpak@B > backup/B-A Remote replication: send incremental once per minute # zfs send -i home/fs@11:31 home/fs@11:32 | ssh host zfs recv -d backup/fs 15
  • 16. Cool features Built-in compression Block-level compression, transparent to all above layers Each block compressed independently All-zero blocks converted to file holes Many compression algorithms available # zfs set compression=on home/kalpak Quotas Limit Girish to a quota of 10G # zfs set quota=10G home/girish Built-in encryption Secure your data 16
  • 17. ZFS is FREE Licensed as CDDL Active ZFS community http://www.opensolaris.org/os/community/zfs Ported to other operating systems Apple OSX FreeBSD (in-kernel) FUSE port to Linux 17
  • 18. QUESTIONS? 18