Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ZFS - The Last Word in Filesystems


Published on

This presentation explains why the Zettabyte File System(ZFS) is considered the last word in filesystems. It explains all the new features that are present in ZFS and their internals.

  • Be the first to comment

ZFS - The Last Word in Filesystems

  1. 1. ZFS – The Zettabyte File System - The Last Word In File Systems Kalpak Shah Lustre Group, Sun Microsystems Inc. 1
  2. 2. EXPLOSION OF DATA How does it affect you and me? Unlimited downloads Lots of movies and songs High resolution pictures and literally thousands of them Videos Documents, work related data How does it affect the computer industry? Burgeoning web Larger, complex databases Data archival High availability Speed Data center, power costs and carbon footprint! 2
  3. 3. What's wrong with existing filesystems? No defense against silent data corruption Any defect in disk, controller, cable, driver or firmware can corrupt data silently Brutal to manage Partitions, volumes, provisioning, grow/shrink, arghh! Lots of limits! Filesystem/volume size, file size, number of files, files per directory. Can you name some more? Not portable between platforms(x86, SPARC, ARM) Dog slow Linear time creates, fat locks, fixed block size, naive prefetch, growing backup times 3
  4. 4. ZFS Overview Pooled storage Eliminate volume management headaches Does for storage what VM did for memory Transactional object system Always consistent on disk – no fsck, ever! Provable end-to-end data integrity Detects and corrects silent data corruption Historically considered too expensive – no longer true Simple administration Concisely express your intent 4
  5. 5. FS/Volume Model vs. Pooled Storage TRADITIONAL VOLUMES ZFS POOLED STORAGE Abstraction: virtual disk Abstraction: malloc/free Partition/volume for each FS No partitions to manage Grow/shrink by hand Grow/shrink automatically Each FS has limited bandwidth All bandwidth always available Storage is fragmented, All storage in the pool is stranded shared 5
  6. 6. Hands-on ZFS - Pools # zpool create home /dev/sda /dev/sdb /dev/sdc /dev/sdd # zfs create home/kalpak # zfs create home/girish # zfs create home/docs # zfs create home/kalpak/mail # zfs set compression=on home/kalpak/mail # zfs list NAME USED AVAIL REFER MOUNTPOINT home 72K 18.0T 10K /home home/kalpak 18.5K 18.0T 9.50K /home/kalpak home/kalpak/mail 9K 18.0T 9K /home/kalpak/mail home/docs 9K 18.0T 9K /home/docs 6
  7. 7. FS/Volume Interfaces vs. ZFS 7
  8. 8. Copy-on-write transactions Bonus: Constant-time snapshots 8
  9. 9. Trends in storage integrity Uncorrectable bit error rates have stayed roughly constant 1 in 10^15 bits (~120TB) for enterprise-class drives Bad sector every 8-20 TB in practice Drive capacities doubling every 12-18 months Number of drives per deployment increasing Cheap flash storage will accelerate this trend Experiments at CERN: Simple application to write/verify 1GB file After 3 weeks, found 152 instances of silent data corruption. Previously everything was assumed to be fine. 9
  10. 10. Ditto Blocks Each logical block can have upto 3 physical blocks Different devices whenever possible Different places on the same device otherwise All ZFS metadata 2+ copies Metadata is precious and its loss can cause significant data loss. Explicitly settable for important user data Detects and corrects silent data corruption In a multi-disk pool, ZFS survives any non- consecutive disk failures In a single-disk pool, ZFS survives loss of up to 1/8th of the platter 10
  11. 11. Self-healing Data in ZFS 11
  12. 12. ZFS Scalability Immense capacity (128 bits) Moore's law (Need 65th bit in 10-15 years) ZFS capacity: 256 quadrillion ZB (1ZB = 1billion TB) 100% dynamic metadata No limits on files, directory entries, number of filesystems, snapshots, etc. Concurrent everything Byte-range locking: parallel read-write in conformance with POSIX Parallel constant time directory operations 12
  13. 13. ZFS Performance Copy-on-write design Turns random writes into sequential writes Intrinsically hot-spot free Dynamic striping across all devices Distribute load across devices Write: stripe data across all mirrors Read: wherever data was written Variable block size Intelligent prefetch Multiple independent prefetch streams 13
  14. 14. ZFS Snapshots Read-only point-in-time copy of the filesystem Instantaneous creation, unlimited numbers No additional space used – remember COW Accessible through .zfs/snapshot in root of each filesystem Take a snapshot of Kalpak's data in home # zfs snapshot home/kalpak@tuesday Rollback to a previous snapshot # zfs rollback home/kalpak@monday Take a look at Wednesday's version of foo.c # cat /home/kalpak/.zfs/snapshot/wednesday/foo.c 14
  15. 15. ZFS Clones, send/receive Clones – writable copy of a snapshot # zfs clone home/docs@monday home/girish/docs Ideal for saving private copies of mostly shared data Software installations, source code repositories, diskless clients, virtual machines ZFS send/receive is powered by snapshots Full backup: any snapshot Incremental backup: any snapshot delta Efficient enough for remote replication Generate full backup # zfs send home/kalpak/docs@monday > backup/monday Generate incremental backup # zfs send -i home/kalpak@A home/kalpak@B > backup/B-A Remote replication: send incremental once per minute # zfs send -i home/fs@11:31 home/fs@11:32 | ssh host zfs recv -d backup/fs 15
  16. 16. Cool features Built-in compression Block-level compression, transparent to all above layers Each block compressed independently All-zero blocks converted to file holes Many compression algorithms available # zfs set compression=on home/kalpak Quotas Limit Girish to a quota of 10G # zfs set quota=10G home/girish Built-in encryption Secure your data 16
  17. 17. ZFS is FREE Licensed as CDDL Active ZFS community Ported to other operating systems Apple OSX FreeBSD (in-kernel) FUSE port to Linux 17
  18. 18. QUESTIONS? 18