Successfully reported this slideshow.
ZFS


Bringing Terabytes Under Control




                   jay@meangrape.com
who’s the bald guy?


• Jay Edwards
• Long time UNIX sysadmin
• mathematician by training
• OpenSolaris community member
does size matter?
                         1995         2000     2005


• “Large” / Spindles   400


  • 1/100   100      ...
15 years of pain


• storage management is painful
• current tools are still primitive
• undocumented symlink hell
ZFS factoids


• http://www.opensolaris.org/os/
  community/zfs/

• http://en.wikipedia.org/wiki/Zfs
I grew up with BSD

• http://www.opensolaris.org/os/
                        man zpool (1M)
  community/zfs/
• http://en.w...
basics 1


• 128 tasty bits
• copy-on-write transactional model
• file system / volume manager
basics 2


• variable block size
• data scrubbing
• checksums
hi, I’m a zpool
    bigdata
               root vdev
                            leaf vdev
                            lea...
zpools 1


• aggregate visible blocks of storage
• n-way mirrors
• RAIDZ
• hot spares
zpools 2
• local storage
  • whole disks
  • partitions
  • file backed
• FC
• iSCSI (this is red for a reason)
zpools 3


• one pool per system
• except it depends
• know your workload
cans and cannots
• can do
  • export
  • upgrade
• can’t do
  • remove root vdevs, raidz
  • rebalance onto new root vdev
make the filesystems


• filesystems, volumes, snapshots,
  clones
• zfs create bigdata/zfs1
• zfs create bigdata/zfs2
configurables
• compression
• NFS
• mountpoint
• sharing
• ditto blocks
• quotas, reservations
backups, cheap today


• zfs send / zfs receive
• full streams, snapshots, incrementals
• u has scriptz?
hard drives are smart


  $500K arrays are really smart




          ZFS is smart
some assumptions
a leaf vdev is 1
    spindle
                       your hardware is
                        stupid and l...
I said, “one spindle”


• Bug 6457709: vdev_knobs should be...
• vq_max_pending = 35
• mdb
100000




 10000




  1000




   100




    10




     1




   0.1



         Kps (total) 4       8   12         16...
100000




 10000




  1000




   100




    10




     1


             IOPs (individual)
             IOPs (total)
 ...
filthy, lying hardware


• ZILs, flush write scsi
• zfs_nocacheflush=1
• don’t disable your ZIL
locality -- I like it

1 to 4 MB physical
      chunks

                     means?


           don’t know
ARC


• adaptive replacement cache
• zfs_arc_max
• mdb
• your database, it loves the RAM, too
faster, faster


• slogs (separate ZILs)...build 68
• zfs_nocacheflush
• vq_max_pending
• compression (maybe)
best feature



ease of administration
worst feature




thinks hardware is stupid
upcoming features




     encryption
     better OLTP
Thanks!
Upcoming SlideShare
Loading in …5
×

Os Edwards

1,296 views

Published on

Published in: Technology
  • Be the first to comment

Os Edwards

  1. 1. ZFS Bringing Terabytes Under Control jay@meangrape.com
  2. 2. who’s the bald guy? • Jay Edwards • Long time UNIX sysadmin • mathematician by training • OpenSolaris community member
  3. 3. does size matter? 1995 2000 2005 • “Large” / Spindles 400 • 1/100 100 300 • 1/25 200 250 100 • 1/4 400 0 0 25 50 75 100
  4. 4. 15 years of pain • storage management is painful • current tools are still primitive • undocumented symlink hell
  5. 5. ZFS factoids • http://www.opensolaris.org/os/ community/zfs/ • http://en.wikipedia.org/wiki/Zfs
  6. 6. I grew up with BSD • http://www.opensolaris.org/os/ man zpool (1M) community/zfs/ • http://en.wikipedia.org/wiki/Zfs man zfs (1M) • man zfs • man zpool xkcd.com
  7. 7. basics 1 • 128 tasty bits • copy-on-write transactional model • file system / volume manager
  8. 8. basics 2 • variable block size • data scrubbing • checksums
  9. 9. hi, I’m a zpool bigdata root vdev leaf vdev leaf vdev root vdev leaf vdev leaf vdev zpool create bigdata mirror c0t0d0 c0t1d0 zpool add bigdata mirror c1t0d0 c1t1d0
  10. 10. zpools 1 • aggregate visible blocks of storage • n-way mirrors • RAIDZ • hot spares
  11. 11. zpools 2 • local storage • whole disks • partitions • file backed • FC • iSCSI (this is red for a reason)
  12. 12. zpools 3 • one pool per system • except it depends • know your workload
  13. 13. cans and cannots • can do • export • upgrade • can’t do • remove root vdevs, raidz • rebalance onto new root vdev
  14. 14. make the filesystems • filesystems, volumes, snapshots, clones • zfs create bigdata/zfs1 • zfs create bigdata/zfs2
  15. 15. configurables • compression • NFS • mountpoint • sharing • ditto blocks • quotas, reservations
  16. 16. backups, cheap today • zfs send / zfs receive • full streams, snapshots, incrementals • u has scriptz?
  17. 17. hard drives are smart $500K arrays are really smart ZFS is smart
  18. 18. some assumptions a leaf vdev is 1 spindle your hardware is stupid and lies space presented in 1 to 4 MB chunks RAM is just for filesystems
  19. 19. I said, “one spindle” • Bug 6457709: vdev_knobs should be... • vq_max_pending = 35 • mdb
  20. 20. 100000 10000 1000 100 10 1 0.1 Kps (total) 4 8 12 16 20 24 Kps (average) Minutes Kps (individual) IOPS (individual) IOPS (total)
  21. 21. 100000 10000 1000 100 10 1 IOPs (individual) IOPs (total) KPS (total) 0.1 KPS (average) KPS (individual) 0 8 16 24 32 Minutes
  22. 22. filthy, lying hardware • ZILs, flush write scsi • zfs_nocacheflush=1 • don’t disable your ZIL
  23. 23. locality -- I like it 1 to 4 MB physical chunks means? don’t know
  24. 24. ARC • adaptive replacement cache • zfs_arc_max • mdb • your database, it loves the RAM, too
  25. 25. faster, faster • slogs (separate ZILs)...build 68 • zfs_nocacheflush • vq_max_pending • compression (maybe)
  26. 26. best feature ease of administration
  27. 27. worst feature thinks hardware is stupid
  28. 28. upcoming features encryption better OLTP
  29. 29. Thanks!

×