2009 04.s10-admin-topics3

  • 1,836 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,836
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1,416
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Solaris 10 Administration Topics Workshop 3 - File Systems By Peter Baer Galvin For Usenix Last Revision April 2009 Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 2. About the Speaker Peter Baer Galvin - 781 273 4100 pbg@cptech.com www.cptech.com peter@galvin.info My Blog: www.galvin.info Bio Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading systems integrator and VAR, and was the Systems Manager for Brown Universitys Computer Science Department. He has written articles for Byte and other magazines. He was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Petes Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the systems administration column there. He is now Sun columnist for the Usenix ;login: magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials in security and system administration and given talks at many conferences and institutions. Copyright 2009 Peter Baer Galvin - All Rights Reserved 2Saturday, May 2, 2009
  • 3. Objectives Cover a wide variety of topics in Solaris 10 Useful for experienced system administrators Save time Avoid (my) mistakes Learn about new stuff Answer your questions about old stuff Wont read the man pages to you Workshop for hands-on experience and to reinforce concepts Note – Security covered in separate tutorial Copyright 2009 Peter Baer Galvin - All Rights Reserved 3Saturday, May 2, 2009
  • 4. More Objectives What makes novice vs. advanced administrator? Bytes as well as bits, tactics and strategy Knows how to avoid trouble How to get out of it once in it How to not make it worse Has reasoned philosophy Has methodology Copyright 2009 Peter Baer Galvin - All Rights Reserved 4Saturday, May 2, 2009
  • 5. Prerequisites Recommend at least a couple of years of Solaris experience Or at least a few years of other Unix experience Best is a few years of admin experience, mostly on Solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 5Saturday, May 2, 2009
  • 6. About the Tutorial Every SysAdmin has a different knowledge set A lot to cover, but notes should make good reference So some covered quickly, some in detail Setting base of knowledge Please ask questions But let’s take off-topic off-line Solaris BOF Copyright 2009 Peter Baer Galvin - All Rights Reserved 6Saturday, May 2, 2009
  • 7. Fair Warning Sites vary Circumstances vary Admin knowledge varies My goals Provide information useful for each of you at your sites Provide opportunity for you to learn from each other Copyright 2009 Peter Baer Galvin - All Rights Reserved 7Saturday, May 2, 2009
  • 8. Why Listen to Me 20 Years of Sun experience Seen much as a consultant Hopefully, youve used: My Usenix ;login: column The Solaris Corner @ www.samag.com The Solaris Security FAQ SunWorld “Petes Wicked World” SunWorld “Petes Super Systems” Unix Secure Programming FAQ (out of date) Operating System Concepts (The Dino Book), now 8th ed Applied Operating System Concepts Copyright 2009 Peter Baer Galvin - All Rights Reserved 8Saturday, May 2, 2009
  • 9. Slide Ownership As indicated per slide, some slides copyright Sun Microsystems Feel free to share all the slides - as long as you don’t charge for them or teach from them for fee Copyright 2009 Peter Baer Galvin - All Rights Reserved 9Saturday, May 2, 2009
  • 10. Overview Lay of the Land Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 11. Schedule Times and Breaks Copyright 2009 Peter Baer Galvin - All Rights Reserved 11Saturday, May 2, 2009
  • 12. Coverage Solaris 10+, with some Solaris 9 where needed Selected topics that are new, different, confusing, underused, overused, etc Copyright 2009 Peter Baer Galvin - All Rights Reserved 12Saturday, May 2, 2009
  • 13. Outline Overview Objectives Choosing the most appropriate file system(s) UFS / SDS Veritas FS / VM (not in detail) ZFS Copyright 2009 Peter Baer Galvin - All Rights Reserved 13Saturday, May 2, 2009
  • 14. Polling Time Solaris releases in use? Plans to upgrade? Other OSes in use? Use of Solaris rising or falling? SPARC and x86 OpenSolaris? Copyright 2009 Peter Baer Galvin - All Rights Reserved 14Saturday, May 2, 2009
  • 15. Your Objectives? Copyright 2009 Peter Baer Galvin - All Rights Reserved 15Saturday, May 2, 2009
  • 16. Lab Preparation Have device capable of telnet on the USENIX network Or have a buddy Learn your “magic number” Telnet to 131.106.62.100+”magic number” User “root, password “lisa” It’s all very secure Copyright 2009 Peter Baer Galvin - All Rights Reserved 16Saturday, May 2, 2009
  • 17. Lab Preparation Or... Use virtualbox Use your own system Use a remote machine you have legit access to Copyright 2009 Peter Baer Galvin - All Rights Reserved 17Saturday, May 2, 2009
  • 18. Choosing the Most Appropriate File Systems Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 19. Choosing the Most Appropriate File Systems Many file systems, many not optional (tmpfs et al) Where you have choice, how to choose? Consider Solaris version being used < S10 means no ZFS ISV support For each ISV make sure desired FS is supported Apps, backups, clustering Priorities Now weigh priorities of performance, reliability, experience, features, risk / reward Copyright 2009 Peter Baer Galvin - All Rights Reserved 19Saturday, May 2, 2009
  • 20. Consider... Pros and cons of mixing file systems Root file system Not much value in using vxfs / vxvm here unless used elsewhere Interoperability (need to detach from one type of system and attach to another?) Cost Supportability & support model Non-production vs. production use Copyright 2009 Peter Baer Galvin - All Rights Reserved 20Saturday, May 2, 2009
  • 21. Root Disk Mirroring The Crux of Performance Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 22. Topics •Root disk mirroring •ZFS Copyright 2009 Peter Baer Galvin - All Rights Reserved 22Saturday, May 2, 2009
  • 23. Root Disk Mirroring Complicated because Must be bootable Want it protected from disk failure And want the protection to work Can increase or decrease upgrade complexity Veritas Live upgrade Copyright 2009 Peter Baer Galvin - All Rights Reserved 23Saturday, May 2, 2009
  • 24. Manual Mirroring Vxvm encapsulation can cause lack of availability Vxvm needs a rootdg disk Any automatic mirroring can propagate errors Consider Use disksuite (Solaris Volume Manager) to mirror boot disk Use 3rd disk as rootdg, 3rd disksuite metadb, manual mirror copy Or use 10Mb rootdg on 2 boot disks in disksuite to do the mirroring Best of all worlds – details in column at www.samag.com/solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 24Saturday, May 2, 2009
  • 25. Manual Mirroring Sometimes want more than no mirroring, less than real mirroring Thus "manual mirroring" Nightly cron job to copy partitions elsewhere Can be used to duplicate root disk, if installboot used Combination of newfs, mount, ufsdump | ufsrestore Quite effective, useful, and cheap Easy recovery from corrupt root image, malicious error, sysadmin error Has saved at least one client But disk failure can require manual intervention Complete script can be found at www.samag.com/solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 25Saturday, May 2, 2009
  • 26. Best Practice – Root Disk Have 4 disks for root! 1st is primary boot device 2nd is disksuite mirror of first 3rd is manual mirror of 1st 4th is manual mirror, kept on a shelf! Put nothing but systems files on these disks (/, /var, /opt, /usr, swap) Copyright 2009 Peter Baer Galvin - All Rights Reserved 26Saturday, May 2, 2009
  • 27. Aside: Disk Performance Which is faster? 73GB drive 300GB drive 10000 RPM 10000 RPM 3Gb/sec 3Gb/sec Copyright 2009 Peter Baer Galvin - All Rights Reserved 27Saturday, May 2, 2009
  • 28. UFS / SDS Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 29. UFS Overview Standard Pre-Solaris 10 file system Many years old, updated continously But still showing its age No integrated volume manager, instead use SDS (disk suite) Very fast, but feature poor For example snapshots exist but only useful for backups Painful to manage, change, repair Copyright 2009 Peter Baer Galvin - All Rights Reserved 29Saturday, May 2, 2009
  • 30. Features 64-bit pointers 16TB file systems (on 64-bit Solaris) 1TB maximum file size metadata logging (by default) increases performance and keeps file systems (usually) consistent after a crash Lots of ISV and internal command (dump) support Only bootable Solaris file system (until S10 10/08) Dynamic multipathing, but via separate “traffic manager” facility Copyright 2009 Peter Baer Galvin - All Rights Reserved 30Saturday, May 2, 2009
  • 31. Issues Sometimes there is still corruption Need to run fsck Sometimes it fails Many limits Many features lacking (compared to ZFS) Lots of manual administration tasks format to slice up a disk newfs to format the file system, fsck to check it mount and /etc/vfstab to mount a file system share commands, plus svcadm commands, to NFS export Plus separate volume management Copyright 2009 Peter Baer Galvin - All Rights Reserved 31Saturday, May 2, 2009
  • 32. Volume Management Separate set of commands (meta*) to manage volumes (RAID et al) For example, to mirror the root file system Have 2 disks with identical partitioning Have 2 small partition per disk for meta-data (here slices 5 and 6) newfs the file systems Create meta-data state databases (at least 3, for quorum) # metadb -a /dev/dsk/c0t0d0s5 # metadb -a /dev/dsk/c0t0d0s6 # metadb -a /dev/dsk/c0t1d0s5 # metadb -a /dev/dsk/c0t1d0s6 Copyright 2009 Peter Baer Galvin - All Rights Reserved 32Saturday, May 2, 2009
  • 33. Volume Management (cont) Initialize submirrors (components of mirrors) and mirror the partitions - here we do /, swap, and /var # metainit -f d10 1 1 c0t0d0s0 # metainit -f d20 1 1 c0t1d0s0 # metainit d0 -m d10 Make the new / bootable # metaroot d0 # metainit -f d11 1 1 c0t0d0s1 # metainit -f d21 1 1 c0t1d0s1 # metainit d1 -m d11 # metainit -f d14 1 1 c0t0d0s4 # metainit -f d24 1 1 c0t1d0s4 # metainit d4 -m d14 # metainit -f d17 1 1 c0t0d0s7 # metainit -f d27 1 1 c0t1d0s7 # metainit d7 -m d17 Copyright 2009 Peter Baer Galvin - All Rights Reserved 33Saturday, May 2, 2009
  • 34. Volume Management (cont) Update /etc/vfstab to reflect new meta devices /dev/md/dsk/d1 - - swap - no - /dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 1 yes - /dev/md/dsk/d7 /dev/md/rdsk/d7 /export ufs 1 yes - Finally attach the submirror to each device to be mirrored # metattach d0 d20 # metattach d1 d21 # metattach d4 d24 # metattach d7 d27 Now the root disk is mirrored, and commands such as Solaris upgrade, live upgrade, and boot understand that Copyright 2009 Peter Baer Galvin - All Rights Reserved 34Saturday, May 2, 2009
  • 35. Veritas VM / FS Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 36. Overview A popular, commercial addition to Solaris 64-bit Integrated volume management (vxfs + vxvm) Mirrored root disk via “encapsulation” Good ISV support Good extended features such as snapshots, replication Shrink and grow file systems Extent based (for better and worse), journaled, clusterable Cross-platform Copyright 2009 Peter Baer Galvin - All Rights Reserved 36Saturday, May 2, 2009
  • 37. Features Very large limits Dynamic multipathing included Hot spares to automatically replace failed disks Dirty region logging (DRL) volume transaction logs for fast recovery from crash But still can require consistency check Copyright 2009 Peter Baer Galvin - All Rights Reserved 37Saturday, May 2, 2009
  • 38. Issues $$$ Adds supportability complexities (who do you call) Complicates OS upgrades (unencapsulate first) Fairly complex to manage Comparison of performance vs. ZFS at http://www.sun.com/software/whitepapers/ solaris10/zfs_veritas.pdf Copyright 2009 Peter Baer Galvin - All Rights Reserved 38Saturday, May 2, 2009
  • 39. ZFS Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 40. ZFS Looks to be the “next great thing” Shipped officially in S10U2 (the 06/06 release) From scratch file system Includes volume management, file system, reliability, scalability, performance, snapshots, clones, replication 128-bit file system, almost everything is “infinite” Checksumming throughout Simple, endian independent, export/importable… Still using traffic manager for multipathing (some following slides are from ZFS talk by Jeff Bonwick and Bill Moore – ZFS team leads at Sun) Copyright 2009 Peter Baer Galvin - All Rights Reserved 40Saturday, May 2, 2009
  • 41. Trouble with Existing Filesystems No defense against silent data corruption Any defect in disk, controller, cable, driver, or firmware can corrupt data silently; like running a server without ECC memory Brutal to manage Labels, partitions, volumes, provisioning, grow/shrink, /etc/ vfstab... Lots of limits: filesystem/volume size, file size, number of files, files per directory, number of snapshots, ... Not portable between platforms (e.g. x86 to/from SPARC) Dog slow Linear-time create, fat locks, fixed block size, naïve prefetch, slow random writes, dirty region logging Copyright 2009 Peter Baer Galvin - All Rights Reserved 41Saturday, May 2, 2009
  • 42. Design Principles Pooled storage Completely eliminates the antique notion of volumes Does for storage what VM did for memory End-to-end data integrity Historically considered “too expensive” Turns out, no it isnt And the alternative is unacceptable Transactional operation Keeps things always consistent on disk Removes almost all constraints on I/O order Allows us to get huge performance wins Copyright 2009 Peter Baer Galvin - All Rights Reserved 42Saturday, May 2, 2009
  • 43. Why “volumes” Exist In the beginning, each filesystem managed a single disk Customers wanted more space, bandwidth, reliability Rewrite filesystems to handle many disks: hard Insert a little shim (“volume”) to cobble disks together: easy An industry grew up around the FS/volume model Filesystems, volume managers sold as separate products Inherent problems in FS/volume interface cant be fixed Copyright 2009 Peter Baer Galvin - All Rights Reserved 43Saturday, May 2, 2009
  • 44. Traditional Volumes FS FS Volume Volume (stripe) (mirror) Copyright 2009 Peter Baer Galvin - All Rights Reserved 44Saturday, May 2, 2009
  • 45. ZFS Pools Abstraction: malloc/free No partitions to manage Grow/shrink automatically All bandwidth always available All storage in the pool is shared Copyright 2009 Peter Baer Galvin - All Rights Reserved 45Saturday, May 2, 2009
  • 46. ZFS Pooled Storage FS FS FS FS FS Storage Pool Storage Pool (RAIDZ) (Mirror) Copyright 2009 Peter Baer Galvin - All Rights Reserved 46Saturday, May 2, 2009
  • 47. Copyright 2009 Peter Baer Galvin - All Rights Reserved 47Saturday, May 2, 2009
  • 48. ZFS Data Integrity Model Everything is copy-on-write Never overwrite live data On-disk state always valid – no “windows of vulnerability” No need for fsck(1M) Everything is transactional Related changes succeed or fail as a whole No need for journaling Everything is checksummed No silent data corruption No panics due to silently corrupted metadata Copyright 2009 Peter Baer Galvin - All Rights Reserved 48Saturday, May 2, 2009
  • 49. Copyright 2009 Peter Baer Galvin - All Rights Reserved 49Saturday, May 2, 2009
  • 50. Copyright 2009 Peter Baer Galvin - All Rights Reserved 50Saturday, May 2, 2009
  • 51. Copyright 2009 Peter Baer Galvin - All Rights Reserved 51Saturday, May 2, 2009
  • 52. Copyright 2009 Peter Baer Galvin - All Rights Reserved 52Saturday, May 2, 2009
  • 53. Copyright 2009 Peter Baer Galvin - All Rights Reserved 53Saturday, May 2, 2009
  • 54. Copyright 2009 Peter Baer Galvin - All Rights Reserved 54Saturday, May 2, 2009
  • 55. Copyright 2009 Peter Baer Galvin - All Rights Reserved 55Saturday, May 2, 2009
  • 56. Copyright 2009 Peter Baer Galvin - All Rights Reserved 56Saturday, May 2, 2009
  • 57. Copyright 2009 Peter Baer Galvin - All Rights Reserved 57Saturday, May 2, 2009
  • 58. Copyright 2009 Peter Baer Galvin - All Rights Reserved 58Saturday, May 2, 2009
  • 59. Copyright 2009 Peter Baer Galvin - All Rights Reserved 59Saturday, May 2, 2009
  • 60. Copyright 2009 Peter Baer Galvin - All Rights Reserved 60Saturday, May 2, 2009
  • 61. Copyright 2009 Peter Baer Galvin - All Rights Reserved 61Saturday, May 2, 2009
  • 62. Copyright 2009 Peter Baer Galvin - All Rights Reserved 62Saturday, May 2, 2009
  • 63. Copyright 2009 Peter Baer Galvin - All Rights Reserved 63Saturday, May 2, 2009
  • 64. Copyright 2009 Peter Baer Galvin - All Rights Reserved 64Saturday, May 2, 2009
  • 65. Copyright 2009 Peter Baer Galvin - All Rights Reserved 65Saturday, May 2, 2009
  • 66. Copyright 2009 Peter Baer Galvin - All Rights Reserved 66Saturday, May 2, 2009
  • 67. Copyright 2009 Peter Baer Galvin - All Rights Reserved 67Saturday, May 2, 2009
  • 68. Copyright 2009 Peter Baer Galvin - All Rights Reserved 68Saturday, May 2, 2009
  • 69. Copyright 2009 Peter Baer Galvin - All Rights Reserved 69Saturday, May 2, 2009
  • 70. Copyright 2009 Peter Baer Galvin - All Rights Reserved 70Saturday, May 2, 2009
  • 71. Copyright 2009 Peter Baer Galvin - All Rights Reserved 71Saturday, May 2, 2009
  • 72. Terms Pool - set of disks in one or more RAID formats (i.e. mirrored stripe) No “/” File system - mountable-container of files Data set - file system, block device, snapshot, volume or clone within a pool Named via pool/path[@snapshot] Copyright 2009 Peter Baer Galvin - All Rights Reserved 72Saturday, May 2, 2009
  • 73. Terms (cont) ZIL - ZFS intent log On-disk duplicate of in-memory log of changes to make to data sets Write goes to memory, ZIL, is acknowledged, then goes to disk ARC - in-memory read cache L2ARC - level 2 ARC - on flash memory Copyright 2009 Peter Baer Galvin - All Rights Reserved 73Saturday, May 2, 2009
  • 74. What ZFS doesn’t do Can’t remove individual devices from pools Rather, replace the device, or 3-way mirror including the device and then remove the device Can’t shrink a pool (yet) Can add individual devices, but not optimum (yet) If adding disk to RAIDZ or RAIDZ2, then end up with RAIDZ(2)+ 1 concatenated device Instead add full RAID elements to a pool Add a mirror pair or RAIDZ(2) set Copyright 2009 Peter Baer Galvin - All Rights Reserved 74Saturday, May 2, 2009
  • 75. zpool # zpool missing command usage: zpool command args ... where command is one of the following: create [-fn] [-o property=value] ... [-O file-system-property=value] ... [-m mountpoint] [-R root] <pool> <vdev> ... destroy [-f] <pool> add [-fn] <pool> <vdev> ... remove <pool> <device> ... list [-H] [-o property[,...]] [pool] ... iostat [-v] [pool] ... [interval [count]] status [-vx] [pool] ... online <pool> <device> ... offline [-t] <pool> <device> ... clear <pool> [device] Copyright 2009 Peter Baer Galvin - All Rights Reserved 75Saturday, May 2, 2009
  • 76. zpool (cont) attach [-f] <pool> <device> <new-device> detach <pool> <device> replace [-f] <pool> <device> [new-device] scrub [-s] <pool> ... import [-d dir] [-D] import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] [-D] [-f] [-R root] -a import [-o mntopts] [-o property=value] ... [-d dir | -c cachefile] [-D] [-f] [-R root] <pool | id> [newpool] export [-f] <pool> ... upgrade upgrade -v upgrade [-V version] <-a | pool ...> history [-il] [<pool>] ... get <"all" | property[,...]> <pool> ... set <property=value> <pool> Copyright 2009 Peter Baer Galvin - All Rights Reserved 76Saturday, May 2, 2009
  • 77. zpool (cont) # zpool create ezfs raidz c2t0d0 c3t0d0 c4t0d0 c5t0d0 # zpool status -v pool: ezfs state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM ezfs ONLINE 0 0 0 raidz ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c3t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 errors: No known data errors Copyright 2009 Peter Baer Galvin - All Rights Reserved 77Saturday, May 2, 2009
  • 78. zpool (cont) pool: zfs state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zfs ONLINE 0 0 0 raidz ONLINE 0 0 0 c0d0s7 ONLINE 0 0 0 c0d1s7 ONLINE 0 0 0 c1d1 ONLINE 0 0 0 c1d0 ONLINE 0 0 0 errors: No known data errors Copyright 2009 Peter Baer Galvin - All Rights Reserved 78Saturday, May 2, 2009
  • 79. zpool (cont) (/)# zpool iostat -v capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- bigp 630G 392G 2 4 41.3K 496K raidz 630G 392G 2 4 41.3K 496K c0d0s6 - - 0 2 8.14K 166K c0d1s6 - - 0 2 7.77K 166K c1d0s6 - - 0 2 24.1K 166K c1d1s6 - - 0 2 22.2K 166K ---------- ----- ----- ----- ----- ----- ----- Copyright 2009 Peter Baer Galvin - All Rights Reserved 79Saturday, May 2, 2009
  • 80. zpool (cont) # zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c0d0s0 ONLINE 0 0 0 c0d1s0 ONLINE 0 0 0 errors: No known data errors pool: zpbg state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zpbg ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 errors: No known data errors Copyright 2009 Peter Baer Galvin - All Rights Reserved 80Saturday, May 2, 2009
  • 81. zpool (cont) zpool iostat -v capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- rpool 6.72G 225G 0 1 9.09K 11.6K mirror 6.72G 225G 0 1 9.09K 11.6K c0d0s0 - - 0 0 5.01K 11.7K c0d1s0 - - 0 0 5.09K 11.7K ---------- ----- ----- ----- ----- ----- ----- zpbg 3.72T 833G 0 0 32.0K 1.24K raidz1 3.72T 833G 0 0 32.0K 1.24K c4t0d0 - - 0 0 9.58K 331 c4t1d0 - - 0 0 10.3K 331 c5t0d0 - - 0 0 10.4K 331 c5t1d0 - - 0 0 10.3K 331 c6t0d0 - - 0 0 9.54K 331 ---------- ----- ----- ----- ----- ----- ----- Copyright 2009 Peter Baer Galvin - All Rights Reserved 81Saturday, May 2, 2009
  • 82. zpool (cont) Note that for import and export, a pool is the delineator You can’t import or export a file system because it’s an integral part of a pool Might cause you to use smaller pools than other Copyright 2009 Peter Baer Galvin - All Rights Reserved 82Saturday, May 2, 2009
  • 83. zfs # zfs missing command usage: zfs command args ... where command is one of the following: create [-p] [-o property=value] ... <filesystem> create [-ps] [-b blocksize] [-o property=value] ... -V <size> <volume> destroy [-rRf] <filesystem|volume|snapshot> snapshot [-r] [-o property=value] ... <filesystem@snapname| volume@snapname> rollback [-rRf] <snapshot> clone [-p] [-o property=value] ... <snapshot> <filesystem|volume> promote <clone-filesystem> rename <filesystem|volume|snapshot> <filesystem|volume|snapshot> rename -p <filesystem|volume> <filesystem|volume> rename -r <snapshot> <snapshot> Copyright 2009 Peter Baer Galvin - All Rights Reserved 83Saturday, May 2, 2009
  • 84. zfs (cont) list [-rH] [-o property[,...]] [-t type[,...]] [-s property] ... [-S property] ... [filesystem|volume|snapshot] ... set <property=value> <filesystem|volume|snapshot> ... get [-rHp] [-o field[,...]] [-s source[,...]] <"all" | property[,...]> [filesystem|volume| snapshot] ... inherit [-r] <property> <filesystem|volume|snapshot> ... upgrade [-v] upgrade [-r] [-V version] <-a | filesystem ...> mount mount [-vO] [-o opts] <-a | filesystem> unmount [-f] <-a | filesystem|mountpoint> share <-a | filesystem> unshare [-f] <-a | filesystem|mountpoint> Copyright 2009 Peter Baer Galvin - All Rights Reserved 84Saturday, May 2, 2009
  • 85. zfs (cont) send [-R] [-[iI] snapshot] <snapshot> receive [-vnF] <filesystem|volume|snapshot> receive [-vnF] -d <filesystem> allow [-ldug] <"everyone"|user|group>[,...] <perm|@setname>[,...] <filesystem|volume> allow [-ld] -e <perm|@setname>[,...] <filesystem|volume> allow -c <perm|@setname>[,...] <filesystem|volume> allow -s @setname <perm|@setname>[,...] <filesystem|volume> unallow [-rldug] <"everyone"|user|group>[,...] [<perm|@setname>[,...]] <filesystem|volume> unallow [-rld] -e [<perm|@setname>[,...]] <filesystem|volume> unallow [-r] -c [<perm|@setname>[,...]] <filesystem|volume> unallow [-r] -s @setname [<perm|@setname>[,...]] <filesystem| volume> Each dataset is of the form: pool/[dataset/]*dataset[@name] For the property list, run: zfs set|get For the delegated permission list, run: zfs allow|unallow Copyright 2009 Peter Baer Galvin - All Rights Reserved 85Saturday, May 2, 2009
  • 86. zfs (cont) # zfs get missing property argument usage: get [-rHp] [-o field[,...]] [-s source[,...]] <"all" | property[,...]> [filesystem|volume|snapshot] ... The following properties are supported: PROPERTY EDIT INHERIT VALUES available NO NO <size> compressratio NO NO <1.00x or higher if compressed> creation NO NO <date> mounted NO NO yes | no origin NO NO <snapshot> referenced NO NO <size> type NO NO filesystem | volume | snapshot used NO NO <size> aclinherit YES YES discard | noallow | restricted | passthrough aclmode YES YES discard | groupmask | passthrough atime YES YES on | off Copyright 2009 Peter Baer Galvin - All Rights Reserved 86Saturday, May 2, 2009
  • 87. zfs (cont) canmount YES NO on | off | noauto casesensitivity NO YES sensitive | insensitive | mixed checksum YES YES on | off | fletcher2 | fletcher4 | sha256 compression YES YES on | off | lzjb | gzip | gzip-[1-9] copies YES YES 1 | 2 | 3 devices YES YES on | off exec YES YES on | off mountpoint YES YES <path> | legacy | none nbmand YES YES on | off normalization NO YES none | formC | formD | formKC | formKD primarycache YES YES all | none | metadata quota YES NO <size> | none readonly YES YES on | off recordsize YES YES 512 to 128k, power of 2 refquota YES NO <size> | none refreservation YES NO <size> | none reservation YES NO <size> | none Copyright 2009 Peter Baer Galvin - All Rights Reserved 87Saturday, May 2, 2009
  • 88. zfs (cont) secondarycache YES YES all | none | metadata setuid YES YES on | off shareiscsi YES YES on | off | type=<type> sharenfs YES YES on | off | share(1M) options sharesmb YES YES on | off | sharemgr(1M) options snapdir YES YES hidden | visible utf8only NO YES on | off version YES NO 1 | 2 | 3 | current volblocksize NO YES 512 to 128k, power of 2 volsize YES NO <size> vscan YES YES on | off xattr YES YES on | off zoned YES YES on | off Sizes are specified in bytes with standard units such as K, M, G, etc. User-defined properties can be specified by using a name containing a colon (:). Copyright 2009 Peter Baer Galvin - All Rights Reserved 88Saturday, May 2, 2009
  • 89. zfs (cont) (/)# zfs list NAME USED AVAIL REFER MOUNTPOINT bigp 630G 384G - /zfs/bigp bigp/big 630G 384G 630G /zfs/bigp/big (root@sparky)-(7/pts)-(06:35:11/05/05)- (/)# zfs snapshot bigp/big@5-nov (root@sparky)-(8/pts)-(06:35:11/05/05)- (/)# zfs list NAME USED AVAIL REFER MOUNTPOINT bigp 630G 384G - /zfs/bigp bigp/big 630G 384G 630G /zfs/bigp/big bigp/big@5-nov 0 - 630G /zfs/bigp/big@5-nov # zfs send bigp/big@5-nov | ssh host zfs receive poolB/received/ big@5-nov # zfs send -i 5-nov big/bigp@6-nov | ssh host zfs receive poolB/received/big Copyright 2009 Peter Baer Galvin - All Rights Reserved 89Saturday, May 2, 2009
  • 90. zfs (cont) # zpool history History for zpbg: 2006-04-03.11:47:44 zpool create -f zpbg raidz c5t0d0 c10t0d0 c11t0d0 c12t0d0 c13t0d0 2006-04-03.18:19:48 zfs receive zpbg/imp 2006-04-03.18:41:39 zfs receive zpbg/home 2006-04-03.19:04:22 zfs receive zpbg/photos 2006-04-03.19:37:56 zfs set mountpoint=/export/home zpbg/home 2006-04-03.19:44:22 zfs receive zpbg/mail 2006-04-03.20:12:34 zfs set mountpoint=/var/mail zpbg/mail 2006-04-03.20:14:32 zfs receive zpbg/mqueue 2006-04-03.20:15:01 zfs set mountpoint=/var/spool/mqueue zpbg/ mqueue # zfs create -V 2g tank/volumes/v2 # zfs set shareiscsi=on tank/volumes/v2 # iscsitadm list target Target: tank/volumes/v2 iSCSI Name: iqn.1986-03.com.sun:02:984fe301-c412-ccc1-cc80- cf9a72aa062a Connections: 0 Copyright 2009 Peter Baer Galvin - All Rights Reserved 90Saturday, May 2, 2009
  • 91. zpool history -l Shows user name, host name, and zone of command # zpool history -l users History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 [user root on corona:global] 2008-07-10.09:43:13 zfs create users/marks [user root on corona:global] 2008-07-10.09:43:44 zfs destroy users/marks [user root on corona:global] 2008-07-10.09:43:48 zfs create users/home [user root on corona:global] 2008-07-10.09:43:56 zfs create users/home/markm [user root on corona:global] 2008-07-10.09:44:02 zfs create users/home/marks [user root on corona:global] Copyright 2009 Peter Baer Galvin - All Rights Reserved 91Saturday, May 2, 2009
  • 92. zpool history -i Shows zfs internal activities - useful for debugging # zpool history -i users History for ’users’: 2008-07-10.09:43:05 zpool create users mirror c1t1d0 c1t2d0 2008-07-10.09:43:13 [internal create txg:6] dataset = 21 2008-07-10.09:43:13 zfs create users/marks 2008-07-10.09:43:48 [internal create txg:12] dataset = 27 2008-07-10.09:43:48 zfs create users/home 2008-07-10.09:43:55 [internal create txg:14] dataset = 33 Copyright 2009 Peter Baer Galvin - All Rights Reserved 92Saturday, May 2, 2009
  • 93. ZFS Delegate Admin Use zfs allow and zfs unallow to grant and remove permissions Use “delegation” property to manage if delegation enabled Then delegate # zfs allow cindys create,destroy,mount,snapshot tank/cindys # zfs allow tank/cindys ------------------------------------------------------------- Local+Descendent permissions on (tank/cindys) user cindys create,destroy,mount,snapshot ------------------------------------------------------------- # zfs unallow cindys tank/cindys # zfs allow tank/cindys Copyright 2009 Peter Baer Galvin - All Rights Reserved 93Saturday, May 2, 2009
  • 94. ZFS - Odds and Ends zfs get all will display all set attributes of all ZFS file systems Recursive snapshots (via -r) as of S10 8/07 zfs clone makes a RW copy of a snapshot zfs promote sets the root of the file system to be the specified clone You can undo a zpool destroy with zpool import -D As of S10 8/07 ZFS is integrated with FMA As of S10 11/06 ZFS supports double-RAID parity Copyright 2009 Peter Baer Galvin - All Rights Reserved 94Saturday, May 2, 2009
  • 95. ZFS “GUI” Did you know that Solaris has an admin GUI? Webconsole enabled by default Turn off via svcadm if not used By default (on Nevada B64 at least) ZFS only on-by-default feature Copyright 2009 Peter Baer Galvin - All Rights Reserved 95Saturday, May 2, 2009
  • 96. Copyright 2009 Peter Baer Galvin - All Rights Reserved 96Saturday, May 2, 2009
  • 97. ZFS Automatic Snapshots In Nevada 100 (LSARC 2008/571) - will be in OpenSolaris 2008.11 SMF service and GNOME app Can take automatic scheduled snapshots By default all zfs file systems, at boot, then every 15 minutes, every hour, every day, etc Auto delete of oldest snapshots if user-defined amount of space is not available Can perform incremental or full backups via those snapshots Nautilus integration allows user to browse and restore files graphically Copyright 2009 Peter Baer Galvin - All Rights Reserved 97Saturday, May 2, 2009
  • 98. ZFS Automatic Snapshots (cont) One SMF service per time frequency: frequent snapshots every 15 mins, keeping 4 snapshots hourly snapshots every hour, keeping 24 snapshots daily snapshots every day, keeping 31 snapshots weekly snapshots every week, keeping 7 snapshots monthly snapshots every month, keeping 12 snapshots Details here: http://src.opensolaris.org/source/xref/jds/zfs- snapshot/README.zfs-auto-snapshot.txt Copyright 2009 Peter Baer Galvin - All Rights Reserved 98Saturday, May 2, 2009
  • 99. ZFS Automatic Snapshots (cont) Service properties provide more details zfs/fs-name The name of the filesystem. If the special filesystem name "//" is used, then the system snapshots only filesystems with the zfs user property "com.sun:auto-snapshot:<label>" set to true, so to take frequent snapshots of tank/timf, run the following zfs command: # zfs set com.sun:auto-snapshot:frequent=true tank/timf The "snap-children" property is ignored when using this fs-name value. Instead, the system automatically determines when its able to take recursive, vs. non-recursive snapshots of the system, based on the values of the ZFS user properties. zfs/interval [ hours | days | months | none] When set to none, we dont take automatic snapshots, but leave an SMF instance available for users to manually fire the method script whenever they want - useful for snapshotting on system events. zfs/keep How many snapshots to retain - eg. setting this to "4" would keep only the four most recent snapshots. When each new snapshot is taken, the oldest is destroyed. If a snapshot has been cloned, the service will drop to maintenance mode when attempting to destroy that snapshot. Setting to "all" keeps all snapshots. zfs/period How often you want to take snapshots, in intervals set according to "zfs/ interval" (eg. every 10 days) Copyright 2009 Peter Baer Galvin - All Rights Reserved 99Saturday, May 2, 2009
  • 100. ZFS Automatic Snapshots (cont) zfs/snapshot-children "true" if you would like to recursively take snapshots of all child filesystems of the specified fs-name. This value is ignored when setting zfs/fs-name=// zfs/backup [ full | incremental | none ] zfs/backup-save-cmd The command string used to save the backup stream. zfs/backup-lock You shouldnt need to change this - but it should be set to "unlocked" by default. We use it to indicate when a backup is running. zfs/label A label that can be used to differentiate this set of snapshots from others, not required. If multiple schedules are running on the same machine, using distinct labels for each schedule is needed - otherwise oneschedule could remove snapshots taken by another schedule according to its snapshot-retention policy. (see "zfs/keep") zfs/verbose Set to false by default, setting to true makes the service produce more output about what its doing. zfs/avoidscrub Set to false by default, this determines whether we should avoid taking snapshots on any pools that have a scrub or resilver in progress. More info in the bugid: 6343667 need itinerary so interrupted scrub/resilver doesnt have to start over Copyright 2009 Peter Baer Galvin - All Rights Reserved 100Saturday, May 2, 2009
  • 101. ZFS Automatic Snapshot (cont) http://blogs.sun.com/erwann/resource/ menu-location.png Copyright 2009 Peter Baer Galvin - All Rights Reserved 101Saturday, May 2, 2009
  • 102. ZFS Automatic Snapshot (cont) If life-preserver icon enabled in file browser, then backup of directory is available Press to bring up nav bar Copyright 2009 Peter Baer Galvin - All Rights Reserved 102Saturday, May 2, 2009
  • 103. ZFS Automatic Snapshot (cont) Drag slider into past to show previous version of files in the directory Then right-click on afile and select “Restore to Desktop” if you want it back More features coming Press to bring up nav bar Copyright 2009 Peter Baer Galvin - All Rights Reserved 103Saturday, May 2, 2009
  • 104. ZFS Status Netbackup, Legato support ZFS for backup / restore VCS supports ZFS as file system of clustered services Most vendors don’t care which file system app runs on Performance as good as other file systems Feature set better Copyright 2009 Peter Baer Galvin - All Rights Reserved 104Saturday, May 2, 2009
  • 105. ZFS Futures Support by ISVs Backup / restore Some don’t get metadata (yet) Use zfs send to emit file containing filesystem Clustering (see Lustre) Performance still a work in progress Being ported to BSD, Mac OS Leopard Check out the ZFS FAQ at http://www.opensolaris.org/os/community/zfs/faq/ Copyright 2009 Peter Baer Galvin - All Rights Reserved 105Saturday, May 2, 2009
  • 106. ZFS Performance From http://www.opensolaris.org/jive/thread.jspa? messageID=14997 billm   Reply On Thu, Nov 17, 2005 at 05:21:36AM -0800, Jim Lin wrote: > Does ZFS reorganize (ie. defrag) the files over time? Not yet. > If it doesnt, it might not perform well in "write-little read-much" > scenarios (where read performance is much more important than write > performance). As always, the correct answer is "it depends". Lets take a look at several cases: - Random reads: No matter if the data was written randomly or sequentially, random reads are random for any filesystem, regardless of their layout policy. Not much you can do to optimize these, except have the best I/O scheduler possible. Copyright 2009 Peter Baer Galvin - All Rights Reserved 106Saturday, May 2, 2009
  • 107. ZFS Performance (cont) - Sequential writes, sequential reads: With ZFS, sequential writes lead to sequential layout on disk. So sequential reads will perform quite well in this case. - Random writes, sequential reads: This is the most interesting case. With random writes, ZFS turns them into sequential writes, which go *really* fast. With sequential reads, you know which order the reads are going to be coming in, so you can kick off a bunch of prefetch reads. Again, with a good I/O scheduler (which ZFS just happens to have), you can turn this into good read performance, if not entirely as good as totally sequential. Believe me, weve thought about this a lot. There is a lot we can do to improve performance, and were just getting started. Copyright 2009 Peter Baer Galvin - All Rights Reserved 107Saturday, May 2, 2009
  • 108. ZFS Performance (cont) For DBs and other direct-disk-access- wanting applications There is no direct I/O in ZFS But can get very good performance by matching I/O size of the app (e.g. Oracle uses 8K) with recordsize of zfs file system This is set at filesystem create time Copyright 2009 Peter Baer Galvin - All Rights Reserved 108Saturday, May 2, 2009
  • 109. ZFS Performance (cont) The ZIL can be a bottleneck on NFS servers NFS does sync writes Put the ZIL on another disk, or on SSD ZFS aggressively uses memory for caching Low priority user, but can cause temporary conflicts with other users Use arcstat to monitor memory use http://www.solarisinternals.com/wiki/index.php/ Arcstat Copyright 2009 Peter Baer Galvin - All Rights Reserved 109Saturday, May 2, 2009
  • 110. ZFS Backup Tool Zetaback is a thin-agent based ZFS backup tool Runs from a central host Scans clients for new ZFS filesystems Manages varying desired backup intervals (per host) for full backups incremental backups Maintain varying retention policies (per host) Summarize existing backups Restore any host:fs backup at any point in time to any target host https://labs.omniti.com/trac/zetaba Copyright 2009 Peter Baer Galvin - All Rights Reserved 110Saturday, May 2, 2009
  • 111. zfs upgrade On-disk format of ZFS changes over time Forward-upgradeable, but not backward compatible Watch out when attaching and detaching zpools Also “sent” not readable by older zfs versions # zfs upgrade This system is currently running ZFS filesystem version 2. The following filesystems are out of date, and can be upgraded. After being upgraded, these filesystems (and any ’zfs send’ streams generated from subsequent snapshots) will no longer be accessible by older software versions. VER FILESYSTEM --- ------------ 1 datab 1 datab/users 1 datab/users/area51 Copyright 2009 Peter Baer Galvin - All Rights Reserved 111Saturday, May 2, 2009
  • 112. Automatic Snapshots and Backups Unsupported services, may become supported http://blogs.sun.com/timf/entry/ zfs_automatic_snapshots_0_10 http://blogs.sun.com/timf/entry/ zfs_automatic_for_the_people Copyright 2009 Peter Baer Galvin - All Rights Reserved 112Saturday, May 2, 2009
  • 113. ZFS - Smashing! http://www.youtube.com/watch?v=CN6iDzesEs0&fmt=18 Copyright 2009 Peter Baer Galvin - All Rights Reserved 113Saturday, May 2, 2009
  • 114. Storage Odds and Ends iostat -y shows performance info on multipathed devices raidctl is RAID configuration tool for multiple RAID controllers fsstat file-system based stat command # fsstat -F new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 0 0 0 0 0 0 0 0 ufs 0 0 0 26.0K 0 52.0K 354 4.71K 1.56M 0 0 proc 0 0 0 0 0 0 0 0 0 0 0 nfs 53.2K 1.02K 24.0K 8.99M 48.6K 4.26M 161K 44.8M 11.8G 23.1M 6.58G zfs 0 0 0 2.94K 0 0 0 0 0 0 0 lofs 7.26K 2.84K 4.30K 31.5K 83 35.4K 6 40.5K 41.3M 45.6K 39.2M tmpfs 0 0 0 410 0 0 0 33 11.0K 0 0 mntfs 0 0 0 0 0 0 0 0 0 0 0 nfs3 0 0 0 0 0 0 0 0 0 0 0 nfs4 0 0 0 0 0 0 0 0 0 0 0 autofs Copyright 2009 Peter Baer Galvin - All Rights Reserved 114Saturday, May 2, 2009
  • 115. Build an OpenSolaris Storage Server in 10 Minutes http://developers.sun.com/openstorage/articles/opensolaris_storage_server.html Example 1: ZFS Filesystem Objectives: Understand the purpose of the ZFS filesystem. Configure a ZFS pool and filesystem. Requirements: A server (SPARC or x64 based) running the OpenSolaris OS. Configuration details from the running server. Step 1: Identify your Disks. Identify the storage available for adding to the ZFS pool using the format(1) command. Your output will vary from that shown here: # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t2d0 /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0 1. c0t3d0 /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@3,0 Specify disk (enter its number): ^D Copyright 2009 Peter Baer Galvin - All Rights Reserved 115Saturday, May 2, 2009
  • 116. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 2: Add your disks to your ZFS pool. # zpool create -f mypool c0t3d0s0 # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT mypool 10G 94K 10.0G 0% ONLINE - Step 3: Create a filesystem in your pool. # zfs create mypool/myfs # df -h /mypool/myfs Filesystem size used avail capacity Mounted on mypool/myfs 9.8G 18K 9.8G 1% /mypool/myfs Copyright 2009 Peter Baer Galvin - All Rights Reserved 116Saturday, May 2, 2009
  • 117. Build an OpenSolaris Storage Server in 10 Minutes - cont Example 2: Network File System (NFS) Objectives: Understand the purpose of the NFS filesystem. Create an NFS shared filesystem on a server and mount it on a client. Requirements: Two servers (SPARC or x64 based) - one from the previous example - running the OpenSolaris OS. Configuration details from the running systems. Step 1: Create the NFS shared filesystem on the server. Switch on the NFS service on the server: # svcs nfs/server STATE STIME FMRI disabled 6:49:39 svc:/network/nfs/server:default # svcadm enable nfs/server Share the ZFS filesystem over NFS: # zfs set sharenfs=on mypool/myfs # dfshares RESOURCE SERVER ACCESS TRANSPORT x4100:/mypool/myfs x4100 - - Copyright 2009 Peter Baer Galvin - All Rights Reserved 117Saturday, May 2, 2009
  • 118. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 2: Switch on the NFS service on the client. This is similar to the the procedure for the server: # svcs nfs/client STATE STIME FMRI disabled 6:47:03 svc:/network/nfs/client:default # svcadm enable nfs/client Mount the shared filesystem on the client: # mkdir /mountpoint # mount -F nfs x4100:/mypool/myfs /mountpoint # df -h /mountpoint Filesystem size used avail capacity Mounted on x4100:/mypool/myfs 9.8G 18K 9.8G 1% /mountpoint Copyright 2009 Peter Baer Galvin - All Rights Reserved 118Saturday, May 2, 2009
  • 119. Build an OpenSolaris Storage Server in 10 Minutes - cont Example 3: Common Internet File System (CIFS) Objectives: Understand the purpose of the CIFS filesystem. Configure a CIFS share on one machine (from the previous example) and make it available on the other machine. Requirements: Two servers (SPARC or x64 based) running the OpenSolaris OS. Configuration details provided here. Step 1: Create a ZFS filesystem for CIFS. # zfs create -o casesensitivity=mixed mypool/myfs2 # df -h /mypool/myfs2 Filesystem size used avail capacity Mounted on mypool/myfs 2 9.8G 18K 9.8G 1% /mypool/myfs2 Step 2: Switch on the SMB Server service on the server. # svcs smb/server STATE STIME FMRI disabled 6:49:39 svc:/network/smb/server:default # svcadm enable smb/server Copyright 2009 Peter Baer Galvin - All Rights Reserved 119Saturday, May 2, 2009
  • 120. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 3: Share the filesystem using CIFS. # zfs set sharesmb=on mypool/myfs2 Verify using the following command: # zfs get sharesmb mypool/myfs2 NAME PROPERTY VALUE SOURCE mypool/myfs2 sharesmb on local Step 4: Verify the CIFS naming. Because we have not explicitly named the share, we can examine the default name assigned to it using the following command: # sharemgr show -vp default nfs=() zfs zfs/mypool/myfs nfs=() /mypool/myfs zfs/mypool/myfs2 smb=() mypool_myfs2=/mypool/myfs2 Both the NFS share (/mypool/myfs) and the CIFS share (mypool_myfs2) are shown. Step 5: Edit the file /etc/pam.conf to support creation of an encrypted version of the users password for CIFS. Add the following line to the end of the file: other password required pam_smb_passwd.so.1 nowarn Copyright 2009 Peter Baer Galvin - All Rights Reserved 120Saturday, May 2, 2009
  • 121. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 6: Change the password using the passwd command. # passwd username New Password: Re-enter new Password: passwd: password successfully changed for root Now repeat Steps 5 and 6 on the Solaris client. Step 7: Enable CIF client services on the client node. # svcs smb/client STATE STIME FMRI disabled 6:47:03 svc:/network/smb/client:default # svcadm enable smb/client Copyright 2009 Peter Baer Galvin - All Rights Reserved 121Saturday, May 2, 2009
  • 122. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 8: Make a mount point on the client and mount the CIFS resource from the server. Mount the resource across the network and check it using the following command sequence: # mkdir /mountpoint2 # mount -F smbfs //root@x4100/mypool_myfs2 /mountpoint2 Password: ******* # df -h /mountpoint2 Filesystem size used avail capacity Mounted on //root@x4100/mypool_myfs2 9.8G 18K 9.8G 1% / mountpoint2 # df -n / : ufs /mountpoint : nfs /mountpoint2 : smbfs Copyright 2009 Peter Baer Galvin - All Rights Reserved 122Saturday, May 2, 2009
  • 123. Build an OpenSolaris Storage Server in 10 Minutes - cont Example 4: Comstar Fibre Channel Target Objectives Understand the purpose of the Comstar Fibre Channel target. Configure an FC target and initiator on two servers. Requirements: Two servers (SPARC or x64 based) running the OpenSolaris OS. Configuration details provided here. Step 1: Start the SSCSI Target Mode Framework and verify it. Use the following commands to start up and check the service on the host that provides the target: # svcs stmf STATE STIME FMRI disabled 19:15:25 svc:/system/device/stmf:default # svcadm enable stmf # stmfadm list-state Operational Status: online Config Status : initialized Copyright 2009 Peter Baer Galvin - All Rights Reserved 123Saturday, May 2, 2009
  • 124. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 2: Ensure that the framework can see the ports. Use the following command to ensure that the target mode framework can see the HBA ports: # stmfadm list-target -v Target: wwn.210000E08B909221 Operational Status: Online Provider Name : qlt Alias : qlt0,0 Sessions : 4 Initiator: wwn.210100E08B272AB5 Alias: ute198:qlc1 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210100E08B296A60 Alias: ute198:qlc3 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B072AB5 Alias: ute198:qlc0 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B096A60 Alias: ute198:qlc2 Logged in since: Thu Mar 27 16:38:30 2008 Copyright 2009 Peter Baer Galvin - All Rights Reserved 124Saturday, May 2, 2009
  • 125. Build an OpenSolaris Storage Server in 10 Minutes - cont Target: wwn.210100E08BB09221 Operational Status: Online Provider Name : qlt Alias : qlt1,0 Sessions : 4 Initiator: wwn.210100E08B272AB5 Alias: ute198:qlc1 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210100E08B296A60 Alias: ute198:qlc3 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B072AB5 Alias: ute198:qlc0 Logged in since: Thu Mar 27 16:38:30 2008 Initiator: wwn.210000E08B096A60 Alias: ute198:qlc2 Logged in since: Thu Mar 27 16:38:30 2008 Copyright 2009 Peter Baer Galvin - All Rights Reserved 125Saturday, May 2, 2009
  • 126. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 3: Create a device to use as storage for the target. Use ZFS to create a volume (zvol) for use as the storage behind the target: # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT mypool 68G 94K 68.0G 0% ONLINE - # zfs create -V 5gb mypool/myvol # zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 5.00G 61.9G 18K /mypool mypool/myvol 5G 66.9G 16K - Copyright 2009 Peter Baer Galvin - All Rights Reserved 126Saturday, May 2, 2009
  • 127. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 4: Register the zvol with the framework. The zvol becomes the SCSI logical unit (disk) behind the target: # sbdadm create-lu /dev/zvol/rdsk/mypool/myvol Created the following LU: GUID DATA SIZE SOURCE 6000ae4093000000000047f3a1930007 5368643584 /dev/zvol/rdsk/mypool/ myvol Confirm its existence as follows: # stmfadm list-lu -v LU Name: 6000AE4093000000000047F3A1930007 Operational Status: Online Provider Name : sbd Alias : /dev/zvol/rdsk/mypool/myvol View Entry Count : 0 Copyright 2009 Peter Baer Galvin - All Rights Reserved 127Saturday, May 2, 2009
  • 128. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 5: Find the initiator HBA ports to which to map the LUs. Discover HBA ports on the initiator host using the following command: # fcinfo hba-port HBA Port WWN: 25000003ba0ad303 Port Mode: Initiator Port ID: 1 OS Device Name: /dev/cfg/c5 Manufacturer: QLogic Corp. Model: 2200 Firmware Version: 2.1.145 FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver: Type: L-port State: online Supported Speeds: 1Gb Current Speed: 1Gb Node WWN: 24000003ba0ad303 Copyright 2009 Peter Baer Galvin - All Rights Reserved 128Saturday, May 2, 2009
  • 129. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 5: Find the initiator HBA ports to which to map the LUs. Discover HBA ports on the initiator host using the following command: # fcinfo hba-port HBA Port WWN: 25000003ba0ad303 Port Mode: Initiator Port ID: 1 OS Device Name: /dev/cfg/c5 Manufacturer: QLogic Corp. Model: 2200 Firmware Version: 2.1.145 FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver: Type: L-port State: online Supported Speeds: 1Gb Current Speed: 1Gb Node WWN: 24000003ba0ad303 . . . Copyright 2009 Peter Baer Galvin - All Rights Reserved 129Saturday, May 2, 2009
  • 130. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 6: Create a host group and add the world-wide numbers (WWNs) of the initiator host HBA ports to it. Name the group mygroup: # stmfadm create-hg mygroup # stmfadm list-hg Host Group: mygroup Add the WWNs of the ports to the group: # stmfadm add-hg-member -g mygroup wwn.210000E08B096A60 wwn.210100E08B296A60 wwn.210100E08B272AB5 wwn.210000E08B072AB5 Now check that everything is in order: # stmfadmlist-hg-member -v -g mygroup With the host group created, youre now ready to export the logical unit. This is accomplished by adding a view entry to the logical unit using this host group, as shown in the following command: # stmfadm add-view -h mygroup 6000AE4093000000000047F3A1930007 Copyright 2009 Peter Baer Galvin - All Rights Reserved 130Saturday, May 2, 2009
  • 131. Build an OpenSolaris Storage Server in 10 Minutes - cont Step 7: Check the visibility of the targets on the initiator host. First, force the devices on the initiator host to be rescanned with a simple script: #!/bin/ksh fcinfo hba-port |grep "^HBA" |awk {print $4}|while read ln do fcinfo remote-port -p $ln -s >/dev/null 2>&1 done The disk exported over FC should then appear in the format list: # format Searching for disks...done c6t6000AE4093000000000047F3A1930007d0: configured with capacity of 5.00GB Copyright 2009 Peter Baer Galvin - All Rights Reserved 131Saturday, May 2, 2009
  • 132. Build an OpenSolaris Storage Server in 10 Minutes - cont ... partition> p Current partition table (default): Total disk cylinders available: 20477 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 511 128.00MB (512/0/0) 262144 1 swap wu 512 - 1023 128.00MB (512/0/0) 262144 2 backup wu 0 - 20476 5.00GB (20477/0/0) 10484224 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 usr wm 1024 - 20476 4.75GB (19453/0/0) 9959936 7 unassigned wm 0 0 (0/0/0) 0 partition> Copyright 2009 Peter Baer Galvin - All Rights Reserved 132Saturday, May 2, 2009
  • 133. ZFS Root Solaris 10 10/08 (aka S10U6) supports installation with ZFS as the root file system (as does OpenSolaris) Note that you can’t as of U6 flash archive a ZFS root system(!) Can upgrade by using liveupgrade (LU) to mirror to second disk (ZFS pool) and upgrading there, then booting there lucreate to copy the primary BE to create an alternate BE # zpool create mpool mirror c1t0d0s0 c1t1d0s0 # lucreate -c c1t2d0s0 -n zfsBE -p mpool The default file systems are created in the specified pool and the non-shared file systems are then copied into the root pool Run luupgrade to upgrade the alternate BE (optional) Run luactivate on the newly upgraded alternatve BE so that when the system is rebooted, it will be the new primary BE # luactivate zfsBE Copyright 2009 Peter Baer Galvin - All Rights Reserved 133Saturday, May 2, 2009
  • 134. Life is good Once on ZFS as root, life is good Mirror the root disk with 1 command (if not mirrored): # zpool attach rpool c1t0d0s0 c1t1d0s0 Note that you have to manually do an installboot on the mirrored disk Now consider all the ZFS features, used on the boot disk Snapshot before patch, upgrade, any change Undo change via 1 command Replicate to another system for backup, DR ... Copyright 2009 Peter Baer Galvin - All Rights Reserved 134Saturday, May 2, 2009
  • 135. ZFS Labs What pools are available in your zone? What are their states? What is their performance like? What ZFS file systems? Create a new file system Create a file there Take a snapshot of that file system Delete the file Revert to the file system state as of the snapshot How do you see the contents of a snapshot? Copyright 2009 Peter Baer Galvin - All Rights Reserved 135Saturday, May 2, 2009
  • 136. ZFS Final Thought Eric Schrocks Weblog - Thursday Nov 17, 2005 UFS/SVM vs. ZFS: Code Complexity A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People tend to focus on performance, features, and CLI tools as they are easier to compare. I thought Id take a moment to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate yields: UFS: kernel= 46806 user= 40147 total= 86953 SVM: kernel= 75917 user=161984 total=237901 TOTAL: kernel=122723 user=202131 total=324854 ZFS: kernel= 50239 user= 21073 total= 71312 The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code (kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what those ZFS numbers will look like in 20 years... Copyright 2009 Peter Baer Galvin - All Rights Reserved 136Saturday, May 2, 2009
  • 137. Copyright 2009 Peter Baer Galvin - All Rights Reserved 137Saturday, May 2, 2009
  • 138. Where to Learn More Community: http://www.opensolaris.org/os/community/zfs Wikipedia: http://en.wikipedia.org/wiki/ZFS ZFS blogs: http://blogs.sun.com/main/tags/zfs ZFS ports Apple Mac: http://developer.apple.com/adcnews FreeBSD: http://wiki.freebsd.org/ZFS Linux/FUSE: http://zfs-on-fuse.blogspot.com As an appliance: http://www.nexenta.com Beginner’s Guide to ZFS: http://www.sun.com/bigadmin/ features/articles/zfs_overview.jsp Copyright 2009 Peter Baer Galvin - All Rights Reserved 138Saturday, May 2, 2009
  • 139. Sun Storage 7x10 Copyright 2009 Peter Baer Galvin - All Rights Reserved 139Saturday, May 2, 2009
  • 140. Speaking of Futures The future of Sun storage? Announced 11/10/2008 Copyright 2009 Peter Baer Galvin - All Rights Reserved 140Saturday, May 2, 2009
  • 141. Most Scalable Storage System Design • Hybrid Flash Storage Pools Read/ > Data is intelligently placed in L2ARC SSDs Write/ DRAM, Flash or DIsk ZIL SSDs > Transparently Managed as one storage pool HDD Pool > Optimizes $/GB and $/IOP (SATA) performance • Enterprise Grade Flash > 3-5 year lifetime Copyright 2009Confidential:BaerOnly Sun Peter Internal Galvin - All Rights Reserved 10 141Saturday, May 2, 2009
  • 142. Latency Comparison Bridging the DRAM to HDD Gap 1S 100mS 10mS 1mS 100uS 10uS TAPE 1uS HDD 100nS FLASH/ SSD 10nS DRAM 1nS CPU Copyright 2009 Confidential:BaerOnly Sun Peter Internal Galvin - All Rights Reserved 35 142Saturday, May 2, 2009
  • 143. ZFS Hybrid Pool Example Based on Actual Benchmark Results 4.9x 3.2x 4% 2x 11% Read IOPs Write IOPs Cost Storage Power Raw Capacity (Watts) (TB) Hybrid Storage Pool (DRAM + Read SSD + Write SSD + 5x 4200 RPM SATA) Traditional Storage Pool (DRAM + 7x 10K RPM 2.5”) Copyright 2009 Peter Baer Galvin - All Rights Reserved 143 Sun Confidential: Internal Only 12Saturday, May 2, 2009
  • 144. Full Compliment of Storage Software Included with the system at no additional cost Data Data Additional Data Protocols Protocols Data Services Data Management Management Services • NFS v3 and v4 • Write Flash Acceleration • DTrace Analytics • CIFS • Read Flash Acceleration • Self-healing system • ISCSI • RAID-Z DP (6) and data • HTTP • Mirroring • Simple out-of-the-box • WebDAV • Striping setup • FTP • Active-active Clustering • Secure Browser UI and CLI • NDMP v4 • Remote Replication • Advanced Networking • FC Target (Roadmap) • Antivirus Quarantine • NIS, LDAP, and AD • InfiniBand (Roadmap) • Snapshots • Users, Rolls • SNMP (r/o, r/w, unlimited) • Compression • Dashboard • Alerts • Phone Home • Scripting • Upgrade Copyright 2009 Peter Baer Galvin - All Rights Reserved 144Saturday, May 2, 2009
  • 145. Copyright 2009 Confidential: Internal Only Sun Peter Baer Galvin - All Rights Reserved 27 145Saturday, May 2, 2009
  • 146. Providing Unprecedented Storage Analytics • Automatic real-time visualization of application and storage related workloads • Simple yet sophisticated instrumentation provides real-time comprehensive analysis • Supports multiple simultaneous application and workload analysis in real- time • Analysis can be saved, exported and replayed for further analysis. • Built on DTrace instrumentation > NFSv3, NFSv4, CIFS, iSCSI > ZFS and the Solaris i/o path > CPU and Memory Utilization > Networking (TCP, UDP, IP) Sun Confidential: Internal Only 7 Copyright 2009 Peter Baer Galvin - All Rights Reserved 146Saturday, May 2, 2009
  • 147. ANSWERING KEY QUESTIONS “What is CPU and Memory Utilization?” “How much storage is being utilized?” “How is disk performing? How many Ops/Sec?” “What Services are active?” “Which applications/users are causing performance issues?” Copyright 2009Confidential: Baer Galvin - All Rights Reserved Sun Peter Internal Only 8 147Saturday, May 2, 2009
  • 148. Data Services ZFS - Continued • ZFS Useable Space " Market Leading Usable Space Double Parity Double Parity Mirrored Single Parity Striped RAID RAID RAID Wide Stripes 72% 83% 42% 60% 90% Copyright 2009 Peter Baer Galvin - All Rights Reserved 148Saturday, May 2, 2009
  • 149. Sun Storage 7000 Unified Storage Systems Price, Performance, Capacity and Availability 7410 Cluster 288 x 3.5” SATAII Disks Up to 287TB* total storage Hybrid Storage Pool with Read / Write optimized SSD 7410 288 x 3.5” SATAII Disks Up to 287TB* total storage Hybrid Storage Pool with Read and Write optimized SSD Price 7210 48x 3.5” SATAII Disks Up to 46TB total storage Hybrid Storage Pool with Write optimized SSD 7110 16x2.5”SAS Disks, 2.3TB Standard Storage Pool SSD is not used *Up to 575TB soon after release Capacity / Performance Copyright 2009 Peter Baer Galvin - All Rights Reserved 149Saturday, May 2, 2009
  • 150. References You Are Now Free to Move About Solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 150Saturday, May 2, 2009
  • 151. References  [Kozierok] TCP/IP Guide, No Starch Press, 2005  [Nemeth] Nemeth et al, Unix System Administration Handbook, 3rd edition, Prentice Hall, 2001  [SunFlash] The SunFlash announcement mailing list run by John J. Mclaughlin. News and a whole lot more. Mail sunflash-info@sun.com  Sun online documents at docs.sun.com  [Kasper] Kasper and McClellan, Automating Solaris Installations, SunSoft Press, 1995 Copyright 2009 Peter Baer Galvin - All Rights Reserved 151Saturday, May 2, 2009
  • 152. References (continued)  [O’Reilly] Networking CD Bookshelf, Version 2.0, O’Reilly 2002  [McDougall] Richard McDougall et al, Resource Management, Prentice Hall, 1999 (and other "Blueprint" books)  [Stern] Stern, Eisler, Labiaga, Managing NFS and NIS, 2nd Edition, O’Reilly and Associates, 2001 Copyright 2009 Peter Baer Galvin - All Rights Reserved 152Saturday, May 2, 2009
  • 153. References (continued)  [Garfinkel and Spafford] Simson Garfinkel and Gene Spafford, Practical Unix & Internet Security, 3rd Ed, O’Reilly & Associates, Inc, 2003 (Best overall Unix security book)  [McDougall, Mauro, Gregg] McDougall, Mauro, and Gregg, Solaris Internals and Solaris Performance and Tools, 2007 (great Solaris internals, DTrace, mdb books) Copyright 2009 Peter Baer Galvin - All Rights Reserved 153Saturday, May 2, 2009
  • 154. References (continued)  Subscribe to the Firewalls mailing list by sending "subscribe firewalls <mailing-address>" to Majordomo@GreatCircle.COM  USENIX membership and conferences. Contact USENIX office at (714)588-8649 or office@usenix.org  Sun Support: Sun’s technical bulletins, plus access to bug database: sunsolve.sun.com  Solaris 2 FAQ by Casper Dik: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/Solaris2/FAQ Copyright 2009 Peter Baer Galvin - All Rights Reserved 154Saturday, May 2, 2009
  • 155. References (continued)  Sun Managers Mailing List FAQ by John DiMarco: ftp://ra.mcs.anl.gov/sun-managers/faq Suns unsupported tool site (IPV6, printing) http://playground.sun.com/ Sunsolve STBs and Infodocs http://www.sunsolve.com Copyright 2009 Peter Baer Galvin - All Rights Reserved 155Saturday, May 2, 2009
  • 156. References (continued)  comp.sys.sun.* FAQ by Rob Montjoy: ftp:// rtfm.mit.edu/pub/usenet-by-group/comp.answers/comp-sys-sun-faq “Cache File System” White Paper from Sun: http://www.sun.com/sunsoft/Products/Solaris-whitepapers/Solaris- whitepapers.html  “File System Organization, The Art of Automounting” by Sun: ftp://sunsite.unc.edu/pub/sun-info/white-papers/TheArtofAutomounting-1.4.ps Solaris 2 Security FAQ by Peter Baer Galvin http://www.sunworld.com/common/security-faq.html Secure Unix Programming FAQ by Peter Baer Galvin http://www.sunworld.com/swol-08-1998/swol-08-security.html Copyright 2009 Peter Baer Galvin - All Rights Reserved 156Saturday, May 2, 2009
  • 157. References (continued)  Firewalls mailing list FAQ: ftp://rtfm.mit.edu/pub/usenet-by-group/ Comp.answers/firewalls-faq  There are a few Solaris-helping files available via anon ftp at ftp://ftp.cs.toronto.edu/pub/darwin/ solaris2 Peter’s Solaris Corner at SysAdmin Magazine http://www.samag.com/solaris  Marcus and Stern, Blueprints for High Availability, Wiley, 2000  Privilege Bracketing in Solaris 10 http://www.sun.com/blueprints/0406/819-6320.pdf Copyright 2009 Peter Baer Galvin - All Rights Reserved 157Saturday, May 2, 2009
  • 158. References (continued) Peter Baer Galvins Sysadmin Column (and old Petes Wicked World security columns, etc) http://www.galvin.info My blog at http://pbgalvin.wordpress.com Operating Environments: Solaris 8 Operating Environment Installation and Boot Disk Layout by Richard Elling http://www.sun.com/blueprints (March 2000) Sun’s BigAdmin web site, including Solaris and Solaris X86 tools and information’ http://www.sun.com/bigadmin Copyright 2009 Peter Baer Galvin - All Rights Reserved 158Saturday, May 2, 2009
  • 159. References (continued) DTrace http://users.tpg.com.au/adsln4yb/ dtrace.html http://www.solarisinternals.com/si/dtrace/ index.php http://www.sun.com/bigadmin/content/dtrace/ Copyright 2009 Peter Baer Galvin - All Rights Reserved 159Saturday, May 2, 2009