Your SlideShare is downloading. ×
  • Like
2009 04.s10-admin-topics2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

2009 04.s10-admin-topics2

  • 1,890 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,890
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
38
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Solaris 10 Administration Topics Workshop 2 - Virtualization By Peter Baer Galvin For Usenix Last Revision Apr 2009 Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 2. About the Speaker Peter Baer Galvin - 781 273 4100 pbg@cptech.com www.cptech.com peter@galvin.info My Blog: www.galvin.info Bio Peter Baer Galvin is the Chief Technologist for Corporate Technologies, Inc., a leading systems integrator and VAR, and was the Systems Manager for Brown Universitys Computer Science Department. He has written articles for Byte and other magazines. He was contributing editor of the Solaris Corner for SysAdmin Magazine , wrote Petes Wicked World, the security column for SunWorld magazine, and Pete’s Super Systems, the systems administration column there. He is now Sun columnist for the Usenix ;login: magazine. Peter is co-author of the Operating Systems Concepts and Applied Operating Systems Concepts texbooks. As a consultant and trainer, Mr. Galvin has taught tutorials in security and system administration and given talks at many conferences and institutions. Copyright 2008 Peter Baer Galvin - All Rights Reserved 2Saturday, May 2, 2009
  • 3. Objectives Cover a wide variety of topics in Solaris 10 Useful for experienced system administrators Save time Avoid (my) mistakes Learn about new stuff Answer your questions about old stuff Wont read the man pages to you Workshop for hands-on experience and to reinforce concepts Note – Security covered in separate tutorial Copyright 2009 Peter Baer Galvin - All Rights Reserved 3Saturday, May 2, 2009
  • 4. More Objectives What makes novice vs. advanced administrator? Bytes as well as bits, tactics and strategy Knows how to avoid trouble How to get out of it once in it How to not make it worse Has reasoned philosophy Has methodology Copyright 2009 Peter Baer Galvin - All Rights Reserved 4Saturday, May 2, 2009
  • 5. Prerequisites Recommend at least a couple of years of Solaris experience Or at least a few years of other Unix experience Best is a few years of admin experience, mostly on Solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 5Saturday, May 2, 2009
  • 6. About the Tutorial Every SysAdmin has a different knowledge set A lot to cover, but notes should make good reference So some covered quickly, some in detail Setting base of knowledge Please ask questions But let’s take off-topic off-line Solaris BOF Copyright 2009 Peter Baer Galvin - All Rights Reserved 6Saturday, May 2, 2009
  • 7. Fair Warning Sites vary Circumstances vary Admin knowledge varies My goals Provide information useful for each of you at your sites Provide opportunity for you to learn from each other Copyright 2009 Peter Baer Galvin - All Rights Reserved 7Saturday, May 2, 2009
  • 8. Why Listen to Me 20 Years of Sun experience Seen much as a consultant Hopefully, youve used: My Usenix ;login: column The Solaris Corner @ www.samag.com The Solaris Security FAQ SunWorld “Petes Wicked World” SunWorld “Petes Super Systems” Unix Secure Programming FAQ (out of date) Operating System Concepts (The Dino Book), now 8th ed Applied Operating System Concepts Copyright 2009 Peter Baer Galvin - All Rights Reserved 8Saturday, May 2, 2009
  • 9. Slide Ownership As indicated per slide, some slides copyright Sun Microsystems Thanks to Jeff Victor for input Feel free to share all the slides - as long as you don’t charge for them or teach from them for fee Copyright 2009 Peter Baer Galvin - All Rights Reserved 9Saturday, May 2, 2009
  • 10. Overview Lay of the Land Copyright 2009 Peter Baer Galvin - All Rights ReservedSaturday, May 2, 2009
  • 11. Schedule Times and Breaks Copyright 2009 Peter Baer Galvin - All Rights Reserved 11Saturday, May 2, 2009
  • 12. Coverage Solaris 10+, with some Solaris 9 where needed Selected topics that are new, different, confusing, underused, overused, etc Copyright 2009 Peter Baer Galvin - All Rights Reserved 12Saturday, May 2, 2009
  • 13. Outline Overview Objectives Virtualization choices in Solaris Zones / Containers LDOMS and Domains Virtualbox Xvm (aka Xen) Copyright 2009 Peter Baer Galvin - All Rights Reserved 13Saturday, May 2, 2009
  • 14. Polling Time Solaris releases in use? Plans to upgrade? Other OSes in use? Use of Solaris rising or falling? SPARC and x86 OpenSolaris? Copyright 2009 Peter Baer Galvin - All Rights Reserved 14Saturday, May 2, 2009
  • 15. Your Objectives? Copyright 2009 Peter Baer Galvin - All Rights Reserved 15Saturday, May 2, 2009
  • 16. Your Lab Environment Apple Macbook Pro 3GB memory Mac OS X 10.4.10 VMware Fusion 1.0 Solaris Nevada 50 Containers Copyright 2009 Peter Baer Galvin - All Rights Reserved 16Saturday, May 2, 2009
  • 17. Lab Preparation Have device capable of telnet on the USENIX network Or have a buddy Learn your “magic number” Telnet to 131.106.62.100+”magic number” User “root, password “lisa” It’s all very secure Copyright 2009 Peter Baer Galvin - All Rights Reserved 17Saturday, May 2, 2009
  • 18. Lab Preparation Or... Use virtualbox Use your own system Use a remote machine you have legit access to Copyright 2009 Peter Baer Galvin - All Rights Reserved 18Saturday, May 2, 2009
  • 19. Lab Preparation Or... Use virtualbox Use your own system Use a remote machine you have legit access to Copyright 2009 Peter Baer Galvin - All Rights Reserved 19Saturday, May 2, 2009
  • 20. Choosing Virtualization Technologies (See separate “virtualization comparison” document) Copyright 2009 Peter Baer Galvin - All Rights Reserved 20Saturday, May 2, 2009
  • 21. !"#$%&()*"+,(-+*(.#&!/01*)"2 /012(301$%$%4-, 5%1$"0#(!067%-,)*(5%1$"0#%80$%4- 9,4"16(!0-0.:-$ !"#$%&#()*+,( *%-.#()* O1-2($4(B#D%P%#%$< O1-2($4(%,4#0$%4- C4.%60#(;4:0%-, *4#01%,(=4-$0%-1, *4#01%,(9,4"16 ;<-0:%6(*<,$: !0-0.1(>*9!A ;4:0%-, *"-(D5! >?4-,(@(*9!A L- =4-$0%-1,(B41(C%-"D G(H-(*4#01%,(IJK 5!M01 *4#01%,(E(=4-$0%-1, /<&1N5 *4#01%,(F(=4-$0%-1, !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 21Saturday, May 2, 2009
  • 22. !"#$%&&()*+,""-*+.&-/ ! !"#$%&()"*+$&*,%- " 9-:"-*$;-(#-<$&#*,1#-*=$.-.)(+$>)),0(&#,=$ ?)(;<)1:@:(&A-#$3/B$",&<&C1,&)#=$D!$.1#14-.-#,$ )*,*=$>&#-@4(1&#-:$*-"(&,+ " !&#4<-@;-(#-<=$5-,-()4-#-)"*$100<&1,&)#$ -#A&()#.-#,* ! ./*$0&1(!/+,0(."0$&*- " %1E&.&C-*$51(:?1(-$&*)<1,&)# ! 2"3&1$#(."0$&*4(5&%+6$#(7$18&*,- " %"<,&0<-$;-(#-<*=$>"<<$D!$-#A&()#.-#,*=$ 5-,-()4-#-)"* ! F-5#)<)4&-*$1(-$).0<-.-#,1(+ !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 22Saturday, May 2, 2009
  • 23. !"#$%&#()*+(),()*-.)/"#$.0#/.12 !"#$%&()"*+$&*,%($*-(.&%+/$#(0$12&*, 812/#.2()*: 812/#.2()*7 812/#.2()*; 812/#.2()*< 812/#.2()*= !13#.2*4*&!13*4*5"(6/ !137 8139"/() !678)()09 345 !678) :;"< !/*(3.0 ;=*$<&1(;"<$&* !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 23Saturday, May 2, 2009
  • 24. Zones, Containers, and LDOMS Copyright 2009 Peter Baer Galvin - All Rights Reserved 24Saturday, May 2, 2009
  • 25. Overview Cover details and use of Zones/Containers and LDOMS Note that Xen (x64 only) and Virtualbox (open source x64 only) are coming No slides yet Copyright 2009 Peter Baer Galvin - All Rights Reserved 25Saturday, May 2, 2009
  • 26. Zones Overview Think of them of chroot on steroids Virtualized operating system services Isolated and “secure” environment for running apps Apps and users (and superusers) in zone cannot see / effect other zones Delegated admin control Virtualized device paths, network interfaces, network ports, process space, resource use (via resource manager) Application fault isolation Detach and attach containers between systems Cloning of a zone to create identical new zone Copyright 2009 Peter Baer Galvin - All Rights Reserved 26Saturday, May 2, 2009
  • 27. Zones Overview - 2 Low physical resource use Up to 8192 zones per system! Differentiated file system Multiple versions of an app installed and running on a given system Inter-zone communication is only via network (but short-pathed through the kernel No application changes needed – no API or ABI Can restrict disk use of a zone via the loopback file driver (lofi) using a file as a file system Can dedicate an Ethernet port to a zone Allowing snooping, firewalling, managing that port by the zone Copyright 2009 Peter Baer Galvin - All Rights Reserved 27Saturday, May 2, 2009
  • 28. Other Virtualization Options Many virtualization options to consider Containers is just one of them Xen (xVM) - being integrated into Solaris Nevada Run other OSes (linux, win) with S10+ has the host Industry semi-standard Para-virtualization, x86 only LDOMs - hard partitions, shipped in May 2007 Run multiple copies of Solaris on the same coolthreads chip (Niagara, Rock in the future) Some resource management - move CPUs and mem VMWare - solaris as a guest, not a host so far, x86 only Traditional Sun Domains - SPARC only, Enterprise servers only Copyright 2009 Peter Baer Galvin - All Rights Reserved 28Saturday, May 2, 2009
  • 29. !"#$%&()"*+ !"#$%"(8%!"(-*9:;0<&%%/=<&3,9:<:>(9:?@AB@C@:C1 !"#$%"(8%!"(&%%#D(E( &$(8%!" %++,*-.-(8%!" (%)%$%*(8%!" 8%!"(&%%#D(E8%!"E/G2V6 8%!"(&%%#D(E8%!"E$"F 8%!"(&%%#D(E8%!"E355 2G2#"/(2"&*+,"2 $"F(2"&*+,"(5&%H",# H"2(5&%H",# /G2V6(5&%H",# R!*+&%!/"!# T556+,3#+%! -53#&%61 -T53,."(9@=@::1 -H:2"1 -/G2V6)1 3N)+#(2"&*+,"2 ,&G5#%(5&%H",# 355(N2"&2(5&%H )F3(N2"&2(5&%H -3N)+#)1 -2261 -2.M(F32.M(5&2#3#1 -2.M(F32.M(5&2#3#1 2",N&+#G(2"&*+,"2 5&%7G(5&%H",# 2G2#"/(5&%H",# 2G2#"/(5&%H",# -6%4+!M(QIK1 -5&%7G1 -+!"#)M(22.)1 -+!"#)M(22.)1 ,%!2%6" ./"0D: ./"0D= ./"0D9 L63#S%&/ 8,%!2 8,%!2 8,%!2 ./"0 ,"0D: ,"0D= ,"0D9 U+&#N36 EN2& EN2& EN2& EN2& ,"9 ,"0 8%!"3)/) 8%!"3)/) 8%!"3)/) 8%!"(/3!34"/"!# ,%&"(2"&*+,"2 &"/%#"(3)/+!E/%!+#%&+!4 563#S%&/(3)/+!+2#&3#+%! -8%!",S4M(8%!"3)/M(86%4+!1 -+!"#)M(&5,F+!)M(22.)M(@@@1 -IJKLM(IN!KOM(PQRK1 -2G2"*"!#)M()"*S23)/M(+S,%!S+4M@@@1 2#%&34"(,%/56"7 !"#$%&()"*+," !"#$%&()"*+," !"#$%&()"*+," -./"01 -,"01 -,"91 !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 29Saturday, May 2, 2009
  • 30. (From the Solaris 10 Sun Net Talk about Solaris 10 Security) Copyright 2009 Peter Baer Galvin - All Rights Reserved 30Saturday, May 2, 2009
  • 31. Zone Limits Only one OS installed on a system One set of OS patches Only one /etc/system Although Sun working to move as many settings as possible out of /etc/ system System crash / OS crash -> all zones crash Each (sparse) zone uses ~ 100MB of disk some VM and physical memory (for processes and daemons running in the zone) - ~40MB of physical memory Copyright 2009 Peter Baer Galvin - All Rights Reserved 31Saturday, May 2, 2009
  • 32. Sparse vs. Whole Root Zone Sparse Whole-Root Loop-back mount of system directories Full install of all system files (/usr, etc) Lots of disk space Little disk space use Each binary independent -> memory use Each zone shares global-zone system- binaries -> shared memory Apps may not be supported (but more likely) Apps may not be supported Cannot change system files Can change system files Inter-zone communication only via Inter-zone communication only via network networkSaturday, May 2, 2009
  • 33. !"#$%&($%)*+,$-+ !"#$%"&##(&) 111&&&&1111&&&& )*#+,- ).- )/,0&&&111&&&1111&&1111 1111 3#+,&##(4&)*#+,-)*#+,7 . / 0 !"#$%"&02,5 3#+,&##(4&) 3#+,&02,5 )$2+ ).- )/,0 ,(6111 9)#-: !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 33Saturday, May 2, 2009
  • 34. !"#$%&($%)*+,$-+.%)/01+$23"", !"#$%"&##(&) 444&&&&4444&&&& )8#-/+ )*+ )./0&&&444&&&4444&&4444 4444 1#-/&##(7&)8#-/+)8#-/9 4 5 6 !"#$%"&0,/2 1#-/&##(7&) 1#-/&0,/2 56 )$,- )*+ )./0 /(3444 9)#-$: !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 34Saturday, May 2, 2009
  • 35. Global Zone Aka the usual system Global Is assigned ID 0 by the system Provides the single instance of the Solaris kernel that is bootable and running on the system Contains a complete installation of the Solaris system software packages Can contain additional software packages or additional software, directories, files, and other data not installed through packages Copyright 2009 Peter Baer Galvin - All Rights Reserved 35Saturday, May 2, 2009
  • 36. Global Zone - 2 Provides a complete and consistent product database that contains information about all software components installed in the global zone Holds configuration information specific to the global zone only, such as the global zone host name and file system table Is the only zone that is aware of all devices and all file systems Copyright 2009 Peter Baer Galvin - All Rights Reserved 36Saturday, May 2, 2009
  • 37. Global Zone - 3 Is the only zone with knowledge of non-global zone existence and configuration Is the only zone from which a non-global zone can be configured, installed, managed, or uninstalled Can see the file systems of the non-global zones (i.e. can copy files into the non-global zone roots for the non-global zones to see Copyright 2009 Peter Baer Galvin - All Rights Reserved 37Saturday, May 2, 2009
  • 38. Non-global Zones Non-Global Is assigned a zone ID by the system when the zone is booted Shares operation under the Solaris kernel booted from the global zone Contains an installed subset of the complete Solaris Operating System software packages Contains Solaris software packages shared from the global zone (“sparse zone”) Can contain additional installed software packages not shared from the global zone Copyright 2009 Peter Baer Galvin - All Rights Reserved 38Saturday, May 2, 2009
  • 39. Non-global Zones -2 Can contain additional software, directories, files, and other data created on the non-global zone that are not installed through packages or shared from the global zone Has a complete and consistent product database that contains information about all software components installed on the zone, whether present on the non-global zone or shared read-only from the global zone Is not aware of the existence of any other zones Cannot install, manage, or uninstall other zones, including itself Has configuration information specific to that non-global zone only, such as the non-global zone host name and file system table Copyright 2009 Peter Baer Galvin - All Rights Reserved 39Saturday, May 2, 2009
  • 40. “Sparse” and “Whole Root” Zones By default /lib, /platform, /sbin, /usr are LOFS read-only mounted from global zone into child zone Ergo those can’t be modified by child zone Packages installed in child zone only install non (/lib, /platform, /sbin, /usr) components into the child zone’s file systems Saves disk space Saves memory Whole root zone removes those mounts Packages install entirely Ergo child zone can modify its /lib, /platform, /sbin, /usr Some apps not supported in zones, some only in whole root, some in sparse root Per app check with app vendor! Note that ZFS clone use for zone builds may mean that sparse root is no longer useful! Copyright 2009 Peter Baer Galvin - All Rights Reserved 40Saturday, May 2, 2009
  • 41. Non-global Zone States Configured - The zone’s configuration is complete and committed to stable storage, not initially booted Incomplete - During an install or uninstall operation Installed - The zone’s configuration is instantiated on the system but no virtual platform. Files copied into zoneroot. Ready - The virtual platform for the zone is established. The kernel creates the zsched process, network interfaces are plumbed, file systems are mounted, and devices are configured. A unique zone ID is assigned by the system, no processes associated with the zone have been started. Running - User processes associated with the zone application environment are running. Shutting down and Down - These states are transitional states that are visible while the zone is being halted. However, a zone that is unable to shut down for any reason will stop in one of these states. Copyright 2009 Peter Baer Galvin - All Rights Reserved 41Saturday, May 2, 2009
  • 42. (From System Administration Guide: N1Grid Containers, Resource Management, and Solaris Zones) Copyright 2009 Peter Baer Galvin - All Rights Reserved 42Saturday, May 2, 2009
  • 43. Zone boot Note that zoneadm allows “boot” “reboot” “halt” and “shutdown”. Only “shutdown” and “boot” execute the smf commands Also note that there are many options to these commands (such as zoneadm boot -- - m verbose) Copyright 2009 Peter Baer Galvin - All Rights Reserved 43Saturday, May 2, 2009
  • 44. Zone Configuration Data from the following are not referenced or copied when a zone is installed: Non-installed packages Patches Data on CDs and DVDs Network installation images Any prototype or other instance of a zone In addition, the following types of information, if present in the global zone, are not copied into a zone that is being installed: New or changed users in the /etc/passwd file New or changed groups in the /etc/group file Configurations for networking services such as DHCP address assignment, UUCP, or sendmail Configurations for network services such as naming services New or changed crontab, printer, and mail files System log, message, and accounting files Copyright 2009 Peter Baer Galvin - All Rights Reserved 44Saturday, May 2, 2009
  • 45. Zone Configuration zlogin –C logs in to a just-boot virgin zone Only root can zlogin – normal zone access is via network The usual sysidconfig questions are asked (hostname, name service, timezone, kerberos) The zone root directory must exist prior to zone installation Zone reboots to put configuration changes into effect (a few seconds) Messages look like a system reboot (within your window) Copyright 2009 Peter Baer Galvin - All Rights Reserved 45Saturday, May 2, 2009
  • 46. sysidcfg Create to shorten first boot questions File gets copied into <zonehome>/root/etc Sample contents: name_service=DNS {domain_name=petergalvin.info name_server=63.240.76.19 search=arp.com} network_interface=PRIMARY {hostname=zone00.petergalvin.info} timezone=US/Eastern terminal=vt100 system_locale=C timeserver=localhost root_password=aMG0YPkgZQPqo <obviously change this> security_policy=NONE nfsv4_domain=dynamic Copyright 2009 Peter Baer Galvin - All Rights Reserved 46Saturday, May 2, 2009
  • 47. Zone Configuration - 2 # zonecfg -z app1 app1: No such zone configured Use create to begin configuring a new zone. zonecfg:app1> create zonecfg:app1> set zonepath=/opt/zone/app1 zonecfg:app1> set autoboot=false zonecfg:app1> add net zonecfg:app1:net> set physical=pnc0 zonecfg:app1:net> set address=192.168.118.140 zonecfg:app1:net> end zonecfg:app1> add fs zonecfg:app1:fs> set dir=/export/home zonecfg:app1:fs> set special=/export/home zonecfg:app1:fs> set type=lofs zonecfg:app1:fs> end zonecfg:app1> add inherit-pkg-dir zonecfg:app1:inherit-pkg-dir> set dir=/opt/sfw zonecfg:app1:inherit-pkg-dir> end zonecfg:app1> verify zonecfg:app1> commit zonecfg:app1> exit Copyright 2009 Peter Baer Galvin - All Rights Reserved 47Saturday, May 2, 2009
  • 48. Zone Configuration - 3 # df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0d0s0 5678823 2689099 2932936 48% / /devices 0 0 0 0% /devices /dev/dsk/c0d0p0:boot 10296 1401 8895 14% /boot proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab fd 0 0 0 0% /dev/fd swap 600780 28 600752 1% /var/run swap 600776 24 600752 1% /tmp /dev/dsk/c0d0s7 4030684 32853 3957525 1% /export/home # zoneadm -z app1 verify WARNING: /opt/zone/app1 does not exist, so it cannot be verified. When zoneadm install is run, install will try to create /opt/zone/app1, and verify will be tried again, but the verify may fail if: the parent directory of /opt/zone/app1 is group- or other-writable or /opt/zone/app1 overlaps with any other installed zones. could not verify net address=192.168.118.140 physical=pnc0: No such device or address zoneadm: zone app1 failed to verify Copyright 2009 Peter Baer Galvin - All Rights Reserved 48Saturday, May 2, 2009
  • 49. Zone Configuration - 4 # ls -l /opt/zone total 2 drwx------ 4 root other 512 Aug 21 12:44 test # mkdir /opt/zone/app1 # chmod 700 /opt/zone/app1 # ls -l /opt/zone total 4 drwx------ 2 root other 512 Sep 16 15:14 app1 drwx------ 4 root other 512 Aug 21 12:44 test # zonadm -z app1 verify could not verify net address=192.168.118.140 physical=pnc0: No such device or address zoneadm: zone app1 failed to verify # zonecfg -z app1 zonecfg:app1> info zonepath: /opt/zone/app1 autoboot: false Copyright 2009 Peter Baer Galvin - All Rights Reserved 49Saturday, May 2, 2009
  • 50. Zone Configuration - 5 net: address: 192.168.118.140 physical: pnc0 zonecfg:app1> remove physical=pnc0 zonecfg:app1> add net zonecfg:app1:net> set physical=pcn0 zonecfg:app1:net> set address=192.168.118.140 zonecfg:app1:net> end zonecfg:app1> exit # zoneadm -z app1 verify # zoneadm -z app1 install Preparing to install zone <app1>. Creating list of files to copy from the global zone. Copying <2199> files to the zone. Initializing zone product registry. Determining zone package initialization order. Preparing to initialize <779> packages on the zone. Initializing package <0> of <779>: percent complete: 0% . . . Copyright 2009 Peter Baer Galvin - All Rights Reserved 50Saturday, May 2, 2009
  • 51. Zone Configuration -6 Zone <app1> is initialized. The file </opt/zone/app1/root/var/sadm/system/logs/install_log> contains a log of the zone installation. # zoneadm list -v ID NAME STATUS PATH 0 global running / 1 test running /opt/zone/test # df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0d0s0 5678823 2766177 2855858 50% / /devices 0 0 0 0% /devices /dev/dsk/c0d0p0:boot 10296 1401 8895 14% /boot proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab fd 0 0 0 0% /dev/fd swap 594332 32 594300 1% /var/run swap 594500 200 594300 1% /tmp /dev/dsk/c0d0s7 4030684 32853 3957525 1% /export/home Copyright 2009 Peter Baer Galvin - All Rights Reserved 51Saturday, May 2, 2009
  • 52. Zone Configuration -7 # zoneadm -z app1 boot zoneadm: zone app1: WARNING: pcn0:2: no matching subnet found in netmasks(4) for 192.168.118.131; using default of 192.168.118.131. # zoneadm list -v ID NAME STATUS PATH 0 global running / 1 test running /opt/zone/test 2 app1 running /opt/zone/app1 # telnet 192.168.118.140 Trying 192.168.118.140... telnet: Unable to connect to remote host: Connection refused # zlogin -C app1 [Connected to zone app1 console] Select a Locale 0. English (C - 7-bit ASCII) 1. U.S.A. (UTF-8) 2. Go Back to Previous Screen Please make a choice (0 - 2), or press h or ? for help: 0 . . . Copyright 2009 Peter Baer Galvin - All Rights Reserved 52Saturday, May 2, 2009
  • 53. Zone Configuration -8 rebooting system due to change(s) in /etc/default/init [NOTICE: Zone rebooting] SunOS Release 5.10 Version s10_63 32-bit Copyright 1983-2004 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: zone-app1 The system is coming up. Please wait. starting rpc services: rpcbind done. syslog service starting. Sep 16 15:48:24 zone-app1 sendmail[7567]: My unqualified host name (zone-app1) unknown; sleeping for retry Sep 16 15:49:24 zone-app1 sendmail[7567]: unable to qualify my own domain name (zone-app1) -- using short name WARNING: local host name (zone-app1) is not qualified; see cf/ README: WHO AM I? /etc/mail/aliases: 12 aliases, longest 10 bytes, 138 bytes total Copyright 2009 Peter Baer Galvin - All Rights Reserved 53Saturday, May 2, 2009
  • 54. Zone Configuration -9 Creating new rsa public/private host key pair Creating new dsa public/private host key pair The system is ready. zone-app1 console login: root Password: Sep 16 15:51:08 zone-app1 login: ROOT LOGIN /dev/console Sun Microsystems Inc. SunOS 5.10 s10_63 May 2004 # cat /etc/passwd root:x:0:1:Super-User:/:/sbin/sh daemon:x:1:1::/: bin:x:2:2::/usr/bin: . . . noaccess:x:60002:60002:No Access User:/: nobody4:x:65534:65534:SunOS 4.x NFS Anonymous Access User:/: Copyright 2009 Peter Baer Galvin - All Rights Reserved 54Saturday, May 2, 2009
  • 55. Zone Configuration -10 # useradd -u 101 -g 14 -d /export/home/pbg -s /bin/bash pbg # passwd pbg New Password: Re-enter new Password: passwd: password successfully changed for pbg # zoneadm list -v ID NAME STATUS PATH 3 app1 running / # exit zone-app1 console login: ~. [Connection to zone app1 console closed] Copyright 2009 Peter Baer Galvin - All Rights Reserved 55Saturday, May 2, 2009
  • 56. Zone Configuration - 11 # zoneadm list -v ID NAME STATUS PATH 0 global running / 1 test running /opt/zone/test 3 app1 running /opt/zone/app1 # uptime 3:53pm up 5:14, 1 user, load average: 0.23, 0.34, 0.43 # telnet 192.168.118.140 Trying 192.168.118.140… Connected to 192.168.118.140. Escape character is ‘^]’. Login: pbg Password: Copyright 2009 Peter Baer Galvin - All Rights Reserved 56Saturday, May 2, 2009
  • 57. Zones and ZFS Installing a zone with its root on ZFS is not supported as the system then lacks the ability to be upgraded. Note that “add fs” can be used to add access to a ZFS file system to a zone Beyond that, “add dataset” delegates a ZFS file system to a zone, removes it from the global zone The zone can manage the file system, except where management would effect other file systems / parent file system Filesystem contents can still be seen from global zone via zonepath +mountpoint (i.e. /zones/zone00/zfs/zonefs/zone00) # zfs create zfs/zonefs/zone00 # zonecfg -z zone00 zonecfg:zone00> add dataset zonecfg:zone00:dataset> set name=zfs/zonefs/zone00 zonecfg:zone00:dataset> end Copyright 2009 Peter Baer Galvin - All Rights Reserved 57Saturday, May 2, 2009
  • 58. Zone Script create -b set zonepath=/opt/zones/zone0 set autoboot=false add inherit-pkg-dir set dir=/lib end add inherit-pkg-dir set dir=/platform end add inherit-pkg-dir set dir=/sbin end Copyright 2009 Peter Baer Galvin - All Rights Reserved 58Saturday, May 2, 2009
  • 59. Zone Script add inherit-pkg-dir set dir=/usr end add inherit-pkg-dir set dir=/opt/sfw end add net set address=192.168.128.200 set physical=pcn0 end add rctl set name=zone.cpu-shares add value (priv=privileged,limit=1,action=none) end Copyright 2009 Peter Baer Galvin - All Rights Reserved 59Saturday, May 2, 2009
  • 60. Life in a Zone # ifconfig -a lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 lo0:1: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 zone test inet 127.0.0.1 netmask ff000000 lo0:2: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 zone app1 inet 127.0.0.1 netmask ff000000 pcn0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 2 inet 192.168.80.128 netmask ffffff00 broadcast 192.168.80.255 ether 0:c:29:44:a9:df pcn0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 zone test inet 192.168.80.139 netmask ffffff00 broadcast 192.168.80.255 pcn0:2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 zone app1 inet 192.168.80.140 netmask ffffff00 broadcast 192.168.80.255 Copyright 2009 Peter Baer Galvin - All Rights Reserved 60Saturday, May 2, 2009
  • 61. Life in a Zone - 2 $ telnet 192.168.80.140 . . . $ df -k Filesystem kbytes used avail capacity Mounted on / 9515147 1894908 7525088 21% / /dev 9515147 1894908 7525088 21% /dev /export/home 10076926 10369 9965788 1% /export/home /lib 9515147 1894908 7525088 21% /lib /platform 9515147 1894908 7525088 21% /platform /sbin 9515147 1894908 7525088 21% /sbin /usr 9515147 1894908 7525088 21% /usr proc 0 0 0 0% /proc mnttab 0 0 0 0% /etc/mnttab fd 0 0 0 0% /dev/fd swap 1043072 16 1043056 1% /var/run swap 1043056 0 1043056 0% /tmp $ touch /usr/foo touch: /usr/foo cannot create Note that virtual memory (and therefore swap) are global resources Copyright 2009 Peter Baer Galvin - All Rights Reserved 61Saturday, May 2, 2009
  • 62. Life in a Zone - 3 $ ps -ef UID PID PPID C STIME TTY TIME CMD root 11120 11120 0 11:00:35 ? 0:00 zsched pbg 11377 11347 0 11:01:28 pts/8 0:00 ps -ef root 11229 11120 0 11:00:40 ? 0:00 /usr/sbin/cron root 11341 11120 0 11:00:46 ? 0:00 /usr/sfw/sbin/snmpd root 11266 11120 0 11:00:41 ? 0:00 /usr/lib/im/htt -port 9010 -s yslog -message_locale C root 11339 11336 0 11:00:46 ? 0:00 /usr/lib/saf/ttymon root 11250 11120 0 11:00:41 ? 0:00 /usr/lib/utmpd root 11264 11261 0 11:00:41 ? 0:00 /usr/sadm/lib/smc/bin/smcboot root 11261 11120 0 11:00:41 ? 0:00 /usr/sadm/lib/smc/bin/smcboot root 11227 11120 0 11:00:40 ? 0:00 /usr/sbin/nscd root 11218 11120 0 11:00:40 ? 0:00 /usr/lib/autofs/automountd root 11325 11120 0 11:00:45 ? 0:00 /usr/lib/dmi/snmpXdmid -s zon e-app1 root 11239 11120 0 11:00:40 ? 0:00 /usr/lib/sendmail -bd -q15m root 11265 11261 0 11:00:41 ? 0:00 /usr/sadm/lib/smc/bin/smcboot root 11230 11120 0 11:00:40 ? 0:00 /usr/sbin/inetd -s root 11273 11266 0 11:00:42 ? 0:00 htt_server -port 9010 -syslog -message_locale C root 11129 11120 0 11:00:36 ? 0:00 init Copyright 2009 Peter Baer Galvin - All Rights Reserved 62Saturday, May 2, 2009
  • 63. Life in a Zone - 4 # mount -p / - / ufs - no rw,intr,largefiles,logging,xattr,onerror=panic /dev - /dev lofs - no zonedevfs /export/home - /export/home lofs - no /lib - /lib lofs - no ro,nodevices,nosub /platform - /platform lofs - no ro,nodevices,nosub /sbin - /sbin lofs - no ro,nodevices,nosub /usr - /usr lofs - no ro,nodevices,nosub proc - /proc proc - no nodevices,zone=app1 mnttab - /etc/mnttab mntfs - no nodevices,zone=app1 fd - /dev/fd fd - no rw,nodevices,zone=app1 swap - /var/run tmpfs - no nodevices,xattr,zone=app1 swap - /tmp tmpfs - no nodevices,xattr,zone=app1 # hostname zone-app1 # zonename app1 Copyright 2009 Peter Baer Galvin - All Rights Reserved 63Saturday, May 2, 2009
  • 64. Zone Clone As of S10 8/07, zones are “cloneable” Much faster than installing a zone As of 10/08 zones on ZFS -> ZFS clone - instantaneous Usable only if the zones of similar configs Configure a zone i.e. zone00 Install the zone Configure a new zone i.e. zone01 Then rather than zoneadm install, with zone00 halted, do # zoneadm –z zone01 clone –m copy zone00 Copyright 2009 Peter Baer Galvin - All Rights Reserved 64Saturday, May 2, 2009
  • 65. Zone Clone (cont) A cloned zone is unconfigured and must be configured When ZFS used as clone file system # zoneadm -z <newzone> clone <oldzone> Can clone a zone’s previously-taken snapshot via # zoneadm -z <newzone> clone -s <snapshot name> <oldzone> Copyright 2009 Peter Baer Galvin - All Rights Reserved 65Saturday, May 2, 2009
  • 66. Zone Clone (cont) So to clone zone1 to make zone2 # zonecfg -z zone1 export -f configfile Edit configfile to change zonepath and address (at least) Create zone2 via zonecfg -z zone2 -f configfile Halt zone1 via zoneadm -z zone1 halt Clone zone1 via zoneadm -z zone2 clone zone1 Use “-m copy” if zone1 on UFS Boot up both zones Check status via zoneadm list -iv Copyright 2009 Peter Baer Galvin - All Rights Reserved 66Saturday, May 2, 2009
  • 67. Zone Migration Zones can be moved between like systems Available S10 8/07 Separate the zone from its current system # zoneadm –z <zone> detach Note zone must be halted first Attach a detached zone to a different system (assuming its file system is now visible there, send a tarball, etc) # zoneadm –z <zone> attach [-F] Note zone must be configured before this can work Note new system is validated to assure the zone can function there To create a config for a zone that is detached rather than having to zonecfg it from scratch # zonecfg –z <zone> create -a zonepath Copyright 2009 Peter Baer Galvin - All Rights Reserved 67Saturday, May 2, 2009
  • 68. Zone Migration (cont) Can dry-run an attach / detach via the “-n” option to see if the attach will work Can upgrade the attaching zone on the attaching system via “-u” but only if all packages on the attaching system are as new or newer than the detaching system Can force an attach if a detach could not be done (dead system for example) Best to save your zone cfg files for use on the attach system (or you have to recreate them) Copyright 2009 Peter Baer Galvin - All Rights Reserved 68Saturday, May 2, 2009
  • 69. Other Cool Zone Stuff ps –Z shows zone in which each process is running Can use resource manager with zones Zones can use global naming services Use features to enable or disable accounts per zone Interzone networking executed via loopback for performance Copyright 2009 Peter Baer Galvin - All Rights Reserved 69Saturday, May 2, 2009
  • 70. Labs Create a “simple” zone Install it Boot it Configure it Look around in it - file systems, processes, resource use, users, etc Halt it Copyright 2009 Peter Baer Galvin - All Rights Reserved 70Saturday, May 2, 2009
  • 71. Zones and DTrace Zones can get some DTrace privileges (starting 11/06) # zonecfg -z my-zone zonecfg:my-zone> set limitpriv="default,dtrace_proc,dtrace_user" zonecfg:my-zone> exit DTrace can use zonenames are predicates to filter results # dtrace -n syscall:::/zonename==”zone1”/ {@[probefunc]=count()} Copyright 2009 Peter Baer Galvin - All Rights Reserved 71Saturday, May 2, 2009
  • 72. Fair-share Scheduling Solaris has many scheduler classes available A thread has priority 0-169, user threads are 0-59 The higher the priority, the sooner scheduled on CPU Scheduler class decides how the priority is modified over time Default user-land is Time-sharing Time-sharing dynamically changes the priority of each thread based on its activity If a thread used it time quantum, its priority decreases (The quantum is the scheduling interval) Kernel uses “sys” class Have a look via ps -elfc Copyright 2009 Peter Baer Galvin - All Rights Reserved 72Saturday, May 2, 2009
  • 73. !"#$%&"$(%&)(*+,($ Fair-share Scheduling !"#$%&"$(%&)(*+,($ !"#$%&"$(%&)(*+,($ 2 22 1 Bac up k AppSer er v 3 1 Bac up k Bac up abas k Dat e 3 1 AppSer er Dat e v abas Web AppSer er v 3 Web Dat e abas Web Database gets 4 / 4+3+2+1= 40% of all CPU ! !! 5 4 ! $ $!% 4 $!% $ time available to container !""#"!"$# "% !""#"!"$# ! "% ! 4 !"#$%&())*+#,%-,*.*/,#0/%$& $ $!%5 !""#"!"$# "% !"#$%&())*+#,%-,*.*/,#0/%$& !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 !"#$%&())*+#,%-,*.*/,#0/%$& !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 73 !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778Saturday, May 2, 2009
  • 74. Zones and Fair Share Scheduling FSS allows all CPU to be used if needed, but overuse to be limited based on “shares” given to CPU users Shares give to projects et al, and/or to containers Load the fair share schedule as the default schedule class dispadmin –d FSS Move all processes into the FSS class priocntl -s -c FSS -i class TS Give the global zone some (2) shares Note this is not persistent across reboots! prctl -n zone.cpu-shares -v 2 -r -i zone global Copyright 2009 Peter Baer Galvin - All Rights Reserved 74Saturday, May 2, 2009
  • 75. Zones and Fair-share scheduling (2) Check the shares of the global zone prctl -n zone.cpu-shares -i zone global Add a zone-wide resource control (1 share) to a zone (within zonecfg) (before S10U5) zonecfg:my-zone> add rctl zonecfg:my-zone:rctl> set name=zone.cpu- shares zonecfg:my-zone:rctl> add value (priv=privileged,limit=1,action=none) zonecfg:my-zone:rctl> end How many total shares are given out on a given machine? Copyright 2009 Peter Baer Galvin - All Rights Reserved 75Saturday, May 2, 2009
  • 76. FX Scheduler Time-share is heavy weight scheduler Has to calculate for every thread that ran in the last quantum, every quantum Plus decreases priority on CPU hogs Instead consider “FX” - fixed scheduler class All priorities stay the same Light weight schedule can gain back a few percent of CPU Copyright 2009 Peter Baer Galvin - All Rights Reserved 76Saturday, May 2, 2009
  • 77. !"#$%&()*+,-.*(/,,0+ ! 9-*&4#-:$,)$4()"0$5)*-#$(-*)"(-*$*"5$1*$3/;*<$ .-.)(+<$=>?$)##-,&)#* ! @$0))A$1#$B-$1**)&1,-:$C&,5$3/;*$1#:$1$*5-:"A-( ! 3/;*$1#$B-$1**&4#-:D " :+#1.&1AA+<$B+$)#E&4"(&#4$1$.&#&.".$1#:$.1F&.".$ #".B-($)E$3/;*$,51,$1$G)#-$)($0))A$*5)"A:$"*- " B+$!)A1(&*$C5-#$&,$:-&:-*$,)$,(1#*E-($3/;*$1.)#4$ -F&*,&#4$0))A*$C&,5$H,5(-*5)A:H$1#:$H&.0)(,1#-H$ 01(1.-,-(* " *,1,&1AA+<$B+$H0&##&#4H$1$3/;$,)$1$0))A$2$"*-E"A$,)$ -#*"(-$,51,$1$0()-**$*,1+*$)#$1$3/;$1#:$:)-*#H,$ *51(-$,5-$3/;H*$15- " @$3/;$&*$.)I-:$B-,C--#$0))A*$C5-#$1#$H&.0)(,1#,H$ C)(JA)1:$*"(01**-*$&,*$",&A&G1,&)#$,5(-*5)A:$E)($1$ *"EE&&-#,$0-(&):$)E$,&.- !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 77Saturday, May 2, 2009
  • 78. !"#$%&()*+,-.*(/,,0+ ! 95-(-$&*$)#-$0)):$)#;&4"(1,&)#$0-($!):1(&*$&#*,1#- ! <+$=-;1":,>$)#-$0)):$-?&*,*>$@0)):A=-;1":,B ! 95-*-$1#$C-$C)"#=$,)$1$0)):D " /()-**>$,1*E>$0()F-,>$3)#,1&#-( ! G$3)#,1&#-($1#$C-$*,1,&1::+$1**&4#-=$,)$1#$ -?&*,&#4$H*51(-=I$0)):$J5-#$,5-$3)#,1&#-($C)),* " %":,&0:-$3)#,1&#-(*$1#$*51(-$,51,$0)): " !"5$1$3)#,1&#-($)#:+$"*-*$(-*)"(-*$J5-#$&,$&*$ ("##&#4 ! G$3)#,1&#-($1#$C-$1**&4#-=$,)$1$,-.0)(1(+$0)): " /)):$)#:+$-?&*,*$J5&:-$3)#,1&#-($("#* " 951,$0)):$1##),$C-$*51(-=$J&,5$),5-($3)#,1&#-(* !"#$%&()*+*,-.*$/()0(&-,1(+$2$3)0+(&45,$6778 Copyright 2009 Peter Baer Galvin - All Rights Reserved 78Saturday, May 2, 2009
  • 79. DRPs You can make “DRP”s non-dynamic by not including a variation in the range (i.e. 2 to 2 rather than 1 to 2) Probably preferred rather than real dynamic With pools, interrupts and I/O only occur in the default pool This can help pin a process to a set of CPUS Cache stays hot, less context switching So consider a DRP config with the kernel in the default pool and all apps in another pool Copyright 2009 Peter Baer Galvin - All Rights Reserved 79Saturday, May 2, 2009
  • 80. Zones and Dynamic Resource Pools Assign zones to dedicated CPU resources Used to assign zone to processor set Can be dynamically created, deleted, modified Can be used with FSS Can be used to reduce Oracle (and other?) costs! Consider two DRPs, one with an email container and one with 2 X web server containers (and global) (from http://www.sun.com/software/solaris/ howtoguides/containersLowRes.jsp): Copyright 2009 Peter Baer Galvin - All Rights Reserved 80Saturday, May 2, 2009
  • 81. Zones and DRPs (cont) Copyright 2009 Peter Baer Galvin - All Rights Reserved 81Saturday, May 2, 2009
  • 82. Zones and DRPs (cont) Create a pool (from global zone) via # # enable DRPs # pooladm –e # # save current config # pooladm –s # # show current state, at start only pool_default exists global# pooladm system my_system string system.comment int system.version 1 boolean system.bind-default true int system.poold.pid 638 pool pool_default int pool.sys_id 0 boolean pool.active true boolean pool.default true int pool.importance 1 string pool.comment pset pset_default Copyright 2009 Peter Baer Galvin - All Rights Reserved 82Saturday, May 2, 2009
  • 83. Zones and DRPs (cont) pset pset_default int pset.sys_id -1 boolean pset.default true uint pset.min 1 uint pset.max 65536 string pset.units population uint pset.load 7 uint pset.size 8 string pset.comment cpu int cpu.sys_id 1 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 0 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 3 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 2 string cpu.comment string cpu.status on-line Copyright 2009 Peter Baer Galvin - All Rights Reserved 83Saturday, May 2, 2009
  • 84. Zones and DRPs (cont) Create a new one-CPU processor set called email-pset # poolcfg -c create pset email-pset (uint pset.min=1; uint pset.max=1) Create a resource pool for the processor set # poolcfg -c create pool email-pool Link the pool to the processor set # poolcfg -c associate pool email-pool (pset email-pset) Set an objective (if including a range of processors (i.e. min <> max) # poolcfg -c modify pset email-pool (string pset.poold.objectives="wt-load") Activate the configuration # pooladm -c Copyright 2009 Peter Baer Galvin - All Rights Reserved 84Saturday, May 2, 2009
  • 85. Zones and DRPs (cont) Check the config # pooladm system my_system string system.comment int system.version 1 boolean system.bind-default true int system.poold.pid 638 pool email-pool int pool.sys_id 1 boolean pool.active true boolean pool.default false int pool.importance 1 string pool.comment pset email pool pool_default int pool.sys_id 0 boolean pool.active true boolean pool.default true int pool.importance 1 string pool.comment pset pset_default pset email-pset int pset.sys_id 1 boolean pset.default false uint pset.min 1 uint pset.max 1 string pset.units population uint pset.load 0 uint pset.size 1 string pset.comment cpu int cpu.sys_id 0 string cpu.comment string cpu.status on-line Copyright 2009 Peter Baer Galvin - All Rights Reserved 85Saturday, May 2, 2009
  • 86. Zones and DRPs (cont) Check the config pset pset_default int pset.sys_id -1 boolean pset.default true uint pset.min 1 uint pset.max 65536 string pset.units population uint pset.load 7 uint pset.size 7 string pset.comment cpu int cpu.sys_id 1 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 3 string cpu.comment string cpu.status on-line cpu int cpu.sys_id 2 string cpu.comment string cpu.status on-line Copyright 2009 Peter Baer Galvin - All Rights Reserved 86Saturday, May 2, 2009
  • 87. DRPs Note that you can give ranges of CPUs to be used in DRPs If you do be sure to set an “objective” else nothing will be dynamic Note that some software licenses allow licensing of the app for only those CPUs in the DRP that the zone is attached to (i.e. only pay for your DRP CPUs, not all CPUs)(!) Copyright 2009 Peter Baer Galvin - All Rights Reserved 87Saturday, May 2, 2009
  • 88. Zones and DRPs (cont) Now enable FSS, make it default for pool_default # poolcfg -c modify pool pool_default (string pool.scheduler="FSS") Create an instance of the configuration # pooladm -c Move all the processes in the default pool and its associated zones under the FSS. # priocntl -s -c FSS -i class TS # priocntl -s -c FSS -i pid 1 Now have the zones use the DRPs # zonecfg –z email-zone zonecfg:email-zone> set pool=email-pool # zonecfg –z Web1-zone zonecfg: Web1-zone> set pool=pool_default zonecfg:Web1-zone> add rctl zonecfg:Web1-zone:rctl> set name=zone.cpu-shares zonecfg:Web1-zone:rctl> add value (priv=privileged,limit=3,action=none) zonecfg:Web1-zone:rctl> end # zonecfg -z Web2-zone zonecfg:Web2-zone> set pool=pool_default zonecfg:Web2-zone> add rctl zonecfg:Web2-zone:rctl> set name=zone.cpu-shares zonecfg:Web2-zone:rctl> add value (priv=privileged,limit=2,action=none) zonecfg:Web2-zone:rtcl> end Copyright 2009 Peter Baer Galvin - All Rights Reserved 88Saturday, May 2, 2009
  • 89. Zones, Resources, and S10 8/07 Much simpler now if you just want a zone to have dedicated CPUs, memory limits (From http://blogs.sun.com/jerrysblog/feed/entries/atom?cat=%2FSolaris) zonecfg:my-zone> set scheduling-class=FSS zonecfg:my-zone> add dedicated-cpu zonecfg:my-zone:dedicated-cpu> set ncpus=1-4 zonecfg:my-zone:dedicated-cpu> set importance=10 zonecfg:my-zone:dedicated-cpu> end zonecfg:my-zone> add capped-memory zonecfg:my-zone:capped-memory> set physical=50m zonecfg:my-zone:capped-memory> set swap=128m zonecfg:my-zone:capped-memory> set locked=10m zonecfg:my-zone:capped-memory> end You have to enable poold via svcadm if “importance”used Still use dispadmin to set system-wide scheduling Copyright 2009 Peter Baer Galvin - All Rights Reserved 89Saturday, May 2, 2009
  • 90. Zones, Resources, and S10 8/07 (cont) Can use zonecfg for the global zone to persistently set resource management settings in global Now can set other zone-wide resource limits easily zone.cpu-shares zone.max-locked-memory (locked property of the capped-memory resource is preferred) zone.max-lwps zone.max-msg-ids zone.max-sem-ids zone.max-shm-ids zone.max-shm-memory zone.max-swap (The swap property of the capped-memory resource is the preferred way to set this control) Copyright 2009 Peter Baer Galvin - All Rights Reserved 90Saturday, May 2, 2009
  • 91. Zones and Networking S10 8/07 Can now create exclusive-IP zones (i.e. dedicate an HBA port to a zone) known as “IP Instances” Need this if you want advanced networking features in a zone (firewalls, snooping, DHCP client, traffic shaping) Each zone get its own IP stack (and soon xVM will too) zonecfg:my-zone>set ip-type=exclusive zonecfg:my-zone> add net zonecfg:my-zone:net> set physical=e1000g1 zonecfg:my-zone:net> end Now the zone can set its own IP address et al, can do IPMP within a zone “zonecfg set physical=” to one of the interfaces in an IPMP group Project Crossbow will allow virtual NICs to be IP instance entity (no longer tying up Ethernet port) Limited to Ethernet devices that use GLDv3 drivers (dladm show-link not reporting “legacy”) Copyright 2009 Peter Baer Galvin - All Rights Reserved 91Saturday, May 2, 2009
  • 92. Zones, Resources and 5/08 CPU Caps Can limit the aggregated amount of CPU that a container’s CPUs can accumulate Although it is possible to use prctl(1M) command to manage CPU caps, the capctl Perl script that simplifies it # capctl <-P project> <-p pid> <-Z zone> <-n name> <-v value> * -P proj: Specify project id * -p pid: Specify pid * -Z zone: Specify zone name * -n name: Specify resource name * -v value: Specify resource value For example, to set a cap for project foo to 50% you can say: # capctl -P foo -v 50 To change the cap to 80%: # capctl -P foo -v 80 To see the cap value: # capctl -P foo To remove the cap: # capctl -P foo -v 0 Copyright 2009 Peter Baer Galvin - All Rights Reserved 92Saturday, May 2, 2009
  • 93. prctl vs zonecfg prctl can read resource settings in the global or child zones Not persistent for setting variables Can’t set variables in the child zone zonecfg is persistent, but only runs in global zone Copyright 2009 Peter Baer Galvin - All Rights Reserved 93Saturday, May 2, 2009
  • 94. Zone Issues Zone cannot reside on NFS But zone can be NFS client Each zone normally has a “sparse” installation of a package, if package is from “inherit-package-dir” directory tree By default, a package installed in global zone is installed in all existing non-global zones Unless the pkgadd –G or –Z options are used See also SUNW_PKG_ALLZONES and SUNW_PKG_HOLLOW package parameters Patches installed in global zone is installed in all non-global zones If any zone does not match patch dependencies, patch not installed Copyright 2009 Peter Baer Galvin - All Rights Reserved 94Saturday, May 2, 2009
  • 95. Zone issues - cont Upgrading the global zone to a new Solaris release upgrades the non-global zones but depends on which upgrade method is used (hint - use live upgrade) Best practice is to keep packages and patches synced between global and all non-global zones Watch out for giving users root in a zone – could violate policy or regulations Flash Archive (flar) can be used to capture system containing zones and clone it, but only if zones are halted. Details at http://www.opensolaris.org/os/community/zones/ faq/flar_zones Copyright 2009 Peter Baer Galvin - All Rights Reserved 95Saturday, May 2, 2009
  • 96. Zones and Packages # pkgadd -d screen* The following packages are available: 1 SMCscreen screen (intel) 4.0.2 Select package(s) you wish to process (or all to process all packages). (default: all) [?,??,q]: ## Not processing zone <zone10>: the zone is not running and cannot be booted ## Booting non-running zone <zone0> into administrative state ## waiting for zone <zone0> to enter single user mode... ## Verifying package <SMCscreen> dependencies in zone <zone0> ## Restoring state of global zone <zone0> ## Booting non-running zone <zone1> into administrative state ## waiting for zone <zone1> to enter single user mode... . . . ## Booting non-running zone <zone0> into administrative state ## waiting for zone <zone0> to enter single user mode... ## waiting for zone <zone0> to enter single user mode... ## Installing package <SMCscreen> in zone <zone0> Copyright 2009 Peter Baer Galvin - All Rights Reserved 96Saturday, May 2, 2009
  • 97. Sparse Zones vs. Whole Root Zones When should you use “sparse”, when should you use “whole root” Check per-application support and/or requirements sparse zones don’t allow writes into /, /usr, etc by default, some apps don’t like that Can intermix sparse and whole-root on the same system Make a sparse root into a whole root # zonecfg create -b In the future, likely that the world will use whole root zones and ZFS cloning But zone roots on ZFS not supported until U6 because not upgradeable Copyright 2009 Peter Baer Galvin - All Rights Reserved 97Saturday, May 2, 2009
  • 98. Upgrading a System Containing Containers Supported methods vary, depending on OS release being upgraded from Generally liveupgrade is best, but many details to consider Well documented at http://docs.sun.com/app/docs/ doc/820-4041/gdzlc?a=view Copyright 2009 Peter Baer Galvin - All Rights Reserved 98Saturday, May 2, 2009
  • 99. Zone Best Practices Note that global zone root can copy files directly into zones via their zonepath directory Consider building at least one container per system Put all users and apps in there Fast to copy for testing Fast reboot Put it on shared storage for future attach / detach But watch out for limits dtrace app support in a zone Surprisingly, a global-zone mount within the zone file system is immediately seen in the zone Copyright 2009 Peter Baer Galvin - All Rights Reserved 99Saturday, May 2, 2009
  • 100. Zone Best Practices (2) Use zonecfg export to save each zone’s config settings - store on a different system For every zone created, in its “virgin state”, create a clone of it and store it on a different system Put zones on ZFS for best feature set Consider configuring child zones to send syslog output to central syslog server Copyright 2009 Peter Baer Galvin - All Rights Reserved 100Saturday, May 2, 2009
  • 101. Zones and /etc/system For variables no longer in /etc/system they can be set via the rctladm command, but only per project. This example is from the Sun installation guide for Weblogic on Solaris 10… Modify /etc/project in each zone the app will run in to contain the following additions to the resource controls for user.root (assuming the application will run as root): bash-3.00# cat /etc/project system:0:::: user.root:1:::: process.max-file-descriptor=(privileged,1024,deny); process.max-sem-ops=(privileged,512,deny); process.max-sem-nsems=(privileged,512,deny); project.max-sem-ids=(privileged,1024,deny); project.max-shm-ids=(privileged,1024,deny); project.max-shm-memory=(privileged,4294967296,deny) noproject:2:::: default:3:::: group.staff:10:::: Copyright 2009 Peter Baer Galvin - All Rights Reserved 101Saturday, May 2, 2009
  • 102. Zones and /etc/system (cont) Note that /etc/project is read at login Also to enable warnings via syslog if the resource limits are approached execute the following commands once in each zone the app will run in (they update the /etc/ rctladm.conf file) Do this in the global zone, not persistent so script it: #rctladm -e syslog process.max-file-descriptor #rctladm -e syslog process.max-sem-ops #rctladm -e syslog process.max-sem-nsems #rctladm -e syslog process.max-sem-ids #rctladm -e syslog process.max-shm-ids #rctladm -e syslog process.max-shm-memory Copyright 2009 Peter Baer Galvin - All Rights Reserved 102Saturday, May 2, 2009
  • 103. Branded Zones Shipped in S10 8/07 Allows native binary execution of bins from other operating systems Centos first Install a brandz zone, install the “guest” OS, then install binaries (RPMs et al) and run them Currently limited to centos and other 2.4-based distros Result - can use DTrace to analyze Linux perf problems See man pages for brands(5), lx(5) Copyright 2009 Peter Baer Galvin - All Rights Reserved 103Saturday, May 2, 2009
  • 104. brandz Example install given at http://milek.blogspot.com/2006/10/brandz- integrated-into-snv49.html # zonecfg -z linux linux: No such zone configured Use create to begin configuring a new zone. zonecfg:linux> create -t SUNWlx zonecfg:linux> set zonepath=/home/zones/linux zonecfg:linux> add net zonecfg:linux:net> set address=192.168.1.10/24 zonecfg:linux:net> set physical=bge0 zonecfg:linux:net> end zonecfg:linux> add attr zonecfg:linux:attr> set name="audio" zonecfg:linux:attr> set type=boolean zonecfg:linux:attr> set value=true zonecfg:linux:attr> end zonecfg:linux> exit Copyright 2009 Peter Baer Galvin - All Rights Reserved 104Saturday, May 2, 2009
  • 105. brandz (cont) # zoneadm -z linux install -d /mnt/iso/ centos_fs_image.tar.bz2 A ZFS file system has been created for this zone. Installing zone linux at root directory /home/zones/ linux from archive /mnt/iso/centos_fs_image.tar.bz2 This process may take several minutes. Setting up the initial lx brand environment. System configuration modifications complete! Setting up the initial lx brand environment. System configuration modifications complete! Installation of zone linux completed successfully. Details saved to log file: "/home/zones/linux/root/var/log/linux.install.10064.log" Copyright 2009 Peter Baer Galvin - All Rights Reserved 105Saturday, May 2, 2009
  • 106. Solaris 8 and 9 Containers Now available as a commercial product ($) from Sun Uses brandz Capture a Solaris 8 or Solaris 9 system via Archiver (aka P2V) Updater Tool, processes Solaris 8 image and prepares it for new, virtualized environment Create it as a container under S10 Apps think they are on S8 or S9 Sun “guarantees” compatibility SPARC only Copyright 2009 Peter Baer Galvin - All Rights Reserved 106Saturday, May 2, 2009
  • 107. Solaris 8 and 9 Containers - cont http://www.sun.com/software/solaris/pdf/ solaris8and9containers_datasheet.pdf # zonecfg -z zone8 zonecfg:zone8> create -t SUNWsolaris8 zonecfg:zone8> set zonepath = /export/home/zones/zone8 zonecfg:zone8> add net zonecfg:zone8:net> set address = <IP Address> zonecfg:zone8:net> set physical = e1000g1 zonecfg:zone8:net> end zonecfg:zone8> verify zonecfg:zone8> commit zonecfg:zone8> exit # zoneadm -z zone8 install -a <FLAR_image_location> {-u|-p} Try for 90 days via http://www.sun.com/software/solaris/ containers/getit.jsp Copyright 2009 Peter Baer Galvin - All Rights Reserved 107Saturday, May 2, 2009
  • 108. zonestat Tool to monitor entire system performance, including per-zone More information that prstat -Z Download from http://opensolaris.org/os/project/zonestat/ # ./zonestat |--Pool--|Pset|-------Memory-----| Zonename| IT|Size|Used| RAM| Shm| Lkd| VM| ------------------------------------------ global 0D 2 0.1 556M 0.0 0.0 331M zone1 0D 2 0.0 26M 0.0 0.0 24M ==TOTAL= === 2 0.1 608M 0.0 0.0 355M # ./zonestat -l |----Pool-----|------CPU-------|----------------Memory----------------| |---|--Size---|-----Pset-------|---RAM---|---Shm---|---Lkd---|---VM---| Zonename| IT| Max| Cur| Cap|Used|Shr|S%| Cap|Used| Cap|Used| Cap|Used| Cap|Used ------------------------------------------------------------------------------- global 0D 2 0.0 0.1 5 83 556M 18E 0.0 18E 0.0 18E 331M zone1 0D 2 0.0 0.0 1 16 26M 18E 0.0 18E 0.0 18E 24M ==TOTAL= --- ---- 2 ---- 0.1 --- -- 3.1G 608M 3.1G 0.0 3.0G 0.0 4.0G 355M Copyright 2009 Peter Baer Galvin - All Rights Reserved 108Saturday, May 2, 2009
  • 109. Zone Futures Live migration Improved networking via project crossbow Not just ip-exclusive. Virtual network stack for each container S10 containers? Copyright 2009 Peter Baer Galvin - All Rights Reserved 109Saturday, May 2, 2009
  • 110. Labs Create a container with resource management (your choice) What is your view of file systems? What file systems are yours, what are shared? What do the file systems look like from the global zone? Test the resource management if possible What does your networking look like? What is your life like in a zone? How are zones different from domains? From vmware? What scheduler is in use in your zone? If fair share, how many shares does your zone have? Copyright 2009 Peter Baer Galvin - All Rights Reserved 110Saturday, May 2, 2009
  • 111. Labs (cont) If you are not fair share scheduled, turn it on and enable shares for your container Clone the zone Detach and attach the zone (to the same system if necessary) Copyright 2009 Peter Baer Galvin - All Rights Reserved 111Saturday, May 2, 2009
  • 112. LDOMS Copyright 2009 Peter Baer Galvin - All Rights Reserved 112Saturday, May 2, 2009
  • 113. LDOMs Logical domains Released April ’07 Only on Niagara and future CMT chips (Niagara II, Rock) Like enterprise-system domains but within one chip Slice the chip into multiple LDOMs, each with its own OS root, boot independently, et Now can run multiple OSes on 1 SPARC chip Copyright 2009 Peter Baer Galvin - All Rights Reserved 113Saturday, May 2, 2009
  • 114. Copyright 2009 Peter Baer Galvin - All Rights Reserved 114Saturday, May 2, 2009
  • 115. LDOMs - Details Can create up to 1 LDOM per thread(!) Best practice seems to be max one LDOM per core i.e. 8 LDOMs on Niagara I and II Nice intro blog http://blogs.sun.com/ash/entry/ultrasparc_t2_launched_today And nice flash demo http://www.sun.com/servers/coolthreads/ldoms/ Community cookbooks http://wikis.sun.com/display/SolarisLogicalDomains/ LDoms+Community+Cookbook Copyright 2009 Peter Baer Galvin - All Rights Reserved 115Saturday, May 2, 2009
  • 116. LDOMS Introduction and Hands-On-Training Peter Baer Galvin With Thanks to: Tom Gendron Chief Technologist SPARC Systems Technical Specialist Corporate Technologies Sun Microsystems 116 1Saturday, May 2, 2009
  • 117. Agenda • Virtualization Comparisons • Concepts of LDOMs • Requirements of LDOMs • Examples • Best Practices 117Saturday, May 2, 2009
  • 118. Single application per server The Data Center Today Server sprawl is hard to manage Client App App Mail Service Average server Developer Server Server Server Database Database Application Management NETWORK utilization between Data Center 5 to 15 % OS Server Energy costs continue to rise Storage 118Saturday, May 2, 2009
  • 119. A widely understood problem 119Saturday, May 2, 2009
  • 120. Virtualization: Who and Why InformationWeek: Feb 12, 2007 http://www.informationweek.com/news/showArticle.jhtml?articleID=197004875 120Saturday, May 2, 2009
  • 121. Server Virtualization Hard Partitions Virtual Machines OS Virtualization Resource Mgmt. App Identity File Web Mail Calendar Web SunRay App Server Database Server Server Server Server Server Database Server Server Database Server App OS Server Multiple OSs Single OS > Very High RAS > Live OS migration > Very scalable and low > Very scalable and low capability overhead overhead > Very Scalable > Improved Utilization > Single OS to manage > Single OS to manage > Mature Technology > Ability to run different OS > Cleanly divides system and > Fine grained resource > Ability to run different OS versions and types application administration management versions > De-couples OS and HW > Fine grained resource > Complete Isolation versions management 121Saturday, May 2, 2009
  • 122. Para vs. Full Virtualization Para-Virtualization • Para-virtualization: File Server Web Server Mail Server App > OS ported to special architecture > Uses generic “virtual” device drivers OS > More efficient since it is “hypervisor” aware Server > “almost” native performance Full Virtualization • Full virtualization: File Server Web Server Mail Server App > OS has no idea it is running virtualized > Must emulate real i/o devices OS > Can be slow/need help from hardware Control Domain > May use traps, emulation or rewriting Server 122Saturday, May 2, 2009
  • 123. What is an LDOM? • It is a virtual server • Has its own console and OBP instance • A configurable allocation of CPU, FPU, Disk, Memory and I/O components • Runs a unique OS/patch image and configuration • Has the capability to stop, start and reboot independently • Utilizes a Hypervisor to facilitate LDOMs 123Saturday, May 2, 2009
  • 124. Requirements for LDOMs • Sun T-Series server > T1/2000 T5x20 rack servers > T6100, T6120 blade > Any future CMT based server • Up to date Firmware on service processor http://sunsolve.sun.com/handbook_pub/validateUser.do?target=index • minimum Solaris 10 11/06 on T1/2000, T6100 • minimum Solaris 10 08/07 T5x20, T6120 • Ldom Manager Software 1.0.1 + patches 124Saturday, May 2, 2009
  • 125. Hypervisor • A thin interface between the Hardware and Solaris • The interface is called sun4v • Solaris calls the sunv4 interface to use hardware specific functions • It is very simple and is implemented in firmware • It allows for the creation of ldoms • It creates communication channels between ldoms 125Saturday, May 2, 2009
  • 126. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 08/07 Solaris 10 11/06 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd drd vntsd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain CPU Cpu CPU /dev/lofi/1 Cpu Mem Mem vdisk0 vdisk1 Crypto CPU Cpu CPU Cpu Mem vol1 Mem vnet0 vnet0 • Multiple Guest Control & Service Guest Guest primary CPU Cpu CPU Crypto ldom1 ldom2 Cpu Mem Mem Crypto vsw0 Domains CPU Cpu CPU Cpu vnet0 vnet1 Mem Crypto Mem Crypto primary-vds0 • Virtualised devices Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU, Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 126Saturday, May 2, 2009
  • 127. LDOMs types • Different Ldom Types - Control Domain - Hosts the Logical Domain Manager (LDM) - Service Domains - Provides virtual services to other domains - I/0 Domains - Has direct access to physical devices - Guest Domains - Used to run user environments • Control, Service and I/O domains can be combined or separate > One of the I/O domains must be the control domain 127Saturday, May 2, 2009
  • 128. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 08/07 Solaris 10 11/06 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd drd vntsd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain CPU Cpu CPU ZFS FS Cpu Mem Mem vdisk0 vdisk1 Crypto CPU Cpu CPU Cpu Mem vol1 Mem vnet0 vnet0 • Multiple Guest Control & Service Guest Guest primary CPU Cpu CPU Crypto ldom1 ldom2 Cpu Mem Mem Crypto vsw0 Domains CPU Cpu CPU Cpu vnet0 vnet1 Mem Crypto Mem Crypto primary-vds0 • Virtualised devices Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU, Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 128Saturday, May 2, 2009
  • 129. Control Domain • Creates and manages other LDOMs • Runs the LDOM Manager software • Allows monitoring and reconfiguration of domains • Recommendation: > Make this Domain as secure as possible 129Saturday, May 2, 2009
  • 130. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 08/07 Solaris 10 11/06 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd drd vntsd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain CPU Cpu CPU ZFS FS Cpu Mem Mem vdisk0 vdisk1 Crypto CPU Cpu CPU Cpu Mem vol1 Mem vnet0 vnet0 • Multiple Guest Control & Service Guest Guest primary CPU Cpu CPU Crypto ldom1 ldom2 Cpu Mem Mem Crypto vsw0 Domains CPU Cpu CPU Cpu vnet0 vnet1 Mem Crypto Mem Crypto primary-vds0 • Virtualised devices Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU, Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 130Saturday, May 2, 2009
  • 131. Service Domain • Provides services to other domains – virtual network switch – virtual disk service – virtual console service • Multiple Service domains can exist with shared or sole access to system facilities • Allows for IO load separation and redundancy within domains deployed on a platform • Often Control and Service Domains are one and the same 131Saturday, May 2, 2009
  • 132. IO Domain • IO Domain has direct access to physical input and output devices. • The number of IO domains is hardware dependent > currently limited to 2 > limited by PCI-E switch configuration • One IO domain must also be the control domain 132Saturday, May 2, 2009
  • 133. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 08/07 Solaris 10 11/06 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd drd vntsd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain CPU Cpu CPU ZFS FS Cpu Mem Mem vdisk0 vdisk1 Crypto CPU Cpu CPU Cpu Mem vol1 Mem vnet0 vnet0 Control & Service Guest Guest • Multiple Guest primary vsw0 CPU Cpu CPU Cpu Mem Crypto Mem Crypto ldom1 ldom2 Domains CPU Cpu CPU Cpu vnet0 vnet1 Mem Crypto Mem Crypto primary-vds0 • Virtualised devices Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU, Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 133Saturday, May 2, 2009
  • 134. Guest Domains • Contain the targeted applications the LDOMs were created to service. • Multiple Guest domains can exist > Constrained only by hardware limitations • May use one or more Service domains to obtain IO > Various redundancy mechanisms can be used • Can be independently powered and rebooted and without affecting other domains 134Saturday, May 2, 2009
  • 135. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 08/07 Solaris 10 11/06 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd drd vntsd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain CPU Cpu CPU ZFS FS Cpu Mem Mem vdisk0 vdisk1 Crypto CPU Cpu CPU Cpu Mem vol1 Mem vnet0 vnet0 Control & Service Guest Guest • Multiple Guest primary vsw0 CPU Cpu CPU Cpu Mem Crypto Mem Crypto ldom1 ldom2 Domains CPU Cpu CPU Cpu vnet0 vnet1 Mem Crypto Mem Crypto primary-vds0 • Virtualised devices Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU, Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 135Saturday, May 2, 2009
  • 136. Virtual devices • Virtual devices are hardware resources abstracted by the hypervisor and made available for use by the other domains • Virtual devices are : > CPUs - VCPU > Memory - > Crypto cores - MAU > Network switches - VSW > NICs - VNET > Disk servers - VDSDEV > Disks - VDISK > Consoles - VCONS 136Saturday, May 2, 2009
  • 137. Example 1 Install Ldom Manager & Setting up the Control Domain 137Saturday, May 2, 2009
  • 138. Example 1 steps • Update firmware to latest release • Install Supported version of Solaris • Install Logical Domain Manager (LDM) software • Configure the control domain • Save initial domain config • Reboot Solaris 138Saturday, May 2, 2009
  • 139. A note on system interfaces • Provide out-of-band management • Two types (iLOM and ALOM)‫‏‬ • T1/2000 uses ALOM interface • T5x20 uses iLOM • iLOM “CLI” has a ALOM compatibility shell > ALOM shell used in the examples • A web based interface available • (SC = system controller, SP = system processor)‫‏‬ > essentially the same thing. 139Saturday, May 2, 2009
  • 140. Web based iLOM interface 140Saturday, May 2, 2009
  • 141. ALOM compatibility shell • login to SP as root/changeme • -> create /SP/users/admin • -> set /SP/users/admin role=Administrator • -> set /SP/users/admin cli_mode=alom – Creating user ... – Enter new password: ******** – Enter new password again: ******** – Created /SP/users/admin • exit • login as admin 141Saturday, May 2, 2009
  • 142. Step 1 Firmware verification and update 142Saturday, May 2, 2009
  • 143. System Identification and Update • Check the Service Processor of your system for firmware levels • using alom mode (showhost not available in bui)‫‏‬ sc> showhost Sun System Firmware 7.0.1 2007/09/14 16:31 Host flash versions: Hypervisor 1.5.1 2007/09/14 16:11 OBP 4.27.1 2007/09/14 15:17 Check SC Firmware POST 4.27.1 2007/09/14 15:43 version 7.0.1 • Upgrade your system firmware if needed... > flashupdate command > sysfwdownload (via Solaris on platform)‫‏‬ > BUI 143Saturday, May 2, 2009
  • 144. Firmware update example sc> showkeyswitch Keyswitch is in the NORMAL position. sc> flashupdate -s 10.8.66.15 -f /incoming//Sun_System_Firmware-6_4_6-Sun_Fire_T2000.bin Username: tgendron Password: ******** SC Alert: System poweron is disabled. Update complete. Reset device to use new software. sc> sc> resetsc telnet and login back in once up. sc> showhost Sun-Fire-T2000 System Firmware 6.5.5 2007/10/28 23:09 144Saturday, May 2, 2009
  • 145. Firmware update example 2 Step 1: From Solaris running on T5120 with the SP to update Download the patch from Sun Solve 127580-05.zip Step 2: unzip and cd into 127580-05 Step 3: run sysfwdownload [image].pkg Step 4: reboot solaris sc> resetsc 145Saturday, May 2, 2009
  • 146. Installing LDOM manager software • T5x20 requires Solaris 10 8/07 or greater • T1/2000 requires Solaris 10 11/06 or greater + * 124921-02 at a minimum * 125043-01 at a minimum * 118833-36 at a minimum • 11/06 is minimum for guests • ldm 1.0.2 is current > includes Solaris Security Toolkit (optional)‫‏‬ 146Saturday, May 2, 2009
  • 147. Install the LDM Software • Unzip and install w/installation script • Security of Control Domain is important > Recommend selecting the JASS secure configuration • Once complete entire system is one LDOM • LDOM software installed in /opt/SUNWldm # [cmt1/root] ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 64 8064M 0.0% 3h 19m [cmt1/root] All the system resource are in domain “primary” * Follow the Administration Guide to install required OS and patches 147Saturday, May 2, 2009
  • 148. Flag Definitions # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 32 32640M 0.1% 6d 20h 24m # - placeholder c control domain d delayed reconfiguration n normal s starting or stopping t transition v virtual I/O domain 148Saturday, May 2, 2009
  • 149. Example 1 Part 2 Setting up the Control Domain 149Saturday, May 2, 2009
  • 150. On naming things... • Choose LDOM component names carefully > Names are used to manage the devices > Bad choices can be very confusing later on... > Keep names short and specific... • You need names for ... > Disk Servers, and disk device instances > Network Virtual Switches, and network device instances > Domains • Service and device names are only known to the Control and Service domains – Guest domains just see virtual devices. 150Saturday, May 2, 2009
  • 151. Control/Service Domain • On our Primary Domain do the following ... • In this example Control and Service are combined Primary > Control domain runs the LDM CPU Cpu Unallocated Resources > Service domain has these services set up: Solaris 10 08/07 CPU Cpu ldmd CPU Cpu CPU Cpu Mem Mem drd • Set up the basic services needed. vntsd CPU Cpu CPU Cpu Mem Mem Mem Mem > vds - virtual disk service CPU Cpu CPU Cpu Mem Mem CPU Crypto Crypto Cpu CPU > vcc - virtual console concentrator vcc0 vds0 Control & Service Cpu Mem Mem primary CPU CPU Mem Crypto Cpu Cpu Mem Crypto > vsw - virtual network switch vsw0 CPU Cpu CPU Mem Crypto Cpu Mem Crypto Crypto Crypto • The service names in this example are below: primary-vds Hypervisor primary-vsw > primary-vds0 CPU Cpu CPU Cpu Hardware > primary-vcc0 Mem Shared CPU, Mem Crypto > primary-vsw0 Memory & IO IO Devices • Allocate resources PCI-E > CPU, Memory, Crypto, IO devices Network 72GB 72GB 151Saturday, May 2, 2009
  • 152. Control/Service Domain set-up (1)‫‏‬ # Add services to the control domain # The mac address taken from a physical interface, e.g., e1000g0. ldm add-vds primary-vds0 primary ldm add-vcc port-range=5000-5100 primary-vcc0 primary ldm add-vsw mac-addr=0:14:4f:6a:9e:dc net-dev=e1000g0 primary-vsw0 primary # Activate the virtual network terminal server svcadm enable vntsd # Allocate resources to the control domain and save ldm set-mau 1 primary ldm set-vcpu 8 primary ldm set-memory 2G primary ldm add-spconfig my-initial # Reboot required to have the configuraiton take effect. init 6 152Saturday, May 2, 2009
  • 153. Crypto Note Note–If you have any cryptographic devices in the control domain, you cannot dynamically reconfigure CPUs. So if you are not using cryptographic devices, set-mau to 0. 153Saturday, May 2, 2009
  • 154. Control/Service Domain set-up (2)‫‏‬ # Verify the primary domain configuration ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 8 2G 6.3% 6m # Enable Networking ifconfig vsw0 plumb ifconfig e1000g0 down unplumb ifconfig vsw0 10.8.66.208 netmask 255.255.255.0 broadcast + up ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 vsw0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 inet 10.8.66.208 netmask ffffff00 broadcast 10.8.66.255 ether 0:14:4f:6a:9e:dc 154Saturday, May 2, 2009
  • 155. Ldom Service details 155Saturday, May 2, 2009
  • 156. Reconfiguration • Dynamic reconfiguration > Resource changes that take effect w/out reboot of domain • Delayed reconfiguration > Resource changes that take effect after a reboot • Resource examples: > VCPU, Memory, IO devices • Currently only VCPUs are dynamic 156Saturday, May 2, 2009
  • 157. Virtual Disk Server device (vds)‫‏‬ Delayed Reconfiguration • VDS runs in a service domain • Performs disk I/O on corresponding raw devices • Device types can be > A entire physical disk or LUN (can be san based)‫‏‬ > Single slice of disk or LUN > Disk image in a filesystem (e.g. ufs, zfs)‫‏‬ > Disk volumes (zfs, svm, VxVM)‫‏‬ > lofi devices NOT supported • Virtual Disk Client (vdc drivers)‫‏‬ > Requests standard block IO via the VDS > Classic client/server architecture 157Saturday, May 2, 2009
  • 158. Virtual Disk devices • Physical LUNS perform best • Disk image files efficient use of space • ZFS snapshots and clones give rapid provisioning • Network install not supported with > zfs volumes > single slice • Network install requries > entire disk > disk image file 158Saturday, May 2, 2009
  • 159. Virtual Network Switch services Delayed (vswitch)‫‏‬ Reconfiguration • Implements a layer-2 network switch • Connects virtual network devices to > To the physical network > or to each other (internal private network)‫‏‬ • vswitch not automatically used by service domain > must be plumbed 159Saturday, May 2, 2009
  • 160. Virtual Console Concentrator Delayed (vcc)‫‏‬ Reconfiguration • Provides console access to LDoms • Service domain VCC driver communicates with all guest console drivers over the Hypervisor > No changes required in guest console drivers (qcn)‫‏‬ • Makes each console available as a tty device on the Control/ Service domain • usage: telnet local host <port> 160Saturday, May 2, 2009
  • 161. Virtual Network Terminal Server Delayed daemon (vntsd) Reconfiguration • VCC implemented by vntsd • Runs in the Control/Service domain • Aggregates the VCC tty devices and makes them available over network sockets > Accessible once a domain is configured and bound > Attach prior to domain start to watch domain OBP boot sequence • Only one user at a time can view a serial console • Flexible support of port groups, IPs, port numbers etc > Not visible outside the Control/Service domain by default 161Saturday, May 2, 2009
  • 162. Example 2 Setting up the Guest Domain 162Saturday, May 2, 2009
  • 163. Primary ldm1 Guest Domain Unallocated Resources Solaris 10 11/06 Solaris 10 08/07 ldmd CPU Cpu CPU Cpu Mem Mem +app+patches drd In the control domain: vntsd CPU Cpu CPU Cpu Mem Mem /dev/dsk/c0d0s0 CPU Cpu CPU /dev/c0t1d0s0 Cpu Mem Mem ldm1-vdisk1 T2000 ldm1- vol1 Control & Service CPU Cpu CPU Cpu Mem Mem Crypto Crypto Guest ldm add-domain ldm1 /dev/e1000g0 primary CPU Cpu CPU Cpu Mem Crypto Mem Crypto ldom1 vsw0 CPU Cpu CPU vnet0 ldm add-mau 1 ldm1 Cpu Mem Crypto Mem Crypto ldm add-vcpu 4 ldm1 primary-vds0 Hypervisor primary-vsw0 ldm add-memory 4G ldm1 CPU Cpu CPU Cpu Hardware CPU Cpu CPU Cpu Shared CPU, ldm add-vnet vnet0 primary-vsw0 ldm1 Mem Mem Crypto Memory & IO Mem Mem Crypto IO Devices ldm add-vdsdev /dev/dsk/c0t1d0s2 ldm1-vol1@primary-vds0 PCI-E ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1 72GB 72GB Network ldm set-var auto-boot?=false ldm1 ldm set-var boot-device=vdisk ldm1 ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel- devices@200/network@0 ldm1 ldm bind-domain ldm1 • Watch the console of ldom1 using ... > telnet localhost 5000 ldm start-domain ldm1 163Saturday, May 2, 2009
  • 164. Disk Service Setup Primary ldm1 • Establish a Virtual Disk Service – primary-vds Solaris 10 11/06 Solaris 10 11/06 +app+patches ldmd /dev/dsk/c0d0s0 • Associate it with some form of vntsd drd media. ldm1-vdisk1 – A real device or slice /dev/dsk/ /dev/c0t1d0s2 c0t1d0s0 or ldm1-vol1 – or a disk image e.g. / Control & Service primary Guest ldom1 ldmzpool/ldg1 • Create disk server device instance to be exported to guest primary-vds Hypervisor domains – ldm1-vol1@primary-vds CPU Cpu CPU Cpu Hardware CPU Cpu CPU Cpu Shared CPU,  ldm add-vdsdev /dev/dsk/c0t1d0s2 Mem Mem Crypto Memory & IO Mem Mem Crypto ldm1-vol1@primary-vds0 IO Devices  ldm add-vdisk ldm1-vdisk1 ldm1- vol1@primary-vds0 ldm1 72GB 72GB PCI-E  (The disk device name can vary - find it via “ok show-devs”) Network 164Saturday, May 2, 2009
  • 165. Virtual Disk Client (vdc) Delayed Reconfiguration • vdcs are the objects passed to OBP and the Operating System in guest systems • Guest domain OBP and Solaris sees normal SCSI devices • Domain administrators may setup devaliases or use raw vdisk devices • vdc’s provide Guest domains with virtual disk devices (vdisks) via device instances from Virtual Disk Servers running in the Service Domains(s) • A future release will provide virtualised access to DVD/CD-ROM in service domains 165Saturday, May 2, 2009
  • 166. Network Setup Primary ldom1 • Establish a Virtual Network Switch Solaris 10 08/07 Solaris 10 11/06 Services ldmd +app+patches – primary-vsw0 vntsd drd > Automatically associated with a vsw device instance – vsw0 Control & Service Guestvnet0 primary primary- ldom1 • May or may not choose to vsw0 vnet0@ldm1 associate it with media. – e1000g0 a real NIC primary-vsw Hypervisor – or no NIC . in memory CPU Cpu CPU Cpu CPU Cpu CPU Cpu • Create a network device instance Hardware to provide to guest domains Mem Mem Crypto Shared CPU, Memory & IO Mem Mem Crypto – vnet0@ldm1 IO Devices e1000g0 72GB PCI-E Network 166Saturday, May 2, 2009
  • 167. Virtual Network Device (vnet) Delayed Reconfiguration • Implements an ethernet device in a domain > Communicates with other vnets or the outside world over vswitch devices • If the vSwitch is suitably configured, packets can be routed out of the server. • vnet exports a GLDv3 interface > A simple virtual Ethernet NIC > Enumerates as a vnetx device > For domain-domain transfers, vnets connect directly. 167Saturday, May 2, 2009
  • 168. Memory Delayed Reconfiguration • Memory is configured through the Control Domain • Minimum allocatable chunk is 8kB > Minimum size is 12MB (for OBP)‫‏‬ > Though most OS deployments will need > 512M • If memory is added over time to a domain > Memory device bindings within a domain may appear to show that memory fragmentation is occuring > Not a problem, all handled in HW by the MMU > No performance penalty 168Saturday, May 2, 2009
  • 169. vCPUs Immediate Reconfiguration • Each UltraSPARC T1 has up to 8 physical cores with 4 threads each > Each thread is considered a vCPU, so up to 32 vCPUs or Domains • Each UltraSPARC T2 has up to 8 physical cores with 8 threads each > Each thread is considered a vCPU, so up to 64 vCPUs or Domains • Maximum Granularity is 1 vCPU per domain • vCPUs can only be allocated to one Domain at a time. • Can be dynamically allocated with the Domain running, > Take care if removing a vcpu from a running domain, will there be enough compute power left in the domain ? 169Saturday, May 2, 2009
  • 170. Example 3 Guest Domains and ZFS 170Saturday, May 2, 2009
  • 171. Using ZFS (1) – setup zfs 1. Remove the disk from the service domain ldm stop-domain ldm1 LDom ldm1 stopped ldm unbind-domain ldm1 ldm remove-vdsdev ldm1-vol1@primary-vds0 2. Create a zpool root@cmt1 > zfs create mypool/ldoms root@cmt1 > zfs create mypool/ldoms/ldm1 root@cmt1 > cd /export/ldoms/ldm1 root@cmt1 > ls root@cmt1 > mkfile 12G `pwd`/rootdisk 171Saturday, May 2, 2009
  • 172. Using ZFS (2) – setup guest domain 3. Configure the guest domain root@cmt1 > ldm add-domain ldm1 root@cmt1 > ldm add-vcpu 8 ldm1 root@cmt1 > ldm add-memory 1G ldm1 root@cmt1 > ldm add-vnet vnet0 primary-vsw0 ldm1 root@cmt1 > ldm add-vdsdev /export/ldoms/ldm1/rootdisk ldm1-vol1@primary-vds0 root@cmt1 > ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1 root@cmt1 > ldm set-var auto-boot?=false ldm1 root@cmt1 > ldm set-var boot-device=ldm1-vdisk1 ldm1 root@cmt1 > ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel- devices@200/network@0 ldm1 172Saturday, May 2, 2009
  • 173. Using ZFS (3) – setup guest domain 4. Start the guest domain root@cmt1 > ldm bind-domain ldm1 root@cmt1 > ldm start-domain ldm1 LDom ldm1 started 5. Inspect the domain root@cmt1 > ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 8 2G 0.7% 17h 12m ldm1 active -t--- 5000 8 1G 13% 7s telnet localhost 5000 {ok} boot vnet0 - install installation goes forward 173Saturday, May 2, 2009
  • 174. Provision the guest 6. Set up for jumpstart Determine the mac address root@cmt1 > ldm list-bindings ldm1 [snip] NETWORK NAME SERVICE DEVICE MAC vnet0 primary-vsw0@primary network@0 00:14:4f:f8:2a:c4 PEER MAC primary-vsw0@primary 00:14:4f:46:41:b4 telnet localhost 5000 {0} ok banner SPARC Enterprise T5120, No Keyboard [snip] Ethernet address 0:14:4f:fb:7:42, Host ID: 83fb0742. 174Saturday, May 2, 2009
  • 175. Provision the guest (2)‫‏‬ {0} ok boot vnet0 - install Boot device: /virtual-devices@100/channel-devices@200/network@0ile and args: - install Requesting Internet Address for 0:14:4f:f8:2a:c4 SunOS Release 5.10 Version Generic_120011-14 64-bit ... How to break telnet> send brk Debugging requested; hardware watchdog suspended. c)ontinue, s)ync, r)eboot, h)alt? r Resetting... {0} ok 175Saturday, May 2, 2009
  • 176. Guest Domain (zfs) login {0} ok boot Boot device: ldm1-vdisk1 File and args: SunOS Release 5.10 Version Generic_120011-14 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: ldm1 ldm1 console login: 176Saturday, May 2, 2009
  • 177. Using ZFS (2) – cloning domains Snapshot and Clone the installed boot disk tgendron@cmt1 > zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 12.0G 54.9G 27.5K /export mypool/ldoms 12.0G 54.9G 25.5K /export/ldoms mypool/ldoms/ldm1 12.0G 54.9G 12.0G /export/ldoms/ldm1 root@cmt1 > zfs snapshot mypool/ldoms/ldm1@initial Create the clones root@cmt1 > zfs snapshot mypool/ldoms/ldm1@initial root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm2 root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm3 root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm4 root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm5 177Saturday, May 2, 2009
  • 178. Using ZFS (2) – Leverage the clones 4. Create the new guest domains (should be easily to script this)‫‏‬ ldm add-domain ldm2 ldm add-vcpu 8 ldm2 ldm add-memory 1G ldm2 ldm add-vnet vnet0 primary-vsw0 ldm2 ldm add-vdsdev /export/ldoms/ldm2/rootdisk ldm2-vol1@primary-vds0 ldm add-vdisk ldm2-vdisk1 ldm2-vol1@primary-vds0 ldm2 ldm set-var auto-boot?=false ldm2 ldm set-var boot-device=vdisk ldm2 ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel-devices@200/network@0 ldm2 ldm bind-domain ldm2 ldm start-domain ldm2 178Saturday, May 2, 2009
  • 179. Boot the cloned ldom {0} ok boot Boot device: vdisk File and args: SunOS Release 5.10 Version Generic_120011-14 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. WARNING: vnet0 has duplicate address 010.030.019.178 (in use by 00:14:4f:f8:2a:c4); disabled Feb 13 19:55:29 svc.startd[7]: svc:/network/physical:default: Method "/lib/svc/method/net-physical" failed with exit status 96. Feb 13 19:55:29 svc.startd[7]: network/physical:default misconfigured: transitioned to maintenance (see svcs -xv for details)‫‏‬ Hostname: ldm1... 179Saturday, May 2, 2009
  • 180. Example 4 Split Service Domains 180Saturday, May 2, 2009
  • 181. Sun Fire T2000 Block Diagram 181Saturday, May 2, 2009
  • 182. Split IO Example • Setting up a second Service domain with split PCI busses... -bash-3.00# ldm list-bindings primary Name: primary ... IO: pci@780 (bus_a)‫‏‬ pci@7c0 (bus_b)‫‏‬ ... -bash-3.00# df / / (/dev/dsk/c1t0d0s0 ):28233648 blocks 3450076 files -bash-3.00# ls -l /dev/dsk/c1t0d0s0 lrwxrwxrwx 1 root root 65 Apr 11 13:25 /dev/dsk/c1t0d0s0 -> ../../devices/ pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0:a -bash-3.00# grep e1000g /etc/path_to_inst "/pci@780/pci@0/pci@1/network@0" 0 "e1000g" "/pci@780/pci@0/pci@1/network@0,1" 1 "e1000g" "/pci@7c0/pci@0/pci@2/network@0" 2 "e1000g" "/pci@7c0/pci@0/pci@2/network@0,1" 3 "e1000g" -bash-3.00# ldm remove-io pci@780 primary .. -bash-3.00# shutdown -i6 -y -g0 .. -bash-3.00# ldm add-io pci@780 second-svrc-dom -bash-3.00# ldm start second-srvc-dom -bash-3.00# ldm list-bindings .. -bash-3.00# Check which PCI bus ports we own and are currently using and be sure to only give away unused ones... i.e need to retain the Control Domain boot disk controller and network device... Providing a PCI bus to a Guest makes the selected Domain a Service domain, by definition – access to physical IO = Service Domain. 182Saturday, May 2, 2009
  • 183. Sun Fire T5x20 Block Diagram 16 x FB-DIMMs Disk Chassis 1RU 2RU/8 SSI x8 FPGA 10GbE 10GbE x4 PCI-E LSI x4 Switch x8 MPC885 SAS links x4 1068E ILOM PLX 8533 Service DVD x4 Processor PCI-E x4 x8 Switch USB 2.0 PCI-E to IDE PCI-E x1 Switch PLX 8533 to x8 10GbE 2.0 USB PLX 8517 x4 x4 x4 SerDes BCM8704 USB 2.0 x4 x4 Hub 10GbE 2RU Only USB Intel Intel Cu PHY 2.0 BCMxxxx Dual Dual GbE GbE XFPFront Panel 10GbE Fibre 0 1 2 3 Plugin USB Quad GbE PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E Serial Network POSIX Rear Panel 183 Connectors x16 x8 x8 x8 x8 x8 Mgt Mgt Serial DB-9Saturday, May 2, 2009
  • 184. MPxIO considerations • MPxIO can be used in the Service/Control domain • Very straightforward to configure with defaults... > Ensure you have two FC-AL HBAs in a single service domain attached to the the same SAN array > Check that you have two paths to the same SAN devices (ls /dev/dsk/)‫‏‬ > Enable MPxIO by running the command stmsboot -e and rebooting the control/ service domain > Check that you now have only a single path to the SAN devices... 184Saturday, May 2, 2009
  • 185. IPMP considerations • IPMP has several options for configurations > Refer to the Admininstration Guide for worked examples... > Options are Multipathing in the Service Domain or Multipathing in the Guest Domain 185Saturday, May 2, 2009
  • 186. Ldom 1.0.1 Best Practice Guidence 186Saturday, May 2, 2009
  • 187. Ldom Best Practice (1)‫‏‬ • Control Domain > Runs LDM daemon processes > Must have adequate CPU and memory > Start w/ 1 core (4 or 8 threads) 1GB Memory > Make this domain as secure as possible 187Saturday, May 2, 2009
  • 188. Ldom Best Practice (2)‫‏‬ • I/O and service domains > Runs IO for other domains > Resources will be sized based on IO load > Start w/ 1 core and 1GB memory > 4GB of memory if zfs used for virtual disks images > Add complete cores as heavier I/O loads 188Saturday, May 2, 2009
  • 189. Ldom Best Practice (3)‫‏‬ • Core/Thread Affinity > Core resources are shared by threads > E.g. L1 cache and MAU, FPU • Best to avoid allocating the threads of a core to separate domains • Create larger Ldoms first using complete cores • Smaller domains last 189Saturday, May 2, 2009
  • 190. Ldom Best Practice (4)‫‏‬ Delayed Reconfiguration • Cypto Units • Each T1/T2 physical CPU Core has a Crypto Unit > 8 in total on a 8 core system > referred to as (MAU)‫‏‬ • Crypto cores can only be allocated to domains that have at least one vcpu(thread) on the same physical Core as the crypto unit • Crypto cores cannot be shared, they are owned by exactly one (or no) Domain • Probably best to allocate all four/eight threads on a Core to a domain that wants to use the Crypto core 190Saturday, May 2, 2009
  • 191. More on Crypto Units • For example we define three domains in order of LDOM1 LDOM2 LDOM3 LDOM1 then LDOM2 then LDOM 3... • LDOM1 has 3 threads (vCPUs) on Core 0 > Only has access to MAU0 since it only has threads on Core 0 T1 Core 0 T1 Core 1 T1 Core 2 • LDOM2 has 6 vCPUs spread across Cores 0, 1 & 2 > Potentially has access to MAUs 0,1 & 2 > BUT.. LDOM1 already binds MAU0 > So only can take MAU1 and MAU2 MAU MAU MAU • LDOM3 has 3 vCPUs on Core 2 0 1 2 > But cant access any MAUs since LDOM2 has already taken MAU2 • Adding and removing vCPUs can cause access to previously accessible MAUs to be lost, currently you cant elect specific vCPUs, framework does that itself • When MAUs are allocated to Domains, vCPUs become delayed reconfiguration properties in those domains 191Saturday, May 2, 2009
  • 192. Ldom Best Practice (5)‫‏‬ • Plan your LDOM configuration carefully, reconfiguration may become awkward • Use easy to understand names > Try not to overload vds, vsw, ldom, vdisk,vnic etc... • Use MPxIO or VxVM, VxFS, Sun Cluster on service domains (only VxFS in Guests) for resilient storage devices • Use IPMP on Guest or Service Domains for resilient network connections 192Saturday, May 2, 2009
  • 193. Ldom Best Practice (6)‫‏‬ • For hi-speed inter-domain comms use device-less/in-memory VSW configs • For high disk performance, allocate a whole real device via a dedicated, properly sized Virtual Disk Server and Service domain • Look at the server architecture when configuring devices to ensure you get the bandwidth you expect • For critical applications consider hot/warm standby domains across multiple physical servers, never rely on multiple instances within a single server. 193Saturday, May 2, 2009
  • 194. LDOMs v1.0.1 Notes • All domains can be Stopped and Started independently > Beware, Guest domains attempting to perform IO using a rebooting Service domain will stall until the Service domain returns. • LDOM SNMP MIB available now with traps and requests to the LDOM framework • MAC address on banner different from what is raprd for jumpstart • Only vcpus can be dynamically reallocated > BUT... if the domain has crypto cores this becomes a delayed reconfiguration > You cannot choose which vCPUs are allocated to a domain • By default the Control/Service domain cannot network with Guest domains > Plumb the vSwitch vsw device to enable communications > Give the vsw device the e1000g devices MAC address • Check you have the latest versions of the documents, Software & Firmware 194Saturday, May 2, 2009
  • 195. SVM, VxVM, ZFS Volume managers • SVM, VxVM and ZFS volumes can be exported from a Service Domain to Guest domains and appear as virtual disks to the Guest Domains > Always appear as a disk with only one s0 slice > Cant be used as Solaris Install targets...yet, just use for data storage • Can export a disk image file placed in one of these volumes as a full disk image to Guest domains > Allows use of the disk as Solaris Install Target > Doing this with ZFS allows very efficient re-use of images using ZFS Snapshotting and Cloning and Compression > Invisibly bestows the benefits of the underlying Volume manager on the disks available to the Guest domains > Using SVM allows either Guest or Service domain to access the disk image, allowing for off-line maintenan c e of the guest domain filesystems (only one at a time can mount the filesystem)‫‏‬ • VxVM can only be used in the Service domain, not Guest domains 195Saturday, May 2, 2009
  • 196. Solaris Cluster 3.2 Support • Sun Cluster 3.2 is now supported in IO Domains > i.e domains with real physical devices, PCI busses or NIU devices • Please check the web site here for more infom on deployment scenarios > http://blogs.sun.com/SC/entry/announcing_solaris_cluster_support_in 196Saturday, May 2, 2009
  • 197. Logical Domains (LDoms) Roadmap • LDoms 1.0 > LDoms 1.0.1 - CURRENT > Niagara support – Niagara2 support > Up to 32 LDOMs per system, guest domain – I/O domain reboot support may be rebooted independently – Control domain minimization > Virtualized console, ethernet, disk & – SNMP MIB cryptographic acceleration – Web management tool > Live re-configuration of virtual CPUs (freeware/unsupported)‫‏‬ > FMA diagnosis for each domain > Control domain hardening * Requiring new Solaris 10 update 197Saturday, May 2, 2009
  • 198. References for further information • http://www.sun.com/ldoms • Sun Blueprints relating to LDOMs – http://www.sun.com/blueprints/0207/820-0832.html – http://www.sun.com/blueprints/0807/820-3023.html • SDLC Release of LDOMs – http://www.sun.com/download/products.xml?id=46e5ba66 • Official Documentation for the SDLC release – http://www.sun.com/servers/coolthreads/ldoms/get.jsp • LDOMs Blogs – http://blogs.sun.com/hlsu/entry/logincal_domains_1_0_1 • OpenSolaris LDOMs community – http://www.opensolaris.org/os/community/ldoms/ 198Saturday, May 2, 2009
  • 199. LDOMS Introduction and Hands-On-Training Peter Baer Galvin With Thanks to: Tom Gendron Chief Technologist SPARC Systems Technical Specialist Corporate Technologies Sun Microsystems 199 1Saturday, May 2, 2009
  • 200. Agenda • Virtualization Comparisons • Concepts of LDOMs • Requirements of LDOMs • Examples • Best Practices 200Saturday, May 2, 2009
  • 201. Single application per server The Data Center Today Server sprawl is hard to manage Client App App Mail Service Average server Server Server Server Database Database Developer Application utilization between Management NETWORK Data Center 5 to 15 % OS Server Energy costs continue to rise Storage 201Saturday, May 2, 2009
  • 202. A widely understood problem 202Saturday, May 2, 2009
  • 203. Virtualization: Who and Why InformationWeek: Feb 12, 2007 http://www.informationweek.com/news/showArticle.jhtml?articleID=197004875 203Saturday, May 2, 2009
  • 204. Server Virtualization Hard Partitions Virtual Machines OS Virtualization Resource Mgmt. App Identity File Web Mail Calendar Web SunRay App Server Database Server Server Server Server Server Database Server Server Database Server App OS Server Multiple OSs Single OS > Very High RAS > Live OS migration capability > Very scalable and low overhead > Very scalable and low overhead > Very Scalable > Improved Utilization > Single OS to manage > Single OS to manage > Mature Technology > Ability to run different OS > Cleanly divides system and > Fine grained resource versions and types application administration management > Ability to run different OS versions > De-couples OS and HW > Fine grained resource versions management > Complete Isolation 204Saturday, May 2, 2009
  • 205. Para vs. Full Virtualization Para-Virtualization • Para-virtualization: App > OS ported to special architecture File Web Mail Server Server Server > Uses generic “virtual” device drivers OS > More efficient since it is “hypervisor” aware > “almost” native performance Server • Full virtualization: Full Virtualization > OS has no idea it is running virtualized File Server Web Server Mail Server App > Must emulate real i/o devices > Can be slow/need help from hardware OS > May use traps, emulation or rewriting Control Domain Server 205Saturday, May 2, 2009
  • 206. What is an LDOM? • It is a virtual server • Has its own console and OBP instance • A configurable allocation of CPU, FPU, Disk, Memory and I/O components • Runs a unique OS/patch image and configuration • Has the capability to stop, start and reboot independently • Utilizes a Hypervisor to facilitate LDOMs 206Saturday, May 2, 2009
  • 207. Requirements for LDOMs • Sun T-Series server > T1/2000 T5x20 rack servers > T6100, T6120 blade > Any future CMT based server • Up to date Firmware on service processor http://sunsolve.sun.com/handbook_pub/validateUser.do?target=index • minimum Solaris 10 11/06 on T1/2000, T6100 • minimum Solaris 10 08/07 T5x20, T6120 • Ldom Manager Software 1.0.1 + patches 207Saturday, May 2, 2009
  • 208. Hypervisor • A thin interface between the Hardware and Solaris • The interface is called sun4v • Solaris calls the sunv4 interface to use hardware specific functions • It is very simple and is implemented in firmware • It allows for the creation of ldoms • It creates communication channels between ldoms 208Saturday, May 2, 2009
  • 209. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 11/06 Solaris 10 08/07 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd vntsd drd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain /dev/lofi/1 CPU Cpu CPU CPU Cpu Cpu CPU Mem Mem Crypto vdisk0 vdisk1 Cpu Mem vol1 Multiple Guest Mem • Control & Service vnet0 Guest Guestvnet0 primary CPU Cpu CPU Cpu Mem Crypto Crypto ldom1 ldom2 Mem Domains vsw0 CPU Cpu CPU vnet0 vnet1 Cpu Mem Crypto Mem Crypto • Virtualised devices primary-vds0 Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU,Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 209Saturday, May 2, 2009
  • 210. LDOMs types • Different Ldom Types - Control Domain - Hosts the Logical Domain Manager (LDM) - Service Domains - Provides virtual services to other domains - I/0 Domains - Has direct access to physical devices - Guest Domains - Used to run user environments • Control, Service and I/O domains can be combined or separate > One of the I/O domains must be the control domain 210Saturday, May 2, 2009
  • 211. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 11/06 Solaris 10 08/07 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd vntsd drd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain ZFS FS CPU Cpu CPU CPU Cpu Cpu CPU Mem Mem Crypto vdisk0 vdisk1 Cpu Mem vol1 Multiple Guest Mem • Control & Service vnet0 Guest Guestvnet0 primary CPU Cpu CPU Cpu Mem Crypto Crypto ldom1 ldom2 Mem Domains vsw0 CPU Cpu CPU vnet0 vnet1 Cpu Mem Crypto Mem Crypto • Virtualised devices primary-vds0 Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU,Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 211Saturday, May 2, 2009
  • 212. Control Domain • Creates and manages other LDOMs • Runs the LDOM Manager software • Allows monitoring and reconfiguration of domains • Recommendation: > Make this Domain as secure as possible 212Saturday, May 2, 2009
  • 213. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 11/06 Solaris 10 08/07 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd vntsd drd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain ZFS FS CPU Cpu CPU CPU Cpu Cpu CPU Mem Mem Crypto vdisk0 vdisk1 Cpu Mem vol1 Multiple Guest Mem • Control & Service vnet0 Guest Guestvnet0 primary CPU Cpu CPU Cpu Mem Crypto Crypto ldom1 ldom2 Mem Domains vsw0 CPU Cpu CPU vnet0 vnet1 Cpu Mem Crypto Mem Crypto • Virtualised devices primary-vds0 Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU,Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 213Saturday, May 2, 2009
  • 214. Service Domain • Provides services to other domains – virtual network switch – virtual disk service – virtual console service • Multiple Service domains can exist with shared or sole access to system facilities • Allows for IO load separation and redundancy within domains deployed on a platform • Often Control and Service Domains are one and the same 214Saturday, May 2, 2009
  • 215. IO Domain • IO Domain has direct access to physical input and output devices. • The number of IO domains is hardware dependent > currently limited to 2 > limited by PCI-E switch configuration • One IO domain must also be the control domain 215Saturday, May 2, 2009
  • 216. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 11/06 Solaris 10 08/07 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd vntsd drd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain ZFS FS CPU Cpu CPU CPU Cpu Cpu CPU Mem Mem Crypto vdisk0 vdisk1 Cpu Mem vol1 Multiple Guest Mem • Control & Service vnet0 Guest Guestvnet0 primary CPU Cpu CPU Crypto ldom1 ldom2 Cpu Mem Mem Crypto vsw0 Domains CPU Cpu CPU Cpu Mem Mem Crypto Crypto vnet0 vnet1 • Virtualised devices primary-vds0 Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU,Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 216Saturday, May 2, 2009
  • 217. Guest Domains • Contain the targeted applications the LDOMs were created to service. • Multiple Guest domains can exist > Constrained only by hardware limitations • May use one or more Service domains to obtain IO > Various redundancy mechanisms can be used • Can be independently powered and rebooted and without affecting other domains 217Saturday, May 2, 2009
  • 218. Key LDOMs components Primary/Control ldom1 ldom2 • The Hypervisor Unallocated Resources Solaris 10 11/06 Solaris 10 08/07 Solaris 10 08/07 +app+patches +app+patches • The Control Domain ldmd vntsd drd CPU Cpu /dev/dsk/c0d0s0 /dev/dsk/c0d0s0 • The Service Domain ZFS FS CPU Cpu CPU CPU Cpu Cpu CPU Mem Mem Crypto vdisk0 vdisk1 Cpu Mem vol1 Multiple Guest Mem • Control & Service vnet0 Guest Guestvnet0 primary CPU Cpu CPU Crypto ldom1 ldom2 Cpu Mem Mem Crypto vsw0 Domains CPU Cpu CPU Cpu Mem Mem Crypto Crypto vnet0 vnet1 • Virtualised devices primary-vds0 Hypervisor primary-vsw0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU Cpu CPU CPU Cpu Cpu Hardware Mem Shared CPU,Mem Mem Mem Crypto Mem Crypto Mem Crypto Memory & IO Mem Mem Crypto IO Devices 72GB PCI-E Network 218Saturday, May 2, 2009
  • 219. Virtual devices • Virtual devices are hardware resources abstracted by the hypervisor and made available for use by the other domains • Virtual devices are : > CPUs - VCPU > Memory - > Crypto cores - MAU > Network switches - VSW > NICs - VNET > Disk servers - VDSDEV > Disks - VDISK > Consoles - VCONS 219Saturday, May 2, 2009
  • 220. Example 1 Install Ldom Manager & Setting up the Control Domain 220Saturday, May 2, 2009
  • 221. Example 1 steps • Update firmware to latest release • Install Supported version of Solaris • Install Logical Domain Manager (LDM) software • Configure the control domain • Save initial domain config • Reboot Solaris 221Saturday, May 2, 2009
  • 222. A note on system interfaces • Provide out-of-band management • Two types (iLOM and ALOM)‫‏‬ • T1/2000 uses ALOM interface • T5x20 uses iLOM • iLOM “CLI” has a ALOM compatibility shell > ALOM shell used in the examples • A web based interface available • (SC = system controller, SP = system processor)‫‏‬ > essentially the same thing. 222Saturday, May 2, 2009
  • 223. Web based iLOM interface 223Saturday, May 2, 2009
  • 224. ALOM compatibility shell • login to SP as root/changeme • -> create /SP/users/admin • -> set /SP/users/admin role=Administrator • -> set /SP/users/admin cli_mode=alom – Creating user ... – Enter new password: ******** – Enter new password again: ******** – Created /SP/users/admin • exit • login as admin 224Saturday, May 2, 2009
  • 225. Step 1 Firmware verification and update 225Saturday, May 2, 2009
  • 226. System Identification and Update • Check the Service Processor of your system for firmware levels • using alom mode (showhost not available in bui)‫‏‬ sc> showhost Sun System Firmware 7.0.1 2007/09/14 16:31 Host flash versions: Hypervisor 1.5.1 2007/09/14 16:11 OBP 4.27.1 2007/09/14 15:17 Check SC Firmware POST 4.27.1 2007/09/14 15:43 version 7.0.1 • Upgrade your system firmware if needed... > flashupdate command > sysfwdownload (via Solaris on platform)‫‏‬ > BUI 226Saturday, May 2, 2009
  • 227. Firmware update example sc> showkeyswitch Keyswitch is in the NORMAL position. sc> flashupdate -s 10.8.66.15 -f /incoming//Sun_System_Firmware-6_4_6-Sun_Fire_T2000.bin Username: tgendron Password: ******** SC Alert: System poweron is disabled. Update complete. Reset device to use new software. sc> sc> resetsc telnet and login back in once up. sc> showhost Sun-Fire-T2000 System Firmware 6.5.5 2007/10/28 23:09 227Saturday, May 2, 2009
  • 228. Firmware update example 2 Step 1: From Solaris running on T5120 with the SP to update Download the patch from Sun Solve 127580-05.zip Step 2: unzip and cd into 127580-05 Step 3: run sysfwdownload [image].pkg Step 4: reboot solaris sc> resetsc 228Saturday, May 2, 2009
  • 229. Installing LDOM manager software • T5x20 requires Solaris 10 8/07 or greater • T1/2000 requires Solaris 10 11/06 or greater + * 124921-02 at a minimum * 125043-01 at a minimum * 118833-36 at a minimum • 11/06 is minimum for guests • ldm 1.0.2 is current > includes Solaris Security Toolkit (optional)‫‏‬ 229Saturday, May 2, 2009
  • 230. Install the LDM Software • Unzip and install w/installation script • Security of Control Domain is important > Recommend selecting the JASS secure configuration • Once complete entire system is one LDOM • LDOM software installed in /opt/SUNWldm # [cmt1/root] ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 64 8064M 0.0% 3h 19m [cmt1/root] All the system resource are in domain “primary” * Follow the Administration Guide to install required OS and patches 230Saturday, May 2, 2009
  • 231. Flag Definitions # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 32 32640M 0.1% 6d 20h 24m # - placeholder c control domain d delayed reconfiguration n normal s starting or stopping t transition v virtual I/O domain 231Saturday, May 2, 2009
  • 232. Example 1 Part 2 Setting up the Control Domain 232Saturday, May 2, 2009
  • 233. On naming things... • Choose LDOM component names carefully > Names are used to manage the devices > Bad choices can be very confusing later on... > Keep names short and specific... • You need names for ... > Disk Servers, and disk device instances > Network Virtual Switches, and network device instances > Domains • Service and device names are only known to the Control and Service domains – Guest domains just see virtual devices. 233Saturday, May 2, 2009
  • 234. Control/Service Domain • On our Primary Domain do the following ... • In this example Control and Service are combined Primary > Control domain runs the LDM CPU Cpu > Service domain has these services set up: CPU Unallocated Resources Solaris 10 08/07 Cpu • Set up the basic services needed. ldmd drd CPUCpu Cpu CPU Mem Mem vntsd CPU Cpu CPU Mem Mem > vds - virtual disk service Cpu Mem Mem CPU Cpu CPU Cpu Mem Mem > vcc - virtual console concentrator CPU Crypto Crypto Cpu CPU Cpu Mem vcc0 vds0 Mem > vsw - virtual network switch Control & Service primary CPUCpu Cpu CPU Mem Crypto Crypto Mem vsw0 CPUCpu • The service names in this example are below: Cpu CPU Mem Crypto Mem Crypto Crypto Crypto > primary-vds0 primary-vds Hypervisor primary-vsw > primary-vcc0 CPU Cpu CPU Cpu Hardware > primary-vsw0 Mem Shared CPU, Mem Crypto Memory & IO • Allocate resources IO Devices > CPU, Memory, Crypto, IO devices PCI-E 72GB 72GB Network 234Saturday, May 2, 2009
  • 235. Control/Service Domain set-up (1)‫‏‬ # Add services to the control domain # The mac address taken from a physical interface, e.g., e1000g0. ldm add-vds primary-vds0 primary ldm add-vcc port-range=5000-5100 primary-vcc0 primary ldm add-vsw mac-addr=0:14:4f:6a:9e:dc net-dev=e1000g0 primary-vsw0 primary # Activate the virtual network terminal server svcadm enable vntsd # Allocate resources to the control domain and save ldm set-mau 1 primary ldm set-vcpu 8 primary ldm set-memory 2G primary ldm add-spconfig my-initial # Reboot required to have the configuraiton take effect. init 6 235Saturday, May 2, 2009
  • 236. Crypto Note Note–If you have any cryptographic devices in the control domain, you cannot dynamically reconfigure CPUs. So if you are not using cryptographic devices, set-mau to 0. 236Saturday, May 2, 2009
  • 237. Control/Service Domain set-up (2)‫‏‬ # Verify the primary domain configuration ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 8 2G 6.3% 6m # Enable Networking ifconfig vsw0 plumb ifconfig e1000g0 down unplumb ifconfig vsw0 10.8.66.208 netmask 255.255.255.0 broadcast + up ifconfig -a lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 vsw0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3 inet 10.8.66.208 netmask ffffff00 broadcast 10.8.66.255 ether 0:14:4f:6a:9e:dc 237Saturday, May 2, 2009
  • 238. Ldom Service details 238Saturday, May 2, 2009
  • 239. Reconfiguration • Dynamic reconfiguration > Resource changes that take effect w/out reboot of domain • Delayed reconfiguration > Resource changes that take effect after a reboot • Resource examples: > VCPU, Memory, IO devices • Currently only VCPUs are dynamic 239Saturday, May 2, 2009
  • 240. Virtual Disk Server device (vds)‫‏‬ Delayed Reconfiguration • VDS runs in a service domain • Performs disk I/O on corresponding raw devices • Device types can be > A entire physical disk or LUN (can be san based)‫‏‬ > Single slice of disk or LUN > Disk image in a filesystem (e.g. ufs, zfs)‫‏‬ > Disk volumes (zfs, svm, VxVM)‫‏‬ > lofi devices NOT supported • Virtual Disk Client (vdc drivers)‫‏‬ > Requests standard block IO via the VDS > Classic client/server architecture 240Saturday, May 2, 2009
  • 241. Virtual Disk devices • Physical LUNS perform best • Disk image files efficient use of space • ZFS snapshots and clones give rapid provisioning • Network install not supported with > zfs volumes > single slice • Network install requries > entire disk > disk image file 241Saturday, May 2, 2009
  • 242. Virtual Network Switch services (vswitch)‫‏‬ Delayed Reconfiguration • Implements a layer-2 network switch • Connects virtual network devices to > To the physical network > or to each other (internal private network)‫‏‬ • vswitch not automatically used by service domain > must be plumbed 242Saturday, May 2, 2009
  • 243. Virtual Console Concentrator (vcc)‫‏‬ Delayed Reconfiguration • Provides console access to LDoms • Service domain VCC driver communicates with all guest console drivers over the Hypervisor > No changes required in guest console drivers (qcn)‫‏‬ • Makes each console available as a tty device on the Control/Service domain • usage: telnet local host <port> 243Saturday, May 2, 2009
  • 244. Virtual Network Terminal Server daemon (vntsd) Delayed Reconfiguration • VCC implemented by vntsd • Runs in the Control/Service domain • Aggregates the VCC tty devices and makes them available over network sockets > Accessible once a domain is configured and bound > Attach prior to domain start to watch domain OBP boot sequence • Only one user at a time can view a serial console • Flexible support of port groups, IPs, port numbers etc > Not visible outside the Control/Service domain by default 244Saturday, May 2, 2009
  • 245. Example 2 Setting up the Guest Domain 245Saturday, May 2, 2009
  • 246. Primary ldm1 Guest Domain Unallocated Resources Solaris 10 08/07 Solaris 10 11/06 ldmd CPU Cpu CPU Mem Mem +app+patches In the control domain: Cpu drd vntsd CPU Cpu CPU Mem /dev/dsk/c0d0s0 Cpu Mem CPU Cpu CPU /dev/c0t1d0s0 Cpu Mem Mem ldm1-vdisk1 T2000 CPU Crypto Crypto Cpu CPU Cpu Mem ldm1-vol1 Mem ldm add-domain ldm1 Control & Service /dev/e1000g0 primary CPU Cpu CPU Cpu Mem Mem Crypto Crypto Guest ldom1 ldm add-mau 1 ldm1 vsw0 CPU Cpu CPU Cpu Mem Mem Crypto Crypto vnet0 ldm add-vcpu 4 ldm1 primary-vds0 Hypervisor primary-vsw0 ldm add-memory 4G ldm1 CPU Cpu CPU Cpu CPU Cpu CPU Cpu Hardware ldm add-vnet vnet0 primary-vsw0 ldm1 Mem Mem Crypto Shared CPU, Mem Mem Crypto Memory & IO IO Devices ldm add-vdsdev /dev/dsk/c0t1d0s2 ldm1-vol1@primary-vds0 PCI-E ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1 72GB 72GB Network ldm set-var auto-boot?=false ldm1 ldm set-var boot-device=vdisk ldm1 ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel- devices@200/network@0 ldm1 ldm bind-domain ldm1 ldm start-domain ldm1 • Watch the console of ldom1 using ... > telnet localhost 5000 246Saturday, May 2, 2009
  • 247. Disk Service Setup Primary ldm1 • Establish a Virtual Disk Service – primary-vds Solaris 10 11/06 Solaris 10 11/06 +app+patches • Associate it with some form of media. ldmd drd /dev/dsk/c0d0s0 – A real device or slice /dev/dsk/ vntsd ldm1-vdisk1 c0t1d0s0 or /dev/c0t1d0s2 – or a disk image e.g. /ldmzpool/ ldg1 ldm1-vol1 Control & Service Guest primary ldom1 • Create disk server device instance to be exported to guest domains – ldm1-vol1@primary-vds primary-vds Hypervisor  ldm add-vdsdev /dev/dsk/c0t1d0s2 ldm1- vol1@primary-vds0 CPU Cpu CPU Cpu CPU Cpu CPU Cpu Hardware  ldm add-vdisk ldm1-vdisk1 ldm1- Mem Mem Crypto Shared CPU, Mem Mem Crypto vol1@primary-vds0 ldm1 Memory & IO  (The disk device name can vary - find it via IO Devices “ok show-devs”) 72GB 72GB PCI-E Network 247Saturday, May 2, 2009
  • 248. Virtual Disk Client (vdc) Delayed Reconfiguration • vdcs are the objects passed to OBP and the Operating System in guest systems • Guest domain OBP and Solaris sees normal SCSI devices • Domain administrators may setup devaliases or use raw vdisk devices • vdc’s provide Guest domains with virtual disk devices (vdisks) via device instances from Virtual Disk Servers running in the Service Domains(s) • A future release will provide virtualised access to DVD/CD-ROM in service domains 248Saturday, May 2, 2009
  • 249. Network Setup Primary ldom1 • Establish a Virtual Network Switch Solaris 10 08/07 Solaris 10 11/06 Services +app+patches ldmd – primary-vsw0 drd vntsd > Automatically associated with a vsw device instance – vsw0 Control & Service Guest vnet0 primary ldom1 • May or may not choose to associate it primary-vsw0 with media. vnet0@ldm1 – e1000g0 a real NIC primary-vsw – or no NIC . in memory Hypervisor CPU CPU CPU CPU • Create a network device instance to Cpu Cpu Hardware Cpu Cpu provide to guest domains Mem Mem Crypto Shared CPU, Mem Mem Crypto Memory & IO – vnet0@ldm1 IO Devices 72GB PCI-E e1000g0 Network 249Saturday, May 2, 2009
  • 250. Virtual Network Device (vnet) Delayed Reconfiguration • Implements an ethernet device in a domain > Communicates with other vnets or the outside world over vswitch devices • If the vSwitch is suitably configured, packets can be routed out of the server. • vnet exports a GLDv3 interface > A simple virtual Ethernet NIC > Enumerates as a vnetx device > For domain-domain transfers, vnets connect directly. 250Saturday, May 2, 2009
  • 251. Memory Delayed Reconfiguration • Memory is configured through the Control Domain • Minimum allocatable chunk is 8kB > Minimum size is 12MB (for OBP)‫‏‬ > Though most OS deployments will need > 512M • If memory is added over time to a domain > Memory device bindings within a domain may appear to show that memory fragmentation is occuring > Not a problem, all handled in HW by the MMU > No performance penalty 251Saturday, May 2, 2009
  • 252. vCPUs Immediate Reconfiguration • Each UltraSPARC T1 has up to 8 physical cores with 4 threads each > Each thread is considered a vCPU, so up to 32 vCPUs or Domains • Each UltraSPARC T2 has up to 8 physical cores with 8 threads each > Each thread is considered a vCPU, so up to 64 vCPUs or Domains • Maximum Granularity is 1 vCPU per domain • vCPUs can only be allocated to one Domain at a time. • Can be dynamically allocated with the Domain running, > Take care if removing a vcpu from a running domain, will there be enough compute power left in the domain ? 252Saturday, May 2, 2009
  • 253. Example 3 Guest Domains and ZFS 253Saturday, May 2, 2009
  • 254. Using ZFS (1) – setup zfs 1. Remove the disk from the service domain ldm stop-domain ldm1 LDom ldm1 stopped ldm unbind-domain ldm1 ldm remove-vdsdev ldm1-vol1@primary-vds0 2. Create a zpool root@cmt1 > zfs create mypool/ldoms root@cmt1 > zfs create mypool/ldoms/ldm1 root@cmt1 > cd /export/ldoms/ldm1 root@cmt1 > ls root@cmt1 > mkfile 12G `pwd`/rootdisk 254Saturday, May 2, 2009
  • 255. Using ZFS (2) – setup guest domain 3. Configure the guest domain root@cmt1 > ldm add-domain ldm1 root@cmt1 > ldm add-vcpu 8 ldm1 root@cmt1 > ldm add-memory 1G ldm1 root@cmt1 > ldm add-vnet vnet0 primary-vsw0 ldm1 root@cmt1 > ldm add-vdsdev /export/ldoms/ldm1/rootdisk ldm1-vol1@primary-vds0 root@cmt1 > ldm add-vdisk ldm1-vdisk1 ldm1-vol1@primary-vds0 ldm1 root@cmt1 > ldm set-var auto-boot?=false ldm1 root@cmt1 > ldm set-var boot-device=ldm1-vdisk1 ldm1 root@cmt1 > ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel- devices@200/network@0 ldm1 255Saturday, May 2, 2009
  • 256. Using ZFS (3) – setup guest domain 4. Start the guest domain root@cmt1 > ldm bind-domain ldm1 root@cmt1 > ldm start-domain ldm1 LDom ldm1 started 5. Inspect the domain root@cmt1 > ldm list-domain NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-cv SP 8 2G 0.7% 17h 12m ldm1 active -t--- 5000 8 1G 13% 7s telnet localhost 5000 {ok} boot vnet0 - install installation goes forward 256Saturday, May 2, 2009
  • 257. Provision the guest 6. Set up for jumpstart Determine the mac address root@cmt1 > ldm list-bindings ldm1 [snip] NETWORK NAME SERVICE DEVICE MAC vnet0 primary-vsw0@primary network@0 00:14:4f:f8:2a:c4 PEER MAC primary-vsw0@primary 00:14:4f:46:41:b4 telnet localhost 5000 {0} ok banner SPARC Enterprise T5120, No Keyboard [snip] Ethernet address 0:14:4f:fb:7:42, Host ID: 83fb0742. 257Saturday, May 2, 2009
  • 258. Provision the guest (2)‫‏‬ {0} ok boot vnet0 - install Boot device: /virtual-devices@100/channel-devices@200/network@0ile and args: - install Requesting Internet Address for 0:14:4f:f8:2a:c4 SunOS Release 5.10 Version Generic_120011-14 64-bit ... How to break telnet> send brk Debugging requested; hardware watchdog suspended. c)ontinue, s)ync, r)eboot, h)alt? r Resetting... {0} ok 258Saturday, May 2, 2009
  • 259. Guest Domain (zfs) login {0} ok boot Boot device: ldm1-vdisk1 File and args: SunOS Release 5.10 Version Generic_120011-14 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: ldm1 ldm1 console login: 259Saturday, May 2, 2009
  • 260. Using ZFS (2) – cloning domains Snapshot and Clone the installed boot disk tgendron@cmt1 > zfs list NAME USED AVAIL REFER MOUNTPOINT mypool 12.0G 54.9G 27.5K /export mypool/ldoms 12.0G 54.9G 25.5K /export/ldoms mypool/ldoms/ldm1 12.0G 54.9G 12.0G /export/ldoms/ldm1 root@cmt1 > zfs snapshot mypool/ldoms/ldm1@initial Create the clones root@cmt1 > zfs snapshot mypool/ldoms/ldm1@initial root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm2 root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm3 root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm4 root@cmt1 > zfs clone mypool/ldoms/ldm1@initial mypool/ldoms/ldm5 260Saturday, May 2, 2009
  • 261. Using ZFS (2) – Leverage the clones 4. Create the new guest domains (should be easily to script this)‫‏‬ ldm add-domain ldm2 ldm add-vcpu 8 ldm2 ldm add-memory 1G ldm2 ldm add-vnet vnet0 primary-vsw0 ldm2 ldm add-vdsdev /export/ldoms/ldm2/rootdisk ldm2-vol1@primary-vds0 ldm add-vdisk ldm2-vdisk1 ldm2-vol1@primary-vds0 ldm2 ldm set-var auto-boot?=false ldm2 ldm set-var boot-device=vdisk ldm2 ldm set-var nvramrc-devalias vnet0 /virtual-devices@100/channel-devices@200/network@0 ldm2 ldm bind-domain ldm2 ldm start-domain ldm2 261Saturday, May 2, 2009
  • 262. Boot the cloned ldom {0} ok boot Boot device: vdisk File and args: SunOS Release 5.10 Version Generic_120011-14 64-bit Copyright 1983-2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. WARNING: vnet0 has duplicate address 010.030.019.178 (in use by 00:14:4f:f8:2a:c4); disabled Feb 13 19:55:29 svc.startd[7]: svc:/network/physical:default: Method "/lib/svc/method/net-physical" failed with exit status 96. Feb 13 19:55:29 svc.startd[7]: network/physical:default misconfigured: transitioned to maintenance (see svcs -xv for details)‫‏‬ Hostname: ldm1... 262Saturday, May 2, 2009
  • 263. Example 4 Split Service Domains 263Saturday, May 2, 2009
  • 264. Sun Fire T2000 Block Diagram 264Saturday, May 2, 2009
  • 265. Split IO Example • Setting up a second Service domain with split PCI busses... -bash-3.00# ldm list-bindings primary Name: primary ... IO: pci@780 (bus_a)‫‏‬ pci@7c0 (bus_b)‫‏‬ ... -bash-3.00# df / / (/dev/dsk/c1t0d0s0 ):28233648 blocks 3450076 files -bash-3.00# ls -l /dev/dsk/c1t0d0s0 lrwxrwxrwx 1 root root 65 Apr 11 13:25 /dev/dsk/c1t0d0s0 -> ../../devices/ pci@7c0/pci@0/pci@1/pci@0,2/LSILogic,sas@2/sd@0,0:a -bash-3.00# grep e1000g /etc/path_to_inst "/pci@780/pci@0/pci@1/network@0" 0 "e1000g" "/pci@780/pci@0/pci@1/network@0,1" 1 "e1000g" "/pci@7c0/pci@0/pci@2/network@0" 2 "e1000g" "/pci@7c0/pci@0/pci@2/network@0,1" 3 "e1000g" -bash-3.00# ldm remove-io pci@780 primary .. -bash-3.00# shutdown -i6 -y -g0 .. -bash-3.00# ldm add-io pci@780 second-svrc-dom -bash-3.00# ldm start second-srvc-dom -bash-3.00# ldm list-bindings .. -bash-3.00# Check which PCI bus ports we own and are currently using and be sure to only give away unused ones... i.e need to retain the Control Domain boot disk controller and network device... Providing a PCI bus to a Guest makes the selected Domain a Service domain, by definition – access to physical IO = Service Domain. 265Saturday, May 2, 2009
  • 266. Sun Fire T5x20 Block Diagram 16 x FB-DIMMs Disk Chassis 1RU 2RU/8 SSI x8 FPGA 10GbE 10GbE x4 PCI-E LSI x4 x8 MPC885 SAS links x4 Switch 1068E ILOM PLX 8533 Service DVD x4 Processor x4 PCI-E x8 Switch PCI-E USB 2.0 to IDE PCI-E Switch PLX 8533 x1 to x8 PLX 8517 10GbE 2.0 USB x4 x4 x4 SerDes BCM8704 USB 2.0 x4 x4 Hub 2RU Only 10GbE USB Intel Intel Cu PHY 2.0 BCMxxxx Dual Dual GbE GbE XFPFront Panel 10GbE Fibre Plugin 0 1 2 3 USB Quad GbE PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E Serial Network POSIX Rear Panel 266 Connectors x16 x8 x8 x8 x8 x8 Mgt Mgt Serial DB-9Saturday, May 2, 2009
  • 267. MPxIO considerations • MPxIO can be used in the Service/Control domain • Very straightforward to configure with defaults... > Ensure you have two FC-AL HBAs in a single service domain attached to the the same SAN array > Check that you have two paths to the same SAN devices (ls /dev/dsk/)‫‏‬ > Enable MPxIO by running the command stmsboot -e and rebooting the control/service domain > Check that you now have only a single path to the SAN devices... 267Saturday, May 2, 2009
  • 268. IPMP considerations • IPMP has several options for configurations > Refer to the Admininstration Guide for worked examples... > Options are Multipathing in the Service Domain or Multipathing in the Guest Domain 268Saturday, May 2, 2009
  • 269. Ldom 1.0.1 Best Practice Guidence 269Saturday, May 2, 2009
  • 270. Ldom Best Practice (1)‫‏‬ • Control Domain > Runs LDM daemon processes > Must have adequate CPU and memory > Start w/ 1 core (4 or 8 threads) 1GB Memory > Make this domain as secure as possible 270Saturday, May 2, 2009
  • 271. Ldom Best Practice (2)‫‏‬ • I/O and service domains > Runs IO for other domains > Resources will be sized based on IO load > Start w/ 1 core and 1GB memory > 4GB of memory if zfs used for virtual disks images > Add complete cores as heavier I/O loads 271Saturday, May 2, 2009
  • 272. Ldom Best Practice (3)‫‏‬ • Core/Thread Affinity > Core resources are shared by threads > E.g. L1 cache and MAU, FPU • Best to avoid allocating the threads of a core to separate domains • Create larger Ldoms first using complete cores • Smaller domains last 272Saturday, May 2, 2009
  • 273. Ldom Best Practice (4)‫‏‬ Delayed Reconfiguration • Cypto Units • Each T1/T2 physical CPU Core has a Crypto Unit > 8 in total on a 8 core system > referred to as (MAU)‫‏‬ • Crypto cores can only be allocated to domains that have at least one vcpu(thread) on the same physical Core as the crypto unit • Crypto cores cannot be shared, they are owned by exactly one (or no) Domain • Probably best to allocate all four/eight threads on a Core to a domain that wants to use the Crypto core 273Saturday, May 2, 2009
  • 274. More on Crypto Units • For example we define three domains in order of LDOM1 then LDOM2 then LDOM 3... LDOM1 LDOM2 LDOM3 • LDOM1 has 3 threads (vCPUs) on Core 0 > Only has access to MAU0 since it only has threads on Core 0 • LDOM2 has 6 vCPUs spread across Cores 0, 1 & 2 T1 Core 0 T1 Core 1 T1 Core 2 > Potentially has access to MAUs 0,1 & 2 > BUT.. LDOM1 already binds MAU0 > So only can take MAU1 and MAU2 • LDOM3 has 3 vCPUs on Core 2 MAU0 MAU1 MAU2 > But cant access any MAUs since LDOM2 has already taken MAU2 • Adding and removing vCPUs can cause access to previously accessible MAUs to be lost, currently you cant elect specific vCPUs, framework does that itself • When MAUs are allocated to Domains, vCPUs become delayed reconfiguration properties in those domains 274Saturday, May 2, 2009
  • 275. Ldom Best Practice (5)‫‏‬ • Plan your LDOM configuration carefully, reconfiguration may become awkward • Use easy to understand names > Try not to overload vds, vsw, ldom, vdisk,vnic etc... • Use MPxIO or VxVM, VxFS, Sun Cluster on service domains (only VxFS in Guests) for resilient storage devices • Use IPMP on Guest or Service Domains for resilient network connections 275Saturday, May 2, 2009
  • 276. Ldom Best Practice (6)‫‏‬ • For hi-speed inter-domain comms use device-less/in-memory VSW configs • For high disk performance, allocate a whole real device via a dedicated, properly sized Virtual Disk Server and Service domain • Look at the server architecture when configuring devices to ensure you get the bandwidth you expect • For critical applications consider hot/warm standby domains across multiple physical servers, never rely on multiple instances within a single server. 276Saturday, May 2, 2009
  • 277. LDOMs v1.0.1 Notes • All domains can be Stopped and Started independently > Beware, Guest domains attempting to perform IO using a rebooting Service domain will stall until the Service domain returns. • LDOM SNMP MIB available now with traps and requests to the LDOM framework • MAC address on banner different from what is raprd for jumpstart • Only vcpus can be dynamically reallocated > BUT... if the domain has crypto cores this becomes a delayed reconfiguration > You cannot choose which vCPUs are allocated to a domain • By default the Control/Service domain cannot network with Guest domains > Plumb the vSwitch vsw device to enable communications > Give the vsw device the e1000g devices MAC address • Check you have the latest versions of the documents, Software & Firmware 277Saturday, May 2, 2009
  • 278. SVM, VxVM, ZFS Volume managers • SVM, VxVM and ZFS volumes can be exported from a Service Domain to Guest domains and appear as virtual disks to the Guest Domains > Always appear as a disk with only one s0 slice > Cant be used as Solaris Install targets...yet, just use for data storage • Can export a disk image file placed in one of these volumes as a full disk image to Guest domains > Allows use of the disk as Solaris Install Target > Doing this with ZFS allows very efficient re-use of images using ZFS Snapshotting and Cloning and Compression > Invisibly bestows the benefits of the underlying Volume manager on the disks available to the Guest domains > Using SVM allows either Guest or Service domain to access the disk image, allowing for off-line m a intenance of the guest domain filesystems (only one at a time can mount the filesystem)‫‏‬ • VxVM can only be used in the Service domain, not Guest domains 278Saturday, May 2, 2009
  • 279. Solaris Cluster 3.2 Support • Sun Cluster 3.2 is now supported in IO Domains > i.e domains with real physical devices, PCI busses or NIU devices • Please check the web site here for more infom on deployment scenarios > http://blogs.sun.com/SC/entry/announcing_solaris_cluster_support_in 279Saturday, May 2, 2009
  • 280. Logical Domains (LDoms) Roadmap • LDoms 1.0 > LDoms 1.0.1 - CURRENT > Niagara support – Niagara2 support > Up to 32 LDOMs per system, guest domain may – I/O domain reboot support be rebooted independently – Control domain minimization > Virtualized console, ethernet, disk & – SNMP MIB cryptographic acceleration – Web management tool > Live re-configuration of virtual CPUs (freeware/unsupported)‫‏‬ > FMA diagnosis for each domain > Control domain hardening * Requiring new Solaris 10 update 280Saturday, May 2, 2009
  • 281. References for further information • http://www.sun.com/ldoms • Sun Blueprints relating to LDOMs – http://www.sun.com/blueprints/0207/820-0832.html – http://www.sun.com/blueprints/0807/820-3023.html • SDLC Release of LDOMs – http://www.sun.com/download/products.xml?id=46e5ba66 • Official Documentation for the SDLC release – http://www.sun.com/servers/coolthreads/ldoms/get.jsp • LDOMs Blogs – http://blogs.sun.com/hlsu/entry/logincal_domains_1_0_1 • OpenSolaris LDOMs community – http://www.opensolaris.org/os/community/ldoms/ 281Saturday, May 2, 2009
  • 282. Domains Copyright 2009 Peter Baer Galvin - All Rights Reserved 282Saturday, May 2, 2009
  • 283. Overview Long-standing Sun server feature E10Ks and all servers since then Hard partition of system resources (bus, CPU, memory, I/O) Options vary depending on hardware (how many domains, CPUs per domain) Sometimes used in conjunction with Dynamic Reconfiguration (DR) Controlled via firmware commands (XSCF on M- servers) Copyright 2009 Peter Baer Galvin - All Rights Reserved 283Saturday, May 2, 2009
  • 284. Prep Work Do this before installing Solaris / moving to production Determine number of domains, resources per domain (CPU, memory, I/O) Make sure I/O is redundant between allocation units (so for example a system board can be taken out of service without disabling I/O to a device) PCI cards must support DR (per device) Leave “kernel cage memory” enabled to minimize number of system boards kernel memory allocated to Enabled by default in S10 (but costs a little performance) Disable via set kernel_cage_enable=0 in /etc/ system Copyright 2009 Peter Baer Galvin - All Rights Reserved 284Saturday, May 2, 2009
  • 285. Prep Work (cont) Copyright 2009 Peter Baer Galvin - All Rights Reserved 285Saturday, May 2, 2009
  • 286. M-Servers Copyright 2009 Peter Baer Galvin - All Rights Reserved 286Saturday, May 2, 2009
  • 287. Server model Max system boards Max domains M9000+EU 16 24 M9000 8 24 M8000 4 16 M5000 2 4 M4000 1 2 Copyright 2009 Peter Baer Galvin - All Rights Reserved 287Saturday, May 2, 2009
  • 288. Implementation For M-servers, see http://docs.sun.com/ source/819-3601-13 setupfru, showfru, setdcl, addboard, showdcl, showboards commands configure resources into domains in XSCF Copyright 2009 Peter Baer Galvin - All Rights Reserved 288Saturday, May 2, 2009
  • 289. DR Can install, remove, add , delete, move, register, configure, unconfigure, etc system boards A system board is in one domain at a time Move resources as needed between domains Movement can be automated or manual And I/O devices While Solaris remains running Good details in http://docs.sun.com/ source/819-5992-12 Copyright 2009 Peter Baer Galvin - All Rights Reserved 289Saturday, May 2, 2009
  • 290. Implementation XSCF used to configure DR Shell and Web interfaces Add to Domain command set showdcl, setdcl, addboard, deleteboard, moveboard, showdomainstatus cfgadm and cfgadm_pci configures DR on I/O devices Be sure to configure and implement all of this before going production - don’t plan on adding a domain to a production system without practice and experience Copyright 2009 Peter Baer Galvin - All Rights Reserved 290Saturday, May 2, 2009
  • 291. xVM Virtualbox Copyright 2009 Peter Baer Galvin - All Rights Reserved 291Saturday, May 2, 2009
  • 292. Overview Sun has a suite of xVM products xVM ops center - patching x86 and SPARC (Linux too) plus provisioning xVM virtualbox - desktop virtualization x86 xVM server (aka Xen) - hypervisor-like virtualization for x86 Copyright 2009 Peter Baer Galvin - All Rights Reserved 292Saturday, May 2, 2009
  • 293. Virtualbox Open source (GPL) virtualization environment for x86 (and closed source commercial version) (Sun bought the independent developer) Completes Sun’s virtualization picture by adding desktop / workstation virtualization tool Competes with VMWare workstation, Parallels, Fusion Copyright 2009 Peter Baer Galvin - All Rights Reserved 293Saturday, May 2, 2009
  • 294. Platform Support Runs on Windows, Linux, MacOS X, and OpenSolaris Guest support is extensive, including Windows (NT 4.0, 98, 2000, XP, Server 2003, Vista), DOS/Windows 3.x, Linux (2.4 and 2.6), OpenBSD, Solaris, OpenSolaris Full list at http://www.virtualbox.org/ wiki/Guest_OSes Copyright 2009 Peter Baer Galvin - All Rights Reserved 294Saturday, May 2, 2009
  • 295. Features Modular design Active community VM descriptions in XML Guest tools to add functionality to some guests Shared folders Multiple snapshots of VM states Supports VT-x and AMD-V (enable per-VM) Seamless windows on Windows guests, Linux, Solaris Import of guest VMs in VMDK format Copyright 2009 Peter Baer Galvin - All Rights Reserved 295Saturday, May 2, 2009
  • 296. Closed-source Features Virtual USB controller Remote Desktop Protocol (RDP) server support Can connect to Virtualbox client from other systems, thin clients USB over RDP works - guest can access local resources while displaying remotely iSCSI initiator (can use iSCSI targets as virtual disks) SATA controller (faster and less overhead than IDE) Copyright 2009 Peter Baer Galvin - All Rights Reserved 296Saturday, May 2, 2009
  • 297. XVM Server Copyright 2009 Peter Baer Galvin - All Rights Reserved 297Saturday, May 2, 2009
  • 298. xVM Server Solaris-based bare-metal hypervisor based on Xen Complete vm management Goal is to be similar to VMWare ESX Brand-new Server itself is open source, is free to try xVM Infrastructure Enterprise - multinode management of VMs xVM Infrastructure Datacenter - multinode management of physical servers and physical and virtual nodes Copyright 2009 Peter Baer Galvin - All Rights Reserved 298Saturday, May 2, 2009
  • 299. Features MS 2003, 2008, RedHat 4.6 / 5.2, Solaris and OpenSolaris guests Live migration Guest cloning / templating xVM Ops Center integration Java-based KVM access to guest OS consoles Management is browser-based VMDK-formatted guest OSes supported Paravirtualized device drivers NAS / CIFS storage support Least privilege security model of services, management DTrace integration (just how much?) ZFS supported (guest OS file systems Copyright 2009 Peter Baer Galvin - All Rights Reserved 299Saturday, May 2, 2009
  • 300. Implementation TBD Copyright 2009 Peter Baer Galvin - All Rights Reserved 300Saturday, May 2, 2009
  • 301. References You Are Now Free to Move About Solaris Copyright 2009 Peter Baer Galvin - All Rights Reserved 301Saturday, May 2, 2009
  • 302. References  [Kozierok] TCP/IP Guide, No Starch Press, 2005  [Nemeth] Nemeth et al, Unix System Administration Handbook, 3rd edition, Prentice Hall, 2001  [SunFlash] The SunFlash announcement mailing list run by John J. Mclaughlin. News and a whole lot more. Mail sunflash-info@sun.com  Sun online documents at docs.sun.com  [Kasper] Kasper and McClellan, Automating Solaris Installations, SunSoft Press, 1995 Copyright 2009 Peter Baer Galvin - All Rights Reserved 302Saturday, May 2, 2009
  • 303. References (continued)  [O’Reilly] Networking CD Bookshelf, Version 2.0, O’Reilly 2002  [McDougall] Richard McDougall et al, Resource Management, Prentice Hall, 1999 (and other "Blueprint" books)  [Stern] Stern, Eisler, Labiaga, Managing NFS and NIS, 2nd Edition, O’Reilly and Associates, 2001 Copyright 2009 Peter Baer Galvin - All Rights Reserved 303Saturday, May 2, 2009
  • 304. References (continued)  [Garfinkel and Spafford] Simson Garfinkel and Gene Spafford, Practical Unix & Internet Security, 3rd Ed, O’Reilly & Associates, Inc, 2003 (Best overall Unix security book)  [McDougall, Mauro, Gregg] McDougall, Mauro, and Gregg, Solaris Internals and Solaris Performance and Tools, 2007 (great Solaris internals, DTrace, mdb books) Copyright 2009 Peter Baer Galvin - All Rights Reserved 304Saturday, May 2, 2009
  • 305. References (continued)  Subscribe to the Firewalls mailing list by sending "subscribe firewalls <mailing-address>" to Majordomo@GreatCircle.COM  USENIX membership and conferences. Contact USENIX office at (714)588-8649 or office@usenix.org  Sun Support: Sun’s technical bulletins, plus access to bug database: sunsolve.sun.com  Solaris 2 FAQ by Casper Dik: ftp://rtfm.mit.edu/pub/usenet-by-group/comp.answers/Solaris2/FAQ Copyright 2009 Peter Baer Galvin - All Rights Reserved 305Saturday, May 2, 2009
  • 306. References (continued)  Sun Managers Mailing List FAQ by John DiMarco: ftp://ra.mcs.anl.gov/sun-managers/faq Suns unsupported tool site (IPV6, printing) http://playground.sun.com/ Sunsolve STBs and Infodocs http://www.sunsolve.com Copyright 2009 Peter Baer Galvin - All Rights Reserved 306Saturday, May 2, 2009
  • 307. References (continued)  comp.sys.sun.* FAQ by Rob Montjoy: ftp:// rtfm.mit.edu/pub/usenet-by-group/comp.answers/comp-sys-sun-faq “Cache File System” White Paper from Sun: http://www.sun.com/sunsoft/Products/Solaris-whitepapers/Solaris- whitepapers.html  “File System Organization, The Art of Automounting” by Sun: ftp://sunsite.unc.edu/pub/sun-info/white-papers/TheArtofAutomounting-1.4.ps Solaris 2 Security FAQ by Peter Baer Galvin http://www.sunworld.com/common/security-faq.html Secure Unix Programming FAQ by Peter Baer Galvin http://www.sunworld.com/swol-08-1998/swol-08-security.html Copyright 2009 Peter Baer Galvin - All Rights Reserved 307Saturday, May 2, 2009
  • 308. References (continued)  Firewalls mailing list FAQ: ftp://rtfm.mit.edu/pub/usenet-by-group/ Comp.answers/firewalls-faq  There are a few Solaris-helping files available via anon ftp at ftp://ftp.cs.toronto.edu/pub/darwin/ solaris2 Peter’s Solaris Corner at SysAdmin Magazine http://www.samag.com/solaris  Marcus and Stern, Blueprints for High Availability, Wiley, 2000  Privilege Bracketing in Solaris 10 http://www.sun.com/blueprints/0406/819-6320.pdf Copyright 2009 Peter Baer Galvin - All Rights Reserved 308Saturday, May 2, 2009
  • 309. References (continued) Peter Baer Galvins Sysadmin Column (and old Petes Wicked World security columns, etc) http://www.galvin.info My blog at http://pbgalvin.wordpress.com Operating Environments: Solaris 8 Operating Environment Installation and Boot Disk Layout by Richard Elling http://www.sun.com/blueprints (March 2000) Sun’s BigAdmin web site, including Solaris and Solaris X86 tools and information’ http://www.sun.com/bigadmin Copyright 2009 Peter Baer Galvin - All Rights Reserved 309Saturday, May 2, 2009
  • 310. References (continued) DTrace http://users.tpg.com.au/adsln4yb/ dtrace.html http://www.solarisinternals.com/si/dtrace/ index.php http://www.sun.com/bigadmin/content/dtrace/ Copyright 2009 Peter Baer Galvin - All Rights Reserved 310Saturday, May 2, 2009