Linux on System z Optimizing Resource Utilization for Linux under z/VM – Part II

809 views

Published on

Learn about Linux on System z Optimizing Resource Utilization for Linux under z/VM system.For more information, visit http://ibm.co/PNo9Cb.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
809
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linux on System z Optimizing Resource Utilization for Linux under z/VM – Part II

  1. 1. 2012-03-15Linux on System zOptimizing Resource Utilization for Linuxunder z/VM – Part II Dr. Juergen Doelle IBM Germany Research & Development visit us at http://www.ibm.com/developerworks/linux/linux390/perf/index.html © 2012 IBM Corporation
  2. 2. TrademarksIBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International BusinessMachines Corp., registered in many jurisdictions worldwide. Other product and service namesmight be trademarks of IBM or other companies. A current list of IBM trademarks is available on theWeb at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in theUnited States, other countries, or both.Other product and service names might be trademarks of IBM or other companies.2 © 2012 IBM Corporation
  3. 3. Agenda  Introduction  Methods to pause a z/VM guest  WebSphere Application Server JVM stacking  Guest scaling and DCSS usage  Summary3 © 2012 IBM Corporation
  4. 4. Introduction  A virtualized environment – runs operating systems like Linux on virtual hardware – backing up the virtual hardware with physical hardware as required is the responsibility of a Hypervisor, like z/VM. – provides significant advantages in regard of management and resource utilization – running virtualized on shared resources introduces a new dynamic into the system  The important part is the as required – normally do not all guests need all assigned resources all the time – It must be possible for the Hypervisor to identify resources which are not used.  This presentation analyzes two issues – Idling applications which look so frequently for work that they appear active to the Hypervisor • Concerns many applications running with multiple processes/threads • Sample: noisy WebSphere Application Server • Solution: pausing guests which are know to be unused for a longer period – A configuration question, what is better vertical or horizontal application stacking • stacking many applications/middle ware in one very large guest • having many guests with one guest for each application/middle ware • or something in between • Sample: 200 WebSphere JVMs • Solution: Analyze the behavior of setup variations4 © 2012 IBM Corporation
  5. 5. Agenda  Introduction  Methods to pause a z/VM guest  WebSphere Application Server JVM stacking  Guest scaling and DCSS usage  Summary5 © 2012 IBM Corporation
  6. 6. Introduction  The requirement: – An idling guest should not consume CPU or memory. – Definition of idle: A guest not processing any client requests is idle • This is the customer view, not the view from the operating system or Hypervisor.  The issue: – An idling WebSphere Application Server guest is so frequently looking for work, that it does not become idle from the Hypervisor view. – For z/VM: it can not set the guest state to dormant and the resources, especially the memory, are considered as actively used: • z/VM could use real memory pages from dormant guests easily for other active guests • But the memory pages from idling WebSphere guests will stay in real memory and not be paged out • The memory pages from an idling WebSphere Application server compete with really active guests for real memory pages – This behavior expected to be not specific for WebSphere Application Server  A solution – Guests which are known as inactive, for example when the developer has finished his work, are made inactive to the z/VM by • using Linux suspend mechanism (hibernates the Linux) • using z/VM stop command (stops the virtual CPUs) – This should help to increase the level of memory overcommitment significantly, for example for systems hosting WAS environments for a world wide working development groups6 © 2012 IBM Corporation
  7. 7. How to pause the guest  Linux Suspend/Resume – Setup: dedicated swap disk to hold the full guest memory zipl.conf: resume=<swap device_node> Optional: add a boot target with the noresume parameter as failsafe entry – Suspend: echo disk >/sys/power/state – Impact: controlled halt of Linux and its devices – Resume: just IPL the guest – more details in the device drivers book: http://www.ibm.com/developerworks/linux/linux390/documentation_dev.html  z/VM CP Stop/Begin – Setup: Privilege class: G – Stop: CP stop cpu all (might be issued from vmcp) – Impact: stops virtual CPUs, execution just halted, device states might remain undefined – Resume: CP begin cpu all (from x3270 session, disconnect when done) – more details in CP command reference: http://publib.boulder.ibm.com/infocenter/zvm/v6r1/index.jsp?topic=/com.ibm.zvm.v610.hcpb7/toc.htm7 © 2012 IBM Corporation
  8. 8. Objectives - Part 1  Objectives – Part 1 – Determine the time needed to deactivate/activate the guest – Show that the deactivated z/VM guest state becomes dormant – Use a WebSphere Application Server guest and a standalone database8 © 2012 IBM Corporation
  9. 9. Time to pause or restart the guests Times until the guest is halted Linux: suspend z/VM: stop command standalone WebSphere standalone WebSphere database guest database guest times [sec] 8 27 immediately immediately Times until the guest is started again z/VM: begin Linux: resume command times [sec] 19 immediately  CP stop/begin is much faster – In case of an IPL/ power loss system state is lost – Disk and memory state of the system might be inconsistent  Linux suspend writes the memory image to the swap device and shutdown the devices. – Needs more time, but the system is left in a very controlled state – IPL or power loss have no impact on the system state – All devices are cleanly shutdown9 © 2012 IBM Corporation
  10. 10. Guest States Suspend/Resume Queue State of Suspended Guests DayTrader Workload (WAS/DB2) state: 1 = DISP, 0 = DORM 1 suspend resume 0 00:00 07:12 14:24 21:36 28:48 36:00 time in mm:ss LNX00105 LNX00107 Stop/Begin Queue State of Suspended Guests DayTrader Workload (WAS/DB2) state: 1 = DISP, 0 = DORM 1 stop begin 0 00:00 07:12 14:24 21:36 28:48 36:00 time in mm:ss  Suspend/Resume behaves ideal, full time dormant!  With Stop/Begin the guest gets always scheduled again – on reason is that z/VM is still processing interrupts for the virtual NICs (VSWITCH)10 © 2012 IBM Corporation
  11. 11. Objectives - Part 2 Objectives – Part 2 – Show if the pages from the paused guests are really moved to XSTOR Environment – use 5 guests with WebSphere Application Server and DB2 and 5 standalone database guests (from another vendor) – The application server systems are in a cluster environment which require an additional guest with network deployment manager – 4 Systems of interest: target systems • 2 WebSphere + 2 standalone databases – 4 Standby systems: activated to produce memory pressure when systems of interest are paused • 2 WebSphere + 2 standalone databases – 2 Base load systems: never paused to produce a constant base load • 1 WebSphere + 1 standalone database11 © 2012 IBM Corporation
  12. 12. Suspend Resume Test - Methodology Real CPU usage Systems of interest (suspended/resumed) 1.2 1.0 0.8 0.6 0.4 workload stop resume IFLs 0.2 and suspend 0.0 0:00:00 0:01:20 0:02:40 0:04:00 0:05:20 0:06:40 0:08:00 0:09:20 0:10:40 0:12:00 0:13:20 0:14:40 0:16:00 0:17:20 0:18:40 0:20:00 0:21:20 0:22:40 0:24:00 0:25:20 0:26:40 0:28:00 0:29:20 0:30:40 0:32:00 0:33:20 0:34:40 0:00:40 0:02:00 0:03:20 0:04:40 0:06:00 0:07:20 0:08:40 0:10:00 0:11:20 0:12:40 0:14:00 0:15:20 0:16:40 0:18:00 0:19:20 0:20:40 0:22:00 0:23:20 0:24:40 0:26:00 0:27:20 0:28:40 0:30:00 0:31:20 0:32:40 0:34:00 0:35:20 Time (h:mm) Real CPU usage Stand by Systems (activated to create memory pressure) 1.2 1.0 0.8 0.6 workload stop 0.4 IFLs and suspend 0.2 0.0 0:00:00 0:01:20 0:02:40 0:04:00 0:05:20 0:06:40 0:08:00 0:09:20 0:10:40 0:12:00 0:13:20 0:14:40 0:16:00 0:17:20 0:18:40 0:20:00 0:21:20 0:22:40 0:24:00 0:25:20 0:26:40 0:28:00 0:29:20 0:30:40 0:32:00 0:33:20 0:34:40 0:00:40 0:02:00 0:03:20 0:04:40 0:06:00 0:07:20 0:08:40 0:10:00 0:11:20 0:12:40 0:14:00 0:15:20 0:16:40 0:18:00 0:19:20 0:20:40 0:22:00 0:23:20 0:24:40 0:26:00 0:27:20 0:28:40 0:30:00 0:31:20 0:32:40 0:34:00 0:35:20 Time (h:mm) Real CPU usage Base Load systems (always up) 1.2 1.0 0.8 0.6 0.4 IFLs 0.2 0.0 0:00:00 0:01:20 0:02:40 0:04:00 0:05:20 0:06:40 0:08:00 0:09:20 0:10:40 0:12:00 0:13:20 0:14:40 0:16:00 0:17:20 0:18:40 0:20:00 0:21:20 0:22:40 0:24:00 0:25:20 0:26:40 0:28:00 0:29:20 0:30:40 0:32:00 0:33:20 0:34:40 0:00:40 0:02:00 0:03:20 0:04:40 0:06:00 0:07:20 0:08:40 0:10:00 0:11:20 0:12:40 0:14:00 0:15:20 0:16:40 0:18:00 0:19:20 0:20:40 0:22:00 0:23:20 0:24:40 0:26:00 0:27:20 0:28:40 0:30:00 0:31:20 0:32:40 0:34:00 0:35:20 Time (h:mm)12 © 2012 IBM Corporation
  13. 13. Suspend/Resume Test – What happens with the pages? Pages in XSTOR Pages in real storage In the middle of the warmup phase In the middle of the warmup phase 200000 200000 SOI W. 180000 180000 SOI W. 160000 160000 SOI SDB 140000 140000 SOI SDB # Pages 120000 # Pages 120000 Standby W. 100000 100000 Standby W. 80000 Standby SDB 80000 60000 Standby SDB 60000 40000 Base Load W. 40000 20000 Base Load SDB 20000 0 Depl. Mgr 0 Pages in XSTOR Pages in real storage In the middle of the suspend phase In the middle of the suspend phase 200000 SOI W. 200000 180000 SOI W. 180000 160000 SOI SDB 160000 140000 SOI SDB 140000 # Pages 120000 # Pages Standby W. 120000 100000 Standby W. 100000 80000 Standby SDB 80000 60000 Standby SDB 60000 40000 Base Load W. 40000 20000 Base Load SDB 20000 0 Depl. Mgr 0 Pages in XSTOR Pages in real storage In the middle of the end phase after resume at the end phase after resume 200000 SOI W. 200000 180000 SOI W. 180000 160000 SOI SDB 160000 140000 SOI SDB 140000 # Pages 120000 # Pages Standby W. 120000 100000 Standby W. 100000 80000 Standby SDB 80000 60000 Standby SDB 60000 40000 Base Load W. 40000 20000 Base Load SDB 20000 0 Depl. Mgr 013 © 2012 IBM Corporation
  14. 14. Resume Times when scaling guest size Time to Resume a Linux Guest 90 80 70 60 50 seconds 40 30 20 10 0 2.0 GB / 1 JVM 2.6 GB / 2 JVMs 3.9 GB / 4 JVMs 5.2 GB / 6 JVMs Guest in GB/# JVMs pure startup time Resume times for 6 JVMs The tests so far have been done with relatively small guests (<1GB memory) How does the resume time scale when the guest size increases – Run multiple JVMs inside the WAS instance, each JVM had a Java heap size of 1 GB to ensure that the memory is really used. – With the guest size the number of JVMs was scaled – The size of the guest was limited by the size of a the mod9 DASD used as swap device for the memory image Resume time includes the time from IPLing the suspended guest until successfully perform an http get Startup time includes only time for executing the startServer command for server 1 – 6 serially with a script► Resume time was always much shorter than just starting the WebSphere application servers14 © 2012 IBM Corporation
  15. 15. Agenda  Introduction  Methods to pause a z/VM guest  WebSphere Application Server JVM stacking  Guest scaling and DCSS usage  Summary15 © 2012 IBM Corporation
  16. 16. Objectives Which setup could be recommended: one or many JVMs per guest?  Expectations – Many JVMs per guest are related with contention inside Linux (memory, CPUs) – Many guests are related with more z/VM effort and z/VM overhead (e.g. virtual NICs, VSWITCH) – Small guests with 1 CPU have no SMP overhead (e.g. no spinlocks)  Methodology – Use a total of 200 JVMs (WebSphere Application Server instances/profiles) – Use a pure Application Server workload without database – Scale the amount of JVMs per guest and the amount of guest accordingly to reach a total of 200 – Use one node agent per guest16 © 2012 IBM Corporation
  17. 17. Test Environment  Hardware: – 1 LPAR on z196, 24 CPUs, 200GB Central storage + 2 GB expanded,  Software – z/VM 6.1, SLES11 SP1, WAS 7.1  Workload: SOA based workload without database back-end (IBM internal)  Guest Setup: – no memory overcommitment guest total virtual #JVMs #VCPUs total of CPU JVMs #guests memory size memory size per guest per guest #VCPUs real : virt per vCPU [GB] [GB] 200 1 24 24 1 : 1.0 8.3 200 200 100 2 12 24 1 : 1.0 8.3 100 200 50 4 6 24 1 : 1.0 8.3 50 200 20 10 3 30 1 : 1.3 6.7 20 200 CPU Overcommitment 10 20 2 40 1 : 1.7 5.0 10 200 4 50 1 50 1 : 2.1 4.0 4 200 2 100 1 100 1 : 4.2 2.0 2 200 Uniprocessor 1 200 1 200 1 : 8.3 1.0 1 20017 © 2012 IBM Corporation
  18. 18. Horizontal versus Vertical WAS Guest Stacking WebSphere JVMs stacking - 200 JVMs CPU load [# IFL] Throughput and LPAR CPU load 4500 16 Throughput [Page Elements/sec] 4000 14 3500 12 3000 10 2500 8 2000 6 1500 start CPU overcommitment + start UP setup 1000 4 guest CPU load < 1 IFL 500 2 0 0 200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0 #JVMs per guest (Total always 200) Guests 1 2 4 10 20 200 Throughput (left Y-axis) LPAR CPU load (right Y- axis) Impact of JVM stacking on throughput is moderate  VM page reorder off – for guests with 10 JVMs and more – Minimum = Maximum – 3.5% (> 8 GB memory) – 1 JVM per guest (200 guests) has the highest throughput – 10 JVMs per guest (20 guests) has the lowest throughput  CPU overcommitment Impact of JVM stacking on CPU load is heavy – starts with 20 JVMs per guest – and increases with less JVMs per guest – Minimum = Maximum - 31%, Difference is 4.2 IFLs – 10 JVMs per guest (20 guests) has the lowest CPU load (9.5 IFL)  Uniprocessor (UP) setup for – 200 JVM per guest (1 guest) has the highest CPU load (13.7 IFL) – 4 JVMs per guest and less Page reorder  no memory overcommmitment – impact of page reorder on/off on a 4GB guest within normal variation18 © 2012 IBM Corporation
  19. 19. Horizontal versus Vertical WAS Guest Stacking - z/VM CPU load WebSphere JVMs stacking - 200 JVMs WebSphere JVMs stacking - 200 JVMs z/VM CPU load (CP=User-Emulation, Total=User+System) z/VM CPU load (CP=User-Emulation, Total=User+System) 14 0.8 12 0.7 0.6 CPU Load [#IFLs] CPU Load [#IFLs] 10 0.5 8 0.4 6 0.3 4 0.2 2 0.1 0 0 200 180 160 140 120 100 80 60 40 20 0 200 180 160 140 120 100 80 60 40 20 0 #JVMs per guest (Total always 200) #JVMs per guest (Total always 200) CP (attributed to the Emulation System (CP time not guest) attributed to the CP (attributed to System (CP time not guest) guests) attributed to guests) Which component causes the variation in CPU load? – CP effort in total between 0.4 and 1 IFL  VM page reorder off • System related is at highest with 1 guest – for guests with 10 JVMs and more • CP effort guest related is at highest with 200 guest, (> 8 GB memory) • At lowest between 4 and 20 guests.  CPU overcommitment – Major contribution comes from the Linux itself (Emulation). – starts with 20 JVMs per guest – and increases with less JVMs per guest z/VM CPU load consist of:  Uniprocessor (UP) setup for – Emulation → this runs the guest – 4 JVMs per guest and less – CP effort attributed to the guest → drives virtual interfaces, e.g VNICs, etc – System (CP effort attributed to no guest) → pure CP effort  no memory overcommmitment19 © 2012 IBM Corporation
  20. 20. Horizontal versus Vertical WAS Guest Stacking -Linux CPU load WebShere JVM stacking - 200 JVMs Linux CPU load (user, system, steal) 14 12 10 #IFLs 8 6 4 2 0 200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0 #JVMs per guest (Total always 200) Guests 1 2 4 10 20 200 User CPU System CPU Steal CPU  Which component inside Linux causes the variation in CPU load? – Major contribution to the reduction in CPU utilization comes from the user space, e.g. inside WebSphere Application Server JVM  System CPU – decreases with the decreasing amount of JVMs per System, – but increases with the Uniprocessor cases  The amount of CPUs per guest seems to be important!20 © 2012 IBM Corporation
  21. 21. virtual CPU scaling with 20 guests (10 JVMs each) and with 200 guests WebSphere JVMs stacking - 200 JVMs WebSphere JVMs stacking - 200 JVMs Throughput and LPAR CPU load - 20 guests, 10 JVMs each Throughput and LPAR CPU load - 200 guests, 1 JVM each CPU load [# IFL] CPU load [# IFL] Throughput [Page Elements/sec] 4,000 14 4,000 16Throughput [Page Elements/sec] 3,500 12 3,500 14 3,000 3,000 12 10 2,500 2,500 10 8 2,000 2,000 8 6 1,500 6 1,500 1,000 4 1,000 4 500 2 500 2 0 0 0 0 1 (1:1) 2 (1:1.7) 3 (1:2.5) 1 (1:8.3) 2 (1:16.7) # virtual CPUs (phys : virtual) # virtual CPUs (phys : virtual) LPAR load z/VM load: Emulation Throughput LPAR load z/VM load: Emulation Throughput  Scaling the amount of virtual CPUs of the guests at the point of the lowest LPAR CPU load – Impact of throughput is moderate (Min = Max – 4.5%), throughput with 1 vCPU is the lowest – Impact on CPU load is high, 2 virtual CPUs provide the lowest amount of CPU utilization – The variation is caused by the emulation part of the CPU load => Linux  It seems that the amount of virtual CPUs has a severe impact on the total CPU load – CPU overcommitment level is one factor, but not the only one! – WebSphere Application Server runs on Linux better with 2 virtual CPUs in this case as long as the CPU overcommitment level is not too excessive 21 © 2012 IBM Corporation
  22. 22. Agenda  Introduction  Methods to pause a z/VM guest  WebSphere Application Server JVM stacking  Guest scaling and DCSS usage  Summary22 © 2012 IBM Corporation
  23. 23. Horizontal versus Vertical WAS Guest Stacking – DCSS vs Minidisk WebSphere JVM stacking WebSphere JVM stacking Throughput with DCSS vs Minidisk LPAR CPU load with DCSS vs Minidisk 4,000 14 3,500 12 3,000 10 Throughput 2,500 2,000 8 #IFLs 1,500 6 1,000 4 500 2 0 0 200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0 200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0 #JVMs per guest (Total always 200) #JVMs per guest (Total always 200) Guests 1 2 4 10 20 200 Guests 1 2 4 10 20 200 DCSS shared Minidisk DCSS shared Minidisk DCSS or shared minidisks?  Impact on throughput is nearly not noticeable  Impact on CPU is significant (and as expected) – For small numbers of guests (1 – 4) it is much cheaper to use a minidisk than a DCSS (Savings: 1.4 – 2.2 IFLs) – 10 guest was the break even – With 20 guests and more, the environment with the DCSS needs less CPU (Savings: 1.5 – 2.2 IFLs) 23 © 2012 IBM Corporation
  24. 24. Agenda  Introduction  Methods to pause a z/VM guest  WebSphere Application Server JVM stacking  Guest scaling and DCSS usage  Summary24 © 2012 IBM Corporation
  25. 25. Summary – Pause a z/VM guest  “No work there” is not sufficient that a WebSphere guest becomes dormant – probably not specific to WebSphere  Deactivating the guest helps z/VM to identify guest pages which can be moved out to the paging devices to use the real memory for other guest – two Methods • Linux suspend mechanism (hibernates the Linux) • z/VM stop command (stops the virtual CPUs) – Allows to increase the possible level of memory and CPU overcommitment  Linux suspend mechanism – takes 10 - 30 sec to hibernate for a 850 MB guest, 20 to 30 sec to resume (1 – 5 GB guest) – controlled halt – the suspended guest is safe!  z/VM stop/begin command – system reacts immediately – guest memory is lost in case of an IPL or power loss – still some activity for virtual devices  There are scenarios which are not eligible for guest deactivation (e.g. HA environments)  For additional Information to methods to pause a z/VM guest – http://www.ibm.com/developerworks/linux/linux390/perf/tuning_vm.html#ruis25 © 2012 IBM Corporation
  26. 26. Summary – scaling JVMs per guest Question: One or many WAS JVMs per guest? Answer:  In regard to throughput: it doesnt matter  In regard to CPU load: it matters heavily – Difference between maximum and minimum is about 4 IFLs(!) at a maximum total of 13.7 IFLs – The variation in CPU load is caused by User space CPU in the guest – z/VM overhead is small and has its maximum at both ends of the scaling – 2 virtual CPUs per guest provide the lowest CPU load for this workload  Sizing recommendation – Do not use more virtual CPUs are required – If the CPU overcommitment level becomes not too high, use at minimum two virtual CPUs per WebSphere system  DCSS vs shared disk – There is only a difference in CPU load, but that reaches the area of 1 - 2 IFLs – For less than 10 guests a shared disk is recommend – For more than 20 guests a DCSS is the better choice  For additional Information – WebSphere JVM stacking: http://www.ibm.com/developerworks/linux/linux390/perf/tuning_vm.html#hv – page reorder: http://www.vm.ibm.com/perf/tips/reorder.html26 © 2012 IBM Corporation
  27. 27. Backup27 © 2012 IBM Corporation
  28. 28. Stop/Begin Test – What happens with the pages? Pages in XSTOR Pages in real storage In the middle of the warmup phase In the middle of the warmup phase 200000 200000 SOI W. 180000 180000 SOI W. 160000 160000 SOI SDB 140000 140000 SOI SDB # Pages 120000 # Pages 120000 Standby W. 100000 100000 Standby W. 80000 Standby SDB 80000 60000 Standby SDB 60000 40000 Base Load W. 40000 20000 Base Load SDB 20000 0 Depl. Mgr 0 Pages in XSTOR Pages in real storage In the middle of the suspend phase In the middle of the suspend phase 200000 SOI W. 200000 180000 SOI W. 180000 160000 SOI SDB 160000 140000 SOI SDB 140000 # Pages 120000 # Pages Standby W. 120000 100000 Standby W. 100000 80000 Standby SDB 80000 60000 Standby SDB 60000 40000 Base Load W. 40000 20000 Base Load SDB 20000 0 Depl. Mgr 0 Pages in XSTOR In the middle of the end phase after resume Pages in real storage 200000 SOI W. 180000 at the end phase after resume SOI W. 160000 SOI SDB 200000 140000 SOI SDB 180000 160000 # Pages 120000 Standby W. 100000 Standby W. 140000 # Pages 80000 Standby SDB 120000 60000 Standby SDB 100000 40000 Base Load W. 80000 20000 Base Load SDB 60000 0 Depl. Mgr 40000 20000 028 © 2012 IBM Corporation

×