VDI Performance of PRIMERGY S7 Server Generation

723 views

Published on

Tasks
■ Develop and set up a PoC (Proof-of-Concept) in order to simulate a standard user operations load on Windows 7 (x64) based VMs.
■ Focus on the technologies “VMware View with Linked Clones" and Citrix "XenDesktop Machine Creation Services".
■ Use of state-of-the-art local storage based on enterprise SSD technology.
■ Figure out VM density on PRIMERGY S7 generation based on Intel Romley-EP server architecture.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

VDI Performance of PRIMERGY S7 Server Generation

  1. 1. White paper VDI Performance of PRIMERGY S7 Server GenerationWhite paperVDI Performance of PRIMERGYS7 Server Generation Content Tasks 2 Use of terms and names 2 Introduction 3 Market observation and product selection 5 Description of load test environment 6 Structure of load environment 6 Description of Medium Workload 6 Description of Heavy Workload 6 Measurements Citrix XenDesktop 5.5 and Machine Creation Services (MCS) 7 Measurements VMware View 5.0 7 Hardware of the test environment 7 Infrastructure VMs of the Test environment 7 Description of Citrix XenDesktop 5.5 VMs 8 Description of VMware View 5.0 VMs 9 Load testing results – VM density 10 Maximum VM density per hypervisor host 10 Summary 11 Hardware recommendations for VDI scenarios 11 Results - higher VM density 11 Dependencies of VM density and memory configuration 11Page 1 of 11 http://www.fujitsu.com/fts
  2. 2. White paper VDI Performance of PRIMERGY S7 Server GenerationTasks■ Develop and set up a PoC (Proof-of-Concept) in order to simulate a standard user operations load on Windows 7 (x64) based VMs.■ Focus on the technologies “VMware View with Linked Clones" and Citrix "XenDesktop Machine Creation Services".■ Use of state-of-the-art local storage based on enterprise SSD technology.■ Figure out VM density on PRIMERGY S7 generation based on Intel Romley-EP server architecture.Use of terms and namesHyperlinks as footnotes are included in the text for more information about specific products or manufacturers. Recurring abbreviations or termsare explained in the Appendix, together with a hyperlink to the source. The abbreviations and terms when first appearing in the text are linkedto the glossary. Product and trade names are usually abbreviated:  Microsoft® Corporation: Microsoft  Windows® 7: Windows 7  Citrix® Systems, Inc.: Citrix  Citrix® XenApp™: XenApp  Citrix® XenDesktop®: XenDesktop  Citrix® XenServer®: XenServer  VMware™: VMware  VMware™ View™: View  VMware™ vCenter™: vCenter  VMware™ vSphere™: vSphere  VMware™ ESXi™: ESXi  Intel®: Intel  Xeon®: Xeon  Sandy Bridge™: Sandy Bridge  Romley-EP™: Romley-EPThe Microsoft trademarks can be seen at: http://www.microsoft.com/library/toolbar/3.0/trademarks/de-de.mspx.The Citrix trademarks as well as the regulations for the correct name and identification can be found here:http://www.citrix.com/English/aboutCitrix/legal/secondLevel.asp?level2ID=2210.Cisco trademarks are listed here: http://www.cisco.com/web/siteassets/legal/trademark.html.All other products named in this document are trademarks of the respective manufacturer.Page 2 of 11 http://www.fujitsu.com/fts
  3. 3. White paper VDI Performance of PRIMERGY S7 Server GenerationIntroductionThis document is designed to illustrate the performance of the current PRIMERGY S7 product line within a virtual desktop infrastructure (VDI).The focus lies on the CPU and RAM requirements. Storage, network and bandwidth requirements are not part of this White Paper. The results canbe used as a basis for sizing a corresponding VDI environment based on Citrix XenDesktop 5.5 and VMware View 5.0.The processor technology used within this PoC (Proof-of-Concept) is based on latest Intel Xeon processor architecture called Sandy Bridge-EP.Predecessor tests showed a negative impact on VM density when using AMD based processors of type Magny cours.The PRIMERGY servers used to evaluate the results were based on Intel´s current two-way Chipset Romley-EP.Figure 1: Benchmarks Intel Sandy BridgeEach server was equipped with two Intel processors of type E5-2667(@2900Mhz, 2x 6 physical cores, Hyper Threading enabled).The PRIMERGY RX300 S7 was equipped with up to 384 GB RAM (24x 16 GB DDR3 RDIMMs 1600MHz@1333MHz).The PRIMERGY CX250 S1 was equipped with 256 GB RAM (16x 16 GB DDR3 RDIMMs 1600MHz@1600MHz).The PRIMERGY RX300 S7 and PRIMERGY CX250 used for these load tests are predestined for hosting a huge number of virtual desktops.All results will also apply to Fujitsu servers with the same Intel architecture: ■ PRIMERGY TX200 S7 ■ PRIMERGY TX300 S7 ■ PRIMERGY TX300 S7 ■ PRIMERGY RX200 S7 ■ PRIMERGY RX300 S7 ■ PRIMERGY RX350 S7 ■ PRIMERGY CX210 S1 ■ PRIMERGY CX250 S1 ■ PRIMERGY CX270 S1 ■ PRIMERGY BX924 S3 ■ PRIMERGY BX924 S3The following was taken into account when making the load measurements:■ The load simulation is to be carried out, on the one hand, using normal load simulation programs, and on the other hand the comparison with the manufacturer’s determined values should be permitted.■ The basic environment should be kept the same for comparison reasons and any special optimization should be avoided.Page 3 of 11 http://www.fujitsu.com/fts
  4. 4. White paper VDI Performance of PRIMERGY S7 Server GenerationIn order to offer the best possible efficient VDI solution, it is important to align the three resources CPU, RAM and Disk IO in the best possibleway. If one of the components is not of correct size, an optimal use of resources will no longer be possible. The Disk IO is usually the limitingfactor in todays VDI environments. The RAM is second as a possible bottleneck. The CPU resource is usually not a limiting factor in todays VDIarchitectures – but is frequently incorrectly identified as such. This is due to the manner in which the resource CPU is displayed on theadministration consoles. The CPU utilization is usually displayed here. However, there is no differentiation between the actual implementedcomputing cycles and the wait cycles (so-called "wait states"). In todays VDI scenarios the percentage of actual CPU computing work is normallyclearly under 10%. The remaining 90% of the available CPU performance is used up in "Waiting“, e.g. for hard disk read processes.The CPU utilization display in common administration tools (e.g. VMware vCenter Performance, Citrix XenCenter Performance) shows 100% CPUutilization in such situations, as the CPU can no longer accept any further jobs. Wait states occur in virtual infrastructures in a massive manner,as several virtual machines with multiple operating systems clearly create more wait states than a single physical system with only oneoperating system. However, both physical and virtual systems struggle with the large CPU performance discrepancy (fastest resource), RAM (2ndfastest resource) and storage (slowest resource). Thus, it is not surprising that unfavorably sized RAM and/or storage subsystem performancethat is too low (disk IO), are the most frequent reasons for an increased number of wait states on the CPU side. This prevents optimal utilizationof existing virtual infrastructure resources. ■ The inquiries, seen here as raindrops, fill the wait queues of the respective resource – seen here as containers. ■ If a resource such as disk IO is "full", the other resources that are still "empty and free" fill up automatically; wait states arise. ■ The CPU is thus filled up until it is "completely full". As soon as the CPU can no longer take on any more jobs, there is a jam DISK-IO waiting usually for "slower" resources. Once this status is reached, you can usually only indirectly determine who is originally responsible for the situation. RAM CPUFigure 2: Dependencies of CPU, RAM and storageAll load tests were performed on local storage based on Enterprise SSDs (up to 6x 64 GB EP SSDs) configured as a RAID-0 array to circumventbottlenecks related to disk-IO. This was not done due to performance reasons (3 SSDs would have been sufficient), but rather because ofcapacity.For more details regarding hardware configuration, please refer to chapter Hardware of the test environment.Page 4 of 11 http://www.fujitsu.com/fts
  5. 5. White paper VDI Performance of PRIMERGY S7 Server GenerationMarket observation and product selectionAs a basis for this document, the following Hypervisor and management solutions were chosen: ■ Citrix XenServer 6.0 and XenCenter 6.0 ■ VMware ESXi 5.0 and vCenter 5.01 Information made available by the market research institute Gartner (May 2010; replaced with updated figures in June 2011) acted as thebasis for product and manufacturer selection.Figure 3: Market positioning Hypervisors (according to Gartner)The manufacturers VMware, Microsoft und Citrix offer the three most sophisticated and best established Bare Metal Hypervisor products for x86server virtualization. Furthermore, the operating systems to be virtualized within the measurements are based on Windows, with the focus onWindows 7 (x64) and Windows Server 2008 R2 (x64).The three Hypervisor product packages specified above from the manufacturers Citrix, Microsoft and VMware have been extended to includespecial products for desktop virtualization. As Microsoft in larger environments (> 500 clients) recommends the Citrix components for brokeringand provisioning, it was decided not to run parallel tests with Hyper-V 2.0The following combinations were used according to manufacturer recommendations: Citrix XenDesktop 5.5 (VDI component) and Citrix XenServer 6.0 (Hypervisor) VMware View 5.0 (VDI component) and VMware ESXi 5.0i (Hypervisor)Details about the test structure are also in section Structure of load environment.1 Source: http://www.citrix.com/site/resources/dynamic/additional/citirix_magic_quadrant_2011.pdfPage 5 of 11 http://www.fujitsu.com/fts
  6. 6. White paper VDI Performance of PRIMERGY S7 Server GenerationDescription of load test environmentStructure of load environmentFigure 4: illustration of PoC load test infrastructureDescription of Medium WorkloadThe Medium Workload simulates a knowledge worker.It causes about 500 MHz CPU load and consumes about 600 MB RAM for applications (in average per VM).The following applications were used to generate the user load:■ Outlook 2007■ Internet Explorer 8 (including Flash Video)■ Word 2007■ Bullzip PDF Printer & Acrobat Reader■ Excel 2007■ PowerPoint 2007■ 7-zipDescription of Heavy WorkloadThe Heavy Workload simulates a power user.It causes about 700 MHz CPU load and consumes about 750 MB RAM for applications (in average per VM).It is based on the Medium Workload but uses less idle time and more simultaneous applications.Page 6 of 11 http://www.fujitsu.com/fts
  7. 7. White paper VDI Performance of PRIMERGY S7 Server GenerationMeasurements Citrix XenDesktop 5.5 and Machine Creation Services (MCS)■ Maximum number of VMs with load profile "Medium"■ Maximum number of VMs with load profile "Heavy"Measurements VMware View 5.0■ Maximum number of VMs with load profile "Medium"■ Maximum number of VMs with load profile "Heavy"Hardware of the test environmentThe following hardware was used for the load test: CPU number / Number of RAM capacity / Disk type / PRIMERGY CPU type / Cores per server RAM speed max / Number of disks x Capacity / CPU frequency physical / logical RAM speed current RAID Level 2 sockets / SAS HDDs / 60 GB / RX300 S6 Intel Xeon E5600 / 12 /24 4 x 146 GB / 800 MHz 2667 MHz RAID-10 2 sockets / 256 GB /384 GB / EP SSDs / RX300 S7 Intel Xeon E5-2667 / 12 / 24 1600 / 1333 / 3 or 6 x 64 GB / 2900 MHz 1066 MHz RAID-0 2 sockets / EP SSDs / 256 GB CX250 S1 Intel Xeon E5-2667 / 12 / 24 3 or 6 x 64 GB / 1600 MHz 2900 MHz RAID-0Table 1: HardwareInfrastructure VMs of the Test environmentName and function Operating systemDC1: Domain Controller Windows Server 2008 EE R2 x64VC1: VMware vCenter 5.0 for load test infrastructure Windows Server 2008 EE R2 x64SQL: Database Server MS SQL 2008 EE with SP3 Windows Server 2008 EE R2 x64DDC1: Desktop Delivery Controller (DDC) Windows Server 2003 EE 32bitCTX-WI: Citrix Webinterface & Citrix License Server Windows Server 2003 EE 32bitVC2: VMware vCenter 5.0 + vComposer 2.7 for VMs Windows Server 2008 EE R2 x64CS1: VMware Connection Server CS 5.0 Windows Server 2008 EE R2 x64Table 2: Infrastructure ServersPage 7 of 11 http://www.fujitsu.com/fts
  8. 8. White paper VDI Performance of PRIMERGY S7 Server GenerationDescription of Citrix XenDesktop 5.5 VMsAll load tests were performed via a single Pooled Desktop Group with dedicated user allocation.All virtual machines were created via the Machine Creation Services (MCS).MCS requires a NFS File Share which was provided by a NetApp FAS2020 storage subsystem. The NFS File Share need not provide highperformance, since MCS will just use the NFS File Share as central repository for the master image. As soon as the first VM boots from the NFS FileShare, the data will be copied to the pre-configured local storage. All blocks which will consequently be requested by (other) VMs will then betaken from the local storage. The local storage only acts as a cache, which means that even transferring VMs to different hosts is possible.Within our load tests the local storage consisted of 3x 64 GB EP SSDs (RAID-0) to get the required capacity.In a practical environment it might make sense to use bigger SSDs.Please also refer to the previously published white paper Sizing of Virtual Desktop Infrastructures for additional information about differentprovisioning types (Provisioning Server) and storage sizing (calculator for needed IOps).Configuration of XenDesktop Master■ OS: Windows 7 (x64)■ CPU: 1 vCPU (installation was conducted using 2 vCPUs)■ RAM: 1536 MB2 3■ HDD: 30 GB vDisk (Thin )To perform load tests as well as for productive use of Windows 7 based VMs, it is highly recommended to perform several optimizations regardingoperating system as well as for the protocol (ICA / PCoIP) used to access the VM. Furthermore there are some modifications regarding hypervisorwhich can positively affect user density. All these optimizations were implemented according to best practices and took place in close contactwith the corresponding vendors - VMware and Citrix.For further information about these optimizations please feel free to contact your local sales person.2 Citrix sets the minimum RAM size recommended by Microsoft for all templates to 2 GB for Windows 7 (x64). As we wanted to use less RAM withinour load tests, this lower limit can thus be changed via the Dynamic Memory Control Feature. This happens via the following commands within theXenServer console:1) xe vm-list a) Note uuid of the VM to be modified2) xe vm-memory-static-range-set uuid=db5d28e6-039a-d10e-e38e-9550c1c32678 min=1024MiB max=2048MiB3) xe vm-memory-dynamic-range-set uuid=db5d28e6-039a-d10e-e38e-9550c1c32678 min=1024MiB max=2048MiB3 There are always two modes when assigning storage capacity:Thick = the entire allocated capacity is already initially reserved on storage for the VM, irrespective of how much actual data is in the VM partition;optimal performanceFast or Thin = the assigned capacity is always available for the VM, but on storage only the capacity is used, which is filled with data;optimal capacity usage but reduced performancePage 8 of 11 http://www.fujitsu.com/fts
  9. 9. White paper VDI Performance of PRIMERGY S7 Server GenerationDescription of VMware View 5.0 VMsAll load tests were performed in an “Automated Pool” with Floating (= non-persistent) Desktops. Dedicated users were allocated due to load testrequirements.All virtual machines were created via Linked Clone mechanisms out of a single master image. As provisioning method only one replica disk waschosen located on the same LUN as the Linked Clones. This was done to save capacity.No SAN or NAS storage was used during the tests. Within our load tests the local storage consisted of 6x 64 GB EP SSDs (RAID-0) to get therequired capacity for up to 170 VMs. In a practical environment it might make sense to use only two but bigger sized SSDs.Please also refer to the previously published white paper Sizing of Virtual Desktop Infrastructures for additional information about differentprovisioning types (Provisioning Server) and storage sizing (calculator for needed IOPs).Configuration of View Master■ OS: Windows 7 (x64)■ CPU: 1 vCPU■ RAM: 1536 MB■ HDD: 30 GB vDisk (Thick4)To perform load tests as well as for productive use of Windows 7 based VMs, it is highly recommended to perform several optimizations regardingoperating system as well as for the protocol (ICA / PCoIP) used to access the VM. Furthermore there are some modifications regarding hypervisorwhich can positively affect user density. All these optimizations were implemented according to best practices and took place in close contactwith the corresponding vendors - VMware and Citrix.For further information about these optimizations please feel free to contact your local sales person.4 There are always two modes when assigning storage capacity:Thick = the entire allocated memory is already initially reserved on storage for the VM, irrespective of how much actual data is in the VM partition;optimal performanceFast or Thin = the assigned memory is always available for the VM, but on storage only the capacity is used which is filled with data;optimal capacity usage but reduced performancePage 9 of 11 http://www.fujitsu.com/fts
  10. 10. White paper VDI Performance of PRIMERGY S7 Server GenerationLoad testing results – VM densityMaximum VM density per hypervisor host XEON X5650 XEON X5650 XEON E5-2667 XEON E5-2667 Solution Hypervisor & max VM density max VM density max VM density max VM density Provisioning "Medium" workload "Heavy" workload "Medium" workload "Heavy" workloadCitrix XenServer 6.0 andCitrix XenDesktop 5.5 with 103 91 111 97Machine Creation Services (MCS)VMware ESXi 5.0 andVMware View 5.0 103 95 141 96Table 3: VM density per hypervisorFigure 5: VM density per hypervisorThe columns which are marked in light grey show results from load tests with a predecessor server generationbased on Intel Xeon X5650 (Nehalem) CPUs.Page 10 of 11 http://www.fujitsu.com/fts
  11. 11. White paper VDI Performance of PRIMERGY S7 Server GenerationSummaryHardware recommendations for VDI scenariosThe new Intel Chipset Romley EP which supports the new Intel Sandy Bridge processors comes along with some new technologies, which helpcustomers to further improve the VM density in VDI scenarios.Fujitsu’s two-way PRIMERGY models based on this new architecture – such as RX300 S7 - will provide up to 8 physical cores per CPU andincreased memory bandwidth and speed through 4 memory channels per CPU at a maximum speed of 1600 MHz.Recommendation CPUWe recommend you to equip all two-way servers based on Romley EP with 2 processors.A good price performance ratio for VDI scenarios can probably be achieved by using 8 core CPUs of type E5-2650 or higher.Recommendation RAMFurthermore, it is recommended to always use a multiple of 8 DIMM modules @ 1600MHz to benefit from the full memory bandwidth andspeed. Since memory is very vital for VDI scenarios, we recommend you to start with 16x 16 GB DDR3 DIMMs @ 1600MHz (= 256 GB RAM).Results - higher VM densityThe VM density with Heavy workload increased only slightly on Intel Xeon E5-2667 processors, although the new processor architecture shouldprovide a performance increase of ~50%. Future load test and BIOS patches might also lead to increased overall performance.In contrast to the Heavy workload, the Medium workload – especially on VMware ESXi 5.0 - really rocks on the new Intel Chipset and CPU SandyBridge architecture with a higher VM density of 37% compared to the predecessor Nehalem architecture:  Citrix Heavy: +9%  Citrix Medium: +8%  VMware Heavy: +1%  VMware Medium: +37%Dependencies of VM density and memory configurationDuring our load tests we also did some tests using different speeds of the RAM modules (either through different BIOS versions and/or numberof equipped modules). The practical influence of memory speed configurations on the VM density were:  RAM modules @ 1066 MHz: 0% - basis  RAM modules @ 1333 MHz: +8% compared to 1066 MHz  RAM modules @ 1600 MHz: +8% compared to 1333 MHzEach memory speed step provided about 8% more VMs on a single hypervisor host.An optimal memory speed configuration was about 16% faster than using a memory configuration with only 1066 MHz.Contact ƒCopyright 2012 Fujitsu, the Fujitsu logo are trademarks or registered trademarks of Fujitsu Limited in JapanFUJITSU Technology Solutions GmbH and other countries. Other company, product and service names may be trademarks or registered trademarksAddress: Mies-van-der-Rohe-Strasse 8, of their respective owners. Technical data subject to modifications and delivery subject to availability. Any80807 Munich, Germany liability that the data and illustrations are complete, actual or correct is excluded. Designations may beWebsite: www.fujitsu.com/fts trademarks and/or copyrights of the respective manufacturer, the use of which by third parties for their own2012-03-26EN purposes may infringe the rights of such owner.

×