Intel(R) Xeon(R) E7 v3-based X6 platforms + Lenovo Flex System Interconnect Fabric solutions deliver a highly-reliable, cost-efficient and scalable system for your data center.
2. Introduction
As technocrats, business decision makers, or technology enthusiasts, we always like making the most
out of the resources we have. Squeezing out that last drop of orange juice before buying the next one,
or topping up the fuel tank before crossing the border where fuel prices are higher, we all want more
out of less. Virtualization is similar. It has a lot of benefits, but also requires expensive hardware and
supporting software. This paper illustrates that we can reduce the number of virtual hosts by 50%,
when using Intel® Xeon® E7 v3-based X6 blade servers.
Keeping the above in mind, it only makes sense to load each server with as many VMs as possible to
get the most out of them. One would think that many real world setups would have a high utilization
number, but that is not the case.
Real world Data
A 2015 Lenovo study collected utilization data from 30 customers running virtualized hosts that were
setup between 2010 and 2013, with a total of 5542 cores and 62TB of RAM. We found that 95% of the
time, the CPU utilization never exceeded 24%, and CPU peaks were at 35%. On the contrary, RAM
utilizations were at 90%, while the average RAM per Core ratio was 11GB for three or four year old
servers.
4
2
2
12
5
76
13
10
10
10
11
360
13
16
24
12
4
13
17
5640
13
12
10
310
19
-
5.00
10.00
15.00
20.00
25.00
30.00
35.00
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
RAMperCore
95%tile ProcessorUtilization
UtilizationData
Most large setups had system utilization of less than 24%. The only outliers were small customers with
two or three hosts, and that had a very high RAM to Core Ratio. Below are key observations:
1. A bank in India was running three hosts with 2xE5530 4Core @ 24GB RAM per core with 85%
Processor utilization (95% of the time).
2. A Film producer in Canada was running six hosts with 2xX5670 and 2xE5-2670 @ 16GB RAM
per Core with 40% Processor utilization (95% of the time).
3. A Large Bank in the US was running 60 hosts with 2xX5650, 2xX5670 @ 12GB RAM per core
with 28% Processor utilization (95% of the time).
4. A multi-university bookstore in the US was running 56 hosts with 4xE7-8837, 2xX5675 with
14GB RAM per core with 9.15% Processor utilization (95% of the time).
x-Axis = 95%percentile CPU
utilization
y-Axis = RAM per Core Ratio
Label Above point = Number of
hosts
3. 5. A manufacturing customer in the US was running seven hosts with 2x5570, 2x5670 @ 7GB
RAM per core with 19% Processor utilization (95% of the time).
6. Even with 12 or 16GB RAM per core on four year old servers, the average utilization of all the
above systems did not exceed 40% (95% of the time).
While this data may not be true for all customers, it does highlight the point that you should look at
the utilization data for your own setup. If the CPU utilization is less than 70% for <two year old servers,
then this paper is for you. Lenovo technical teams can also do an infrastructure analysis of your setup
to get the above data.
The Right Platform
The simplest way to increase the processor utilization is to increase the RAM on every host, and then
add VMs. Based on the above data, most customers have underutilized cores. If we refresh these
servers with 2015 Intel® Xeon® E7-8800 v3 series processors, we can increase the RAM per core to 24,
32, or 48GB RAM, due to the increased performance of each socket. Doing this increases VM density.
At the same time, this makes many IT administrators hesitant. Here’s why:
1. Increasing VM density causes I/O bottlenecks, and the amount of I/O available may be insufficient.
2. During host failures, VM migrations can cause bottlenecks in the network, and the current system
may not be able to handle the large Ethernet traffic.
3. Failure of a host results in a higher percentage loss of resources, when the total hosts are lesser.
Traditionally, having more hosts gives IT Administrators better peace of mind. We have found
though, that ten-twelve hosts keep many management costs low, yet provides enough
redundancy to keep the environment running at optimal efficiency.
4. Traditionally, E5 systems lacking availability features have been used, which lack availability
features. Hence, increasing VM density on hosts that lack such capabilities makes IT administrators
reluctant.
If many servers in your current setup are reaching their end of warranty soon, it makes better sense
in terms of ROI, to replace the existing servers, with fewer more powerful servers, possessing more
RAM. Then the question of the right platform arises. The adjacent table shows some of the differences
between the Lenovo Flex System x240 M5 Compute Node E5-based 2-Socket blade and the Lenovo
Flex System x880 X6 Compute Node
E7-based scalable blade.
Intel® Xeon® E5 processor-
based systems have a
maximum of 12 DIMM slots per
socket, and have few reliability
capabilities; the E5s are not
designed to be mission critical.
Due to their lower hardware
cost, they are the market choice for enterprise virtualization.
Intel® Xeon® E7 processor-based systems are designed for mission critical workloads, and while
they come with a slew of RAS capabilities, they also have 24 DIMMs per socket. These processors
are the gold standard for database servers due to their scalability, reliability and memory
capabilities.
We have established that RAM is the key factor in increasing VM density. E7 systems enable a
maximum RAM scalability of 1.5TB per 2-socket system, almost 48GB RAM per Core (36C/1,536GB
x240M5 x880
Processor E5-2600v3 E7-4800v3/ 8800v3
No of Cores per 2S Blade 36 36
DIMM Slots per 2S Blade 24 48
Max RAM with 16GB DIMMs 384 GB 768 GB
Max RAM with 32GB DIMMs 768 GB 1536 GB
PCI Slots 2 4
Max BW per 2S Blade 120Gbps 240Gbps
4. RAM), which meets the VM density requirement. As the E7 processor-based systems are designed for
mission critical workloads, they also address the availability concerns stated before. The picture
below1
shows some of the reliability features of Intel® Xeon® E7.
Lenovo further improves the high availability capabilities by integrating the E7-based Lenovo Flex
System x880 X6 Compute Node blade server with Lenovo XClarity and VMware vCenter/ Microsoft
System Centre. This results in the following capabilities:
1. Minimizing unplanned downtime: monitoring server hardware health and automatically
evacuating VMs in response to predictive alerts (PFA) before actual failure, (e.g., when a PFA alert
is received [up to 48 hours before a failure], the hypervisor manager is informed, and it initiates a
VM migration to evacuate all workloads to a spare host.
2. Non-disruptive rolling firmware updates: workload migrations are done automatically, while the
server firmware is being upgraded.
Increasing the number of VMs per blade also requires an increase in the I/O bandwidth allocated. As
the Flex System x880 X6 Compute Node blade system has 2X the number of I/O slots versus the E5-
based blade system, this bottleneck is also addressed. Hence, an E7-8800 v3-based Lenovo Flex System
x880 X6 Compute Node offers a minimum of 2X the VM density with increased reliability features as
compared to the Lenovo Flex System x240 M5 Compute Node E5-2600v3 based blade.
However, the solution isn’t complete without considering a supporting infrastructure that is
complementary to the idea of a simple, lower total cost, and scalable setup.
1 Intel, Intel® Xeon® Processor E7 Family: Reliability, Availability, and Serviceability,
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-e7-family-ras-server-paper.pdf, (August 15,
2015).
Courtesy:
Intel.com
5. Flex System Interconnect Fabric
Our objective is to increase the VM density per host, to reduce overall costs. However, such a setup
also leads to higher networking costs due to higher bandwidth needs. Consider this, problem #1
To address problem #1 we use the Lenovo Flex System
Interconnect Fabric2
(FSIF), which consists of a single pair of
FCoE gateway switches (G8264CS) connected to a pair of
FCoE Transit switches (SI4093 or EN4093R) in each chassis.
Up to nine chassis can be connected in this setup. Two
G8264CS switches manage all of the 126 slots across the 9
chassis, and the intermediate switches become invisible.
This architecture has the following advantages:
1. 2X uplink bandwidth due to active-active uplinks
2. Faster network traffic, as MAC address lookups happen
just once (at the first module only, not every module).
3. Easier management for all switches inside the PoD
4. Zero downtime upgrades
5. High scalability upto 126 blades
This also results in the lowest price per chassis that Lenovo has to offer. However the chassis uplink
bandwidth becomes a bottleneck. For example, if there are eight blades in the chassis, and we have
only four uplinks, then the total bandwidth for both SAN and LAN traffic is only 5Gbps per blade, not
10Gbps. The SI/ EN4093R switches can support up to 14x10Gbps and 2x40Gbps uplink ports, so the
uplink bandwidth can be increased. However, this uses up the ports in the Top-of-Rack switch. Other
OEMs that offer similar architectures have network latency3
problems. This becomes problem #2.
To address problem #2, we use a uniqueness of the FlexSystem Chassis within the FSIF design that
allows us to add more bandwidth to servers as needed. Below is a picture of the rear of the chassis
showing the four switch bays; in the FSIF architecture, only
switch bays 1 & 2 are used; 3 and 4 are free. If more bandwidth
is needed – e.g., we have some database servers inside the
chassis – we can add two additional modules of FC5022
16Gbps SAN switches. This does two things:
1. It provides dedicated SAN traffic to the internal nodes
directly to the external SAN switch or storage
2. It reduces the load on the existing FCoE uplink pipe and the
G8264CS TOR switches, and provides more Ethernet and
SAN bandwidth to the other nodes
If the ethernet traffic load is high, rather than the 16Gbps SAN
switches, add more L2/L3 10/40Gbps EN4093R switches. This
flexibility allows for easier balancing of network traffic, by
adding bandwidth where needed; a feature not available from
Lenovo competitors.
2
You tube video: Lenovo Flex System Interconnect Fabric Provisioning –https://www.youtube.com/watch?v=UH5BBBFuDD8
3
Refer to Tolly Report #214014
1 23 4
6. Commercial Viability
Intel Xeon E7-8800v3 based systems clearly have many advantages over Intel Xeon E5-2600v3 based
systems, but they are also higher priced. Considering that the chassis cost is fixed, we have used our
internal pricing to give a concept of the cost differential between various configurations of E5 and E7
systems. The chart below uses the price of a single Lenovo Flex System x240M5 Compute Node with
2xE5-2640v3 256GB RAM with baseline of $1.00, and compares that with other configurations and
costs. The $3.55 bar in blue shows the cost of software license (VMware enterprise socket licence, 3
yrs, and MS windows 2012R ROK Datacenter Edition), which is 3.55 times the price of the Flex System
x240 M5 Compute Node baseline price. Each 2S E5 blade has 2x10Gbps FCoE, and each 4S E7 blade
has 4x10Gbps FCoE. Hence the only variable is the core count.
We see that when software and hardware prices are combined, and the RAM per Core Ratio is
increased, the E7v3 x880 X6 compute nodes are less expensive than the E5v3 x240 M5 compute nodes
– and we get all of the mission critical RAS capabilities for free. In fact, in servers >28 cores per host,
the cost of one X6 blade is equal to the hardware price of two Flex System x240M5 E5 based blades
loaded with the same total RAM.
7. Summary
This paper is a guide that should encourage you to begin discussion and collaboration among the IT
professionals in your organization. The intent of this paper is provide you with a new perspective and
encourage you to reassess your virtualized infrastructure in view of the realization that lower cost may
not always be the most price-optimized solution.
As we’ve illustrated, at Lenovo, we strive to provide superior solutions that deliver the highest possible
ROI to organizations. This involves getting a better perspective of the full picture of price, availability
and performance, and proposing the most appropriate solution for a specific environment. Once we
collectively understand the various unique variables in your environment, our teams can right-size a
solution so that you get the best value for your investment.
For virtualization, the proposed Lenovo blade architecture on Intel Xeon E7-8800v3 based X6 platform
+ Lenovo Flex System Interconnect Fabric solution delivers a highly-reliable, cost efficient and scalable
system for your datacenter.
8. Acknowledgements
The industry expertise, personal support and valuable contributions of the following Lenovo
colleagues significantly helped to shape this paper:
1. Joseph Jakubowski, Christopher Floyd and Marc Baker - for their guidance around the
fundamentals of virtualization.
2. Jonathan Wu and Shekhar Mishra - for encouraging me on this idea, every step of the way.
3. Thomas Vezina - for providing me the utilization data - without which the story would be
incomplete.
4. Mohammed Yasser - for giving me a deeper understanding of FSIF.
5. Velayuthum Karani and Vijaykumar Kulageri - for being my “sounding boards.”
6. Katherine Holoman - for helping me be a better writer, and converting my thoughts into words
that make a difference!
Further Reading
Content referred:
1. Lenovo Press - Flex System Interconnect Fabric
2. Lenovo Press - Flex System Chassis
3. Lenovo Press - Flex System x240M5 Node
4. Lenovo Press - Flex System x880 X6 Node
5. Lenovo XClarity Software – website for demo, and Lenovo Press
6. Tolly Report – VM Migration and Aggregate Network Performance vs Cisco UCS
7. Optimal Virtualization With X6 – Lenovo Press DRAFT
8. All charts and tables are internally created. Data is internally collected.