1. VMware vSphere Fault Tolerance for Multiprocessor
Virtual Machines - Technical Preview
Jim Chow, VMware
Wei Xu, VMware
BCO5065
#BCO5065
2. 22
Disclaimer
This session may contain product features that are
currently under development.
This session/overview of the new technology represents
no commitment from VMware to deliver these features in
any generally available product.
Features are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new technologies or features
discussed or presented have not been determined.
10. 1010
vCenter Server
Central management server
Continuous availability difficult
Multiprocessor FT makes it simple
• Natural fit
VMware
vCenter Server
11. 1111
Backing up FT VMs
Support for vStorage APIs for Data Protection (VADP)
• API for non-disruptive snapshots
19. 1919
Performance Numbers
0
20
40
60
80
100
Microsoft SQL
Server 2-vCPU
Microsoft SQL
Server 4-vCPU
Oracle
Swingbench 2-
vCPU
Oracle
Swingbench 4-
vCPU
% Throughput (FT/non FT)
(higher is better)
Similar configuration to vSphere 4 FT Performance Whitepaper
• Models real-world workloads: 60% CPU utilization
20. 2020
43% of companies experiencing disasters never
re-open, and 29% close within two years.
(McGladrey and Pullen)
93% of business that lost their data center for
10 days went bankrupt within one year.
(National Archives & Records Administration)
Top executives say 10 hours to recovery;
IT managers say up to 30 hours.
(Harris Interactive)
Disasters Happen. Do You Need Protection?
21. 2121
Do You Need Protection?
Server failures happen
• Google released some data about their server failures
• 2% to 4% servers fail, 1% to 5% of disk drives crash.
• 20 rack failures: 40-80 machines instantly disappeared
• 1-6 hours to get back
Sources
http://content.dell.com/us/en/gen/d/large-business/google-data-center
22. 2222
vSphere Offers Protection at Every Level
NIC Teaming,
Storage
Multipathing
High Availability,
Fault Tolerance, vMotion,
DRS
Storage
vMotion
Site
Recovery
Manager
Component Server Storage Data Site
Backup Solutions
Protection against hardware failures
Planned maintenance with zero downtime
Protection against unplanned downtime
and disasters
24. 2424
Background
2009: vSphere Fault Tolerance in vSphere 4.0
2010: Updates to vSphere Fault Tolerance in vSphere 4.1
2011: Updates to vSphere Fault Tolerance in vSphere 5.0
Details: http://www.vmware.com/products/fault-tolerance/
Problem:
• FT only for uni-processor VMs
• Is FT for multi-processor VMs possible?
• An impressively hard problem
• Concerted effort to find an approach
Reached a key milestone
• We’d like to share it
25. 2525
A Starting Point: vSphere FT
vLockstep
vSphere ESX
(Primary)
vSphere ESX
(Secondary)
28. 2828
Turning on Multiprocessor FT
Creating two VMs
A new VM, but identical configuration
• vRAM, # vCPUs, vNICs, etc.
Each VM owns a complete set of VM files
• Separate vmdks completely owned by each VM
Primary VM
Disk 2
Config
Disk 1
Secondary VM
Disk 2
Config
Disk 1
31. 3131
Datastores
One datastore must be common
Ensures only one running copy of the VM at any time
Primary VM
Disk 2
Config
Disk 1
Secondary VM
Disk 2
Config
Disk 1
Tie
Break
Datastore