Zhang Chen <chen.zhang@intel.com>
Legal Disclaimer
 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE,
EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED
BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,
INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY,
RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO
FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR
OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE
SAVING, OR LIFE SUSTAINING APPLICATIONS.
 Intel may make changes to specifications and product descriptions at any time, without notice.
 All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without
notice.
 Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the
product to deviate from published specifications. Current characterized errata are available on request.
 Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States
and other countries.
 *Other names and brands may be claimed as the property of others.
 Copyright © 2019 Intel Corporation.
1
Agenda
Background Introduction
Introduction COarse-grain LOck-stepping
New Changes
Performance And Customers
Future Work About COLO
2
Non-Stop Service with VM Replication
Virtual Machine (VM) replication
 A software solution for business continuity and disaster recovery
through application-agnostic hardware fault tolerance by replicating
the state of primary VM (PVM) to secondary VM (SVM) on different
physical nodes.
3
Agenda
Background Introduction
Introduction COarse-grain LOck-stepping
New Changes
Performance And Customers
Future Work About COLO
4
What Is COLO ?
 Clients and Server(VM) model
 Server(VM) and Clients are a system of networked request-response
system
 Clients only care about the response from the VM
 COarse-grain LOck-stepping VMs for Non-stop Service
(COLO)
 PVM and SVM execute in parallel
 Compare the output packets from PVM and SVM to detect VM status
 Synchronize SVM state with PVM when their responses (network
packets) are not identical (PVM status not same as SVM. )
5
Scenario
 Stateless service
 For example:
UDP based application
Part of online game, etc.
 Stateful service
 For example:
TCP based application
Live streaming, website service, etc.
COLO is not designed for stateless service, but for critical stateful
service. Our solution can protect user service even from physical
network or power interruption.
6
Why COLO Better
 Comparing with Continuous VM checkpoint
 No buffering-introduced latency
 Less checkpoint frequency
 On demand vs periodic
 Comparing with Instruction lock-stepping
 Eliminate excessive overhead of un-deterministic instruction
execution due to Multi-Processor-guest memory access
7
COLO History
COLO paper published on SOCC13
2013.10
COLO framework merged by Xen
2015.7
All COLO related patches
merged by Qemu
2018.10
New features
Now
8
Old COLO Architecture
COarse-grain LOck-stepping Virtual Machine for Non-stop Service
Primary Node
Primary VM
Heartbeat
Qemu
COLO Proxy
(compare packet
and mirror packet)
Block
Replication
InternalNetwork
Secondary Node
Secondary VMDomain 0
COLO Proxy
(Adjust Sequence
Number and ACK)
Block
Replication
Xen Xen
External NetworkExternal NetworkStorage Storage
Request
Response
Request
Response
Disk IO
Net IO
Domain 0
COLO Frame
Qemu
COLO Frame
Heartbeat
9
New COLO Architecture
COarse-grain LOck-stepping Virtual Machine for Non-stop Service
Primary Node
Primary VM
Heartbeat
Qemu
COLO Proxy
(compare packet
and mirror packet)
Block
Replication
InternalNetwork
Secondary Node
Secondary VMDomain 0
COLO Proxy
(Adjust Sequence
Number and ACK)
Block
Replication
Xen Xen
External NetworkExternal NetworkStorage Storage
Request
Response
Request
Response
Disk IO
Net IO
Domain 0
Qemu
Heartbeat
COLO Frame COLO Frame
10
COLO Architecture Difference
 Re-developed COLO Proxy
 Old COLO proxy based on kernel netfilter module, need
manually patch and rebuild kernel in domain0.
 New COLO proxy code in Qemu, no need modify kernel.
 More generic code, split COLO proxy to filter-redirector, filter-
mirror, filter-rewriter and colo-compare.
11
Block Replication(Storage Process) Write
 Pnode
 Send the write request to Snode
 Write the write request to storage
 Snode
 Receive PVM write request
 Read original data to SVM cache &
write PVM write request to
storage(Copy On Write)
 Write SVM write request to SVM
cache
Read
 Pnode
 Read form storage
 Snode
 Read from SVM cache, or storage
(SVM cache miss)
Checkpoint
Drop SVM cache
Failover
Write SVM cache to storage
Base on qemu’s quorum,nbd,backup-driver,backingfile
Detail: Qemu/docs/block-replication.txt
Primary Node
Primary VM
Qemu
Block
Replication
InternalNetwork
Secondary Node
Secondary
VM
Domain 0
Block
Replication
Xen Xen
Storage StorageDisk IO
Domain 0
Qemu
12
COLO Frame (Memory Sync Process)
• PNode
– Track PVM dirty pages, send them to Snode.
• Snode
– Receive the PVM dirty pages, save them.
– On checkpoint, update SVM memory.
Primary Node
Primary VM
InternalNetwork
Secondary Node
Secondary
VM
Domain 0
Xen Xen
Domain 0
Transfer Dirty Pages and VM state
COLO FrameCOLO Frame
13
Agenda
Background Introduction
Introduction COarse-grain LOck-stepping
New Changes
Performance And Customers
Future Work About COLO
14
COLO Proxy Design
Detail: Qemu/docs/colo-proxy.txt
15
 Kernel scheme(obsolete):
 Based on kernel TCP/IP stack and netfilter component.
 Better performance but less flexible (need modify
netfilter/iptables and kernel)
 Userspace scheme:
 All implemented in QEMU.
 Based on QEMU netfilter components and SLIRP
component.
 More flexible.
Proxy Design (Kernel scheme)
Same: release the packet to client
Different: trigger checkpoint and release packet to client
Base on kernel TCP/IP and netfileter
Guest-RX
 Pnode
 Receive a packet from client
 Copy the packet and send to Snode
 Send the packet to PVM
 Snode
 Receive the packet from Pnode
 Adjust packet’s ack_seq number
 Send the packet to SVM
Guest-TX
 Snode
 Receive the packet from SVM
 Adjust packet’s seq number
 Send the SVM packet to Pnode
 Pnode
 Receive the packet from PVM
 Receive the packet from Snode
 Compare PVM/SVM packet
Primary Node
Primary VM
Qemu
InternalNetwork
Secondary Node
Secondary
VM
Domain 0
Xen Xen
Domain 0
Request
Response
Request
Response
Kernel
COLO
Proxy
Qemu
Kernel
COLO
Proxy
External NetworkExternal Network
16
Proxy Design (Userspace scheme)
Same: release the packet to client
Different: trigger checkpoint and release packet to client
Base on Qemu’s netfilter and SLIRP (Userspace TCP/IP stack)
Guest-RX
 Pnode
 Receive a packet from client
 Copy the packet and send to Snode
 Send the packet to PVM
 Snode
 Receive the packet from Pnode
 Adjust packet’s ack_seq number
 Send the packet to SVM
Guest-TX
 Snode
 Receive the packet from SVM
 Adjust packet’s seq number
 Send the SVM packet to Pnode
 Pnode
 Receive the packet from PVM
 Receive the packet from Snode
 Compare PVM/SVM packet
Primary Node
Primary VM
Qemu
InternalNetwork
Secondary Node
Secondary
VM
Domain 0
Xen Xen
Domain 0
Request
Response
Request
Response
KernelCOLO
Proxy
Qemu
Kernel
COLO
Proxy
External NetworkExternal Network
Kernel
17
COLO internal connection On XenPrimary Node
Primary VM
Heartbeat
Qemu
COLO Proxy
(compare packet
and mirror packet)
Block
Replication
InternalNetwork
Secondary Node
Secondary VMDomain 0
COLO Proxy
(Adjust Sequence
Number and ACK)
Block
Replication
Xen Xen
External NetworkExternal NetworkStorage Storage
Request
Response
Net IO
Domain 0
Qemu
Heartbeat
COLO Frame COLO Frame
18
New Feature
 Integrate COLO heartbeat module
 New heartbeat module in Qemu, make users enable
COLO easily while preserving external heartbeat
interface.
 Internal heartbeat can reduce the time of fault detection
and response.
 Enable continuous backup
 Continuous backup the VM after failover.
19
Agenda
Background Introduction
Introduction COarse-grain LOck-stepping
New Changes
Performance And Customers
Future Work About COLO
20
COLO Performance
39%
43%
46%
43%
26%
32%
34%
52%
32%
65%
47%
92%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Webbench 50c 60s
Webbench 50c 200s
Webbench 50c 600s
FTP 200M GET
FTP 1G GET
FTP 200M PUT
FTP 1G PUT
Netperf TCP_RR
Netperf TCP_STREAM
Netperf UDP_RR
Netperf UDP_STREAM
Kernel build
COLO VM VS Native VM
21
Agenda
Background Introduction
Introduction COarse-grain LOck-stepping
New Changes
Performance And Customers
Future Work About COLO
22
Future Work
 Bugfix and strengthen stability
 Shared storage support
 Network performance optimization
 Libvirt support
23
XPDDS19: Application Agnostic High Availability Solution On Hypervisor Level - Chen Zhang, Intel

XPDDS19: Application Agnostic High Availability Solution On Hypervisor Level - Chen Zhang, Intel

  • 1.
  • 2.
    Legal Disclaimer  INFORMATIONIN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.  *Other names and brands may be claimed as the property of others.  Copyright © 2019 Intel Corporation. 1
  • 3.
    Agenda Background Introduction Introduction COarse-grainLOck-stepping New Changes Performance And Customers Future Work About COLO 2
  • 4.
    Non-Stop Service withVM Replication Virtual Machine (VM) replication  A software solution for business continuity and disaster recovery through application-agnostic hardware fault tolerance by replicating the state of primary VM (PVM) to secondary VM (SVM) on different physical nodes. 3
  • 5.
    Agenda Background Introduction Introduction COarse-grainLOck-stepping New Changes Performance And Customers Future Work About COLO 4
  • 6.
    What Is COLO?  Clients and Server(VM) model  Server(VM) and Clients are a system of networked request-response system  Clients only care about the response from the VM  COarse-grain LOck-stepping VMs for Non-stop Service (COLO)  PVM and SVM execute in parallel  Compare the output packets from PVM and SVM to detect VM status  Synchronize SVM state with PVM when their responses (network packets) are not identical (PVM status not same as SVM. ) 5
  • 7.
    Scenario  Stateless service For example: UDP based application Part of online game, etc.  Stateful service  For example: TCP based application Live streaming, website service, etc. COLO is not designed for stateless service, but for critical stateful service. Our solution can protect user service even from physical network or power interruption. 6
  • 8.
    Why COLO Better Comparing with Continuous VM checkpoint  No buffering-introduced latency  Less checkpoint frequency  On demand vs periodic  Comparing with Instruction lock-stepping  Eliminate excessive overhead of un-deterministic instruction execution due to Multi-Processor-guest memory access 7
  • 9.
    COLO History COLO paperpublished on SOCC13 2013.10 COLO framework merged by Xen 2015.7 All COLO related patches merged by Qemu 2018.10 New features Now 8
  • 10.
    Old COLO Architecture COarse-grainLOck-stepping Virtual Machine for Non-stop Service Primary Node Primary VM Heartbeat Qemu COLO Proxy (compare packet and mirror packet) Block Replication InternalNetwork Secondary Node Secondary VMDomain 0 COLO Proxy (Adjust Sequence Number and ACK) Block Replication Xen Xen External NetworkExternal NetworkStorage Storage Request Response Request Response Disk IO Net IO Domain 0 COLO Frame Qemu COLO Frame Heartbeat 9
  • 11.
    New COLO Architecture COarse-grainLOck-stepping Virtual Machine for Non-stop Service Primary Node Primary VM Heartbeat Qemu COLO Proxy (compare packet and mirror packet) Block Replication InternalNetwork Secondary Node Secondary VMDomain 0 COLO Proxy (Adjust Sequence Number and ACK) Block Replication Xen Xen External NetworkExternal NetworkStorage Storage Request Response Request Response Disk IO Net IO Domain 0 Qemu Heartbeat COLO Frame COLO Frame 10
  • 12.
    COLO Architecture Difference Re-developed COLO Proxy  Old COLO proxy based on kernel netfilter module, need manually patch and rebuild kernel in domain0.  New COLO proxy code in Qemu, no need modify kernel.  More generic code, split COLO proxy to filter-redirector, filter- mirror, filter-rewriter and colo-compare. 11
  • 13.
    Block Replication(Storage Process)Write  Pnode  Send the write request to Snode  Write the write request to storage  Snode  Receive PVM write request  Read original data to SVM cache & write PVM write request to storage(Copy On Write)  Write SVM write request to SVM cache Read  Pnode  Read form storage  Snode  Read from SVM cache, or storage (SVM cache miss) Checkpoint Drop SVM cache Failover Write SVM cache to storage Base on qemu’s quorum,nbd,backup-driver,backingfile Detail: Qemu/docs/block-replication.txt Primary Node Primary VM Qemu Block Replication InternalNetwork Secondary Node Secondary VM Domain 0 Block Replication Xen Xen Storage StorageDisk IO Domain 0 Qemu 12
  • 14.
    COLO Frame (MemorySync Process) • PNode – Track PVM dirty pages, send them to Snode. • Snode – Receive the PVM dirty pages, save them. – On checkpoint, update SVM memory. Primary Node Primary VM InternalNetwork Secondary Node Secondary VM Domain 0 Xen Xen Domain 0 Transfer Dirty Pages and VM state COLO FrameCOLO Frame 13
  • 15.
    Agenda Background Introduction Introduction COarse-grainLOck-stepping New Changes Performance And Customers Future Work About COLO 14
  • 16.
    COLO Proxy Design Detail:Qemu/docs/colo-proxy.txt 15  Kernel scheme(obsolete):  Based on kernel TCP/IP stack and netfilter component.  Better performance but less flexible (need modify netfilter/iptables and kernel)  Userspace scheme:  All implemented in QEMU.  Based on QEMU netfilter components and SLIRP component.  More flexible.
  • 17.
    Proxy Design (Kernelscheme) Same: release the packet to client Different: trigger checkpoint and release packet to client Base on kernel TCP/IP and netfileter Guest-RX  Pnode  Receive a packet from client  Copy the packet and send to Snode  Send the packet to PVM  Snode  Receive the packet from Pnode  Adjust packet’s ack_seq number  Send the packet to SVM Guest-TX  Snode  Receive the packet from SVM  Adjust packet’s seq number  Send the SVM packet to Pnode  Pnode  Receive the packet from PVM  Receive the packet from Snode  Compare PVM/SVM packet Primary Node Primary VM Qemu InternalNetwork Secondary Node Secondary VM Domain 0 Xen Xen Domain 0 Request Response Request Response Kernel COLO Proxy Qemu Kernel COLO Proxy External NetworkExternal Network 16
  • 18.
    Proxy Design (Userspacescheme) Same: release the packet to client Different: trigger checkpoint and release packet to client Base on Qemu’s netfilter and SLIRP (Userspace TCP/IP stack) Guest-RX  Pnode  Receive a packet from client  Copy the packet and send to Snode  Send the packet to PVM  Snode  Receive the packet from Pnode  Adjust packet’s ack_seq number  Send the packet to SVM Guest-TX  Snode  Receive the packet from SVM  Adjust packet’s seq number  Send the SVM packet to Pnode  Pnode  Receive the packet from PVM  Receive the packet from Snode  Compare PVM/SVM packet Primary Node Primary VM Qemu InternalNetwork Secondary Node Secondary VM Domain 0 Xen Xen Domain 0 Request Response Request Response KernelCOLO Proxy Qemu Kernel COLO Proxy External NetworkExternal Network Kernel 17
  • 19.
    COLO internal connectionOn XenPrimary Node Primary VM Heartbeat Qemu COLO Proxy (compare packet and mirror packet) Block Replication InternalNetwork Secondary Node Secondary VMDomain 0 COLO Proxy (Adjust Sequence Number and ACK) Block Replication Xen Xen External NetworkExternal NetworkStorage Storage Request Response Net IO Domain 0 Qemu Heartbeat COLO Frame COLO Frame 18
  • 20.
    New Feature  IntegrateCOLO heartbeat module  New heartbeat module in Qemu, make users enable COLO easily while preserving external heartbeat interface.  Internal heartbeat can reduce the time of fault detection and response.  Enable continuous backup  Continuous backup the VM after failover. 19
  • 21.
    Agenda Background Introduction Introduction COarse-grainLOck-stepping New Changes Performance And Customers Future Work About COLO 20
  • 22.
    COLO Performance 39% 43% 46% 43% 26% 32% 34% 52% 32% 65% 47% 92% 0% 10%20% 30% 40% 50% 60% 70% 80% 90% 100% Webbench 50c 60s Webbench 50c 200s Webbench 50c 600s FTP 200M GET FTP 1G GET FTP 200M PUT FTP 1G PUT Netperf TCP_RR Netperf TCP_STREAM Netperf UDP_RR Netperf UDP_STREAM Kernel build COLO VM VS Native VM 21
  • 23.
    Agenda Background Introduction Introduction COarse-grainLOck-stepping New Changes Performance And Customers Future Work About COLO 22
  • 24.
    Future Work  Bugfixand strengthen stability  Shared storage support  Network performance optimization  Libvirt support 23