Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

We have presented the idea of coarse grain lock-stepping (COLO) virtual machiens for non-stop service in last year's xen summit. We have made significant progress in the past year and submitted the patch series to the community. It is a good time for us to present the latest status to the community and call for participation.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this


  1. 1. Status of COLO Project Eddie Dong*, Xiaowei Yang# *Intel Open Source Technology Center #Huawei Technology Co. Key Contributors: Jianshan Lai, Congyang Wen, Tao Hong 1
  2. 2. Notices and Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel may make changes to specifications, product descriptions, and plans at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. All dates provided are subject to change without notice. Intel and Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2013, Intel Corporation. All rights reserved. 2
  3. 3. Agenda     3 Background Status Performance Call for action
  4. 4. What is COLO ?  COarse-grain LOck-stepping Virtual Machines for Non-stop Service  Solution for Client / Server application without application awareness  Dual VM based high availability solution  Relaxed constraints for higher performance
  5. 5. Replicated network   Copy client request to both PVM/SVM Compare response packets from PVM and SVM with compare module  When both are the same the response is send to the client  When they are not the same, sync PVM and SVM and then send the response
  6. 6. Non-Stop Service with VM Replication Primary APPs Secondary Fail Over OS PVM VMM OS VM Replication Network Hardware SVM VMM Hardware Hardware Failure Storage 6 APPs Compare w/ Remus
  7. 7. Problems with existing approaches  Instruction level lock-stepping  Excessive  memory  overhead from maintaining the exact machine state access in an MP-guest is un-deterministic Periodic Check-pointing  Extra network latency  Excessive VM checkpoint overhead 7
  8. 8. Relaxed constraints help  Relaxing constraints tends to lower the rate of synchronization  Periodic check-pointing defines the rate of synchronization  Tying the rate of synchronization to dissimilar responses ties it to the application characteristics  In 8 most cases this lowers the rate as compared to the periodic mothod
  9. 9. Architecture of COLO COarse-grain LOck-stepping Virtual Machine for Non-stop Service 9
  10. 10. Agenda     10 Background Status Performance Call for action
  11. 11. Current Status   Patches for Xen are sent to the mailing list Academia paper published at ACM Symposium on Cloud Computing (SOCC’13)  Refer to “COLO: COarse-grained LOck-stepping Virtual Machines for Nonstop Service” for details   Industry announcement  Huawei FusionSphere uses COLO  list/HW_308817?KeyTemps= 11
  12. 12. TCP/IP optimization          Per-Connection Comparison (no modification to TCP/IP) Coarse-grain TCP Timestamp Coarse-grain TCP Notification Window Size Deterministic Algorithm to segment application data Deterministic Algorithm to generate Initail Seq Number Deterministic Algorithm to generate ID(IP packet header) Immediately Acknowledgement Use separated packet to send FIN …
  13. 13. EXAMPLE:Coarse-grain TCP Notification Window Size Coarse-grain Window size rules: if origin window < 256 rounds down to the nearest power of 2 else masks the 8 least significant bits For example: 1.orgin window size=172(10101100b) set window size to 128(1000000b) 2. orgin window size=283(100011011b) set window size to 256(100000000b) 3. orgin window size=789(1100010101b) set window size to 768(1100000000b)
  14. 14. EXAMPLE :Deterministic segmentation Application data to send at T1 and T2 3000 B 2000 B App data1 (Time point1) App data2(Time point2) TCP/IP packet header Method1:Find latest unsent skb and append app data2 to unused tail skb payload 280 B 1080 B 920 B 1360 B 1360 B Method2:Find latest unsent skb(skb==NULL) and use new skb to send app data2 Colo Deterministic Method:NOT check the latest unsend skb and use new skb to send app data2 280 B 1360 B 640 B 1360 B 1360 B
  15. 15. Storage process Write Pnode DM sends the Write request (offset, len, data) to PVM cache in Snode DM calls block driver to write to storage Snode DM saves Write request in SVM cache Read Snode From SVM cache, or storage otherwise Pnode From storage Checkpoint DM calls block driver to flush PVM cache Failover DM calls block driver to flush SVM cache
  16. 16. Memory sync   One of the biggest time-consume step Asynchronous sends dirty memory when the PVM/SVM are running Less dirty memory transmission during VM checkpoint  Less CPU pressure and latency   Critical for the case where the VM checkpoint happens very few
  17. 17. Faster VBD/VIF frontend/backend suspend/resume  Old method:  communication between Frontend and backend through xenstored - low efficient  New method:  Use event channel to speed frontend/backend communication
  18. 18. Agenda     Background Status Performance Call for action *Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 18
  19. 19. Web Server Performance - Web Bench For more complete information about performance and benchmark results, visit 19 Source: Intel
  20. 20. Web Server Performance - Web Bench (MP) For more complete information about performance and benchmark results, visit Source: Intel 20
  21. 21. PostgreSQL Performance - Pgbench For more complete information about performance and benchmark results, visit 21 Source: Intel
  22. 22. PostgreSQL Performance - Pgbench (MP) For more complete information about performance and benchmark results, visit 22 Source: Intel
  23. 23. Upstream  Initial patch series are posted  More  comments are welcome Depend on the readiness of the Remus on top of XL  COLO reuses Remus for VM checkpoint and heartbeat
  24. 24. Agenda     24 Background Status Performance Call for action
  25. 25. Next and Call for actions    Work good when HVM linux guest + PV driver Window guest support is under developement Need more participants and fast turn over of upstreaming