Realtime scheduling for virtual machines in SKT

4,322 views

Published on

The needs for immediate responsiveness of VMs in the virtualized environments have been on the rise. Several services in SKT also require soft realtime support for virtual machines to substitute the physical machines to achieve high utilization and adaptability. However, consolidated multiple OSes and irregular external events might render the hypervisor infringe on a VM's promptitude. As a solution of this problem, we are improving Xen's credit scheduler by introducing the RT_PRIORITY that guarantees a VM's running at any given point in time as long as credits remains to be burn. It would increase the quality of service and make a VM's behavior predictable on the consolidated environment. In addition, we extend our suggestion to the multi-core environment and even a large number of physical machines by using live migrations.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,322
On SlideShare
0
From Embeds
0
Number of Embeds
2,562
Actions
Shares
0
Downloads
89
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Realtime scheduling for virtual machines in SKT

  1. 1. Real-time schedulingfor virtual machinesin SK Telecom Eunkyu ByunCloud Computing Lab., SK TelecomSponsored by: & &
  2. 2. Cloud by Virtualization in SKT• • Provide virtualized ICT infra to customers like Amazon EC2 from SKT’s cloud resource pool exploiting server virtualization • Resources : Servers/PC, Network, Storage, … • Functionalities : load balancing, security solution, back-up, …• Private cloud inside SKT – virtualized servers, virtualized I/O • Migrate IT services on legacy servers to virtualized servers • Provide employees with PaaS for software development • Virtual desktop infrastructure for employees
  3. 3. Cloud as Telecommunication Operator• SK Telecom is a Telecommunication operator as well as a Cloud service provider One Common Cloud Computing Infrastructure General-purpose Server farms Legacy Network/Telecom Services for Cloud Hosting based on on dedicated/reliable equipments virtualization technique Virtual Telco Advantage Requirements - Scale dynamically with demand 1. Guaranteeing time readiness - High utilization 2. Scalability of services - Easy start-up of new services 3. Cost-effective secure storage
  4. 4. Case Study – Virtual Telco.• IMS (IP multimedia subsystems) on Cloud • Delivering IP multimedia services (VoIP, VOD, Instance Message, …) requiring session initiation between participants on Internet to users connected to wireless telecom networks • Launch easily Internet services on wireless network/telecom infra IP network Application Service Media Processing Session Management User Info. Database SIP • Migrate servers for components into Cloud – require high reliability
  5. 5. Challenges1. Guaranteeing time readiness2. Scalability of services3. Cost-effective secure storage• Which virtualization technique, i.e., hypervisor, is most suitable for supporting real-time VM?• We choose…. • Xen Hypervisor – best in responsiveness benchmark, open source • Credit scheduler – default in Xen 4.1, known to be stable • Second option : Credit 2 scheduler
  6. 6. Limitation of credit scheduler(sec) CPU usage(sec) – Media Player VM60 Contention between 6 CPU-intensive VMs(weight = 256) and 1 Media player VM50  Real-time VM can not occupy the40 proper amount of CPU even though with very high weight30  CPU intensive VM makes use of the20 residual credits of non CPU-intensive (i.e. media player) VM’s credits10 0 Need improvement!! (weight)
  7. 7. Research Goal• Find improved soft real-time schedulers based on stable credit scheduler • Fair CPU sharing – each VM occupies CPU (almost) exactly proportional to its weight + work-conserving • Real-time support – fast responsiveness of real-time VMs • Modify credit scheduler to distinguish realtime VM and non-realtime VM • Realtime VMs are marked externally and treated specially to provide fast responsiveness • Co-work with
  8. 8. Preempt based scheduling BOOST > RT > UNDER > OVER Idea - Realtime VM’s VCPU is inserted to the runQ of a physical cpu at right after BOOST priority Non-realtime VMs can run when RT VMs consume all given credits or are blocked Run Queue CPU 0 VCPU 4 VCPU 2 VCPU 7 VCPU 1 VCPU 0 VCPU 5 : Under Priority : Over Priority : Boost Priority VCPU 3 VCPU 6 VCPU 8 : RT Priority New Job to Schedule
  9. 9. BOOST based scheduling (Min Lee, VEE’10 ) In the credit scheduler, VMs can get the highest priority (BOOST) when they receives events if they were blocked However, VMs in runQ is not boosted BOOST realtime VMs always they receives external event even they are in already in runQ Run Queue CPU 0 VCPU 4 VCPU 2 VCPU 7 VCPU 1 VCPU 0 VCPU 5 : Under Priority : Boost Priority External Event : Over Priority : RT Priority
  10. 10. Multi BOOST (by Korea Univ. at XenSummit, Aug, 2011) Multiple BOOSTs at the same time  Driver domain and realtime VM cannot always get the highest priority DRIVER_BOOST > RT_BOOST > BOOST > RT > UNDER > OVER Run Queue CPU 0 VCPU 2 VCPU 7 VCPU 1 VCPU 0 VCPU 5 : Under Priority VCPU 3 VCPU 6 VCPU 8 : Over Priority : Boost Priority : RT Priority External Event !
  11. 11. Performance Evaluation VM5 VM6 RT VM (micro bench) (micro bench) (media player) Physical server spec. V1 V2 V1 V2 V1 AMD Phenom™ II X6 V3 V4 V3 V4 CPU 1055T (6 cores) VM1 VM2 VM3 VM4 Memory 16GB(micro bench) (micro bench) (micro bench) (micro bench) V1 V2 V1 V2 V1 V2 V1 V2 NET Gigabit Ethernet V3 V4 V3 V4 V3 V4 V3 V4 Xen 4.1.1 VM VCPU:4, MEM:1GB PCPU PCPU PCPU PCPU PCPU PCPU 1 2 3 4 5 6 XenHypervisorMicro bench - Not set as RT priority- repeat CPU-intensive computing during random time and sleep for random time- above 98% CPU usage
  12. 12. CPU, Network Usages CPU usage(sec) NET usage(MB)60 500 465.17 51.96 51.52 450 421.750 400 39.7 39.22 38.72 35040 31.58 300 259.04 255.9730 250 193.05 20020 150 12.3 10010 42.22 500 0
  13. 13. Fairness CPU sharing according to Weight RT VM runs CPU-intensive process – fully utilizing CPU RT VM1 VM2 VM3 VM4 VM5 VM6 400 350 28.44 50.77 50.64 44.71 36.67 28.52 300 44.73 36.67 50.8 50.84 28.51 36.56 28.46 250 44.63 50.78 50.42 36.6 28.38 200 44.64 36.55 28.47 50.76 50.85 150 44.71 36.65 50.64 50.64 100 44.82 184.54 50.97 50.58 136.43 50 88.91 50.81 51.33 0 unmodified weight(256)+preemption weight(512)+preemption weight(1024)+preemption weight(2048)+preemption +boost+multiboost +boost+multiboost +boost+multiboost +boost+multiboost •CPU usages are proportional to weight value •Fair between RTVM and no-RTVM with equal weight
  14. 14. Work-conserving(sec) Media Dom0 VM1 VM2 VM3 VM4 VM5 VM6 400 350 358.661 358.645 358.791 358.97 358.098 53.689 52.129 52.294 49.548 13.84% 57.314 15.98% 14.97% 14.53% 14.57% 300 52.205 52.361 49.589 13.85% 57.237 15.96% 53.576 14.94% 14.56% 14.59% 250 52.171 52.338 49.57 13.85% 53.588 14.95% 14.55% 14.59% 56.979 15.89% 200 49.586 13.85% 53.741 52.152 14.54% 52.381 14.6% 14.99% 150 Upper bound for 57.16 15.94% media player 49.572 13.85% 53.699 14.98% 52.113 14.53% 52.35 14.59% 100 57.306 15.98% 49.599 13.86% 65.951 53.712 14.98% 52.258 14.57% 52.377 14.6% 50 57.113 15.93% 51.958 78.79% 51.524 14.39% 31.582 39.698 11.07% 39.219 10.93% 3.43% 8.81% 0 12.3 no_contention unomdified weight weight+preemption weight+preemption weight+preemption +boost +boost+multiboost Weight for VM1~VM6 (non realtime VM) : 256 Weight for Media player VM( realtime VM) : 2048
  15. 15. Responsiveness – Test environment VM5 VM6 RT VM ping request (micro bench) (micro bench) (ping target) (per 0.01sec, 10000 counts) V1 V2 V1 V2 V1 V3 V4 V3 V4 External VM1 VM2 VM3 VM4 server (micro bench) (micro bench) (micro bench) (micro bench) V1 V2 V1 V2 V1 V2 V1 V2 V3 V4 V3 V4 V3 V4 V3 V4 Physical server spec. PCPU PCPU PCPU PCPU PCPU PCPU AMD Phenom™ II X6 1 2 3 4 5 6CPU 1055T (6 cores) Xen HypervisorRAM 16GBNET Gigabit EthernetXen 4.1.1VM VCPU:4, MEM:1GB
  16. 16. Responsiveness (Ping RTT, Credit)• The cumulative distribution of ping RTT as the number of simultaneous CPU- intensive VMs increases no_contention contention_VM1 contention_VM2 contention_VM3 (cumulative contention_VM4 contention_VM5 contention_VM6 distribution) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 (ping RTT, ms) 0 1 2 3 4 5 6 7 8 9 10 + All pings take only 0.5ms 20% of ping takes longer than 10 ms without contention when 7 VMs run simultaneously
  17. 17. Responsiveness (Ping RTT, modified) no_contention contention_VM6 contention_VM6(cumulative +preemptdistribution) contention_VM6 contention_VM6 contention_VM6 +rtboost +multiboost preempt+multiboost 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 7 8 9 10 (ping RTT, ms)After applying our modification, all pings take only 0.5ms even with contention weight of realtime VM = 256 weight of non-realtime VM = 256
  18. 18. What about Credit2 scheduler?(sec) CPU usage(sec) – Media Player VM60  VM burn credits based on their weight50  Higher weight means credits burn more slowly40  VCPUs are inserted into the runQ by credit order30  VM with more credits runs first  Credits are “reset” when the next vcpu20 in the runqueue is less than or equal to zero10 0 Achieve both fairness and work-conserving (weight)
  19. 19. Responsiveness of Credit 2 scheduler(cumulative weight(256) weight(512) weight(768) weight(1024) weight(1536) weight(2048)distribution) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 (ping RTT, ms) 0 0 1 2 3 4 5 6 7 8 9 10 For fast responsiveness, VM needs higher weight. If we want to divide CPU cycle equally between VMs? Using special policy for realtime VM is necessary.
  20. 20. Ongoing Research• What if there are several realtime VMs competing limited physical server/core ? • Prediction based scheduling between real-time VMs • Load balancing between physical cores • Efficient placement policy of RTVMs between physical servers • Load balancing between physical servers using live migration of VMs
  21. 21. Summary• SK telecom is trying to operate Telco services on cloud resources• Realtime support in hypervisor is essential• Analyzed the performance of modifications of Credit scheduler of Xen hypervisor • For one realtime VM per physical core, fair sharing and fast responsiveness• Plan to improve for more complex and practical cases
  22. 22. Thank youContact: eunkyu.byun@sk.com
  23. 23. Comparison of hypervisers• Evaluation environment • Physical server : Dell R410 (Xeon 8 cores, 16GB Memory) • Virtual Machine : 1 Core, 1 GB Memory, 20GB HDD • Increase the number of VMs running benchmarks • Benchmarks : PCMARK 2005, kernel compile, SPEC-CPU 2006 • A real-time application measure the delay of the timer interrupt handling in OS of VM • Measure every 5 sec. for ten minutes
  24. 24. KVM 0.12.5
  25. 25. Vmware ESX 4.1
  26. 26. Xen 4.0 •Xen is the best one, but not sufficient•Contention of non real-time VMs affects the responsiveness of real-time VM
  27. 27. Approach• VM scheduler in the hypervisor is important • Credit scheduler • Stable (default scheduler in Xen hypervisor 4.0) , SMP support • Need improvement for latency-sensitive VM • Credit 2 scheduler • Proportional sharing according to weight of each VM • Provide responsiveness to VMs with larger weights • Not so stable yet, need more analysis

×