Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Minimizing I/O Latency in Xen-ARM

7,412 views

Published on

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Minimizing I/O Latency in Xen-ARM

  1. 1. Minimizing I/O Latency in Xen-ARM Seehwan Yoo, Chuck Yoo and OSVirtual Korea University
  2. 2. I/O Latency Issues in Xen-ARM•  Guaranteed I/O latency is essential to Xen-ARM –  For mobile communication, •  Broken/missing phone calls, long network detection •  AP, CP are used for guaranteed communication•  Virtualization premises transparent execution –  Not only instruction execution; access isolation –  But also performance; performance isolation•  In practice, I/O latency is still an obstacle –  Lack of scheduler support •  Hierarchical scheduling nature i.e. hypervisor cannot impose task scheduling inside a guest OS •  The credit scheduler doesn’t match to time-sensitive applications –  Latency due to the split driver model •  Enhances reliability, but degrades performance (w.r.t. I/O latency) •  Inter-domain communication between IDD and user domain•  We investigate these issues, and present possible remediesOperating Systems Lab. http://os.korea.ac.kr 2
  3. 3. Related Work•  Credit schedulers in Xen –  Task-aware scheduler (by Hwanjoo et. al. vee’08) •  Inspection of task execution at guest OS •  Adaptively use boost mechanism by VM workload characteristics –  Laxity-based soft RT scheduler (by Lee min et. al. vee’10) •  Laxity calculation by execution profile •  Assign priority based on the remaining time to deadline (laxity) –  Dynamic core-allocation (by Y. Hu et. al. hpdc’10) •  Core-allocation by workload characteristics (driver core, fast/slow tick core)•  Mobile/embedded hypervisors –  OKL4 microvisor: microkernel-based hypervisor •  Has verified kernel – from sel4 •  Presents good performance – commercially successful •  Real-time optimizations – slice donation, direct switching, reflective scheduling, threaded interrupt handling, etc. –  VirtualLogix – VLX: mobile hypervisor for real-time support •  Shared driver model – device sharing model among RT-guest OS and non-RT guest OSs •  Good PV performanceOperating Systems Lab. http://os.korea.ac.kr 3
  4. 4. Background – the Credit in Xen•  Weighted round-robin based fair scheduler•  Priority – BOOST, UNDER, OVER –  Basic principle: preserve fairness among all VCPUs •  Each vcpu gets credit periodically •  Credit is debited as vcpu consumes execution time –  Priority assignment •  Remaining credit <= 0 à OVER (lowest) •  Remaining credit > 0 à UNDER •  Event-pending VCPU : BOOST (highest) –  BOOST: for providing low response time •  Allows immediate preemption of the current vcpuOperating Systems Lab. http://os.korea.ac.kr 4
  5. 5. Fallacies for the BOOST in the Credit•  Fallacy 1) VCPU is always boosted by I/O event –  In fact, BOOST is sometimes ignored because VCPU is boosted only when it doesn’t break the fairness •  ‘Not-boosted vcpu’s are observed when the vcpu is in-execution –  Example 1) •  If a user domain has CPU job and waiting for execution, then •  It is not boosted since it will be executed soon, and tentative BOOST is easy to break the fairness•  Fallacy 2) BOOST always prioritizes the VCPU –  In fact, BOOST is easy to be negated because multiple vcpus can be boosted simultaneously •  ‘Multi-boost’ happens quite often in split driver model –  Example 1) •  Driver domain has to be boosted, and then •  User domain also needs to be boosted –  Example 2) •  Driver domain has multiple pkts that are destined to multiple user domains, then •  All the designated user domains are boostedOperating Systems Lab. http://os.korea.ac.kr 5
  6. 6. Xen-ARM’s Latency Characteristic•  I/O latency measured throughout interrupt path –  Preemption latency : until code is preemptible –  VCPU dispatch latency : until the designated VCPU is scheduled –  Intra-VM latency : until IO completion Dom0 DomU dispatch dispatch Phy intr. Dom0 DomU Xen-ARM Dom0 DomU Intr. handler I/O task I/O task Preemption Vcpu Vcpu dispatch Intra-VM dispatch Intra-VM latency latency latency latency latency Operating Systems Lab. http://os.korea.ac.kr 6
  7. 7. Observed Latency thru intr. path•  I/O latency measured throughout interrupt path –  Send ping request from external server to dom1 –  VM settings •  Dom0 : the driver domain •  Dom1 : 20% cpu load + ping recv. •  Dom2 : 100% cpu load (CPU-burning workload) –  Xen-netback latency •  Large worst-case latency •  Dom0 vcpu dispatch lat. + intra-dom0 lat. –  Netback-domU latency •  Large average latency •  Dom1 vcpu dispatch lat. Operating Systems Lab. http://os.korea.ac.kr 7
  8. 8. Observed VCPU dispatch latency : not-boosted vcpu•  Experimental setting (%) 100 –  Dom1: varying CPU workload –  Dom2: burning CPU workload –  Another external host sends ping Cumulative latency distribution 80 to Dom1•  Not-boosted vcpus affect I/O latency distribution 60 –  Dom1 CPU workload 20% : almost 90% ping requests are handled within 1ms 40 –  Dom1 CPU workload 40% : 75% ping requests are handled 20% CPU load @ dom1 40% CPU load @ dom1 within 1ms 20 60% CPU load @ dom1 –  Dom1 CPU workload 60% : 65% 80% CPU load @ dom1 ping requests are handled within 1ms 0 –  Dom1 CPU workload 80% : only 60% 0 2 4 6 8 10 12 14 ping requests are handled Latency distribution by CPU load Latency (ms) within 1ms•  When Dom1 has more CPU load è larger I/O latency (self-disturbing by not-boosted vcpu) Operating Systems Lab. http://os.korea.ac.kr 8
  9. 9. Observed Multi-boost•  At the hypervisor, we vcpu state priority sched. out count counted the number Blocked BOOST 275 of “schedule out” of the UNDER 13 driver domain Unblocked BOOST 664,190 UNDER 49•  Multi-boost is specifically when –  the current VCPU is BOOST state, and it is unblocked –  Imply that there should be another BOOST vcpu, and the current vcpu is scheduled out•  Large number of Multi-boostsOperating Systems Lab. http://os.korea.ac.kr 9
  10. 10. Intra-VM latency 1•  Latency Cumulative distribution 0.95 from usb_irq to netback 0.9 Xen-usb_irq @ dom0 Xen-netback @ dom0 Xen-icmp @ dom1 –  Schedule outs 0.85 Dom0 : IDD Dom1 : CPU 20%+ ping recv during dom0 execution Dom2 : CPU 100% 0.8 0 20 40 60 80 Latency (ms) 1•  Reasons Cumulative distribution 0.95 –  Dom0 : not always the higest prio. 0.9 Xen-usb_irq @ dom0 –  Asynch I/O handling : Xen-netback @ dom0 Xen-icmp @ dom1 0.85 Dom0 : IDD bottom half, softirq, Dom1 : CPU 80%+ ping recv Dom2 : CPU 100% tasklets, etc. 0.8 0 20 40 60 80 Latency (ms)Operating Systems Lab. http://os.korea.ac.kr 10
  11. 11. Virtual Interrupt Preemption @ Guest OS <At driver domain>•  Interrupt enable/disable is not physically void default_idle(void) { operated within a guest OS Hardware interrupt local_irq_disable(); –  local_irq_disable(): disables Set a pending bit only virtual intr. Virtual (because virtual if (!need_resched()) interrupt intr. is disabled) // blocks this domain disabled•  Physical intr. can be occurred arch_idle(); –  might trigger inter-VM local_irq_enable(); scheduling } Driver domain is scheduled out•  Virtual interrupt preemption having pending IRQ –  Similar to lock-holder preemption –  Perform driver function with interrupt disabled –  Physical timer intr. triggers inter-VM scheduling –  Virt. Intr. can be received only after the domain is scheduled again (as large as tens of ms) Note that the driver domain performs extensive I/O operation that disables interruptOperating Systems Lab. http://os.korea.ac.kr 11
  12. 12. Resolving I/O Latency Problems for Xen-ARM1.  Fixed priority assignment –  Let the driver domain always run with DRIVER_BOOST, which is the highest priority •  regardless of the CPU workload •  Resolves non-boosted VCPU and multi-boost –  RT_BOOST, BOOST for real-time I/O domain2.  Virtual FIQ support –  ARM-specific interrupt optimization –  Higher priority than normal IRQ interrupt –  vPSR (Program status register) usage3.  Do not schedule out the driver domain when it disables virtual interrupt –  It will be finished soon, and the hypervisor should give a chance to run the driver domain •  Resolves virtual interrupt preemptionOperating Systems Lab. http://os.korea.ac.kr 12
  13. 13. Enhanced Latency : no dom1 vcpu dispatch latency 1 1 0.9 0.9 Cumulative distribution Cumulative distribution 0.8 0.8 0.7 0.7 Xen-netback @dom0 Xen-netback @ dom0 Xen-icmp @ dom1 Xen-icmp @ dom1 0.6 0.6 Dom0 : IDD Dom0 : IDD 0.5 Dom1 : CPU 20%+ ping recv Dom1 : CPU 40%+ ping recv 0.5 Dom2 : CPU 100% Dom2 : CPU 100% 0.4 0.4 0 20 40 60 80 0 20 40 60 80 Latency (ms) Latency (ms) 1 1 0.9 0.9 Cumulative distribution Cumulative distribution 0.8 0.8 0.7 0.7 Xen-netback @ dom0 Xen-netback @ dom0 Xen-icmp @ dom1 Xen-icmp @ dom1 0.6 0.6 Dom0 : IDD Dom0 : IDD Dom1 : CPU 60%+ ping recv Dom1 : CPU 80%+ ping recv 0.5 0.5 Dom2 : CPU 100% Dom2 : CPU 100% 0.4 0.4 0 20 40 60 80 0 20 40 60 80 Latency (ms) Latency (ms)Operating Systems Lab. http://os.korea.ac.kr 13
  14. 14. Enhanced Interrupt Latency : @ Driver Domain•  Vcpu dispatch latency 1 –  From intr. handler @ Xen to hard irq handler @ IDD 0.99 Cumulative distribution –  Fixed priority dom0 vcpu 0.98 Dom0 : IDD Dom1 : CPU 80%+ ping recv –  Virtual FIQ Dom2 : CPU 100% –  No latency is observed! 0.97 Xen-usb_irq - orig. Xen-netback - orig. 0.96 Xen-usb_irq - new Xen-netback - new•  Intra-VM latency 0.95 –  From ISR to netback @ IDD 0 20 40 60 80 Latency (ms) –  No virt. intr. preemption (dom0 highest prio.)•  Among 13M intr., –  56K are caught where virt intr preemption happened –  8.5M preemption occurred with FIQ optimizationOperating Systems Lab. http://os.korea.ac.kr 14
  15. 15. Enhanced end-user Latency : overall result•  Over 1 million ping tests, –  Fixed priority make the driver domain to run without additional latency (from inter-VM scheduling) •  Largely reduces overall latency –  99% interrupts (%) 100 are handled Latency distribution (cumulative) within 1ms 95 reduced latency 90 Xen-domU (enhanced) Xen-domU (original) 85 80 0 10 20 30 40 50 60 Latencies in all intervals (ms)Operating Systems Lab. http://os.korea.ac.kr 15
  16. 16. Conclusion and Possible Future work•  We analyzed I/O latency in Xen-ARM virtual machine –  Throughout the interrupt handling path in split driver model•  Two main reasons for long latency –  Limitation of ‘BOOST’ in the Xen-ARM’s Credit scheduler •  Not-boosted vcpu •  Multi-boost –  Driver domain’s virtualized interrupt handling •  Virtual interrupt preemption (aka. lock-holder preemption)•  Achieve under millisecond latency for 99% network packet interrupts –  DRIVER_BOOST; the highest priority for driver domain –  Modify scheduler in order for the driver domain not to be scheduled out while virtual interrupts are disabled –  Further optimizations (incl. virtual FIQ mode, softirq-awareness)•  Possible future work –  Multi-core consideration/extension (core allocation, etc.) •  Other scheduler integration –  Tight latency guarantee for real-time guest OS •  Rest 1% holds the keyOperating Systems Lab. http://os.korea.ac.kr 16
  17. 17. Thanks for your attentionCredits to OSvirtual @ oslab, KU 17
  18. 18. Appendix. Native comparison •  Comparison with native system – no cpuworkload –  Slightly reduced handling latency, largely reduced max. Orig.   Orig.     New  sched.   New  sched.   Na#ve   dom0   domU   Dom0   DomU  Min         375   459   575   444   584  Avg         532.52   821.48   912.42   576.54   736.06  Max   107456   100782   100964   1883   2208  Stdev   1792.34   4656.26   4009.78   41.84   45.95   1 Cumulative distribution 0.8 0.6 0.4 native ping (eth0) orig. dom0 0.2 orig. domU new sched. dom0 new sched. domU 0 300 400 500 600 700 800 900 1000 Latency (us) Operating Systems Lab. http://os.korea.ac.kr 18
  19. 19. Appendix. Fairness? •  is still good, and achieves high utilization CPU burning jobs’ utilization dom2 dom1 NO I/O (ideal case) Orig. credit New scheduler * SettingDom1: 20, 40, 60, 80, 100% CPU load + ping receiverDom2: 100% CPU loadNote that the credit is work-conserving Normalized Orig. credit New scheduler throughput = ( Measured thruput/ ideal thruput ) Operating Systems Lab. http://os.korea.ac.kr 19
  20. 20. Appendix. Latency in multi-OS env. •  3 domain cases with differentiated service –  added RT_BOOST priority, between DRIVER_BOOST and BOOST –  Dom1 assuming RT –  Dom2 and Dom3 for GP Credit’s latency dist. 3 domains for Enhanced latency dist. 3 doms. for 10K ping tests (interval 10~40ms) 10K ping tests (interval 10~40ms)100.00% 100.00%90.00% 90.00%80.00% 80.00%70.00% 70.00%60.00% 60.00%50.00% Dom1 50.00% Dom140.00% Dom2 40.00% Dom230.00% Dom3 30.00% Dom320.00% 20.00%10.00% 10.00% 0.00% 0.00% 0 6000 12000 18000 24000 30000 36000 42000 48000 54000 60000 66000 72000 78000 84000 90000 96000 102000 108000 114000 120000 0 6000 12000 18000 24000 30000 36000 42000 48000 54000 60000 66000 72000 78000 84000 90000 96000 102000 108000 114000 120000 Operating Systems Lab. http://os.korea.ac.kr 20
  21. 21. Appendix. Latency in multi-OS env.•  3 domain cases with differentiated service –  added RT_BOOST priority, between DRIVER_BOOST and BOOST –  Dom1 assuming RT –  Dom2 and Dom3 for GP 100.00% 90.00% 80.00% 70.00% 60.00% Enh. Dom1 Enh. Dom2 50.00% Enh. Dom3 40.00% Credit Dom1 30.00% Credit Dom2 20.00% Credit Dom3 10.00% 0.00%Operating Systems Lab. http://os.korea.ac.kr 21

×