Your SlideShare is downloading. ×
XS Oracle 2009 CVF
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

XS Oracle 2009 CVF

560
views

Published on

Ze'ev Maor: Client Virtualization Framework

Ze'ev Maor: Client Virtualization Framework

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
560
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The On-going Evolutions of Power Management in Xen Kevin Tian Open Source Technology Center (OTC) Intel Cooperation
  • 2. Agenda • Brief history • Evolutions of Idle power management • Evolutions of run-time power management • Tools • Experimental data about Xen power efficiency • Power impact from VM Intel Confidential 2
  • 3. Brief History Enhanced green computing Improved stability Jan. 2008 Jun. 2008 Better usability Jul. 2007 Xen 3.2 Mature deep C- … released states support Host S3 Xen 3.1 Sep. 2007 May. 2007 Aug. 2008 Dom0 Preliminary Xen 3.3 controlled cpufreq and released freq/vol cpuidle support scaling in Xen Intel Confidential 3
  • 4. Evolutions of Idle power management (Xen cpuidle) Intel Confidential 4
  • 5. Xen Summit Boston 2008 Dom0 ACPI Parser External Control Schedulers Registration Enter Idle Hypercall Ladder Halt (C1) CpuIdle driver Dynamic Timekeeping Tick C1 C2 Mwait/IO Xen Intel Confidential 5
  • 6. Enhanced C-states support Dom0 noPM x ps p1 hv m noC3 x ps p1 hv m 120% C3 x ps p1 hv m 100% 100% 100% ACPI Parser 80% 102.60% Percentage External 91.40% 60% 107.70% Control 82.30% 40% 20% Schedulers 0% id le (W att) SPEC p o w e r Registration Enter Idle (Sco r e ) Hypercall Ladder Halt (C1) ‘noPM’ has both cpuidle and cpufreq CpuIdle driver disabled, and vice versa for other two Dynamic Timekeeping Tick cases. Compared to ‘C3’, noC3 has C1 C2 Mwait/IO Deep C-states maximum C-state limited to C2 Timer C1E Xentrace For idle watt, lower value means greener. For SPECpower score, higher value indicates more power efficient Xen Intel Confidential 6
  • 7. TSC freeze TSC freeze Ideal TSC Unsynchronized TSC Time went backwards warning Dom0 CPU0 Actual TSC Lots of lost ticks Fluctuating TSC scale factor ACPI Parser CPU1 Faster ToD Xen system time skew External Control 0 1 sec 1 sec Scale error … Platform counter … TSC Restore TSC upon elapsed platform counter since current entry Schedulers Percpu platform to TSC scale … TSC Registration Enter Idle Restore TSC upon elapsed platform Software compensation counter since last calibration according to elapsed Percpu platform to TSC scale … counter of platform timer TSCLadder Hypercall Halt (C1) … Restore TSC upon elapsed platform counter since power on CpuIdle driver Global platform to TSC scale Dynamic Timekeeping … Tick TSC C1 C2 Mwait/IO Deep C-states Timer C1E TSC Xentrace save/restore Hardware enhancement to have TSC Always never stopped (e.g. by Intel Core-i7) running TSC Platform counter TSC Xen Intel Confidential 7
  • 8. APIC timer freeze Dom0 Timer heap Nearest deadline Reprogram local APIC T count-down timer T T APIC timer ACPI Parser T T T T freeze interrupt when … count down to zero External Scan/execute Control expired timers (Delayed) Timer softirq handler Local APIC ISR Schedulers Registration Enter Idle Solution use platform timer Halt (C1) (PIT/HPET) to carry nearest Hypercall Ladder deadline, when APIC timer is halted in deep C- states: CpuIdle driver Dynamic Timekeeping Tick C1 C2 Mwait/IO Deep C-states is required since number of platform Broadcast Timer timer source is less than number of CPUs C1E PIT/HPET TSC Xentrace broadcast save/restore will come soon to have more MSI based HPET interrupt platform timer sources with reduced broadcast traffic Always running TSC will be supported in new CPU Always running APIC timer soon Xen Intel Confidential 8
  • 9. Menu governor Ladder Dom0 Expected minimal residency at Cn Expected minimal residency at Cn+1 C0 ACPI Parser Cn External Control Cn+1 Inefficient Promotion if continuous N Cn Schedulers residencies > expectation Demotion if current Cn+1 residency < expectation Registration Enter Idle less idle watt consumed (-5.2%) Higher SPECpower score (+1.6%) Hypercall Ladder Menu (HVM WinWPsp1) CpuIdle driver Menu Halt (C1) C1 C2 Mwait/IO Deep C-states Nearest timer deadline C1, 1ns, 20w Dynamic PICK Timekeeping Last unexpected break event Tick C1E C2, 10ns, 15w PIT/HPET TSC broadcast save/restore Timer C3, 100ns, 5w Latency/power requirement … Always Xentrace running TSC To be further tuned! Xen Intel Confidential 9
  • 10. Range timer Dom0 0(ms) 1 2 3 4 5 6 For each timer, it accepts a range for expiration now: C0 ACPI Parser [expiration, expiration + Cn Frequent C-state External timer_slop] Control Entry/exit may (default 50us for timer_slop) Instead consume 0(ms) 1 2 3 4 5 6 More power Overlapping ranges can be merged to reduce timer Schedulers interrupt count C0 Registration Enter Idle Cn Hypercall Ladder Menu Halt (C1) CpuIdle driver Dynamic Timekeeping Tick C1 C2 Mwait/IO Deep C-states Range Timer timer C1E PIT/HPET TSC Xentrace broadcast save/restore Always running TSC Xen Intel Confidential 10
  • 11. Range timer effect One UP HVM RHEL5u1 Multiple idle UP HVM RHEL5u1 10000 range=50us 9000 range=1ms timer interrupt/second 8000 7000 6000 -7.5% +1.2% 5000 4000 3000 2000 1000 Idle(watt) SPECpower(score) 0 100% 100% 50us 1HVM 2HVM 4HVM 92.50% 101.20% 1ms Collected on a two-cores mobile platform Intel Confidential 11
  • 12. Current picture Dom0 Xenpm ACPI Parser External Control Power Schedulers Aware Registration Enter Idle Hypercall Statistics Ladder Menu Halt (C1) CpuIdle driver Dynamic Timekeeping Tick C1 C2 Mwait/IO Deep C-states Range Timer timer C1E PIT/HPET TSC Xentrace broadcast save/restore Always running TSC Xen Intel Confidential 12
  • 13. Evolutions of run-time power management (Xen cpufreq) Intel Confidential 13
  • 14. Xen Summit Boston 2008 Dom0 Linux PM Tools On User Others Demand Space Cpufreq core Power ACPI Others ACPI Parser Now-K8 cpufreq Linux Cpufreq External Control Query Idle State Registration Schedulers On Demand Tiny cpufreq core Enable MSR ACPI Cpufreq Access (IA32) Perm Xen Intel Confidential 14
  • 15. Current picture Dom0 Linux PM Xenpm ? Tools On User Others Demand Space Cpufreq core Power ACPI Others ACPI Parser Now-K8 cpufreq Linux Cpufreq External Control Query Idle More State governors Registration Schedulers Turbo mode On User Perfor Power Enhanced Demand Space mance Save Control User Tiny cpufreq core Interface Statistics Enable Cpu offline MSR ACPI ACPI Power / online Cpufreq Cpufreq Now Access (IA32) (IA64) K8 Perm More Xen drivers Intel Confidential 15
  • 16. Tools Intel Confidential 16
  • 17. Dom0 Retrieve run-time Xenpm statistics about Xen cpuidle and cpufreq Apply user policy on exposed control knobs of Xen cpufreq (governor, set freq, etc) More capabilities to be added later, e.g. profile… Log every state change for Xen cpuidle and cpufreq: Xentrace CPU0 391365842416 (+ 21204) cpu_idle_entry [ idle to state 2 ] CPU0 391375951050 (+10108634) cpu_idle_exit [ return from state 2 ] Raw data could be further processed by other scripts Xen Intel Confidential 17
  • 18. Experimental data about Xen power efficiency Intel Confidential 18
  • 19. • All data shown in this section: – For reference only and not guaranteed – On a two-core mobile platform, with one HVM guest created • Server consolidation effect with multiple VMs/workloads are in progress • Improvement when Xen cpuidle and cpufreq are enabled – SPECpower score is normalized (100% noPM score is 1000 ssj_ops / watt) – Similarly, consumed watt is also normalized (idle noPM watt is 10w) 1500 25 noPM-score PM-score normalized score (ssj_ops/watt) noPM-watt PM-watt 20 Normalized power (watt) Reduced 1000 Power! 15 10 500 Improved Efficiency 5 0 0 0 10 20 30 40 50 60 70 80 90 100 SPECpower workload (%) Intel Confidential 19
  • 20. • Below is a more attracting comparison – Native WinXPsp1’s idle watt and SPECpower score are both normalized to 100% as the base native xpsp1 140% native rhel5u1 HZ=1000 in xen xpsp1 hvm RHEL5u1 incurs high timer interrupt kvm xpsp1 hvm 130% xen rhel5u1 hvm kvm rhel5u1 hvm 120% Normalized percentage (%) 110% 100% Xen is slightly more power 90% efficient than KVM Similar Idle power 80% consumption for Xen and KVM 70% 60% Native Native 50% idle (Watt) SPECpow er (ssj_ops/w att) Intel Confidential 20
  • 21. Power impact from VM Intel Confidential 21
  • 22. • VMM shouldn’t be blamed as only reason for high power consumption! • A ‘bad’ VM could eat power – Just like what ‘bad’ application could do in a native OS – Cause high break events (e.g. timer interrupts) with short C-state residency • Which parts in VM could draw high power? – ‘Bad’ applications hurt just as what they could on native – How guest OS is implemented also matters • Periodic tick frequency – HZ • Timer usage in drivers • Time sub-system implementation • … • Green guest OS wins! – Smaller HZ, idle tickles or fully dynamic tick, range timer, etc. Intel Confidential 22
  • 23. Idle power consumption 120.00% 4 115.00% 3.5 ‘Bad’ guest 110.00% 3 Average residency (ms) Normalized watt 105.00% 2.5 100.00% 2 95.00% 1.5 Green Green 90.00% 1 85.00% 0.5 80.00% 0 bare Dom0 PV HVM Winxp HZ=1000 HZ=250 HZ=100 tickless Idle power consumption Average C-state residency Intel Confidential 23
  • 24. Power efficiency 1020 1000 Normalized score (ssj_ops / watt) 980 960 940 920 900 880 860 840 820 800 bare Dom0 PV HVM Winxp HZ=1000 HZ=100 tickless SPECpower score Intel Confidential 24
  • 25. Legal Information • INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. • Intel may make changes to specifications, product descriptions, and plans at any time, without notice. • All dates provided are subject to change without notice. • Intel is a trademark of Intel Corporation in the U.S. and other countries. • *Other names and brands may be claimed as the property of others. • Copyright © 2007, Intel Corporation. All rights are protected. Intel Confidential 25
  • 26. Intel Confidential 26

×