XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel
Upcoming SlideShare
Loading in...5
×
 

XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel

on

  • 1,621 views

Mobile devices, such as smart phones and tablets, are becoming de-facto everyday computing and communication devices, virtualization can bring additional benfits to mobile devices for both security ...

Mobile devices, such as smart phones and tablets, are becoming de-facto everyday computing and communication devices, virtualization can bring additional benfits to mobile devices for both security and manageability. IT department may use hypervisor, as a highly secure solution, to manage autherized mobile devices, such as for network traffic monitoring, filtering, scan (for virus detection), and/or OS update/patching even when the guest OS becomes completely dead. We insert Xen to the mobile OS Android to deprivilege Android as guest for security and manageability purpose. However, the usage case of mobile device is quit different with that of server, for example mobile devices runs completely different benchmarks (mostly multimedia focused) vs. that in server (mostly responsiveness focused). We analyze the gap of Xen as a mobile hypervisor and present how we improve the performance.

Statistics

Views

Total Views
1,621
Views on SlideShare
1,615
Embed Views
6

Actions

Likes
0
Downloads
27
Comments
0

2 Embeds 6

http://www.xenproject.org 4
http://xenproject.org 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel XPDS13: Performance Optimization on Xen-based Android Device - Jack Ren, Intel and Xiantao Zhang, Intel Presentation Transcript

    • Performance Optimization on Xenbased Android device Jack Ren/Xiantao Zhang/Dongxiao Xu Key contributor: Eddie Dong Intel Corporation
    • Legal Disclaimer  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.  Intel may make changes to specifications and product descriptions at any time, without notice.  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.  Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.  *Other names and brands may be claimed as the property of others.  Copyright © 2012 Intel Corporation.
    • Agenda • Overview • Design Details • Gaps, Analysis & Optimizations • Summary 3
    • Overview • Back to Xen Summit 2011 in Seoul… “Mobile virtualization will be more important…Xen has unique advantages there” - <<Mobile Virtualization using the Xen Technoligies>>, Jun Nakajima, Intel. And Jun proposed xen-based Android system:
    • Overview continue • New use case: Android in Dom0, hypervisor as TEE Dom0 TEE: Trusted Execution Engine Android userland (ring 3) Gallery VideoPlayer Surface Manager OpenGLES GFX Video Virtual CPU Virtual MMU Browser Android framework Android Kernel (ring 1) Xen (ring 0) … Dalvik … PM … Virtual IRQ … But we don’t want to sacrifice performance and power too much
    • Design Details • Android runs almost natively Virtualization performance I/O pass-through to Android close to native performance CPU vCPUs pinned to physical CPUs Eliminate the vCPU scheduling penalty MMU Para-virtualized Good run time performance IRQ Xen owns, dispatch to Main overhead: ring switch Android via I/O: 21% downgrade − For example, Quadrantevent channel FPU Para-virtualized No vCPU scheduling, very good performance CpuIdle Pass-through to Android Completely consistent with Android PM CpuFreq Pass-through to Android Same as above Standby (S3) Pass-through to Android Same as above Standby (S3) is a little bit tricky…
    • Design Details continue Re-design S3 • Dom0 owns the full suspend/resume logic. • Xen assists Dom0 to issue the real monitor/mwait. • 2X faster than native for S3 resume. CPU0 HYPERVISOR_ do_mwait_suspend() do_mwait_suspend() CPU1 CPU2 CPU3 mwait wake up CPU0 CPU1 sleep HYPERVISOR_ vcpu_op(VCPUOP_down) do_mwait_suspend() mwait mwait HYPERVISOR_ vcpu_op(VCPUOP_up) CPU2 CPU3 Time line
    • Preliminary Power (normalized) • > 90% of benchmarks reach 95% of native power Power KPIs 105% 100% 95% 90% 85% 80% Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance But we still identified several gaps…
    • Preliminary Performance (normalized) •> 90% of benchmarks reach 97% of native performance Performance KPIs 120.00% 100.00% 80.00% 60.00% 40.00% USB MTP write… USB MTP erad large… CF-Bench (malloc) WLAN download 3G HSDPA download H.264 video record H.264/MPEG-4 AVC… ColdBoot time to… GLBenchmark 2.5.1… GLBenchmark 2.5.1… Qudrant IO Qudrant3D Qudrant2D SmarkBench2012 BaseMarkES2v1… BaseMarkES2v1 Taiji FishIE Tank -200M Octane Browsermark EEMC BrowingBench Sunspider AnTutu 2.9.4 CPU Int Micro Benchmark… iSPEC00 - speed CaffeineMark Dhrystone - BENC CoreMark EEMBC 0.00% Micro Benchmark… 20.00% Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Configurations: [describe config + what test used + who did testing]. For more information go to http://www.intel.com/performance But we still identified several gaps and need some tools to help us…
    • Tools Enabling Enabled a lot of tools for performance tuning • vTune − Based on PMU, mainly used to tune Dom0 • Xentrace − Based on original Xentrace, but revised to count key events and hypercalls • Perf − Based on PMU, mainly used to tune Dom0 • Xenoprofier − Based on PMU, mainly used to tune Xen Those tools prove very helpful in the late tuning Performance and power
    • Case #1: Quadrant I/O (perf) Gap: 21% • Analysis: Storage data are cached in page cache which is allocated from high_memory. Each page cache access needs to kmap/kunmap which leads to a lot of PVMMU hypercalls • Optimizations: − Shrink Xen memory foot print from 168M to 72M − Force page cache allocated from low memory • Gap reduced to 8.5% Can we continue to optimize and close that gap of 8.5%?
    • Case #1: Quadrant I/O (perf) continue Profiled by Vtune Among 8.5%: Xen overhead = 134/3138 ~= 4.27% Xen traces type hcall hcall hcall hcall hcall hcall hcall event event hcall hcall hcall hcall event hcall event event hcall hcall hcall name mwait_idle_op multicall mmu_update mmuext_op vcpu_op event_channel_op xen_version PAGE_FAULT IRQ event_channel_op physdev_op event_channel_op event_channel_op TIMER_IRQ event_channel_op TRAP PRIVOP fpu_taskswitch undfined apic_op count 3759 12147 27126 7781 6577 3405 4937 9764 1119 1259 1692 840 761 472 545 1038 1032 1038 21 3 total cost: cost cost% 37142118744 145492506 32.12% 113270256 25.00% 50615724 11.17% 39658986 8.75% 26617650 5.88% 12374700 2.73% 11719224 2.59% 10178934 2.25% 9081834 2.00% 8251512 1.82% 7024398 1.55% 6150300 1.36% 5745738 1.27% 4361118 0.96% 1040916 0.23% 872700 0.19% 439638 0.10% 102672 0.02% 5484 0.00% 453004290 Among 4.27%: PVMMU overhead ~= 70.88% Hard to further close the gap of 8.5% due to PVMMU overhead
    • Case #2: Home Screen Scroll (power) Gap: 1.2% gap Profiled by Vtune Xen overhead = 30/3176 ~= 1% Xen traces type event Name IRQ, event TRAP, event PAGE_FAULT, event PRIVOP, event count cost 1843 cost% 18323532 7.040037304 88 131352 0.050466416 943 3237852 1.244006825 1385 533748 0.205069952 TIMER_IRQ, 144 2062704 0.792506221 hcall mmu_update, 990 8866296 3.40649688 hcall fpu_taskswitch, 95 66816 0.025671204 hcall multicall, 8736 109199952 41.9554339 hcall xen_version, 3914 10860348 4.172626492 hcall vcpu_op, 9694 55009236 21.13495769 hcall mmuext_op, 3858 34409052 13.22021375 hcall event_channel_op, 1188 10105920 3.882769643 hcall physdev_op, 1078 7469256 2.869743719 hcall mwait_idle_op, 3938 Among 1%: PVMMU overhead = 59.83% 23493503868 total cost 260276064 cost of PAGE_FAULT, mmu_update, multicall, mmuext_op PVMMU overhead again… 155713152 59.82615136
    • Other Gaps Other cases have the similar Xen overheads: • PVMMU • TLS/stack switching Some cases could be optimized by reducing the hypercall numbers by optimizing guest • For example, Quadrant I/O While, some cases could be hard to optimize due to PV overhead • For example, CF-Bench malloc Could be fixed by HVM Dom0
    • Summary • Dom0 Android achieved near-native power and performance • Still found some power and performance gaps caused by PVOPS − PVMMU − TLS/Stack switch • Those gaps could be fixed by HVM Dom0
    • Q&A • Questions? • or contact Jack Ren <jack.ren@intel.com> Xiantao Zhang <xiantao.zhang@intel.com>