XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung
Upcoming SlideShare
Loading in...5
×
 

XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung

on

  • 2,832 views

Electricity charge for operating data centers is reaching approximately 27% of total operation cost. For this reason, ARM servers have been getting more attention for future energy-efficient data ...

Electricity charge for operating data centers is reaching approximately 27% of total operation cost. For this reason, ARM servers have been getting more attention for future energy-efficient data centers and the performance of ARM processors keeps increasing (i.e., almost 3GHz). For efficiently utilizing ARM cores, ARM PVH has been introduced in Xen 4.3, and based on this, we have implemented live migration feature and evaluated on top of dualcore ARM board. More specifically, we choose multimedia streaming workload, measure the maximum concurrent clients, and calculate clients per watt (CPW) as the performance metric. From this, we have found out that even dualcore ARM processor (with virtualization) gives higher CPW (7 CPW) over x86 case (6 CPW). In addition we could reduce the energy consumption around 70% (4-to-1 consolidation for low-loaded servers) by using server consolidation.

Statistics

Views

Total Views
2,832
Views on SlideShare
2,813
Embed Views
19

Actions

Likes
1
Downloads
30
Comments
0

2 Embeds 19

http://www.xenproject.org 17
http://xenproject.org 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyong Yoo, Samsung Presentation Transcript

  • Performance Evaluation of Live Migration based on Xen ARM PVH for Energy-efficient ARM Server 2013-10-24 Jaeyong Yoo, Sangdok Mo, Sung-Min Lee, ChanJu Park, Ivan Bludov, Nikolay Martyanov Software R&D Center Samsung Electronics Software Center
  • Contents • Motivation • Live Migration in Xen ARM PVH – Design and Implementation • Performance Evaluation 1. Streaming service with ARM vs. x86 2. Streaming server consolidation with live migration 3. Streaming service with quad-core ARM board • Concluding Remark Software Center
  • Motivation Software Center
  • Energy Problem in Datacenters • Datacenters eat up magnificent amount of electricity Racks (3%) Electricity (27%) Space (17%) Cooling Equipment (6%) Service (13%) Power Equipment (17%) Datacenter operation cost Software Center Engineering & Installation (19%) Ref: Jaroslav Rajić, ``Evolving Toward the Green Data Center,’’ http://stack.nil.si/ipcorner/GreenDC/#chapter2
  • ARM Servers for Future Green Data Center • Economical choice • Vendors of ARM Server Soc • OS for ARM Servers – Significant advantage in compute/watt – AMD: Seattle (64-bit ARM server processor, 2H 2014) – Calxeda: ECX-1000 – Applied Micro: X-Gene – Linaro LEG – Redhat deploys ARM-Based Servers for Fedora Project Applied micro X-Gene AMD Seattle: 64-bit ARM server Software Center Calxeda Energy Core ECX-1000
  • ARM Servers for Future Green Data Center • Economical choice • Vendors of ARM Server Soc • OS for ARM Servers – Significant advantage in compute/watt – AMD: Seattle (64-bit ARM server processor, 2H 2014) – Calxeda: ECX-1000 – Applied Micro: X-Gene – Linaro LEG – Redhat deploys ARM-Based Servers for Fedora Project Further energy efficiency maximization:Applied micro AMD Seattle: 64-bit ARM server Server consolidation by virtualization Software Center Calxeda Energy Core ECX-1000 X-Gene
  • Design and Implementation of Live Migration in Xen ARM PVH Software Center
  • Overall Architecture • Components for Live Migration in Xen ARM PVH libvirt Dom0 DomU perform-migrate xl libxl apache mysql libxc streaming server ARM-migrate Kernel Kernel Legend get dirtybitmap dirty-page detecting Memory data save/restore Memory map get/set VCPU save/restore Hardware (Arndale) Software Center HVM context save/restore Hypervisor suspend /resume Cortex-A15 Dualcore 1.7 GHz, 2GB Memory, SATA3, USB3.0 Existing module Newly Impleme nted Modified module
  • Sequence of Live Migration migration source DomU Suspend VCPU save HVM save dirty bitmap migration destination memory memory dirty xc detection save get map xl xl xc memory memory set map restore migratedomain receive domain -restore - save HVM restore VCPU restore DomU resume get/set memory map store dirtypages start dirtypaging get dirty bitmap loop until stop-condition save/restore memory contents suspend domU last-dirty pages save/restore HVM save/restore VCPU resume DomU Software Center
  • Major Hypercalls for Live Migration Implemented Hypercalls for Enabling Live Migration Feature in Xen ARM PVH Functions Hypercalls Description Memory Migration XENMEM_get/set_memory_map • Save/restore physical memory map of DomU XEN_DOMCTL_shadow_op • Enable dirty-page detection • Get dirty-page bitmap XENMEM_add_to_physmap_range • Access the domU’s memory from dom0 VCPU Migration XEN_DOMCTL_get/setvcpucontext • Save/restore the vcpu registers HVM Migration XEN_DOMCTL_get/sethvmcontext • Save/restore the hvm contexts (e.g., timer, interrupt controller) Software Center
  • Dirty-page Tracing: Get-dirty Bitmap libxc ARM-migrate XEN_DOMCTL_ shadow_op (peek dirtypages) hypercall param from toolstack: dirty-page bitmap get dirty-page bitmap Filling up the dirty-page bitmap Software Center Temporary dirty-page storing dirty pages candidates: 1. Embedded in page table (use un-used bits in PTE) 2. Linked list of PFNs 3. Bitmap of PFNs Dirty-page detecting
  • Dirty-page Tracing: Dirty-page Detection guest VA Guest page table Level 1 Level 2 Level 3 IPA domu kernel Xen page table Level 1 Level 1 Level 2 Level 2 Level 3 Xen-side for Xen itself Software Center p2m: physical to machine page table Level 3 MA Xen-side for domu
  • Dirty-page Tracing: Dirty-page Detection guest VA Guest page table Level 1 Level 2 Level 3 IPA domu kernel w=0 Xen page table Level 1 Level 2 Level 2 Level 3 Xen-side for Xen itself Software Center write bit=0/1 Level 1 Level 3 PTE MA Xen-side for domu
  • Dirty-page Tracing: Dirty-page Detection guest VA write request Guest page table Level 1 Level 2 Level 3 IPA domu kernel w=0 Xen page table Level 1 write bit=0/1 Level 1 Level 2 Level 2 Level 3 Xen-side for Xen itself Software Center Level 3 MA fault traped by xen PTE Xen-side for domu
  • Implementation Choice • Manual walking of p2m table • Virtual-linear page table Software Center
  • Manual Walking of p2m Table IPA Xen-side for Xen itself Xen-side for domu Level 1 Level 1 Level 2 Superpage checking Level 2 Level 3 PTE w bit modification PTE Level 3 MA create a mapping to Xen (3 times) physical memory (a.k.a. machine memory) Software Center
  • Virtual-linear Page Table • Consider third-level page table as a continuous memory block in virtual address space ※ virtually continous third-level page table (8GB DomU requires 16MB third-level page table) virtual memory Xen page table Lev el 1 3lvl PT #2 ※ guest’s third-level page table Software Center 3lvl PT #1 3lvl PT #5 Lev el 2 Lev el 3 physical memory (a.k.a. machine memory) ref: http://www.technovelty.org/linux/virtual-linear-page-table.html
  • Virtual-linear Page Table • Consider third-level page table as a continuous memory block in virtual address space ※ virtually continous third-level page table (8GB DomU requires 16MB third-level page table) virtual memory for given IPA, with some arithmetic, calculate the Xen VA and just read it! 3lvl PT #2 ※ guest’s third-level page table Software Center 3lvl PT #1 3lvl PT #5 Xen page table Lev el 1 Lev el 2 Lev el 3 physical memory (a.k.a. machine memory) ref: http://www.technovelty.org/linux/virtual-linear-page-table.html
  • Evaluation Software Center
  • Experiment Environment (Hardware/Software) • x86 hardware – – – • 8 cores (i7-2600 3.4GHz) Intel 1Gbps NIC 4GB memory • Xen source: Xen 4.4 staging Domain kernels: – Dom0: Linaro kernel 3.11 – DomU: Linaro kernel 3.9 Streaming server: – ARM – – – – – • • Arndale board 2 cores 1Gbps Network card (USB 3.0) SSD mSATA 2GB memory Exp. Platform 2 ffserver (RTSP streaming) Exp. Platform 1 Streaming Server Exp. Platform 2 Streaming Server Streaming Server Linux Linux Linux xen x86 HW clients Software Center 1G switch power source Arndale board 220v power Power meter (Yokogawa WT3000)
  • Experiment Environment (Hardware/Software) • x86 hardware – – – • 8 cores (i7-2600 3.4GHz) Intel 1Gbps NIC 4GB memory • Xen source: Xen 4.4 staging Domain kernels: – Dom0: Linaro kernel 3.11 – DomU: Linaro kernel 3.9 Streaming server: – ARM – – – – – • • Arndale board 2 cores 1Gbps Network card (USB 3.0) SSD mSATA 2GB memory Exp. Platform 2 ffserver (RTSP streaming) Exp. Platform 1 Streaming Server Streaming Server Exp. Platform 2 Streaming Server Linux Linux Linux Note: Major evaluations are performed within mobile-featured ARM board. Performance evaluation of server-featured ARM Arndale x86 board is presented at the end of the slides. HW clients Software Center 1G switch power source xen board 220v power Power meter (Yokogawa WT3000)
  • Experiment Environment (Scenarios) Test case 1: Streaming service with ARM vs. x86 Saturate the streaming server to get the maximum number of streaming clients Test case 2: Streaming server consolidation with live migration 10% of the maximum number of streaming clients Measurement 1: Measurement 1: Measurement 3: Maximum number of streaming clients for each test platform Energy-efficiency comparison for each test platform Total live migration time, service downtime Measurement 2: Measurement 2: Energy-efficiency comparison for each test platform Streaming server consolidation within xen-virtualized servers Test case 3: Streaming with quad-core ARM board Software Center Maximum clients with varying number of ARM cores (in-progress) Measurement 4: Dirty-page detection time, dirty-page get-bitmap time, total dirty-page counts
  • Case 1: Streaming Service ARM vs. x86 (Maximum capacity of ARM virtualized Server) • Max streaming clients with varying number of VMs – Dual-core ARM board – Single VCPU for each VM Number Per VM of VMs Memory Max Streaming Clients Watt 1 512MB around ~110 14.8 2 512MB around ~80 12.6 3 256MB around ~90 14.5 4 256MB around ~80 11.8 Software Center Finding: ARM cores are major bottleneck point
  • Case 1: Streaming Service ARM vs. x86 (Energy-efficiency comparison to x86 hardware) • Compare with the best case of ARM* virtualization OS Total memory in server Max Streaming Clients Watt Client/Watt Required memory x86 with Linux 4GB ~750 121.5 W 6.17 CPW ~ 2.4GB ARM with native Linux 2GB ~200 11.7 W 17.09 CPW ~ 707MB ARM with virtualization 512MB ~110 14.8 W 7.43 CPW ~ 340MB * Dual-core ARM CPU Software Center Finding: Even dual-core ARM with virtualization show higher CPW than x86
  • Case 2: Streaming Server Consolidation of ARM virtualized server • Scenario: – 4 ARM boards, each running a 256MB VM – Each VM has 10 clients – Consolidate all VMs to one ARM board, and turn off other 3 ARM boards Watts before consolidation Watts after consolidation Energy saving percentage 2 to 1 consolidation 2 x 8w = 16w 8.6w 46% saving [extrapolated] 3 to 1 3 x 8w = 24w 8.9w 63% saving [extrapolated] 4 to 1 4 x 8w = 32w 9.4w 71% saving Software Center Finding: Server consolidation can significantly save energy consumption
  • Case 2: Live Migration Performance • Migrate a VM at a time – With different domU memory size (128MB, 256MB, 512MB) • Measurements: – Live migration time • Whole time for live migration – Total dirty pages • Number of dirtied pages during the time of live migration Software Center
  • Case 2: Live Migration Performance • Number of dirty-pages in iterations configuration for stop-condition max iter: 29 max_mem_factor: 3 min_dirty_per_iter: 50 Software Center
  • Case 2: Service downtime due to live migration • Service downtime – The time that VM is not responding to outside interaction – Measurement method: • flood-ping to migrating domain • time difference between packets send from the migrating domain Software Center
  • Case 2: Performance of dirty-page detection • Measure the elapsed time of two major functions – dirty-page detection – dirty-page collection Software Center
  • Case 3: Quad-core ARM board (In-progress) • ARM board: 4 ARM cores with 8GB memory Number of VMs Per VM Memory Max Streaming Clients Watt CPW 1 1GB ~ 120 17.0 W 7.06 CPW 2 1GB ~250 18.5 W 13.51 CPW 3 1GB ~300 18.9 W 15.87 CPW • x86 case: (see slide 24) OS Total memory Max Streaming Clients Watt Client/Watt x86 with Linux 4GB ~750 121.5 W 6.17 CPW Software Center
  • Concluding Remark • ARM server is a good candidate for green data centers – Even ARM mobile processors with virtualization results in better CPW compared to x86 – Virtualization in ARM servers can leverage the energy efficiency by server consolidation • Pass-through to DomU could significantly increase the performance Software Center