Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Software-defined migration how to migrate bunch of v-ms and volumes within a limited time frame

188 views

Published on

Kentaro Matsumoto, KDDI Corporation, Hyde Sugiyama, Red Hat, Inc

As telecom career, we KDDI have been managing thousands of physical servers and run various kinds of workloads. In our operation of such a huge environment, We are frequently required to shut down our servers for maintenance, but it is not easy to negotiate with our tenant users to allow downtime. To make it easier, we are developing the structure called "Zone Migration", using the framework of OpenStack project "Watcher". "Zone Migration" makes it possible to migrate tenants’ workloads from compute nodes and storage devices we want to maintain (source zone) to new blank ones (destination zone) efficiently, automatically, and with minimum downtime.

These requirements as follows are realized.

-A lot of VMs and volumes should be migrated within a limited time frame

-Operations should be automated, but also can be controlled manually

-Time and load of migration should be under control so that tenants’ systems will not be affected

We are proceeding with the project in cooperation with NEC and Red Hat, and developing this structure on Red Hat OpenStack Platform.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Software-defined migration how to migrate bunch of v-ms and volumes within a limited time frame

  1. 1. Software-Defined Migration: How to migrate bunch of VMs and Volumes within a limited time frame Kentaro Matsumoto KDDI Takashi Torii NEC Hyde Sugiyama Red Hat
  2. 2. 1. Background 2. Concept & Use case 3. PoC environment and result 4. Contribution plan 5. Discussion – further NFV use case Agenda
  3. 3. ● Company Name : KDDI CORPORATION ● Date of Establishment: June 1, 1984 ● Main Business: Telecommunications Business ● President: Takashi TANAKA ● Capital: US$ 1.42 B ● Revenue: US$ 44.66 B * ● Operating Income: US$ 8.3 B * ● Total Employees : 28,172 * *consolidated base Fortune Global Company in the World *As of March, 2016 *1USD=100JPY About KDDI
  4. 4. Experience of global business 60+years Global Network Coverage 190+countries Number of TELEHOUSE Data Centers 45sites Total Number of Employees in Japan 21,000+ Total Number of group companies 183 companies Head Quarter Tokyo Number of offices around the world 107offices About KDDI TELEHOUSE Shanghai, China Submarine Cable ship KDDI Pacific Link TELEHOUSE New York, U.S. Global Network Operation Center, London KDDI Satellite Parabolic Antenna
  5. 5. 6 Huge Operation & Maintenance cost Compute Cluster compute1 compute2 compute3 compute4 compute5 compute6 shared spare vm1 vm3 vm4 vm5vm2b vm2a • Various issues at computes (bios update, security patch for KVM …) • Maintenance one by one sequentially at midnight • Hard to negotiate with tenants (owners of VMs) • Thousands of these clusters patch 1. Migrate vm1 to spare compute 2. Apply patch 3. Migrate vm1 to original compute patch 4. Repeat process Background
  6. 6. Concept 7 Migrate all resources of zone at once Service Zone OpenStack IaaS Environment 1. Add computes as Maintenance Zone from H/W pool Maintenance Zone 2. Migrate all resources (VMs/Volumes) at once to Maintenance Zone H/W pool 3. H/W maintenance of each compute • Migrate VMs and Volumes on hundreds of physical servers and storages • Develop this structure based on OpenStack technologies • “nova livemigration” for running VMs • “nova migration” for stopped VMs • “nova volume update” for VM-attached volumes • “cinder migration” for VM-detached volumes • “watcher” for migration scheduling patch
  7. 7. Use Case 8 Use Case of KDDI PoC environment Zone#2: Maintenance ZoneZone#1 : Service Zone#1 compute1 compute2 compute3 compute4 compute5 compute6 vm1 Storage for Zone#1 Vol#1 (vm1 /vda)Vol#3 (vm1 /vdb) vm2 Vol#2 (vm2 /vda)Vol#4 (vm2 /vdb) Vol#5 (detached) Storage for Zone#2 Shared Storage Zone#1+Zone#2 : Service Zone#1 1. Integrate Zone#2 as Service Zone#1 2. Migrate vms 3. Migrate volumes Zone#2: Service Zone#1Zone#1 : Maintenance Zone  Precondition • Don’t use ephemeral disk • System volume of VMs (/dev/vda) are stored in shared storage • Additional volume of VMs (/dev/vdb) are stored in zone-dedicated storage 4. Divide zone again
  8. 8. 9 PoC Environment and Result Migration of large amounts of VMs and Volumes Zone#2: Maintenance ZoneZone#1 : Service Zone#1 compute1 compute2 compute3 compute4 compute5 compute6 Storage for Zone#1(AFA) Storage for Zone#2(AFA) Shared Storage (ceph) ■ Test scenario • Migrate 30 VMs and 30 volumes • 20MB/sec load in each VM by “stress” • 500IOPS load in each VM by “vdbench” RedHat OpenStack Platform 9 (mitaka) 16core/64GB mem 10vm / compute VM(3 sizes) System (50GB) Additional(5GB) 30 additional volumes 30 system volumes Size vCPU MEM #ofVM S 1 2 12 M 2 4 12 L 4 8 6
  9. 9. 10 PoC Environment and Result Migration of large amounts of VMs and Volumes Zone#2: Maintenance ZoneZone#1 : Service Zone#1 compute1 compute2 compute3 compute4 compute5 compute6 Storage for Zone#1(AFA) Storage for Zone#2(AFA) Shared Storage (ceph) 1. Migrate max 5 volumes in parallel 2. Migrate max 5 VMs attaching migrated volumes 3. Migrates next 5 volumes 4. Migrates next 5 VMs 5. Repeat process ■ Result • 42 minutes for whole process • No ping failure to VMs at migration
  10. 10. PoC Environment and Result 11 Developing GUI for monitoring status
  11. 11. 12 Basic Features will be implemented in Pike Additional Strategy and Efficacy will be in Queen Item Blueprint Target Detail Framework Automatic-Triggering-Audit Done(Ocata) Triggering action plan automatically Cancel-Action-Plan Pike-2 Add support to cancel execution of Action plan Suspended-Audit Done(Pike-1) Add suspended audit state for continuous audit Multi-Data-Source Pike-3 Handle multiple datasources independently from the strategy Multi-Global-Efficacy- Indicator Queen Supports multiple global efficacy indicator Data Model Plugin Cinder-Model-Integration Pike-3 Integrate storage (cinder) information in the model Action Plugin Volume-Migration Pike Implements volume migrate action Strategy Plugin Volume-Migration-Strategy Queen Implementing migration strategy Watcher Contribution plan
  12. 12. Watcher Block Diagram 13 API Decision Engine Strategy Applier Action CinderCLI DataModel Data source Workflow Nova Glance Ceilometer Volume Migration Action Cinder Model Integration Volume Migration Strategy Auto-Trigger/Suspended/ Cancel Multi data sourceMulti-Global-Efficacy-Indicator
  13. 13. Discussion – NFV use case 14 Service Availability Classification Levels (ETSI GS NFV-REL 001 V1.1.1) SAL Type Customer Type Service/Function Level 1 Network Operation Control Traffic Government/Regulatory Emergency Services Intra-carrier engineering traffic Emergency telecommunication service (emergency response, emergency dispatch) Critical Network Infrastructure Functions (e.g. VoLTE functions DNS Servers,etc.) Level 2 Enterprise and/ or large scale customers (e.g. Corporations, University) Real-time traffic (Voice and video) Network Infrastructure Functions supporting Level 2 services (e.g. VPN servers, Corporate Web/ Mail servers) Level 3 General Consumer Public and ISP Traffic Data traffic (including voice and video traffic provided by OTT) Network Infrastructure Functions supporting Level 3 services Zone migration target use case for planned outage
  14. 14. Q & A Thank you 谢谢

×