Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Design and Operation of OpenStack Cloud on 100 Physical Servers - OpenStack Summit 2014 Paris Presentation

4,251 views

Published on

Title: Design and Operation of OpenStack Cloud on 100 Physical Servers - OpenStack Summit 2014 Paris Presentation
Date: Tuesday, November 4 • 14:00 - 14:40
Speakers: Hiromichi Itou,Ken Igarashi,Akihiro Motoki

Agenda:
http://openstacksummitnovember2014paris.sched.org/event/142815bb3b425e4d6374e34bc81d871e
Video:
https://www.openstack.org/summit/openstack-paris-summit-2014/session-videos/presentation/design-and-operation-of-openstack-cloud-on-100-physical-servers-ntt-docomo

You will face many problems when you start designing your OpenStack Cloud because of a lack of full design architecture information. For example, there are many Neutron plugins, but it is difficult to choose the best plugin and its configuration to get a high throughput of a Virtual Machine (VM) and achieve a High Availability (HA) of L3 Agent. Also, we couldn't find information for how much computing resource (CPU, Memory and HDD) is required for management and operation servers (e.g. API, RabbitMQ, MySQL and Monitoring etc.).

We built OpenStack Icehouse Cloud on 100 physical servers (1600 physical cores) without using commecial software, and did several performance and long-run tests to address these problems.

In this talk, we will present performance comparison of Neutron ML2 plugin implementations (Open vSwitch and Linux Bridge), tunnelling protocols (GRE and VXLAN) and physical network configurations (Network Interface Bonding and Server Side Equal Cost Multi Path) to achieve 10Gbps at a VM, and the L3 Agent HA we implemented. Also, we will present how much computing resource we used and each server loads to operate the cloud. Finaly, we will share our Ansible Based OpenStack deployment and management tool.

Key topics include:
- Performance comparison of OSS Neutron ML2 plugins (Open vSwitch and Linux Bridge) and tunneling protocols (GRE and VXLAN)
- Performance comparision of redundant network configurations (Network Interface Bonding and Server Side Equal Cost Multi Path)
- HA of L3 Agent (ACT/STBY) we implemented
- Ansible based deployment/operation tools
- Items we must watch for OpenStack operation
- Hardware specifications and resources we used to operate the Cloud

We will share a full design architecture and hardware sizing information for a large scale cloud and prove OSS based Neutron can handle a hundred servers.

Published in: Technology

Design and Operation of OpenStack Cloud on 100 Physical Servers - OpenStack Summit 2014 Paris Presentation

  1. 1. Copyright©2014  NTT  DOCOMO,  INC.  All  rights  reserved. Design and Operation of OpenStack Cloud on 100 Physical Servers NTT DOCOMO Inc. Ken Igarashi Virtualtech Japan Inc. Hiromichi Ito NEC Akihiro Motoki
  2. 2. DOCOMO, INC All Rights Reserved Ken Igarashi ○  Leading OpenStack Project at NTT DOCOMO ○  One of the first members of proposing OpenStack Bare Metal Provisioning (currently called "Ironic") - bit.ly/1stuN2E Hiromichi Ito ○  CTO of Virtualtech Japan Inc. Akihiro Motoki ○  Senior Research Engineer, NEC ○  Core developer of Neutron and Horizon. About Us 2
  3. 3. DOCOMO, INC All Rights Reserved ○  Information required Ø  Hardware resources/performance –  Management resources –  User resources ü Nova, Cinder – depends of individual Ø  Hardware/Software configuration –  High Availability –  Network configuration (e.g. Neutron) Ø  Deployment tool –  JuJu/MaaS, Fuel, Helion, RDO etc ○  How we get it Ø  Did simulation using 100 physical hosts –  Total 3200vCPU, 12.8TB Memory –  Collaboration with: National Institute of Information and Communications Technology, VirtualTech Japan Inc., NTT Advanced Technology Corporation, Japan Advanced Institute of Science and Technology, Tokyo University and Dell Japan Inc. Design OpenStack Cloud 3
  4. 4. DOCOMO, INC All Rights Reserved Test Environment 4 National Institute of Information and Communications Technology Ishikawa prefecture About 1400 servers in the single site
  5. 5. DOCOMO, INC All Rights Reserved Research and Development New locater protocol development Home network protocol development Virtual node migration algorithms HEMS management protocol New tunnel protocols Inter-AS traceback TCP behavior comparison Proxy server performance evaluation Evaluation of X-ray sharing Video conference protocol switching FW benchmarking Protocol / Product Evaluation Education Security operation competition Cyberrange training Remote hands-on for Asian students Competition of cloud computing ideas Testbed federation algorithms Supporting software for control testbeds Wireless link simulation on wired link IPv6 support on network testbeds Simulation Realistic and Flexible experiments based on bare-wire environment StarBED – http://bit.ly/10gYttm 5 ○  Open to any companies and organizations
  6. 6. DOCOMO, INC All Rights Reserved 6 100 Physical Servers on StarBED Compute Node x 36 Leaf Switch (S4810) Leaf Switch (S4810) Leaf Switch (S4810) Spine Switch (S6000) Spine Switch (S6000) Compute Node x 37 40Gb x 2 10Gb x 4 LB (BIG-IP 5200V) x 2 Leaf Switch (S4810) 10Gb x 4 Management Servers x 21 10Gb x 1 10Gb x 1 10Gb x 1 10Gb x 1 Compute Node x 6 40Gb x 2 40Gb x 2 40Gb x 2 10Gb x 1 10Gb x 1 10Gb x 1 10Gb x 1 10Gb x 1 10Gb x 1 ○  OpenStack Icehouse
  7. 7. Copyright©2014  NTT  DOCOMO,  INC.  All  rights  reserved. Network Configuration
  8. 8. DOCOMO, INC All Rights Reserved ○  Multi-Chassis Link Aggregation (MLAG) Network Redundancy 8 ○  Endhost Equal Cost Multi Path(ECMP) Switch Switch eth1 bond0 z.z.z.z eth2 MLAG with VRRP ECMP Bonding Switch Switch eth1 x.x.x.x lo z.z.z.z eth2 y.y.y.y Routing Protocol ECMP Maturity Need expensive switch Remove network complexity Maturity
  9. 9. DOCOMO, INC All Rights Reserved ○  Virtual network creation is essential to increase network security Ø  ML2 with tunnel network configuration –  Type Driver ü VXLAN ü GRE –  We chose VXLAN ü VXLAN uses MAC Address-in-User Datagram Protocol (MAC-in-UDP) encapsulation ü The load balancing algorithm works effectively by using UDP port number hash ü Many network hardware support VXLAN Ø  Mechanism Drivers –  Open vSwitch (OVS) –  Linux Bridge Neutron Configuration 9
  10. 10. DOCOMO, INC All Rights Reserved ○  Throughput between 1 VM and1 VM on different physical hosts (1 TCP connection) Ø  No much difference between OVS and Linux Bridge Ø  MLAG gets better performance than ECMP Throughput for Different Network Configuration 10 3.4     3.6     3.8     4.0     4.2     4.4     4.6     ovs_mlag   ovs_ecmp   bridge_mlag   Throughput  [Gbps]
  11. 11. DOCOMO, INC All Rights Reserved ○  MLAG with OVS seems the best configuration today Ø  Performance, Potential, Stability 3.4     3.6     3.8     4.0     4.2     4.4     4.6     ovs_mlag   ovs_ecmp   bridge_mlag   Throughput  [Gbps] Throughput for Different Network Configuration 11 We increased VM’s MTU to 8950 to get the performance but the physical network bandwidth is 20Gbps
  12. 12. DOCOMO, INC All Rights Reserved Throughput for Different Number of VMs 12 ○  VM communicates to random VM on a different physical hosts (1 connection per VM) ○  It consumes only 50% of the total bandwidth though allocating all physical resource to VM 0.0     2.0     4.0     6.0     8.0     10.0     12.0     0.0     0.5     1.0     1.5     2.0     2.5     3.0     3.5     100   200   300   400   477    Throughput  (PHY)  [Gbps]    Throughput  (VM)  [Gbps] Number  of  Servers VM  (MTU  1500)   VM  (MTU  8950)   PHY  (MTU  1500)   PHY  (MTU  8950)   * PHY :VM’s total throughput measured at a physical host
  13. 13. DOCOMO, INC All Rights Reserved ○  We could get 19Gbps (MTU 1500) between physical hosts ○  Enabling VXLAN Ø  We could get only 10Gps (MTU 8950) Ø  VM’s CPU load during the communication Ø  The throughput is highly reduced by turning on VXLAN –  CPU is overloaded by VTEP software processing ü packet encapsulation and de-capsulation Slow Throughput 13 Server Receiver 89.3 0.0 391:31.82 vhost-yyyyy 49.3 0.8 257:06.66 qemu-system-x86 98.4 0.0 462:41.90 vhost-xxxxx 42.9 0.9 294:34.67 qemu-system+
  14. 14. DOCOMO, INC All Rights Reserved ○  NIC with VXLAN offload would be able to reduce CPU load ○  Available Device Lists Ø  Mellanox ConnectX-3 Pro –  World 1st VXLAN offload NIC Ø  Intel X710,XL710 –  Release at 2014 Sep. Ø  Emulex XE102 Ø  Qlogic 8300 series –  Support at October 21, 2013 software release Ø  Qlogic NetXtreme II 57800 series –  Broadcom is selling its NetXtreme II line of 10GbE controllers and adapters to QLogic. NIC with VXLAN Offload Support 14
  15. 15. DOCOMO, INC All Rights Reserved 0.0     5.0     10.0     15.0     20.0     25.0     0.0     0.5     1.0     1.5     2.0     2.5     3.0     3.5     4.0     10   20   30   38   Throughput  (PHY)  [Gbps] Thoughput  (VM)    [Gbps] Number  of  Servers VM  OFF  (MTU  1500)   VM  ON  (MTU  1500)   VM  OFF  (MTU  8950)   VM  ON  (MTU  8950)   PHY  OFF  (MTU  1500)   PHY  ON  (MTU  1500)   PHY  OFF  (MTU  8950)   PHY  ON  (MTU  8950)   Throughput using VXLAN Offload NIC 15 ○  Throughput between VMs on 4 different physical hosts (2 server, 2 receiver) ○  It can consume 98% of the total physical bandwidth Ø  VXLAN Offload with MTU 8950 3.5 ~ 5.6 x 1.3 ~ 1.4 x * PHY :VM’s total throughput measured at a physical host
  16. 16. DOCOMO, INC All Rights Reserved CPU Load 16 0.0 50.0 100.0 150.0 200.0 250.0 1 2 4 6 8 10 12 14 16 CPU[%] Number of Servers On Tx CPU/Gbps OFF Tx CPU/Gbps 27.1% 0.0 50.0 100.0 150.0 200.0 250.0 300.0 1 2 4 6 8 10 12 14 16 CPU[%] Number of Servers On Rx CPU/Gbps OFF Rx CPU/Gbps 28.5% Server Receiver
  17. 17. DOCOMO, INC All Rights Reserved ○  We could get 1.3~5.5 times throughput compared to NIC without offload capability ○  CPU load on a physical host was reduced 27~28% ○  MTU 8950 showed 1.5~1.6 times better throughput than MTU 1500 Ø  We decided to set MTU 9000 on a physical host but we deliver MTU 1500 by DHCP server Ø  Let user extend MTU VXLAN Offload NIC 17
  18. 18. Copyright©2014  NTT  DOCOMO,  INC.  All  rights  reserved. High Availability
  19. 19. DOCOMO, INC All Rights Reserved ○  You need 10-12 people Ø  4 group + α people are required ○  If we can delay fixing a problem later, we can only work on weekday Ø  High Availability is the key to achieve this ○  Our design Ø  Double redundancies for hardware Ø  Triple redundancies for software ⇒ Against double failure 24/7 Support 19
  20. 20. DOCOMO, INC All Rights Reserved ○  Others○  Load Balancer based VM VM VM VM VM VM MySQL (Galera) High Availability 20 Arbitrator DB1 DB2 DB3 DB4 VM VM Nova OpenStack APIs Zabbix LBLB Load Balancing SSL Termination Health Check Neutron Agents PXE, DNS, DHCP MAAS RabbitMQ
  21. 21. DOCOMO, INC All Rights Reserved ○  4 Nodes + 1 Arbitrator MySQL HA 21 Arbitrator DB1 LBLB DB2 DB3 DB4 Read/Write to a single node Quorum-based Voting Health Check •  Check TCP Port 3306 •  Cluster Status ü show status like 'wsrep_ready=‘ON’ Priority 1 Priority 2 Priority 3 Priority 4
  22. 22. DOCOMO, INC All Rights Reserved Galera-cluster State Transition 22 Open Primary Joiner Joined [3] Synced [4] Donor [2] IST and SST wsrep_ready=‘ON’ ○  WSREP_STATUS = 2 and 4 can’t cover all the states
  23. 23. DOCOMO, INC All Rights Reserved ○  Node Recovery Ø  Health check detects DB1’s failure MySQL HA 23 DB1 LBLB DB2 DB3 DB4 Priority 1 Priority 2 Priority 3 Priority 4 Health Check •  Check TCP Port 3306 •  Cluster Status ü show status like wsrep_ready=‘ON’ Arbitrator
  24. 24. DOCOMO, INC All Rights Reserved ○  Node Recovery Ø  Designated DB is changed from DB1 to DB2 MySQL HA 24 DB1 LBLB DB2 DB3 DB4 Priority 1 Priority 2 Priority 3 Priority 4 •  Cluster Status ü show status like wsrep_ready=‘YES’ -> ‘NO’ Arbitrator
  25. 25. DOCOMO, INC All Rights Reserved Arbitrator ○  Node Recovery Ø  DB1 is fixed from DB4 (lowest priority) using IST or SST MySQL HA 25 DB1 LBLB DB2 DB3 DB4 Priority 1 Priority 2 Priority 3 Priority 4 Synchronization •  IST: Incremental State Transfer •  SST: State Snapshot Transfer
  26. 26. DOCOMO, INC All Rights Reserved MySQL HA 26 DB1 LBLB DB2 DB3 DB4 Priority 1 Priority 2 Priority 3 Priority 4 ○  Node Recovery Ø  DB1’s priority is changed before joining the cluster Priority 1 Priority 2 Priority 3 Priority 4 Arbitrator
  27. 27. DOCOMO, INC All Rights Reserved ○  Node Recovery Ø  The cluster is backed to normal state MySQL HA 27 DB1 LBLB DB2 DB3 DB4 Priority 4 Priority 1 Priority 2 Priority 3 Arbitrator
  28. 28. DOCOMO, INC All Rights Reserved 0.0     50.0     100.0     150.0     200.0     250.0     300.0     350.0     120TPS   240TPS   120TPS   240TPS   Time  for  recovery    [s] Backgroud  Traffic JOINED-­‐>SYNCED   JOINER-­‐>JOINED   Recovery Time 28 ○  Time for IST Performance Max 340 TPS Performance Max 1356 TPS
  29. 29. DOCOMO, INC All Rights Reserved Recovery Time 29 0.0     200.0     400.0     600.0     800.0     1,000.0     1,200.0     1,400.0     1,600.0     1,800.0     2,000.0     120TPS   240TPS   120TPS   240TPS   Time  for  recovery    [s] Background  Traffic JOINED-­‐>SYNCED   JOINER-­‐>JOINED   ○  Time for SST Performance Max 340 TPS Performance Max 1356 TPS
  30. 30. DOCOMO, INC All Rights Reserved ○  Loosing all database Disaster Recovery Restore from backup Fix NW DB1 DB2 DB3 DB4 Stand-by 30 SST Run MySQL Run MySQLStand-by Stand-by DONOR Run MySQL SST SST Healthy State DB 3GB 11 seconds 70 seconds 70 seconds 98.2 minutes 97.5minutes for 12hours bin log recovery Run MySQL DONOR
  31. 31. DOCOMO, INC All Rights Reserved ○  MAAS includes Ø  DNS, DHCP, tftp ○  DNS Ø  Master – Slave ○  DHCP (ISC DHCP) Ø  Replication – (Delivering fixed IP address through DHCP) ○  MAAS and tftp Ø  Back up by VM MAAS-HA 31 MAAS Storage VM Image Activate
  32. 32. DOCOMO, INC All Rights Reserved ○  We add multiple RabbitMQ address to configuration files Ø  Easy configuration and application level health monitoring At lease 3 RabbitMQ (5 ideally) hosts are required against split-brain ○  Read/Write to single node using load balancer Ø  Don’t need to care about split-brain – 3 RabbitMQ hosts Ø  Network level health monitoring RabbitMQ-HA 32 VM VM VM VM VM VM VM VM Nova LB LB VM VM VM VM VM VM VM VM Nova fcluster_partition_handling =‘autoheal’
  33. 33. Copyright©2014  NTT  DOCOMO,  INC.  All  rights  reserved. Neutron HA
  34. 34. DOCOMO, INC All Rights Reserved Network Setup 34 o  DHCP agent Ø  Support Active-Active. Assign a virtual network into multiple agents ü dhcp_agents_per_network = 3 (should be <= 3) o  L3 agent Ø  Support only Active-Standby Ø  If it fails, we need to migrate a router to another agent o  Metadata agent Ø  Has no state ⇒ Just need to keep metadata-agent running in all nodes NW node Data Plane (VXLAN) External Net Neutron Server Message Queue NW node NW node L3-agt dhcp-agt Control Plane dhcp-agt dhcp-agt L3-agt L3-agt meta-agt meta-agt meta-agt Compute Node Compute Node Compute Node
  35. 35. DOCOMO, INC All Rights Reserved Monitoring Points 35 NW node Data Plane (VXLAN) External Net GW router Neutron Server Message Queue NW node NW node L3-agt dhcp-agt Compute Node Compute Node Compute Node Control Plane dhcp-agt dhcp-agt L3-agt L3-agt [2] PING from external net [1] PING from Internal net [4] PING from C-plane [3] Agent state check via REST API
  36. 36. DOCOMO, INC All Rights Reserved ○  Data plane connectivity –  If it fails, users cannot communicate through routers. Ø  [1] Internal network for VXLAN (ping) Ø  [2] External network (ping) ○  Network agent health check –  L3 agent, DHCP agent Ø  [3] Agent alive state from neutron server (REST API agent-list) –  Each neutron agent reports its state via message queue. Ø  [4] Control network connectivity (ping) –  If it fails, we are no longer able to control the node. Health Checks against Failures 36
  37. 37. DOCOMO, INC All Rights Reserved Recovery from Failures 37 NW node Data Plane (VXLAN) External Net GW router Neutron Server Message Queue NW node NW node Compute Node Compute Node Compute Node Control Plane L3-agt dhcp-agt dhcp-agt dhcp-agt L3-agt L3-agt (1) Disable agents on the host
  38. 38. DOCOMO, INC All Rights Reserved Recovery from Failures 38 NW node Data Plane (VXLAN) External Net GW router Neutron Server Message Queue NW node NW node Compute Node Compute Node Compute Node Control Plane L3-agt dhcp-agt dhcp-agt dhcp-agt L3-agt L3-agt (2) Migrate network/router
  39. 39. DOCOMO, INC All Rights Reserved Recovery from Failures 39 NW node Data Plane (VXLAN) External Net GW router Neutron Server Message Queue NW node NW node Compute Node Compute Node Compute Node Control Plane L3-agt dhcp-agt dhcp-agt dhcp-agt L3-agt L3-agt
  40. 40. DOCOMO, INC All Rights Reserved Recovery from Failures 40 NW node Data Plane (VXLAN) External Net GW router Neutron Server Message Queue NW node NW node Compute Node Compute Node Compute Node Control Plane L3-agt dhcp-agt dhcp-agt dhcp-agt L3-agt L3-agt (3) Shutdown NICs (or the node)
  41. 41. DOCOMO, INC All Rights Reserved ○  Dedicated network namespace on network node for external connectivity checking Ø  Network node has reachability from external network node. Ø  Use IP address on isolated namespace to avoid access to the node host from public network. Tips: Checking External network connectivity 41 Network Node Bridge (external) ethN Router netns Router netns Netns for checking IPAddr GW router PING check No access to the host
  42. 42. DOCOMO, INC All Rights Reserved ○  Throughput from external node to a VM ○  Injected a control plane failure and migrated a router from another L3-agent Traffic During Router Migration 42 0 100 200 300 400 500 600 700 800 900 1000 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 Elapsed Time [second] Throughput[Mbps] 10 seconds
  43. 43. DOCOMO, INC All Rights Reserved ○  Migrated 88 routers from one L3-agent to other two L3-agents Router Migration Progress 43 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 0:00:00 0:00:17 0:00:35 0:00:52 0:01:09 0:01:26 0:01:44 0:02:01 0:02:18 NumberofRoutersProcessed Elapsed Time [sec] L3-agent processed REST API processed REST API requested L3-agent processed (aggregated)
  44. 44. DOCOMO, INC All Rights Reserved Possible Improvements 44 NW node Data Plane (VXLAN) External Net GW router Neutron Server Message Queue NW node NW node Compute Node Compute Node Compute Node Control Plane L3-agt L3-agt L3-agt o  Integration with L3-Agent HA feature Ø  It improves data-plane availability much Ø  Monitoring external network connectivity needs to be improved in L3-HA Ø  Still requires router migration based on C-Plane monitoring No monitoring for external network now HA supported for internal network failure C-Plane monitoring is still required
  45. 45. DOCOMO, INC All Rights Reserved ○  Integration with Juno Neutron features Ø  Using L3-Agent HA feature (prev. page) Ø  Leveraging L3-agent auto rescheduling –  Helps us reduce the number of REST API calls –  Juno Neutron support L3-agent rescheduling for routers on inactive agents –  “admin_state” is not considered for rescheduling ß Need to be improved ○  Possible contributions to Neutron upstream Ø  DHCP agent auto rescheduling Ø  LBaaS agent scheduling –  There is no way to reassigning LBaaS agent for HAProxy driver Possible Improvements 45
  46. 46. Copyright©2014  NTT  DOCOMO,  INC.  All  rights  reserved. Management Resources
  47. 47. DOCOMO, INC All Rights Reserved ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MaaS, MongoDB Management Resources 47
  48. 48. DOCOMO, INC All Rights Reserved Management Resources 48 3 ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MaaS, MongoDB
  49. 49. DOCOMO, INC All Rights Reserved Management Resources 49 3 3 + 2 ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MaaS, MongoDB
  50. 50. DOCOMO, INC All Rights Reserved Management Resources 50 3 3 + 2 4 + 0.5 ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MaaS, MongoDB
  51. 51. DOCOMO, INC All Rights Reserved Management Resources 51 3 3 + 2 4 + 0.5 3 ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MaaS, MongoDB
  52. 52. DOCOMO, INC All Rights Reserved Management Resources 52 3 3 + 2 4 + 0.5 3 3 ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MAAS, MongoDB
  53. 53. DOCOMO, INC All Rights Reserved Management Resources 53 3 3 + 2 4 + 0.5 3 3 xxTB ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MAAS, MongoDB
  54. 54. DOCOMO, INC All Rights Reserved Management Resources 54 3 3 + 2 4 + 0.5 3 3 xxTB 2 ○  Controller Ø  API ○  Message Queue Ø  RabbitMQ ○  Database Ø  MySQL – OpenStack ○  Neutron Servers ○  Monitoring Ø  Zabbix Servers(+MySQL) ○  Storage Ø  Log, Backup ○  Deployment + etc Ø  MAAS, MongoDB
  55. 55. DOCOMO, INC All Rights Reserved Management Resources 55 Controller RabbitMQ MySQL Neutron Zabbix Log, backup storage etc
  56. 56. DOCOMO, INC All Rights Reserved Management Resources 56 Controller RabbitMQ MySQL Neutron Zabbix Log, backup storage etc Nova Compute
  57. 57. DOCOMO, INC All Rights Reserved Scalability Test 57 0.0%   10.0%   20.0%   30.0%   40.0%   50.0%   60.0%   0.0     5.0     10.0     15.0     20.0     25.0     30.0     35.0     40.0     45.0     50.0     0-­‐1000   1000-­‐2000   2000-­‐3000   3000-­‐4000   4000-­‐5000   Error  Rate  [%] Time [S]     Number  of  VMs Elapsed  Time  to  Boot  a  Instance  (Avg.)   Error  %   ○  We measured VM boot time for 0-5000 instances
  58. 58. DOCOMO, INC All Rights Reserved Database Size - Zabbix 58 size duration History 50 bytes 30 days Trend 128 bytes 90 days Event 130 bytes 90 days Servers Switch (per port) Tempest Health Check (30 seconds) Usage (180 seconds) Health Check Usage System Check Item Number 69 557 1 24 500 Size (history) 15GB 40GB 687MB 5GB 108MB Size (trend) 2GB 15GB 88MB 2GB 138MB Size (event) * 1GB 1GB 1GB 1GB 1GB Total Size 18GB 57GB 2GB 8GB 1GB * Assume 1 event/second 86 GB
  59. 59. DOCOMO, INC All Rights Reserved Database Size - OpenStack 59 Sep 14 2014 Sep 25 2014 OpenStack Related Keystone* 1.4GB 1.4GB Nova (28k -> 55k) 451MB 856MB Neutron (7k -> 9k) 78MB 235MB Glance 64MB 89MB Heat 45MB 55MB Cinder 39MB 43MB Sub Total 2.1GB 2.7GB MySQL Related Transaction log 4.1GB 4.1GB Ibdata1 268MB 268MB Total Size 6.4GB 7.0 GB * Did “keystone-manage token_flush” every 1 hour
  60. 60. DOCOMO, INC All Rights Reserved ○  We can change configuration easily (e.g. HA and Neutron) ○  We can use Ansible for deployment and operation Deployment Tools 60 DOCOMO (Ansible based) Mirantis Fuel HP Helion Canonical Juju/MAAS MySQL HA LB + Percona haproxy +corosync +pacemaker +Galera haproxy +keepalived +Galera haproxy +corosync +pacemaker +Percona RabbitMQ HA Configfile based (pause_minority) RabbitMQ Cluster(autoheal) + LB Configfile based (pause_minority) Configfile based (ignore) LB HA Commercial Products haproxy (nameserver) +corosync +pacemaker haproxy +keepalived haproxy +corosync +pacemaker Network Neutron + Own HA Neutron Neutron DVR Neutron
  61. 61. DOCOMO, INC All Rights Reserved ○  Default security group Ø  IP table entry is added/deleted to all VMs whenever you create/delete a VM ⇒ ovs-agent became busy when we created mv VMs ○  Number of Neutron workers Ø  neutron.conf –  api_workers = ‘number of cores’ –  rpc_workers = ‘number of cores’ Ø  metadata_agent.ini –  metadata_workers = ‘number of cores’ ○  Number of File Descriptors Ø  Default : 1024 Ø  RabbitMQ: more than 5,000 connections Ø  metadata-ns-proxy (L3-agent,dhcp-agent): request x 2 ○  Retry VM Creation Time Ø  nova.conf –  scheduler_max_attempts = 1 ⇒ No difference between 1 and 3 Tips Learned from Scalability Tests 61
  62. 62. Copyright©2014  NTT  DOCOMO,  INC.  All  rights  reserved. END

×