Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)

627 views

Published on

タイトル:Introduction of private cloud in LINE
アジェンダ:
- Introduction/Background of Private Cloud
- OpenStack in LINE
- Challenge of OpenStack

Published in: Technology
  • Be the first to comment

Introduction of private cloud in LINE - OpenStack最新情報セミナー(2019年2月)

  1. 1. Introduction Private Cloud in LINE (2019/01) Yuki Nishiwaki
  2. 2. Agenda 1. Introduction/Background of Private Cloud 2. OpenStack in LINE 3. Challenge of OpenStack
  3. 3. Who are we? Responsibility - Develop/Maintain Common/Fundamental Function for Private Cloud (IaaS) - Consider/Think of Optimization for Whole Private Cloud Network Service Operation PlatformStorage Software - IaaS (OpenStack + α) - Kubernetes Knowledge - Software - Network, Virtualization, Linux
  4. 4. Private Cloud OpenStack VM (Nova) Image Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) Loadbalancer L4LB L7LB Kubernetes (Rancher) Storage Block Storage (Ceph) Object Storage (Ceph) Database Search/Analytics Engine (ElasticSearch) RDBMS (Mysql) KVS (Redis) Messaging (Kafka) Function (Knative) Baremetal Platform Service Network Storage Operation Operation Tools
  5. 5. Today’s Topic OpenStack VM (Nova) Image Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) Loadbalancer L4LB L7LB Kubernetes (Rancher) Storage Block Storage (Ceph) Object Storage (Ceph) Database Search/Analytics Engine (ElasticSearch) RDBMS (Mysql) KVS (Redis) Messaging (Kafka) Function (Knative) Baremetal Operation Tools
  6. 6. OpenStack in LINE 導入時期 2016年 Version Mitaka + Customization クラスタ数 4 Hypervisor数 1100+ ● Dev Cluster: 400 ● Prod Cluster: 600 (region 1) ● Prod Cluster: 76 (region 2) ● Prod Cluster: 80 (region 3) VM数 26000+ ● Dev Cluster: 15503 ● Prod Cluster: 8870 (region 1) ● Prod Cluster: 335 (region 2) ● Prod Cluster: 229 (region 3)
  7. 7. Difficulty of building OpenStack Cloud TOR Core Aggregation ToR Aggregation ToR Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Hypervisor Aggregation ToR OpenStack database OpenStack database OpenStack API OpenStack API Core Aggregation Datacenter Rack ● Knowledge of Networking ○ Design/Plan whole DC Network ● Knowledge of Operation for Large Product ○ Build Operation Tool which is not for specific software ○ Consider User Support ● Knowledge of Server Kitting ○ Communicate procurement department ● Knowledge of OpenStack Software ○ Design deployment of OpenStack ○ Deploy OpenStack ○ Customize OpenStack ○ Troubleshooting ■ OpenStack Component ■ Related Software
  8. 8. Building OpenStack is not completed in one team Network Operation Platform ● Maintain ○ Golden VM Image ○ ElasticSearch for logging ○ Prometheus for alerting ● Develop Operation Tools ● User Support ● Buy New Servers ● Design/Planning ○ DC Network ○ Inter-DC Network ● Implement Network Orchestrator (Outside OpenStack) ● Design OpenStack Deployment ● Deploy OpenStack ● Customize OpenStack ● Troubleshooting Member: 3+ Member: 4+ Member: 4+
  9. 9. Challenge of OpenStack Basically We are trying to make OpenStack(IaaS) stable What we have done 1. Legacy System Integration 2. Bring New Network Architecture into OpenStack Network 3. Maintain Customization for OSS while keep to catch up upstream What we will do 1. Scale Emulation Environment 2. Internal Communication Visualizing/Tuning 3. Containerize OpenStack 4. Event Hub as a Platform
  10. 10. Challenge of OpenStack Basically We are trying to make OpenStack(IaaS) stable What we have done 1. Legacy System Integration 2. Bring New Network Architecture into OpenStack Network 3. Maintain Customization for OSS while keep to catch up upstream What we will do 1. Scale Emulation Environment 2. Internal Communication Visualizing/Tuning 3. Containerize OpenStack 4. Event Hub as a Platform
  11. 11. Configuration Management Challenge 1: Integration with Legacy System Even before cloud, We have many Company-wide Systems for some purpose CMDB Monitoring System Server Login Authority Management IPDB Server Register Spec, OS, Location.. Register IP address, Hostname Register server as a monitoring target Register acceptable user of server setup Ask for new server Infra Dev
  12. 12. Challenge 1: Integration with Legacy System After private cloud, “Server Creation” is completed without Infrastructure department interruption. Thus Private Cloud itself should register new server Private Cloud Configuration Management CMDB Monitoring System Server Login Authority Management IPDB Server Create new server Dev Register
  13. 13. Challenge 2: New Network Architecture in our DC For scalability, operatabilty. We introduce CLOS Network Architecture and terminate L3 on Hypervisor. Previous New
  14. 14. Challenge 2: Support new architecture in OpenStack Network Controller (Neutron) neutron-server neutron-dhcp-agent neutron-linuxbridge-agent OSS implementation neutron-metadata-agent Expect to share L2 Network We want all vms not to share l2 network neutron-custom-agent Replace New
  15. 15. Challenge 3: Improve Customization for OSS ● We have customized many OpenStack Components ○ Perf ● Previously we just customize it after customize again and again OpenStack VM (Nova) Image Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) VM (Nova) customize commit for A customize commit for C customize commit for A customize commit for B customize commit for AIt’s difficult for us to take specific patch away from our customized OpenStack. Specific version upstreamLINE version forked
  16. 16. Challenge 3: Improve Customization for OSS VM (Nova) customize commit for A customize commit for C customize commit for A customize commit for B customize commit for A Specific version upstreamLINE version forked patch for A patch for B patch for C Base Commit ID VM (Nova) Specific version maintain by git maintain by git ● Don’t fork/Stop to fork ● Just maintain only patch file in git => easily take patch out than before
  17. 17. Challenge will be different from Day1 to Day2 Day1 (So far) ● Develop user faced feature ○ Keep same experience as before (legacy system) ○ Support new architecture ● Daily operation ○ Predictable ○ Unpredictable based on trouble Day2 (from now) ● Enhance Operation ● Optimize Development ● Reduce daily operation ○ Predictable ○ Unpredictable
  18. 18. Challenge of OpenStack Basically We are trying to make OpenStack(IaaS) stable What we have done 1. Legacy System Integration 2. Bring New Network Architecture into OpenStack Network 3. Maintain Customization for OSS while keep to catch up upstream What we will do 1. Scale Emulation Environment 2. Internal Communication Visualizing/Tuning 3. Containerize OpenStack 4. Event Hub as a Platform
  19. 19. Future Challenge 1: Scale Emulation Environment 導入時期 2016年 Version Mitaka + Customization クラスタ数 4+1 (WIP: Semi Public Cloud) Hypervisor数 1100+ ● Dev Cluster: 400 ● Prod Cluster: 600 (region 1) ● Prod Cluster: 76 (region 2) ● Prod Cluster: 80 (region 3) VM数 26000+ ● Dev Cluster: 15503 ● Prod Cluster: 8870 (region 1) ● Prod Cluster: 335 (region 2) ● Prod Cluster: 229 (region 3) The number of hypervisor is continuously increased We faced the situation - Timing/Scale related error - Some operation took long time !
  20. 20. We need environment to simulate scale from following point of view without preparing same number of Hypervisor ● Database Access ● RPC over RabbitMQ Future Challenge 1: Scale Emulation Environment They are control plane specific load. We can use this environment for tuning of control plane in OpenStack
  21. 21. ● Implement Fake Agent (nova-compute) (neutron-agent) ● Use container instead of actual HV Future Challenge 1: Scale Emulation Environment Hypervisor (nova-compute, neutron-agent) Controle Plane Controle Plane Controle Plane 600 HV Orchestrate/Manage Real Environment Scale Environment Controle Plane Controle Plane Controle Plane ● Use same env 600 fake-HV Server Fake HV (docker container) (nova-compute, neutron-agent) Hypervisor (nova-compute, neutron-agent)Hypervisor (HV) (nova-compute, neutron-agent) Fake HV (docker container) (nova-compute, neutron-agent)
  22. 22. ● Implement Fake Agent (nova-compute) (neutron-agent) ● Use container instead of actual HV Future Challenge 1: Scale Emulation Environment Hypervisor (nova-compute, neutron-agent) Controle Plane Controle Plane Controle Plane 600 HV Orchestrate/Manage Real Environment Scale Environment Controle Plane Controle Plane Controle Plane ● Use same env 600 fake-HV Server Fake HV (docker container) (nova-compute, neutron-agent) Hypervisor (nova-compute, neutron-agent)Hypervisor (HV) (nova-compute, neutron-agent) Fake HV (docker container) (nova-compute, neutron-agent)Easy to add new Fake HV => We can emulate any number of scale
  23. 23. Future Challenge 2: Communication Visualizing There are 2 types of communication among OpenStack each software Authentication (Keystone) VM (Nova) Network (Neutron) Microservice ● Restful API (between component) ● RPC over Messaging Bus (inside component) Restful API Restful API Restful API neutron-agent neutron-server RPC
  24. 24. Future Challenge 2: Communication Visualizing Authentication (Keystone) VM (Nova) Network (Neutron) Microservice Restful API Restful API Restful API neutron-agent neutron-server RPC Anytime this can be broken Communication can be failed. - Because of scale - Because of in-proper config Error sometimes got propagated from one to other
  25. 25. Future Challenge 2: Communication Visualizing Authentication (Keystone) VM (Nova) Network (Neutron) Microservice Restful API Restful API Restful API neutron-agent neutron-server RPC Anytime this can be broken Communication can be failed. - Because of scale - Because of in-proper config Error sometimes got propagated from one to other 1. Very difficult to troubleshoot this kind of issue because of - Error got propagated from one to another - Log is not always enough information - Log is only shown when something happen 2. Sometimes problem can be predicted by some metrics - how many rpc got received - how many rpc waited for reply
  26. 26. Future Challenge 2: Communication Visualizing Authentication (Keystone) VM (Nova) Network (Neutron) Microservice Restful API Restful API Restful API neutron-agent neutron-server RPC Monitoring tool Monitor Communication related metrics
  27. 27. Future Challenge 3: Containerize OpenStack Motivation/Current Pain Point ● Complexity of packaging tool like RPM ○ Dependency between packages ○ Configuration for new file => We need to build RPM everytime we changed the code ● Impossible to run different version of OpenStack on same server ○ Dependency of common library of OpenStack => we actually deployed much more control plane servers than we actually need ● Lack of observability for all softwares running on control plane ○ No way to identify which part is to install depended library and which part is to install our software in deployment script (ansible, chef…) ○ Deployment script doesn’t take care software running after deployed ○ We can not notice if some developer run something temporally script
  28. 28. Future Challenge 3: Containerize OpenStack Server Server Server Ansible PlaybookAnsible PlaybookAnsible Playbook Install library Install software Start software K8s manifest K8s manifest nova-api neutron-server common-library RPM Server nova-api neutron-server common-library Docker Registry Get package Server Server Server nova-api container nova-api common-library nova-api container nova-api common-library Install software Start software
  29. 29. Future Challenge 4: EventHub for All Component OpenStack VM (Nova) Image Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) Loadbalancer L4LB L7LB Kubernetes (Rancher) Storage Block Storage (Ceph) Object Storage (Ceph) Database Search/Analytics Engine (ElasticSearch) RDBMS (Mysql) KVS (Redis) Messaging (Kafka) Function (Knative) Baremetal Operation Tools
  30. 30. Future Challenge 4: EventHub for All Component OpenStack VM (Nova) Image Store (Glance) Network Controller (Neutron) Identify (Keystone) DNS Controller (Designate) Loadbalancer L4LB L7LB Kubernetes (Rancher) Storage Block Storage (Ceph) Object Storage (Ceph) Database Search/Analytics Engine (ElasticSearch) RDBMS (Mysql) KVS (Redis) Messaging (Kafka) Function (Knative) Baremetal Operation Tools Depending on others Some component/operation script want to do something When User(actually project) in Keystone is deleted When VM is created When RealServer is added to Loadbalancer
  31. 31. Pub/Sub Concept in Microservice Architecture Authentication Component VM Component Publish important event of own component Subscribe just interested events Network Component This component can do something when interested event happenedThis component don’t have to consider who this component need to work with Messaging bus (RabbitMQ)
  32. 32. Pub/Sub Concept in Microservice Architecture Authentication Component VM Component Publish important event of own component Subscribe just interested events Network Component This component can do something when interested event happenedThis component don’t have to consider who this component need to work with Messaging bus This mechanism allow us to extend Private Cloud (Microservice) without changing existing code for future
  33. 33. Future Challenge 4: EventHub for All Component This part of notification logic has been already implemented in OpenStack but... Authentication Component (Keystone) Messaging bus (RabbitMQ) VM Component (Nova) Operation ScriptA Operation ScriptB L7LB Kubernetes Publish Event Subscribe Event Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Business logic Business logic Business logic Business logic
  34. 34. Future Challenge 4: EventHub for All Component This part of notification logic has been already implemented in OpenStack but... Authentication Component (Keystone) Messaging bus (RabbitMQ) VM Component (Nova) Operation ScriptA Operation ScriptB L7LB Kubernetes Publish Event Subscribe Event Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Logic for access rabbitmq Business logic Business logic Business logic Business logic ● Sometimes Logic for access rabbitmq code got bigger than actual business logic ● All of components/script need to implement that logic first
  35. 35. Future Challenge 4: EventHub for All Component We are currently developing new component which allow us to register program with interested event. It will make more easy to co-work with other component Authentication Component (Keystone) Messaging bus (RabbitMQ) VM Component (Nova) Operation ScriptA Operation ScriptB L7LB Kubernetes Publish Event Logic for access rabbitmq Business logic Business logic Business logic Business logic Subscribe Event Business logic Business logic Business logic Function as a Service New
  36. 36. For more future: IaaS to PaaS, CaaS…. We are currently trying to introduce additional abstraction layer above from IaaS ● https://engineering.linecorp.com/ja/blog/japan-container-days-v18-12-report/ ● https://www.slideshare.net/linecorp/lines-private-cloud-meet-cloud-native-world
  37. 37. Take a glance at “K8s on OpenStack”
  38. 38. Many Container Related Project started in LINE Published ● https://www.slideshare.net/linecorp/parallel-selenium-test-with-docker ● https://www.slideshare.net/linecorp/test-in-dockerized-system-architecture-of-line-now-line-now-docker ● https://www.slideshare.net/linecorp/local-development-environment-for-micro-services-with-docker ● https://www.slideshare.net/linecorp/clova-92916456 (Japanese Only) Undergoing Project
  39. 39. Currently Application Engineer maintain it... VM Kubernetes Kubernetes Container Container Container Container Container Container Container Container Container Container Container Container Developers A in Japan Developers B in Taiwan Private Cloud Private Cloud Developers Responsibility border Application Developer OS VM OS VM OS BM OS BM OS BM OS IaaS
  40. 40. Private Cloud Developers Operating Knowledge is distributed VM Kubernetes Kubernetes Container Container Container Container Container Container Container Container Container Container Container Container Developers A in Japan Developers B in Taiwan Private Cloud Responsibility border Application Developer OS VM OS VM OS BM OS BM OS BM OS knowledge knowledge ● Lack of mechanism to share knowledge between them ● Quality will be uneven ● New team start from beginner IaaS Problem
  41. 41. Time to extend our responsibility from IaaS to Private Cloud Developers VM Kubernetes Kubernetes Container Container Container Container Container Container Container Container Developers A in Japan Developers B in Taiwan Private Cloud Responsibility border Application Developer OS VM OS VM OS BM OS BM OS BM OS knowledge knowledge knowledge IaaS KaaS
  42. 42. Rancher 2.X based KaaS Kubernetes Kubernetes Kubernetes  Kubernetes   Kubernetes   Kubernetes   API クラスタの操作:  - クラスタの作成  - クラスタの変更  - ノードの追加 クラスタの利用:  - アプリケーションのデプロイ  - アプリケーションのアップデート  - アプリケーションのスケールアウト ... クラスタ管理 - クラスタ/ノード作成、変更 - クラスタ/ノード監視、ヒーリング

×