SlideShare a Scribd company logo
1 of 27
Download to read offline
Title
You Need Cloud to Manage Cloud: Kubernetes
As Best Way to Manage OpenStack Cloud
Vadim Ponomarev
2
What Is OpenStack?
o open-source cloud computing platform
o created by Rackspace and NASA in 2010
o written in python
o modular and microservices architecture
o used for a public/private cloud
3
Why do we need OpenStack?
o open-source self-hosted solution for private / public clouds
o Vmware alternative with zero price tag
o a strong network isolation
4
OpenStack architecture
5
OpenStack architecture
6
What's the problem?
o hundreds of microservices
o hundreds of bare-metal servers
o a huge python codebase
o a full update at least twice a year (upstream release period)
7
8
Why Kubernetes?
✓ built to manage thousands of microservices
✓ can scale to hundreds of nodes
✓ containerization solves the problem with dependencies
✓ self-healing, high availability, healthchecks
✓ and many other benefits ...
9
But ...
o OpenStack is not just an application
o VMs will be running on k8s workers
o OpenStack has its own network stack
o a complicated order of starting
services
o a storage based on Ceph
10
Moreover
And how does K8s help
here?
12
o do not reinvent the wheel
o use openstack-helm
o use official docker images when possible
o run all the OpenStack services in one namespace
o RTFM (if you can find it)
General Tips
13
Database
o Percona XtraDB Cluster with k8s operator
o separate database cluster for Neutron (network system)
o use fast SSDs if cluster > 50 compute nodes
o monitoring
14
Storage system
o CephFS is the most popular
o one Ceph cluster for k8s and for OpenStack (different pools)
o a separate physical network for a storage
o dedicated storage hosts if you have the budget:
+ to reduce load
+ to reduce chances of losing data
+ to have faster reboot of compute nodes
15
How OS network works
o SDN OpenvSwitch / OVN
o L2: VXLAN / Geneve / VLAN
o L3: virtual routers / OVN
o dnsmasq DHCP / DNS
o service called “Neutron”
16
Network challenges
1. How does OpenvSwitch/OVN configure host system?
17
Network challenges
2. External networks only VLAN based
Management Network
API Network
Data Net (VxLAN)
External Net
(VLAN)
Node 1 Node 2 Node 3
TOR
switch
Traffic to
unknown target
Unknown
unicast flood
18
Network Tips: OVS
1. OpenvSwitch daemon:
o host network
o capabilities
o run as root
o mount /run directory from the
host system
19
Network Tips: S&L TOR VLANs
2. External networks:
Node 1 Node 2
Leaf switch
VLAN 100
VLAN 100
Leaf switch
VLAN 200
VLAN 200
Node 3 Node 4
Leaf switch
VLAN 300
VLAN 300
Leaf switch
VLAN 400
VLAN 400
Spine switch Spine switch
Layer 2
Layer 3
20
Network Tips: S&L without
VLANs
2. External networks:
Node 1 Node 2
Leaf switch Leaf switch
Node 3 Node 4
Leaf switch Leaf switch
Spine switch Spine switch
Layer 3 (BGP)
Layer 3 (BGP)
Layer 2
21
Network Tips: solutions
o use segments extension and
per-rack VLANs
o use BGP dynamic routing
plugin
o use DVR routers when it’s
possible
o use EVPN-VXLAN
network in the data center
22
Compute
o Nova configures KVM on the host system
o VM can have a direct access for network/GPU cards
o Privileged libvirt container
o Mounts from the host system:
 /lib/modules
 /var/lib/nova
 /var/lib/libvirt
 /run
 /sys/fs/cgroups
23
Compute
o State directories with RW access from all the hosts for the
migrations
24
Is OpenStack ready?
o Bad or non-existent healthchecks
o No graceful restart
o Multiline logs (no json support!)
25
Is OpenStack ready?
o Bad monitoring abilities
o Complex dependencies between
components
o Difficult to customize images with
components
26
If everything is so bad, why K8s?
o Anyway, it gives better control over hundreds of services
with K8s
o It gives more stability with updates
o Self-healing, HA, isolation, etc.
o It’s easier to control at a large scale
o K8s is more popular than OpenStack
➡Body Level One
• Body Level Two
• Body Level Three
• Body Level Four
• Body Level Five
Title
я
Leaveyourfeedback!
Youcanratethetalkandgive
feedbackonwhatyou'velikedor
whatcouldbeimproved
https://www.linkedin.com/in/v-pon/
@velizarx
https://github.com/velp

More Related Content

Similar to You need Cloud to manage Cloud: Kubernetes as best way to manage OpenStack cloud (ENG, HighLoad++ Armenia 2022)

Openstack Neutron Insights
Openstack Neutron InsightsOpenstack Neutron Insights
Openstack Neutron InsightsAtul Pandey
 
Using Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking AccelerationUsing Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking AccelerationNetronome
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingJoe Huang
 
OpenStack 2012 fall summit observation - Quantum/SDN
OpenStack 2012 fall summit observation - Quantum/SDNOpenStack 2012 fall summit observation - Quantum/SDN
OpenStack 2012 fall summit observation - Quantum/SDNTe-Yen Liu
 
OpenFlow tutorial
OpenFlow tutorialOpenFlow tutorial
OpenFlow tutorialopenflow
 
Hong kongopenstack2013 sdn_bluehost
Hong kongopenstack2013 sdn_bluehostHong kongopenstack2013 sdn_bluehost
Hong kongopenstack2013 sdn_bluehostJun Park
 
SDN: an introduction
SDN: an introductionSDN: an introduction
SDN: an introductionLuca Profico
 
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał DubielOpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał Dubieleurobsdcon
 
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.Semihalf
 
CloudKC: Evolution of Network Virtualization
CloudKC: Evolution of Network VirtualizationCloudKC: Evolution of Network Virtualization
CloudKC: Evolution of Network VirtualizationCynthia Thomas
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge MigrationJames Denton
 
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack NetworkingONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networkingmarkmcclain
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksJakub Pavlik
 
Open stack networking_101_update_2014
Open stack networking_101_update_2014Open stack networking_101_update_2014
Open stack networking_101_update_2014yfauser
 

Similar to You need Cloud to manage Cloud: Kubernetes as best way to manage OpenStack cloud (ENG, HighLoad++ Armenia 2022) (20)

Openstack Neutron Insights
Openstack Neutron InsightsOpenstack Neutron Insights
Openstack Neutron Insights
 
Using Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking AccelerationUsing Agilio SmartNICs for OpenStack Networking Acceleration
Using Agilio SmartNICs for OpenStack Networking Acceleration
 
Networking in Openstack - Neutron 101
Networking in Openstack - Neutron 101Networking in Openstack - Neutron 101
Networking in Openstack - Neutron 101
 
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack CascadingBuilding Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
Building Multi-Site and Multi-OpenStack Cloud with OpenStack Cascading
 
OpenStack 2012 fall summit observation - Quantum/SDN
OpenStack 2012 fall summit observation - Quantum/SDNOpenStack 2012 fall summit observation - Quantum/SDN
OpenStack 2012 fall summit observation - Quantum/SDN
 
OpenFlow tutorial
OpenFlow tutorialOpenFlow tutorial
OpenFlow tutorial
 
Hong kongopenstack2013 sdn_bluehost
Hong kongopenstack2013 sdn_bluehostHong kongopenstack2013 sdn_bluehost
Hong kongopenstack2013 sdn_bluehost
 
Meetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStackMeetup 23 - 02 - OVN - The future of networking in OpenStack
Meetup 23 - 02 - OVN - The future of networking in OpenStack
 
State of the OpenDaylight Union
State of the OpenDaylight UnionState of the OpenDaylight Union
State of the OpenDaylight Union
 
SDN: an introduction
SDN: an introductionSDN: an introduction
SDN: an introduction
 
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał DubielOpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
 
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
Software Defined Networks (SDN) na przykładzie rozwiązania OpenContrail.
 
Neutron CI Run on Docker
Neutron CI Run on DockerNeutron CI Run on Docker
Neutron CI Run on Docker
 
10 sdn-vir-6up
10 sdn-vir-6up10 sdn-vir-6up
10 sdn-vir-6up
 
CloudKC: Evolution of Network Virtualization
CloudKC: Evolution of Network VirtualizationCloudKC: Evolution of Network Virtualization
CloudKC: Evolution of Network Virtualization
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
 
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack NetworkingONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
ONUG Tutorial: Bridges and Tunnels Drive Through OpenStack Networking
 
Operators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 NetworksOperators experience and perspective on SDN with VLANs and L3 Networks
Operators experience and perspective on SDN with VLANs and L3 Networks
 
Open stack networking_101_update_2014
Open stack networking_101_update_2014Open stack networking_101_update_2014
Open stack networking_101_update_2014
 
Simplify Networking for Containers
Simplify Networking for ContainersSimplify Networking for Containers
Simplify Networking for Containers
 

Recently uploaded

Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameKapil Thakar
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FESTBillieHyde
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Alkin Tezuysal
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 

Recently uploaded (20)

Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Flow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First FrameFlow Control | Block Size | ST Min | First Frame
Flow Control | Block Size | ST Min | First Frame
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Technical SEO for Improved Accessibility WTS FEST
Technical SEO for Improved Accessibility  WTS FESTTechnical SEO for Improved Accessibility  WTS FEST
Technical SEO for Improved Accessibility WTS FEST
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
Design and Modeling for MySQL SCALE 21X Pasadena, CA Mar 2024
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 

You need Cloud to manage Cloud: Kubernetes as best way to manage OpenStack cloud (ENG, HighLoad++ Armenia 2022)

  • 1. Title You Need Cloud to Manage Cloud: Kubernetes As Best Way to Manage OpenStack Cloud Vadim Ponomarev
  • 2. 2 What Is OpenStack? o open-source cloud computing platform o created by Rackspace and NASA in 2010 o written in python o modular and microservices architecture o used for a public/private cloud
  • 3. 3 Why do we need OpenStack? o open-source self-hosted solution for private / public clouds o Vmware alternative with zero price tag o a strong network isolation
  • 6. 6 What's the problem? o hundreds of microservices o hundreds of bare-metal servers o a huge python codebase o a full update at least twice a year (upstream release period)
  • 7. 7
  • 8. 8 Why Kubernetes? ✓ built to manage thousands of microservices ✓ can scale to hundreds of nodes ✓ containerization solves the problem with dependencies ✓ self-healing, high availability, healthchecks ✓ and many other benefits ...
  • 9. 9 But ... o OpenStack is not just an application o VMs will be running on k8s workers o OpenStack has its own network stack o a complicated order of starting services o a storage based on Ceph
  • 11. And how does K8s help here?
  • 12. 12 o do not reinvent the wheel o use openstack-helm o use official docker images when possible o run all the OpenStack services in one namespace o RTFM (if you can find it) General Tips
  • 13. 13 Database o Percona XtraDB Cluster with k8s operator o separate database cluster for Neutron (network system) o use fast SSDs if cluster > 50 compute nodes o monitoring
  • 14. 14 Storage system o CephFS is the most popular o one Ceph cluster for k8s and for OpenStack (different pools) o a separate physical network for a storage o dedicated storage hosts if you have the budget: + to reduce load + to reduce chances of losing data + to have faster reboot of compute nodes
  • 15. 15 How OS network works o SDN OpenvSwitch / OVN o L2: VXLAN / Geneve / VLAN o L3: virtual routers / OVN o dnsmasq DHCP / DNS o service called “Neutron”
  • 16. 16 Network challenges 1. How does OpenvSwitch/OVN configure host system?
  • 17. 17 Network challenges 2. External networks only VLAN based Management Network API Network Data Net (VxLAN) External Net (VLAN) Node 1 Node 2 Node 3 TOR switch Traffic to unknown target Unknown unicast flood
  • 18. 18 Network Tips: OVS 1. OpenvSwitch daemon: o host network o capabilities o run as root o mount /run directory from the host system
  • 19. 19 Network Tips: S&L TOR VLANs 2. External networks: Node 1 Node 2 Leaf switch VLAN 100 VLAN 100 Leaf switch VLAN 200 VLAN 200 Node 3 Node 4 Leaf switch VLAN 300 VLAN 300 Leaf switch VLAN 400 VLAN 400 Spine switch Spine switch Layer 2 Layer 3
  • 20. 20 Network Tips: S&L without VLANs 2. External networks: Node 1 Node 2 Leaf switch Leaf switch Node 3 Node 4 Leaf switch Leaf switch Spine switch Spine switch Layer 3 (BGP) Layer 3 (BGP) Layer 2
  • 21. 21 Network Tips: solutions o use segments extension and per-rack VLANs o use BGP dynamic routing plugin o use DVR routers when it’s possible o use EVPN-VXLAN network in the data center
  • 22. 22 Compute o Nova configures KVM on the host system o VM can have a direct access for network/GPU cards o Privileged libvirt container o Mounts from the host system:  /lib/modules  /var/lib/nova  /var/lib/libvirt  /run  /sys/fs/cgroups
  • 23. 23 Compute o State directories with RW access from all the hosts for the migrations
  • 24. 24 Is OpenStack ready? o Bad or non-existent healthchecks o No graceful restart o Multiline logs (no json support!)
  • 25. 25 Is OpenStack ready? o Bad monitoring abilities o Complex dependencies between components o Difficult to customize images with components
  • 26. 26 If everything is so bad, why K8s? o Anyway, it gives better control over hundreds of services with K8s o It gives more stability with updates o Self-healing, HA, isolation, etc. o It’s easier to control at a large scale o K8s is more popular than OpenStack
  • 27. ➡Body Level One • Body Level Two • Body Level Three • Body Level Four • Body Level Five Title я Leaveyourfeedback! Youcanratethetalkandgive feedbackonwhatyou'velikedor whatcouldbeimproved https://www.linkedin.com/in/v-pon/ @velizarx https://github.com/velp

Editor's Notes

  1. Hi. My name is Vadim. During the last 5 years, I have been working with clouds based on OpenStack as a developer, DevOps, and architect. And today I want to talk about the problems that you can face with it.
  2. I wanna start with a quick overview of what OpenStack is. Openstack is the most popular open-source solution for creating your own cloud. The first version of Openstack was released by Rackspace and NASA more than 10 years ago. OpenStack has written mainly in Python and has a modular architecture. Nowadays it is used for private and public clouds around the world and has a huge community.
  3. Openstack is an open-source solution for private or public clouds especially when you need a self-hosted cloud. Another alternative is VMware but OpenStack is free. Also, OpenStack has strong network isolation, unlike other solutions and you can use it as a platform for customers. 
  4. OpenStack is a large system that contains tens of separate services and each of them contains hundreds of microservices running across bare-metal servers. This is a scheme of architecture with basic services from the official OpenStack website. And it's greatly simplified.
  5. Underhood, there are many microservices, databases, message queues, schedulers, workers, etc. All these services interact with each other according to different protocols. The slide also has a highly simplified diagram of what is happening.
  6. Let's summarise what is OpenStack based cloud: it's hundreds of microservices deployed on hundreds of bare-metal servers all of these services are written in python and we have to update everything at least twice a year (official release period of OpenStack)
  7. And ​​if you have an OpenStack-based cloud on bare-metal, your DevOps team looks like this every release.
  8. So, why Kubernetes? Kubernetes was created to manage thousands of microservices deployed on hundreds or even thousands of bare-metal hosts. Packing in containers will resolve dependencies problems that arise with python microservices running on the same host. You also get the rest of the benefits of Kubernetes: self-healing, high availability, health checks, and so on.
  9. But you can't just deploy OpenStack into Kubernetes cluster because it is not a simple web application. OpenStack is a cloud platform that should run customer VMs on the hosts as well as provide storage, network, and other infrastructure for them. It means Libvirt will be used for virtualization. Ceph will be used as a storage system. The network system will configure interfaces, bridges, and filters. And some of these systems should have root access to the host system.
  10. Moreover OpenStack network stack really complicated and usually you should have at least 5 isolated networks: for management (number 1 on the slide), for public traffic (number 2 on the slide), an internal cluster network for cross-service communication (number 3 on the slide), for private traffic between VMs (number 4 on the slide), and last for storage. In addition, different types of nodes should have different sets of services and perform certain tasks.
  11. And how does Kubernetes help here?
  12. I wanna start with general tips. First of all, please, do not reinvent the wheel. OpenStack community already built helm charts and resolve lots of problems there. You can find detailed documentation for them by the link on the QR code. OpenStack community has already taken care of docker images and created lightweight images for all services. If you are not going to change the code of the services, use them without problems. QR-code to the repository you can see also on the slide. Also, it’s easier to use one Kubernetes namespace for all OpenStack components maybe except Ceph. And of course, you have to read the documentation. OpenStack well-documented ecosystem and you will find answers to most of your questions. Openstack documentation has only one problem: all documentation is generated from the repository of each project and the search engine is really bad. So this is kinda challenging.
  13. About the database for your cluster. Based on my experience I recommend using MySQL as a database for your OpenStack installation because this is the most popular database in the community. Most of the bugs and problems have already been fixed. Also, I recommend running a separate database cluster at least for Neutron. Neutron is the name of the service which configures networks in OpenStack and can generate many heavy queries. In addition, the community sometimes makes mistakes and I have already seen a situation where the new neutron release has bugs in database queries and the database is stuck in deadlocks. Also, a good decision is to use fast disks for the database in clusters of more than 50 compute nodes. Because OpenStack's components can generate many queries. Monitoring databases is really important it will help you to find problems when the next release will come.
  14. About the storage system, you have to know that Ceph is the most popular solution for the distributed file system in OpenStack world. OpenStack community has great experience with it. You can use one Ceph cluster for Kubernetes and for OpenStack at the same time but you have to create separate pools. You need to have a separate physical network for storage because VM creation or VM migration between hosts will generate huge network traffic. Better use at least 10Gb network interfaces, especially for storage. Also using dedicated nodes for storage is a great idea if you have a budget to reduce the load, reduce the chances of losing data, and also you will not experience problems when rebooting compute nodes.
  15. The network layer is really complicated in Openstack and I want to make a quick overview for better understanding. Default OpenStack network stack based on several technologies. The second layer is based on OpenvSwitch with VXLAN or Geneve protocols for the tunnels between nodes and with VLANs for external networks. The third layer is virtual routers which can be configured as Linux network namespaces or OVN routers. OVN is a complete SDN solution created by OpenvSwitch team. It includes virtual routers and some additional features. You maybe know   Kubernetes CNI based on this project and OpenStack also can use it for their network system. Other network services like DHCP and DNS are configured as Dnsmasq daemons. If you want to go deeper scan the QR code to find more information. If I draw analogies with AWS I would say that OpenStack private network is AWS VPC and Openstack virtual router is AWS gateway.
  16. And with the network system, we have several challenges. On the slide, you can see a simple diagram of how OVS works. The first point: OpenvSwitch is working with Linux kernel on the host system to configure traffic flows. It means that OpenvSwitch daemons are running inside containers and theoretically should be able to load kernel modules to the host system. Also, they should be able to configure bridges and interfaces on the host system. In addition, Kubernetes networks should not conflict with OpenStack networks on the host. So you have to split network ranges for different layers.
  17. The second challenge: OpenStack supports only VLAN-based external networks. It means that all networks configured for public traffic will work based on raw Layer 2. This is a big problem in a large cluster because you have to configure this VLAN everywhere on the nodes. And any abnormal traffic will spread in this VLAN everywhere breaking the stable operation of the entire cluster. On the right side of the slide, you can see a simple scheme of how unknown unicast spreads in the VLAN network. Top-of-rack switches redistribute unknown unicast traffic. So any DDoS and flood traffic is more dangerous in such networks. And the problem gets worse as your cloud grows.
  18. So how to correctly run the network subsystem of OpenStack in Kubernetes? To run the OpenvSwitch daemon correctly you have to provide three capabilities: NET_ADMIN (to allow network configuration on the host system), SYS_MODULE (because OpenvSwitch will load its own kernel modules), and SYS_NICE (for better performance). In addition, you have to connect the container to the host network and mount /run directory from the host system. You can find a small video with an explanation of how it works by the QR code.
  19. About VLAN-based external networks problem. Good approach if the layer 2 segment is not going further than the Top-of-Rack switch. Like in this diagram we have VLANs only between a node and leaf switches and traffic does not go further than the Top-of-Rack switch. But between left and right pairs of leafs, the switch fabric should configure tunnels or routing to have connectivity from one node to another. This of course requires a special configuration of the network devices.
  20. A completely ideal situation is when you don’t have layer 2 segments in the data center network at all. It means that each node is working like a router: handles all layer 2 traffic from the VMs and after that routing it. But this is not always possible as it requires more complicated data center network configuration. But there are several options for how to reduce the size of the L2 segment.
  21. Here are the options provided by OpenStack community itself. First is the segmentation extension that allows you to set and control different VLANs for different parts of your network. It means that you can use segmentation as it was on the first diagram where we had VLAN only between nodes and leafs. Neutron also provides BGP dynamic routing plugin which adds one more agent to each node but allows you to announce subnets directly from the nodes. An analogy from Kubernetes world is BGP mode which is nowadays provided by popular CNIs like Cilium or Kube-router. This is can be a silver bullet for reducing the layer 2 segments. The last two options most likely require changes to the data center network. The first is distributed virtual router or DVR. This is a type of routing configuration that allows you to create a hyper-converged cloud. But it requires configuring VLANs and BGP everywhere on Top-of-Rack switches. Another option and the best solution in most cases is using EVPN VXLAN fabric in your data center network. But it will be more expensive and harder to maintain in the future.
  22. Let's move on to compute system. Nova is OpenStack service that configures Virtual Machines on hosts. Typically it works with Libvirt which requires extended access to the host system. Also, OpenStack supports different types of VM. It can be a simple VM or for example, a GPU VM that has direct unlimited access to one GPU card or it can be VM with direct access to a network card to run some network functions. Libvirt containers have to be run as privileged containers and should have access to lots of directories on the host system. On the slide, you can see a minimal list of these directories that have to be mounted to the container.
  23. In addition, for virtual machine migrations, Nova requires a shared directory accessible from all nodes. The good news is that it is used to create temporary small files, and even NFS can be used for these purposes. Usually, there are no problems with data loss or conflicts.
  24. Even after all this, you need to keep in mind that OpenStack is not ready for Kubernetes out of the box. OpenStack components were developed to be running on bare-metal servers. As a result, we do not have real health checks that can approve that a service was running correctly. Not all of the services support graceful restart which means that sometimes after pod restart you can get an inconsistent system and after this restart the system will start full synchronization just to make sure everything is OK. Also, most of the components can generate multiline logs which makes debugging much more difficult. For example, in the screenshot below this is one error that contains more than 10 lines in the log.
  25. It can be difficult if you deal with OpenStack for the first time. After installation, you have to monitor such a complicated system, but OpenStack does not provide its exporters or other systems for monitoring. OpenStack community provides a basic exporter that collects general information about the cloud like the number of running VMs, IP addresses, and so on. You have to build your monitoring system around components if you want to understand the real state of your cloud. Lots of components are required on other OpenStack components, which means that automatically deploying and running OpenStack from scratch is impossible. Anyway, you have to manually fix some deadlocks and re-run some services. Also, you have to control the order when you deploy a new version of OpenStack components. And if you want to add something like monitoring to the container, you have to create a huge and difficult pipeline for component builds because usually, you can re-use only a basic docker image.
  26. So why do we need Kubernetes in this case? Run OpenStack in Kubernetes much better than on bare-metal servers even if we have to fight with such problems. Kubernetes gives you more stability and control over the system that contains hundreds of components. Deployments become more stable and predictable. With the growth of infrastructure, there are more and more advantages. And of course, it's easier to find a DevOps who worked with Kubernetes than an admin who set up and supports OpenStack for bare-metal. If you know both technologies please let me know after Q&A session I can offer you an interesting job.
  27. I hope that my talk will be useful to you and save time. My contacts are on the slide and you can write to me with any questions, I will try to help. Also, you can see the QR code where you can evaluate my talk. And we have some time for questions.