SlideShare a Scribd company logo
1 of 29
Download to read offline
1
The things I wish I would have known
before doing
OpenStack Cloud Transformation
2
O/ O/
Feel free to interrupt
us at anytim
e!
@daikk115
Dai, Dang Van
Cloud Engineer
@ducnc
Duc, Nguyen Cong
Cloud Engineer
Once Upon a Time....
★ We had new OpenStack Rocky
Cloud cluster which was deployed
by Kolla-ansible
★ Need to start Cloud
transformation immediately
★ Not only transformation but also
integration
The system was increasingly complex
➔ CPU: Broadwell, Skylake,...
➔ SVR and SW: Dell, HP, Cisco,...
➔ HBA: QLogic, Emulex
➔ Storage: Ceph, SAN(s), NAS(s)
➔ Hundred of services jump into
Cloud
We spent a year dealing with most of the cases, decreasing the complex!
We don’t do that here!
We do that in our production!
6
(1) Unify CPU Model on Compute for Live Migration
https://www.bleepingcomputer.com/news/software/list-of-links-bios-updates-for-the-meltdown-and-spectre-patches/
BIOS Version (U41)V521 BIOS Version (U41)V519
Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl,stibp:
libvirtError: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl,stibp
CVE-2017-5703
CVE-2017-5715
(1) Unify CPU Model on Compute for Live Migration
7
❖ CPU model: SandyBridge, SandyBridge-IBRS,
Broadwell, Skylake,...
❖ OpenStack configurations
8
❖ Same computes should have same BIOS and
firmware version
❖ Check flags and cpu model mapping:
/usr/share/libvirt/cpu_map.xml
(1) Unify CPU Model on Compute for Live Migration
9
❖ Similar with
➢ VMWare EVC (Enhanced vMotion Compatibility) (Cluster-Level)
➢ Hyper-V CPU Compatibility Mode (VM-Level)
(1) Unify CPU Model on Compute for Live Migration
Refers:
https://kb.vmware.com/s/article/1005764#What%20is%20EVC
https://www.altaro.com/hyper-v/configure-cpu-compatibility-mode-hyper-v/
(2) Clustering compute node by Host Aggregate
10
HA 01 HA 02
(2) Clustering compute node by Host Aggregate
11
Compute
SAN Storage
Compute
Cinder Volume
Compute Compute
Ceph
Cinder Volume
retype
(2) Clustering compute node by Host Aggregate
12
openstack flavor set FLAVOR_NAME 
--property aggregate_instance_extra_specs:cpu_model=Custom_IvyBridge
(3) Slow down CPU during live migration
13
Live migrate process
Abort Force-complete
Auto-convergencecpu speed
Refers:
https://docs.openstack.org/nova/rocky/admin/live-migration-usage.html
https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
(3) Slow down CPU during live migration
14
Live migrate process
nova live-migration-abort nova live-migration-force-complete
live_migration_permit_auto_converge=true in nova.confcpu speed
Refers:
https://docs.openstack.org/nova/rocky/admin/live-migration-usage.html
https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
(3) Slow down CPU during live migration
15
Refers:
https://docs.openstack.org/nova/rocky/configuration/config.html
❖ OpenStack perspective
➢ live_migration_completion_timeout
➢ live_migration_downtime
➢ live_migration_downtime_steps
➢ live_migration_downtime_delay
16
(4) HAProxy limitation
❖ 2000 established connection/backend/thread
➢ Increase maxconn
➢ Use multi-thread for HAProxy
(5) Speed up network by multiple queue
17
❖ By default, VM only have one combined queue
❖ Best practices: Number queue = vCPUs
Image source: https://blog.cloudflare.com/how-to-receive-a-million-packets/
(5) Speed up network by multiple queue
18
Single queue Multiple queue
# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 0
Combined: 1
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 1
# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 0
Combined: 8
Current hardware settings:
RX: 0
TX: 0
Other: 0
Combined: 8
(5) Speed up network by multiple queue
19
Number
iperf3
process
Single queue speed
(Gbits/sec)
Multiple queue speed
(Gbits/sec)
1 5.42 4.17
2 3.54 + 4.19 = 7.73 3.57 + 3.57 = 7.14
4 2.05 + 2.28 + 1.86 + 2.71 = 8.9 2.36 + 2.71 + 2.33 + 2.46 = 9.86
8 1.14 + 1.34 + 1.48 + 1.30 + 1.20 +
0.99 + 0.91 + 1.17 = 9.53
1.78 + 1.28 + 1.16 + 1.19 + 1.76 +
1.23 + 1.32 + 1.26 = 10.98
❖ Iperf3 test server: 20GB bond0,10GB RAM, 8
vCPUs on different compute
(5) Speed up network by multiple queue
20
openstack flavor set ${FLAVOR_NAME} 
--property hw:vif_multiqueue_enabled='true'
openstack image set 
--property hw_vif_multiqueue_enabled=true 
${IMAGE_NAME}
(6) Session Initiation Protocol
21
Source: https://www.researchgate.net/figure/Elements-of-session-initiation-protocol-network_fig25_301577956
Mobile Packet Backbone
Network
Data Center Network
(6) Session Initiation Protocol
22
Data Center Network
Mobile Packet Backbone Network
eth0
eth1
eth0
eth1
Source IP: eth1
Source IP: eth0
Real-time Transport Protocol
Port
Security
(7) Directly reply from VM behind a load balancer
23
● Very fast load-balancing mode
● Load-balancer network bandwith is not a bottleneck anymore
● Total output bandwith is the sum of each backend bandwith
● The service VIP must be configured on a loopback interface on
each backend and must not answer to ARP requests
Refers: https://www.haproxy.com/blog/layer-4-load-balancing-direct-server-return-mode/
(7) Directly reply from VM behind a load balancer
24
OPS VM
Eth0
10.0.0.10
Loopback
10.0.0.254
LB
10.0.0.254Router
Client
192.168.122.1
Port
Security
(8) Entropy affect java tomcat application startup time
25
❖ Secure Libs of Java need to be feeded entropy
for session ID and others
❖ But new VMs don’t have enough them
daikk115@daikk115 ~/Downloads $ uptime
00:15:31 up 1:39, 3 users, load average:
1,63, 1,47, 1,51
daikk115@daikk115 ~/Downloads $ cat
/proc/sys/kernel/random/entropy_avail
3763
daikk115@daikk115 ~ $ uptime
00:26:50 up 1 min, 2 users, load average:
0,76, 0,29, 0,11
daikk115@daikk115 ~ $ cat
/proc/sys/kernel/random/entropy_avail
838
(8) Entropy affect java tomcat application startup time
26
❖ Fix in OS layer
➢ apt-get install haveged
➢ yum install haveged
Refers:
https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged
https://portal.cloudunboxed.net/knowledgebase/12/Speed-up-and-secure-cloud-servers-with-more-Entropy.html
https://lmgtfy.com/?q=cloud+haveged
(8) Entropy affect java tomcat application startup time
27
❖ Fix in virtualization layer
openstack flavor set FLAVOR-NAME 
--property hw_rng:allowed=True 
--property hw_rng:rate_bytes=2000 
--property hw_rng:rate_period=2000
openstack image set --property hw_rng_model=virtio IMAGE_NAME
Refers:
https://wiki.openstack.org/wiki/LibvirtVirtioRng
(8) Entropy affect java tomcat application startup time
28
❖ Nova configuration
➢ Default: rng_dev_path = /dev/urandom
➢ Better way: rng_dev_path = /dev/hwrng (depend on hardware)
❖ Redhat recommend strategy about HWRNG
➢ Use /dev/hwrng feed /dev/random, then use /dev/random or /dev/urandom for VM source generator
Refers:
https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.rng_dev_path
Conclusion
29
❏ Unify CPU Model
❏ Clustering compute node
❏ Config to slow down CPU during live migration
❏ Increase default HAProxy maxconn in backend
❏ Enable multiple queue
❏ Port level security: Only allow the packet with IP/MAC address pair known
to OpenStack by default
❏ Entropy is very important, especially for scale out system running on OPS
Cloud platform

More Related Content

What's hot

Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines
anynines GmbH
 
Enhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world applicationEnhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world application
openstackindia
 

What's hot (20)

Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
 
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin	Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
Kata Container - The Security of VM and The Speed of Container | Yuntong Jin
 
John Spray - Ceph in Kubernetes
John Spray - Ceph in KubernetesJohn Spray - Ceph in Kubernetes
John Spray - Ceph in Kubernetes
 
OpenStack Neutron behind the Scenes
OpenStack Neutron behind the ScenesOpenStack Neutron behind the Scenes
OpenStack Neutron behind the Scenes
 
Cloud data center and openstack
Cloud data center and openstackCloud data center and openstack
Cloud data center and openstack
 
Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
 
Packaging Strategy for Community Openstack and Implementation Reference | Hoj...
Packaging Strategy for Community Openstack and Implementation Reference | Hoj...Packaging Strategy for Community Openstack and Implementation Reference | Hoj...
Packaging Strategy for Community Openstack and Implementation Reference | Hoj...
 
Cisco UCS loves Kubernetes, Docker and OpenStack Kolla
Cisco UCS loves Kubernetes, Docker and OpenStack KollaCisco UCS loves Kubernetes, Docker and OpenStack Kolla
Cisco UCS loves Kubernetes, Docker and OpenStack Kolla
 
XCP-ng - past, present and future
XCP-ng - past, present and futureXCP-ng - past, present and future
XCP-ng - past, present and future
 
XenServer HA Improvements
XenServer HA ImprovementsXenServer HA Improvements
XenServer HA Improvements
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
Copr HD OpenStack Day India
Copr HD OpenStack Day IndiaCopr HD OpenStack Day India
Copr HD OpenStack Day India
 
Monitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, NagiosMonitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, Nagios
 
Enhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world applicationEnhancing OpenStack FWaaS for real world application
Enhancing OpenStack FWaaS for real world application
 
Ceph with CloudStack
Ceph with CloudStackCeph with CloudStack
Ceph with CloudStack
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
 
OPNFV & OpenStack
OPNFV & OpenStackOPNFV & OpenStack
OPNFV & OpenStack
 
CI, CD, CT, Deploy, IaaS, DevOps, Stage
CI, CD, CT, Deploy, IaaS, DevOps, StageCI, CD, CT, Deploy, IaaS, DevOps, Stage
CI, CD, CT, Deploy, IaaS, DevOps, Stage
 

Similar to Meetup 23 - 01 - The things I wish I would have known before doing OpenStack Cloud Transformation

DockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker ContainersDockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
Docker, Inc.
 
DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes Integration
Hank Preston
 

Similar to Meetup 23 - 01 - The things I wish I would have known before doing OpenStack Cloud Transformation (20)

Neutron CI Run on Docker
Neutron CI Run on DockerNeutron CI Run on Docker
Neutron CI Run on Docker
 
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE PlatformsFIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
FIWARE Tech Summit - Docker Swarm Secrets for Creating Great FIWARE Platforms
 
Openstack HA
Openstack HAOpenstack HA
Openstack HA
 
Docker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platformsDocker Swarm secrets for creating great FIWARE platforms
Docker Swarm secrets for creating great FIWARE platforms
 
Switch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie CarrSwitch as a Server - PuppetConf 2014 - Leslie Carr
Switch as a Server - PuppetConf 2014 - Leslie Carr
 
NFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center OperationsNFD9 - Matt Peterson, Data Center Operations
NFD9 - Matt Peterson, Data Center Operations
 
Drupaljam 2017 - Deploying Drupal 8 onto Hosted Kubernetes in Google Cloud
Drupaljam 2017 - Deploying Drupal 8 onto Hosted Kubernetes in Google CloudDrupaljam 2017 - Deploying Drupal 8 onto Hosted Kubernetes in Google Cloud
Drupaljam 2017 - Deploying Drupal 8 onto Hosted Kubernetes in Google Cloud
 
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker ContainersDockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
DockerCon14 Performance Characteristics of Traditional VMs vs. Docker Containers
 
Performance characteristics of traditional v ms vs docker containers (dockerc...
Performance characteristics of traditional v ms vs docker containers (dockerc...Performance characteristics of traditional v ms vs docker containers (dockerc...
Performance characteristics of traditional v ms vs docker containers (dockerc...
 
Hardware accelerated switching with Linux @ SWLUG Talks May 2014
Hardware accelerated switching with Linux @ SWLUG Talks May 2014Hardware accelerated switching with Linux @ SWLUG Talks May 2014
Hardware accelerated switching with Linux @ SWLUG Talks May 2014
 
Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news   Giles Sirett: Introduction and CloudStack news
Giles Sirett: Introduction and CloudStack news
 
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
[Rakuten TechConf2014] [F-4] At Rakuten, The Rakuten OpenStack Platform and B...
 
Power vc for powervm deep dive tips & tricks
Power vc for powervm deep dive tips & tricksPower vc for powervm deep dive tips & tricks
Power vc for powervm deep dive tips & tricks
 
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaSAutoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
 
Network Automation Tools
Network Automation ToolsNetwork Automation Tools
Network Automation Tools
 
IBM POWER Systems
IBM POWER SystemsIBM POWER Systems
IBM POWER Systems
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
Kubernetes laravel and kubernetes
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
 
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017  - ...
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
 
DevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes IntegrationDevNetCreate - ACI and Kubernetes Integration
DevNetCreate - ACI and Kubernetes Integration
 

More from Vietnam Open Infrastructure User Group

More from Vietnam Open Infrastructure User Group (20)

Room 3 - 5 - Nguyễn Văn Hoàn - 101 Bugs, issues when I work with Ceph
Room 3 - 5 - Nguyễn Văn Hoàn - 101 Bugs, issues when I work with CephRoom 3 - 5 - Nguyễn Văn Hoàn - 101 Bugs, issues when I work with Ceph
Room 3 - 5 - Nguyễn Văn Hoàn - 101 Bugs, issues when I work with Ceph
 
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
 
Room 3 - 6 - Nguyễn Văn Thắng & Dzung Nguyen - Ứng dụng openzfs làm lưu trữ t...
Room 3 - 6 - Nguyễn Văn Thắng & Dzung Nguyen - Ứng dụng openzfs làm lưu trữ t...Room 3 - 6 - Nguyễn Văn Thắng & Dzung Nguyen - Ứng dụng openzfs làm lưu trữ t...
Room 3 - 6 - Nguyễn Văn Thắng & Dzung Nguyen - Ứng dụng openzfs làm lưu trữ t...
 
Room 3 - 4 - Lê Quang Hiếu - How to be a cool dad: Leverage DIY Home Automati...
Room 3 - 4 - Lê Quang Hiếu - How to be a cool dad: Leverage DIY Home Automati...Room 3 - 4 - Lê Quang Hiếu - How to be a cool dad: Leverage DIY Home Automati...
Room 3 - 4 - Lê Quang Hiếu - How to be a cool dad: Leverage DIY Home Automati...
 
Room 3 - 2 - Trần Tuấn Anh - Defending Software Supply Chain Security in Bank...
Room 3 - 2 - Trần Tuấn Anh - Defending Software Supply Chain Security in Bank...Room 3 - 2 - Trần Tuấn Anh - Defending Software Supply Chain Security in Bank...
Room 3 - 2 - Trần Tuấn Anh - Defending Software Supply Chain Security in Bank...
 
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
 
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
Room 3 - 1 - Nguyễn Xuân Trường Lâm - Zero touch on-premise storage infrastru...
 
Room 2 - 2 - Giang Thiên Phú - Kinh nghiệm tối ưu mongodb với database hơn 10...
Room 2 - 2 - Giang Thiên Phú - Kinh nghiệm tối ưu mongodb với database hơn 10...Room 2 - 2 - Giang Thiên Phú - Kinh nghiệm tối ưu mongodb với database hơn 10...
Room 2 - 2 - Giang Thiên Phú - Kinh nghiệm tối ưu mongodb với database hơn 10...
 
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
 
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
 
Room 2 - 7 - Lã Mạnh Hà - Agile + DevOps = A great combination
Room 2 - 7 - Lã Mạnh Hà - Agile + DevOps = A great combinationRoom 2 - 7 - Lã Mạnh Hà - Agile + DevOps = A great combination
Room 2 - 7 - Lã Mạnh Hà - Agile + DevOps = A great combination
 
Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practiceRoom 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
 
Room 2 - 5 - Seong Soo - NHN Cloud - Upstream contribution mentoring program ...
Room 2 - 5 - Seong Soo - NHN Cloud - Upstream contribution mentoring program ...Room 2 - 5 - Seong Soo - NHN Cloud - Upstream contribution mentoring program ...
Room 2 - 5 - Seong Soo - NHN Cloud - Upstream contribution mentoring program ...
 
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsiRoom 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
Room 1 - 2 - Nguyễn Văn Thắng & Dzung Nguyen - Proxmox VE và ZFS over iscsi
 
Room 1 - 6 - Trần Quốc Sang - Autoscaling for multi cloud platform based on S...
Room 1 - 6 - Trần Quốc Sang - Autoscaling for multi cloud platform based on S...Room 1 - 6 - Trần Quốc Sang - Autoscaling for multi cloud platform based on S...
Room 1 - 6 - Trần Quốc Sang - Autoscaling for multi cloud platform based on S...
 
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
Room 1 - 3 - Lê Anh Tuấn - Build a High Performance Identification at GHTK wi...
 
Room 1 - 7 - Lê Quốc Đạt - Upgrading network of Openstack to SDN with Tungste...
Room 1 - 7 - Lê Quốc Đạt - Upgrading network of Openstack to SDN with Tungste...Room 1 - 7 - Lê Quốc Đạt - Upgrading network of Openstack to SDN with Tungste...
Room 1 - 7 - Lê Quốc Đạt - Upgrading network of Openstack to SDN with Tungste...
 
Room 1 - 5 - Thủy Đặng - Load balancing k8s services on baremetal with Cilium...
Room 1 - 5 - Thủy Đặng - Load balancing k8s services on baremetal with Cilium...Room 1 - 5 - Thủy Đặng - Load balancing k8s services on baremetal with Cilium...
Room 1 - 5 - Thủy Đặng - Load balancing k8s services on baremetal with Cilium...
 
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
 
Room 1 - 1 - Benoit TELLIER - On premise email inbound service with Apache James
Room 1 - 1 - Benoit TELLIER - On premise email inbound service with Apache JamesRoom 1 - 1 - Benoit TELLIER - On premise email inbound service with Apache James
Room 1 - 1 - Benoit TELLIER - On premise email inbound service with Apache James
 

Recently uploaded

Recently uploaded (20)

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration ToolingWSO2Con2024 - Low-Code Integration Tooling
WSO2Con2024 - Low-Code Integration Tooling
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 

Meetup 23 - 01 - The things I wish I would have known before doing OpenStack Cloud Transformation

  • 1. 1 The things I wish I would have known before doing OpenStack Cloud Transformation
  • 2. 2 O/ O/ Feel free to interrupt us at anytim e! @daikk115 Dai, Dang Van Cloud Engineer @ducnc Duc, Nguyen Cong Cloud Engineer
  • 3. Once Upon a Time.... ★ We had new OpenStack Rocky Cloud cluster which was deployed by Kolla-ansible ★ Need to start Cloud transformation immediately ★ Not only transformation but also integration The system was increasingly complex ➔ CPU: Broadwell, Skylake,... ➔ SVR and SW: Dell, HP, Cisco,... ➔ HBA: QLogic, Emulex ➔ Storage: Ceph, SAN(s), NAS(s) ➔ Hundred of services jump into Cloud We spent a year dealing with most of the cases, decreasing the complex!
  • 4. We don’t do that here!
  • 5. We do that in our production!
  • 6. 6 (1) Unify CPU Model on Compute for Live Migration https://www.bleepingcomputer.com/news/software/list-of-links-bios-updates-for-the-meltdown-and-spectre-patches/ BIOS Version (U41)V521 BIOS Version (U41)V519 Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl,stibp: libvirtError: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl,stibp CVE-2017-5703 CVE-2017-5715
  • 7. (1) Unify CPU Model on Compute for Live Migration 7 ❖ CPU model: SandyBridge, SandyBridge-IBRS, Broadwell, Skylake,... ❖ OpenStack configurations
  • 8. 8 ❖ Same computes should have same BIOS and firmware version ❖ Check flags and cpu model mapping: /usr/share/libvirt/cpu_map.xml (1) Unify CPU Model on Compute for Live Migration
  • 9. 9 ❖ Similar with ➢ VMWare EVC (Enhanced vMotion Compatibility) (Cluster-Level) ➢ Hyper-V CPU Compatibility Mode (VM-Level) (1) Unify CPU Model on Compute for Live Migration Refers: https://kb.vmware.com/s/article/1005764#What%20is%20EVC https://www.altaro.com/hyper-v/configure-cpu-compatibility-mode-hyper-v/
  • 10. (2) Clustering compute node by Host Aggregate 10
  • 11. HA 01 HA 02 (2) Clustering compute node by Host Aggregate 11 Compute SAN Storage Compute Cinder Volume Compute Compute Ceph Cinder Volume retype
  • 12. (2) Clustering compute node by Host Aggregate 12 openstack flavor set FLAVOR_NAME --property aggregate_instance_extra_specs:cpu_model=Custom_IvyBridge
  • 13. (3) Slow down CPU during live migration 13 Live migrate process Abort Force-complete Auto-convergencecpu speed Refers: https://docs.openstack.org/nova/rocky/admin/live-migration-usage.html https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
  • 14. (3) Slow down CPU during live migration 14 Live migrate process nova live-migration-abort nova live-migration-force-complete live_migration_permit_auto_converge=true in nova.confcpu speed Refers: https://docs.openstack.org/nova/rocky/admin/live-migration-usage.html https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
  • 15. (3) Slow down CPU during live migration 15 Refers: https://docs.openstack.org/nova/rocky/configuration/config.html ❖ OpenStack perspective ➢ live_migration_completion_timeout ➢ live_migration_downtime ➢ live_migration_downtime_steps ➢ live_migration_downtime_delay
  • 16. 16 (4) HAProxy limitation ❖ 2000 established connection/backend/thread ➢ Increase maxconn ➢ Use multi-thread for HAProxy
  • 17. (5) Speed up network by multiple queue 17 ❖ By default, VM only have one combined queue ❖ Best practices: Number queue = vCPUs Image source: https://blog.cloudflare.com/how-to-receive-a-million-packets/
  • 18. (5) Speed up network by multiple queue 18 Single queue Multiple queue # ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX: 0 TX: 0 Other: 0 Combined: 1 Current hardware settings: RX: 0 TX: 0 Other: 0 Combined: 1 # ethtool -l eth0 Channel parameters for eth0: Pre-set maximums: RX: 0 TX: 0 Other: 0 Combined: 8 Current hardware settings: RX: 0 TX: 0 Other: 0 Combined: 8
  • 19. (5) Speed up network by multiple queue 19 Number iperf3 process Single queue speed (Gbits/sec) Multiple queue speed (Gbits/sec) 1 5.42 4.17 2 3.54 + 4.19 = 7.73 3.57 + 3.57 = 7.14 4 2.05 + 2.28 + 1.86 + 2.71 = 8.9 2.36 + 2.71 + 2.33 + 2.46 = 9.86 8 1.14 + 1.34 + 1.48 + 1.30 + 1.20 + 0.99 + 0.91 + 1.17 = 9.53 1.78 + 1.28 + 1.16 + 1.19 + 1.76 + 1.23 + 1.32 + 1.26 = 10.98 ❖ Iperf3 test server: 20GB bond0,10GB RAM, 8 vCPUs on different compute
  • 20. (5) Speed up network by multiple queue 20 openstack flavor set ${FLAVOR_NAME} --property hw:vif_multiqueue_enabled='true' openstack image set --property hw_vif_multiqueue_enabled=true ${IMAGE_NAME}
  • 21. (6) Session Initiation Protocol 21 Source: https://www.researchgate.net/figure/Elements-of-session-initiation-protocol-network_fig25_301577956 Mobile Packet Backbone Network Data Center Network
  • 22. (6) Session Initiation Protocol 22 Data Center Network Mobile Packet Backbone Network eth0 eth1 eth0 eth1 Source IP: eth1 Source IP: eth0 Real-time Transport Protocol Port Security
  • 23. (7) Directly reply from VM behind a load balancer 23 ● Very fast load-balancing mode ● Load-balancer network bandwith is not a bottleneck anymore ● Total output bandwith is the sum of each backend bandwith ● The service VIP must be configured on a loopback interface on each backend and must not answer to ARP requests Refers: https://www.haproxy.com/blog/layer-4-load-balancing-direct-server-return-mode/
  • 24. (7) Directly reply from VM behind a load balancer 24 OPS VM Eth0 10.0.0.10 Loopback 10.0.0.254 LB 10.0.0.254Router Client 192.168.122.1 Port Security
  • 25. (8) Entropy affect java tomcat application startup time 25 ❖ Secure Libs of Java need to be feeded entropy for session ID and others ❖ But new VMs don’t have enough them daikk115@daikk115 ~/Downloads $ uptime 00:15:31 up 1:39, 3 users, load average: 1,63, 1,47, 1,51 daikk115@daikk115 ~/Downloads $ cat /proc/sys/kernel/random/entropy_avail 3763 daikk115@daikk115 ~ $ uptime 00:26:50 up 1 min, 2 users, load average: 0,76, 0,29, 0,11 daikk115@daikk115 ~ $ cat /proc/sys/kernel/random/entropy_avail 838
  • 26. (8) Entropy affect java tomcat application startup time 26 ❖ Fix in OS layer ➢ apt-get install haveged ➢ yum install haveged Refers: https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged https://portal.cloudunboxed.net/knowledgebase/12/Speed-up-and-secure-cloud-servers-with-more-Entropy.html https://lmgtfy.com/?q=cloud+haveged
  • 27. (8) Entropy affect java tomcat application startup time 27 ❖ Fix in virtualization layer openstack flavor set FLAVOR-NAME --property hw_rng:allowed=True --property hw_rng:rate_bytes=2000 --property hw_rng:rate_period=2000 openstack image set --property hw_rng_model=virtio IMAGE_NAME Refers: https://wiki.openstack.org/wiki/LibvirtVirtioRng
  • 28. (8) Entropy affect java tomcat application startup time 28 ❖ Nova configuration ➢ Default: rng_dev_path = /dev/urandom ➢ Better way: rng_dev_path = /dev/hwrng (depend on hardware) ❖ Redhat recommend strategy about HWRNG ➢ Use /dev/hwrng feed /dev/random, then use /dev/random or /dev/urandom for VM source generator Refers: https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.rng_dev_path
  • 29. Conclusion 29 ❏ Unify CPU Model ❏ Clustering compute node ❏ Config to slow down CPU during live migration ❏ Increase default HAProxy maxconn in backend ❏ Enable multiple queue ❏ Port level security: Only allow the packet with IP/MAC address pair known to OpenStack by default ❏ Entropy is very important, especially for scale out system running on OPS Cloud platform