Professional Resume Template for Software Developers
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack Cloud Transformation
1. 1
The things I wish I would have known
before doing
OpenStack Cloud Transformation
2. 2
O/ O/
Feel free to interrupt
us at anytim
e!
@daikk115
Dai, Dang Van
Cloud Engineer
@ducnc
Duc, Nguyen Cong
Cloud Engineer
3. Once Upon a Time....
★ We had new OpenStack Rocky
Cloud cluster which was deployed
by Kolla-ansible
★ Need to start Cloud
transformation immediately
★ Not only transformation but also
integration
The system was increasingly complex
➔ CPU: Broadwell, Skylake,...
➔ SVR and SW: Dell, HP, Cisco,...
➔ HBA: QLogic, Emulex
➔ Storage: Ceph, SAN(s), NAS(s)
➔ Hundred of services jump into
Cloud
We spent a year dealing with most of the cases, decreasing the complex!
6. 6
(1) Unify CPU Model on Compute for Live Migration
https://www.bleepingcomputer.com/news/software/list-of-links-bios-updates-for-the-meltdown-and-spectre-patches/
BIOS Version (U41)V521 BIOS Version (U41)V519
Live Migration failure: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl,stibp:
libvirtError: operation failed: guest CPU doesn't match specification: missing features: spec-ctrl,stibp
CVE-2017-5703
CVE-2017-5715
7. (1) Unify CPU Model on Compute for Live Migration
7
❖ CPU model: SandyBridge, SandyBridge-IBRS,
Broadwell, Skylake,...
❖ OpenStack configurations
8. 8
❖ Same computes should have same BIOS and
firmware version
❖ Check flags and cpu model mapping:
/usr/share/libvirt/cpu_map.xml
(1) Unify CPU Model on Compute for Live Migration
9. 9
❖ Similar with
➢ VMWare EVC (Enhanced vMotion Compatibility) (Cluster-Level)
➢ Hyper-V CPU Compatibility Mode (VM-Level)
(1) Unify CPU Model on Compute for Live Migration
Refers:
https://kb.vmware.com/s/article/1005764#What%20is%20EVC
https://www.altaro.com/hyper-v/configure-cpu-compatibility-mode-hyper-v/
11. HA 01 HA 02
(2) Clustering compute node by Host Aggregate
11
Compute
SAN Storage
Compute
Cinder Volume
Compute Compute
Ceph
Cinder Volume
retype
12. (2) Clustering compute node by Host Aggregate
12
openstack flavor set FLAVOR_NAME
--property aggregate_instance_extra_specs:cpu_model=Custom_IvyBridge
13. (3) Slow down CPU during live migration
13
Live migrate process
Abort Force-complete
Auto-convergencecpu speed
Refers:
https://docs.openstack.org/nova/rocky/admin/live-migration-usage.html
https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
14. (3) Slow down CPU during live migration
14
Live migrate process
nova live-migration-abort nova live-migration-force-complete
live_migration_permit_auto_converge=true in nova.confcpu speed
Refers:
https://docs.openstack.org/nova/rocky/admin/live-migration-usage.html
https://rk4n.github.io/2016/08/10/qemu-post-copy-and-auto-converge-features/
15. (3) Slow down CPU during live migration
15
Refers:
https://docs.openstack.org/nova/rocky/configuration/config.html
❖ OpenStack perspective
➢ live_migration_completion_timeout
➢ live_migration_downtime
➢ live_migration_downtime_steps
➢ live_migration_downtime_delay
16. 16
(4) HAProxy limitation
❖ 2000 established connection/backend/thread
➢ Increase maxconn
➢ Use multi-thread for HAProxy
17. (5) Speed up network by multiple queue
17
❖ By default, VM only have one combined queue
❖ Best practices: Number queue = vCPUs
Image source: https://blog.cloudflare.com/how-to-receive-a-million-packets/
20. (5) Speed up network by multiple queue
20
openstack flavor set ${FLAVOR_NAME}
--property hw:vif_multiqueue_enabled='true'
openstack image set
--property hw_vif_multiqueue_enabled=true
${IMAGE_NAME}
21. (6) Session Initiation Protocol
21
Source: https://www.researchgate.net/figure/Elements-of-session-initiation-protocol-network_fig25_301577956
Mobile Packet Backbone
Network
Data Center Network
22. (6) Session Initiation Protocol
22
Data Center Network
Mobile Packet Backbone Network
eth0
eth1
eth0
eth1
Source IP: eth1
Source IP: eth0
Real-time Transport Protocol
Port
Security
23. (7) Directly reply from VM behind a load balancer
23
● Very fast load-balancing mode
● Load-balancer network bandwith is not a bottleneck anymore
● Total output bandwith is the sum of each backend bandwith
● The service VIP must be configured on a loopback interface on
each backend and must not answer to ARP requests
Refers: https://www.haproxy.com/blog/layer-4-load-balancing-direct-server-return-mode/
24. (7) Directly reply from VM behind a load balancer
24
OPS VM
Eth0
10.0.0.10
Loopback
10.0.0.254
LB
10.0.0.254Router
Client
192.168.122.1
Port
Security
25. (8) Entropy affect java tomcat application startup time
25
❖ Secure Libs of Java need to be feeded entropy
for session ID and others
❖ But new VMs don’t have enough them
daikk115@daikk115 ~/Downloads $ uptime
00:15:31 up 1:39, 3 users, load average:
1,63, 1,47, 1,51
daikk115@daikk115 ~/Downloads $ cat
/proc/sys/kernel/random/entropy_avail
3763
daikk115@daikk115 ~ $ uptime
00:26:50 up 1 min, 2 users, load average:
0,76, 0,29, 0,11
daikk115@daikk115 ~ $ cat
/proc/sys/kernel/random/entropy_avail
838
26. (8) Entropy affect java tomcat application startup time
26
❖ Fix in OS layer
➢ apt-get install haveged
➢ yum install haveged
Refers:
https://www.digitalocean.com/community/tutorials/how-to-setup-additional-entropy-for-cloud-servers-using-haveged
https://portal.cloudunboxed.net/knowledgebase/12/Speed-up-and-secure-cloud-servers-with-more-Entropy.html
https://lmgtfy.com/?q=cloud+haveged
27. (8) Entropy affect java tomcat application startup time
27
❖ Fix in virtualization layer
openstack flavor set FLAVOR-NAME
--property hw_rng:allowed=True
--property hw_rng:rate_bytes=2000
--property hw_rng:rate_period=2000
openstack image set --property hw_rng_model=virtio IMAGE_NAME
Refers:
https://wiki.openstack.org/wiki/LibvirtVirtioRng
28. (8) Entropy affect java tomcat application startup time
28
❖ Nova configuration
➢ Default: rng_dev_path = /dev/urandom
➢ Better way: rng_dev_path = /dev/hwrng (depend on hardware)
❖ Redhat recommend strategy about HWRNG
➢ Use /dev/hwrng feed /dev/random, then use /dev/random or /dev/urandom for VM source generator
Refers:
https://docs.openstack.org/nova/rocky/configuration/config.html#libvirt.rng_dev_path
29. Conclusion
29
❏ Unify CPU Model
❏ Clustering compute node
❏ Config to slow down CPU during live migration
❏ Increase default HAProxy maxconn in backend
❏ Enable multiple queue
❏ Port level security: Only allow the packet with IP/MAC address pair known
to OpenStack by default
❏ Entropy is very important, especially for scale out system running on OPS
Cloud platform