This presentation describes the upgrade of an OpenStack infrastructure and how virtualization of network and compute elements can help you to orchestrate all the upgrade process, minimizing the downtime for all running applications. Automating it with Ansible playbooks or similar tools helps to handle odd cases depending on the specific target infrastructure.
3. Fastweb and FASTcloud
FASTWEB C1 – PUBLIC
3
Fastweb S.p.A. is an Italian telecommunications company that provides landline,
broadband Internet and digital television services
Fastweb is fully owned by the Swiss telecommunication company Swisscom
Not only a fiber company!
4. DC Tier IV
FASTWEB C1 – PUBLIC
4
Following Milan, a new Tier IV Data Center
Surface
Power Use Effectiveness (PUE)
Location Milan Rome
600 m2
500 m2
1,25 1,25
Certification Tier IV Tier IV
2018
5. FASTcloud
FASTWEB C1 – PUBLIC
5
Starting from 2015 our cloud solution is based on OpenStack
Our FASTcloud services runs on Italian jurisdiction
We offer flexible solutions such as Virtual Server, Virtual Private Data Center, Private
IaaS with dedicated hardware
As a telecommunication company we offer to our customer our cloud services over
Internet and VPN MPLS
6. Business requirements to upgrade to mitaka
Double Jump - From kilo to mitaka
Need to reduce the downtime for the customers, specifically for
the L3 agents
FASTWEB C1 – PUBLIC
6
7. The Upgrade Path
FASTWEB C1 – PUBLIC
7
Possible ways to upgrade:
1. Big Bang (in-place) upgrade;
2. Side by Side clusters;
3. Control Plane side by side;
4. Rolling upgrades (upgrade levels)
Have you planned
a rollback path?
Think about impacts:
1. On the infrastructure
2. From user side
3. From applications side
9. Clustering Openstack Services
FASTWEB C1 – PUBLIC
9
Provides HA for all our services
Keep all services consistent building
constraints
Make some services clustering free could
be the answer
Follow the divide et impera paradigm
Cons:
Resource constraints make difficult the
management of some services
11. Virtualized components: Galera cluster
FASTWEB C1 – PUBLIC
11
Goal:
1. Be cluster free
2. Replication mode (Galera cluster)
for fault tolerance
High- availability service that provides:
1. High System uptime
2. No Data loss
3. Scalability for growth
12. Virtualized components: Nova service
FASTWEB C1 – PUBLIC
12
Nova Control Plane:
a. 2 nodes in HA
b. VIP to access services
c. Haproxy + keepalived
nova.conf
[upgrade_levels]
compute = kilo
nova.conf
[upgrade_levels]
compute = liberty
Pin the compute RPC version:
[upgrade_levels] = X + 1 but not > 1
13. Managing Openstack: The Ansible way
FASTWEB C1 – PUBLIC
13
IaaS Software
Host Operating System
Openstack services roles
Ceph rados gateway roles
Reverse proxy management
Upgrade to Liberty path
Upgrade to Mitaka path
Full-Stack Automation
with Ansible
Common and common-openstack roles to keep
aligned the infrastructure components
Playbooks to update the control plane services
from kilo to liberty and from liberty to mitaka
Playbooks:
14. It’s time to upgrade: Planning
FASTWEB C1 – PUBLIC
14
Make the
Integration
tests
Upgrade
control Plane
to Mitaka
Disable
virtualized
services
from PCS
Routers
Rollback
Upgrade
Control Plane
to Liberty
Align the
haproxy/keepalived
config
Add neutron
auxiliary blades
and switch
routers
Prepare all virtual
environment
(provision the
VMs using
ansible roles)
15. Neutron aux mode: adding two new agents
FASTWEB C1 – PUBLIC
15
neutron.conf
● dhcp_agents_per_network = 2
● max_l3_agents_per_router= 2
dhcp_conf.ini
● enable_metadata_on_isolated_network
x 3
x 2
16. Neutron aux mode: moving routers
FASTWEB C1 – PUBLIC
16
Aux Neutron L3 agent
Neutron L3 agentCompute node
“${NEUTRON_CLIENT}” l3-agent-router-[add|remove] “${AGENT_ID}” “${ROUTER_ID}”
for router in $(ip netns | grep qrouter); do
ip netns exec $router ip link
set dev $interface down;
done;
Force the routers to switch
17. Test critical sections: update db schema
FASTWEB C1 – PUBLIC
17
Fix Neutron db
Table
ha_router_agent_port_bindings
for duplicate entries
● Dump the entire db and replicate it on the
Instance B;
● Execute the update schema for each service
to test it works correctly:
Ansible bool condition:
when: update_schema
openstack-db --service “${service}” --update
Production
Database
Mirrored
Database
Instance A Instance B
NovaCinder NeutronKeystone Heat Glance
19. Lesson Learned: the MTU issue
FASTWEB C1 – PUBLIC
19
The MTU on the qbrXYZ and qrouter-XYZ interfaces are
1500 instead of the rest of the infrastructure where Jumbo
frame is enabled
neutron.conf
[DEFAULT]
global_physnet_mtu = 9000
ml2_conf.ini
[DEFAULT]
path_mtu = 9000
20. Best Practices
FASTWEB C1 – PUBLIC
20
● Review the release notes for each release to learn about new,
updated and deprecated parameters
● Openstack mirrored environment
● Identify critical update paths (i.e. openstack db schema update)
● Parallelize as much as possible (i.e. packages update)
● Make use of Ansible templates (ready to go to newton)