Openstack devops challenges a journey from dump baremetal to functional openstack cloud system - final
1. Openstack DevOps Challenges
A Journey from dumb baremetals to production grade Openstack cloud system
Harish Kumar (hkumar@d4devops.org)
Ritesh Raj Sarraf (rrs@researchut.com)
2. An Adventurous Journey Begins..
CloudRX - A fictitious company who want to setup openstack
production cloud
Implement using DevOps culture
A production grade cloud have so many heterogeneous components
Openstack Components
Non-Openstack
Components
Storage systems like cepph,
Glusterfs, SDN like onos,
opencontrail, opendaylight
Other Support systems
Dns, Dhcp, Monitoring,
Log aggregation etc
Baremetal systems
Hardware config,
OS Provisioning,
Network device setup
Openstack Components
3. Components in Cloud system
Multi-node Openstack controllers
− All APIs, schedulers, message queues
Multi-node Ceph cluster
Number of compute nodes
Database servers
SDN Controllers
Load balancers
Other supporting systems like DNS, monitoring, etc
4. CICD Pipeline
Commit changes
to branch
Unit tests Gate tests
Packages Created
And pushed to
Unstable repo
Create repo snapshot
(v100) and select
for further testing
v100 - Acceptance,
integration, upgrade
testing
Promote v100
based on test results
and pushed to
staging/prod repo
Staging Production
5. CICD – general guidelines
Gate all applications before part of pipeline
Use same tools on all phases of pipeline to avoid change
in behavior
Try to reduce assumptions and hard-coded configurations
to make it adaptable
Handle scalable, distributed systems
Handle heterogeneous applications which have different
release cycle and dependencies
6. Initial Challenges
Implement a build and test pipeline various other jobs to
support
− Jenkins was the answer without a second thought
Manage Config management and automation
− Options
Puppet
Chef
Ansible
− We choose puppet
Puppet had most complete plugins for the
technology stack we have
7. Challenges on initial pipeline phases
Need parallel test environments so we can gate/at in
parallel
Should be easily provisioned and removed
Virtual environments an answer to it
− Provision a miniature of cloud on top of a cloud
− Built a tool to provision test cloud on top of an
Openstack cloud based on spec provided
− Easy to provision, easy to delete, use apis to build
openstack virtual test cloud on top of openstack
8. Automated environment setup Challenges
Bootstrapping such distributed system like an openstack
cloud system is complicated
− Bootstrap the whole openstack cloud
− Bootstrap clusters like rabbitmq, mysql, ceph clusters
− Handle inter-service deps on multi-node environment
How to validate that system is ready for testing
9. Automated environment setup Continues
Introduction of service discovery tool
− Options – etcd, consul, zookeeper
− We chose consul
− What and why consul
We built orchestration system around consul
− All nodes provisioned with userdata which install
puppet, consul etc
− Configure themselves with puppet according to role
− Each service come up will register themselves to consul
− Dependants will wait till dependency available before
configure
−
10. Automated environment setup Continues
All services will have healthcheck registered in consul, so
only healthy services would be exposed to the network
Each facility deployed will install validation script
Each node continuously run validations and write its own
state to consul kv
An external system can query centrally to get system state
Consul kv to record various other things like orchestration,
operational tooling
16. Staging and production
Baremetal management is very much complicated
− Have to work with heterogeneous physical systems
− Different ways for hardware configuration in different
vendors/models
− Operating system provisioning with different hardware
configuration can be complicated
− Different systems may need different capabilities
Rolling upgrades possible?
Handling upgrade failures
Possible rollback in certain situations
17. Baremetal server management
Undercloud controller with openstack ironic
− All-in-one openstack system with nova with ironic,
neutron with flat provider network, glance, keystone
− Easy to provision, delete and rebuild baremetals - the
undercloud
− Enable to use same tooling on dev/test virtual
environments and staging/production physical
environments
Tools to do various baremetal management tasks
− Hardware configurations, like raid setup
− Automated server enrollment to ironic
− Recording server locations to ironic which can be used