Openstack Lessons learned
Continuous Integration and Deployment using Openstack
Tuning Openstack for High Availability and Performance in Large Production Deployments
2. Agenda
2
6:30 pm - Continuous Integration and Deployment Using OpenStack. Miguel Zuniga
7:15 pm - Tuning OpenStack for Availability and Performance in Large Production
Deployments. Raj Geda & Gabriel Capisizu
4. Agenda
• Continuous CI/CD Workflow – From beginning to end
• Replicating code changes to multiple repositories
• The gating system – Review, Approval, Build, Integration and Release
• Packages, Artifacts
• Distribution of Packages
• Infrastructure as Code
• Deployment to Production
4
5. Continuous CI / CD Workflow
Presentation Identifier Goes Here
5
6. Continuous CI / CD workflow
1. Developer commits change to review system
2. Gating system detects the new commit and instructs the
coordinator to execute the job specified in the config file.
3. Coordinator instructs worker to execute specific gate job.
4. Worker downloads the new commit, reads the config file and
executes all the instructions. Coordinator reports back to Gating
system which allows the commit to move forward or get rejected.
5. Once approved the commit get replicated to some external git
server (github, stash, git server).
Presentation Identifier Goes Here
6
7. Continuous CI / CD workflow
6. Worker creates the packages of the approved commits and
stores them in a package repository
7. Configuration management server downloads the latest config
mgmt code.
8. Instruct the clients to make modifications on their state
9. Clients pull new packages from the repositories servers.
Presentation Identifier Goes Here
7
8. Replicating Code
• Only approved commits get replicated to target destinations.
• All other systems pull code from these destinations.
• Production packages are created with the code stored here.
• Remember your CI system might be in cloud so this is your
backup solution.
Presentation Identifier Goes Here
8
9. Gating system
• Provide one or more gates based on the events emitted by the
review system.
• Recommended gating:
Review Gate -> When user git review a new change
Approval Gate -> When user +2 a new change
Build Gate -> When user submits for merge an approved change
Integration Gate -> When user comments integration on specific change
Release Gate -> When user adds a new tag to a specific commit
Periodic Gate -> Executed once a day (or more)
Presentation Identifier Goes Here
9
10. Gating system
• The gating system will also review each of the changes providing
a +1, -1, approved or rejected output based on the results from
the jobs executed at a specific gate.
• Provides an easy API interface to collect metrics, send
notifications, create reports and execute jobs based on other
triggers.
Presentation Identifier Goes Here
10
11. Packages and Artifacts
• Create packages only from approved changes which are replicated to
the final code destination (github, stash, git server).
• Use OS package systems (rpm, debs)
• Keep a sane versioning v0.XXXX for development vY.XXX for
production
• The gating system provides to gates where to build packages, the build
gate which can create a package of each commit and the release gate
which creates a package of a specific tag, use the late one for
production packages.
Presentation Identifier Goes Here
11
12. Distribution of Packages
• Have a central repository which will be RW to store all the
packages.
• Replicate packages on all Cloud availability zones, Data Centers,
environments.
• Use existing tools or at least rsync to move the data around.
• All your repository endpoints must be RO.
• Use CDN when possible.
Presentation Identifier Goes Here
12
13. Infrastructure as Code
• Puppet/Chef/Ansible/Whatever
• Write your code in a recursive way (or with specific order) to allow
deployments with one iteration (only one run).
• Your code should go also through the review system.
• The config management server should pull the code from the
replicated git servers (github, stash, git server).
• Make sure your server is pulling code based on the tag releases.
• Use a change management window if necessary.
Presentation Identifier Goes Here
13
14. Deployment to Production
• Deployment will be controlled by the config management server
when new code gets pulled from the git repositories.
• The clients will download the specific packages based on the
config mgmt instructions.
• Use orchestration tools if your application needs it.
• The trick is in generating a stable and reliable package as well as
good configuration management code.
Presentation Identifier Goes Here
14
16. Tuning OpenStack for availability and
performance in large production deployments
Raj Geda, Gabriel Capisizu
Cloud Platform Engineering
17. Agenda
• Large scale, High Availability
• Infrastructure life cycle
• LDAP and Keystone integration
• Keystone SSL, PKI Tokens
• Nova
• KVM
• Database
• RabbitMQ
17
18. What is a large (scale) production environment?
Multiple DC’s
Thousands of Hypervisors
10s x thousands of VMs
Millions of requests/min to API Endpoints
18
19. High Availability
Any control plane service distributed among failure zones
Hardware load balancer (HA pair) in front of any service
No L2 spanning across availability zones
Redundant power
Redundant network connectivity
19
20. High Availability – compute node
TOR
sw
managemet
sw
bond0
1g
LACP
active/active
ipmi/out-of-band
802.1q
compute node
to distribution
to distribution
TOR
sw
10g
eth0
10g
eth0
10G Redundant NICs
ACTIVE/ACTIVE LACP
trunks w/ 802.1q
Out of band
management/IPMI
20
21. Infrastructure Lifecycle management
Bare metal provisioning – Foreman, Cobbler
Classification of systems
Configuration management – puppet, masterless
Orchestration - salt, fab
21
22. Enterprise directories and OpenStack
Where are your users ?
Why do you need to use an ‘identity management’ with your enterprise
directory
Active directory or LDAP
Why LDAP ?
22
23. Keystone and LDAP
Keystone role in OpenStack
LDAP for identity, SQL for assignment
Read-Only LDAP
Use SSL for connecting keystone with LDAP (use ldaps:// rather than ldap:// ) –
use a trusted CA, turn verification of certs ON
Create proxy user for keystone to bind – set permissions, ACI
LDAP capacity – keep an eye on load and performance of the directory server(s)
23
24. End to End encryption and Keystone
Understand the flow of data
24
25. Staying on top of your (keystone) security
35357/ssl 5000/ssl
apache/mod_wsgi/
mod_ssl
keystone
self signed cert
https://keystone-admin/ https://keystone/
LB VIP
LB VIP
self signed cert
apache/mod_wsgi/
mod_ssl
35357/ssl 5000/ssl
self signed cert self signed cert
apache/mod_wsgi/
mod_ssl
keystone
apache/mod_wsgi/
mod_ssl
client ssl
35357/ssl 5000/ssl
self signed cert self signed cert
apache/mod_wsgi/
mod_ssl
keystone
apache/mod_wsgi/
mod_ssl
signed cert
client ssl
server ssl
signed cert
server ssl
25
Certificates
trusted CA signed vs self signed
Front keystone servers with
apache/mod_wsgi or nginx/uwsgi
Load balancers in the front of your
keystone servers, custom checks to
Validate services
chmod 640 keystone.conf
26. Keystone and PKI tokens
Why not just use UUID tokens?
Why PKI tokens are better
improved performance – no calls to keystone for validation
first get the public key (certificate) cache it, subsequent calls use it to
validate tokens
keystone server encodes, signs token
client validates expiration, revocation list, signature, decodes token
26
27. Issues with PKI tokens
Token size
Catalog size
No catalog filtering
Make sure the components that use keystone support large tokens
Raise the default to accommodate large tokens(MAX_HEADER_LINE = 32768)
apache – mod_wsig
python eventlet – wsgi.py
Use – nocatalog
27
29. 29
<Apache-nova.conf >
Listen 8774
<VirtualHost *:8774>
WSGIScriptAlias / /opt/wsgi/nova-api.wsgi
WSGIDaemonProcess nova-api user=nova group=nogroup processes=3
threads=10 python-path=/usr/share/pyshared/nova
WSGIProcessGroup nova-api
# SSL Config
SSLEngine on
SSLCertificateFile /etc/ssl/certs/server.crt
SSLCertificateKeyFile /etc/ssl/private/server.key
ErrorLog /var/log/nova/nova-api.log
LogLevel info
CustomLog /var/log/nova/nova-api.log combined
</VirtualHost>
< /opt/wsgi/nova-api.wsgi>
import os
import sys
from nova import log
from nova import utils
from paste.deploy.loadwsgi import loadapp
sys.path.insert(0,(os.path.dirname(os.path.realpath(__file__))))
sys.stdout = sys.stderr
# Read nova configuration options and pick the default configuration file
# Typically /etc/nova/nova.conf
flags = utils.default_flagfile()
# Import nova gflags information for ease of use
from nova import flags
flags.FLAGS(sys.argv)
log.setup()
# Location of the paste-deploy configuration file
config = '/etc/nova/api-paste.ini'
# Application that mod_wsgi will be deploying
application = loadapp('config:%s' % config, name = ’nova-api')
Tuning of Nova api
# Raise the default from 8192 to accommodate large tokens
eventlet.wsgi.MAX_HEADER_LINE = 32768
30. Nova conductor
User
Nova API
AMQP
Queue
database
nova-scheduler nova-compute
nova-conductor
SQL
30
Nova API
AMQP SQL
Queue
database
nova-scheduler
nova-compute
User
to disable nova conductor use:
32. KSM (Kernel SamePage Merging)
KSM lets the hypervisor system share identical memory pages amongst
different processes or virtualized guests.
KSM is critical to performance if you want to over provision your resources
successfully.
ksmtuned/ksmd process work the following way:
– scans through the memory finding duplicate pages
– Merges duplicate page to single page
– Map to all virtual machine locations
– Set copy on write
– Separate page when individual guests write to it.
32
36. High Available MySQL
36
wsrep
Galera approach is Data Centric
Connect to any node to write
No headache for auto increment
Replicate the full dataset across all
nodes
38. MySQL Cluster Limitations
support only InnoDB
Primary Key is must
commit latency (based on how may nodes in cluster)
Doesn’t like huge transactions
Deadlock on commit
Presentation Identifier Goes Here
38
39. Rabbit ‘MQ’
39
They used to make cartoons about me
and
now they are using me in data queues!
40. What can RabbitMQ do for you?
Clustering support
Highly Available Queues
Implements the latest AMQP spec (0.9/1.0)
Federation
Flexible Routing
40