Sirish Raghuram
Co-founder, CEO
Platform9
7 OpenStack Best Practices
Private Clouds Made Easy
Roopak Parikh
Co-founder, VP Engineering
Platform9
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
Speaker Bio
2
Sirish Raghuram
• Co-founder, CEO at Platform9
• Previously: Staff Engineer at VMware (12 years)
• Technical and Management responsibility for
multiple VMware products
Roopak Parikh
• Co-founder, VP Engineering at Platform9
• Previously: Staff Engineer at VMware (7 years)
• Architect for multiple VMware products
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Best practices from managing 50+ active
OpenStack deployments
• Recommended for technical audience looking to use
OpenStack in production
• Assumes fair knowledge of OpenStack
Preamble
3
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
OpenStack Architecture
4
Clarity UI
Nova
!
!
Cinder
Scheduler
Keystone
(Identity)
CLI / Tools Scripts Heat
(Orchestration)
Neutron
Glance
(Images)
Basic Storage
Compute
Basic Network
Block

Storage
Network

Controller
© 2015 Platform9 Systems, Inc. Webinar: Best Practices for OpenStack in Production
Platform9 Managed
OpenStack:
• Your servers host your data
• Platform9 hosts the
OpenStack controller as a
Service, with an SLA
• No need to install, monitor,
troubleshoot or upgrade
OpenStack
Platform9 Managed OpenStack
5
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Controller API logs
• Nginx or Apache
• Controller services
• /var/log/nova/*, /var/log/glance/*, /var/log/keystone…
• Rabbit/MQ
• /var/log/rabbitmq
• Controller system health
• CPU, Memory, Disk, N/W
• File Descriptors
• Sockets
• Compute node logs (occasionally)
• nova, glance, other services
• Rarely, libvirt
#1 — Instrument & Monitor
6
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9 Log Telemetry
7
raw log
raw log
raw log
raw log
… Pre-process

(filter)
log storage,
archival and
search
Alert filters
alert

mechanism
Alerts
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• 100% automation is key
• Alerts can be very noisy
• Future:
• Sentry / Rollbar / to easily discern problem areas by
severity and priority
• Migrate from papertrail to E-L-K?
Takeaways
8
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Common points of failure
• OpenStack Controller
• Database
• Python applications (Keystone, Nova, Glance, et al)
• Rabbit-mq
• Compute Nodes
• Agent software uptime
#2 — High Availability Configuration
9
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9 HA Architecture
10
Compute NodeCompute NodeCompute NodeCompute Node
…
Internet
OpenStack
Controller
OpenStack
Controller
OpenStack
Controller
UI
Virtual

IP
Load
Bala-
ncer
Intranet
Replicated

DB
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• SLA —> must recover quickly from losing
Controller
• Backup Controller DB
• Backup Controller State
• Automated recipe to restore from backup
• Test restore recipe
#3 — Backup / Restore
11
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Automated mechanism to rollout
• Controller upgrade
• Compute node agent upgrade
• Plan for testing upgrade before committing
• Roll-back if required
#4 — Upgrade / Patch Rollout
12
© 2015 Platform9 Systems, Inc. @Platform9Sys
Platform9 Orchestration
13
Vanilla OS
customer
state
Template
Image V1
Customer
Server V1
Fresh Install
Upgrade
Vanilla OS
Template
Image V2
Customer
Server V2
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Platform9: Havana to Juno Upgrade
14
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Segregate underlying infrastructure for different
classes of workloads (or users!)
• By workload, hardware type, geography or organization
• Illustrations:
• Test/Dev vs Production
• Tier 1 vs Tier 2
• SSD vs HDD
#5 — Workload Tiering
15
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
Intelligent Placement
16
DevOps
Tier-2

Infra
Tier-1

Infra
Private Cloud
Tier-2Tier-1
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• OpenStack controller and compute node software communicate over message
queues
• Reliable message delivery is critical to OpenStack
• Issue
• Once in ~2-5000 API requests, compute node or controller node can lose connection to
queue
• Result: messages stuck in queue and never delivered
• Result: operations can stall, seemingly at random
• Resolution
• oslo messaging heart-beating applied Jan 2015
• Ref: https://github.com/openstack/oslo.messaging/commit/b9e134d7e955b9180482d2f7c8844501c750adf6
• Disabled in April: https://github.com/openstack/oslo.messaging/commit/
287a4f56f45ed9cd40116a9e7b6e529f3382a925
• Platform9 has a Platform9 specific heart-beat mechanism, leverages Platform9 web socket
architecture
#6 — Hardened Messaging Libs
17
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Issue #6 is an example of an issue you will run into
• Be prepared to
• Debug / diagnose
• It took us ~7 man days to debug issue #6 (worst case
example)
• Roll out a patch
• Techniques
• Separate webinar topic!
#7 — Troubleshooting / Debugging
18
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Reviewed 7 best practices to running OpenStack
successfully
• Share your own tips — share via GTM chat panel!
Recap
19
© 2015 Platform9 Systems, Inc. Webinar: Why OpenStack for VMware?
• Production grade OpenStack without the hard work
• Request your own Platform9 account
• Related resources
• OpenStack benefits for KVM / VMware — recorded webinars
• Upcoming webinar: Jun 7, 2015
• Have questions?
• Ask away!
• Get in touch:
• @Platform9Sys
• support@platform9.com
Summary
20

Webinar: OpenStack Best Practices for Production

  • 1.
    Sirish Raghuram Co-founder, CEO Platform9 7OpenStack Best Practices Private Clouds Made Easy Roopak Parikh Co-founder, VP Engineering Platform9
  • 2.
    © 2015 Platform9Systems, Inc. Webinar: Best Practices for OpenStack in Production Speaker Bio 2 Sirish Raghuram • Co-founder, CEO at Platform9 • Previously: Staff Engineer at VMware (12 years) • Technical and Management responsibility for multiple VMware products Roopak Parikh • Co-founder, VP Engineering at Platform9 • Previously: Staff Engineer at VMware (7 years) • Architect for multiple VMware products
  • 3.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Best practices from managing 50+ active OpenStack deployments • Recommended for technical audience looking to use OpenStack in production • Assumes fair knowledge of OpenStack Preamble 3
  • 4.
    © 2015 Platform9Systems, Inc. Webinar: Best Practices for OpenStack in Production OpenStack Architecture 4 Clarity UI Nova ! ! Cinder Scheduler Keystone (Identity) CLI / Tools Scripts Heat (Orchestration) Neutron Glance (Images) Basic Storage Compute Basic Network Block
 Storage Network
 Controller
  • 5.
    © 2015 Platform9Systems, Inc. Webinar: Best Practices for OpenStack in Production Platform9 Managed OpenStack: • Your servers host your data • Platform9 hosts the OpenStack controller as a Service, with an SLA • No need to install, monitor, troubleshoot or upgrade OpenStack Platform9 Managed OpenStack 5
  • 6.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Controller API logs • Nginx or Apache • Controller services • /var/log/nova/*, /var/log/glance/*, /var/log/keystone… • Rabbit/MQ • /var/log/rabbitmq • Controller system health • CPU, Memory, Disk, N/W • File Descriptors • Sockets • Compute node logs (occasionally) • nova, glance, other services • Rarely, libvirt #1 — Instrument & Monitor 6
  • 7.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? Platform9 Log Telemetry 7 raw log raw log raw log raw log … Pre-process
 (filter) log storage, archival and search Alert filters alert
 mechanism Alerts
  • 8.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • 100% automation is key • Alerts can be very noisy • Future: • Sentry / Rollbar / to easily discern problem areas by severity and priority • Migrate from papertrail to E-L-K? Takeaways 8
  • 9.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Common points of failure • OpenStack Controller • Database • Python applications (Keystone, Nova, Glance, et al) • Rabbit-mq • Compute Nodes • Agent software uptime #2 — High Availability Configuration 9
  • 10.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? Platform9 HA Architecture 10 Compute NodeCompute NodeCompute NodeCompute Node … Internet OpenStack Controller OpenStack Controller OpenStack Controller UI Virtual
 IP Load Bala- ncer Intranet Replicated
 DB
  • 11.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • SLA —> must recover quickly from losing Controller • Backup Controller DB • Backup Controller State • Automated recipe to restore from backup • Test restore recipe #3 — Backup / Restore 11
  • 12.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Automated mechanism to rollout • Controller upgrade • Compute node agent upgrade • Plan for testing upgrade before committing • Roll-back if required #4 — Upgrade / Patch Rollout 12
  • 13.
    © 2015 Platform9Systems, Inc. @Platform9Sys Platform9 Orchestration 13 Vanilla OS customer state Template Image V1 Customer Server V1 Fresh Install Upgrade Vanilla OS Template Image V2 Customer Server V2
  • 14.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? Platform9: Havana to Juno Upgrade 14
  • 15.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Segregate underlying infrastructure for different classes of workloads (or users!) • By workload, hardware type, geography or organization • Illustrations: • Test/Dev vs Production • Tier 1 vs Tier 2 • SSD vs HDD #5 — Workload Tiering 15
  • 16.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? Intelligent Placement 16 DevOps Tier-2
 Infra Tier-1
 Infra Private Cloud Tier-2Tier-1
  • 17.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • OpenStack controller and compute node software communicate over message queues • Reliable message delivery is critical to OpenStack • Issue • Once in ~2-5000 API requests, compute node or controller node can lose connection to queue • Result: messages stuck in queue and never delivered • Result: operations can stall, seemingly at random • Resolution • oslo messaging heart-beating applied Jan 2015 • Ref: https://github.com/openstack/oslo.messaging/commit/b9e134d7e955b9180482d2f7c8844501c750adf6 • Disabled in April: https://github.com/openstack/oslo.messaging/commit/ 287a4f56f45ed9cd40116a9e7b6e529f3382a925 • Platform9 has a Platform9 specific heart-beat mechanism, leverages Platform9 web socket architecture #6 — Hardened Messaging Libs 17
  • 18.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Issue #6 is an example of an issue you will run into • Be prepared to • Debug / diagnose • It took us ~7 man days to debug issue #6 (worst case example) • Roll out a patch • Techniques • Separate webinar topic! #7 — Troubleshooting / Debugging 18
  • 19.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Reviewed 7 best practices to running OpenStack successfully • Share your own tips — share via GTM chat panel! Recap 19
  • 20.
    © 2015 Platform9Systems, Inc. Webinar: Why OpenStack for VMware? • Production grade OpenStack without the hard work • Request your own Platform9 account • Related resources • OpenStack benefits for KVM / VMware — recorded webinars • Upcoming webinar: Jun 7, 2015 • Have questions? • Ask away! • Get in touch: • @Platform9Sys • support@platform9.com Summary 20