A play in three acts:
1. State of OpenStack's maturity for Enterprises
2. Some lessons learned over the 5 years we've been involved with OpenStack
3. Workloads that run on OpenStack and how to succeed with building cloud-native applications.
3. OpenStack success in the enterprise
State of the union, 2016
3
Upstream
Distro
Solution
Operated
Sweet spot
$ $ $ $ $
4. Deployment / lifecycle lessons
4
V1
Triple-O
V2+
Ansible
Standard configurations
Text-based, transparent
Easy to tweak
Easy to update
Predictable upgrades
Secure out of the box
5. Management lessons
5
V1
Ceilometer, Horizon
V2+
Monasca, Ops Console
Operations as a first-class persona
Separate OLTP / Management DB’s
Scalable event store
Alarm engine
Prescribed resolutions
Remediation and lifecycle actions
Time series visualization
6. Security lessons
6
V1
“Read the whitepaper”
V2+
Barbican, Bandit
TLS for endpoints
TLS for internal services
Barbican for key management
Data-at-rest encryption
Bandit: static analysis in CICD
Audit logging: PCI compliance
Standard configs secure by default
7. Platforms are about workloads
7
OpenStack
CattleWorkload
Requirements
Features/complexity
(IaaS+: Heat,
LBaaS, …)
2013
11. Cloud-native workloads
“DIY” by stitching together services versus “delegate to a platform”
11
“AWS approach” PaaS approach
Unit of compute EC2/ECS (nova/docker) Docker/runC
App deployment/versioning CloudFormation (heat) Built in
Load balancing ELB (neutron-lbaas) Built in
Zero-downtime deployment CodeDeploy/Elastic Beanstalk (diy) Built in
Services (DB/queuing) RDS (trove)/SQS (zaqar/cue) Service brokering
App health monitoring CloudWatch (monasca) Built in
Auto-scaling AutoScaling (heat?) Built in
Log aggregation CloudTrail (ELK/diy?) Built in
12. What about containers?
“Everything is a container” versus “delegate to a platform”
12
CaaS approach PaaS approach
Unit of compute Docker/runC Docker/runC
App deployment/versioning Pods, replication controllers/compose Built in
Load balancing k8s services (plumbed to LB) Built in
Zero-downtime deployment k8s rolling-update Built in
Services (DB/queuing) Data service in a pod? Service brokering
App health monitoring Datadog? Sysdig? Built in
Auto-scaling “Horizontal pod auto-scaler” Built in
Log aggregation Fluentd, elasticsearch, kibana Built in
13. Use the platforms, Luke!
13
OpenStack Platform
Cloud Native Platform
Traditional/
High-end
Workloads
Apps Apps Apps Apps Apps AppsApps
15. Other talks you should check out…
15
HPE Track Speakers Title
Tuesday 11:15 AM–11:55 AM Joy Dorairaj Security & Compliance in OpenStack
Tuesday 12:05 PM–12:45 PM Tom Howley Lifecycle management of Openstack using Ansible
Tuesday 2:00 PM–2:50 PM Joy Dorairaj
Achieving OpenStack Carrier—Grade Performance
and Reliability
Tuesday 2:50 PM–3:30 PM Nayana Dhawalbhakta Multi-Data Center OpenStack Carrier Grade for CSP's
Tuesday 3:40 PM–4:20 PM HPE & Telstra Executive
Full ISO 7-Layer Stack Fullfillment, Activation and
Orchestration of VNF's in Carrier Networks
Tuesday 4:40 PM–5:20 PM
Swami Vasudevan, Fabrizio Fresco,
Matt Young, Joy Dorairaj, Paul Murray
OpenStack in Production Panel
Tuesday 5:30 PM–6:10 PM
Henrik Blixt, Dave Hawley, Matt Young,
Nathanial Dillon
Ignite Session: What's Hot and What's New
I want to start by providing an honest assessment of OpenStack’s maturity level for Enterprise customers, circa 2016.
First, it’s important to note - as with Linux, there are a number of consumption models, with tradeoffs between how much you pay a vendor and how much effort you put in yourself.
With any open source project, at the far end of the spectrum is the “DIY” camp – where you directly interact with the upstream ecosystem. Not for the faint of heart, but some very sophisticated companies / early adopters have found early success there. But you need a deep commitment and a set of OpenStack upstream contributors to make that work.
Next up is consuming a distro. This is where many Enterprises are at with Linux. Our observation with OpenStack is that it’s not quite at the level of “turnkey” to be consumed as a distro – rather, most of the success we see in the Enterprise adoption of OpenStack is when it’s consumed as a turnkey solution (for example, delivered as an appliance, or stood up / operated with the help of professional services / OpenStack experts). And of course at the far end of the spectrum is the model where another vendor fully operates your OpenStack cluster for you. We find that the sweet spot is those last two models.
An important dynamic here is that many Enterprises that started at the left end of the spectrum and experienced challenges (which gave OpenStack a mixed reputation in the Enterprise) are moving to the right end of the spectrum and experiencing success.
Time will tell whether OpenStack can cross the chasm and be consumed as a distro… with every successive release it’s getting easier to stand up and operate… but our observation, based on our customers, is that we’re not there yet.
So what have we learned over the past few years building OpenStack solutions? First, lets talk about deploying the platform and managing its lifecycle.
In v1, we used Triple-O – running openstack on top of openstack. “Turtles all the way down”. It was attractive / elegant from a computer science perspective, but the hard-won lesson is that we needed a different approach to deploy and manage a complex distributed system such as openstack. Some of the difficulties included the opaqueness of the installation, the difficulty of updates (full images, no topology changes), and the impossibility of upgrades.
We moved to an Ansible-based approach to address many of these issues. One of the most important things we learned is to provide a set of standard configurations that match the typical topologies – this really helps our customers fall into the “pit of success”. The fact that the entire deployment system is text based (written in yaml) makes it really easy to tweak in order to match the actual customer environment. More configurability means less custom stuff, which in turn makes the entire system pretty trivial to update with zero downtime. And for the first time, we feel like we have a deployment and lifecycle system that can support zero-downtime upgrades between releases.
And it’s more secure! Our Ansible-based lifecycle manager tooling makes it relatively straightforward to enable TLS for external-facing API endpoints as well as internal service communication. Another strong point for HOS 2.x+ on the security front is network separation; HLM makes it possible to separate different classes of traffic across customer-defined network architectures.
The proof is in our customer success. A year ago, it was common for OpenStack deployments to take weeks. Just the other day we had a large aerospace company in the US deploy a 10-node PoC cluster in half a day (another 1.5 days on environment issues).
At the heart of management is event collection and the infamous “single pane of glass” that admins can use to gain visibility and control over their infrastructure.
Community driven development is an iterative process. In 2014 we sometimes confused Ceilometer (which was focused on collecting usage data for metering purposes) with an event database that could be used for monitoring. But these are different use cases. These days, we know there are significant scale and architecture problems with Ceilometer and it’s been since pulled from core OpenStack.
And Horizon (the OpenStack console project) was always designed as a tenant dashboard with some admin features that were kind of bolted on. It’s a great tenant console – we continue to invest in it.
But we’ve learned that we need to treat operations as a first-class persona. We helped build two things – Monasca and Ops Console – that focus on management of OpenStack at scale.
In v1 we made a common deployment mistake – combine the OLTP store behind Nova etc with the store for the operational event stream. Very different usage patterns and we quickly found out that you need to separate the stores, messaging systems – otherwise “why is this so slow?” We built Monasca on top of a scalable event store (InfluxDB) and provide Vertica (a lightning-fast columnar database) underneath it in Helion – much better than using something like MySQL.
You don’t just want events – you want “standing queries” that tell you when things go wrong – Monasca calls this an Alarm Engine. Just as importantly, we have a set of proscribed resolutions - steps to resolve common issues (HPE has a wealth of experience operating openstack!) The coolest thing is that remediations drive lifecycle actions – so that your cloud can reconfigure itself.
Ops Console is basically the dashboard on top of Monasca data – you can view and manage alarms, create personalized dashboards, look at time-series charts of important metrics and KPI’s.
Putting these all together you get a scalable cloud management and operations platform that is completely open source and capable of managing large and small scale cloud deployments. Each component is scalable, fault tolerant, completely pluggable into your existing tools and processes.
In v1, security was largely an exercise for the reader – how to configure openstack to be secure (e.g. turn on TLS).
In this area, we’ve made tremendous gains. We turn on TLS not just for API endpoints, but also for internal service communication.
We enable using an ESKM for key and secret management using Barbican.
We provide data-at-rest encryption for Cinder volumes.
We use the Bandit toolset (which we drove the development of) to perform static analysis on OpenStack code for security vulnerabilities. Putting this in our CICD system finds security issues much earlier in the cycle!
Audit logging is an important step for PCI compliance. You can forward audit trails to a centralized log and/or processing system (ELK stack, ArcSight, Splunk)
Finally, this all ties back to lifecycle management – our standard configs are secure by default (e.g. enable TLS for endpoints and for internal traffic)
All of this is essential to being able to run business-critical workloads on the cloud.
With that, let’s transition to talking about workloads.
Platforms are about workloads. OpenStack started out by focusing on cloud-native applications (cattle) as the sweet spot workload. That was the sweet spot in 2013 around the Havana timeframe when I started getting involved.
But there’s been an inexorable pull “to the right” – i.e. adding features and complexity – to support more workloads.
In 2014 it became vogue to run Cloud Native Platforms (like Cloud Foundry) on top of OpenStack and have those platforms manage applications. We launched a product (HDP) to do just that.
The requirements on the IaaS platform were actually lower, and we started seeing some competition between the approaches (building directly on top of the platform, a-la “the AWS way”, vs building on top of a cloud-native platform).
We’ll get back to that.
In 2015 we started seeing OpenStack get pulled towards supporting features that allow “Pets” to run on the platform. Things like Live Migration.
2 years ago we used to say “if you’re trying to replace vSphere with OpenStack, you’re doing it wrong”. Much hand-wringing ensued about what should the future of OpenStack be. Now, not so much…
And in 2016, now in its sixth year, OpenStack can support performance-sensitive, low-latency workloads like NFV.
This is actually great! Platforms mature when they support a diverse set of workloads. We should embrace this about OpenStack as opposed to wring our hands on how it evolved from its initial target of being a “cattle platform”.
Going back to how to build cloud-native applications…
Back when we first started with OpenStack, the idea was to provide the same design pattern that startups were using on AWS. A set of loosely coupled composable services that a developer could stitch together into a working system. But that approach requires lots of sophistication. And many who try it end up replicating a ton of undifferentiated infrastructure.
These days, most of the success we see in Enterprises building cloud-native applications is taking the PaaS approach.
What about containers? Aren’t containers going to rule the world and kill OpenStack and kill all the PaaS platforms?
Well, the current state of the art with “CaaS platforms” looks a lot like the AWS approach to building cloud-native apps. With AWS, all of these considerations are discrete services. With CaaS platforms, all of these considerations are containers that the container management platform can orchestrate alongside your app.
The PaaS approach continues to be more attractive to Enterprises we talk to, as compared to the “stitch it all together yourself” approach. The trick is that the PaaS platforms need to evolve with the times – pick up new IaaS/CaaS capabilities.
So to bring the last lesson home – “use the platform, luke!”
Use OpenStack to host a wide variety of heterogeneous workloads. It’s ready for it!
Use a Cloud Native App Platform hosted on top of OpenStack to build cloud-native applications.
This is how we see the open source cloud world evolving, and this is how Enterprises can best find success in the Cloud.