1. Just Let Me Code: How We Got
Our Groove Back with AppFog &
Cloud Foundry
Dave Persing, Nebari Software
CloudOpen 2015
Seattle, WA
2. • Consulting Services
• Started with the Disney
MagicBand
• Focused on awesome IOT
experiences
• We want to take our passion
for software and IOT to
everyone
3. • Director of Engineering for Nebari
• Chief Cat Herder
• Code, Architecture, Deployment
• Formerly of Disney Interactive
• Lost my groove while preparing a
new application environment.
Luckily, I got it back.
4. Monolithic Servers Suck
• Game servers crashed frequently
during new content releases or PVP
tournament ends
• Primarily due to monolithic server
architecture
• No ability to scale needed services
horizontally to meet demand
• Only option was to add new,
massive, expensive nodes to the
pool
• Why, oh, why didn’t we make these
microservices in the beginning?
5. 12 Factor Apps
• Services separated by divisions of labor
• Can scale without big architecture changes
• APIs provide a clear contract for
communication between services
• Configuration is in the environment (no SSH
keys in your repo!)
• Keep Dev/Stage/Prod as similar as possible
• Check out 12factor.net for more info
Image Credit: https://www.ctl.io/blog/post/appfog-
and-twelve-factor-apps-explained/
6. Encouraj is the engine for life's buddy system
iOS frontend with Node.js backend with SQL data store
7. The Encouraj Pre-AppFog Era
• We are not sysadmins
• Too small to have a dedicated
Ops Engineer
• Started the project on Azure-
based VMs
• Scale on Azure can be automated
but still needs configuration
• Configured with Ansible playbooks
• Quickly ran into provisioning and
configuration issues
8. Ansible
• Agentless - Uses SSH instead of an agent daemon
• Readable YAML configuration files
• Ansible and Azure hosts behind a load balancer are a
pain
• Provisioning and deployment managed with Ansible
playbooks
• Playbooks still require maintenance!
• We’re still responsible for the infrastructure (nginx,
MySQL, app nodes, etc.)
9. • We aren’t sysadmins!
• Tons of dependencies to take
into account
• Syslog, log sinks, supporting
infrastructure configuration
• Patch all the things!
• Deployment required a bastion
host due to an Ansible
limitation
• Each step of the deployment
process required custom
scripting
10. Why not Docker?
• Docker is awesome for lots of things
• Still requires some configuration for linking multiple
images
• We still would need to host our images in-house
• Setting up the private Docker Registry is easy, but
requires more maintenance
• Deploy pipelines still need to be implemented outside
building the images
11. This all made me ask…
Is the juice worth the squeeze?
12. The Encouraj Post-AppFog Era
• Platform built keeping 12Factor
App core values
• Deployment is easy!
• Configuration is easy!
• No provisioning!
• But wait, there’s more!
13. Provisioning
• No provisioning!
• Apps are defined using a YAML manifest
• App instances are created using
buildpacks
• No patch schedules
• Not responsible for ongoing infrastructure
maintenance
14. Deployment
• Wildly easy deployment!
• GitHub calls a web hook on the build
server on new commits to the repos
• Build server runs CLI commands to
push new instances to
develop/stage/production
• No more bastion hosts
• Zero downtime deployments are
incredibly easy
• No VMs means no user management
• Easy scaling with the CLI or newly
deployed manifests. App nodes come
up incredibly fast.
15. • User-Provided Service Instances
mean easy integration with
supporting services
• Encouraj uses Papertrail and
Pingdom for logging and health
monitoring
• Papertrail integration is exceedingly
fast
• Binding to other services is a CLI
command
• Adding new routes to your
application is a snap
Supporting Services
16. Pre-AppFog ===
• Maintenance Heavy
• Requires a full-time Ops Engineer
• Waking up at 3 AM wondering if I
configured that server correctly
• Waking up again at 4 AM wondering if I
remembered to apply that most recent
security patch
• Patch schedules
• Slow scaling
• Works on my machine
17. Post-AppFog ===
• My team can focus on doing what we do
best: writing software
• Sleepless nights wondering how the
Seahawks are going to do this year
• Easy scaling
• Zero downtime deployments
• Limited configuration of our app
environment
• Easy continuous deployment and delivery
• Next steps will include canary deployments
• Obligatory Success Kid
I’m Dave Persing. We’re going to talk about AppFog and Cloud Foundry and our experiences with using AppFog in real development.
We are a software consulting services company. We're almost all engineers and come from an exceedingly broad range of backgrounds.
You name the tech, one of us has probably used it.
Nebari rose from the desire to bring IOT and amazing IOT experiences to our clients. Based on the work our founders did at Disney for the MagicBand (Who is familiar with the MagicBand?), we wanted to more IOT more accessible. The MagicBand was the largest scale IOT project we've heard of. Disney World went ticketless with wristbands that included RFID and BLE technologies in there. Guests are able to pay for dinner, book FastPasses and even order dinner ahead of time, arrive at the restaurant and have their food magically delivered to their table wherever they decided to sit. Our founders were responsible for developing a very large part of the backend systems driving the experiences.
IOT is a big buzzword these days, so more specifically, we want to empower our clients with IOT experiences rather than the typical "Look, my fridge is attached to the internet!" kind of IOT devices. Experiences similar to the MagicBand.
Aaaaand there went my dignity. I've been into tech and software ever since I failed miserably at Little League baseball as a short, rotund pre-teen. Turns out I much excelled more at eating a Baby Ruth than embodying Babe Ruth. In the midst of wallowing in my candybar, I found the joys of writing batch files in MS-DOS to terrify my father when he logged into his computer; It's incredible the creative curse words one can ellicit from one's father with a quick batch file simulating a reformat of his C: drive. No surprise, I was grounded from the computer for a really long time.
At Nebari, I'm the Director of Engineering and Cat Herder in Chief. I'm responsible for architecture, keeping our development team on track, exploring new tech to help us be more light on our feet, making sure our processes are useful and productive, and even coding! I lost my groove while preparing a new application environment. This is how I got it back.
Prior to Nebari, I was with Disney Interactive as the Engineering Lead for Marvel: Avengers Alliance. While responsible for this project with millions of users and huge numbers of daily active users, I started to learn how much I have a distaste for huge, monolithic server architectures.
Avengers Alliance backend was one of those huge, monolithic servers. The game servers crashed regularly during new content releases and at the ends of PVP tournaments. Let me tell you, nothing makes one more motivated to fix problems than being a victim of nerd rage. We had no ability to scale out needed infrastructure horizontally to meet demands. The backend had no options other than adding new, permanent hardware (really, really expensive), or a couple hundred 100 of drone VMs that would still crash with too many open file descriptors. The virtualized servers weren't automagic, so we had to stay remember to decommission these servers at the end of the tournament or risk getting dinged for several 10s of thousands of dollars in compute time. Each server was responsible for ALL traffic sent to it via the loadbalancer rather than having nice, easily divided areas of responsibility. This is when I started learning about the 12 Factor App and saw this would have helped. A lot.
My name is Dave and I'm a 12 factor app user. Oh wait... Wrong room.
12 Factor Apps are something of an engineering ethos evangelizing the virtues of a well-defined, self-contained application to allow for scaling an individual service rather than an entire application stack. APIs on each of the services provide a clear, known contract for communications with other services. App configuration is stored in the environment as variables fed to the application at runtime. Ports, connection strings, other settings can be loaded into the environment and configured dynamically. Another one of the 12 is an attempt to keep Dev/Stage/Prod as similar to each other as possible. Combine this with GitFlow, which is a source control methodology allowing us to differentiate feature release, hotfixes, and other changes to our source. You have a full record of all commits to any environment for easy tracking due to the use of no-fast-forward which records each commit individually rather than the merge.
Encouraj is an app we developed for a client. It's composed of an iOS frontend with Node.js driving the web tier backed by a SQL data store. Encouraj calls itself "the engine for life's buddy system". Users can share things they want help with in their lives to a limited subset of friends, called a circle. The friends can respond with encouragements and positive vibes. We had a great time writing this app.
This is the first Node.js-based app we have released to production. I’m sure many of you have used node before and know the ins and outs. We have implemented nearly end-to-end coverage with Mocha-based unit tests. These tests hit all major functionality of the application. This is necessary due to Node.js being weakly typed. Nothing brings development to a screeching halt faster than accidentally naming two functions the same thing… Of course I’ve never done that before.
I'm still trying to get my boss to acknowledge my "evil boss" request for encouragement...
When we started the Encouraj project, we knew we wanted to go with the 12 Factor app approach. Easier said than done when maintaining your own infrastructure...
We're not sysadmins at Nebari. I'm not smart enough to be a sysadmin. I know just enough to be dangerous. Nebari itself is small and doesn't have a full-time, dedicated Ops engineer on staff. All of that said and because apparently I'm a glutton for punishment, we started the project on Azure-based VMs. Azure VMs can be configured to scale automagically based on CPU load or traffic spikes. However, we had some prerequisite configuration steps that needed to take place. I'm lazy by nature, so I don't like to repeat myself with unnecessary steps. We knew we needed some sort of repeatable, fire-and-forget way to manage our nascent infrastructure. Ansible is a great little tool if you're looking for a lightweight, headless, idempotent configuration and provisioning tool.
Ansible is agentless. There's no agent daemon that runs on the host machine to pull down or implement host changes. Configuration files are built in YAML, so they're very readable and easily understood. Playbooks provide all the supporting tech needed to fully install, configure, clone, start, and otherwise modify a system to our collective wills. It's pretty great.
This is where we ran into issues on Azure, though. Using Ansible and Azure hosts behind a load balancer is a pain. Ansible has (had?) a limitation of different ports for the same host aren't picked up as unique. So, trying to configure our new hosts from outside the loadbalancer was a no-go. We navigated the hurdle by creating a new playbook for a bastion host to proxy the requests to configure the machines. Ansible managed the deployment of the nodes, however, when we needed to add new nodes to the pool, we had to add their new hostnames to the inventory file. Not unreasonable, but not my preferred solution.
Playbooks still require maintenance. Perhaps we have explicit versioning associated with installed pacakges rather than using the HEAD version. sudo apt-get install MyPackage is out. Adding new pieces to the stack? Memcached? Need a new playbook. Redis? New playbook. These playbooks define your environment. If they're outdated, you risk all kinds of configuration and provisioning nightmares.
At the end, we're still responsible for maintenance! Supporting infrastructure is huge. What if we need nginx to reverse proxy to our machines? What about SSL? What about hardening MySQL? Continuing maintenance can be a monster timesuck.
There are a ton of dependencies to take into account when creating an application environment. Think syslog, log sinks, nginx, external services. All of it needs to be configured via your configuration mangement. I want our time focused on coding problems; not on configuration management.
Patch schedules are tough. I've lost hours of my life waiting out patch cycles to make sure live operations continue smoothly. Hoping and praying to the linux gods that your delicately designed system doesn't reject the newly grafted patches. And God forbid you have to patch your caching tier. That just ends in tears. Make sure you clear your caches post-patching, people! I speak from experience.
Deployment is still hard. Each step of the deployment process required a large amount of custom scripting. Build server to bastion host. Bastion host to each host in the pool.
Some of you are probably asking, "Hey, man... Why not just use Docker?". I hear you. I love Docker. It's great for all kinds of situations. It's something we're another project right now. It's something I investigated and ultimately decided to pass on. Why? Docker still requires some configuration. I'm really paranoid about exposing our clients' code and possible private images, so I want to host our Docker Registry. Setting up a Docker Registry is easy, but that's more infrastructure we have to maintain. Outside of those reasons, there was still a whole deployment pipeline that needed implementation. We still need to get the images onto the machines. In my mind, it's a "best tool for the job" kind of question. Our Encouraj client isn't technical, so we took the path of least resistance that fulfilled both of our requirements.
So all of this made it start to ask... Is the juice of provisioning, configuring, maintaining our own environment worth the squeeze?
AppFog is a platform built with those 12 Factor App core values. Under the AppFog covers, Cloud Foundry drives the management. AppFog provides some fantastic insights into what our costs are, the numbers of nodes running, and a app by app breakdown of costs. Deployment is easy! Configuration is easy! I don't have to provisioning VMs! BUT WAIT! There's more!
I don't have to provision VMs! Apps are defined using YAML manifests. We have an app manifest for each of our environments for dev/stage and production. App instances are created using buildpacks. Buildpacks are self-contained environments-in-a-box for all sorts of languages. Node.js, Golang, Mono just to name a few.
I'm not responsible for patch schedules! My team and I don't need to be on Slack waiting for the fit to hit the shan while I'm patching machines in the middle of the night.
Most importantly, we're not responsible for ongoing infrastructure maintenance! Who has managed an infrastructure with a large user base before? Takes lots of time, right? Tons of moving parts! For me and my team, I would much prefer us doing what we're GOOD at. Writing code over maintaining infrastructure.
Deployment is incredibly straightforward. Our GitHub instance calls a web hook on our build server (Encouraj uses Strider, if anyone is curious) whenever new commits are pushed to the repository. The build server executes two command line parameters to push a new build up to AppFog. Two lines! It could even be one line if we were feeling crazy! No more bastion hosts to configure and maintain. I'm not maintaining VMs, so I don't need to remember to remove that one guy who used to work with us' SSH key from authorized_hosts. My n00b engineers don't accidentally delete the production database. (I'm not the only one that's learned this hard lesson, right? Guess I'm keeping the Hello Kitty party hats of shame on my desk for yet another day...) Scaling up to more nodes is insanely quick. Either edit the manifest or modify the instances via the CLI. Done.
One of the coolest features we're using are zero-downtime deploys. It's as simple as pushing your new app with a temporary route, adding your app to the main pool, taking the old app out of the rotation, and removing the temporary route. Boom. Done. Scripting these types of deployments are hard. Sure, you can achieve the same thing using all kinds of tools like HAProxy and Consul. But can you do it in 4 CLI commands?
AppFog allows for adding external integrations as well with User-Provided Service Instances. We're using Papertrail and Pingdom right now. Papertail is a web-based logging tool providing all kinds of search and tail functionality much akin to Splunk. Papertrail integration was great. We exposed a log sink to the papertrail endpoint and were done. Debugging live issues is wildly easy. You're able to bind to other services from the marketplace or your own services with a CLI command. Adding new routes for new parts of your 12 Factor App? Easy.
So, in summary for our pre-AppFog era we've seen that it's maintenance heavy. We'd need to keep a full-time ops engineer on staff. To add to my already child-induced shortened sleep cycle, I'm waking up at 3 AM wondering if I configured all the nodes correctly. I'm waking up AGAIN at 4 AM panicking about missed security patches. I need to plan for and execute patch cycles. Scaling is slow with needing to provision/configure and remove members from the pool. Differing develop/stage/production code means I get more "worked on my machine" excuses. Please note: I've used this before. I'm not proud.
In the Post-AppFog and Cloud Foundry world, we get to do what we do best. Write great software. My sleepless nights are now consumed with wondering how the Seahawks are going to kick ass this year. We get easy, fast scaling out of any service that lives in our app.
Mindless zero-downtime deployments! Configuration is very limited.
We’re not debugging why the bastion host is unreachable. Our velocity speeds up since we're constantly shipping out new versions of the app thanks to AppFog and Cloud Foundry. We’re implementing canary deployments to identify and fix possible production issues before the issues hit the whole user population. Obligatory Success Kid.
I’d say the juice is absolutely worth the AppFog squeeze.
Here are some resources if you're curious to read up on anything I've talked about. I highly recommend giving the 12factor.net site a read. Even if you can't implement them right away, they're fantastic things to keep in the back of your mind. Read up on the CloudFoundry docs and see how easily you can get things running.
Feel free to reach out to me personally if you have questions or comments. I love hearing feedback even if it's of the "YOU SUCK!" nature. Check out our website or follow us on Twitter or Facebook.