A Tale of Two Workflows - ChefConf 2014
Upcoming SlideShare
Loading in...5
×
 

A Tale of Two Workflows - ChefConf 2014

on

  • 1,317 views

Watch this talk here: https://www.youtube.com/watch?v=L__8o02od6Q ...

Watch this talk here: https://www.youtube.com/watch?v=L__8o02od6Q

For an example of the code we used in our CI pipeline to make a Chef Environment from a Berksfile.lock - check out this project:
https://github.com/petecheslock/berks2env

One of the biggest advantages of Chef is it's flexibility, allowing you to customize it at-will to fit your infrastructure needs. While this makes Chef incredibly powerful, it can also be challenging to develop a workflow to manage the day-to-day usage of chef.

Should I use a single repo for all my cookbooks?
One cookbook per repo?
Berkshelf?
Librarian?
Test-Kitchen?
Where does Jenkins(CI) fit it?
What about Testing?
How does this work with my small team? What about my large team? What about my * Distributed Team?
Over the past few years I have been a part of two distinct Chef workflows that take opposite paths about how to solve issues around collaboration, versioning, testing, etc. During the course of this talk I will share:

Details about the requirements that lead us down these 2 paths.
What worked.
What didn't.
How we use many of the tools available to safely test code changes.
How we deploy cookbook changes safely and quickly (and keep uptime our highest priority).

Statistics

Views

Total Views
1,317
Views on SlideShare
1,284
Embed Views
33

Actions

Likes
2
Downloads
16
Comments
0

2 Embeds 33

https://twitter.com 30
http://www.slideee.com 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

A Tale of Two Workflows - ChefConf 2014 A Tale of Two Workflows - ChefConf 2014 Presentation Transcript

  • A Tale of Two Workflows Pete Cheslock @petecheslock
  • Age of Wisdom?
  • Age of Foolishness?
  • Who Am I? Pete Cheslock Currently - Rabble Rouser at Dyn ! Previously at Sonian - one of the very early Opscode Chef™ Customers (probably?). Also Sensu.
  • Disclaimer WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS YOU CAN USE CHEF AND IS INTENDED FOR A MATURE AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE WORKFLOWS USED AND DISCOURAGES ANYONE FROM ATTEMPTING THEM.
  • Disclaimer WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS YOU CAN USE CHEF AND IS INTENDED FOR A MATURE AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE WORKFLOWS USED AND DISCOURAGES ANYONE FROM ATTEMPTING THEM. THIS TALK MAY ANGER YOU - I’M HERE IF YOU NEED A HUG AFTERWARDS
  • Double Disclaimer For the love of all that is DevOps..
  • Double Disclaimer For the love of all that is DevOps.. Please don’t Cargo Cult this.
  • What do you do here? I’m a people person - I swear.
  • Biases rule everything around me
  • Chef The cause of... and solution to... all of life's problems.
  • Environments Databags Roles are good Roles are bad WTF is a Berkshelf? Librarian? Chef Server Chef Zero Vagrant-Berkswhat? Hosted Chef LWRPs Don’t Use Definitions! Definitions are Awesome!
  • Pick Your Poison
  • Sonian Founded 2008 2008 AWS Startup Challenge Finalist I joined in 2009 Very early Chef user - Originally with Puppet (before Opscode existed) Pre-Databags, Roles, etc, etc. Massive growth in short time - reaching 100’s of TB’s of ElasticSearch and well over a PB of S3 Storage.
  • https://github.com/opscode/chef-repo .chef/knife.rb cookbooks data_bags environments roles
  • Soon - business started to pick up - very quickly. Speed picked up, things moved fast and we broke stuff
  • Soon - business started to pick up - very quickly. Speed picked up, things moved fast and we broke stuff To close some deals we had contracts signed that would limit when we could push changes to the systems.
  • Customer A: HEAD sonian/chef-repo:master Customer B: fd50a5c Customer C: sonian/chef-repo:tag-v0.1.1 a1add77
  • Customer A: HEAD sonian/chef-repo:master Customer B: fd50a5c Customer C: sonian/chef-repo:tag-v0.1.1 HEAD
  • Now imagine that scenario with 20 environments - Each environment living either on AWS, Rackspace Cloud, HP Cloud or IBM “SmartCloud” Each environment has a different contracted deployment schedule. I know what you are thinking - system changes aren’t a “deploy” - well next time I’ll bring you to meet with the lawyers on that.
  • How did this work in practice? In the past we’d push a small change to Prod - everything would break terribly. Lots of technical debt - scenarios that no one could ever believe could happen This is email archiving - in some cases customers would have mail forwarded to us via their mail server. We CAN NOT drop that mail. If they are audited and we are proven to be missing data - that is really, really bad. Srs super bad.
  • We liked our single Chef-repo Every Story had Branch- and we got into the cycle of commit, merge, push and test Represented our pre-prod environments as branches in git - using some internal tooling to manage.
  • eng-9999 HEAD (master) QA (Daily) Dev (Daily) Cut a new branch from Master Developer adds commits and test locally Developer merges to dev branch for dev testing If things “work” and nothing breaks - merge to QA If it passes regression testing - merge into master (with others)
  • • roles/stack.rb • base.rb • nonprod.rb • cloud.rb (ec2, rackspace) • roles/application.rb • application.rb • service.rb • etc.rb “Hold on a minute. I’m just going to push this small change to this one role.” It’s roles all the way down
  • We got burned all the time. “Move Fast and Break Everything” Needed something that worked for today & the future Let’s create a Git branching strategy!
  • Wut? I know. Seriously. I know. We were trying to answer this one question. “How do you version the cookbooks, roles, and databags as one singular asset.”
  • release/ 2011-08-01 release/ 2011-07-01 master (HEAD)
  • release/ 2011-08-01 base/ 2011-07-01 release/ 2011-07-01 master (HEAD) Cut a new branch for the release At the same time create a base/ release tag.
  • release/ 2011-08-01 base/ 2011-07-01 release/ 2011-07-01 master (HEAD) Cut a new branch for the release At the same time create a base/ release tag. QA New code constantly hitting master
  • release/ 2011-08-01 eng-9999 base/ 2011-07-01 release/ 2011-07-01 master (HEAD) Cut a new branch for the release At the same time create a base/ release tag. QA New code constantly hitting master Checkout a branch from the Base Tag Merge code into Release branch Merge into master if you want it to advance
  • base/ 2011-08-01 release/ 2011-08-01 eng-9999 base/ 2011-07-01 release/ 2011-07-01 master (HEAD) Cut a new branch for the release At the same time create a base/ release tag.
  • base/ 2011-08-01 release/ 2011-08-01 eng-9999 base/ 2011-07-01 release/ 2011-07-01 master (HEAD) Make individual commits and Cherry-pick forward Cut a new branch for the release At the same time create a base/ release tag.
  • base/ 2011-08-01 release/ 2011-08-01 eng-9999 base/ 2011-07-01 release/ 2011-07-01 master (HEAD) Cut a new branch for the release At the same time create a base/ release tag. Rebase & Squash commits branches Backwards
  • That sounds overly complex We has some git experts - and it leveled up all our game. Extensive tooling around our branching strategy. We were Release Engineering.
  • https://github.com/sniperd/mise-en-place
  • So What Happened? It actually worked. Not only that - it really worked well. 20+ Stacks, upgrading 4 per night (6pm to 12pm if you are lucky) Before “Deploy Week” - we deployed all the time - and things broke all the time.
  • Over the course of about 12 months we went from: Deploy whenever - things break randomly (little testing) Create a multi-page deploy checklist of mostly manual items “Deploy Week” - 20 Stacks over 5 days (6pm to 12am - hopefully) “Deploy Day” - 20 Stack over one night - 6pm to 9pm “Deploy Day” - Saturday (contracts) - Best time was 20+ stacks ~1 hour
  • Deploys were drama free They were drama free because we tested all the pieces that changes together. And not just unit and integration testing, but full on regression testing and user acceptance testing. DataBags, Roles, Cookbooks, Application Code - It all moved together. Tooling was built to support the support team (who eventually did the deploys) High communication and tight teamwork allowed this to work.
  • “If I could do it all over again I would do it very differently”
  • Dyn Incorporated in 2001, Dyn’s global presence services more than four million enterprise, small business and personal customers. We specialize in Traffic Management & Message Management I joined early in 2013 to run the System Automation and Release Engineering Team (We call it DevTools)
  • There is always technical debt in the banana stand
  • Chef CFEngine Puppet NIH
  • Develop a pipeline that allows for simple usage by plugging it into a CI system for automated testing and deployment.
  • ! But the hardest challenge is that change is dangerous. It’s even more frightening when you have a MASSIVE chunk of the internet depending on you to stay running ALL THE TIME.
  • Do it w/o taking down the internet If we don’t build in the necessary gates and levers to allow for lots of testing and controlled deploy options out to our edge systems, bad things can happen.
  • Scope of bad
  • Scope of bad
  • Scope of bad
  • Scope of bad
  • Scope of bad
  • Scope of bad
  • Initial Challenges We have lots of FreeBSD Change is hard - especially to unknown systems. We really wanted to deploy a solution that was going to bring in Zero Dependencies.
  • I heard you like FreeBSD…
  • Now that FreeBSD problem is solved - we were able to start deploying Chef out to all our nodes. We created a role[base] - which includes a run list of items of things we wanted in place. About a month later or so - we wanted to push a change to that role - at the same time it was linked to some specific cookbook versions.
  • So basically we wanted a versioned run list - but we also want to set and override some attributes also. So we decided to move our roles (since we were not using them much yet) and just focus on using wrapper recipes. The bonus here is that any person can just clone a cookbook - and run Test- Kitchen & Serverspec on that “role” to get a node just like it. No dealing with roles from other cookbooks.
  • Roles vs. No Roles
  • The wrapper recipe idea made sense to us because we wanted to make sure that when we used community cookbooks - we never edited them. So for example we have a dyn_ci recipe which wraps the functionality inside of the Jenkins recipe. When Jenkins updates from 1.0 to 2.0 - we simply update and refactor our wrapper cookbook and set the version constraint in the metadata as appropriate.
  • Circular Dependency
  • We use the default chef-full template and it has a section that looks like this:
  • ! Where are most community cookbooks stored? github.com & community.opscode.com. Who does their DNS? You see where we are going. So - we created a new organization on our Enterprise Chef Server - called the cookbook repo, where we stored community cookbooks we used.
  • Later we moved those to Github Enterprise locally for 2 reasons. 1. It allowed anyone to easily see which cookbooks we already had locally. 2.It allowed us to run short time forks of those cookbooks while we pushed the changes upstream to the owner. (and for people to see those changes.
  • Remove the humans from the equation ! Foodcritic, chefspec, rubocop, serverspec thor-scmversion to automate versioning and git tagging.
  • Run will execute - if the tests pass - thor will version based on #patch, #minor, #major
  • So we try to speed up the iteration to master So - now the development cycle looks like User cuts a branch - makes changes - runs tests locally (we hope) - then submits a pull request. Jenkins tests the PR - if good - report back to GH:E with Green. When merged - Jenkins runs the tests again - if they pass then Jenkins will tag the release and upload it to the cookbookrepo.
  • Development Deployment
  • How has this worked? We are the product owner On-Demand support internally Training Mentoring
  • All new apps come with cookbooks. They even come with tests. (Yay!) Test Kitchen and Berkshelf for our local development and deploy
  • github.com/dyninc/cookbookapi So we built our own cookbook api to use (with Berks 2) that let us use our own site with our own cookbooks (and the community cookbooks in our site repo)
  • So how do you get it to production? So - the requirements were such that we wanted a few thing Easily be able to deploy to a single node in a site Easily be able to deploy to a single node in many sites Easily be able to deploy to a single node in every site Easily be able to deploy to a single node in a region Easily be able to deploy to a single node in many sites …… you get the point. EVERY POSSIBLE DEPLOY SCENARIO.
  • Represent state of chef org in Git Act as single source of truth Have Jenkins manage the upload of those cookbooks to prod Ensure the environment locks those cookbooks explictly
  • So, i already told you we didn’t use roles because we really wanted to be able to version the run list (many people other than us could be touching that). We have thor-scmversion auto bumping the versions of cookbooks (and freezing on upload to the package server) As one does. We knew that when we ran node in production - we want it in an environment with very specific cookbook version locks. And we wanted those environment to be immutable. Created and uploaded in an automated way.
  • We’ve been using thor-scm for versioning our cookbooks - why not our servers too?
  • 1_5_LATEST1_5_0 1_4_123 1_4_LATEST1_4_1251_4_124-alpha_1 app-2 app-1 1_4_LATEST Virtual Real =
  • 1_5_LATEST1_5_0 1_4_123 1_4_LATEST1_4_125 1_4_124-alpha_1 app-2 app-1 1_4_LATEST 6ead49d Deploy dyn_myface v1.0.3 Virtual Real =
  • 1_5_LATEST1_5_0 1_4_123 1_4_LATEST1_4_125 1_4_124-alpha_1 app-2 app-1 1_4_LATEST 6ead49d Deploy dyn_myface v1.0.3 Virtual Real =
  • 1_5_LATEST1_5_0 1_4_123 1_4_LATEST 1_4_125 1_4_124-alpha_1 app-2 app-1 1_4_LATEST 6ead49d Deploy dyn_myface v1.0.3 d6b0b7e Deploy dyn_myface v1.0.3 to all #patch Virtual Real =
  • 1_5_LATEST1_5_0 1_4_123 1_4_LATEST 1_4_125 1_4_124-alpha_1 app-2 app-1 1_4_LATEST 6ead49d Deploy dyn_myface v1.0.3 d6b0b7e Deploy dyn_myface v1.0.3 to all #patch Virtual Real =
  • 1_5_LATEST 1_5_0 1_4_123 1_4_LATEST 1_4_125 1_4_124-alpha_1 app-2 app-1 1_4_LATEST = 6ead49d Deploy dyn_myface v1.0.3 d6b0b7e Deploy dyn_myface v1.0.3 to all #patch Virtual Real 7db580b Deploy dyn_myface v2.0.0 #minor =
  • 1_5_LATEST 1_5_0 1_4_123 1_4_LATEST 1_4_125 1_4_124-alpha_1 app-2 app-1 1_4_LATEST = 6ead49d Deploy dyn_myface v1.0.3 d6b0b7e Deploy dyn_myface v1.0.3 to all #patch Virtual Real 7db580b Deploy dyn_myface v2.0.0 #minor =
  • Limited allow list for deploy Anyone can propose a change to production - but the ops team will approve those changes. (for #patch or greater that is) The same workflow applies to pre-release environments.
  • Databags? Since we version all of our cookbooks using Thor-scmversion And we do the same with chef environments. And we need lots of flexibility with our code deployment process due to the nature of the system We built a tool that allows us to version our databags for deploy. https://github.com/Vanders/knife-databag-version
  • Version your databags? Seriously - what is wrong with you? We use databags pretty sparingly - mostly just encrypted databags for shared secrets and other info. Our engineers ask us for the flexibility - we build the tools. The tools enable the workflow.
  • What’s this all look like? assume we have a simple data bag item:
  • with knife data bag version this becomes a template:
  • knife data bag version can then create a JSON file using this template:
  • knife data bag version will emit a JSON file:
  • All managed by Jenkins - hands off for the developer Databags the same as cookbooks - and allow for more flexible deploy options for us. We still use standard databags - this is just another lever to pull
  • Room for improvement? #minor and #major Site to abstract changing cookbook versions. Upload cookbooks early - control with environment version locks
  • Thank You Pete Cheslock petecheslock@gmail.com @petecheslock
  • Thank You Pete Cheslock petecheslock@gmail.com @petecheslock