A Tale of Two Workflows - ChefConf 2014

A Tale of Two Workﬂows
Pete Cheslock
@petecheslock

Who Am I?
Pete Cheslock
Currently - Rabble Rouser at Dyn
!
Previously at Sonian - one of the very early Opscode Chef™
Customers (probably?). Also Sensu.

Disclaimer
WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS
YOU CAN USE CHEF AND IS INTENDED FOR A MATURE
AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE
WORKFLOWS USED AND DISCOURAGES ANYONE FROM
ATTEMPTING THEM.

Disclaimer
WARNING: THIS TALK FEATURES TWO CRAZY ASS WAYS
YOU CAN USE CHEF AND IS INTENDED FOR A MATURE
AUDIENCE. PETE CHESLOCK DOES NOT CONDONE THE
WORKFLOWS USED AND DISCOURAGES ANYONE FROM
ATTEMPTING THEM.
THIS TALK MAY ANGER YOU - I’M HERE IF YOU NEED
A HUG AFTERWARDS

Double Disclaimer
For the love of all that is DevOps..

Double Disclaimer
For the love of all that is DevOps..
Please don’t Cargo Cult this.

What do you do here?
I’m a people person - I swear.

Biases rule everything around me

Chef
The cause of... and solution to... all of life's problems.

Environments
Databags
Roles are good
Roles are bad
WTF is a Berkshelf?
Librarian?
Chef Server
Chef Zero
Vagrant-Berkswhat?
Hosted Chef
LWRPs
Don’t Use Deﬁnitions!
Deﬁnitions are Awesome!

Sonian
Founded 2008
2008 AWS Startup Challenge Finalist
I joined in 2009
Very early Chef user - Originally with
Puppet (before Opscode existed)
Pre-Databags, Roles, etc, etc.
Massive growth in short time - reaching
100’s of TB’s of ElasticSearch and well
over a PB of S3 Storage.

https://github.com/opscode/chef-repo
.chef/knife.rb
cookbooks
data_bags
environments
roles

Soon - business started to pick
up - very quickly.
Speed picked up, things moved
fast and we broke stuff

Soon - business started to pick
up - very quickly.
Speed picked up, things moved
fast and we broke stuff
To close some deals we had
contracts signed that would
limit when we could push
changes to the systems.

Customer A:
HEAD
sonian/chef-repo:master
Customer B:
fd50a5c
Customer C:
sonian/chef-repo:tag-v0.1.1
a1add77

Customer A:
HEAD
sonian/chef-repo:master
Customer B:
fd50a5c
Customer C:
sonian/chef-repo:tag-v0.1.1
HEAD

Now imagine that scenario with 20 environments - Each
environment living either on AWS, Rackspace Cloud, HP Cloud or
IBM “SmartCloud”
Each environment has a different contracted deployment schedule.
I know what you are thinking - system changes aren’t a “deploy” -
well next time I’ll bring you to meet with the lawyers on that.

How did this work in practice?
In the past we’d push a small change to Prod - everything would
break terribly. Lots of technical debt - scenarios that no one could
ever believe could happen
This is email archiving - in some cases customers would have mail
forwarded to us via their mail server. We CAN NOT drop that
mail. If they are audited and we are proven to be missing data -
that is really, really bad. Srs super bad.

We liked our single Chef-repo
Every Story had Branch- and we got into the cycle of commit,
merge, push and test
Represented our pre-prod environments as branches in git - using
some internal tooling to manage.

eng-9999
HEAD
(master)
QA
(Daily)
Dev
(Daily)
Cut a new branch
from Master
Developer adds
commits and test
locally
Developer merges to
dev branch for dev
testing
If things “work”
and nothing breaks -
merge to QA
If it passes
regression testing -
merge into master
(with others)

• roles/stack.rb
• base.rb
• nonprod.rb
• cloud.rb (ec2, rackspace)
• roles/application.rb
• application.rb
• service.rb
• etc.rb
“Hold on a minute. I’m
just going to push this small
change to this one role.”
It’s roles all the way down

We got burned all the time.
“Move Fast and Break Everything”
Needed something that worked for
today & the future
Let’s create a Git branching
strategy!

Wut?
I know.
Seriously. I know.
We were trying to answer this one question.
“How do you version the cookbooks, roles,
and databags as one singular asset.”

release/
2011-08-01
release/
2011-07-01
master
(HEAD)

release/
2011-08-01
base/
2011-07-01
release/
2011-07-01
master
(HEAD)
Cut a new branch
for the release
At the same time
create a base/
release tag.

release/
2011-08-01
base/
2011-07-01
release/
2011-07-01
master
(HEAD)
Cut a new branch
for the release
At the same time
create a base/
release tag.
QA
New code constantly
hitting master

release/
2011-08-01
eng-9999
base/
2011-07-01
release/
2011-07-01
master
(HEAD)
Cut a new branch
for the release
At the same time
create a base/
release tag.
QA
New code constantly
hitting master
Checkout a branch
from the Base Tag
Merge code into
Release branch
Merge into master if
you want it to advance

base/
2011-08-01
release/
2011-08-01
eng-9999
base/
2011-07-01
release/
2011-07-01
master
(HEAD)
Cut a new branch
for the release
At the same time
create a base/
release tag.

base/
2011-08-01
release/
2011-08-01
eng-9999
base/
2011-07-01
release/
2011-07-01
master
(HEAD)
Make individual commits and
Cherry-pick forward
Cut a new branch
for the release
At the same time
create a base/
release tag.

base/
2011-08-01
release/
2011-08-01
eng-9999
base/
2011-07-01
release/
2011-07-01
master
(HEAD)
Cut a new branch
for the release
At the same time
create a base/
release tag.
Rebase & Squash
commits branches
Backwards

That sounds overly complex
We has some git experts - and it
leveled up all our game.
Extensive tooling around our
branching strategy.
We were Release Engineering.

https://github.com/sniperd/mise-en-place

So What Happened?
It actually worked.
Not only that - it really worked well.
20+ Stacks, upgrading 4 per night (6pm to 12pm if you are lucky)
Before “Deploy Week” - we deployed all the time - and things broke
all the time.

Over the course of about 12 months we went from:
Deploy whenever - things break randomly (little testing)
Create a multi-page deploy checklist of mostly manual items
“Deploy Week” - 20 Stacks over 5 days (6pm to 12am - hopefully)
“Deploy Day” - 20 Stack over one night - 6pm to 9pm
“Deploy Day” - Saturday (contracts) - Best time was 20+ stacks ~1 hour

Deploys were drama free
They were drama free because we tested all the pieces that changes
together. And not just unit and integration testing, but full on
regression testing and user acceptance testing.
DataBags, Roles, Cookbooks, Application Code - It all moved together.
Tooling was built to support the support team (who eventually did the
deploys)
High communication and tight teamwork allowed this to work.

“If I could do it all over again I would do it very differently”

Dyn
Incorporated in 2001, Dyn’s
global presence services more than
four million enterprise, small
business and personal customers.
We specialize in Trafﬁc Management
& Message Management
I joined early in 2013 to run the
System Automation and Release
Engineering Team
(We call it DevTools)

There is always technical debt in the banana stand

Develop a pipeline that allows
for simple usage by plugging it
into a CI system for
automated testing and
deployment.

!
But the hardest challenge is that change is dangerous. It’s even
more frightening when you have a MASSIVE chunk of the internet
depending on you to stay running ALL THE TIME.

Do it w/o taking down the internet
If we don’t build in the
necessary gates and levers to
allow for lots of testing and
controlled deploy options out
to our edge systems, bad
things can happen.

Initial Challenges
We have lots of FreeBSD
Change is hard - especially to unknown systems.
We really wanted to deploy a solution that was going to bring in
Zero Dependencies.

Now that FreeBSD problem is solved - we were able to start
deploying Chef out to all our nodes.
We created a role[base] - which includes a run list of items of things
we wanted in place.
About a month later or so - we wanted to push a change to that role
- at the same time it was linked to some speciﬁc cookbook versions.

So basically we wanted a versioned
run list - but we also want to set and
override some attributes also.
So we decided to move our roles (since
we were not using them much yet) and
just focus on using wrapper recipes.
The bonus here is that any person can
just clone a cookbook - and run Test-
Kitchen & Serverspec on that “role”
to get a node just like it. No dealing
with roles from other cookbooks.

The wrapper recipe idea made sense to us because we wanted to
make sure that when we used community cookbooks - we never
edited them. So for example we have a dyn_ci recipe which wraps
the functionality inside of the Jenkins recipe.
When Jenkins updates from 1.0 to 2.0 - we simply update and
refactor our wrapper cookbook and set the version constraint in the
metadata as appropriate.

We use the default chef-full template and it has a section that looks
like this:

!
Where are most community cookbooks stored? github.com &
community.opscode.com. Who does their DNS? You see where
we are going.
So - we created a new organization on our Enterprise Chef Server
- called the cookbook repo, where we stored community
cookbooks we used.

Later we moved those to Github Enterprise locally for 2 reasons.
1. It allowed anyone to easily see which cookbooks we already had
locally.
2.It allowed us to run short time forks of those cookbooks while we
pushed the changes upstream to the owner. (and for people to see
those changes.

Remove the humans from the equation
!
Foodcritic, chefspec, rubocop,
serverspec
thor-scmversion to automate
versioning and git tagging.

Run will
execute - if
the tests
pass - thor
will version
based on
#patch,
#minor,
#major

So we try to speed up the iteration to master
So - now the development cycle looks like
User cuts a branch - makes changes - runs tests locally (we hope) -
then submits a pull request.
Jenkins tests the PR - if good - report back to GH:E with Green.
When merged - Jenkins runs the tests again - if they pass then
Jenkins will tag the release and upload it to the cookbookrepo.

How has this worked?
We are the product owner
On-Demand support internally
Training
Mentoring

All new apps come with
cookbooks.
They even come with tests.
(Yay!)
Test Kitchen and Berkshelf
for our local development and
deploy

github.com/dyninc/cookbookapi
So we built our own
cookbook api to use
(with Berks 2) that let
us use our own site
with our own
cookbooks (and the
community cookbooks
in our site repo)

So how do you get it to production?
So - the requirements were such that we wanted a few thing
Easily be able to deploy to a single node in a site
Easily be able to deploy to a single node in many sites
Easily be able to deploy to a single node in every site
Easily be able to deploy to a single node in a region
Easily be able to deploy to a single node in many sites
…… you get the point. EVERY POSSIBLE DEPLOY SCENARIO.

Represent state of chef org in Git
Act as single source of truth
Have Jenkins manage the upload of
those cookbooks to prod
Ensure the environment locks those
cookbooks explictly

So, i already told you we didn’t use roles because we really wanted to be able
to version the run list (many people other than us could be touching that).
We have thor-scmversion auto bumping the versions of cookbooks (and
freezing on upload to the package server) As one does.
We knew that when we ran node in production - we want it in an
environment with very speciﬁc cookbook version locks.
And we wanted those environment to be immutable. Created and uploaded
in an automated way.

We’ve been using thor-scm for versioning our cookbooks - why not
our servers too?

1_5_LATEST1_5_0
1_4_123
1_4_LATEST1_4_1251_4_124-alpha_1
app-2
app-1
1_4_LATEST
Virtual Real
=

1_5_LATEST1_5_0
1_4_123
1_4_LATEST1_4_125
1_4_124-alpha_1
app-2
app-1
1_4_LATEST
6ead49d Deploy dyn_myface v1.0.3
Virtual Real
=

1_5_LATEST1_5_0
1_4_123
1_4_LATEST 1_4_125
1_4_124-alpha_1
app-2
app-1
1_4_LATEST
d6b0b7e Deploy dyn_myface v1.0.3 to all #patch
Virtual Real
=

1_5_LATEST 1_5_0
1_4_123
1_4_LATEST 1_4_125
1_4_124-alpha_1
app-2
app-1
1_4_LATEST
=
d6b0b7e Deploy dyn_myface v1.0.3 to all #patch
Virtual Real
7db580b Deploy dyn_myface v2.0.0 #minor
=

Limited allow list for deploy
Anyone can propose a change to production - but the ops team will
approve those changes. (for #patch or greater that is)
The same workﬂow applies to pre-release environments.

Databags?
Since we version all of our cookbooks using Thor-scmversion
And we do the same with chef environments.
And we need lots of ﬂexibility with our code deployment process
due to the nature of the system
We built a tool that allows us to version our databags for deploy.
https://github.com/Vanders/knife-databag-version

Version your databags?
Seriously - what is wrong with
you?
We use databags pretty sparingly
- mostly just encrypted databags
for shared secrets and other info.
Our engineers ask us for the
ﬂexibility - we build the tools.
The tools enable the workﬂow.

What’s this all look like?
assume we have a simple data bag item:

with knife data bag version this becomes a template:

knife data bag version can then create a JSON ﬁle using this template:

knife data bag version will emit a JSON ﬁle:

All managed by Jenkins - hands off
for the developer
Databags the same as cookbooks -
and allow for more ﬂexible deploy
options for us.
We still use standard databags - this is
just another lever to pull

Room for improvement?
#minor and #major
Site to abstract changing
cookbook versions.
Upload cookbooks early -
control with environment
version locks

Thank You
Pete Cheslock
petecheslock@gmail.com
@petecheslock

A Tale of Two Workflows - ChefConf 2014

More Related Content

What's hot

Viewers also liked

Similar to A Tale of Two Workflows - ChefConf 2014

More from Pete Cheslock

Recently uploaded

A Tale of Two Workflows - ChefConf 2014