"Puppet at GitHub / ChatOps" from PuppetConf 2012, by Jesse Newland
Video of "Puppet at GitHub": http://bit.ly/WVS3vQ
Learn more about Puppet: http://bit.ly/QQoAP1
Abstract: Ops at GitHub has a unique challenge - keeping up with the rabid pace of features and products that the GitHub team develops. In this talk, we'll focus on tools and techniques we use to rapidly and confidently ship infrastructure changes/features with Puppet using Puppet-Rspec, CI, Puppet-Lint, branch puppet deploys, and Hubot.
Speaker Bio: Jesse Newland does Ops at GitHub. His favorite hobby is SPOF wack-a-mole, followed closely by guitar and piano. Prior to GitHub, Jesse was the CTO at Rails Machine where he ran a large private cloud and managed several hundred production Ruby on Rails applications using Puppet. To the delight and/or chagrin of the Puppet community, Jesse is to blame for Moonshine, the Ruby DSL for Puppet before Puppet had a Ruby DSL.
2. Jesse Newland
jnewland
hey errbody
my name is jesse newland
I do ops at GitHub
3. Puppet
at
GitHub
And today I’m going to be talking about Puppet at GitHub.
Really, I’m telling a story in two parts.
4. All of the amazing Puppet
OSS projects @rodjek
has written but doesn’t
want to talk about
First... I’ll be talking about all of the amazing Puppet open source projects Tim Sharpe has
written but doesn’t want to talk about
and how we use them at GitHub
5. *
And then, I want to introduce you to the star of the GitHub Ops team, Hubot, and tell you a
little bit about something we’ve been calling ChatOps
6. the
Setup
But, before I get into all of that, I'm actually going to talk about an
upcoming talk, one by a coworker of mine at GitHub. Will Farrington
is going to be speaking tomorrow at 2:45pm about The Setup, our
Puppet-powered GitHubber laptop management solution. It's
amazing. It's one of the coolest uses of Puppet I've ever seen, and
it's going to completely change the way you think about your
development environment.
But I’m not going to be talking about any of that today.
So, yeah, go to Will's talk tommorrow. You won't be disappointed.
7. Puppet
at
GitHub
So I guess you could say that I’m talking about
8. THE of
REST
Puppet
at
GitHub
the rest of puppet at github. For the scope of this talk, I’m going to be talking about the
Puppet infrastructure that runs github.com
9. 4 years, >100k LOC
We’ve been managing GitHub’s infrastructure with Puppet for 4 years, since the move to
Rackspace. There’s a ton of code, and we’re developing at a rapid pace.
10. Simple
But we are obsessed with keeping our Puppet deployment simple
11. Single
Master
We use a single puppetmaster running lots of unicorns. Nothing fancy. It works for now.
However, we will need to scale this tier up or out in about 6 months if the trends look right.
We’ll probably switch to two load balanced puppetmasters around that time.
12. cron FTW
# cat /etc/cron.d/puppet
13 * * * * root /usr/bin/
We don’t run the agent, but rather run puppet on cron every hour in combination with runs
triggered via Hubot (more on that later)
13. No
ENC
We don’t use an external node classifier
14. ([a-z0-9-_]+)(d+)([a-z]?).(.*).github.com
$ cat manifests/nodes/janky.rscloud.pp
node /^jankyd+.rscloud.github.com$/ {
github::role::janky { 'janky':
public_address => dns_lookup($fqdn),
nginx_hostname => $fqdn,
}
}
Instead, we give nodes DNS names that adhere to a naming convention that maps them to a
pre-defined role
15. Where the magic happens
$ head modules/github/manifests/role/janky.pp
define github::role::janky($public_address,
$nginx_hostname='',
$god=true
) {
github::core { 'janky': }
include github::app::janky
github::nginx { 'janky': }
}
Role definitions are where the magic happens. We try to DRY common functionality into our
core module and into other simple classes or defines so that role definitions read like a nice
summary of what makes this role different from others
16. Heavy use of augeas
augeas { 'my.cnf/avoid_cardinality_skew':
context => '/files/etc/mysql/my.cnf/mysqld/',
changes => [
'set innodb_stats_auto_update 0',
'set innodb_stats_on_metadata 0',
'set innodb_stats_on_metadata 64'
],
require => Percona::Server[$::fqdn],
}
We generally try to avoid templates for configuration files in favor of using aw ge us
Lets us manage the small pieces of configuration we care about and use the OS defaults for
the things we don't.
17. BORING
But I don’t want to just show all of you Puppet code for thirty minutes. That's boring
18. What’s interesting
about Puppet at
GitHub?
I’d rather talk about what's interesting about how we use Puppet at GitHub. And what I think
is the most interesting is that we focus heavily on ensuring the Puppet development workflow
is easily accessible to everyone at GitHub.
19. Making
Puppet Less
Scary
We’re doing our best to make puppet less scary for people that aren’t familiar with it, so they
can help the Ops team grow and evolve our infrastructure. We’re doing some things right
here, but there’s still a lot of work to do.
20. I’ve been thinking about this a lot recently as we’ve just had two large infrastructure projects
shipped by people that were completely or relatively new to puppet. First, Derek Greentree
shipped a Cassandra cluster,,,
21. And Adam Roben shipped puppet manifests for our windows build and CI servers.
22. this
is
good
This is an awesome trend, and I want it to continue. So I thought I’d talk a bit today about
what we’re doing to try to enable even more of this.
23. Flow just like
a (GitHub)
Ruby project
For us, an important part of making Puppet development accessible for other developers at
GitHub is making the development flow on our puppet codebase as similar as possible to that
of any other GitHub Ruby project. That means sticking with some common conventions
24. Setup
$ ./script/bootstrap
Like making it as easy to setup as any other project at GitHub
26. $ cat Puppetfile
forge "http://forge.puppetlabs.com"
mod 'puppetlabs/apt'
...
And puppet deps are managed by librarian-puppet, a bundler-like library that manages the
puppet modules your infrastructure depends on and install them directly from GitHub
repositories.
I’m of the opinion that the unit of open source currency is no longer a tarball downloaded
from a something named *forge. It’s a GitHub repo. All of the developers at GitHub feel the
same way, so Tim wrote librarian puppet
27. rodjek / librarian-puppet
For those of you keeping score at home, that’s the first of Tim Sharpe’s open source projects
that I’ve mentioned. Hi Tim!
28. Making puppet flow like other projects at GitHub means ensuring we have good editor
support for the language
30. Tests
$ ./script/cibuild
It means running tests is a simple one-step process
31. TESTS!
Tests are super important. A solid and easy to use test harness helps build developer
confidence in a new language.
32. Safety
net
And tests are crucial safety net for helping people cut their teeth on Puppet if they haven’t
ever touched it before.
33. rspec-puppet
should contain_github__firewall_rule('internal_network')
should contain_ssmtp__relay_to('smtp').with_relay_host('smtp')
should contain_file('/etc/logstash/logstash.conf')
should include_class('github::ksplice')
should contain_networking__bond('bond0').with(
:gateway => '172.22.0.2',
:arp_ip_target => '172.22.0.2',
:up_commands => nil
)
We use rspec-puppet heavily. If you haven’t used rspec-puppet yet, go check it out right
now.
It’s amazing.
There are no less than three talks about it at Puppetconf, so I’m not going to talk about HOW
to use it today, just touch a little bit on how WE use it.
34. rodjek / rspec-puppet
rspec-puppet, that’s three
35. role
describe 'github::role::fe' do
let(:title) { 'fe' }
let(:node) { 'fe1.rs.github.com' }
let(:params) {
{
specs
:public_address => '207.97.227.242/27',
:private_address => '172.22.1.59/22',
:git_weight => '16'
}
}
let(:facts) {
are
{
:ipaddress => '172.22.1.59',
:operatingsystem => 'Debian',
:datacenter => 'rackspace-iad2',
}
king
}
it do
should contain_github__core('fe')
...
end
end
We try our best to adequately test our individual puppet modules, but our central and most
frequently touched specs exercise our role system. There’s one spec for each role which
describes its intended functionality.
These specs focus on critical functionality of each role, and help a great deal to build
confidence that we’re not introducing regressions when adding or refactoring functionality or
working in other roles.
36. .git/hooks/pre-commit
$ git commit -am "lolbadchange"
modules/github/manifests/role/fe.pp:err: Could
not parse for environment production: Syntax
error at 'allow_outbound_syslog'; expected '}'
at /Users/jnewland/github/puppet/modules/github/
manifests/role/fe.pp:31
modules/github/manifests/role/fe.pp - WARNING:
=> is not properly aligned on line 626
For an even faster feedback loop than running specs, all Puppet dev environments
automatically get setup with a pre-commit hook that checks for syntax errors and ensures
your changes confirm to the Puppet Style guide.
This has proved amazingly useful for Puppet novices and experts alike, novices finding it
helps them understand language conventions quickly and guides them towards solutions,
and experts using it to catch typos and help them not look like novices.
38. specs run on each push
auto deploy on CI pass
rspec-puppet and puppet-lint are automatically run by CI on every commit on every branch
pushed to our Puppet repo.
Once master passes CI, puppet is automatically deployed
39. As you can see, Hubot automates a lot of the process of rolling out Puppet
That example covered pushing changes to master, but what about a Pull-Request based
workflow?
40. Say we have a pull request for a branch we want to merge, and that we’ve reviewed the code
and it all looks good.
41. branches
==
environments
On each deploy, we turn all git branches into puppet environments.
42. This combined with heaven, our capistrano-powered deployment API we interact with via
Hubot, enables us to experiment with unmerged Puppet branches in a powerful way
44. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
You might ask Hubot to confirm its build status
45. Build #108816
(5fe75932f26ea62cb5fc5e3d0cb302cc2461d11e)
of puppet/git-gh13 was successful(421s) github/
puppet@567ea48...5fe7593
Yup, looks good.
46. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Then roll the branch out to a staging box to make everything applies cleanly there.
48. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Then, if you wanted an extra layer of confidence, you could noop the branch against a
production node
49. ** [out :: REDACTED ] Bootstrapping...
** [out :: REDACTED ] Gem environment up-to-date.
** [out :: REDACTED ] Running librarian-puppet...
** [out :: REDACTED ] Generating puppet environments...
** [out :: REDACTED ] Cleaning up deleted branches...
** [out :: REDACTED ] Done!
** [out :: REDACTED ] Sending 'restart' command
** [out :: REDACTED ] The following watches were affected:
** [out :: REDACTED ] puppetmaster_unicorn
** [out :: fs1a.rs.github.com] info: Applying
configuration version
'8fb1a2716d5f950b836e511471a2bdac3ed27090'
** [out :: fs1a.rs.github.com] notice: /Stage[main]/
Github::Common_packages/Package[git]/ensure: would have
changed from '1:1.7.10-1+github12' to
'1:1.7.10-1+github13'
...
Yup, looks good
50. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Next, you’d merge the pull request. If you stopped here, the code would gradually roll out to
all affected nodes over the next hour.
51. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
If you wanted the rollout to happen faster than that, you could force a puppet run on the
affected class of nodes
53. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Then you’d probably want to check out load to make sure nothing went crazy
55. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
...and maybe check some logs or other related metrics to confirm your change didn’t break
something
57. ChatOps
How we interact with Puppet via Hubot is a great example of a core principal of how we do
ops at GitHub. We’ve been calling it ChatOps recently.
58. Essentially, ChatOps is the result of Hubot becoming sentient, and decreeing, among other
things, that we now address him as “Supreme Leader” and communicate with our
infrastructure though his secure channels alone.
We occasionally observe him speaking in tongues that sound eerily like YouTube comments.
59. Hubot
Actually, that’s not it at all. Hubot is the star of our Ops team.
60. heaven
shell Hubot janky
graphme
We use hubot day in day out to interact with other simple tools we’ve written over JSON apis.
61. ALL OF
hubotshell
heaven
janky
THE APIS graphme
Hubot interacts nicely with tons of external APIs too. If you have a JSON API, making your
service work with Hubot is a piece of cake.
62. Why is this stupid
chat bot so
important to Ops?
But why do we obsess about Hubot so much? It’s just a chat bot, right?
There are some distinct upsides to this approach we’ve notices as our use of Hubot in Ops
has grown
63. hubot ci status puppet/git-gh13
deploy:apply puppet/git-gh13 staging/fs1
deploy:noop puppet/git-gh13 prod/fs1
# merge pull request
hubot deploy:apply puppet to prod/fs
graph me -1h @collectd.load(fs*)
log me hooks github/github
Remember the flow I just showed you for rolling out puppet changes to our infrastructure?
64. Everyone sees all
of that happen
on their first day
Everyone sees all of this happen from the minute they join GitHub. It’s right there, in the Ops
room, right in the middle of the conversation in campfire.
65. You don’t just see how to roll out puppet, you see how to...
66. hubot ci status github/smoke-perf
check the status of branch’s last build
84. hubot whois 4.9.23.22
Once the outage has been resolved, you might see how to grab whois information for an IP
that exhibited suspicious activity in the logs you saw
85. hubot khanify spammers
and how to hit meme generator to make a joke when you realize that IP is a spammer
86. hubot play in the air tonight
then someone would queue up the song that popped into their head when they thought
about drums and gorillas at the same time
87. hubot tweet@github PuppetConf Drinkup Friday
night at 8:30 at Zeke’s
(3rd & Brannan)
and then finish it all off with a tweet about the Drinkup we’re throwing friday night
88. ChatOps
ChatOps means building tools that make it easier to operate your infrastructure via Hubot
than via Terminal or Chrome
89. By placing tools
directly in the
middle of the
conversation
Because...
90. Everyone
is pairing
all of the time
This is the core concept behind ChatOps.
91. Teaching by
doing
Teaching by doing is awesome
92. This was always my main
motivation with hubot - teaching
by doing by making things
visible. It's an extremely
powerful teaching
technique - @rtomayko
Ryan Tomayko had this in mind from the very first commits to hubot, which just presented a
simple wrapper around a repository of shell scripts we use for management and monitoring
our infrastructure.
93. This is how I respond to “how to I do X” questions in Campfire now.
If there’s not yet Hubot functionality to do a thing, we try to write it.
94. Communicate
by
doing
Placing tools in the middle of the conversation also means you get communication of your
work for free.
If you’re doing something in a shell or on a website, you have to do it, then tell people about
it. If you do it with hubot, that comes free.
95. THINGS I
HAVEN’T ASKED
RECENTLY
For example, here are a few things I haven’t asked recently because Hubot has told me the
answer
97. THINGS I
are you deploying that or should i?
HAVEN’T ASKED
how’s that deploy going?
RECENTLY
98. THINGS I
are you deploying that or should i?
HAVEN’T ASKED
how’s that deploy going?
is anyone responding to that nagios alert?
RECENTLY
99. THINGS I
is that branch green?
are you deploying that or should i?
HAVEN’T ASKED
how’s that deploy going?
is anyone responding to that nagios alert?
RECENTLY
100. THINGS I
is that branch green?
are you deploying that or should i?
HAVEN’T ASKED
how’s that deploy going?
is anyone responding to that nagios alert?
RECENTLY
how does load look?
101. did anyone update the status page?
THINGS I
is that branch green?
are you deploying that or should i?
HAVEN’T ASKED
how’s that deploy going?
is anyone responding to that nagios alert?
RECENTLY
how does load look?
102. did anyone update the status page?
THINGS I
is that branch green?
are you deploying that or should i?
HAVEN’T ASKED
how’s that deploy going?
is anyone responding to that nagios alert?
RECENTLY
how does load look?
did that deploy finish?
106. Hide the
ugly
Another awesome benefit of ChatOps-ing all of the things is that you can hide ugly interfaces
and design exactly the interaction you want with some simple porcelain commands
109. hubot nagios ack fs3b/syslog
# fix stuff
nagios check fs3b/syslog
nagios status fs3b/syslog
hubot nagios downtime fs3b/syslog 90
nagios mute fs3b/syslog
nagios unmute fs3b/syslog
Which we can interact with without any unnecessary eye bleeding. Making this easy means
developers and other ops engineers actually mute or schedule downtime when they’re testing
things.
110. Mobile
FTW
Yet another awesome benefit of ChatOps is that you get mobile support for free
111. Well, that is, if you have a team of awesome iOS developers that have built an actually
functioning Campfire client for the iPhone
This lets you do anything hubot can do from your phone.
Which means from your couch. Or your bed. Or a beach in Hawaii.
Which means you can fix a lot of things without pulling your laptop out of your bag.
113. And now for
something
completely
different
While I’m showing off mobile stuff, I thought I’d slip in a demo of something else we’ve done
to make Ops more mobile friendly.
114. We’ve hacked together support for PagerDuty alerts via Apple Push Notifications. When you
swipe on the alert, you go directly to the PagerDuty mobile UI for an incident
118. Boom
I can’t even begin to tell you how happy this makes me, and how less shitty it makes being
on-call
119. So, who better to summarize all of this than Hubot himself. I asked him what he thought
about ChatOps. Here’s what he said:
120. ChatOps all the things.
Listen to what Hubot said. You’ll love it. Your ops team will love it.
And you’ll help other developers learn how to interact with ops tools without any additional
work.
That’s awesome.
121. Work at GitHub
jesse@github.com
If you can’t ChatOps all the things at your gig now, you could always just come work with me
at GitHub.
Shoot me an email if you’re interested.
123. Tomorrow @ 8:30 PM
Zeke’s
3rd & Brannan
While I still have everyone’s attention, I wanted to mention the GitHub Drinkup we’re
throwing for Puppetconf again. It’s tomorrow night at 8:30pm at Zeke’s, which is on the
corner of 3rd and Brannan,
everyone’s invited. I’ll see you there.
Thanks again!