How TubeMogul reached
10,000 Puppet deployment in
one year
May 26th, 2015
Nicolas Brousse | Sr. Director Of Operations Engineering | nicolas@tubemogul.com
Julien Fabre | Site Reliability Engineer | julien.fabre@tubemogul.com
Who are we?
TubeMogul
● Enterprise software company for digital branding
● Over 27 Billions Ads served in 2014
● Over 30 Billions Ad Auctions per day
● Bid processed in less than 50 ms
● Bid served in less than 80 ms (include network round trip)
● 5 PB of monthly video traffic served
● 1.3 EB of data stored
Who are we?
Operations Engineering
● Ensure the smooth day to day operation of the platform
infrastructure
● Provide a cost effective and cutting edge infrastructure
● Team composed of SREs, SEs and DBAs
● Managing over 2,500 servers (virtual and physical)
Our Infrastructure
Public Cloud On Premises
Multiple locations with a mix of Public Cloud and On Premises
● Java (a lot!)
● MySQL
● Couchbase
● Vertica
● Kafka
● Storm
● Zookeeper, Exhibitor
● Hadoop, HBase, Hive
● Terracotta
● ElasticSearch, Kibana
● LogStash
● PHP, Python, Ruby, Go...
● Apache httpd
● Nagios
● Ganglia
Technology Hoarders
● Graphite
● Memcached
● Puppet
● HAproxy
● OpenStack
● Git and Gerrit
● Gor
● ActiveMQ
● OpenLDAP
● Redis
● Blackbox
● Jenkins, Sonar
● Tomcat
● Jetty (embedded)
● AWS DynamoDB, EC2, S3...
● 2008 - 2010: Use SVN, Bash scripts and custom templates.
● 2010: Managing about 250 instances. Start looking at Puppet.
● 2011: Puppet 0.25 then 2.7 by EOY on 400 servers with 2 contributors.
● 2012: 800 servers managed by Puppet. 4 contributors.
● 2013: 1,000 servers managed by Puppet. 6 contributors.
● 2014: 1,500 servers managed by Puppet. Introduced Continuous Delivery
Workflow. 9 contributors. Start 3.7 migration.
● 2015: 2,000 servers managed by Puppet. 13 contributors.
Five Years Of Puppet!
● 2000 nodes
● 225 unique nodes definition
● 1 puppetmaster
● 112 Puppet modules
Puppet Stats
● Virtual and Physical Servers Configuration : Master mode
● Building AWS AMI with Packer : Master mode
● Local development environment with Vagrant : Master mode
● OpenStack deployment : Masterless mode
Where and how do we use Puppet ?
Code Review?
● Gerrit, an industry standard : Eclipse, Google, Chromium, OpenStack,
WikiMedia, LibreOffice, Spotify, GlusterFS, etc...
● Fine Grained Permissions Rules
● Plugged to LDAP
● Code Review per commit
● Stream Events
● Use GitBlit
● Integrated with Jenkins and Jira
● Managing about 600 Git repositories
A Powerful Gerrit Integration
Gerrit in Action
verify -1 when no
ticket # or doesn’t
pass Jenkins code
validation
● 1 job per module
● 1 job for the manifests and hiera data
● 1 job for the Puppet fileserver
● 1 job to deploy
Continuous Delivery with Jenkins
Global Jenkins stats for the past year
● ~10,000 Puppet deployment
● Over 8,500 Production App Deployment
Plugin : github.com/jenkinsci/job-dsl-plugin
● Automate the jobs creation
● Ensure a standard across all the jobs
● Versioned the configuration
● Apply changes to all your jobs without pain
● Test your configuration changes
Jenkins job DSL : code your Jenkins jobs
Team Awareness: HipChat Integration with Hubot
Infrastructure As Code
● Follow standard development lifecycle
● Repeatable and consistent server
provisioning
Continuous Delivery
● Iterate quickly
● Automated code review to improve code
quality
Reliability
● Improve Production Stability
● Enforce Better Security Practices
Puppet Continuous Delivery Workflow: The Vision
The Workflow
The Workflow : Puppet code logic
Puppet environments
● Dedicated node manifests (*.pp)
● Modules deployed by branch with Git submodules
All the data in Hiera
● Try to avoid params.pp class
● Store everything : modules parameters, classes, keys, passwords, ...
Puppet Code Hierarchy
/etc/puppet
├── puppet.conf, hiera.yaml, *.conf
├── hiera
└── environments
├── dev
│ ├── manifests
│ │ ├── nodes/*.pp
│ │ └── site.pp
│ └── modules
│ ├── activemq
│ ...
│ └── zookeeper
└── production
├── manifests
│ ├── nodes/*.pp
│ └── site.pp
└── modules
├── activemq
…
└── zookeeper
Git submodules, branch dev
Git submodules, branch production
Hiera Configuration
$ cat /etc/puppet/hiera.yaml
---
:backends:
- eyaml
- yaml
:yaml:
:datadir: /etc/puppet/hiera
:eyaml:
:pkcs7_private_key: /var/lib/puppet/hiera_keys/private_key.pkcs7.pem
:pkcs7_public_key: /var/lib/puppet/hiera_keys/public_key.pkcs7.pem
:hierarchy:
- fqdn/%{::fqdn}
- "%{::zone}/%{::vpc}/%{::hostgroup}"
- "%{::zone}/%{::vpc}/all"
- "%{::zone}/%{::hostgroup}"
- "%{::zone}/all"
- hostname/%{::hostname}
- hostgroup/%{::hostgroup}
- environment/%{::environment}
- common
:merge_behavior: deeper
Hiera eyaml : github.com/TomPoulton/hiera-eyaml
● Hiera backend
● Easy to use
● Powerful CLI : eyaml edit /etc/puppet/hiera/secrets.yaml
Encrypt Your Secrets
$ cat secret.yaml
---
ec2::access_key_id: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMII
IBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAVIa28OwyaqI5N1TDCvVkBZz3YG+s+Hfzr0lqgcvRCIuJ
Gpq28sQmmuBaQjWY38i86ZSFu0gM6saOHfG64OzVlurO7k/l0CKeL0JfXNaVM4TUqMaN9dSkL5e2vsmpLKrMAS
awmarqbLYwllTrTe32H4NWxU1e+qWLeUMr9ciBnA3W1Azm4RIo+3bsvgvMfdks....=]
Encrypt Files
Blackbox : github.com/StackExchange/blackbox
● Use GPG to encrypt secret files
● Easy to add/delete team members
● No need to change your Puppet code !
# modules/${modules_name}/files/credentials.yaml.gpg
file { ‘/etc/app/credentials.yaml’:
ensure => ‘file’,
owner => ‘root’,
group => ‘root’,
mode => ‘0644’,
source => ‘puppet:///modules/${module_name}/credentials.yaml’
}
The Workflow
The Workflow : bottlenecks
● Only Ops team members can commit (SRE, SE)
● Review and validation is done only by a SRE
● Jenkins will verify the code but will not validate the commit
● Static Puppet environments
● Rely a lot on server hostnames
Flexibility : R10K github.com/adrienthebo/r10k !
● Dynamic environments
● No Git submodules anymore ! : - )
● Easy to reproduce any environment
● Can use private and forge Puppet modules
● Can use branches and tags
● Based on Puppetfile
Puppet Workflow Reloaded!
R10K
$ cat Puppetfile
forge "https://forgeapi.puppetlabs.com"
# Forge modules
mod 'pdxcat/collectd'
mod 'puppetlabs/rabbitmq'
mod 'arioch/redis'
mod 'maestrodev/wget'
mod 'puppetlabs/apt'
mod 'puppetlabs/stdlib'
# Tubemogul modules
mod "hosts",
:git => 'ssh://<gerrit_host>/puppet/modules/hosts',
:branch => 'dev'
mod "timezone",
:git => 'ssh://<gerrit_host>/puppet/modules/timezone',
:branch => 'dev'
...
Puppet Workflow Reloaded!
Better code organization : Roles and Profiles
● Represent the business logic : Roles
o Highest abstraction layer
o Use Profiles for implementation
● Implement the applications : Profiles
o Remove potential code duplication
o Use modules and other Puppet resources
Roles/Profiles Pattern
class role::logs {
include profile::base
include profile::logstash::server
include profile::elasticsearch
}
class profile::logstash {
$version = hiera('profile::logstash::server::version', '1.4.2')
$es_host = hiera('profile::logstash::server::es_host', 'es01')
$redis_host = hiera('profile::logstash::server::redis_host', 'redis01')
class { 'logstash':
package_url => "https://download.elasticsearch.org/logstash/.../logstash_${version}.deb",
java_install => true,
}
logstash::configfile { 'input_redis':
content => template('logstash/configfile/logstash.input_redis.conf.erb'),
order => 10,
}
logstash::configfile { 'output_es':
content => template('logstash/configfile/logstash.output_es.conf.erb'),
order => 30,
}
}
Do not rely on hostname : nodeless approach
● Facts to guide Puppet
● No node myawesomeserver { } anymore
● Enforce a cluster vision
● site.pp gives the configuration logic
Puppet Workflow Reloaded!
# /etc/puppet/manifests/site.pp
node default {
if $::ec2_tag_tm_role {
notify { "Using role : ${ec2_tag_tm_role}": }
include "role::${::ec2_tag_tm_role}"
} else {
fail(‘No role found. Nothing to configure.’)
}
}
● Specify tags during the provisioning
● Retrieve tags with AWS Ruby SDK and create facts
● New hierarchy
AWS EC2 tags
$ facter -p | grep ec2_tag
ec2_tag_cluster => rtb-bidder
ec2_tag_nagios_host => mgmt01
ec2_tag_name => bidder
ec2_tag_pupenv => production
ec2_tag_tm_role => rtb::bidder
:hierarchy:
- "%{::zone}/%{::ec2_tag_vpc}/%{::ec2_tag_cluster}"
- "%{::zone}/%{::ec2_tag_vpc}/all"
- "%{::zone}/all"
- vpc/%{::ec2_tag_vpc}/%{::ec2_tag_cluster}
- vpc/%{::ec2_tag_vpc}/all
- environment/%{::environment}
- common
New merging and reviewing rules
● Everyone can commit a Puppet code
● Allow everyone to review a Puppet change (+1)
● Allow SE and SRE to validate a Puppet change (+2)
● Auto validation/merging in dev if at least 80% of test (+2)
Next improvements
● Acceptance testing with Beaker and Docker
● Full test provisioning with ServerSpec
● PuppetDB to improve the reporting
● Dedicated Puppet Masters
Nicolas Brousse
Julien Fabre
@orieg
@julien_fabre

Puppet Camp Silicon Valley 2015: How TubeMogul reached 10,000 Puppet Deployment in one year

  • 1.
    How TubeMogul reached 10,000Puppet deployment in one year May 26th, 2015 Nicolas Brousse | Sr. Director Of Operations Engineering | nicolas@tubemogul.com Julien Fabre | Site Reliability Engineer | julien.fabre@tubemogul.com
  • 2.
    Who are we? TubeMogul ●Enterprise software company for digital branding ● Over 27 Billions Ads served in 2014 ● Over 30 Billions Ad Auctions per day ● Bid processed in less than 50 ms ● Bid served in less than 80 ms (include network round trip) ● 5 PB of monthly video traffic served ● 1.3 EB of data stored
  • 3.
    Who are we? OperationsEngineering ● Ensure the smooth day to day operation of the platform infrastructure ● Provide a cost effective and cutting edge infrastructure ● Team composed of SREs, SEs and DBAs ● Managing over 2,500 servers (virtual and physical)
  • 4.
    Our Infrastructure Public CloudOn Premises Multiple locations with a mix of Public Cloud and On Premises
  • 5.
    ● Java (alot!) ● MySQL ● Couchbase ● Vertica ● Kafka ● Storm ● Zookeeper, Exhibitor ● Hadoop, HBase, Hive ● Terracotta ● ElasticSearch, Kibana ● LogStash ● PHP, Python, Ruby, Go... ● Apache httpd ● Nagios ● Ganglia Technology Hoarders ● Graphite ● Memcached ● Puppet ● HAproxy ● OpenStack ● Git and Gerrit ● Gor ● ActiveMQ ● OpenLDAP ● Redis ● Blackbox ● Jenkins, Sonar ● Tomcat ● Jetty (embedded) ● AWS DynamoDB, EC2, S3...
  • 6.
    ● 2008 -2010: Use SVN, Bash scripts and custom templates. ● 2010: Managing about 250 instances. Start looking at Puppet. ● 2011: Puppet 0.25 then 2.7 by EOY on 400 servers with 2 contributors. ● 2012: 800 servers managed by Puppet. 4 contributors. ● 2013: 1,000 servers managed by Puppet. 6 contributors. ● 2014: 1,500 servers managed by Puppet. Introduced Continuous Delivery Workflow. 9 contributors. Start 3.7 migration. ● 2015: 2,000 servers managed by Puppet. 13 contributors. Five Years Of Puppet!
  • 7.
    ● 2000 nodes ●225 unique nodes definition ● 1 puppetmaster ● 112 Puppet modules Puppet Stats
  • 8.
    ● Virtual andPhysical Servers Configuration : Master mode ● Building AWS AMI with Packer : Master mode ● Local development environment with Vagrant : Master mode ● OpenStack deployment : Masterless mode Where and how do we use Puppet ?
  • 9.
  • 10.
    ● Gerrit, anindustry standard : Eclipse, Google, Chromium, OpenStack, WikiMedia, LibreOffice, Spotify, GlusterFS, etc... ● Fine Grained Permissions Rules ● Plugged to LDAP ● Code Review per commit ● Stream Events ● Use GitBlit ● Integrated with Jenkins and Jira ● Managing about 600 Git repositories A Powerful Gerrit Integration
  • 11.
    Gerrit in Action verify-1 when no ticket # or doesn’t pass Jenkins code validation
  • 12.
    ● 1 jobper module ● 1 job for the manifests and hiera data ● 1 job for the Puppet fileserver ● 1 job to deploy Continuous Delivery with Jenkins Global Jenkins stats for the past year ● ~10,000 Puppet deployment ● Over 8,500 Production App Deployment
  • 13.
    Plugin : github.com/jenkinsci/job-dsl-plugin ●Automate the jobs creation ● Ensure a standard across all the jobs ● Versioned the configuration ● Apply changes to all your jobs without pain ● Test your configuration changes Jenkins job DSL : code your Jenkins jobs
  • 14.
    Team Awareness: HipChatIntegration with Hubot
  • 15.
    Infrastructure As Code ●Follow standard development lifecycle ● Repeatable and consistent server provisioning Continuous Delivery ● Iterate quickly ● Automated code review to improve code quality Reliability ● Improve Production Stability ● Enforce Better Security Practices Puppet Continuous Delivery Workflow: The Vision
  • 16.
  • 17.
    The Workflow :Puppet code logic Puppet environments ● Dedicated node manifests (*.pp) ● Modules deployed by branch with Git submodules All the data in Hiera ● Try to avoid params.pp class ● Store everything : modules parameters, classes, keys, passwords, ...
  • 18.
    Puppet Code Hierarchy /etc/puppet ├──puppet.conf, hiera.yaml, *.conf ├── hiera └── environments ├── dev │ ├── manifests │ │ ├── nodes/*.pp │ │ └── site.pp │ └── modules │ ├── activemq │ ... │ └── zookeeper └── production ├── manifests │ ├── nodes/*.pp │ └── site.pp └── modules ├── activemq … └── zookeeper Git submodules, branch dev Git submodules, branch production
  • 19.
    Hiera Configuration $ cat/etc/puppet/hiera.yaml --- :backends: - eyaml - yaml :yaml: :datadir: /etc/puppet/hiera :eyaml: :pkcs7_private_key: /var/lib/puppet/hiera_keys/private_key.pkcs7.pem :pkcs7_public_key: /var/lib/puppet/hiera_keys/public_key.pkcs7.pem :hierarchy: - fqdn/%{::fqdn} - "%{::zone}/%{::vpc}/%{::hostgroup}" - "%{::zone}/%{::vpc}/all" - "%{::zone}/%{::hostgroup}" - "%{::zone}/all" - hostname/%{::hostname} - hostgroup/%{::hostgroup} - environment/%{::environment} - common :merge_behavior: deeper
  • 20.
    Hiera eyaml :github.com/TomPoulton/hiera-eyaml ● Hiera backend ● Easy to use ● Powerful CLI : eyaml edit /etc/puppet/hiera/secrets.yaml Encrypt Your Secrets $ cat secret.yaml --- ec2::access_key_id: ENC[PKCS7,MIIBiQYJKoZIhvcNAQcDoIIBejCCAXYCAQAxggEhMII IBHQIBADAFMAACAQEwDQYJKoZIhvcNAQEBBQAEggEAVIa28OwyaqI5N1TDCvVkBZz3YG+s+Hfzr0lqgcvRCIuJ Gpq28sQmmuBaQjWY38i86ZSFu0gM6saOHfG64OzVlurO7k/l0CKeL0JfXNaVM4TUqMaN9dSkL5e2vsmpLKrMAS awmarqbLYwllTrTe32H4NWxU1e+qWLeUMr9ciBnA3W1Azm4RIo+3bsvgvMfdks....=]
  • 21.
    Encrypt Files Blackbox :github.com/StackExchange/blackbox ● Use GPG to encrypt secret files ● Easy to add/delete team members ● No need to change your Puppet code ! # modules/${modules_name}/files/credentials.yaml.gpg file { ‘/etc/app/credentials.yaml’: ensure => ‘file’, owner => ‘root’, group => ‘root’, mode => ‘0644’, source => ‘puppet:///modules/${module_name}/credentials.yaml’ }
  • 22.
  • 23.
    The Workflow :bottlenecks ● Only Ops team members can commit (SRE, SE) ● Review and validation is done only by a SRE ● Jenkins will verify the code but will not validate the commit ● Static Puppet environments ● Rely a lot on server hostnames
  • 24.
    Flexibility : R10Kgithub.com/adrienthebo/r10k ! ● Dynamic environments ● No Git submodules anymore ! : - ) ● Easy to reproduce any environment ● Can use private and forge Puppet modules ● Can use branches and tags ● Based on Puppetfile Puppet Workflow Reloaded!
  • 25.
    R10K $ cat Puppetfile forge"https://forgeapi.puppetlabs.com" # Forge modules mod 'pdxcat/collectd' mod 'puppetlabs/rabbitmq' mod 'arioch/redis' mod 'maestrodev/wget' mod 'puppetlabs/apt' mod 'puppetlabs/stdlib' # Tubemogul modules mod "hosts", :git => 'ssh://<gerrit_host>/puppet/modules/hosts', :branch => 'dev' mod "timezone", :git => 'ssh://<gerrit_host>/puppet/modules/timezone', :branch => 'dev' ...
  • 26.
    Puppet Workflow Reloaded! Bettercode organization : Roles and Profiles ● Represent the business logic : Roles o Highest abstraction layer o Use Profiles for implementation ● Implement the applications : Profiles o Remove potential code duplication o Use modules and other Puppet resources
  • 27.
    Roles/Profiles Pattern class role::logs{ include profile::base include profile::logstash::server include profile::elasticsearch } class profile::logstash { $version = hiera('profile::logstash::server::version', '1.4.2') $es_host = hiera('profile::logstash::server::es_host', 'es01') $redis_host = hiera('profile::logstash::server::redis_host', 'redis01') class { 'logstash': package_url => "https://download.elasticsearch.org/logstash/.../logstash_${version}.deb", java_install => true, } logstash::configfile { 'input_redis': content => template('logstash/configfile/logstash.input_redis.conf.erb'), order => 10, } logstash::configfile { 'output_es': content => template('logstash/configfile/logstash.output_es.conf.erb'), order => 30, } }
  • 28.
    Do not relyon hostname : nodeless approach ● Facts to guide Puppet ● No node myawesomeserver { } anymore ● Enforce a cluster vision ● site.pp gives the configuration logic Puppet Workflow Reloaded! # /etc/puppet/manifests/site.pp node default { if $::ec2_tag_tm_role { notify { "Using role : ${ec2_tag_tm_role}": } include "role::${::ec2_tag_tm_role}" } else { fail(‘No role found. Nothing to configure.’) } }
  • 29.
    ● Specify tagsduring the provisioning ● Retrieve tags with AWS Ruby SDK and create facts ● New hierarchy AWS EC2 tags $ facter -p | grep ec2_tag ec2_tag_cluster => rtb-bidder ec2_tag_nagios_host => mgmt01 ec2_tag_name => bidder ec2_tag_pupenv => production ec2_tag_tm_role => rtb::bidder :hierarchy: - "%{::zone}/%{::ec2_tag_vpc}/%{::ec2_tag_cluster}" - "%{::zone}/%{::ec2_tag_vpc}/all" - "%{::zone}/all" - vpc/%{::ec2_tag_vpc}/%{::ec2_tag_cluster} - vpc/%{::ec2_tag_vpc}/all - environment/%{::environment} - common
  • 30.
    New merging andreviewing rules ● Everyone can commit a Puppet code ● Allow everyone to review a Puppet change (+1) ● Allow SE and SRE to validate a Puppet change (+2) ● Auto validation/merging in dev if at least 80% of test (+2)
  • 31.
    Next improvements ● Acceptancetesting with Beaker and Docker ● Full test provisioning with ServerSpec ● PuppetDB to improve the reporting ● Dedicated Puppet Masters
  • 32.