One-Man Ops

Jos Boumans
Jos BoumansVP of Operations
One-Man Ops
with Puppet & Friends
     Jos Boumans
Operations @ Krux Digital
RIPE NCC
Can I have
another /8
 please?




    How you know us
Ubuntu Server
10.04 LTS
10.10
AWS Integration
Krux
Good guys of
Data Privacy
Not to be confused
      with...
Our Traffic


• Serving 4000-10000 user & contextual data
  requests/second
• Sub 100 ms response times
• Processing ~150 gb of raw data per day
• Twitter: Average ~3000 tweets/second
Our Infrastructure

• Started small on AWS. Now:
• 100 dedicated nodes
• +100-200 on demand Map/Reduce nodes
• Dozens of local development machines
• 20 different types of machines
One-Man Ops team
Sad Panda
Go from here...
... to here
Your Toolkit
Ubuntu 10.04
cloud-init
Uses AMI user-data to bootstrap puppet on the client

  https://help.ubuntu.com/community/CloudInit

 http://www.youtube.com/watch?v=-zL3BdbKyGY
#cloud-config

### Update puppet to 2.6.3
apt_sources:
- source: "ppa:mathiaz/puppet-backports"
apt_update: true
apt_upgrade: true

ssh-rsa: AAAAB3NzaC.....+ujFHz

puppet:
 conf:
  puppetd:
   server: "puppet.example.com"
   # certname %i: instanceid, %f: fqdn of the machine
   certname: "%i.%f"
  ca_cert: |
   -----BEGIN CERTIFICATE-----
   ....
monthly updates
http://uec-images.ubuntu.com/query/lucid/server/
               released.current.txt
you can upgrade
       the kernel
      Only AMI that I know that can do this

http://cloud.ubuntu.com/2011/02/migrating-to-pv-
         grub-kernels-for-kernel-upgrades/
Updated software for
      10.04
           Backported builds for
     Apache, Memcache, Mysql, PHP, etc

 https://launchpad.net/~ubuntu-server-edgers
I may be biased
AWS
<3 Elastic Load
      Balancer
They're free and will save you more than once

http://aws.amazon.com/elasticloadbalancing/
<3 S3
(Simple Storage Service)
      Great cheap data retention
        Good poor mans CDN

      http://aws.amazon.com/s3
Tip: Get ExpanDrive for
great SSHFS and S3FS
     Available for Windows and Mac:

      http://www.expandrive.com/
RDS > Own MySQL
   Hot Standby - Failover is ~7 minutes
Read Replicates - Improve read performance

   BUT, you can't replicate out of RDS :(

       http://aws.amazon.com/rds/
Use EBS Root
   (Elastic Block Storage)
You can reboot and stop/start machines and keep state
  Consider attaching extra EBS for data persistence

Tip: Software raid for multiple EBS drives for better IO
</3 Network
       Partitioning
        This will happen to you a lot

Relying on network connections will decrease
        availability of your machines
</3 Floating
   public IPS
    AWS DHCP server is flaky

  AWS DNS TTL is 60 seconds

Limited amount of fixed public IPs
Sort your DNS
  AWS offers http://aws.amazon.com/route53/

When you go multi data center or have big traffic,
 seriously consider Dyn: http://dyn.com/dns/
Avoid Single
Points of Failure
       Because they WILL fail.

 Architect for eventually consistent,
 distributed systems where you can.
Remember him..?
Puppet
Optimize for making
Puppet development
      EASY
   Bridge the gap between dev & ops

     Tip: use a c1.medium at least
Put your Puppet
  code in VCS
I really don't need to explain why, right?
Run multiple Puppet
   environments
http://docs.puppetlabs.com/guides/environment.html

We put 1 host of each cluster in puppet environment
 development, 1 in staging, the rest in production

         Don't break everything at once :)
Split your Puppet
 code into modules
     We use: Forge, Components, Services

http://docs.puppetlabs.com/guides/modules.html
Use seperate init.pp,
params.pp & config.pp
Params.pp so you can include variables from elsewhere

              Config.pp lets you specify:
           kfoo::config { $fqdn } in a service
                     and require:
        Kfoo::Config[ $fqdn ] in the component

  http://docs.puppetlabs.com/guides/modules.html
Use a common
        base class
Set up all the plumbing from users, to apt,
 to filesystems, to mounts, ntp, sudo, git,
        monitoring, ssh, and so on.

      Run it early using run stages
Sample Service
class s_webui {
  include kbase
  include kapache
  include kwebui
  include kredis

    kwebui         { $fqdn: }
    kapache::vhost { $fqdn: ssl => 443 }
    kredis::config { $fqdn: memory => '100M' }
}
Write tools to make
you more productive
Enable developers to run their own Puppet master

         Create new components easily

           Push changes to production

       Our code: https://github.com/krux/ops-tools /
Your own Puppet server
          & manifests
puppet001:puppet-jib$ screen -S jib.puppetmaster 
  bin/run_puppet_master_locally 8180

Running: sudo puppet master --no-daemonize
 --verbose --debug --masterport 8180
 --pidfile /mnt/tmp/puppetmaster.8180.pid
 --confdir /data/git/puppet-jib/bin/..

.....
notice: Starting Puppet master version 2.6.3
.....
Our Layout
$git/
  bin/
    update_env.pl
    run_puppet_master_locally.pl
    new_component.pl
  env/
    development/
      forge/
      krux-modules/
      services/
    staging/
      ...
    production/
      ...
Use an External
           Node Classifier
           Manage your host specific configuration
              separately from your manifests

http://docs.puppetlabs.com/guides/external_nodes.html

Our code: https://github.com/krux/ops-tools /blob/puppet/bin/node_classifier.py
Keep node
configuration in an
 editable location
                 We chose S3

Git, LDAP, or anything else that works for you.
Sign nodes that have
  a configuration only
        Keyed off their certname, run periodically

                     Inspired by:
http://ubuntumathiaz.wordpress.com/2010/03/24/using-
  puppet-in-uecec2-puppet-support-in-ubuntu-images/

 Our code: https://github.com/krux/ops-tools /blob/puppet/bin/check_csr.py
Master Puppet.conf
[master]
.......
node_terminus = exec
external_nodes = /usr/bin/node_classifier.py --bucket instances
reports        = http, store, foreman

### different puppet environments: development, staging, production
[development]
templatedir = $confdir/env/development/templates
modulepath = $confdir/env/development/krux-modules:
               $confdir/env/development/forge:
               $confdir/env/development/services

[....]
Sample Configuration
{ 'classes': ['s_sandbox::jib'],
  'parameters': {
  'zone':                 'us-east-1c',
  'instance_type':         'c1.medium',
  'instance_id':           'i-23a3d042',
  'security_group':         'krux-ops-dev',
  'puppet_environment': 'development',
  'puppet_master_port': 8180,
  'kredis_save_to_disk': 0
  'certname':                'ops-dev003.example.com.
      47334fd8-1516-451d-bd5a-8760ab2a36c0',
}}
Attend a Puppet
    Master Training!
            No, I don't get a kick back :)

http://puppetlabs.com/services/training-workshops/
... avoid becoming him
Foreman
Email
 Reports & Alerts
   This feature alone is worth installing it.

      Run it on the same host as your
     Puppet master for minimal friction

http://theforeman.org/projects/foreman/wiki/
         Summarized_E-Mail_Reports
Dashboard / Browser
Theoretically:
   Node Classifier
http://theforeman.org/projects/foreman/wiki/
               External_Nodes

    We are happy with S3 based solution

       YMMV though: do look into it!
Theoretically:
Initiate Puppetrun
http://theforeman.org/projects/foreman/wiki/
                 Puppetrun

      Couldn't get it to work though :(
Python Boto & s3cmd
$ s3cmd put file.txt
  s3://my-bucket
Great for cronjobs, maintenance tasks & file syncs

  Consider s3://my-dropbox for your company

            http://s3tools.org/s3cmd
boto: Full python API
   access to AWS
        Boto + AWS + Puppet
                   =
     Real 'Infrastructure as Code'

    http://code.google.com/p/boto/
start_instance.py:
     Launch AWS nodes
          Manage zone, security group, type ami,
              puppet class, EBS, hostname

              Bootstraps the node for puppet,
          integrates with external node classifier

Our code: https://github.com/krux/ops-tools /blob/aws/bin/start_instance.py
$ start_instance.py -t m1.large -z us-east-1a -a 10
  -H dev001.example.com -s mycorp-development
  ami-2ec83147 s_development

Starting instance of ami ami-2ec83147 - this may take a while
......... started i-12345678

Attaching 10gb volume to instance i-12345678 - this may take a while
..... attached vol-87654321

Created these DNS entries:
 dev001.example.com => ec2-172-131-213-58.compute-1.amazonaws.com

Wrote configuration to S3 key:
 s3://instances/dev001.example.com.47334fd8-1516-451d-bd5a-8760ab2a36c0
security_groups.py
       Manage & Sync
     Programmatically manage your security groups
           keep groups in sync across regions

Our code: https://github.com/krux/ops-tools /blob/aws/bin/security_groups.py
Monitoring & Graphing
Free developer
           account
            1 Free node with all features,
         unlimited nodes with basic features
         Free: HTTP(S), PING, SSH, DNS, TCP
Premium: HTTP JSON(!), Custom plugins, Mysql, Apache
                  mod_status, etc.

        Get a 2nd free node through referral:
     https://cloudkick.com/referral/633f0729
Performance Graphs
Puppet classes &
       config information




Monitoring & Alerts
Generate your
  cloudkick.conf from
        Puppet
  Use puppet classes, tags, colors as you define them
                  as cloudkick tags

Our code for doing so: https://gist.github.com/1230044
Cloudkick Gem for
       parallel-ssh
    Uses your cloudkick tags to do node selection,
which are based straight off your puppet classes & facts

     https://github.com/cloudkick/cloudkick-gem
Cloudkick pssh
$ cloudkick pssh --query 'node:redis-c*' 'hostname'

[1] 18:38:23 [SUCCESS] 64.206.11.221
redis-c-slave001.example.com
[2] 18:38:23 [SUCCESS] 52.13.118.158
redis-c-master001.example.com
[3] 18:38:24 [SUCCESS] 52.16.34.217
redis-c-slave004.example.com
[4] 18:38:24 [SUCCESS] 183.71.131.32
redis-c-slave002.example.com
Krux Improvements:
 pscp, listing nodes
           Get it from our github:
  https://github.com/krux/cloudkick-gem

          Fork and contribute!
Cloudkick list
$cloudkick list --full --query 'node:redis-c*'

# Name            IP                Type         Zone
redis-c-master001 52.13.118.158     m2.4xlarge   us-east-1a
redis-c-slave001 64.206.11.221      m2.4xlarge    us-east-1a
redis-c-slave002 183.71.131.32      m2.4xlarge   us-east-1b
redis-c-slave004 52.16.34.217       m2.4xlarge   us-east-1d
Take away:
Measure Everything!
                Further reading:

  Pagerduty for cell phone/pager/email alerts
  New Relic for more in depth app monitoring
MCollective for more advanced task parallelization
Just one more thing....
Vagrant
VirtualBox + Ubuntu
   + Puppet = JFDI
     Use same puppet infrastructure to provision
               dev machines locally

Put it on a USB stick, be up and running in 30 minutes

Our code for doing so: https://gist.github.com/1230221
Thank You!
Slides at: slideshare.net/jiboumans

  Follow us: @KruxEngineering

  We're Hiring: kruxdigital.com
1 of 79

Recommended

Chaos patterns - architecting for failure in distributed systems by
Chaos patterns - architecting for failure in distributed systemsChaos patterns - architecting for failure in distributed systems
Chaos patterns - architecting for failure in distributed systemsJos Boumans
4.7K views47 slides
How to Measure Everything: A Million Metrics Per Second with Minimal Develope... by
How to Measure Everything: A Million Metrics Per Second with Minimal Develope...How to Measure Everything: A Million Metrics Per Second with Minimal Develope...
How to Measure Everything: A Million Metrics Per Second with Minimal Develope...Puppet
2.6K views51 slides
今すぐ始めるCloud Foundry #hackt #hackt_k by
今すぐ始めるCloud Foundry #hackt #hackt_k今すぐ始めるCloud Foundry #hackt #hackt_k
今すぐ始めるCloud Foundry #hackt #hackt_kToshiaki Maki
11.2K views123 slides
Assembling an Open Source Toolchain to Manage Public, Private and Hybrid Clou... by
Assembling an Open Source Toolchain to Manage Public, Private and Hybrid Clou...Assembling an Open Source Toolchain to Manage Public, Private and Hybrid Clou...
Assembling an Open Source Toolchain to Manage Public, Private and Hybrid Clou...POSSCON
709 views45 slides
Cross platform mobile apps using .NET by
Cross platform mobile apps using .NETCross platform mobile apps using .NET
Cross platform mobile apps using .NETJonas Follesø
4.2K views64 slides
How We Learned To Love The Data Center Operating System by
How We Learned To Love The Data Center Operating SystemHow We Learned To Love The Data Center Operating System
How We Learned To Love The Data Center Operating Systemsaulius_vl
122 views38 slides

More Related Content

What's hot

Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017 by
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017Jeff Geerling
539 views28 slides
Extend and build on Kubernetes by
Extend and build on KubernetesExtend and build on Kubernetes
Extend and build on KubernetesStefan Schimanski
4.8K views49 slides
Practical Operation Automation with StackStorm by
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStormShu Sugimoto
3.3K views38 slides
Configuration management and deployment with ansible by
Configuration management and deployment with ansibleConfiguration management and deployment with ansible
Configuration management and deployment with ansibleIvan Dimitrov
509 views21 slides
Container and microservices: a love story by
Container and microservices: a love storyContainer and microservices: a love story
Container and microservices: a love storyThomas Rossetto
211 views63 slides
Rebooting a Cloud by
Rebooting a CloudRebooting a Cloud
Rebooting a CloudJesse Robbins
2.7K views55 slides

What's hot(20)

Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017 by Jeff Geerling
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Drupal VM for Drupal 8 Dev - Drupal Camp STL 2017
Jeff Geerling539 views
Practical Operation Automation with StackStorm by Shu Sugimoto
Practical Operation Automation with StackStormPractical Operation Automation with StackStorm
Practical Operation Automation with StackStorm
Shu Sugimoto3.3K views
Configuration management and deployment with ansible by Ivan Dimitrov
Configuration management and deployment with ansibleConfiguration management and deployment with ansible
Configuration management and deployment with ansible
Ivan Dimitrov509 views
Container and microservices: a love story by Thomas Rossetto
Container and microservices: a love storyContainer and microservices: a love story
Container and microservices: a love story
Thomas Rossetto211 views
決済サービスのSpring Bootのバージョンを2系に上げた話 by Ryosuke Uchitate
決済サービスのSpring Bootのバージョンを2系に上げた話決済サービスのSpring Bootのバージョンを2系に上げた話
決済サービスのSpring Bootのバージョンを2系に上げた話
Ryosuke Uchitate1.7K views
DevOps for Humans - Ansible for Drupal Deployment Victory! by Jeff Geerling
DevOps for Humans - Ansible for Drupal Deployment Victory!DevOps for Humans - Ansible for Drupal Deployment Victory!
DevOps for Humans - Ansible for Drupal Deployment Victory!
Jeff Geerling11.3K views
Drupal VM for Drupal 8 Dev - MidCamp 2017 by Jeff Geerling
Drupal VM for Drupal 8 Dev - MidCamp 2017Drupal VM for Drupal 8 Dev - MidCamp 2017
Drupal VM for Drupal 8 Dev - MidCamp 2017
Jeff Geerling2K views
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化 by Ryosuke Uchitate
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Ryosuke Uchitate1.2K views
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA by sean_seannery
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLARiot Games Scalable Data Warehouse Lecture at UCSB / UCLA
Riot Games Scalable Data Warehouse Lecture at UCSB / UCLA
sean_seannery7.7K views
React & The Art of Managing Complexity by Ryan Anklam
React &  The Art of Managing ComplexityReact &  The Art of Managing Complexity
React & The Art of Managing Complexity
Ryan Anklam1.3K views
Locarise,reagent and JavaScript Libraries by Ikuru Kanuma
Locarise,reagent and JavaScript LibrariesLocarise,reagent and JavaScript Libraries
Locarise,reagent and JavaScript Libraries
Ikuru Kanuma479 views
Achieving Continuous Delivery: An Automation Story by jimi-c
Achieving Continuous Delivery: An Automation StoryAchieving Continuous Delivery: An Automation Story
Achieving Continuous Delivery: An Automation Story
jimi-c6.1K views
Docker cr ineta-20150601 by chrisortman
Docker cr ineta-20150601Docker cr ineta-20150601
Docker cr ineta-20150601
chrisortman1.3K views
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins by Tomas Doran
Chasing AMI - Building Amazon machine images with Puppet, Packer and JenkinsChasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Chasing AMI - Building Amazon machine images with Puppet, Packer and Jenkins
Tomas Doran13.1K views
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT by Maarten Balliauw
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLTWindows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
Windows Azure Web Sites - Things they don’t teach kids in school - BuildStuffLT
Maarten Balliauw3.7K views
CloudStack and NFV by ShapeBlue
CloudStack and NFVCloudStack and NFV
CloudStack and NFV
ShapeBlue881 views
Breaking Up With Your Data Center Presentation by Telescope_Inc
Breaking Up With Your Data Center PresentationBreaking Up With Your Data Center Presentation
Breaking Up With Your Data Center Presentation
Telescope_Inc2.8K views

Similar to One-Man Ops

Bare Metal to OpenStack with Razor and Chef by
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and ChefMatt Ray
7K views71 slides
Rapid scaling in_the_cloud_with_puppet by
Rapid scaling in_the_cloud_with_puppetRapid scaling in_the_cloud_with_puppet
Rapid scaling in_the_cloud_with_puppetCarl Caum
1.2K views45 slides
Building an HPC Cluster in 10 Minutes by
Building an HPC Cluster in 10 MinutesBuilding an HPC Cluster in 10 Minutes
Building an HPC Cluster in 10 MinutesMonica Rut Avellino
1.1K views20 slides
Kubernetes laravel and kubernetes by
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetesWilliam Stewart
3.4K views28 slides
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki by
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakijavier ramirez
1K views72 slides
Docker Security workshop slides by
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slidesDocker, Inc.
5.3K views122 slides

Similar to One-Man Ops(20)

Bare Metal to OpenStack with Razor and Chef by Matt Ray
Bare Metal to OpenStack with Razor and ChefBare Metal to OpenStack with Razor and Chef
Bare Metal to OpenStack with Razor and Chef
Matt Ray7K views
Rapid scaling in_the_cloud_with_puppet by Carl Caum
Rapid scaling in_the_cloud_with_puppetRapid scaling in_the_cloud_with_puppet
Rapid scaling in_the_cloud_with_puppet
Carl Caum1.2K views
Kubernetes laravel and kubernetes by William Stewart
Kubernetes   laravel and kubernetesKubernetes   laravel and kubernetes
Kubernetes laravel and kubernetes
William Stewart3.4K views
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki by javier ramirez
Google Cloud Platform for DeVops, by Javier Ramirez @ teowakiGoogle Cloud Platform for DeVops, by Javier Ramirez @ teowaki
Google Cloud Platform for DeVops, by Javier Ramirez @ teowaki
javier ramirez1K views
Docker Security workshop slides by Docker, Inc.
Docker Security workshop slidesDocker Security workshop slides
Docker Security workshop slides
Docker, Inc.5.3K views
Configuring Your First Hadoop Cluster On EC2 by benjaminwootton
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
benjaminwootton29.7K views
Writing & Sharing Great Modules - Puppet Camp Boston by Puppet
Writing & Sharing Great Modules - Puppet Camp BostonWriting & Sharing Great Modules - Puppet Camp Boston
Writing & Sharing Great Modules - Puppet Camp Boston
Puppet1.8K views
Reusable, composable, battle-tested Terraform modules by Yevgeniy Brikman
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
Yevgeniy Brikman28.4K views
Couch to OpenStack: Nova - July, 30, 2013 by Trevor Roberts Jr.
Couch to OpenStack: Nova - July, 30, 2013Couch to OpenStack: Nova - July, 30, 2013
Couch to OpenStack: Nova - July, 30, 2013
Trevor Roberts Jr.4.6K views
Puppet and CloudStack by ke4qqq
Puppet and CloudStackPuppet and CloudStack
Puppet and CloudStack
ke4qqq545 views
Postgres the hardway by Dave Pitts
Postgres the hardwayPostgres the hardway
Postgres the hardway
Dave Pitts275 views
Puppetpreso by ke4qqq
PuppetpresoPuppetpreso
Puppetpreso
ke4qqq439 views
Portland Puppet User Group June 2014: Writing and publishing puppet modules by Puppet
Portland Puppet User Group June 2014: Writing and publishing puppet modulesPortland Puppet User Group June 2014: Writing and publishing puppet modules
Portland Puppet User Group June 2014: Writing and publishing puppet modules
Puppet1.9K views
June 2014 PDX PUG: Writing and Publishing Puppet Modules by Puppet
June 2014 PDX PUG: Writing and Publishing Puppet Modules June 2014 PDX PUG: Writing and Publishing Puppet Modules
June 2014 PDX PUG: Writing and Publishing Puppet Modules
Puppet1.2K views
Build Your Own CaaS (Container as a Service) by HungWei Chiu
Build Your Own CaaS (Container as a Service)Build Your Own CaaS (Container as a Service)
Build Your Own CaaS (Container as a Service)
HungWei Chiu726 views

Recently uploaded

SUPPLIER SOURCING.pptx by
SUPPLIER SOURCING.pptxSUPPLIER SOURCING.pptx
SUPPLIER SOURCING.pptxangelicacueva6
15 views1 slide
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...Jasper Oosterveld
18 views49 slides
The details of description: Techniques, tips, and tangents on alternative tex... by
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...BookNet Canada
127 views24 slides
Attacking IoT Devices from a Web Perspective - Linux Day by
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day Simone Onofri
16 views68 slides
Empathic Computing: Delivering the Potential of the Metaverse by
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the MetaverseMark Billinghurst
478 views80 slides
HTTP headers that make your website go faster - devs.gent November 2023 by
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
22 views151 slides

Recently uploaded(20)

ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada127 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst478 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman33 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta26 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely21 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva

One-Man Ops

  • 1. One-Man Ops with Puppet & Friends Jos Boumans Operations @ Krux Digital
  • 3. Can I have another /8 please? How you know us
  • 10. Not to be confused with...
  • 11. Our Traffic • Serving 4000-10000 user & contextual data requests/second • Sub 100 ms response times • Processing ~150 gb of raw data per day • Twitter: Average ~3000 tweets/second
  • 12. Our Infrastructure • Started small on AWS. Now: • 100 dedicated nodes • +100-200 on demand Map/Reduce nodes • Dozens of local development machines • 20 different types of machines
  • 19. cloud-init Uses AMI user-data to bootstrap puppet on the client https://help.ubuntu.com/community/CloudInit http://www.youtube.com/watch?v=-zL3BdbKyGY
  • 20. #cloud-config ### Update puppet to 2.6.3 apt_sources: - source: "ppa:mathiaz/puppet-backports" apt_update: true apt_upgrade: true ssh-rsa: AAAAB3NzaC.....+ujFHz puppet: conf: puppetd: server: "puppet.example.com" # certname %i: instanceid, %f: fqdn of the machine certname: "%i.%f" ca_cert: | -----BEGIN CERTIFICATE----- ....
  • 22. you can upgrade the kernel Only AMI that I know that can do this http://cloud.ubuntu.com/2011/02/migrating-to-pv- grub-kernels-for-kernel-upgrades/
  • 23. Updated software for 10.04 Backported builds for Apache, Memcache, Mysql, PHP, etc https://launchpad.net/~ubuntu-server-edgers
  • 24. I may be biased
  • 25. AWS
  • 26. <3 Elastic Load Balancer They're free and will save you more than once http://aws.amazon.com/elasticloadbalancing/
  • 27. <3 S3 (Simple Storage Service) Great cheap data retention Good poor mans CDN http://aws.amazon.com/s3
  • 28. Tip: Get ExpanDrive for great SSHFS and S3FS Available for Windows and Mac: http://www.expandrive.com/
  • 29. RDS > Own MySQL Hot Standby - Failover is ~7 minutes Read Replicates - Improve read performance BUT, you can't replicate out of RDS :( http://aws.amazon.com/rds/
  • 30. Use EBS Root (Elastic Block Storage) You can reboot and stop/start machines and keep state Consider attaching extra EBS for data persistence Tip: Software raid for multiple EBS drives for better IO
  • 31. </3 Network Partitioning This will happen to you a lot Relying on network connections will decrease availability of your machines
  • 32. </3 Floating public IPS AWS DHCP server is flaky AWS DNS TTL is 60 seconds Limited amount of fixed public IPs
  • 33. Sort your DNS AWS offers http://aws.amazon.com/route53/ When you go multi data center or have big traffic, seriously consider Dyn: http://dyn.com/dns/
  • 34. Avoid Single Points of Failure Because they WILL fail. Architect for eventually consistent, distributed systems where you can.
  • 37. Optimize for making Puppet development EASY Bridge the gap between dev & ops Tip: use a c1.medium at least
  • 38. Put your Puppet code in VCS I really don't need to explain why, right?
  • 39. Run multiple Puppet environments http://docs.puppetlabs.com/guides/environment.html We put 1 host of each cluster in puppet environment development, 1 in staging, the rest in production Don't break everything at once :)
  • 40. Split your Puppet code into modules We use: Forge, Components, Services http://docs.puppetlabs.com/guides/modules.html
  • 41. Use seperate init.pp, params.pp & config.pp Params.pp so you can include variables from elsewhere Config.pp lets you specify: kfoo::config { $fqdn } in a service and require: Kfoo::Config[ $fqdn ] in the component http://docs.puppetlabs.com/guides/modules.html
  • 42. Use a common base class Set up all the plumbing from users, to apt, to filesystems, to mounts, ntp, sudo, git, monitoring, ssh, and so on. Run it early using run stages
  • 43. Sample Service class s_webui { include kbase include kapache include kwebui include kredis kwebui { $fqdn: } kapache::vhost { $fqdn: ssl => 443 } kredis::config { $fqdn: memory => '100M' } }
  • 44. Write tools to make you more productive Enable developers to run their own Puppet master Create new components easily Push changes to production Our code: https://github.com/krux/ops-tools /
  • 45. Your own Puppet server & manifests puppet001:puppet-jib$ screen -S jib.puppetmaster bin/run_puppet_master_locally 8180 Running: sudo puppet master --no-daemonize --verbose --debug --masterport 8180 --pidfile /mnt/tmp/puppetmaster.8180.pid --confdir /data/git/puppet-jib/bin/.. ..... notice: Starting Puppet master version 2.6.3 .....
  • 46. Our Layout $git/ bin/ update_env.pl run_puppet_master_locally.pl new_component.pl env/ development/ forge/ krux-modules/ services/ staging/ ... production/ ...
  • 47. Use an External Node Classifier Manage your host specific configuration separately from your manifests http://docs.puppetlabs.com/guides/external_nodes.html Our code: https://github.com/krux/ops-tools /blob/puppet/bin/node_classifier.py
  • 48. Keep node configuration in an editable location We chose S3 Git, LDAP, or anything else that works for you.
  • 49. Sign nodes that have a configuration only Keyed off their certname, run periodically Inspired by: http://ubuntumathiaz.wordpress.com/2010/03/24/using- puppet-in-uecec2-puppet-support-in-ubuntu-images/ Our code: https://github.com/krux/ops-tools /blob/puppet/bin/check_csr.py
  • 50. Master Puppet.conf [master] ....... node_terminus = exec external_nodes = /usr/bin/node_classifier.py --bucket instances reports = http, store, foreman ### different puppet environments: development, staging, production [development] templatedir = $confdir/env/development/templates modulepath = $confdir/env/development/krux-modules: $confdir/env/development/forge: $confdir/env/development/services [....]
  • 51. Sample Configuration { 'classes': ['s_sandbox::jib'], 'parameters': { 'zone': 'us-east-1c', 'instance_type': 'c1.medium', 'instance_id': 'i-23a3d042', 'security_group': 'krux-ops-dev', 'puppet_environment': 'development', 'puppet_master_port': 8180, 'kredis_save_to_disk': 0 'certname': 'ops-dev003.example.com. 47334fd8-1516-451d-bd5a-8760ab2a36c0', }}
  • 52. Attend a Puppet Master Training! No, I don't get a kick back :) http://puppetlabs.com/services/training-workshops/
  • 55. Email Reports & Alerts This feature alone is worth installing it. Run it on the same host as your Puppet master for minimal friction http://theforeman.org/projects/foreman/wiki/ Summarized_E-Mail_Reports
  • 57. Theoretically: Node Classifier http://theforeman.org/projects/foreman/wiki/ External_Nodes We are happy with S3 based solution YMMV though: do look into it!
  • 59. Python Boto & s3cmd
  • 60. $ s3cmd put file.txt s3://my-bucket Great for cronjobs, maintenance tasks & file syncs Consider s3://my-dropbox for your company http://s3tools.org/s3cmd
  • 61. boto: Full python API access to AWS Boto + AWS + Puppet = Real 'Infrastructure as Code' http://code.google.com/p/boto/
  • 62. start_instance.py: Launch AWS nodes Manage zone, security group, type ami, puppet class, EBS, hostname Bootstraps the node for puppet, integrates with external node classifier Our code: https://github.com/krux/ops-tools /blob/aws/bin/start_instance.py
  • 63. $ start_instance.py -t m1.large -z us-east-1a -a 10 -H dev001.example.com -s mycorp-development ami-2ec83147 s_development Starting instance of ami ami-2ec83147 - this may take a while ......... started i-12345678 Attaching 10gb volume to instance i-12345678 - this may take a while ..... attached vol-87654321 Created these DNS entries: dev001.example.com => ec2-172-131-213-58.compute-1.amazonaws.com Wrote configuration to S3 key: s3://instances/dev001.example.com.47334fd8-1516-451d-bd5a-8760ab2a36c0
  • 64. security_groups.py Manage & Sync Programmatically manage your security groups keep groups in sync across regions Our code: https://github.com/krux/ops-tools /blob/aws/bin/security_groups.py
  • 66. Free developer account 1 Free node with all features, unlimited nodes with basic features Free: HTTP(S), PING, SSH, DNS, TCP Premium: HTTP JSON(!), Custom plugins, Mysql, Apache mod_status, etc. Get a 2nd free node through referral: https://cloudkick.com/referral/633f0729
  • 68. Puppet classes & config information Monitoring & Alerts
  • 69. Generate your cloudkick.conf from Puppet Use puppet classes, tags, colors as you define them as cloudkick tags Our code for doing so: https://gist.github.com/1230044
  • 70. Cloudkick Gem for parallel-ssh Uses your cloudkick tags to do node selection, which are based straight off your puppet classes & facts https://github.com/cloudkick/cloudkick-gem
  • 71. Cloudkick pssh $ cloudkick pssh --query 'node:redis-c*' 'hostname' [1] 18:38:23 [SUCCESS] 64.206.11.221 redis-c-slave001.example.com [2] 18:38:23 [SUCCESS] 52.13.118.158 redis-c-master001.example.com [3] 18:38:24 [SUCCESS] 52.16.34.217 redis-c-slave004.example.com [4] 18:38:24 [SUCCESS] 183.71.131.32 redis-c-slave002.example.com
  • 72. Krux Improvements: pscp, listing nodes Get it from our github: https://github.com/krux/cloudkick-gem Fork and contribute!
  • 73. Cloudkick list $cloudkick list --full --query 'node:redis-c*' # Name IP Type Zone redis-c-master001 52.13.118.158 m2.4xlarge us-east-1a redis-c-slave001 64.206.11.221 m2.4xlarge us-east-1a redis-c-slave002 183.71.131.32 m2.4xlarge us-east-1b redis-c-slave004 52.16.34.217 m2.4xlarge us-east-1d
  • 74. Take away: Measure Everything! Further reading: Pagerduty for cell phone/pager/email alerts New Relic for more in depth app monitoring MCollective for more advanced task parallelization
  • 75. Just one more thing....
  • 77. VirtualBox + Ubuntu + Puppet = JFDI Use same puppet infrastructure to provision dev machines locally Put it on a USB stick, be up and running in 30 minutes Our code for doing so: https://gist.github.com/1230221
  • 79. Slides at: slideshare.net/jiboumans Follow us: @KruxEngineering We're Hiring: kruxdigital.com