Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

One-Man Ops

One-Man Ops with Puppet & Friends.

If you're getting started in Amazon AWS here's 7 tools that will help you be successful, a few tips to make your life easier and some common pitfalls to avoid.

One-Man Ops

  1. 1. One-Man Opswith Puppet & Friends Jos BoumansOperations @ Krux Digital
  2. 2. RIPE NCC
  3. 3. Can I haveanother /8 please? How you know us
  4. 4. Ubuntu Server
  5. 5. 10.04 LTS
  6. 6. 10.10
  7. 7. AWS Integration
  8. 8. Krux
  9. 9. Good guys ofData Privacy
  10. 10. Not to be confused with...
  11. 11. Our Traffic• Serving 4000-10000 user & contextual data requests/second• Sub 100 ms response times• Processing ~150 gb of raw data per day• Twitter: Average ~3000 tweets/second
  12. 12. Our Infrastructure• Started small on AWS. Now:• 100 dedicated nodes• +100-200 on demand Map/Reduce nodes• Dozens of local development machines• 20 different types of machines
  13. 13. One-Man Ops team
  14. 14. Sad Panda
  15. 15. Go from here...
  16. 16. ... to here
  17. 17. Your Toolkit
  18. 18. Ubuntu 10.04
  19. 19. cloud-initUses AMI user-data to bootstrap puppet on the client
  20. 20. #cloud-config### Update puppet to 2.6.3apt_sources:- source: "ppa:mathiaz/puppet-backports"apt_update: trueapt_upgrade: truessh-rsa: AAAAB3NzaC.....+ujFHzpuppet: conf: puppetd: server: "" # certname %i: instanceid, %f: fqdn of the machine certname: "%i.%f" ca_cert: | -----BEGIN CERTIFICATE----- ....
  21. 21. monthly updates released.current.txt
  22. 22. you can upgrade the kernel Only AMI that I know that can do this grub-kernels-for-kernel-upgrades/
  23. 23. Updated software for 10.04 Backported builds for Apache, Memcache, Mysql, PHP, etc
  24. 24. I may be biased
  25. 25. AWS
  26. 26. <3 Elastic Load BalancerTheyre free and will save you more than once
  27. 27. <3 S3(Simple Storage Service) Great cheap data retention Good poor mans CDN
  28. 28. Tip: Get ExpanDrive forgreat SSHFS and S3FS Available for Windows and Mac:
  29. 29. RDS > Own MySQL Hot Standby - Failover is ~7 minutesRead Replicates - Improve read performance BUT, you cant replicate out of RDS :(
  30. 30. Use EBS Root (Elastic Block Storage)You can reboot and stop/start machines and keep state Consider attaching extra EBS for data persistenceTip: Software raid for multiple EBS drives for better IO
  31. 31. </3 Network Partitioning This will happen to you a lotRelying on network connections will decrease availability of your machines
  32. 32. </3 Floating public IPS AWS DHCP server is flaky AWS DNS TTL is 60 secondsLimited amount of fixed public IPs
  33. 33. Sort your DNS AWS offers you go multi data center or have big traffic, seriously consider Dyn:
  34. 34. Avoid SinglePoints of Failure Because they WILL fail. Architect for eventually consistent, distributed systems where you can.
  35. 35. Remember him..?
  36. 36. Puppet
  37. 37. Optimize for makingPuppet development EASY Bridge the gap between dev & ops Tip: use a c1.medium at least
  38. 38. Put your Puppet code in VCSI really dont need to explain why, right?
  39. 39. Run multiple Puppet environments put 1 host of each cluster in puppet environment development, 1 in staging, the rest in production Dont break everything at once :)
  40. 40. Split your Puppet code into modules We use: Forge, Components, Services
  41. 41. Use seperate init.pp,params.pp & config.ppParams.pp so you can include variables from elsewhere Config.pp lets you specify: kfoo::config { $fqdn } in a service and require: Kfoo::Config[ $fqdn ] in the component
  42. 42. Use a common base classSet up all the plumbing from users, to apt, to filesystems, to mounts, ntp, sudo, git, monitoring, ssh, and so on. Run it early using run stages
  43. 43. Sample Serviceclass s_webui { include kbase include kapache include kwebui include kredis kwebui { $fqdn: } kapache::vhost { $fqdn: ssl => 443 } kredis::config { $fqdn: memory => 100M }}
  44. 44. Write tools to makeyou more productiveEnable developers to run their own Puppet master Create new components easily Push changes to production Our code: /
  45. 45. Your own Puppet server & manifestspuppet001:puppet-jib$ screen -S jib.puppetmaster bin/run_puppet_master_locally 8180Running: sudo puppet master --no-daemonize --verbose --debug --masterport 8180 --pidfile /mnt/tmp/ --confdir /data/git/puppet-jib/bin/.......notice: Starting Puppet master version 2.6.3.....
  46. 46. Our Layout$git/ bin/ env/ development/ forge/ krux-modules/ services/ staging/ ... production/ ...
  47. 47. Use an External Node Classifier Manage your host specific configuration separately from your manifests code: /blob/puppet/bin/
  48. 48. Keep nodeconfiguration in an editable location We chose S3Git, LDAP, or anything else that works for you.
  49. 49. Sign nodes that have a configuration only Keyed off their certname, run periodically Inspired by: puppet-in-uecec2-puppet-support-in-ubuntu-images/ Our code: /blob/puppet/bin/
  50. 50. Master Puppet.conf[master].......node_terminus = execexternal_nodes = /usr/bin/ --bucket instancesreports = http, store, foreman### different puppet environments: development, staging, production[development]templatedir = $confdir/env/development/templatesmodulepath = $confdir/env/development/krux-modules: $confdir/env/development/forge: $confdir/env/development/services[....]
  51. 51. Sample Configuration{ classes: [s_sandbox::jib], parameters: { zone: us-east-1c, instance_type: c1.medium, instance_id: i-23a3d042, security_group: krux-ops-dev, puppet_environment: development, puppet_master_port: 8180, kredis_save_to_disk: 0 certname: 47334fd8-1516-451d-bd5a-8760ab2a36c0,}}
  52. 52. Attend a Puppet Master Training! No, I dont get a kick back :)
  53. 53. ... avoid becoming him
  54. 54. Foreman
  55. 55. Email Reports & Alerts This feature alone is worth installing it. Run it on the same host as your Puppet master for minimal friction Summarized_E-Mail_Reports
  56. 56. Dashboard / Browser
  57. 57. Theoretically: Node Classifier External_Nodes We are happy with S3 based solution YMMV though: do look into it!
  58. 58. Theoretically:Initiate Puppetrun Puppetrun Couldnt get it to work though :(
  59. 59. Python Boto & s3cmd
  60. 60. $ s3cmd put file.txt s3://my-bucketGreat for cronjobs, maintenance tasks & file syncs Consider s3://my-dropbox for your company
  61. 61. boto: Full python API access to AWS Boto + AWS + Puppet = Real Infrastructure as Code
  62. 62. Launch AWS nodes Manage zone, security group, type ami, puppet class, EBS, hostname Bootstraps the node for puppet, integrates with external node classifierOur code: /blob/aws/bin/
  63. 63. $ -t m1.large -z us-east-1a -a 10 -H -s mycorp-development ami-2ec83147 s_developmentStarting instance of ami ami-2ec83147 - this may take a while......... started i-12345678Attaching 10gb volume to instance i-12345678 - this may take a while..... attached vol-87654321Created these DNS entries: => ec2-172-131-213-58.compute-1.amazonaws.comWrote configuration to S3 key: s3://instances/
  64. 64. Manage & Sync Programmatically manage your security groups keep groups in sync across regionsOur code: /blob/aws/bin/
  65. 65. Monitoring & Graphing
  66. 66. Free developer account 1 Free node with all features, unlimited nodes with basic features Free: HTTP(S), PING, SSH, DNS, TCPPremium: HTTP JSON(!), Custom plugins, Mysql, Apache mod_status, etc. Get a 2nd free node through referral:
  67. 67. Performance Graphs
  68. 68. Puppet classes & config informationMonitoring & Alerts
  69. 69. Generate your cloudkick.conf from Puppet Use puppet classes, tags, colors as you define them as cloudkick tagsOur code for doing so:
  70. 70. Cloudkick Gem for parallel-ssh Uses your cloudkick tags to do node selection,which are based straight off your puppet classes & facts
  71. 71. Cloudkick pssh$ cloudkick pssh --query node:redis-c* hostname[1] 18:38:23 [SUCCESS][2] 18:38:23 [SUCCESS][3] 18:38:24 [SUCCESS][4] 18:38:24 [SUCCESS]
  72. 72. Krux Improvements: pscp, listing nodes Get it from our github: Fork and contribute!
  73. 73. Cloudkick list$cloudkick list --full --query node:redis-c*# Name IP Type Zoneredis-c-master001 m2.4xlarge us-east-1aredis-c-slave001 m2.4xlarge us-east-1aredis-c-slave002 m2.4xlarge us-east-1bredis-c-slave004 m2.4xlarge us-east-1d
  74. 74. Take away:Measure Everything! Further reading: Pagerduty for cell phone/pager/email alerts New Relic for more in depth app monitoringMCollective for more advanced task parallelization
  75. 75. Just one more thing....
  76. 76. Vagrant
  77. 77. VirtualBox + Ubuntu + Puppet = JFDI Use same puppet infrastructure to provision dev machines locallyPut it on a USB stick, be up and running in 30 minutesOur code for doing so:
  78. 78. Thank You!
  79. 79. Slides at: Follow us: @KruxEngineering Were Hiring: