One-Man Ops

19,303 views
19,278 views

Published on

One-Man Ops with Puppet & Friends.

If you're getting started in Amazon AWS here's 7 tools that will help you be successful, a few tips to make your life easier and some common pitfalls to avoid.

One-Man Ops

  1. 1. One-Man Opswith Puppet & Friends Jos BoumansOperations @ Krux Digital
  2. 2. RIPE NCC
  3. 3. Can I haveanother /8 please? How you know us
  4. 4. Ubuntu Server
  5. 5. 10.04 LTS
  6. 6. 10.10
  7. 7. AWS Integration
  8. 8. Krux
  9. 9. Good guys ofData Privacy
  10. 10. Not to be confused with...
  11. 11. Our Traffic• Serving 4000-10000 user & contextual data requests/second• Sub 100 ms response times• Processing ~150 gb of raw data per day• Twitter: Average ~3000 tweets/second
  12. 12. Our Infrastructure• Started small on AWS. Now:• 100 dedicated nodes• +100-200 on demand Map/Reduce nodes• Dozens of local development machines• 20 different types of machines
  13. 13. One-Man Ops team
  14. 14. Sad Panda
  15. 15. Go from here...
  16. 16. ... to here
  17. 17. Your Toolkit
  18. 18. Ubuntu 10.04
  19. 19. cloud-initUses AMI user-data to bootstrap puppet on the client https://help.ubuntu.com/community/CloudInit http://www.youtube.com/watch?v=-zL3BdbKyGY
  20. 20. #cloud-config### Update puppet to 2.6.3apt_sources:- source: "ppa:mathiaz/puppet-backports"apt_update: trueapt_upgrade: truessh-rsa: AAAAB3NzaC.....+ujFHzpuppet: conf: puppetd: server: "puppet.example.com" # certname %i: instanceid, %f: fqdn of the machine certname: "%i.%f" ca_cert: | -----BEGIN CERTIFICATE----- ....
  21. 21. monthly updateshttp://uec-images.ubuntu.com/query/lucid/server/ released.current.txt
  22. 22. you can upgrade the kernel Only AMI that I know that can do thishttp://cloud.ubuntu.com/2011/02/migrating-to-pv- grub-kernels-for-kernel-upgrades/
  23. 23. Updated software for 10.04 Backported builds for Apache, Memcache, Mysql, PHP, etc https://launchpad.net/~ubuntu-server-edgers
  24. 24. I may be biased
  25. 25. AWS
  26. 26. <3 Elastic Load BalancerTheyre free and will save you more than oncehttp://aws.amazon.com/elasticloadbalancing/
  27. 27. <3 S3(Simple Storage Service) Great cheap data retention Good poor mans CDN http://aws.amazon.com/s3
  28. 28. Tip: Get ExpanDrive forgreat SSHFS and S3FS Available for Windows and Mac: http://www.expandrive.com/
  29. 29. RDS > Own MySQL Hot Standby - Failover is ~7 minutesRead Replicates - Improve read performance BUT, you cant replicate out of RDS :( http://aws.amazon.com/rds/
  30. 30. Use EBS Root (Elastic Block Storage)You can reboot and stop/start machines and keep state Consider attaching extra EBS for data persistenceTip: Software raid for multiple EBS drives for better IO
  31. 31. </3 Network Partitioning This will happen to you a lotRelying on network connections will decrease availability of your machines
  32. 32. </3 Floating public IPS AWS DHCP server is flaky AWS DNS TTL is 60 secondsLimited amount of fixed public IPs
  33. 33. Sort your DNS AWS offers http://aws.amazon.com/route53/When you go multi data center or have big traffic, seriously consider Dyn: http://dyn.com/dns/
  34. 34. Avoid SinglePoints of Failure Because they WILL fail. Architect for eventually consistent, distributed systems where you can.
  35. 35. Remember him..?
  36. 36. Puppet
  37. 37. Optimize for makingPuppet development EASY Bridge the gap between dev & ops Tip: use a c1.medium at least
  38. 38. Put your Puppet code in VCSI really dont need to explain why, right?
  39. 39. Run multiple Puppet environmentshttp://docs.puppetlabs.com/guides/environment.htmlWe put 1 host of each cluster in puppet environment development, 1 in staging, the rest in production Dont break everything at once :)
  40. 40. Split your Puppet code into modules We use: Forge, Components, Serviceshttp://docs.puppetlabs.com/guides/modules.html
  41. 41. Use seperate init.pp,params.pp & config.ppParams.pp so you can include variables from elsewhere Config.pp lets you specify: kfoo::config { $fqdn } in a service and require: Kfoo::Config[ $fqdn ] in the component http://docs.puppetlabs.com/guides/modules.html
  42. 42. Use a common base classSet up all the plumbing from users, to apt, to filesystems, to mounts, ntp, sudo, git, monitoring, ssh, and so on. Run it early using run stages
  43. 43. Sample Serviceclass s_webui { include kbase include kapache include kwebui include kredis kwebui { $fqdn: } kapache::vhost { $fqdn: ssl => 443 } kredis::config { $fqdn: memory => 100M }}
  44. 44. Write tools to makeyou more productiveEnable developers to run their own Puppet master Create new components easily Push changes to production Our code: https://github.com/krux/ops-tools /
  45. 45. Your own Puppet server & manifestspuppet001:puppet-jib$ screen -S jib.puppetmaster bin/run_puppet_master_locally 8180Running: sudo puppet master --no-daemonize --verbose --debug --masterport 8180 --pidfile /mnt/tmp/puppetmaster.8180.pid --confdir /data/git/puppet-jib/bin/.......notice: Starting Puppet master version 2.6.3.....
  46. 46. Our Layout$git/ bin/ update_env.pl run_puppet_master_locally.pl new_component.pl env/ development/ forge/ krux-modules/ services/ staging/ ... production/ ...
  47. 47. Use an External Node Classifier Manage your host specific configuration separately from your manifestshttp://docs.puppetlabs.com/guides/external_nodes.htmlOur code: https://github.com/krux/ops-tools /blob/puppet/bin/node_classifier.py
  48. 48. Keep nodeconfiguration in an editable location We chose S3Git, LDAP, or anything else that works for you.
  49. 49. Sign nodes that have a configuration only Keyed off their certname, run periodically Inspired by:http://ubuntumathiaz.wordpress.com/2010/03/24/using- puppet-in-uecec2-puppet-support-in-ubuntu-images/ Our code: https://github.com/krux/ops-tools /blob/puppet/bin/check_csr.py
  50. 50. Master Puppet.conf[master].......node_terminus = execexternal_nodes = /usr/bin/node_classifier.py --bucket instancesreports = http, store, foreman### different puppet environments: development, staging, production[development]templatedir = $confdir/env/development/templatesmodulepath = $confdir/env/development/krux-modules: $confdir/env/development/forge: $confdir/env/development/services[....]
  51. 51. Sample Configuration{ classes: [s_sandbox::jib], parameters: { zone: us-east-1c, instance_type: c1.medium, instance_id: i-23a3d042, security_group: krux-ops-dev, puppet_environment: development, puppet_master_port: 8180, kredis_save_to_disk: 0 certname: ops-dev003.example.com. 47334fd8-1516-451d-bd5a-8760ab2a36c0,}}
  52. 52. Attend a Puppet Master Training! No, I dont get a kick back :)http://puppetlabs.com/services/training-workshops/
  53. 53. ... avoid becoming him
  54. 54. Foreman
  55. 55. Email Reports & Alerts This feature alone is worth installing it. Run it on the same host as your Puppet master for minimal frictionhttp://theforeman.org/projects/foreman/wiki/ Summarized_E-Mail_Reports
  56. 56. Dashboard / Browser
  57. 57. Theoretically: Node Classifierhttp://theforeman.org/projects/foreman/wiki/ External_Nodes We are happy with S3 based solution YMMV though: do look into it!
  58. 58. Theoretically:Initiate Puppetrunhttp://theforeman.org/projects/foreman/wiki/ Puppetrun Couldnt get it to work though :(
  59. 59. Python Boto & s3cmd
  60. 60. $ s3cmd put file.txt s3://my-bucketGreat for cronjobs, maintenance tasks & file syncs Consider s3://my-dropbox for your company http://s3tools.org/s3cmd
  61. 61. boto: Full python API access to AWS Boto + AWS + Puppet = Real Infrastructure as Code http://code.google.com/p/boto/
  62. 62. start_instance.py: Launch AWS nodes Manage zone, security group, type ami, puppet class, EBS, hostname Bootstraps the node for puppet, integrates with external node classifierOur code: https://github.com/krux/ops-tools /blob/aws/bin/start_instance.py
  63. 63. $ start_instance.py -t m1.large -z us-east-1a -a 10 -H dev001.example.com -s mycorp-development ami-2ec83147 s_developmentStarting instance of ami ami-2ec83147 - this may take a while......... started i-12345678Attaching 10gb volume to instance i-12345678 - this may take a while..... attached vol-87654321Created these DNS entries: dev001.example.com => ec2-172-131-213-58.compute-1.amazonaws.comWrote configuration to S3 key: s3://instances/dev001.example.com.47334fd8-1516-451d-bd5a-8760ab2a36c0
  64. 64. security_groups.py Manage & Sync Programmatically manage your security groups keep groups in sync across regionsOur code: https://github.com/krux/ops-tools /blob/aws/bin/security_groups.py
  65. 65. Monitoring & Graphing
  66. 66. Free developer account 1 Free node with all features, unlimited nodes with basic features Free: HTTP(S), PING, SSH, DNS, TCPPremium: HTTP JSON(!), Custom plugins, Mysql, Apache mod_status, etc. Get a 2nd free node through referral: https://cloudkick.com/referral/633f0729
  67. 67. Performance Graphs
  68. 68. Puppet classes & config informationMonitoring & Alerts
  69. 69. Generate your cloudkick.conf from Puppet Use puppet classes, tags, colors as you define them as cloudkick tagsOur code for doing so: https://gist.github.com/1230044
  70. 70. Cloudkick Gem for parallel-ssh Uses your cloudkick tags to do node selection,which are based straight off your puppet classes & facts https://github.com/cloudkick/cloudkick-gem
  71. 71. Cloudkick pssh$ cloudkick pssh --query node:redis-c* hostname[1] 18:38:23 [SUCCESS] 64.206.11.221redis-c-slave001.example.com[2] 18:38:23 [SUCCESS] 52.13.118.158redis-c-master001.example.com[3] 18:38:24 [SUCCESS] 52.16.34.217redis-c-slave004.example.com[4] 18:38:24 [SUCCESS] 183.71.131.32redis-c-slave002.example.com
  72. 72. Krux Improvements: pscp, listing nodes Get it from our github: https://github.com/krux/cloudkick-gem Fork and contribute!
  73. 73. Cloudkick list$cloudkick list --full --query node:redis-c*# Name IP Type Zoneredis-c-master001 52.13.118.158 m2.4xlarge us-east-1aredis-c-slave001 64.206.11.221 m2.4xlarge us-east-1aredis-c-slave002 183.71.131.32 m2.4xlarge us-east-1bredis-c-slave004 52.16.34.217 m2.4xlarge us-east-1d
  74. 74. Take away:Measure Everything! Further reading: Pagerduty for cell phone/pager/email alerts New Relic for more in depth app monitoringMCollective for more advanced task parallelization
  75. 75. Just one more thing....
  76. 76. Vagrant
  77. 77. VirtualBox + Ubuntu + Puppet = JFDI Use same puppet infrastructure to provision dev machines locallyPut it on a USB stick, be up and running in 30 minutesOur code for doing so: https://gist.github.com/1230221
  78. 78. Thank You!
  79. 79. Slides at: slideshare.net/jiboumans Follow us: @KruxEngineering Were Hiring: kruxdigital.com

×