Clearly, I Have Made Some Bad Decisions

6,863 views

Published on

Too often in the organization of this conference we have heard "but I don't have scalability issues".

This talk discusses what scalability issues actually are, and details why we all inevitably have them. Avoiding them, or delaying solutions until they are unavoidable, leads to making many bad "temporary" decisions that cannot be fixed further down the line.

I will discuss the methodologies and best practices that are required in order to be scalable, and describe the common mistakes they will temper, and why they should be implemented immediately. Finally, I will briefly touch on how to deal with rectifying the bad decisions that we all inevitably make, no matter how forward-thinking we are.

Published in: Technology
  • Be the first to comment

Clearly, I Have Made Some Bad Decisions

  1. “I don’t have scaling problems”
  2. Scaling is about change not about quantity
  3. Problems don’t occur when things are normal
  4. If things change, you will have scaling problems
  5. Work takes time to do
  6. Work takes time to do Email needs to be read
  7. Work takes time to do Email needs to be readCode runs on a server
  8. M is t ake !“I don’t have scaling problems”
  9. M is t ake !“I don’t have scaling problems”Not a mistake we’re making if we’re here?
  10. Mistakes will be made Problems will happen
  11. Mistakes will be made Problems will happenBut there are things we can do to be prepared
  12. #1 Measure Everything
  13. How do you know ifsomething is wrong?
  14. How do you know ifsomething is wrong? not wrong?
  15. # uptime 17:27:18 up 405 days, 2:36, 1 user,load average: 26.93, 10.46, 6.16 !?!?
  16. # uptime 17:27:18 up 405 days, 2:36, 1 user,load average: 26.93, 10.46, 6.16
  17. Read your log files
  18. Read your log files(Exceptions aren’t always exceptional)
  19. Measure in production(hat tip: Coda, “metrics, metrics everywhere”) That’s the only place where things are really happening
  20. Measure in production(hat tip: Coda, “metrics, metrics everywhere”) That’s the only place where things are really happening But don’t let your metrics cause performance problems
  21. PING web (192.168.19.1): 56 data bytesRequest timeout for icmp_seq 0Request timeout for icmp_seq 1Request timeout for icmp_seq 2Request timeout for icmp_seq 3 Sometimes you can just tell things are wrong
  22. #2 Infrastructure as code (and config management)
  23. Don’t do this.
  24. Chef or Puppet (or cfengine or bcfg2)Server config is code
  25. Chef or Puppet (or cfengine or bcfg2)Server config is code Revision control
  26. Chef or Puppet (or cfengine or bcfg2)Server config is code Revision control Feature branches
  27. Chef or Puppet (or cfengine or bcfg2)Server config is code Revision control Feature branches Commenting and authorship
  28. Chef or Puppet (or cfengine or bcfg2)Server config is code Revision control Feature branches Commenting and authorship Centralized (not in someone’s head)
  29. Should I choose Chef or Puppet?
  30. Should I choose Chef or Puppet? Yes (Seriously, this is non-negotiable.)
  31. How do I switch my servers to start using config management? My advice: build new ones, throw the old ones away.
  32. Clean Known stateBuild test clusters
  33. Clean Known state Buil d test clusters troyDes
  34. Clean Known state Buil d test clusters troyDes Build live machines
  35. Clean Known state Buil d test clusters troyD es Bu ild e live machines U s
  36. Clean Known state Buil d test clusters troyD es Bu ild Use live machines! tr oy D es
  37. One-button servers What about your code?
  38. #3a Real deployment
  39. $ svn upU www/index.phpU www/payments.phpU www/settings-live.phpU www/settings-dev.phpA www/specials.php U .Updated to revision 9703. Don’t do this.
  40. Deployment is more than just putting code in place.
  41. Deployment is more than just putting code in place. reproducible idempotent rollouts
  42. Deployment is more than just putting code in place. reproducible idempotent rollouts tied to a known build number
  43. Deployment is more than just putting code in place. reproducible idempotent rollouts tied to a known build number with separately-versioned known configuration
  44. Deployment is more than just putting code in place. reproducible idempotent rollouts tied to a known build number with separately-versioned known configurationtriggered non-manually across any number of servers
  45. Deployment is more than just putting code in place. reproducible idempotent rollouts tied to a known build number with separately-versioned known configurationtriggered non-manually across any number of servers with full dependency management
  46. Deployment is more than just putting code in place. reproducible idempotent rollouts tied to a known build number with separately-versioned known configurationtriggered non-manually across any number of servers with full dependency management and automated regression testing.
  47. Etsy’s Deployinator FabricOS Packages Vlad the DeployerCapistrano Roll your own
  48. #3b Continuous deployment
  49. Holy Grailtrunk = live tests block commits feature flags? dark launches?
  50. Cowboy vsPerfectionist
  51. Fast iteration = fast test results Ten new tiny features tested Two accepted One huge feature tested ... and rejected
  52. Failure is comfortableConsequences immediately visible Blame out, responsibility in
  53. Okay, fine:Continuous Integration
  54. After all thatThings still go wrong
  55. #4 Plan for failure
  56. Take backupsTest backups
  57. Automate serversTest server crashes
  58. Netflix’s Chaos Monkey And cousins: the Simian Army
  59. Server failures predicted and foiledWhat about code? New features?
  60. #5 Future Compatibility
  61. ALTER TABLE `user` ADD COLUMN `twootr` VARCHAR(16);CREATE INDEX `twootr_idx` ON `user` (`twootr`); Don’t do this. (on live)
  62. “Future compatible” schemas Normalized tables are performance heavy “Future compatible” code Don’t assume any columns?
  63. Shiny new Yucky old Migrate ? Write Read
  64. What about other bad decisions?
  65. #6 Wing It
  66. spof.yola.com - Django - MySQL Scheduled for reboot
  67. spof.yola.com - Django - MySQL Slave Replication - MySQL
  68. spof.yola.com - Django - MySQL - MySQL
  69. spof.yola.com - Django - Django Slave replication - MySQL - MySQL
  70. spof.yola.com LB - Django - Django Slave replication - MySQL - MySQL
  71. spof.yola.com LB Drop DNS TTL - Django - Django Slave replication - MySQL - MySQL
  72. spof.yola.com LB - Django - Django Slave replication - MySQL - MySQL
  73. k ay ’s o t i tB u
  74. Jonathan Hitchcock @vhata github.com/vhata

×