Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Familiar Smells I've Detected in Your Systems Engineering Organization...And How to Fix Them

218 views

Published on

Talk from LISA 2018, Nashville TN

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Familiar Smells I've Detected in Your Systems Engineering Organization...And How to Fix Them

  1. 1. October 29–31, 2018 | Nashville, TN, USA www.usenix.org/lisa18 #lisa18 Familiar Smells I’ve Detected in Your Systems Engineering Organization… and How to Fix Them Dave Mangot @davemangot
  2. 2. October 29–31, 2018 | Nashville, TN, USA www.usenix.org/lisa18 #lisa18 Familiar Smells I’ve Detected in Your Systems Engineering Organization… and How to Fix Them Dave Mangot @davemangot
  3. 3. About Me 20+ years in systems engineering, have led DevOps and Systems Engineering transformations at multiple companies Big companies: Cable & Wireless, Salesforce, SolarWinds Small Companies: Terracotta, Tagged, Librato International DevOps contributor
  4. 4. I’ve got, so much trouble in my mind… Sir Joe Quarterman & Free Soul - I've Got So Much Trouble On My Mind - https://www.youtube.com/watch?v=0sZedHraIiY
  5. 5. Dat darn pager
  6. 6. Outsourced Ops The road to hell is paved with short term projects that never died
  7. 7. Don’t Fix Problems Solve Them
  8. 8. Crawl - Walk - Run
  9. 9. Top N
  10. 10. Conway’s Law - Eng/Ops VP of Architecture VP of Engineering VP of Operations
  11. 11. Follow the Sun MTTR SLO You must be this tall…
  12. 12. I’ve got, so much trouble in my mind… Sir Joe Quarterman & Free Soul - I've Got So Much Trouble On My Mind - https://www.youtube.com/watch?v=0sZedHraIiY
  13. 13. Untested Infrastructure
  14. 14. Run Books (procedural) DON’T DO IT!
  15. 15. Change Control
  16. 16. Crawl - Walk - Run
  17. 17. Infra Unit Tests • Rspec/Test Kitchen • ServerSpec • https://goss.rocks
  18. 18. Tests in Staging • Stage is like prod • Stage is like prod • Stage is like prod
  19. 19. Automated Deployment • Dark Launches / Feature Flags • Canaries • Blue Green Deploys • Cfg Management (stateful tiers)
  20. 20. I’ve got, so much trouble in my mind… Sir Joe Quarterman & Free Soul - I've Got So Much Trouble On My Mind - https://www.youtube.com/watch?v=0sZedHraIiY
  21. 21. Boring Technology
  22. 22. Avoiding Failure •bonded network interfaces •RAID cards •SAN storage
  23. 23. Taleb Black Swan “the problem with artificially suppressed volatility is not just that the system tends to become extremely fragile; it is that, at the same time, it exhibits no visible risks... These artificially constrained systems become prone to Black Swans. Such environments eventually experience massive blowups... catching everyone off guard and undoing years of stability or, in almost all cases, ending up far worse than they were in their initial volatile state” (p105)1. - Nassim Taleb https://continuousdelivery.com/2013/01/on-antifragility-in-systems-and-organizational-architecture/
  24. 24. MTTR > MTBF 'So instead of trying to prevent errors, optimize for a relatively frequent occurrence of low-impact errors. “Failure free operations require experience with failure.” For infrequently occurring errors, this means we need to induce them.' - Aaron Blohowiak https://www.linkedin.com/pulse/reliability-works-aaron-blohowiak/?published=t
  25. 25. Crawl - Walk - Run
  26. 26. Configure With Code
  27. 27. Production Readiness Gameday
  28. 28. Chaos Engineering
  29. 29. Crawl - Walk - Run Stage is like prod (x 3) Choose Your Incentives!
  30. 30. Questions? @davemangot

×