Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DOES15 - Randy Shoup - Ten (Hard-Won) Lessons of the DevOps Transition

394 views

Published on

Randy Shoup, Consulting CTO

DevOps is no longer just for Internet unicorns any more. Today many large enterprises are transitioning from the slow and siloed traditional IT approach to modern DevOps practices, and getting substantial improvements in agility, velocity, scalability, and efficiency. But this transition is not without its challenges and pitfalls, and those of us who have led this journey have the scar tissue to prove it.

A successful transition to DevOps practices ultimately involves changes to organization, to culture, and to architecture. Organizationally, we want to create multi-skilled teams with end-to-end ownership and shared on-call responsibilities. Culturally, we want to prioritize solving problems and improving the product over closing tickets. Architecturally, we want to move to an infrastructure with independently testable and deployable components.

The ten practical lessons outlined in this session synthesize the speaker’s experiences leading teams at eBay, Google, and KIXEYE, as well as from his current consulting practice.

Published in: Technology
  • Be the first to comment

DOES15 - Randy Shoup - Ten (Hard-Won) Lessons of the DevOps Transition

  1. 1. Ten (Hard-Won) Lessons of the DevOps Transition Randy Shoup @randyshoup linkedin.com/in/randyshoup
  2. 2. 1. Reorganize Teams Around Ownership • End-to-end Ownership o Small, cross-functional team owns application / service from design to deployment to retirement o Team has inside it all skill sets needed to do the job o Depends on other teams for supporting services o Able to move very rapidly and independently • “You build it, you run it” o The same team that builds the software operates the software o No separate maintenance or sustaining engineering team
  3. 3. 1. Reorganize Teams Around Ownership • E.g., KIXEYE and MySQL o Development team wrote the SQL, issued all the queries o DBA / Ops team responsible for performance and uptime o Splitting ownership between teams was counterproductive and disruptive • Alternative strategies o Centrally-maintained persistence service OR o Customer manages its own persistence
  4. 4. 2. Lose the Ticket Culture Ticket Culture Ownership Culture Do what is asked for Do what is needed One-way communication Two-way collaboration Goal is to close the ticket Goal is product success Reactive approach Proactive approach Reinforces silos Reinforces collaboration Prioritizes process Prioritizes results
  5. 5. 3. Replace Approvals With Code • Reduce or eliminate approval bodies o E.g., eBay Architecture Review Board o (-) Too late o (-) Too slow o (-) Too disengaged from details • Package expertise in code o Smart, experienced people build their knowledge into code o Teams with specialized skills (databases, security, compliance, etc.) provide a service, library, or tool
  6. 6. 3. Replace Approvals With Code • E.g., Security at Google o Provide secure foundations by maintaining lower-level libraries and services o Provide self-service penetration tests, vulnerability assessments, etc.
  7. 7. The easiest way to “enforce” a standard practice is with working code.
  8. 8. 4. Enforce a Service Mentality • Vendor-Customer Discipline o Service team is a vendor; the products are its customers o Service is useful only to the extent it provides value to its customers • Customer can choose to use service or not (!) o Customer team is responsible for deciding what is best for their use case o Use the right tool for the right job • Provides powerful incentives o Service must be *strictly better* than the alternatives of build, buy, borrow
  9. 9. 5. Charge for Usage • Charge customers for *usage* of the service o Aligns economic incentives of customer and provider o Motivates both sides to optimize efficiency • Free usage leads to waste o No incentive to control usage or find more efficient alternatives • E.g., App Engine usage at Google o Charging particularly egregious internal customer led to 10x reduction in usage
  10. 10. 6. Prioritize Quality • Quality, Performance, and Reliability are “Priority-0 features” o “Stop the line” if there is a degradation o Equally important to users as product features or engaging user experience • Developers write tests and code together o Continuous testing of features, performance, load o Confidence to make risky changes • “Slow down to speed up” o Catch bugs earlier, fail faster
  11. 11. 6. Prioritize Quality • E.g., Development Process at Google o Code reviews before submission o Automated tests for everything o Single searchable source code repository  Internal Open Source Model o Not “here is a bug report” o Instead “here is the bug; here is the code fix; here is the test that verifies the fix” 
  12. 12. 7. Start Investing in Testing • Write functional tests around a component o If you can only write a few tests, they should be meaningful ones o End-to-end tests exercise more meaningful customer-visible capabilities than unit tests • Fail any build that breaks a test • Keep ratcheting up the tests o For every new feature, add tests for that feature o For every new bug, add a test that reproduces the bug and verifies the fix
  13. 13. 8. Actively Manage Technical Debt • Maintain sustainable and well-understood level of debt o Denominated in engineering effort to fix o Plan for how and when you will pay it off o Track feature work vs. accrued debt over time • “Don’t have time to do it right” ? o WRONG  – Don’t have time to do it twice (!) o The more constrained you are on time and resources, the more important it is to do it solidly the first time
  14. 14. Vicious Cycle of Technical Debt Technical Debt “No time to do it right” Quick- and-dirty
  15. 15. Virtuous Cycle of Investment Solid Foundation Confidence Faster and Better Invest in Quality
  16. 16. 9. Share On-call Duties • All members of the team rotate on-call responsibilities o Strongest motivator to build in solid monitoring and diagnosis capabilities o Best way to learn the real-world behavior of the system o Best way to develop empathy for customers and other team members • Train via on-call “apprenticeship” o 1. Apprentice starts as secondary on-call, experienced engineer is primary o 2. Apprentice is primary, experienced engineer is secondary o 3. Apprentice graduates
  17. 17. 10. Make Post-Mortems Truly Blameless • Overcoming blame culture takes work o Institutional memory of blame is long o E.g., Initial post-mortems at KIXEYE elicited tons of fear • Constantly reinforce learning over blame o When you say “blameless”, you have to really mean it (!) o Don’t ask “what did you do?”, ask “what did you learn?”
  18. 18. 10. Make Post-Mortems Truly Blameless • Open and Honest Discussion o Document exactly what happened o What went right o What went wrong • Focus on Learning and Improvement o How should we change process, technology, documentation, etc. o How could we have automated the problems away? o How could we have diagnosed more quickly? • Take fear and personalization out of it  Engineers will compete to take personal responsibility (!)  “Finally we can fix that broken system” 
  19. 19. Top Five Takeaways • 1. Reorganize Teams Around Ownership • 2. Replace Approvals With Code • 3. Prioritize Quality • 4. Actively Manage Technical Debt • 5. Make Post-Mortems Truly Blameless
  20. 20. What I Could Use Help With • Encouraging leaders to lose the blame culture • Measuring productivity in a principled way • Overcoming resistance to taking the pager
  21. 21. Thank You! • @randyshoup • linkedin.com/in/randyshoup

×