Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

There is No Spoon - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2016

While learning and implementing new technologies into your infra has major impact, sometime we miss the big picture and forget to involve our good friends - the software engineers. Come hear how you can solve this problem, like we did at Facebook: how production engineers and software engineers works together, from the roadmap plans all the way to a shared oncall.

  • Login to see the comments

  • Be the first to like this

There is No Spoon - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2016

  1. 1. Production Engineering (PE) There Is No Spoon Ran Leibman Production Engineer
  2. 2. Agenda 1. How Production Engineering was formed in Facebook 2. What we do in Onavo 3. How we moved to the PE model in Onavo 4. Q & A
  3. 3. Facebook - Pre PE Days
  4. 4. SRO - Site Reliability Operations 1. Keep the site up 24/7 2. Follow the sun 3. Capacity plans
  5. 5. Why SRO was not enough ?
  6. 6. What are the alternatives ?
  7. 7. NOC
  8. 8. The Production Engineering Model 1. PEs are embedded within the software engineering teams 2. Taking part in meetings 3. Involved in roadmap plans 4. Reviewing diffs 5. Oncall - Software & Production Engineers
  9. 9. Onavo - Adopting The PE Model
  10. 10. » Protect user traffic using IPsec » Protect against malicious sites » Compress user traffic » Control data leakage Save, Measure & Protect your mobile data
  11. 11. a bit of context Onavo 1. Founded at 2010 2. Classic Startup Dev & Ops teams 1. Dev - writes code 2. Ops - keeps the infra up & running 3. Acquired by Facebook at 2013
  12. 12. Making The Change - Step By Step
  13. 13. Step 1 - Go Sit Close/Next With The Developers
  14. 14. Step 2 - Get The Colleagues Onboard
  15. 15. Step 3 - Get Your Tooling Ready
  16. 16. you don’t want that “Confused Travolta” moment … Have Good (short) Documentation » Document your alerts » Links to dashboards » Links to third party software docs » Runbooks - how to debug in prod » log files, how to restart the service, getting stack traces & metrics … » Links to config management
  17. 17. Dev Friendly Systems
  18. 18. avoid the graph porn … Simple And Indicative Dashboards 1. Match the product KPIs 2. Strong signal 3. Intuitive titles 4. Easy to spot anomalies 5. Easy to find correlations
  19. 19. Step 4 - Review Your Alerts
  20. 20. rm -rf /all/false/alarms* Refactor Your Alerts as Needed » The first challenge is to make sure alerts are handled » To make it possible every alert should be » Indicate a real problem » Clear to understand - Informative » Impactful » Actionable
  21. 21. Step 5 - Train The Team - Get Them Ready
  22. 22. learning is easy - remembering is hard Train The Team » Wiki / Doc based » makes it easier to remember » Hands-on Hands-on Hands-on » Pre create task pool (even if low impact) » Give oncall use cases & examples » Reusable
  23. 23. Step 6 - Oncall + Hand Holding
  24. 24. make yourself available and adjust as you go Shared Oncall » Short oncall cycles, 1-2 days » Increase the period each cycle » Oncall Summaries » Do oncall as well - set an example » Preemptively check status with
 the current oncall
  25. 25. √ Step 1 - Go Sit Close/Next With The Developers The Steps
  26. 26. √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  27. 27. √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  28. 28. √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  29. 29. √ Step 5 - Train The Team √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  30. 30. √ Step 6 - Oncall + Hand Holding √ Step 5 - Train The Team √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  31. 31. Questions? Ran Leibman Production Engineer

×