Release Often Release Safely


Published on

Kung-Fu of releasing often but safely for high loaded systems

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Release Often Release Safely

  1. 1. Release Often Release Safely<br />Sergejus Barinovas (@sergejusb)<br /><br />
  2. 2. This is not a theoretical presentation<br />
  3. 3. This presentation based on real life experience<br />
  4. 4. Successful software workflow<br />
  5. 5. Dilemma: Innovative or Stable?<br />Innovative<br />Often (bi-weekly) releases of new features<br />Higher risk of bugs and downtimes<br />Stable<br />Higher uptime and better customer perception<br />Seasonal releases of new features <br />
  6. 6. We wanted both …<br />… be innovative and agile while staying as much stable as possible<br />
  7. 7. Stability in our terms<br />99.999% uptime for serving ads<br />2 datacenters + clouds<br />500 M requests / day<br />
  8. 8. Let’s learn Kung Fuof releasing often and safely<br />
  9. 9. Challenges we ha(d/ve)<br />Detect issues in production as soon as possible<br />Test new features in production while reducing impact for customers<br />Roll-out new features in a controlled manner<br />
  10. 10. Detect issues in production ASAP<br />Monitoring<br />Choose monitoring system carefully<br />It took us about 1 year (Zabbix)<br />First list all your possible monitoring use cases<br />Prepare your software for monitoring<br />Logging is a must have!<br />Performance / SLA counters help to measure and understand software better<br />Create a clear baseline to compare with after releases<br />
  11. 11. Detect issues in production ASAP<br />Automated functional tests<br />Designed to detect end-user issues<br />Differently than unit and integration tests<br />UI / business logic<br />Still not as many as we want (Selenium UI / C#)<br />Ongoing process of unifying automated QA tests<br />Run after each release and on periodic basis<br />Very important if you have > 1 server<br />Huge time saver if tests are repetitive<br />
  12. 12. Though unit tests help in finding bugs during coding, they are more vital when software evolves!<br />Finding<br />
  13. 13. Test new features in production<br />Even ideal staging environment is not equal to production environment<br />Before starting rolling-out new feature it is important to check its<br />Resource consumption<br />CPU / RAM / HDD / IO / Network<br />Performance impact on existing functionality<br />Response times / SLA<br />Stability<br />Errors / memory leaks<br />
  14. 14. Test new features in production<br />Use Case #1:<br />Safely rollout new feature that integrates into core data collection pipeline<br />
  15. 15. Test new features in production<br />Dark releases<br />Works best with brand new features<br />Release new feature to one or several servers<br />New feature gets real load, but is not available for customers<br />Have automated rollback package in case something goes wrong<br />
  16. 16. Test new features in production<br />Dark release notes from our release plan<br />
  17. 17. Test new features in production<br />Use Case #2:<br />Safely migrate to the new SQL connection pooling mechanism<br />
  18. 18. Test new features in production<br />Feature flags and switchers<br />Works both for brand new features and updates<br />Feature can be switched on / off any time<br />if (FeatureEnabled) then …<br />if (UseNewLogic) then … else …<br />Can effect existing customers<br />Possible to test each server one by one by switching feature on / off<br />
  19. 19. Test new features in production<br />Use Case #3:<br />Safely migrate to the brand-new intelligent targeting subsystem<br />
  20. 20. Test new features in production<br />Valves<br />Very similar to switches<br />Feature can get from 0% to 100% of real load<br />Very handy to gradually roll-out new features on each server one by one<br />So far helped us a lot though require extra development effort<br />
  21. 21. Test new features in production<br />Caveats we had so far<br />Make sure you can turn features on / off without effecting connected users<br />Create simple interface to display current status of all switches and valves on each affected server<br />Secure access to switches and valves<br />
  22. 22. Controlling roll-out of new feature<br />Switches and valves enable very smooth and controlled roll-out<br />Partial roll-out to different datacenters / clouds<br />Different datacenters / clouds have different version of feature released<br />Redirect all traffic to the new or old version of feature<br />
  23. 23. Controlling roll-out of new feature<br />Future research: application level load balancing<br />Load balancer can act as a switches / valve without actually programming load distribution logic<br />Ability to automatically redirect users to the new version of application while preserving old one<br />
  24. 24. Summary<br />Monitoring system is very important, but your software should be prepared for this<br />Automated functional tests are functional monitoring of your software<br />Switches and valves are very powerful concept for testing in production and roll-outs, but require extra development and maintenance time<br />Dark releases and partial roll-outs are the most cost effective safety mechanism<br />
  25. 25. Thanks! Questions?<br />Sergejus Barinovas (@sergejusb)<br /><br />