• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Release Often Release Safely
 

Release Often Release Safely

on

  • 2,959 views

Kung-Fu of releasing often but safely for high loaded systems

Kung-Fu of releasing often but safely for high loaded systems

Statistics

Views

Total Views
2,959
Views on SlideShare
824
Embed Views
2,135

Actions

Likes
0
Downloads
8
Comments
0

11 Embeds 2,135

http://sergejus.blogas.lt 1755
http://at2011.agiletour.org 307
http://dotnetgroup.dev 42
http://feeds2.feedburner.com 13
http://coderwall.com 6
http://webcache.googleusercontent.com 4
http://at2012.agiletour.org 3
http://localhost 2
http://www.rssmountain.com 1
http://sergejus.blogas.lt&_=1323084889109 HTTP 1
http://sergejus.blogas.lt. 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Release Often Release Safely Release Often Release Safely Presentation Transcript

    • Release Often Release Safely
      Sergejus Barinovas (@sergejusb)
      http://sergejus.blogas.lt
    • This is not a theoretical presentation
    • This presentation based on real life experience
    • Successful software workflow
    • Dilemma: Innovative or Stable?
      Innovative
      Often (bi-weekly) releases of new features
      Higher risk of bugs and downtimes
      Stable
      Higher uptime and better customer perception
      Seasonal releases of new features
    • We wanted both …
      … be innovative and agile while staying as much stable as possible
    • Stability in our terms
      99.999% uptime for serving ads
      2 datacenters + clouds
      500 M requests / day
    • Let’s learn Kung Fuof releasing often and safely
    • Challenges we ha(d/ve)
      Detect issues in production as soon as possible
      Test new features in production while reducing impact for customers
      Roll-out new features in a controlled manner
    • Detect issues in production ASAP
      Monitoring
      Choose monitoring system carefully
      It took us about 1 year (Zabbix)
      First list all your possible monitoring use cases
      Prepare your software for monitoring
      Logging is a must have!
      Performance / SLA counters help to measure and understand software better
      Create a clear baseline to compare with after releases
    • Detect issues in production ASAP
      Automated functional tests
      Designed to detect end-user issues
      Differently than unit and integration tests
      UI / business logic
      Still not as many as we want (Selenium UI / C#)
      Ongoing process of unifying automated QA tests
      Run after each release and on periodic basis
      Very important if you have > 1 server
      Huge time saver if tests are repetitive
    • Though unit tests help in finding bugs during coding, they are more vital when software evolves!
      Finding
    • Test new features in production
      Even ideal staging environment is not equal to production environment
      Before starting rolling-out new feature it is important to check its
      Resource consumption
      CPU / RAM / HDD / IO / Network
      Performance impact on existing functionality
      Response times / SLA
      Stability
      Errors / memory leaks
    • Test new features in production
      Use Case #1:
      Safely rollout new feature that integrates into core data collection pipeline
    • Test new features in production
      Dark releases
      Works best with brand new features
      Release new feature to one or several servers
      New feature gets real load, but is not available for customers
      Have automated rollback package in case something goes wrong
    • Test new features in production
      Dark release notes from our release plan
    • Test new features in production
      Use Case #2:
      Safely migrate to the new SQL connection pooling mechanism
    • Test new features in production
      Feature flags and switchers
      Works both for brand new features and updates
      Feature can be switched on / off any time
      if (FeatureEnabled) then …
      if (UseNewLogic) then … else …
      Can effect existing customers
      Possible to test each server one by one by switching feature on / off
    • Test new features in production
      Use Case #3:
      Safely migrate to the brand-new intelligent targeting subsystem
    • Test new features in production
      Valves
      Very similar to switches
      Feature can get from 0% to 100% of real load
      Very handy to gradually roll-out new features on each server one by one
      So far helped us a lot though require extra development effort
    • Test new features in production
      Caveats we had so far
      Make sure you can turn features on / off without effecting connected users
      Create simple interface to display current status of all switches and valves on each affected server
      Secure access to switches and valves
    • Controlling roll-out of new feature
      Switches and valves enable very smooth and controlled roll-out
      Partial roll-out to different datacenters / clouds
      Different datacenters / clouds have different version of feature released
      Redirect all traffic to the new or old version of feature
    • Controlling roll-out of new feature
      Future research: application level load balancing
      Load balancer can act as a switches / valve without actually programming load distribution logic
      Ability to automatically redirect users to the new version of application while preserving old one
    • Summary
      Monitoring system is very important, but your software should be prepared for this
      Automated functional tests are functional monitoring of your software
      Switches and valves are very powerful concept for testing in production and roll-outs, but require extra development and maintenance time
      Dark releases and partial roll-outs are the most cost effective safety mechanism
    • Thanks! Questions?
      Sergejus Barinovas (@sergejusb)
      http://sergejus.blogas.lt