LearnBop Blue Green AWS Deployments - October 2015

LearnBop Blue/Green Deployments
October 2015

utcnow
CTO at
www.learnbop.com
Algorithmic individual tutoring tuned by veteran teachers
Common Core and state standards supported
Currently enjoyed in schools. Sign up to be notified when parent led version is live:
http://go.learnbop.com/amazon-parents

General release good practices
Continuous integration - build, test, etc
Scripted environment creation/update (ideally in source control)
Scripted “one-click” deploy
New code, API’s AND database schema should be backwards compatible

Why not rolling releases?
Not immutable infrastructure
❖Opportunities for config creep
❖Rollback risks - Code only releases likely easy. What if you patch the
OS, update a few libraries, etc?
Manual or automatic complexity tracking version state
Some big change will require new servers/environment anyway

Why blue/green?
Immutable infrastructure
❖Ensures your environment build process is up to date each release
❖Old environment is guaranteed untouched if rollback or comparison
needed
Rollback is FAST
Same process for minor or major changes (OS updates? no problem)
One button spin up & deploy plus one button to shift traffic. Either old or new.
No complex in between risk.

Web Request Path - Round 1
Maybe in 1993…
GET / HTTP/1.0

Web Request Path - Round 2
GET / HTTP/1.1

Swap CNAME worst case
Bad Scenario 1 - Users stuck on old pre-swap version longer than a few min
User actively clicking on the site with keep the HTTP keep-alive sockets
active and won’t get a chance to check DNS again
Browser and OS DNS cache can keep old value longer than a minimal DNS
TTL
Some DNS servers or apps may be configured/misconfigured with
abnormally high TTL
Bad Scenario 2 - Users stuck on old pre-swap version INDEFINITELY
Long polling, websockets, notification refresh will keep re-using the same
HTTP keep-alive socket
It never goes back to a DNS server to get a new address as long as they
don’t lose internet access/close the browser
I’ve seen it happen 12+ hours

Swap CNAME worst case
Bad Scenario 3 - Semi-permanent stale data
CDN caches old version of file during you swap
Browser gets old file with Cache-Control: max-
age=3600 and caches it for a YEAR
Emergency Workarounds
Tell your users to clear cache (not a great move for
public websites)
Change your cachebuster ?build= # and re-publish
Disable CDN
Bad Scenario 4 - User requests going from old → new
→ old servers
Request hits one bank of DNS servers and gets new
IP
Hit different bank of DNS servers and gets old IP
Could send new form data to old server backend...

How do we know what version users are hitting?

How do we know what version CDN is hitting?

Discarded Alternatives
Try to reuse ELB OR put servers in a 3rd ELB (not in blue or green env)
Complex to manage which servers should be in and out
If using Elastic Beanstalk and auto-scaling complex to manage new servers
or putting in servers
Trick Beanstalk into switching the ELB it’s using (swap ELB for pre and
post)
Error: Tag keys starting with ‘aws:’ are reserved for internal use
Swap CNAMEs first and then put new nodes in both new and old ELB.
Remove old nodes from old ELB after
Not bad but still need to leave old ELB up in case of old DNS
Pre-rollback testing hard as old nodes are not reachable

Final Solution Attributes
Attributes
Only possible relatively recently with new AWS attach/detach ELB to
AutoScaling Group (ASG) feature out June 11th - see blog post
Fully scripted and one click (bash script run through RunDeck)
Rollback is as simple and running it again to swap back
No CNAME/DNS changes!
Old environment not hit more than 3 minutes after new servers come online
No one hitting new server has any risk of future request hitting old server
(unless you rollback)

Final Solution Environment Setup
Environment work
Initial state: Beanstalk application with two environments running and green
(staging and production)
Create two new ELB’s outside of Elastic Beanstalk (PROD and STAGING)
Attach STAGING ELB to staging (pre-swap to prod) Autoscaling Group
CNAME dualstack DNS name of STAGING ELB to your staging web site
address
Attach PROD ELB to production Autoscaling Group
CNAME dualstack DNS name of PROD ELB to your production site
address
Ensure Connection Draining is enabled on all four ELBs with a timeout of
120 seconds
Ensure application sets a session type cookie on EVERY request
Create an ELB application controlled session stickiness cookie policy

Final Solution Steps - Sanity Checks
First Do No Harm! Lots of sanity checks before proceeding.
1. Confirm two environments exist in application and one has the PROD ELB
attached to its ASG and the other has the STAGING ELB attached to its
ASG.
2. Confirm both environments are Health: Green

Final Solution Steps
1. Enable ELB application sticky cookie policy on PROD ELB (both HTTP
and HTTPS if applicable! - avoid users hitting new servers then old)
2. Set PROD ELB Connection Idle Timeout to 20 seconds (to close
connection and thwart WebSockets, Long Polling, HTTP keep-alive)
3. Attach PROD ELB to new code environment ASG (loop until complete)
4. Detach PROD ELB from old code environment ASG (loop until complete)
5. Disable ELB application sticky cookie policy on PROD ELB
6. Set PROD ELB Connection Idle Timeout back to 60 seconds
7. Attach STAGING ELB to old code environment ASG (loop until complete)
8. Detach STAGING ELB from new code environment ASG (loop until
complete)
9. Flag old code environment for termination (separate script 2 hours later)
10.Flag deployment successful in 3rd party tools/monitoring
Rollback if needed is running the same script

Q&A / Thank you!
Always Be Shipping!
Email: alec@learnbop.com
Twitter: alec1a
Slide Deck (posted by Sunday, Oct 4th)
http://tinyurl.com/bluegreen2015
LearnBop for Parents
http://go.learnbop.com/amazon-parents

LearnBop Blue Green AWS Deployments - October 2015

More Related Content

Similar to LearnBop Blue Green AWS Deployments - October 2015

Recently uploaded

LearnBop Blue Green AWS Deployments - October 2015

Editor's Notes