Automated push monitoring and rollback at imvu

1,560 views
1,440 views

Published on

Archcamp Lightning talk on "Metrics, Collection, and Immune Systems"

Kishore Jalleda
Director of Operations
IMVU, Inc

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,560
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Automated push monitoring and rollback at imvu

  1. 1. Automated Push Monitoring and Rollback @IMVUArchCamp Lightning talks <br />KishoreJalleda<br />Director of Operations<br />IMVU, Inc<br />
  2. 2. How did it all start?<br />From a P1 back in 2007. <br />Site issues <br />Ops and eng identify the bad revision <br />Engineers commit a fix<br />Wait for BB to go green <br />Finally all tests pass <br />Push the fix and the site recovers <br />Too bad we were down for 20 minutes<br />
  3. 3. How did it all start? (cont’d)<br />Postmortem time ( 5 whys ) <br />Run multiple website revisions for fast rollbacks <br />Typical web server directory structure <br />website -> /home/webadmin/website.107847 ( symlink)<br />website.107767<br />website.107788<br />website.107825<br />website.107834<br />website.107835<br />website.107847<br />
  4. 4. More evolution<br />We had more P1’s and more Postmortems and more follow ups. <br />Identified some common root causes<br />Finding changes in key metrics was manual and sometimes took days or even weeks. <br />rolling back was fully scripted but required a manual trigger <br />Push monitoring and auto rollbacks was born<br />
  5. 5. Push Monitoring & Auto Rollback<br />Phase 1: <br />Push to small % of servers <br />Monitor pre and post push key metrics <br />Key metrics OK ? <br />Go to Phase 2 <br />Key metrics not OK ? <br />Rollback to previous green revision ( simple symlink switch, takes seconds ) <br />
  6. 6. Push Monitoring & Auto Rollback(Cont’d)<br />Phase 2: <br />Push to rest of servers <br />Monitor pre & post push key metrics <br />Key metrics OK ? <br />Push successful  <br />Key metrics not OK ? <br />Rollback to previous green revision ( simple symlink switch, takes seconds ) <br />
  7. 7. What if your push gets rolled back ?<br />You get an email with subject “rollback of r107767” <br />The body contains something like this <br />Revision 107767 triggered an alarm in the cluster and was automatically rolled back to revision 107764<br />Details: https://foo.imvu.com/push_yyyy.php?push_phase_id=384000<br />kjalleda initiated the push at Fri May 13 14:46:38 2011.<br />
  8. 8. Push Status Page <br />
  9. 9. More evolution<br />The below hacks evolved from more Postmortems / 5 Whys <br />Regret your last push ?, “imvu_oops” to the rescue. Along with rolling back to a previous good revision, this will also lock commits, pushes, and sends an email to ops, eng, and on-call. <br />Ability to manually rollback quickly without having to go through commit/BB/push<br />Ability to manually push a particular revision <br />Ability to manually lock commits and or pushes <br />Push system itself is broken, now what ? ( its a P1 at IMVU ) <br />Automated rollbacks on any metric inaccessibility <br />Immune system for IMVU config variables ( site switches ) <br /> <br />
  10. 10. Expect some hurdles<br />Don’t expect your push monitoring to catch everything, remember not all changes cause immediate impact, some take days or even weeks to surface<br />There are inevitably going to be false positives / Intermittent issues due to a variety of reasons. <br />Push settings/thresholds may need periodic tweaking to accommodate some cluster changes <br />Ongoing production issues can skew some metrics which can impact pushes<br />Rollbacks from un-related errors are a pain to deal with. <br />
  11. 11. Thank You!<br />KishoreJalleda<br />kjalleda@imvu.com<br />IMVU recognized as:Inc. 500:  http://bit.ly/dv52wK <br /> Red Herring 100:  http://bit.ly/bbz5Ex <br /> Best Place to Work:  http://bit.ly/aAVdp8 <br /> (and we're hiring): http://www.imvu.com/jobs <br />

×