• Save
Go or No-Go: Operability and Contingency Planning at Etsy.com
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Go or No-Go: Operability and Contingency Planning at Etsy.com



These are the slides from my talk at the Surge Conference in 2010, in Baltimore: http://omniti.com/surge/2010/speakers/john-allspaw

These are the slides from my talk at the Surge Conference in 2010, in Baltimore: http://omniti.com/surge/2010/speakers/john-allspaw



Total Views
Views on SlideShare
Embed Views



25 Embeds 8,212

http://www.kitchensoap.com 7134
http://www.planetdevops.net 320
http://kitchensoap.com 228
http://feeds2.feedburner.com 221
http://srdjira.websense.com 156
http://blog.dremer.net 52
http://www.newsblur.com 20
http://translate.googleusercontent.com 14
http://feeds.feedburner.com 14
http://dev.newsblur.com 14
http://lanyrd.com 8
http://www.webpagetest.org 5
http://www.linkedin.com 5
http://newsblur.com 4
http://bo.lt 3
http://a0.twimg.com 2
https://si0.twimg.com 2
http://static.slidesharecdn.com 2
http://srdjira 2
http://localhost:8888 1
http://migurski.net 1
http://nird.blogspot.com 1
http://wfl01w.srv.mailcontrol.com 1
http://cache.baidu.com 1
http://webcache.googleusercontent.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • I love these slides! Did you know we’re running a competition on SlideShare to win a 3M PocketProjector MP180? To enter, simply tag your presentation with ‘3MInform’. Head over to our page for more details... and don’t forget to follow us to find out if you get shortlisted!
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Go or No-Go: Operability and Contingency Planning at Etsy.com Presentation Transcript

  • 1. Go or No-Go Operability and Contingency Planning John Allspaw, Etsy.com
  • 2. Etsy as of now Total Members: over 5.7 million Total Sellers: over 400,000 Items Currently Listed: 6.5 million Page Views per month: 775 million Total $ sold (gross merchandise sales) 2010 = $179.4 million (through August)
  • 3. New Features
  • 4. Delivering OperableGoSoftware Arch Review Development/Ops or No-Go Launch* Feedback Loop
  • 5. CONTINUOUS DEPLOYMENT != deploying new features without coordination and planning
  • 6. Operability Review Contingency Checklist
  • 7. Not An Innovative Idea http://en.wikipedia.org/wiki/Launch_status_check
  • 8. 10 minute get-together • Product • Development • Operations • Design • Community • Support
  • 9. Consensus
  • 10. Informally Codifies “OK” Dev “We all understand/agree/ Ops accept that we are OK here!” Product Community Support Buggy Stable Perfect! Sloppy Finished Enough For Launch Unfinished
  • 11. Yes or No
  • 12. Has the feature been tested enough to deploy to production? Is there any final functional QA still needed?
  • 13. Is communication (blog post/forums/etc) about the feature ready to go out with the feature?
  • 14. Does everyone know: when it will go live, and who will push the feature?
  • 15. Has the feature been in production for staff (or some other specific subset of the users) already? If not, could it have been?
  • 16. Is it possible to dark launch this feature? Will this feature be dark launched? (or, has it already?)
  • 17. Is it possible to turn up this feature on a percentage basis? If so: will we?
  • 18. Does it involve any new infrastructure? If so: are those pieces in monitoring and metrics collection? (this answer can’t be “no” before launch)
  • 19. Do we have on/off switches for this feature? If so: are those switches documented? (this answer can’t be “no” before launch)
  • 20. Are all the leads (Dev, Ops, Product, Community, Support, etc.) available for the launch and in communication? (this answer can’t be “no” before launch)
  • 21. Is there a single and easy place for users to report bugs or concerns about the feature? (forum topic, etc.)
  • 22. Have all leads agreed upon a post-launch “it’s all DONE” time to declare the launch was successful?
  • 23. Have we done a Contingency Checklist™ and everyone reviewed it? (this answer can’t be “no” before launch)
  • 24. Contingency Checklist
  • 25. “What could possibly go wrong?” “When it does go wrong, WTF will we do?!”
  • 26. NOTE: This is worked outBEFORE launch, normally by product and development, involving others where needed. (when we have saner heads)
  • 27. Issue Onsite Messaging Likelihood Forums Comment(s) Blog Impact on Users PR Engineering Response
  • 28. Comment Impact on Engineering Onsite Issue Likelihood Forums Blog PR (s) Users Response Messaging
  • 29. Example: Coffee! AWESOME NEW FEATURE • add coffee (like a tag) to your profile • others can favorite coffees • page showing all coffee favorites • bulk-add coffees to your profile • search people by their coffee
  • 30. Issue What could possibly go wrong with the feature launched in production? Example: “The Coffees-You’ve-Favorited page is too expensive.”
  • 31. Likelihood How likely is this issue going to come up? Example: “Low to mid.”
  • 32. Comment(s) Any extra info about this issue here. Example: “Because of how we paginate coffee favorites page, they are somewhat harder than normal favorites. If we do have to turn this off, we’re saying that we need to re-design it, or it needs to stay off until the initial burst of traffic from the launch.”
  • 33. Impact How much is this going to impact the experience of the feature, if it does become a concern? Example: “High”
  • 34. Engineering Response What will we do to mitigate the issue (i.e. can we gracefully degrade?) Example: “Set disable_coffee_favorites_page = 1”
  • 35. Onsite Messaging What is the messaging to the community in the forums/blog/etc., if this needs graceful degradation? Example: “‘The Coffee Favorites page is currently unavailable.’ Or, in the forums: “We’re working through some issues with displaying Coffee Favorites, we’ll let you know the status as time goes on.’”
  • 36. PR Is the issue so severe that we need PR involved? Example: “The CEO sends a press release, apologizing to Folger’s, Peet’s, and Starbucks with a witty yet calming voice of explanation and a humble request for patience.”
  • 37. * afterwards....
  • 38. *successful launch... Metrics? Are we there yet? OMG! Who to call if it breaks later?
  • 39. * non-successful launch... Metrics? What’d we miss? Post Mortem? Ramp down?
  • 40. Photos http://www.flickr.com/photos/jliba/3783269078/ http://www.flickr.com/photos/mybloodyself/2072928376/ http://www.flickr.com/photos/jacy/360020853/ http://www.flickr.com/photos/f-l-e-x/2319852529/ http://www.flickr.com/photos/16230215@N08/3023061528/ http://www.flickr.com/photos/proimos/4199675334/ http://www.flickr.com/photos/askal_bosch/2579320395/