Rails Operations                   Lessons learned from deploying and managing hundreds                                   ...
Oh hi, I’m Josh
• @techpickles                    • http://github.com/technicalpickles                    • http://technicalpickles.comI a...
Awesomeness Engineer                     of Supreme Versatility                                              IIMy official...
Managed hosting and                                            operationsWe’re mostly known for our hosting. What isn’t as...
I’m talking about                       Rails OperationsConveniently enough, I’m talking about Rails Operations today.
WTF is                                           Rails Operations?I found this hard to distill down to a simple statement....
Development and maintenance                    of a production Rails applicationThe simplest definition I’ve found is this.
Very important assumption                   You develop code that will eventually go into production,                    a...
Before we dig in too       deep...
Let’s talk about the business. We need to start with wheredevelopment and operations fit within the rest of The Business.
QuestionDoes development generate revenue?
• Takes place on laptops, desktop machines,  staging servers• No real users• Unknown if it truly works • Tests are green, ...
NO
but it CREATESpotential revenue
• Step 1: Development• Step 2: .......• Step 3: PROFIT
QuestionDoes operations generate revenue?
• Lives on servers located in data centers  and clouds• Real users• Either code works, or it doesn’t• Either the applicati...
NOJust because your application works in production, doesn’t meanpeople are using it or buying your product.
but it PRESERVES                       potential revenueIf you have good operations, that means users will be able to seey...
• Step 1: Development• Step 2: Operations• Step 3: ......• Step 4: PROFIT!
QuestionUh, what generates revenue?
Million Dollar Question
• Working features (or at least that work  enough)• Infrastructure to keep the application up  and running (or at least up...
Lessons learnedAlright. I’ve given you a definition of Rails Operations, and had abrief detour to talk about the business a...
Common threadsPutting this all together, I kept coming back to some commonthreads. That is, some ideas that apply to many ...
Give a damnIf you don’t care about what you’re doing, everything else I’mtalking about today probably doesn’t matter. I do...
Earlier we talked about how operations preserves revenue. To thatend, our goal is to mitigate risk as much as makes sense.
Tradeoffs and compromise. Each possible solution has them. Thetrick is understanding that there are tradeoffs. What tradeo...
ConfigurationManagement    Pattern
It’s about managing    configuration.        duh.
You write code that                        manages your servers’                            configurationTake a moment to t...
• apache package is installed• apache service is running• deploy user exists• cron jobs• etc
• Moonshine• Puppet• Chef
AutomationBootstrapping. Anyone that has setup a new server from scratchcan tell you... it’s time consuming, labor intensi...
The best way to illustrate why you should be using configuration management is to explore theconsequences of not using it.I...
As always, there’s tradeoffs to be made.Setting up and learning how to do configuration management takes time. Time that co...
Staging Servers     Pattern
Preproduction serversStaging servers are all about being a testbed between
Helps ensurecorrectness of deploy
configuration                         management                               +                        staging servers    ...
There’s basically no downside to using staging servers. The onlytradeoff though is that servers do cost dollar signs and s...
Maths... look around you. In most cases, you can do some dollar sign math to justify costs of a thing. Let’s try this.A st...
Repeat after me•   development•   staging•   production
capistrano-gitflowWhenever possible, I like to enforce standard by means of automationFor the flow of code from development ...
Deploy early, deploy      often        Pattern
A play on release early,                         release often.                       Although technically, I guess it’s t...
By deploying early and often, we’re also limiting risk. The lesschanges that go out in a single deploy, the less things th...
In a way, you can consider undeployed code a liability.Imagine spending a day or two doing some code cleanups to get ready...
Feeling DrivenDevelopment    Antipattern
Oh feelings.
The front page feels       slow
The primary key seems  like it’s increasing         rapidly
IO seems high
What does it even                             mean?This drives me nuts. By saying something ‘feels’ slow, there’s animplie...
Science Driven Development   Counterpattern
Metrics everywhere!With the right tools, you can easily be continuously collecting dataso you have it in your pocket when ...
• New Relic - http://newrelic.com                                       • Scout - http://scoutapp.comThese are the two we ...
The front page feels         slowThe front page is taking 10 seconds to load, but we  really need it to be loading in unde...
The primary key seems  like it’s increasing         rapidlyThe primary key is at 90% of it’s maximum, up from80% yesterday...
IO seems highIO fluctatues up to 90% sometimes, but doesn’t appear              to have a negative effect
Monitoring   Topic
How do you knowwhen everything is     awful?
How would you prefer     to know?• Angry tweets• Angry email from your boss• You personally checking everything all the  t...
• Nagios• Scout
What to monitorIt’s not a problem til it’s a problem
Define priorityDoes it wake someone up?
Must be actionable
Single point of contactIf everything is awful, needs to be a single point of contact. Theytake point, acknowledge and begi...
Vertical scaling      Pattern
Your app is slow     Now what?
Resources are(relatively) cheap
Developers are(relatively) expensive
Imagine having memory issues.
As always there’s a balance.Remember, it’s a tradeoff to optimize for developer time byvertically scaling. It buys you tim...
Hipster Stack    Antipattern
“I read a blog postabout how mongo is totally web scale”
Cargo cult operations
Remember what’s important for th ebusiness? Do you want tobecome the expert at <insert technology here>? Is it really them...
If you’re still going to go         hipster...• experiment in branches• understand operational impact• Staging!
Test in production      Wait, what?
Further Reading                       • Web Operations - John Allspaw and Jesse                         Robins            ...
Fin.
Want to talk ops?       find me here    josh@railsmachine       @techpickles
Do you like these        things?• Rails• Operations• Ping Pong• Beer       We are hiring
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Rails Operations -  Lessons Learned
Upcoming SlideShare
Loading in …5
×

Rails Operations - Lessons Learned

1,292 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,292
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Rails Operations - Lessons Learned

  1. 1. Rails Operations Lessons learned from deploying and managing hundreds of Rails applicationsThanks for coming out this morning. I know it’s hungover oclock,so it means a lot. You are dedicated, upstanding individuals.
  2. 2. Oh hi, I’m Josh
  3. 3. • @techpickles • http://github.com/technicalpickles • http://technicalpickles.comI am from the internet
  4. 4. Awesomeness Engineer of Supreme Versatility IIMy official title is Awesomeness Engineer of Supreme Versatility.2. (I recently was promoted)
  5. 5. Managed hosting and operationsWe’re mostly known for our hosting. What isn’t as well known is our managed services. For this, we engage more closelywith our customers.When bringing on new managed customers, we work with them to spec out servers, review application’s needs. We getthem up and running on these servers with our configuration management tool, moonshine. And once deployed, weprovide 24x7 monitoring. If you’re server goes down, we let you know, and get it back online as soon as possible,regardless of when it happens.And that’s not all. Once live, we provide operational support. Anything from application performance analysis,recommending architecture improvements, installing and managing new software on servers, or just being there to givefeedback on how the application is operating.You can basically think of us as a Rails Operations company.
  6. 6. I’m talking about Rails OperationsConveniently enough, I’m talking about Rails Operations today.
  7. 7. WTF is Rails Operations?I found this hard to distill down to a simple statement.I think it’s safe to say that the majority of us are developers. We write code, build applications, launch products.A lot of organizations, operations is something different. eople associate operations with system administration. Andto an extent, this can be fairly accurate. Different people, different teams, different. As developers, we write somecode, and toss it over wall, and let _them_ handle it.I think this is a bit flawed. The code you write has an operational impact. The systems you run it on have anoperational impact on your code. It’s a complex relationship, and when developer and operations teams areseparate, it’s hard to bridge the gap between, since it’s neithers responsibility.
  8. 8. Development and maintenance of a production Rails applicationThe simplest definition I’ve found is this.
  9. 9. Very important assumption You develop code that will eventually go into production, and in part to some business model, generate revenueThat is to say, you are part of some organization
  10. 10. Before we dig in too deep...
  11. 11. Let’s talk about the business. We need to start with wheredevelopment and operations fit within the rest of The Business.
  12. 12. QuestionDoes development generate revenue?
  13. 13. • Takes place on laptops, desktop machines, staging servers• No real users• Unknown if it truly works • Tests are green, but...
  14. 14. NO
  15. 15. but it CREATESpotential revenue
  16. 16. • Step 1: Development• Step 2: .......• Step 3: PROFIT
  17. 17. QuestionDoes operations generate revenue?
  18. 18. • Lives on servers located in data centers and clouds• Real users• Either code works, or it doesn’t• Either the application is available or not
  19. 19. NOJust because your application works in production, doesn’t meanpeople are using it or buying your product.
  20. 20. but it PRESERVES potential revenueIf you have good operations, that means users will be able to seeyour application working and actually be able to use it.
  21. 21. • Step 1: Development• Step 2: Operations• Step 3: ......• Step 4: PROFIT!
  22. 22. QuestionUh, what generates revenue?
  23. 23. Million Dollar Question
  24. 24. • Working features (or at least that work enough)• Infrastructure to keep the application up and running (or at least up enough)• A business model• Sheer determination• Good luck
  25. 25. Lessons learnedAlright. I’ve given you a definition of Rails Operations, and had abrief detour to talk about the business and where developmentand operations fit into it.Now for some lessons. Basically, I’ll be going over some patterns,some antipatterns, and other practices and topics.
  26. 26. Common threadsPutting this all together, I kept coming back to some commonthreads. That is, some ideas that apply to many aspects. I’m goingto start you off with a few together, and then just jump into thelessons. We’ll probably pick up a few more along the way.
  27. 27. Give a damnIf you don’t care about what you’re doing, everything else I’mtalking about today probably doesn’t matter. I don’t think youneed to worry about this though, since you are here.
  28. 28. Earlier we talked about how operations preserves revenue. To thatend, our goal is to mitigate risk as much as makes sense.
  29. 29. Tradeoffs and compromise. Each possible solution has them. Thetrick is understanding that there are tradeoffs. What tradeoffs youmake depends on what your priorities are. For example: * Dollar signs * Time * Sanity * Technical debt * Higher risk
  30. 30. ConfigurationManagement Pattern
  31. 31. It’s about managing configuration. duh.
  32. 32. You write code that manages your servers’ configurationTake a moment to think about how you might describe a server to someone.There’s plenty of nouns:* packages* users* files* cronjobs* servicesAnd some verbs:* running commands
  33. 33. • apache package is installed• apache service is running• deploy user exists• cron jobs• etc
  34. 34. • Moonshine• Puppet• Chef
  35. 35. AutomationBootstrapping. Anyone that has setup a new server from scratchcan tell you... it’s time consuming, labor intensive, and errorprone.Bootstraping is just part of it though, only ever happens oncethough. What’s more interesting is that you can use this tomanage your infrastructure as it involves. Need to start usingredis? Just add it to your configuration management, and you’llhave it next deploy.
  36. 36. The best way to illustrate why you should be using configuration management is to explore theconsequences of not using it.Imagine it’s time to add a new application server. Your application is under heavy load, and needs thisserver to be up and serving requests. How long will it take you to get it up? And how will you know it’ssetup correctly? If you’re doing this all manually, you can’t really know the answers to these questions.Here’s another example. Adding a new dependency to your application. It can be a gem, a nativepackage, a new daemon, whatever. How do you ensure this gets on the server when you need it?Deploy and pray? Log into the server and install it yourself? This sucks, and kind of risky especially ifyou’re talking about production.
  37. 37. As always, there’s tradeoffs to be made.Setting up and learning how to do configuration management takes time. Time that could bespent working on user-facing tasks.Taking on risk of having to cold deploy, or having deploys fail because of missingdependencies.Usually, the balance is to have to take the risk and have it burn you enough times that it’s morepainful to not stop and get your configuration management on, that it is to not do so.If you do know it, it’s a no brainer. Just DO IT.
  38. 38. Staging Servers Pattern
  39. 39. Preproduction serversStaging servers are all about being a testbed between
  40. 40. Helps ensurecorrectness of deploy
  41. 41. configuration management + staging servers = VERY YESIf you use configuration management, and have staging servers,then this is a huge win.We talked about adding new dependencies earlier. If you aredoing configuration management, then staging is the first placeyou can see if ur doing it right.
  42. 42. There’s basically no downside to using staging servers. The onlytradeoff though is that servers do cost dollar signs and stagingservers are no different. This leads us to a new thread...
  43. 43. Maths... look around you. In most cases, you can do some dollar sign math to justify costs of a thing. Let’s try this.A staging server may cost $60/moBut how can you calculate the cost of not having a staging server? Let’s assume that if you don’t have a staging server,you’re bound to do a bad deploy that it could have prevented. Some code that doesn’t work outright, or is otherwiseflawed. Let’s say it causes an hour of downtime while you determine the problem and try to fix it. Do you know how much itcosts your business in lost revenue to be down an hour?This is actually a pretty mature question, and I’d be surprised if many people can answer it off hand. In any event, I thinkwe can do some fuzzy math to say yeah, it probably is more than $60. If that’s the case, then one failed deploy a month isenough to validate a staging server.
  44. 44. Repeat after me• development• staging• production
  45. 45. capistrano-gitflowWhenever possible, I like to enforce standard by means of automationFor the flow of code from development -> staging -> production, we have capistrano-gitflow.Originally done up by apinstein, I did some refactorings and cleaned it up enough to be usable as agemEffectively, this enforces development -> staging -> production. Whenever you deploy to staging, ittags the current branch including information about the date, the user deploying, and a small blurbabout the changes. Assuming this is cool, you can promote a tag to production and go on from there.If you haven’t deployed to staging yet, you’ll be promtpted and it will default to using the lastproduction tag.
  46. 46. Deploy early, deploy often Pattern
  47. 47. A play on release early, release often. Although technically, I guess it’s the sameIt’s basically the same thing we hear in the open sourcecommunity.The sooner you release code, the sooner you can validate it andthe sooner you can get feedback. Does it work? Does it not breakthe entire site? Are users happy?
  48. 48. By deploying early and often, we’re also limiting risk. The lesschanges that go out in a single deploy, the less things there arethat can possibly break. By waiting to deploy, you’re accumulatinga larger set of changes to deploy, and therefore there’s moresurface area to debug if it breaks.
  49. 49. In a way, you can consider undeployed code a liability.Imagine spending a day or two doing some code cleanups to get ready for a sprint. Should you deploywhen you are done and happy with the refactorings, or should you go ahead and do your sprint.If it were me, I’d deploy the refactorings first. That way, the code is out there, and you’ll know if itperforms equally to its nonrefactored version. It’s really easy to introduce performance killing changesin even a few line diff.If you instead wait and deploy with new features, if anything goes awry, you have significantly morecode to spelunk to track down a potential problem.
  50. 50. Feeling DrivenDevelopment Antipattern
  51. 51. Oh feelings.
  52. 52. The front page feels slow
  53. 53. The primary key seems like it’s increasing rapidly
  54. 54. IO seems high
  55. 55. What does it even mean?This drives me nuts. By saying something ‘feels’ slow, there’s animplied assumption. The assumption is that it should be fast.Saying it like that is...weird, because it gives no indication of whatis slow or not.The trick is in determining what the assumption is, and thenfinding a way to measure and identify the problem.How can we do this?
  56. 56. Science Driven Development Counterpattern
  57. 57. Metrics everywhere!With the right tools, you can easily be continuously collecting dataso you have it in your pocket when you need it.
  58. 58. • New Relic - http://newrelic.com • Scout - http://scoutapp.comThese are the two we use and highly recommend.New Relic is really great for giving a high level view of your application. We’re talking at the request response level,including all sorts of fun maths with most time consuming requests, highest standard deviation, etc. It also breaks downrequests by where time spent. Like if it’s all in the view, the controller, the database, partials, etc etcScout is useful for other reasons. While New Relic is good for high level understanding of your application, Scout is a bitmore low level. You can use it to collect metrics about your servers, and how well they are running. Memory, CPU, diskspace, IO, mysql connection stats, and so on.I really believe these are a great combination, because New Relic can point you in the direction of a problem area, and Scoutcan better understand what’s contributing to it at a system level.
  59. 59. The front page feels slowThe front page is taking 10 seconds to load, but we really need it to be loading in under 1 second
  60. 60. The primary key seems like it’s increasing rapidlyThe primary key is at 90% of it’s maximum, up from80% yesterday, and looks like it’ll run out overnight.
  61. 61. IO seems highIO fluctatues up to 90% sometimes, but doesn’t appear to have a negative effect
  62. 62. Monitoring Topic
  63. 63. How do you knowwhen everything is awful?
  64. 64. How would you prefer to know?• Angry tweets• Angry email from your boss• You personally checking everything all the time• An automated system to let you know
  65. 65. • Nagios• Scout
  66. 66. What to monitorIt’s not a problem til it’s a problem
  67. 67. Define priorityDoes it wake someone up?
  68. 68. Must be actionable
  69. 69. Single point of contactIf everything is awful, needs to be a single point of contact. Theytake point, acknowledge and begin looking into it. If need be,bring on others
  70. 70. Vertical scaling Pattern
  71. 71. Your app is slow Now what?
  72. 72. Resources are(relatively) cheap
  73. 73. Developers are(relatively) expensive
  74. 74. Imagine having memory issues.
  75. 75. As always there’s a balance.Remember, it’s a tradeoff to optimize for developer time byvertically scaling. It buys you time to either deal
  76. 76. Hipster Stack Antipattern
  77. 77. “I read a blog postabout how mongo is totally web scale”
  78. 78. Cargo cult operations
  79. 79. Remember what’s important for th ebusiness? Do you want tobecome the expert at <insert technology here>? Is it really themost valuable thing you can be doing?
  80. 80. If you’re still going to go hipster...• experiment in branches• understand operational impact• Staging!
  81. 81. Test in production Wait, what?
  82. 82. Further Reading • Web Operations - John Allspaw and Jesse Robins • Continuous Delivery - Jez Humble and David Farley • “Web Operations for Developers 101”http://www.amazon.com/Web-Operations-Keeping-Data-Time/dp/1449377440/ref=sr_1_1?s=books&ie=UTF8&qid=1314447411&sr=1-1http://www.amazon.com/Continuous-Delivery-Deployment-Automation-Addison-Wesley/dp/0321601912/ref=sr_1_4?s=books&ie=UTF8&qid=1314447411&sr=1-4http://www.paperplanes.de/2011/7/25/web_operations_101_for_developers.html
  83. 83. Fin.
  84. 84. Want to talk ops? find me here josh@railsmachine @techpickles
  85. 85. Do you like these things?• Rails• Operations• Ping Pong• Beer We are hiring

×