Devops down-under

334 views
276 views

Published on

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
334
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Devops down-under

  1. 1. DevOps Down Under 2011 Sprinkling DevOps Magic in Other People's Environments Robert Postill
  2. 2. How's This Gonna Go Down?
  3. 3. How's This Gonna Go Down? <ul><li>Everybody's got a story and this is our's
  4. 4. Our first architecture
  5. 5. Learning from failure
  6. 6. A brief aside
  7. 7. Getting better
  8. 8. Tough messages
  9. 9. Where to from here </li></ul>
  10. 10. C3's Story <ul><li>There was a dream
  11. 11. It makes the excel go into the data warehouse
  12. 12. And it's done badly
  13. 13. So we built a prototype
  14. 14. Then we made a sale </li></ul>
  15. 15. A little bit of how it works
  16. 16. Priorities of our first architecture <ul><li>Works!
  17. 17. Restarts when the machine restarts
  18. 18. Remotely deploy updates
  19. 19. Not a lot of state on the VM </li></ul>
  20. 20. Our first architecture
  21. 21. Our first architecture
  22. 22. Lesson: Most customers will accept a small selection of services if you give them a report from that service
  23. 23. create_deployment.sh <ul><li>Poor man's capistrano
  24. 24. A shell script that: </li><ul><li>Fetched the latest from github
  25. 25. Exported it to a datestamped directory
  26. 26. Made a set of symlinks point to the right places
  27. 27. Restarted the app </li></ul></ul>
  28. 28. Flaws <ul><li>We knew practically nothing about what was happening on the box
  29. 29. The logs... THE LOGS FIX THOSE FREAKING LOGS!!! </li></ul>
  30. 30. And the worst flaw of all... <ul><li>We started to get calls that started with: </li><ul><li>“ Integrity’s down, what's the score?” </li></ul><li>Then we'd have a look...
  31. 31. And it would be the database </li></ul>
  32. 32. Lesson: Things you don't own go badly wrong and the first people to know are the end users
  33. 33. A lot of sad face
  34. 34. So we revved the architecture
  35. 35. Then more stuff happened... <ul><li>We continued to get calls that started with: </li></ul>“ Integrity’s down, what's the score?” <ul><li>Then we'd have a look...
  36. 36. And it would be the VM, mounted disks read-only </li></ul>
  37. 37. Lesson: Virtual Machines are prone to at least a couple of novel modes of failure
  38. 38. Which started to lead to the inevitable
  39. 39. So the next problem... Us <ul><li>New Relic gives you slow transaction reports
  40. 40. In ruby select, collect and friends are ways of making in memory decisions over collections of things
  41. 41. Which works on test set sizes of ten or so
  42. 42. But doesn't on large volumes of things, like say a couple of million objects
  43. 43. We'd created a technical debt mountain </li></ul>
  44. 44. Hiring someone new
  45. 45. A brief trip to the metaworld <ul><li>We're devops by necessity
  46. 46. There is no ops department
  47. 47. Our devs cover a lot of ground </li><ul><li>Architecture
  48. 48. Operations
  49. 49. Database Administration
  50. 50. Networking
  51. 51. Support
  52. 52. Business Analysis </li></ul></ul>
  53. 53. Behold the AnDevOpSuptecht <ul><li>It used to be that a lot of places had Systems Programmers
  54. 54. Now it feels like architects are going the same way
  55. 55. Where's the limit going to be drawn on the responsibility of an individual...
  56. 56. Are we thinking about the roles we play in the wrong way? </li></ul>
  57. 57. Crap Maths Applied To Recruitment <ul><li>Australian Population : 21,874,900
  58. 58. Melbourne Population: 3,478,138
  59. 59. 22.6% ' professionals' in 2006 census: 786,059
  60. 60. Professionals in 'information, media and telecoms': 14,246
  61. 61. Spolsky says 1 in 200 dev applicants can dev, leaving: 712
  62. 62. TIOBE Index says Ruby is used by 1.484% of devs: 10 </li></ul>
  63. 63. Crap Maths Applied To Recruitment <ul><li>Australian Population : 21,874,900
  64. 64. Melbourne Population: 3,478,138
  65. 65. 22.6% ' professionals' in 2006 census: 786,059
  66. 66. Professionals in 'information, media and telecoms': 14,246
  67. 67. Spolsky says 1 in 200 dev applicants can dev, leaving: 712
  68. 68. TIOBE Index says Ruby is used by 1.484% of devs: 10 </li></ul>
  69. 69. So... <ul><li>Before we look into </li><ul><li>Team fit
  70. 70. Seniority
  71. 71. Skills (Ubuntu, Databases, Business intelligence...) </li></ul><li>I need a lie down :(
  72. 72. Congratulations to you in Melbourne who do hire devops!
  73. 73. Do we need to think about apprenticeships? </li></ul>
  74. 74. Lesson: You need good people, really good people
  75. 75. Meanwhile, back at the point...
  76. 76. Looking To Get Smart <ul><li>We wanted to get start deploying to numbers of machines (> 10)
  77. 77. We needed a way to start automating deployment
  78. 78. Have you seen this chef thing?
  79. 79. So we started creating recipes </li></ul>
  80. 80. But we had issues <ul><li>I don't want to beat up on chef
  81. 81. The development of our architecture was *much* slower through chef
  82. 82. We lost our chef database
  83. 83. We tried to run chef server internally on two instances
  84. 84. We spent a lot of time learning things like never use the ui, only ever use data bags
  85. 85. chef changed too fast and we also changed too fast </li></ul>
  86. 86. Lesson: The tools may not be mature enough and more importantly you may not be mature enough to use them
  87. 87. So now we... <ul><li>Take a stock Ubuntu VM
  88. 88. Customise via capistrano scripts
  89. 89. Snapshot, distribute
  90. 90. Update via capistrano and create_deployment.sh
  91. 91. Distribute SSH keys via chef </li></ul>
  92. 92. And the customers kept on ringing <ul><li>In particular there was the terrible case of the wild performance swings
  93. 93. New Relic would give us 6x, 4x, 12x performance swings dependant on the week.
  94. 94. We'd see CPU spikes and terrible loads applied to the mongrels as users got frustrated </li></ul>“ Integrity’s slow, what's the score?” <ul><li>And we'd see... not much </li></ul>
  95. 95. And that got difficult <ul><li>We had to start asking for VMWare metrics
  96. 96. Our working assumption was the same version does not pitch and roll like this
  97. 97. Lets be honest what we're saying is “we don't think you can manage your own infrastructure”
  98. 98. Explicitly :( </li></ul>
  99. 99. A lot of thinking...
  100. 100. Little by little we ground out answers <ul><li>We found out there wasn't a lot of separation between VMs
  101. 101. Then we found out the VMs were moving over different physical hosts (vMotion)
  102. 102. And then we started to get a handle on overcommitment </li></ul>
  103. 103. Lesson: Smart tools can play havoc with performance
  104. 104. Lesson: VMWare (or their competitors) is not a magic well
  105. 105. Where we are now
  106. 106. Where we are now
  107. 107. There's plenty for us still to do <ul><li>Retire create_deployment.sh
  108. 108. Automate deployment
  109. 109. Refactor the architecture to give us scalability over numerous machines
  110. 110. Deploy to only part of the architecture
  111. 111. Deploy based on need </li></ul>
  112. 112. Wrapping Up <ul><li>Pushing your stuff into other people's environments is hard
  113. 113. Back yourself with the stats and share them
  114. 114. Make sure your app has sufficient canaries
  115. 115. Find good people
  116. 116. Prepare for tough conversations </li></ul>
  117. 117. Questions? <ul>Photo credits (in order of appearance): <li>http://www.flickr.com/photos/ricoslounge/38351363/ - ricoslounge
  118. 118. http://www.flickr.com/photos/jima/3435396513/ - jima
  119. 119. http://www.flickr.com/photos/34495711@N06/3613301938/ - Aaron Frutman
  120. 120. http://www.flickr.com/photos/dancoulter/21042744/ - Dan Coulter
  121. 121. http://www.flickr.com/photos/abennett96/2639105060/ - BenSpark
  122. 122. http://www.flickr.com/photos/bcymet/1923368669/ - bcymet </li></ul>

×