Agile Operations - Xpdays France 2009


Published on

Explaining why developers and operations need to work together.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Agile Operations - Xpdays France 2009

  1. 1. Agile and Operations for people with a good taste Gildas Le Nadan - Patrick Debois - Xpdays France 2009
  2. 2. Gildas Le Nadan Gildas comes from France
  3. 3. He has been working on servers for several years
  4. 4. Patrick Debois Patrick comes from Belgium
  5. 5. He works as freelancer always looking for good opportunities
  6. 6. We both come from sysadmin land
  7. 7. But we have been looking with great interest to the agile developer community
  8. 8. Finally we decided to give our own version of agile for operations
  9. 9. Today we have two stories for you
  10. 10. The first one is about the impact of agile development on operations
  11. 11. The second one is what it means to do agile in Operations
  12. 12. So letʼs start with the first great story. It starts off like a beautiful project
  13. 13. Once upon a time there was a team that had a lot of Agile Developers with good intentions
  14. 14. They had prepared all their frameworks and IDEʼs
  15. 15. They had all the tools they needed
  16. 16. They even had experts on Usability
  17. 17. They worked in close cooperation with the customer
  18. 18. They worked hard
  19. 19. Using their Post-its in abundance
  20. 20. They dutifully continued holding their standup meetings
  21. 21. They monitored their progress using a backlog
  22. 22. And this is the result after their first sprint. A fully working product!
  23. 23. And the second sprint again, but still room for a lot of improvement
  24. 24. They worked and they worked, but after a while they realized ...
  25. 25. That they were adminstrating their systems in an adhoc manner and that they were not super sysadmins
  26. 26. So They called in a Sysadmin to fix it
  27. 27. But It turned out their development environment ...
  28. 28. was quiet different from their test environment.
  29. 29. and nowhere near production environment!
  30. 30. They were still using manual installations
  31. 31. The sysadmins also cleanup up the development and test environment, providing a good base for further development
  32. 32. While doing so, Operations followed their official ITIL guidelines,
  33. 33. While they cleaned up things a few testers found some bugs
  34. 34. But overall most clients were happy with the quality
  35. 35. But still it felt that there was something missing.. It required more Tasting euh testing by a senior person
  36. 36. There was a real nasty bug that came up once in a while
  37. 37. But the problem was Hard to catch
  38. 38. Eventually they nailed it
  39. 39. Confident now, they decided to make their first public release
  40. 40. There were some minor usability problems.
  41. 41. But they were easily solved by the project team with a temporary fix
  42. 42. They created some workarounds
  43. 43. Delivering a great result
  44. 44. Inspired by the success Marketing wanted to put out lots of new features
  45. 45. But then Operations Team shouted NOOOOOO!!!!!
  46. 46. because in reality, things started to get ugly in production
  47. 47. First they had dealt with it, doing some emergency patches
  48. 48. But now things got REALLY ugly
  49. 49. Customers were experiencing bad response times
  50. 50. When they finally activated the logging , carefully checking the impact, the logs were full of useless debug messages
  51. 51. Eventually it appeared, that the usage mix was different: real users would use it synchronous way but the request by the APIʼs were asynchronous
  52. 52. So operations decided to put this in the FAQ
  53. 53. And they put on a ticket system
  54. 54. They installed larger servers
  55. 55. But the platform stayed fragile and required frequent restarts
  56. 56. One of the classic problems is that projects think they are the only one
  57. 57. But this project was not the only one operations had to support
  58. 58. They also had to document things, not loose any information
  59. 59. because, when the project was finished, some developers were assigned maintenance fixes. At some moment, nobody of the original team was still there, and junior staff was trained to step in
  60. 60. The development team had the following view on the subject
  61. 61. Some people, probably the most senior, had a broader view of the platform operations was operating.
  62. 62. In reality it really looked this way
  63. 63. Ops giving specifications To avoid this kind of surprises in the future, they invited operations people during the design phase. This way they could transmit their knowledge of the productions environment and it was written down in the request for proposals.
  64. 64. Ops wrote down every requirement they could think of
  65. 65. But this Big Design UpFront resulted in and over complicated, overly designed and over engineered solutions.
  66. 66. The solutions seems to be to integrate operations IN the project phase. This both in the beginning and during the project. So both in good and bad times...
  67. 67. Because these people will constantly think about ... logs
  68. 68. They will check that sizing is done correctly
  69. 69. They will think of emergency procedures
  70. 70. Make sure Parallel Processing works
  71. 71. That your applications are packaged nicely
  72. 72. that your data can be archived and that the backup AND restore works
  73. 73. take the necessary security measures
  74. 74. think of good deployment tools
  75. 75. They will think about reporting. Find relations with other systems Think of reports management will request for SLA reporting
  76. 76. In the end everybody will be proud of what they prepared
  77. 77. And that includes the serving staff as well!
  78. 78. If you think now. Yeah but Iʼm in another business
  79. 79. You will always require some kind of log files
  80. 80. You will always need infrastructure
  81. 81. More good tools
  82. 82. Someone who needs to deal with angry customers
  83. 83. Good End User Manuals
  84. 84. The need for archiving
  85. 85. Cleanup Routines
  86. 86. dealing with capacity peeks
  87. 87. Monitoring the health of your systems
  88. 88. Some who takes care of supplies to keep your systems going
  89. 89. Hopefully you will see the light in the end
  90. 90. but off course , disasters can still happen!
  91. 91. Agile Manifesto Ok, the operations team needs to be agile, and it needs to be integrated in the project. How would the agile manifesto apply to YOUR work as an operations member
  92. 92. We value the items on the left more then on the right
  93. 93. Individuals and interactions over processes and tools
  94. 94. ITIL vs. Agile ITIL has lot of practices for keeping things running. It used to be a change moderator, but as development is more agile we need to adapt. ITIL v3 is has introduced the notion of continuous improvement too.
  95. 95. Operations as a cost centre Increase in Maturity can bring Value (Gartner scale) 0 ADhoc 1 Reactive 2 Proactive 3 Service 4 Value
  96. 96. There’s no magic tool There’s no magic tool that can save you from bad organization. It still requires you to think!
  97. 97. Working software over comprehensive Documentation Working software over comprehensive Documentation Working means working in operation (Scope Problem , Dev) / Working Service
  98. 98. Customer Collaboration over Contract Negotiation Who’s the customer (Internal, External / Different ASP, Normal company, internal support)
  99. 99. Responding to change over following a plan Operations has been doing this for years. Every incident / issue requires us to react/ adapt things
  100. 100. Avoid the “Big Design UpFront”.
  101. 101. Our highest priority is to satisfy the customer through early and continuous delivery of valuable software Our highest priority is to satisfy the customers: endusers but also developers What is early for the customer? 4d for a server, 2 min for a new account? What is value is for customer?
  102. 102. Risk Mgt DEV /Project = Creating value Loss of Value (protect value) = OPS
  103. 103. Welcome changing requirements even late in development. Agile processes harness change for the customer’s competitive advantage
  104. 104. Ops are often very resistive to change. Bussines might require constant adaptation.
  105. 105. Deliver working software frequently, from a couple of weeks to a couple of months, with a preference for the shorter timescale Do things often so you get better at it.
  106. 106. Avoid Big bang migrations. Go in small steps.
  107. 107. Business people and developers must work together daily throughout the project
  108. 108. Have operations in your project and afterwards In good and bad times...
  109. 109. Build projects around motivated individuals. Give them the environment and the support they need and trust them to get the job done. Different environments : dev, test, prepod, training, trial, prod, qa Do you trust your developers to do deployment? Do you any secrets/super power they don’t have?
  110. 110. The most efficient method of conveying information to and with a development team is face-to-face Don’t lock yourself in a small room with only email communication!
  111. 111. Working software is the primary measure It things work, and people are satisfied, you’re doing a good job!
  112. 112. Agile Processes promote sustainable development. The sponsors, developers, and users should be able to maintain constant pace indefinitely Shared projects Specialists On call + daily job extend deployment power beyond ops team to spread the load
  113. 113. Continuous Attention to Technical Excellence and good design enhances agility Keep your skills sharp! You never know who is looking at you.
  114. 114. Scalability Thinks of Scalabilty
  115. 115. M A N A A G E B Y I T L I Manageability (start, stop subparts/ monitor progress)
  116. 116. Maintainability Maintenability = changed the text depending on the environment that changes
  117. 117. abi lity ec ur S Securability
  118. 118. Reliability Reliability
  119. 119. F l e x i b i l i t y Flexibility
  120. 120. Why is it important Ops has limited control over the elements they need to integrate or take care or
  121. 121. A Loose Coupling C B A’ C B A’ C D B E F G Noodle Soup Loose coupling
  122. 122. Butterfly Effect Butterfly effect
  123. 123. KPI and Monitoring KPI and Monitoring
  124. 124. Simplicity -- the art of maximizing the amount of work not done Don’t go over engineer, Pragmatic
  125. 125. Simplicity Design Issues: Keep things Simple Stupid (KISS) Donʼt over cluster, loop networks, ...
  126. 126. Best Architectures, requirements and designs emerge from self-organizing teams. Use the tools you can adapt to your needs as you require them. Not because they have a good marketing.
  127. 127. Closed Software Closed Hardware Avoid Closed Source Software or Closed Appliances
  128. 128. Multiple Projects • One Product Owner? • = Program Manager Be clear on who is your customers. Your boss, project manager(s), tickets?
  129. 129. Incidents vs Projects Avoid being a Shared resource, pick the phone, take complaints, and new projects So you can commit to your work better.
  130. 130. Pair System Administration Operations decided to go for pair sysadminstration Project but also for incidents Learning , spreading the knowledge (vs. specialist / hero culture)
  131. 131. Continuous Improvement • Burndown charts vs Qos • Target, no absolute/ Estimation Always try to improve yourself
  132. 132. Virtualized Hardware Go virtual on your hardware. Stop your emotions ;-)
  133. 133. Automated Deployment Automate things, that you don’t want to do over and over again. DRY: don’t repeat yourself
  134. 134. Config Mgt Version control your stuff, use tools like puppet, chef in stead of custom scripts
  135. 135. Doing Incremental Steps Work in small steps Changes in configurations: better traceability
  136. 136. Refactoring You need to correct mistakes.
  137. 137. Test Driven Administration Be sure that you can test/monitor what you need to have things working. Otherwise you are blind when changes happen.
  138. 138. Trend analyze for better prediction of when things will fail
  139. 139. Even if project finishes, environment will changes (patches, new hardware). So you need to able to test
  140. 140. Sometimes cleaning is easy. But if there are Legacy systems with lots of dependencies or no clear owner
  141. 141. Work together on your teams Continuous Integration system. You will learn a lot
  142. 142. So the next time youʼre celebrating a new project release
  143. 143. Maybe youʼll remember us
  144. 144. Make Operations Fun Again So that your operations team will be happy
  145. 145. More? IEEE-paper:
  146. 146. Thanks you for listening!