- Assume everybody here has heard of the project- The broad aim is to give Australia a world class broadband network for generations- NBN is a wholesaler of Ethernet layer two network – many, many retailers- Three different technologies in use: 93% Fibre, 4% Fixed Wireless, 3% Satellite- Look at our webpage – contains news updates well justified with factual data and lots of interesting technical detail.
- Small part of the overall organization (approx 5% - 10%) - We started with a virtual blank slate- Internal development organizations founded with Agile principles in mind - We had an externally hosted basic website and a payrole system- Very much a DevOps style shop, responsible for the end to end delivery process - Our brief was to build a exciting web presence and all the externally facing (all the way to PROD) applications needed to accept Orders and Tickets for a Telco - Build a classic three layer infrastructure with an external hosting provider - Multiple copies of the environment CI, Dev, Staging, Prod (80 machines) - Management pressure to build it really fast! - We felt the direction we where headed was towards ‘snowflake servers’ and we wanted to avoid it
- Puppet to the rescue! - I’m going to tell you about our journey with Puppet - The mistakes we made, the breakthroughs in understanding we had - This stuff is not an NBN competitive advantage, we’re happy to share and learn from our mistakes
- Machine population is segregated into groups to provide isolation for your - Method of segregation was initially by splitting the manifest top level directory per production environment environment- Puppet agent installed as daemon and running with 30 minute period
- This process was working to a certain definition of ‘working’, but definitely not - Not building snowflake servers, our servers where closely related (fruit) optimal - But not 100% confident that our lower layer environments closely reflect what’s in- Not DRY production- A lot of time was spent copying information from one directory to another and - The promotion process is reliant on a lot of manual processes manually repeating processes- Analysis of what is different across environments using a diff tool
- We were better than we would have been than cowboying changes straight onto - At this point we decided that we needed to be better and kicked off a big refactor of the box with SSH our work- But we are still aren’t a well oiled machine
- Our operations team working very closely with our development team - Multiple environments Dev, UAT, Staging, Prod- Lots of exchange of ideas - We were getting divergence between our different environments due to manual- Our Dev groups were very keen on implementing Continuous Delivery merge processes- We decided that we’d do this with our virtual machine setup too - Change from distinct directories per environment to single manifest directory share- Follow the same patterns as software development by all environments
- This is a big change that we wanted to do safely - Make good behaviour for your developers easy- We’ve still got infrastructure changes that need to be made each day - Fast feedback loop- Need to establish a red/green refactor cycle with good test coverage - Single top level script that runs test suite locally - Spawns lots of parallel processes running the catalogue compile for every node
- Now using version control system the right way - Syntax checking is not enough!- Easy traceability of who changed a recipe, when and why - More specific functional testing- To change a recipe you don’t have to repeat yourself lots of time and do lots - Over time we added more tests to catch the repeated errors we were making manual steps - Extends the safety net so that you fail early (it breaks in Dev not in PROD)
- CI server runs our build system with every check in to source control - Refactor safely completed- The team learns about a mistake as soon as it is made, not some indeterminate - Celebration! time later
- Environment promotion pipeline - Invert the mode in which the Puppet agent is run- NB: version number is fastest changing in lower environments - Instead of polling the Puppet agent is invoked only when instructed- Only ‘good’ versions are available for promotion to higher environments - We use Mcollective for this
- Different environments are pulling different versions of the same manifest directory - There are class of errors that can only be detected by running directly on the structure server and testing the result- Dev has the newest version, Prod has the oldest version - The ‘—noop’ flag is the best way to ‘guessing’ what will really happen when you apply a manifest
- An agent to ‘kick’ the puppet daemon and run in –noop mode - Kick the puppet agent for real and run using the environment to which you belong
- Give strong visual and immediate feedback on the results of a change - Give strong visual and immediate feedback on the results of a change- Green banner == change passed tests - This one not so good- Provide centralized report that shows ‘what would change if I were to deploy this - Provide Puppet detailed output leading developer directly to the failing node and version’ failure reason.
- Surprise, the new DC is ready – we need you to move in before Christmas!
- Our process is incrementally changing the machines and it is very good at doing - Use virtual machine to impersonate a particular node and build it from scratch that- Incremental consistency is no guarantee that you can rebuild from scratch- We needed to develop new tools to help with this problem
- Vagrant + VirtualBox + Puppet is a great combination for learning!- Provides new team members a safe environment to learn about the PROD manifests in detail- Is an essential part of building great team with knowledge shared broadly across memebers
- Its your safety net, make sure you use it all the time - 5 minutes delay (the time to run through a well configured pipeline) isn’t an- Sometimes ‘urgent’ changes that you think can ‘easily’ be cowboyed straight onto inefficiency worth working without a safety net the box aren’t as easy as you thought