Slides for the path that the Productivity Engineering team at Box took to move from bare-metal servers to a modern cloud platform, consisting of OpenStack and AWS. This was done on the back of two open-source tools by HashiCorp: Packer & Terraform.
The Productivity Engineering exists to make other engineers at Box more productive.
Hi everyone, my name is Nadeem and I am a Software Engineer on the Productivity Engineering Infrastructure team.
In Productivity Engineering, we manage our own infrastructure for our services. This includes the Jenkins cluster, our ClusterRunner nodes, and Forge & mergeq for our Rosie service.
I am going to talk about how we went from single-purpose, long-running bare-metal servers towards an elastic infrastructure.
So we decided to go cloud, for a more modern approach.
So we had two options, either to build a private cloud or go directly for a public cloud provider.
Since we had invested in bare-metal servers, we did not want to just throw them away. We decided to utilize them till they reach end of life and build a private cloud. To move faster, we also decided to put any new growth in a public cloud platform.
Therefore, we decided to go with a hybrid approach with both a private and a public cloud.
For the private cloud, we decided to go with OpenStack, and initially for the public cloud we chose AWS.
However, in the future, we wanted the ability to swap AWS with Google, if required.
…or to build across multiple cloud platforms.
In order to make this process as smooth as possible, we needed to develop an abstract system that would work across multiple cloud platforms.
This is how such a system would look like:
First, you would continuously build fully-backed images for your various types of services.
Then you would run some sort of verification on those images, and if the verification passes, the images would be “blessed”.
Finally, you would take those blessed images, and turn them into any number of real instances.
Logically, you can think of the part that builds & verifies images as phase 1.
And the part that deploys real instances from those images as phase 2.
We also set a few requirements to gauge the success of the system.
First of, we wanted the system to be easy-to-use and have a simple interface.
Keeping with our mission of enabling developer productivity, we wanted our system to be eventually used by an average developer, to be able to create any sandbox environment they need.