Maintained deployment system for 4,000 server environment for leading Internet Shopping Comparison company in West Los Angeles.
Instrumental in bringing up two datacenters of about 300 servers each from scratch for leading Internet Advertising company in Irvine, including building the entire deployment system from metal to application.
The second datacenter, once all the physical hardware was there, took live traffic in less than two weeks from bare metal. </li></ul>
And, of course, self-deprecating humor. </li></ul>
Why a deployment system? <ul><li>If you have one server, you don't need one.
Deploying servers manually usually mean many cycles of going back and forth with the application owners to validate. For each server. This eats up time and means bringing up servers can take up to or more than two weeks...
Why not? <ul><li>It is practically impossible to get application owners to tell you what they need – if they understand what they need in the first place.
It's not their fault. Their focus is on the application. They are not System Administrators. That is why you are paid.
They do not accurately represent server builds
They often contain incorrect or useless information
Frankly, they serve no function other than to waste time and sow confusion, while providing an easily shattered veneer of repeatability.
So, naturally, they're usually the first thing tried. </li></ul>
How does it benefit you? <ul><li>Very, very fast deployment times (metal to live server in less than an hour, and deploy as many servers as you can get open terminals to at once – I've done 20 in an hour. And unlimited if you can get the serial consoles to work via expect.)
The code is the documentation. Server specs can never go out of sync because the server specs are actively deployed.
Very tight control over anything that is deployed to the servers.
(This means that even if someone installs something you don't want them to, you can simply have it removed within 10 minutes with no manual intervention.)
And... puppet (cfengine or another configuration management tool will probably work, but I prefer puppet.) </li></ul></ul>
How does this work? <ul><li>The simple flow is: </li><ul><li>Enter the information for the server into your Asset Tracker. The most important part is the MAC Address, though you can add other things as required. In multiple datacenters you might want a “Location” field, for example.
Make sure the DNS info is properly entered into your DNS server in whatever way.
Tell your DHCP server you want to allow the server to PXE boot. This can happen manually or automatically.
I prefer manually simply because if you set the server up to boot automatically – you can get into a situation where the server accidentally reboots and rebuilds itself. This tends to make app owners unhappy.
And.... let it install. An hour later you have a full build with no further manual intervention. </li></ul></ul>
How does this work behind the scenes? <ul><li>Magic?
Guess I'll have to tell you the super-secret explanation. </li></ul>
Behind the scenes... <ul><li>The RT server is the bedrock of the system. It contains all of the information needed to successfully build the system. A command line interface is absolutely required so that the scripts that actually do the build can get access to the info.
For example, at a minimum you'll want to put the MAC Address into the RT system. You may even want to populate DNS from an IP field. Every step of this process uses the info from RT.
There may be site-specific stuff you need to use. Don't be afraid to add or use it. This is only a framework. </li></ul>
Behind the scenes <ul><li>You will next need a script to build the pxelinux.cfg file for pxebooting. This script pulls all of the required info out of AT (such as the MAC address, location, etc) and generates the appropriate file. This is a custom script and is by no means one size fits all, but is fairly easy to write. The output of this script is a working pxelinux.cfg file. </li></ul>
What about kickstart? <ul><li>Oh, here comes the genius part. (And I can say that because I didn't invent it, but I have it down to an art!)
The kickstart file is not a file at all. It is a CGI script. It goes to RT and DNS and gathers all of the information required, makes decisions on how to build the servers, and then custom generates a kickstart file.
It should at minimum take one argument – the RT asset ID. This is a unique identifier and allows all the information to be pulled out of the asset tracker to be used in the script. </li></ul>
Be careful! <ul><li>This script is one of the bedrocks of what you are trying to do. It is also dangerous. It is dangerous because you are essentially trying to generate a script in one language (kickstart/bash) using another language (perl, python, whatever), so it rapidly becomes unmaintainable.
I recommend something like Template::Toolkit to make it more manageable.
Build maintainability in from the beginning! You may not get another chance! </li></ul>
Yum repository <ul><li>You'll also need a yum repository at this point.
The reason for this is: control. If you use external repositories, you are putting control of releases and upgrades into their hands, not yours.
And while Centos, Fedora, etc., are fairly good about it, they make mistakes – and you do not want your production site to go down because of someone else's mistake. It's still your fault for not taking my advice. :) </li></ul>
Control <ul><li>As you might have gathered from now, as an aside...
For example, you could automate the server restart by using a serial port and expect. For many enterprise-class servers this isn't reliable, but it'll work for some. </li><ul><li>Many servers use CLP-SM. This is great for consistent command lines. Doesn't work so well for programmatic resetting. </li></ul><li>You could also set AT to automatically kick off a rebuild on setting a field.
You could have AT automatically populate DNS using the SOAP client for nictool... </li></ul>
It's not out of the realm of possibility that you could set up a server from bare metal all the way to application deployment simply by setting the fields correctly in AT and then setting a special field using this infrastructure.