Care and feeding notes


Published on

These are my notes from my talk "Care and Feeding of Large Web Applications"

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Care and feeding notes

  1. 1. 3/3/12 No Title Care and Feeding of Large Web Applications by Perrin Harkins So, you launched your website. Congratulations! And then there were a bunch of quick fixes. And you started getting traffic so you had to add more machines. And some more developers. And more features to keep your new users happy. And suddenly you find yourself spending all your time doing damage control on a site that seems to have taken on a life of its own and you cant make a new release because the regression testing alone would take three years. Usually, this is the part where everyone starts clamoring for a rewrite, and the CEO contemplates firing your ass and bringing in an army of consultants to rewrite it all in the flavor of the month. How can we avoid this mess? How can we create a web development process that is sustainable for years and doesnt hold back development? Backstory Theres more than one way to do it, but Ill tell how my team did it, at a small startup company called Plus Three. Let me give you a few stats about our project: About 2.5 years of continuous development 2 - 5 developers on the team during that time 65,000+ lines of Perl code 1600+ lines of SQL (Computed with David Wheelers SLOCCount program) Plenty of HTML, CSS, and JavaScript too 6000+ automated tests in 78 files 169 CPAN modules Its a big system, built to support running websites for political campaigns and non-profit membership organizations. Some of the major components are a content management system, an e-commerce system with comprehensive reporting, a data warehouse with an AJAX query builder GUI, a large-scale e-mail campaign system, a variety of user-facing web apps, and an asynchronous job queue. This talk isnt meant to be about coding style, which Ive discussed in some previous talks, but Ill give you the 10,000 foot overview: Object-oriented MVC-ish structure with the typical breakdown into controller classes, database classes, and templates.file:///Users/perrinharkins/Conferences/care_and_feeding.html 1/7
  2. 2. 3/3/12 No Title (Not very pure MVC, but thats a whole separate topic.) Our basic building blocks were CGI::Application, Class::DBI, and HTML::Template. Ok, thats the software. How did we keep it under control? Deployment Lets dive right in by talking about the hardest thing first: deployment. So hard to get right, but so rarely discussed and so hard to generalize. Everyone ends up with solutions that are tied very closely to their own organizations quirks. The first issue here is how to package a release. We used plain old .tar.gz files, built by a simple script after pulling a tagged release from our source control system. We tried to always release complete builds, not individual files. This is important in order to be sure you have a consistent production system that you can rebuild from scratch if necessary. Its also important for setting up QA testing. If you just upload a file here and there (or worse, vi a file in production!), you get yourself in a bad state where your source control no longer reflects whats really live and your testing misses things because of it. We managed to stick to the "full build release" rule, outside of dire emergencies. Like most big Perl projects we used a ton of CPAN modules. The first advice youll get about how to install them is "just use the CPAN shell," possibly with a bundle or Task file. This is terrible advice. The most obvious problem with it is that as the number of CPAN modules increases, the probability of one of them failing to install via the CPAN shell for some obscure and irrelevant reason approaches 1. The second most obvious problem is that you dont want to install whatever the latest version of some module happens to be -- you want to install the specific version that youve been developing with and that you tested in QA. There might be something subtly different about the new version that will break your site. Test it first. Let me lay out the requirements we had for a CPAN installer: Install specific versions. Install from local media. Sometimes a huge CPAN download is not convenient. Handle versions with local patches. We always submitted our patches, but sometimes we couldnt afford to wait for a release that included them. Fully automated. That means that modules which ask pesky questions during install must be handled in some way. Im looking at you, WWW::Mechanize. Install into a local directory. We dont want to put anything in the system directories because we want to be able to run multiple versions of our application on one machine, even if they require different versions of the same module. Skip the tests. I know this sounds like blasphemy, but bear with me. If you have a cluster of identical machines, running all the module tests on all of them is a waste of time. And the larger issue is that CPAN authors still dont all agree on what the purpose of tests is. Some modules come with tests that are effectively useless or simply fail unless you set up test databases or jump through similar hoops.file:///Users/perrinharkins/Conferences/care_and_feeding.html 2/7
  3. 3. 3/3/12 No Title Our solution to the installation problem was to write an automated build system that builds all the modules it finds in the src/ directory of our release package. (Note that this means we can doctor one of those modules if we have to.) We used the Expect module (which is included and bootstrapped at the beginning of the build) and gave it canned answers for the modules with chatty install scripts. We also made it build some non-CPAN things we needed: Apache and mod_perl, the SWISH-E search engine. If we could have bundled Perl and MySQL too, that would have been ideal. Why bundle the dependencies? Why not just use whatever apache binary we find lying around? In short, we didnt want to spend all of our time troubleshooting insane local configurations and builds where someone missed a step. A predictable runtime environment is important. To stress that point a little more, if your software is an internal application thats going to be run on dedicated hardware, you can save yourself a lot of trouble by only supporting very specific configurations. Just as an example, only supporting one version of one operating system cuts down the time and resources you need for QA testing. To this end, we specified exact versions of Perl, MySQL, Red Hat Linux, and a set of required packages and install options in addition to the things we bundled in our releases. That was the theory anyway. Reality intruded a bit here in the form of cheap legacy hardware that would work with some versions of Red Hat and not others. If we had a uniform cluster of hardware, we could have gone as far as creating automated installs, maybe even network booting, but the best we were able to do was keep our list of supported OS versions down to a handful. This is also a place where human nature can become a problem. If you have a separate sysadmin group, they can get territorial when developers try to dictate details of the OS to install. But thats another separate topic. The automated build worked out very well. Eventually though, as we added more modules, the builds started taking longer than we would have liked. Remember, we built them on every machine. Not the most efficient thing to do. The obvious next step would be binary distributions, possibly using RPMs, or just tar balls. Not trivial, but not too bad if you can insist on one version of Perl and one hardware architecture. If we were only concerned about distributing the CPAN modules, it might be possible to use something existing like PAR. If youre interested in seeing this build system, the Krang CMS (which we used) comes with a version of it, along with a pretty nice automated installer that checks dependencies and can be customized for different OSes. ( You could probably make your own for the CPAN stuff using CPANPLUS, but youd still need to do the Expect part and the non-CPAN builds. QA Upgrades We didnt automate upgrades enough. Changes on a production system are tense for everyone, and its much better to have them automated so that you can fully test them ahead of time and make the actual work to be done in the upgrade process as dumb as possible. We didnt fully automate this, but we did fully automate one of the crucial parts of it: data and database schema upgrades. Our procedure was pretty simple, and coincidentally similar to the Ruby on Rails schema upgrade approach. We kept the current schema version number in the database and the code version number in the release package, and when we ran our upgrade utility it would look for any upgrade scripts with versions between the one we were on and the one we wanted to go to. For example, when going from version 2.0 tofile:///Users/perrinharkins/Conferences/care_and_feeding.html 3/7
  4. 4. 3/3/12 No Title 3.0, it would look in the upgrade/ directory (also in our install bundle), find scripts named V2.1 and V3.0, and run them in order. Usually they just ran SQL scripts, but sometimes we needed to do some things in perl as well. Our SQL upgrade scripts were written by hand. I tried a couple of schema diffing utilities but they were pretty weak. They didnt pick up things like changes in default value for a column, or know what to do with changes in foreign keys. Maybe someday someone will make a good one. Even then, it will still require some manual intervention when columns and tables get renamed, or a table gets split into multiple tables. One cool thing we discovered recently is a decent way to test these upgrades on real data. We always set up a QA server with a copy of the current version of the system, and then try our upgrade procedure and continue with testing. This works fine except that when you fix a bug and need to do it again, it takes forever to set it up again. We tried VMWare snapshots, but the disk performance for Linux on VMWare was so poor that we had to abandon it. Backups over the network seemed like they would take a long time to restore. Then we tried LVM, the Linux Volume Manager. It let us take a snapshot just before the upgrade test, and then roll back to it almost instantly. Time-travel bug Plugin System Harder than it sounds Simple factory works for most things Configuration The trouble with highly configurable software is that someone has to configure it. Our configuration options expanded greatly as time went on, and we had to devise ways to make configuring it easier. We started with a simple config file containing defaults and comments, like the one that comes with Apache. In fact it was very much like that one because we used Config::ApacheFormat. In the beginning, this worked fine. Config::ApacheFormat supplied a concept of blocks that inherit from surrounding blocks, so that if you have a block for each server and a parameter that applies to all of them, you can put it outside of those blocks and avoid repeating it. You can even override that parameter in the one server that needs something different. As the number of parameters grew, we realized a few things: People will ignore configuration options they dont understand. Expectations are that if the server starts, it must be okay. A few of lines of comments in a config file is pretty weak documentation. Long config files full of things that you hardly ever need to change are pointless and look daunting. To deal with these problems, we started making extensive use of default values, so that things that didnt usually get changed could be left out of the file. We ended up creating a fairly complex config system in order to keep the file short. It does things like default several values based on one setting, e.g. setting the domain name for a server allows it to default the cookie domain, the e-mail account to use as the Fromfile:///Users/perrinharkins/Conferences/care_and_feeding.html 4/7
  5. 5. 3/3/12 No Title address on site-related mail, etc. Of course this created the necessity to see what all of the values were defaulting to, so a config dumper utility was created. By the time we were done, we had moved to a level where using one of the complex config modules like Config::Scoped probably would have been a better choice than maintaining our own. Well, Config::Scoped still scares me, but something along those lines. Testing You all know the deal with testing. You have to have it. Its your only hope of being able to change the code later without breaking everything. This point became very clear to me when I did a couple of big refactorings and the test suite found all kind of problems I missed on my own. For any large application, youll probably end up needing some local test libraries that save setup work in your test scripts. Ours had functions for doing common things like getting a WWW::Mechanize object all logged in and ready to go. When youre testing a large database-driven application, you need some strategies for generating and cleaning up test data. We created a module for this called Arcos::TestData. (Arcos is the name of the project.) The useage is like this: my $creator = Arcos::TestData->new(); END { $creator->cleanup() } # create an Arcos::DB::ContactInfo my $contact_info = $creator->create_contact_info(); # create one with some values specified my $aubrey = $creator->create_contact_info(first_name => George, occupcation => housecat); This one is simple, but some of them will create a whole tree of dependent objects with default values to avoid needing to code all that in your test. When the END block runs, it deletes all the registered objects in reverse order, to avoid referential integrity problems. This seemed very clever at the time. However, after a while there were many situations that required special handling, like web-based tests that cause objects to be created by another process. We had solutions for each one, but they took programmer time, and at this point I think it might have been smarter to simply wipe the whole schema at the end of a test script. We could have just truncated all the non-lookup tables pretty quickly. We got a lot of mileage out of Test::WWW::Mechanize. Test::Class helps similar classes Testing web interfaces - Mech tricks - Selenium Smolder Testing difficult thingsfile:///Users/perrinharkins/Conferences/care_and_feeding.html 5/7
  6. 6. 3/3/12 No Title Code Formatting This was the first project I worked on where we had an official Perl::Tidy spec and we all used it. Can I just say it was awesome? Thats all I wanted to say about it. Developers who worked on Perl::Tidy, you have my thanks. Version Control A couple of years ago, only crackpots had opinions about version control. CVS was the only game in town. These days, theres several good open source choices and everyone wants to tell you about their favorite and why yours is crap. Im not going to go into the choice of tools too much here. You can fight that out amongst yourselves. We used Subversion, but Ill try to talk about the theory without getting bogged down in the mechanics. Most projects need at least two branches: one for maintenance of the release currently in production, and one for new development. Most of you are familiar with this from open source projects. Here are the main ideas we used for source control: The main branch is for new development, but must be stable. Code should not to be checked in until all tests pass. (But more about that later.) When you make a release of the main branch, tag it. That means tagging the whole branch at that point. Example: tag release 2.0. The main branch is now for development of 3.0. For each main branch release, make a maintenance branch from the point where you tagged it. Example: make a "2.x" branch for fixing bugs that show up in production. When you make a bug fix release from a maintenance branch, tag the branch and then merge all changes since the last release on that branch to the main branch. This is the only merging ever done and its always a merge of changes from one sequentially numbered tag to the next and into the main branch. Example: tag the 2.x branch bug fix release as 2.1. Merge all changes from 2.0 to 2.1 to the main development branch. This is about as simple as you can make it, and it worked very well for us for a long time. Eventually though, we discovered situations that didnt fit nicely. One of these was that sometimes there was a period of a few days during QA where part of the team would still be working on bug fixes on the development branch while others were ready to move on to working on features for the next major release. You cant do both in the same place. One solution is to create the maintenance branch at that point, for doing the final pre-release bug fixes, and let the main branch open up for major new development. Its a bad sign if you need to do this often. Usually the team should be sharing things evenly enough to make it unnecessary. Another problem, although less frequent than you might expect, is keeping the development branch stable at all times. Some changes are too big to be done safely as a single commit. At that point it becomes necessary to make a feature branch, working on it until the new feature is stable and all tests are passing again, and then merging it back to the main development branch. Beware of complicated merging, whether your tools support it well or not. A web app is not the Linux kernel. If you find yourself needing to do bidirectional merges or frequent repeated merges to the pointfile:///Users/perrinharkins/Conferences/care_and_feeding.html 6/7
  7. 7. 3/3/12 No Title where you have trouble keeping track of whats been merged, you may need to take a look at your process and see if theres some underlying reason. Maybe the source control system is being used as a substitute for basic personal communication on your team, or has become a battleground for warring factions. Some problems are easier to solve by talking to your co-workers than by devising a complex branching scheme.file:///Users/perrinharkins/Conferences/care_and_feeding.html 7/7