7. What’s this got to do with
Symfony2?
“On the other hand, Symfony2 was conceived
from the start to be fast, with a strong emphasis
on performance.”
- The technological benefits of Symfony in 6 easy lessons
http://symfony.com/six-technical-reasons
9. What is Load Testing?
“Load testing is a simulation of multiple users
working with a web application at the same
time.”
“It can be performed for a number of purposes,
but the main goal is to check the performance
of the application.”
24. Pitfalls
· Don’t load test your load testing host
· Know the capacity of the network between
your load test host and the load test target
· Monitor the load test target and check for
contention when testing
If FOSS isn’t your thing, the big kid on the commercialware block is HP LoadRunner - it slices, it dices, it does hard core load testing. Probably the only thing it doesn’t do is make you the coffee.
But be prepared to spend some serious coin, both for the software, and for the guru to make it work.
Just a note - we don’t have LoadRunner - the screenshot here I just nicked from the HP website.
Prefer to do your load testing in the cloud? No problem! There’s plenty of options out there…
Of the cloud based tools, the only one I’ve had brief chance to play with so far is Load Impact.
On the plus side, it’s pretty easy to set up test scenarios and it gives you good feedback as the tests are running. On the minus side I haven’t found a way to export the test data for separate analysis, and the choices of test location are limited to the places where Amazon and Rackspace have a footprint, which is a bit of a bummer if you want to test what a user in Perth is going to experience when your servers are in Sydney...
Now, ironically, despite my bashing JMeter’s sucky UI
my current fave load testing tool is the CLI-based Siege, mainly because it’s very lightweight and pretty quick for setting up test scenarios.
The latest source-only version on the website is 3.0.6. There’s packaged versions for:
OS/X - MacPorts (v2.7.2)
Debian/Ubuntu - OOTB (v2.7.0)
Fedora 20 - OOTB (v3.0.1)
RHEL/CentOS - EPEL Repositories (v3.0.0)
Now, it’s nowhere near as powerful or feature packed as the three I mentioned before but with a bit of care, and a bit of shell scripting it’s possible to setup some fairly thorough test plans.
There is one big downside I need to point out. Out of the box, the timers in Siege only have a 10ms resolution, which I think is too coarse for testing Symfony2 web apps.
But it’s open source, so we can fix this. In fact , for those of you brave enough to compile from source, I’ve put together a patch that increases the resolution to 1ms using the high precision clock on Linux (should work on OS/X - untested). Come and see me later or email me if you want the patch. I’m working to push this patch upstream too.
If it’s not obvious, for the rest of this talk I’m going to look at using Siege. Before I do though, just a quick divert to how I assume you will setup your Symfony2 web app hosting for performance and scalability.
Is it just me, or do exported Visio diagrams look a bit ugly??
Moving on, I’m going to assume that if you’re deploying a high performance Symfony2 web app that you’re going to be using some sort of CDN - push or pull; Akamai, Amazon Cloudfront, Rackspace Cloudfiles, or even just an S3 bucket for static assets.
The thing is, you don’t want to be bogging down your application servers with requests for static files that can be served directly by the CDN. If you do, you’re wasting your web application servers’ bandwidth and memory - resources you need for getting the best performance from your web application.
Now I’ve got that off my chest, I can start talking about using Siege.
Siege is configured through a text file - normally .siegerc in your home directory. You can also specify the location of a custom configuration file on the command line.
You can create an initial configuration using the command siege.config which will write a .siegerc file for you. But you will want to customise it.
The generated version of .siegerc is pretty well commented, so I’ll just call out the things that I think you need to change:
- csv = true: siege will log the results of each URL hit to standard output, setting csv = true will make siege output the log in csv format so that you can import it into Excel or a database for detailed analysis.
- show-logfile = false: at the end of every benchmark run, siege outputs a message at the console about where to find the log of the run - show-logfile = false disables this dumb message.
- benchmark = true: in benchmark mode, siege runs as fast as the web server and network will let it with no delay between requests - flat out, no waiting
Oh no!
Relax - it’s not a problem.
For http basic authentication look for the “login =” section of .siegerc
For cookie based auth, use “login-url’ - this will only be hit once by each siege virtual client. Any cookles sent by the server in response to this URL will then be offered in all siege URL requests.
So how do you tell siege what URLs to hit?
There are two ways
Obviously, you can only test a single URL if you use the command line.
If you use a file, each siege virtual client will cycle through the list of URLs in order, emulating a workflow through your web app. If you want to hit all the URLs in the file randomly, look at the “internet =” option in .siegrc.
Now we’ve got the groundwork out of the way, the real fun can start, but
Dont do this
In this example, we’ve got 200 virtual clients running for 10 minutes. This is not a good first test! You will:
a. overload your web application straight away,
b. drown in garbage data because of queuing delays
c. mask the useful results.
d. delay the tests because siege is going to be trying to spit out results to your terminal window
Start simple
A test like this (1 virtual user bashing away as hard as they can for 30 seconds), is going to reveal a lot more about your best case response time because there (should be) no contention at any layer: CPU, Network, DB, etc, etc
There’s plenty of scope for getting a baseline and finding bottlenecks with a single virtual user (e.g. what happens when your web app takes a memcache miss?)
Once you’ve got a useful baseline with a single virtual user, then you can ramping up the numbers. But maybe start at two, just to see if you’ve got any obvious deadlock issues.
There is one thing that you should do for every test...
Save the test timings
The summary results that siege gives, while useful, are only scratching the surface. By saving the individual test times, you can then analyse the data in more detail, maybe revealing patterns that aren’t visible in a summary.
In fact, as a general rule, if the load testing app that you use doesn’t let you export the individual test times, you should probably find a different one.
Personally, for a single test, I don’t think there’s too much to say here, but for this example I will point out a couple of things:
1. On average, we’re effectively getting 25 page views per second (1500 per minute), assuming that all the static assets are offloaded to the CDN
2. We seem to be averaging in at 40ms per view and our worst case (maybe an outlier?) is 96ms, so there’s probably not too much to worry about so far
But what if we analyse the individual test times?
If I haven’t banged on at you enough about saving the individual test times, I’ll chuck in this last example.
This is a histogram of the individual test results from the summary you saw on the last slide, binned on 10ms intervals. Straight away you can see a couple of things:
1. Although the summary page said the average response time was 40ms, the modal response time is actually around the 25-35ms mark
2. Rather than being an outlier, the maximum response time around the 90-100ms mark accounts for about 20% of the responses - maybe this should be investigated?
Now, just some stuff about traps you can fall into.
This is going to seem kind-of obvious and with Siege it’s not likely to be a problem, but it may affect you if you’re using a different tool. Make sure the box you’re using to generate your load can keep up with the responses from your web app, otherwise you’re just benchmarking your load tester!
At 100Mbit/sec it takes about 1.6ms to transfer 20k of data. As you start ramping up virtual users, responses will be delayed just waiting to get across the network cable - e.g 20 virtual users will experience, on average 32ms of delay just waiting for the network.
One of the reasons I personally won’t totally shift all my load testing work onto cloud based services is that I don’t know what’s going on with the network..
Next one...
Don’t just rely on the output of your load testing tool to tell you when you’ve hit the wall, monitor the system(s) you’re testing too. Amazon Cloudwatch Graphs, New Relic, hell, even top and vmstat can give you insight into how your web app is performing as the load increases.
Not only can some of this info inform your decisions about what & where to optimise in your app, it also gives you a clue about what to look out for when it really goes live.
There’s a couple of low-level pitfalls that I want to call out separately
Now I’m an ops guy, so I think firewalling is a good thing. But out of the box, Linux iptables can actually hinder the availability of a high performance web app.
The first time I encountered this problem was on the box I was running SIege on, but it can happen on the web app server (and when you’re in production too)...
Linux iptables is a ‘stateful inspection’ firewall - to do that, it has to maintain a ‘connection state tracking table’ in the kernel.
The size of this table is fixed, it doesn’t change as the number of active connections varies.
Now, if the number of active connections on the server exceeds the site of the table, new connections are just dropped on the floor and you get this error logged in the system log (/var/log/syslog on Debian derived distros)
When viewed from siege it will look like your app’s performance has tanked, when you look at your web app server it will appear to be relatively idle.
Thankfully. the fix is pretty simple - just increase the size of the connection tracking table, like so:
You can also add that setting to /etc/sysctl.conf so that it’s applied at every boot.
The other pitfall you may encounter is kinda the same, but different…
It’s the TCP TIME_WAIT state...
What the hell is that I hear you say…
In TCP, when a client wants to connect to, say, a web server, it chooses a random port number from an assigned port range and uses that port to send a message to the web server listening on port 80. The combination of the Source IP address and port and the Destination IP and port uniquely identifies any particular conversation between the two boxes.
Now in the TCP protocol spec there are some rules that require one of the hosts that was participating in any conversation, once it’s over, to ‘hang around’ listening for any stray packets that may have got lost in the network. This is called the TIME_WAIT state and lasts, nominally, for two minutes (a pretty long time…)
During that two minutes, that port number cannot be used for a new connection AT ALL.
We can do a bit of quick math here to work out how many ports we can play with: 61000-32768+1=28233 usable ports in any two minute period, which works out to an average of 235 connections per second or about 4.25ms per connection.
Seems pretty fast, doesn’t it?
Anyone here used the Symfony2 AppCache?
In a test bed scenario, I’ve managed to get an app using Symfony2 AppCache to serve 99.2% of all requests in under 4ms.
Now it was a test scenario, but I’m sure some smart folks like you can make this happen in your production environments…
Back to the problem - there is two pieces of good news:
1. I think it’s peculiar to siege,so it only happens on the load testing host (probably need to check this on other load testing tools)
2. It’s pretty easy to fix - just increase the usable port range
In this example, I’ve increased the available port range to the maximum - 64,512 ports, or about 537 connections per second. If you blow-out that limit, you need to get another load testing host and run them in parallel.
Well, I think that’s me done...
Now you’re armed with all this great knowledge about load testing, all I can say is
Get out there and beat those web apps into submission!