I would describe myself as a fairly typical backend developer, though I am a bit proud that the UI guys at Achievers describe my front-end work as “not so bad.” Since joining Achievers I’ve worked on some of the biggest infrastructure changes we’ve done, including the migration to PFA (remember from the first tech talk), implementing our cross currency solution, and lately helping Juan get our DB sharded.
No one remembers when your web app is running fast, but they will never forgot when it performs like molasses. The speed of your web app directly relates to your business goals, as evidenced by the two examples here.Performance is very important, but scalability and availability are even more important to your business as every hour your site is down or unable to handle load is lost revenue and lost consumer confidence.To ensure that you application is performing and scaling well, you must profile it to understand what it is doing.
Borrowed the definition of profiling from Wikipedia, sounds pretty fancy.Basically profiling is understanding exactly how your program is executing, and for web applications we are most concerned with time duration as opposed to other facets such as memory.There are a few different ways a profiling tool can test your program. Some programming languages (such as Java) have hooks that these tools can leverage to gather statistics, and some tools will dynamically recompile your program to build in all the data gathering.After a tool runs, you may have either a statistical summary of what it observed, known as a profile, or a stream of events, known as a trace
I want you to walk away from this talk and be able to dig into how your code is running. Hopefully with the tools that will be shown off you’ll have a good starting point. If you’ve seen or used the tools in this presentation that is great, you’re ahead of the curve =)
1) Request level profiling gives you a good overall understanding of what is occurring in a request2) As the DB is likely your biggest bottleneck, we’ll learn how to easily and automatically find queries with poor performance3) Then we’ll look at tools that will show you exactly what your PHP code is doing4) Moving on will be a short bit on browser profiling5) Then we’ll move on to how to maintain performance starting with load testing before moving on to production monitoringI will stop for questions after each section so that I don’t end up info dumping too much =)
It’s one thing to want your site to perform faster, but you have to be realistic; set yourself some short and long-term goals.Can anyone guess what a users expectations for how long it a page should take to load are these days?Turns out that today’s standards are that your site is fully loaded within 2 seconds, which means your server has to respond within 400ms to give the browser enough time to render and load. Hitting these targets is not easy, at Achievers our server responds in 250ms on average, but our full load times are between 2.5 and 3 seconds meaning we have some work to do.There is a balancing act between improving performance and releasing new features, you don’t want performance to slowly suffer as features are added, nor do you want to fall behind but have a blazingly fast site
A few notes before we begin:The content of this tech talk is aimed to be applicable to all languages and frameworks, but some of the tools that will be shown off are more geared towards MySQL and PHP. There are similar tools for other languages and the slide notes will have links where possible
Profiling for web requests is pretty much answering the question “Why the hell is the page taking so long to load?”From expectations, the application server needs to send the response in 400ms or less; if it isn’t at that level, we need to know if the issue is with the queries ran, an API call, or possibly just your application code itselfWe’re going to look at the profiler that comes included in the Code Igniter framework, which is quite helpful and appends the profiling statistics to the responseNotes:Other profilers include:Laravel - Another PHP framework which has a profiler built-in (http://laravel.com/)dynaTrace - Java profiler + lots of other features (http://www.compuware.com/application-performance-management/dynatrace-enterprise.html)cProfile - Python profiler built into the language (http://docs.python.org/2/library/profile.html)Visual Studio - Profiler for C# in the IDE (http://msdn.microsoft.com/en-CA/library/z9z62c29.aspx)
With CI enabling the profiler is easy, just inject that line anytime before your controller ends. The profile details will be output at the bottom of the page, injected into the HTML just before the ending <body> tag. Let’s see what it looks like with our demo application.*******Show off profiler on first 6 degrees, than 3rd. Wow, that’s quite the jump in response time and number of queries. This is a contrived example as I implemented a terribly inefficient breadth-first search for this six degrees of separation calculator, but it does illustrate how useful the profiling details are. With it I know exactly where to start looking for the issue given the number of queries is the biggest difference between these two pages. I also want to point out that you need to profile over multiple sets of data or configurations or you may end up only seeing a slice of the pie (so to speak)*******
Here’s an example of the CodeIgniter profiler for our test application, from one of the six degrees of separation pages. Clearly a red flag here would be the 3700 queries being ran!
The output of the profiler is really a high-level health check on the request, as you can easily see if there are slow queries, too many queries, or code sections that take a while to run.The most important sections in the profile details are certainly the queries executed and the timing benchmarks; the former allowing you to quickly find straggling queries and the later identifying the areas of code that you should dig into if they are taking too long
1) But, the output is more of a guideline. If the request doesn’t run very well on your dev machine, then likely it won’t be great in production. But if it looks good on your machine, that doesn’t imply it will work well on production, which is why we will talk about load testing later on2) And now, a question for the audience: how many people actually have a profiling tool like CodeIgniter’s enabled as they develop?
I wrote a ton of documentation on how to install/use profiling tools (which became the basis for this tech talk) and shared it with the team, but surprise surprise, no one took it seriously.So after a particularly bad day of fixing performance issues the day before a release (not the first time I had to do so), I took it upon myself to create a profiling solution that no dev could ignore
Thus I created the Achievers Performance Header (trademarked) so that performance is always at the fore-front of our developers. The header appears on any dev/test environment and ties the performance of the request to our goals, including total DB time, # of queries executed, and # of duplicate queries. For instant feedback the text is colour coded from green to red as the performance gets farther from our targets. You can also expand the header to show the worst performance details so it is easy to see what needs to be improved.You would think that having this header would be enough that everyone starting taking performance seriously, but turns out it wasn’t
Even with the header,developers were still ignore the performance issues in the request. The leadership team made the decision that everyone needed to be smacked upside the head if performance was poor, so now the header begins expanded when past any target threshold, which looks a bit like *this*Now it is pretty much impossible to ignore a poorly performing page. We also got QA involved and had them start logging performance issues at the same level as functional issues. This has helped integrate profiling and performance testing into the dev cycle as no dev wants to see the header (it was designed to be this ugly, hours were put into its UI)
I believe that during development you should always be profiling your code. The sooner you find an issue or notice that a change hurts performance, the faster and easier it is to fix.Of course, there is a time and a place for optimization, and it should be only after a feature is functional as there is no point in optimization code that doesn’t do its job.But in the end, you need your developers to buy-in and get in the habit of profiling as they code, or you’ll end up with surprises later onSTOP FOR QUESTIONS
Now that we’ve seen how to profile at a high level and as I said the queries executed is the most important part to look at. So now we’re going to dive into how to find the queries that are bottlenecking your DB
I don’t think anyone will disagree with me if I say that database performance is critical, its like saying that chocolate is the most delicious thing in the world.Because your database is a shared resource, slow queries in one part of your application can affect the performance of the rest.Especially given that it isn’t a simple infrastructure change to scale your DB with master/slave configurations or creating a clustered DB
Going to be pretty glib here, but really the only real possibility for finding slow queries with MySQL is using the slow query log.
When you turn the slow query log on in MySQL, any query that takes longer to execute than your defined threshold (in seconds) will be logged along with information on how long it took and how many rows were examined. Turning the slow query log on is as simple as adding the “log-slow-queries” directive to your mysql config, and you can adjust the threshold with the “long_query_time”, the default being 10 seconds. So in this example any query that takes longer than a tenth of a second to execute will be logged.
This is a snippet of the slow query log that I took from our production servers, I’ve just obfuscated the tables/server names. Because we run Percona Server the slow query log has many additional details over what MySQL gives, which is quite nice =)Now all this information is great, but let me tell you from experience that trawling though the slow query log is painful, especially if you have the same queries showing up all the time. Thankfully we’re going to look at a tool to synthesize the log into a more helpful format.Note:Percona Server slow query log extensions - http://www.percona.com/doc/percona-server/5.1/diagnostics/slow_extended.html#changes-to-the-log-format
pt-query-digest reads the logs, and aggregates queries based upon their structure, so ignoring the exact parameters, then aggregates the data from the log into high level statistics
Running the tool is straightforward on linux systems, just run the script pointing it to the slow query log and redirect the output to a file.The tool has many options, but a few interesting ones are including specifying the date range to examine and writing the data to MySQL instead of a file.So let’s open up and see what the digest looks likeNotes:To run on Windows I recommend installing Strawberry perl (http://strawberryperl.com/) and then running the command you just need to prepend “perl” (ie c\\> perl pt-query-digest /var/log/mysql/slow-query.log)
Here is an snippet of the output that I generated on my local machine for our demo site. Let’s walk through the three sections
The first section contains statistics pertaining to all queries in the slow query log. Interesting details include the # of unique queries, and then average stats. There are less values here than in the slow query log I showed earlier because of the difference in slow query output between MySQL and Percona Server.
The second section contains the most expensive queries, ranked in descending order. Each query is assigned a unique Query ID that you can search for in the file to find the full query details. Since this ranks the most expensive queries first, this is the best place to start for examining slow queries in more depth.
After the ranking, there is a section for each query ranked. Similarly to the overall section, we see statistics pertaining to the executions of this query. Finally it shows an instance of the query from the slow query log
Now we have a ranking of the worst queries in our application, so you should pop the top ranked query off and start digging into you it doesn’t perform.With MySQL that means running EXPLAIN on the query to understand the execution path and where it is going wrong. How to use and understand MySQL EXPLAIN is a tech talk in itself, and thankfully Dr. Aris did one so I suggest watching the video on Achievers tech and reading his slides. When you are running EXPLAIN, run it on the biggest data sets applicable as MySQL can change its execution path depending on the relative sizes of tables in the query. So it is possible for a query you optimized to hit a tipping point, and all of a sudden start performing badly because it is executing in a way you didn’t intend
And that is why you always keep up to date on the slow query log. Optimizing queries once doesn’t guarantee they will stay performant.To get ahead, your developers need to understand how to find poorly performing queries, and equally important they know how to use EXPLAIN and figure out how to fix those queries.One thing we started recently at Achievers (thanks to Alex Lecca!) is to generate weekly reports with pt-query-digest so that we have visibility into how our DB is performing week to weekSTOP FOR QUESTIONS
Moving on, I have a theoretical question: can anyone tell me what could be happening if the application server is taking a really long time to send the response, but your queries are running fast, none of your benchmarks are slow, and lets assume no issues with 3rd party system?
The most likely scenario is that lots of small pieces of code are being executed a lot and the time just adds up, for example a recursive function that doesn’t hit its end condition properly.Well, this is exactly where a tool called callgrind (and it’s derivatives) comes to the rescue. It is a command line too that profiles all the function calls made in your application, and generates a call stack with all the timings.
In a couple slides, we’ll look at a tool which reads in these call stacks and summarizes the data into an aggregrated, easy to understand format.But before that, running command line tools isn’t very webby, is there an easier way to generated these grind files?Notes:For those interested in the technical details, the callgrind format is documented at http://valgrind.org/docs/manual/cl-format.html
Thankfully there is with PHP! XDebug is not only for interactive debugging, but also has a profiling function which outputs callgrind files.There is a downside, generating a profile with XDebug will slow the execution of a request by 3-5 times as it is doing a lot of introspection.*********Go to webgrind.1) Open reference pageHere at Achievers we take web security very seriously, and we use Open Web Application Security Project’s (OWASP) Enterprise Security API for input and output encoding in our platform, to protect from Cross site Scripting attacks. If you want to learn more about Web security, you can watch the tech talk on Web Security that Matt York gave last year. So using this library I created an output encoded version of the same page as this grind file, let’s take a look at it in WebGrind*gasps*, that’s an over 10 times performance decrease!*********Notes:See docs at http://xdebug.org/docs/profilerI used http://www.jetbrains.com/phpstorm/marklets/ to generate bookmarklets so toggling profiling on/off is just a simple clickMany profiling tools for languages other than C/C++ can generate or transform data into the callgrind file formatRubyProf - http://ruby-prof.rubyforge.org/ - The CallTreePrinter function generates callgrind formatLua - http://jan.kneschke.de/2007/11/14/profiling-lua-with-kcachegrind/Python - https://pypi.python.org/pypi/pyprof2calltree/
So here we have WebGrinds UI, as you can see it is pretty simple. By default it orders the functions based upon total self cost, which the cost of the function call subtracting the cost of any functions it calls. Just like pt-query-digest, the tool does most of the work of finding where to start for you; just start from the top and work your way down. And obviously it is a much better use of your time to improve a method that is called 10 times but takes 500ms, than a method called 200,000 times and totals 20ms.So what exactly is WebGrind show us?At the top we have a bar graph showing the distribution of work (classifying as either internal PHP functions, procedural methods, class methods, or includes) in the entire request, the total # of unique function calls, and total time2) Then it breaks down all the functions by name. The last three columns are the most important, listing how many times the function was executed, the total self time, and the total inclusive time3) Beside each function there is a icon that either links to your source code, or PHP docs for internal functions4) You can also expand a function to see where it was called from and the functions it calls
We use WebGrind over the other tools because I believe it presents the data in the easiest to digest format. But I have to point out that currently WebGrind can only read callgrind files generated by XDebugNotes:KCacheGrind - http://kcachegrind.sourceforge.net/html/Home.htmlWinCacheGrind - http://sourceforge.net/projects/wincachegrind/WebGrind - https://github.com/jokkedk/webgrind
To install WebGrind just follow the instructions on it’s GitHub page, they are dead simple. Example virtual host for Apache 2.2 is:<VirtualHost *:80>ServerNamewebgrind.localhostDocumentRoot "c:/webgrind" <Directory "c:/webgrind"> allow from allAllowOverride all </Directory> </VirtualHost>
Back to our example, are we stuck with a ten times decrease in performance or not?When we first started securing our code we weren’t paying attention to its performance until right before release, so we were caught with our pants down so to speak******Let’s take a look at another grind file, for an optimized version of the OWASP library.You can already see a huge difference in the number of times functions are called and the total time. The output encoding is still an appreciable chunk of time, but not longer crazy.******
First lets not kid ourselves, output encoding is always going to be an expensive operation as you’re encoding every non-alphanumeric character to a unicode code point. The OWASP algorithm is generic and pretty intense, but we were able to find optimizations. So how did we do it?
1) First we had to understand what was doing on, so what we did was download WebGrind in order to see the summary of what PHP was doing. 2) As you might guess, we were astonished by how many times we were calling functions with the OWASP code, but it gave us the starting point for where to dig into their code to see where we could improve it3) It took several very smart Achievers several days to figure out exactly where we could make changes to improve the performance; it was not a simple taskAs you know these tech talks are about giving back to the tech community, especially related to the lessons we’ve learned, so soon we are going to make our optimizations available on our Achievers Tech GitHub page.In the end, the changes we made to the OWASP code were significant and required a through understanding of their code. Thankfully with PHP there is a really simple addition you can make for an immediate performance boost without changing a line of code.Notes:Full list of changes we made:Removing redundant encoding: we know all our code and database are in UTF-8, so we skip the encoding detection step by telling it the initial encoding is UTF-8 (does the detection if we don’t pass initial encoding). we also don’t again detect encoding for each character after splitting the string by characterUnpacking the bytes of the entire string and looping over them to encode is a lot faster than splitting the string by character and foreach unpacking its bytes and encodingWe cache all the string we encode. It uses a bit more memory, but vastly saves on processing time. Without the caching in our demo page it would take about 50% longer to do the encodingThere were debug statements in the OWASP encode that normalized the input encoding before passing it to the debug logging method, but since we have debugging off for OWASP that function did nothing, so this extra work normalizing was all wastedWe also do some APC caching of large objects in the library (specifically the HTMLPurifier necessary for XSS cleaning) to reduce the overhead of instantiating thingsWe also removed from the HTML encoding the conversion of some characters into their HTML entity codes (compared to their equivalent hex codes) as it has to decode the entity to the character then normalize store in a map
That addition being opcode caching! As you may be aware, PHP is an interpreted language so every request the source files need to be read, parse, and compiled before being executed. An opcode cache just lets you short-circuit past the first three steps at the expense of some memory. The performance boost is phenomenal, ranging between two to fives times faster!At Achievers we use APC as it is simple to setup, and it turns out that it is on the roadmap to be included into the core PHP in version 6.0Notes:The options listed are those that are being actively developed:APC : http://pecl.php.net/package/APCXCache : http://xcache.lighttpd.net/wiki/IntroductionZend Optimizer+ : https://github.com/zend-dev/ZendOptimizerPlusThere are also PHP compilers out there:Phalanger (http://www.php-compiler.net/) which compiles PHP into Common Intermediate Language (CIL) bytecode for .NET to run using IIS/Windows or Mono/Apache on linuxHipHop (https://github.com/facebook/hiphop-php) by Facebook which actually converts PHP into C++ using Just-in-Time compilation
Setting up APC is pretty simple, just get the extension and enable it in your php.ini, so I won’t go into too much detail. With APC you may want to tweak the cache size and whether it should check if the file is modified every request. But otherwise turn it on and enjoy the boostNotes:For Windows users, if the versions at downloads.php.net/ don’t work for you, try at http://dev.freshsite.pl/php-accelerators/apc.html. You may have to search around for a version compiled for your version of PHP/Thread-safetyMost important options to tweak in APC are:apc.shm_size -size of each of shared memory allocated for APC in MB (defaults to around 32MB depending on distro)apc.stat -whether to check if the file is modified on every request
One more time, but WebGrind isn’t really useful unless it becomes second nature to use as you are developing. It is pretty important as with it you’ll have a better understanding of what is happening in your code and prevent surprises like we had.STOP FOR QUESTIONSAt this point we’ve discussed profiling both your database and application code, so we’ve covered the main aspects of the server side of a request. But once the response is sent and it starts loading in the user’s browser, how can we find out what the browser is doing and if we can increase it’s performance?
Of the topics in this presentation, how to profile and improve browser performance has by far the most already written about it. So I doubt anyone will be surprised by the tools I will go over.Before we get to the real tools, lets talk a little about what data we are interested in when it comes to performance in the browser
I’m sure that anyone who’s worked on the front-end of a website has used firebug or a native developer tool at least once. These literally form the basis of profiling in the browser.In general, the most critical aspect to performance in a browser is how fast it can request, download, and use resources. Slow resources directly affect how quickly the content can be rendered, styled, and usable.But how do we make use of this information?
Good thing is, some else has already done the work for us. YSlow and PageSpeed are basically the industry standards for profiling browser performance.They analyze the page, score it based upon a variety of rules, and then give you recommendations on what you can do to improve performance.Not that it is always simple to implement the recommendations, but if you follow the advice you will get biggest bang for your buckNotes:PageSpeed has an online version at https://developers.google.com/speed/pagespeed/insights (doesn’t work for authenticated pages)
Using either of these tools is as simple as opening them and clicking the “Analyze now” or “Run test” buttons.1) First we have a screenshot from YSlow showing the overall grade and every rule it applied. Each rule has its own grade and a short explanation of the rule and the recommendation. 2) Now we see PageSpeed, which only gives you an overall score out of 100, but orders its recommendations from the largest potential wins to the smallest
Going to go through these next two slides quickly. I’ve just put a quick summary of the most important things you can do for various parts of a web request to improve performance, but again, implementing these isn’t always simple.Notes:Parallelizing downloads - HTTP spec only allows 4 concurrent downloads from a single domain so the browser blocks on the 5th resource, but if you use different domains (even if they point to the exact same server) you can increase the parallelism of the browser and eliminate blocking
There are hundreds of resources online that go into depth on how to implement each of these recommendations, so instead of talking about each one I want to show off another pretty cool browser profiling tool from Google, called Speed TracerNotes:ETags: Etags stands for Entity Tags, the browser will cache the resource and the ETag from the header. Then the browser can send a request for the resource with a “If-None-Match: [ETag]” and if the resource hasn’t changed the server can just return a 304 Not Modified header so the browser knows to use the cached resource. If the resource changed the server will just return the new resource and ETagCDNs: Content Delivery Network - separate (generally cookie free) domains that only serve up static resources, and can be placed to be geographically close to users
Speed Tracer is a chrome extension that records UI as they occur in your page, and them shows you a timeline of how sluggish (resource constrained) the UI thread is. Though this tool is pretty cool, any issue that you might discover from it is of a much lower priority that implementing YSlow and PageSpeeds recommendations.****Switch over******Notes:Excessive layout refers to the browser having to re-compute where DOM elements should because of a CSS change such as changes to anything in the box model, floats, etc. Changing a lot of values in succession can cause a lot of tiny layouts, worse is appending a lot of nodes to the DOM in succession (such as adding rows to a table)
The sluggishness graph shows how much UI resources are being consumed, with the top of the graph being 100%. So the smaller the peaks the better, as being at 100% means all the UI resources are consumed and new events have to block until resources are freeBelow that is the network graph, which exists just to be able to cross-reference network activity with sluggish UI behaviourAt the bottom is the event timeline, which is very similar to the network timelines from developer tools. It shows, in order, the UI events that occurred and how long each took. You can click into each event to get a more detailed view of what occurredWhile this tool is cool, it probably isn’t too useful unless you have a very very script heavy page, at which point being able to see how long script evaluation, callbacks, and garbage collection takes can be useful
But to wrap up, profiling in browsers is really quite simple, use YSlow or PageSpeed as they do all the heavy lifting of profiling. All you need to do is figure out how to implement their recommendations.STOP FOR QUESTIONSNow we’ve covered all the areas you can profile your web application in, but all the concepts and tools I’ve shown has been in the context of a single developer profiling on their dev environment. How do we know if our code will scale?
To answer that question, we stress the bejeezus out of our code with load testing
All the profiling we’ve talked about so far only gives you a general guideline whether your application will perform well.Unless you generate some load you don’t know a piece of your infrastructure, lets say your database, will have a conniption once dozens of requests are vying for its resources concurrently.Average response times discovered through load testing will also be more accurate than your own development numbers.
In order for load testing to be relevant, you need to have an idea of the critical paths through you application. The pages you want to make sure are running as fast as possible and scale to many users are the prime candidates for load testing.In Achievers, our load test involves a user logging in, view the newsfeed and catalog, searching for a user, doing a POST to make a recognition, then logging outIt’s a good idea to include a request that writes to the database so that you are testing a ‘static’ site
There are many options available for load testing, and I will admit the only one I’ve used in depth is JMeter (though loadimpact.com seemed interesting when I ran it’s interactive demo). I can tell you that JMeter is very powerful and configurable, but also is a bit confusing if you aren’t familiar with it, mostly because it can generate load for more than just HTTP requestsNotes:JMeter - http://jmeter.apache.org/Siege - http://www.joedog.org/CURL-loader - http://curl-loader.sourceforge.net/doc/fast.htmlWebLOAD - http://www.radview.com/Load Impact - http://loadimpact.com
JMeteris an Apache project and is a graphical interface for building test plans to make requests to a variety of endpoints, including HTTP.It has a lot of functionality including setting up concurrent threads (ie what makes it useful for load testing), storing cookies, asserting on responses, and reportingThe UI takes a little bit to get used to, it’s a bit strange that to add an HTTP request you have to go to the “Sampler” menu option******Lets take a look a JMeter, go through the test plan and show adding a new URL******
So here is a screenshot of a test plan I came up with for Special K’s rentals. Let’s walk through what is going on in the plan:1) First I have a added thread group, which is the basis of generate multi-user load. I have it set to 50 concurrent threads2) Then for each thread there is a cookie manager (so that session cookies from logging in are stored) and some request defaults to make life easier3) After that we define all the URLs we want to hit. Each thread will log in, go to a film, hit the six degrees calculator, do a film search, and then logout4) On the right side I have open the aggregate report that populates as the test is in process. Here we have all sorts of stats including the number of samples and the min/max/average response time. There are a few other columns I cut off including the error rate and throughput.If you can see the numbers in the report, you can see that the min and average values for each page are starkly different, so loading the server greatly increased response times. Since I was running this on my laptop with 50 threads, it makes perfect sense as 100% of the CPU was being used during the whole test. Which brings up that load testing should be done on an environment as close to your production servers as possible to get the most relevant numbers
Just like profiling, load testing should become part of your standard cycle. At Achievers we run a two week sprint cycle meaning that every two weeks a there should be new usable code for features. Thus we really should be profiling our code at least once every two weeks to validate we aren’t ruining performance. Regardless of the interval of load testing you choose, it is most important to be consistent so you can really see the performance trend line. I’ll admit we aren’t very good at this internally right now, we need to get our process in order.
Now that you can create test plans and run them, you should have a good idea of how your application handles load and thus be confident that your application won’t collapse when you run in production.Your load tests should be geared towards the average number of concurrent users you handle day-to-day, but you should be aware of how your system can handle spikes of users. At Achievers we average about 300 concurrent users at a time, but we have seen spikes up to 800 users; and I’m proud to say our application scales great as there is barely a drop in response times.STOP FOR QUESTIONSThough our newfound confidence that our application scales is great, it doesn’t hold a candle to actually knowing how your application scales in production. To truly know this, we need to monitor what is going on in production.
There are a few different types of production monitoring tools, from IT centric tools like Zabbix for monitoring server hardware/network traffic, or a marketing centric tool like Google Analytics. But were going to look at tools that are centered around profiling your application as it runs, and letting you know if it sees performance and scalability issues.
The fancy name for this type of monitoring is Application Performance Management or APM. As I mentioned, APM tools combine profiling the guts of your application with diagnosing and flagging issues that impact performance or availability. So as these tools profile your appthey aggregate the data into actionable statistics.The APM tool I’m going to talk more about is New Relic as we run it here at Achievers. APM tools generally do not come cheap (paying per server per month usually), but both New Relic and AppDyanmics do have free lite versionsHowever, if your company has the money to pay for a full version, it is more than worth it for the insight they give you into the performance and scalability of your applicationNotes:New Relic - https://newrelic.com/Scout - https://scoutapp.com/AppDynamics - http://www.appdynamics.com/dynaTrace - http://www.compuware.com/application-performance-management/dynatrace-enterprise.html
Here is a short list of some of the features that APM tools generally come with; not every feature is available in free versions unfortunatelyThe only downside of running an APM (other than cost) is the miniscule performance drop, usually in the 1-2% range. However this is a small price to pay for all the concrete data they provide
New Relic is a SaaS platform with a web UI that is quite intuitive and slick. The free version contains the basics: real-time + server monitoring, application response times, and end-user monitoring, but only retains that data for 24 hours. The pro version has a lot more features including database response times, slow request/query traces, and also retains the data forever.Obviously the cost of an APM means they aren’t accessible to startups, though even the scaled back features in the free version are so useful I’d never start a new web application without them.
Unfortunately the installation of New Relic for PHP isn’t as simple as the other tools we’ve talked about, so I won’t bother with it.New Relic has two components, a PHP extension that does all the data gathering while your PHP code executes (not unlike XDebug) and a daemon process that receives the data and transmits it to the New Relic servers. The extension contains a method you can call which outputs a script block which facilities the end-user monitoringNotes:https://newrelic.com/docs/php/the-newrelic-install-script
Let’s assume you got New Relic installed and running, so what does it look like?This screenshot is the overview for New Relic and has auto-refreshing dashboards galore. We actually run this on a TV in the dev department so anyone can see at a glance how production is doing, alongside a TV for Google AnalyticsThe UI is pretty darn slick as you can hover over the graph for more details, drag on the graphs to zoom in, and turn components on and offI’ll show off a couple other pages in New Relic to give you more of a feel of how much useful information it provides.
This is the Web Transactions page where we can see the app server + browser times, which you can sort in many ways.Here it is sorted by the percent of total page load time, so basically the most popular URLs are at the top.On the right side we get some nice graphs on response time and throughput, and cut off at the bottom are slow traces that New Relic has identified
If you click into one of the web transactions from the left side, a new pane appears showing the performance and throughput of that web transaction. It also has slow traces for the transaction in case you want to dig in to see what their breakdown is.
If you click into a trace, another new pane appears, with the full details of the trace. We know the browser, where the request was made from, the breakdown of actions making up the total response time, and it even gives us a nice comparison to the baseline for this web transactionThere is a very similar UI for database transactions, and these two sections together have been invaluable in finding pages with poor performance that we didn’t find in our development profiling. As we are a SaaS platform with many configurations, it isn’t possible for us to have 100% testing coverage, so having these real trace details is invaluable to solving customer issues.
New Relic also provides several reports which you can use to see how you application is performing in various aspects. Shown here is the response time scalability report which displays a trend line of how fast your application response times are traffic increases. Generally, the more horizontal the trend line is the better as that means the application response time doesn’t increase as more users are hitting the site.As you can see Achievers is pretty good as our line is pretty flat, and as traffic increases our response time only increases by about 40ms.
You don’t want to see a graph like this, where the response time is increasing pretty linearly with the number of requests. I found this screenshot on the web, I’m not s
I’ve only showed off a couple of the features that an APM tool like New Relic has to offer, but even the basic features like application response time and end-user monitoring are so insightful that I can’t imagine how I lived without it.Since starting to use New Relic I’ve come to the conclusion that just like I wouldn’t write an application without logging to know and debug what’s going on in production, I wouldn’t launch an app without an APM tool in place. The real, user driven data they provide is just that useful and takes a lot of the guesswork out of choosing where to start improving your applications performace (paying for one out of my own pocket is a different matter).STOP ANY QUESTIONS
So we’ve gone over a lot of content so far, so I’d just like to quickly recap what we’ve talked about.….Once you start using these tools you’ll get a feel for how they work and how you should be using them in your organization, so just dive in and start using them!
Coming to the end of this presentation, I just want to iterate the themes I’ve laying down.Profiling has to be integrated into your development culture, which means teaching the tools, making sure developers use them, and holding people accountable for poorly performing codeWithout visibility into how your code is performing, you can’t fix it without floundering aroundProfile and load test your code well before release to have confidence in its scalabilityRun an APM in production because nothing beats real, user-driven metricsAnd just never let up, don’t relax and let the performance of your application slowly be dragged down until it is unusableThank you
If you have questions about the tools or trouble getting them to work, go to the Achievers Tech Facebook page and we’ll get back to you asap
Profiling and Tuning a Web Application - The Dirty Details
Profiling & Tuning a Web Application Kaelen Proctor
So what if my site isn’t the fastest?• Response time directly relates to your business – In 2007 Amazon determined that a 100ms increase in load time would cause a 1% drop in sales – In 2009 Shopzilla decreased page load time from 6 to 1.2 seconds, which netted a 7-12% conversion rate increase!• The slower you can serve up pages the more frustrated your customers become
What exactly is profiling?• Profiling is a dynamic analysis of the time complexity, frequency/duration of function calls, or memory allocation of a program• A profiling tool runs this analysis by instrumenting either the source or executable, through a variety of techniques including event hooks or dynamic recompilation
Goals• To show off some tools of the profiling trade• To demonstrate how to use them effectively and identify the biggest “bang for you buck” bottlenecks• To impress upon you the need to integrate profiling early and continuously into your development cycle
Agenda1. Profiling your application I. What a single request looks like II. The database (MySQL) III. Application code (PHP) IV. In the Browser2. Maintaining performance at scale I. Load testing II. Production monitoring
Before you embark• What is your performance goal?• Where is that relative to today?• What processes are necessary to maintain the goal?
Profiling for a Web Application• Web applications are all about speed; how quickly a response can be sent and usable• On the app server, that means understanding the queries ran, 3rd party libraries, web APIs, and application code• Simple can sometimes be best – We use CodeIgniter (CI) here and its built-in request profiler is easy to use and extremely helpful
Enabling the CI profiler• Drop this line anywhere before the controller ends: – $this->output->enable_profiler(true);• The output code injects the profiling content at the end of the <body> tag• Lets see what it looks like site: Special K’s Video Rentals
How does the profile details help?• Great overview of what is occurring in the request• Queries executed is the most important aspect of a profile – Identification of long-running or duplicate queries• Adding timing benchmarks can give a lot of insight – Especially if you leverage a lot of 3rd party libraries or web services
It’s more of a guideline• Most likely, you’re profiling on your dev machine with test data• No idea how the request will scale – No competition for resources (i.e. database) – Have you profiled all possible configurations?• Is anyone profiling or even paying attention to the details?
… In my experience, they aren’t• No matter how much documentation you write on how to profile and which tools to use, it will get dropped in crunch time• Most developers didn’t even turn the profiler on
Achievers Performance Header ™• Leveraging CI’s profiler, we tie the profiling summary with our performance targets• Text is colour-coded on a linear scale from green to red as the further the request is from our targets• Expanding the header shows the summarized performance details
Shoving in their facesNow once any target’s threshold is passed, the header defaults to the expanded view
Knowing is half the battle• Finding issues early => more time to fix• Always profiling => instant detection of a performance-killing change• But there is balance – “Premature optimization is the root of all evil” – Wait until a feature is working before making it work fast
Database performance is critical• It is the biggest shared resource your application contains• Really slow queries will affect the speed of the entire database• Scaling out your DB is not a simple task, so ensuring it isn’t bogged down is critical
Finding the stragglers• First you need to identify the slow queries, so you can: 1. Manually review each query in your code 2. Profile every request and review each executed query 3. Let MySQL do the work with its slow query log• Let’s go with option #3
Slow query log• When on, MySQL logs any query that runs longer than a threshold # of seconds• The log contains the total query time, lock time, and rows examined/sent• To enable, add to the MySQL config file: log-slow-queries=/var/log/mysql/slow- query.log long_query_time=0.1
pt-query-digest• http://www.percona.com/doc/percona- toolkit/2.2/pt-query-digest.html• Reads the slow query log and groups queries by their structure• Outputs aggregated statistics on the whole log as well as for each query
Digesting• All the Percona tools are Perl scripts, so execution is fairly straightforward• Usage (on unix/linux): – pt-query-digest /var/log/mysql/slow-query.log > digest.out• Options for specifying a date range, filter queries, writing the results to a DB, etc.
Well, now what?• Now you have a great starting point for finding bottlenecks in your DB• Slow queries - run MySQL EXPLAIN – Refer to the tech talk by Dr. Aris Zakinthinos – Most likely it is missing indexes – De-normalization may be necessary – Protip: Use your biggest data sets when running explain
Regurgitation• Running pt-query-digest once won’t solve all your database issues• Tuning your query performance is a never- ending process – Teach developers how to use EXPLAIN and optimize queries – Weekly reports using pt-query-digest to give visibility into DB performance
The Devil is in the details• Callgrind is a language agnostic command line tool that profiles the function calls in your application (through emulation)• It generates callgrind files which can contain the entire call stack, and can be read to summarize what your app code is doing
XDebug profiling to the rescue• Awesomely, XDebug writes callgrind files when profiling is enabled – This makes generating the grind files trivial in PHP• Just add to your php.ini: xdebug.profiler_enable=1 xdebug.profiler_output_dir = "/tmp/xdebug"
Other Grind Visualizers• KCacheGrind was the original visualizer, which was ported as Windows as WinCacheGrind• Regardless, all three aggregate the function calls into total # of calls and total cost(s)• WebGrind is limited to a summary table, whereas the other two can display the full call tree
Installing WebGrind• Installation: 1. Prerequisite: Install XDebug 2. Download zip from WebGrind’s Github 3. Extract zip to folder accessible by webserver 4. Setup virtual host for WebGrind 5. WebGrind will read from the XDebug profiler output directory automatically 6. Open in browser and voila!
Are we screwed?• No! We can fix it, otherwise this wouldn’t be a good demo =)• We ran into these performance issues with the OWASP library late in our security release
Output encoding• Output encoding is a very expensive task – Simply put, the OWASP library encodes any non- alphanumeric character – It makes no assumptions on the incoming data, so ends up doing a lot of encoding detection and normalization before any real output encoding
Digging into OWASP• How did we make it more efficient? – First, we installed WebGrind and started looking at exactly what OWASP was doing – We identified the functions that were taking too long or being called too often, and then dove into the code – A little elbow grease and trial/error later, we had it optimized and running smoothly
Opcode caching• PHP is an interpreted language, so with every request, the code is read from disk, parsed, and compiled into opcode before executing• An opcode cache stores the compiled opcode so the first three steps are skipped• Speeds up your application by 2-5 times!• Options: APC, XCache, Zend Optimizer+
Setting up APC• http://pecl.php.net/package/APC• Linux: – Install w/ PECL: pecl install apc-3.1.9 – Compile the extension yourself• Windows: Download pre-compiled binary from http://downloads.php.net/pierre/• Enable by adding the extension in php.ini• Sit back and enjoy the performance boost
Keep on Grindin’• Use WebGrind to summarize what your app code is doing; find the functions bottlenecking your application• Make it second nature to profile your application code with WebGrind• But for a quick boost, start using an opcode cache now and never look back!
Developer tools• Firebug + webkit developer tools• The most important aspect of these tools to performance is the network/timeline tab• Shows you all resource requests and their timings including blocking, waiting, receiving, and more• Displays when the DOMContent and Load events are fired
Yahoo’s YSlow and Google PageSpeed• Browser extensions for Chrome + Firefox (sorry, IE)• Analyzes a page request/response and offers best practices about how to improve performance• Yahoo and Google know what they are talking about; follow the tools advice for the biggest wins in the shortest timeframe
A quick summary• HTTP – Reduce # of requests – Parallelize downloads – Smaller cookies• HTML – Reduce DOM nodes – Asynchronously load minor content• CSS – Minify + concatenate – Load in the <head>
Google Speed Tracer• Chrome extension that shows a timeline of the internals of the UI thread including HTML parsing, script callbacks, painting, garbage collection, and many more• Resolving issues found by Speed Tracer should be saved for after implementing all of YSlow and PageSpeed’s recommendations
The reasons you load test• A more accurate portrayal of your site’s average performance, compared to request profiling and grind files• Helps locate issues of scale that don’t appear when testing a single request
Before we dive into the tools• First, you need to define a basic flow that you want to measure as your benchmark• Ex. Login -> Newsfeed -> Catalog -> User Search -> Recognition -> Logout• Should contain the most commonly accessed URLs• Also nice to have a mix of GET and POST requests
Choosing a load tester• Options abound – JMeter: GUI – Siege: CMD – CURL-loader: CMD – WebLOAD: GUI – Loadimpact.com - SaaS• We will focus on JMeter since we use it =)
JMeter• A GUI written in Java for load testing and benchmarking servers (HTTP, SOAP, JMS, etc.)• Supports variables in requests, assertions on responses, cookies, and many aggregate reports• Not the most intuitive UI until you get used to it
When to load test?• Depends on your dev cycle, but once a week/sprint is a good starting point• It’s more important to be consistent!• If possible, should be part of an automated build/test suite
Getting ahead• Metrics on multi-user response times before going to production is important• Otherwise you have no idea how your app will scale to a real user load• It’s probably a good idea to load test with more users (threads) than your average to know how you can handle spikes
Application Performance Management• Tools that focus on monitoring + managing the performance, availability, and scalability of an application• Some options: – New Relic - PHP/.NET/Ruby/Java/Python – Scout - Ruby – AppDynamics - Java/.NET – dynaTrace - Java/.NET
The benefits• Too many to list – Real-time dashboards – Application response times – End-user monitoring (browser times) – Error reporting – Alerts for server/performance issues – Server monitoring – And more!
New Relic• SaaS APM platform with a slick web UI• Free lite version has real-time monitoring, server monitoring, and error detection, but only retains data for 24 hours• Pro version is $150/server/month, but has many additional features, including full response traces
Real metrics = real insight• APM tools are the culmination of all the profiling tools + techniques we’ve seen• There is no substitute for real, user-driven performance numbers• Review the bottlenecks the APM identifies, dig deeper using all the other tools we learned about, then watch as your app gets faster and more responsive
Recap1. Profiling your application I. Profiling as an overall health check II. Digesting the slow query logs to find bottlenecks III. Grinding your code to find the hidden details IV. YSlow/PageSpeed and doing what they say2. Maintaining performance I. Use JMeter to load test for scalability II. Monitoring prod to accumulate real metrics
Putting it all together• Teach developers the importance of profiling their code; integrate into culture• Performance must be top of mind/visible• Profile and load test critical sections before release; confidence in your code• Run an APM in production; real, actionable data on bottlenecks• Never let up: the war is never over