• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
phptek13 - Caching and tuning fun tutorial
 

phptek13 - Caching and tuning fun tutorial

on

  • 4,435 views

phptek13 Caching and tuning fun for high scalability tutorial

phptek13 Caching and tuning fun for high scalability tutorial

Statistics

Views

Total Views
4,435
Views on SlideShare
4,429
Embed Views
6

Actions

Likes
0
Downloads
17
Comments
0

3 Embeds 6

http://librosweb.es 4
http://librosweb.dev 1
http://lanyrd.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Caching serves 3 purposes : - Firstly, to reduce the number of requests or the load at the source of information, which can be a database server, content repository, or anything else.
  • Secondly, you want to improve the response time of each request. If you request a page that takes 50ms to load without caching and you get 10 hits/second, you won't be able to serve those requests with 5 Apache processes. If you could cache some of the data used on the page you might be able to return the page in 20ms. That doesn't just improve user experience, but reduces the load on the webserver, since the number of concurrent connections is a lot lower. → connections closed faster → handle more connections and, as a result, more hits on the same machine. → If you don't : more Apache processes needed → will eat memory, will eat system resources and as a result will also cause context switching.
  • More tuning → Reduce the amount of data that needs to be sent over the network or the Internet - benefits you, as application provider, because you have less traffic on your upstream connection. - also better for the enduser because there will be less data to be downloaded. → Ofcourse part of frontend side, which we'll discuss near the end of the tutorial.
  • The first way to cache in a web application, or actually more commonly a website, is to cache entire pages. This used to be a very popular caching technique, back in the days when pages were very static. It's still popular for things like company websites, blogs, and any site that has pages that don't change a lot. Basically, you just render the page once, put it in a cache, and from the moment onwards, you just retrieve it from the cache.
  • Store part of a page. Probably most common + best way to cache data. - Basically what you do is, take piece of data : - data from the database - result of a calculation - an aggregation of two feeds - parsed data from CSV-file from NFS share located on the other side of the world - could be data that was stored on a USB stick your kid is now chewing on. What I mean is : it doesn't matter where the data came from. Part of a page, usually a block on a page and want save time by not having to get that data from its original source every time again. Instead of saving entire page, where you can have multiple dynamic parts, some of which might not be cached because they are really dynamic, like the current time. So store small block, so that when we render the page, all we do is get small block from cache and place it in dynamic page and output it.
  • Store the output of SQL queries. → Now, who of you know what SQL query caching is, MySQL query cache for example ? → Basically, the MySQL query cache is a cache which stores the output of recently run SQL queries. It's built into MySQL, it's... not enabled by default everywhere, it depends on your distribution. → And it speeds up queries by a huge margin. Disabling it is something I never do, because you gain a lot by having it enabled. → However, there's a few limitations : - First of all, the query cache is limited in size.
  • But, basically one of the big drawbacks of MySQL query cache, is that every time you do an insert, update or delete on a table, the entire query cache for queries referencing that table, is erased. → Another drawback is that you still need to connect to the MySQL server and you still need to go through a lot of the core of MySQL to get your results. → So, storing the output of SQL queries in a separate cache, being Memcache or one of the other tools we're going to see in a moment, is actually not a bad idea. Also because of the fact that, if you have a big Website, you will still get quite a bit load on your MySQL database. So anything that takes the load off the database and takes it to where you have more resources available, is a good idea. → Better : store returned object or group of objects
  • Another caching technique I want to mention is storing the result of complex PHP processes. - You might think about some kind of calculation, but when I mention calculation, people tend to think about getting data from here and there and then summing them. - That's not what I mean. By complex PHP processes I mean things like parsing configuration files, parsing XML files, loading CSV-data in an array, converting mutliple XML-files into an object structure, and so on. - End result of those complex PHP processes can be cached, especially if the data from which we started doesn't change a lot. That way you can save a lot of system resources, which can be used for other things.
  • There's plenty of other types of data to store in cache. The only limit there is your imagination. All you need to think of is : - I have this data - how long did it take me to create it - how often does it change - how often will it be retrieved ? That last bit can be a difficult thing to balance out, but we'll get back to that later.
  • OK, let's talk about where cached data can be stored. I already mentioned MySQL query cache. Turn it on But don't rely on it too heavily especially if you have data that changes often.
  • I said I was going to discuss some do's and don'ts... → This one falls under the category don't → There's a second database mechanism for "caching", at least some people use it for that purpose. It's called database memory tables. → MySQL has such as storage type : it's called a memory or a heap table. And basically it allows you to store data in tables that are stored in memory. → Don't confuse it with a temporary table, which is only valid for your connection. → This is actually a persistent table, well persistent meaning that it will survive after you disconnect, but it won't survive a server reboot, because it's in-memory only. → Advantages of this storage type are that it's faster than disk-based tables and you can join it with disk-based tables. Also, there's a default limit of 16MByte per table and it can be troublesome getting it to work on a master-slave setup. → So my advise is : don't use it.
  • Alright, next. Opcode caching... this is definitely a DO. → There's a few opcode caches out there. → Now what is opcode caching ? Basically, when you run a PHP file, the PHP is converted by the PHP compiler to what is called bytecode. This code is then executed by the PHP engine and that produces the output. → Now, if your PHP code doesn't change a lot, which normally it shouldn't while your application is live, there's no reason for the PHP compiler to convert your source code to bytecode over and over again, because basically it's just doing the same thing, every time. → So an opcode cache caches the bytecode, the compiled version of your source code. That way, it doesn't have to compile the code, unless your source code changes. This can have a huge performance impact.
  • APC is the most popular one and will probably be included in one of the next few releases. Might be 5.4, but there's still a lot of discussion about that. I'm guessing we probably won't see it before 5.5 or who knows 6.0, if that ever comes out. To enable APC, all you have to do is install the module, which can be done using PECL or through your distribution's package management system. Then make sure apc is enabled in php.ini and you're good to go. → The other opcode caches are eAccelerator, which is sort of slightly outdated now, although it does in some cases produce a better performance. But since APC will be included in the PHP core, I'm not sure if it's going to survive for very long anymore. → Then there's Zend Accelerator, which is built into Zend Server. Basically, it's similar to APC in terms of opcode caching functionality, but it's just bundled with the Zend products.
  • APC is the most popular one and will probably be included in one of the next few releases. Might be 5.4, but there's still a lot of discussion about that. I'm guessing we probably won't see it before 5.5 or who knows 6.0, if that ever comes out. To enable APC, all you have to do is install the module, which can be done using PECL or through your distribution's package management system. Then make sure apc is enabled in php.ini and you're good to go. → The other opcode caches are eAccelerator, which is sort of slightly outdated now, although it does in some cases produce a better performance. But since APC will be included in the PHP core, I'm not sure if it's going to survive for very long anymore. → Then there's Zend Accelerator, which is built into Zend Server. Basically, it's similar to APC in terms of opcode caching functionality, but it's just bundled with the Zend products.
  • APC is the most popular one and will probably be included in one of the next few releases. Might be 5.4, but there's still a lot of discussion about that. I'm guessing we probably won't see it before 5.5 or who knows 6.0, if that ever comes out. To enable APC, all you have to do is install the module, which can be done using PECL or through your distribution's package management system. Then make sure apc is enabled in php.ini and you're good to go. → The other opcode caches are eAccelerator, which is sort of slightly outdated now, although it does in some cases produce a better performance. But since APC will be included in the PHP core, I'm not sure if it's going to survive for very long anymore. → Then there's Zend Accelerator, which is built into Zend Server. Basically, it's similar to APC in terms of opcode caching functionality, but it's just bundled with the Zend products.
  • Slightly better than using local disk is using a local memory disk or a ramdisk. → Advantage : slightly faster, on the other hand if you're using Linux the standard file caching system will cache recently accessed files anyway, so there might not be a big performance impact when comparing to standard disk caching.
  • See slide >> replication!<<
  • See slide
  • See slide
  • - Key names must be unique - Prefix/namespace your keys ! → might seem overkill at first, but it's usually necessary after a while, at least for large systems. → Oh, and don't share the same Memcache with multiple projects. Start separate instances for each !) - Be careful with charachters. Use only letters, numbers and underscore ! - Sometimes MD5() is your friend → but : harder to debug - Use clear names. Remember you can't make a list of data in the cache, so you'll need to document them. I know you don't like to write documentation, but you'll simply have to in this case.
  • OK, that sort of covers the basics of how we can use Memcache to cache data for your site. So purely in terms of caching in the code, we've done a lot. → There's still things that you can always add. If you're using Zend Framework or any other major framework, you can cache things like the initialization of the configuration file, creation of the route object (which is a very heavy process if you have a lot of routes). → Things like translation and locale can be cached in Zend Framework using 1 command, so do that ! → But as I said before, the only limit is your imagination... → and your common sense ! → Don't overdo it... make sure that the cache has enough space left for the things you really need to cache.
  • So, why don't we switch everything from Apache to nginx ? → Well, it's not THAT easy. There's a lot of Apache modules that Nginx doesn't have, like WebDAV support and many of the authentication modules. → The basic modules are there and they're built into Nginx, which again makes them faster than Apache, because they don't go through some module API whcih causes overhead. → But there are some specific solutions that you can not build using Nginx, although there are some third-party modules out there now, but keep in mind you have to add these by recompiling Nginx. → Now, since we're talking mostly about scaling public websites, chances are we're not going to need any of those modules, so we'll have no trouble at all putting the N in LANMMP.
  • → see slide → And that's all there is to it : it's running. Well, not exactly, we still need to configure it ofcourse.
  • Now, as I mentioned Nginx is very fast and as a first step to using it to scale our website, we're going place it in front of Apache. So, we're going to run it on the same server, but we're going to move Apache to a different port, preferably one that's not accessible from the outside, and we're going to have Nginx forward requests to Apache. → Ofcourse we're not going to send all requests to Apache, 'cause that would be quite stupid, adding overhead. → We're only going to send all dynamic content requests to Apache and serve all static files directly from Nginx.
  • So, we're serving all those extensions directly from disk and forwarding all the rest to the Apache running on port 8080. We're also forwarding the Set-Cookie headers and adding a few headers so Apache can log the original IP if it wants to. → Something to keep in mind here : you will have 2 logfiles now : 1 from Nginx and 1 from Apache. → What you should notice once you start using this type of setup is that your performance from an enduser perspective will remain somewhat the same if your server was not overloaded yet. If it was having issues because of memory problems or too many Apache workers, ... → However, you will suddenly need a lot less Apache workers, which will save you quite a lot of memory. That memory can be used for... Memcache maybe ?
  • OK, what we just did is very nice. → But if you're really not relying on any of the special Apache modules, why would you keep Apache anyway ? → Why not just replace it alltogether ? Well, it depends on what your bottleneck is. → If you're looking for a way to lower your memory usage and you don't mind losing some processing power, this is certainly the way to go. → So let's go for a LNMMP stack. We're going to kick out Apache.
  • If one of the backend webservers goes down, you want all traffic to go to the other one ofcourse. That's where health checks come in
  • >> platforms ! <<
  • >> thing on list <<
  • Indicates how long the file should not be retrieved
  • Split requests across subdomains : - HTTP/1.1 spec advises 2 connections per hostname - To get around that, use multiple subdomains. - Especially put your statics separately → helps when you grow and put them on a CDN - Be careful : don't use too many subdomains → DNS penalty
  • >> in Varnish <<

phptek13 - Caching and tuning fun tutorial phptek13 - Caching and tuning fun tutorial Presentation Transcript