Web Speed And Scalability


Published on

Web Speed And Scalability

Published in: Technology, Sports
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Web Speed And Scalability

  1. 1. Jason Ragsdale – 01/08/2008
  2. 2. <ul><li>How to build a bigger, faster, and more reliable website </li></ul><ul><li>You will learn the concepts of Speed and Scalability </li></ul><ul><li>Specific Examples of Caching, Load Balancing and testing tools. </li></ul>
  3. 3. <ul><li>What is Scalability? </li></ul><ul><li>Avoiding Failure </li></ul><ul><li>High Availability?!?!? </li></ul><ul><ul><li>Monitoring </li></ul></ul><ul><ul><li>Release Cycles </li></ul></ul><ul><ul><li>Fault Tolerence </li></ul></ul><ul><ul><li>Load Balancing </li></ul></ul><ul><li>Static Content </li></ul><ul><li>Caching </li></ul><ul><li>Yslow (Let it be your friend) </li></ul>
  4. 4. <ul><li>Horizontal Scalability </li></ul><ul><ul><li>Capacity can be increased just by adding more hardware/software </li></ul></ul><ul><ul><ul><li>Best solution </li></ul></ul></ul><ul><ul><ul><li>Does not guarntee that you are safe </li></ul></ul></ul><ul><li>Up (Vertical) Scalability </li></ul><ul><ul><li>Capacity can be increased by adding more Disk Storage, RAM , Processors </li></ul></ul><ul><ul><ul><li>Expensive </li></ul></ul></ul><ul><ul><ul><li>Should only be used if Horizontal will not work for you </li></ul></ul></ul><ul><ul><ul><li>Difficult to move to Horizontal if you run out of capacity in your hardware </li></ul></ul></ul>
  5. 5. <ul><li>Capital investment will be made </li></ul><ul><li>The system will be more complex </li></ul><ul><li>Maintenance costs will increase </li></ul><ul><li>Time will be required to act </li></ul>
  6. 6. <ul><li>Good Planning </li></ul><ul><ul><li>Have a plan for whatever you are about to do to your system, and most importantly, have a roll-back plan if and when things do not work the way you expected. </li></ul></ul><ul><li>Functional and Unit Testing </li></ul><ul><ul><li>Automated test do not catch everything that can go wrong, but they are very good at catching bugs introduced by changes elsewhere in your code base </li></ul></ul><ul><ul><li>Unit Testing (PHPUnit, Simpletest) </li></ul></ul><ul><ul><li>Function Testing (selenium) </li></ul></ul><ul><li>Control Change (Version Control) </li></ul><ul><ul><li>USE IT!!!! There is no better way even as a single developer to keep your codebase safe from bad changes </li></ul></ul>
  7. 7. Version Control in Action <ul><li>/trunk/ </li></ul><ul><ul><li>Used for all mainline development </li></ul></ul><ul><li>/production/ </li></ul><ul><ul><li>Only stable and production ready code from trunk is contained in here. Only make fix severe bug fixes in this branch </li></ul></ul><ul><li>/tags/ </li></ul><ul><ul><li>Holds copies of production ready code </li></ul></ul><ul><li>Do not use Version Control as a backup solution, backup your VCS seperately </li></ul>
  8. 8. High Availablity?!?!?! <ul><li>What is “five nines” 99.999%? </li></ul><ul><ul><li>Do the math, 60 seconds * 60 minutes * 24 hours * 365 days </li></ul></ul><ul><ul><ul><li>31,536,000 seconds of uptime a year </li></ul></ul></ul><ul><ul><ul><ul><li>99.999 * 31536000 = 315.36 seconds of downtime a year </li></ul></ul></ul></ul><ul><li>Understand the goodness of “Planned maintence periods” </li></ul><ul><ul><li>There are things you will need to do to your systems on a peridoic basis I.E. Database Cleanup, Disk Defrag, Software/Hardware Upgrades </li></ul></ul><ul><li>You can stagger your maintence periods if you have enough servers so you have no custmomer downtime, just a reduction in capacity </li></ul>
  9. 9. Monitoring <ul><li>No matter how stable your code is or how reliable your hardware, you will have failure </li></ul><ul><li>Monitoring Methods </li></ul><ul><ul><li>Top Down (Business Monitors) </li></ul></ul><ul><ul><ul><li>Monitor the application as the customer interacts with it </li></ul></ul></ul><ul><ul><li>Bottom Up (System Monitors) </li></ul></ul><ul><ul><ul><li>Most commonly used </li></ul></ul></ul><ul><ul><ul><li>Monitors the base components of your application like </li></ul></ul></ul><ul><ul><ul><ul><li>Disk Space </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Network speed </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Database Statistics </li></ul></ul></ul></ul><ul><ul><ul><li>By no means bad, but without Business Monitoring you will not be able to catch all failures </li></ul></ul></ul>
  10. 10. Criteria For A Monitoring System <ul><li>SNMP Support </li></ul><ul><ul><li>Can support most systems out there </li></ul></ul><ul><li>Extensibility </li></ul><ul><ul><li>Ability to plugin custom monitoring packages </li></ul></ul><ul><li>Flexible notifications </li></ul><ul><ul><li>Handle notifing operators and escaliting issues if they are not looked into </li></ul></ul><ul><li>Custom reaction </li></ul><ul><ul><li>In the event of errors that can not be diagnosed by computers, need to be able to notify a human to do further investigation </li></ul></ul><ul><li>Complex scheduling </li></ul><ul><ul><li>Ability to set the monitoring frequency and timing per monitoring item </li></ul></ul><ul><li>Maintenance scheduling </li></ul><ul><ul><li>Monitors should never be taken offline, they need to be smart enough to know when a maintence period is in effect </li></ul></ul><ul><li>Event acnowledgement </li></ul><ul><ul><li>Ability to understand when a event needs to be paged to a human at 2am, and when it shouldent </li></ul></ul><ul><li>Service dependencies </li></ul><ul><ul><li>You need to monitor all points between your monitoring system and the client. This includes Firewalls, Routers, Switches </li></ul></ul>
  11. 11. Release Cycles <ul><li>Basic Release Cycle </li></ul><ul><ul><li>Development </li></ul></ul><ul><ul><ul><li>Things are expected to break </li></ul></ul></ul><ul><ul><li>Staging </li></ul></ul><ul><ul><ul><li>QA and bug fixing a build before release </li></ul></ul></ul><ul><ul><li>Production </li></ul></ul><ul><ul><ul><li>Only serious bug fixes are pushed </li></ul></ul></ul><ul><li>Keep in mind that reality has priority over “Best Practice” </li></ul><ul><ul><li>You can and will have to release from development… it happens </li></ul></ul>
  12. 12. Fault Tolerence router switch www-1-1 www-1-2 Intertubes router switch www-1-1 www-1-2 Intertubes router switch
  13. 13. Load Balancing <ul><li>Load Balancing is NOT HA </li></ul><ul><li>Balancing is meant to spread the workload of requests across the cluster </li></ul><ul><li>Balancing Approaches </li></ul><ul><ul><li>Round robin </li></ul></ul><ul><ul><ul><li>One request per server in a uniform rotation </li></ul></ul></ul><ul><ul><li>Least connections </li></ul></ul><ul><ul><ul><li>The faster the machine processes requests the more it will receive </li></ul></ul></ul><ul><ul><li>Perdictive </li></ul></ul><ul><ul><ul><li>Useally based on Round robin or Least connections with some custom code </li></ul></ul></ul><ul><ul><li>Available resources </li></ul></ul><ul><ul><ul><li>Not a good choice, bad performance </li></ul></ul></ul><ul><ul><li>Random </li></ul></ul><ul><ul><ul><li>Pure random distribution of requests </li></ul></ul></ul><ul><ul><li>Weighted random </li></ul></ul><ul><ul><ul><li>Random with a preference to specific machines </li></ul></ul></ul>
  14. 14. Static Content <ul><li>Static content is </li></ul><ul><ul><li>Images </li></ul></ul><ul><ul><li>CSS </li></ul></ul><ul><ul><li>JS </li></ul></ul><ul><ul><li>Any non dynamic element </li></ul></ul><ul><li>Serving these items from a dedicated server fees up your web process for actual dynamic code, intern increasing your capacity and response speed </li></ul><ul><li>On you static server you can use lightHTTP, which is very quick at serving static content compaired to apache (Although apache 2.2.x is much better than 1.3.x) </li></ul>
  15. 15. Types of Caching <ul><li>Layered / Transport Cache </li></ul><ul><ul><li>“ Transparent” </li></ul></ul><ul><ul><li>Placed infront of your hardware and caches requests before they hit your webserver </li></ul></ul><ul><li>Intergrated (Look-Aside) Cache </li></ul><ul><ul><li>Computational Reuse technique </li></ul></ul><ul><ul><ul><li>Used where the cost of storing the results of a computation and later finding them again is less expensive than performing the computation again </li></ul></ul></ul><ul><li>Write-Thru Caches </li></ul><ul><ul><li>Application is responsible for updating the Cache and Datastore when changes are made </li></ul></ul><ul><li>Write-Back Caches </li></ul><ul><ul><li>All data changes are made to the cache </li></ul></ul><ul><ul><li>Cache layer is responsible for modifing the backend datastore </li></ul></ul><ul><li>Distrubuted Cache </li></ul><ul><ul><li>Using several machines to cache data, distrubiting the data and load </li></ul></ul><ul><ul><li>Memcached can do this very simply </li></ul></ul>
  16. 16. Memcahed <ul><li>It is a high-performance, distributed object caching system </li></ul><ul><li>It is simple to setup and use </li></ul><ul><ul><li># ./memcached -d -m 2048 -l -p 11211 </li></ul></ul><ul><li>It is not designed to be redudant </li></ul><ul><ul><li>If you loose data you memcache will repopulate the data as it is accessed </li></ul></ul><ul><li>It provides no security to your cache </li></ul><ul><ul><li>“ Memcached is the soft, doughy underbelly of your application. Part of what makes the clients and server lightweight is the complete lack of authentication. New connections are fast, and server configuration is nonexistent. If you wish to restrict access, you may use a firewall, or have memcached listen via unix domain sockets.” </li></ul></ul><ul><li>Limitations </li></ul><ul><ul><li>Key size limited to 250 characters </li></ul></ul><ul><ul><li>Data size limited to 1MB </li></ul></ul>
  17. 17. APC and why it’s your friend <ul><li>Alternative PHP Cache </li></ul><ul><ul><li>The Alternative PHP Cache (APC) is a free and open opcode cache for PHP. It was conceived of to provide a free, open, and robust framework for caching and optimizing PHP intermediate code. </li></ul></ul><ul><ul><li>Just enabling APC will transparently cache your code as you use it, no code changes required on your side </li></ul></ul><ul><ul><li>Provides a cheap caching layer that can be shared on a between all apache processes on one machine </li></ul></ul>
  18. 18. YSlow? <ul><li>Based on 13 princables from http://developer.yahoo.com/performance/rules.html </li></ul><ul><ul><li>1.) Make fewer HTTP requests </li></ul></ul><ul><ul><ul><li>80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages. </li></ul></ul></ul><ul><ul><li>2.) Use a CDN </li></ul></ul><ul><ul><ul><li>The user's proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start? </li></ul></ul></ul><ul><ul><li>3.) Add an Expires header </li></ul></ul><ul><ul><ul><li>Web page designs are getting richer and richer, which means more scripts, stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components. </li></ul></ul></ul><ul><ul><li>4.) Gzip components </li></ul></ul><ul><ul><ul><li>The time it takes to transfer an HTTP request and response across the network can be significantly reduced by decisions made by front-end engineers. It's true that the end-user's bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response. </li></ul></ul></ul>
  19. 19. YSlow? <ul><ul><li>5.) Put CSS at the top </li></ul></ul><ul><ul><ul><li>While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages load faster. This is because putting stylesheets in the HEAD allows the page to render progressively. </li></ul></ul></ul><ul><ul><li>6.) Put JS at the bottom </li></ul></ul><ul><ul><ul><li>Rule 5 described how stylesheets near the bottom of the page prohibit progressive rendering, and how moving them to the document HEAD eliminates the problem. Scripts (external JavaScript files) pose a similar problem, but the solution is just the opposite: it's better to move scripts from the top to as low in the page as possible. One reason is to enable progressive rendering, but another is to achieve greater download parallelization. </li></ul></ul></ul><ul><ul><li>7.) Avoid CSS expressions </li></ul></ul><ul><ul><ul><li>CSS expressions are a powerful (and dangerous) way to set CSS properties dynamically. They're supported in Internet Explorer, starting with version 5. As an example, the background color could be set to alternate every hour using CSS expressions. </li></ul></ul></ul><ul><ul><li>8.) Make JS and CSS External </li></ul></ul><ul><ul><ul><li>Many of these performance rules deal with how external components are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself? </li></ul></ul></ul>
  20. 20. YSlow? <ul><ul><li>9.) Reduce DNS lookups </li></ul></ul><ul><ul><ul><li>The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people's names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server's IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can't download anything from this hostname until the DNS lookup is completed. </li></ul></ul></ul><ul><ul><li>10.) Minify JS </li></ul></ul><ul><ul><ul><li>Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are JSMin and YUI Compressor. </li></ul></ul></ul><ul><ul><li>11.) Avoid redirects </li></ul></ul><ul><ul><ul><li>Redirects are accomplished using the 301 and 302 status codes. </li></ul></ul></ul><ul><ul><li>12.) Remove duplicate scripts </li></ul></ul><ul><ul><ul><li>It hurts performance to include the same JavaScript file twice in one page. This isn't as unusual as you might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution. </li></ul></ul></ul><ul><ul><li>13.) Configure Etags </li></ul></ul><ul><ul><ul><li>Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser's cache matches the one on the origin server. (An &quot;entity&quot; is another word for what I've been calling a &quot;component&quot;: images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component's ETag using the ETag response header. </li></ul></ul></ul><ul><ul><li>14.) Make AJAX cachable </li></ul></ul><ul><ul><ul><li>People ask whether these performance rules apply to Web 2.0 applications. They definitely do! This rule is the first rule that resulted from working with Web 2.0 applications at Yahoo!. </li></ul></ul></ul>
  21. 21. Example apache 2.x performace config # enable expirations ExpiresActive On # expire GIF images after a month in the client's cache ExpiresByType image/gif A2592000 ExpiresByType image/jpeg A2592000 ExpiresByType text/css A2592000 ExpiresByType application/x-javascript A2592000 # disable ETags FileETag None
  22. 22. Example apache 2.x performace config # Gzip Compression # Insert filter SetOutputFilter DEFLATE # Netscape 4.x has some problems... BrowserMatch ^Mozilla/4 gzip-only-text/html # Netscape 4.06-4.08 have some more problems BrowserMatch ^Mozilla/4.0[678] no-gzip # MSIE masquerades as Netscape, but it is fine BrowserMatch MSIE !no-gzip !gzip-only-text/html # NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48 # the above regex won't work. You can use the following # workaround to get the desired effect: BrowserMatch MSI[E] !no-gzip !gzip-only-text/html # Don't compress images SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png|mp3)$ no-gzip dont-vary # Make sure proxies don't deliver the wrong content Header append Vary User-Agent env=!dont-vary
  23. 24. <ul><li>YSlow: http://developer.yahoo.com/yslow/ </li></ul><ul><ul><li>Rules: http://developer.yahoo.com/performance/rules.html </li></ul></ul><ul><li>Scalable Internet Architectures </li></ul><ul><ul><li>By Theo Schlossnagle </li></ul></ul><ul><li>APC: http://us3.php.net/apc </li></ul><ul><li>Memcahed: http://www.danga.com/memcached/ </li></ul><ul><li>Selenium: http://www.openqa.org/selenium/ </li></ul><ul><li>Simpletest: http://simpletest.org/ </li></ul><ul><li>PHPUnit: http://www.phpunit.de/ </li></ul>