Web Speed And Scalability - Presentation Transcript
Jason Ragsdale – 01/08/2008
How to build a bigger, faster, and more reliable website
You will learn the concepts of Speed and Scalability
Specific Examples of Caching, Load Balancing and testing tools.
What is Scalability?
Avoiding Failure
High Availability?!?!?
Monitoring
Release Cycles
Fault Tolerence
Load Balancing
Static Content
Caching
Yslow (Let it be your friend)
Horizontal Scalability
Capacity can be increased just by adding more hardware/software
Best solution
Does not guarntee that you are safe
Up (Vertical) Scalability
Capacity can be increased by adding more Disk Storage, RAM , Processors
Expensive
Should only be used if Horizontal will not work for you
Difficult to move to Horizontal if you run out of capacity in your hardware
Capital investment will be made
The system will be more complex
Maintenance costs will increase
Time will be required to act
Good Planning
Have a plan for whatever you are about to do to your system, and most importantly, have a roll-back plan if and when things do not work the way you expected.
Functional and Unit Testing
Automated test do not catch everything that can go wrong, but they are very good at catching bugs introduced by changes elsewhere in your code base
Unit Testing (PHPUnit, Simpletest)
Function Testing (selenium)
Control Change (Version Control)
USE IT!!!! There is no better way even as a single developer to keep your codebase safe from bad changes
Version Control in Action
/trunk/
Used for all mainline development
/production/
Only stable and production ready code from trunk is contained in here. Only make fix severe bug fixes in this branch
/tags/
Holds copies of production ready code
Do not use Version Control as a backup solution, backup your VCS seperately
High Availablity?!?!?!
What is “five nines” 99.999%?
Do the math, 60 seconds * 60 minutes * 24 hours * 365 days
31,536,000 seconds of uptime a year
99.999 * 31536000 = 315.36 seconds of downtime a year
Understand the goodness of “Planned maintence periods”
There are things you will need to do to your systems on a peridoic basis I.E. Database Cleanup, Disk Defrag, Software/Hardware Upgrades
You can stagger your maintence periods if you have enough servers so you have no custmomer downtime, just a reduction in capacity
Monitoring
No matter how stable your code is or how reliable your hardware, you will have failure
Monitoring Methods
Top Down (Business Monitors)
Monitor the application as the customer interacts with it
Bottom Up (System Monitors)
Most commonly used
Monitors the base components of your application like
Disk Space
Network speed
Database Statistics
By no means bad, but without Business Monitoring you will not be able to catch all failures
Criteria For A Monitoring System
SNMP Support
Can support most systems out there
Extensibility
Ability to plugin custom monitoring packages
Flexible notifications
Handle notifing operators and escaliting issues if they are not looked into
Custom reaction
In the event of errors that can not be diagnosed by computers, need to be able to notify a human to do further investigation
Complex scheduling
Ability to set the monitoring frequency and timing per monitoring item
Maintenance scheduling
Monitors should never be taken offline, they need to be smart enough to know when a maintence period is in effect
Event acnowledgement
Ability to understand when a event needs to be paged to a human at 2am, and when it shouldent
Service dependencies
You need to monitor all points between your monitoring system and the client. This includes Firewalls, Routers, Switches
Release Cycles
Basic Release Cycle
Development
Things are expected to break
Staging
QA and bug fixing a build before release
Production
Only serious bug fixes are pushed
Keep in mind that reality has priority over “Best Practice”
You can and will have to release from development… it happens
Balancing is meant to spread the workload of requests across the cluster
Balancing Approaches
Round robin
One request per server in a uniform rotation
Least connections
The faster the machine processes requests the more it will receive
Perdictive
Useally based on Round robin or Least connections with some custom code
Available resources
Not a good choice, bad performance
Random
Pure random distribution of requests
Weighted random
Random with a preference to specific machines
Static Content
Static content is
Images
CSS
JS
Any non dynamic element
Serving these items from a dedicated server fees up your web process for actual dynamic code, intern increasing your capacity and response speed
On you static server you can use lightHTTP, which is very quick at serving static content compaired to apache (Although apache 2.2.x is much better than 1.3.x)
Types of Caching
Layered / Transport Cache
“ Transparent”
Placed infront of your hardware and caches requests before they hit your webserver
Intergrated (Look-Aside) Cache
Computational Reuse technique
Used where the cost of storing the results of a computation and later finding them again is less expensive than performing the computation again
Write-Thru Caches
Application is responsible for updating the Cache and Datastore when changes are made
Write-Back Caches
All data changes are made to the cache
Cache layer is responsible for modifing the backend datastore
Distrubuted Cache
Using several machines to cache data, distrubiting the data and load
Memcached can do this very simply
Memcahed
It is a high-performance, distributed object caching system
It is simple to setup and use
# ./memcached -d -m 2048 -l 10.0.0.40 -p 11211
It is not designed to be redudant
If you loose data you memcache will repopulate the data as it is accessed
It provides no security to your cache
“ Memcached is the soft, doughy underbelly of your application. Part of what makes the clients and server lightweight is the complete lack of authentication. New connections are fast, and server configuration is nonexistent. If you wish to restrict access, you may use a firewall, or have memcached listen via unix domain sockets.”
Limitations
Key size limited to 250 characters
Data size limited to 1MB
APC and why it’s your friend
Alternative PHP Cache
The Alternative PHP Cache (APC) is a free and open opcode cache for PHP. It was conceived of to provide a free, open, and robust framework for caching and optimizing PHP intermediate code.
Just enabling APC will transparently cache your code as you use it, no code changes required on your side
Provides a cheap caching layer that can be shared on a between all apache processes on one machine
YSlow?
Based on 13 princables from http://developer.yahoo.com/performance/rules.html
1.) Make fewer HTTP requests
80% of the end-user response time is spent on the front-end. Most of this time is tied up in downloading all the components in the page: images, stylesheets, scripts, Flash, etc. Reducing the number of components in turn reduces the number of HTTP requests required to render the page. This is the key to faster pages.
2.) Use a CDN
The user's proximity to your web server has an impact on response times. Deploying your content across multiple, geographically dispersed servers will make your pages load faster from the user's perspective. But where should you start?
3.) Add an Expires header
Web page designs are getting richer and richer, which means more scripts, stylesheets, images, and Flash in the page. A first-time visitor to your page may have to make several HTTP requests, but by using the Expires header you make those components cacheable. This avoids unnecessary HTTP requests on subsequent page views. Expires headers are most often used with images, but they should be used on all components including scripts, stylesheets, and Flash components.
4.) Gzip components
The time it takes to transfer an HTTP request and response across the network can be significantly reduced by decisions made by front-end engineers. It's true that the end-user's bandwidth speed, Internet service provider, proximity to peering exchange points, etc. are beyond the control of the development team. But there are other variables that affect response times. Compression reduces response times by reducing the size of the HTTP response.
YSlow?
5.) Put CSS at the top
While researching performance at Yahoo!, we discovered that moving stylesheets to the document HEAD makes pages load faster. This is because putting stylesheets in the HEAD allows the page to render progressively.
6.) Put JS at the bottom
Rule 5 described how stylesheets near the bottom of the page prohibit progressive rendering, and how moving them to the document HEAD eliminates the problem. Scripts (external JavaScript files) pose a similar problem, but the solution is just the opposite: it's better to move scripts from the top to as low in the page as possible. One reason is to enable progressive rendering, but another is to achieve greater download parallelization.
7.) Avoid CSS expressions
CSS expressions are a powerful (and dangerous) way to set CSS properties dynamically. They're supported in Internet Explorer, starting with version 5. As an example, the background color could be set to alternate every hour using CSS expressions.
8.) Make JS and CSS External
Many of these performance rules deal with how external components are managed. However, before these considerations arise you should ask a more basic question: Should JavaScript and CSS be contained in external files, or inlined in the page itself?
YSlow?
9.) Reduce DNS lookups
The Domain Name System (DNS) maps hostnames to IP addresses, just as phonebooks map people's names to their phone numbers. When you type www.yahoo.com into your browser, a DNS resolver contacted by the browser returns that server's IP address. DNS has a cost. It typically takes 20-120 milliseconds for DNS to lookup the IP address for a given hostname. The browser can't download anything from this hostname until the DNS lookup is completed.
10.) Minify JS
Minification is the practice of removing unnecessary characters from code to reduce its size thereby improving load times. When code is minified all comments are removed, as well as unneeded white space characters (space, newline, and tab). In the case of JavaScript, this improves response time performance because the size of the downloaded file is reduced. Two popular tools for minifying JavaScript code are JSMin and YUI Compressor.
11.) Avoid redirects
Redirects are accomplished using the 301 and 302 status codes.
12.) Remove duplicate scripts
It hurts performance to include the same JavaScript file twice in one page. This isn't as unusual as you might think. A review of the ten top U.S. web sites shows that two of them contain a duplicated script. Two main factors increase the odds of a script being duplicated in a single web page: team size and number of scripts. When it does happen, duplicate scripts hurt performance by creating unnecessary HTTP requests and wasted JavaScript execution.
13.) Configure Etags
Entity tags (ETags) are a mechanism that web servers and browsers use to determine whether the component in the browser's cache matches the one on the origin server. (An "entity" is another word for what I've been calling a "component": images, scripts, stylesheets, etc.) ETags were added to provide a mechanism for validating entities that is more flexible than the last-modified date. An ETag is a string that uniquely identifies a specific version of a component. The only format constraints are that the string be quoted. The origin server specifies the component's ETag using the ETag response header.
14.) Make AJAX cachable
People ask whether these performance rules apply to Web 2.0 applications. They definitely do! This rule is the first rule that resulted from working with Web 2.0 applications at Yahoo!.
Example apache 2.x performace config # enable expirations ExpiresActive On # expire GIF images after a month in the client's cache ExpiresByType image/gif A2592000 ExpiresByType image/jpeg A2592000 ExpiresByType text/css A2592000 ExpiresByType application/x-javascript A2592000 # disable ETags FileETag None
Example apache 2.x performace config # Gzip Compression # Insert filter SetOutputFilter DEFLATE # Netscape 4.x has some problems... BrowserMatch ^Mozilla/4 gzip-only-text/html # Netscape 4.06-4.08 have some more problems BrowserMatch ^Mozilla/4.0[678] no-gzip # MSIE masquerades as Netscape, but it is fine BrowserMatch MSIE !no-gzip !gzip-only-text/html # NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48 # the above regex won't work. You can use the following # workaround to get the desired effect: BrowserMatch MSI[E] !no-gzip !gzip-only-text/html # Don't compress images SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png|mp3)$ no-gzip dont-vary # Make sure proxies don't deliver the wrong content Header append Vary User-Agent env=!dont-vary
0 comments
Post a comment