Caching and tuning fun for high scalability @ FrOSCon 2011
Upcoming SlideShare
Loading in...5
×
 

Caching and tuning fun for high scalability @ FrOSCon 2011

on

  • 2,791 views

"Caching and tuning fun for high scalability" talk at FrOSCon 2011

"Caching and tuning fun for high scalability" talk at FrOSCon 2011

Twitter : @wimgtr

Statistics

Views

Total Views
2,791
Views on SlideShare
2,743
Embed Views
48

Actions

Likes
2
Downloads
79
Comments
0

3 Embeds 48

http://lanyrd.com 44
http://us-w1.rockmelt.com 3
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Caching and tuning fun for high scalability @ FrOSCon 2011 Caching and tuning fun for high scalability @ FrOSCon 2011 Presentation Transcript

    • Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
    • Notes about this presentation
        This presentation was part of the FrOSCon 2011 program. It was designed to presented live and as a result many of the slides may seem odd without spoken explanation. The live benchmarks at the conference are ofcourse also not part of these slides.
    • Who am I ?
      • Wim Godden (@wimgtr)
      • Owner of Cu.be Solutions (http://cu.be)
      • PHP developer since 1997
      • Developer of OpenX
      • Zend Certified Engineer
      • Zend Framework Certified Engineer
      • MySQL Certified Developer
    • Who are you ?
      • Developers ?
      • System/network engineers ?
      • Managers ?
      • Caching experience ?
    • Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
    • Goals of this tutorial
      • Everything about caching and tuning
      • A few techniques
        • How-to
        • How-NOT-to
      • -> Increase reliability, performance and scalability
      • 5 visitors/day -> 5 million visitors/day
      • (Don't expect miracle cure !)
    • LAMP
    • LAMP
    • Architecture
    • Our test site
    • Our base benchmark
      • Apachebench = useful enough
      • Result ?
    • Caching
    • What is caching ?
    • What is caching ? select * from article join user on article.user_id = user.id order by created desc limit 10
    • Caching goals
      • Source of information (db, file, webservice, …) :
        • Reduce # of request
        • Reduce the load
      • Latency :
        • Reduce for visitor
        • Reduce for Webserver load
      • Network :
        • Send less data to visitor
        • Hey, that's frontend !
    • Theory of caching DB
    • Theory of caching DB
    • Theory of caching if ($data == false) DB
    • Caching techniques
        #1 : Store entire pages
      • Company Websites
      • Blogs
      • Full pages that don't change
      • Render -> Store in cache -> retrieve from cache
    • Caching techniques
        #1 : Store entire pages
    • Caching techniques
        #2 : Store parts of a page
      • Most common technique
      • Usually a small block in a page
      • Best effect : reused on lots of pages
    • Caching techniques
        #2 : Store parts of a page
    • Caching techniques
        #3 : Store SQL queries
      • ↔ SQL query cache
          • Limited in size
    • Caching techniques
        #3 : Store SQL queries
      • ↔ SQL query cache
          • Limited in size
          • Resets on every insert/update/delete
          • Server and connection overhead
      • Goal :
        • not to get rid of DB
        • free up DB resources for more hits !
    • Caching techniques
        #3 : Store SQL queries
    • Caching techniques
        #4 : Store complex processing results
      • Not just calculations
      • CPU intensive tasks :
        • Config file parsing
        • XML file parsing
        • Loading CSV in an array
      • Save resources -> more resources available
    • Caching techniques
        #4 : Store complex processing results
    • Caching techniques
        #xx : Your call Only limited by your imagination ! When you have data, think :
      • Creating time ?
      • Modification frequency ?
      • Retrieval frequency ?
    • How to find cacheable data
      • New projects : start from 'cache everything'
      • Existing projects :
        • Look at MySQL slow query log
        • Make a complete query log (don't forget to turn it off !)
        • Check page loading times
    • Caching storage - MySQL query cache
      • Use it
      • Don't rely on it
      • Good if you have :
        • lots of reads
        • few different queries
      • Bad if you have :
        • lots of insert/update/delete
        • lots of different queries
    • Caching storage - Disk
      • Data with few updates : good
      • Caching SQL queries : preferably not
      • DON'T use NFS or other network file systems
        • especially for sessions
        • high latency
        • locking issues !
    • Caching storage - Disk / ramdisk
      • Overhead : filesystem access
      • Limited number of files per directory
        • -> Subdirectories
      • Local
        • 5 Webservers -> 5 local caches
        • -> Hard to scale
        • How will you keep them synchronized ?
          • -> Don't say NFS or rsync !
    • Caching storage - Memcache
      • Facebook, Twitter, Slashdot, … -> need we say more ?
      • Distributed memory caching system
      • Multiple machines ↔ 1 big memory-based hash-table
      • Key-value storage system
        • Keys - max. 250bytes
        • Values - max. 1Mbyte
    • Caching storage - Memcache
      • Facebook, Twitter, Slashdot, … -> need we say more ?
      • Distributed memory caching system
      • Multiple machines ↔ 1 big memory-based hash-table
      • Key-value storage system
        • Keys - max. 250bytes
        • Values - max. 1Mbyte
      • Extremely fast... non-blocking, UDP (!)
    • Memcache - where to install
    • Memcache - where to install
    • Memcache - installation & running it
      • Installation
        • Distribution package
        • PECL
        • Windows : binaries
      • Running
        • No config-files
        • memcached -d -m <mem> -l <ip> -p <port>
        • ex. : memcached -d -m 2048 -l 127.0.0.1 -p 11211
    • Caching storage - Memcache - some notes
      • Not fault-tolerant
        • It's a cache !
        • Lose session data
        • Lose shopping cart data
        • ...
    • Caching storage - Memcache - some notes
      • Not fault-tolerant
        • It's a cache !
        • Lose session data
        • Lose shopping cart data
      • Different libraries
        • Original : libmemcache
        • New : libmemcached (consistent hashing, UDP, binary protocol, …)
      • Firewall your Memcache port !
    • Memcache in code <?php $memcache = new Memcache(); $memcache->addServer( '172.16.0.1' , 11211); $memcache->addServer( '172.16.0.2' , 11211); $myData = $memcache->get( 'myKey' ); if ($myData === false ) { $myData = GetMyDataFromDB(); // Put it in Memcache as 'myKey', without compression, with no expiration $memcache->set( 'myKey' , $myData, false , 0); } echo $myData;
    • Let's give that a go ! /** * Retrieves the 10 highest rated articles * @return array List of highest rated articles */ static public function getTopRatedArticleList () { if ($articleList = $cache->load( 'topRatedArticleList' ) === false) { $articleList = self :: getTopRatedArticleListUncached (); $cache->save($articleList, 'topRatedArticleList' ); } return $articleList; }
    • Where's the data ?
      • Memcache client decides (!)
      • 2 hashing algorithms :
        • Traditional
          • Server failure -> all data must be rehashed
        • Consistent
          • Server failure -> 1/x of data must be rehashed (x = # of servers)
      • No replication !
    • Memcache slabs
        (or why Memcache says it's full when it's not)
      • Multiple slabs of different sizes :
        • Slab 1 : 400 bytes
        • Slab 2 : 480 bytes (400 * 1.2)
        • Slab 3 : 576 bytes (480 * 1.2) (and so on...)
      • Multiplier (1.2 here) can be configured
      • Each larger slab has room for fewer items (chunks)
      • -> Store a lot of very large objects
      • -> Large slabs might be full
      • -> Rest of slabs might be free
      • -> Try to store more -> eviction of data !
    • Memcache - Is it working ?
      • Connect to it using telnet
        • &quot;stats&quot; command ->
        • Use Cacti or other monitoring tools
      STAT pid 2941 STAT uptime 10878 STAT time 1296074240 STAT version 1.4.5 STAT pointer_size 64 STAT rusage_user 20.089945 STAT rusage_system 58.499106 STAT curr_connections 16 STAT total_connections 276950 STAT connection_structures 96 STAT cmd_get 276931 STAT cmd_set 584148 STAT cmd_flush 0 STAT get_hits 211106 STAT get_misses 65825 STAT delete_misses 101 STAT delete_hits 276829 STAT incr_misses 0 STAT incr_hits 0 STAT decr_misses 0 STAT decr_hits 0 STAT cas_misses 0 STAT cas_hits 0 STAT cas_badval 0 STAT auth_cmds 0 STAT auth_errors 0 STAT bytes_read 613193860 STAT bytes_written 553991373 STAT limit_maxbytes 268435456 STAT accepting_conns 1 STAT listen_disabled_num 0 STAT threads 4 STAT conn_yields 0 STAT bytes 20418140 STAT curr_items 65826 STAT total_items 553856 STAT evictions 0 STAT reclaimed 0
    • Memcache - backing up
    • Memcache - deleting <?php $memcache = new Memcache(); $memcache->delete( 'myKey' ); $myData = $memcache->get( 'myKey' ); // $myData === false
    • Memcache - tip
        Page with multiple blocks ? -> use Memcached::getMulti() Warning : what if you get some hits and some misses ?
    • Naming your keys
      • Key names must be unique
      • Prefix / namespace your keys !
      • Only letters, numbers and underscore
      • md5() is useful
        • -> BUT : harder to debug
      • Use clear names
      • Document your key names !
    • Updating data
    • Updating data
    • Adding/updating data $memcache->delete( 'ArticleDetails__Toshiba_32C100U_32_Inch' ); $memcache->delete( 'Homepage_Popular_Product_List' );
    • Adding/updating data
    • Adding/updating data - Why it crashed
    • Adding/updating data - Why it crashed
    • Adding/updating data - Why it crashed
    • Cache stampeding elePHPants
    • Cache stampeding
    • Memcache code ? DB
    • Cache warmup scripts
      • Used to fill your cache when it's empty
      • Run it before starting Webserver !
      • 2 ways :
        • Visit all URLs
          • Error-prone
          • Hard to maintain
        • Call all cache-updating methods
      • Make sure you have a warmup script !
    • Cache stampeding - what about locking ?
        Seems like a nice idea, but...
      • Lock in place
      • -> lots of new connections
      • -> memory spike
      • What if the process that created the lock fails ?
    • Quick word about expiration
      • General rule : don't let things expire
      • Exception to the rule : things that have an end date (calendar items)
    • So...
        DON'T DELETE FROM CACHE (and don't expire unless usefull)
    • LAMP...
        -> LAMMP -> LANMMP
    • Nginx
      • Web server
      • Reverse proxy
      • Lightweight, fast
      • 7.5% of all Websites
    • Nginx
      • No threads, event-driven
      • Uses epoll / kqueue
      • Low memory footprint
      • 10000 active connections = normal
    • Nginx - a true alternative to Apache ?
      • Not all Apache modules
        • mod_auth_*
        • mod_dav*
      • Basic modules are available
      • Some 3 rd party modules (needs recompilation !)
    • Nginx - Installation
      • Packages
      • Win32 binaries
      • Build from source (./configure; make; make install)
    • Nginx - Configuration server { listen 80; server_name www.domain.ext *.domain.ext; index index.html; root /home/domain.ext/www; } server { listen 80; server_name photo.domain.ext; index index.html; root /home/domain.ext/photo; }
    • Nginx - phase 1
      • Move Apache to a different port (8080)
      • Put Nginx at port 80
      • Nginx serves all statics (images, css, js, …)
      • Forward dynamic requests to Apache
    • Nginx for static files only server { listen 80; server_name www.domain.ext; location ~* ^.*.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|pdf|ppt|txt|tar|rtf|js)$ { expires 30d; root /home/www.domain.ext; } location / { proxy_pass http://www.domain.ext:8080; proxy_pass_header Set-Cookie; proxy_set_header X-Real-IP $remote_addr; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; } }
    • Nginx - let's give that a go !
    • Nginx for PHP ?
        LANMMP to... LNMPP (ok, this is getting ridiculous)
    • Nginx with PHP
      • In the past : spawn-fcgi (from Lighttpd)
      • Now : PHP-FPM (in PHP 5.3 !)
      • Runs on port 9000
      • Nginx connects using fastcgi method
      location / { fastcgi_pass 127.0.0.1:9000; fastcgi_index index.php; include fastcgi_params; fastcgi_param SCRIPT_NAME $fastcgi_script_name; fastcgi_param SCRIPT_FILENAME /home/www.phpbenelux.eu/$fastcgi_script_name; fastcgi_param SERVER_NAME $host; fastcgi_intercept_errors on; }
    • Nginx + PHP-FPM features
      • Graceful upgrade
      • Spawn new processes under high load
      • Chroot
      • Slow request log !
    • Nginx + PHP-FPM features
      • Graceful upgrade
      • Spawn new processed under high load
      • Chroot
      • Slow request log !
      • fastcgi_finish_request() -> offline processing
    • Nginx + PHP-FPM - performance ?
    • Reverse proxy time...
    • Varnish
      • Not just a load balancer
      • Reverse proxy cache / http accelerator / …
      • Caches (parts of) pages in memory
      • Careful :
        • uses threads
        • Nginx is faster and scales better (but doesn't have VCL)
    • Varnish - Installation & configuration
      • Installation
        • Packages
        • Source : ./configure && make && make install
      • Configuration
        • /etc/default/varnish
        • /etc/varnish/*.vcl
    • Varnish - backends + load balancing backend server1 { .host = &quot;192.168.0.10&quot;; } backend server2{ .host = &quot;192.168.0.11&quot;; } director example_director round-robin { { .backend = server1; } { .backend = server2; } }
    • Varnish - backends + load balancing backend server1 { .host = &quot;192.168.0.10&quot;; .probe = { .url = &quot;/&quot;; .interval = 5s; .timeout = 1 s; .window = 5; .threshold = 3; } }
    • Varnish - VCL
      • Varnish Configuration Language
      • DSL (Domain Specific Language)
        • -> compiled to C
      • Hooks into each request
      • Defines :
        • Backends (web servers)
        • ACLs
        • Load balancing strategy
      • Can be reloaded while running
    • Varnish - whatever you want
      • Real-time statistics (varnishtop, varnishhist, ...)
      • ESI
    • Varnish - ESI
        Perfect for caching pages
      In your article page output : <esi:include src=&quot;/latest-news&quot;/>
    • Varnish with ESI - hold on tight !
    • Varnish - what can/can't be cached ?
      • Can :
        • Static pages
        • Images, js, css
        • Pages or parts of pages that don't change often (ESI)
      • Can't :
        • POST requests
        • Requests with Set-Cookie
        • Very large files (it's not a file server !)
        • User-specific content
    • ESI -> no caching on user-specific content ? Logged in as : Wim Godden 5 messages TTL = 5min TTL=1h TTL = 0s ?
    • Coming to an Nginx near you soon... Logged in as : Wim Godden 5 messages <esim:include src=&quot;/news&quot; ttl=&quot;5m&quot; /> <esim:include src=&quot;/menu&quot; ttl=&quot;1h&quot; /> <esim:include src=&quot;/top&quot; usercookie=&quot;PHPSESS_ID&quot; ttl=&quot;1h&quot; />
    • New message arrives...
      • Hash using page name and session
      • Self-chosen key (i.e. 'mails_for_' followed by session)
      DB
    • Advantages
      • No hits to backend anymore (except the first one) !
        • Not for user-specific content
        • Not even for non-specific content
    • Do we need TTLs ? Logged in as : Wim Godden 5 messages <esim:include src=&quot;/news&quot; ttl=&quot;5m&quot; /> <esim:include src=&quot;/menu&quot; ttl=&quot;1h&quot; /> <esim:include src=&quot;/top&quot; usercookie=&quot;PHPSESS_ID&quot; ttl=&quot;1h&quot; />
    • Advantages
      • No hits to backend anymore (except the first one) !
        • Not for user-specific content
        • Not even for non-specific content
          • No TTLs for non-specific content
          • TTL for user-specific content is required (defaults to 5min)
      • No need to specify ESI parameters in configuration file
        • Only needs enabling
    • How many Memcache requests ? Logged in as : Wim Godden 5 messages <esim:include src=&quot;/news&quot; ttl=&quot;5m&quot; /> <esim:include src=&quot;/menu&quot; ttl=&quot;1h&quot; /> <esim:include src=&quot;/top&quot; usercookie=&quot;PHPSESS_ID&quot; ttl=&quot;1h&quot; />
    • Advantages
      • No hits to backend anymore (except the first one) !
        • Not for user-specific content
        • Not even for non-specific content
          • No TTLs for non-specific content
          • TTL for user-specific content is required (defaults to 5min)
      • No need to specify ESI parameters in configuration file
        • Only needs enabling
      • Memcache getMulti -> 1 Memcache request per page
    • Under development
      • Feature set = unclear
      • Performance = even more unclear
        • Debugging code makes it slow
      • Extends ESI standard, but doesn't follow it entirely
        • (what standard ?)
      • Release date ?
        • End 2011 ?
    • Tuning
    • Apache - tuning tips
      • Disable unused modules -> fixes 10% of performance issues
      • Set AllowOverride to None
      • Disable SymLinksIfOwnerMatch
        • Why ? Site in /var/www/domain.com/subdomain/html
      • MinSpareServers, MaxSpareServers, StartServers, MaxClients, MPM selection -> a whole session of its own ;-)
      • Don't mod_proxy -> use Nginx or Varnish !
      • High load on an SSL-site ? -> put SSL on a reverse proxy
    • PHP speed - some tips
      • Upgrade PHP - every minor release has 5-15% speed gain !
      • Use an opcode cache
    • Caching storage - Opcode caching
    • PHP speed - some tips
      • Upgrade PHP - every minor release has 5-15% speed gain !
      • Use an opcode cache
      • Profile your code
        • XHProf
        • Xdebug
    • KCachegrind is your friend
    • PHP speed - some tips
      • Upgrade PHP - every minor release has 5-15% speed gain !
      • Use an opcode cache
      • Profile your code
        • XHProf
        • Xdebug
      • But : turn off profilers on acceptance/production platforms !
      • Let's see what difference opcode caching and profilers make...
    • DB speed - some tips
      • Avoid NOW() -> use PHP date(&quot;Y-m-d&quot;) as a parameter
        • Why ? Query cache !
      • Index, index, index ! (where needed only)
      • Use same types for joins
        • i.e. don't join decimal with int
      • RAND() is evil !
      • count(*) is evil in InnoDB without a where clause !
        • (and there are other examples of specific things to avoid)
      • Select the right storage engine
      • Persistent connect is not always good !
    • Caching & Tuning @ frontend http://www.websiteoptimization.com/speed/tweak/average-web-page/
    • Caching in the browser
      • HTTP 304 (Not modified)
      • Expires/Cache-Control header
    • HTTP 304 First request Next requests
    • HTTP 304 with ETag First request Next requests
    • Expires/Cache-control header
        Cache-Control
        • HTTP/1.1
        • Seconds to expiry
        • Used by browsers
      First request Next requests No requests until item expires
        Expires
        • HTTP/1.0
        • Date to expire on
        • Used by old proxies
        • Requires clock to be accurate !
    • Pragma: no-cache = evil
      • &quot;Pragma: no cache&quot; doesn't make it uncacheable
      • Don't want caching on a page ?
        • HTTP/1.0 : &quot;Expires : Fri, 30 Oct 1998 00:00:00 GMT&quot; (in the past)
        • HTTP/1.1 : &quot;Cache-Control: no-store&quot;
    • Frontend tuning
        1. You optimize backend 2. Frontend engineers messes up -> havoc on backend 3. Don't forget : frontend sends requests to backend ! SO...
      • Care about frontend
      • Test frontend
      • Check what requests frontend sends to backend
    • Tuning frontend
      • Minimize requests
        • Combine CSS/JavaScript files
        • Use inline images in CSS/XHTML (not supported on all browsers yet)
    • Frontend tuning - inline CSS/XHTML images #navbar span { width: 31px; height: 31px; display: inline; float: left; margin-right: 4px; } .home { background-image: url(........MEl0nGVUC6tObNnPceSFBaQVMJAxC4lo3gNOrUaFnTHoAxNm3XVxPfRq139e8BEGAjWD5bgIALw287T8AcAXLly2kjOACdc17higXSIKDO/Lpv7Qq4bw7APgBq8eOzX69InrZ6xe3dbxZffyTGkb8tdx8F+b0Xn2sFsCSBAgTM5lp63RHYnoHUudZgRgkGOGCB+43nGk4OGcQTabKx5dyJKJ7ImoUNCaRRAZYN1ppsrT3Y2gIwyjSQBAtUpABml/0IJGYd6VjQUDH9uBFkGxGm5I8dPQaRUAQUMBdhhBV25ZYUJZBcSAtSJBddWZZ5UAGPOTXlgkNVOSZdBxEwIkYu7VhYnAol5GaadRqF0Uaz0TgXnX2umVFyGakJUUAAADs=); margin-left: 4px; } <img border=0 src=&quot;......Uaz0TgXnX2umVFyGakJUUAAADs=&quot;>
    • Tuning frontend
      • Minimize requests
        • Combine CSS/JavaScript files
        • Use inline images in CSS (not supported on all browsers yet)
        • Use CSS Sprites
    • CSS Sprites
    • Tuning content - CSS sprites
    • Tuning content - CSS sprites 11 images 11 HTTP requests 24KByte 1 images 1 HTTP requests 14KByte
    • Tuning frontend
      • Minimize requests
        • Combine CSS/JavaScript files
        • Use inline images in CSS (not supported on all browsers yet)
        • Use CSS Sprites (horizontally if possible)
      • Put CSS at top
      • Put JavaScript at bottom
        • Max. no connections
        • Especially if JavaScript does Ajax (advertising-scripts, …) !
      • Avoid iFrames
        • Again : max no. of connections
      • Don't scale images in HTML
      • Have a favicon.ico (don't 404 it !)
    • Tuning frontend
      • Use GET for Ajax retrieval requests (and cache them !)
      • Split requests across subdomains
      • Put statics on a separate subdomain (without cookies !)
      www.whatever.com www.whatever.com images.whatever.com
    • Tuning miscellaneous
      • Avoid DNS lookups
        • Frontend : don't use too many subdomains (2 = ideal)
        • Backend :
          • Turn off DNS resolution in Apache : HostnameLookups Off
          • If your app uses external data
            • Run a local DNS cache (timeout danger !)
            • Make sure you can trust DNS servers (preferable run your own)
      • Compress non-binary content (GZIP)
        • mod_deflate in Apache
        • HttpGzipModule in Nginx (HttpGzipStaticModule for pre-zipped statics !)
        • No native support in Varnish
    • What else can kill your site ?
      • Redirect loops
        • Multiple requests
          • More load on Webserver
          • More PHP to process
        • Additional latency for visitor
        • Try to avoid redirects anyway
        • -> In ZF : use $this->_forward instead of $this->_redirect
      • Watch your logs, but equally important...
      • Watch the logging process ->
      • Logging = disk I/O -> can kill your site !
      • Slashdot effect
    • Above all else... be prepared !
      • Have a monitoring system
      • Use a cache abstraction layer (disk -> Memcache)
      • Don't install for the worst -> prepare for the worst
      • Have a test-setup
      • Have fallbacks
        • Turn off non-critical functionality
      • Questions ?
      • Questions ?
    • Contact
      • Twitter @wimgtr
      • Web http://techblog.wimgodden.be
      • Slides http://www.slideshare.net/wimg
      • E-mail [email_address]
    • Please...
      • Rate my talk : http://tinyurl.com/cachetune
      • Vote to see me talk at Confoo : http://www.confoo.ca
        • Caching and tuning fun for high scalability
        • Keeping old code alive without non-stop hassle
        • Beyond PHP : it's not (just) about the code !
        • Who's in control of your PHP app ?
        • Creating Fast, Dynamic ACLs in Zend Framework
      • Thanks !
    •