Caching and tuning fun for high scalability

Caching and tuning fun
for high scalability
Wim Godden
Cu.be Solutions

Who am I ?
Wim Godden (@wimgtr)
Founder of Cu.be Solutions (http://cu.be)
Open Source developer since 1997
Developer of OpenX, PHPCompatibility, Nginx SCL, ...
Speaker at PHP and Open Source conferences

Who are you ?
Developers ?
System/network engineers ?
Managers ?
Caching experience ?

Goals of this tutorial
Everything about caching and tuning
A few techniques
How-to
How-NOT-to
→ Increase reliability, performance and scalability
5 visitors/day → 5 million visitors/day
(Don't expect miracle cure !)

Test page
3 DB-queries
select firstname, lastname, email from user where user_id = 5;
select title, createddate, body from article order by createddate desc limit 5;
select title, createddate, body from article order by score desc limit 5;
Page just outputs result

Our base benchmark
Apachebench = useful enough
Result ?
Single webserver Proxy
Static PHP Static PHP
Apache + PHP 3900 17.5 6700 17.5
Limit :
CPU, network
or disk
Limit :
database

What is caching ?
x = 5, y = 2
n = 50
Same result
CACHECACHE
select
*
from
article
join user
on article.user_id = user.id
order by
created desc
limit
10
Doesn't change
all the time

Theory of caching
DB
Cache
$data = get('key')
false
GET /page
Page
select data from
table
$data = returned result
set('key', $data)
if ($data == false)

Theory of caching
DB
Cache
HIT

Caching goals - 1st
goal
Reduce # of concurrent request
Reduce the load

Some figures
Pageviews : 5000 (4000 on 10 pages)
Avg. loading time : 200ms
Cache 10 pages
Avg. loading time : 20ms
→ Total avg. loading time : 56ms
Worth it ?

Caching goals - 3rd
goal
Send less data across the network / Internet
You benefit → lower bill from upstream provider
Users benefit → faster page load
Wait a second... that's a frontend problem !
True, but remember : the backend is transmitting it !

Caching techniques
#1 : Store entire pages
Company Websites
Blogs
Full pages that don't change
Render → Store in cache → retrieve from cache

Caching techniques
#2 : Store parts of a page
Most common technique
Usually a small block in a page
Best effect : reused on lots of pages
Can be inserted on dynamic pages

Caching techniques
#3 : Store SQL queries
↔ SQL query cache
Limited in size

Caching techniques
#3 : Store SQL queries
↔ SQL query cache
Limited in size
Resets on every insert/update/delete
Server and connection overhead
Goal :
not to get rid of DB
free up DB resources for more hits !
Better :
store processed data instead of raw data
store group of objects

Caching techniques
#4 : Store complex PHP results
Not just calculations
CPU intensive tasks :
Config file parsing
XML file parsing
Loading CSV in an array
Save resources → more resources available

Caching techniques
#xx : Your call
Only limited by your imagination !
When you have data, think :
Creating time ?
Modification frequency ?
Retrieval frequency ?

How to find cacheable data
New projects : start from 'cache everything'
Existing projects :
Check page loading times
Look at MySQL slow query log
Make a complete query log (don't forget to turn it off !)
→ Use Percona Toolkit (pt-query-digest)

Databases - pt-query-digest
# Profile
# Rank Query ID Response time Calls R/Call Apdx V/M Item
# ==== ================== ================ ===== ======= ==== ===== ==========
# 1 0x543FB322AE4330FF 16526.2542 62.0% 1208 13.6806 1.00 0.00 SELECT output_option
# 2 0xE78FEA32E3AA3221 0.8312 10.3% 6412 0.0001 1.00 0.00 SELECT poller_output poller_item
# 3 0x211901BF2E1C351E 0.6811 8.4% 6416 0.0001 1.00 0.00 SELECT poller_time
# 4 0xA766EE8F7AB39063 0.2805 3.5% 149 0.0019 1.00 0.00 SELECT wp_terms wp_term_taxonomy wp_term_relationships
# 5 0xA3EEB63EFBA42E9B 0.1999 2.5% 51 0.0039 1.00 0.00 SELECT UNION wp_pp_daily_summary wp_pp_hourly_summary
# 6 0x94350EA2AB8AAC34 0.1956 2.4% 89 0.0022 1.00 0.01 UPDATE wp_options
# MISC 0xMISC 0.8137 10.0% 3853 0.0002 NS 0.0 <147 ITEMS>

Databases - pt-query-digest
# Query 2: 0.26 QPS, 0.00x concurrency, ID 0x92F3B1B361FB0E5B at byte 14081299
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: | _^ |
# Time range: 2011-12-28 18:42:47 to 19:03:10
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 1 312
# Exec time 50 4s 5ms 25ms 13ms 20ms 4ms 12ms
# Lock time 3 32ms 43us 163us 103us 131us 19us 98us
# Rows sent 59 62.41k 203 231 204.82 202.40 3.99 202.40
# Rows examine 13 73.63k 238 296 241.67 246.02 10.15 234.30
# Rows affecte 0 0 0 0 0 0 0 0
# Rows read 59 62.41k 203 231 204.82 202.40 3.99 202.40
# Bytes sent 53 24.85M 46.52k 84.36k 81.56k 83.83k 7.31k 79.83k
# Merge passes 0 0 0 0 0 0 0 0
# Tmp tables 0 0 0 0 0 0 0 0
# Tmp disk tbl 0 0 0 0 0 0 0 0
# Tmp tbl size 0 0 0 0 0 0 0 0
# Query size 0 21.63k 71 71 71 71 0 71
# InnoDB:
# IO r bytes 0 0 0 0 0 0 0 0
# IO r ops 0 0 0 0 0 0 0 0
# IO r wait 0 0 0 0 0 0 0 0
# pages distin 40 11.77k 34 44 38.62 38.53 1.87 38.53
# queue wait 0 0 0 0 0 0 0 0
# rec lock wai 0 0 0 0 0 0 0 0
# Boolean:
# Full scan 100% yes, 0% no
# String:
# Databases wp_blog_one (264/84%), wp_blog_tw… (36/11%)... 1 more
# Hosts
# InnoDB trxID 86B40B (1/0%), 86B430 (1/0%), 86B44A (1/0%)... 309 more
# Last errno 0
# Users wp_blog_one (264/84%), wp_blog_two (36/11%)... 1 more
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms ################################################################
# 100ms
# 1s
# 10s+
# Tables
# SHOW TABLE STATUS FROM `wp_blog_one ` LIKE 'wp_options'G
# SHOW CREATE TABLE `wp_blog_one `.`wp_options`G
# EXPLAIN /*!50100 PARTITIONS*/
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'G

Caching storage - MySQL query cache
Use it
Don't rely on it
Good if you have :
lots of reads
few different queries
Bad if you have :
lots of insert/update/delete
lots of different queries

The problem with SQL query caching
select id, name from someTable where x = 5; ← uncached
select id, name from someTable where x = 5; ← cached
update someTable set name="Jim" where x = 10;
select id, name from someTable where x = 5; ← uncached
Imagine :
500 select/sec, 10 updates/min → 10 cache purges per min
50 select/sec, 10 update/sec → 10 cache purge per sec

Caching storage - Database memory tables
Tables stored in memory
In MySQL : memory/heap table
↔ temporary table :
memory tables are persistent
temporary tables are session-specific
Faster than disk-based tables
Can be joined with disk-based tables
But :
default 16MByte limit
master-slave = trouble
if you don't need join → overhead of DB software
So : don't use it unless you need to join

Caching storage - Opcode caching
DO !

APC
De-facto standard
Will be in PHP core in 5.4 ? 5.5 ? 6.0 ?
PECL or packages
eAccelerator
Zend Accelerator
X-Cache
WinCacheForPhp

APC
De-facto standard until 5.4
PECL or packages
Zend Optimizer+
Built-in with PHP 5.5
eAccelerator
PHP PHP + APC
42.18 req/sec 206.20 req/sec

Caching storage - Disk
Data with few updates : good
Caching SQL queries : preferably not
DON'T use NFS or other network file systems
high latency
possible problem for sessions : locking issues !

Caching storage - Memory disk (ramdisk)
Usually faster than physical disk
But : OS file caching makes difference minimal
(if you have enough memory)

Caching storage - Disk / ramdisk
Overhead : filesystem
Limited number of files per directory
→ Subdirectories
Local
5 Webservers → 5 local caches
How will you keep them synchronized ?
→ Don't say NFS or rsync !

Caching storage - APC variable cache
More than an opcode cache (PHP 5.5 → use APCu)
Store user data in memory
apc_add / apc_store to add/update
apc_fetch to retrieve
apc_delete
Fast → huge performance impact
Session support !
Downside :
local storage → hard to scale
restart Apache → cache = empty

Caching storage - Memcache(d)
Facebook, Twitter, YouTube, … → need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system
Keys - max. 250bytes
Values - max. 1Mbyte

Caching storage - Memcache(d)
Facebook, Twitter, YouTube, … → need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system
Keys - max. 250bytes
Values - max. 1Mbyte
Extremely fast... non-blocking, UDP (!)

Memcache - installation & running it
Installation
Distribution package
PECL
Windows : binaries
Running
No config-files
memcached -d -m <mem> -l <ip> -p <port>
ex. : memcached -d -m 2048 -l 172.16.1.91 -p 11211

Caching storage - Memcache - some notes
Not fault-tolerant
It's a cache !
Lose session data
Lose shopping cart data
...

Caching storage - Memcache - some notes
Not fault-tolerant
It's a cache !
Lose session data
Lose shopping cart data
…
Different libraries
Original : libmemcache
New : libmemcached (consistent hashing, UDP, binary protocol, …)
Firewall your Memcache port !

Memcache in code
<?php
$memcache = new Memcache();
$memcache->addServer('172.16.0.1', 11211);
$myData = $memcache->get('myKey');
if ($myData === false) {
$myData = GetMyDataFromDB();
// Put it in Memcache as 'myKey', without compression, with no expiration
$memcache->set('myKey', $myData, false, 0);
}
echo $myData;

Memcache in code
<?php
$myData = $memcache->get('myKey');
if ($memcache->getResultCode() == Memcached::RES_NOTSTORED) {
$myData = GetMyDataFromDB();
// Put it in Memcache as 'myKey', without compression, with no expiration
$memcache->set('myKey', $myData, false, 0);
}
echo $myData;

Benchmark with Memcache
Apache + PHP 3900 17.5 6700 17.5
Apache + PHP + MC 3900 55 6700 108

Where's the data ?
Memcache client decides (!)
2 hashing algorithms :
Traditional
Server failure → all data must be rehashed
Consistent
Server failure → 1/x of data must be rehashed (x = # of servers)
No replication !

Memcache slabs
(or why Memcache says it's full when it's not)
Multiple slabs of different sizes :
Slab 1 : 40 bytes
Slab 2 : 50 bytes (40 * 1.25)
Slab 3 : 63 bytes (63 * 1.25) (and so on...)
Multiplier (1.25 by default) can be configured
Store a lot of objects of different sizes
→ Certain slabs : full
→ Other slabs : Mostly empty
→ Eviction of data !

Memcache - Is it working ?
Connect to it using telnet
"stats" command →
Use Cacti or other monitoring tools
STAT pid 2941
STAT uptime 10878
STAT time 1296074240
STAT version 1.4.5
STAT pointer_size 64
STAT rusage_user 20.089945
STAT rusage_system 58.499106
STAT curr_connections 16
STAT total_connections 276950
STAT connection_structures 96
STAT cmd_get 276931
STAT cmd_set 584148
STAT cmd_flush 0
STAT get_hits 211106
STAT get_misses 65825
STAT delete_misses 101
STAT delete_hits 276829
STAT incr_misses 0
STAT incr_hits 0
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 0
STAT cas_badval 0
STAT auth_cmds 0
STAT auth_errors 0
STAT bytes_read 613193860
STAT bytes_written 553991373
STAT limit_maxbytes 268435456
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT threads 4
STAT conn_yields 0
STAT bytes 20418140
STAT curr_items 65826
STAT total_items 553856
STAT evictions 0
STAT reclaimed 0

Memcache - deleting
<?php
$memcache->delete('myKey');

Memcache - caching a page
<?php
$output = $memcache->get('page_' . $page_id);
if ($output === false) {
ob_start();
GetMyPageInRegularWay($page_id);
$output = ob_get_contents();
ob_end_clean();
$memcache->set('page_' . $page_id, $output, false, 600); // Cache 10 mins
}
echo $output;

Memcache - tip
Page with multiple blocks ?
→ use Memcached::getMulti()
But : what if you get some hits and some misses ?
getMulti($array)
Hashing
algorithm

Naming your keys
Key names must be unique
Prefix / namespace your keys !
Only letters, numbers and underscore
Why ? → Change caching layer
md5() is useful
→ BUT : harder to debug
Use clear names
Document your key names !

Updating data
LCD_Popular_Product_List

Adding/updating data
$memcache->delete('ArticleDetails__Toshiba_32C100U_32_Inch');
$memcache->delete('LCD_Popular_Product_List');

Adding/updating data - Why it crashed

Memcache code ?
DB
Visitor interface Admin interface
Memcache code

Standard caching code
public function getArticle($id)
{
$cache = Zend_Registry::get('Zend_Cache');
if (!$articleList = $cache->load('article_' . $id)) {
$select = $this->db->select()
->from('article', array('id', 'title', 'body', 'created'))
->join('user', 'user.id = article.user_id', array('username'))
->where('article.id = ?', $id);
$articleList = $db->fetchRow($select, $id);
$cache->save($articleList, 'article_' . $id);
}
return $articleList;
}

Standard caching code
public function getArticleUncached($id)
{
$select = $this->db->select()
->from('article', array('id', 'title', 'body', 'created'))
->join('user', 'user.id = article.user_id', array('username'))
->where('article.id = ?', $id);
return $db->fetchRow($select, $id);
}
public function getArticle($id)
{
$cache = Zend_Registry::get('Zend_Cache');
if (!$articleList = $cache->load('article_' . $id)) {
$articleList = $this->getArticleUncached($id);
$cache->save($articleList, 'article_' . $id);
}
return $articleList;
}
public function updateArticleCache($id)
{
$cache->save(
$this->getArticleUncached($id),
'article_' . $id
);
}

Cache warmup scripts
Used to fill your cache when it's empty
Run it before starting Webserver !
2 ways :
Visit all URLs
Error-prone
Hard to maintain
Call all cache-updating methods
Make sure you have a warmup script !

Cache stampeding - what about locking ?
Seems like a nice idea, but...
While lock in place
What if the process that created the lock fails ?

Quick word about expiration
General rule : don't let things expire
Exception to the rule : things that have an end date (calendar
items)

So...
DON'T DELETE FROM CACHE
&
DON'T EXPIRE FROM CACHE
(unless you know you'll never store it again)

Quick-tip
Start small → disk or APC
Move to Memcached/Redis/... later
But : is your code ready ?
→ Use a component like Zend_Cache to switch easily !

Time for...
a break (15 min)
After the break :
Byebye Apache
Reverse proxying
The importance of frontend
...

Nginx
Web server
Reverse proxy
Lightweight, fast
12.89% of all Websites

Nginx
No threads, event-driven
Uses epoll / kqueue
Low memory footprint
10000 active connections = normal

Nginx - a true alternative to Apache ?
Not all Apache modules
mod_auth_*
mod_dav*
…
Basic modules are available
Some 3rd
party modules (needs recompilation !)

Nginx - Installation
Packages
Win32 binaries
→ Not for production !
Build from source (./configure; make; make install)

Nginx - Configuration
server {
listen 80;
server_name www.domain.ext *.domain.ext;
index index.html;
root /home/domain.ext/www;
}
server {
listen 80;
server_name photo.domain.ext;
index index.html;
root /home/domain.ext/photo;
}

Nginx - phase 1
Move Apache to a different port (8080)
Put Nginx at port 80
Nginx serves all statics (images, css, js, …)
Forward dynamic requests to Apache

Nginx for static files only
server {
listen 80;
server_name www.domain.ext;
location ~* ^.*.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|pdf|ppt|txt|tar|rtf|js)$ {
expires 30d;
root /home/www.domain.ext;
}
location / {
proxy_pass http://www.domain.ext:8080;
proxy_pass_header Set-Cookie;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

Nginx for PHP ?
Bottleneck = PHP ? Keep it in Apache
Bottleneck = memory ? Go for it !
LANMMP to... LNMPP
(ok, this is getting ridiculous)

Nginx with PHP-FPM
Since PHP 5.3.3
Runs on port 9000
Nginx connects using fastcgi method
location / {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
include fastcgi_params;
fastcgi_param SCRIPT_NAME $fastcgi_script_name;
fastcgi_param SCRIPT_FILENAME /home/www.domain.ext/$fastcgi_script_name;
fastcgi_param SERVER_NAME $host;
fastcgi_intercept_errors on;
}

Nginx + PHP-FPM features
Graceful upgrade
Spawn new processes under high load
Chroot
Slow request log !

Nginx + PHP-FPM features
Graceful upgrade
Spawn new processes under high load
Chroot
Slow request log !
fastcgi_finish_request() → offline processing

Nginx + PHP-FPM - performance ?
Apache + PHP 3900 17.5 6700 17.5
Apache + PHP + MC 3900 55 6700 108
Nginx + PHP-FPM + MC 11700 57 11200 112
Limit :
single-threaded
Apachebench

Nginx + PHP-FPM - performance ?
Apache + PHP 3900 17.5 6700 17.5
Apache + PHP + MC 3900 55 6700 108
Nginx + PHP-FPM + MC 11700 57 11200 112
Apache (tuned) +
PHP/MC
10600 55 11400 108
Limit :
single-threaded
Apachebench

Varnish
Not just a load balancer
Reverse proxy cache / http accelerator / …
Caches (parts of) pages in memory
Careful :
uses threads (like Apache)
Nginx usually scales better (but doesn't have VCL)

Varnish - Installation & configuration
Installation
Packages
Source : ./configure && make && make install
Configuration
/etc/default/varnish
/etc/varnish/*.vcl

Varnish - backends + load balancing
backend server1 {
.host = "192.168.0.10";
}
backend server2 {
.host = "192.168.0.11";
}
director example_director round-robin {
{
.backend = server1;
}
{
.backend = server2;
}
}

Varnish - backends + load balancing
backend server1 {
.host = "192.168.0.10";
.probe = {
.url = "/";
.interval = 5s;
.timeout = 1 s;
.window = 5;
.threshold = 3;
}
}

Varnish - VCL
Varnish Configuration Language
DSL (Domain Specific Language)
→ compiled to C
Hooks into each request
Defines :
Backends (web servers)
ACLs
Load balancing strategy
Can be reloaded while running

Varnish - whatever you want
Real-time statistics (varnishtop, varnishhist, ...)
ESI

Article content page
Article content (TTL : 15 min)
/article/732
Varnish - ESI
Header (TTL : 60 min)
/top
Latest news (TTL : 2 min) /news
Navigation
(TTL :
60 min)
/nav

Going to /page/id/732
<esi:include src="/top"/>
<esi:include src="/nav"/>
<esi:include src="/news"/>
<esi:include src="/article/732"/>

Article content page
<esi:include src="/article/732"/>
Varnish - ESI
Perfect for caching pages
<esi:include src="/top"/>
<esi:include src="/news"/>
<esi:include
src="/nav"/>
In your Varnish config :
sub vcl_fetch {
if (req.url == "/news") {
esi; /* Do ESI processing */
set obj.ttl = 2m;
} elseif (req.url == "/nav") {
esi;
set obj.ttl = 1m;
} elseif ….
….
}

Varnish with ESI - hold on tight !
Apache + PHP 3900 17.5 6700 17.5
Apache + PHP + MC 3900 55 6700 108
Nginx + PHP-FPM + MC 11700 57 11200 112
Varnish - - 11200 4200

Varnish - what can/can't be cached ?
Can :
Static pages
Images, js, css
Pages or parts of pages that don't change often (ESI)
Can't :
POST requests
Very large files (it's not a file server !)
Requests with Set-Cookie
User-specific content

ESI → no caching on user-specific content ?
Logged in as : Wim Godden
5 messages
TTL = 5minTTL=1h
TTL = 0s ?

Coming soon...
Based on Nginx
Reduces load by 50 – 95%
Requires code changes !
Well-built project → few changes
Effect on webservers and database servers

ESI on Nginx
Logged in as : Wim Godden
5 messages
NEWSMenu

SCL on Nginx + Memcached
<scl:include key="news" src="/news" ttl="5m" />
<scl:include
key="menu"
src="/menu"
ttl="1h" />
<scl:include key="top" src="/top" session="true" ttl="1h" />

Requesting /page (1st
time)
Nginx
Shared memory
1
2
3
4
/page
/page

Requesting /page ESI subrequests (1st
time)
Nginx
1
2
3
/menu
/news
/top (in SCL session)

Requesting /page (next time)
Nginx
Shared memory
1
2
/page
/menu
/news
/top (in SCL session)
/page

New message is sent...
POST /send
DB
insert into...
set(...)
top (in SCL session)

Advantages
No repeated GET hits to webserver anymore !
At login : POST → warm up the cache !
No repeated hits for user-specific content
Not even for non-specific content

First release : ESI
Part of the ESI 1.0 spec
Only relevant features implemented
Extension for dynamic session support
But : unavailable for copyright reasons

Rebuilt from scratch : SCL
Session-specific Caching Language
Ideas for a better name ?
Language details :
Control structures : if/else, switch/case, foreach
Variable handling
Strings : concatenation, substring, ...

SCL code samples
You are logged in as : <scl:session_var("person_name") />
You are logged in as : <@s("person_name") />

SCL code samples
<scl:switch var="session_var('isAdmin')">
<scl:case value="1">
<scl:include key="admin-buttons" src="/admin-buttons.php" />
</scl:case>
<scl:default>
<div id="just-a-user">
<scl:include key="user-buttons" src="/user-buttons.php" />
</div>
</scl:default>
</scl:switch>

Figures
2nd
customer :
No. of web servers : 72 → 8
No. of db servers : 15 → 4
Total : 87 → 12 (86% reduction !)
Last customer :
No. of total servers : +/- 1350
Expected reduction : 1350 → 300
Expected savings : €1.6 Million per year

A real example : vBulletin
DB Server Load Web Server Load Max Requests/sec (1 = 282)
0
5
10
15
20
25
30
35
Standard install
With Memcached
Nginx + SCL + memcached

Availability
Good news :
It will become Open Source
It's solid : ESI version stable at 4 customers
Bad news :
First customer holds copyrights
Total rebuild
→ Open Source release
No current projects, so spare time project
Beta : Dec 2013
Final : Q1-Q2 2014 (on Github !)

Apache - tuning tips
Disable unused modules → fixes 10% of performance issues
Set AllowOverride to None. Enable only where needed !
Disable SymLinksIfOwnerMatch. Enable only where needed !
MinSpareServers, MaxSpareServers, StartServers, MaxClients,
MPM selection → a whole session of its own ;-)
Don't mod_proxy → use Nginx or Varnish !
High load on an SSL-site ? → put SSL on a reverse proxy

PHP speed - some tips
Upgrade PHP - every minor release has 5-15% speed gain !
Use an opcode cache (Zend O+, APC, eAccelerator, XCache)
Profile your code
XHProf
Xdebug
But : turn off profilers on acceptance/production platforms !

PHP speed - some tips
Most performance issues are in DB queries → look there first !
Log PHP errors and review those logs !
Shorter code != faster code → keep your code readable !
Hardware cost < Manpower cost
→ 1 more server < 30 mandays of labor
Keep micro-optimizations in code = last thing on list

DB speed - some tips
Avoid dynamic functions
Ex. : select id from calendar where startDate > curdate()
Better : select id from calendar where startDate > "2013-05-14"
Use same types for joins
i.e. don't join decimal with int
RAND() is evil !
count(*) is evil in InnoDB without a where clause !
Persistent connect is sort-of evil
Index, index, index !
→ But only on fields that are used in where, order by, group by !

Caching & Tuning @ frontend
http://www.websiteoptimization.com/speed/tweak/average-web-page/

Caching in the browser
HTTP 304 (Not modified)
Expires/Cache-Control header
2 notes :
Don't use POST if you want to cache
Don't cache user-specific pages in browser (security !)

HTTP 304
Browser Server
No header
Last Modified: Fri 28 Jan 2011 08:31:01 GMT
If-Modified-Since: Fri 28 Jan 2011 08:31:01 GMT
200 OK / 304 Not Modified
First request
Next requests

HTTP 304 with ETag
Browser Server
No header
Etag: 8a53321-4b-43f0b6dd972c0
If-None-Match: 8a53321-4b-43f0b6dd972c0
200 OK / 304 Not Modified
First request
Next requests

Expires/Cache-control header
Cache-Control
HTTP/1.1
Seconds to expiry
Used by browsers
Browser Server
No header
Expires: Fri 29 Nov 2011 12:11:08 GMT
Cache-Control: max-age=86400
First request
Next requests No requests until item expires
Expires
HTTP/1.0
Date to expire on
Used by old proxies
Requires clock to be accurate !

Pragma: no-cache = evil
"Pragma: no cache" doesn't make it uncacheable
Don't want caching on a page ?
HTTP/1.0 : "Expires : Fri, 30 Oct 1998 00:00:00 GMT" (in the past)
HTTP/1.1 : "Cache-Control: no-store"

Frontend tuning
1. You optimize backend
2. Frontend engineers messes up → havoc on backend
3. Don't forget : frontend sends requests to backend !
SO...
Care about frontend
Test frontend
Check what requests frontend sends to backend

Tuning frontend
Minimize requests
Combine CSS/JavaScript files
Use inline images in CSS/XHTML (not supported on all browsers yet)

Frontend tuning - inline CSS/XHTML images
#navbar span {
width: 31px;
height: 31px;
display: inline;
float: left;
margin-right: 4px;
}
.home { background-image:
url(data:image/gif;base64,R0lGODlhHwAfAPcAAAAAAIxKAKVjCLW1tb29tcbGvc7OxtZ7ANbWztbW1tbe1t7e1uelMefn1ufn3ufn5+fv3u
+MAO/v5+/v7/fGCPf35/f37//nY////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////........MEl0nGVUC6tObNnPceSFBaQVMJAxC4lo3gNOrUaFnTHoAxNm3XVxPfRq139e8BEGAjWD5bgI
ALw287T8AcAXLly2kjOACdc17higXSIKDO/Lpv7Qq4bw7APgBq8eOzX69InrZ6xe3dbxZffyTGkb8tdx8F+b0Xn2sFsCSBAgTM5lp63RH
YnoHUudZgRgkGOGCB+43nGk4OGcQTabKx5dyJKJ7ImoUNCaRRAZYN1ppsrT3Y2gIwyjSQBAtUpABml/0IJGYd6VjQUDH9uBFkGx
Gm5I8dPQaRUAQUMBdhhBV25ZYUJZBcSAtSJBddWZZ5UAGPOTXlgkNVOSZdBxEwIkYu7VhYnAol5GaadRqF0Uaz0TgXnX2umV
FyGakJUUAAADs=); margin-left: 4px; }
<img border=0
src="data:image/gif;base64,R0lGODlhHwAfAPcAAAAAAIxKAKVjCLW1tb29tcbGvc7OxtZ7ANbWztbW1tbe1t7e1uelMefn1ufn3ufn5+fv
3u+MAO/v5+/v7/fGCPf35/f37//nY/......Uaz0TgXnX2umVFyGakJUUAAADs=">

Tuning frontend
Minimize requests
Use CSS Sprites

Tuning content - CSS sprites
11 images
11 HTTP requests
24KByte
1 image
1 HTTP requests
14KByte

Tuning frontend
Minimize requests
Use CSS Sprites (horizontally if possible)
Put CSS at top
Put JavaScript at bottom
Max. no connections
Especially if JavaScript does Ajax (advertising-scripts, …) !
Avoid iFrames
Again : max no. of connections
Don't scale images in HTML
Have a favicon.ico (don't 404 it !)
→ see my blog

Tuning frontend
Don't use inline CSS/JavaScript
CSS/JavaScript need to be external files (minified, merged)
Why ? → Cacheable by browser / reverse proxy
Use GET for Ajax retrieval requests (and cache them !)
Optimize images (average 50-60% !)
Split requests across subdomains
Put statics on a separate subdomain (without cookies !)
www.phpbenelux.eu
Max. 2
requests
www.phpbenelux.eu
Max. 2
requests
Max. 2
requests
images.phpbenelux.eu

Tuning miscellaneous
Avoid DNS lookups
Frontend : don't use too many subdomains (2 = ideal)
Backend :
Turn off DNS resolution in Apache : HostnameLookups Off
If your app uses external data
Run a local DNS cache (timeout danger !)
Make sure you can trust DNS servers (preferable run your own)
Compress non-binary content (GZIP)
mod_deflate in Apache
HttpGzipModule in Nginx (HttpGzipStaticModule for pre-zipped statics !)
No native support in Varnish

What else can kill your site ?
Redirect loops
Multiple requests
More load on Webserver
More PHP to process
Additional latency for visitor
Try to avoid redirects anyway
→ In ZF : use $this->_forward instead of $this->_redirect
Watch your logs, but equally important...
Watch the logging process →
Logging = disk I/O → can kill your server !

Above all else... be prepared !
Have a monitoring system
Use a cache abstraction layer (disk → Memcache)
Don't install for the worst → prepare for the worst
Have a test-setup
Have fallbacks
→ Turn off non-critical functionality

So...
Cache
But : never delete, always push !
Have a warmup script
Monitor your cache
Have an abstraction layer
Apache = fine, Nginx = better
Static pages ? Use Varnish
Tune your frontend → impact on backend !

Contact
Twitter @wimgtr
Web http://techblog.wimgodden.be
Slides http://www.slideshare.net/wimg
E-mail wim.godden@cu.be
Please...
Rate my talk : http://joind.in/9044

Thanks !
Please...
Rate my talk : http://joind.in/9044

Caching and tuning fun for high scalability

Caching and tuning fun for high scalability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Caching and tuning fun for high scalability

Similar to Caching and tuning fun for high scalability (20)

More from Wim Godden

More from Wim Godden (20)

Recently uploaded

Recently uploaded (20)

Caching and tuning fun for high scalability

Editor's Notes