Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
Who am I ? <ul><li>Wim Godden (@wimgtr)
Owner of Cu.be Solutions (http://cu.be)
Open Source developer since 1997
Developer of OpenX
Zend Certified Engineer
Zend Framework Certified Engineer
MySQL Certified Developer </li></ul>
Who are you ? <ul><li>Developers ?
System/network engineers ?
Managers ?
Caching experience ? </li></ul>
Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
Goals of this tutorial <ul><li>Everything about caching and tuning
A few techniques </li><ul><li>How-to
How-NOT-to </li></ul><li>-> Increase reliability, performance and scalability
5 visitors/day -> 5 million visitors/day
(Don't expect miracle cure !) </li></ul>
LAMP
Architecture
Our base benchmark <ul><li>Apachebench = useful enough
Result ? </li></ul>Single webserver Proxy Static PHP Static PHP Apache + PHP 3900 17.5 6700 17.5 Limit : CPU, network or d...
Caching
What is caching ?
What is caching ? select * from article join user on article.user_id = user.id order by created desc limit 10
Theory of caching if ($data == false) DB
Theory of caching DB
Caching techniques <ul>#1 : Store entire pages #2 : Store part of a page (block) #3 : Store data retrieval (SQL ?) #4 : St...
Modification frequency ?
Retrieval frequency ? </li></ul>
How to find cacheable data <ul><li>New projects : start from 'cache everything'
Existing projects : </li><ul><li>Look at MySQL slow query log
Make a complete query log (don't forget to turn it off !)
Check page loading times </li></ul></ul>
Caching storage - Disk <ul><li>Data with few updates : good
Caching SQL queries : preferably not
DON'T  use NFS or other network file systems </li><ul><li>especially for sessions
locking issues !
high latency </li></ul></ul>
Caching storage - Disk / ramdisk <ul><li>Local </li><ul><li>5 Webservers -> 5 local caches
-> Hard to scale
How will you keep them synchronized ? </li><ul><li>-> Don't say NFS or rsync ! </li></ul></ul></ul>
Caching storage - Memcache(d) <ul><li>Facebook, Twitter, YouTube, … -> need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system </li><ul><li>Keys - max. 250bytes
Values - max. 1Mbyte </li></ul></ul>
Caching storage - Memcache(d) <ul><li>Facebook, Twitter, YouTube, … -> need we say more ?
Distributed memory caching system
Multiple machines ↔ 1 big memory-based hash-table
Key-value storage system </li><ul><li>Keys - max. 250bytes
Values - max. 1Mbyte </li></ul><li>Extremely fast... non-blocking, UDP (!) </li></ul>
Memcache - where to install
Memcache - where to install
Memcache - installation & running it <ul><li>Installation </li><ul><li>Distribution package
PECL
Windows : binaries </li></ul><li>Running </li><ul><li>No config-files
memcached -d -m <mem> -l <ip> -p <port>
ex. : memcached -d -m 2048 -l 172.16.1.91 -p 11211 </li></ul></ul>
Caching storage - Memcache - some notes <ul><li>Not fault-tolerant </li><ul><li>It's a cache !
Lose session data
Lose shopping cart data
... </li></ul></ul>
Caching storage - Memcache - some notes <ul><li>Not fault-tolerant </li><ul><li>It's a cache !
Lose session data
Lose shopping cart data
… </li></ul><li>Firewall your Memcache port ! </li></ul>
Memcache in code <?php $memcache =  new  Memcache(); $memcache->addServer( '172.16.0.1' , 11211); $memcache->addServer( '1...
Where's the data ? <ul><li>Memcache client decides (!)
2 hashing algorithms : </li><ul><li>Traditional </li><ul><li>Server failure -> all data must be rehashed </li></ul><li>Con...
Benchmark with Memcache Single webserver Proxy Static PHP Static PHP Apache + PHP 3900 17.5 6700 17.5 Apache + PHP + MC 39...
Memcache slabs <ul>(or why Memcache says it's full when it's not) <li>Multiple slabs of different sizes : </li><ul><li>Sla...
Slab 2 : 480 bytes (400 * 1.2)
Slab 3 : 576 bytes (480 * 1.2) (and so on...) </li></ul><li>Multiplier (1.2 here) can be configured
Store a lot of very large objects
-> Large slabs : full
-> Rest : free
-> Eviction of data ! </li></ul>
Memcache - Is it working ? <ul><li>Connect to it using telnet </li><ul><li>&quot;stats&quot; command ->
Use Cacti or other monitoring tools </li></ul></ul>STAT pid 2941 STAT uptime 10878 STAT time 1296074240 STAT version 1.4.5...
Memcache - backing up
Memcache - tip <ul>Page with multiple blocks ? -> use Memcached::getMulti() But : what if you get some hits and some misse...
Upcoming SlideShare
Loading in...5
×

Caching and tuning fun for high scalability @ FOSDEM 2012

3,603

Published on

Caching has been a 'hot' topic for a few years. But caching takes more than merely taking data and putting it in a cache : the right caching techniques can improve performance and reduce load significantly. But we'll also look at some major pitfalls, showing that caching the wrong way can bring down your site. If you're looking for a clear explanation about various caching techniques and tools like Memcached, Nginx and Varnish, as well as ways to deploy them in an efficient way, this talk is for you.

Published in: Technology

Caching and tuning fun for high scalability @ FOSDEM 2012

  1. 1. Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
  2. 2. Who am I ? <ul><li>Wim Godden (@wimgtr)
  3. 3. Owner of Cu.be Solutions (http://cu.be)
  4. 4. Open Source developer since 1997
  5. 5. Developer of OpenX
  6. 6. Zend Certified Engineer
  7. 7. Zend Framework Certified Engineer
  8. 8. MySQL Certified Developer </li></ul>
  9. 9. Who are you ? <ul><li>Developers ?
  10. 10. System/network engineers ?
  11. 11. Managers ?
  12. 12. Caching experience ? </li></ul>
  13. 13. Caching and tuning fun for high scalability Wim Godden Cu.be Solutions
  14. 14. Goals of this tutorial <ul><li>Everything about caching and tuning
  15. 15. A few techniques </li><ul><li>How-to
  16. 16. How-NOT-to </li></ul><li>-> Increase reliability, performance and scalability
  17. 17. 5 visitors/day -> 5 million visitors/day
  18. 18. (Don't expect miracle cure !) </li></ul>
  19. 19. LAMP
  20. 20. Architecture
  21. 21. Our base benchmark <ul><li>Apachebench = useful enough
  22. 22. Result ? </li></ul>Single webserver Proxy Static PHP Static PHP Apache + PHP 3900 17.5 6700 17.5 Limit : CPU, network or disk Limit : database
  23. 23. Caching
  24. 24. What is caching ?
  25. 25. What is caching ? select * from article join user on article.user_id = user.id order by created desc limit 10
  26. 26. Theory of caching if ($data == false) DB
  27. 27. Theory of caching DB
  28. 28. Caching techniques <ul>#1 : Store entire pages #2 : Store part of a page (block) #3 : Store data retrieval (SQL ?) #4 : Store complex processing result #? : Your call ! </ul><ul>When you have data, think : <li>Creating time ?
  29. 29. Modification frequency ?
  30. 30. Retrieval frequency ? </li></ul>
  31. 31. How to find cacheable data <ul><li>New projects : start from 'cache everything'
  32. 32. Existing projects : </li><ul><li>Look at MySQL slow query log
  33. 33. Make a complete query log (don't forget to turn it off !)
  34. 34. Check page loading times </li></ul></ul>
  35. 35. Caching storage - Disk <ul><li>Data with few updates : good
  36. 36. Caching SQL queries : preferably not
  37. 37. DON'T use NFS or other network file systems </li><ul><li>especially for sessions
  38. 38. locking issues !
  39. 39. high latency </li></ul></ul>
  40. 40. Caching storage - Disk / ramdisk <ul><li>Local </li><ul><li>5 Webservers -> 5 local caches
  41. 41. -> Hard to scale
  42. 42. How will you keep them synchronized ? </li><ul><li>-> Don't say NFS or rsync ! </li></ul></ul></ul>
  43. 43. Caching storage - Memcache(d) <ul><li>Facebook, Twitter, YouTube, … -> need we say more ?
  44. 44. Distributed memory caching system
  45. 45. Multiple machines ↔ 1 big memory-based hash-table
  46. 46. Key-value storage system </li><ul><li>Keys - max. 250bytes
  47. 47. Values - max. 1Mbyte </li></ul></ul>
  48. 48. Caching storage - Memcache(d) <ul><li>Facebook, Twitter, YouTube, … -> need we say more ?
  49. 49. Distributed memory caching system
  50. 50. Multiple machines ↔ 1 big memory-based hash-table
  51. 51. Key-value storage system </li><ul><li>Keys - max. 250bytes
  52. 52. Values - max. 1Mbyte </li></ul><li>Extremely fast... non-blocking, UDP (!) </li></ul>
  53. 53. Memcache - where to install
  54. 54. Memcache - where to install
  55. 55. Memcache - installation & running it <ul><li>Installation </li><ul><li>Distribution package
  56. 56. PECL
  57. 57. Windows : binaries </li></ul><li>Running </li><ul><li>No config-files
  58. 58. memcached -d -m <mem> -l <ip> -p <port>
  59. 59. ex. : memcached -d -m 2048 -l 172.16.1.91 -p 11211 </li></ul></ul>
  60. 60. Caching storage - Memcache - some notes <ul><li>Not fault-tolerant </li><ul><li>It's a cache !
  61. 61. Lose session data
  62. 62. Lose shopping cart data
  63. 63. ... </li></ul></ul>
  64. 64. Caching storage - Memcache - some notes <ul><li>Not fault-tolerant </li><ul><li>It's a cache !
  65. 65. Lose session data
  66. 66. Lose shopping cart data
  67. 67. … </li></ul><li>Firewall your Memcache port ! </li></ul>
  68. 68. Memcache in code <?php $memcache = new Memcache(); $memcache->addServer( '172.16.0.1' , 11211); $memcache->addServer( '172.16.0.2' , 11211); $myData = $memcache->get( 'myKey' ); if ($myData === false ) { $myData = GetMyDataFromDB(); // Put it in Memcache as 'myKey', without compression, with no expiration $memcache->set( 'myKey' , $myData, false , 0); } echo $myData;
  69. 69. Where's the data ? <ul><li>Memcache client decides (!)
  70. 70. 2 hashing algorithms : </li><ul><li>Traditional </li><ul><li>Server failure -> all data must be rehashed </li></ul><li>Consistent </li><ul><li>Server failure -> 1/x of data must be rehashed (x = # of servers) </li></ul></ul></ul>
  71. 71. Benchmark with Memcache Single webserver Proxy Static PHP Static PHP Apache + PHP 3900 17.5 6700 17.5 Apache + PHP + MC 3900 55 6700 108
  72. 72. Memcache slabs <ul>(or why Memcache says it's full when it's not) <li>Multiple slabs of different sizes : </li><ul><li>Slab 1 : 400 bytes
  73. 73. Slab 2 : 480 bytes (400 * 1.2)
  74. 74. Slab 3 : 576 bytes (480 * 1.2) (and so on...) </li></ul><li>Multiplier (1.2 here) can be configured
  75. 75. Store a lot of very large objects
  76. 76. -> Large slabs : full
  77. 77. -> Rest : free
  78. 78. -> Eviction of data ! </li></ul>
  79. 79. Memcache - Is it working ? <ul><li>Connect to it using telnet </li><ul><li>&quot;stats&quot; command ->
  80. 80. Use Cacti or other monitoring tools </li></ul></ul>STAT pid 2941 STAT uptime 10878 STAT time 1296074240 STAT version 1.4.5 STAT pointer_size 64 STAT rusage_user 20.089945 STAT rusage_system 58.499106 STAT curr_connections 16 STAT total_connections 276950 STAT connection_structures 96 STAT cmd_get 276931 STAT cmd_set 584148 STAT cmd_flush 0 STAT get_hits 211106 STAT get_misses 65825 STAT delete_misses 101 STAT delete_hits 276829 STAT incr_misses 0 STAT incr_hits 0 STAT decr_misses 0 STAT decr_hits 0 STAT cas_misses 0 STAT cas_hits 0 STAT cas_badval 0 STAT auth_cmds 0 STAT auth_errors 0 STAT bytes_read 613193860 STAT bytes_written 553991373 STAT limit_maxbytes 268435456 STAT accepting_conns 1 STAT listen_disabled_num 0 STAT threads 4 STAT conn_yields 0 STAT bytes 20418140 STAT curr_items 65826 STAT total_items 553856 STAT evictions 0 STAT reclaimed 0
  81. 81. Memcache - backing up
  82. 82. Memcache - tip <ul>Page with multiple blocks ? -> use Memcached::getMulti() But : what if you get some hits and some misses ? </ul>
  83. 83. Updating data
  84. 84. Updating data LCD_Popular_Product_List
  85. 85. Adding/updating data $memcache->delete( 'LCD_Popular_Product_List' );
  86. 86. Adding/updating data
  87. 87. Adding/updating data - Why it crashed
  88. 88. Adding/updating data - Why it crashed
  89. 89. Adding/updating data - Why it crashed
  90. 90. Cache stampeding
  91. 91. Cache stampeding
  92. 92. Memcache code ? DB
  93. 93. Cache warmup scripts <ul><li>Used to fill your cache when it's empty
  94. 94. Run it before starting Webserver !
  95. 95. 2 ways : </li><ul><li>Visit all URLs </li><ul><li>Error-prone
  96. 96. Hard to maintain </li></ul><li>Call all cache-updating methods </li></ul><li>Make sure you have a warmup script ! </li></ul>
  97. 97. Cache stampeding - what about locking ? <ul>Seems like a nice idea, but... <li>While lock in place
  98. 98. What if the process that created the lock fails ? </li></ul>
  99. 99. LAMP... <ul>-> LAMMP -> LNMMP </ul>
  100. 100. Nginx <ul><li>Web server
  101. 101. Reverse proxy
  102. 102. Lightweight, fast
  103. 103. 9.6% of all Websites </li></ul>
  104. 104. Nginx <ul><li>No threads, event-driven
  105. 105. Uses epoll / kqueue
  106. 106. Low memory footprint
  107. 107. 10000 active connections = normal </li></ul>
  108. 108. Nginx - a true alternative to Apache ? <ul><li>Not all Apache modules </li><ul><li>mod_auth_*
  109. 109. mod_dav*
  110. 110. … </li></ul><li>Basic modules are available
  111. 111. Some 3 rd party modules (needs recompilation !) </li></ul>
  112. 112. Nginx - Configuration server { listen 80; server_name www.domain.ext *.domain.ext; index index.html; root /home/domain.ext/www; } server { listen 80; server_name photo.domain.ext; index index.html; root /home/domain.ext/photo; }
  113. 113. Nginx + PHP-FPM - performance ? Single webserver Proxy Static PHP Static PHP Apache + PHP 3900 17.5 6700 17.5 Apache + PHP + MC 3900 55 6700 108 Nginx + PHP-FPM + MC 11700 57 11200 112
  114. 114. Reverse proxy time...
  115. 115. Varnish <ul><li>Not just a load balancer
  116. 116. Reverse proxy cache / http accelerator / …
  117. 117. Caches (parts of) pages in memory
  118. 118. Careful : </li><ul><li>uses threads (like Apache)
  119. 119. Nginx usually scales better (but doesn't have VCL) </li></ul></ul>
  120. 120. Varnish - backends + load balancing backend server1 { .host = &quot;192.168.0.10&quot;; } backend server2{ .host = &quot;192.168.0.11&quot;; } director example_director round-robin { { .backend = server1; } { .backend = server2; } }
  121. 121. Varnish - VCL <ul><li>Varnish Configuration Language
  122. 122. DSL (Domain Specific Language) </li><ul><li>-> compiled to C </li></ul><li>Hooks into each request
  123. 123. Defines : </li><ul><li>Backends (web servers)
  124. 124. ACLs
  125. 125. Load balancing strategy </li></ul><li>Can be reloaded while running </li></ul>
  126. 126. Varnish - whatever you want <ul><li>Real-time statistics (varnishtop, varnishhist, ...)
  127. 127. ESI </li></ul>
  128. 128. Varnish - ESI <ul>Perfect for caching pages </ul>In your article page output : <esi:include src=&quot;/news&quot;/> In your Varnish config : sub vcl_fetch { if (req.url == &quot;/news&quot;) { esi; /* Do ESI processing */ set obj.ttl = 2m; } elseif (req.url == &quot;/nav&quot;) { esi; set obj.ttl = 1m; } elseif …. … . }
  129. 129. Varnish with ESI - hold on tight ! Single webserver Proxy Static PHP Static PHP Apache + PHP 3900 17.5 6700 17.5 Apache + PHP + MC 3900 55 6700 108 Nginx + PHP-FPM + MC 11700 57 11200 112 Varnish - - 11200 4200
  130. 130. Varnish - what can/can't be cached ? <ul><li>Can : </li><ul><li>Static pages
  131. 131. Images, js, css
  132. 132. Pages or parts of pages that don't change often (ESI) </li></ul><li>Can't : </li><ul><li>POST requests
  133. 133. Very large files (it's not a file server !)
  134. 134. Requests with Set-Cookie
  135. 135. User-specific content </li></ul></ul>
  136. 136. ESI -> no caching on user-specific content ? Logged in as : Wim Godden 5 messages TTL = 5min TTL=1h TTL = 0s ?
  137. 137. Coming soon... <ul><li>Based on Nginx
  138. 138. Reduces load by 50 – 95% </li><ul><li>Requires code changes !
  139. 139. Well-built project -> few changes
  140. 140. Effect on webservers and database servers </li></ul></ul>
  141. 141. What's the result ?
  142. 142. What's the result ?
  143. 143. Figures <ul><li>First customer : </li><ul><li>No. of web servers : 18 -> 4
  144. 144. No. of db servers : 6 -> 2
  145. 145. Total : 24 -> 6 (75% reduction !) </li></ul><li>Second customer (already using Nginx + Memcache) : </li><ul><li>No. of web servers : 72 -> 8
  146. 146. No. of db servers : 15 -> 4
  147. 147. Total : 87 -> 12 (86% reduction !) </li></ul></ul>
  148. 148. Availability <ul><li>Stable at 2 customers
  149. 149. Still under heavy development
  150. 150. Final release : July 2012
  151. 151. Released as ???? </li></ul>
  152. 152. Tuning
  153. 153. PHP speed - some tips <ul><li>Upgrade PHP - every minor release has 5-15% speed gain !
  154. 154. Use an opcode cache </li></ul>
  155. 155. Caching storage - Opcode caching
  156. 156. DB speed - some tips <ul><li>Use same types for joins </li><ul><li>i.e. don't join decimal with int </li></ul><li>RAND() is evil !
  157. 157. count(*) is evil in InnoDB without a where clause !
  158. 158. Persistent connect is sort-of evil </li></ul>
  159. 159. Caching & Tuning @ frontend http://www.websiteoptimization.com/speed/tweak/average-web-page/
  160. 160. Frontend tuning <ul>1. You optimize backend 2. Frontend engineers messes up -> havoc on backend 3. Don't forget : frontend sends requests to backend ! SO... <li>Care about frontend
  161. 161. Test frontend
  162. 162. Check what requests frontend sends to backend </li></ul>
  163. 163. Tuning frontend <ul><li>Minimize requests </li><ul><li>Combine CSS/JavaScript files </li></ul></ul>
  164. 164. Tuning frontend <ul><li>Minimize requests </li><ul><li>Combine CSS/JavaScript files
  165. 165. Use CSS Sprites </li></ul></ul>
  166. 166. CSS Sprites
  167. 167. Tuning content - CSS sprites
  168. 168. Tuning content - CSS sprites 11 images 11 HTTP requests 24KByte 1 image 1 HTTP requests 14KByte
  169. 169. Tuning frontend <ul><li>Minimize requests </li><ul><li>Combine CSS/JavaScript files
  170. 170. Use CSS Sprites (horizontally if possible) </li></ul><li>Put CSS at top
  171. 171. Put JavaScript at bottom </li><ul><li>Max. no connections
  172. 172. Especially if JavaScript does Ajax (advertising-scripts, …) ! </li></ul><li>Avoid iFrames </li><ul><li>Again : max no. of connections </li></ul><li>Don't scale images in HTML
  173. 173. Have a favicon.ico (don't 404 it !) </li><ul><li>-> see my blog </li></ul></ul>
  174. 174. What else can kill your site ? <ul><li>Redirect loops </li><ul><li>Multiple requests </li><ul><li>More load on Webserver
  175. 175. More code to process </li></ul><li>Additional latency for visitor
  176. 176. Try to avoid redirects anyway </li></ul><li>Watch your logs, but equally important...
  177. 177. Watch the logging process ->
  178. 178. Logging = disk I/O -> can kill your server !
  179. 179. Slashdot effect </li></ul>
  180. 180. Above all else... be prepared ! <ul><li>Have a monitoring system
  181. 181. Use a cache abstraction layer (disk -> Memcache)
  182. 182. Don't install for the worst -> prepare for the worst
  183. 183. Have a test-setup
  184. 184. Have fallbacks </li><ul><li>-> Turn off non-critical functionality </li></ul></ul>
  185. 185. <ul>Questions ? </ul>
  186. 186. <ul>Questions ? </ul>
  187. 187. Contact <ul><li>Twitter @wimgtr
  188. 188. Web http://techblog.wimgodden.be
  189. 189. Slides http://www.slideshare.net/wimg
  190. 190. E-mail [email_address] </li></ul>
  191. 191. <ul>Thanks ! </ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×