Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to memcached


Published on

Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.

Published in: Technology
  • Follow this article that I have written. This should work fine for Win7 and WinXP. It helps in setting up the Memcached Server and Perl Client.
    Are you sure you want to  Yes  No
    Your message goes here

Introduction to memcached

  2. 2. Tags memcached, performance, scalability, php, mySQL, caching techniques, #ikdoeict
  3. 3. lead web dev at Netlog since 4 years php + mysql + frontend working on Gatcha
  4. 4. For who? talk for students professional bachelor ICT
  5. 5. Why this talk? One of the first things I’ve learnt at Netlog. Using it every single day.
  6. 6. Program - About caching - About memcached - Examples - Tips & tricks - Toolsets and other solutions
  7. 7. What is caching? A copy of real data with faster (and/or cheaper) access
  8. 8. What is caching? • From Wikipedia: "A cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (owing to longer access time) or to compute, compared to the cost of reading the cache." • Term introducted by IBM in the 60’s
  9. 9. The anatomy • simple key/value storage • simple operations • save • get • delete
  10. 10. Terminology • storage cost • retrieval cost (network load / algorithm load) • invalidation (keeping data up to date / removing irrelevant data) • replacement policy (FIFO/LFU/LRU/MRU/RANDOM vs. Belady’s algorithm) • cold cache / warm cache
  11. 11. Terminology • cache hit and cache miss • typical stats: • hit ratio (hits / hits + misses) • miss ratio (1 - hit ratio) • 45 cache hits and 10 cache misses • 45/(45+10) = 82% hit ratio • 18% miss ratio
  12. 12. When to cache? • caches are only efficient when the benefits of faster access outweigh the overhead of checking and keeping your cache up to date • more cache hits then cache misses
  13. 13. Where are caches used? • at hardware level (cpu, hdd) • operating systems (ram) • web stack • applications • your own short term vs long term memory
  14. 14. Caches in the web stack • Browser cache • DNS cache • Content Delivery Networks (CDN) • Proxy servers • Application level • full output caching plugin) (eg. Wordpress WP-Cache • ...
  15. 15. Caches in the web stack (cont’d) • Application level • opcode cache (APC) • query cache (MySQL) • storing denormalized results in the database • object cache • storing values in php objects/classes
  16. 16. Efficiency of caching? • the earlier in the process, the closer to the original request(er), the faster • browser cache will be faster then cache on a proxy • but probably also the harder to get it right • the closer to the requester the more parameters the cache depends on
  17. 17. What to cache on the server-side? • As PHP backend developer, what to cache? • expensive operations: operations that work with slower resources • database access • reading files(in fact, any filesystem access) • API calls • Heavy computations • XML
  18. 18. Where to cache on the server-side? • As PHP backend developer, where to store cache results? • in database (computed values, generated html) • you’ll still need to access your database • in static files (generated html or serialized php values) • you’ll still need to access your file system
  19. 19. in memory!
  20. 20. memcached
  21. 21. About memcached • Free & open source, high-performance, distributed memory object caching system • Generic in nature, intended for use in speeding up dynamic web applications by alleviating database load. • key/value dictionary
  22. 22. About memcached (cont’d) • Developed by Brad Fitzpatrick for LiveJournal in 2003 • Now used by Netlog, Facebook, Flickr, Wikipedia, Twitter, YouTube ...
  23. 23. Technically • It’s a server • Client access over TCP or UDP • Servers can run in pools • eg. 3 servers with 64GB mem each give you a single pool of 192GB storage for caching • Servers are independent, clients manage the pool
  24. 24. What to store in memcache? • high demand (used often) • expensive (hard to compute) • common (shared accross users) • Best? All three
  25. 25. What to store in memcache? (cont’d) • Typical: • user sessions (often) • user data (often, shared) • homepage data (eg. often, shared, expensive)
  26. 26. What to store in memcache? (cont’d) • Workflow: • monitor application (query logs / profiling) • add a caching level • compare speed gain
  27. 27. Memcached principles • Fast network access (memcached servers close to other application servers) • Nomemcached is gone) server goes down, data in persistency (if your • No redundancy / fail-over • No replication (single item in cache lives on one server only) • No authentication (not in shared environments)
  28. 28. Memcached principles (cont’d) • 1 key is maximum 1MB • keys are strings of 250 characters (in application typically MD5 of user readable string) • No enumeration of keys (thus no list of valid keys in cache at certain moment, list of keys beginnen with “user_”, ...) • No active clean-up (only clean up when more space needed, LRU)
  29. 29. $ telnet localhost 11211 Trying Connected to localhost. Escape character is '^]'. get foo VALUE foo 0 2 hi END stats STAT pid 8861 (etc)
  30. 30. Client Access • both ASCII as Binary protocol • in real life: • clients available for all major languages • C, C++, PHP, Python, Ruby, Java, Perl, Windows, ...
  31. 31. PHP Clients • Support the basics such as multiple servers, setting values, getting values, incrementing, decrementing and getting stats. • pecl/memcache • pecl/memcached • newer, in beta, a couple more features
  32. 32. PHP Client Comparison pecl/memcache pecl/memcached First Release Date 2004-06-08 2009-01-29 (beta) Actively Developed? Yes Yes External Dependency None libmemcached Features Automatic Key Fixup Yes No Append/Prepend No Yes Automatic Serialzation2 Yes Yes Binary Protocol No Optional CAS No Yes Compression Yes Yes Communication Timeout Connect Only Various Options Consistent Hashing Yes Yes Delayed Get No Yes Multi-Get Yes Yes Session Support Yes Yes Set/Get to a specific server No Yes Stores Numerics Converted to Strings Yes
  33. 33. PHP Client functions • Memcached::add — Add an item under a new key • Memcached::addServer — Add a server to the server pool • Memcached::decrement — Decrement numeric item's value • Memcached::delete — Delete an item • Memcached::flush — Invalidate all items in the cache • Memcached::get — Retrieve an item • Memcached::getMulti — Retrieve multiple items • Memcached::getStats — Get server pool statistics • Memcached::increment — Increment numeric item's value • Memcached::set — Store an item • ...
  34. 34. Output caching • Pages with high load / expensive to generate • Very easy • Very fast • But: all the dependencies ... • language, css, template, logged in user’s details, ...
  35. 35. <?php $html = $cache->get('mypage'); if (!$html) { ob_start(); echo "<html>"; // all the fancy stuff goes here echo "</html>"; $html = ob_get_contents(); ob_end_clean(); $cache->set('mypage', $html); } echo $html; ?>
  36. 36. Data caching • on a lower level • easier to find all dependencies • ideal solution for offloading database queries • the database is almost always the biggest bottleneck in backend performance problems
  37. 37. <?php function getUserData($UID) { $key = 'user_' . $UID; $userData = $cache->get($key); if (!$userData) { $queryResult = Database::query("SELECT * FROM USERS WHERE uid = " . (int) $UID); $userData = $queryResult->getRow(); $cache->set($userData); } return $userData; } ?>
  38. 38. “There are only two hard things in Computer Science: cache invalidation and naming things.” Phil Karlton
  39. 39. Invalidation • Caching for a certain amount of time • eg. 10 minutes • don’t delete caches • thus: You can’t trust that data coming from cache is correct
  40. 40. Invalidation (cont’d) • Use: Great for summaries • Overview • Pages where it’s not that big a problem if data is a little bit out of dat (eg. search results) • Good for quick and dirty optimizations
  41. 41. Invalidation (cont’d) • Store forever, and expire on certain events • the userdata example • store userdata for ever • when user changes any of his preferences, throw cache away
  42. 42. Invalidation • Use: • data that is fetched more then it’s updated • where it’s critical the data is correct • Improvement: instead of delete on event, update cache on event. (Mind: race conditions. Cache invalidation always as close to original change as possible!)
  43. 43. Uses at Netlog • sessions (cross server) • database results (via database class, or object caching) • flooding checks • output caching (eg. for RSS feeds) • locks
  44. 44. <?php function getUserData($UID) { $db = DB::getInstance(); $db->prepare("SELECT * FROM USERS WHERE uid = {UID}"); $db->assignInt('UID', $UID); $db->execute(); return $db->getRow(); } ?>
  45. 45. <?php function getUserData($UID) { $db = DB::getInstance(); $db->prepare("SELECT * FROM USERS WHERE uid = {UID}"); $db->assignInt('UID', $UID); $db->setCacheTTL(0); // cache forever $db->execute(); return $db->getRow(); } ?>
  46. 46. <?php function getUserData($UID, $invalidateCache = false) { $db = DB::getInstance(); $db->prepare("SELECT * FROM USERS WHERE uid = {UID}"); $db->assignInt('UID', $UID); $db->setCacheTTL(0); // cache forever if ($invalidateCache) { return $db->invalidateCache(); } $db->execute(); return $db->getRow(); } ?>
  47. 47. <?php function updateUserData($UID, $data) { $db = DB::getInstance(); $db->prepare("UPDATE USERS SET ... WHERE uid = {UID}"); ... getUserData($UID, true); // invalidate cache return $result; } ?>
  48. 48. <?php function getLastBlogPosts($UID, $start = 0, $limit = 10, $invalidateCache = false) { $db = DB::getInstance(); $db->prepare("SELECT blogid FROM BLOGS WHERE uid = {UID} ORDER BY dateadd DESC LIMIT {start}, {limit}"); $start; $limit; $UID; $db->setCacheTTL(0); // cache forever if ($invalidateCache) { return $db->invalidateCache(); } $db->execute(); return $db->getResults(); } ?>
  49. 49. <?php function addNewBlogPost($UID, $data) { $db = DB::getInstance(); $db->prepare("INSERT INTO BLOGS ..."); ... // invalidate caches getLastBlogPosts($UID, 0, 10); getLastBlogPosts($UID, 11, 20); ... // ??? return $result; } ?>
  50. 50. <?php function getLastBlogPosts($UID, $start = 0, $limit = 10) { $cacheVersionNumber = CacheVersionNumbers:: get('lastblogsposts_' . $UID); $db = DB::getInstance(); $db->prepare("SELECT blogid FROM ..."); ... $db->setCacheVersionNumber($cacheVersionNumber); $db->setCacheTTL(0); // cache forever $db->execute(); return $db->getResults(); } ?>
  51. 51. <?php class CacheVersionNumbers { public static function get($name) { $result = $cache->get('cvn_' . $name); if (!$result) { $result = microtime() . rand(0, 1000); $cache->set('cvn_' . $name, $result); } return $result; } public static function bump($name) { return $cache->delete('cvn_' . $name); } } ?>
  52. 52. <?php function addNewBlogPost($UID, $data) { $db = DB::getInstance(); $db->prepare("INSERT INTO BLOGS ..."); ... CacheVersionNumbers::bump('lastblogsposts_' . $UID); return $result; } ?>
  53. 53. Query Caching (cont’d) • queries with JOIN and WHERE statements are harder to cache • often not easy to find the cache key on update/change events • solution: JOIN in PHP
  54. 54. Query Caching (cont’d) • queries with JOIN and WHERE statements are harder to cache • often not easy to find the cache key on update/change events • solution: JOIN in PHP • In following example: what if nickname of user changes?
  55. 55. <?php $db = DB::getInstance(); $db->prepare("SELECT c.comment_message, c.comment_date, u.nickname FROM COMMENTS c JOIN USERS u ON u.uid = c.commenter_uid WHERE c.postid = {postID}"); ... ?>
  56. 56. <?php $db = DB::getInstance(); $db->prepare("SELECT c.comment_message, c.comment_date , c.commenter_uid AS uid FROM COMMENTS c WHERE c.postid = {postID}"); ... $comments = Users::addUserDetails($comments); ... ?>
  57. 57. <?php ... public static function addUserDetails($array) { foreach($array as &$item) { $item = array_merge($item, self::getUserData($item['uid'])); // assume high hit ratio } return $item; } ... ?>
  58. 58. So? • Pro’s: • speed, duh. • queries get simpler (better for your db) • easier porting to key/value storage solutions • Cons: • You’re relying on memcached to be up and have good hit ratios
  59. 59. Multi-Get Optimisations • We reduced database access • Memcached is faster, but access to memcache still has it’s price • Solution: multiget • fetch multiple keys from memcached in one single call • result is array of items
  60. 60. Multi-Get Optimisations (cont’d) • back to addUserDetails example • find UID’s from array • multiget to memcached for details of UID’s • for UID’s without result, do a query • SELECT ... FROM USERS WHERE uid IN (...) • for each fetched user, store in cache • worst case (no hits): 1 query • return merged cache/db results
  61. 61. Consistent Hashing • client is responsible for managing pool • hashes a certain key to a certain server • clients can be naïve: distribute keys on size of pool • if one server goes down, all keys will now be queried on other servers > cold cache • use a client with consistent hashing algorithms, so if server goes down, only data on that server gets lost
  62. 62. Memcached Statistics • available stats from servers include: • uptime, #calls (get/set/...), #hits (since uptime), #misses (since uptime) • no enumeration, no distinguishing on types of caches • add own logging / statistics to monitor effectiveness of your caching strategy
  63. 63. More tips ... • Be carefull when security matters. (Remember ‘no authentication’?) • Working on authentication for memcached via SASL Auth Protocol • Caching is not an excuse not to do database tuning. (Remember cold cache?) • Make sure to write unit tests for your caching classes and places where you use it. (Debugging problems related to out-of-date cache data is hard and boring. Very boring.)
  64. 64. Libraries for memcached • Zend framework has Zend_Cache with support for a memcached backend • Wordpress has 3 plugins for working with memcached • all of the other major frameworks have some sort of support (built in or via plugins): Symfony, Django, CakePHP, Drupal, ... • Gear6: memcached servers in the cloud
  65. 65. memcached isn’t the only caching solution • memcachedb (persistent memcached) • opcode caching • APC (php compiled code cache, usable for other purposes too) • xCache • eAccelerator • Zend optimizer
  66. 66. Last thought • main bottleneck in php backends is database • adding php servers is easier then scaling databases • a complete caching layer before your database layer solves a lot of performance and scalability issues • but being able to scale takes more then memcached • performance tuning, beginning with identifying the slowest and most used parts stays important, be it tuning of your php code, memcached calls or database queries
  67. 67. VELO PERS FO R DE
  68. 68. ME R GA High-score Handling YOU Tournaments A ME Challenge builder IA LG Achievements S OC OP AT Got an idea for a game? Great!
  69. 69. Gatcha For Game Developers Game tracking Start game and end game calls results in accurate gameplay tracking and allows us to show who is playing the game at any given moment, compute popularity, target games. High-scores You push your high-score to our API, we do the hard work of creating different types of leader boards and rankings. Achievements Pushing achievements reached in your game, just takes one API call, no configuration needed.
  70. 70. Gatcha For Game Developers Multiplayer Games We run SmartFox servers that enable you to build real-time multiplayer games, with e.g.. in game chat coming: Challenges & Tournaments Allow your game players to challenge each other, or build challenges & contests yourself.
  71. 71. Gatcha For Game Developers How to integrate? Flash Games We offer wrapper for AS3 and AS2 games with full implementation of our API Unity3D Games OpenSocial Games Talk to the supported containers via the Gatcha OpenSocial Extension Other Games Simple iframe implementation. PHP Client API available for the Gatcha API Start developing in our sandbox.
  72. 72. Job openings Weʼre searching for great developers! PHP Talents Working on integrations and the gaming platform Flash Developers Working on Flash Games and the gaming platform Design Artists Designing games and integrations
  73. 73.
  74. 74. Resources, a.o.: • memcached & apc: caching-with-memcached-and-apc • speed comparison: memcachedv2.html • php client comparison: wiki/PHPClientComparison • cakephp-memcached: 2009/06/17/send-your-database-on-vacation-by-using- cakephp-memcached/ • caching basics: basics • caching w php: effectice-caching-w-php-caching