memcached

 scaling your website
 with memcached


 by: steve yen
about me

• Steve Yen

 • NorthScale
 • Escalate Software
 • Kiva Software
what you’ll learn

•   what, where, why, when



• how
 • especially, best practices
“mem cache dee”

• latest version1.4.1

• http://code.google.com/p/memcached
open source
distributed cache
livejournal
helps your websites run
          fast
popular
simple
KISS
easy
small bite-sized steps


• not a huge, forklift replacement
  rearchitecture / reengineering project
fast
“i only block for
  memcached”
scalable
many client libraries
• might be TOO many
• the hit list...
 • Java ==> spymemcached
 • C ==> libmemcached
 • Python, Ruby, etc ==>
    • libmemcached wrappers
frameworks

• rails
• django
• spring / hibernate
• cakephp, symphony, etc
applications

• drupal
• wordpress
• mediawiki
• etc
it works

it promises to solve performance problems


                               it delivers!
problem?
your website is too
      slow
RDBMS melting down
urgent! emergency
one server


web app + RDBMS
1 + 1 servers

     web app


     RDBMS
N + 1 servers

web app, web app, web app, web app


             RDBMS
RDBMS
EXPLAIN PLAN?
buy a bigger box
buy better disks
master write DB +
multiple read DB?
vertical partitioning?
sharding?
uh oh, big reengineering

• risky!

• touch every line of code, every query!!
and, it’s 2AM
you need a band-aid
a simple band-aid now
use a cache
keep things in memory!
don’t hit disk
distributed cache


• to avoid wasting memory
don’t write one of
  these yourself
memcached
simple API


• hash-table-ish
your code before



v = db.query( SOME SLOW QUERY )
your code after

v = memcachedClient.get(key)
if (!v) {
    v = db.query( SOME SLOW QUERY )
    memcachedClient.set(key, v)
}
cache read-heavy stuff
invalidate when writing


• db.execute(“UPDATE foo WHERE ...”)
• memcachedClient.delete(...)
and, repeat

• each day...
 • look for the next slowest operations
 • add code to cache a few more things
your life gets better
thank you memcached!
no magic
you are in control
now for the decisions
memcached adoption

• first, start using memcached
 • poorly
   • but you can breathe again
memcached adoption


• next, start using memcached correctly
memcached adoption

• later
 • queueing
 • persistence
 • replication
 • ...
an early question
where to run servers?
answer 1

• right on your web servers

• a great place to start, if you have extra
  memory
servers

web app web app web app web app
memcached   memcached   memcached,   memcached




                  RDBMS
add up your memory
        usage!


• having memcached server swap == bad!
answer 2

• run memcached right on your database
  server?


• WRONG!
answer 3
• run memcached on separate dedicated
  memcached servers


• congratulations!
 • you either have enough money
 • or enough traffic that it matters
running a server

• daemonize

• don’t be root!

• no security
server lists

• mc-server1:11211
• mc-server2:11211
• mc-server3:11211
consistent hashing




 source: http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
client-side intelligence


• no “server master” bottleneck
libmemcached

• fast C memcached client
 • supports consistent hashing
 • many wrappers to your favorite languages
updating server lists

• push out new configs and restart?

• moxi
 • memcached + integrated proxy
keys

• no whitespace
• 250 char limit
• use short prefixes
keys & MD5

• don’t

• stats become useless
values
• any binary object

• 1MB limit
 • change #define & recompile if you want more
 • and you’re probably doing something wrong if
    you want more
values
• query resultset
•   serialized object
•   page fragment

•
•
    pages
    etc
nginx + memcached
>1 language?

• JSON
• protocol buffers
• XML
memcached is lossy


• memcached WILL lose data
that’s a good thing




         remember, it’s a CACHE
why is memcached
      lossy?
memcached node dies
when node restarts...

• you just get a bunch of cache misses
                    (and a short RDBMS spike)
eviction


more disappearing data!
LRU


• can config memcached to not evict
 • but, you’re probably doing something
    wrong if you do this
remember, it forgets


• it’s just a CACHE
expiration

• aka, timeouts

• memcached.set(key, value, timeout)
use expirations or not?
1st school of thought

• expirations hide bugs
• you should be doing proper invalidations
 • (aka, deletes)
 • coherency!
school 2

• it’s 3AM and I can’t think anymore

• business guy:
 • “sessions should auto-logout after 30
    minutes due to bank security policy”
put sessions
      in memcached?

• just a config change
 • eg, Ruby on Rails
good


• can load-balance requests to any web host
• don’t touch the RDBMS on every web
  request
bad


• could lose a user’s session
solution

• save sessions to memcached
• the first time, also save to RDBMS
 • ideally, asynchronously
• on cache miss, restore from RDBMS
solution

• save sessions to memcached
• the first time, also save to RDBMS
 • ideally, asynchronously
• on cache miss, restore from RDBMS
in the background...
• have a job querying the RDBMS
 • cron job?
• the job queries for “old” looking session
  records in the sessions table
  • refresh old session records from
    memcached
add vs replace vs set
append vs prepend
CAS


• compare - and - swap
incr and decr


• no negative numbers
queueing


• “hey, with those primitives, I could build a queue!”
don’t
• memcached is lossy
• protocol is incorrect for a queue
• instead
 • gearman
 • beanstalkd
 • etc
cache stampedes

• gearman job-unique-id
• encode a timestamp in your values
 • one app node randomly decides to
    refresh slightly early
coherency
denormalization


• or copies of data
example: changing a
   product price
memcached UDF’s

• another great tool in your toolbox

• on a database trigger, delete stuff from
  memcached
memcached UDF’s


• works even if you do UPDATES with fancy
  WHERE clauses
multigets

• they are your friend

• memcached is fast, but...
 • imagine 1ms for a get request
   • 200 serial gets ==> 200ms
a resultset loop

foreach product in resultset
  c = memcached.get(product.category_id)
  do something with c
2 loops
for product in resultset
  multiget_request.append(product.category_id)
multiget_response = memcachedClient.multiget(
  multiget_request)
for c in multiget_response
  do something with c
memcached slabber

• allocates memory into slabs

• it might “learn” the wrong slab sizes

• watch eviction stats
losing a node


• means your RDBMS gets hit
replication
• simple replication in libmemcached

• >= 2x memory cost
• only simple verbs
 • set, get, delete
• doesn’t handle flapping nodes
persistence
things that speak
        memcached

• tokyo tyrant
• memcachedb
• moxi
another day

• monitoring & statistics
• near caching
• moxi
thanks!!!

• love any feedback
 • your memcached war stories
    • your memcached wishlist

• steve.yen@northscale.com
thanks!

photo credits

 •     http://flickr.com/photos/davebluedevil/15877348/

 •     http://www.flickr.com/photos/theamarand/2874288064/

 •     http://www.flickr.com/photos/splityarn/3469596708/

 •     http://www.flickr.com/photos/heisnofool/3241930754/

 •     http://www.flickr.com/photos/onourminds/2885704630/

 •     http://www.flickr.com/photos/lunaspin/990825818/

Memcached Code Camp 2009