MEMCACHED
An Overview
Quick overview and best practices
- Asif Ali / Oct / 2010 / azifali@gmail.com
Image courtesy: memcached offical website
“The Disk is the new tape!”
Quoted from somewhere
What is memcache?
• Simple in memory caching system
• Can be used as a temporary in-memory data
store
• Stores data in-memory only.
• Excellent read performance.
• Great write performance.
• Data distribution between multiple servers.
• API available for most languages
Memcache
• NO – acid, data structures, etc etc.
• If you’re looking for a replacement for your DB
then you’re looking at the wrong solution.
• Simple “put” a data using a key
• Get the data using the key.
• Generally, your app remembers the key.
• Data typically expires or gets flushed out (so has a
short life).
• Does not auto commit to disk
When should I use memcached?
• When your database is optimized to the hilt and you
still need more out of it.
– Lots of SELECTs are using resources that could be better used
elsewhere in the DB.
– Locking issues keep coming up
• When table listings in the query cache are torn down
so often it becomes useless
• To get maximum “scale out” of minimum hardware
Use cases for memcached
• Anything what is more expensive to fetch from
elsewhere, and has sufficient hit rate, can be placed in
memcached
• Web Apps that require extremely fast reads
• Temporary data storage
• Global data store for your application data
• Session data (ROR)
• Extremely fast writes of any application data (that will
be committed to an actual data store later)
• Memcached is not a storage engine but a caching
system
NoSQL & SQL
• We’re not going to debate about the pros and
cons of the different NoSQL vs SQL solutions.
• My View: It really depends upon your requirements.
• NoSQL caters to large but specific problems and in
my opinion, there is no all in one solution.
• Your comfort of MySQL or SQL statements might not
solve certain scale problems
For Example
• If you have hundreds of Terra Bytes of data and would like to
process it, then clearly – Map Reduce + Hive would be a great
solution for it.
• If you have large data and would like to do real time analytics
/ queries where processing speed is important then you might
want to consider Cassandra / MongoDB
• If your app needs massive amounts of simple reads then
clearly memcached is probably the solution of choice
• If you want large storage of data with non aggregate
processing and very comfortable with MySQL then a MySQL
based sharded solution can do wonders.
MEMCACHED SERVER
Memcached Server Info
• Memcache server accepts read / write TCP
connections through a standalone app or daemon.
• Clients connect on specific ports to read / write
• You can allocate a fixed memory for memcached to
use.
• Memcached stores data in the RAM (of course!)
• Memcached uses libevent and scales pretty well
(theoritical limit 200,000)
Configuration options
• Max connections
• Threads
• Port number
• Type of process – foreground or daemon
• Tcp / udp
Read write (Pseudo code)
• Set “key”, “value”, ”expire time”
• A = get “key”
Sample Ruby code
• Connection:
– With one server:
M = MemCache.new ‘localhost:11211’, :namespace
=> ‘my_namespace’
– With multiple servers:
M = MemCache.new %w[one.example.com:11211
two.example.com:11211], :namespace =>
‘my_namespace’
Sample Ruby code
• Usage
– m = MemCache.new('localhost:11211')
– m.set 'abc', 'xyz‘
– m.get 'abc‘ => ‘xyz’
– m.replace ‘abc’, ‘123’
– m.get ‘abc’ => ‘123’
– m.delete ‘abc’ => ‘DELETED’
– m.get ‘abc’ => nil
Memcached data structure
• Generally simple strings
• Simple strings are compatible with other
languages.
• You can also store other objects example
hashes.
• Complex data stored using one language
generally may not be fetchable from different
systems
Sample Ruby code
• Memcache can store data of any data types.
– m = MemCache.new('localhost:11211')
– m.set ‘my_string’, ‘Hello World !’
– m.set ‘my_integer’, 100
– m.set ‘my_array’, [1,”hello”,”2.0”]
– m.set ‘my_hash’, {‘1’=>’one’,’2’=>’two’}
– m.set ‘my_complex_data’, <any complex data>
Limits of memcached
• Keys can be no more then 250 characters
• Stored data can not exceed 1M (largest typical slab
size) per key
• There are generally no limits to the number of nodes
running memcache
• There are generally no limits the the amount of RAM
used by memcache over all nodes
– 32 bit machines do have a limit of 4GB though
FEATURES
Memcached can replicate for high
availability
Memcached 1
Data structure A
Memcached 2
Copy of
Data structure A
Data can be distributed using
memcached
Memcached 1
Part 1 - Data
structure A
Memcached 2
Part - 2
The connecting client can connect to either memcached 1 or memcached 2 to
fetch any data that is distributed across the two servers
Memcached can store any object
• Simple String
• Hash
• Other
• Strings can be read / written among
heterogenous systems.
Memcached best practices
Best Practices
• Allocate enough space for memcached.
• Memcache does not auto save data into disk so don’t
forget to serialize your data if you need to.
• A memcached crash could lead to a large downtime
if you have to load a lot of data to load (into
memcache) so you should
– Replicate memcached data OR
– Find algorithms to store data as fast as you can OR
– Have redundant servers with similar data and handle “no
data” situation well.
Best Practices
• Shared nothing, non replicated set of servers works the best.
• Don’t try to replicate your database environment into
Memcached. It is not what it was meant for.
• Don’t store data for too long in memcached. Least recently
used items (LRU) are evicted automatically in certain
scenarios.
• Reload data that is required for your reads as often as
possible
• Don’t go just by existing benchmarks; Find out your
application benchmarks in your environment and use that
variable to scale.
What to look out for
• Memcached performance can hit roadblocks
under very high reads and writes on a single
instance so try to isolate read / write instances
or see what works best for your app.
• Data increments on memcache values –
remember this is a shared memory space and
data is not always “locked” before being
updated.
OUR MEMCACHED USAGE
Usage History
• Considered using memcached more than 2
years ago due to problems in high volumen
MySQL read and writes.
• Used it first as a way to store Mongrel’s
session variables instead of MySQL.
The Problem we tried to solve
• Deliver a matching ad to a complex set of variables
• 6000+ devices in device db.
• 8-10m records in ip data
• Country / carrier / manufacturer / channel / targeting
• IP filters, black lists
• Bot Filtering
• Ad moderation etc etc etc..
• Typical In an ad network scenario
The solution needed to
• Make data fetching as fast as possible
• Make data structures simple
• Make processing extremely simple (for
example a string comparision vs a SQL Query)
• Make processing of data as fast as possible.
• Complete all processing 50 ms or less.
..continued
• Using a database was possible in low traffic
scenario.
• Traffic grew and so did database nightmares.
Our usage of memcached
• Hundreds of millions of reads daily; small
chunks of data. Infrequent writes.
• More than 60G of memcached shared data
space.
• MySQL was fine until we moved our
performance metric from seconds to ms.
• Current total transaction time between 21-
50ms.
Our partial stack
Memcached clients currently used
• http://deveiate.org/code/Ruby-MemCache/
• http://github.com/higepon/memcached-client
• There are newer and better clients that can be
used.
Questions?

Memcached Presentation

  • 1.
    MEMCACHED An Overview Quick overviewand best practices - Asif Ali / Oct / 2010 / azifali@gmail.com Image courtesy: memcached offical website
  • 2.
    “The Disk isthe new tape!” Quoted from somewhere
  • 3.
    What is memcache? •Simple in memory caching system • Can be used as a temporary in-memory data store • Stores data in-memory only. • Excellent read performance. • Great write performance. • Data distribution between multiple servers. • API available for most languages
  • 4.
    Memcache • NO –acid, data structures, etc etc. • If you’re looking for a replacement for your DB then you’re looking at the wrong solution. • Simple “put” a data using a key • Get the data using the key. • Generally, your app remembers the key. • Data typically expires or gets flushed out (so has a short life). • Does not auto commit to disk
  • 5.
    When should Iuse memcached? • When your database is optimized to the hilt and you still need more out of it. – Lots of SELECTs are using resources that could be better used elsewhere in the DB. – Locking issues keep coming up • When table listings in the query cache are torn down so often it becomes useless • To get maximum “scale out” of minimum hardware
  • 6.
    Use cases formemcached • Anything what is more expensive to fetch from elsewhere, and has sufficient hit rate, can be placed in memcached • Web Apps that require extremely fast reads • Temporary data storage • Global data store for your application data • Session data (ROR) • Extremely fast writes of any application data (that will be committed to an actual data store later) • Memcached is not a storage engine but a caching system
  • 7.
    NoSQL & SQL •We’re not going to debate about the pros and cons of the different NoSQL vs SQL solutions. • My View: It really depends upon your requirements. • NoSQL caters to large but specific problems and in my opinion, there is no all in one solution. • Your comfort of MySQL or SQL statements might not solve certain scale problems
  • 8.
    For Example • Ifyou have hundreds of Terra Bytes of data and would like to process it, then clearly – Map Reduce + Hive would be a great solution for it. • If you have large data and would like to do real time analytics / queries where processing speed is important then you might want to consider Cassandra / MongoDB • If your app needs massive amounts of simple reads then clearly memcached is probably the solution of choice • If you want large storage of data with non aggregate processing and very comfortable with MySQL then a MySQL based sharded solution can do wonders.
  • 9.
  • 10.
    Memcached Server Info •Memcache server accepts read / write TCP connections through a standalone app or daemon. • Clients connect on specific ports to read / write • You can allocate a fixed memory for memcached to use. • Memcached stores data in the RAM (of course!) • Memcached uses libevent and scales pretty well (theoritical limit 200,000)
  • 11.
    Configuration options • Maxconnections • Threads • Port number • Type of process – foreground or daemon • Tcp / udp
  • 12.
    Read write (Pseudocode) • Set “key”, “value”, ”expire time” • A = get “key”
  • 13.
    Sample Ruby code •Connection: – With one server: M = MemCache.new ‘localhost:11211’, :namespace => ‘my_namespace’ – With multiple servers: M = MemCache.new %w[one.example.com:11211 two.example.com:11211], :namespace => ‘my_namespace’
  • 14.
    Sample Ruby code •Usage – m = MemCache.new('localhost:11211') – m.set 'abc', 'xyz‘ – m.get 'abc‘ => ‘xyz’ – m.replace ‘abc’, ‘123’ – m.get ‘abc’ => ‘123’ – m.delete ‘abc’ => ‘DELETED’ – m.get ‘abc’ => nil
  • 15.
    Memcached data structure •Generally simple strings • Simple strings are compatible with other languages. • You can also store other objects example hashes. • Complex data stored using one language generally may not be fetchable from different systems
  • 16.
    Sample Ruby code •Memcache can store data of any data types. – m = MemCache.new('localhost:11211') – m.set ‘my_string’, ‘Hello World !’ – m.set ‘my_integer’, 100 – m.set ‘my_array’, [1,”hello”,”2.0”] – m.set ‘my_hash’, {‘1’=>’one’,’2’=>’two’} – m.set ‘my_complex_data’, <any complex data>
  • 17.
    Limits of memcached •Keys can be no more then 250 characters • Stored data can not exceed 1M (largest typical slab size) per key • There are generally no limits to the number of nodes running memcache • There are generally no limits the the amount of RAM used by memcache over all nodes – 32 bit machines do have a limit of 4GB though
  • 18.
  • 19.
    Memcached can replicatefor high availability Memcached 1 Data structure A Memcached 2 Copy of Data structure A
  • 20.
    Data can bedistributed using memcached Memcached 1 Part 1 - Data structure A Memcached 2 Part - 2 The connecting client can connect to either memcached 1 or memcached 2 to fetch any data that is distributed across the two servers
  • 21.
    Memcached can storeany object • Simple String • Hash • Other • Strings can be read / written among heterogenous systems.
  • 22.
  • 23.
    Best Practices • Allocateenough space for memcached. • Memcache does not auto save data into disk so don’t forget to serialize your data if you need to. • A memcached crash could lead to a large downtime if you have to load a lot of data to load (into memcache) so you should – Replicate memcached data OR – Find algorithms to store data as fast as you can OR – Have redundant servers with similar data and handle “no data” situation well.
  • 24.
    Best Practices • Sharednothing, non replicated set of servers works the best. • Don’t try to replicate your database environment into Memcached. It is not what it was meant for. • Don’t store data for too long in memcached. Least recently used items (LRU) are evicted automatically in certain scenarios. • Reload data that is required for your reads as often as possible • Don’t go just by existing benchmarks; Find out your application benchmarks in your environment and use that variable to scale.
  • 25.
    What to lookout for • Memcached performance can hit roadblocks under very high reads and writes on a single instance so try to isolate read / write instances or see what works best for your app. • Data increments on memcache values – remember this is a shared memory space and data is not always “locked” before being updated.
  • 26.
  • 27.
    Usage History • Consideredusing memcached more than 2 years ago due to problems in high volumen MySQL read and writes. • Used it first as a way to store Mongrel’s session variables instead of MySQL.
  • 28.
    The Problem wetried to solve • Deliver a matching ad to a complex set of variables • 6000+ devices in device db. • 8-10m records in ip data • Country / carrier / manufacturer / channel / targeting • IP filters, black lists • Bot Filtering • Ad moderation etc etc etc.. • Typical In an ad network scenario
  • 29.
    The solution neededto • Make data fetching as fast as possible • Make data structures simple • Make processing extremely simple (for example a string comparision vs a SQL Query) • Make processing of data as fast as possible. • Complete all processing 50 ms or less.
  • 30.
    ..continued • Using adatabase was possible in low traffic scenario. • Traffic grew and so did database nightmares.
  • 31.
    Our usage ofmemcached • Hundreds of millions of reads daily; small chunks of data. Infrequent writes. • More than 60G of memcached shared data space. • MySQL was fine until we moved our performance metric from seconds to ms. • Current total transaction time between 21- 50ms.
  • 32.
  • 33.
    Memcached clients currentlyused • http://deveiate.org/code/Ruby-MemCache/ • http://github.com/higepon/memcached-client • There are newer and better clients that can be used.
  • 34.