MEMCACHED
Sam Warmuth & Dex Delfrate




                             11/12/09
THE PROBLEM



• MySQL   is great for small/medium usage, but can’t scale to
 today’s demands

• Some   things can be done for reads, but writes don’t scale
HOW DO WE FIX IT?
• Scale   MySQL

 • Better   indexes

 • Buy    bigger HDs

 • Buy    faster CPUs

 • Master Write, several   Reads
HOW DO WE FIX IT?
• Scale   MySQL

 • Better   indexes

 • Buy    bigger HDs

 • Buy    faster CPUs

 • Master Write, several   Reads

    OR
  • Cache   the queries
CACHING



• Querying   the database on every request is overkill

• Data   might be loaded thousands of times between changes
CACHING



• Every
      time you make a query that might be repeated, save it
 in RAM

• Next    time someone asks for that info, use the cached version
HOW MUCH FASTER IS RAM?


   RAM: 83NS
   HARD DISK: 13.7MS
HOW MUCH FASTER IS RAM?
HOW MUCH FASTER IS RAM?



RAM is 165,000 times   faster than disk access
WHAT IS MEMCACHED?


• Sits   in RAM

• Dumb     caching layer

• One    gigantic hash table
BASIC LAYOUT
     User Requests


 Application Framework



      Memcached
        Server


    MySQL/Postgres
       Server
HASHING
                        Hash Index
• How   Hashing Works           0
                                1
   • hash(X)=2
   • hash(Y)=1                  2
   • hash(Z)=4                  3
                                4
                                5
HASHING
                        Hash Index
• How   Hashing Works           0
                                1    Y
   • hash(X)=2
   • hash(Y)=1                  2    X
   • hash(Z)=4                  3
                                4    Z
                                5
MULTIPLE SERVERS
                            Hash Index
                                      0
• Modulo Division
                                      1   Y
• Client calculates hash, runs modulo 2   X
  to figure out which server
                                      3
    • 3 servers, hash value 4 % 3 = 1 4   Z
      (2nd server)                    5
MULTIPLE SERVERS
                            Hash Index
                                      0       Server 1
• Modulo Division
                                      1   Y   Server 2
• Client calculates hash, runs modulo 2   X   Server 3
  to figure out which server
                                      3       Server 1
    • 3 servers, hash value 4 % 3 = 1 4   Z   Server 2
      (2nd server)                    5       Server 3
What happens when you want
to add a server or a server goes
down?
WHEN A SERVER GOES
     DOWN
    Index
       0        Server 1
       1    Y   Server 2
       2    X   Server 3
       3        Server 1
       4    Z   Server 2
       5        Server 3
WHEN A SERVER GOES
           DOWN
                 Index
                     0       Server 1
Accesses of Server   1   Y       Server 2
 2s data will miss   2   X   Server 3
   every time.
                     3       Server 1
                     4   Z       Server 2
                     5       Server 3
CONSISTENT HASHING


• Hash   the servers

• For   each lookup, find the server with the nearest hash
4 Servers: A, B, C, D      0
h(A) = 11             11       1
h(B) = 5
h(C) = 8        10                     2
h(D) = 3


           9                               3



               8                       4

                     7             5
                           6
4 Servers: A, B, C, D      0
h(A) = 11             11       1
h(B) = 5               A
h(C) = 8        10                     2
h(D) = 3


          9                            D   3


                  C
              8                        4
                               B
                      7            5
                           6
0
             11          1
                 A
    10                           2



9                                D   3


         C
    8                            4
                         B
             7               5
                     6
0
             11          1
                 A
    10                           2



9                                D   3


         C
    8                            4
                         B
             7               5
                     6
0
             11          1
                 A
    10                           2



9                                D   3


         C
    8                            4
                         B
             7               5
                     6
0
             11          1
                 A
    10                           2



9                                D   3


         C
    8                            4
                         B
             7               5
                     6
0
             11          1
                 A
    10                           2



9                                D   3


         C
    8                            4
                         B
             7               5
                     6
0
             11          1
                 A
    10                           2



9                                D   3


         C
    8                            4
                         B
             7               5
                     6
0
             11          1
                 A
    10                           2



9                                D   3


         C
    8                            4
                         B
             7               5
                     6
IF A SERVER GOES DOWN
       A




                  D



   C

              B
IF A SERVER GOES DOWN
       A




                  D



   C

              B
IF A SERVER GOES DOWN
       A




                  D



   C

              B
IF A SERVER GOES DOWN
       A




                  D



   C

              B
IF A SERVER GOES DOWN
       A




                  D



   C

              B
CONSISTENT HASHING
• Pros

 • If
    one server goes down, all of its requests move on to the
   next server

 • Seamlessly   add new servers

• Cons

 • Hashes   distribute mostly evenly, but could cluster

   • Solution: split   servers into multiple sub-server nodes
0
         11       1


    10                    2



9                             3



    8                     4

         7            5
              6
0
             11                1
                 A1
                                   B2
    10                                  2
        C1




9                                       D2   3
    D1




    8                                   4
                                   B1
                 C2
             7        A2           5
                           6
SCALING



• Not   only is memcached fast, it scales horizontally.
WHAT IS SCALING?

 There are two kinds of scaling:



 Vertical    &    Horizonal
WHAT IS SCALING?

 There are two kinds of scaling:



 Vertical    &    Horizonal
WHAT IS SCALING?

 There are two kinds of scaling:



 Vertical    &    Horizonal
VERTICAL SCALING


• “Traditional” scaling
• Transitionsare very complicated
• The old machine is useless afterward.
• There’s a cap on how big you can go.
• “Traditional” scaling
• Transitionsare very complicated
• The old machine is useless afterward.
• There’s a cap on how big you can go.
• “Traditional” scaling
• Transitionsare very complicated
• The old machine is useless afterward.
• There’s a cap on how big you can go.


                      User Requests


                          Router



                          Server A
• “Traditional” scaling
• Transitionsare very complicated
• The old machine is useless afterward.
• There’s a cap on how big you can go.


                      User Requests


                          Router




                    Server A
HORIZONTAL SCALING



• More   servers creates more capacity.

• Transparent   to the application

• No   single point of failure
• More   servers creates more capacity.

• Transparent   to the application

• No   single point of failure
• More   servers creates more capacity.

• Transparent   to the application

• No   single point of failure

                     User Requests


                         Router
• More   servers creates more capacity.

• Transparent   to the application

• No   single point of failure

                     User Requests


                         Router



                        Server A
• More   servers creates more capacity.

• Transparent   to the application

• No   single point of failure

                     User Requests


                         Router



          Server A                   Server B
USING MEMCACHED




v = myDB.query( DB query )
USING MEMCACHED


v = memcachedClient.get(key)

if (v == null)

  v = myDB.query( DB query )

  memcachedClient.set(key, v)

end
WRITING DATA



• When       updating data, either update or delete cache

• It’ll   be cached on the next request
WHO USES IT?



• 350   million active users

• Avg. user   is on 55 minutes/day

• Very   connected data

• Everyone    loads everyone else’s data all the time
WHO USES IT?



• 350   million active users

• Avg. user   is on 55 minutes/day

• Very   connected data

• Everyone    loads everyone else’s data all the time
WHO USES IT?

     1750
          requ
      per s    ests
• 350
           econ
        million active users

• Avg. user

• Very
                d
              is on 55 minutes/day

         connected data

• Everyone    loads everyone else’s data all the time
RESULTS




• 95%   cache hit rate

• DB   queries take ~5 ms, memcached hits ~.5ms
RESULTS




• 95%   cache hit rate

• DB   queries take ~5 ms, memcached hits ~.5ms
WHO USES IT?




• More   than100 Million Videos, 1 BILLION views/day

• Everyone   loads everyone else’s data all the time
• Memcached   handles all thumbnails, descriptions, etc.
WHO USES IT?




• Over    3 million articles, 7 Billion page views/month

• Saves   every version of articles in the database, so cache works
 great

• Memcached      split over hundreds of servers.
REFERENCES

• http://memcached.org/

• http://download.tangent.org/talks/Memcached%20Study.pdf

• http://code.google.com/p/memcached/wiki/Start

• http://en.wikipedia.org/wiki/Memcached

• http://adam.blog.heroku.com/past/2009/7/6/
 sql_databases_dont_scale/

Memcached

  • 1.
    MEMCACHED Sam Warmuth &Dex Delfrate 11/12/09
  • 2.
    THE PROBLEM • MySQL is great for small/medium usage, but can’t scale to today’s demands • Some things can be done for reads, but writes don’t scale
  • 3.
    HOW DO WEFIX IT? • Scale MySQL • Better indexes • Buy bigger HDs • Buy faster CPUs • Master Write, several Reads
  • 4.
    HOW DO WEFIX IT? • Scale MySQL • Better indexes • Buy bigger HDs • Buy faster CPUs • Master Write, several Reads OR • Cache the queries
  • 5.
    CACHING • Querying the database on every request is overkill • Data might be loaded thousands of times between changes
  • 6.
    CACHING • Every time you make a query that might be repeated, save it in RAM • Next time someone asks for that info, use the cached version
  • 7.
    HOW MUCH FASTERIS RAM? RAM: 83NS HARD DISK: 13.7MS
  • 8.
  • 9.
    HOW MUCH FASTERIS RAM? RAM is 165,000 times faster than disk access
  • 10.
    WHAT IS MEMCACHED? •Sits in RAM • Dumb caching layer • One gigantic hash table
  • 11.
    BASIC LAYOUT User Requests Application Framework Memcached Server MySQL/Postgres Server
  • 12.
    HASHING Hash Index • How Hashing Works 0 1 • hash(X)=2 • hash(Y)=1 2 • hash(Z)=4 3 4 5
  • 13.
    HASHING Hash Index • How Hashing Works 0 1 Y • hash(X)=2 • hash(Y)=1 2 X • hash(Z)=4 3 4 Z 5
  • 14.
    MULTIPLE SERVERS Hash Index 0 • Modulo Division 1 Y • Client calculates hash, runs modulo 2 X to figure out which server 3 • 3 servers, hash value 4 % 3 = 1 4 Z (2nd server) 5
  • 15.
    MULTIPLE SERVERS Hash Index 0 Server 1 • Modulo Division 1 Y Server 2 • Client calculates hash, runs modulo 2 X Server 3 to figure out which server 3 Server 1 • 3 servers, hash value 4 % 3 = 1 4 Z Server 2 (2nd server) 5 Server 3
  • 16.
    What happens whenyou want to add a server or a server goes down?
  • 17.
    WHEN A SERVERGOES DOWN Index 0 Server 1 1 Y Server 2 2 X Server 3 3 Server 1 4 Z Server 2 5 Server 3
  • 18.
    WHEN A SERVERGOES DOWN Index 0 Server 1 Accesses of Server 1 Y Server 2 2s data will miss 2 X Server 3 every time. 3 Server 1 4 Z Server 2 5 Server 3
  • 19.
    CONSISTENT HASHING • Hash the servers • For each lookup, find the server with the nearest hash
  • 20.
    4 Servers: A,B, C, D 0 h(A) = 11 11 1 h(B) = 5 h(C) = 8 10 2 h(D) = 3 9 3 8 4 7 5 6
  • 21.
    4 Servers: A,B, C, D 0 h(A) = 11 11 1 h(B) = 5 A h(C) = 8 10 2 h(D) = 3 9 D 3 C 8 4 B 7 5 6
  • 22.
    0 11 1 A 10 2 9 D 3 C 8 4 B 7 5 6
  • 23.
    0 11 1 A 10 2 9 D 3 C 8 4 B 7 5 6
  • 24.
    0 11 1 A 10 2 9 D 3 C 8 4 B 7 5 6
  • 25.
    0 11 1 A 10 2 9 D 3 C 8 4 B 7 5 6
  • 26.
    0 11 1 A 10 2 9 D 3 C 8 4 B 7 5 6
  • 27.
    0 11 1 A 10 2 9 D 3 C 8 4 B 7 5 6
  • 28.
    0 11 1 A 10 2 9 D 3 C 8 4 B 7 5 6
  • 29.
    IF A SERVERGOES DOWN A D C B
  • 30.
    IF A SERVERGOES DOWN A D C B
  • 31.
    IF A SERVERGOES DOWN A D C B
  • 32.
    IF A SERVERGOES DOWN A D C B
  • 33.
    IF A SERVERGOES DOWN A D C B
  • 34.
    CONSISTENT HASHING • Pros • If one server goes down, all of its requests move on to the next server • Seamlessly add new servers • Cons • Hashes distribute mostly evenly, but could cluster • Solution: split servers into multiple sub-server nodes
  • 35.
    0 11 1 10 2 9 3 8 4 7 5 6
  • 36.
    0 11 1 A1 B2 10 2 C1 9 D2 3 D1 8 4 B1 C2 7 A2 5 6
  • 37.
    SCALING • Not only is memcached fast, it scales horizontally.
  • 38.
    WHAT IS SCALING? There are two kinds of scaling: Vertical & Horizonal
  • 39.
    WHAT IS SCALING? There are two kinds of scaling: Vertical & Horizonal
  • 40.
    WHAT IS SCALING? There are two kinds of scaling: Vertical & Horizonal
  • 41.
    VERTICAL SCALING • “Traditional”scaling • Transitionsare very complicated • The old machine is useless afterward. • There’s a cap on how big you can go.
  • 42.
    • “Traditional” scaling •Transitionsare very complicated • The old machine is useless afterward. • There’s a cap on how big you can go.
  • 43.
    • “Traditional” scaling •Transitionsare very complicated • The old machine is useless afterward. • There’s a cap on how big you can go. User Requests Router Server A
  • 44.
    • “Traditional” scaling •Transitionsare very complicated • The old machine is useless afterward. • There’s a cap on how big you can go. User Requests Router Server A
  • 45.
    HORIZONTAL SCALING • More servers creates more capacity. • Transparent to the application • No single point of failure
  • 46.
    • More servers creates more capacity. • Transparent to the application • No single point of failure
  • 47.
    • More servers creates more capacity. • Transparent to the application • No single point of failure User Requests Router
  • 48.
    • More servers creates more capacity. • Transparent to the application • No single point of failure User Requests Router Server A
  • 49.
    • More servers creates more capacity. • Transparent to the application • No single point of failure User Requests Router Server A Server B
  • 50.
    USING MEMCACHED v =myDB.query( DB query )
  • 51.
    USING MEMCACHED v =memcachedClient.get(key) if (v == null) v = myDB.query( DB query ) memcachedClient.set(key, v) end
  • 52.
    WRITING DATA • When updating data, either update or delete cache • It’ll be cached on the next request
  • 53.
    WHO USES IT? •350 million active users • Avg. user is on 55 minutes/day • Very connected data • Everyone loads everyone else’s data all the time
  • 54.
    WHO USES IT? •350 million active users • Avg. user is on 55 minutes/day • Very connected data • Everyone loads everyone else’s data all the time
  • 55.
    WHO USES IT? 1750 requ per s ests • 350 econ million active users • Avg. user • Very d is on 55 minutes/day connected data • Everyone loads everyone else’s data all the time
  • 56.
    RESULTS • 95% cache hit rate • DB queries take ~5 ms, memcached hits ~.5ms
  • 57.
    RESULTS • 95% cache hit rate • DB queries take ~5 ms, memcached hits ~.5ms
  • 58.
    WHO USES IT? •More than100 Million Videos, 1 BILLION views/day • Everyone loads everyone else’s data all the time
  • 59.
    • Memcached handles all thumbnails, descriptions, etc.
  • 60.
    WHO USES IT? •Over 3 million articles, 7 Billion page views/month • Saves every version of articles in the database, so cache works great • Memcached split over hundreds of servers.
  • 61.
    REFERENCES • http://memcached.org/ • http://download.tangent.org/talks/Memcached%20Study.pdf •http://code.google.com/p/memcached/wiki/Start • http://en.wikipedia.org/wiki/Memcached • http://adam.blog.heroku.com/past/2009/7/6/ sql_databases_dont_scale/