Agile Memcached

                      Luciano Rocha
                   lfrocha@gmail.com



   Portuguese Perl Workshop, 2012/09/28




Luciano Rocha lfrocha@gmail.com   Agile Memcached
Agile




  By that I mean:
        continuous deployment, or frequent deployments
        systems’ code not in sync
        A/B experiment




              Luciano Rocha lfrocha@gmail.com   Agile Memcached
Caching
Why




  Why do we cache?
      saves us from re-doing expensive computations on every
      request;
      ergo, increase in capacity and faster response times




            Luciano Rocha lfrocha@gmail.com   Agile Memcached
Caching
Subtleties




    Pay attention to
         cache validity, expiration times
         resource consumption by the cache itself
         hard scalability dependency
         implicit data dependency not explicitly defined
         replacement strategies




               Luciano Rocha lfrocha@gmail.com   Agile Memcached
Cache types



  Storage medium:
       memory fast and expensive and very limited - low latency,
       high throughput;
       disk slow and cheap and you get TB of it - high latency,
       medium throughput.
  Locality:
       per-process;
       per-host;
       network.




              Luciano Rocha lfrocha@gmail.com   Agile Memcached
Memcached




 Memcached, because accessing other hosts’ RAM is faster than
 the local disk.
     non-persistent
     fast access/low latency
     faster than local disks




           Luciano Rocha lfrocha@gmail.com   Agile Memcached
Memcached
Cache::Memcached::Fast




   Perl interface to Memcached:
        XS interface (thus the ::Fast)
        namespace support
        automatic serialization of Perl data structures
        automatic compression
         multi support

   Storable is evil!
   Use Sereal.




               Luciano Rocha lfrocha@gmail.com   Agile Memcached
Memcached
Cache::Memcached::Fast example


   Simple operations: fetch and store.
         $cache −>s e t ( $key , $data , 1 ∗ 60 ∗ 60 ) ;
         $ i n c a c h e = $cache −>g e t ( $ k e y ) ;
         $cache −>d e l e t e ( $ k e y ) ;
   Group operations, pipeline requests
         $cache −>s e t m u l t i ( [ $key , $data , 1 ∗ 60 ∗ 60 ]
         $ h a s h r e f = $cache −>g e t m u l t i ( $key , . . . )
         $cache −>d e l e t e m u l t i ( $key , . . . )

    multi is useful
   Reduces request round-trip latency, and allows for creating a group
   of smaller, related items, instead of a single big item.


               Luciano Rocha lfrocha@gmail.com   Agile Memcached
Memcached
Less usual operations




   Memcached provides other operations, that can be used for
   app-level locking or lock-free operations.
                 $cache −>add | | $cache −>r e p l a c e ;
                 $cache −>g e t s ; . . . ; $cache −>c a s ;

                 $cache −>i n c r && $cache −>d e c r ;
                 $cache −>append ; $cache −>p r e p e n d ;




                 Luciano Rocha lfrocha@gmail.com   Agile Memcached
Memcached
few guarantees

   Don’t assume:
         what you get is what you stored;
         the item will always be there until it expires;
         space is freed right after item expires;
         free memory means your item can be stored.
   Not provided:
         namespace (key is just a string);
         list of ’current keys’;
         expire keys matching $foo
         other operations that would require global locks

   limited maximum item size
   This is a plus!

                 Luciano Rocha lfrocha@gmail.com   Agile Memcached
Memcached
also




       Internal fragmentation (check Twitter and other forks).
       Text protocol susceptible to command injection.
       It might be gone completely, in-process cache is still a good
       idea.




             Luciano Rocha lfrocha@gmail.com   Agile Memcached
Agile API




  Provide a simplified API. If it’s easy to use, it will be used:
       make it easier to use than Cache::Memcached::Fast
       global object, setup on startup, easy to access
       provide versions, categories and namespaces (easy
       invalidation)
       collocate getters and setters




             Luciano Rocha lfrocha@gmail.com   Agile Memcached
Agile API
Example




   Example of such an API:
   my $ i t e m = Cache : : S i n g l e I t e m −>g e t (
        i d => $ h o t e l −>i d ,
        c a t e g o r y => ’ c o m p e t i t i v e s e t ’ ,
       # namespace => ’ MyCompany ’ ,
        v e r s i o n => 2 ,
        v a l i d a t e => sub {
                $ h o t e l −>a v g p r i c e eq $ [0] − >{ a v g p r i c e }
        },
        b u i l d => sub { . . . } ,
   );




               Luciano Rocha lfrocha@gmail.com   Agile Memcached
Agile API
Explanation



   Arguments:
        id: easy to get from object.
        category: from module/topic/experiment.
        version: monotonically increasing.
        validate: callback that will verify item fetched or computed is
        still valid. Not if is in the expected format.
        build: callback that is responsible to build the item.

   Force explicit category
   Automatically generated, from modules/functions, causes bugs and
   prevents easy fix of others.



              Luciano Rocha lfrocha@gmail.com   Agile Memcached
Agile API
Agility


    How does it help? Fixes, experiments and logging.
       easy invalidation
            1   increment $version
            2   rollout
          format not set in stone
                experimenting with different compression and serialization
                methods
                store metadata with or alongside object
          do cleanups after request has been served (set delayed)?
          also, $build->() even if found, at cleanup, if close to
          expiration time?
          log time each operation takes
          log created keys (store in hadoop, get list of keys to delete
          from there)

                 Luciano Rocha lfrocha@gmail.com   Agile Memcached
Agile API
Correctness

   What’s wrong here?
    $ i t e m = Cache : : S i n g l e I t e m −>g e t (
             i d => $ h o t e l −>i d ,
             ... ,
             b u i l d => sub {
                      ... ,
                     $foo = get foo ( $hotel ) ;
                      ... ,
             },
    );

   sub g e t f o o {
       ...;
       $ b a r = e x p e r i m e n t ( ’X ’ )
              ? get bar new ()
              : get bar ();
       . . . Luciano Rocha lfrocha@gmail.com Agile Memcached
Agile API
Correctness

   What’s wrong here?
    $ i t e m = Cache : : S i n g l e I t e m −>g e t (
             i d => $ h o t e l −>i d ,
             ... ,
             b u i l d => sub {
                      ... ,
                     $foo = get foo ( $hotel ) ;
                      ... ,
             },
    );

   sub g e t f o o {
       ...;
       $ b a r = e x p e r i m e n t ( ’X ’ )
              ? get bar new ()
              : get bar ();
       . . . Luciano Rocha lfrocha@gmail.com Agile Memcached
Agile API
Correctness

   Easy fix:
          i d => j o i n ( ’ | ’ , $ h o t e l −>i d , e x p e r i m e n t ( ’X ’ ) ) ,
   Force developer to take it into consideration:
   package Foo : : E x p e r i m e n t ;
   sub e x p e r i m e n t {
       $Cache : : S i n g l e I t e m : : ALLOWED
           and e x i s t s $Cache : : S i n g l e I t e m : : ALLOWED{ $ [ 0 ] }
             o r Carp : : c o n f e s s . . .

   package Foo : : H o t e l ;
   $ i t e m = Cache : : S i n g l e I t e m −>g e t (
            ... ,
            e x p e r i m e n t s w h i t e l i s t => [ qw( X ) ] ,
   );

                Luciano Rocha lfrocha@gmail.com   Agile Memcached
Agile API
Correctness




   Your experiment results are now valid! (Or just closer to it...)




              Luciano Rocha lfrocha@gmail.com   Agile Memcached

Agile Memcached

  • 1.
    Agile Memcached Luciano Rocha lfrocha@gmail.com Portuguese Perl Workshop, 2012/09/28 Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 2.
    Agile Bythat I mean: continuous deployment, or frequent deployments systems’ code not in sync A/B experiment Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 3.
    Caching Why Whydo we cache? saves us from re-doing expensive computations on every request; ergo, increase in capacity and faster response times Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 4.
    Caching Subtleties Pay attention to cache validity, expiration times resource consumption by the cache itself hard scalability dependency implicit data dependency not explicitly defined replacement strategies Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 5.
    Cache types Storage medium: memory fast and expensive and very limited - low latency, high throughput; disk slow and cheap and you get TB of it - high latency, medium throughput. Locality: per-process; per-host; network. Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 6.
    Memcached Memcached, becauseaccessing other hosts’ RAM is faster than the local disk. non-persistent fast access/low latency faster than local disks Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 7.
    Memcached Cache::Memcached::Fast Perl interface to Memcached: XS interface (thus the ::Fast) namespace support automatic serialization of Perl data structures automatic compression multi support Storable is evil! Use Sereal. Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 8.
    Memcached Cache::Memcached::Fast example Simple operations: fetch and store. $cache −>s e t ( $key , $data , 1 ∗ 60 ∗ 60 ) ; $ i n c a c h e = $cache −>g e t ( $ k e y ) ; $cache −>d e l e t e ( $ k e y ) ; Group operations, pipeline requests $cache −>s e t m u l t i ( [ $key , $data , 1 ∗ 60 ∗ 60 ] $ h a s h r e f = $cache −>g e t m u l t i ( $key , . . . ) $cache −>d e l e t e m u l t i ( $key , . . . ) multi is useful Reduces request round-trip latency, and allows for creating a group of smaller, related items, instead of a single big item. Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 9.
    Memcached Less usual operations Memcached provides other operations, that can be used for app-level locking or lock-free operations. $cache −>add | | $cache −>r e p l a c e ; $cache −>g e t s ; . . . ; $cache −>c a s ; $cache −>i n c r && $cache −>d e c r ; $cache −>append ; $cache −>p r e p e n d ; Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 10.
    Memcached few guarantees Don’t assume: what you get is what you stored; the item will always be there until it expires; space is freed right after item expires; free memory means your item can be stored. Not provided: namespace (key is just a string); list of ’current keys’; expire keys matching $foo other operations that would require global locks limited maximum item size This is a plus! Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 11.
    Memcached also Internal fragmentation (check Twitter and other forks). Text protocol susceptible to command injection. It might be gone completely, in-process cache is still a good idea. Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 12.
    Agile API Provide a simplified API. If it’s easy to use, it will be used: make it easier to use than Cache::Memcached::Fast global object, setup on startup, easy to access provide versions, categories and namespaces (easy invalidation) collocate getters and setters Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 13.
    Agile API Example Example of such an API: my $ i t e m = Cache : : S i n g l e I t e m −>g e t ( i d => $ h o t e l −>i d , c a t e g o r y => ’ c o m p e t i t i v e s e t ’ , # namespace => ’ MyCompany ’ , v e r s i o n => 2 , v a l i d a t e => sub { $ h o t e l −>a v g p r i c e eq $ [0] − >{ a v g p r i c e } }, b u i l d => sub { . . . } , ); Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 14.
    Agile API Explanation Arguments: id: easy to get from object. category: from module/topic/experiment. version: monotonically increasing. validate: callback that will verify item fetched or computed is still valid. Not if is in the expected format. build: callback that is responsible to build the item. Force explicit category Automatically generated, from modules/functions, causes bugs and prevents easy fix of others. Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 15.
    Agile API Agility How does it help? Fixes, experiments and logging. easy invalidation 1 increment $version 2 rollout format not set in stone experimenting with different compression and serialization methods store metadata with or alongside object do cleanups after request has been served (set delayed)? also, $build->() even if found, at cleanup, if close to expiration time? log time each operation takes log created keys (store in hadoop, get list of keys to delete from there) Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 16.
    Agile API Correctness What’s wrong here? $ i t e m = Cache : : S i n g l e I t e m −>g e t ( i d => $ h o t e l −>i d , ... , b u i l d => sub { ... , $foo = get foo ( $hotel ) ; ... , }, ); sub g e t f o o { ...; $ b a r = e x p e r i m e n t ( ’X ’ ) ? get bar new () : get bar (); . . . Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 17.
    Agile API Correctness What’s wrong here? $ i t e m = Cache : : S i n g l e I t e m −>g e t ( i d => $ h o t e l −>i d , ... , b u i l d => sub { ... , $foo = get foo ( $hotel ) ; ... , }, ); sub g e t f o o { ...; $ b a r = e x p e r i m e n t ( ’X ’ ) ? get bar new () : get bar (); . . . Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 18.
    Agile API Correctness Easy fix: i d => j o i n ( ’ | ’ , $ h o t e l −>i d , e x p e r i m e n t ( ’X ’ ) ) , Force developer to take it into consideration: package Foo : : E x p e r i m e n t ; sub e x p e r i m e n t { $Cache : : S i n g l e I t e m : : ALLOWED and e x i s t s $Cache : : S i n g l e I t e m : : ALLOWED{ $ [ 0 ] } o r Carp : : c o n f e s s . . . package Foo : : H o t e l ; $ i t e m = Cache : : S i n g l e I t e m −>g e t ( ... , e x p e r i m e n t s w h i t e l i s t => [ qw( X ) ] , ); Luciano Rocha lfrocha@gmail.com Agile Memcached
  • 19.
    Agile API Correctness Your experiment results are now valid! (Or just closer to it...) Luciano Rocha lfrocha@gmail.com Agile Memcached