Key-Value-Stores:
The Key to Scaling?
     Tim Lossen
Who?
• @tlossen
• backend developer
   -   Ruby, Rails, Sinatra ...
• passionate about technology
Problem
Challenge
• backend for facebook game
Challenge
• backend for facebook game
• expected load:
   -   1 mio. daily active users
   -   20 mio. total users
   -   100 KB data per user
Challenge
• expected peak traffic:
   -   10.000 concurrent users
   -   200.000 requests / minute
Challenge
• expected peak traffic:
   -   10.000 concurrent users
   -   200.000 requests / minute
• write-heavy workload
Wanted
• scalable database
• with high throughput
   -   especially for writes
Options
• relational database
   -   with sharding
Options
• relational database
   -   with sharding
• nosql database
   -   key-value-store
   -   document db
   -   graph db
Options
• relational database
   -   with sharding
• nosql database
   -   key-value-store
   -   document db
   -   graph db
Options
• relational database
   -   with sharding
• nosql database
   -   key-value-store
   -   document db
   -   graph db
Options
• relational database
   -   with sharding
• nosql database
   -   key-value-store
   -   document db
   -   graph db
Shortlist
• Cassandra
• Redis
• Membase
Cassandra
Facts
• written in Java
   -   55.000 lines of code
• Thrift API
   -   clients for Java, Ruby, Python ...
History
• originally developed by Facebook
   -   in production for “Inbox Search”
• later open-sourced
   -   top-level Apache project
Features
• high availability
   -   no single point of failure
• incremental scalability
• eventual consistency
Architecture
• Dynamo-like hash ring
   -   partitioning + replication
   -   all nodes are equal
Hash Ring
Architecture
• Dynamo-like hash ring
   -   partitioning + replication
   -   all nodes are equal
• Bigtable data model
   -   column families
   -   supercolumns
“Cassandra aims to run
  on an in"astructure
 of hundreds of nodes.”
Redis
Facts
• written in C
   -   13.000 lines of code
• socket API
   -   redis-cli
   -   client libs for all major languages
Features
• high read & write throughput
   -   50.000 to 100.000 ops / second
Features
• high read & write throughput
   -   50.000 to 100.000 ops / second
• interesting data structures
   -   lists, hashes, (sorted) sets
   -   atomic operations
Features
• high read & write throughput
   -   50.000 to 100.000 ops / second
• interesting data structures
   -   lists, hashes, (sorted) sets
   -   atomic operations
• strong consistency
Architecture
• in-memory database
  -   append-only log on disk
  -   virtual memory
Architecture
• in-memory database
   -   append-only log on disk
   -   virtual memory
• single instance
   -   master-slave replication
   -   clustering is on roadmap
“Memory is the new disk,
  disk is the new tape.”
              — Jim Gray
Membase
Facts
• written in C and Erlang
• API-compatible to Memcached
   -   same protocol
• client libs for all major languages
History
• developed by NorthScale & Zynga
   -   used in production (Farmville)
• released in June 2010
   -   Apache 2.0 License
Features
• “Memcached with persistence”
   -   extremely fast
   -   throughput scales linearly
Features
• “Memcached with persistence”
   -   extremely fast
   -   throughput scales linearly
• automatic data placement
   -   memory, ssd, disk
Features
• “Memcached with persistence”
   -   extremely fast
   -   throughput scales linearly
• automatic data placement
   -   memory, ssd, disk
• configurable replica count
Architecture
• cluster
   -   all nodes are alike
   -   one elected as “coordinator”
Architecture
• cluster
   -   all nodes are alike
   -   one elected as “coordinator”
• each node is master for part of key
  space
   -   handles all reads & writes
Mapping Scheme
“simple, fast, elastic”
Solution
Which one
would you pick?
Decision
• Cassandra ?
Decision
• Cassandra ?
   -   too big, too complicated
Decision
• Cassandra ?
   -   too big, too complicated
• Membase ?
Decision
• Cassandra ?
   -   too big, too complicated
• Membase ?
   -   not yet available (then)
Decision
• Cassandra ?
   -   too big, too complicated
• Membase ?
   -   not yet available (then)
• Redis !
Motivation
• keep operations simple
• use as few machines as possible
   -   ideally, only one
Design
• two machines (+ load balancer)
   -   Redis master handles all reads /
       writes
   -   Redis slave as hot standby
Design
• two machines (+ load balancer)
   -   Redis master handles all reads /
       writes
   -   Redis slave as hot standby
   -   both machines used as app servers
Design
• two machines (+ load balancer)
   -   Redis master handles all reads /
       writes
   -   Redis slave as hot standby
   -   both machines used as app servers
• dedicated hardware
Data model
• one Redis hash per user
   -   key: facebook id
• store data as serialized JSON
   -   booleans, strings, numbers,
       timestamps ...
Advantages
• turns Redis into “document db”
   -   efficient to swap user data in / out
   -   atomic ops on parts
• easy to dump / restore user data
Capacity
• 4 GB memory for 20 mio. integer keys
   -   keys always stay in memory!
Capacity
• 4 GB memory for 20 mio. integer keys
   -   keys always stay in memory!
• 2 GB memory for 10.000 user hashes
   -   others can be swapped out
Capacity
• 4 GB memory for 20 mio. integer keys
   -   keys always stay in memory!
• 2 GB memory for 10.000 user hashes
   -   others can be swapped out
• 3.6 mio. ops / minute
   -   sufficient for 200.000 requests
Status
• game was launched in august
   -   currently still in beta
Status
• game was launched in august
   -   currently still in beta
• expect to reach 1 mio. daily active users
  in Q1/2011
Status
• game was launched in august
   -   currently still in beta
• expect to reach 1 mio. daily active users
  in Q1/2011
• will try to stick to 2 or 3 machines
   -   possibly bigger / faster ones
Conclusions
• use the right tool for the job
Conclusions
• use the right tool for the job
• keep it simple
   -   avoid sharding, if possible
Conclusions
• use the right tool for the job
• keep it simple
   -   avoid sharding, if possible
• don’t scale out too early
   -   but have a viable “plan b”
Conclusions
• use the right tool for the job
• keep it simple
   -   avoid sharding, if possible
• don’t scale out too early
   -   but have a viable “plan b”
• use dedicated hardware
Q&A
Links
• cassandra.apache.org
• redis.io
• membase.org



• tim.lossen.de

Key-Value-Stores -- The Key to Scaling?