KeyValue Stores
 Jedi Master Edition
Who?
Antonio Garrote
@antoniogarrote



Mauro Pompilio
@malditogeek



Pablo Delgado
@pablete
Agenda
•Why?
•Definitions
•CouchDB
•Redis
•Cassandra
•Ruby Libraries
•Demo application
•Data modeling
•Benchmark
Why?
•Scalability
•Availability
•Fault Tolerance
•Schema-free
•Ease of use
•Performance
•Elasticity
•blah blah blah
NO
silver bullet!
NoSQL != NoSQL
 No SQL  Not Only SQL
Taxonomy
•Key-value stores:
Redis, Voldemort, Cassandra
•Column-oriented datastores:
Cassandra, HBase
•Document collection databases:
CouchDB, MongoDB
•Graph database:
Neo4J, AllegroGraph
•Data structure store:
Redis
CouchDB
   relax!
 •Damien Katz
 •Erlang - OTP compliant
 •schema-less documents
 •high availability
 •completely distributed
 •made for the web
CouchDB


B-Trees . MapReduce . MVCC
Ruby Libraries
•CouchDB

 •Pure: net/http + JSON implementation

 •Thin wrapper: Couchrest
 http://github.com/jchris/couchrest


 •ORM/ActiveRecord: ActiveCouch,
 CouchObject, RelaxDB ..etc
 http://github.com/arunthampi/activecouch
 http://github.com/paulcarey/relaxdb
CouchDB
•Rocks
  •Simplicity and elegance
  •Much more than a DB
  •New possibilities for web apps

•Sucks
  •Speed
  •Speed
  •Speed
Redis
       il meglio d'Italia




classy as a           tasty as
  Giulietta           a pizza
Redis
•Salvatore 'antirez' Sanfilippo
•ANSI C - POSIX compliant

•MemCache-like (on steroids)
•Data structures store:
  •strings
  •counters
  •lists
  •sets + sorted sets (>= 1.1)
Ruby Libraries
•Redis

  •Client: redis-rb
  http://github.com/ezmobius/redis-rb


  •Hash/Object mapper: Ohm
  http://github.com/soveran/ohm


  •ORM: RedisRecord
  http://github.com/malditogeek/redisrecord
Redis
require 'redis'
redis = Redis.new

# Strings
redis['foo'] = 'bar' # => 'bar'
redis['foo']         # => 'bar'

# Expirations
redis.expire('foo', 5) # will expire existing key 'foo' in 5 sec
redis.set('foo', 'bar', 5) # set 'foo' with 5 sec expiration

# Counters
redis.incr('counter')     # => 1
redis.incr('counter', 10) # => 11
redis.decr('counter')     # => 10
Redis
# Lists
%w(1st 2nd 3rd).each { |item| redis.push_tail('logs', item) }
redis.list_range('logs', 0, -1) # => ["1st", "2nd", "3rd"]
redis.pop_head('logs')          # => "1st"
redis.pop_tail('logs')          # => "3rd"


# Sets
%w(one two).each { |item| redis.set_add('foo-tags', item) }
%w(two three).each { |item| redis.set_add('bar-tags', item) }
redis.set_intersect('foo-tags', 'bar-tags') # => ["two"]
redis.set_union('foo-tags', 'bar-tags')     # => ["three", "two",
"one"]
Redis
•Rocks
  •Speed, in memory dataset
  •Asynch non-blocking persistence
  •Non-blocking replication
  •Data structures with atomic operations
  •Ease of use and deployment
•Sucks
  •Sharding (client-side only at the moment)
  •Datasets > RAM
  •Very frequent code updates (?)
Redis
Upcoming coolness...


   •1.1
          •Sorted sets (ZSET), append-only journaling
   •1.2
          •HASH type, JSON dump tool
   •1.3
          •Virtual memory (datasets > RAM)
   •1.4
          •Redis-cluster proxy: consistent hashing and fault
          tollerant nodes
   •1.5
          •Optimizations, UDP GET/SET
Cassandra

BigTable       Dynamo
  by
           +       by
Cassandra
Structure Storage System over P2P network


             •Developed at Facebook
             •Java

             •Dynamo: partition and
             replication
             •Bigtable: Log-structured
             ColumnFamily data model
Ruby Libraries
•Cassandra

  •Client: cassandra
  http://github.com/fauna/cassandra


  •ORM: cassandra_object
  http://github.com/NZKoz/cassandra_object


  •ORM: BigRecord
  http://github.com/openplaces/bigrecord
Cassandra
•Rocks
  •High Availability
  •Incremental Scalability
  •Minimal Administration
  •No Single Point of Failure
•Sucks
  •Thrift API (...not so bad)
  •Change Schema, restart server
  •The Logo
Demo Application




http://github.com/antoniogarrote/conf_rails_hispana_2009
Data Modeling
•Class mapping
•ID generation
•Relationships
   •one-to-one
   •one-to-many
   •many-to-many
•Index sorting
•Pagination
•Data filtering
Cassandra
•Class mapping
    • ColumnFamily :Blog, :Post

•ID generation
  •UUID.new(Time.now)

•Relationships
  •Use ColumnFamily :PostsforUser to
  hold all posts that belong to a user
Cassandra
•Index sorting
  •Columns within a ColumnFamily are stored in
  sorted order. Keys are also sorted (if
  OrderPreservingPartitioner)
•Pagination
  •for keys get_range (start, finish, count)
  •for columns get_slice (start, finish, count)
•Data filtering
  •Use get_range/get_slice and play around with
  start/finish
Redis
•Class mapping
  • Namespaced keys: 'Post:5:title'

•ID generation
  •Redis counters: incr('Post:ids')

•Relationships
  •Redis lists: push_tail('Post:5:_rating_ids', 4)
Redis
•Index sorting
   •Redis sort:
      •sort 'Post:list', by 'Post:*:score', get
      'Post:*:id'


•Pagination
   •Redis lists: list_range('Post:list', 0, -9)

•Data filtering
  •Lookups: 'Post:permalink:fifth_post' => 5
CouchDB
•Type attribute in each document
•CouchDB automatic ID generation
•Related document IDs in the
attributes
•Views with complex keys
•Special attributes for view functions
CouchDB
   View: relation_blog_posts

function(doc){
   if(doc.type=="post"){
       emit([doc.blog_id,
             doc.created_at],
             doc);
   }
}
CouchDB
    View: relation_blog_posts


               GET
/db/design_doc/relation_blog_posts?
         startkey=[blog_1]
VPork
•Utility for load-testing a distributed hash table.
•Allows you to test raw throughput via
concurrent read/writes operations.
•Hardware:
   •2 x comodity servers: CoreDuo 2.5Ghz, 4Gb RAM,
   7200RPM disks
   •CouchDB: 2 instances, round-robin balanced
   •Cassandra: 2 instances
   •Redis: 1 instance

http://github.com/antoniogarrote/vpork
VPork
Throughput with read probability 0.2
VPork
Throughput with read probability 0.5
VPork
Throughput with read probability 0.8
Conclusions
•Complementary to relational solutions
•Each K/V address a different problem
•Best use case:
  •CouchDB: distributed/scalable
  Javascript-only app (no backend)
  •Cassandra: big amount of writes, no
  SPOF
  •Redis: datasets < RAM, lookups,
  cache, buffers
Credits
•All sponsored products, company names, brand names,
trademarks and logos are the property of their respective
owners.
•Alfa Romeo Giulietta: http://www.flickr.com/photos/
mauboi/3296469097/
•Pizza: http://reportingfrombelgium.wordpress.com/2009/
       05/20/belgian-summer-holidays/
•Sammy: http://www.yuddy.com/celebrity/Sammy-Davis-
Jr/bio
•Everything else is from teh internets and is free.

KeyValue Stores

  • 1.
    KeyValue Stores JediMaster Edition
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
    NoSQL != NoSQL No SQL Not Only SQL
  • 7.
    Taxonomy •Key-value stores: Redis, Voldemort,Cassandra •Column-oriented datastores: Cassandra, HBase •Document collection databases: CouchDB, MongoDB •Graph database: Neo4J, AllegroGraph •Data structure store: Redis
  • 8.
    CouchDB relax! •Damien Katz •Erlang - OTP compliant •schema-less documents •high availability •completely distributed •made for the web
  • 9.
  • 10.
    Ruby Libraries •CouchDB •Pure:net/http + JSON implementation •Thin wrapper: Couchrest http://github.com/jchris/couchrest •ORM/ActiveRecord: ActiveCouch, CouchObject, RelaxDB ..etc http://github.com/arunthampi/activecouch http://github.com/paulcarey/relaxdb
  • 11.
    CouchDB •Rocks •Simplicityand elegance •Much more than a DB •New possibilities for web apps •Sucks •Speed •Speed •Speed
  • 12.
    Redis il meglio d'Italia classy as a tasty as Giulietta a pizza
  • 13.
    Redis •Salvatore 'antirez' Sanfilippo •ANSIC - POSIX compliant •MemCache-like (on steroids) •Data structures store: •strings •counters •lists •sets + sorted sets (>= 1.1)
  • 14.
    Ruby Libraries •Redis •Client: redis-rb http://github.com/ezmobius/redis-rb •Hash/Object mapper: Ohm http://github.com/soveran/ohm •ORM: RedisRecord http://github.com/malditogeek/redisrecord
  • 15.
    Redis require 'redis' redis =Redis.new # Strings redis['foo'] = 'bar' # => 'bar' redis['foo'] # => 'bar' # Expirations redis.expire('foo', 5) # will expire existing key 'foo' in 5 sec redis.set('foo', 'bar', 5) # set 'foo' with 5 sec expiration # Counters redis.incr('counter') # => 1 redis.incr('counter', 10) # => 11 redis.decr('counter') # => 10
  • 16.
    Redis # Lists %w(1st 2nd3rd).each { |item| redis.push_tail('logs', item) } redis.list_range('logs', 0, -1) # => ["1st", "2nd", "3rd"] redis.pop_head('logs') # => "1st" redis.pop_tail('logs') # => "3rd" # Sets %w(one two).each { |item| redis.set_add('foo-tags', item) } %w(two three).each { |item| redis.set_add('bar-tags', item) } redis.set_intersect('foo-tags', 'bar-tags') # => ["two"] redis.set_union('foo-tags', 'bar-tags') # => ["three", "two", "one"]
  • 17.
    Redis •Rocks •Speed,in memory dataset •Asynch non-blocking persistence •Non-blocking replication •Data structures with atomic operations •Ease of use and deployment •Sucks •Sharding (client-side only at the moment) •Datasets > RAM •Very frequent code updates (?)
  • 18.
    Redis Upcoming coolness... •1.1 •Sorted sets (ZSET), append-only journaling •1.2 •HASH type, JSON dump tool •1.3 •Virtual memory (datasets > RAM) •1.4 •Redis-cluster proxy: consistent hashing and fault tollerant nodes •1.5 •Optimizations, UDP GET/SET
  • 19.
    Cassandra BigTable Dynamo by + by
  • 20.
    Cassandra Structure Storage Systemover P2P network •Developed at Facebook •Java •Dynamo: partition and replication •Bigtable: Log-structured ColumnFamily data model
  • 21.
    Ruby Libraries •Cassandra •Client: cassandra http://github.com/fauna/cassandra •ORM: cassandra_object http://github.com/NZKoz/cassandra_object •ORM: BigRecord http://github.com/openplaces/bigrecord
  • 22.
    Cassandra •Rocks •HighAvailability •Incremental Scalability •Minimal Administration •No Single Point of Failure •Sucks •Thrift API (...not so bad) •Change Schema, restart server •The Logo
  • 23.
  • 24.
    Data Modeling •Class mapping •IDgeneration •Relationships •one-to-one •one-to-many •many-to-many •Index sorting •Pagination •Data filtering
  • 25.
    Cassandra •Class mapping • ColumnFamily :Blog, :Post •ID generation •UUID.new(Time.now) •Relationships •Use ColumnFamily :PostsforUser to hold all posts that belong to a user
  • 26.
    Cassandra •Index sorting •Columns within a ColumnFamily are stored in sorted order. Keys are also sorted (if OrderPreservingPartitioner) •Pagination •for keys get_range (start, finish, count) •for columns get_slice (start, finish, count) •Data filtering •Use get_range/get_slice and play around with start/finish
  • 27.
    Redis •Class mapping • Namespaced keys: 'Post:5:title' •ID generation •Redis counters: incr('Post:ids') •Relationships •Redis lists: push_tail('Post:5:_rating_ids', 4)
  • 28.
    Redis •Index sorting •Redis sort: •sort 'Post:list', by 'Post:*:score', get 'Post:*:id' •Pagination •Redis lists: list_range('Post:list', 0, -9) •Data filtering •Lookups: 'Post:permalink:fifth_post' => 5
  • 29.
    CouchDB •Type attribute ineach document •CouchDB automatic ID generation •Related document IDs in the attributes •Views with complex keys •Special attributes for view functions
  • 30.
    CouchDB View: relation_blog_posts function(doc){ if(doc.type=="post"){ emit([doc.blog_id, doc.created_at], doc); } }
  • 31.
    CouchDB View: relation_blog_posts GET /db/design_doc/relation_blog_posts? startkey=[blog_1]
  • 32.
    VPork •Utility for load-testinga distributed hash table. •Allows you to test raw throughput via concurrent read/writes operations. •Hardware: •2 x comodity servers: CoreDuo 2.5Ghz, 4Gb RAM, 7200RPM disks •CouchDB: 2 instances, round-robin balanced •Cassandra: 2 instances •Redis: 1 instance http://github.com/antoniogarrote/vpork
  • 33.
  • 34.
  • 35.
  • 36.
    Conclusions •Complementary to relationalsolutions •Each K/V address a different problem •Best use case: •CouchDB: distributed/scalable Javascript-only app (no backend) •Cassandra: big amount of writes, no SPOF •Redis: datasets < RAM, lookups, cache, buffers
  • 37.
    Credits •All sponsored products,company names, brand names, trademarks and logos are the property of their respective owners. •Alfa Romeo Giulietta: http://www.flickr.com/photos/ mauboi/3296469097/ •Pizza: http://reportingfrombelgium.wordpress.com/2009/ 05/20/belgian-summer-holidays/ •Sammy: http://www.yuddy.com/celebrity/Sammy-Davis- Jr/bio •Everything else is from teh internets and is free.