KeyValue Stores
Upcoming SlideShare
Loading in...5

KeyValue Stores



Technical overview of three of the most representative KeyValue Stores: Cassandra, Redis and CouchDB. Focused on Ruby and Ruby on Rails developement, this talk shows how to solve common problems, the ...

Technical overview of three of the most representative KeyValue Stores: Cassandra, Redis and CouchDB. Focused on Ruby and Ruby on Rails developement, this talk shows how to solve common problems, the most popular libraries, benchmarking and the best use case for each one of them.

This talk was part of the Conferencia Rails 2009, Madrid, Spain.



Total Views
Views on SlideShare
Embed Views



10 Embeds 128 114 3 2 2 2 1 1 1 1 1



Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

KeyValue Stores KeyValue Stores Presentation Transcript

  • KeyValue Stores Jedi Master Edition
  • Who? Antonio Garrote @antoniogarrote Mauro Pompilio @malditogeek Pablo Delgado @pablete
  • Agenda •Why? •Definitions •CouchDB •Redis •Cassandra •Ruby Libraries •Demo application •Data modeling •Benchmark
  • Why? •Scalability •Availability •Fault Tolerance •Schema-free •Ease of use •Performance •Elasticity •blah blah blah
  • NO silver bullet!
  • NoSQL != NoSQL No SQL Not Only SQL
  • Taxonomy •Key-value stores: Redis, Voldemort, Cassandra •Column-oriented datastores: Cassandra, HBase •Document collection databases: CouchDB, MongoDB •Graph database: Neo4J, AllegroGraph •Data structure store: Redis
  • CouchDB relax! •Damien Katz •Erlang - OTP compliant •schema-less documents •high availability •completely distributed •made for the web
  • CouchDB B-Trees . MapReduce . MVCC
  • Ruby Libraries •CouchDB •Pure: net/http + JSON implementation •Thin wrapper: Couchrest •ORM/ActiveRecord: ActiveCouch, CouchObject, RelaxDB ..etc
  • CouchDB •Rocks •Simplicity and elegance •Much more than a DB •New possibilities for web apps •Sucks •Speed •Speed •Speed
  • Redis il meglio d'Italia classy as a tasty as Giulietta a pizza
  • Redis •Salvatore 'antirez' Sanfilippo •ANSI C - POSIX compliant •MemCache-like (on steroids) •Data structures store: •strings •counters •lists •sets + sorted sets (>= 1.1)
  • Ruby Libraries •Redis •Client: redis-rb •Hash/Object mapper: Ohm •ORM: RedisRecord
  • Redis require 'redis' redis = # Strings redis['foo'] = 'bar' # => 'bar' redis['foo'] # => 'bar' # Expirations redis.expire('foo', 5) # will expire existing key 'foo' in 5 sec redis.set('foo', 'bar', 5) # set 'foo' with 5 sec expiration # Counters redis.incr('counter') # => 1 redis.incr('counter', 10) # => 11 redis.decr('counter') # => 10
  • Redis # Lists %w(1st 2nd 3rd).each { |item| redis.push_tail('logs', item) } redis.list_range('logs', 0, -1) # => ["1st", "2nd", "3rd"] redis.pop_head('logs') # => "1st" redis.pop_tail('logs') # => "3rd" # Sets %w(one two).each { |item| redis.set_add('foo-tags', item) } %w(two three).each { |item| redis.set_add('bar-tags', item) } redis.set_intersect('foo-tags', 'bar-tags') # => ["two"] redis.set_union('foo-tags', 'bar-tags') # => ["three", "two", "one"]
  • Redis •Rocks •Speed, in memory dataset •Asynch non-blocking persistence •Non-blocking replication •Data structures with atomic operations •Ease of use and deployment •Sucks •Sharding (client-side only at the moment) •Datasets > RAM •Very frequent code updates (?)
  • Redis Upcoming coolness... •1.1 •Sorted sets (ZSET), append-only journaling •1.2 •HASH type, JSON dump tool •1.3 •Virtual memory (datasets > RAM) •1.4 •Redis-cluster proxy: consistent hashing and fault tollerant nodes •1.5 •Optimizations, UDP GET/SET
  • Cassandra BigTable Dynamo by + by
  • Cassandra Structure Storage System over P2P network •Developed at Facebook •Java •Dynamo: partition and replication •Bigtable: Log-structured ColumnFamily data model
  • Ruby Libraries •Cassandra •Client: cassandra •ORM: cassandra_object •ORM: BigRecord
  • Cassandra •Rocks •High Availability •Incremental Scalability •Minimal Administration •No Single Point of Failure •Sucks •Thrift API (...not so bad) •Change Schema, restart server •The Logo
  • Demo Application
  • Data Modeling •Class mapping •ID generation •Relationships •one-to-one •one-to-many •many-to-many •Index sorting •Pagination •Data filtering
  • Cassandra •Class mapping • ColumnFamily :Blog, :Post •ID generation • •Relationships •Use ColumnFamily :PostsforUser to hold all posts that belong to a user
  • Cassandra •Index sorting •Columns within a ColumnFamily are stored in sorted order. Keys are also sorted (if OrderPreservingPartitioner) •Pagination •for keys get_range (start, finish, count) •for columns get_slice (start, finish, count) •Data filtering •Use get_range/get_slice and play around with start/finish
  • Redis •Class mapping • Namespaced keys: 'Post:5:title' •ID generation •Redis counters: incr('Post:ids') •Relationships •Redis lists: push_tail('Post:5:_rating_ids', 4)
  • Redis •Index sorting •Redis sort: •sort 'Post:list', by 'Post:*:score', get 'Post:*:id' •Pagination •Redis lists: list_range('Post:list', 0, -9) •Data filtering •Lookups: 'Post:permalink:fifth_post' => 5
  • CouchDB •Type attribute in each document •CouchDB automatic ID generation •Related document IDs in the attributes •Views with complex keys •Special attributes for view functions
  • CouchDB View: relation_blog_posts function(doc){ if(doc.type=="post"){ emit([doc.blog_id, doc.created_at], doc); } }
  • CouchDB View: relation_blog_posts GET /db/design_doc/relation_blog_posts? startkey=[blog_1]
  • VPork •Utility for load-testing a distributed hash table. •Allows you to test raw throughput via concurrent read/writes operations. •Hardware: •2 x comodity servers: CoreDuo 2.5Ghz, 4Gb RAM, 7200RPM disks •CouchDB: 2 instances, round-robin balanced •Cassandra: 2 instances •Redis: 1 instance
  • VPork Throughput with read probability 0.2
  • VPork Throughput with read probability 0.5
  • VPork Throughput with read probability 0.8
  • Conclusions •Complementary to relational solutions •Each K/V address a different problem •Best use case: •CouchDB: distributed/scalable Javascript-only app (no backend) •Cassandra: big amount of writes, no SPOF •Redis: datasets < RAM, lookups, cache, buffers
  • Credits •All sponsored products, company names, brand names, trademarks and logos are the property of their respective owners. •Alfa Romeo Giulietta: mauboi/3296469097/ •Pizza: 05/20/belgian-summer-holidays/ •Sammy: Jr/bio •Everything else is from teh internets and is free.