KeyValue Stores


Published on

Technical overview of three of the most representative KeyValue Stores: Cassandra, Redis and CouchDB. Focused on Ruby and Ruby on Rails developement, this talk shows how to solve common problems, the most popular libraries, benchmarking and the best use case for each one of them.

This talk was part of the Conferencia Rails 2009, Madrid, Spain.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • KeyValue Stores

    1. 1. KeyValue Stores Jedi Master Edition
    2. 2. Who? Antonio Garrote @antoniogarrote Mauro Pompilio @malditogeek Pablo Delgado @pablete
    3. 3. Agenda •Why? •Definitions •CouchDB •Redis •Cassandra •Ruby Libraries •Demo application •Data modeling •Benchmark
    4. 4. Why? •Scalability •Availability •Fault Tolerance •Schema-free •Ease of use •Performance •Elasticity •blah blah blah
    5. 5. NO silver bullet!
    6. 6. NoSQL != NoSQL No SQL Not Only SQL
    7. 7. Taxonomy •Key-value stores: Redis, Voldemort, Cassandra •Column-oriented datastores: Cassandra, HBase •Document collection databases: CouchDB, MongoDB •Graph database: Neo4J, AllegroGraph •Data structure store: Redis
    8. 8. CouchDB relax! •Damien Katz •Erlang - OTP compliant •schema-less documents •high availability •completely distributed •made for the web
    9. 9. CouchDB B-Trees . MapReduce . MVCC
    10. 10. Ruby Libraries •CouchDB •Pure: net/http + JSON implementation •Thin wrapper: Couchrest •ORM/ActiveRecord: ActiveCouch, CouchObject, RelaxDB ..etc
    11. 11. CouchDB •Rocks •Simplicity and elegance •Much more than a DB •New possibilities for web apps •Sucks •Speed •Speed •Speed
    12. 12. Redis il meglio d'Italia classy as a tasty as Giulietta a pizza
    13. 13. Redis •Salvatore 'antirez' Sanfilippo •ANSI C - POSIX compliant •MemCache-like (on steroids) •Data structures store: •strings •counters •lists •sets + sorted sets (>= 1.1)
    14. 14. Ruby Libraries •Redis •Client: redis-rb •Hash/Object mapper: Ohm •ORM: RedisRecord
    15. 15. Redis require 'redis' redis = # Strings redis['foo'] = 'bar' # => 'bar' redis['foo'] # => 'bar' # Expirations redis.expire('foo', 5) # will expire existing key 'foo' in 5 sec redis.set('foo', 'bar', 5) # set 'foo' with 5 sec expiration # Counters redis.incr('counter') # => 1 redis.incr('counter', 10) # => 11 redis.decr('counter') # => 10
    16. 16. Redis # Lists %w(1st 2nd 3rd).each { |item| redis.push_tail('logs', item) } redis.list_range('logs', 0, -1) # => ["1st", "2nd", "3rd"] redis.pop_head('logs') # => "1st" redis.pop_tail('logs') # => "3rd" # Sets %w(one two).each { |item| redis.set_add('foo-tags', item) } %w(two three).each { |item| redis.set_add('bar-tags', item) } redis.set_intersect('foo-tags', 'bar-tags') # => ["two"] redis.set_union('foo-tags', 'bar-tags') # => ["three", "two", "one"]
    17. 17. Redis •Rocks •Speed, in memory dataset •Asynch non-blocking persistence •Non-blocking replication •Data structures with atomic operations •Ease of use and deployment •Sucks •Sharding (client-side only at the moment) •Datasets > RAM •Very frequent code updates (?)
    18. 18. Redis Upcoming coolness... •1.1 •Sorted sets (ZSET), append-only journaling •1.2 •HASH type, JSON dump tool •1.3 •Virtual memory (datasets > RAM) •1.4 •Redis-cluster proxy: consistent hashing and fault tollerant nodes •1.5 •Optimizations, UDP GET/SET
    19. 19. Cassandra BigTable Dynamo by + by
    20. 20. Cassandra Structure Storage System over P2P network •Developed at Facebook •Java •Dynamo: partition and replication •Bigtable: Log-structured ColumnFamily data model
    21. 21. Ruby Libraries •Cassandra •Client: cassandra •ORM: cassandra_object •ORM: BigRecord
    22. 22. Cassandra •Rocks •High Availability •Incremental Scalability •Minimal Administration •No Single Point of Failure •Sucks •Thrift API (...not so bad) •Change Schema, restart server •The Logo
    23. 23. Demo Application
    24. 24. Data Modeling •Class mapping •ID generation •Relationships •one-to-one •one-to-many •many-to-many •Index sorting •Pagination •Data filtering
    25. 25. Cassandra •Class mapping • ColumnFamily :Blog, :Post •ID generation • •Relationships •Use ColumnFamily :PostsforUser to hold all posts that belong to a user
    26. 26. Cassandra •Index sorting •Columns within a ColumnFamily are stored in sorted order. Keys are also sorted (if OrderPreservingPartitioner) •Pagination •for keys get_range (start, finish, count) •for columns get_slice (start, finish, count) •Data filtering •Use get_range/get_slice and play around with start/finish
    27. 27. Redis •Class mapping • Namespaced keys: 'Post:5:title' •ID generation •Redis counters: incr('Post:ids') •Relationships •Redis lists: push_tail('Post:5:_rating_ids', 4)
    28. 28. Redis •Index sorting •Redis sort: •sort 'Post:list', by 'Post:*:score', get 'Post:*:id' •Pagination •Redis lists: list_range('Post:list', 0, -9) •Data filtering •Lookups: 'Post:permalink:fifth_post' => 5
    29. 29. CouchDB •Type attribute in each document •CouchDB automatic ID generation •Related document IDs in the attributes •Views with complex keys •Special attributes for view functions
    30. 30. CouchDB View: relation_blog_posts function(doc){ if(doc.type=="post"){ emit([doc.blog_id, doc.created_at], doc); } }
    31. 31. CouchDB View: relation_blog_posts GET /db/design_doc/relation_blog_posts? startkey=[blog_1]
    32. 32. VPork •Utility for load-testing a distributed hash table. •Allows you to test raw throughput via concurrent read/writes operations. •Hardware: •2 x comodity servers: CoreDuo 2.5Ghz, 4Gb RAM, 7200RPM disks •CouchDB: 2 instances, round-robin balanced •Cassandra: 2 instances •Redis: 1 instance
    33. 33. VPork Throughput with read probability 0.2
    34. 34. VPork Throughput with read probability 0.5
    35. 35. VPork Throughput with read probability 0.8
    36. 36. Conclusions •Complementary to relational solutions •Each K/V address a different problem •Best use case: •CouchDB: distributed/scalable Javascript-only app (no backend) •Cassandra: big amount of writes, no SPOF •Redis: datasets < RAM, lookups, cache, buffers
    37. 37. Credits •All sponsored products, company names, brand names, trademarks and logos are the property of their respective owners. •Alfa Romeo Giulietta: mauboi/3296469097/ •Pizza: 05/20/belgian-summer-holidays/ •Sammy: Jr/bio •Everything else is from teh internets and is free.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.