Your SlideShare is downloading. ×
KeyValue Stores
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

KeyValue Stores


Published on

Technical overview of three of the most representative KeyValue Stores: Cassandra, Redis and CouchDB. Focused on Ruby and Ruby on Rails developement, this talk shows how to solve common problems, the …

Technical overview of three of the most representative KeyValue Stores: Cassandra, Redis and CouchDB. Focused on Ruby and Ruby on Rails developement, this talk shows how to solve common problems, the most popular libraries, benchmarking and the best use case for each one of them.

This talk was part of the Conferencia Rails 2009, Madrid, Spain.

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Transcript

    • 1. KeyValue Stores Jedi Master Edition
    • 2. Who? Antonio Garrote @antoniogarrote Mauro Pompilio @malditogeek Pablo Delgado @pablete
    • 3. Agenda •Why? •Definitions •CouchDB •Redis •Cassandra •Ruby Libraries •Demo application •Data modeling •Benchmark
    • 4. Why? •Scalability •Availability •Fault Tolerance •Schema-free •Ease of use •Performance •Elasticity •blah blah blah
    • 5. NO silver bullet!
    • 6. NoSQL != NoSQL No SQL Not Only SQL
    • 7. Taxonomy •Key-value stores: Redis, Voldemort, Cassandra •Column-oriented datastores: Cassandra, HBase •Document collection databases: CouchDB, MongoDB •Graph database: Neo4J, AllegroGraph •Data structure store: Redis
    • 8. CouchDB relax! •Damien Katz •Erlang - OTP compliant •schema-less documents •high availability •completely distributed •made for the web
    • 9. CouchDB B-Trees . MapReduce . MVCC
    • 10. Ruby Libraries •CouchDB •Pure: net/http + JSON implementation •Thin wrapper: Couchrest •ORM/ActiveRecord: ActiveCouch, CouchObject, RelaxDB ..etc
    • 11. CouchDB •Rocks •Simplicity and elegance •Much more than a DB •New possibilities for web apps •Sucks •Speed •Speed •Speed
    • 12. Redis il meglio d'Italia classy as a tasty as Giulietta a pizza
    • 13. Redis •Salvatore 'antirez' Sanfilippo •ANSI C - POSIX compliant •MemCache-like (on steroids) •Data structures store: •strings •counters •lists •sets + sorted sets (>= 1.1)
    • 14. Ruby Libraries •Redis •Client: redis-rb •Hash/Object mapper: Ohm •ORM: RedisRecord
    • 15. Redis require 'redis' redis = # Strings redis['foo'] = 'bar' # => 'bar' redis['foo'] # => 'bar' # Expirations redis.expire('foo', 5) # will expire existing key 'foo' in 5 sec redis.set('foo', 'bar', 5) # set 'foo' with 5 sec expiration # Counters redis.incr('counter') # => 1 redis.incr('counter', 10) # => 11 redis.decr('counter') # => 10
    • 16. Redis # Lists %w(1st 2nd 3rd).each { |item| redis.push_tail('logs', item) } redis.list_range('logs', 0, -1) # => ["1st", "2nd", "3rd"] redis.pop_head('logs') # => "1st" redis.pop_tail('logs') # => "3rd" # Sets %w(one two).each { |item| redis.set_add('foo-tags', item) } %w(two three).each { |item| redis.set_add('bar-tags', item) } redis.set_intersect('foo-tags', 'bar-tags') # => ["two"] redis.set_union('foo-tags', 'bar-tags') # => ["three", "two", "one"]
    • 17. Redis •Rocks •Speed, in memory dataset •Asynch non-blocking persistence •Non-blocking replication •Data structures with atomic operations •Ease of use and deployment •Sucks •Sharding (client-side only at the moment) •Datasets > RAM •Very frequent code updates (?)
    • 18. Redis Upcoming coolness... •1.1 •Sorted sets (ZSET), append-only journaling •1.2 •HASH type, JSON dump tool •1.3 •Virtual memory (datasets > RAM) •1.4 •Redis-cluster proxy: consistent hashing and fault tollerant nodes •1.5 •Optimizations, UDP GET/SET
    • 19. Cassandra BigTable Dynamo by + by
    • 20. Cassandra Structure Storage System over P2P network •Developed at Facebook •Java •Dynamo: partition and replication •Bigtable: Log-structured ColumnFamily data model
    • 21. Ruby Libraries •Cassandra •Client: cassandra •ORM: cassandra_object •ORM: BigRecord
    • 22. Cassandra •Rocks •High Availability •Incremental Scalability •Minimal Administration •No Single Point of Failure •Sucks •Thrift API (...not so bad) •Change Schema, restart server •The Logo
    • 23. Demo Application
    • 24. Data Modeling •Class mapping •ID generation •Relationships •one-to-one •one-to-many •many-to-many •Index sorting •Pagination •Data filtering
    • 25. Cassandra •Class mapping • ColumnFamily :Blog, :Post •ID generation • •Relationships •Use ColumnFamily :PostsforUser to hold all posts that belong to a user
    • 26. Cassandra •Index sorting •Columns within a ColumnFamily are stored in sorted order. Keys are also sorted (if OrderPreservingPartitioner) •Pagination •for keys get_range (start, finish, count) •for columns get_slice (start, finish, count) •Data filtering •Use get_range/get_slice and play around with start/finish
    • 27. Redis •Class mapping • Namespaced keys: 'Post:5:title' •ID generation •Redis counters: incr('Post:ids') •Relationships •Redis lists: push_tail('Post:5:_rating_ids', 4)
    • 28. Redis •Index sorting •Redis sort: •sort 'Post:list', by 'Post:*:score', get 'Post:*:id' •Pagination •Redis lists: list_range('Post:list', 0, -9) •Data filtering •Lookups: 'Post:permalink:fifth_post' => 5
    • 29. CouchDB •Type attribute in each document •CouchDB automatic ID generation •Related document IDs in the attributes •Views with complex keys •Special attributes for view functions
    • 30. CouchDB View: relation_blog_posts function(doc){ if(doc.type=="post"){ emit([doc.blog_id, doc.created_at], doc); } }
    • 31. CouchDB View: relation_blog_posts GET /db/design_doc/relation_blog_posts? startkey=[blog_1]
    • 32. VPork •Utility for load-testing a distributed hash table. •Allows you to test raw throughput via concurrent read/writes operations. •Hardware: •2 x comodity servers: CoreDuo 2.5Ghz, 4Gb RAM, 7200RPM disks •CouchDB: 2 instances, round-robin balanced •Cassandra: 2 instances •Redis: 1 instance
    • 33. VPork Throughput with read probability 0.2
    • 34. VPork Throughput with read probability 0.5
    • 35. VPork Throughput with read probability 0.8
    • 36. Conclusions •Complementary to relational solutions •Each K/V address a different problem •Best use case: •CouchDB: distributed/scalable Javascript-only app (no backend) •Cassandra: big amount of writes, no SPOF •Redis: datasets < RAM, lookups, cache, buffers
    • 37. Credits •All sponsored products, company names, brand names, trademarks and logos are the property of their respective owners. •Alfa Romeo Giulietta: mauboi/3296469097/ •Pizza: 05/20/belgian-summer-holidays/ •Sammy: Jr/bio •Everything else is from teh internets and is free.