KeyValue Stores
 Jedi Master Edition
Who?
Antonio Garrote
@antoniogarrote



Mauro Pompilio
@malditogeek



Pablo Delgado
@pablete
Agenda
•Why?
•Definitions
•CouchDB
•Redis
•Cassandra
•Ruby Libraries
•Demo application
•Data modeling
•Benchmark
Why?
•Scalability
•Availability
•Fault Tolerance
•Schema-free
•Ease of use
•Performance
•Elasticity
•blah blah blah
NO
silver bullet!
NoSQL != NoSQL
 No SQL  Not Only SQL
Taxonomy
•Key-value stores:
Redis, Voldemort, Cassandra
•Column-oriented datastores:
Cassandra, HBase
•Document collection...
CouchDB
   relax!
 •Damien Katz
 •Erlang - OTP compliant
 •schema-less documents
 •high availability
 •completely distribu...
CouchDB


B-Trees . MapReduce . MVCC
Ruby Libraries
•CouchDB

 •Pure: net/http + JSON implementation

 •Thin wrapper: Couchrest
 http://github.com/jchris/couch...
CouchDB
•Rocks
  •Simplicity and elegance
  •Much more than a DB
  •New possibilities for web apps

•Sucks
  •Speed
  •Spe...
Redis
       il meglio d'Italia




classy as a           tasty as
  Giulietta           a pizza
Redis
•Salvatore 'antirez' Sanfilippo
•ANSI C - POSIX compliant

•MemCache-like (on steroids)
•Data structures store:
  •s...
Ruby Libraries
•Redis

  •Client: redis-rb
  http://github.com/ezmobius/redis-rb


  •Hash/Object mapper: Ohm
  http://git...
Redis
require 'redis'
redis = Redis.new

# Strings
redis['foo'] = 'bar' # => 'bar'
redis['foo']         # => 'bar'

# Expi...
Redis
# Lists
%w(1st 2nd 3rd).each { |item| redis.push_tail('logs', item) }
redis.list_range('logs', 0, -1) # => ["1st", "...
Redis
•Rocks
  •Speed, in memory dataset
  •Asynch non-blocking persistence
  •Non-blocking replication
  •Data structures...
Redis
Upcoming coolness...


   •1.1
          •Sorted sets (ZSET), append-only journaling
   •1.2
          •HASH type, J...
Cassandra

BigTable       Dynamo
  by
           +       by
Cassandra
Structure Storage System over P2P network


             •Developed at Facebook
             •Java

            ...
Ruby Libraries
•Cassandra

  •Client: cassandra
  http://github.com/fauna/cassandra


  •ORM: cassandra_object
  http://gi...
Cassandra
•Rocks
  •High Availability
  •Incremental Scalability
  •Minimal Administration
  •No Single Point of Failure
•...
Demo Application




http://github.com/antoniogarrote/conf_rails_hispana_2009
Data Modeling
•Class mapping
•ID generation
•Relationships
   •one-to-one
   •one-to-many
   •many-to-many
•Index sorting
...
Cassandra
•Class mapping
    • ColumnFamily :Blog, :Post

•ID generation
  •UUID.new(Time.now)

•Relationships
  •Use Colu...
Cassandra
•Index sorting
  •Columns within a ColumnFamily are stored in
  sorted order. Keys are also sorted (if
  OrderPr...
Redis
•Class mapping
  • Namespaced keys: 'Post:5:title'

•ID generation
  •Redis counters: incr('Post:ids')

•Relationshi...
Redis
•Index sorting
   •Redis sort:
      •sort 'Post:list', by 'Post:*:score', get
      'Post:*:id'


•Pagination
   •R...
CouchDB
•Type attribute in each document
•CouchDB automatic ID generation
•Related document IDs in the
attributes
•Views w...
CouchDB
   View: relation_blog_posts

function(doc){
   if(doc.type=="post"){
       emit([doc.blog_id,
             doc.c...
CouchDB
    View: relation_blog_posts


               GET
/db/design_doc/relation_blog_posts?
         startkey=[blog_1]
VPork
•Utility for load-testing a distributed hash table.
•Allows you to test raw throughput via
concurrent read/writes op...
VPork
Throughput with read probability 0.2
VPork
Throughput with read probability 0.5
VPork
Throughput with read probability 0.8
Conclusions
•Complementary to relational solutions
•Each K/V address a different problem
•Best use case:
  •CouchDB: distr...
Credits
•All sponsored products, company names, brand names,
trademarks and logos are the property of their respective
own...
Upcoming SlideShare
Loading in...5
×

KeyValue Stores

12,035

Published on

Technical overview of three of the most representative KeyValue Stores: Cassandra, Redis and CouchDB. Focused on Ruby and Ruby on Rails developement, this talk shows how to solve common problems, the most popular libraries, benchmarking and the best use case for each one of them.

This talk was part of the Conferencia Rails 2009, Madrid, Spain.

http://app.conferenciarails.org/talks/43-key-value-stores-conviertete-en-un-jedi-master

Published in: Technology
0 Comments
22 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
12,035
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
390
Comments
0
Likes
22
Embeds 0
No embeds

No notes for slide
  • KeyValue Stores

    1. 1. KeyValue Stores Jedi Master Edition
    2. 2. Who? Antonio Garrote @antoniogarrote Mauro Pompilio @malditogeek Pablo Delgado @pablete
    3. 3. Agenda •Why? •Definitions •CouchDB •Redis •Cassandra •Ruby Libraries •Demo application •Data modeling •Benchmark
    4. 4. Why? •Scalability •Availability •Fault Tolerance •Schema-free •Ease of use •Performance •Elasticity •blah blah blah
    5. 5. NO silver bullet!
    6. 6. NoSQL != NoSQL No SQL Not Only SQL
    7. 7. Taxonomy •Key-value stores: Redis, Voldemort, Cassandra •Column-oriented datastores: Cassandra, HBase •Document collection databases: CouchDB, MongoDB •Graph database: Neo4J, AllegroGraph •Data structure store: Redis
    8. 8. CouchDB relax! •Damien Katz •Erlang - OTP compliant •schema-less documents •high availability •completely distributed •made for the web
    9. 9. CouchDB B-Trees . MapReduce . MVCC
    10. 10. Ruby Libraries •CouchDB •Pure: net/http + JSON implementation •Thin wrapper: Couchrest http://github.com/jchris/couchrest •ORM/ActiveRecord: ActiveCouch, CouchObject, RelaxDB ..etc http://github.com/arunthampi/activecouch http://github.com/paulcarey/relaxdb
    11. 11. CouchDB •Rocks •Simplicity and elegance •Much more than a DB •New possibilities for web apps •Sucks •Speed •Speed •Speed
    12. 12. Redis il meglio d'Italia classy as a tasty as Giulietta a pizza
    13. 13. Redis •Salvatore 'antirez' Sanfilippo •ANSI C - POSIX compliant •MemCache-like (on steroids) •Data structures store: •strings •counters •lists •sets + sorted sets (>= 1.1)
    14. 14. Ruby Libraries •Redis •Client: redis-rb http://github.com/ezmobius/redis-rb •Hash/Object mapper: Ohm http://github.com/soveran/ohm •ORM: RedisRecord http://github.com/malditogeek/redisrecord
    15. 15. Redis require 'redis' redis = Redis.new # Strings redis['foo'] = 'bar' # => 'bar' redis['foo'] # => 'bar' # Expirations redis.expire('foo', 5) # will expire existing key 'foo' in 5 sec redis.set('foo', 'bar', 5) # set 'foo' with 5 sec expiration # Counters redis.incr('counter') # => 1 redis.incr('counter', 10) # => 11 redis.decr('counter') # => 10
    16. 16. Redis # Lists %w(1st 2nd 3rd).each { |item| redis.push_tail('logs', item) } redis.list_range('logs', 0, -1) # => ["1st", "2nd", "3rd"] redis.pop_head('logs') # => "1st" redis.pop_tail('logs') # => "3rd" # Sets %w(one two).each { |item| redis.set_add('foo-tags', item) } %w(two three).each { |item| redis.set_add('bar-tags', item) } redis.set_intersect('foo-tags', 'bar-tags') # => ["two"] redis.set_union('foo-tags', 'bar-tags') # => ["three", "two", "one"]
    17. 17. Redis •Rocks •Speed, in memory dataset •Asynch non-blocking persistence •Non-blocking replication •Data structures with atomic operations •Ease of use and deployment •Sucks •Sharding (client-side only at the moment) •Datasets > RAM •Very frequent code updates (?)
    18. 18. Redis Upcoming coolness... •1.1 •Sorted sets (ZSET), append-only journaling •1.2 •HASH type, JSON dump tool •1.3 •Virtual memory (datasets > RAM) •1.4 •Redis-cluster proxy: consistent hashing and fault tollerant nodes •1.5 •Optimizations, UDP GET/SET
    19. 19. Cassandra BigTable Dynamo by + by
    20. 20. Cassandra Structure Storage System over P2P network •Developed at Facebook •Java •Dynamo: partition and replication •Bigtable: Log-structured ColumnFamily data model
    21. 21. Ruby Libraries •Cassandra •Client: cassandra http://github.com/fauna/cassandra •ORM: cassandra_object http://github.com/NZKoz/cassandra_object •ORM: BigRecord http://github.com/openplaces/bigrecord
    22. 22. Cassandra •Rocks •High Availability •Incremental Scalability •Minimal Administration •No Single Point of Failure •Sucks •Thrift API (...not so bad) •Change Schema, restart server •The Logo
    23. 23. Demo Application http://github.com/antoniogarrote/conf_rails_hispana_2009
    24. 24. Data Modeling •Class mapping •ID generation •Relationships •one-to-one •one-to-many •many-to-many •Index sorting •Pagination •Data filtering
    25. 25. Cassandra •Class mapping • ColumnFamily :Blog, :Post •ID generation •UUID.new(Time.now) •Relationships •Use ColumnFamily :PostsforUser to hold all posts that belong to a user
    26. 26. Cassandra •Index sorting •Columns within a ColumnFamily are stored in sorted order. Keys are also sorted (if OrderPreservingPartitioner) •Pagination •for keys get_range (start, finish, count) •for columns get_slice (start, finish, count) •Data filtering •Use get_range/get_slice and play around with start/finish
    27. 27. Redis •Class mapping • Namespaced keys: 'Post:5:title' •ID generation •Redis counters: incr('Post:ids') •Relationships •Redis lists: push_tail('Post:5:_rating_ids', 4)
    28. 28. Redis •Index sorting •Redis sort: •sort 'Post:list', by 'Post:*:score', get 'Post:*:id' •Pagination •Redis lists: list_range('Post:list', 0, -9) •Data filtering •Lookups: 'Post:permalink:fifth_post' => 5
    29. 29. CouchDB •Type attribute in each document •CouchDB automatic ID generation •Related document IDs in the attributes •Views with complex keys •Special attributes for view functions
    30. 30. CouchDB View: relation_blog_posts function(doc){ if(doc.type=="post"){ emit([doc.blog_id, doc.created_at], doc); } }
    31. 31. CouchDB View: relation_blog_posts GET /db/design_doc/relation_blog_posts? startkey=[blog_1]
    32. 32. VPork •Utility for load-testing a distributed hash table. •Allows you to test raw throughput via concurrent read/writes operations. •Hardware: •2 x comodity servers: CoreDuo 2.5Ghz, 4Gb RAM, 7200RPM disks •CouchDB: 2 instances, round-robin balanced •Cassandra: 2 instances •Redis: 1 instance http://github.com/antoniogarrote/vpork
    33. 33. VPork Throughput with read probability 0.2
    34. 34. VPork Throughput with read probability 0.5
    35. 35. VPork Throughput with read probability 0.8
    36. 36. Conclusions •Complementary to relational solutions •Each K/V address a different problem •Best use case: •CouchDB: distributed/scalable Javascript-only app (no backend) •Cassandra: big amount of writes, no SPOF •Redis: datasets < RAM, lookups, cache, buffers
    37. 37. Credits •All sponsored products, company names, brand names, trademarks and logos are the property of their respective owners. •Alfa Romeo Giulietta: http://www.flickr.com/photos/ mauboi/3296469097/ •Pizza: http://reportingfrombelgium.wordpress.com/2009/ 05/20/belgian-summer-holidays/ •Sammy: http://www.yuddy.com/celebrity/Sammy-Davis- Jr/bio •Everything else is from teh internets and is free.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×