Cassandra and Rails
What we learned on our first project




                               Mike Wynholds
                 ...
Agenda

Introduction

What is Cassandra?

Cassandra and Rails

Emergent data patterns

Deployment
What is Cassandra?
Column Family database

Distributed design - “eventually consistent”

Open sourced by Facebook, now Apa...
Relational vs Column Family

      Schema-ful                       Schema-less (mostly)

      Row-based                 ...
Relational                          Column Based
Database                               Keyspace
                         ...
What about ORM?


Cassandra is NOT a relational database

No ActiveRecord support (currently)

No Hibernate support (curre...
An example in Rails


bitchroom - a place to bitch and whine

Twitter-like features - user post timelines

Digg-like featu...
Custom Rails config
-- /config/cassandra.yml --

development:
 servers: "127.0.0.1:9160"
 keyspace: "bitchroom_development"
...
Cassandra API

   get(keyspace, key, column_path, level)
   get_slice(keyspace, key, column_parent, predicate, level)
   m...
Sample save
def save
 uuid = SimpleUUID::UUID.new(Time.now)
 @id = uuid.to_guid

 post_hash = self.to_cassandra_hash
 cass...
Emergent data patterns


 Simple object map

 Object relationship map

 Timeline <-- this one is the key
Simple Object Map
      Row id = object id (primary key)

      Attribute column names

      String column values

ids = ...
Object Relationship Map
      Row id = object id (primary key)

      Relationship attribute column names

      External ...
Timeline
       TimeUUID column names

       External ID column values


result = []
options = { :count => 20, :reverse =...
Deployment
Load balancing + failover


                :80



               haproxy
                :81      :9160   cass...
Resources


http://cassandra.apache.org/

http://wiki.apache.org/cassandra/API

http://github.com/fauna/cassandra

http://...
Upcoming SlideShare
Loading in...5
×

Cassandra and Rails at LA NoSQL Meetup

7,463

Published on

This presentation introduces people to Cassandra and Column Family Datastores in general. I will discuss what Cassandra is, how and when it is useful, and how it integrates with Rails. I will also go in to lessons learned during our 3-month project, and the useful patterns that emerged. The discussion will be very technical, but targeted at developers who are not familiar with, or have not done a project with Cassandra.

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,463
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
150
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide
















  • Cassandra and Rails at LA NoSQL Meetup

    1. 1. Cassandra and Rails What we learned on our first project Mike Wynholds Carbon Five @mwynholds
    2. 2. Agenda Introduction What is Cassandra? Cassandra and Rails Emergent data patterns Deployment
    3. 3. What is Cassandra? Column Family database Distributed design - “eventually consistent” Open sourced by Facebook, now Apache Used by Facebook, Twitter, Digg, Rackspace... Largest cluster: 100 TB, 150 nodes Still very immature - at version 0.6.2
    4. 4. Relational vs Column Family Schema-ful Schema-less (mostly) Row-based Column-based Robust SQL queries No query language Transactional Eventually consistent * ODBC/JDBC Thrift * Fast - 300/350ms Blazing - .12/15ms ** * Cassandra-specific ** MySQL vs Cassandra, > 50GB data, write/read
    5. 5. Relational Column Based Database Keyspace Column Family Table 1 col1 col2 col3 col4 col5 key1: col1:val col3:val key2: col1:val1 col2:val col3:val key3: col3:val Table 2 Table 3 Super Column Family col1 col2 col1 col2 col3 col1: col1: key1: col1:val col1:val col2:val key2::
    6. 6. What about ORM? Cassandra is NOT a relational database No ActiveRecord support (currently) No Hibernate support (currently) OCFM? Lots of room for jars/gems
    7. 7. An example in Rails bitchroom - a place to bitch and whine Twitter-like features - user post timelines Digg-like features - up/down, fav, reply
    8. 8. Custom Rails config -- /config/cassandra.yml -- development: servers: "127.0.0.1:9160" keyspace: "bitchroom_development" timeout: 3 retries: 2 -- /initializers/initialize_app.rb -- require 'cassandra' env = ENV['RAILS_ENV'] || 'development' cfg = YAML.load_file('#{RAILS_ROOT}/config/cassandra.yml')[env] thrift_options = { :timeout => cfg['timeout'], :retries => cfg['retries'] } $cassandra = Cassandra.new(cfg['keyspace'], cfg['servers'], thrift_options) $cassandra.disable_node_auto_discovery!
    9. 9. Cassandra API get(keyspace, key, column_path, level) get_slice(keyspace, key, column_parent, predicate, level) multiget_slice(keyspace, keys, column_parent, predicate, level) get_count(keyspace, key, column_parent, level) get_range_slices(keyspace, column_parent, predicate, range, level) insert(keyspace, ey, column_path, value, timestamp, level) batch_mutate(keyspace, mutation_map, level) remove(keyspace, key, column_path, timestamp, level) describe_*(...) Version 0.6.x only
    10. 10. Sample save def save uuid = SimpleUUID::UUID.new(Time.now) @id = uuid.to_guid post_hash = self.to_cassandra_hash cassandra.insert(:Posts, @id, post_hash) pointer = {uuid => @id} cassandra.insert(:Timelines, "main", pointer) cassandra.insert(:Timelines, "user-#{@author_id}") return self end
    11. 11. Emergent data patterns Simple object map Object relationship map Timeline <-- this one is the key
    12. 12. Simple Object Map Row id = object id (primary key) Attribute column names String column values ids = [ 1, 2, 3 ] posts = cassandra.multi_get(:Posts, ids, { })
    13. 13. Object Relationship Map Row id = object id (primary key) Relationship attribute column names External ID super-column values ids = [ 1, 2, 3 ] prs = cassandra.multi_get(:PostRelationships, ids, { })
    14. 14. Timeline TimeUUID column names External ID column values result = [] options = { :count => 20, :reverse => true } timelines = [ 'user-1', 'user-2', 'user-3' ] multi_get_result = cassandra.multi_get(:Timelines, timelines, options) multi_get_result.values().each do |timeline| timeline.each { |uuid, id| result << [uuid, id] } end result = result.sort { |a, b| b[0] <=> a[0] }.slice(0, options[:count])
    15. 15. Deployment Load balancing + failover :80 haproxy :81 :9160 cassandra nginx :5000 :9160 :9160 mongrel cassandra
    16. 16. Resources http://cassandra.apache.org/ http://wiki.apache.org/cassandra/API http://github.com/fauna/cassandra http://github.com/ryanking/simple_uuid
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×