• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Cassandra and Rails at LA NoSQL Meetup
 

Cassandra and Rails at LA NoSQL Meetup

on

  • 9,001 views

This presentation introduces people to Cassandra and Column Family Datastores in general. I will discuss what Cassandra is, how and when it is useful, and how it integrates with Rails. I will also go ...

This presentation introduces people to Cassandra and Column Family Datastores in general. I will discuss what Cassandra is, how and when it is useful, and how it integrates with Rails. I will also go in to lessons learned during our 3-month project, and the useful patterns that emerged. The discussion will be very technical, but targeted at developers who are not familiar with, or have not done a project with Cassandra.

Statistics

Views

Total Views
9,001
Views on SlideShare
8,931
Embed Views
70

Actions

Likes
10
Downloads
147
Comments
0

8 Embeds 70

http://blog.carbonfive.com 36
https://twitter.com 12
http://static.slidesharecdn.com 9
http://videofork.com 7
http://paper.li 3
http://blog.naver.com 1
http://108.166.83.15 1
https://si0.twimg.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Cassandra and Rails at LA NoSQL Meetup Cassandra and Rails at LA NoSQL Meetup Presentation Transcript

  • Cassandra and Rails What we learned on our first project Mike Wynholds Carbon Five @mwynholds
  • Agenda Introduction What is Cassandra? Cassandra and Rails Emergent data patterns Deployment
  • What is Cassandra? Column Family database Distributed design - “eventually consistent” Open sourced by Facebook, now Apache Used by Facebook, Twitter, Digg, Rackspace... Largest cluster: 100 TB, 150 nodes Still very immature - at version 0.6.2
  • Relational vs Column Family Schema-ful Schema-less (mostly) Row-based Column-based Robust SQL queries No query language Transactional Eventually consistent * ODBC/JDBC Thrift * Fast - 300/350ms Blazing - .12/15ms ** * Cassandra-specific ** MySQL vs Cassandra, > 50GB data, write/read
  • Relational Column Based Database Keyspace Column Family Table 1 col1 col2 col3 col4 col5 key1: col1:val col3:val key2: col1:val1 col2:val col3:val key3: col3:val Table 2 Table 3 Super Column Family col1 col2 col1 col2 col3 col1: col1: key1: col1:val col1:val col2:val key2::
  • What about ORM? Cassandra is NOT a relational database No ActiveRecord support (currently) No Hibernate support (currently) OCFM? Lots of room for jars/gems
  • An example in Rails bitchroom - a place to bitch and whine Twitter-like features - user post timelines Digg-like features - up/down, fav, reply
  • Custom Rails config -- /config/cassandra.yml -- development: servers: "127.0.0.1:9160" keyspace: "bitchroom_development" timeout: 3 retries: 2 -- /initializers/initialize_app.rb -- require 'cassandra' env = ENV['RAILS_ENV'] || 'development' cfg = YAML.load_file('#{RAILS_ROOT}/config/cassandra.yml')[env] thrift_options = { :timeout => cfg['timeout'], :retries => cfg['retries'] } $cassandra = Cassandra.new(cfg['keyspace'], cfg['servers'], thrift_options) $cassandra.disable_node_auto_discovery!
  • Cassandra API get(keyspace, key, column_path, level) get_slice(keyspace, key, column_parent, predicate, level) multiget_slice(keyspace, keys, column_parent, predicate, level) get_count(keyspace, key, column_parent, level) get_range_slices(keyspace, column_parent, predicate, range, level) insert(keyspace, ey, column_path, value, timestamp, level) batch_mutate(keyspace, mutation_map, level) remove(keyspace, key, column_path, timestamp, level) describe_*(...) Version 0.6.x only
  • Sample save def save uuid = SimpleUUID::UUID.new(Time.now) @id = uuid.to_guid post_hash = self.to_cassandra_hash cassandra.insert(:Posts, @id, post_hash) pointer = {uuid => @id} cassandra.insert(:Timelines, "main", pointer) cassandra.insert(:Timelines, "user-#{@author_id}") return self end
  • Emergent data patterns Simple object map Object relationship map Timeline <-- this one is the key
  • Simple Object Map Row id = object id (primary key) Attribute column names String column values ids = [ 1, 2, 3 ] posts = cassandra.multi_get(:Posts, ids, { })
  • Object Relationship Map Row id = object id (primary key) Relationship attribute column names External ID super-column values ids = [ 1, 2, 3 ] prs = cassandra.multi_get(:PostRelationships, ids, { })
  • Timeline TimeUUID column names External ID column values result = [] options = { :count => 20, :reverse => true } timelines = [ 'user-1', 'user-2', 'user-3' ] multi_get_result = cassandra.multi_get(:Timelines, timelines, options) multi_get_result.values().each do |timeline| timeline.each { |uuid, id| result << [uuid, id] } end result = result.sort { |a, b| b[0] <=> a[0] }.slice(0, options[:count])
  • Deployment Load balancing + failover :80 haproxy :81 :9160 cassandra nginx :5000 :9160 :9160 mongrel cassandra
  • Resources http://cassandra.apache.org/ http://wiki.apache.org/cassandra/API http://github.com/fauna/cassandra http://github.com/ryanking/simple_uuid