Michael Bleigh
       Persistence                     Intridea, Inc.


       Smoothie
       Blending SQL and NoSQL




      photo by Nikki L. via Flickr

Saturday, April 10, 2010
Saturday, April 10, 2010
present.ly

Saturday, April 10, 2010
The Buzz


Saturday, April 10, 2010
You’ve (probably)
                           heard a lot about
                                NoSQL


Saturday, April 10, 2010
NoSQL is a new way
                 to think about
                   persistence


Saturday, April 10, 2010
Atomicity
                           Consistency
                            Isolation
                            Durability


Saturday, April 10, 2010
Denormalization
                 Eventual Consistency
                    Schema-Free
                   Horizontal Scale
                     Map Reduce

Saturday, April 10, 2010
Map Reduce
                      • Massively parallel way to
                           process large datasets
                      • First you scour data and “map” a
                           new set of data
                      • Then you “reduce” the data
                           down to a salient result

Saturday, April 10, 2010
map = function() {
                     this.tags.forEach(function(tag) {
                       emit(tag, {count: 1});
                     });
                   }

                   reduce = function(key, values) {
                     var total = 0;
                     for (var i = 0; i < values.length; i++) {
                       total += values[i].count;
                     return {count: total};
                   }




Saturday, April 10, 2010
NoSQL tries to scale
                  (more) simply


Saturday, April 10, 2010
NoSQL is going
                            mainstream


Saturday, April 10, 2010
New York Times
                     Business Insider
                     BBC ShopWiki
                      GitHub Meebo
                   Disqus SourceForge
                        Sony Digg

Saturday, April 10, 2010
...but not THAT
                              mainstream.


Saturday, April 10, 2010
A word of caution...



Saturday, April 10, 2010
Saturday, April 10, 2010
sn’t
                 d oe s
              QL wait
          oS , it
        N
         s le ep
                            NoSQL can
                           divide by zero
                                NoSQL
                               to infin  counte
                                        ity, twi  d
                                                 ce
Saturday, April 10, 2010
NoSQL is a (growing)
              collection of tools, not
                a new way of life


Saturday, April 10, 2010
The Ecosystem


Saturday, April 10, 2010
Key-Value Stores

                                   • Voldemort
                      •    Redis
                                   • Tokyo Cabinet
                      • Riak       • MemcachedDB

Saturday, April 10, 2010
Document Stores


                      •    MongoDB   • Riak
                      • CouchDB      • FleetDB

Saturday, April 10, 2010
Column(ish) Stores


                      •    Cassandra
                      • HBase


Saturday, April 10, 2010
Graph Databases

                      •    Neo4j
                      • HypergraphDB
                      • InfoGrid

Saturday, April 10, 2010
When should I use
                this stuff?


Saturday, April 10, 2010
Complex, slow joins
                 for “activity stream”




Saturday, April 10, 2010
Complex, slow joins
                 for “activity stream”

                   Denormalize,
                use Key-Value Store
Saturday, April 10, 2010
Variable schema,
                    vertical interaction




Saturday, April 10, 2010
Variable schema,
                    vertical interaction

                 Document Database
                  or Column Store
Saturday, April 10, 2010
Modeling deep
                           relationships




Saturday, April 10, 2010
Modeling deep
                           relationships


                           Graph Database

Saturday, April 10, 2010
NoSQL solves real
                 scalability and data
                    design issues


Saturday, April 10, 2010
Ben Scofield
                bit.ly/state-of-nosql


Saturday, April 10, 2010
Ready to go?



Saturday, April 10, 2010
Just one problem...



Saturday, April 10, 2010
Your data is already
                  in a SQL database


Saturday, April 10, 2010
So now we need to
                      ask the question...


Saturday, April 10, 2010
Saturday, April 10, 2010
Yeah, it blends.



Saturday, April 10, 2010
The “Hard” Way:
                            Do it by hand.


Saturday, April 10, 2010
class Post
                     include MongoMapper::Document

                           key   :title, String
                           key   :body, String
                           key   :tags, Array
                           key   :user_id, Integer

                           def user
                             User.find_by_id(self.user_id)
                           end

                     def user=(some_user)
                       self.user_id = some_user.id
                     end
                   end

                   class User < ActiveRecord::Base
                     def posts(options = {})
                       Post.all({:conditions => {:user_id => self.id}}.merge(options))
                     end
                   end




Saturday, April 10, 2010
Pros & Cons
                      •    Simple, maps to your domain

                      •    Works for small, simple ORM intersections

                      •    MUCH simpler in Rails 3

                      •    Complex relationships are a mess

                      •    Makes your models fat

                      •    As DRY as the ocean



Saturday, April 10, 2010
The “Easy” Way:
                            DataMapper


Saturday, April 10, 2010
DataMapper

                      • Generic, relational ORM
                      • Speaks pretty much everything
                           you’ve ever heard of
                      • Implements Identity Map
                      • Module-based inclusion
Saturday, April 10, 2010
DataMapper.setup(:default, "mysql://localhost")
                   DataMapper.setup(:mongodb, "mongo://localhost/posts")

                   class Post
                     include DataMapper::Resource
                     def self.default_repository_name; :mongodb; end

                           property :title, String
                           property :body, String
                           property :tags, Array

                     belongs_to :user
                   end

                   class User
                     include DataMapper::Resource

                           property :email, String
                           property :name, String

                     has n, :posts
                   end




Saturday, April 10, 2010
Pros & Cons
                      •    The ultimate Polyglot ORM

                      •    Simple relationships between persistence
                           engines are easy

                      •    Jack of all trades, master of none

                      •    Perpetuates (sometimes) false assumptions

                      •    Legacy stuff is in ActiveRecord anyway



Saturday, April 10, 2010
Show and Tell:
                   Social Storefront


Saturday, April 10, 2010
Saturday, April 10, 2010
The Application
                      • Dummy version of a store that lets
                           others “follow” your purchases (like a
                           less creepy version of Blippy)
                      • Four requirements:
                            •   users

                            •   purchasing

                            •   listings

                            •   social graph

Saturday, April 10, 2010
Users

                      • I already have an authentication
                           system
                      • I’m happy with it
                      • It’s Devise and ActiveRecord
                      • Stick with SQL
Saturday, April 10, 2010
Purchasing

                      • Users need to be able to purchase
                           items from my storefront
                      • I can’t lose their transactions
                      • I need full ACID
                      • SQL Again
Saturday, April 10, 2010
Social Graph

                      • I want activity streams and one
                           and two way relationships
                      • I need speed
                      • I don’t need consistency
                      • I’ll use Redis
Saturday, April 10, 2010
Product Listings
                      • I am selling both books about
                           Ruby and movies about zombies
                      • They have very different
                           properties
                      • Products are relatively non-
                           relational
                      • I’ll use MongoDB
Saturday, April 10, 2010
Demo and
                           Walkthrough


Saturday, April 10, 2010
Wrapping Up



Saturday, April 10, 2010
These systems can
            (and should) live and
               work together


Saturday, April 10, 2010
Most important step
                is to actually think
                 about data design


Saturday, April 10, 2010
When you have a
                   whole bag of tools,
                   things stop looking
                        like nails

Saturday, April 10, 2010
@mbleigh

Saturday, April 10, 2010
Questions?



Saturday, April 10, 2010

Persistence Smoothie: Blending SQL and NoSQL (RubyNation Edition)

  • 1.
    Michael Bleigh Persistence Intridea, Inc. Smoothie Blending SQL and NoSQL photo by Nikki L. via Flickr Saturday, April 10, 2010
  • 2.
  • 3.
  • 4.
  • 5.
    You’ve (probably) heard a lot about NoSQL Saturday, April 10, 2010
  • 6.
    NoSQL is anew way to think about persistence Saturday, April 10, 2010
  • 7.
    Atomicity Consistency Isolation Durability Saturday, April 10, 2010
  • 8.
    Denormalization Eventual Consistency Schema-Free Horizontal Scale Map Reduce Saturday, April 10, 2010
  • 9.
    Map Reduce • Massively parallel way to process large datasets • First you scour data and “map” a new set of data • Then you “reduce” the data down to a salient result Saturday, April 10, 2010
  • 10.
    map = function(){ this.tags.forEach(function(tag) { emit(tag, {count: 1}); }); } reduce = function(key, values) { var total = 0; for (var i = 0; i < values.length; i++) { total += values[i].count; return {count: total}; } Saturday, April 10, 2010
  • 11.
    NoSQL tries toscale (more) simply Saturday, April 10, 2010
  • 12.
    NoSQL is going mainstream Saturday, April 10, 2010
  • 13.
    New York Times Business Insider BBC ShopWiki GitHub Meebo Disqus SourceForge Sony Digg Saturday, April 10, 2010
  • 14.
    ...but not THAT mainstream. Saturday, April 10, 2010
  • 15.
    A word ofcaution... Saturday, April 10, 2010
  • 16.
  • 17.
    sn’t d oe s QL wait oS , it N s le ep NoSQL can divide by zero NoSQL to infin counte ity, twi d ce Saturday, April 10, 2010
  • 18.
    NoSQL is a(growing) collection of tools, not a new way of life Saturday, April 10, 2010
  • 19.
  • 20.
    Key-Value Stores • Voldemort • Redis • Tokyo Cabinet • Riak • MemcachedDB Saturday, April 10, 2010
  • 21.
    Document Stores • MongoDB • Riak • CouchDB • FleetDB Saturday, April 10, 2010
  • 22.
    Column(ish) Stores • Cassandra • HBase Saturday, April 10, 2010
  • 23.
    Graph Databases • Neo4j • HypergraphDB • InfoGrid Saturday, April 10, 2010
  • 24.
    When should Iuse this stuff? Saturday, April 10, 2010
  • 25.
    Complex, slow joins for “activity stream” Saturday, April 10, 2010
  • 26.
    Complex, slow joins for “activity stream” Denormalize, use Key-Value Store Saturday, April 10, 2010
  • 27.
    Variable schema, vertical interaction Saturday, April 10, 2010
  • 28.
    Variable schema, vertical interaction Document Database or Column Store Saturday, April 10, 2010
  • 29.
    Modeling deep relationships Saturday, April 10, 2010
  • 30.
    Modeling deep relationships Graph Database Saturday, April 10, 2010
  • 31.
    NoSQL solves real scalability and data design issues Saturday, April 10, 2010
  • 32.
    Ben Scofield bit.ly/state-of-nosql Saturday, April 10, 2010
  • 33.
    Ready to go? Saturday,April 10, 2010
  • 34.
  • 35.
    Your data isalready in a SQL database Saturday, April 10, 2010
  • 36.
    So now weneed to ask the question... Saturday, April 10, 2010
  • 37.
  • 38.
  • 39.
    The “Hard” Way: Do it by hand. Saturday, April 10, 2010
  • 40.
    class Post include MongoMapper::Document key :title, String key :body, String key :tags, Array key :user_id, Integer def user User.find_by_id(self.user_id) end def user=(some_user) self.user_id = some_user.id end end class User < ActiveRecord::Base def posts(options = {}) Post.all({:conditions => {:user_id => self.id}}.merge(options)) end end Saturday, April 10, 2010
  • 41.
    Pros & Cons • Simple, maps to your domain • Works for small, simple ORM intersections • MUCH simpler in Rails 3 • Complex relationships are a mess • Makes your models fat • As DRY as the ocean Saturday, April 10, 2010
  • 42.
    The “Easy” Way: DataMapper Saturday, April 10, 2010
  • 43.
    DataMapper • Generic, relational ORM • Speaks pretty much everything you’ve ever heard of • Implements Identity Map • Module-based inclusion Saturday, April 10, 2010
  • 44.
    DataMapper.setup(:default, "mysql://localhost") DataMapper.setup(:mongodb, "mongo://localhost/posts") class Post include DataMapper::Resource def self.default_repository_name; :mongodb; end property :title, String property :body, String property :tags, Array belongs_to :user end class User include DataMapper::Resource property :email, String property :name, String has n, :posts end Saturday, April 10, 2010
  • 45.
    Pros & Cons • The ultimate Polyglot ORM • Simple relationships between persistence engines are easy • Jack of all trades, master of none • Perpetuates (sometimes) false assumptions • Legacy stuff is in ActiveRecord anyway Saturday, April 10, 2010
  • 46.
    Show and Tell: Social Storefront Saturday, April 10, 2010
  • 47.
  • 48.
    The Application • Dummy version of a store that lets others “follow” your purchases (like a less creepy version of Blippy) • Four requirements: • users • purchasing • listings • social graph Saturday, April 10, 2010
  • 49.
    Users • I already have an authentication system • I’m happy with it • It’s Devise and ActiveRecord • Stick with SQL Saturday, April 10, 2010
  • 50.
    Purchasing • Users need to be able to purchase items from my storefront • I can’t lose their transactions • I need full ACID • SQL Again Saturday, April 10, 2010
  • 51.
    Social Graph • I want activity streams and one and two way relationships • I need speed • I don’t need consistency • I’ll use Redis Saturday, April 10, 2010
  • 52.
    Product Listings • I am selling both books about Ruby and movies about zombies • They have very different properties • Products are relatively non- relational • I’ll use MongoDB Saturday, April 10, 2010
  • 53.
    Demo and Walkthrough Saturday, April 10, 2010
  • 54.
  • 55.
    These systems can (and should) live and work together Saturday, April 10, 2010
  • 56.
    Most important step is to actually think about data design Saturday, April 10, 2010
  • 57.
    When you havea whole bag of tools, things stop looking like nails Saturday, April 10, 2010
  • 58.
  • 59.