Persistence                     Michael Bleigh
                                       Intridea, Inc.


      Smoothie
            Blending SQL and NoSQL




      photo by Nikki L. via Flickr

Thursday, March 11, 2010
Thursday, March 11, 2010
Thursday, March 11, 2010
present.ly

Thursday, March 11, 2010
tweetstream hashie
                        acts-as-taggable-on
                      subdomain-fu seed-fu
                           mustache_json

                           github.com/intridea

Thursday, March 11, 2010
@mbleigh

Thursday, March 11, 2010
You’ve (probably)
                       heard a lot about
                            NoSQL


Thursday, March 11, 2010
NoSQL is a new way
                to think about
                  persistence


Thursday, March 11, 2010
Atomicity
                           Consistency
                            Isolation
                            Durability


Thursday, March 11, 2010
Denormalization
               Eventual Consistency
                  Schema-Free
                 Horizontal Scale


Thursday, March 11, 2010
NoSQL tries to scale
                 (more) simply


Thursday, March 11, 2010
NoSQL is going
                            mainstream


Thursday, March 11, 2010
New York Times
                   Business Insider
                   BBC ShopWiki
                    GitHub Meebo
                 Disqus SourceForge
                      Sony Digg

Thursday, March 11, 2010
...but not THAT
                              mainstream.


Thursday, March 11, 2010
A word of caution...



Thursday, March 11, 2010
NoSQL can
                           divide by zero


Thursday, March 11, 2010
sn’t
                d oe s
             QL wait
         oS , it
       N
        s le ep
                            NoSQL can
                           divide by zero
                                NoSQL
                               to infin  counte
                                        ity, twi  d
                                                 ce
Thursday, March 11, 2010
NoSQL is a (growing)
             collection of tools, not
               a new way of life


Thursday, March 11, 2010
Key-Value Stores
            Document Databases
               Column Stores
             Graph Databases

Thursday, March 11, 2010
Key-Value Stores



Thursday, March 11, 2010
Redis

                    • Key-value store + datatypes
                     • Lists, (Scored) Sets, Hashes
                    • Cache-like functions
                           (expiration)
                    • (Mostly) In-Memory
Thursday, March 11, 2010
Riak

                    • Combo key-value store and
                           document database
                    • HTTP REST interface
                    • “Link walking”
                    • Map-Reduce
Thursday, March 11, 2010
Map/Reduce
                    • Massively parallel way to
                           process large datasets
                    • First you scour data and “map” a
                           new set of data
                    • Then you “reduce” the data
                           down to a salient result

Thursday, March 11, 2010
map = function() {
                    this.tags.forEach(function(tag) {
                      emit(tag, {count: 1});
                    });
                  }

                  reduce = function(key, values) {
                    var total = 0;
                    for (var i = 0; i < values.length; i++) {
                      total += values[i].count;
                    return {count: total};
                  }




Thursday, March 11, 2010
Tokyo Cabinet
                             Dynomite
                           MemcachedDB
                             Voldemort

Thursday, March 11, 2010
Document Databases



Thursday, March 11, 2010
MongoDB

                    • Document store that speaks
                           BSON (Binary JSON)
                    • Indexing, simple query syntax
                    • GridFS
                    • Deliberate MapReduce
Thursday, March 11, 2010
CouchDB

                    • JSON Document Store
                    • HTTP REST Interface
                    • Incremental MapReduce
                    • Intelligent Replication
Thursday, March 11, 2010
Column-Oriented
                              Datastores


Thursday, March 11, 2010
Cassandra

                    • Built by Facebook,
                           used by Twitter
                    • Pure horizontal scalability
                    • Schemaless

Thursday, March 11, 2010
Graph Databases



Thursday, March 11, 2010
Neo4J



Thursday, March 11, 2010
When should I use
                        this stuff?


Thursday, March 11, 2010
Complex, slow joins
               for “activity stream”




Thursday, March 11, 2010
Complex, slow joins
               for “activity stream”

                 Denormalize,
              use Key-Value Store
Thursday, March 11, 2010
Variable schema,
                   vertical interaction




Thursday, March 11, 2010
Variable schema,
                   vertical interaction

                Document Database
                 or Column Store
Thursday, March 11, 2010
Modeling multi-step
                relationships




Thursday, March 11, 2010
Modeling multi-step
                relationships


                           Graph Database

Thursday, March 11, 2010
NoSQL solves real
                scalability and data
                   design issues


Thursday, March 11, 2010
Ben Scofield
               bit.ly/state-of-nosql


Thursday, March 11, 2010
Ready to go?



Thursday, March 11, 2010
Just one problem...



Thursday, March 11, 2010
Your data is already
                 in a SQL database


Thursday, March 11, 2010
We CAN all just
                             get along.


Thursday, March 11, 2010
Three Ways



Thursday, March 11, 2010
The Hard(ish) Way



Thursday, March 11, 2010
The Easy Way



Thursday, March 11, 2010
A Better Way?



Thursday, March 11, 2010
The Hard Way:
                           Do it by hand.


Thursday, March 11, 2010
class Post
                    include MongoMapper::Document

                      key   :title, String
                      key   :body, String
                      key   :tags, Array
                      key   :user_id, Integer

                      def user
                        User.find_by_id(self.user_id)
                      end

                    def user=(some_user)
                      self.user_id = some_user.id
                    end
                  end

                  class User < ActiveRecord::Base
                    def posts(options = {})
                      Post.all({:conditions => {:user_id => self.id}}.merge(options))
                    end
                  end




Thursday, March 11, 2010
Pros & Cons
                    •      Simple, maps to your domain

                    •      Works for small, simple ORM intersections

                    •      MUCH simpler in Rails 3

                    •      Complex relationships are a mess

                    •      Makes your models fat

                    •      As DRY as the ocean



Thursday, March 11, 2010
The Easy Way:
                            DataMapper


Thursday, March 11, 2010
DataMapper

                    • Generic, relational ORM
                    • Speaks pretty much everything
                           you’ve ever heard of
                    • Implements Identity Map
                    • Module-based inclusion
Thursday, March 11, 2010
DataMapper.setup(:default, "mysql://localhost")
                  DataMapper.setup(:mongodb, "mongo://localhost/posts")

                  class Post
                    include DataMapper::Resource
                    def self.default_repository_name; :mongodb; end

                      property :title, String
                      property :body, String
                      property :tags, Array

                    belongs_to :user
                  end

                  class User
                    include DataMapper::Resource

                      property :email, String
                      property :name, String

                    has n, :posts
                  end




Thursday, March 11, 2010
Pros & Cons
                    •      The ultimate Polyglot ORM

                    •      Simple relationships between persistence
                           engines are easy

                    •      Jack of all trades, master of none

                    •      Perpetuates (sometimes) false assumptions

                    •      Legacy stuff is in ActiveRecord anyway



Thursday, March 11, 2010
Is there a better way?



Thursday, March 11, 2010
Maybe.



Thursday, March 11, 2010
Gloo: Cross-ORM
           Relationship Mapper

                           github.com/intridea/gloo


Thursday, March 11, 2010
0.0.0.prealpha.1



Thursday, March 11, 2010
Can’t we just sit
                           down and talk to
                             each other?


Thursday, March 11, 2010
class Post
                    include MongoMapper::Resource

                       key :title, String
                       key :body, String
                       key :tags, Array

                    gloo :active_record do
                      belongs_to :user
                    end
                  end

                  class User < ActiveRecord::Base
                    gloo :mongo_mapper do
                      many :posts
                    end
                  end




Thursday, March 11, 2010
Goals/Status
                    • Be able to define relationships
                           on the terms of any ORM from
                           any class, ORM or not
                    • Right Now: Partially working
                           ActiveRecord relationships
                    • Doing it wrong? Maybe
Thursday, March 11, 2010
Code Time:
                           Schema4Less


Thursday, March 11, 2010
Social Storefront
                    • Dummy application of a store that
                           lets others “follow” your purchases (a
                           less creepy Blippy?)
                    • Four requirements:
                            •   users

                            •   purchasing

                            •   listings

                            •   social graph

Thursday, March 11, 2010
Users

                    • I already have an authentication
                           system
                    • I’m happy with it
                    • It’s Devise and ActiveRecord
                    • Stick with SQL
Thursday, March 11, 2010
Purchasing

                    • Users need to be able to purchase
                           items from my storefront
                    • I can’t lose their transactions
                    • I need full ACID
                    • I’ll use MySQL
Thursday, March 11, 2010
Social Graph

                    • I want activity streams and one
                           and two way relationships
                    • I need speed
                    • I don’t need consistency
                    • I’ll use Redis
Thursday, March 11, 2010
Product Listings
                    • I am selling both movies and
                           books
                    • They have very different
                           properties
                    • Products are relatively non-
                           relational
                    • I’ll use MongoDB
Thursday, March 11, 2010
Demo and
                           Walkthrough


Thursday, March 11, 2010
Thursday, March 11, 2010
Wrapping Up



Thursday, March 11, 2010
These systems can
           (and should) live and
              work together


Thursday, March 11, 2010
Most important step
               is to actually think
                about data design


Thursday, March 11, 2010
When you have a
                  whole bag of tools,
                  things stop looking
                       like nails

Thursday, March 11, 2010
Questions?



Thursday, March 11, 2010

Persistence Smoothie

  • 1.
    Persistence Michael Bleigh Intridea, Inc. Smoothie Blending SQL and NoSQL photo by Nikki L. via Flickr Thursday, March 11, 2010
  • 2.
  • 3.
  • 4.
  • 5.
    tweetstream hashie acts-as-taggable-on subdomain-fu seed-fu mustache_json github.com/intridea Thursday, March 11, 2010
  • 6.
  • 7.
    You’ve (probably) heard a lot about NoSQL Thursday, March 11, 2010
  • 8.
    NoSQL is anew way to think about persistence Thursday, March 11, 2010
  • 9.
    Atomicity Consistency Isolation Durability Thursday, March 11, 2010
  • 10.
    Denormalization Eventual Consistency Schema-Free Horizontal Scale Thursday, March 11, 2010
  • 11.
    NoSQL tries toscale (more) simply Thursday, March 11, 2010
  • 12.
    NoSQL is going mainstream Thursday, March 11, 2010
  • 13.
    New York Times Business Insider BBC ShopWiki GitHub Meebo Disqus SourceForge Sony Digg Thursday, March 11, 2010
  • 14.
    ...but not THAT mainstream. Thursday, March 11, 2010
  • 15.
    A word ofcaution... Thursday, March 11, 2010
  • 16.
    NoSQL can divide by zero Thursday, March 11, 2010
  • 17.
    sn’t d oe s QL wait oS , it N s le ep NoSQL can divide by zero NoSQL to infin counte ity, twi d ce Thursday, March 11, 2010
  • 18.
    NoSQL is a(growing) collection of tools, not a new way of life Thursday, March 11, 2010
  • 19.
    Key-Value Stores Document Databases Column Stores Graph Databases Thursday, March 11, 2010
  • 20.
  • 21.
    Redis • Key-value store + datatypes • Lists, (Scored) Sets, Hashes • Cache-like functions (expiration) • (Mostly) In-Memory Thursday, March 11, 2010
  • 22.
    Riak • Combo key-value store and document database • HTTP REST interface • “Link walking” • Map-Reduce Thursday, March 11, 2010
  • 23.
    Map/Reduce • Massively parallel way to process large datasets • First you scour data and “map” a new set of data • Then you “reduce” the data down to a salient result Thursday, March 11, 2010
  • 24.
    map = function(){ this.tags.forEach(function(tag) { emit(tag, {count: 1}); }); } reduce = function(key, values) { var total = 0; for (var i = 0; i < values.length; i++) { total += values[i].count; return {count: total}; } Thursday, March 11, 2010
  • 25.
    Tokyo Cabinet Dynomite MemcachedDB Voldemort Thursday, March 11, 2010
  • 26.
  • 27.
    MongoDB • Document store that speaks BSON (Binary JSON) • Indexing, simple query syntax • GridFS • Deliberate MapReduce Thursday, March 11, 2010
  • 28.
    CouchDB • JSON Document Store • HTTP REST Interface • Incremental MapReduce • Intelligent Replication Thursday, March 11, 2010
  • 29.
    Column-Oriented Datastores Thursday, March 11, 2010
  • 30.
    Cassandra • Built by Facebook, used by Twitter • Pure horizontal scalability • Schemaless Thursday, March 11, 2010
  • 31.
  • 32.
  • 33.
    When should Iuse this stuff? Thursday, March 11, 2010
  • 34.
    Complex, slow joins for “activity stream” Thursday, March 11, 2010
  • 35.
    Complex, slow joins for “activity stream” Denormalize, use Key-Value Store Thursday, March 11, 2010
  • 36.
    Variable schema, vertical interaction Thursday, March 11, 2010
  • 37.
    Variable schema, vertical interaction Document Database or Column Store Thursday, March 11, 2010
  • 38.
    Modeling multi-step relationships Thursday, March 11, 2010
  • 39.
    Modeling multi-step relationships Graph Database Thursday, March 11, 2010
  • 40.
    NoSQL solves real scalability and data design issues Thursday, March 11, 2010
  • 41.
    Ben Scofield bit.ly/state-of-nosql Thursday, March 11, 2010
  • 42.
    Ready to go? Thursday,March 11, 2010
  • 43.
  • 44.
    Your data isalready in a SQL database Thursday, March 11, 2010
  • 45.
    We CAN alljust get along. Thursday, March 11, 2010
  • 46.
  • 47.
  • 48.
    The Easy Way Thursday,March 11, 2010
  • 49.
  • 50.
    The Hard Way: Do it by hand. Thursday, March 11, 2010
  • 51.
    class Post include MongoMapper::Document key :title, String key :body, String key :tags, Array key :user_id, Integer def user User.find_by_id(self.user_id) end def user=(some_user) self.user_id = some_user.id end end class User < ActiveRecord::Base def posts(options = {}) Post.all({:conditions => {:user_id => self.id}}.merge(options)) end end Thursday, March 11, 2010
  • 52.
    Pros & Cons • Simple, maps to your domain • Works for small, simple ORM intersections • MUCH simpler in Rails 3 • Complex relationships are a mess • Makes your models fat • As DRY as the ocean Thursday, March 11, 2010
  • 53.
    The Easy Way: DataMapper Thursday, March 11, 2010
  • 54.
    DataMapper • Generic, relational ORM • Speaks pretty much everything you’ve ever heard of • Implements Identity Map • Module-based inclusion Thursday, March 11, 2010
  • 55.
    DataMapper.setup(:default, "mysql://localhost") DataMapper.setup(:mongodb, "mongo://localhost/posts") class Post include DataMapper::Resource def self.default_repository_name; :mongodb; end property :title, String property :body, String property :tags, Array belongs_to :user end class User include DataMapper::Resource property :email, String property :name, String has n, :posts end Thursday, March 11, 2010
  • 56.
    Pros & Cons • The ultimate Polyglot ORM • Simple relationships between persistence engines are easy • Jack of all trades, master of none • Perpetuates (sometimes) false assumptions • Legacy stuff is in ActiveRecord anyway Thursday, March 11, 2010
  • 57.
    Is there abetter way? Thursday, March 11, 2010
  • 58.
  • 59.
    Gloo: Cross-ORM Relationship Mapper github.com/intridea/gloo Thursday, March 11, 2010
  • 60.
  • 61.
    Can’t we justsit down and talk to each other? Thursday, March 11, 2010
  • 62.
    class Post include MongoMapper::Resource key :title, String key :body, String key :tags, Array gloo :active_record do belongs_to :user end end class User < ActiveRecord::Base gloo :mongo_mapper do many :posts end end Thursday, March 11, 2010
  • 63.
    Goals/Status • Be able to define relationships on the terms of any ORM from any class, ORM or not • Right Now: Partially working ActiveRecord relationships • Doing it wrong? Maybe Thursday, March 11, 2010
  • 64.
    Code Time: Schema4Less Thursday, March 11, 2010
  • 65.
    Social Storefront • Dummy application of a store that lets others “follow” your purchases (a less creepy Blippy?) • Four requirements: • users • purchasing • listings • social graph Thursday, March 11, 2010
  • 66.
    Users • I already have an authentication system • I’m happy with it • It’s Devise and ActiveRecord • Stick with SQL Thursday, March 11, 2010
  • 67.
    Purchasing • Users need to be able to purchase items from my storefront • I can’t lose their transactions • I need full ACID • I’ll use MySQL Thursday, March 11, 2010
  • 68.
    Social Graph • I want activity streams and one and two way relationships • I need speed • I don’t need consistency • I’ll use Redis Thursday, March 11, 2010
  • 69.
    Product Listings • I am selling both movies and books • They have very different properties • Products are relatively non- relational • I’ll use MongoDB Thursday, March 11, 2010
  • 70.
    Demo and Walkthrough Thursday, March 11, 2010
  • 71.
  • 72.
  • 73.
    These systems can (and should) live and work together Thursday, March 11, 2010
  • 74.
    Most important step is to actually think about data design Thursday, March 11, 2010
  • 75.
    When you havea whole bag of tools, things stop looking like nails Thursday, March 11, 2010
  • 76.