• Save
No SQL, No problem - using MongoDB in Ruby
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Great presentation!

    The only part that seems a bit confusing is the page on eventual consistency. I'm guessing it made sense in the context of your live talk, but it's seems you might be implying that MongoDB is an eventually consistent model (which it's not).
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
15,686
On Slideshare
15,528
From Embeds
158
Number of Embeds
5

Actions

Shares
Downloads
0
Comments
1
Likes
24

Embeds 158

http://www.slideshare.net 131
http://nhruby.org 24
http://www.nhruby.org 1
https://twitter.com 1
http://www.techgig.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide


  • E.F. Codd asserted that, mathematically, no commercial database conformed to his true Relational modelPredicates, Predicate variables, relations, tuples, superkeys, finite projectionsAtomicity Consistency Isolation Durability

  • IBM’s first SQL release: System R
    paper: 1974






  • Bring attention to EAV box!
  • originated with the concept of "association lists" AKA key/value pairs
    A “simple” way to attach arbitrary attributes and values to records in a normalized RDMBS
  • "Physical schema" (actual storage structure) is radically different from the "logical schema" – the way users and applications see it.
    PIVOTING: Converting logical schema to/from physical schema
    Note: full scan, no type control



  • Inefficiency - non-optimal JOINs
    Leaky abstraction









  • memcached - K/V store
    Lotus Notes - multivalue
    Zope Object Database - OODBMS
    marketibility?







  • combination - facebook “Hive” based on Hadoop with “QL”
  • mostly Hadoop, true

  • A column family is a container for columns, analogous to the table in a relational system.
    CouchDB - REST interface, JSON response
    Redis - in-memory, journaled changes to data stored to disk
    Tokyo Cabinet - update of GDBM
    Hbase - the Hadoop Database, modeled on Google BigTable






  • (JavaScript Object Notation) is a lightweight data-interchange format.It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999.













  • assigning specific documents to “thing” and “fred”
  • Using DBRef to create a reference between collections
  • Showing that the $ref object in fred is tied to the actual record in things collection


  • find always returns a MongoDB::Cursor that is iterable / The query doesn’t get run until you actually attempt to retrieve data from a cursor. / Cursors have a to_a method.
    find_one returns a single object
    where uses any valid JS expression
  • $in = array contains, $all = array equivalent, $size = count fields
    $exists = bool, $type = string/int
  • complete ruby program
  • collection.update(selector, document, options = {})
    Upserts - :upsert -> true
  • patented[1] software framework introduced by Google to support distributed computing on large data sets on clusters of computers.[2]
  • CouchDB - lacks native conditionals, but uses Javascript anyway
    is SQL that much better?
  • specify the map and reduce functions in JavaScript, as strings
    reduce receives array of values for each element emitted by map
  • chunks = 256k, auto-sharded
    GridFilesystem - emulates a filesystem - write, open, close, delete, etc
    GridFS saves whatever metadata - GridFS is a specification for mapping chunks->files

  • CC: FIFO - Logging, caches, auto-archiving
    MK: auto-index on any array values
    FTS : split text into array, use MK - no native stemming, bulk index (yet)
    AS : beta. Uses router (godfather), config servers (consiglere), mongod instances (map/reduce recommended)
    RP: master/slave - to be replaced by Replica Sets in 1.6 - Eventual Consistency
  • Amazon popularized the concept of “Eventual Consistency”.  Their definition is: the storage system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.

  • MongoRecord = 10gen’s Original OM, ActiveRecord-ish, works w/ Rails
    mongomapper - datamapper-ish v0.7 - in production
    Candy - Candy's goal is to provide the simplest possible object persistence for the MongoDB database. By "simple" we mean "nearly invisible." Candy doesn't try to mirror ActiveRecord or DataMapper.
    (alpha 0.2)

  • taken directly from tests


Transcript

  • 1. no SQL, no problem development with mongodb and Sinatra Sam Beam Onset Corps
  • 2. A Brief History of Relational Databases* (*as if you need it) • “A Relational Model of Data for Large Shared Databanks” - Edgar F. Codd, 1970 -- IBM • “Standard” SQL - Structured Query Language • ACID (atomicity, consistency, isolation, durability) photo: osti.gov
  • 3. The Relational Data Model • An engine based on “rules” and “facts” • Consistency/Isolation self-enforced • ACID
  • 4. SQL • History • Purpose • “Standards” • Extensions
  • 5. SQL • History • 1974 - IBM Research “SEQUEL” Ingres, UC Berkley • 1979 - 1983 First IBM releases
  • 6. SQL • History • 1974 - IBM Research “SEQUEL” Ingres, UC Berkley • 1979 - 1983 First IBM releases
  • 7. SQL • Purpose SELECT name FROM emp WHERE salary > 55000 AND dept = ’sales’ • Simple, set-based, declarative syntax brochure: 1980-85 computerhistory.org DEC microcomputer 1983 (128kB RAM) computerweekly.com
  • 8. SQL IBM Starlink Workstations, 1983 computerweekly.com • “Standards” • ANSI SQL-92 • no existing RDBMS in full compliance
  • 9. SQL IBM AS/400 computerweekly.com • Extensions • Procedural languages (PL/SQL, T-SQL,pgSQL etc) • Storage type extensions (BLOB, XML, Java)
  • 10. SQL Extensions SQL/XML SELECT XMLElement (name emp, Store documents as XMLForest(last_name || ’,’ || CLOB first_name AS fullname, salary) ) FROM emp; or mapped to columns Tools for <emp> ‣Annotating <fullname>Tiger, Scott</fullname> ‣Indexing <salary>10000</salary> ‣Searching (XPath) </emp> ‣Validating (DTD) <emp> <fullname>Smith, John</fullname> <salary>12000</salary> XML documents directly </emp>
  • 11. SQL Extensions SQL/XML SELECT XMLElement (name emp, Store documents as XMLForest(last_name || ’,’ || CLOB first_name AS fullname, salary) ) FROM emp; or mapped to columns Tools for <emp> ‣Annotating <fullname>Tiger, Scott</fullname> ‣Indexing <salary>10000</salary> ‣Searching (XPath) </emp> ‣Validating (DTD) <emp> <fullname>Smith, John</fullname> <salary>12000</salary> XML documents directly </emp>
  • 12. SQL Extensions Example of specialized, sparse data Geospatial/Vector • Vector Markup Language (VML) Several choices - all XML • Scalable Vector Graphics (SVG) • Geography Markup Language (GML) • LandXML • Keyhole Markup Language • X3D • G P S eXchange Format (GP X ) • VRML
  • 13. SQL Extensions SELECT TO_NUMBER(EXTRACTVALUE(VALUE(t1), 'trk/number')) as track_number, SUBSTR(EXTRACTVALUE(VALUE(t1), 'trk/name'),1,10) as track_name, Example of specialized, sparse data mdsys.sdo_geometry(CASE WHEN (SELECT COUNT(*) FROM TABLE(XMLSequence(EXTRACT(VALUE(allSegments), 'trk/trkseg', 'xmlns="http://www.topografix.com/GPX/1/1"')))) > 1 THEN 3006 ELSE 3002 Geospatial/Vector END, 8307, NULL, • Vector Markup Language (VML) Several choices - all XML GetElemInfoFromXML(EXTRACT(VALUE(allSegments),'trk/trkseg','xmlns="http://www.topografix.com/GPX/1/1"')), • Scalable Vector Graphics (SVG) • Geography Markup Language (GML) CAST(MULTISET(SELECT case when mod(rownum,3) = 1 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lon','xmlns="http:// • LandXML • Keyhole Markup Language when mod(rownum,3) = 2 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/@lat','xmlns="http:// • X3D • G P S eXchange Format (GP X ) when mod(rownum,3) = 0 then TO_NUMBER(EXTRACT(VALUE(t), '/trkpt/ele/text()','xmlns="h • VRML 1/1"')) end ordinate FROM (select level as rin from dual connect by level < 4) r, TABLE(XMLSequence(EXTRACT(VALUE(allSegments),'trk/trkseg/trkpt','xmlns="http://www.topogr ) as mdsys.sdo_ordinate_array )) as geom FROM GPX2 g, TABLE(XMLSequence(EXTRACT(g.OBJECT_VALUE,'/gpx/trk','xmlns="http://www.topografix.com/GPX/1/1"'))) t1, TABLE(XMLSequence(EXTRACT(VALUE(t1),'trk[number=' || EXTRACTVALUE(VALUE(t1), 'trk/number') || ']','xmlns="http:// 1/1"'))) allSegments;
  • 14. Typical RDBMS Schema Issues • Conflicts • Downtime • Deployment • Scaling image credit: www.magentocommerce.com
  • 15. Typical RDBMS Schema “For purposes of flexibility, the Magento Issues database heavily utilizes an Entity-Attribute- Value (EAV) data model. As is often the case, the cost of flexibility is complexity - Magento is no exception. The process of • Conflictsmanipulating data in Magento is often more “involved” than that typically experienced using • Downtime traditional relational tables.” http://www.magentocommerce.com/wiki/ • Deployment development/ • Scaling image credit: www.magentocommerce.com
  • 16. Entity-Attribute-Value When you have Unknown Unknowns A “thing” and “properties” engine (instead of “rules” and “facts”) Often found in: •e-commerce •medical records •event logging •science image: Yale Univ. School of Medicine senselab.med.yale.edu
  • 17. Entity-Attribute-Value Only alternative: create table customer( Cust_ID number, Cust_Name varchar2(xxx), Cust_Contact varchar2(xxxx), field1 varchar2(4000), field2 varchar2(4000), ... fieldN varchar2(4000) ) credit: asktom.oracle.com
  • 18. Logical Progression <sarcasm> new datatype “xml_universe” XML serialized object DTD containing all the known attributes of "everything" Schema: CREATE TABLE everything ( xml_universe NOT NULL ); </sarcasm>
  • 19. Enter the Dragon “NoSQL”
  • 20. Enter the Dragon “NoSQL” NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. -- Wikipedia
  • 21. Enter the Dragon “NoSQL” NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. -- Wikipedia
  • 22. “NoSQL” NoSQL is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. -- Wikipedia • Many techniques • Many weapons • Many use cases
  • 23. “NoSQL”
  • 24. “NoSQL” “Not Only SQL” “Non-Relational Database Management System”
  • 25. “NoSQL” neg “Not Only SQL” ati ve “Non-Relational Database Management System”
  • 26. Structured Storage Key-Value Store Document Store Multivalue Database OODBMS
  • 27. Paradigm Change
  • 28. Paradigm Change CAP Theorem “One can only have two of Consistency, Availability, and tolerance to network Partitions at the same time”
  • 29. Paradigm Change CAP Theorem “One can only have two of Consistency, Availability, and tolerance to network Partitions at the same time” If the network is broken, your database won’t work.
  • 30. Paradigm Change CAP Theorem “One can only have two of Consistency, Availability, and tolerance to network Partitions at the same time” If the network is broken, your database won’t work. The network is going to break.
  • 31. Paradigm Change CAP Theorem Consistency Availability
  • 32. Paradigm Change • Non-relational • Schema-free • “Easily” scalable • OO friendly (no ORM) • web friendly - REST/JSON API
  • 33. Paradigm Change When? • High-volume, low value (social media, web) • Unknown Unknowns • Storage of application objects • Caching, Logging
  • 34. Paradigm Change When NOT? • ACID • Traditional Waterfall development • Problems requiring Relational model * * combination?
  • 35. Paradigm Change Who’s driving it?
  • 36. The Candidates
  • 37. The Candidates Storage Type License Implementation Cassandra ColumnFamily * Apache 2,0 Java CouchDB Document Apache 2,0 Erlang Hbase ColumnFamily * Apache 2,0 Java Redis Key/Value BSD C Tokyo Cabinet Key/Value LGPL C Voldemort Key/Value Apache 2,0 Java Memcached Key/Value BSD C MongoDB Document (BSON) AGPL 3.0 C++
  • 38. • Document-oriented • Dynamic queries • Full dynamic index support • Efficient binary large-object storage • Built for speed • Replication and Auto-failover
  • 39. Installation ‣ Download source or binary for OS X, Linux, Windows http://www.mongodb.org/ ‣ Make data directory $ mkdir /some/path/mongodb ‣ Run! $ bin/mongod --dbpath=/some/path/mongodb
  • 40. Database Structure ‣Separate DBs ‣Organized into Collections top-level key Document
  • 41. Collections ‣Group things into logical classes ‣Indexable by one or more keys ‣Schema-free!
  • 42. Documents ‣Always contains key _id ‣Creating Relationships: subdocument, shared key, or DBRef ‣Native storage and transfer : BSON
  • 43. JSON A collection of name/value pairs. [object, record, struct, dictionary, hash table, keyed list, associative array] An ordered list of values. [array, vector, list, sequence] http://json.org/
  • 44. BSON BSON is a binary encoded serialization of JSON-like documents. http://bsonspec.org/ http://www.mongodb.org/display/DOCS/BSON
  • 45. JSON/BSON Example { author : "Joe Example", created : Date(’03-28-2010’), title : "My latest blog post", tags : [ "example", "joe", "testing"], comments : [ { author : 'jim', comment : 'I disagree' }, { author : 'nancy', comment : 'Good post' } ] } http://bsonspec.org/ http://www.mongodb.org/display/DOCS/BSON
  • 46. mongo shell $ mongo MongoDB shell version: 1.5.0-pre- url: test connecting to: test type "help" for help > show dbs admin shorty test > use test switched to db test >
  • 47. mongo shell $ mongo MongoDB shell version: 1.5.0-pre- url: test connecting to: test type "help" for help > show dbs admin shorty test > use test switched to db test > show collections foo fs.chunks fs.files system.indexes >
  • 48. mongo shell > db.foo.find() > db.foo.save( { name : 'Fred Flintstone', catchphrase : 'Yabba Dabba doo' } ) > db.foo.find() { "_id" : ObjectId("4bcb1dc899d3ae6c0c68035b"), "name" : "Fred Flintstone", "catchphrase" : "Yabba Dabba doo" } >
  • 49. mongo shell > db.foo.save( { name : 'Fred Flintstone', catchphrase : 'Yabba Dabba doo' } ) > db.foo.save( { name : 'Barney Rubble' } ) > db.foo.find() { "_id" : ObjectId("4bcb213199d3ae6c0c680361"), "name" : "Fred Flintstone", "catchphrase" : "Yabba Dabba doo" } { "_id" : ObjectId("4bcb213799d3ae6c0c680362"), "name" : "Barney Rubble" } >
  • 50. mongo shell > for( var i = 1; i < 10; i++ ) db.things.save( { x:'thing'+i, counter:i } ); >
  • 51. mongo shell > for( var i = 1; i < 10; i++ ) db.things.save( { x:'thing'+i, counter:i } ); > db.things.find() { "_id" : ObjectId("4bcb22e199d3ae6c0c680363"), "x" : "thing1", "counter" : 1 } { "_id" : ObjectId("4bcb22e199d3ae6c0c680364"), "x" : "thing2", "counter" : 2 } { "_id" : ObjectId("4bcb22e199d3ae6c0c680365"), "x" : "thing3", "counter" : 3 } { "_id" : ObjectId("4bcb22e199d3ae6c0c680366"), "x" : "thing4", "counter" : 4 } { "_id" : ObjectId("4bcb22e199d3ae6c0c680367"), "x" : "thing5", "counter" : 5 } { "_id" : ObjectId("4bcb22e199d3ae6c0c680368"), "x" : "thing6", "counter" : 6 } { "_id" : ObjectId("4bcb22e199d3ae6c0c680369"), "x" : "thing7", "counter" : 7 } { "_id" : ObjectId("4bcb22e199d3ae6c0c68036a"), "x" : "thing8", "counter" : 8 } { "_id" : ObjectId("4bcb22e199d3ae6c0c68036b"), "x" : "thing9", "counter" : 9 } >
  • 52. mongo shell > db.things.find( { x : 'thing7' } ) { "_id" : ObjectId("4bcb22e199d3ae6c0c680369"), "x" : "thing7", "counter" : 7 } >
  • 53. mongo shell > db.things.find( { counter : { $gt : 5, $lte : 8 } } ) { "_id" : ObjectId("4bcb22e199d3ae6c0c680368"), "x" : "thing6", "counter" : 6 } { "_id" : ObjectId("4bcb22e199d3ae6c0c680369"), "x" : "thing7", "counter" : 7 } { "_id" : ObjectId("4bcb22e199d3ae6c0c68036a"), "x" : "thing8", "counter" : 8 } >
  • 54. mongo shell > db.foo.update( { name : 'Fred Flintstone' }, { $set : { wife : 'Wilma' }} ) > db.foo.update( { name : 'Barney Rubble' }, { $set : { wife : 'Betty' }} ) > db.foo.find() { "_id" : ObjectId("4bcb213199d3ae6c0c680361"), "catchphrase" : "Yabba Dabba doo", "name" : "Fred Flintstone", "wife" : "Wilma" } { "_id" : ObjectId("4bcb213799d3ae6c0c680362"), "name" : "Barney Rubble", "wife" : "Betty" }
  • 55. mongo shell > db.foo.update( { name : 'Fred Flintstone' }, { $set : { kids : ['Pebbles'] } } ) > db.foo.findOne( {name : 'Fred Flintstone'} ) { "_id" : ObjectId("4bcb213199d3ae6c0c680361"), "catchphrase" : "Yabba Dabba doo", "kids" : [ "Pebbles" ], "name" : "Fred Flintstone", "wife" : "Wilma" }
  • 56. mongo shell > db.foo.update( { name : 'Fred Flintstone' }, { $push : { kids : 'Bam-Bam' } } ) > db.foo.findOne( {"_id" : ObjectId("4bcb213199d3ae6c0c680361")}) { "_id" : ObjectId("4bcb213199d3ae6c0c680361"), "catchphrase" : "Yabba Dabba doo", "kids" : [ "Pebbles", "Bam-Bam" ], "name" : "Fred Flintstone", "wife" : "Wilma" } >
  • 57. mongo shell > thing = db.things.findOne( { counter : 3 } ) { "_id" : ObjectId("4bcb22e199d3ae6c0c680365"), "x" : "thing3", "counter" : 3 } > fred = db.foo.findOne( {name : 'Fred Flintstone'} ) { "_id" : ObjectId("4bcb213199d3ae6c0c680361"), "catchphrase" : "Yabba Dabba doo", "kids" : [ "Pebbles", "Bam-Bam" ], "name" : "Fred Flintstone", "things" : null, "wife" : "Wilma" } >
  • 58. mongo shell > fred.things = [] [ ] > fred.things.push ( new DBRef('things', thing._id) ) > db.foo.save(fred) > fred { "_id" : ObjectId("4bcb213199d3ae6c0c680361"), "catchphrase" : "Yabba Dabba doo", "kids" : [ "Pebbles", "Bam-Bam" ], "name" : "Fred Flintstone", "things" : [ { "$ref" : "things", "$id" : ObjectId("4bcb22e199d3ae6c0c680365") } ], "wife" : "Wilma" }
  • 59. mongo shell > fred.things[0] { "$ref" : "things", "$id" : ObjectId("4bcb22e199d3ae6c0c680365") } > fred.things[0].fetch() { "_id" : ObjectId("4bcb22e199d3ae6c0c680365"), "x" : "thing3", "counter" : 3 } >
  • 60. and Ruby
  • 61. and Ruby mongo-ruby-driver gem install mongo gem install mongo_ext
  • 62. and Ruby Querying DB = Connection.new('localhost').db('test') coll = DB['mycollection'] coll.find( { :first_name => 'Fred' } )     # find all Freds coll.find( { :email => /@gmail.com$/i } )  # Regex coll.find_one( { :_id => 42 } )            # single record coll.find( { :age => { '$gte' => 21 } } )   # native conditionals coll.find( { 'author.first_name' => 'John' } )    # embedded object coll.find( { '$where' => 'this.age % 7 == 0' } )  # custom conditional
  • 63. and Ruby More Querying Native conditionals: $in, $nin, $all, $size, $exists, $type :fields (subset of document keys) coll.find({:zipcode => '03801'}, {:fields => [:first_name,:last_name]} ) :limit, :skip for pagination coll.find({:zipcode => '03801'}, {:limit => 100}) sorting coll.find({:zipcode => '03801'}, {:sort => [:created_at, 'ascending']}) count, distinct and group (does not use Map/Reduce)
  • 64. and Ruby Inserting and finding require 'rubygems' require 'mongo' include Mongo DB = Connection.new(ENV['DATABASE_URL'] || 'localhost').db('test') coll = DB['eyes'] coll.remove 100.times { |i| coll.insert('i' => i, 'token' => rand(1000)) } coll.find({'token' => { '$gt' => 900 }},           {:sort => [:token, :descending]}).each { |row| puts row.inspect }
  • 65. and Ruby Updating in place coll.update({:token => { '$gt' => 970 }}, {'$set'=>{:winner => true}}, {:multi => true}); coll.find(:winner => true).each { |row| puts row.inspect } {"_id"=>ObjectID('4bcb51a65ea7db36fc000036'), "i"=>53, "token"=>990, "winner"=>true} {"_id"=>ObjectID('4bcb51a65ea7db36fc000038'), "i"=>75, "token"=>977, "winner"=>true}
  • 66. and Ruby Map/Reduce Parallel Computing Delayed Gratification
  • 67. and Ruby Map/Reduce Map: chops massive data set into smaller problem-specific set Reduce: iterates over Map results, combining as needed
  • 68. and Ruby Map/Reduce Map: chops massive data set into smaller problem-specific set Reduce: iterates over Map results, combining as needed
  • 69. and Ruby Map/Reduce map    = "function() { emit(this.author, {votes: this.votes}); }" reduce = "function(key, values) {   var sum = 0;   values.forEach(function(doc) {    sum += doc.votes;   });   return {votes: sum}; };" @results = @comments.map_reduce(map, reduce) puts @results.find().inspect [{"author"="sbeam", "value"=>{"votes"=>21.0}}, {"author"=>"barney", "value"=>{"votes"=>13.0}}] http://kylebanker.com/blog/2009/12/mongodb-map-reduce-basics/
  • 70. and Ruby GridFS Store large files Transparently chunks Incremental delivery (video streaming)
  • 71. and Ruby GridFS @grid = Grid.new(@db) # Saving IO data and including the optional filename image = File.open("kitty.jpg") file_id = @grid.put(image, :filename => "kitty.jpg") .... @grid = Grid.new(@db) # writing file with given _id to HTTP if img = @grid.get(Mongo::ObjectID::from_string(params[:file_id]))     headers 'Content-Type' => img.content_type     img.read end
  • 72. and Ruby Other interesting things Capped collections (think memcached) Multikeys and Full-text search Auto-sharding Replica Sets
  • 73. and Ruby Eventual Consistency can my use case tolerate • stale reads? • reading values out of order?  • not reading my own writes?
  • 74. and Ruby Ruby Adapters • MongoRecord http://github.com/mongodb/mongo-record • MongoMapper http://github.com/jnunemaker/mongomapper • Candy http://github.com/SFEley/candy
  • 75. and Ruby MongoMapper example class Post   include MongoMapper::Document   key :title, String   key :body, String   many :comments, :as => :commentable, :class_name => 'PostComment'   timestamps! end class PostComment   include MongoMapper::Document   key :username, String, :default => 'Anonymous'   key :body, String   key :commentable_id, ObjectId   key :commentable_type, String   belongs_to :commentable, :polymorphic => true   timestamps! end
  • 76. Much more... • Theory http://blog.mongodb.org/ • GUIs http://blog.timgourley.com/tagged/nosql • Support http://www.10gen.com/
  • 77. • sbeam@onsetcorps.net • http://twitter.com/sbeam • http://github.com/sbeam/mf1