open-source, high-performance, document-oriented database Antoine Girbal [email_address]
RDBMS (Oracle, MySQL) New Gen. OLAP (vertica, aster, greenplum) Non-relational Operational Stores (“NoSQL”) RDBMS Helper (MemCache, Application layer) Data Store Analytics
non-relational, next-generation operational datastores and databases focus on scalability ease of modeling and changing data (also no sql syntax, thanks!) NoSQL Really Means:
Horizontally Scalable Architectures no joins no complex transactions +
JSON-style Documents {  hello :   “world”  } \x16\x00\x00\x00 \x02 hello\x00 \x06\x00\x00\x00world\x00 \x00 http://bsonspec.org represented as  BSON Just like a light and friendly XML
Flexible “Schemas” In collection db.posts: {author: “mike”, links: 3 , date: "Sun Jul 18 2010 14:40:20 GMT-0700 (PDT)" text: “blah blah”} {author: “eliot”, date: "Sun Jul 18 2010 14:40:22 GMT-0700 (PDT)" text: “Here is MongoDB ...”, views: 10 } Potentially all documents in the same collection
Embedded Document { _id: ObjectId("4d1009c7262bb4b94af1cea4") author_id: “1346”, date: "Sun Jul 18 2010 14:40:20 GMT-0700 (PDT)", title: “my story” text: “once upon a time ...”, tags: [“novel”,”english”], Comments:[ {user_id: 234, text: “awesome dude”}, {user_id: 1235, text: “that made me cry”}] } little need for joins or transactions across documents!
Data Model Normalized: Many tables Row/Column Natural: Collections Documents Just BLOBS Can't scale As fast as BLOBs if well modeled, horizontal scalability Easy to scale (Dynamo), can easily use caching No feature Most regular SQL features (satisfies 90% of users) (too) many PERFORMANCE FEATURES
Features Complex querying Atomic updates with modifiers Indexing (unique, compound, Geo) Aggregation and Map / Reduce Capped Collections Powerful Shell (Javascript) GridFS: file storage
Replication master slave slave Using Replica Set: - pool of servers with 1 master - automatic master election and failover - distributed reads (slaveOk) slave Client Client
Sharding client mongos ... mongos mongod ... Shards mongod mongod mongod Config Servers mongod mongod mongod mongod mongod mongod mongod mongod ... For large datasets,  or write heavy system
Support OS: Mac OS X, Windows, Linux, Solaris, 32/64 bits Drivers: C, C#, C++, Haskell, Java, Javascript, Perl, PHP, Python, Ruby, Scala... + community drivers Open-source project with active community, Wiki, Google Group, 10gen consulting / support
Production Examples Shutterfly Fourquare Craigslist bit.ly IGN Sourceforge Etsy the New York Times Business Insider Gilt Groupe Intuit College humor Evite Disqus Justin.tv Heartbeat Hot Potato Eventbrite Sugar crm Electronic Arts ...
New Post > post = {author: "mike", ... date: new Date(), ... text: "my blog post", ... tags: ["mongodb", "intro"]} > db.posts.save(post) > db.posts.findOne() { "_id" : ObjectId("4d2f944103e8fdbb36f6d205"), "author" : "mike", "date" : ISODate("2011-01-14T00:08:49.933Z"), "text" : "my blog post", "tags" : ["mongodb","intro"]}
A Quick Aside special key present in all documents unique across a Collection any type you want _id
Update > db.posts.update({_id: post._id}, ... { $set: {author: "tony"}}) > c = {author: "eliot", date: new Date(), text: "great post!"} > db.posts.update({_id: post._id}, ... { $push: {comments: c}}) > db.posts.update({_id: post._id}, ... { $inc: {views: 1}})
Querying > db.posts.findOne() { "_id" : ObjectId("4d2f944103e8fdbb36f6d205"), "author" : "tony", "comments" : [ { "author" : "eliot", "date" : ISODate("2011-01-14T00:13:52.463Z"), "text" : "great post!" } ], "date" : ISODate("2011-01-14T00:08:49.933Z"), "tags" : [ "mongodb", "intro" ], "text" : "my blog post", "views" : 1 }
More Querying Find by Author > db.posts.find({author: "tony"}) 10 most recent posts: > db.posts.find().sort({date: -1}).limit(10) Posts since April 1 st : > april_1 = new Date(2010, 3, 1) > db.posts.find({date: {$gt: april_1}}) Adding an index to speed up: > db.posts.ensureIndex({author: 1}) > db.posts.ensureIndex({date: 1})
More Querying Find with regexp: > db.posts.find({text: /post$/}) Find within array: > db.posts.find({tags: "intro"}) > db.posts.ensureIndex({tags: 1}) Find within embedded object: > db.posts.find({"comments.author": "eliot"}) > db.posts.ensureIndex({"comments.author": 1})
More Querying Counting: > db.posts.find().count() > db.posts.find({author: "tony"}).count() Paging: > page = 2 > page_size = 15 > db.post.find().limit(page_size).skip(page * page_size) Advanced operators: $gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $where > db.posts.find({$where: "this.author == 'tony' || this.title == 'foo'"})
Download MongoDB http://www.mongodb.org and let us know what you think @mongodb Current 1.6, soon 1.8

Introduction to MongoDB

  • 1.
    open-source, high-performance, document-orienteddatabase Antoine Girbal [email_address]
  • 2.
    RDBMS (Oracle, MySQL)New Gen. OLAP (vertica, aster, greenplum) Non-relational Operational Stores (“NoSQL”) RDBMS Helper (MemCache, Application layer) Data Store Analytics
  • 3.
    non-relational, next-generation operationaldatastores and databases focus on scalability ease of modeling and changing data (also no sql syntax, thanks!) NoSQL Really Means:
  • 4.
    Horizontally Scalable Architecturesno joins no complex transactions +
  • 5.
    JSON-style Documents { hello : “world” } \x16\x00\x00\x00 \x02 hello\x00 \x06\x00\x00\x00world\x00 \x00 http://bsonspec.org represented as BSON Just like a light and friendly XML
  • 6.
    Flexible “Schemas” Incollection db.posts: {author: “mike”, links: 3 , date: "Sun Jul 18 2010 14:40:20 GMT-0700 (PDT)" text: “blah blah”} {author: “eliot”, date: "Sun Jul 18 2010 14:40:22 GMT-0700 (PDT)" text: “Here is MongoDB ...”, views: 10 } Potentially all documents in the same collection
  • 7.
    Embedded Document {_id: ObjectId("4d1009c7262bb4b94af1cea4") author_id: “1346”, date: "Sun Jul 18 2010 14:40:20 GMT-0700 (PDT)", title: “my story” text: “once upon a time ...”, tags: [“novel”,”english”], Comments:[ {user_id: 234, text: “awesome dude”}, {user_id: 1235, text: “that made me cry”}] } little need for joins or transactions across documents!
  • 8.
    Data Model Normalized:Many tables Row/Column Natural: Collections Documents Just BLOBS Can't scale As fast as BLOBs if well modeled, horizontal scalability Easy to scale (Dynamo), can easily use caching No feature Most regular SQL features (satisfies 90% of users) (too) many PERFORMANCE FEATURES
  • 9.
    Features Complex queryingAtomic updates with modifiers Indexing (unique, compound, Geo) Aggregation and Map / Reduce Capped Collections Powerful Shell (Javascript) GridFS: file storage
  • 10.
    Replication master slaveslave Using Replica Set: - pool of servers with 1 master - automatic master election and failover - distributed reads (slaveOk) slave Client Client
  • 11.
    Sharding client mongos... mongos mongod ... Shards mongod mongod mongod Config Servers mongod mongod mongod mongod mongod mongod mongod mongod ... For large datasets, or write heavy system
  • 12.
    Support OS: MacOS X, Windows, Linux, Solaris, 32/64 bits Drivers: C, C#, C++, Haskell, Java, Javascript, Perl, PHP, Python, Ruby, Scala... + community drivers Open-source project with active community, Wiki, Google Group, 10gen consulting / support
  • 13.
    Production Examples ShutterflyFourquare Craigslist bit.ly IGN Sourceforge Etsy the New York Times Business Insider Gilt Groupe Intuit College humor Evite Disqus Justin.tv Heartbeat Hot Potato Eventbrite Sugar crm Electronic Arts ...
  • 14.
    New Post >post = {author: "mike", ... date: new Date(), ... text: "my blog post", ... tags: ["mongodb", "intro"]} > db.posts.save(post) > db.posts.findOne() { "_id" : ObjectId("4d2f944103e8fdbb36f6d205"), "author" : "mike", "date" : ISODate("2011-01-14T00:08:49.933Z"), "text" : "my blog post", "tags" : ["mongodb","intro"]}
  • 15.
    A Quick Asidespecial key present in all documents unique across a Collection any type you want _id
  • 16.
    Update > db.posts.update({_id:post._id}, ... { $set: {author: "tony"}}) > c = {author: "eliot", date: new Date(), text: "great post!"} > db.posts.update({_id: post._id}, ... { $push: {comments: c}}) > db.posts.update({_id: post._id}, ... { $inc: {views: 1}})
  • 17.
    Querying > db.posts.findOne(){ "_id" : ObjectId("4d2f944103e8fdbb36f6d205"), "author" : "tony", "comments" : [ { "author" : "eliot", "date" : ISODate("2011-01-14T00:13:52.463Z"), "text" : "great post!" } ], "date" : ISODate("2011-01-14T00:08:49.933Z"), "tags" : [ "mongodb", "intro" ], "text" : "my blog post", "views" : 1 }
  • 18.
    More Querying Findby Author > db.posts.find({author: "tony"}) 10 most recent posts: > db.posts.find().sort({date: -1}).limit(10) Posts since April 1 st : > april_1 = new Date(2010, 3, 1) > db.posts.find({date: {$gt: april_1}}) Adding an index to speed up: > db.posts.ensureIndex({author: 1}) > db.posts.ensureIndex({date: 1})
  • 19.
    More Querying Findwith regexp: > db.posts.find({text: /post$/}) Find within array: > db.posts.find({tags: "intro"}) > db.posts.ensureIndex({tags: 1}) Find within embedded object: > db.posts.find({"comments.author": "eliot"}) > db.posts.ensureIndex({"comments.author": 1})
  • 20.
    More Querying Counting:> db.posts.find().count() > db.posts.find({author: "tony"}).count() Paging: > page = 2 > page_size = 15 > db.post.find().limit(page_size).skip(page * page_size) Advanced operators: $gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $where > db.posts.find({$where: "this.author == 'tony' || this.title == 'foo'"})
  • 21.
    Download MongoDB http://www.mongodb.organd let us know what you think @mongodb Current 1.6, soon 1.8

Editor's Notes

  • #15 Collection (logical groupings of documents) Indexes are per-collection
  • #22 blog post twitter