Schema Design — MongoBerlin

                              Richard M Kreuter
                                   10gen Inc.
                              richard@10gen.com


                               March 25, 2011




Schema Design — MongoBerlin
Observations about Relational Database Schemas


         Relational schema design is often presented and thought of as
         an exercise in normalization. While academics debate how
         many normal forms can fit on the head of a pin, practitioners
         tend to employ just one or two.
         However, all nontrivial real-world applications employ a variety
         of strategic denormalizations: materialized views in the
         RDBMS, caching layers outside the RDBMS. These
         denormalizations tend to be vital to real-world performance.
         Finally, application programmers seldom code in relations, but
         rather in object graphs; the RDBMS’s model, the set of
         tuples, isn’t a great fit for modern programming languages or
         developers’ minds.


   Schema Design — MongoBerlin
MongoDB Documents, Queries, Features



        MongoDB documents are deeply nestable sequences key-value
        pairs, thus permitting “rich” structure.
        The MongoDB query language is relatively SQL-like in its
        capacity to find documents satisfying complicated, dynamic
        criteria.
        MongoDB documents can be updated atomically, with
        special efficiency at updates that don’t alter a document’s size
        or shape.




  Schema Design — MongoBerlin
MongoDB Schema Design Generalities



  When designing for MongoDB, do...
        ... let the application direct the schema.
        ... denormalize judiciously.
        ... design your schema for indexing.
        ... resort to application-level JOINs when needed
  And don’t ...
        ... treat collections as heaps.
        ... frequently resize documents.




  Schema Design — MongoBerlin
Letting the application direct the schema




   Most applications mostly view their data in a small number of,
   distinguished “shape”, generally congruent to graphs of
   inter-object has-a relationships among instance classes in the
   applications’ models. MongoDB lets you store your data more or
   less directly according to the shape of your model.




   Schema Design — MongoBerlin
Letting the application direct the schema, continued




   db.blog_posts.findOne()
   { _id : Object(...)
     text : "A blazingly clever blog post.",
     by : "A. U. Thor",
     date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",
     tags : [ "funny", "ironic" ]
   }




   Schema Design — MongoBerlin
Denormalizing Judiciously




   Most application entities turn out to have some fields that are very
   frequently altered, and other fields that are exceedingly seldom
   altered. Embedding infrequently altered attributes around the
   database is a reasonable strategy to improve performance.




   Schema Design — MongoBerlin
Denormalizing Judiciously, continued

   db.product_reviews.findOne()
   { _id : Object(...)
     comment : "The best thing ever!"
     date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",
     reviewer : { uid : ObjectId("987654abcxyz"),
                  name : "Khan Sumer",
                  thumbnail : "thumb-123456.jpg",
                  url : "http://blahblah.com/" } }
   db.users.find({ _id : ObjectId("987654abcxyz")})
   { uid : ObjectId("987654abcxyz"),
     name : "Khan Sumer",
     thumbnail : ..., url : ...
     last_post : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)",
     favorites : [ ... ], friends : [ ... ] }
   }
   Schema Design — MongoBerlin
Design your schema for indexing



   There’s a subtle relationship between schemas and indexes.
   Consider this query:

   db.boxes.find({$where : "this.height > this.width"})

   This query doesn’t take advantage of MongoDB indexes, both
   because of the JavaScript and also because this predicate isn’t
   something MongoDB knows how to index. If this sort of query is
   important, maintaining a separate boolean attribute in the
   document is the right thing; and the separate value can be indexed.




   Schema Design — MongoBerlin
Application-level JOINs




   Because most MongoDB documents are “richer” than RDBMS
   rows, they tend to represent “pre-JOINed” data; and so
   application-level JOIN operations should be few. However,
   sometimes you do need relational-style normalization and
   application-level JOINS. This comes up in some many-to-many
   relationships, and may not cost much in practice.




   Schema Design — MongoBerlin
Don’t treat collections as heaps




   Although MongoDB permits quite a bit of freedom in document
   structure, documents in a collection ought to share a common
   subset of attributes, for programmatic processing effective
   indexing, and developer comprehension. If you have documents
   with very different sets of attributes, consider storing them in
   separate collections.




   Schema Design — MongoBerlin
Don’t frequently resize documents




   Resizing a document (e.g. by adding/removing attributes or
   adding/removing elements of lists) is generally costly. (In-place
   updates are quite efficient, however.) In general, a schema whose
   documents’ sizes are highly volatile should be considered suspect;
   such data might best be stored as separate documents.




   Schema Design — MongoBerlin
Don’t frequently resize documents, continued
   So, instead of this
   db.urlhits.findOne()
   { _id : ..., url : "http://10gen.com",
     // this is counting with granularity of 1 day
     counts : { "2011-03-01" :
                 { firefox : 12345, chrome : 23456 },
                "2011-03-02" :
                 { firefox : 15678, chrome : 24567 }
                ... } }
   consider this:
   db.urlhits2.findOne()
   { _id : ..., url : "http://10gen.com",
     date : "2011-03-01",
     counts : { "firefox : 12345, chrome : 23456 } }
   Schema Design — MongoBerlin
Don’t frequently resize documents, continued

   So, instead of this

   db.user_events.findOne()
   { _id : ..., user : "kreuter"
     clicks : [ { url : <url1>, time : <time1> },
                { url : <url2>, time : <time2> },
                ... ] }

   consider this:

   db.user_events.findOne()
   { _id : ..., user : "kreuter", url: <url1>, time: <time1> }




   Schema Design — MongoBerlin
Going forward



         www.mongodb.org — downloads, docs, community
         mongodb-user@googlegroups.com — mailing list
         #mongodb on irc.freenode.net
         try.mongodb.org — web-based shell
         10gen is hiring. Email jobs@10gen.com.
         10gen offers support, training, and advising services for
         mongodb




   Schema Design — MongoBerlin

MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

  • 1.
    Schema Design —MongoBerlin Richard M Kreuter 10gen Inc. richard@10gen.com March 25, 2011 Schema Design — MongoBerlin
  • 2.
    Observations about RelationalDatabase Schemas Relational schema design is often presented and thought of as an exercise in normalization. While academics debate how many normal forms can fit on the head of a pin, practitioners tend to employ just one or two. However, all nontrivial real-world applications employ a variety of strategic denormalizations: materialized views in the RDBMS, caching layers outside the RDBMS. These denormalizations tend to be vital to real-world performance. Finally, application programmers seldom code in relations, but rather in object graphs; the RDBMS’s model, the set of tuples, isn’t a great fit for modern programming languages or developers’ minds. Schema Design — MongoBerlin
  • 3.
    MongoDB Documents, Queries,Features MongoDB documents are deeply nestable sequences key-value pairs, thus permitting “rich” structure. The MongoDB query language is relatively SQL-like in its capacity to find documents satisfying complicated, dynamic criteria. MongoDB documents can be updated atomically, with special efficiency at updates that don’t alter a document’s size or shape. Schema Design — MongoBerlin
  • 4.
    MongoDB Schema DesignGeneralities When designing for MongoDB, do... ... let the application direct the schema. ... denormalize judiciously. ... design your schema for indexing. ... resort to application-level JOINs when needed And don’t ... ... treat collections as heaps. ... frequently resize documents. Schema Design — MongoBerlin
  • 5.
    Letting the applicationdirect the schema Most applications mostly view their data in a small number of, distinguished “shape”, generally congruent to graphs of inter-object has-a relationships among instance classes in the applications’ models. MongoDB lets you store your data more or less directly according to the shape of your model. Schema Design — MongoBerlin
  • 6.
    Letting the applicationdirect the schema, continued db.blog_posts.findOne() { _id : Object(...) text : "A blazingly clever blog post.", by : "A. U. Thor", date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)", tags : [ "funny", "ironic" ] } Schema Design — MongoBerlin
  • 7.
    Denormalizing Judiciously Most application entities turn out to have some fields that are very frequently altered, and other fields that are exceedingly seldom altered. Embedding infrequently altered attributes around the database is a reasonable strategy to improve performance. Schema Design — MongoBerlin
  • 8.
    Denormalizing Judiciously, continued db.product_reviews.findOne() { _id : Object(...) comment : "The best thing ever!" date : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)", reviewer : { uid : ObjectId("987654abcxyz"), name : "Khan Sumer", thumbnail : "thumb-123456.jpg", url : "http://blahblah.com/" } } db.users.find({ _id : ObjectId("987654abcxyz")}) { uid : ObjectId("987654abcxyz"), name : "Khan Sumer", thumbnail : ..., url : ... last_post : "Mon Mar 21 2011 03:54:51 GMT-0400 (EDT)", favorites : [ ... ], friends : [ ... ] } } Schema Design — MongoBerlin
  • 9.
    Design your schemafor indexing There’s a subtle relationship between schemas and indexes. Consider this query: db.boxes.find({$where : "this.height > this.width"}) This query doesn’t take advantage of MongoDB indexes, both because of the JavaScript and also because this predicate isn’t something MongoDB knows how to index. If this sort of query is important, maintaining a separate boolean attribute in the document is the right thing; and the separate value can be indexed. Schema Design — MongoBerlin
  • 10.
    Application-level JOINs Because most MongoDB documents are “richer” than RDBMS rows, they tend to represent “pre-JOINed” data; and so application-level JOIN operations should be few. However, sometimes you do need relational-style normalization and application-level JOINS. This comes up in some many-to-many relationships, and may not cost much in practice. Schema Design — MongoBerlin
  • 11.
    Don’t treat collectionsas heaps Although MongoDB permits quite a bit of freedom in document structure, documents in a collection ought to share a common subset of attributes, for programmatic processing effective indexing, and developer comprehension. If you have documents with very different sets of attributes, consider storing them in separate collections. Schema Design — MongoBerlin
  • 12.
    Don’t frequently resizedocuments Resizing a document (e.g. by adding/removing attributes or adding/removing elements of lists) is generally costly. (In-place updates are quite efficient, however.) In general, a schema whose documents’ sizes are highly volatile should be considered suspect; such data might best be stored as separate documents. Schema Design — MongoBerlin
  • 13.
    Don’t frequently resizedocuments, continued So, instead of this db.urlhits.findOne() { _id : ..., url : "http://10gen.com", // this is counting with granularity of 1 day counts : { "2011-03-01" : { firefox : 12345, chrome : 23456 }, "2011-03-02" : { firefox : 15678, chrome : 24567 } ... } } consider this: db.urlhits2.findOne() { _id : ..., url : "http://10gen.com", date : "2011-03-01", counts : { "firefox : 12345, chrome : 23456 } } Schema Design — MongoBerlin
  • 14.
    Don’t frequently resizedocuments, continued So, instead of this db.user_events.findOne() { _id : ..., user : "kreuter" clicks : [ { url : <url1>, time : <time1> }, { url : <url2>, time : <time2> }, ... ] } consider this: db.user_events.findOne() { _id : ..., user : "kreuter", url: <url1>, time: <time1> } Schema Design — MongoBerlin
  • 15.
    Going forward www.mongodb.org — downloads, docs, community mongodb-user@googlegroups.com — mailing list #mongodb on irc.freenode.net try.mongodb.org — web-based shell 10gen is hiring. Email jobs@10gen.com. 10gen offers support, training, and advising services for mongodb Schema Design — MongoBerlin