Building your First
MongoDB Application
       Rick Copeland
         @rick446
      http://arborian.com
Who am I?

• Now a consultant, but formerly...
 • Software engineer at SourceForge
 • Author of Essential SQLAlchemy
 • Primarily code Python
What will you learn
     here?
What will you learn
       here?
• Data Modeling
What will you learn
       here?
• Data Modeling
• Queries
What will you learn
       here?
• Data Modeling
• Queries
• Geospatial indexing
What will you learn
       here?
• Data Modeling
• Queries
• Geospatial indexing
• Updates
What will you learn
       here?
• Data Modeling
• Queries
• Geospatial indexing
• Updates
• Map/Reduce
What will you learn
       here?
• Data Modeling
• Queries
• Geospatial indexing
• Updates
• Map/Reduce
• Deployment and Scaling Concerns
What will you learn
       here?
• Data Modeling
• Queries
• Geospatial indexing
• Updates
• Map/Reduce
• Deployment and Scaling Concerns
• Why MongoDB?
Our Application

• Users can create different locations
• Users can “check in” to these locations
• Users can see who else is checked in
• Let’s call it ThreeTriangles (3tri)
3tri Operations
                        Record
         Find nearby
                       checkins
            places



           Places      Checkins



User-generated                    Checkin
   content                         Stats
MongoDB Terminology
  Relational   MongoDB
   Database    Database
    Table      Collection
    Index        Index
    Row        Document
   Column        Field
Documents?
doc1 = {!
  _id: ObjectId('4b97e62bf1d8c7152c9ccb74'),
  key1: value1,!
  key2: value2,
  key3: {..., ..., ...},
  key4: [..., ..., ]
}
Collections
                                  •   No schema
                                      enforcement
doc1,doc2,...     doc5,doc6,...
                                  •   Per-collection...
   Places              Users
                                      •   Querying

                                      •   Indexing
         doc8,doc9,...
                                      •   Updating


            Checkins                  •   Sharding
Places v1
place1 = {!
  name: "Blake Hotel",!
  address: "555 South McDowell Street",
  city: "Charlotte",
  zip: "28204"}

db.places.find({zip:"28204"}).limit(10)
Places v2
place2 = {
  name: "Blake Hotel",!
  address: "555 South McDowell Street",
  city: "Charlotte",
  zip: "28204",
  tags: ["hotel", "recommended"]}

db.places.find({
  zip: "28204", tags: "hotel"}).limit(10)
Places v3
place3 = {
  name: "Blake Hotel",!
  address: "555 South McDowell Street",
  city: "Charlotte",

  zip: "28204",
  tags: ["hotel", "recommended"],
  latlon: [ 35.21, 80.83 ] }

db.places.ensureIndex({latlong:"2d"})
db.places.find({latlong:{$near:[35,80]}})
Places v4
place4 = {
   name: "Blake Hotel",!
   address: "555 South McDowell Street",
   city: "Charlotte",
   zip: "28204",
   tags: ["hotel", "recommended"],
   latlon: [ 35.21, 80.83 ],
   tips: [
      { user: "rick",
        time: ISODateTime(...),
        tip: "Come learn about #self2012"},
      { ... },
      { ... } ] }
Some Queries
/* First, some indexes */
db.places.ensureIndex({tags:1})
db.places.ensureIndex({name:1})
db.places.ensureIndex({latlong:"2d"})

/* Find places */
db.places.find({latlong:{$near:[40,70]}})

/* Regex searching */
db.places.find({name: /^typeaheadstring/)

/* Searching arrays */
db.places.find({tags: "business"})
Inserting and updating
/* Initial data load */
db.places.insert([place1, place2, ...])

/* Update-in-place */
db.places.update(
   { name:"Blake Hotel" },
   { $push : {
      tips: {
         user: "rick",
         time: ISODateTime(...),
         tip: "Come learn about #self2012" }
       } } )
3tri Operations
                        Record
         Find nearby
                       checkins
            places



           Places      Checkins



User-generated                    Checkin
   content                         Stats
Users
user1 = {
   name: "rick",
   email: "rick@arborian.com",
   ...
   checkins: [

ObjectId('4b97e62bf1d8c7152c9ccb74'),
      ... ] }

/* checkins [] = references checkin
   collection _id field */
Checkins
db.checkins.ensureIndex({place:1, ts:1})
db.checkins.ensureIndex({ts:1})

checkin1 = {
   place: {
      id: ObjectId(...),
      name: "Blake Hotel" },
   ts: ISODateTime(...),
   user: {
      id: ObjectId(...), name: "rick" } }
Atomic Updates
•   $set           •   $pull

•   $unset         •   $pullAll

•   $rename        •   $addToSet

•   $push          •   $inc

•   $pushAll       •   $bit

•   $pop
3tri Operations
                        Record
         Find nearby
                       checkins
            places



           Places      Checkins



User-generated                    Checkin
   content                         Stats
Simple Statistics
/* All checkins */
db.checkins.find({"place.name": "Blake Hotel"})

/* Last 10 checkins */
db.checkins.find({"place.name": "Blake Hotel"})
   .sort({ts:-1}).limit(10)

/* Number of checkins today */
db.checkins.find(
   { "place.name": "Blake Hotel",
     ts: { $gt: ISODateTime(...)} })
   .count()
MapReduce
mapFunc = function() {
   emit(this.place.name, 1); }
reduceFunc = function(key, values) {
   return Array.sum(values); }

res = db.checkins.mapReduce(
   mapFunc, reduceFunc, !
   { query: { ts: { $gt: nowminus3hrs } }
     out: {inline: 1} })

res.results = [
   {_id:"Blake Hotel", value: 17}, ...]
Deployment and Scaling
      Options
Single Master
        Deployment
Read / Write            Primary



        Read        Secondary



        Read        Secondary
Auto-Sharding
  Shard 1 (0..10)   Shard 2 (10..20)   Shard 3 (20..30)


          Primary           Primary          Primary



        Secondary        Secondary       Secondary


Config

        Secondary        Secondary       Secondary
Use Cases
• Replace RDBMS for high-traffic web
  applications
• CMS-style applications
• Social and mobile applications
• Real-time analytics, high-speed logging
• Maybe not double-entry bookkeeping
What do you give up?
• No multi-document atomic operations (i.e.
  transactions)
• No server-side JOINs
• No referential integrity constraints
  between documents
• Data model is typically tied to query
  patterns (less flexible than relational DBs)
Questions?
   Please rate this talk at http://svy.mk/L3jM7f

Interested in training? http://Arborian.com/training

MongoDB Info & Downloads: http://mongodb.org


                  Rick Copeland
                     @rick446
                http://arborian.com

Building Your First MongoDB Application

  • 1.
    Building your First MongoDBApplication Rick Copeland @rick446 http://arborian.com
  • 2.
    Who am I? •Now a consultant, but formerly... • Software engineer at SourceForge • Author of Essential SQLAlchemy • Primarily code Python
  • 3.
    What will youlearn here?
  • 4.
    What will youlearn here? • Data Modeling
  • 5.
    What will youlearn here? • Data Modeling • Queries
  • 6.
    What will youlearn here? • Data Modeling • Queries • Geospatial indexing
  • 7.
    What will youlearn here? • Data Modeling • Queries • Geospatial indexing • Updates
  • 8.
    What will youlearn here? • Data Modeling • Queries • Geospatial indexing • Updates • Map/Reduce
  • 9.
    What will youlearn here? • Data Modeling • Queries • Geospatial indexing • Updates • Map/Reduce • Deployment and Scaling Concerns
  • 10.
    What will youlearn here? • Data Modeling • Queries • Geospatial indexing • Updates • Map/Reduce • Deployment and Scaling Concerns • Why MongoDB?
  • 11.
    Our Application • Userscan create different locations • Users can “check in” to these locations • Users can see who else is checked in • Let’s call it ThreeTriangles (3tri)
  • 12.
    3tri Operations Record Find nearby checkins places Places Checkins User-generated Checkin content Stats
  • 13.
    MongoDB Terminology Relational MongoDB Database Database Table Collection Index Index Row Document Column Field
  • 14.
    Documents? doc1 = {! _id: ObjectId('4b97e62bf1d8c7152c9ccb74'), key1: value1,! key2: value2, key3: {..., ..., ...}, key4: [..., ..., ] }
  • 15.
    Collections • No schema enforcement doc1,doc2,... doc5,doc6,... • Per-collection... Places Users • Querying • Indexing doc8,doc9,... • Updating Checkins • Sharding
  • 16.
    Places v1 place1 ={! name: "Blake Hotel",! address: "555 South McDowell Street", city: "Charlotte", zip: "28204"} db.places.find({zip:"28204"}).limit(10)
  • 17.
    Places v2 place2 ={ name: "Blake Hotel",! address: "555 South McDowell Street", city: "Charlotte", zip: "28204", tags: ["hotel", "recommended"]} db.places.find({ zip: "28204", tags: "hotel"}).limit(10)
  • 18.
    Places v3 place3 ={ name: "Blake Hotel",! address: "555 South McDowell Street", city: "Charlotte", zip: "28204", tags: ["hotel", "recommended"], latlon: [ 35.21, 80.83 ] } db.places.ensureIndex({latlong:"2d"}) db.places.find({latlong:{$near:[35,80]}})
  • 19.
    Places v4 place4 ={ name: "Blake Hotel",! address: "555 South McDowell Street", city: "Charlotte", zip: "28204", tags: ["hotel", "recommended"], latlon: [ 35.21, 80.83 ], tips: [ { user: "rick", time: ISODateTime(...), tip: "Come learn about #self2012"}, { ... }, { ... } ] }
  • 20.
    Some Queries /* First,some indexes */ db.places.ensureIndex({tags:1}) db.places.ensureIndex({name:1}) db.places.ensureIndex({latlong:"2d"}) /* Find places */ db.places.find({latlong:{$near:[40,70]}}) /* Regex searching */ db.places.find({name: /^typeaheadstring/) /* Searching arrays */ db.places.find({tags: "business"})
  • 21.
    Inserting and updating /*Initial data load */ db.places.insert([place1, place2, ...]) /* Update-in-place */ db.places.update( { name:"Blake Hotel" }, { $push : { tips: { user: "rick", time: ISODateTime(...), tip: "Come learn about #self2012" } } } )
  • 22.
    3tri Operations Record Find nearby checkins places Places Checkins User-generated Checkin content Stats
  • 23.
    Users user1 = { name: "rick", email: "rick@arborian.com", ... checkins: [ ObjectId('4b97e62bf1d8c7152c9ccb74'), ... ] } /* checkins [] = references checkin collection _id field */
  • 24.
    Checkins db.checkins.ensureIndex({place:1, ts:1}) db.checkins.ensureIndex({ts:1}) checkin1 ={ place: { id: ObjectId(...), name: "Blake Hotel" }, ts: ISODateTime(...), user: { id: ObjectId(...), name: "rick" } }
  • 25.
    Atomic Updates • $set • $pull • $unset • $pullAll • $rename • $addToSet • $push • $inc • $pushAll • $bit • $pop
  • 26.
    3tri Operations Record Find nearby checkins places Places Checkins User-generated Checkin content Stats
  • 27.
    Simple Statistics /* Allcheckins */ db.checkins.find({"place.name": "Blake Hotel"}) /* Last 10 checkins */ db.checkins.find({"place.name": "Blake Hotel"}) .sort({ts:-1}).limit(10) /* Number of checkins today */ db.checkins.find( { "place.name": "Blake Hotel", ts: { $gt: ISODateTime(...)} }) .count()
  • 28.
    MapReduce mapFunc = function(){ emit(this.place.name, 1); } reduceFunc = function(key, values) { return Array.sum(values); } res = db.checkins.mapReduce( mapFunc, reduceFunc, ! { query: { ts: { $gt: nowminus3hrs } } out: {inline: 1} }) res.results = [ {_id:"Blake Hotel", value: 17}, ...]
  • 29.
  • 30.
    Single Master Deployment Read / Write Primary Read Secondary Read Secondary
  • 31.
    Auto-Sharding Shard1 (0..10) Shard 2 (10..20) Shard 3 (20..30) Primary Primary Primary Secondary Secondary Secondary Config Secondary Secondary Secondary
  • 32.
    Use Cases • ReplaceRDBMS for high-traffic web applications • CMS-style applications • Social and mobile applications • Real-time analytics, high-speed logging • Maybe not double-entry bookkeeping
  • 33.
    What do yougive up? • No multi-document atomic operations (i.e. transactions) • No server-side JOINs • No referential integrity constraints between documents • Data model is typically tied to query patterns (less flexible than relational DBs)
  • 34.
    Questions? Please rate this talk at http://svy.mk/L3jM7f Interested in training? http://Arborian.com/training MongoDB Info & Downloads: http://mongodb.org Rick Copeland @rick446 http://arborian.com

Editor's Notes

  • #2 \n
  • #3 SF - early adopters of MongoDB (0.8), so we went through some of the growing pains\nSQLAlchemy - I don’t hate SQL!\nKnow Python best, but also C, C++, etc.\n
  • #4 \n
  • #5 \n
  • #6 \n
  • #7 \n
  • #8 \n
  • #9 \n
  • #10 \n
  • #11 \n
  • #12 \n
  • #13 \n
  • #14 Tend to not have auto-increment keys\nStandard JSON values + a few new primitives\nEmbedded docs & arrays OK (and encouraged!)\n
  • #15 \n
  • #16 Lots of places, hard to find the one you need\n
  • #17 Better, now at least we’re just looking for hotels, still too many\n
  • #18 Results sorted by distance\n
  • #19 Embedded tips for data locality\n
  • #20 Regex has same benefits and drawbacks as LIKE in sql -- prefix queries are your friend!\n
  • #21 Lots of in-place operators so you can update your document atomically (makes transactionlessness less important)\n
  • #22 So we’ve mostly covered the “Places” side, now let’s look at “Checkins”\n
  • #23 \n
  • #24 Checking in inserts checkin and then $pushes user object\n
  • #25 You can also “reach inside” your objects for updating\n
  • #26 \n
  • #27 \n
  • #28 Flexible output options for dumping results in a collection\n
  • #29 \n
  • #30 \n
  • #31 Data automatically balanced between replica sets\nConfig server stores ‘chunk’ locations\nTransparent to applications\n
  • #32 \n
  • #33 No transactions, but doc model is rich enough to do without in many apps\nNo server-side JOIN, but can be emulated in client (use $in)\nNo referential integrity, but 1:N joins become embedded documents\nData model *can* be relational, but you give up performance (and introduce integrity issues, etc) - many will just denormalize data to get both flexibility and performance at the expense of application complexity.\n
  • #34 \n