2 App Designs
Using MongoDB
Kyle Banker
kyle@10gen.com
@hwaet
What we'll cover:
Brief intro

Feed reader

Website analyitcs

Questions
Intro
MongoDB
Rich data model

Replication and automatic failover

Sharding
Use cases
Social networking and geolocation

E-commerce

CMS

Analytics

Production deployments: http://is.gd/U0VkG6
Demo
Feed reader
Four collections
Users

Feeds

Entries

Buckets
Users
{ _id: ObjectId("4e316f236ce9cca7ef17d59d"),
  username: 'kbanker',
    feeds: [
        { _id: ObjectId("4e316f236ce9cca7ef17d54c"),
          name: "GigaOM" },

          { _id: ObjectId("4e316f236ce9cca7ef17d54d"),
            name: "Salon.com" }

          { _id: ObjectId("4e316f236ce9cca7ef17d54e"),
            name: "Foo News" }
     ],

    latest: Date(2011, 7, 25)
}
Index
db.users.ensureIndex( { username: 1 }, { unique: true } )
Feeds
{ _id: ObjectId("4e316b8a6ce9cca7ef17d54b"),
  url: 'http://news.foo.com/feed.xml/',
  name: 'Foo News',
  subscriber_count: 2,
  latest: Date(2011, 7, 25)
}


                 Index
db.feeds.ensureIndex( { url: 1 }, { unique: true
Adding a feed subscription
// Upsert
db.feeds.update(
  { url: 'http://news.foo.com/feed.xml/', name: 'Foo News'},
  { $inc: {subscriber_count: 1} },
   true )
Adding a feed subscription
// Add this feed to user feed list
db.users.update( {_id: ObjectId("4e316f236ce9cca7ef17d59d") },
  { $addToSet: { feeds:
             { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"),
               name: 'Foo News'
             }
  }
)
Removing a feed
            subscription
db.users.update(
  { _id: ObjectId("4e316f236ce9cca7ef17d59d") },

 { $pull: { feeds:
            { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"),
              name: 'Foo News'
            }
 } )
Removing a feed
            subscription
db.feeds.update( {url: 'http://news.foo.com/feed.xml/'},
  { $inc: {subscriber_count: -1} } )
Entries (populate in
          background)
{ _id: ObjectId("4e316b8a6ce9cca7ef17d54b"),
  feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"),
  title: 'Important person to resign',
  body: 'A person deemed very important has decided...',
  reads: 5,
  date: Date(2011, 7, 27)
}
What we need now
Populate personal feeds (buckets)

Avoid lots of expensive queries

Record what's been read
Without bucketing
// Naive query runs every time
db.entries.find(
  feed_id: {$in: user_feed_ids}
).sort({date: 1}).limit(25)
With bucketing
// A bit smarter: only runs once
entries = db.entries.find(
  { date: {$gt: user_latest },
    feed_id: { $in: user_feed_ids }
 ).sort({date: 1})
Index
db.entries.ensureIndex( { date: 1, feed_id: 1} )
bucket = {
  _id: ObjectId("4e3185c26ce9cca7ef17d552"),
  user_id: ObjectId("4e316f236ce9cca7ef17d59d"),
  date: Date( 2011, 7, 27 )
  n: 100,

    entries: [
      { _id: ObjectId("4e316b8a6ce9cca7ef17d5ac"),
         feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"),
         title: 'Important person to resign',
         body: 'A person deemed very important has decided...',
         date: Date(2011, 7, 27),
         read: false
      },
        { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8"),
          feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"),
          title: 'Panda bear waves hello',
          body: 'A panda bear at the local zoo...',
          date: Date(2011, 7, 27),
          read: false
        }
    ]
}
db.buckets.insert( buckets )

db.users.update(
  { _id: ObjectId("4e316f236ce9cca7ef17d59d") }
  { $set: { latest: latest_entry_date } }
)
Viewing a personal feed
// Newest
db.buckets.find(
  { user_id: ObjectId("4e316f236ce9cca7ef17d59d") }
).sort({date: -1}).limit(1)
Viewing a personal feed
// Next newest (that's how we paginate)
db.buckets.find(
  { user_id: ObjectId("4e316f236ce9cca7ef17d59d"),
    date: { $lt: previous_reader_date }
  }
).sort({date: -1}).limit(1)
Index
db.buckets.ensureIndex( { user_id: 1, date: -1 } )
Marking a feed item
db.buckets.update(
  { user_id: ObjectId("4e316f236ce9cca7ef17d59d"),
    'entries.id': ObjectId("4e316b8a6ce9cca7ef17d5c8")},

    { $set: { 'entries.$.read' : true }
)
Marking a feed item
db.entries.update(
  { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8") },

    { $inc: { read: 1 } }
)
Sharding note:
Buckets collection is eminently shardable

Shard key: { user_id: 1, date: 1 }
Website
analyitcs
Challenges:
Real-time reporting.

Efficient use of space.

Easily wipe unneeded data.
Techniques
Pre-aggregate the data.

Pre-construct document structure.

Store emphemeral data in a separate
database.
Two collections:
Each collection gets its own database.

Collections names are time-scoped.

Clean, fast removal of old data.
// Collections holding totals for each day, stored
// in a database per month
days_2011_5
days_2011_6
days_2011_7
...

// Totals for each month...
months_2011_1_4
months_2011_5_8
months_2011_9_12
...
Hours and minutes
{ _id: { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"),
         day: '2011-5-1'
       },
  total: 2820,
  hrs: { 0: 500,
         1: 700,
         2: 450,
         3: 343,
         // ... 4-23 go here
         }
   // Minutes are rolling. This gives real-time
   // numbers for the last hour. So when you increment
   // minute n, you need to $set minute n-1 to 0.
   mins: { 1: 12,
           2: 10,
           3: 5,
           4: 34
           // ... 5-60 go here
         }
}
Updating day
document...
//   Update 'days' collection at 5:37 p.m. on 2011-5-1
//   Might want to queue increments so that $inc is greater
//   than 1 for each write...
id   = { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"),
         day: '2011-5-1'
       };

update = { $inc: { total: 1, 'hrs.17': 1, 'mins.37': 1 },
           $set: { 'mins.36': 0} };

// Update collection containing days 1-5
db.days_2011_5.update( id, update );
Months and days
{ _id: { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"),
         month: '2011-5'
       },
  total: 34173,
  months: { 1: 4000,
            2: 4300,
            3: 4200,
            4: 5000,
            5: 5100,
            6: 5700,
            7: 5873
            // ... 8-12 go here
         }
}
Updating month
 document...
// Update 'months' collection at 5:37 p.m. on 2011-5-1
query = { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"),
          month: '2011-5'
        };

update = { $inc: { total: 1, 'months.5': 1 } };

// Update collection containing days 1-5
db.month_2011_1_4.update( query, update );
Reporting
Must provide data at multiple resolutions
(second, minute, etc.).

We have the raw materials for that.

Application assembles the data
intelligently.
Summary
Feed Reader
Rich documents

Incremental modifiers

Bucketing strategy
Website Analytics
Pre-aggregate data

Time-scoped databases and collections
http://www.10gen.com/presentations
http://manning.com/banker
Thank you

Getting Started with MongoDB: 4 Application Designs

  • 1.
  • 2.
  • 3.
  • 4.
    What we'll cover: Briefintro Feed reader Website analyitcs Questions
  • 5.
  • 6.
    MongoDB Rich data model Replicationand automatic failover Sharding
  • 7.
    Use cases Social networkingand geolocation E-commerce CMS Analytics Production deployments: http://is.gd/U0VkG6
  • 8.
  • 9.
  • 10.
  • 11.
    Users { _id: ObjectId("4e316f236ce9cca7ef17d59d"), username: 'kbanker', feeds: [ { _id: ObjectId("4e316f236ce9cca7ef17d54c"), name: "GigaOM" }, { _id: ObjectId("4e316f236ce9cca7ef17d54d"), name: "Salon.com" } { _id: ObjectId("4e316f236ce9cca7ef17d54e"), name: "Foo News" } ], latest: Date(2011, 7, 25) }
  • 12.
  • 13.
    Feeds { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), url: 'http://news.foo.com/feed.xml/', name: 'Foo News', subscriber_count: 2, latest: Date(2011, 7, 25) } Index db.feeds.ensureIndex( { url: 1 }, { unique: true
  • 14.
    Adding a feedsubscription // Upsert db.feeds.update( { url: 'http://news.foo.com/feed.xml/', name: 'Foo News'}, { $inc: {subscriber_count: 1} }, true )
  • 15.
    Adding a feedsubscription // Add this feed to user feed list db.users.update( {_id: ObjectId("4e316f236ce9cca7ef17d59d") }, { $addToSet: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Foo News' } } )
  • 16.
    Removing a feed subscription db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") }, { $pull: { feeds: { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), name: 'Foo News' } } )
  • 17.
    Removing a feed subscription db.feeds.update( {url: 'http://news.foo.com/feed.xml/'}, { $inc: {subscriber_count: -1} } )
  • 18.
    Entries (populate in background) { _id: ObjectId("4e316b8a6ce9cca7ef17d54b"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Important person to resign', body: 'A person deemed very important has decided...', reads: 5, date: Date(2011, 7, 27) }
  • 19.
    What we neednow Populate personal feeds (buckets) Avoid lots of expensive queries Record what's been read
  • 20.
    Without bucketing // Naivequery runs every time db.entries.find( feed_id: {$in: user_feed_ids} ).sort({date: 1}).limit(25)
  • 21.
    With bucketing // Abit smarter: only runs once entries = db.entries.find( { date: {$gt: user_latest }, feed_id: { $in: user_feed_ids } ).sort({date: 1})
  • 22.
  • 23.
    bucket = { _id: ObjectId("4e3185c26ce9cca7ef17d552"), user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: Date( 2011, 7, 27 ) n: 100, entries: [ { _id: ObjectId("4e316b8a6ce9cca7ef17d5ac"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Important person to resign', body: 'A person deemed very important has decided...', date: Date(2011, 7, 27), read: false }, { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8"), feed_id: ObjectId("4e316b8a6ce9cca7ef17d54b"), title: 'Panda bear waves hello', body: 'A panda bear at the local zoo...', date: Date(2011, 7, 27), read: false } ] }
  • 24.
    db.buckets.insert( buckets ) db.users.update( { _id: ObjectId("4e316f236ce9cca7ef17d59d") } { $set: { latest: latest_entry_date } } )
  • 25.
    Viewing a personalfeed // Newest db.buckets.find( { user_id: ObjectId("4e316f236ce9cca7ef17d59d") } ).sort({date: -1}).limit(1)
  • 26.
    Viewing a personalfeed // Next newest (that's how we paginate) db.buckets.find( { user_id: ObjectId("4e316f236ce9cca7ef17d59d"), date: { $lt: previous_reader_date } } ).sort({date: -1}).limit(1)
  • 27.
  • 28.
    Marking a feeditem db.buckets.update( { user_id: ObjectId("4e316f236ce9cca7ef17d59d"), 'entries.id': ObjectId("4e316b8a6ce9cca7ef17d5c8")}, { $set: { 'entries.$.read' : true } )
  • 29.
    Marking a feeditem db.entries.update( { _id: ObjectId("4e316b8a6ce9cca7ef17d5c8") }, { $inc: { read: 1 } } )
  • 30.
    Sharding note: Buckets collectionis eminently shardable Shard key: { user_id: 1, date: 1 }
  • 31.
  • 32.
    Challenges: Real-time reporting. Efficient useof space. Easily wipe unneeded data.
  • 33.
    Techniques Pre-aggregate the data. Pre-constructdocument structure. Store emphemeral data in a separate database.
  • 34.
    Two collections: Each collectiongets its own database. Collections names are time-scoped. Clean, fast removal of old data.
  • 35.
    // Collections holdingtotals for each day, stored // in a database per month days_2011_5 days_2011_6 days_2011_7 ... // Totals for each month... months_2011_1_4 months_2011_5_8 months_2011_9_12 ...
  • 36.
    Hours and minutes {_id: { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), day: '2011-5-1' }, total: 2820, hrs: { 0: 500, 1: 700, 2: 450, 3: 343, // ... 4-23 go here } // Minutes are rolling. This gives real-time // numbers for the last hour. So when you increment // minute n, you need to $set minute n-1 to 0. mins: { 1: 12, 2: 10, 3: 5, 4: 34 // ... 5-60 go here } }
  • 37.
  • 38.
    // Update 'days' collection at 5:37 p.m. on 2011-5-1 // Might want to queue increments so that $inc is greater // than 1 for each write... id = { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), day: '2011-5-1' }; update = { $inc: { total: 1, 'hrs.17': 1, 'mins.37': 1 }, $set: { 'mins.36': 0} }; // Update collection containing days 1-5 db.days_2011_5.update( id, update );
  • 39.
    Months and days {_id: { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), month: '2011-5' }, total: 34173, months: { 1: 4000, 2: 4300, 3: 4200, 4: 5000, 5: 5100, 6: 5700, 7: 5873 // ... 8-12 go here } }
  • 40.
  • 41.
    // Update 'months'collection at 5:37 p.m. on 2011-5-1 query = { uri: BinData("0beec7b5ea3f0fdbc95d0dd47f35"), month: '2011-5' }; update = { $inc: { total: 1, 'months.5': 1 } }; // Update collection containing days 1-5 db.month_2011_1_4.update( query, update );
  • 42.
    Reporting Must provide dataat multiple resolutions (second, minute, etc.). We have the raw materials for that. Application assembles the data intelligently.
  • 43.
  • 44.
    Feed Reader Rich documents Incrementalmodifiers Bucketing strategy
  • 45.
  • 46.
  • 47.
  • 48.