Open source, high performance database




                                         July 2012

                                                     1
• Quick introduction to mongoDB

• Data modeling in
  mongoDB, queries, geospatial, updates
  and map reduce.

• Using a location-based app as an example

• Example works in mongoDB JS shell
                                             2
3
MongoDB is a scalable, high-performance, open
 source, document-oriented database.

•   Fast Querying
•   In-place updates
•   Full Index Support
•   Replication /High Availability
•   Auto-Sharding
•   Aggregation; Map/Reduce
•   GridFS


                                                4
MongoDB is Implemented in C++
• Windows, Linux, Mac OS-X, Solaris




Drivers are available in many languages
10gen supported
• C, C#
  (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ru
  by, Scala, nodejs
• Multiple community supported drivers
                                                                          5
RDBMS       MongoDB
Table       Collection
Row(s)      JSON Document
Index       Index
Partition   Shard
Join        Embedding/Linking
Schema      (implied Schema)




                                6
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
   author : "Asya",
   date : ISODate("2012-02-02T11:52:27.442Z"),
   text : "About MongoDB...",
   tags : [ "tech", "databases" ],
   comments : [{
        author : "Fred",
        date : ISODate("2012-02-03T17:22:21.124Z"),   text : "Best
Post Ever!"
   }],
   comment_count : 1
 }



                                                                     7
• JSON has powerful, limited set of datatypes
– Mongo extends datatypes with Date, Int types, Id, …

• MongoDB stores data in BSON

• BSON is a binary representation of JSON
– Optimized for performance and navigational abilities
– Also compression

See: bsonspec.org


                                                         8
• Intrinsic support for fast, iterative development

• Super low latency access to your data

• Very little CPU overhead

• No additional caching layer required

• Built in replication and horizontal scaling support



                                                        9
• Want to build an app where users can check in to
  a location




• Leave notes or comments about that location



                                                     10
"As a user I want to be able to find other
  locations nearby"

• Need to store locations
  (Offices, Restaurants, etc)
 – name, address, tags
 – coordinates
 – User generated content e.g. tips / notes




                                              11
"As a user I want to be able to 'checkin' to a
  location"

Checkins
 – User should be able to 'check in' to a location
 – Want to be able to generate statistics:
  • Recent checkins
  • Popular locations



                                                     12
loc1, loc2, loc3   user1, user2   checkin1, checkin2


locations          users           checkins




                                                       13
> location_1 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021
 }




                                14
> location_1 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021
 }

> db.locations.find({name: "Din Tai Fung"})




                                              15
> location_1 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021
 }

> db.locations.ensureIndex({name: 1})
> db.locations.find({name: "Din Tai Fung"})




                                              16
> location_2 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021,

     tags: ["restaurant", "dumplings"]
 }




                                         17
> location_2 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021,

     tags: ["restaurant", "dumplings"]
 }

> db.locations.ensureIndex({tags: 1})




                                         18
> location_2 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021,

     tags: ["restaurant", "dumplings"]
 }

> db.locations.ensureIndex({tags: 1})
> db.locations.find({tags: "dumplings"})




                                           19
> location_3 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021,
    tags: ["restaurant", "dumplings"],

     lat_long: [52.5184, 13.387]
 }




                                         20
> location_3 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021,
    tags: ["restaurant", "dumplings"],

     lat_long: [52.5184, 13.387]
 }

> db.locations.ensureIndex({lat_long: "2d"})




                                               21
> location_3 = {
    name: "Din Tai Fung",
    address: "123 Xingye Lu",
    city: "Shanghai",
    post_code: 200021,
    tags: ["restaurant", "dumplings"],

     lat_long: [52.5184, 13.387]
 }

> db.locations.ensureIndex({lat_long: "2d"})
> db.locations.find({lat_long: {$near:[52.53, 13.4]}})




                                                         22
// creating your indexes:
> db.locations.ensureIndex({tags: 1})
> db.locations.ensureIndex({name: 1})
> db.locations.ensureIndex({lat_long: "2d"})

// finding places:
> db.locations.find({lat_long: {$near:[52.53, 13.4]}})

// with regular expressions:
> db.locations.find({name: /^Din/})

// by tag:
> db.locations.find({tag: "dumplings"})


                                                         23
Atomic operators:

$set, $unset, $inc, $push, $pushAll, $pull, $pullAll, $bit




                                                             24
// initial data load:
> db.locations.insert(location_3)

// adding a tip with update:
> db.locations.update(
  {name: "Din Tai Fung"},
  {$push: {
    tips: {
      user: "Asya",
      date: "28/03/2012",
      tip: "The hairy crab dumplings are awesome!"}
 }})




                                                      25
> db.locations.findOne()
{ name: "Din Tai Fung",
  address: "123 Xingye Lu",
  city: "Shanghai",
  post_code: 200021,
  tags: ["restaurant", "dumplings"],
  lat_long: [52.5184, 13.387],
  tips:[{
      user: "Asya",
      date: "28/03/2012",
      tip: "The hairy crab dumplings are awesome!"
   }]

}



                                                     26
"As a user I want to be able to 'checkin' to a
  location"
Checkins
 – User should be able to 'check in' to a location
 – Want to be able to generate statistics:
   • Recent checkins
   • Popular locations




                                                     27
> user_1 = {
    _id: "asya@10gen.com",
    name: "Asya",
    twitter: "asya999",
    checkins: [
      {location: "Din Tai Fung", ts: "28/03/2012"},
      {location: "Meridian Hotel", ts: "27/03/2012"}
    ]
 }

> db.users.ensureIndex({checkins.location: 1})
> db.users.find({checkins.location: "Din Tai Fung"})




                                                       28
// find all users who've checked in here:
> db.users.find({"checkins.location":"Din Tai Fung"})




                                                        29
// find all users who've checked in here:
> db.users.find({"checkins.location":"Din Tai Fung"})

// find the last 10 checkins here?
> db.users.find({"checkins.location":"Din Tai Fung"})
         .sort({"checkins.ts": -1}).limit(10)




                                                        30
// find all users who've checked in here:
> db.users.find({"checkins.location":"Din Tai Fung"})

// find the last 10 checkins here: - Warning!
> db.users.find({"checkins.location":"Din Tai Fung"})
         .sort({"checkins.ts": -1}).limit(10)




                Hard to query for last 10


                                                        31
> user_2 = {
    _id: "asya@10gen.com",
    name: "Asya",
    twitter: "asya999",
 }

> checkin_1 = {
    location: location_id,
    user: user_id,
    ts: "20/03/2010"
 }

> db.checkins.ensureIndex({user: 1})
> db.checkins.find({user: user_id})



                                       32
// find all users who've checked in here:
> location_id = db.checkins.find({"name":"Din Tai Fung"})
> u_ids = db.checkins.find({location: location_id},
                 {_id: -1, user: 1})
> users = db.users.find({_id: {$in: u_ids}})

// find the last 10 checkins here:
> db.checkins.find({location: location_id})
           .sort({ts: -1}).limit(10)

// count how many checked in today:
> db.checkins.find({location: location_id,
             ts: {$gt: midnight}}
         ).count()


                                                            33
// Find most popular locations
> agg = db.checkins.aggregate(
   {$match: {ts: {$gt: now_minus_3_hrs}}},
   {$group: {_id: "$location",      numEntries: {$sum: 1}}}
 )
> agg.result
 [{"_id": "Din Tai Fung", "numEntries" : 17}]




                                                              34
// Find most popular locations
> map_func = function() {
   emit(this.location, 1);
 }
> reduce_func = function(key, values) {
   return Array.sum(values);
 }
> db.checkins.mapReduce(map_func, reduce_func,
  {query: {ts: {$gt: now_minus_3_hrs}},
   out: "result"})
> db.result.findOne()
 {"_id": "Din Tai Fung", "value" : 17}



                                                 35
Deployment

             36
• Single server
  - need a strong backup plan
                                P




                                    37
• Single server
  - need a strong backup plan
                                P

• Replica sets              P   S   S
  - High availability
  - Automatic failover




                                        38
• Single server
  - need a strong backup plan
                                P

• Replica sets              P   S   S
  - High availability
  - Automatic failover

• Sharded                   P   S   S
  - Horizontally scale
  - Auto balancing
                            P   S   S
                                        39
Content Management       Operational Intelligence           E-Commerce




            User Data Management         High Volume Data Feeds




                                                                         40
41
download at mongodb.org


 conferences, appearances, and meetups
 http://www.10gen.com/events



       Facebook             |    Twitter   |         LinkedIn
    http://bit.ly/mongofb       @mongodb       http://linkd.in/joinmongo

support, training, and this talk brought to you by



                                                                           42

First app online conf

  • 1.
    Open source, highperformance database July 2012 1
  • 2.
    • Quick introductionto mongoDB • Data modeling in mongoDB, queries, geospatial, updates and map reduce. • Using a location-based app as an example • Example works in mongoDB JS shell 2
  • 3.
  • 4.
    MongoDB is ascalable, high-performance, open source, document-oriented database. • Fast Querying • In-place updates • Full Index Support • Replication /High Availability • Auto-Sharding • Aggregation; Map/Reduce • GridFS 4
  • 5.
    MongoDB is Implementedin C++ • Windows, Linux, Mac OS-X, Solaris Drivers are available in many languages 10gen supported • C, C# (.Net), C++, Erlang, Haskell, Java, JavaScript, Perl, PHP, Python, Ru by, Scala, nodejs • Multiple community supported drivers 5
  • 6.
    RDBMS MongoDB Table Collection Row(s) JSON Document Index Index Partition Shard Join Embedding/Linking Schema (implied Schema) 6
  • 7.
    { _id :ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Asya", date : ISODate("2012-02-02T11:52:27.442Z"), text : "About MongoDB...", tags : [ "tech", "databases" ], comments : [{ author : "Fred", date : ISODate("2012-02-03T17:22:21.124Z"), text : "Best Post Ever!" }], comment_count : 1 } 7
  • 8.
    • JSON haspowerful, limited set of datatypes – Mongo extends datatypes with Date, Int types, Id, … • MongoDB stores data in BSON • BSON is a binary representation of JSON – Optimized for performance and navigational abilities – Also compression See: bsonspec.org 8
  • 9.
    • Intrinsic supportfor fast, iterative development • Super low latency access to your data • Very little CPU overhead • No additional caching layer required • Built in replication and horizontal scaling support 9
  • 10.
    • Want tobuild an app where users can check in to a location • Leave notes or comments about that location 10
  • 11.
    "As a userI want to be able to find other locations nearby" • Need to store locations (Offices, Restaurants, etc) – name, address, tags – coordinates – User generated content e.g. tips / notes 11
  • 12.
    "As a userI want to be able to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: • Recent checkins • Popular locations 12
  • 13.
    loc1, loc2, loc3 user1, user2 checkin1, checkin2 locations users checkins 13
  • 14.
    > location_1 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021 } 14
  • 15.
    > location_1 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021 } > db.locations.find({name: "Din Tai Fung"}) 15
  • 16.
    > location_1 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021 } > db.locations.ensureIndex({name: 1}) > db.locations.find({name: "Din Tai Fung"}) 16
  • 17.
    > location_2 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021, tags: ["restaurant", "dumplings"] } 17
  • 18.
    > location_2 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1}) 18
  • 19.
    > location_2 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021, tags: ["restaurant", "dumplings"] } > db.locations.ensureIndex({tags: 1}) > db.locations.find({tags: "dumplings"}) 19
  • 20.
    > location_3 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } 20
  • 21.
    > location_3 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) 21
  • 22.
    > location_3 ={ name: "Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387] } > db.locations.ensureIndex({lat_long: "2d"}) > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) 22
  • 23.
    // creating yourindexes: > db.locations.ensureIndex({tags: 1}) > db.locations.ensureIndex({name: 1}) > db.locations.ensureIndex({lat_long: "2d"}) // finding places: > db.locations.find({lat_long: {$near:[52.53, 13.4]}}) // with regular expressions: > db.locations.find({name: /^Din/}) // by tag: > db.locations.find({tag: "dumplings"}) 23
  • 24.
    Atomic operators: $set, $unset,$inc, $push, $pushAll, $pull, $pullAll, $bit 24
  • 25.
    // initial dataload: > db.locations.insert(location_3) // adding a tip with update: > db.locations.update( {name: "Din Tai Fung"}, {$push: { tips: { user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!"} }}) 25
  • 26.
    > db.locations.findOne() { name:"Din Tai Fung", address: "123 Xingye Lu", city: "Shanghai", post_code: 200021, tags: ["restaurant", "dumplings"], lat_long: [52.5184, 13.387], tips:[{ user: "Asya", date: "28/03/2012", tip: "The hairy crab dumplings are awesome!" }] } 26
  • 27.
    "As a userI want to be able to 'checkin' to a location" Checkins – User should be able to 'check in' to a location – Want to be able to generate statistics: • Recent checkins • Popular locations 27
  • 28.
    > user_1 ={ _id: "asya@10gen.com", name: "Asya", twitter: "asya999", checkins: [ {location: "Din Tai Fung", ts: "28/03/2012"}, {location: "Meridian Hotel", ts: "27/03/2012"} ] } > db.users.ensureIndex({checkins.location: 1}) > db.users.find({checkins.location: "Din Tai Fung"}) 28
  • 29.
    // find allusers who've checked in here: > db.users.find({"checkins.location":"Din Tai Fung"}) 29
  • 30.
    // find allusers who've checked in here: > db.users.find({"checkins.location":"Din Tai Fung"}) // find the last 10 checkins here? > db.users.find({"checkins.location":"Din Tai Fung"}) .sort({"checkins.ts": -1}).limit(10) 30
  • 31.
    // find allusers who've checked in here: > db.users.find({"checkins.location":"Din Tai Fung"}) // find the last 10 checkins here: - Warning! > db.users.find({"checkins.location":"Din Tai Fung"}) .sort({"checkins.ts": -1}).limit(10) Hard to query for last 10 31
  • 32.
    > user_2 ={ _id: "asya@10gen.com", name: "Asya", twitter: "asya999", } > checkin_1 = { location: location_id, user: user_id, ts: "20/03/2010" } > db.checkins.ensureIndex({user: 1}) > db.checkins.find({user: user_id}) 32
  • 33.
    // find allusers who've checked in here: > location_id = db.checkins.find({"name":"Din Tai Fung"}) > u_ids = db.checkins.find({location: location_id}, {_id: -1, user: 1}) > users = db.users.find({_id: {$in: u_ids}}) // find the last 10 checkins here: > db.checkins.find({location: location_id}) .sort({ts: -1}).limit(10) // count how many checked in today: > db.checkins.find({location: location_id, ts: {$gt: midnight}} ).count() 33
  • 34.
    // Find mostpopular locations > agg = db.checkins.aggregate( {$match: {ts: {$gt: now_minus_3_hrs}}}, {$group: {_id: "$location", numEntries: {$sum: 1}}} ) > agg.result [{"_id": "Din Tai Fung", "numEntries" : 17}] 34
  • 35.
    // Find mostpopular locations > map_func = function() { emit(this.location, 1); } > reduce_func = function(key, values) { return Array.sum(values); } > db.checkins.mapReduce(map_func, reduce_func, {query: {ts: {$gt: now_minus_3_hrs}}, out: "result"}) > db.result.findOne() {"_id": "Din Tai Fung", "value" : 17} 35
  • 36.
  • 37.
    • Single server - need a strong backup plan P 37
  • 38.
    • Single server - need a strong backup plan P • Replica sets P S S - High availability - Automatic failover 38
  • 39.
    • Single server - need a strong backup plan P • Replica sets P S S - High availability - Automatic failover • Sharded P S S - Horizontally scale - Auto balancing P S S 39
  • 40.
    Content Management Operational Intelligence E-Commerce User Data Management High Volume Data Feeds 40
  • 41.
  • 42.
    download at mongodb.org conferences, appearances, and meetups http://www.10gen.com/events Facebook | Twitter | LinkedIn http://bit.ly/mongofb @mongodb http://linkd.in/joinmongo support, training, and this talk brought to you by 42

Editor's Notes

  • #6 not recommended
  • #7 Later Antoine will talk about schema design
  • #10 Flexible schema
  • #38 journaling plus backups
  • #39 talk about relication
  • #41 small sample of use cases.