MongoDB
Introduction and Internal
by Kai Zhao 2011.12
kingaim@gmail.com
What is MongoDB?

•Open source, scalable, high-performance, document-oriented NoSQL Key-Value
 based database.

•Features
    •JSON-style document –oriented storage with schema-less
    •B-tree index supported on any attribute
    •Log-based replication for Master/Slave and Replica Set
    •Auto-sharding architecture (via horizontal partition) scales to thousands of nodes
    •NoSQL-style query
    •Surprising updating behaviors
    •Map/Reduce support
    •GridFS specification for storing large files
    •Developed by 10gen with commercial support
Well/Less Well Suited




                        Source: http://www.mongodb.org/display/DOCS/Use+Cases
Basic concepts in MongoDB

          MongoDB                     Relational DBMS

          Databases*                      Databases*

          Collections*                    Relations*

 Documents*      Indexes*        Columns*         Indexes*

    Fields*
                                              NoSQL MongoDB    Relational DBMS
   Each document has its own fields and           Database        Database
   makes MongoDB schema-less.
                                                  Collection      Relation
                                                  Document          Tuple
                                                       Field       Column
                                                       Index        Index
                                                   Cursor          Cursor
* means 0 or more objects
CRUD Demo time

   show dbs                                        view existing databases
   use test                                        use database “test”
   db.t.insert({name:’bob’,age:’30’})              insert 30 years bob
   db. t.insert({name:’alice’,gender:’female’})    insert lady alice
   db. t.find()                                    list all documents in collection t
   db. t.find({name:’bob’},{age:1})                find 1 year old bob
   db. t.find().limit(1).skip(1)                   find the second document
   db. t.find().sort({name:1})                     sort the results with ascend name
   db. t.find({$or:[{name:’bob’},{name:’tom’}]})   find bob or tom’s documents
   db. t.update({name:’ bob’},{$set:{age:31}},      update all bob’s age to 31
                  false,true})
   db.stats()                                      database statistic
   db.getCollectionNames()                         collections under this db
   db.t.ensureIndex({name:1})                      create index on name
   db.people.find({name:“bob"}).explain()          explain plan step
Query Optimization

db.people.find({x:10,y:”foo”})




          Collection people      DiskLocation Scan




          Index on x             Index Scan



          Index on y             Index Scan
MongoDB Architecture




    Source: mongoDB Replication and Replica Set by Dwight Merriman 10gen
MongoDB Sharding
MongoDB uses two key operations to facilitate sharding - split and migrate.
Split splits a chunk into two ranges; it is done to assure no one chunk is unusually large.
Migrate moves a chunk (the data associated with a key range) to another shard.
This is done as needed to rebalance.

Split is an inexpensive metadata operation, while migrate is expensive as large amounts
of data may be moving server to server.
Both splits and migrates are performed automatically.

MongoDB has a sub-system called Balancer, which monitors shards loads and moves
chunks around if it finds an imbalance.

If you add a new shard to the system, some chunks will eventually be moved to
that shard to spread out the load.

A recently split chunk may be moved immediately to a new shard if the system
predicts that future insertions will benefit from that move.
MongoDB
Sharding

Pull mode
MongoDB Sharding: Briefly
                                      Sequence
          FROM:C                                                  TO:N




  #Make sure my view is complete and lock
  #Get the document’s DiskLoc for sharding
  #Trigger the N to sharding in Pull mode

                                             #Copy Index Definition from C
                                             #Remove existing data in [min~max]
                                             #Clone the data in[min~max] from C
                                             #Ask C to replicate the changes


   #Ask N to commit
                                              #N commit
MongoDB Sharding: In Details
           FROM                                                    TO




 Notice: The FROM can be updated/deleted during sharding and TO can catch up in step 4.
Replication and Sharding




Source: http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture
MongoDB Replication: Pull mode

Slave continuously pull the OpLog from Master.
Question

Reference:
1: Source code digest: http://www.cnblogs.com/daizhj/category/260889.html
2: Books http://www.mongodb.org/display/DOCS/Books
3: MongoDB offical website http://www.mongodb.com/

Mongodb introduction and_internal(simple)

  • 1.
    MongoDB Introduction and Internal byKai Zhao 2011.12 kingaim@gmail.com
  • 2.
    What is MongoDB? •Opensource, scalable, high-performance, document-oriented NoSQL Key-Value based database. •Features •JSON-style document –oriented storage with schema-less •B-tree index supported on any attribute •Log-based replication for Master/Slave and Replica Set •Auto-sharding architecture (via horizontal partition) scales to thousands of nodes •NoSQL-style query •Surprising updating behaviors •Map/Reduce support •GridFS specification for storing large files •Developed by 10gen with commercial support
  • 3.
    Well/Less Well Suited Source: http://www.mongodb.org/display/DOCS/Use+Cases
  • 4.
    Basic concepts inMongoDB MongoDB Relational DBMS Databases* Databases* Collections* Relations* Documents* Indexes* Columns* Indexes* Fields* NoSQL MongoDB Relational DBMS Each document has its own fields and Database Database makes MongoDB schema-less. Collection Relation Document Tuple Field Column Index Index Cursor Cursor * means 0 or more objects
  • 5.
    CRUD Demo time  show dbs view existing databases  use test use database “test”  db.t.insert({name:’bob’,age:’30’}) insert 30 years bob  db. t.insert({name:’alice’,gender:’female’}) insert lady alice  db. t.find() list all documents in collection t  db. t.find({name:’bob’},{age:1}) find 1 year old bob  db. t.find().limit(1).skip(1) find the second document  db. t.find().sort({name:1}) sort the results with ascend name  db. t.find({$or:[{name:’bob’},{name:’tom’}]}) find bob or tom’s documents  db. t.update({name:’ bob’},{$set:{age:31}}, update all bob’s age to 31  false,true})  db.stats() database statistic  db.getCollectionNames() collections under this db  db.t.ensureIndex({name:1}) create index on name  db.people.find({name:“bob"}).explain() explain plan step
  • 6.
    Query Optimization db.people.find({x:10,y:”foo”}) Collection people DiskLocation Scan Index on x Index Scan Index on y Index Scan
  • 7.
    MongoDB Architecture Source: mongoDB Replication and Replica Set by Dwight Merriman 10gen
  • 8.
    MongoDB Sharding MongoDB usestwo key operations to facilitate sharding - split and migrate. Split splits a chunk into two ranges; it is done to assure no one chunk is unusually large. Migrate moves a chunk (the data associated with a key range) to another shard. This is done as needed to rebalance. Split is an inexpensive metadata operation, while migrate is expensive as large amounts of data may be moving server to server. Both splits and migrates are performed automatically. MongoDB has a sub-system called Balancer, which monitors shards loads and moves chunks around if it finds an imbalance. If you add a new shard to the system, some chunks will eventually be moved to that shard to spread out the load. A recently split chunk may be moved immediately to a new shard if the system predicts that future insertions will benefit from that move.
  • 9.
  • 10.
    MongoDB Sharding: Briefly Sequence FROM:C TO:N #Make sure my view is complete and lock #Get the document’s DiskLoc for sharding #Trigger the N to sharding in Pull mode #Copy Index Definition from C #Remove existing data in [min~max] #Clone the data in[min~max] from C #Ask C to replicate the changes #Ask N to commit #N commit
  • 11.
    MongoDB Sharding: InDetails FROM TO Notice: The FROM can be updated/deleted during sharding and TO can catch up in step 4.
  • 12.
    Replication and Sharding Source:http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture
  • 13.
    MongoDB Replication: Pullmode Slave continuously pull the OpLog from Master.
  • 14.
    Question Reference: 1: Source codedigest: http://www.cnblogs.com/daizhj/category/260889.html 2: Books http://www.mongodb.org/display/DOCS/Books 3: MongoDB offical website http://www.mongodb.com/