SlideShare a Scribd company logo
1 of 117
Download to read offline
MongoDB
          http://tinyurl.com/97o49y3

                            by toki
About me
● Delta Electronic CTBD Senior Engineer
● Main developer of http://loltw.net
  β—‹ Website built via MongoDB with daily 600k PV
  β—‹ Data grow up everyday with auto crawler bots
MongoDB - Simple Introduction
● Document based NOSQL(Not Only SQL)
  database
● Started from 2007 by 10Gen company
● Wrote in C++
● Fast (But takes lots of memory)
● Stores JSON documents in BSON format
● Full index on any document attribute
● Horizontal scalability with auto sharding
● High availability & replica ready
What is database?
● Raw data
  β—‹ John is a student, he's 12 years old.
● Data
  β—‹ Student
    β–  name = "John"
    β–  age = 12
● Records
  β—‹ Student(name="John", age=12)
  β—‹ Student(name="Alice", age=11)
● Database
  β—‹ Student Table
  β—‹ Grades Table
Example of (relational) database

                    Student Grade

                 Grade ID

                 StudentID
       Student
                 Grade
Student ID                                 Grade

Name                                Grade ID

Age                                 Name

Class ID

                         Class

                 Class ID

                 Name
SQL Language - How to find data?
● Find student name is John
  β—‹ select * from student where name="John"
● Find class name of John
  β—‹ select s.name, c.name as class_name from student
    s, class c where name="John" and s.class_id=c.
    class_id
Why NOSQL?
● Big data
  β—‹ Morden data size is too big for single DB server
  β—‹ Google search engine
● Connectivity
  β—‹ Facebook like button
● Semi-structure data
  β—‹ Car equipments database
● High availability
  β—‹ The basic of cloud service
Common NOSQL DB characteristic
●   Schemaless
●   No join, stores pre-joined/embedded data
●   Horizontal scalability
●   Replica ready - High availability
Common types of NOSQL DB
● Key-Value
  β—‹ Based on Amazon's Dynamo paper
  β—‹ Stores K-V pairs
  β—‹ Example:
    β–  Dynomite
    β–  Voldemort
Common types of NOSQL DB
● Bigtable clones
  β—‹   Based on Google Bigtable paper
  β—‹   Column oriented, but handles semi-structured data
  β—‹   Data keyed by: row, column, time, index
  β—‹   Example:
      β–  Google Big Table
      β–  HBase
      β–  Cassandra(FB)
Common types of NOSQL DB
● Document base
  β—‹ Stores multi-level K-V pairs
  β—‹ Usually use JSON as document format
  β—‹ Example:
    β–  MongoDB
    β–  CounchDB (Apache)
    β–  Redis
Common types of NOSQL DB
● Graph
  β—‹ Focus on modeling the structure of data -
    interconnectivity
  β—‹ Example
     β–  Neo4j
     β–  AllegroGraph
Start using MongoDB - Installation
● From apt-get (debian / ubuntu only)
  β—‹ sudo apt-get install mongodb
● Using 10-gen mongodb repository
  β—‹ http://docs.mongodb.org/manual/tutorial/install-
    mongodb-on-debian-or-ubuntu-linux/
● From pre-built binary or source
  β—‹ http://www.mongodb.org/downloads
● Note:
  32-bit builds limited to around 2GB of data
Manual start your MongoDB
mkdir -p /tmp/mongo
mongod --dbpath /tmp/mongo

or

mongod -f mongodb.conf
Verify your MongoDB installation
$ mongo

MongoDB shell version: 2.2.0
connecting to: test
>_

--------------------------------------------------------
mongo localhost/test2
mongo 127.0.0.1/test
How many database do you have?
show dbs
Elements of MongoDB
● Database
  β—‹ Collection
    β–  Document
What is JSON
● JavaScript Object Notation
● Elements of JSON      {
  β—‹ Object: K/V pairs       "key1": "value1",
  β—‹ Key, String             "key2": 2.0
  β—‹ Value, could be         "key3": [1, "str", 3.0],
    β–  string                "key4": false,
    β–  bool                  "key5": {
                               "name": "another object",
    β–  number
                            }
    β–  array
                        }
    β–  object
    β–  null
Another sample of JSON
{
    "name": "John",
    "age": 12,
    "grades": {
        "math": 4.0,
        "english": 5.0
    },
    "registered": true,
    "favorite subjects": ["math", "english"]
}
Insert document into MongoDB
s={
  "name": "John",
  "age": 12,
  "grades": {
      "math": 4.0,
      "english": 5.0
  },
  "registered": true,
  "favorite subjects": ["math", "english"]
}
db.students.insert(s);
Verify inserted document
db.students.find()

also try

db.student.insert(s)
show collections
Save document into MongoDB
s.name = "Alice"
s.age = 14
s.grades.math = 2.0

db.students.save(s)
What is _id / ObjectId ?
● _id is the default primary key for indexing
  documents, could be any JSON acceptable
  value.
● By default, MongoDB will auto generate a
  ObjectId as _id
● ObjectId is 12 bytes value of unique
  document _id
● Use ObjectId().getTimestamp() to restore
  the timestamp in ObjectId
   0     1      2       3   4     5       6   7       8    9      10       11

       unix timestamp           machine       process id       Increment
Save document with id into MongoDB
s.name = "Bob"
s.age = 11
s['favorite subjects'] = ["music", "math", "art"]
s.grades.chinese = 3.0
s._id = 1

db.students.save(s)
Save document with existing _id
delete s.registered

db.students.save(s)
How to find documents?
● db.xxxx.find()
  β—‹ list all documents in collection
● db.xxxx.find(
    find spec, //how document looks like
    find fields, //which parts I wanna see
    ...
  )
● db.xxxx.findOne()
  β—‹ only returns first document match find spec.
find by id
db.students.find({_id: 1})
db.students.find({_id: ObjectId('xxx....')})
find and filter return fields
db.students.find({_id:   1},   {_id: 1})
db.students.find({_id:   1},   {name: 1})
db.students.find({_id:   1},   {_id: 1, name: 1})
db.students.find({_id:   1},   {_id: 0, name: 1})
find by name - equal or not equal
db.students.find({name: "John"})
db.students.find({name: "Alice"})

db.students.find({name: {$ne: "John"}})
● $ne : not equal
find by name - ignorecase ($regex)
db.students.find({name: "john"})    => X
db.students.find({name: /john/i})   => O

db.students.find({
     name: {
       $regex: "^b",
       $options: "i"
     }
  })
find by range of names - $in, $nin
db.students.find({name: {$in: ["John", "Bob"]}})
db.students.find({name: {$nin: ["John", "Bob"]}})


● $in : in range (array of items)
● $nin : not in range
find by age - $gt, $gte, $lt, $lte
db.students.find({age:   {$gt: 12}})
db.students.find({age:   {$gte: 12}})
db.students.find({age:   {$lt: 12}})
db.students.find({age:   {$lte: 12}})

●   $gt    :   greater than
●   $gte   :   greater than or equal
●   $lt    :   lesser than
●   $lte   :   lesser or equal
find by field existence - $exists
db.students.find({registered: {$exists: true}})
db.students.find({registered: {$exists: false}})
find by field type - $type
db.students.find({_id: {$type: 7}})
db.students.find({_id: {$type: 1}})
  1    Double           11    Regular expression

  2    String           13    JavaScript code

  3    Object           14    Symbol

  4    Array            15    JavaScript code with scope

  5    Binary Data      16    32 bit integer

  7    Object id        17    Timestamp

  8    Boolean          18    64 bit integer

  9    Date             255   Min key

  10   Null             127   Max key
find in multi-level fields
db.students.find({"grades.math": {$gt: 2.0}})
db.students.find({"grades.math": {$gte: 2.0}})
find by remainder - $mod
db.students.find({age: {$mod: [10, 2]}})
db.students.find({age: {$mod: [10, 3]}})
find in array - $size
db.students.find(
  {'favorite subjects': {$size: 2}}
)
db.students.find(
  {'favorite subjects': {$size: 3}}
)
find in array - $all
db.students.find({'favorite subjects': {
      $all: ["music", "math", "art"]
  }})
db.students.find({'favorite subjects': {
      $all: ["english", "math"]
  }})
find in array - find value in array
db.students.find(
  {"favorite subjects": "art"}
)

db.students.find(
  {"favorite subjects": "math"}
)
find with bool operators - $and, $or
db.students.find({$or: [
    {age: {$lt: 12}},
    {age: {$gt: 12}}
]})

db.students.find({$and: [
    {age: {$lt: 12}},
    {age: {$gte: 11}}
]})
find with bool operators - $and, $or
db.students.find({$and: [
    {age: {$lt: 12}},
    {age: {$gte: 11}}
]})

equals to

db.student.find({age: {$lt:12, $gte: 11}}
find with bool operators - $not
$not could only be used with other find filter

X db.students.find({registered: {$not: false}})
O db.students.find({registered: {$ne: false}})

O db.students.find({age: {$not: {$gte: 12}}})
find with JavaScript- $where
db.students.find({$where: "this.age > 12"})

db.students.find({$where:
   "this.grades.chinese"
})
find cursor functions
● count
  db.students.find().count()
● limit
  db.students.find().limit(1)
● skip
  db.students.find().skip(1)
● sort
  db.students.find().sort({age: -1})
  db.students.find().sort({age: 1})
combine find cursor functions
db.students.find().skip(1).limit(1)
db.students.find().skip(1).sort({age: -1})
db.students.find().skip(1).limit(1).sort({age:
-1})
more cursor functions
● snapshot
  ensure cursor returns
  β—‹ no duplicates
  β—‹ misses no object
  β—‹ returns all matching objects that were present at
    the beginning and the end of the query.
  β—‹ usually for export/dump usage
more cursor functions
● batchSize
  tell MongoDB how many documents should
  be sent to client at once

● explain
  for performance profiling

● hint
  tell MongoDB which index should be used
  for querying/sorting
list current running operations
● list operations
  db.currentOP()

● cancel operations
  db.killOP()
MongoDB index - when to use index?
● while doing complicate find
● while sorting lots of data
MongoDB index - sort() example
for (i=0; i<1000000; i++){
    db.many.save({value: i});
}

db.many.find().sort({value: -1})

error: {
    "$err" : "too much data for sort() with no index. add an index or specify
a smaller limit",
    "code" : 10128
}
MongoDB index - how to build index
db.many.ensureIndex({value: 1})

● Index options
  β—‹   background
  β—‹   unique
  β—‹   dropDups
  β—‹   sparse
MongoDB index - index commands
● list index
  db.many.getIndexes()

● drop index
  db.many.dropIndex({value: 1})
  db.many.dropIndexes() <-- DANGER!
MongoDB Index - find() example
db.many.dropIndex({value: 1})
db.many.find({value: 5555}).explain()

db.many.ensureIndex({value: 1})
db.many.find({value: 5555}).explain()
MongoDB Index - Compound Index
db.xxx.ensureIndex({a:1, b:-1, c:1})

query/sort with fields
   ● a
   ● a, b
   ● a, b, c
will be accelerated by this index
Remove/Drop data from MongoDB
● Remove
  db.many.remove({value: 5555})
  db.many.find({value: 5555})
  db.many.remove()
● Drop
  db.many.drop()
● Drop database
  db.dropDatabase() EXTREMELY DANGER!!!
How to update data in MongoDB
Easiest way:

s = db.students.findOne({_id: 1})
s.registered = true
db.students.save(s)
In place update - update()
update( {find spec},
        {update spec},
        upsert=false)

db.students.update(
  {_id: 1},
  {$set: {registered: false}}
)
Update a non-exist document
db.students.update(
  {_id: 2},
  {name: 'Mary', age: 9},
  true
)
db.students.update(
  {_id: 2},
  {$set: {name: 'Mary', age: 9}},
  true
)
set / unset field value
db.students.update({_id: 1},
  {$set: {"age": 15}})

db.students.update({_id: 1},
  {$set: {registered:
      {2012: false, 2011:true}
  }})
db.students.update({_id: 1},
  {$unset: {registered: 1}})
increase/decrease value
db.students.update({_id: 1}, {
   $inc: {
      "grades.math": 1.1,
      "grades.english": -1.5,
      "grades.history": 3.0
   }
})
push value(s) into array
db.students.update({_id: 1},{
   $push: {tags: "lazy"}
})

db.students.update({_id: 1},{
   $pushAll: {tags: ["smart", "cute"]}
})
add only not exists value to array
db.students.update({_id: 1},{
   $push: {tags: "lazy"}
})
db.students.update({_id: 1},{
   $addToSet:{tags: "lazy"}
})
db.students.update({_id: 1},{
   $addToSet:{tags: {$each: ["tall", "thin"]}}
})
remove value from array
db.students.update({_id: 1},{
   $pull: {tags: "lazy"}
})
db.students.update({_id: 1},{
   $pull: {tags: {$ne: "smart"}}
})
db.students.update({_id: 1},{
   $pullAll: {tags: ["lazy", "smart"]}
})
pop value from array
a = []; for(i=0;i<20;i++){a.push(i);}
db.test.save({_id:1, value: a})

db.test.update({_id: 1}, {
   $pop: {value: 1}
})
db.test.update({_id: 1}, {
   $pop: {value: -1}
})
rename field
db.test.update({_id: 1}, {
   $rename: {value: "values"}
})
Practice: add comments to student
Add a field into students ({_id: 1}):
● field name: comments
● field type: array of dictionary
● field content:
   β—‹ {
         by: author name, string
         text: content of comment, string
    }
● add at least 3 comments to this field
Example answer to practice
db.students.update({_id: 1}, {
$addToSet: { comments: {$each: [
    {by: "teacher01", text: "text 01"},
    {by: "teacher02", text: "text 02"},
    {by: "teacher03", text: "text 03"},
]}}
})
The $ position operator (for array)
db.students.update({
      _id: 1,
      "comments.by": "teacher02"
   }, {
      $inc: {"comments.$.vote": 1}
})
Atomically update - findAndModify
● Atomically update SINGLE DOCUMENT and
  return it
● By default, returned document won't
  contain the modification made in
  findAndModify command.
findAndModify parameters
db.xxx.findAndModify({
query: filter to query
sort: how to sort and select 1st document in query results
remove: set true if you want to remove it
update: update content
new: set true if you want to get the modified object
fields: which fields to fetch
upsert: create object if not exists
})
GridFS
●   MongoDB has 32MB document size limit
●   For storing large binary objects in MongoDB
●   GridFS is kind of spec, not implementation
●   Implementation is done by MongoDB drivers
●   Current supported drivers:
    β—‹   PHP
    β—‹   Java
    β—‹   Python
    β—‹   Ruby
    β—‹   Perl
GridFS - command line tools
● List
  mongofiles list
● Put
  mongofiles put xxx.txt
● Get
  mongofiles get xxx.txt
MongoDB config - basic
● dbpath
  β—‹ Which folder to put MongoDB database files
  β—‹ MongoDB must have write permission to this folder
● logpath, logappend
  β—‹ logpath = log filename
  β—‹ MongoDB must have write permission to log file
● bind_ip
  β—‹ IP(s) MongoDB will bind with, by default is all
  β—‹ User comma to separate more than 1 IP
● port
  β—‹ Port number MongoDB will use
  β—‹ Default port = 27017
Small tip - rotate MongoDB log
db.getMongo().getDB("admin").runCommand
("logRotate")
MongoDB config - journal
● journal
  β—‹ Set journal on/off
  β—‹ Usually you should keep this on
MongoDB config - http interface
● nohttpinterface
  β—‹ Default listen on http://localhost:28017
  β—‹ Shows statistic info with http interface
● rest
  β—‹ Used with httpinterface option enabled only
  β—‹ Example:
    http://localhost:28017/test/students/
    http://localhost:28017/test/students/?
    filter_name=John
MongoDB config - authentication
● auth
  β—‹ By default, MongoDB runs with no authentication
  β—‹ If no admin account is created, you could login with
    no authentication through local mongo shell and
    start managing user accounts.
MongoDB account management
● Add admin user
  > mongo localhost/admin
  db.addUser("testadmin", "1234")
● Authenticated as admin user
  use admin
  db.auth("testadmin", "1234")
MongoDB account management
● Add user to test database
  use test
  db.addUser("testrw", "1234")
● Add read only user to test database
  db.addUser("testro", "1234", true)
● List users
  db.system.users.find()
● Remove user
  db.removeUser("testro")
MongoDB config - authentication
● keyFile
  β—‹ At least 6 characters and size smaller than 1KB
  β—‹ Used only for replica/sharding servers
  β—‹ Every replica/sharding server should use the same
    key file for communication
  β—‹ On U*ix system, file permission to key file for
    group/everyone must be none, or MongoDB will
    refuse to start
MongoDB configuration - Replica Set
● replSet
  β—‹ Indicate the replica set name
  β—‹ All MongoDB in same replica set should use the
    same name
  β—‹ Limitation
     β–  Maximum 12 nodes in a single replica set
     β–  Maximum 7 nodes can vote
  β—‹ MongoDB replica set is Eventually consistent
How's MongoDB replica set working?
● Each a replica set has single primary
  (master) node and multiple slave nodes
● Data will only be wrote to primary node
  then will be synced to other slave nodes.
● Use getLastError() for confirming previous
  write operation is committed to whole
  replica set, otherwise the write operation
  may be rolled back if primary node is down
  before sync.
How's MongoDB replica set working?
● Once primary node is down, the whole
  replica set will be marked as fail and can't
  do any operation on it until the other nodes
  vote and elect a new primary node.
● During failover, any write operation not
  committed to whole replica set will be
  rolled back
Simple replica set configuration
mkdir -p /tmp/db01
mkdir -p /tmp/db02
mkdir -p /tmp/db03

mongod --replSet test --port 29001 --dbpath /tmp/db01
mongod --replSet test --port 29002 --dbpath /tmp/db02
mongod --replSet test --port 29003 --dbpath /tmp/db03
Simple replica set configuration
mongo localhost:29001
Another way to config replica set
rs.initiate()
rs.add("localhost:29001")
rs.add("localhost:29002")
rs.add("localhost:29003")
Extra options for setting replica set
● arbiterOnly
  β—‹ Arbiter nodes don't receive data, can't become
    primary node but can vote.
● priority
  β—‹ Node with priority 0 will never be elected as
    primary node.
  β—‹ Higher priority nodes will be preferred as primary
  β—‹ If you want to force some node become primary
    node, do not update node's vote result, update
    node's priority value and reconfig replica set.
● buildIndexes
  β—‹ Can only be set to false on nodes with priority 0
  β—‹ Use false for backup only nodes
Extra options for setting replica set
● hidden
  β—‹ Nodes marked with hidden option will not be
    exposed to MongoDB clients.
  β—‹ Nodes marked with hidden option will not receive
    queries.
  β—‹ Only use this option for nodes with usage like
    reporting, integration, backup, etc.
● slaveDelay
  β—‹ How many seconds slave nodes could fall behind to
    primary nodes
  β—‹ Can only be set on nodes with priority 0
  β—‹ Used for preventing some human errors
Extra options for setting replica set
● vote
  If set to 1, this node can vote, else not.
Change primary node at runtime
config = rs.conf()
config.members[1].priority = 2
rs.reconfig(config)
What is sharding?

  Name    Value     A    value

  Alice   value     to   value

  Amy     value     F    value

  Bob     value
                    G    value
    :     value
                    to   value
    :     value
                    N    value
    :     value

    :     value
                    O    value
  Yoko    value
                    to   value
  Zeus    value
                    Z    value
MongoDB sharding architecture
Elements of MongoDB sharding
cluster
● Config Server
  Storing sharding cluster metadata
● mongos Router
  Routing database operations to correct
  shard server
● Shard Server
  Hold real user data
Sharding config - config server
● Config server is a MongoDB instance runs
  with --configsrv option
● Config servers will automatically synced by
  mongos process, so DO NOT run them with
  --replSet option
● Synchronous replication protocol is
  optimized for three machines.
Sharding config - mongos Router
● Use mongos (not mongod) for starting a
  mongos router
● mongos routes database operations to
  correct shard servers
● Exmaple command for starting mongos
  mongos --configdb db01, db02, db03
● With --chunkSize option, you could specify
  a smaller sharding chunk if you're just
  testing.
Sharding config - shard server
● Shard server is a MongoDB instance runs
  with --shardsvr option
● Shard server don't need to know where
  config server / mongos route is
Example script for building MongoDB
shard cluster
mkdir   -p   /tmp/s00
mkdir   -p   /tmp/s01
mkdir   -p   /tmp/s02
mkdir   -p   /tmp/s03

mongod --configsvr --port 29000 --dbpath /tmp/s00
mongos --configdb localhost:29000 --chunkSize 1 --port
28000
mongod --shardsvr --port 29001 --dbpath /tmp/s01
mongod --shardsvr --port 29002 --dbpath /tmp/s02
mongod --shardsvr --port 29003 --dbpath /tmp/s03
Sharding config - add shard server
mongo localhost:28000/admin

db.runCommand({addshard: "localhost:29001"})
db.runCommand({addshard: "localhost:29002"})
db.runCommand({addshard: "localhost:29003"})


db.printShardingStatus()
db.runCommand( { enablesharding : "test" } )
db.runCommand( {shardcollection: "test.shardtest",
key: {_id: 1}, unique: true})
Let us insert some documents
use test

for (i=0; i<1000000; i++) {
   db.shardtest.insert({value: i});
}
Remove 1 shard & see what happens
use admin
db.runCommand({removeshard: "shard0002"})

Let's add it back
db.runCommand({addshard: "localhost:
29003"})
Pick your sharding key wisely
● Sharding key can not be changed after
  sharding enabled
● For updating any document in a sharding
  cluster, sharding key MUST BE INCLUDED as
  find spec
EX:
  sharding key= {name: 1, class: 1}
  db.xxx.update({name: "xxxx", class: "ooo},{
  ..... update spec
  })
Pick your sharding key wisely
● Sharding key will strongly affect your data
  distribution model
EX:
  sharding by ObjectId
  shard001 => data saved 2 months ago
  shard002 => data saved 1 months ago
  shard003 => data saved recently
Other sharding key examples
EX:
  sharding by Username
  shard001 => Username starts with a to k
  shard002 => Username starts with l to r
  shard003 => Username starts with s to z
EX:
  sharding by md5
  completely random distribution
What is Mapreduce?
● Map then Reduce
● Map is the procedure to call a function for
  emitting keys & values sending to reduce
  function
● Reduce is the procedure to call a function
  for reducing the emitted keys & values sent
  via map function into single reduced result.
● Example: map students grades and reduce
  into total students grades.
How to call mapreduce in MongoDB
db.xxx.mapreduce(
   map function,
   reduce function,{
   out: output option,
   query: query filter, optional,
   sort: sort filter, optional,
   finalize: finalize function,
   .... etc
})
Let's generate some data
for (i=0; i<10000; i++){
   db.grades.insert({
       grades: {
          math: Math.random() * 100 % 100,
          art: Math.random() * 100 % 100,
          music: Math.random() * 100 % 100
       }
   });
}
Prepare Map function
function map(){
   for (k in this.grades){
       emit(k, {total: 1,
       pass: 1 ? this.grades[k] >= 60.0 : 0,
       fail: 1 ? this.grades[k] < 60.0 : 0,
       sum: this.grades[k],
       avg: 0
       });
   }
}
Prepare reduce function
function reduce(key, values){
   result = {total: 0, pass: 0, fail: 0, sum: 0, avg: 0};
   values.forEach(function(value){
       result.total += value.total;
       result.pass += value.pass;
       result.fail += value.fail;
       result.sum += value.sum;
   });
   return result;
}
Execute your 1st mapreduce call
 db.grades.mapReduce(
   map,
   reduce,
   {out:{inline: 1}}
)
Add finalize function
function finalize(key, value){
   value.avg = value.sum / value.total;
   return value;
}
Run mapreduce again with finalize
 db.grades.mapReduce(
   map,
   reduce,
   {out:{inline: 1}, finalize: finalize}
)
Mapreduce output options
● {replace: <result collection name>}
  Replace result collection if already existed.
● {merge: <result collection name>}
  Always overwrite with new results.
● {reduce: <result collection name>}
  Run reduce if same key exists in both
  old/current result collections. Will run
  finalize function if any.
● {inline: 1}
  Put result in memory
Other mapreduce output options
● db- put result collection in different
  database
● sharded - output collection will be sharded
  using key = _id
● nonAtomic - partial reduce result will be
  visible will processing.
MongoDB backup & restore
● mongodump
  mongodump -h localhost:27017
● mongorestore
  mongorestore -h localhost:27017 --drop
● mongoexport
  mongoexport -d test -c students -h
  localhost:27017 > students.json
● mongoimport
  mongoimport -d test -c students -h
  localhost:27017 < students.json
Conclusion - Pros of MongoDB
●   Agile (Schemaless)
●   Easy to use
●   Built in replica & sharding
●   Mapreduce with sharding
Conclusion - Cons of MongoDB
● Schemaless = everyone need to know how
  data look like
● Waste of spaces on keys
● Eats lots of memory
● Mapreduce is hard to handle
Cautions of MongoDB
● Global write lock
  β—‹ Add more RAM
  β—‹ Use newer version (MongoDB 2.2 now has DB level
    global write lock)
  β—‹ Split your database properly
● Remove document won't free disk spaces
  β—‹ You need run compact command periodically
● Don't let your MongoDB data disk full
  β—‹ Once freespace of disk used by MongoDB if full, you
    won't be able to move/delete document in it.

More Related Content

What's hot

The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
Β 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation FrameworkMongoDB
Β 
Dev Jumpstart: Schema Design Best Practices
Dev Jumpstart: Schema Design Best PracticesDev Jumpstart: Schema Design Best Practices
Dev Jumpstart: Schema Design Best PracticesMongoDB
Β 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDBrogerbodamer
Β 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
Β 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBantoinegirbal
Β 
Apache Solr lessons learned
Apache Solr lessons learnedApache Solr lessons learned
Apache Solr lessons learnedJeroen Rosenberg
Β 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB
Β 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWAnkur Raina
Β 
MySQL Without The SQL -- Oh My! PHP Detroit July 2018
MySQL Without The SQL -- Oh My! PHP Detroit July 2018MySQL Without The SQL -- Oh My! PHP Detroit July 2018
MySQL Without The SQL -- Oh My! PHP Detroit July 2018Dave Stokes
Β 
Mongo db queries
Mongo db queriesMongo db queries
Mongo db queriesssuser6d5faa
Β 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDBantoinegirbal
Β 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)MongoDB
Β 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real WorldMike Friedman
Β 
Working with the Web: 
Decoding JSON
Working with the Web: 
Decoding JSONWorking with the Web: 
Decoding JSON
Working with the Web: 
Decoding JSONSV.CO
Β 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
Β 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
Β 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
Β 

What's hot (19)

The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
Β 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
Β 
Dev Jumpstart: Schema Design Best Practices
Dev Jumpstart: Schema Design Best PracticesDev Jumpstart: Schema Design Best Practices
Dev Jumpstart: Schema Design Best Practices
Β 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
Β 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
Β 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Β 
Apache Solr lessons learned
Apache Solr lessons learnedApache Solr lessons learned
Apache Solr lessons learned
Β 
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...
Β 
Introduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUWIntroduction to MongoDB at IGDTUW
Introduction to MongoDB at IGDTUW
Β 
MySQL Without The SQL -- Oh My! PHP Detroit July 2018
MySQL Without The SQL -- Oh My! PHP Detroit July 2018MySQL Without The SQL -- Oh My! PHP Detroit July 2018
MySQL Without The SQL -- Oh My! PHP Detroit July 2018
Β 
Mongo db queries
Mongo db queriesMongo db queries
Mongo db queries
Β 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
Β 
Schema Design (Mongo Austin)
Schema Design (Mongo Austin)Schema Design (Mongo Austin)
Schema Design (Mongo Austin)
Β 
Data Modeling for the Real World
Data Modeling for the Real WorldData Modeling for the Real World
Data Modeling for the Real World
Β 
Mongo db
Mongo dbMongo db
Mongo db
Β 
Working with the Web: 
Decoding JSON
Working with the Web: 
Decoding JSONWorking with the Web: 
Decoding JSON
Working with the Web: 
Decoding JSON
Β 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
Β 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
Β 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
Β 

Similar to Mongo db

MongoDB - Javascript for your Data
MongoDB - Javascript for your DataMongoDB - Javascript for your Data
MongoDB - Javascript for your DataPaulo Fagundes
Β 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
Β 
PHP Development With MongoDB
PHP Development With MongoDBPHP Development With MongoDB
PHP Development With MongoDBFitz Agard
Β 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)MongoSF
Β 
Mongo Presentation by Metatagg Solutions
Mongo Presentation by Metatagg SolutionsMongo Presentation by Metatagg Solutions
Mongo Presentation by Metatagg SolutionsMetatagg Solutions
Β 
Latinoware
LatinowareLatinoware
Latinowarekchodorow
Β 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrKai Chan
Β 
MongoDB - Features and Operations
MongoDB - Features and OperationsMongoDB - Features and Operations
MongoDB - Features and Operationsramyaranjith
Β 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopAhmedabadJavaMeetup
Β 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data ModelingDATAVERSITY
Β 
Schema Design
Schema DesignSchema Design
Schema DesignMongoDB
Β 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)javier ramirez
Β 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
Β 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_bostonMongoDB
Β 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
Β 
PHP Machinist Presentation
PHP Machinist PresentationPHP Machinist Presentation
PHP Machinist PresentationAdam Englander
Β 

Similar to Mongo db (20)

Mongo DB 102
Mongo DB 102Mongo DB 102
Mongo DB 102
Β 
MongoDB - Javascript for your Data
MongoDB - Javascript for your DataMongoDB - Javascript for your Data
MongoDB - Javascript for your Data
Β 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
Β 
PHP Development With MongoDB
PHP Development With MongoDBPHP Development With MongoDB
PHP Development With MongoDB
Β 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)
Β 
Mongo Presentation by Metatagg Solutions
Mongo Presentation by Metatagg SolutionsMongo Presentation by Metatagg Solutions
Mongo Presentation by Metatagg Solutions
Β 
Latinoware
LatinowareLatinoware
Latinoware
Β 
Search Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and SolrSearch Engine-Building with Lucene and Solr
Search Engine-Building with Lucene and Solr
Β 
MongoDB - Features and Operations
MongoDB - Features and OperationsMongoDB - Features and Operations
MongoDB - Features and Operations
Β 
MongoDB
MongoDB MongoDB
MongoDB
Β 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
Β 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
Β 
Querying mongo db
Querying mongo dbQuerying mongo db
Querying mongo db
Β 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
Β 
Schema Design
Schema DesignSchema Design
Schema Design
Β 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
Β 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Β 
Schema design mongo_boston
Schema design mongo_bostonSchema design mongo_boston
Schema design mongo_boston
Β 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Β 
PHP Machinist Presentation
PHP Machinist PresentationPHP Machinist Presentation
PHP Machinist Presentation
Β 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
Β 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
Β 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Β 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Β 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
Β 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Β 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
Β 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
Β 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
Β 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Β 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
Β 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Β 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
Β 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Β 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Β 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Β 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Β 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Β 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Β 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
Β 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Β 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Β 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Β 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Β 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Β 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Β 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Β 

Mongo db

  • 1. MongoDB http://tinyurl.com/97o49y3 by toki
  • 2. About me ● Delta Electronic CTBD Senior Engineer ● Main developer of http://loltw.net β—‹ Website built via MongoDB with daily 600k PV β—‹ Data grow up everyday with auto crawler bots
  • 3. MongoDB - Simple Introduction ● Document based NOSQL(Not Only SQL) database ● Started from 2007 by 10Gen company ● Wrote in C++ ● Fast (But takes lots of memory) ● Stores JSON documents in BSON format ● Full index on any document attribute ● Horizontal scalability with auto sharding ● High availability & replica ready
  • 4. What is database? ● Raw data β—‹ John is a student, he's 12 years old. ● Data β—‹ Student β–  name = "John" β–  age = 12 ● Records β—‹ Student(name="John", age=12) β—‹ Student(name="Alice", age=11) ● Database β—‹ Student Table β—‹ Grades Table
  • 5. Example of (relational) database Student Grade Grade ID StudentID Student Grade Student ID Grade Name Grade ID Age Name Class ID Class Class ID Name
  • 6. SQL Language - How to find data? ● Find student name is John β—‹ select * from student where name="John" ● Find class name of John β—‹ select s.name, c.name as class_name from student s, class c where name="John" and s.class_id=c. class_id
  • 7. Why NOSQL? ● Big data β—‹ Morden data size is too big for single DB server β—‹ Google search engine ● Connectivity β—‹ Facebook like button ● Semi-structure data β—‹ Car equipments database ● High availability β—‹ The basic of cloud service
  • 8. Common NOSQL DB characteristic ● Schemaless ● No join, stores pre-joined/embedded data ● Horizontal scalability ● Replica ready - High availability
  • 9. Common types of NOSQL DB ● Key-Value β—‹ Based on Amazon's Dynamo paper β—‹ Stores K-V pairs β—‹ Example: β–  Dynomite β–  Voldemort
  • 10. Common types of NOSQL DB ● Bigtable clones β—‹ Based on Google Bigtable paper β—‹ Column oriented, but handles semi-structured data β—‹ Data keyed by: row, column, time, index β—‹ Example: β–  Google Big Table β–  HBase β–  Cassandra(FB)
  • 11. Common types of NOSQL DB ● Document base β—‹ Stores multi-level K-V pairs β—‹ Usually use JSON as document format β—‹ Example: β–  MongoDB β–  CounchDB (Apache) β–  Redis
  • 12. Common types of NOSQL DB ● Graph β—‹ Focus on modeling the structure of data - interconnectivity β—‹ Example β–  Neo4j β–  AllegroGraph
  • 13. Start using MongoDB - Installation ● From apt-get (debian / ubuntu only) β—‹ sudo apt-get install mongodb ● Using 10-gen mongodb repository β—‹ http://docs.mongodb.org/manual/tutorial/install- mongodb-on-debian-or-ubuntu-linux/ ● From pre-built binary or source β—‹ http://www.mongodb.org/downloads ● Note: 32-bit builds limited to around 2GB of data
  • 14. Manual start your MongoDB mkdir -p /tmp/mongo mongod --dbpath /tmp/mongo or mongod -f mongodb.conf
  • 15. Verify your MongoDB installation $ mongo MongoDB shell version: 2.2.0 connecting to: test >_ -------------------------------------------------------- mongo localhost/test2 mongo 127.0.0.1/test
  • 16. How many database do you have? show dbs
  • 17. Elements of MongoDB ● Database β—‹ Collection β–  Document
  • 18. What is JSON ● JavaScript Object Notation ● Elements of JSON { β—‹ Object: K/V pairs "key1": "value1", β—‹ Key, String "key2": 2.0 β—‹ Value, could be "key3": [1, "str", 3.0], β–  string "key4": false, β–  bool "key5": { "name": "another object", β–  number } β–  array } β–  object β–  null
  • 19. Another sample of JSON { "name": "John", "age": 12, "grades": { "math": 4.0, "english": 5.0 }, "registered": true, "favorite subjects": ["math", "english"] }
  • 20. Insert document into MongoDB s={ "name": "John", "age": 12, "grades": { "math": 4.0, "english": 5.0 }, "registered": true, "favorite subjects": ["math", "english"] } db.students.insert(s);
  • 21. Verify inserted document db.students.find() also try db.student.insert(s) show collections
  • 22. Save document into MongoDB s.name = "Alice" s.age = 14 s.grades.math = 2.0 db.students.save(s)
  • 23. What is _id / ObjectId ? ● _id is the default primary key for indexing documents, could be any JSON acceptable value. ● By default, MongoDB will auto generate a ObjectId as _id ● ObjectId is 12 bytes value of unique document _id ● Use ObjectId().getTimestamp() to restore the timestamp in ObjectId 0 1 2 3 4 5 6 7 8 9 10 11 unix timestamp machine process id Increment
  • 24. Save document with id into MongoDB s.name = "Bob" s.age = 11 s['favorite subjects'] = ["music", "math", "art"] s.grades.chinese = 3.0 s._id = 1 db.students.save(s)
  • 25. Save document with existing _id delete s.registered db.students.save(s)
  • 26. How to find documents? ● db.xxxx.find() β—‹ list all documents in collection ● db.xxxx.find( find spec, //how document looks like find fields, //which parts I wanna see ... ) ● db.xxxx.findOne() β—‹ only returns first document match find spec.
  • 27. find by id db.students.find({_id: 1}) db.students.find({_id: ObjectId('xxx....')})
  • 28. find and filter return fields db.students.find({_id: 1}, {_id: 1}) db.students.find({_id: 1}, {name: 1}) db.students.find({_id: 1}, {_id: 1, name: 1}) db.students.find({_id: 1}, {_id: 0, name: 1})
  • 29. find by name - equal or not equal db.students.find({name: "John"}) db.students.find({name: "Alice"}) db.students.find({name: {$ne: "John"}}) ● $ne : not equal
  • 30. find by name - ignorecase ($regex) db.students.find({name: "john"}) => X db.students.find({name: /john/i}) => O db.students.find({ name: { $regex: "^b", $options: "i" } })
  • 31. find by range of names - $in, $nin db.students.find({name: {$in: ["John", "Bob"]}}) db.students.find({name: {$nin: ["John", "Bob"]}}) ● $in : in range (array of items) ● $nin : not in range
  • 32. find by age - $gt, $gte, $lt, $lte db.students.find({age: {$gt: 12}}) db.students.find({age: {$gte: 12}}) db.students.find({age: {$lt: 12}}) db.students.find({age: {$lte: 12}}) ● $gt : greater than ● $gte : greater than or equal ● $lt : lesser than ● $lte : lesser or equal
  • 33. find by field existence - $exists db.students.find({registered: {$exists: true}}) db.students.find({registered: {$exists: false}})
  • 34. find by field type - $type db.students.find({_id: {$type: 7}}) db.students.find({_id: {$type: 1}}) 1 Double 11 Regular expression 2 String 13 JavaScript code 3 Object 14 Symbol 4 Array 15 JavaScript code with scope 5 Binary Data 16 32 bit integer 7 Object id 17 Timestamp 8 Boolean 18 64 bit integer 9 Date 255 Min key 10 Null 127 Max key
  • 35. find in multi-level fields db.students.find({"grades.math": {$gt: 2.0}}) db.students.find({"grades.math": {$gte: 2.0}})
  • 36. find by remainder - $mod db.students.find({age: {$mod: [10, 2]}}) db.students.find({age: {$mod: [10, 3]}})
  • 37. find in array - $size db.students.find( {'favorite subjects': {$size: 2}} ) db.students.find( {'favorite subjects': {$size: 3}} )
  • 38. find in array - $all db.students.find({'favorite subjects': { $all: ["music", "math", "art"] }}) db.students.find({'favorite subjects': { $all: ["english", "math"] }})
  • 39. find in array - find value in array db.students.find( {"favorite subjects": "art"} ) db.students.find( {"favorite subjects": "math"} )
  • 40. find with bool operators - $and, $or db.students.find({$or: [ {age: {$lt: 12}}, {age: {$gt: 12}} ]}) db.students.find({$and: [ {age: {$lt: 12}}, {age: {$gte: 11}} ]})
  • 41. find with bool operators - $and, $or db.students.find({$and: [ {age: {$lt: 12}}, {age: {$gte: 11}} ]}) equals to db.student.find({age: {$lt:12, $gte: 11}}
  • 42. find with bool operators - $not $not could only be used with other find filter X db.students.find({registered: {$not: false}}) O db.students.find({registered: {$ne: false}}) O db.students.find({age: {$not: {$gte: 12}}})
  • 43. find with JavaScript- $where db.students.find({$where: "this.age > 12"}) db.students.find({$where: "this.grades.chinese" })
  • 44. find cursor functions ● count db.students.find().count() ● limit db.students.find().limit(1) ● skip db.students.find().skip(1) ● sort db.students.find().sort({age: -1}) db.students.find().sort({age: 1})
  • 45. combine find cursor functions db.students.find().skip(1).limit(1) db.students.find().skip(1).sort({age: -1}) db.students.find().skip(1).limit(1).sort({age: -1})
  • 46. more cursor functions ● snapshot ensure cursor returns β—‹ no duplicates β—‹ misses no object β—‹ returns all matching objects that were present at the beginning and the end of the query. β—‹ usually for export/dump usage
  • 47. more cursor functions ● batchSize tell MongoDB how many documents should be sent to client at once ● explain for performance profiling ● hint tell MongoDB which index should be used for querying/sorting
  • 48. list current running operations ● list operations db.currentOP() ● cancel operations db.killOP()
  • 49. MongoDB index - when to use index? ● while doing complicate find ● while sorting lots of data
  • 50. MongoDB index - sort() example for (i=0; i<1000000; i++){ db.many.save({value: i}); } db.many.find().sort({value: -1}) error: { "$err" : "too much data for sort() with no index. add an index or specify a smaller limit", "code" : 10128 }
  • 51. MongoDB index - how to build index db.many.ensureIndex({value: 1}) ● Index options β—‹ background β—‹ unique β—‹ dropDups β—‹ sparse
  • 52. MongoDB index - index commands ● list index db.many.getIndexes() ● drop index db.many.dropIndex({value: 1}) db.many.dropIndexes() <-- DANGER!
  • 53. MongoDB Index - find() example db.many.dropIndex({value: 1}) db.many.find({value: 5555}).explain() db.many.ensureIndex({value: 1}) db.many.find({value: 5555}).explain()
  • 54. MongoDB Index - Compound Index db.xxx.ensureIndex({a:1, b:-1, c:1}) query/sort with fields ● a ● a, b ● a, b, c will be accelerated by this index
  • 55. Remove/Drop data from MongoDB ● Remove db.many.remove({value: 5555}) db.many.find({value: 5555}) db.many.remove() ● Drop db.many.drop() ● Drop database db.dropDatabase() EXTREMELY DANGER!!!
  • 56. How to update data in MongoDB Easiest way: s = db.students.findOne({_id: 1}) s.registered = true db.students.save(s)
  • 57. In place update - update() update( {find spec}, {update spec}, upsert=false) db.students.update( {_id: 1}, {$set: {registered: false}} )
  • 58. Update a non-exist document db.students.update( {_id: 2}, {name: 'Mary', age: 9}, true ) db.students.update( {_id: 2}, {$set: {name: 'Mary', age: 9}}, true )
  • 59. set / unset field value db.students.update({_id: 1}, {$set: {"age": 15}}) db.students.update({_id: 1}, {$set: {registered: {2012: false, 2011:true} }}) db.students.update({_id: 1}, {$unset: {registered: 1}})
  • 60. increase/decrease value db.students.update({_id: 1}, { $inc: { "grades.math": 1.1, "grades.english": -1.5, "grades.history": 3.0 } })
  • 61. push value(s) into array db.students.update({_id: 1},{ $push: {tags: "lazy"} }) db.students.update({_id: 1},{ $pushAll: {tags: ["smart", "cute"]} })
  • 62. add only not exists value to array db.students.update({_id: 1},{ $push: {tags: "lazy"} }) db.students.update({_id: 1},{ $addToSet:{tags: "lazy"} }) db.students.update({_id: 1},{ $addToSet:{tags: {$each: ["tall", "thin"]}} })
  • 63. remove value from array db.students.update({_id: 1},{ $pull: {tags: "lazy"} }) db.students.update({_id: 1},{ $pull: {tags: {$ne: "smart"}} }) db.students.update({_id: 1},{ $pullAll: {tags: ["lazy", "smart"]} })
  • 64. pop value from array a = []; for(i=0;i<20;i++){a.push(i);} db.test.save({_id:1, value: a}) db.test.update({_id: 1}, { $pop: {value: 1} }) db.test.update({_id: 1}, { $pop: {value: -1} })
  • 65. rename field db.test.update({_id: 1}, { $rename: {value: "values"} })
  • 66. Practice: add comments to student Add a field into students ({_id: 1}): ● field name: comments ● field type: array of dictionary ● field content: β—‹ { by: author name, string text: content of comment, string } ● add at least 3 comments to this field
  • 67. Example answer to practice db.students.update({_id: 1}, { $addToSet: { comments: {$each: [ {by: "teacher01", text: "text 01"}, {by: "teacher02", text: "text 02"}, {by: "teacher03", text: "text 03"}, ]}} })
  • 68. The $ position operator (for array) db.students.update({ _id: 1, "comments.by": "teacher02" }, { $inc: {"comments.$.vote": 1} })
  • 69. Atomically update - findAndModify ● Atomically update SINGLE DOCUMENT and return it ● By default, returned document won't contain the modification made in findAndModify command.
  • 70. findAndModify parameters db.xxx.findAndModify({ query: filter to query sort: how to sort and select 1st document in query results remove: set true if you want to remove it update: update content new: set true if you want to get the modified object fields: which fields to fetch upsert: create object if not exists })
  • 71. GridFS ● MongoDB has 32MB document size limit ● For storing large binary objects in MongoDB ● GridFS is kind of spec, not implementation ● Implementation is done by MongoDB drivers ● Current supported drivers: β—‹ PHP β—‹ Java β—‹ Python β—‹ Ruby β—‹ Perl
  • 72. GridFS - command line tools ● List mongofiles list ● Put mongofiles put xxx.txt ● Get mongofiles get xxx.txt
  • 73. MongoDB config - basic ● dbpath β—‹ Which folder to put MongoDB database files β—‹ MongoDB must have write permission to this folder ● logpath, logappend β—‹ logpath = log filename β—‹ MongoDB must have write permission to log file ● bind_ip β—‹ IP(s) MongoDB will bind with, by default is all β—‹ User comma to separate more than 1 IP ● port β—‹ Port number MongoDB will use β—‹ Default port = 27017
  • 74. Small tip - rotate MongoDB log db.getMongo().getDB("admin").runCommand ("logRotate")
  • 75. MongoDB config - journal ● journal β—‹ Set journal on/off β—‹ Usually you should keep this on
  • 76. MongoDB config - http interface ● nohttpinterface β—‹ Default listen on http://localhost:28017 β—‹ Shows statistic info with http interface ● rest β—‹ Used with httpinterface option enabled only β—‹ Example: http://localhost:28017/test/students/ http://localhost:28017/test/students/? filter_name=John
  • 77. MongoDB config - authentication ● auth β—‹ By default, MongoDB runs with no authentication β—‹ If no admin account is created, you could login with no authentication through local mongo shell and start managing user accounts.
  • 78. MongoDB account management ● Add admin user > mongo localhost/admin db.addUser("testadmin", "1234") ● Authenticated as admin user use admin db.auth("testadmin", "1234")
  • 79. MongoDB account management ● Add user to test database use test db.addUser("testrw", "1234") ● Add read only user to test database db.addUser("testro", "1234", true) ● List users db.system.users.find() ● Remove user db.removeUser("testro")
  • 80. MongoDB config - authentication ● keyFile β—‹ At least 6 characters and size smaller than 1KB β—‹ Used only for replica/sharding servers β—‹ Every replica/sharding server should use the same key file for communication β—‹ On U*ix system, file permission to key file for group/everyone must be none, or MongoDB will refuse to start
  • 81. MongoDB configuration - Replica Set ● replSet β—‹ Indicate the replica set name β—‹ All MongoDB in same replica set should use the same name β—‹ Limitation β–  Maximum 12 nodes in a single replica set β–  Maximum 7 nodes can vote β—‹ MongoDB replica set is Eventually consistent
  • 82. How's MongoDB replica set working? ● Each a replica set has single primary (master) node and multiple slave nodes ● Data will only be wrote to primary node then will be synced to other slave nodes. ● Use getLastError() for confirming previous write operation is committed to whole replica set, otherwise the write operation may be rolled back if primary node is down before sync.
  • 83. How's MongoDB replica set working? ● Once primary node is down, the whole replica set will be marked as fail and can't do any operation on it until the other nodes vote and elect a new primary node. ● During failover, any write operation not committed to whole replica set will be rolled back
  • 84. Simple replica set configuration mkdir -p /tmp/db01 mkdir -p /tmp/db02 mkdir -p /tmp/db03 mongod --replSet test --port 29001 --dbpath /tmp/db01 mongod --replSet test --port 29002 --dbpath /tmp/db02 mongod --replSet test --port 29003 --dbpath /tmp/db03
  • 85. Simple replica set configuration mongo localhost:29001
  • 86. Another way to config replica set rs.initiate() rs.add("localhost:29001") rs.add("localhost:29002") rs.add("localhost:29003")
  • 87. Extra options for setting replica set ● arbiterOnly β—‹ Arbiter nodes don't receive data, can't become primary node but can vote. ● priority β—‹ Node with priority 0 will never be elected as primary node. β—‹ Higher priority nodes will be preferred as primary β—‹ If you want to force some node become primary node, do not update node's vote result, update node's priority value and reconfig replica set. ● buildIndexes β—‹ Can only be set to false on nodes with priority 0 β—‹ Use false for backup only nodes
  • 88. Extra options for setting replica set ● hidden β—‹ Nodes marked with hidden option will not be exposed to MongoDB clients. β—‹ Nodes marked with hidden option will not receive queries. β—‹ Only use this option for nodes with usage like reporting, integration, backup, etc. ● slaveDelay β—‹ How many seconds slave nodes could fall behind to primary nodes β—‹ Can only be set on nodes with priority 0 β—‹ Used for preventing some human errors
  • 89. Extra options for setting replica set ● vote If set to 1, this node can vote, else not.
  • 90. Change primary node at runtime config = rs.conf() config.members[1].priority = 2 rs.reconfig(config)
  • 91. What is sharding? Name Value A value Alice value to value Amy value F value Bob value G value : value to value : value N value : value : value O value Yoko value to value Zeus value Z value
  • 93. Elements of MongoDB sharding cluster ● Config Server Storing sharding cluster metadata ● mongos Router Routing database operations to correct shard server ● Shard Server Hold real user data
  • 94. Sharding config - config server ● Config server is a MongoDB instance runs with --configsrv option ● Config servers will automatically synced by mongos process, so DO NOT run them with --replSet option ● Synchronous replication protocol is optimized for three machines.
  • 95. Sharding config - mongos Router ● Use mongos (not mongod) for starting a mongos router ● mongos routes database operations to correct shard servers ● Exmaple command for starting mongos mongos --configdb db01, db02, db03 ● With --chunkSize option, you could specify a smaller sharding chunk if you're just testing.
  • 96. Sharding config - shard server ● Shard server is a MongoDB instance runs with --shardsvr option ● Shard server don't need to know where config server / mongos route is
  • 97. Example script for building MongoDB shard cluster mkdir -p /tmp/s00 mkdir -p /tmp/s01 mkdir -p /tmp/s02 mkdir -p /tmp/s03 mongod --configsvr --port 29000 --dbpath /tmp/s00 mongos --configdb localhost:29000 --chunkSize 1 --port 28000 mongod --shardsvr --port 29001 --dbpath /tmp/s01 mongod --shardsvr --port 29002 --dbpath /tmp/s02 mongod --shardsvr --port 29003 --dbpath /tmp/s03
  • 98. Sharding config - add shard server mongo localhost:28000/admin db.runCommand({addshard: "localhost:29001"}) db.runCommand({addshard: "localhost:29002"}) db.runCommand({addshard: "localhost:29003"}) db.printShardingStatus() db.runCommand( { enablesharding : "test" } ) db.runCommand( {shardcollection: "test.shardtest", key: {_id: 1}, unique: true})
  • 99. Let us insert some documents use test for (i=0; i<1000000; i++) { db.shardtest.insert({value: i}); }
  • 100. Remove 1 shard & see what happens use admin db.runCommand({removeshard: "shard0002"}) Let's add it back db.runCommand({addshard: "localhost: 29003"})
  • 101. Pick your sharding key wisely ● Sharding key can not be changed after sharding enabled ● For updating any document in a sharding cluster, sharding key MUST BE INCLUDED as find spec EX: sharding key= {name: 1, class: 1} db.xxx.update({name: "xxxx", class: "ooo},{ ..... update spec })
  • 102. Pick your sharding key wisely ● Sharding key will strongly affect your data distribution model EX: sharding by ObjectId shard001 => data saved 2 months ago shard002 => data saved 1 months ago shard003 => data saved recently
  • 103. Other sharding key examples EX: sharding by Username shard001 => Username starts with a to k shard002 => Username starts with l to r shard003 => Username starts with s to z EX: sharding by md5 completely random distribution
  • 104. What is Mapreduce? ● Map then Reduce ● Map is the procedure to call a function for emitting keys & values sending to reduce function ● Reduce is the procedure to call a function for reducing the emitted keys & values sent via map function into single reduced result. ● Example: map students grades and reduce into total students grades.
  • 105. How to call mapreduce in MongoDB db.xxx.mapreduce( map function, reduce function,{ out: output option, query: query filter, optional, sort: sort filter, optional, finalize: finalize function, .... etc })
  • 106. Let's generate some data for (i=0; i<10000; i++){ db.grades.insert({ grades: { math: Math.random() * 100 % 100, art: Math.random() * 100 % 100, music: Math.random() * 100 % 100 } }); }
  • 107. Prepare Map function function map(){ for (k in this.grades){ emit(k, {total: 1, pass: 1 ? this.grades[k] >= 60.0 : 0, fail: 1 ? this.grades[k] < 60.0 : 0, sum: this.grades[k], avg: 0 }); } }
  • 108. Prepare reduce function function reduce(key, values){ result = {total: 0, pass: 0, fail: 0, sum: 0, avg: 0}; values.forEach(function(value){ result.total += value.total; result.pass += value.pass; result.fail += value.fail; result.sum += value.sum; }); return result; }
  • 109. Execute your 1st mapreduce call db.grades.mapReduce( map, reduce, {out:{inline: 1}} )
  • 110. Add finalize function function finalize(key, value){ value.avg = value.sum / value.total; return value; }
  • 111. Run mapreduce again with finalize db.grades.mapReduce( map, reduce, {out:{inline: 1}, finalize: finalize} )
  • 112. Mapreduce output options ● {replace: <result collection name>} Replace result collection if already existed. ● {merge: <result collection name>} Always overwrite with new results. ● {reduce: <result collection name>} Run reduce if same key exists in both old/current result collections. Will run finalize function if any. ● {inline: 1} Put result in memory
  • 113. Other mapreduce output options ● db- put result collection in different database ● sharded - output collection will be sharded using key = _id ● nonAtomic - partial reduce result will be visible will processing.
  • 114. MongoDB backup & restore ● mongodump mongodump -h localhost:27017 ● mongorestore mongorestore -h localhost:27017 --drop ● mongoexport mongoexport -d test -c students -h localhost:27017 > students.json ● mongoimport mongoimport -d test -c students -h localhost:27017 < students.json
  • 115. Conclusion - Pros of MongoDB ● Agile (Schemaless) ● Easy to use ● Built in replica & sharding ● Mapreduce with sharding
  • 116. Conclusion - Cons of MongoDB ● Schemaless = everyone need to know how data look like ● Waste of spaces on keys ● Eats lots of memory ● Mapreduce is hard to handle
  • 117. Cautions of MongoDB ● Global write lock β—‹ Add more RAM β—‹ Use newer version (MongoDB 2.2 now has DB level global write lock) β—‹ Split your database properly ● Remove document won't free disk spaces β—‹ You need run compact command periodically ● Don't let your MongoDB data disk full β—‹ Once freespace of disk used by MongoDB if full, you won't be able to move/delete document in it.