Mongo db
Upcoming SlideShare
Loading in...5

Mongo db






Total Views
Slideshare-icon Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Mongo db Mongo db Presentation Transcript

    • MongoDB by toki
    • About me● Delta Electronic CTBD Senior Engineer● Main developer of ○ Website built via MongoDB with daily 600k PV ○ Data grow up everyday with auto crawler bots
    • MongoDB - Simple Introduction● Document based NOSQL(Not Only SQL) database● Started from 2007 by 10Gen company● Wrote in C++● Fast (But takes lots of memory)● Stores JSON documents in BSON format● Full index on any document attribute● Horizontal scalability with auto sharding● High availability & replica ready
    • What is database?● Raw data ○ John is a student, hes 12 years old.● Data ○ Student ■ name = "John" ■ age = 12● Records ○ Student(name="John", age=12) ○ Student(name="Alice", age=11)● Database ○ Student Table ○ Grades Table
    • Example of (relational) database Student Grade Grade ID StudentID Student GradeStudent ID GradeName Grade IDAge NameClass ID Class Class ID Name
    • SQL Language - How to find data?● Find student name is John ○ select * from student where name="John"● Find class name of John ○ select, as class_name from student s, class c where name="John" and s.class_id=c. class_id
    • Why NOSQL?● Big data ○ Morden data size is too big for single DB server ○ Google search engine● Connectivity ○ Facebook like button● Semi-structure data ○ Car equipments database● High availability ○ The basic of cloud service
    • Common NOSQL DB characteristic● Schemaless● No join, stores pre-joined/embedded data● Horizontal scalability● Replica ready - High availability
    • Common types of NOSQL DB● Key-Value ○ Based on Amazons Dynamo paper ○ Stores K-V pairs ○ Example: ■ Dynomite ■ Voldemort
    • Common types of NOSQL DB● Bigtable clones ○ Based on Google Bigtable paper ○ Column oriented, but handles semi-structured data ○ Data keyed by: row, column, time, index ○ Example: ■ Google Big Table ■ HBase ■ Cassandra(FB)
    • Common types of NOSQL DB● Document base ○ Stores multi-level K-V pairs ○ Usually use JSON as document format ○ Example: ■ MongoDB ■ CounchDB (Apache) ■ Redis
    • Common types of NOSQL DB● Graph ○ Focus on modeling the structure of data - interconnectivity ○ Example ■ Neo4j ■ AllegroGraph
    • Start using MongoDB - Installation● From apt-get (debian / ubuntu only) ○ sudo apt-get install mongodb● Using 10-gen mongodb repository ○ mongodb-on-debian-or-ubuntu-linux/● From pre-built binary or source ○● Note: 32-bit builds limited to around 2GB of data
    • Manual start your MongoDBmkdir -p /tmp/mongomongod --dbpath /tmp/mongoormongod -f mongodb.conf
    • Verify your MongoDB installation$ mongoMongoDB shell version: 2.2.0connecting to: test>_--------------------------------------------------------mongo localhost/test2mongo
    • How many database do you have?show dbs
    • Elements of MongoDB● Database ○ Collection ■ Document
    • What is JSON● JavaScript Object Notation● Elements of JSON { ○ Object: K/V pairs "key1": "value1", ○ Key, String "key2": 2.0 ○ Value, could be "key3": [1, "str", 3.0], ■ string "key4": false, ■ bool "key5": { "name": "another object", ■ number } ■ array } ■ object ■ null
    • Another sample of JSON{ "name": "John", "age": 12, "grades": { "math": 4.0, "english": 5.0 }, "registered": true, "favorite subjects": ["math", "english"]}
    • Insert document into MongoDBs={ "name": "John", "age": 12, "grades": { "math": 4.0, "english": 5.0 }, "registered": true, "favorite subjects": ["math", "english"]}db.students.insert(s);
    • Verify inserted documentdb.students.find()also trydb.student.insert(s)show collections
    • Save document into = "Alice"s.age = 14s.grades.math =
    • What is _id / ObjectId ?● _id is the default primary key for indexing documents, could be any JSON acceptable value.● By default, MongoDB will auto generate a ObjectId as _id● ObjectId is 12 bytes value of unique document _id● Use ObjectId().getTimestamp() to restore the timestamp in ObjectId 0 1 2 3 4 5 6 7 8 9 10 11 unix timestamp machine process id Increment
    • Save document with id into = "Bob"s.age = 11s[favorite subjects] = ["music", "math", "art"]s.grades.chinese = 3.0s._id =
    • Save document with existing _iddelete
    • How to find documents?● db.xxxx.find() ○ list all documents in collection● db.xxxx.find( find spec, //how document looks like find fields, //which parts I wanna see ... )● db.xxxx.findOne() ○ only returns first document match find spec.
    • find by iddb.students.find({_id: 1})db.students.find({_id: ObjectId(xxx....)})
    • find and filter return fieldsdb.students.find({_id: 1}, {_id: 1})db.students.find({_id: 1}, {name: 1})db.students.find({_id: 1}, {_id: 1, name: 1})db.students.find({_id: 1}, {_id: 0, name: 1})
    • find by name - equal or not equaldb.students.find({name: "John"})db.students.find({name: "Alice"})db.students.find({name: {$ne: "John"}})● $ne : not equal
    • find by name - ignorecase ($regex)db.students.find({name: "john"}) => Xdb.students.find({name: /john/i}) => Odb.students.find({ name: { $regex: "^b", $options: "i" } })
    • find by range of names - $in, $nindb.students.find({name: {$in: ["John", "Bob"]}})db.students.find({name: {$nin: ["John", "Bob"]}})● $in : in range (array of items)● $nin : not in range
    • find by age - $gt, $gte, $lt, $ltedb.students.find({age: {$gt: 12}})db.students.find({age: {$gte: 12}})db.students.find({age: {$lt: 12}})db.students.find({age: {$lte: 12}})● $gt : greater than● $gte : greater than or equal● $lt : lesser than● $lte : lesser or equal
    • find by field existence - $existsdb.students.find({registered: {$exists: true}})db.students.find({registered: {$exists: false}})
    • find by field type - $typedb.students.find({_id: {$type: 7}})db.students.find({_id: {$type: 1}}) 1 Double 11 Regular expression 2 String 13 JavaScript code 3 Object 14 Symbol 4 Array 15 JavaScript code with scope 5 Binary Data 16 32 bit integer 7 Object id 17 Timestamp 8 Boolean 18 64 bit integer 9 Date 255 Min key 10 Null 127 Max key
    • find in multi-level fieldsdb.students.find({"grades.math": {$gt: 2.0}})db.students.find({"grades.math": {$gte: 2.0}})
    • find by remainder - $moddb.students.find({age: {$mod: [10, 2]}})db.students.find({age: {$mod: [10, 3]}})
    • find in array - $sizedb.students.find( {favorite subjects: {$size: 2}})db.students.find( {favorite subjects: {$size: 3}})
    • find in array - $alldb.students.find({favorite subjects: { $all: ["music", "math", "art"] }})db.students.find({favorite subjects: { $all: ["english", "math"] }})
    • find in array - find value in arraydb.students.find( {"favorite subjects": "art"})db.students.find( {"favorite subjects": "math"})
    • find with bool operators - $and, $ordb.students.find({$or: [ {age: {$lt: 12}}, {age: {$gt: 12}}]})db.students.find({$and: [ {age: {$lt: 12}}, {age: {$gte: 11}}]})
    • find with bool operators - $and, $ordb.students.find({$and: [ {age: {$lt: 12}}, {age: {$gte: 11}}]})equals todb.student.find({age: {$lt:12, $gte: 11}}
    • find with bool operators - $not$not could only be used with other find filterX db.students.find({registered: {$not: false}})O db.students.find({registered: {$ne: false}})O db.students.find({age: {$not: {$gte: 12}}})
    • find with JavaScript- $wheredb.students.find({$where: "this.age > 12"})db.students.find({$where: "this.grades.chinese"})
    • find cursor functions● count db.students.find().count()● limit db.students.find().limit(1)● skip db.students.find().skip(1)● sort db.students.find().sort({age: -1}) db.students.find().sort({age: 1})
    • combine find cursor functionsdb.students.find().skip(1).limit(1)db.students.find().skip(1).sort({age: -1})db.students.find().skip(1).limit(1).sort({age:-1})
    • more cursor functions● snapshot ensure cursor returns ○ no duplicates ○ misses no object ○ returns all matching objects that were present at the beginning and the end of the query. ○ usually for export/dump usage
    • more cursor functions● batchSize tell MongoDB how many documents should be sent to client at once● explain for performance profiling● hint tell MongoDB which index should be used for querying/sorting
    • list current running operations● list operations db.currentOP()● cancel operations db.killOP()
    • MongoDB index - when to use index?● while doing complicate find● while sorting lots of data
    • MongoDB index - sort() examplefor (i=0; i<1000000; i++){{value: i});}db.many.find().sort({value: -1})error: { "$err" : "too much data for sort() with no index. add an index or specifya smaller limit", "code" : 10128}
    • MongoDB index - how to build indexdb.many.ensureIndex({value: 1})● Index options ○ background ○ unique ○ dropDups ○ sparse
    • MongoDB index - index commands● list index db.many.getIndexes()● drop index db.many.dropIndex({value: 1}) db.many.dropIndexes() <-- DANGER!
    • MongoDB Index - find() exampledb.many.dropIndex({value: 1})db.many.find({value: 5555}).explain()db.many.ensureIndex({value: 1})db.many.find({value: 5555}).explain()
    • MongoDB Index - Compound{a:1, b:-1, c:1})query/sort with fields ● a ● a, b ● a, b, cwill be accelerated by this index
    • Remove/Drop data from MongoDB● Remove db.many.remove({value: 5555}) db.many.find({value: 5555}) db.many.remove()● Drop db.many.drop()● Drop database db.dropDatabase() EXTREMELY DANGER!!!
    • How to update data in MongoDBEasiest way:s = db.students.findOne({_id: 1})s.registered =
    • In place update - update()update( {find spec}, {update spec}, upsert=false)db.students.update( {_id: 1}, {$set: {registered: false}})
    • Update a non-exist documentdb.students.update( {_id: 2}, {name: Mary, age: 9}, true)db.students.update( {_id: 2}, {$set: {name: Mary, age: 9}}, true)
    • set / unset field valuedb.students.update({_id: 1}, {$set: {"age": 15}})db.students.update({_id: 1}, {$set: {registered: {2012: false, 2011:true} }})db.students.update({_id: 1}, {$unset: {registered: 1}})
    • increase/decrease valuedb.students.update({_id: 1}, { $inc: { "grades.math": 1.1, "grades.english": -1.5, "grades.history": 3.0 }})
    • push value(s) into arraydb.students.update({_id: 1},{ $push: {tags: "lazy"}})db.students.update({_id: 1},{ $pushAll: {tags: ["smart", "cute"]}})
    • add only not exists value to arraydb.students.update({_id: 1},{ $push: {tags: "lazy"}})db.students.update({_id: 1},{ $addToSet:{tags: "lazy"}})db.students.update({_id: 1},{ $addToSet:{tags: {$each: ["tall", "thin"]}}})
    • remove value from arraydb.students.update({_id: 1},{ $pull: {tags: "lazy"}})db.students.update({_id: 1},{ $pull: {tags: {$ne: "smart"}}})db.students.update({_id: 1},{ $pullAll: {tags: ["lazy", "smart"]}})
    • pop value from arraya = []; for(i=0;i<20;i++){a.push(i);}{_id:1, value: a})db.test.update({_id: 1}, { $pop: {value: 1}})db.test.update({_id: 1}, { $pop: {value: -1}})
    • rename fielddb.test.update({_id: 1}, { $rename: {value: "values"}})
    • Practice: add comments to studentAdd a field into students ({_id: 1}):● field name: comments● field type: array of dictionary● field content: ○ { by: author name, string text: content of comment, string }● add at least 3 comments to this field
    • Example answer to practicedb.students.update({_id: 1}, {$addToSet: { comments: {$each: [ {by: "teacher01", text: "text 01"}, {by: "teacher02", text: "text 02"}, {by: "teacher03", text: "text 03"},]}}})
    • The $ position operator (for array)db.students.update({ _id: 1, "": "teacher02" }, { $inc: {"comments.$.vote": 1}})
    • Atomically update - findAndModify● Atomically update SINGLE DOCUMENT and return it● By default, returned document wont contain the modification made in findAndModify command.
    • findAndModify{query: filter to querysort: how to sort and select 1st document in query resultsremove: set true if you want to remove itupdate: update contentnew: set true if you want to get the modified objectfields: which fields to fetchupsert: create object if not exists})
    • GridFS● MongoDB has 32MB document size limit● For storing large binary objects in MongoDB● GridFS is kind of spec, not implementation● Implementation is done by MongoDB drivers● Current supported drivers: ○ PHP ○ Java ○ Python ○ Ruby ○ Perl
    • GridFS - command line tools● List mongofiles list● Put mongofiles put xxx.txt● Get mongofiles get xxx.txt
    • MongoDB config - basic● dbpath ○ Which folder to put MongoDB database files ○ MongoDB must have write permission to this folder● logpath, logappend ○ logpath = log filename ○ MongoDB must have write permission to log file● bind_ip ○ IP(s) MongoDB will bind with, by default is all ○ User comma to separate more than 1 IP● port ○ Port number MongoDB will use ○ Default port = 27017
    • Small tip - rotate MongoDB logdb.getMongo().getDB("admin").runCommand("logRotate")
    • MongoDB config - journal● journal ○ Set journal on/off ○ Usually you should keep this on
    • MongoDB config - http interface● nohttpinterface ○ Default listen on http://localhost:28017 ○ Shows statistic info with http interface● rest ○ Used with httpinterface option enabled only ○ Example: http://localhost:28017/test/students/ http://localhost:28017/test/students/? filter_name=John
    • MongoDB config - authentication● auth ○ By default, MongoDB runs with no authentication ○ If no admin account is created, you could login with no authentication through local mongo shell and start managing user accounts.
    • MongoDB account management● Add admin user > mongo localhost/admin db.addUser("testadmin", "1234")● Authenticated as admin user use admin db.auth("testadmin", "1234")
    • MongoDB account management● Add user to test database use test db.addUser("testrw", "1234")● Add read only user to test database db.addUser("testro", "1234", true)● List users db.system.users.find()● Remove user db.removeUser("testro")
    • MongoDB config - authentication● keyFile ○ At least 6 characters and size smaller than 1KB ○ Used only for replica/sharding servers ○ Every replica/sharding server should use the same key file for communication ○ On U*ix system, file permission to key file for group/everyone must be none, or MongoDB will refuse to start
    • MongoDB configuration - Replica Set● replSet ○ Indicate the replica set name ○ All MongoDB in same replica set should use the same name ○ Limitation ■ Maximum 12 nodes in a single replica set ■ Maximum 7 nodes can vote ○ MongoDB replica set is Eventually consistent
    • Hows MongoDB replica set working?● Each a replica set has single primary (master) node and multiple slave nodes● Data will only be wrote to primary node then will be synced to other slave nodes.● Use getLastError() for confirming previous write operation is committed to whole replica set, otherwise the write operation may be rolled back if primary node is down before sync.
    • Hows MongoDB replica set working?● Once primary node is down, the whole replica set will be marked as fail and cant do any operation on it until the other nodes vote and elect a new primary node.● During failover, any write operation not committed to whole replica set will be rolled back
    • Simple replica set configurationmkdir -p /tmp/db01mkdir -p /tmp/db02mkdir -p /tmp/db03mongod --replSet test --port 29001 --dbpath /tmp/db01mongod --replSet test --port 29002 --dbpath /tmp/db02mongod --replSet test --port 29003 --dbpath /tmp/db03
    • Simple replica set configurationmongo localhost:29001
    • Another way to config replica setrs.initiate()rs.add("localhost:29001")rs.add("localhost:29002")rs.add("localhost:29003")
    • Extra options for setting replica set● arbiterOnly ○ Arbiter nodes dont receive data, cant become primary node but can vote.● priority ○ Node with priority 0 will never be elected as primary node. ○ Higher priority nodes will be preferred as primary ○ If you want to force some node become primary node, do not update nodes vote result, update nodes priority value and reconfig replica set.● buildIndexes ○ Can only be set to false on nodes with priority 0 ○ Use false for backup only nodes
    • Extra options for setting replica set● hidden ○ Nodes marked with hidden option will not be exposed to MongoDB clients. ○ Nodes marked with hidden option will not receive queries. ○ Only use this option for nodes with usage like reporting, integration, backup, etc.● slaveDelay ○ How many seconds slave nodes could fall behind to primary nodes ○ Can only be set on nodes with priority 0 ○ Used for preventing some human errors
    • Extra options for setting replica set● vote If set to 1, this node can vote, else not.
    • Change primary node at runtimeconfig = rs.conf()config.members[1].priority = 2rs.reconfig(config)
    • What is sharding? Name Value A value Alice value to value Amy value F value Bob value G value : value to value : value N value : value : value O value Yoko value to value Zeus value Z value
    • MongoDB sharding architecture
    • Elements of MongoDB shardingcluster● Config Server Storing sharding cluster metadata● mongos Router Routing database operations to correct shard server● Shard Server Hold real user data
    • Sharding config - config server● Config server is a MongoDB instance runs with --configsrv option● Config servers will automatically synced by mongos process, so DO NOT run them with --replSet option● Synchronous replication protocol is optimized for three machines.
    • Sharding config - mongos Router● Use mongos (not mongod) for starting a mongos router● mongos routes database operations to correct shard servers● Exmaple command for starting mongos mongos --configdb db01, db02, db03● With --chunkSize option, you could specify a smaller sharding chunk if youre just testing.
    • Sharding config - shard server● Shard server is a MongoDB instance runs with --shardsvr option● Shard server dont need to know where config server / mongos route is
    • Example script for building MongoDBshard clustermkdir -p /tmp/s00mkdir -p /tmp/s01mkdir -p /tmp/s02mkdir -p /tmp/s03mongod --configsvr --port 29000 --dbpath /tmp/s00mongos --configdb localhost:29000 --chunkSize 1 --port28000mongod --shardsvr --port 29001 --dbpath /tmp/s01mongod --shardsvr --port 29002 --dbpath /tmp/s02mongod --shardsvr --port 29003 --dbpath /tmp/s03
    • Sharding config - add shard servermongo localhost:28000/admindb.runCommand({addshard: "localhost:29001"})db.runCommand({addshard: "localhost:29002"})db.runCommand({addshard: "localhost:29003"})db.printShardingStatus()db.runCommand( { enablesharding : "test" } )db.runCommand( {shardcollection: "test.shardtest",key: {_id: 1}, unique: true})
    • Let us insert some documentsuse testfor (i=0; i<1000000; i++) { db.shardtest.insert({value: i});}
    • Remove 1 shard & see what happensuse admindb.runCommand({removeshard: "shard0002"})Lets add it backdb.runCommand({addshard: "localhost:29003"})
    • Pick your sharding key wisely● Sharding key can not be changed after sharding enabled● For updating any document in a sharding cluster, sharding key MUST BE INCLUDED as find specEX: sharding key= {name: 1, class: 1}{name: "xxxx", class: "ooo},{ ..... update spec })
    • Pick your sharding key wisely● Sharding key will strongly affect your data distribution modelEX: sharding by ObjectId shard001 => data saved 2 months ago shard002 => data saved 1 months ago shard003 => data saved recently
    • Other sharding key examplesEX: sharding by Username shard001 => Username starts with a to k shard002 => Username starts with l to r shard003 => Username starts with s to zEX: sharding by md5 completely random distribution
    • What is Mapreduce?● Map then Reduce● Map is the procedure to call a function for emitting keys & values sending to reduce function● Reduce is the procedure to call a function for reducing the emitted keys & values sent via map function into single reduced result.● Example: map students grades and reduce into total students grades.
    • How to call mapreduce in map function, reduce function,{ out: output option, query: query filter, optional, sort: sort filter, optional, finalize: finalize function, .... etc})
    • Lets generate some datafor (i=0; i<10000; i++){ db.grades.insert({ grades: { math: Math.random() * 100 % 100, art: Math.random() * 100 % 100, music: Math.random() * 100 % 100 } });}
    • Prepare Map functionfunction map(){ for (k in this.grades){ emit(k, {total: 1, pass: 1 ? this.grades[k] >= 60.0 : 0, fail: 1 ? this.grades[k] < 60.0 : 0, sum: this.grades[k], avg: 0 }); }}
    • Prepare reduce functionfunction reduce(key, values){ result = {total: 0, pass: 0, fail: 0, sum: 0, avg: 0}; values.forEach(function(value){ +=; result.pass += value.pass; +=; result.sum += value.sum; }); return result;}
    • Execute your 1st mapreduce call db.grades.mapReduce( map, reduce, {out:{inline: 1}})
    • Add finalize functionfunction finalize(key, value){ value.avg = value.sum /; return value;}
    • Run mapreduce again with finalize db.grades.mapReduce( map, reduce, {out:{inline: 1}, finalize: finalize})
    • Mapreduce output options● {replace: <result collection name>} Replace result collection if already existed.● {merge: <result collection name>} Always overwrite with new results.● {reduce: <result collection name>} Run reduce if same key exists in both old/current result collections. Will run finalize function if any.● {inline: 1} Put result in memory
    • Other mapreduce output options● db- put result collection in different database● sharded - output collection will be sharded using key = _id● nonAtomic - partial reduce result will be visible will processing.
    • MongoDB backup & restore● mongodump mongodump -h localhost:27017● mongorestore mongorestore -h localhost:27017 --drop● mongoexport mongoexport -d test -c students -h localhost:27017 > students.json● mongoimport mongoimport -d test -c students -h localhost:27017 < students.json
    • Conclusion - Pros of MongoDB● Agile (Schemaless)● Easy to use● Built in replica & sharding● Mapreduce with sharding
    • Conclusion - Cons of MongoDB● Schemaless = everyone need to know how data look like● Waste of spaces on keys● Eats lots of memory● Mapreduce is hard to handle
    • Cautions of MongoDB● Global write lock ○ Add more RAM ○ Use newer version (MongoDB 2.2 now has DB level global write lock) ○ Split your database properly● Remove document wont free disk spaces ○ You need run compact command periodically● Dont let your MongoDB data disk full ○ Once freespace of disk used by MongoDB if full, you wont be able to move/delete document in it.