Mongo db

2,237 views

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,237
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
65
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Mongo db

  1. 1. MongoDB http://tinyurl.com/97o49y3 by toki
  2. 2. About me● Delta Electronic CTBD Senior Engineer● Main developer of http://loltw.net ○ Website built via MongoDB with daily 600k PV ○ Data grow up everyday with auto crawler bots
  3. 3. MongoDB - Simple Introduction● Document based NOSQL(Not Only SQL) database● Started from 2007 by 10Gen company● Wrote in C++● Fast (But takes lots of memory)● Stores JSON documents in BSON format● Full index on any document attribute● Horizontal scalability with auto sharding● High availability & replica ready
  4. 4. What is database?● Raw data ○ John is a student, hes 12 years old.● Data ○ Student ■ name = "John" ■ age = 12● Records ○ Student(name="John", age=12) ○ Student(name="Alice", age=11)● Database ○ Student Table ○ Grades Table
  5. 5. Example of (relational) database Student Grade Grade ID StudentID Student GradeStudent ID GradeName Grade IDAge NameClass ID Class Class ID Name
  6. 6. SQL Language - How to find data?● Find student name is John ○ select * from student where name="John"● Find class name of John ○ select s.name, c.name as class_name from student s, class c where name="John" and s.class_id=c. class_id
  7. 7. Why NOSQL?● Big data ○ Morden data size is too big for single DB server ○ Google search engine● Connectivity ○ Facebook like button● Semi-structure data ○ Car equipments database● High availability ○ The basic of cloud service
  8. 8. Common NOSQL DB characteristic● Schemaless● No join, stores pre-joined/embedded data● Horizontal scalability● Replica ready - High availability
  9. 9. Common types of NOSQL DB● Key-Value ○ Based on Amazons Dynamo paper ○ Stores K-V pairs ○ Example: ■ Dynomite ■ Voldemort
  10. 10. Common types of NOSQL DB● Bigtable clones ○ Based on Google Bigtable paper ○ Column oriented, but handles semi-structured data ○ Data keyed by: row, column, time, index ○ Example: ■ Google Big Table ■ HBase ■ Cassandra(FB)
  11. 11. Common types of NOSQL DB● Document base ○ Stores multi-level K-V pairs ○ Usually use JSON as document format ○ Example: ■ MongoDB ■ CounchDB (Apache) ■ Redis
  12. 12. Common types of NOSQL DB● Graph ○ Focus on modeling the structure of data - interconnectivity ○ Example ■ Neo4j ■ AllegroGraph
  13. 13. Start using MongoDB - Installation● From apt-get (debian / ubuntu only) ○ sudo apt-get install mongodb● Using 10-gen mongodb repository ○ http://docs.mongodb.org/manual/tutorial/install- mongodb-on-debian-or-ubuntu-linux/● From pre-built binary or source ○ http://www.mongodb.org/downloads● Note: 32-bit builds limited to around 2GB of data
  14. 14. Manual start your MongoDBmkdir -p /tmp/mongomongod --dbpath /tmp/mongoormongod -f mongodb.conf
  15. 15. Verify your MongoDB installation$ mongoMongoDB shell version: 2.2.0connecting to: test>_--------------------------------------------------------mongo localhost/test2mongo 127.0.0.1/test
  16. 16. How many database do you have?show dbs
  17. 17. Elements of MongoDB● Database ○ Collection ■ Document
  18. 18. What is JSON● JavaScript Object Notation● Elements of JSON { ○ Object: K/V pairs "key1": "value1", ○ Key, String "key2": 2.0 ○ Value, could be "key3": [1, "str", 3.0], ■ string "key4": false, ■ bool "key5": { "name": "another object", ■ number } ■ array } ■ object ■ null
  19. 19. Another sample of JSON{ "name": "John", "age": 12, "grades": { "math": 4.0, "english": 5.0 }, "registered": true, "favorite subjects": ["math", "english"]}
  20. 20. Insert document into MongoDBs={ "name": "John", "age": 12, "grades": { "math": 4.0, "english": 5.0 }, "registered": true, "favorite subjects": ["math", "english"]}db.students.insert(s);
  21. 21. Verify inserted documentdb.students.find()also trydb.student.insert(s)show collections
  22. 22. Save document into MongoDBs.name = "Alice"s.age = 14s.grades.math = 2.0db.students.save(s)
  23. 23. What is _id / ObjectId ?● _id is the default primary key for indexing documents, could be any JSON acceptable value.● By default, MongoDB will auto generate a ObjectId as _id● ObjectId is 12 bytes value of unique document _id● Use ObjectId().getTimestamp() to restore the timestamp in ObjectId 0 1 2 3 4 5 6 7 8 9 10 11 unix timestamp machine process id Increment
  24. 24. Save document with id into MongoDBs.name = "Bob"s.age = 11s[favorite subjects] = ["music", "math", "art"]s.grades.chinese = 3.0s._id = 1db.students.save(s)
  25. 25. Save document with existing _iddelete s.registereddb.students.save(s)
  26. 26. How to find documents?● db.xxxx.find() ○ list all documents in collection● db.xxxx.find( find spec, //how document looks like find fields, //which parts I wanna see ... )● db.xxxx.findOne() ○ only returns first document match find spec.
  27. 27. find by iddb.students.find({_id: 1})db.students.find({_id: ObjectId(xxx....)})
  28. 28. find and filter return fieldsdb.students.find({_id: 1}, {_id: 1})db.students.find({_id: 1}, {name: 1})db.students.find({_id: 1}, {_id: 1, name: 1})db.students.find({_id: 1}, {_id: 0, name: 1})
  29. 29. find by name - equal or not equaldb.students.find({name: "John"})db.students.find({name: "Alice"})db.students.find({name: {$ne: "John"}})● $ne : not equal
  30. 30. find by name - ignorecase ($regex)db.students.find({name: "john"}) => Xdb.students.find({name: /john/i}) => Odb.students.find({ name: { $regex: "^b", $options: "i" } })
  31. 31. find by range of names - $in, $nindb.students.find({name: {$in: ["John", "Bob"]}})db.students.find({name: {$nin: ["John", "Bob"]}})● $in : in range (array of items)● $nin : not in range
  32. 32. find by age - $gt, $gte, $lt, $ltedb.students.find({age: {$gt: 12}})db.students.find({age: {$gte: 12}})db.students.find({age: {$lt: 12}})db.students.find({age: {$lte: 12}})● $gt : greater than● $gte : greater than or equal● $lt : lesser than● $lte : lesser or equal
  33. 33. find by field existence - $existsdb.students.find({registered: {$exists: true}})db.students.find({registered: {$exists: false}})
  34. 34. find by field type - $typedb.students.find({_id: {$type: 7}})db.students.find({_id: {$type: 1}}) 1 Double 11 Regular expression 2 String 13 JavaScript code 3 Object 14 Symbol 4 Array 15 JavaScript code with scope 5 Binary Data 16 32 bit integer 7 Object id 17 Timestamp 8 Boolean 18 64 bit integer 9 Date 255 Min key 10 Null 127 Max key
  35. 35. find in multi-level fieldsdb.students.find({"grades.math": {$gt: 2.0}})db.students.find({"grades.math": {$gte: 2.0}})
  36. 36. find by remainder - $moddb.students.find({age: {$mod: [10, 2]}})db.students.find({age: {$mod: [10, 3]}})
  37. 37. find in array - $sizedb.students.find( {favorite subjects: {$size: 2}})db.students.find( {favorite subjects: {$size: 3}})
  38. 38. find in array - $alldb.students.find({favorite subjects: { $all: ["music", "math", "art"] }})db.students.find({favorite subjects: { $all: ["english", "math"] }})
  39. 39. find in array - find value in arraydb.students.find( {"favorite subjects": "art"})db.students.find( {"favorite subjects": "math"})
  40. 40. find with bool operators - $and, $ordb.students.find({$or: [ {age: {$lt: 12}}, {age: {$gt: 12}}]})db.students.find({$and: [ {age: {$lt: 12}}, {age: {$gte: 11}}]})
  41. 41. find with bool operators - $and, $ordb.students.find({$and: [ {age: {$lt: 12}}, {age: {$gte: 11}}]})equals todb.student.find({age: {$lt:12, $gte: 11}}
  42. 42. find with bool operators - $not$not could only be used with other find filterX db.students.find({registered: {$not: false}})O db.students.find({registered: {$ne: false}})O db.students.find({age: {$not: {$gte: 12}}})
  43. 43. find with JavaScript- $wheredb.students.find({$where: "this.age > 12"})db.students.find({$where: "this.grades.chinese"})
  44. 44. find cursor functions● count db.students.find().count()● limit db.students.find().limit(1)● skip db.students.find().skip(1)● sort db.students.find().sort({age: -1}) db.students.find().sort({age: 1})
  45. 45. combine find cursor functionsdb.students.find().skip(1).limit(1)db.students.find().skip(1).sort({age: -1})db.students.find().skip(1).limit(1).sort({age:-1})
  46. 46. more cursor functions● snapshot ensure cursor returns ○ no duplicates ○ misses no object ○ returns all matching objects that were present at the beginning and the end of the query. ○ usually for export/dump usage
  47. 47. more cursor functions● batchSize tell MongoDB how many documents should be sent to client at once● explain for performance profiling● hint tell MongoDB which index should be used for querying/sorting
  48. 48. list current running operations● list operations db.currentOP()● cancel operations db.killOP()
  49. 49. MongoDB index - when to use index?● while doing complicate find● while sorting lots of data
  50. 50. MongoDB index - sort() examplefor (i=0; i<1000000; i++){ db.many.save({value: i});}db.many.find().sort({value: -1})error: { "$err" : "too much data for sort() with no index. add an index or specifya smaller limit", "code" : 10128}
  51. 51. MongoDB index - how to build indexdb.many.ensureIndex({value: 1})● Index options ○ background ○ unique ○ dropDups ○ sparse
  52. 52. MongoDB index - index commands● list index db.many.getIndexes()● drop index db.many.dropIndex({value: 1}) db.many.dropIndexes() <-- DANGER!
  53. 53. MongoDB Index - find() exampledb.many.dropIndex({value: 1})db.many.find({value: 5555}).explain()db.many.ensureIndex({value: 1})db.many.find({value: 5555}).explain()
  54. 54. MongoDB Index - Compound Indexdb.xxx.ensureIndex({a:1, b:-1, c:1})query/sort with fields ● a ● a, b ● a, b, cwill be accelerated by this index
  55. 55. Remove/Drop data from MongoDB● Remove db.many.remove({value: 5555}) db.many.find({value: 5555}) db.many.remove()● Drop db.many.drop()● Drop database db.dropDatabase() EXTREMELY DANGER!!!
  56. 56. How to update data in MongoDBEasiest way:s = db.students.findOne({_id: 1})s.registered = truedb.students.save(s)
  57. 57. In place update - update()update( {find spec}, {update spec}, upsert=false)db.students.update( {_id: 1}, {$set: {registered: false}})
  58. 58. Update a non-exist documentdb.students.update( {_id: 2}, {name: Mary, age: 9}, true)db.students.update( {_id: 2}, {$set: {name: Mary, age: 9}}, true)
  59. 59. set / unset field valuedb.students.update({_id: 1}, {$set: {"age": 15}})db.students.update({_id: 1}, {$set: {registered: {2012: false, 2011:true} }})db.students.update({_id: 1}, {$unset: {registered: 1}})
  60. 60. increase/decrease valuedb.students.update({_id: 1}, { $inc: { "grades.math": 1.1, "grades.english": -1.5, "grades.history": 3.0 }})
  61. 61. push value(s) into arraydb.students.update({_id: 1},{ $push: {tags: "lazy"}})db.students.update({_id: 1},{ $pushAll: {tags: ["smart", "cute"]}})
  62. 62. add only not exists value to arraydb.students.update({_id: 1},{ $push: {tags: "lazy"}})db.students.update({_id: 1},{ $addToSet:{tags: "lazy"}})db.students.update({_id: 1},{ $addToSet:{tags: {$each: ["tall", "thin"]}}})
  63. 63. remove value from arraydb.students.update({_id: 1},{ $pull: {tags: "lazy"}})db.students.update({_id: 1},{ $pull: {tags: {$ne: "smart"}}})db.students.update({_id: 1},{ $pullAll: {tags: ["lazy", "smart"]}})
  64. 64. pop value from arraya = []; for(i=0;i<20;i++){a.push(i);}db.test.save({_id:1, value: a})db.test.update({_id: 1}, { $pop: {value: 1}})db.test.update({_id: 1}, { $pop: {value: -1}})
  65. 65. rename fielddb.test.update({_id: 1}, { $rename: {value: "values"}})
  66. 66. Practice: add comments to studentAdd a field into students ({_id: 1}):● field name: comments● field type: array of dictionary● field content: ○ { by: author name, string text: content of comment, string }● add at least 3 comments to this field
  67. 67. Example answer to practicedb.students.update({_id: 1}, {$addToSet: { comments: {$each: [ {by: "teacher01", text: "text 01"}, {by: "teacher02", text: "text 02"}, {by: "teacher03", text: "text 03"},]}}})
  68. 68. The $ position operator (for array)db.students.update({ _id: 1, "comments.by": "teacher02" }, { $inc: {"comments.$.vote": 1}})
  69. 69. Atomically update - findAndModify● Atomically update SINGLE DOCUMENT and return it● By default, returned document wont contain the modification made in findAndModify command.
  70. 70. findAndModify parametersdb.xxx.findAndModify({query: filter to querysort: how to sort and select 1st document in query resultsremove: set true if you want to remove itupdate: update contentnew: set true if you want to get the modified objectfields: which fields to fetchupsert: create object if not exists})
  71. 71. GridFS● MongoDB has 32MB document size limit● For storing large binary objects in MongoDB● GridFS is kind of spec, not implementation● Implementation is done by MongoDB drivers● Current supported drivers: ○ PHP ○ Java ○ Python ○ Ruby ○ Perl
  72. 72. GridFS - command line tools● List mongofiles list● Put mongofiles put xxx.txt● Get mongofiles get xxx.txt
  73. 73. MongoDB config - basic● dbpath ○ Which folder to put MongoDB database files ○ MongoDB must have write permission to this folder● logpath, logappend ○ logpath = log filename ○ MongoDB must have write permission to log file● bind_ip ○ IP(s) MongoDB will bind with, by default is all ○ User comma to separate more than 1 IP● port ○ Port number MongoDB will use ○ Default port = 27017
  74. 74. Small tip - rotate MongoDB logdb.getMongo().getDB("admin").runCommand("logRotate")
  75. 75. MongoDB config - journal● journal ○ Set journal on/off ○ Usually you should keep this on
  76. 76. MongoDB config - http interface● nohttpinterface ○ Default listen on http://localhost:28017 ○ Shows statistic info with http interface● rest ○ Used with httpinterface option enabled only ○ Example: http://localhost:28017/test/students/ http://localhost:28017/test/students/? filter_name=John
  77. 77. MongoDB config - authentication● auth ○ By default, MongoDB runs with no authentication ○ If no admin account is created, you could login with no authentication through local mongo shell and start managing user accounts.
  78. 78. MongoDB account management● Add admin user > mongo localhost/admin db.addUser("testadmin", "1234")● Authenticated as admin user use admin db.auth("testadmin", "1234")
  79. 79. MongoDB account management● Add user to test database use test db.addUser("testrw", "1234")● Add read only user to test database db.addUser("testro", "1234", true)● List users db.system.users.find()● Remove user db.removeUser("testro")
  80. 80. MongoDB config - authentication● keyFile ○ At least 6 characters and size smaller than 1KB ○ Used only for replica/sharding servers ○ Every replica/sharding server should use the same key file for communication ○ On U*ix system, file permission to key file for group/everyone must be none, or MongoDB will refuse to start
  81. 81. MongoDB configuration - Replica Set● replSet ○ Indicate the replica set name ○ All MongoDB in same replica set should use the same name ○ Limitation ■ Maximum 12 nodes in a single replica set ■ Maximum 7 nodes can vote ○ MongoDB replica set is Eventually consistent
  82. 82. Hows MongoDB replica set working?● Each a replica set has single primary (master) node and multiple slave nodes● Data will only be wrote to primary node then will be synced to other slave nodes.● Use getLastError() for confirming previous write operation is committed to whole replica set, otherwise the write operation may be rolled back if primary node is down before sync.
  83. 83. Hows MongoDB replica set working?● Once primary node is down, the whole replica set will be marked as fail and cant do any operation on it until the other nodes vote and elect a new primary node.● During failover, any write operation not committed to whole replica set will be rolled back
  84. 84. Simple replica set configurationmkdir -p /tmp/db01mkdir -p /tmp/db02mkdir -p /tmp/db03mongod --replSet test --port 29001 --dbpath /tmp/db01mongod --replSet test --port 29002 --dbpath /tmp/db02mongod --replSet test --port 29003 --dbpath /tmp/db03
  85. 85. Simple replica set configurationmongo localhost:29001
  86. 86. Another way to config replica setrs.initiate()rs.add("localhost:29001")rs.add("localhost:29002")rs.add("localhost:29003")
  87. 87. Extra options for setting replica set● arbiterOnly ○ Arbiter nodes dont receive data, cant become primary node but can vote.● priority ○ Node with priority 0 will never be elected as primary node. ○ Higher priority nodes will be preferred as primary ○ If you want to force some node become primary node, do not update nodes vote result, update nodes priority value and reconfig replica set.● buildIndexes ○ Can only be set to false on nodes with priority 0 ○ Use false for backup only nodes
  88. 88. Extra options for setting replica set● hidden ○ Nodes marked with hidden option will not be exposed to MongoDB clients. ○ Nodes marked with hidden option will not receive queries. ○ Only use this option for nodes with usage like reporting, integration, backup, etc.● slaveDelay ○ How many seconds slave nodes could fall behind to primary nodes ○ Can only be set on nodes with priority 0 ○ Used for preventing some human errors
  89. 89. Extra options for setting replica set● vote If set to 1, this node can vote, else not.
  90. 90. Change primary node at runtimeconfig = rs.conf()config.members[1].priority = 2rs.reconfig(config)
  91. 91. What is sharding? Name Value A value Alice value to value Amy value F value Bob value G value : value to value : value N value : value : value O value Yoko value to value Zeus value Z value
  92. 92. MongoDB sharding architecture
  93. 93. Elements of MongoDB shardingcluster● Config Server Storing sharding cluster metadata● mongos Router Routing database operations to correct shard server● Shard Server Hold real user data
  94. 94. Sharding config - config server● Config server is a MongoDB instance runs with --configsrv option● Config servers will automatically synced by mongos process, so DO NOT run them with --replSet option● Synchronous replication protocol is optimized for three machines.
  95. 95. Sharding config - mongos Router● Use mongos (not mongod) for starting a mongos router● mongos routes database operations to correct shard servers● Exmaple command for starting mongos mongos --configdb db01, db02, db03● With --chunkSize option, you could specify a smaller sharding chunk if youre just testing.
  96. 96. Sharding config - shard server● Shard server is a MongoDB instance runs with --shardsvr option● Shard server dont need to know where config server / mongos route is
  97. 97. Example script for building MongoDBshard clustermkdir -p /tmp/s00mkdir -p /tmp/s01mkdir -p /tmp/s02mkdir -p /tmp/s03mongod --configsvr --port 29000 --dbpath /tmp/s00mongos --configdb localhost:29000 --chunkSize 1 --port28000mongod --shardsvr --port 29001 --dbpath /tmp/s01mongod --shardsvr --port 29002 --dbpath /tmp/s02mongod --shardsvr --port 29003 --dbpath /tmp/s03
  98. 98. Sharding config - add shard servermongo localhost:28000/admindb.runCommand({addshard: "localhost:29001"})db.runCommand({addshard: "localhost:29002"})db.runCommand({addshard: "localhost:29003"})db.printShardingStatus()db.runCommand( { enablesharding : "test" } )db.runCommand( {shardcollection: "test.shardtest",key: {_id: 1}, unique: true})
  99. 99. Let us insert some documentsuse testfor (i=0; i<1000000; i++) { db.shardtest.insert({value: i});}
  100. 100. Remove 1 shard & see what happensuse admindb.runCommand({removeshard: "shard0002"})Lets add it backdb.runCommand({addshard: "localhost:29003"})
  101. 101. Pick your sharding key wisely● Sharding key can not be changed after sharding enabled● For updating any document in a sharding cluster, sharding key MUST BE INCLUDED as find specEX: sharding key= {name: 1, class: 1} db.xxx.update({name: "xxxx", class: "ooo},{ ..... update spec })
  102. 102. Pick your sharding key wisely● Sharding key will strongly affect your data distribution modelEX: sharding by ObjectId shard001 => data saved 2 months ago shard002 => data saved 1 months ago shard003 => data saved recently
  103. 103. Other sharding key examplesEX: sharding by Username shard001 => Username starts with a to k shard002 => Username starts with l to r shard003 => Username starts with s to zEX: sharding by md5 completely random distribution
  104. 104. What is Mapreduce?● Map then Reduce● Map is the procedure to call a function for emitting keys & values sending to reduce function● Reduce is the procedure to call a function for reducing the emitted keys & values sent via map function into single reduced result.● Example: map students grades and reduce into total students grades.
  105. 105. How to call mapreduce in MongoDBdb.xxx.mapreduce( map function, reduce function,{ out: output option, query: query filter, optional, sort: sort filter, optional, finalize: finalize function, .... etc})
  106. 106. Lets generate some datafor (i=0; i<10000; i++){ db.grades.insert({ grades: { math: Math.random() * 100 % 100, art: Math.random() * 100 % 100, music: Math.random() * 100 % 100 } });}
  107. 107. Prepare Map functionfunction map(){ for (k in this.grades){ emit(k, {total: 1, pass: 1 ? this.grades[k] >= 60.0 : 0, fail: 1 ? this.grades[k] < 60.0 : 0, sum: this.grades[k], avg: 0 }); }}
  108. 108. Prepare reduce functionfunction reduce(key, values){ result = {total: 0, pass: 0, fail: 0, sum: 0, avg: 0}; values.forEach(function(value){ result.total += value.total; result.pass += value.pass; result.fail += value.fail; result.sum += value.sum; }); return result;}
  109. 109. Execute your 1st mapreduce call db.grades.mapReduce( map, reduce, {out:{inline: 1}})
  110. 110. Add finalize functionfunction finalize(key, value){ value.avg = value.sum / value.total; return value;}
  111. 111. Run mapreduce again with finalize db.grades.mapReduce( map, reduce, {out:{inline: 1}, finalize: finalize})
  112. 112. Mapreduce output options● {replace: <result collection name>} Replace result collection if already existed.● {merge: <result collection name>} Always overwrite with new results.● {reduce: <result collection name>} Run reduce if same key exists in both old/current result collections. Will run finalize function if any.● {inline: 1} Put result in memory
  113. 113. Other mapreduce output options● db- put result collection in different database● sharded - output collection will be sharded using key = _id● nonAtomic - partial reduce result will be visible will processing.
  114. 114. MongoDB backup & restore● mongodump mongodump -h localhost:27017● mongorestore mongorestore -h localhost:27017 --drop● mongoexport mongoexport -d test -c students -h localhost:27017 > students.json● mongoimport mongoimport -d test -c students -h localhost:27017 < students.json
  115. 115. Conclusion - Pros of MongoDB● Agile (Schemaless)● Easy to use● Built in replica & sharding● Mapreduce with sharding
  116. 116. Conclusion - Cons of MongoDB● Schemaless = everyone need to know how data look like● Waste of spaces on keys● Eats lots of memory● Mapreduce is hard to handle
  117. 117. Cautions of MongoDB● Global write lock ○ Add more RAM ○ Use newer version (MongoDB 2.2 now has DB level global write lock) ○ Split your database properly● Remove document wont free disk spaces ○ You need run compact command periodically● Dont let your MongoDB data disk full ○ Once freespace of disk used by MongoDB if full, you wont be able to move/delete document in it.

×