Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MongoDB Basic Concepts

13,091 views

Published on

  • Be the first to comment

MongoDB Basic Concepts

  1. 1. MongoDB Basic ConceptsNorberto LeiteSenior Solutions Architect, 10gen
  2. 2. Agenda• Overview• Replication• Scalability• Consistency & Durability• Flexibility / Developer Experience 2
  3. 3. But first ...
  4. 4. HappyHanukkah!!!
  5. 5. Who’s this guy?
  6. 6. Norberto LeiteSenior Solutions Architect@nleite / norberto@10gen.com 6
  7. 7. Norberto LeiteSenior SolutionsArchitect@nleite /norberto@10gen.comBarcelona 7
  8. 8. Norberto LeiteSenior SolutionsArchitect@nleite /norberto@10gen.comBarcelonaLove MongoDB 8
  9. 9. Norberto LeiteSenior SolutionsArchitect@nleite /norberto@10gen.comBarcelonaLove MongoDBand others ... 9
  10. 10. Your Data
  11. 11. Fundamentals Document ApplicationHigh Oriented {Performance name: ‘Norberto Leite’, position: ‘SA’, nick: ‘WingMan’, based: [‘Barcelona’, ‘London’] } mongoDB mongoDB mongoDB mongoDB Fully Consistent Horizontal Scalability 13
  12. 12. Replication
  13. 13. Why do we need Replication?• Failover• Backups• Secondary Batch Jobs• High Availability 15
  14. 14. Outages• Planned – Hardware upgrade – OS or file-system tuning – Software upgrade – Relocation of data to new file-system / storage• Un-planed – Human Error – Hardware Failure – Data Center / Region Outage – Application Corruption 16
  15. 15. Replica Sets• Data Protection – Multiple copies of data – Data spread across data centers, AZ’s etc• High Availability – Automated Failover – Automated Recovery 17
  16. 16. Asynchronous ReplicationApp Write Primary Read (default) Secondary Read (optional) Secondary Read (optional)
  17. 17. FailoverApp Write Primary Read (default) Secondary Read (optional) Secondary Read (optional)
  18. 18. Automatic Failover Primary ElectionApp Primary Write Primary Read (default) Secondary Read (optional)
  19. 19. Automatic RecoveryApp Read Recovery Secondary (optional) Write Primary Read (default) Secondary Read (optional)
  20. 20. Sharding
  21. 21. Sharding• Data Location Transparent to Code• Data Distribution is Automatic – as well as re-distribution• Aggregation System resources Horizontally• No CODE Changes!!! 23
  22. 22. sh.shardCollection("test.tweets", {_id: 1} , false) Range Distribution shard01 shard02 shard03 a-i j-m n-z
  23. 23. Chunk Splitshard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj k-m ki-m
  24. 24. Auto Balancingshard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj ki-m
  25. 25. Routeddb.tweets.find( {_id: Queries‘norberto’}) shard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj ki-m
  26. 26. db.tweets.find( {email:‘norberto@10gen’}) Scatter Gather shard01 shard02 shard03 a-i ja-jz j-m n-z ka-kj ki-m
  27. 27. Caching 96 GB Mem 3:1 Data/Mem shard01 a-i300 GB Data j-r n-z 300 GB
  28. 28. Horizontal Distribution 96 GB Mem 96 GB Mem 96 GB Mem 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem shard01 shard02 shard03 a-i a-i j-r n-z300 GB Data 100 GB 100 GB 100 GB
  29. 29. Consistency andDurability
  30. 30. Consistency• Eventual Consistency – Allow updates when a system as been partitioned – Resolve conflicts later – Ex: Cassandra, CouchDB• Immediate Consistency – Single Master – Avoids conflicts – Example: MongoDB 32
  31. 31. Durability• For how long is my data available?• When do I know my data is safe?!• Where is it safe?• MongoDB style: – Fire and Forget – Get Last Error – Journal Sync – Replica Safe 33
  32. 32. Durability Multiple Data Centers Memory Journal Secondary Nodes RDMS Async w=1 (default) j=truew=majority w=”tag” 34
  33. 33. Flexibility
  34. 34. Data Model• Why Json? – Well understood data format – Maps simply to objects – Linking & Embedding to describe relationships 36
  35. 35. JSONplace1 = { : "578 Broadway 7th Floor", name : "10gen HQ", address city : "New York", zip "business", "tech" ]} : "10011", tags : [}
  36. 36. Relational Way
  37. 37. MongoDB Way embeddinglinking
  38. 38. JSON & Scale Out• Embedding removes the need for: – Distributed Joins – Two Phase Commit• Enables data to be distributed across many nodes without penalty 40

×