Mongo db

3,424 views

Published on

MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.

First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.

Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.

At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,424
On SlideShare
0
From Embeds
0
Number of Embeds
676
Actions
Shares
0
Downloads
125
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Mongo db

  1. 1. NoSQL Database Akshay Mathur Sarang Shravagi @akshaymathu, @_sarangs {name: ‘mongo’, type: ‘db’}
  2. 2. Who uses MongoDB @akshaymathu, @_sarangs 2
  3. 3. Let’s Know Each Other • Do you code? • OS? • Programing Language? • Why are you attending? @akshaymathu, @_sarangs 3
  4. 4. Akshay Mathur • Managed development, testing and release teams in last 14+ years – Currently Principal Architect at ShopSocially • Founding Team Member of – ShopSocially (Enabling “social” for retailers) – AirTight Neworks (Global leader of WIPS) @akshaymathu, @_sarangs 4
  5. 5. Sarang Shravagi • 10gen Certified Developer and DBA • CS graduate from PICT Pune • 3+ years in Software Product industry • Currently Senior Full-stack Developer at ShopSocially @akshaymathu, @_sarangs 5
  6. 6. How we use MongoDB @akshaymathu, @_sarangs 6 Python MongoDB MongoEngine
  7. 7. Where MongoDB Fits @akshaymathu, @_sarangs 7
  8. 8. Program Outline: Understanding NoSQL • Data Landscape • Different Storage Needs • Design Paradigm Shift from SQL to NoSQL • Different Datastores • Closer look to Document Storage • Drawing parallel from RDBMS @akshaymathu, @_sarangs 8
  9. 9. Program Outline: Hands on Lab • Installation and basic configuration • Mongo Shell • Creating and Changing Schema • Create, Read, Update and Delete of Data • Analyzing Performance • Improving performance by creating Indices • Assignment • Problem solving for the assignment @akshaymathu, @_sarangs 9
  10. 10. Program Outline: Advance Topics • Handling Big Data – Introduction to Map/Reduce – Introduction to Data Partitioning (Sharding) • Disaster Recovery – Introduction to Replica set and High Availability @akshaymathu, @_sarangs 10
  11. 11. Ground Rules • Disturb Everyone – Not by phone rings – Not by local talks – By more information and questions @akshaymathu, @_sarangs 11
  12. 12. Data Patterns & Storage Needs @akshaymathu, @_sarangs 12
  13. 13. Data at an Online Store • Product Information • User Information • Purchase Information • Product Reviews • Site Interactions • Social Graph • Search Index @akshaymathu, @_sarangs 13
  14. 14. SQL to NoSQL Design Paradigm Shift @akshaymathu, @_sarangs 14
  15. 15. SQL Storage • Was designed when – Storage and data transfer was costly – Processing was slow – Applications were oriented more towards data collection • Initial adopters were financial institutions @akshaymathu, @_sarangs 15
  16. 16. SQL Storage • Structured – schema • Relational – foreign keys, constraints • Transactional – Atomicity, Consistency, Isolation, Durability • High Availability through robustness – Minimize failures • Optimized for Writes • Typically Scale Up @akshaymathu, @_sarangs 16
  17. 17. NoSQL Storage • Is designed when – Storage is cheap – Data transfer is fast – Much more processing power is available • Clustering of machines is also possible – Applications are oriented towards consumption of User Generated Content – Better on-screen user experience is in demand @akshaymathu, @_sarangs 17
  18. 18. NoSQL Storage • Semi-structured – Schemaless • Consistency, Availability, Partition Tolerance • High Availability through clustering – expect failures • Optimized for Reads • Typically Scale Out @akshaymathu, @_sarangs 18
  19. 19. Different Datastores Half Level Deep @akshaymathu, @_sarangs 19
  20. 20. SQL: RDBMS • MySql, Postgresql, Oracle etc. • Stores data in tables having columns – Basic (number, text) data types • Strong query language • Transparent values – Query language can read and filter on them – Relationship between tables based on values • Suited for user info and transactions @akshaymathu, @_sarangs 20
  21. 21. NoSQL: Key/Value • Redis, DynamoDB etc. • Stores a values against a key – Strings • Values are opaque – Can not be part of query • Suited for site interactions @akshaymathu, @_sarangs 21
  22. 22. NoSQL: Key/Value
  23. 23. NoSQL: Document • MongoDB, CouchDB etc. • Object Oriented data models – Stores data in document objects having fields – Basic and compound (list, dict) data types • SQL like queries • Transparent values – Can be part of query • Suited for product info and its reviews @akshaymathu, @_sarangs 23
  24. 24. NoSQL: Document
  25. 25. NoSQL: Column Family • Cassandra, Big Table etc. • Stores data in columns • Transparent values – Can be part of query • SQL like queries • Suited for search @akshaymathu, @_sarangs 25
  26. 26. NoSQL: Column Family
  27. 27. NoSQL: Graph • Neo4j • Stores data in form of nodes and relationships • Query is in form of traversal • In-memory • Suited for social graph @akshaymathu, @_sarangs 27
  28. 28. NoSQL: Graph
  29. 29. Document Storage: Closer Look @akshaymathu, @_sarangs 30
  30. 30. MongoDB • Document database • Powerful query language • Docs, sub-docs, indexes • Map/reduce • Replicas, shards, replicated shards • SDKs/drivers for so many languages – C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl, Ruby, Scala @akshaymathu, @_sarangs 31
  31. 31. RDBMS: DB Design @akshaymathu, @_sarangs 32
  32. 32. RDBMS: Query @akshaymathu, @_sarangs 33
  33. 33. RDBMS  MongoDB RDBMS MongoDB Database Database Table Collection Row Document Column Field Select c1, c2 from Table where c1 = ‘v1’ order by c2 limit n Collection.objects(F1 = ‘v1’).order_by(‘c2’).limit(n) @akshaymathu, @_sarangs 34
  34. 34. MongoDB: Design @akshaymathu, @_sarangs 35
  35. 35. MongoDB: Query • Movies.objects() @akshaymathu, @_sarangs 36
  36. 36. @akshaymathu, @_sarangs 37
  37. 37. Have you Installed? http://www.mongodb.org/downloads @akshaymathu, @_sarangs
  38. 38. Hands-on Dive-in with Sarang @akshaymathu, @_sarangs 39
  39. 39. MongoDB: Core Binaries • mongod – Database server • mongo – Database client shell • mongos – Router for Sharding @akshaymathu, @_sarangs 40
  40. 40. Getting Help • For mongo shell – mongo –help • Shows options available for running the shell • Inside mongo shell – Object.help() • Shows commands available on the object @akshaymathu, @_sarangs 41
  41. 41. Import Export Tools • For objects – mongodump – mongorestore – bsondump – mongooplog • For data items – mongoimport – mongoexport @akshaymathu, @_sarangs 42
  42. 42. Database Operations • Database creation • Creating/changing collection • Data insertion • Data read • Data update • Creating indices • Data deletion • Dropping collection @akshaymathu, @_sarangs 43
  43. 43. Diagnostic Tools • mongostat • mongoperf • mongosnif • mongotop @akshaymathu, @_sarangs 44
  44. 44. @akshaymathu, @_sarangs 45
  45. 45. Assignment • Go to http://www.velocitainc.com/mongo/ – Tasks • assignments.txt – Data • students.json @akshaymathu, @_sarangs 46
  46. 46. Disaster Recovery Introduction to Replica Sets and High Availability @akshaymathu, @_sarangs 47
  47. 47. Disasters • Physical Failure – Hardware – Network • Solution – Replica Sets • Provide redundant storage for High Availability – Real time data synchronization • Automatic failover for zero down time @akshaymathu, @_sarangs 48
  48. 48. Replication @akshaymathu, @_sarangs 49
  49. 49. Multi Replication • Data can be replicated to multiple places simultaneously • Odd number of machines are always needed in a replica set @akshaymathu, @_sarangs 50
  50. 50. Single Replication • If you want to have only one or odd number of secondary, you need to setup an arbiter @akshaymathu, @_sarangs 51
  51. 51. Failover • When primary fails, remaining machines vote for electing new primary @akshaymathu, @_sarangs 52
  52. 52. Handling Big Data Introduction to Map/Reduce and Sharding @akshaymathu, @_sarangs 53
  53. 53. Large Data Sets • Problem 1 – Performance • Queries go slow • Solution – Map/Reduce @akshaymathu, @_sarangs 54
  54. 54. Map Reduce • A way to divide large query computation into smaller chunks • May run in multiple processes across multiple machines • Think of it as GROUP BY of SQL @akshaymathu, @_sarangs 55
  55. 55. Map/Reduce Example • Map function digs the data and returns required values @akshaymathu, @_sarangs 56
  56. 56. Map/Reduce Example • Reduce function uses the output of Map function and generates aggregated value @akshaymathu, @_sarangs 57
  57. 57. Large Data Sets • Problem 2 – Vertical Scaling of Hardware • Can’t increase machine size beyond a limit • Solution – Sharding @akshaymathu, @_sarangs 58
  58. 58. Sharding • A method for storing data across multiple machines • Data is partitioned using Shard Keys @akshaymathu, @_sarangs 59
  59. 59. Data Partitioning: Range Based • A range of Shard Keys stay in a chunk @akshaymathu, @_sarangs 60
  60. 60. Data Partitioning: Hash Bsed • A hash function on Shard Keys decides the chunk @akshaymathu, @_sarangs 61
  61. 61. Sharded Cluster @akshaymathu, @_sarangs 62
  62. 62. Optimizing Shards: Splitting • In a shard, when size of a chunk increases, the chunk is divided into two @akshaymathu, @_sarangs 63
  63. 63. Optimizing Shards: Balancing • When number of chunks in a shard increase, a few chunks are migrated to other shard @akshaymathu, @_sarangs 64
  64. 64. Summary • MongoDB is good – Stores objects as we use in programming language – Flexible semi-structured design – Scales out to store big data – Embedded documents eliminates need for join • MongoDB is bad – No multi-document query – De-normalized storage – No support for transactions @akshaymathu, @_sarangs 65
  65. 65. Thanks @akshaymathu, @_sarangs 66 @akshaymathu @_sarangs

×