Uploaded on

MongoDB is a popular NoSQL database. This presentation was delivered during a workshop. …

MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.

First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.

Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.

At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,256
On Slideshare
0
From Embeds
0
Number of Embeds
38

Actions

Shares
Downloads
21
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. NoSQL Database Akshay Mathur Sarang Shravagi @akshaymathu, @_sarangs {name: ‘mongo’, type: ‘db’}
  • 2. Who uses MongoDB @akshaymathu, @_sarangs 2
  • 3. Let’s Know Each Other • Do you code? • OS? • Programing Language? • Why are you attending? @akshaymathu, @_sarangs 3
  • 4. Akshay Mathur • Managed development, testing and release teams in last 14+ years – Currently Principal Architect at ShopSocially • Founding Team Member of – ShopSocially (Enabling “social” for retailers) – AirTight Neworks (Global leader of WIPS) @akshaymathu, @_sarangs 4
  • 5. Sarang Shravagi • 10gen Certified Developer and DBA • CS graduate from PICT Pune • 3+ years in Software Product industry • Currently Senior Full-stack Developer at ShopSocially @akshaymathu, @_sarangs 5
  • 6. How we use MongoDB @akshaymathu, @_sarangs 6 Python MongoDB MongoEngine
  • 7. Where MongoDB Fits @akshaymathu, @_sarangs 7
  • 8. Program Outline: Understanding NoSQL • Data Landscape • Different Storage Needs • Design Paradigm Shift from SQL to NoSQL • Different Datastores • Closer look to Document Storage • Drawing parallel from RDBMS @akshaymathu, @_sarangs 8
  • 9. Program Outline: Hands on Lab • Installation and basic configuration • Mongo Shell • Creating and Changing Schema • Create, Read, Update and Delete of Data • Analyzing Performance • Improving performance by creating Indices • Assignment • Problem solving for the assignment @akshaymathu, @_sarangs 9
  • 10. Program Outline: Advance Topics • Handling Big Data – Introduction to Map/Reduce – Introduction to Data Partitioning (Sharding) • Disaster Recovery – Introduction to Replica set and High Availability @akshaymathu, @_sarangs 10
  • 11. Ground Rules • Disturb Everyone – Not by phone rings – Not by local talks – By more information and questions @akshaymathu, @_sarangs 11
  • 12. Data Patterns & Storage Needs @akshaymathu, @_sarangs 12
  • 13. Data at an Online Store • Product Information • User Information • Purchase Information • Product Reviews • Site Interactions • Social Graph • Search Index @akshaymathu, @_sarangs 13
  • 14. SQL to NoSQL Design Paradigm Shift @akshaymathu, @_sarangs 14
  • 15. SQL Storage • Was designed when – Storage and data transfer was costly – Processing was slow – Applications were oriented more towards data collection • Initial adopters were financial institutions @akshaymathu, @_sarangs 15
  • 16. SQL Storage • Structured – schema • Relational – foreign keys, constraints • Transactional – Atomicity, Consistency, Isolation, Durability • High Availability through robustness – Minimize failures • Optimized for Writes • Typically Scale Up @akshaymathu, @_sarangs 16
  • 17. NoSQL Storage • Is designed when – Storage is cheap – Data transfer is fast – Much more processing power is available • Clustering of machines is also possible – Applications are oriented towards consumption of User Generated Content – Better on-screen user experience is in demand @akshaymathu, @_sarangs 17
  • 18. NoSQL Storage • Semi-structured – Schemaless • Consistency, Availability, Partition Tolerance • High Availability through clustering – expect failures • Optimized for Reads • Typically Scale Out @akshaymathu, @_sarangs 18
  • 19. Different Datastores Half Level Deep @akshaymathu, @_sarangs 19
  • 20. SQL: RDBMS • MySql, Postgresql, Oracle etc. • Stores data in tables having columns – Basic (number, text) data types • Strong query language • Transparent values – Query language can read and filter on them – Relationship between tables based on values • Suited for user info and transactions @akshaymathu, @_sarangs 20
  • 21. NoSQL: Key/Value • Redis, DynamoDB etc. • Stores a values against a key – Strings • Values are opaque – Can not be part of query • Suited for site interactions @akshaymathu, @_sarangs 21
  • 22. NoSQL: Key/Value
  • 23. NoSQL: Document • MongoDB, CouchDB etc. • Object Oriented data models – Stores data in document objects having fields – Basic and compound (list, dict) data types • SQL like queries • Transparent values – Can be part of query • Suited for product info and its reviews @akshaymathu, @_sarangs 23
  • 24. NoSQL: Document
  • 25. NoSQL: Column Family • Cassandra, Big Table etc. • Stores data in columns • Transparent values – Can be part of query • SQL like queries • Suited for search @akshaymathu, @_sarangs 25
  • 26. NoSQL: Column Family
  • 27. NoSQL: Graph • Neo4j • Stores data in form of nodes and relationships • Query is in form of traversal • In-memory • Suited for social graph @akshaymathu, @_sarangs 27
  • 28. NoSQL: Graph
  • 29. Document Storage: Closer Look @akshaymathu, @_sarangs 30
  • 30. MongoDB • Document database • Powerful query language • Docs, sub-docs, indexes • Map/reduce • Replicas, shards, replicated shards • SDKs/drivers for so many languages – C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl, Ruby, Scala @akshaymathu, @_sarangs 31
  • 31. RDBMS: DB Design @akshaymathu, @_sarangs 32
  • 32. RDBMS: Query @akshaymathu, @_sarangs 33
  • 33. RDBMS  MongoDB RDBMS MongoDB Database Database Table Collection Row Document Column Field Select c1, c2 from Table where c1 = ‘v1’ order by c2 limit n Collection.objects(F1 = ‘v1’).order_by(‘c2’).limit(n) @akshaymathu, @_sarangs 34
  • 34. MongoDB: Design @akshaymathu, @_sarangs 35
  • 35. MongoDB: Query • Movies.objects() @akshaymathu, @_sarangs 36
  • 36. @akshaymathu, @_sarangs 37
  • 37. Have you Installed? http://www.mongodb.org/downloads @akshaymathu, @_sarangs
  • 38. Hands-on Dive-in with Sarang @akshaymathu, @_sarangs 39
  • 39. MongoDB: Core Binaries • mongod – Database server • mongo – Database client shell • mongos – Router for Sharding @akshaymathu, @_sarangs 40
  • 40. Getting Help • For mongo shell – mongo –help • Shows options available for running the shell • Inside mongo shell – Object.help() • Shows commands available on the object @akshaymathu, @_sarangs 41
  • 41. Import Export Tools • For objects – mongodump – mongorestore – bsondump – mongooplog • For data items – mongoimport – mongoexport @akshaymathu, @_sarangs 42
  • 42. Database Operations • Database creation • Creating/changing collection • Data insertion • Data read • Data update • Creating indices • Data deletion • Dropping collection @akshaymathu, @_sarangs 43
  • 43. Diagnostic Tools • mongostat • mongoperf • mongosnif • mongotop @akshaymathu, @_sarangs 44
  • 44. @akshaymathu, @_sarangs 45
  • 45. Assignment • Go to http://www.velocitainc.com/mongo/ – Tasks • assignments.txt – Data • students.json @akshaymathu, @_sarangs 46
  • 46. Disaster Recovery Introduction to Replica Sets and High Availability @akshaymathu, @_sarangs 47
  • 47. Disasters • Physical Failure – Hardware – Network • Solution – Replica Sets • Provide redundant storage for High Availability – Real time data synchronization • Automatic failover for zero down time @akshaymathu, @_sarangs 48
  • 48. Replication @akshaymathu, @_sarangs 49
  • 49. Multi Replication • Data can be replicated to multiple places simultaneously • Odd number of machines are always needed in a replica set @akshaymathu, @_sarangs 50
  • 50. Single Replication • If you want to have only one or odd number of secondary, you need to setup an arbiter @akshaymathu, @_sarangs 51
  • 51. Failover • When primary fails, remaining machines vote for electing new primary @akshaymathu, @_sarangs 52
  • 52. Handling Big Data Introduction to Map/Reduce and Sharding @akshaymathu, @_sarangs 53
  • 53. Large Data Sets • Problem 1 – Performance • Queries go slow • Solution – Map/Reduce @akshaymathu, @_sarangs 54
  • 54. Map Reduce • A way to divide large query computation into smaller chunks • May run in multiple processes across multiple machines • Think of it as GROUP BY of SQL @akshaymathu, @_sarangs 55
  • 55. Map/Reduce Example • Map function digs the data and returns required values @akshaymathu, @_sarangs 56
  • 56. Map/Reduce Example • Reduce function uses the output of Map function and generates aggregated value @akshaymathu, @_sarangs 57
  • 57. Large Data Sets • Problem 2 – Vertical Scaling of Hardware • Can’t increase machine size beyond a limit • Solution – Sharding @akshaymathu, @_sarangs 58
  • 58. Sharding • A method for storing data across multiple machines • Data is partitioned using Shard Keys @akshaymathu, @_sarangs 59
  • 59. Data Partitioning: Range Based • A range of Shard Keys stay in a chunk @akshaymathu, @_sarangs 60
  • 60. Data Partitioning: Hash Bsed • A hash function on Shard Keys decides the chunk @akshaymathu, @_sarangs 61
  • 61. Sharded Cluster @akshaymathu, @_sarangs 62
  • 62. Optimizing Shards: Splitting • In a shard, when size of a chunk increases, the chunk is divided into two @akshaymathu, @_sarangs 63
  • 63. Optimizing Shards: Balancing • When number of chunks in a shard increase, a few chunks are migrated to other shard @akshaymathu, @_sarangs 64
  • 64. Summary • MongoDB is good – Stores objects as we use in programming language – Flexible semi-structured design – Scales out to store big data – Embedded documents eliminates need for join • MongoDB is bad – No multi-document query – De-normalized storage – No support for transactions @akshaymathu, @_sarangs 65
  • 65. Thanks @akshaymathu, @_sarangs 66 @akshaymathu @_sarangs