Introduction to MongoDB


Published on

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Introduction to MongoDB

    1. 1. Slide 1 Introduction to MongoDB® View Mongo DB ® Course at
    2. 2. Slide 2Slide 2 @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why is a NoSQL database needed? Benefits of NoSQL over RDBMS dbs Comparing NoSQL vs SQL dbs How MongoDB® solves the problem Use cases of MongoDB® Job opportunity and trends on MongoDB® For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN Objectives of this Session
    3. 3. Slide 3 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 3 RDBMS/OLTP/Real Time NoSQL/New SQL/BigData DSS/OLAP/DW Oracle MySQL MS SQL DB2 Netezza SAP Hana Oracle Express Mongo DB® HBase Cassandra Couch DB Database Categories
    4. 4. Slide 4Slide 4 @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Why NoSQL Volume - 15 - 20 petabyte data in Govt. of India “AADHAR” project, NYSE, Facebook Velocity - Fastest growing app, Facebook, Jet, Video stream Variety - Data types e.g. Images, Videos, Text etc
    5. 5. Slide 5Slide 5 @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions Horizontal Scaling (scale out) versus Vertical Scaling (scale up)  Scale-out - Cache dependent ‘Read’ and ‘Write’ operations  Scale-up - Limited by Memory and Processing (CPU) capabilities  Complex RDBMS model – Parsing, Locking, Logging, Buffer pool, Threads etc.  Monitoring and managing the distributed architecture  Agility with fastest changing world Why NoSQL
    6. 6. Slide 6 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 6 Structured Data Text, Log Files, Click Streams, Blogs, Tweets, Audio, Video, etc Unstructured and Semi–structured Data Data Growth Pattern
    7. 7. Slide 7 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 7 NoSQL to the Rescue Distributed Architecture: easy to scale-out, easy to manage large number of nodes First Reads: Satisfying ‘write once, read many times’ behaviour Easy Replication: The failover solution Higher Performance: An architecture providing much higher per-node performance than the traditional SQL-based databases Schema Free: Data model with dynamic and flexible schema Fast Development: No need of the additional ORM layer To achieve the same, we compromise on:  No joins  No ACID transaction
    8. 8. Slide 8 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 8 NoSQL to the Rescue
    9. 9. Slide 9 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 9 CAP We must understand the CAP theorem when we talk about NoSQL databases or in fact when designing any distributed system. CAP theorem states that there are 3 basic requirements which exist in a special relation when designing applications for a distributed architecture. Consistency Availability Partition Tolerance CAP Theorem This means that the system is always on (guaranteed service availability), no downtime. This means that the system continues to function even if the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. This means that the data in the database remains consistent after the execution of an operation. For example, after an update operation all clients see the same data.
    10. 10. Slide 10 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 10  CAP provides the basic requirements for a distributed system to follow 2 of the 3 requirements.  Theoretically it is impossible to fulfill all 3 requirements.  Therefore, all the current NoSQL database follows the different combinations of the C, A, P from the CAP theorem. CAP Theorem and NoSQL Databases  CA - Single site cluster, therefore all nodes are always in contact. When a partition occurs, the system blocks.  CP - Some data may not be accessible, but the rest is still consistent/accurate.  AP - System is still available under partitioning, but some of the data returned may be inaccurate.
    11. 11. Slide 11 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 11  Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem. Basically Available  Soft State indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model. Soft State  Eventual Consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time. Eventual Consistency A BASE system gives up on consistency. NoSQL Database - A BASE not ACID System
    12. 12. Slide 12 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 12 ~ 150 No SQL Database are there in Market ~150 NoSQL Database – Not a Panacea
    13. 13. Slide 13 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 13 Graph Store Key – value Stores Wide Column Stores% Document Base  Document databases pair each key with a complex data structure known as a document.  Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.  Graph stores are used to store information about networks, such as social connections.  Graph stores include Neo4J and HyperGraphDB.  Key-value stores are the simplest NoSQL databases.  Every single item in the database is stored as an attribute name (or "key"), together with its value.  Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows. Categories of NoSQL Database
    14. 14. Slide 14 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 14 Key Value Store Memcached Coherence Redis Tabular Big Table Hbase Accumulo Document Oriented MongoDB® Couch DB Cloudant Graph Stores Neo4J Oracle NoSQL HyperGraphDB Types of NoSQL Databases
    15. 15. Slide 15 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 15 Compromising Features of RDBMS Step 2 Step 3 Selecting a NoSQL Database Step 1 Right Data Model Pros and Cons of Consistency
    16. 16. Slide 16 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 16 1000 TPS Caching Layer 300 ~ 500 SQL Transaction 100 ~ 200 SQL Transaction 1000 TPS WEB APPLICATION RDBMS1 Applications Changing Data RDBMS1 A Traditional Database Solution
    17. 17. Slide 17 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 17 WEB APPLICATION 5000 TPS A NoSQL Database Solution Applications Changing Data Application grows with user base and data volume 5000 TPS
    18. 18. Slide 18 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 18 Suppose a client needs database design for his blog or website with below features, then what could be the difference between RDBMS and MongoDB® schema design approach? Website has the following requirements. Every post has the unique title, description and url. Every post can have one or more tags. Every post has the name of its publisher and total number of likes. Every post has comments given by users along with their name, message, data-time and likes. On each post there can be zero or more comments. Data Modeling Example in RDBMS and MongoDB ®
    19. 19. Slide 19 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 19 In RDBMS schema design for above requirements will have minimum 3 tables. posts id title description url likes post_by tags id post_id tag comments Comment_id post_id by_user message date_time likes ∞ 1 1 ∞ Data Modeling Example in RDBMS and MongoDB®
    20. 20. Slide 20 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 20 While in MongoDB® schema design will have one collection post and has the following structure. So while showing the data, in RDBMS we need to join three tables and in Mongodb® data will be shown from one collection only. { _id: POST_ID title: TITLE_OF_POST, description: POST_DESCRIPTION, by: POST_BY, url: URL_OF_POST, tags: [ TAG1, TAG2, TAG3], likes: TOTAL_LIKES, comments: [ { user:'COMMENT_BY', message: TEXT, dateCreated: DATE_TIME, like: LIKES }, { user:'COMMENT_BY', message: TEXT, dateCreated: DATE_TIME, like: LIKES } ] } Data Modeling Example in RDBMS and MongoDB
    21. 21. Slide 21 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 21 Where to Use MongoDB®? RDBMS replacement for Web Applications Semi-structured Content Management Real-time Analytics & High-Speed Logging Caching and High Scalability Web 2.0, Media, SAAS, Gaming
    22. 22. Slide 22 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 22  Metlife uses MongoDB® for “The Wall”, an innovative customer service application which provides a 360-degree, consolidated view of MetLife customers, including policy details and transactions across lines of business.  ebay has a number of projects running on MongoDB® for search suggestions, metadata storage, cloud management and merchandizing categorization.  MongoDB® is the repository that powers MTV Networks’ next-generation CMS, which is used to manage and distribute content for all of MTV Networks’ major websites.  MongoDB® is used for back-end storage on the SourceForge front pages, project pages, and download pages for all projects.  Craigslist uses MongoDB® to archive billions of records.  ADP uses MongoDB® for its high performance, scalability, reliability and its ability to preserve the data manipulation capabilities of traditional relational databases. Real World Use Cases of MongoDB
    23. 23. Slide 23 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 23  CNN Turk uses MongoDB ® for its infrastructure and content management system, including the  Foursquare uses MongoDB ® to store venues and user ‘check-ins’ into venues, sharding the data over more than 25 machines on Amazon EC2.  is the easy, fun, and fast way to share live video online. MongoDB ® powers’s internal analytics tools for virality, user retention, and general usage stats that out-of-the-box solutions can’t provide.  ibibo (‘I build, I bond’) is a social network using MongoDB ® for its dashboard feeds. Each feed is represented as a single document containing an average of 1000 entries; the site currently stores over two million of these documents in MongoDB ® . Real World Use Cases of MongoDB ®
    24. 24. Slide 24 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 24 Financial Services  Risk Analytics and Reporting  Reference Data Management  Market Data Management  Portfolio Management  Order Capture  Time Series Data Government  Surveillance Data Aggregation  Crime Data Management and Analytics  Citizen Engagement Platform  Program Data Management  Healthcare Record Management Industry/Domains where MongoDB® is Used
    25. 25. Slide 25 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 25 Health Care  360-degree Patient View  Population Management for At-risk Demographics  Lab Data Management and Analytics  Mobile Apps for Doctors and Nurses  Electronic Healthcare Records (EHR) Media and Entertainment  Content Management and Delivery  User Data Management  Digital Asset Management  Mobile and Social Apps  Content Archiving Industry/Domains where MongoDB® is Used
    26. 26. Slide 26 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 26 Retail  Rich Product Catalogs  Customer Data Management  New Services  Digital Coupons  Real-time Price Optimization Telecommunication  Consumer Cloud  Product Catalog  Customer Service Improvement  Machine-to-Machine (M2M) Platform  Real-time Network Analysis and Optimization Industry/Domains where MongoDB® is Used
    27. 27. Slide 27 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 27 How Popular is MongoDB® in the Industry?  Google search provides a wide indicator of MongoDB® adoption in the industry.
    28. 28. Slide 28 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 28 Job Opportunity and Trends in MongoDB®
    29. 29. Slide 29 Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for QuestionsSlide 29 Where Not to Use MongoDB®? Highly transactional applications Applications with traditional database system where requirements such as foreign-key constraints etc are needed.
    30. 30. Slide 30 Questions? Buy MongoDB Courses at : Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions