Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

When to Use MongoDB

5,619 views

Published on

Engineers often ask "how do I know if I should build my application on MongoDB?" IT executives ask a similar question, "which applications in my application portfolio should I migrate to MongoDB?" This presentation will present a framework for answering these questions.

We will cover two sets of criteria: (1) how to determine when to migrate a legacy application to MongoDB and (2) when should MongoDB be used for new applications? The presentation will also include a brief introduction to MongoDB to provide enough MongoDB technical background for analyzing when to use MongoDB?

Learning Objectives:
The basics of MongoDB document model, query capabilities, and architecture required for analyzing when to use MongoDB?
Criteria for determining when to use MongoDB to re-platform legacy applications
Criteria for determining when to use MongoDB for new applications

  • Be the first to comment

When to Use MongoDB

  1. 1. Jay Runkel Principal Solutions Architect WHEN TO USE MONGODB jay.runkel@mongodb.com @jayrunkel
  2. 2. AGENDA • When to use MongoDB? Are we asking the right question? • Why MongoDB? • Evaluating Use Case Suitability for MongoDB • When you shouldn’t use MongoDB?
  3. 3. WHEN TO USE MONGODB?
  4. 4. TRANSPORTATION ?
  5. 5. CLEARING A FOREST ?
  6. 6. MODERN APPLICATION RDBMS MongoDB ?
  7. 7. SHOULD WE USE MONGODB?
  8. 8. Legacy Rigid Schemas Resistant to change Throughput & Cost make Scale-Up Impractical Relational Model Scale-up Data changes constantly, which fits poorly with a relational model Scale-Up clusters were never meant to handle today’s volumes Today Flexible Model 01 10 JSON Scale-out Flexible Multi-Structured Schema that is designed to adapt to changes Scale-out to the end of the world and distribute data where it needs to be TRADITIONAL RDBMS SYSTEMS WEREN’T DESIGNED FOR TODAY’S WORLD
  9. 9. BEING SUCCESSFUL WITH MONGODB 5x Productivity* We help our customers to increase overall output, e.g. in terms of development or ops productivity. 80% Cost reduction* We help our customers to dramatically lower their total cost of ownership for data storage and analytics by up to 80%. * Dependent on type of implementation While the detailed definition of success metrics look different for each customer, 2 key factors are consistent across all of our engagements:
  10. 10. SHOULD WE USE MONGODB?
  11. 11. CAN WE USE MONGODB? • If we get ‒ 5x developer productivity ‒ 80% cost reduction • Shouldn’t we consider this alternative first? Assess MongoDB Fit MongoD B? Build In MongoDB Look at Alternatives yes no
  12. 12. SHOULD CAN WE USE MONGODB?
  13. 13. WHY MONGODB?
  14. 14. RELATIONAL Expressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  15. 15. Scalability & Performance Always On, Global Deployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations NOSQL
  16. 16. NEXUS ARCHITECTURE Scalability & Performance Always On, Global Deployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations
  17. 17. THAT’S NICE JAY, BUT… • Where does the developer productivity come from? • What about the TCO savings?
  18. 18. DEVELOPER PRODUCTIVITY
  19. 19. DOCUMENT DATA MODEL Relational MongoDB { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] }
  20. 20. DOCUMENTS ARE RICH DATA STRUCTURES { first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } Fields can contain an array of sub- documents Fields Typed field values Fields can contain arrays
  21. 21. DOCUMENTS ARE FLEXIBLE Documents in the same product catalog collection in MongoDB { product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’], size_oz: [8, 32], finish: [‘satin’, ‘eggshell’] } { product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ], material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’ } { product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’, frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’, weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26 }
  22. 22. DO MORE WITH YOUR DATA { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } } } Rich Queries Find everybody in London with a car built between 1970 and 1980 Geospatial Find all of the car owners within 5km of Trafalgar Sq. Search Find all the cars described as having leather seats. Count them by model. (text, facets, collation) Aggregation Calculate the average value of Paul’s car collection Graph Find all the cars own by Paul’s family (descendants) Map Reduce What is the ownership pattern of colors by geography over time? (is purple trending up in China?)
  23. 23. Morphia MEAN Stack Java Python PerlRuby Support for the most popular languages and frameworks DRIVERS & ECOSYSTEM
  24. 24. DEVELOPMENT – THE PAST
  25. 25. DEVELOPMENT – WITH MONGODB
  26. 26. NEW DATA FIELDS AND TYPES • New sensor version à new field ALTER TABLE device_data ADD lbs_fuel int; • 5000 aircraft x 1 year of data x 1 reading per minute > 2B Rows TailNumber lbs fuel ts speed New Column 2BRows How long will this take?
  27. 27. MONGODB LIFECYCLE
  28. 28. DAY 1: INITIAL EFFORTS FOR BOTH TECHNOLOGIES DDL: create table contact ( … ) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name ) values ( ?,? )”); fetchStmt = connection.prepareStatement (“select id, name from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); } return m; } SQL DDL: none save(Map m) { collection.insert(m); } mongoDB Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; } Let’s assume for argument’s sake that both approaches take the same amount of time
  29. 29. DAY 2: ADD SIMPLE FIELDS m.put(“name”, “buzz”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); • Capturing title and hireDate is part of adding a new business feature • It was pretty easy to add two fields to the structure • …but now we have to change our persistence code Brace yourself (again) …..
  30. 30. SQL DAY 2 (CHANGES IN BOLD) DDL: alter table contact add title varchar(8); alter table contact add hireDate date; init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); fetchStmt = connection.prepareStatement (“select id, name, title, hiredate from contact where id = ?”); } save(Map m) { contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); contactInsertStmt.execute(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); if(rs.next()) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); } return m; } Consequences: 1. Code release schedule linked to database upgrade (new code cannot run on old schema) 2. Issues with case sensitivity starting to creep in (many RDBMS are case insensitive for column names, but code is case sensitive) 3. Changes require careful mods in 4 places 4. Beginning of technical debt
  31. 31. MONGODB DAY 2 save(Map m) { collection.insert(m); } Map fetch(String id) { Map m = null; DBObject dbo = new BasicDBObject(); dbo.put(“id”, id); c = collection.find(dbo); if(c.hasNext()) } m = (Map) c.next(); } return m; } Advantages: 1. Zero time and money spent on overhead code 2. Code and database not physically linked 3. New material with more fields can be added into existing collections; backfill is optional 4. Names of fields in database precisely match key names in code layer and directly match on name, not indirectly via positional offset 5. No technical debt is created✔ NO CHANGE
  32. 32. DAY 3: ADD LIST OF PHONE NUMBERS m.put(“name”, “buzz”); m.put(“id”, “K1”); m.put(“title”, “Mr.”); m.put(“hireDate”, new Date(2011, 11, 1)); n1.put(“type”, “work”); n1.put(“number”, “1-800-555-1212”)); list.add(n1); n2.put(“type”, “home”)); n2.put(“number”, “1-866-444-3131”)); list.add(n2); m.put(“phones”, list); • It was still pretty easy to add this data to the structure • .. but meanwhile, in the persistence code … REALLY brace yourself…
  33. 33. SQL DAY 3 CHANGES: OPTION 2: PROPER APPROACH WITH MULTIPLE PHONE NUMBERS DDL: create table phones ( … ) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select id, name, title, hiredate, type, number from contact, phones where phones.id = contact.id and contact.id = ?”); } save(Map m) { startTrans(); contactInsertStmt.setString(1, m.get(“id”)); contactInsertStmt.setString(2, m.get(“name”)); contactInsertStmt.setString(3, m.get(“title”)); contactInsertStmt.setDate(4, m.get(“hireDate”)); for(Map onePhone : m.get(“phones”)) { c2stmt.setString(1, m.get(“id”)); c2stmt.setString(2, onePhone.get(“type”)); c2stmt.setString(3, onePhone.get(“number”)); c2stmt.execute(); } contactInsertStmt.execute(); endTrans(); } Map fetch(String id) { Map m = null; fetchStmt.setString(1, id); rs = fetchStmt.execute(); int i = 0; List list = new ArrayList(); while (rs.next()) { if(i == 0) { m = new HashMap(); m.put(“id”, rs.getString(1)); m.put(“name”, rs.getString(2)); m.put(“title”, rs.getString(3)); m.put(“hireDate”, rs.getDate(4)); m.put(“phones”, list); } Map onePhone = new HashMap(); onePhone.put(“type”, rs.getString(5)); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); i++; } return m; } This took time and money
  34. 34. SQL DAY 5: ZOMBIES! (ZERO OR MORE BETWEEN ENTITIES) init() { contactInsertStmt = connection.prepareStatement (“insert into contact ( id, name, title, hiredate ) values ( ?,?,?,? )”); c2stmt = connection.prepareStatement(“insert into phones (id, type, number) values (?, ?, ?)”; fetchStmt = connection.prepareStatement (“select A.id, A.name, A.title, A.hiredate, B.type, B.number from contact A left outer join phones B on (A.id = B. id) where A.id = ?”); } Whoops! And it’s also wrong! We did not design the query accounting for contacts that have no phone number. Thus, we have to change the join to an outer join. But this ALSO means we have to change the unwind logic This took more time and money! while (rs.next()) { if(i == 0) { // … } String s = rs.getString(5); if(s != null) { Map onePhone = new HashMap(); onePhone.put(“type”, s); onePhone.put(“number”, rs.getString(6)); list.add(onePhone); } } …but at least we have a DAL… right?
  35. 35. COST REDUCTION
  36. 36. DEVELOPER COSTS ON THE RISE $0 $20,000 $40,000 $60,000 $80,000 $100,000 $120,000 1985 2013 $0 $20,000 $40,000 $60,000 $80,000 $100,000 1985 2013 Storage Cost per GB Developer Salary
  37. 37. OPTIMIZING FOR ENGINEERING PRODUCTIVITY 1985 2017 Engineer Costs Infrastructure Costs
  38. 38. COST REDUCTION 1. Scale out on commodity hardware vs. scale up 2. Cloud 3. Build-in HA ‒ No additional components ‒ Configuration
  39. 39. SCALING RELATIONAL Scale Up Scale Out
  40. 40. SCALING MONGODB: AUTOMATIC SHARDING Three types: hash-based, range-based, location-aware Increase or decrease capacity as you go Automatic balancing
  41. 41. QUERY ROUTING Multiple query optimization models Each sharding option appropriate for different apps
  42. 42. CLOUD - ATLAS
  43. 43. Automated Available On-Demand Secure Highly Available Automated Backups Elastically Scalable Atlas: Database as a Service for MongoDB
  44. 44. RELATIONAL HIGH(?) AVAILABILITY Application replication DC1 DC2 Replication Replication Availability Availability Application Bolted on Components • Recovery: min – hours • Manual intervention • Expensive $$$
  45. 45. MONGODB REPLICA SETS Replica Set – 2 to 50 copies Self-healing shard Data Center Aware Addresses availability considerations: High Availability Disaster Recovery Maintenance Workload Isolation: operational & analytics
  46. 46. WAIT? WHAT ABOUT OTHER NOSQL?
  47. 47. REMEMBER THIS? { first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } } } Rich Queries Find everybody in London with a car built between 1970 and 1980 Geospatial Find all of the car owners within 5km of Trafalgar Sq. Search Find all the cars described as having leather seats. Count them by model. (text, facets, collation) Aggregation Calculate the average value of Paul’s car collection Graph Find all the cars own by Paul’s family (descendants) Map Reduce What is the ownership pattern of colors by geography over time? (is purple trending up in China?)
  48. 48. AGGREGATION: POWERFUL ANALYTICS
  49. 49. MONGODB CONNECTOR FOR BI Visualize and explore multi-dimensional documents using SQL-based BI tools. The connector does the following: • Provides the BI tool with the schema of the MongoDB collection to be visualized • Translates SQL statements issued by the BI tool into equivalent MongoDB queries that are sent to MongoDB for processing • Converts the results into the tabular format expected by the BI tool, which can then visualize the data based on user requirements
  50. 50. “We reduced 100+ lines of integration code to just a single line after moving to the MongoDB Spark connector.” - Early Access Tester, Multi-National Banking Group Group Analytics Application Scala, Java, Python, R APIs SQL Machine Learning Libraries Streaming Graph Spark Worker Spark Worker Spark Worker Spark Worker MongoDB Connector for Spark ADVANCED ANALYTICS MongoDB Connector for Apache Spark • Native Scala connector, certified by Databricks • Exposes all Spark APIs & libraries • Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations • Locality awareness to reduce data movement • Updated with Spark 2.0 support
  51. 51. WHAT DOES THIS MEAN? • Developer productivity • Wider range of use cases • Changing requirements • Complex queries and analytics MongoDB
  52. 52. WIDE VARIETY OF USE CASES Single View Internet of Things Mobile Real-Time Analytics App Modernization Content ManagementBlockchain
  53. 53. DECISION CRITERIA
  54. 54. CRITERIA: WHEN TO USE MONGODB? • Triage ‒ Building new app ‒ Re-platforming existing app ‒ Evaluating application portfolio
  55. 55. DECISION FLOW CHART Existing Application ? Existing App Criteria New App Criteria yes no
  56. 56. EXISTING APPLICATIONS • Is there a critical requirement that isn’t being met? ‒ Performance/Scalability ‒ Agility ‒ Variable data sources/formats ‒ Availability/Resiliency ‒ Cost ‒ Cloud • Revision or re-platform?
  57. 57. EXISTING APPLICATION CHALLENGES Requirements Challenges MongoDB Features Performance/Scalability Can’t meeting query volume Query Latency issues Data volume exceeding server(s) capacity Document Model WiredTiger Sharding Commodity Hardware Cloud/Atlas Availability/Resiliency Need automatic failover: • Zero down time when loss of node, network, or data center • No engineering effort required to restore service Replica sets • Automated failover • Zero downtime maintenance Cloud Migration Cloud migration No cloud provider lock in Atlas
  58. 58. EXISTING APPLICATION CHALLENGES Requirements Challenges MongoDB Features Agility – Shorten time to value Feature backlog Developers focused on maintenance instead of innovation Flexible document model Powerful query language Driver architecture Variable data sources/format New data sources Data format changes continuously Flexible document model Cost Mainframe MIPS RAC clusters Additional expensive components for replication, failover Commodity Hardware Open Source Atlas Replica sets
  59. 59. CRITERIA FOR ASSESSING MONGODB FIT • Performance/Scalability • Availability/Resiliency • Cloud • Agility • Variable Data • Cost • Data naturally modeled as documents? • Complex queries • Analytics • Strong consistency
  60. 60. ADDITIONAL CRITERIA Requirements Challenges MongoDB Features Data naturally modeled as documents Complex code for shredding and reconstituting objects Flexible document model Complex Queries Analytics Complex application code Complex architectures including search engine, Hadoop, ETL, CDC Limited application functionality Long time to market Secondary indexes Powerful query language Aggregration Framework BI Connector Spark integration Strong Consistency Users require most up-to-date view of data Complex application code required to handle edge cases Strong consistency Read and write concerns
  61. 61. WHEN NOT USE MONGODB?
  62. 62. WHEN NOT TO USE MONGODB?
  63. 63. EXISTING APPLICATION • Performance/Scalability • Availability/Resiliency • Cloud • Agility • Variable Data • Cost ✓ ✓ ✓ ✓ ✓ ✓
  64. 64. NEW APPLICATIONS • MongoDB makes sense for vast majority of use cases • Why not MongoDB? ‒ Fear/comfort level with new technology ‒ Don’t know how to support MongoDB? ‒ Don’t want to learn new technology ‒ Expensive enterprise license that is “free” to project team • Our other solution is good enough
  65. 65. LET’S REVIEW
  66. 66. NEXUS ARCHITECTURE Scalability & Performance Always On, Global Deployments FlexibilityExpressive Query Language & Secondary Indexes Strong Consistency Enterprise Management & Integrations Replica sets Sharding WiredTiger Document Model Replica Sets Sharding Ops & Cloud Mgr Atlas MongoDB query language Secondary indexes Aggregation Framework BI Connector Strong Consistency Read and Write Concern BI Connector Ops & Cloud Mgr. Spark Connector Atlas
  67. 67. I DON’T ALWAYS BUILD APPLICATIONS IN MONGODB, BUT WHEN I DO I GET….. Ø 5x Developer Productivity Ø 80% Cost Reduction
  68. 68. USE CASE INDICATORS FOR MONGODB • Performance/Scalability • Availability/Resiliency • Cloud • Agility • Variable Data • Cost • Data naturally modeled as documents? • Complex queries • Analytics • Strong consistency
  69. 69. QUESTIONS? • Jay Runkel • Principal Solutions Architect jay.runkel@mongodb.com @jayrunkel
  70. 70. MANY REASONS FOR MONGODB? • Strong Consistency ‒ Documents ‒ Indexes ‒ Consistency across multi-data center deployments • Expressive Query Language and Secondary Indexes ‒ More powerful than SQL ‒ Analytics ‒ Dynamic index creation • Scalability/Performance ‒ PBs of Data ‒ Millions of ops/sec • High Availability ‒ Automated failover < 2 seconds ‒ Supports Active-Active and Active- Passive multi-data center deployments • Deploy Anywhere ‒ On-Prem, AWS, Azure, Google • Ease of Management ‒ Best in class operations tooling ‒ Configure once: one cluster spans multi-data centers

×