• Share
  • Email
  • Embed
  • Like
  • Private Content
Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond
 

Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond

on

  • 12,032 views

MongoHQ knows there is something special about 100 GB of data. Our customers that hit 100 GB are running core pieces of their business on a scalable MongoDB platform. In this presentation, we will ...

MongoHQ knows there is something special about 100 GB of data. Our customers that hit 100 GB are running core pieces of their business on a scalable MongoDB platform. In this presentation, we will walk through a cloud focused scaling checklist that will help you quickly and securely blow past the 100 GB milestone. Using customer examples and best practice MongoDB use cases, we'll help prepare you to get to the data size your business needs.

Statistics

Views

Total Views
12,032
Views on SlideShare
2,499
Embed Views
9,533

Actions

Likes
15
Downloads
77
Comments
0

34 Embeds 9,533

http://blog.mongohq.com 3259
http://www.mongodb.com 2350
http://blog.mongodb.org 1075
http://webcache.googleusercontent.com 1046
http://mongohq.wpengine.com 744
http://cloud.feedly.com 479
http://www.newsblur.com 186
http://newsblur.com 101
http://127.0.0.1 46
http://digg.com 41
http://www.feedspot.com 37
http://xianguo.com 28
http://feedly.com 21
https://twitter.com 19
http://feeds.darkmaster.at 18
http://vitormazzi.newsblur.com 12
http://plus.url.google.com 10
http://www.hanrss.com 8
http://kylefoster.me 8
http://drupal1.10gen.cc 6
https://www.mongodb.com 5
http://inoreader.com 5
http://summary 5
http://cc.bingj.com 4
http://www.pudd.co.uk 4
http://reader.aol.com 3
https://live.mongodb.com 3
https://www.google.com 2
http://joe-desktop.office.infosiftr.com 2
http://nuesbyte.com 2
http://localhost 1
http://translate.googleusercontent.com 1
http://reader.eruditorum.fr 1
http://feedreader.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 4 years of experienceRun 50,000 total MongoDB databasesRun multi-terrabytesharded environmentsWe have a philosophy of optimize, then shardOur real enjoyment is seeing a company grow, use our platform, and find value with our platform
  • If your company is creating data, and you and your customers have created 100GB of data, that is fast growing business.In some cases, 10GB is a good, growing business.Unless you are digesting a Twitter’s firehose or scraping the web consuming someone else’s data, there is something special about 100GB of data10GB and growing is a good as-a-service business on a good clipAs you approach 50GB of data, your next hurdle is 100GBWe are thinking about building applications that are planning for 100GB of data
  • Quick survey to see what type of audience we have. If you would, just respond manually with your dataSize, or with your letter to the chat. We will summate those values in a moment.
  • Let’s get started looking at the checkilst.Reasonable numberFirst 3 build the case for how you should think about MongoDB, please bare with me through these sections, I am framing the discussion. 1 – 3 will be longer, but will lend a theme to latter examples4 – 6 are a set of techniques for improving performance7 – 10 are some commandments.11 leaves you with some words of wisdom
  • Let’s start off by identifying your data’s behavior.
  • I’ve put together two different axis for us to look at: data and queries.Data type : small v. largeData type: fast v. slowQuery type: complex v. simpleQuery type: known v. unknownBefore engineers checkout from this talk, this exercise is quick, easy and important for mapping your data usage. It will help you understand your different types of data that compose your application. It will also help when searching for the best tools for the job.I am proposing two sets of axis here, but I am sure there are more. After discussing with customers, these two charts are a good starting point, and offer a good way to think of data growth.
  • Here is our first access. Fast and slow on the y axis Small and large on the x axisMy question for you is: what type of data do you have?Do you have fast and large data?Do you have fast and small data?Slow and large data?What type of data do you have?Increasingly we find
  • Modern applications have all data types
  • Data’s characteristics are not static.Overtime, your data type is changing. In the chart, we are showing aging data move from fast and small to larger and slower.When discussing the same set of data, we have to discuss the assumption of time. Two engineers can talk about the same collection of data. One engineer is thinking of last week’s data. Another engineer is thinking of this week’s data, and they are arguing different use cases.
  • Green is good.MongoDB excels at use cases with many types of data except queues, messaging, and OLTP. If you have small / fast data, MongoDB is not the tool for that. Notice, I don’t go to the end of fast, large, or small axis. I recognize there are edge cases past the capabilities of MongoDB performance.
  • Our next axis we are introducing is query type.Previously, we had discussed data type. Previously, we answered “How does your data behave?”Now, we are answering “How does you retrieve your data?” These axis are not as intuitive as our prior axises.Simple query: single valueComplex query: multiple values, multiple conditions, multiple rangesKnown query: you precisely know the arguments you are queryingUnknown query: you do know yet know the arguments you are querying
  • As with earlier, modern applications have all types of data.The key positions on this spectrum are upper right, “Simple and known” and lower left, “Complex and unknown”.Simple and known represent many modern NoSQL database’s – key value storesUnknown and complex is what I term “data discovery” – the data has not turned into information, and I want to go through the process of turning raw data into actionable information. This is “data discovery”. It typically represents very complex queries, and represents an unknown end state of the queriesOn the rest of the spectrum, we have internal metrics, which are often every structured datasets, and single range queries. Across from each other, we have application search and user search. Applications’ typically have a search mechanism.
  • Which queries are off limits in MongoDB – data discovery.
  • As with data types, queries required of data also change over time. Today’s data is suitable for quick fast application usage.Last week’s data requires analysis – data discoveryThese are important notions when working with increasingly larger data – we recognize that all data created similarly will have different requirements during its life. Recognize these nuances, and adapt.
  • MongoDB dominates the “simple / known” quadrant. We
  • If you’ve usedMongoDB, JSON-like dataExpressive documents on complex relationshipsWith creativity you can create simple, known queries and these complex relationshipsOn-the-fly addition of attributes / columns
  • MongoDB use’s btree indexes.The indexing constraints are:absolutesimpleList the indexing constraintsNo intersections of indexesOperators are $or, $and, $sort, $nin, $in, $ne, $gte, $lteAny violation of these constraints will lead to table scans
  • Hopefully, with 1 – 3, I’ve built the case for simplicity. And, now we will answer the question: “what does it mean to optimize with for 100 GB of data?”
  • Scaling your query and database interactions will move you toward the simple and known quadrant. Approaching an interaction that is similar to a key-value store.
  • Imagine a messaging system that captures messages between two parties.For any person, you want to find the most recent messages for a personNaïve example – use the $or operator Of course, we learned that $sort does not work with $or, this will cause table scansSecond attempt, query on participantsBest approach, use NoSQL for what it does best
  • Here is a view of aging data.How does data become “cold”?How does your data become “hot”? What data needs to be fast?What data can be slow?
  • Keep fast data fast, and keep cold data separateOver time, we’ve stated our data becomes:LargerSlowerRequires complex queriesFilters on unknown conditionsKeep that data separate from today’s data.
  • Backups are important – don’t use `mongodump` to do it (unless you have a 3rd member you want to run them against)

Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyond Presentation Transcript

  • www.mongohq.com Scaling Checklist for MongoDB Scaling Checklist for MongoDB 100GB & Beyond
  • www.mongohq.com Scaling Checklist for MongoDB MongoHQ www.mongohq.com | @mongohq MongoHQ is a fully-managed platform used by developers to deploy, host and scale open-source databases. Chris Winslett chris@mongohq.com I’ve spoken at a number of MongoDB conferences on optimizing queries. I’ve been with MongoHQ for two years – prior to that I built applications for the education and technical sectors.
  • www.mongohq.com Scaling Checklist for MongoDB TL;DR • 100GB of data is relatively big data • MongoDB has comparative advantages • MongoDB has absolute constraints • Know the MongoDB gauges • Surpassing 100GB requires: – Understanding absolute constraints. – Knowledge of application’s data consumption – Optimization of data consumption to comparative advantages
  • www.mongohq.com Scaling Checklist for MongoDB Audience Survey What is your data size? Choose the biggest bucket. A. < 10GB B. < 50GB C. < 75GB D. < 100GB E. > 100 GB
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Identify your data behavior 1. Small v. Large – type of data 2. Fast v. Slow – behavior of data 3. Complex v. Simple – type of queries 4. Known v. Unknown – behavior of queries 5. Queuing v. Application data This can happen at planning, staging, or production phase.
  • www.mongohq.com Scaling Checklist for MongoDB Patterns of your Data Small Large Fast Slow
  • www.mongohq.com Scaling Checklist for MongoDB Small Large Fast Slow Modern applications have all patterns Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Rendered in background
  • www.mongohq.com Scaling Checklist for MongoDB Small Large Fast Slow Where doesn’t MongoDB excel? Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Rendered in background
  • www.mongohq.com Scaling Checklist for MongoDB 4th dimension is time Main application collections Today’s Data Last week’s data Small Large Fast Slow
  • www.mongohq.com Scaling Checklist for MongoDB Data-types to avoid with MongoDB Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Small Large Fast Slow Rendered in background
  • www.mongohq.com Scaling Checklist for MongoDB What type of queries do you have? Unknown Known Simple Complex
  • www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex Modern applications have all types of queries Data discovery Application search Key value Single Range Query User generated search Internal metrics Multi- Range Query
  • www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex Queries to Avoid with MongoDB Data discovery Application search Key value Single Range Query User generated search Internal metrics Multi- Range Query
  • www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex 4th Dimension is Time Real-time core of application Today’s Data Last week’s data
  • www.mongohq.com Scaling Checklist for MongoDB MongoDB Queries and MongoDB Elastic Search SQL Elastic Search Unknown Known Simple Complex
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB MongoDB’s Technical Comparative Advantage • Expressive data structure allows simplification of complex data relationships • Create simple, known queries and return expressive relationships • On-the-fly addition of attributes / columns • Total Cost of Ownership*
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB MongoDB Indexing Constraints • Only one index can be used per query • Only one range operator can be used per index • Range operator must be the last field on index • Know how to use the right side of indexes
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB What does it mean to optimize? Unknown Known Simple Complex Scaling to 100GB involves moving queries from complex to simple and unknown to known Start Finish Start
  • www.mongohq.com Scaling Checklist for MongoDB Example of simplifying a query. Naïve Query: db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1}) Find the most recent messages for a person’s message stream. Second attempt: db.messages.find({participant_ids: <id>}).sort({_id: -1}) Best approach db.users.find({_id: <id>})
  • www.mongohq.com Scaling Checklist for MongoDB Naïve Query { _id: <id>, message: “Wow, this pizza is good!”, sender_id: <user_id>, recipient_id: <user_id> } db.messages.find({$or: [{recipient_id: <id>}, {sender_id: <id>}]}).sort({_id: -1}) Document Query
  • www.mongohq.com Scaling Checklist for MongoDB Second Attempt Document { _id: <id>, message: “Wow, this pizza is good!”, sender_id: <sender_id>, recipient_id: <recipient_id>, participant_ids: [<sender_id>,<recipient_id>] } db.messages.find({participant_ids: <id>}).sort({_id: -1}) Query
  • www.mongohq.com Scaling Checklist for MongoDB Best approach Document Hint: use the $push, $sort, $slice for the last 50 { _id: <id>, name: “Clarke Kent”, recent_messages: [ <…50 denormalized messages…> ] } db.users.find({_id: <id>}) Query
  • www.mongohq.com Scaling Checklist for MongoDB How did we optimize? Unknown Known Simple Complex We took a known, complex query and made it simple. Finish Start
  • www.mongohq.com Scaling Checklist for MongoDB Methods for Simplifying Queries • Bucket values • Create summary attributes • Pre-compute values • Use expressive documents structures • Sort and filter at the application level • Create summary documents • Divide and measure (more on this later)
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Small Large Fast Slow Remove “unrefactorable” data Main application collections Application Metadata Secondary Application Collections Internal metrics Event logs and event data Queues, OLTP, Messages Rendered in background Redis
  • www.mongohq.com Scaling Checklist for MongoDB MongoDB Move up and right, or find another tool Unknown Known Simple Complex Data discovery Application search User generated search Multi- Range Query
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Unknown Known Simple Complex 4th Dimension is Time Real-time core of application Today’s Data (fast) Last week’s data (slower)
  • www.mongohq.com Scaling Checklist for MongoDB Separate Data with Cross Purposes • If this today’s data must be fast, and last week’s data can be slow: – Rollout today’s data using TTL collections – Use another database for last weeks data – Use high-RAM ratio and SSD backed machines for this today’s data – Use cheaper hardware for last week’s data
  • www.mongohq.com Scaling Checklist for MongoDB MongoDB Doesn’t have Joins Data doesn’t have to be adjacent. Divide, measure, conquer.
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Stop Use `mongodump` `mongodump` is long running tablescan that exports all documents. This disrupts RAM and causes performance issues. Self-hosting: use the MongoDB MMS and Backup As-a-service: ask your vendor about backup alternatives
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Configure MMS Now!
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Avoid Page Faults like the Plague 0 1000 2000 3000 4000 5000 6000 7000 8000 50% Table Scans 1% Table Scans 0% Table Scans MongoDB Operations / Second
  • www.mongohq.com Scaling Checklist for MongoDB MongoDB What type of queries cause page faults? Unknown Known Simple Complex Data discovery Application search User generated search Multi- Range Query
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Track & Remove Slow Queries • system.profile collection – link • MongoDB professor – link • Dex – link • MongoHQ Slow Query Tracker and Profiler - link
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Buying time with hardware has a limited life • Don’t get addicted to buying more hardware. • Before any purchasing decision, always – consider optimization – investigate separating, paring data
  • www.mongohq.com Scaling Checklist for MongoDB 100 GB Checklist 1. Identify your data behavior 2. Use MongoDB for comparative advantages 3. Know the MongoDB indexing constraints 4. Refactor schema to simplify queries 5. Remove data that does not fit MongoDB 6. Separate hot and cold data 7. Stop using `mongodump` 8. Check your gauges 9. Avoid queries causing page faults 10. Track and monitor slow queries 11. Buying time with hardware
  • www.mongohq.com Scaling Checklist for MongoDB Thank you! For any questions: chris@mongohq.com www.mongohq.com @mongohq