Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IN106 Performance with MongoDB

275 views

Published on

One of the reasons companies are turning to NoSQL databases is performance. This presentation highlights performance advantages of MongoDB over other data stores, covers key hardware requirements for performance along with discussing sharding, choosing a database type, key factors for an optimal schema design, in addition to the types of and importance of indexes.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

IN106 Performance with MongoDB

  1. 1. MWLUG 2017 Moving Collaboration Forward MongoDB Performance Kim Greene Kim Greene Consulting, Inc. kim@kimgreene.com
  2. 2. MWLUG 2017 Moving Collaboration Forward About Me { “name”: “Kim Greene”, “email”: “kim@kimgreene.com”, “company”: “Kim Greene Consulting, Inc.”, “website”: “www.kimgreene.com”, “twitter”: “@iSeriesDomino”, }
  3. 3. MWLUG 2017 Moving Collaboration Forward Agenda • Why companies are turning to MongoDB • Hardware • Sharding • Database choice • Schema design • Indexes
  4. 4. MWLUG 2017 Moving Collaboration Forward Why Customers are Turning to MongoDB
  5. 5. MWLUG 2017 Moving Collaboration Forward Inserting Data: MongoDB vs. MySQL • Inserting 1,615 chemical compound records into two parent- child tables • Turned off foreign keys during insert and used string builder to create bulk insert SQL statement in MySQL
  6. 6. MWLUG 2017 Moving Collaboration Forward MongoDB vs. S3 Performance • Download 220 KB object from MongoDB was 7x faster cold, and 3x faster when warm
  7. 7. MWLUG 2017 Moving Collaboration Forward Hardware
  8. 8. MWLUG 2017 Moving Collaboration Forward Hardware Requirements • Can use commodity hardware all way up to IBM Power and zSeries – Use multi-core systems when possible • Ensure indexes and most frequently accessed data (working set) fits in RAM • RAM is the most important factor for hardware • db.serverStatus() – Use to obtain info on current working set
  9. 9. MWLUG 2017 Moving Collaboration Forward Hardware Requirements • Data placement is key! – Use SSDs for: • Write-heavy data • Placement of journals • Compression – Can reduces footprint by up to 80% – Equals fewer bits read from disk
  10. 10. MWLUG 2017 Moving Collaboration Forward Compression • WiredTiger has native compression • Compression options for documents and indexes – Snappy • Default, balance between high document and journal compression ratios • Low CPU overhead – zlib • Higher compression • Additional CPU overhead – Prefix • What indexes use by default, reduces size ~50%
  11. 11. MWLUG 2017 Moving Collaboration Forward Compression • Snappy and SSDs – Use for frequently accessed data • zlib and rotational disks – Use for older, less frequently accessed data
  12. 12. MWLUG 2017 Moving Collaboration Forward Sharding
  13. 13. MWLUG 2017 Moving Collaboration Forward Sharding • Place a portion of data on certain servers • Use with – Very large data sets – High throughput demands – Needs for geo location of data
  14. 14. MWLUG 2017 Moving Collaboration Forward Sharding
  15. 15. MWLUG 2017 Moving Collaboration Forward Sharding • Distribute data across cluster based on query patterns or data locality • Types of sharding: – Range – Hash – Zone
  16. 16. MWLUG 2017 Moving Collaboration Forward Sharding • Range sharding – Divides data into ranges based on shard key values – Efficient queries when reading documents in a contiguous range – Can have poor read and write performance with poor shard key range selection • Hash sharding – More even data distribution – Can impact performance of range-based queries • Zone sharding – Used to improve locality of data • By geographic region • By hardware configuration for tiered storage-architectures • By application feature
  17. 17. MWLUG 2017 Moving Collaboration Forward Database Choice
  18. 18. MWLUG 2017 Moving Collaboration Forward 4 Types of Databases • WiredTiger – Most commonly used database type, the default • Encrypted – For highly sensitive data • In-memory – For performance critical data • MMAPv1 – Improved version of database used in earlier versions of MongoDB
  19. 19. MWLUG 2017 Moving Collaboration Forward In-Memory Database • Doesn’t maintain any on-disk data, including configuration data, indexes, user credentials, etc. • Entire database needs to be able to fit into memory – Key to know true “working set”
  20. 20. MWLUG 2017 Moving Collaboration Forward Schema Design
  21. 21. MWLUG 2017 Moving Collaboration Forward Schema Design • Schema design is critical – Most performance problems are because of poor schema design • RDBMS schema design – What answers do I have? • MongoDB schema design – What questions will I have?
  22. 22. MWLUG 2017 Moving Collaboration Forward Schema Design • Key items of focus –How will the data be accessed –What is the projected read to write ratio –How large will documents become • Want to structure data to match how it is queried and updated
  23. 23. MWLUG 2017 Moving Collaboration Forward Schema Design • Basic schema designs – Embedding – Referencing – Denormalization
  24. 24. MWLUG 2017 Moving Collaboration Forward Embedding • To embed or not to embed – Favor embedding unless there is a compelling reason not to – If an object needs to be accessed frequently on it’s own, it’s best not to embed
  25. 25. MWLUG 2017 Moving Collaboration Forward Embedding
  26. 26. MWLUG 2017 Moving Collaboration Forward Embedding • Use when all of the data is manipulated together • Relationship between collections is one-to- one • When able to be used, normally reduces latency of get requests by 50%
  27. 27. MWLUG 2017 Moving Collaboration Forward Referencing • Link to other documents when: – One to many relationships – Need to access parts of data stand-alone
  28. 28. MWLUG 2017 Moving Collaboration Forward Denormalizing • Read/write ratio is key for deciding on denormalizing – Fields primarily read and rarely updated are good candidates – If a field is updated frequently, don’t do it
  29. 29. MWLUG 2017 Moving Collaboration Forward Denormalization • Limits having to perform application-level join for denormalized fields
  30. 30. MWLUG 2017 Moving Collaboration Forward Denormalization • Consider the write/read ratio when denormalizing – A field that will mostly be read and only seldom updated is a good candidate for denormalization – As updates become more frequent relative to queries, the savings from denormalization decrease
  31. 31. MWLUG 2017 Moving Collaboration Forward Back to Embedding • Embed computed information when you write it – Prevents needing to retrieve and compute over and over – Works well if writes are infrequent – Pushes work to the application on the write, result is dramatically improved read time
  32. 32. MWLUG 2017 Moving Collaboration Forward Back to Embedding • What to look for when choosing referencing vs embedding data in a document – Things that don’t change often and aren’t read often are best stored in a separate document – Parent document contains a reference to the less frequently accessed/updated document
  33. 33. MWLUG 2017 Moving Collaboration Forward Schema Design • The MongoDB data schema design of choice depends – entirely – on your particular application’s data access patterns • Structure your data to match the ways that your application queries and updates it
  34. 34. MWLUG 2017 Moving Collaboration Forward Indexes
  35. 35. MWLUG 2017 Moving Collaboration Forward Indexes • ½ of all performance issues are due to missing or incorrect secondary indexes • Index early • Index often
  36. 36. MWLUG 2017 Moving Collaboration Forward Types of Secondary Indexes • Unique • Compound • Array • Time to Live (TTL) • Geospatial • Partial • Sparse • Text search
  37. 37. MWLUG 2017 Moving Collaboration Forward Types of Secondary Indexes • Unique – Rejects insertion of new documents or the update of a document with an existing value for the field it is built over • Compound – Useful for queries that specify multiple predicates • Example: Find customers based on last name, first name, and city of residence – Can reduce the need for single field indexes as any leading field in a compound index can be used
  38. 38. MWLUG 2017 Moving Collaboration Forward Types of Secondary Indexes • Array – For fields that contain an array, each array value is stored as a separate index entry • Time to Live (TTL) – Specify a period of time after which the data is automatically deleted from the database
  39. 39. MWLUG 2017 Moving Collaboration Forward Types of Secondary Indexes • Geospatial – Allow MongoDB to optimize queries for documents that contain points or a polygon that are closest to a given point or line; that are within a circle, rectangle, or polygon; or that intersect with a circle, rectangle, or polygon • Partial – Use to include only documents that meet specific conditions
  40. 40. MWLUG 2017 Moving Collaboration Forward Types of Secondary Indexes • Sparse – Contain entries for documents that contain a specified field – Allow for smaller, more efficient indexes when fields are not present in all documents • Text search – Specialized index for text search that uses advanced, language-specific linguistic rules for stemming, tokenization, case sensitivity and stop words
  41. 41. MWLUG 2017 Moving Collaboration Forward Indexing Tidbits • Query optimizer – Selects best index to use by periodically running query plans • Index intersection – Allows MongoDB to use more than one index to optimize ad-hoc queries at run-time • Covered queries – Return results containing only indexed fields – Very efficient, results returned without reading from source documents
  42. 42. MWLUG 2017 Moving Collaboration Forward Aggregation Pipeline • Replaces find in certain scenarios • Improves performance significantly – Moves processing from the client side to the server – Saves CPU and bandwidth • Reduce the amount of data transmitted to the application layer
  43. 43. MWLUG 2017 Moving Collaboration Forward Where to Find More Information
  44. 44. MWLUG 2017 Moving Collaboration Forward Where to Find More Information • MongoDB University – university.mongodb.com • YouTube tutorials – youtube.com/mongodb • MongoDB Performance Best Practices white paper – mongodb.com/collateral/mongodb-performance- best-practices

×