Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mongo db improve the performance of your application codemotion2016

588 views

Published on

Tips and tricks useful for improve the performance of your MongoDB-based application

Published in: Data & Analytics
  • Be the first to comment

Mongo db improve the performance of your application codemotion2016

  1. 1. MADRID · NOV 18-19 · 2016 19th November 2016 18:30-19:15 MongoDB: Improve the performance of your application by @juanroycouto @codemotion_es 1
  2. 2. MADRID · NOV 18-19 · 2016 #whoami ❖ @juanroycouto ❖ www.juanroy.es ❖ www.mongodbspain.com/en/author/juanroy ❖ juanroycouto@gmail.com ❖ MongoDB DBA at Grupo Undanet 2
  3. 3. MADRID · NOV 18-19 · 2016 #Agenda 3 ● Data Model ● Application Patterns ● .explain() ● Indexes ● Storage Engines ● Query Optimizer ● Aggregation Framework ● Profiling ● Tuning ● Sharding ● Storage & Compression
  4. 4. MADRID · NOV 18-19 · 2016 #Data Model 4
  5. 5. MADRID · NOV 18-19 · 2016 #Data Model 5 ● All the information in one document: ○ No joins ○ Single read ○ Higher performance ○ ACID compliance at the document level ● MMAPv1: avoid unbounded document growth ● WiredTiger rewrites the document for each update
  6. 6. MADRID · NOV 18-19 · 2016 #Application Patterns 6 ● Modify only those fields that have changed ● Project only the fields you need ● Avoid negation in queries ● Use Covered Queries when possible ● Avoid scatter-gather queries ● Write Concern: ○ w : 0 → faster than w : all
  7. 7. MADRID · NOV 18-19 · 2016 #.explain() 7 ● Use .explain() to get the time a query needs ● Verbose options: ○ queryPlanner (default) ○ executionStats (stats for the winner plan) ○ allPlansExecution (stats for all candidate plans)
  8. 8. MADRID · NOV 18-19 · 2016 #.explain() 8 ● In-memory sorting data is limited to 32MB (you need an index) ● My index selectivity is the relationship between nReturned and totalKeysExamined
  9. 9. MADRID · NOV 18-19 · 2016 #Indexes 9
  10. 10. MADRID · NOV 18-19 · 2016 #Indexes 10 ● Improve the performance of read-intensive apps ● Overhead: ○ Write operations ○ Disk usage ○ Memory consumption
  11. 11. MADRID · NOV 18-19 · 2016 ● Working Set is made of: ○ Indexes ○ Your most frequently accessed data #Indexes-Working Set 11
  12. 12. MADRID · NOV 18-19 · 2016 ● If your Working Set does not fit into RAM: ○ Increment your RAM ○ Shard the database ○ Avoid scan all documents in the collection ■ Page faults ■ You will fill in your Working Set with infrequent accessed data #Indexes-Working Set 12
  13. 13. MADRID · NOV 18-19 · 2016 #Indexes-Before creating indexes 13 ● Understand your data ● How your queries will be ● Create as few indexes as possible ● Put your highly selective fields first in the index ● Check your current indexes before creating new ones
  14. 14. MADRID · NOV 18-19 · 2016 #Indexes-Advices 14 ● Do not index a little cardinality field ● Check your index performance ● Monitor your system to check if: ○ Your strategy is still up to date ○ You need new indexes ○ You can remove unnecessary indexes
  15. 15. MADRID · NOV 18-19 · 2016 #Indexes-Compound 15 ● A compound index has better performance than: ○ Single indexes ○ Intersection ● Remove indexes that are prefixes of other ● The value of the combined fields must have high cardinality
  16. 16. MADRID · NOV 18-19 · 2016 #Indexes-Compound 16 >db.collection.createIndex( { equality, sort, range } ); >db.collection.find( { book_author : ‘John’, #equality book_id : { $gt : 54 } #range } ) .sort( { book_price : 1 } ); #sort
  17. 17. MADRID · NOV 18-19 · 2016 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 2, "totalKeysExamined" : 100, "totalDocsExamined" : 100, > db.example.dropIndex( { a : 1, b : 1 } ); > db.example.createIndex( { a : 1 } ); > db.example.createIndex( { b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 36, "totalKeysExamined" : 10000, "totalDocsExamined" : 10000 #Indexes-Compound example 17
  18. 18. MADRID · NOV 18-19 · 2016 #Indexes-Covered Queries 18 ● They return results from the indexes ● A covered query does not read from the document level: ○ ndocsExamined : 0 ● In the index must be all the fields: ○ Included in the query ○ Returned by the query
  19. 19. MADRID · NOV 18-19 · 2016 #Indexes-Covered Queries example 19 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 100 > db.example.explain("executionStats").find( { a : 56, b : 87 }, { _id : 0, a : 1, b : 1 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 0
  20. 20. MADRID · NOV 18-19 · 2016 #Indexes-TTL 20 ● TTL (TimeToLive) Indexes ○ Allow the user to specify a period of time after which the data will be automatically deleted
  21. 21. MADRID · NOV 18-19 · 2016 #Indexes-TTL example 21 > use test > db.timetolive.createIndex( { date : 1 } , { expireAfterSeconds: 20 } ); > db.timetolive.insert( { date : new Date() } ); > db.timetolive.find().pretty() { "_id" : ObjectId("581fb43f5626d57bcf66fe73"), "date" : ISODate("2016-11-06T22:52:47.373Z") } > db.timetolive.count() 0 > The background task that removes expired documents runs every 60 seconds.
  22. 22. MADRID · NOV 18-19 · 2016 #Indexes-Partial 22 ● Partial Indexes ○ Allow to index a subset of your data by specifying a filtering expression during the index creation ○ Better query performance while reducing system overhead
  23. 23. MADRID · NOV 18-19 · 2016 #Indexes-Partial example > use test > db.sysprofile.createIndex( { ns : 1, op : 1 }, { partialFilterExpression : { millis : { $gt : 10000 } } } ); > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query' }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "COLLSCAN" > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query', millis : { $gte : 20000 } }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "IXSCAN" 23
  24. 24. MADRID · NOV 18-19 · 2016 #Indexes-Sparse 24 ● Sparse Indexes ○ Only contain entries for documents that contain a specific field ○ Smaller and more efficient indexes
  25. 25. MADRID · NOV 18-19 · 2016 > use test > db.test.createIndex( { b : 1 } , { sparse : true } ); > db.test.stats() 16384 > > db.test.createIndex( { b : 1 } ); > db.test.stats() 2580480 > #Indexes-Sparse example 25
  26. 26. MADRID · NOV 18-19 · 2016 #Storage Engines-WiredTiger 26 ● Reduced storage space and higher I/O scalability ● It compresses indexes into RAM, freeing up Working Set for documents ● Performance (granular concurrency control) ● Native compression: ○ per collection ○ per index
  27. 27. MADRID · NOV 18-19 · 2016 #Storage Engines-WiredTiger 27 ● Types of compression: ○ Snappy (default): Data and Journal ○ zlib: maximum compression ○ Prefix: for indexes
  28. 28. MADRID · NOV 18-19 · 2016 #Storage Engines: In-Memory 28 ● Extreme performance coupled with real time analytics for the most demanding, latency-sensitive applications ● Delivers the extreme throughput and predictable latency ● Eliminates the need for separate caching layers
  29. 29. MADRID · NOV 18-19 · 2016 #Query Optimizer 29 ● It selects the best index to use ● It periodically runs alternate query plans and selects the index with the best response time for each query type ● The results of these tests are stored as a cached query plan and are updated periodically
  30. 30. MADRID · NOV 18-19 · 2016 #Aggregation Framework 30
  31. 31. MADRID · NOV 18-19 · 2016 #Aggregation Framework 31 ● Use $indexStats to determine how frequently your index is used ● Each shard has its own statistics ● Stats are reset when you restart your mongod service >db.collection.aggregate([ { $indexStats : {} } ])
  32. 32. MADRID · NOV 18-19 · 2016 #Aggregation Framework example 32 > db.test.aggregate([ { $indexStats : {} } ]).pretty() { "name" : "b_1", "key" : { "b" : 1 }, "host" : "hostname:27017", "accesses" : { "ops" : NumberLong(4), "since" : ISODate("2016-11-07T00:05:39.771Z") } } ...
  33. 33. MADRID · NOV 18-19 · 2016 #Tools-Compass 33
  34. 34. MADRID · NOV 18-19 · 2016 #Tools-Compass 34 ● MongoDB Compass ○ Analyze the size and usage of the indexes ○ It offers visual plan explanations ○ Index creation ○ Real time stats ○ Identify and fix performance and data problems
  35. 35. MADRID · NOV 18-19 · 2016 #Tools-Ops Manager 35 ● The Visual Query Profiler allows to analyze a query performance and recommends on the addition of indexes ● It can be used to visualize output from the profiler when identifying slow queries
  36. 36. MADRID · NOV 18-19 · 2016 #Tools-Cloud Manager & Atlas 36 ● Use the Query Targeting chart
  37. 37. MADRID · NOV 18-19 · 2016 #Tools-Profiling 37 ● Guess the slowest queries for improving their indexes ● Log information for all events or only those whose duration exceeds a configurable threshold (100ms default)
  38. 38. MADRID · NOV 18-19 · 2016 #Tools-Profiling example 38 > db.setProfilingLevel(2) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.example.find( { c : 98 } ); > db.system.profile.find().pretty() { "op" : "query", "ns" : "test.example", "query" : { "find" : "example", "filter" : { "c" : 98 } }, … "millis" : 6
  39. 39. MADRID · NOV 18-19 · 2016 #Tuning-SSD 39 ● Most disk access patterns are not sequential ● Data files benefit from SSDs ● MongoDB journal files have high sequential write patterns (good for spinning disks) ● RAM/SSDs better than spinning drives ● Use high performance storage ● Avoid networked storage
  40. 40. MADRID · NOV 18-19 · 2016 #Tuning-CPU 40 ● MongoDB has better performance on faster CPUs ● The MongoDB WiredTiger Storage Engine is better able to saturate multi-core processor resources than the MMAPv1 Storage Engine
  41. 41. MADRID · NOV 18-19 · 2016 #Tuning-Thread 41 ● The WiredTiger Storage Engine is multi-threaded and can take advantage of many CPU Cores
  42. 42. MADRID · NOV 18-19 · 2016 #Sharding 42 ● The balancer has been moved from the mongos to the primary member of the config server ● Sharding improves the performance of reads ● Benefits (even with a ‘poor’ shard key and scatter-gather): ○ A larger amount of memory to store indexes ○ Parallelize queries across shards reducing latency
  43. 43. MADRID · NOV 18-19 · 2016 #Storage & Compression types 43 ● Balance query latency with storage density and cost by assigning data sets based on a value such as a timestamp to specific storage devices
  44. 44. MADRID · NOV 18-19 · 2016 #Storage & Compression types 44 ● Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression ● Older, less frequently accessed data is tagged to lower-throughput hard disk drives compressed with zlib to attain maximum storage density with a lower cost-per-bit ● As data ages, MongoDB will automatically migrates it between shards
  45. 45. MADRID · NOV 18-19 · 2016 #Questions? 45 Questions?
  46. 46. MongoDB: Improve the performance of your application by @juanroycouto MADRID · NOV 18-19 · 2016 46 Thank you for your attention!

×