2011 Mongo FR - Indexing in MongoDB

  • 1,078 views
Uploaded on

We all know that MongoDB is one of the most flexible and feature-rich databases available. In this session we'll discuss how you can leverage this feature set and maintain high performance with your …

We all know that MongoDB is one of the most flexible and feature-rich databases available. In this session we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,078
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
25
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MongoDB Indexing and Query Optimizer Details Antoine Girbal Mongo FR March 23, 2011
  • 2. What will we cover?
    • Many details of how indexing and the query optimizer work
    • 3. A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.
    • 4. We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).
    • 5. Much of the material will be presented through examples.
    • 6. Diagrams are to aid understanding – some details will be left out.
  • 7. Btree (conceptual diagram) 1 2 3 4 5 6 7 8 9 {_id:4,x:6}
  • 8. Find One Document
    • db.c.find( {x:6} ).limit( 1 )
    • 9. Index {x:1}
  • 10. Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
  • 11. Find One Document > db.c.find( {x:6} ).limit( 1 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } } Uses a btree cursor to find the object. Index ranges are around a single value.
  • 12. Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
  • 13. Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}
  • 14. Find One Document 1 2 3 4 5 6 6 6 9 6 ? {_id:4,x:6} Now we have duplicate x values
  • 15. Find One Document 1 2 3 4 5 6 6 6 9 6 ? {_id:4,x:6}
  • 16. Equality Match
    • db.c.find( {x:6} )
    • 17. Index {x:1}
    • 18. Several documents to be returned
  • 19. Equality Match 9 1 2 3 4 5 6 6 6 6 ? {_id:4,x:6} {_id:5,x:6} {_id:1,x:6}
  • 20. Equality Match > db.c.find( {x:6} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }
  • 21. Equality Match 1 2 3 4 5 6 6 6 9 6 ?
  • 22. Full Document Matcher
    • db.c.find( {x:6,y:1} )
    • 23. Index {x:1}
    • 24. Object content needs to be checked
  • 25. Full Document Matcher 9 1 2 3 4 5 6 6 6 6 ? {y:4,x:6} {y:5,x:6} {y:1,x:6}
  • 26. Full Document Matcher > db.c.find( {x:6,y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } } Documents for all matching index keys are scanned, but only one document matched on non index keys.
  • 27. Range Match
    • db.c.find( {x:{$gte:4,$lte:7}} )
    • 28. Index {x:1}
  • 29. Range Match 8 1 2 3 4 5 6 7 9 4 <= ? <= 7
  • 30. Range Match > db.c.find( {x:{$gte:4,$lte:7}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 4, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 4, 7 ] ] } }
  • 31. Range Match 1 2 3 4 5 6 7 8 9
  • 32. Exclusive Range Match
    • db.c.find( {x:{$gt:4,$lt:7}} )
    • 33. Index {x:1}
    • 34. Range of index is same as inclusive range match
    • 35. but boundaries are not scanned nor returned
  • 36. Multikeys
    • db.c.find( {x:{$gt:7}} )
    • 37. Index {x:1}
    • 38. documents contain lists with several values like [8,9].
  • 39. Multikeys 1 2 3 4 5 6 7 9 ? > 7 {_id:4,x:[8,9]} 8
  • 40. Multikeys > db.c.find( {x:{$gt:7}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 2, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 7, 1.7976931348623157e+308 ] ] } } All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.
  • 41. Multikeys 1 2 3 4 5 6 7 8 9
  • 42. Range Types
    • Explicit inequality
      • db.c.find( {x:{$gt:4,$lt:7}} )
      • 43. db.c.find( {x:{$gt:4}} )
      • 44. db.c.find( {x:{$ne:4}} )
    • Regular expression prefix
      • db.c.find( {x:/^a/} )
      • Data type
        • db.c.find( {x:/a/} )
  • 45. Range Types db.c.find( {x:/^a/} ) &quot;indexBounds&quot; : { &quot;x&quot; : [ [ &quot;a&quot;, &quot;b&quot; ], [ /^a/, /^a/ ] ] } 2 ranges scanned of 2 different types: string and regex
  • 46. Range Types db.c.find( {x:/a/} ) &quot;indexBounds&quot; : { &quot;x&quot; : [ [ &quot;&quot;, { } ], [ /a/, /a/ ] ] } Here the index only helps to restrict type, not efficient in practice
  • 47. Set Match
    • db.c.find( {x:{$in:[3,6]}} )
    • 48. Index {x:1}
  • 49. Set Match 8 1 2 3 4 5 6 7 9 3 , 6
  • 50. Set Match > db.c.find( {x:{$in:[3,6]}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1 multi&quot;, &quot;nscanned&quot; : 3, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 2, &quot;millis&quot; : 8, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 3, 3 ], [ 6, 6 ] ] }} Why is nscanned 3? This is an algorithmic detail, when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.
  • 51. Set Match 1 2 3 4 5 6 7 8 9
  • 52. All Match
    • db.c.find( {x:{$all:[3,6]}} )
    • 53. Index {x:1}
  • 54. All Match 8 1 2 3 4 5 6 7 9 3 ? {_id:4,x:[3,6]}
  • 55. All Match > db.c.find( {x:{$all:[3,6]}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 3, 3 ] ] } } The first entry in the $all match array is always used for index bounds. Note this may not be the least numerous indexed value in the $all array.
  • 56. All Match 1 2 3 4 5 6 7 8 9
  • 57. Limit
    • db.c.find( {x:{$lt:6},y:3} ).limit( 3 )
    • 58. Index {x:1}
  • 59. Limit 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 60. Limit > db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 3, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } Scan until three matches are found, then stop.
  • 61. Skip
    • db.c.find( {x:{$lt:6},y:3} ).skip( 3 )
    • 62. Index {x:1}
  • 63. Skip 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 64. Skip > db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } All skipped documents are scanned.
  • 65. Sort
    • db.c.find( {x:{$lt:6}} ).sort( {x:1} )
    • 66. Index {x:1}
    • 67. Sorting along index key uses index btree ordering
  • 68. Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 69. Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 4, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } } Find uses the btree cursor to easily sort data
  • 70. Sort
    • db.c.find( {x:{$lt:6}} ).sort( {y:1} )
    • 71. Index {x:1}
    • 72. Using non-indexed key to sort data will need to scan & order
  • 73. Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3
  • 74. Sort Results are sorted on the fly to match requested order. The scanAndOrder field is only printed when its value is true. > db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 5, &quot;n&quot; : 4, &quot;scanAndOrder&quot; : true, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : true, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ -1.7976931348623157e+308, 6 ] ] } }
  • 75. Sort and scanAndOrder
    • With “scanAndOrder” sort, all documents must be touched even if there is a limit spec.
    • 76. With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.
  • 77. Count
    • Count uses the same indexed but only scans in the index, not the object data in storage
    • 78. With some operators the full document must be checked. Some of these cases:
      • With current semantics, all multikey elements must match negation constraints
    • Multikey de duplication works without loading full document
  • 82. Covered Indexes
    • db.c.find( {x:6}, {x:1,_id:0} )
    • 83. Index {x:1}
    Id would be returned by default, but isn’t in the index so we need to exclude to return only indexed fields.
  • 84. Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : true, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 6, 6 ] ] } } IndexOnly is true, and isMultiKey must be false. Currently we set isMultiKey to true the first time we save a doc where the field is a multikey array.
  • 85. Two Equality Bounds
    • db.c.find( {x:5,y:’c’} )
    • 86. Index {x:1,y:1}
  • 87. Two Equality Bounds ? 5 c 1 b 3 d 4 g 5 d 5 f 6 c 7 a 9 b 5 c
  • 88. Two Equality Bounds > db.c.find( {x:5,y:'c'} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1_y_1&quot;, &quot;nscanned&quot; : 1, &quot;nscannedObjects&quot; : 1, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 5, 5 ] ], &quot;y&quot; : [ [ &quot;c&quot;, &quot;c&quot; ] ]}} 2 Ranges applied to narrow down the data to scan.
  • 89. Two Equality Bounds ? 1 b 3 d 4 g 5 c 5 d 5 f 5 c 6 c 7 a 9 b
  • 90. Two Set Bounds
    • db.c.find( {x:{$in:[5,9]},y:{$in:[’c’,’f’]}} )
    • 91. Index {x:1,y:1}
  • 92. Two Set Bounds , , , 5 c 1 b 3 d 4 g 5 d 5 f 6 c 7 a 9 f 5 c 5 f 9 c 9 f
  • 93. Two Set Bounds > db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain() { &quot;cursor&quot; : &quot;BtreeCursor x_1_y_1 multi&quot;, &quot;nscanned&quot; : 5, &quot;nscannedObjects&quot; : 3, &quot;n&quot; : 3, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, ... &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 5, 5 ], [ 9, 9 ] ], &quot;y&quot; : [ [ &quot;c&quot;, &quot;c&quot; ], [ &quot;f&quot;, &quot;f&quot; ] ] } }
  • 94. Disjoint $or Criteria
    • db.c.find( {$or:[{x:5},{y:’d’}]} )
    • 95. Indexes {x:1}, {y:1}
    • 96. Does 2 sequential find for each clause
    • 97. Must not return same document twice, so it checks whether it satisfies previous clause
  • 98. Disjoint $or Criteria ? ? 1 b 3 d 4 g 5 d 6 a 7 e 9 f 5 c d 7 g 5 1 b 3 d 4 g 5 d 6 a 7 e 9 f 5 c 7 g
  • 99. Disjoint $or Criteria > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { &quot;clauses&quot; : [ { &quot;cursor&quot; : &quot;BtreeCursor x_1&quot;, &quot;nscanned&quot; : 2, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 2, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;x&quot; : [ [ 5, 5 ] ] } }, { &quot;cursor&quot; : &quot;BtreeCursor y_1&quot;, &quot;nscanned&quot; : 2, &quot;nscannedObjects&quot; : 2, &quot;n&quot; : 1, &quot;millis&quot; : 1, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { &quot;y&quot; : [ [ &quot;d&quot;, &quot;d&quot; ] ] } }], &quot;nscanned&quot; : 4, &quot;nscannedObjects&quot; : 4, &quot;n&quot; : 3, &quot;millis&quot; : 1}
  • 100. Unindexed $or Clause
    • db.c.find( {$or:[{x:5},{y:’d’}]} )
    • 101. Index {x:1} (no index on y)
  • 102. Unindexed $or Clause > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { &quot;cursor&quot; : &quot;BasicCursor&quot;, &quot;nscanned&quot; : 9, &quot;nscannedObjects&quot; : 9, &quot;n&quot; : 3, &quot;millis&quot; : 0, &quot;nYields&quot; : 0, &quot;nChunkSkips&quot; : 0, &quot;isMultiKey&quot; : false, &quot;indexOnly&quot; : false, &quot;indexBounds&quot; : { } } Since y is not indexed, we must do a full collection scan to match y:’d’. Since a full scan is required, we don’t use the index on x to match x:5.
  • 103. Automatic Index Selection (Query Optimizer)
  • 104. Optimal Index
    • find( {x:5} )
      • Index {x:1}
      • 105. Index {x:1,y:1}
    • find( {x:5} ).sort( {y:1 } )
      • Index {x:1,y:1}
    • find( {} ).sort( {x:1} )
      • Index {x:1}
    • find( {x:{$gt:1,$lt:7}} ).sort( {x:1} )
      • Index {x:1}
  • 106. Optimal Index
    • Rule of Thumb
      • No scanAndOrder
      • 107. All fields with index useful constraints are indexed
      • 108. If there is a range or sort it is the last field of the index used to resolve the query
    • If multiple optimal indexes exist, one chosen arbitrarily.
  • 109. Multiple Candidate Indexes
    • find( {x:4,y:’a’} )
      • Index {x:1} or {y:1}?
    • find( {x:4} ).sort( {y:1} )
      • Index {x:1} or {y:1}?
      • 110. Note: {x:1,y:1} is optimal
    • find( {x:{$gt:2,$lt:7},y:{$gt:’a’,$lt:’f’}} )
      • Index {x:1,y:1} or {y:1,x:1}?
  • 111. Multiple Candidate Indexes
    • The only index selection criterion is nscanned
    • 112. find( {x:4,y:’a’} )
      • Index {x:1} or {y:1} ?
      • 113. If fewer documents match {y:’a’} than {x:4} then nscanned for {y:1} will be less so we pick {y:1}
    • find( {x:{$gt:2,$lt:7},y:{$gt:’b’,$lt:’f’}} )
      • Index {x:1,y:1} or {y:1,x:1} ?
      • 114. If fewer distinct values of 2 < x < 7 than distinct values of ‘b’ < y < ‘f’ then {x:1,y:1} chosen (rule of thumb)
  • 115. Multiple Candidate Indexes
    • The only index selection criterion is nscanned
    • 116. Pretty good, but doesn’t cover every case, eg
      • Overhead of using an index versus doing a collection scan
      • 117. Cost of scanAndOrder vs ordered index
      • 118. Cost of loading full document vs just index key
      • 119. Cost of scanning adjacent btree keys vs non adjacent keys/documents
  • 120. Competing Indexes
    • At most one query plan per index
    • 121. Run in interleaved fashion
    • 122. Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.
  • 123. Competing Indexes
    • Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query).
    • 124. We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.
  • 125. “ Learning” a Query Plan
    • When an index is chosen for a query the query’s “pattern” and nscanned are recorded
      • find( {x:3,y:’c’} )
      • {Pattern: {x:’equality’, y:’equality’}, Index: {x:1}, nscanned: 50}
      • 126. find( {x:{$gt:5},y:{$lt:’z’}} )
      • 127. {Pattern: {x:’gt bound’, y:’lt bound’}, Index: {y:1}, nscanned: 500}
  • 128. “ Learning” a Query Plan
    • When a new query matches the same pattern, the same query plan is used
      • find( {x:5,y:’z’} )
      • Use index {x:1}
      • 129. find( {x:{$gt:20},y:{$lt:’b’}} )
      • 130. Use index {y:1}
  • 131. “ Un-Learning” a Query Plan
    • 100 writes to the collection
    • 132. Indexes added / removed
  • 133. Bad Plan Insurance
    • If nscanned for a new query using a recorded plan is much worse than the recorded nscanned for an earlier query with the same pattern, we start interleaving other plans with the current plan.
    • 134. Currently “much worse” means 10x
  • 135. Thanks! Feature Requests jira.mongodb.org Support groups.google.com/group/mongodb-user