2011 Mongo FR - Indexing in MongoDB

1. MongoDB Indexing and Query Optimizer Details Antoine Girbal Mongo FR March 23, 2011

3. A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.

4. We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).

5. Much of the material will be presented through examples.

6. Diagrams are to aid understanding – some details will be left out.

7. Btree (conceptual diagram) 1 2 3 4 5 6 7 8 9 {_id:4,x:6}

9. Index {x:1}

10. Find One Document 1 2 3 4 5 6 7 8 9 6 ? {_id:4,x:6}

11. Find One Document > db.c.find( {x:6} ).limit( 1 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } } Uses a btree cursor to find the object. Index ranges are around a single value.

14. Find One Document 1 2 3 4 5 6 6 6 9 6 ? {_id:4,x:6} Now we have duplicate x values

17. Index {x:1}

18. Several documents to be returned

19. Equality Match 9 1 2 3 4 5 6 6 6 6 ? {_id:4,x:6} {_id:5,x:6} {_id:1,x:6}

20. Equality Match > db.c.find( {x:6} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } }

21. Equality Match 1 2 3 4 5 6 6 6 9 6 ?

23. Index {x:1}

24. Object content needs to be checked

25. Full Document Matcher 9 1 2 3 4 5 6 6 6 6 ? {y:4,x:6} {y:5,x:6} {y:1,x:6}

26. Full Document Matcher > db.c.find( {x:6,y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 3, "nscannedObjects" : 3, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 6, 6 ] ] } } Documents for all matching index keys are scanned, but only one document matched on non index keys.

28. Index {x:1}

29. Range Match 8 1 2 3 4 5 6 7 9 4 <= ? <= 7

30. Range Match > db.c.find( {x:{$gte:4,$lte:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 4, 7 ] ] } }

31. Range Match 1 2 3 4 5 6 7 8 9

33. Index {x:1}

34. Range of index is same as inclusive range match

35. but boundaries are not scanned nor returned

37. Index {x:1}

38. documents contain lists with several values like [8,9].

39. Multikeys 1 2 3 4 5 6 7 9 ? > 7 {_id:4,x:[8,9]} 8

40. Multikeys > db.c.find( {x:{$gt:7}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 7, 1.7976931348623157e+308 ] ] } } All keys in valid range are scanned, but the matcher rejects duplicate documents making n == 1.

41. Multikeys 1 2 3 4 5 6 7 8 9

43. db.c.find( {x:{$gt:4}} )

45. Range Types db.c.find( {x:/â/} ) "indexBounds" : { "x" : [ [ "a", "b" ], [ /â/, /â/ ] ] } 2 ranges scanned of 2 different types: string and regex

46. Range Types db.c.find( {x:/a/} ) "indexBounds" : { "x" : [ [ "", { } ], [ /a/, /a/ ] ] } Here the index only helps to restrict type, not efficient in practice

48. Index {x:1}

49. Set Match 8 1 2 3 4 5 6 7 9 3 , 6

50. Set Match > db.c.find( {x:{$in:[3,6]}} ).explain() { "cursor" : "BtreeCursor x_1 multi", "nscanned" : 3, "nscannedObjects" : 2, "n" : 2, "millis" : 8, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 3 ], [ 6, 6 ] ] }} Why is nscanned 3? This is an algorithmic detail, when there are disjoint ranges for a key nscanned may be higher than the number of matching keys.

51. Set Match 1 2 3 4 5 6 7 8 9

53. Index {x:1}

54. All Match 8 1 2 3 4 5 6 7 9 3 ? {_id:4,x:[3,6]}

55. All Match > db.c.find( {x:{$all:[3,6]}} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ 3, 3 ] ] } } The first entry in the $all match array is always used for index bounds. Note this may not be the least numerous indexed value in the $all array.

56. All Match 1 2 3 4 5 6 7 8 9

58. Index {x:1}

59. Limit 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3

60. Limit > db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } } Scan until three matches are found, then stop.

62. Index {x:1}

63. Skip 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3

64. Skip > db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } } All skipped documents are scanned.

66. Index {x:1}

67. Sorting along index key uses index btree ordering

68. Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3

69. Sort > db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } } Find uses the btree cursor to easily sort data

71. Index {x:1}

72. Using non-indexed key to sort data will need to scan & order

73. Sort 8 1 2 3 4 5 6 7 9 6 ? < y:3 y:1 y:3 y:3 y:3

74. Sort Results are sorted on the fly to match requested order. The scanAndOrder field is only printed when its value is true. > db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 5, "nscannedObjects" : 5, "n" : 4, "scanAndOrder" : true, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : true, "indexOnly" : false, "indexBounds" : { "x" : [ [ -1.7976931348623157e+308, 6 ] ] } }

76. With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.

79. $size

80. array match

83. Index {x:1} Id would be returned by default, but isn’t in the index so we need to exclude to return only indexed fields.

84. Covered Indexes > db.c.find( {x:6}, {x:1,_id:0} ).explain() { "cursor" : "BtreeCursor x_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : true, "indexBounds" : { "x" : [ [ 6, 6 ] ] } } IndexOnly is true, and isMultiKey must be false. Currently we set isMultiKey to true the first time we save a doc where the field is a multikey array.

86. Index {x:1,y:1}

87. Two Equality Bounds ? 5 c 1 b 3 d 4 g 5 d 5 f 6 c 7 a 9 b 5 c

88. Two Equality Bounds > db.c.find( {x:5,y:'c'} ).explain() { "cursor" : "BtreeCursor x_1_y_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ], "y" : [ [ "c", "c" ] ]}} 2 Ranges applied to narrow down the data to scan.

89. Two Equality Bounds ? 1 b 3 d 4 g 5 c 5 d 5 f 5 c 6 c 7 a 9 b

91. Index {x:1,y:1}

92. Two Set Bounds , , , 5 c 1 b 3 d 4 g 5 d 5 f 6 c 7 a 9 f 5 c 5 f 9 c 9 f

93. Two Set Bounds > db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain() { "cursor" : "BtreeCursor x_1_y_1 multi", "nscanned" : 5, "nscannedObjects" : 3, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, ... "indexBounds" : { "x" : [ [ 5, 5 ], [ 9, 9 ] ], "y" : [ [ "c", "c" ], [ "f", "f" ] ] } }

95. Indexes {x:1}, {y:1}

96. Does 2 sequential find for each clause

97. Must not return same document twice, so it checks whether it satisfies previous clause

98. Disjoint $or Criteria ? ? 1 b 3 d 4 g 5 d 6 a 7 e 9 f 5 c d 7 g 5 1 b 3 d 4 g 5 d 6 a 7 e 9 f 5 c 7 g

99. Disjoint $or Criteria > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { "clauses" : [ { "cursor" : "BtreeCursor x_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 2, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "x" : [ [ 5, 5 ] ] } }, { "cursor" : "BtreeCursor y_1", "nscanned" : 2, "nscannedObjects" : 2, "n" : 1, "millis" : 1, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { "y" : [ [ "d", "d" ] ] } }], "nscanned" : 4, "nscannedObjects" : 4, "n" : 3, "millis" : 1}

101. Index {x:1} (no index on y)

102. Unindexed $or Clause > db.c.find( {$or:[{x:5},{y:'d'}]} ).explain() { "cursor" : "BasicCursor", "nscanned" : 9, "nscannedObjects" : 9, "n" : 3, "millis" : 0, "nYields" : 0, "nChunkSkips" : 0, "isMultiKey" : false, "indexOnly" : false, "indexBounds" : { } } Since y is not indexed, we must do a full collection scan to match y:’d’. Since a full scan is required, we don’t use the index on x to match x:5.

103. Automatic Index Selection (Query Optimizer)

107. All fields with index useful constraints are indexed

114. If fewer distinct values of 2 < x < 7 than distinct values of ‘b’ < y < ‘f’ then {x:1,y:1} chosen (rule of thumb)

117. Cost of scanAndOrder vs ordered index

118. Cost of loading full document vs just index key

119. Cost of scanning adjacent btree keys vs non adjacent keys/documents

121. Run in interleaved fashion

122. Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.

124. We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.

126. find( {x:{$gt:5},y:{$lt:’z’}} )

127. {Pattern: {x:’gt bound’, y:’lt bound’}, Index: {y:1}, nscanned: 500}

129. find( {x:{$gt:20},y:{$lt:’b’}} )

130. Use index {y:1}

132. Indexes added / removed

134. Currently “much worse” means 10x

135. Thanks! Feature Requests jira.mongodb.org Support groups.google.com/group/mongodb-user

2011 Mongo FR - Indexing in MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2011 Mongo FR - Indexing in MongoDB

Similar to 2011 Mongo FR - Indexing in MongoDB (20)

2011 Mongo FR - Indexing in MongoDB