16. How do I create indexes?
// Create an index if one does not exist
db.recipes.createIndex({ main_ingredient: 1 })
// The client remembers the index and raises no errors
db.recipes.ensureIndex({ main_ingredient: 1 })
* 1 means ascending, -1 descending
17. What can be indexed?
// Multiple fields (compound key indexes)
db.recipes.ensureIndex({
main_ingredient: 1,
calories: -1
})
// Arrays of values (multikey indexes)
{
name: 'Chicken Noodle Soupā,
ingredients : ['chicken', 'noodles']
}
db.recipes.ensureIndex({ ingredients: 1 })
18. What can be indexed?
// Subdocuments
{
name : 'Apple Pie',
contributor: {
name: 'Joe American',
id: 'joea123'
}
}
db.recipes.ensureIndex({ 'contributor.id': 1 })
db.recipes.ensureIndex({ 'contributor': 1 })
19. How do I manage indexes?
// List a collection's indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()
// Drop a specific index
db.recipes.dropIndex({ ingredients: 1 })
// Drop all indexes and recreate them
db.recipes.reIndex()
// Default (unique) index on _id
20. Background Index Builds
// Index creation is a blocking operation that can take a long time
// Background creation yields to other operations
db.recipes.ensureIndex(
{ ingredients: 1 },
{ background: true }
)
22. Uniqueness Constraints
// Only one recipe can have a given value for name
db.recipes.ensureIndex( { name: 1 }, { unique: true } )
// Force index on collection with duplicate recipe names ā drop the
duplicates
db.recipes.ensureIndex(
{ name: 1 },
{ unique: true, dropDups: true }
)
* dropDups is probably never what you want
23. Sparse Indexes
// Only documents with field calories will be indexed
db.recipes.ensureIndex(
{ calories: -1 },
{ sparse: true }
)
// Allow multiple documents to not have calories field
db.recipes.ensureIndex(
{ name: 1 , calories: -1 },
{ unique: true, sparse: true }
)
* Missing fields are stored as null(s) in the index
24. Geospatial Indexes
// Add latitude, longitude coordinates
{
name: '10gen Palo Altoā,
loc: [ 37.449157, -122.158574 ]
}
// Index the coordinates
db.locations.ensureIndex( { loc : '2d' } )
// Query for locations 'near' a particular coordinate
db.locations.find({
loc: { $near: [ 37.4, -122.3 ] }
})
25. TTL Collections
// Documents must have a BSON UTC Date field
{ 'status' : ISODate('2012-10-12T05:24:07.211Z'), ā¦ }
// Documents are removed after 'expireAfterSeconds' seconds
db.recipes.ensureIndex(
{ submitted_date: 1 },
{ expireAfterSeconds: 3600 }
)
26. Limitations
ā¢ Collections can not have > 64 indexes.
ā¢ Index keys can not be > 1024 bytes (1K).
ā¢ The name of an index, including the namespace, must be <
128 characters.
ā¢ Queries can only use 1 index*
ā¢ Indexes have storage requirements, and impact the
performance of writes.
ā¢ In memory sort (no-index) limited to 32mb of return data.
28. Profiling Slow Ops
db.setProfilingLevel( n , slowms=100ms )
n=0 profiler off
n=1 record operations longer than slowms
n=2 record all queries
db.system.profile.find()
* The profile collection is a capped collection, and fixed in size
29. The Explain Plan (Pre Index)
db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
"cursor" : "BasicCursor" ,
"n" : 42,
"nscannedObjectsā : 12345
"nscanned" : 12345,
...
"millis" : 356,
...
}
* Doesnāt use cached plans, re-evals and resets cache
30. The Explain Plan (Post Index)
db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
"cursor" : "BtreeCursor calories_-1" ,
"n" : 42,
"nscannedObjects": 42
"nscanned" : 42,
...
"millis" : 0,
...
}
* Doesnāt use cached plans, re-evals and resets cache
31. The Query Optimizer
ā¢ For each "type" of query, MongoDB
periodically tries all useful indexes
ā¢ Aborts the rest as soon as one plan wins
ā¢ The winning plan is temporarily cached for
each ātypeā of query
32. Manually Select Index to Use
// Tell the database what index to use
db.recipes.find({
calories: { $lt: 1000 } }
).hint({ _id: 1 })
// Tell the database to NOT use an index
db.recipes.find(
{ calories: { $lt: 1000 } }
).hint({ $natural: 1 })
33. Use Indexes to Sort Query
Results
// Given the following index
db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })
// The following query and sort operations can use the index
db.collection.find( ).sort({ a:1 })
db.collection.find( ).sort({ a:1, b:1 })
db.collection.find({ a:4 }).sort({ a:1, b:1 })
db.collection.find({ b:5 }).sort({ a:1, b:1 })
34. Indexes that wonāt work for
sorting query results
// Given the following index
db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })
// These can not sort using the index
db.collection.find( ).sort({ b: 1 })
db.collection.find({ b: 5 }).sort({ b: 1 })
35. Index Covered Queries
// MongoDB can return data from just the index
db.recipes.ensureIndex({ main_ingredient: 1, name: 1 })
// Return only the ingredients field
db.recipes.find(
{ main_ingredient: 'chickenā },
{ _id: 0, name: 1 }
)
// indexOnly will be true in the explain plan
db.recipes.find(
{ main_ingredient: 'chicken' },
{ _id: 0, name: 1 }
).explain()
{
"indexOnly": true,
}
38. Trying to Use Multiple
Indexes
// MongoDB can only use one index for a query
db.collection.ensureIndex({ a: 1 })
db.collection.ensureIndex({ b: 1 })
// Only one of the above indexes is used
db.collection.find({ a: 3, b: 4 })
39. Compound Key Mistakes
// Compound key indexes are very effective
db.collection.ensureIndex({ a: 1, b: 1, c: 1 })
// But only if the query is a prefix of the index
// This query can't effectively use the index
db.collection.find({ c: 2 })
// ā¦but this query can
db.collection.find({ a: 3, b: 5 })
41. Regular Expressions
db.users.ensureIndex({ username: 1 })
// Left anchored regex queries can use the index
db.users.find({ username: /^joe smith/ })
// But not generic regexes
db.users.find({username: /smith/ })
// Or case insensitive queries
db.users.find({ username: /Joe/i })
When speaking: What are indexes and why do we need them?First part of this talk is conceptualSecond part is extremely detailed
Look at 7 documents
Queries, inserts and deletes: O(log(n)) time
MongoDB's indexes are B-Trees.Lookups (queries), inserts and deletes happen in O(log(n)) time.TODO: Add a page describing what a B-Tree is???
So this is helpful, and can speed up queries by a tremendous amount
So itās imperative we understand them
Tell a story about a customer problem caused by a missing index.
Repeated calls to ensureIndex only result in one create message going to the server. The index is cached client side for some period of time (varies by driver).
Indexes can be costly if you have too manysoooo....
getIndexes returns an index document for each index in the collection.dropIndex requires the spec used to create the index initiallyreIndex drops *all* indexes (including the _id index) and rebuilds them
Caveats:Still a resource-intensive operationIndex build is slowerThe mongo shell session or app will block while the index buildsIndexes are still built in the foreground on secondariesKristine to provide replica set image.
unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!TODO: Maybe add a red exclamation point for dropDups.
MongoDB doesn't enforce a schema ā documents are not required to have the same fields.Sparse indexes only contain entries for documents that have the indexed field.Without sparse, documents without field 'a' have a null entry in the index for that field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple 'null' values violate the unique constraint.XXX: Is there a visual that makes sense here?
'2d' index is a geohash on top of the b-tree.Allows you to search for documents 'near' a latitude/longitude position. Bounds queries are also possible using $within.TODO: Google maps image, or something similar. Kristine to provide.
Index must be on a BSON date field.Documents are removed after expireAfterSeconds seconds.Reaper thread runs every 60 seconds.TODO: Hourglass image, or something similar. Kristine to provide.
Indexes are a really powerful feature of MongoDB, however there are some limitations.Understanding these limitations is an important part of using MongoDB correctly.With the exception of $or queries.If index key exceeds 1k, documents silently dropped/not included
Changingslowms also affects what queries are logged to the mongodb log file.
cursor ā the type of cursor used. BasicCursor means no index was used. TODO: Use a real example here instead of made up numbersā¦n ā the number of documents that match the querynscannedObjects ā the number of documents that had to be scannednscanned ā the number of items (index entries or documents) examinedmillis ā how long the query tookRatio of n to nscanned should be as close to 1 as possible.
cursor ā the type of cursor used. BasicCursor means no index was used.n ā the number of documents that match the querynscannedObjects ā the number of documents that had to be scannednscanned ā the number of items (index entries or documents) examinedmillis ā how long the query tookRatio of n to nscanned should be as close to 1 as possible.
Winning plan is reevaluated after 1000 write operations (insert, update, remove, etc.).TODO: Replace much of this with an animation? Kristine to provide.
Tells MongoDB exactly what index to use.
MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
TODO: Cookbook image here? Rework to go along with the cookbook example?
Tell a story about a customer problem caused by a suboptimal index.TODO: Change background color?
Better to use a compound index on the low selectivity field and some other more selective field.