• Like

Webinar: Indexing and Query Optimization

  • 975 views
Uploaded on

Having the right indexes in place are crucial to performance in MongoDB. In this talk, we’ll explain how indexes work and the various indexing options. Then, we'll cover the tools available to …

Having the right indexes in place are crucial to performance in MongoDB. In this talk, we’ll explain how indexes work and the various indexing options. Then, we'll cover the tools available to optimize your queries and avoid common pitfalls. This session will use real-world examples to demonstrate the importance of proper indexing.

More in: Self Improvement
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
975
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
19
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • When speaking: What are indexes and why do we need them?First part of this talk is conceptualSecond part is extremely detailed
  • Look at 7 documents
  • Queries, inserts and deletes: O(log(n)) time
  • MongoDB's indexes are B-Trees.Lookups (queries), inserts and deletes happen in O(log(n)) time.TODO: Add a page describing what a B-Tree is???
  • So this is helpful, and can speed up queries by a tremendous amount
  • So it’s imperative we understand them
  • Tell a story about a customer problem caused by a missing index.
  • Repeated calls to ensureIndex only result in one create message going to the server. The index is cached client side for some period of time (varies by driver).
  • Indexes can be costly if you have too manysoooo....
  • getIndexes returns an index document for each index in the collection.dropIndex requires the spec used to create the index initiallyreIndex drops *all* indexes (including the _id index) and rebuilds them
  • Caveats:Still a resource-intensive operationIndex build is slowerThe mongo shell session or app will block while the index buildsIndexes are still built in the foreground on secondariesKristine to provide replica set image.
  • unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!TODO: Maybe add a red exclamation point for dropDups.
  • MongoDB doesn't enforce a schema – documents are not required to have the same fields.Sparse indexes only contain entries for documents that have the indexed field.Without sparse, documents without field 'a' have a null entry in the index for that field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple 'null' values violate the unique constraint.XXX: Is there a visual that makes sense here?
  • '2d' index is a geohash on top of the b-tree.Allows you to search for documents 'near' a latitude/longitude position. Bounds queries are also possible using $within.TODO: Google maps image, or something similar. Kristine to provide.
  • Index must be on a BSON date field.Documents are removed after expireAfterSeconds seconds.Reaper thread runs every 60 seconds.TODO: Hourglass image, or something similar. Kristine to provide.
  • Indexes are a really powerful feature of MongoDB, however there are some limitations.Understanding these limitations is an important part of using MongoDB correctly.With the exception of $or queries.If index key exceeds 1k, documents silently dropped/not included
  • Changingslowms also affects what queries are logged to the mongodb log file.
  • cursor – the type of cursor used. BasicCursor means no index was used. TODO: Use a real example here instead of made up numbers…n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • cursor – the type of cursor used. BasicCursor means no index was used.n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • Winning plan is reevaluated after 1000 write operations (insert, update, remove, etc.).TODO: Replace much of this with an animation? Kristine to provide.
  • Tells MongoDB exactly what index to use.
  • MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
  • MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
  • TODO: Cookbook image here? Rework to go along with the cookbook example?
  • Tell a story about a customer problem caused by a suboptimal index.TODO: Change background color?
  • Better to use a compound index on the low selectivity field and some other more selective field.

Transcript

  • 1. Solutions Architect, 10genThomas BoydIndexing and QueryOptimization
  • 2. Agenda• What are indexes?• Why do I need them?• Working with indexes in MongoDB• Optimize your queries• Avoiding common mistakes
  • 3. What are indexes?
  • 4. What are indexes?Imagine youre looking for a recipe in a cookbookordered by recipe name. Looking up a recipe byname is quick and easy.
  • 5. What are indexes?• How would you find a recipe using chicken?• How about a 250-350 calorie recipe usingchicken?
  • 6. KRISTINE TO INSERT IMAGE OF COOKBOOKConsult the index!
  • 7. Linked List1 2 3 4 5 6 7
  • 8. Finding 7 in Linked List1 2 3 4 5 6 7
  • 9. Finding 7 in Tree1234765
  • 10. Indexes in MongoDB are B-trees
  • 11. Queries, inserts and deletes:O(log(n)) time
  • 12. Indexes are the singlebiggest tunableperformance factor inMongoDB
  • 13. Absent or suboptimalindexes are the mostcommon avoidableMongoDB performanceproblem.
  • 14. Why do I need indexes?A brief story
  • 15. Working with Indexes inMongoDB
  • 16. // Create an index if one does not existdb.recipes.createIndex({ main_ingredient: 1 })// The client remembers the index and raises no errorsdb.recipes.ensureIndex({ main_ingredient: 1 })* 1 means ascending, -1 descendingHow do I create indexes?
  • 17. // Multiple fields (compound key indexes)db.recipes.ensureIndex({main_ingredient: 1,calories: -1})// Arrays of values (multikey indexes){name: Chicken Noodle Soup’,ingredients : [chicken, noodles]}db.recipes.ensureIndex({ ingredients: 1 })What can be indexed?
  • 18. // Subdocuments{name : Apple Pie,contributor: {name: Joe American,id: joea123}}db.recipes.ensureIndex({ contributor.id: 1 })db.recipes.ensureIndex({ contributor: 1 })What can be indexed?
  • 19. // List a collections indexesdb.recipes.getIndexes()db.recipes.getIndexKeys()// Drop a specific indexdb.recipes.dropIndex({ ingredients: 1 })// Drop all indexes and recreate themdb.recipes.reIndex()// Default (unique) index on _idHow do I manage indexes?
  • 20. // Index creation is a blocking operation that can take a long time// Background creation yields to other operationsdb.recipes.ensureIndex({ ingredients: 1 },{ background: true })Background Index Builds
  • 21. Options• Uniqueness constraints (unique, dropDups)• Sparse Indexes• Geospatial (2d) Indexes• TTL Collections (expireAfterSeconds)
  • 22. // Only one recipe can have a given value for namedb.recipes.ensureIndex( { name: 1 }, { unique: true } )// Force index on collection with duplicate recipe names – drop theduplicatesdb.recipes.ensureIndex({ name: 1 },{ unique: true, dropDups: true })* dropDups is probably never what you wantUniqueness Constraints
  • 23. // Only documents with field calories will be indexeddb.recipes.ensureIndex({ calories: -1 },{ sparse: true })// Allow multiple documents to not have calories fielddb.recipes.ensureIndex({ name: 1 , calories: -1 },{ unique: true, sparse: true })* Missing fields are stored as null(s) in the indexSparse Indexes
  • 24. // Add GeoJSON with longitude & latitude coordinates{name: 10gen Palo Alto’,loc: { type: “Point”, coordinates: [-122.158574 , 37.449157] }}// Index the coordinatesdb.locations.ensureIndex( { loc : 2dsphere } )// Query for locations near a particular coordinatedb.locations.find({loc: { $near: {$geometry: {type: “Point”, coordinates: [ -122.3, 37.4] }})Geospatial Indexes
  • 25. // Documents must have a BSON UTC Date field{ status : ISODate(2012-10-12T05:24:07.211Z), … }// Documents are removed after expireAfterSeconds secondsdb.recipes.ensureIndex({ submitted_date: 1 },{ expireAfterSeconds: 3600 })TTL Collections
  • 26. Limitations• Collections can not have > 64 indexes.• Index keys can not be > 1024 bytes (1K).• The name of an index, including the namespace, must be <128 characters.• Queries can only use 1 index*• Indexes have storage requirements, and impact theperformance of writes.• In memory sort (no-index) limited to 32mb of return data.
  • 27. Optimize Your Queries
  • 28. db.setProfilingLevel( n , slowms=100ms )n=0 profiler offn=1 record operations longer than slowmsn=2 record all queriesdb.system.profile.find()* The profile collection is a capped collection, and fixed in sizeProfiling Slow Ops
  • 29. db.recipes.find( { calories:{ $lt : 40 } }).explain( ){"cursor" : "BasicCursor" ,"n" : 42,"nscannedObjects” : 12345"nscanned" : 12345,..."millis" : 356,...}* Doesn’t use cached plans, re-evals and resets cacheThe Explain Plan (Pre Index)
  • 30. db.recipes.find( { calories:{ $lt : 40 } }).explain( ){"cursor" : "BtreeCursor calories_-1" ,"n" : 42,"nscannedObjects": 42"nscanned" : 42,..."millis" : 0,...}* Doesn’t use cached plans, re-evals and resets cacheThe Explain Plan (Post Index)
  • 31. The Query Optimizer• For each "type" of query, MongoDBperiodically tries all useful indexes• Aborts the rest as soon as one plan wins• The winning plan is temporarily cached foreach “type” of query
  • 32. // Tell the database what index to usedb.recipes.find({calories: { $lt: 1000 } }).hint({ _id: 1 })// Tell the database to NOT use an indexdb.recipes.find({ calories: { $lt: 1000 } }).hint({ $natural: 1 })Manually Select Index to Use
  • 33. // Given the following indexdb.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })// The following query and sort operations can use the indexdb.collection.find( ).sort({ a:1 })db.collection.find( ).sort({ a:1, b:1 })db.collection.find({ a:4 }).sort({ a:1, b:1 })db.collection.find({ b:5 }).sort({ a:1, b:1 })Use Indexes to Sort QueryResults
  • 34. // Given the following indexdb.collection.ensureIndex({ a:1, b:1, c:1, d:1 })// These can not sort using the indexdb.collection.find( ).sort({ b: 1 })db.collection.find({ b: 5 }).sort({ b: 1 })Indexes that won’t work forsorting query results
  • 35. // MongoDB can return data from just the indexdb.recipes.ensureIndex({ main_ingredient: 1, name: 1 })// Return only the ingredients fielddb.recipes.find({ main_ingredient: chicken’ },{ _id: 0, name: 1 })// indexOnly will be true in the explain plandb.recipes.find({ main_ingredient: chicken },{ _id: 0, name: 1 }).explain(){"indexOnly": true,}Index Covered Queries
  • 36. Absent or suboptimalindexes are the mostcommon avoidableMongoDB performanceproblem.
  • 37. Avoiding CommonMistakes
  • 38. // MongoDB can only use one index for a querydb.collection.ensureIndex({ a: 1 })db.collection.ensureIndex({ b: 1 })// Only one of the above indexes is useddb.collection.find({ a: 3, b: 4 })Trying to Use MultipleIndexes
  • 39. // Compound key indexes are very effectivedb.collection.ensureIndex({ a: 1, b: 1, c: 1 })// But only if the query is a prefix of the index// This query cant effectively use the indexdb.collection.find({ c: 2 })// …but this query candb.collection.find({ a: 3, b: 5 })Compound Key Mistakes
  • 40. db.collection.distinct(status’)[ new, processed ]db.collection.ensureIndex({ status: 1 })// Low selectivity indexes provide little benefitdb.collection.find({ status: new })// Betterdb.collection.ensureIndex({ status: 1, created_at: -1 })db.collection.find({ status: new }).sort({ created_at: -1 })Low Selectivity Indexes
  • 41. db.users.ensureIndex({ username: 1 })// Left anchored regex queries can use the indexdb.users.find({ username: /^joe smith/ })// But not generic regexesdb.users.find({username: /smith/ })// Or case insensitive queriesdb.users.find({ username: /Joe/i })Regular Expressions
  • 42. // Indexes arent helpful with negationsdb.things.ensureIndex({ x: 1 })// e.g. "not equal" queriesdb.things.find({ x: { $ne: 3 } })// …or "not in" queriesdb.things.find({ x: { $nin: [2, 3, 4 ] } })// …or the $not operatordb.people.find({ name: { $not: John Doe } })Negation
  • 43. Choosing the rightindexes is one of themost important thingsyou can do as aMongoDB developer sotake the time to get yourindexes right!
  • 44. Solutions Architect, 10genThomas BoydThank you