#MongoDBDays




Indexing and Query
Optimization
Thomas Rückstieß
Technical Support Engineer
thomas@10gen.com
Agenda
• What are indexes?
• Why do I need them?
• Working with indexes in MongoDB
• Optimize your queries
• Avoiding common mistakes
What are indexes?
KRISTINE TO INSERT IMAGE OF COOKBOOK




Find a recipe by name:
“Currywurst”
What are indexes?
 • How would you find a recipe using chicken?
 • How about a 300-500 calorie recipe using
   chicken?
Chicken                                88 kcal
     88 kcal: Chicken soup                 Chicken: Chicken soup
     356 kcal: Chicken Ceasar Salad    356 kcal
     480 kcal: Teriyaki Chicken            Chicken: Chicken Ceasar Salad
     680 kcal: Buffalo Chicken Wings   412 kcal
Fish                                       Fish: Tuna Sandwich
     412 kcal: Tuna Sandwich           480 kcal
Pork                                       Pork: Curry Wurst
     480 kcal: Curry Wurst                 Chicken: Teriyaki Chicken
                                       680 kcal
                                           Chicken: Buffalo Chicken Wings
1   2   3    4    5   6   7




        Linked List
1    2    3     4    5     6   7




    Finding 7 in Linked List
4


    2                       6


1          3        5           7


        Finding 7 in Tree
Indexes in MongoDB are B-trees
Indexes are the single
biggest tunable
performance factor in
MongoDB
Absent or suboptimal
indexes are the most
common avoidable
MongoDB performance
problem.
Why do I need indexes?
A brief story

    5 min




   30 sec
Why do I need indexes?



                         3h
Working with Indexes in
MongoDB
How do I create indexes?

// The client remembers the index and raises no errors
db.recipes.ensureIndex({ main_ingredient: 1 })




* 1 means ascending, -1 descending
What can be indexed?
// Multiple fields (compound indexes)
db.recipes.ensureIndex({
   main_ingredient: 1,
   calories: -1
})

// Arrays of values (multikey indexes)
{
   name: ’Curry Wurst mit Pommes’,
   ingredients : [’pork', ’curry']
}

db.recipes.ensureIndex({ ingredients: 1 })




                                             Image: http://www.marions-kochbuch.com/
What can be indexed?
// Subdocuments
{
   name : ’Curry Wurst mit Pommes',
   contributor: {
     name: ’Hans Wurst',
     id: ’hawu36'
   }
}

db.recipes.ensureIndex({ 'contributor.id': 1 })

db.recipes.ensureIndex({ 'contributor': 1 })




                                                  Image: http://www.marions-kochbuch.com/
How do I manage indexes?
// List a collection's indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()


// Drop a specific index
db.recipes.dropIndex({ ingredients: 1 })


// Drop all indexes and recreate them
db.recipes.reIndex()


// Default (unique) index on _id
Background Index Builds
// Index creation is a blocking operation that can take a long time
// Background creation yields to other operations
db.recipes.ensureIndex(
    { ingredients: 1 },
    { background: true }
)
Options
• Uniqueness constraints (unique, dropDups)
• Sparse Indexes
• Geospatial (2d) Indexes
• TTL Collections (expireAfterSeconds)
Uniqueness Constraints
// Only one recipe can have a given value for name
db.recipes.ensureIndex( { name: 1 }, { unique: true } )


// Force index on collection with duplicate recipe names – drop the
duplicates
db.recipes.ensureIndex(
    { name: 1 },
    { unique: true, dropDups: true }
)


* dropDups is probably never what you want



                                                          image: www.idownloadblog.com
Sparse Indexes
// Only documents with field calories will be indexed
db.recipes.ensureIndex(
    { calories: -1 },
    { sparse: true }
)
// Allow multiple documents to not have calories field
db.recipes.ensureIndex(
    { name: 1 , calories: -1 },
    { unique: true, sparse: true }
)
* Missing fields are stored as null(s) in the index
Geospatial Indexes
// Add latitude, longitude coordinates
{
     name: ’Curry 36 am Mehringdamm’,
     loc: [ 13.387764, 52.493442]
}
// Index the coordinates
db.locations.ensureIndex( { loc : '2d' } )


// Query for locations 'near' a particular coordinate
db.locations.find({
     loc: { $near: [ 37.4, -122.3 ] }
})


                                                        image: NASA
TTL Collections
// Documents must have a BSON UTC Date field
{ ’submitted_date' :
     ISODate('2012-10-12T05:24:07.211Z'), … }


// Documents are removed after
// 'expireAfterSeconds' seconds
db.recipes.ensureIndex(
    { submitted_date: 1 },
    { expireAfterSeconds: 3600 }
)




                                     image: taylordsdn112.wordpress.com
Limitations
• Collections can not have > 64 indexes.

• Index keys can not be > 1024 bytes (1K).

• The name of an index, including the namespace, must be <
  128 characters.
• Queries can only use 1 index*

• Indexes have storage requirements, and impact the
  performance of writes.
• In memory sort (no-index) limited to 32mb of return data.
Optimize Your Queries
Profiling Slow Ops
db.setProfilingLevel( n , slowms=100ms )


n=0 profiler off
n=1 record operations longer than slowms
n=2 record all queries


db.system.profile.find()




* The profile collection is a capped collection, and fixed in size




                                                                image: http://www.speareducation.com/
The Explain Plan (without
Index)
db.recipes.find( { calories:
    { $lt : 40 } }
).explain( )
{
    "cursor" : "BasicCursor" ,
    "n" : 42,
    "nscannedObjects” : 12345
    "nscanned" : 12345,
    ...
    "millis" : 356,
    ...
}
* Doesn’t use cached plans, re-evals and resets cache
The Explain Plan (with Index)
db.recipes.find( { calories:
    { $lt : 40 } }
).explain( )
{
    "cursor" : "BtreeCursor calories_-1" ,
    "n" : 42,
    "nscannedObjects": 42
    "nscanned" : 42,
    ...
    "millis" : 0,
    ...
}
* Doesn’t use cached plans, re-evals and resets cache
The Query Optimizer
• For each "type" of query, MongoDB
  periodically tries all useful indexes
• Aborts the rest as soon as one plan wins
• The winning plan is temporarily cached for
  each “type” of query
Manually Select Index to Use
// Tell the database what index to use
db.recipes.find({
  calories: { $lt: 1000 } }
).hint({ _id: 1 })


// Tell the database to NOT use an index
db.recipes.find(
  { calories: { $lt: 1000 } }
).hint({ $natural: 1 })
Use Indexes to Sort Query
Results
// Given the following index
db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })

// The following query and sort operations can use the index
db.collection.find( ).sort({ a:1 })
db.collection.find( ).sort({ a:1, b:1 })

db.collection.find({ a:4 }).sort({ a:1, b:1 })
db.collection.find({ b:5 }).sort({ a:1, b:1 })
Indexes that won’t work for
sorting query results
// Given the following index
db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })


// These can not sort using the index
db.collection.find( ).sort({ b: 1 })
db.collection.find({ b: 5 }).sort({ b: 1 })
Index Covered Queries
// MongoDB can return data from just the index
db.recipes.ensureIndex({ main_ingredient: 1, name: 1 })

// Return only the ingredients field
db.recipes.find(
   { main_ingredient: 'chicken’ },
   { _id: 0, name: 1 }
)

// indexOnly will be true in the explain plan
db.recipes.find(
    { main_ingredient: 'chicken' },
    { _id: 0, name: 1 }
).explain()
{
    "indexOnly": true,
}
Absent or suboptimal
indexes are the most
common avoidable
MongoDB performance
problem.
Avoiding Common
Mistakes
Trying to Use Multiple
Indexes
// MongoDB can only use one index for a query
db.collection.ensureIndex({ a: 1 })
db.collection.ensureIndex({ b: 1 })


// Only one of the above indexes is used
db.collection.find({ a: 3, b: 4 })
Compound Key Mistakes
// Compound key indexes are very effective
db.collection.ensureIndex({ a: 1, b: 1, c: 1 })


// But only if the query is a prefix of the index


// This query can't effectively use the index
db.collection.find({ c: 2 })


// …but this query can
db.collection.find({ a: 3, b: 5 })
Low Selectivity Indexes
db.collection.distinct('status’)
[ 'new', 'processed' ]


db.collection.ensureIndex({ status: 1 })


// Low selectivity indexes provide little benefit
db.collection.find({ status: 'new' })


// Better
db.collection.ensureIndex({ status: 1, created_at: -1 })
db.collection.find(
  { status: 'new' }
).sort({ created_at: -1 })
Regular Expressions
db.users.ensureIndex({ username: 1 })


// Left anchored regex queries can use the index
db.users.find({ username: /^hans wurst/ })


// But not generic regexes
db.users.find({username: /wurst/ })


// Or case insensitive queries
db.users.find({ username: /Hans/i })
Negation
// Indexes aren't helpful with negations
db.things.ensureIndex({ x: 1 })

// e.g. "not equal" queries
db.things.find({ x: { $ne: 3 } })

// …or "not in" queries
db.things.find({ x: { $nin: [2, 3, 4 ] } })

// …or the $not operator
db.people.find({ name: { $not: ’Hans Wurst' } })
Choosing the right
indexes is one of the
most important things
you can do as a
MongoDB developer so
take the time to get your
indexes right!
#MongoDBDays




Thank you
Thomas Rückstieß
Technical Support Engineer
thomas@10gen.com

Indexing and Query Optimization

  • 1.
    #MongoDBDays Indexing and Query Optimization ThomasRückstieß Technical Support Engineer thomas@10gen.com
  • 2.
    Agenda • What areindexes? • Why do I need them? • Working with indexes in MongoDB • Optimize your queries • Avoiding common mistakes
  • 3.
  • 4.
    KRISTINE TO INSERTIMAGE OF COOKBOOK Find a recipe by name: “Currywurst”
  • 5.
    What are indexes? • How would you find a recipe using chicken? • How about a 300-500 calorie recipe using chicken? Chicken 88 kcal 88 kcal: Chicken soup Chicken: Chicken soup 356 kcal: Chicken Ceasar Salad 356 kcal 480 kcal: Teriyaki Chicken Chicken: Chicken Ceasar Salad 680 kcal: Buffalo Chicken Wings 412 kcal Fish Fish: Tuna Sandwich 412 kcal: Tuna Sandwich 480 kcal Pork Pork: Curry Wurst 480 kcal: Curry Wurst Chicken: Teriyaki Chicken 680 kcal Chicken: Buffalo Chicken Wings
  • 6.
    1 2 3 4 5 6 7 Linked List
  • 7.
    1 2 3 4 5 6 7 Finding 7 in Linked List
  • 8.
    4 2 6 1 3 5 7 Finding 7 in Tree
  • 9.
    Indexes in MongoDBare B-trees
  • 10.
    Indexes are thesingle biggest tunable performance factor in MongoDB
  • 11.
    Absent or suboptimal indexesare the most common avoidable MongoDB performance problem.
  • 12.
    Why do Ineed indexes? A brief story 5 min 30 sec
  • 13.
    Why do Ineed indexes? 3h
  • 14.
  • 15.
    How do Icreate indexes? // The client remembers the index and raises no errors db.recipes.ensureIndex({ main_ingredient: 1 }) * 1 means ascending, -1 descending
  • 16.
    What can beindexed? // Multiple fields (compound indexes) db.recipes.ensureIndex({ main_ingredient: 1, calories: -1 }) // Arrays of values (multikey indexes) { name: ’Curry Wurst mit Pommes’, ingredients : [’pork', ’curry'] } db.recipes.ensureIndex({ ingredients: 1 }) Image: http://www.marions-kochbuch.com/
  • 17.
    What can beindexed? // Subdocuments { name : ’Curry Wurst mit Pommes', contributor: { name: ’Hans Wurst', id: ’hawu36' } } db.recipes.ensureIndex({ 'contributor.id': 1 }) db.recipes.ensureIndex({ 'contributor': 1 }) Image: http://www.marions-kochbuch.com/
  • 18.
    How do Imanage indexes? // List a collection's indexes db.recipes.getIndexes() db.recipes.getIndexKeys() // Drop a specific index db.recipes.dropIndex({ ingredients: 1 }) // Drop all indexes and recreate them db.recipes.reIndex() // Default (unique) index on _id
  • 19.
    Background Index Builds //Index creation is a blocking operation that can take a long time // Background creation yields to other operations db.recipes.ensureIndex( { ingredients: 1 }, { background: true } )
  • 20.
    Options • Uniqueness constraints(unique, dropDups) • Sparse Indexes • Geospatial (2d) Indexes • TTL Collections (expireAfterSeconds)
  • 21.
    Uniqueness Constraints // Onlyone recipe can have a given value for name db.recipes.ensureIndex( { name: 1 }, { unique: true } ) // Force index on collection with duplicate recipe names – drop the duplicates db.recipes.ensureIndex( { name: 1 }, { unique: true, dropDups: true } ) * dropDups is probably never what you want image: www.idownloadblog.com
  • 22.
    Sparse Indexes // Onlydocuments with field calories will be indexed db.recipes.ensureIndex( { calories: -1 }, { sparse: true } ) // Allow multiple documents to not have calories field db.recipes.ensureIndex( { name: 1 , calories: -1 }, { unique: true, sparse: true } ) * Missing fields are stored as null(s) in the index
  • 23.
    Geospatial Indexes // Addlatitude, longitude coordinates { name: ’Curry 36 am Mehringdamm’, loc: [ 13.387764, 52.493442] } // Index the coordinates db.locations.ensureIndex( { loc : '2d' } ) // Query for locations 'near' a particular coordinate db.locations.find({ loc: { $near: [ 37.4, -122.3 ] } }) image: NASA
  • 24.
    TTL Collections // Documentsmust have a BSON UTC Date field { ’submitted_date' : ISODate('2012-10-12T05:24:07.211Z'), … } // Documents are removed after // 'expireAfterSeconds' seconds db.recipes.ensureIndex( { submitted_date: 1 }, { expireAfterSeconds: 3600 } ) image: taylordsdn112.wordpress.com
  • 25.
    Limitations • Collections cannot have > 64 indexes. • Index keys can not be > 1024 bytes (1K). • The name of an index, including the namespace, must be < 128 characters. • Queries can only use 1 index* • Indexes have storage requirements, and impact the performance of writes. • In memory sort (no-index) limited to 32mb of return data.
  • 26.
  • 27.
    Profiling Slow Ops db.setProfilingLevel(n , slowms=100ms ) n=0 profiler off n=1 record operations longer than slowms n=2 record all queries db.system.profile.find() * The profile collection is a capped collection, and fixed in size image: http://www.speareducation.com/
  • 28.
    The Explain Plan(without Index) db.recipes.find( { calories: { $lt : 40 } } ).explain( ) { "cursor" : "BasicCursor" , "n" : 42, "nscannedObjects” : 12345 "nscanned" : 12345, ... "millis" : 356, ... } * Doesn’t use cached plans, re-evals and resets cache
  • 29.
    The Explain Plan(with Index) db.recipes.find( { calories: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor calories_-1" , "n" : 42, "nscannedObjects": 42 "nscanned" : 42, ... "millis" : 0, ... } * Doesn’t use cached plans, re-evals and resets cache
  • 30.
    The Query Optimizer •For each "type" of query, MongoDB periodically tries all useful indexes • Aborts the rest as soon as one plan wins • The winning plan is temporarily cached for each “type” of query
  • 31.
    Manually Select Indexto Use // Tell the database what index to use db.recipes.find({ calories: { $lt: 1000 } } ).hint({ _id: 1 }) // Tell the database to NOT use an index db.recipes.find( { calories: { $lt: 1000 } } ).hint({ $natural: 1 })
  • 32.
    Use Indexes toSort Query Results // Given the following index db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 }) // The following query and sort operations can use the index db.collection.find( ).sort({ a:1 }) db.collection.find( ).sort({ a:1, b:1 }) db.collection.find({ a:4 }).sort({ a:1, b:1 }) db.collection.find({ b:5 }).sort({ a:1, b:1 })
  • 33.
    Indexes that won’twork for sorting query results // Given the following index db.collection.ensureIndex({ a:1, b:1, c:1, d:1 }) // These can not sort using the index db.collection.find( ).sort({ b: 1 }) db.collection.find({ b: 5 }).sort({ b: 1 })
  • 34.
    Index Covered Queries //MongoDB can return data from just the index db.recipes.ensureIndex({ main_ingredient: 1, name: 1 }) // Return only the ingredients field db.recipes.find( { main_ingredient: 'chicken’ }, { _id: 0, name: 1 } ) // indexOnly will be true in the explain plan db.recipes.find( { main_ingredient: 'chicken' }, { _id: 0, name: 1 } ).explain() { "indexOnly": true, }
  • 35.
    Absent or suboptimal indexesare the most common avoidable MongoDB performance problem.
  • 36.
  • 37.
    Trying to UseMultiple Indexes // MongoDB can only use one index for a query db.collection.ensureIndex({ a: 1 }) db.collection.ensureIndex({ b: 1 }) // Only one of the above indexes is used db.collection.find({ a: 3, b: 4 })
  • 38.
    Compound Key Mistakes //Compound key indexes are very effective db.collection.ensureIndex({ a: 1, b: 1, c: 1 }) // But only if the query is a prefix of the index // This query can't effectively use the index db.collection.find({ c: 2 }) // …but this query can db.collection.find({ a: 3, b: 5 })
  • 39.
    Low Selectivity Indexes db.collection.distinct('status’) ['new', 'processed' ] db.collection.ensureIndex({ status: 1 }) // Low selectivity indexes provide little benefit db.collection.find({ status: 'new' }) // Better db.collection.ensureIndex({ status: 1, created_at: -1 }) db.collection.find( { status: 'new' } ).sort({ created_at: -1 })
  • 40.
    Regular Expressions db.users.ensureIndex({ username:1 }) // Left anchored regex queries can use the index db.users.find({ username: /^hans wurst/ }) // But not generic regexes db.users.find({username: /wurst/ }) // Or case insensitive queries db.users.find({ username: /Hans/i })
  • 41.
    Negation // Indexes aren'thelpful with negations db.things.ensureIndex({ x: 1 }) // e.g. "not equal" queries db.things.find({ x: { $ne: 3 } }) // …or "not in" queries db.things.find({ x: { $nin: [2, 3, 4 ] } }) // …or the $not operator db.people.find({ name: { $not: ’Hans Wurst' } })
  • 42.
    Choosing the right indexesis one of the most important things you can do as a MongoDB developer so take the time to get your indexes right!
  • 43.
    #MongoDBDays Thank you Thomas Rückstieß TechnicalSupport Engineer thomas@10gen.com

Editor's Notes

  • #4 First part of this talk is more conceptualSecond part is more detailed and MongoDB specific.
  • #5 Not just a database concept, have been around for a long timeServe the same purpose: access information quickly and efficiently.
  • #7 Find recipe by name: CurrywurstHere: compound index on (Reipe Type, Name, Page Number)
  • #8 Really depends on the order
  • #9 Humans have ability to quickly skim over the recipe page and find the name they are looking forMove into the world of computers:Linked lists
  • #10 Look at 7 documents
  • #11 Binary TreesQueries, inserts and deletes: O(log(n)) time
  • #12 MongoDB&apos;s indexes are B-Trees.German invention: Prof. Rudolph BayerLookups (queries), inserts and deletes happen in O(log(n)) time.
  • #13 So this is helpful, and can speed up queries by a tremendous amount
  • #14 So it’s imperative we understand them
  • #16 As Support Engineer, get questions about performance all the timeUsually :“Everything got really slow”“We didn’t change anything” – RIGHTLast week: They didn’t change anythingDBA ran a few queries without an indexNot only slow, slow down + blocks reads, pulled 20M documents into RAM, ejecting working set
  • #17 As Support Engineer, get questions about performance all the timeUsually :“Everything got really slow”“We didn’t change anything” – RIGHTLast week: They didn’t change anythingDBA ran a few queries without an indexNot only slow, slow down + blocks reads, pulled 20M documents into RAM, ejecting working set
  • #19 Repeated calls to ensureIndex only result in one create message going to the server. The index is cached client side for some period of time (varies by driver).Check Manual.
  • #22 reIndex drops *all* indexes (including the _id index) and rebuilds themExpensive operation. Don’t need to do this often. Corruption / Fragmentation.
  • #23 Caveats:Still a resource-intensive operationIndex build is slowerIndexes are still built in the foreground on secondaries
  • #24 Repeat what wejust did:Basic operationsNow looking at some of the options and different index types
  • #25 unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!
  • #26 MongoDB doesn&apos;t enforce a schemaSparse indexes only contain entries for documents that have the indexed field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple &apos;null&apos; values violate the unique constraint.Useful? Imagine: high_priority tag on only 1 % of your data. Can use sparse index to retrieve quickly.
  • #27 Only touch on Geo indexes.&apos;2d&apos; index is a geohash on top of the b-tree.Allows you to search for documents &apos;near&apos; a long/lat position. Bounds queries are also possible using $within.Good for mobile apps: Find 50 nearest “currywurstbuden” close to me2.4 brings even more geo features.
  • #28 Index must be on a BSON date field.Documents are removed after expireAfterSeconds seconds.Reaper thread runs every 60 seconds.
  • #29 Indexes are a really powerful feature of MongoDB, so you need to understand the limitations.*With the exception of $or queries.If index key exceeds 1k, documents silently dropped/not included
  • #30 Another question we often get in Support:How do we know that our indexes are efficientHow do we know that our queries use the right indexes
  • #31 n=2 lot of writes to profile collection, usually n=1.
  • #32 cursor – the type of cursor used. BasicCursor means no index was used. TODO: Use a real example here instead of made up numbers…n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • #33 cursor – the type of cursor used. BasicCursor means no index was used.n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
  • #34 Winning plan is reevaluated after 1000 write operations (insert, update, remove, etc.).
  • #35 Tells MongoDB exactly what index to use.
  • #36 MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.Add slide before this to explain sorting. Also: break
  • #37 MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.
  • #38 Rework to go along with the cookbook example
  • #40 SupportThese things come up oftenUnderstand now, don’t make same mistakes
  • #43 Better to use a compound index on the low selectivity field and some other more selective field.