SlideShare a Scribd company logo
1 of 51
Download to read offline
Indexing	
  and	
  Query	
  Optimization
                                        Kevin	
  Matulef
                                   September	
  6,	
  2012




Thursday, September 6, 12
What’s in store

      • What are indexes?

      • Picking the right indexes.

      • Creating indexes in MongoDB

      • Troubleshooting




Thursday, September 6, 12
Indexes are the single biggest
         tunable performance factor
                in MongoDB.




Thursday, September 6, 12
Absent or suboptimal indexes are
      the most common avoidable
    MongoDB performance problem.




Thursday, September 6, 12
So what problem do indexes solve?




Thursday, September 6, 12
Thursday, September 6, 12
How do you find a chicken recipe?


      • An unindexed cookbook might be quite a
          page turner.

      • Probably not what you want, though.




Thursday, September 6, 12
I know, I’ll use an index!




Thursday, September 6, 12
Thursday, September 6, 12
Let’s imagine a simple index
                            ingredient             page

                            aardvark                790

                                ...                   ...

                              beef       190,	
  191,	
  205,	
  ...

                                ...                   ...

                             chicken     182,	
  199,	
  200,	
  ...	
  

                             chorizo             497,	
  ...

                                ...                   ...

                             zucchini        673,	
  986,	
  ...




Thursday, September 6, 12
How do you find a quick chicken recipe?




Thursday, September 6, 12
Let’s imagine a compound index
                            ingredient   cooking	
  time        page

                                ...            ...                 ...

                             chicken        15	
  min         182,	
  200

                             chicken        25	
  min            199

                             chicken        30	
  min      289,316,320

                             chicken        45	
  min      290,	
  291,	
  354

                                ...            ...                 ...




Thursday, September 6, 12
Consider the ordering of index keys




  Aardvark,	
  20	
  min    Chicken,	
  15	
  min                      Zuchinni,	
  45	
  min
                                   Chicken,	
  25	
  min
                                        Chicken,	
  30	
  min
                                               Chicken,	
  45	
  min




Thursday, September 6, 12
How about a low-calorie chicken recipe?




Thursday, September 6, 12
Let’s imagine a 2nd compound index

                            ingredient   calories     page


                                ...         ...         ...


                             chicken       250      199,	
  316


                             chicken       300      289,291


                             chicken       425         320


                                ...         ...         ...




Thursday, September 6, 12
How about a quick, low-calorie recipe?




Thursday, September 6, 12
Let’s imagine a last compound index
                            calories   cooking	
  time   page

                               ...           ...          ...

                              250         25	
  min      199

                              250         30	
  min      316

                              300         25	
  min      289

                              300         45	
  min      291
                              425         30	
  min      320

                               ...           ...          ...

          How do you find dishes from 250 to 300 calories that cook from
                               30 to 40 minutes?



Thursday, September 6, 12
Consider the ordering of index keys


              250	
  cal,   250	
  cal,   300	
  cal,   300	
  cal,   425	
  cal,
              25	
  min     30	
  min     25	
  min     45	
  min     30	
  min




          How do you find dishes from 250 to 300 calories that cook from
                               30 to 40 minutes?

                  4 index entries will be scanned, but only 1 will match!




Thursday, September 6, 12
Range queries using an index on A, B
      • A is a range J

      • A is constant, B is a range J

      • A is constant, order by B J

      • A is range, B is constant/range   K

      • B is constant/range, A unspecified L


Thursday, September 6, 12
It’s really that straightforward.




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




                                              13




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




                                                 13

            Queries,	
  Inserts,	
  Deletes:	
  O(log	
  n)


Thursday, September 6, 12
All this is relevant to MongoDB.
      • MongoDB’s indexes are B-Trees, which are
          designed for range queries.

      • Generally, the best index for your queries is
          going to be a compound index.

      • Every additional index slows down inserts &
          removes, and may slow updates.




Thursday, September 6, 12
On to MongoDB!




Thursday, September 6, 12
Declaring Indexes
      • db.foo.ensureIndex( { username : 1 } )




Thursday, September 6, 12
Declaring Indexes
      • db.foo.ensureIndex( { username : 1 } )


      • db.foo.ensureIndex( { username : 1, created_at : -1 } )




Thursday, September 6, 12
And managing them....
          > db.system.indexes.find() //db.foo.getIndexes()

           { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" }
           { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }




Thursday, September 6, 12
And managing them....
          > db.system.indexes.find() //db.foo.getIndexes()

           { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" }
           { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }



          > db.foo.dropIndex( { username : 1} )

            { "nIndexesWas" : 2 , "ok" : 1 }




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)


      • All queries can use just 1 index
                  (except $or queries).




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)


      • All queries can use just 1 index
                  (except $or queries).


      • The maximum index key size is 1024 bytes.



Thursday, September 6, 12
Indexes get used where you’d expect
           • db.foo.find({x : 42})
           • db.foo.find({x : {$in : [42,52]}})
           • db.foo.find({x : {$lt : 42})
           • update, findAndModify that select on x,
           • count, distinct,
           • $match in aggregation
           • left-anchored regexp, e.g. /^Kev/




Thursday, September 6, 12
But indexes aren’t always helpful
      • Most negations: $not, $nin, $ne


      • Some corner cases: $mod, $where


      • Matching most regular expressions, e.g. /a/
          or /foo/i




Thursday, September 6, 12
Advanced Options




Thursday, September 6, 12
Arrays: the powerful “multiKey” index
           { title : “Chicken Noodle Soup”,
             ingredients : [“chicken”, “noodles”] }

           >	
  db.foo.ensureIndex(	
  {	
  ingredients	
  :	
  1	
  }	
  )

                             ingredients                        page
                                chicken                          42
                                   ...                            ...
                               noodles                           42
                                   ...                            ...




Thursday, September 6, 12
Unique Indexes
     • db.foo.ensureIndex( { email : 1 } , {unique : true} )



          > db.foo.insert({email : “matulef@10gen.com”})
          > db.foo.insert({email : “matulef@10gen.com”})
             E11000 duplicate key error ...




Thursday, September 6, 12
Sparse Indexes
     • db.foo.ensureIndex( { email : 1 } , {sparse : true} )


                  No index entries for docs without “email” field




Thursday, September 6, 12
Geospatial Indexes
          { name: "10gen Office",
            lat_long: [ 52.5184, 13.387 ] }

          > db.foo.ensureIndex( { lat_long : “2d” } )

          > db.locations.find( { lat_long: {$near: [52.53, 13.4] } } )




Thursday, September 6, 12
Troubleshooting




Thursday, September 6, 12
The Query Optimizer
      • For each “type” of query, mongoDB
          periodically tries all useful indexes.

      • Aborts as soon as one plan wins.

      • Winning plan is temporarily cached.




Thursday, September 6, 12
Which plan wins? Explain!
      > db.foo.find( { t: { $lt : 40 } } ).explain( )
      {
        "cursor" : "BtreeCursor t_1" ,
        "n" : 42,
        “nscannedObjects: 42
        "nscanned" : 42,
         ...
        "millis" : 0,
         ...
      }




Thursday, September 6, 12
Which plan wins? Explain!
      > db.foo.find( { t: { $lt : 40 } } ).explain( )
      {
        "cursor" : "BtreeCursor t_1" ,
        "n" : 42,                          Pay attention to the
        “nscannedObjects: 42
        "nscanned" : 42,                    ratio	
  n/nscanned!
         ...
        "millis" : 0,
         ...
      }




Thursday, September 6, 12
Think you know better? Give us a hint
      > db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} )




Thursday, September 6, 12
Recording slow queries
      > db.setProfilingLevel( n , slowms=100ms )

      n=0 profiler off
      n=1 record queries longer than slowms
      n=2 record all queries

      > db.system.profile.find()




Thursday, September 6, 12
Operational Tips




Thursday, September 6, 12
Background index builds
          db.foo.ensureIndex( { user : 1 } , { background : true } )

          Caveats:
           • still resource-intensive
           • will build in foreground on secondaries




Thursday, September 6, 12
Minimizing impact on Replica Sets
          for (s in secondaries)
              s.restartAsStandalone()
              s.buildIndex()
              s.restartAsReplSetMember()
              s.waitForCatchup()

          p.stepDown()
          p.restartAsStandalone()
          p.buildIndex()
          p.restartAsReplSetMember()




Thursday, September 6, 12
Absent or suboptimal indexes are
     the most common avoidable
   MongoDB performance problem...

    ...so take some time and get your
               indexes right!


Thursday, September 6, 12
Thanks!

                (and thanks to Richard Kreuter for the slides)




Thursday, September 6, 12

More Related Content

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Indexing and Query Optimization Webinar

  • 1. Indexing  and  Query  Optimization Kevin  Matulef September  6,  2012 Thursday, September 6, 12
  • 2. What’s in store • What are indexes? • Picking the right indexes. • Creating indexes in MongoDB • Troubleshooting Thursday, September 6, 12
  • 3. Indexes are the single biggest tunable performance factor in MongoDB. Thursday, September 6, 12
  • 4. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem. Thursday, September 6, 12
  • 5. So what problem do indexes solve? Thursday, September 6, 12
  • 7. How do you find a chicken recipe? • An unindexed cookbook might be quite a page turner. • Probably not what you want, though. Thursday, September 6, 12
  • 8. I know, I’ll use an index! Thursday, September 6, 12
  • 10. Let’s imagine a simple index ingredient page aardvark 790 ... ... beef 190,  191,  205,  ... ... ... chicken 182,  199,  200,  ...   chorizo 497,  ... ... ... zucchini 673,  986,  ... Thursday, September 6, 12
  • 11. How do you find a quick chicken recipe? Thursday, September 6, 12
  • 12. Let’s imagine a compound index ingredient cooking  time page ... ... ... chicken 15  min 182,  200 chicken 25  min 199 chicken 30  min 289,316,320 chicken 45  min 290,  291,  354 ... ... ... Thursday, September 6, 12
  • 13. Consider the ordering of index keys Aardvark,  20  min Chicken,  15  min Zuchinni,  45  min Chicken,  25  min Chicken,  30  min Chicken,  45  min Thursday, September 6, 12
  • 14. How about a low-calorie chicken recipe? Thursday, September 6, 12
  • 15. Let’s imagine a 2nd compound index ingredient calories page ... ... ... chicken 250 199,  316 chicken 300 289,291 chicken 425 320 ... ... ... Thursday, September 6, 12
  • 16. How about a quick, low-calorie recipe? Thursday, September 6, 12
  • 17. Let’s imagine a last compound index calories cooking  time page ... ... ... 250 25  min 199 250 30  min 316 300 25  min 289 300 45  min 291 425 30  min 320 ... ... ... How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes? Thursday, September 6, 12
  • 18. Consider the ordering of index keys 250  cal, 250  cal, 300  cal, 300  cal, 425  cal, 25  min 30  min 25  min 45  min 30  min How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes? 4 index entries will be scanned, but only 1 will match! Thursday, September 6, 12
  • 19. Range queries using an index on A, B • A is a range J • A is constant, B is a range J • A is constant, order by B J • A is range, B is constant/range K • B is constant/range, A unspecified L Thursday, September 6, 12
  • 20. It’s really that straightforward. Thursday, September 6, 12
  • 21. B-Trees (Bayer & McCreight ’72) Thursday, September 6, 12
  • 22. B-Trees (Bayer & McCreight ’72) 13 Thursday, September 6, 12
  • 23. B-Trees (Bayer & McCreight ’72) 13 Queries,  Inserts,  Deletes:  O(log  n) Thursday, September 6, 12
  • 24. All this is relevant to MongoDB. • MongoDB’s indexes are B-Trees, which are designed for range queries. • Generally, the best index for your queries is going to be a compound index. • Every additional index slows down inserts & removes, and may slow updates. Thursday, September 6, 12
  • 25. On to MongoDB! Thursday, September 6, 12
  • 26. Declaring Indexes • db.foo.ensureIndex( { username : 1 } ) Thursday, September 6, 12
  • 27. Declaring Indexes • db.foo.ensureIndex( { username : 1 } ) • db.foo.ensureIndex( { username : 1, created_at : -1 } ) Thursday, September 6, 12
  • 28. And managing them.... > db.system.indexes.find() //db.foo.getIndexes() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" } Thursday, September 6, 12
  • 29. And managing them.... > db.system.indexes.find() //db.foo.getIndexes() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" } > db.foo.dropIndex( { username : 1} ) { "nIndexesWas" : 2 , "ok" : 1 } Thursday, September 6, 12
  • 30. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. Thursday, September 6, 12
  • 31. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) Thursday, September 6, 12
  • 32. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) • All queries can use just 1 index (except $or queries). Thursday, September 6, 12
  • 33. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) • All queries can use just 1 index (except $or queries). • The maximum index key size is 1024 bytes. Thursday, September 6, 12
  • 34. Indexes get used where you’d expect • db.foo.find({x : 42}) • db.foo.find({x : {$in : [42,52]}}) • db.foo.find({x : {$lt : 42}) • update, findAndModify that select on x, • count, distinct, • $match in aggregation • left-anchored regexp, e.g. /^Kev/ Thursday, September 6, 12
  • 35. But indexes aren’t always helpful • Most negations: $not, $nin, $ne • Some corner cases: $mod, $where • Matching most regular expressions, e.g. /a/ or /foo/i Thursday, September 6, 12
  • 37. Arrays: the powerful “multiKey” index { title : “Chicken Noodle Soup”, ingredients : [“chicken”, “noodles”] } >  db.foo.ensureIndex(  {  ingredients  :  1  }  ) ingredients page chicken 42 ... ... noodles 42 ... ... Thursday, September 6, 12
  • 38. Unique Indexes • db.foo.ensureIndex( { email : 1 } , {unique : true} ) > db.foo.insert({email : “matulef@10gen.com”}) > db.foo.insert({email : “matulef@10gen.com”}) E11000 duplicate key error ... Thursday, September 6, 12
  • 39. Sparse Indexes • db.foo.ensureIndex( { email : 1 } , {sparse : true} ) No index entries for docs without “email” field Thursday, September 6, 12
  • 40. Geospatial Indexes { name: "10gen Office", lat_long: [ 52.5184, 13.387 ] } > db.foo.ensureIndex( { lat_long : “2d” } ) > db.locations.find( { lat_long: {$near: [52.53, 13.4] } } ) Thursday, September 6, 12
  • 42. The Query Optimizer • For each “type” of query, mongoDB periodically tries all useful indexes. • Aborts as soon as one plan wins. • Winning plan is temporarily cached. Thursday, September 6, 12
  • 43. Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ... } Thursday, September 6, 12
  • 44. Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor t_1" , "n" : 42, Pay attention to the “nscannedObjects: 42 "nscanned" : 42, ratio  n/nscanned! ... "millis" : 0, ... } Thursday, September 6, 12
  • 45. Think you know better? Give us a hint > db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} ) Thursday, September 6, 12
  • 46. Recording slow queries > db.setProfilingLevel( n , slowms=100ms ) n=0 profiler off n=1 record queries longer than slowms n=2 record all queries > db.system.profile.find() Thursday, September 6, 12
  • 48. Background index builds db.foo.ensureIndex( { user : 1 } , { background : true } ) Caveats: • still resource-intensive • will build in foreground on secondaries Thursday, September 6, 12
  • 49. Minimizing impact on Replica Sets for (s in secondaries) s.restartAsStandalone() s.buildIndex() s.restartAsReplSetMember() s.waitForCatchup() p.stepDown() p.restartAsStandalone() p.buildIndex() p.restartAsReplSetMember() Thursday, September 6, 12
  • 50. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem... ...so take some time and get your indexes right! Thursday, September 6, 12
  • 51. Thanks! (and thanks to Richard Kreuter for the slides) Thursday, September 6, 12