SlideShare a Scribd company logo
1 of 51
Download to read offline
Indexing	
  and	
  Query	
  Optimization
                                        Kevin	
  Matulef
                                   September	
  6,	
  2012




Thursday, September 6, 12
What’s in store

      • What are indexes?

      • Picking the right indexes.

      • Creating indexes in MongoDB

      • Troubleshooting




Thursday, September 6, 12
Indexes are the single biggest
         tunable performance factor
                in MongoDB.




Thursday, September 6, 12
Absent or suboptimal indexes are
      the most common avoidable
    MongoDB performance problem.




Thursday, September 6, 12
So what problem do indexes solve?




Thursday, September 6, 12
Thursday, September 6, 12
How do you find a chicken recipe?


      • An unindexed cookbook might be quite a
          page turner.

      • Probably not what you want, though.




Thursday, September 6, 12
I know, I’ll use an index!




Thursday, September 6, 12
Thursday, September 6, 12
Let’s imagine a simple index
                            ingredient             page

                            aardvark                790

                                ...                   ...

                              beef       190,	
  191,	
  205,	
  ...

                                ...                   ...

                             chicken     182,	
  199,	
  200,	
  ...	
  

                             chorizo             497,	
  ...

                                ...                   ...

                             zucchini        673,	
  986,	
  ...




Thursday, September 6, 12
How do you find a quick chicken recipe?




Thursday, September 6, 12
Let’s imagine a compound index
                            ingredient   cooking	
  time        page

                                ...            ...                 ...

                             chicken        15	
  min         182,	
  200

                             chicken        25	
  min            199

                             chicken        30	
  min      289,316,320

                             chicken        45	
  min      290,	
  291,	
  354

                                ...            ...                 ...




Thursday, September 6, 12
Consider the ordering of index keys




  Aardvark,	
  20	
  min    Chicken,	
  15	
  min                      Zuchinni,	
  45	
  min
                                   Chicken,	
  25	
  min
                                        Chicken,	
  30	
  min
                                               Chicken,	
  45	
  min




Thursday, September 6, 12
How about a low-calorie chicken recipe?




Thursday, September 6, 12
Let’s imagine a 2nd compound index

                            ingredient   calories     page


                                ...         ...         ...


                             chicken       250      199,	
  316


                             chicken       300      289,291


                             chicken       425         320


                                ...         ...         ...




Thursday, September 6, 12
How about a quick, low-calorie recipe?




Thursday, September 6, 12
Let’s imagine a last compound index
                            calories   cooking	
  time   page

                               ...           ...          ...

                              250         25	
  min      199

                              250         30	
  min      316

                              300         25	
  min      289

                              300         45	
  min      291
                              425         30	
  min      320

                               ...           ...          ...

          How do you find dishes from 250 to 300 calories that cook from
                               30 to 40 minutes?



Thursday, September 6, 12
Consider the ordering of index keys


              250	
  cal,   250	
  cal,   300	
  cal,   300	
  cal,   425	
  cal,
              25	
  min     30	
  min     25	
  min     45	
  min     30	
  min




          How do you find dishes from 250 to 300 calories that cook from
                               30 to 40 minutes?

                  4 index entries will be scanned, but only 1 will match!




Thursday, September 6, 12
Range queries using an index on A, B
      • A is a range J

      • A is constant, B is a range J

      • A is constant, order by B J

      • A is range, B is constant/range   K

      • B is constant/range, A unspecified L


Thursday, September 6, 12
It’s really that straightforward.




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




                                              13




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




                                                 13

            Queries,	
  Inserts,	
  Deletes:	
  O(log	
  n)


Thursday, September 6, 12
All this is relevant to MongoDB.
      • MongoDB’s indexes are B-Trees, which are
          designed for range queries.

      • Generally, the best index for your queries is
          going to be a compound index.

      • Every additional index slows down inserts &
          removes, and may slow updates.




Thursday, September 6, 12
On to MongoDB!




Thursday, September 6, 12
Declaring Indexes
      • db.foo.ensureIndex( { username : 1 } )




Thursday, September 6, 12
Declaring Indexes
      • db.foo.ensureIndex( { username : 1 } )


      • db.foo.ensureIndex( { username : 1, created_at : -1 } )




Thursday, September 6, 12
And managing them....
          > db.system.indexes.find() //db.foo.getIndexes()

           { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" }
           { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }




Thursday, September 6, 12
And managing them....
          > db.system.indexes.find() //db.foo.getIndexes()

           { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" }
           { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }



          > db.foo.dropIndex( { username : 1} )

            { "nIndexesWas" : 2 , "ok" : 1 }




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)


      • All queries can use just 1 index
                  (except $or queries).




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)


      • All queries can use just 1 index
                  (except $or queries).


      • The maximum index key size is 1024 bytes.



Thursday, September 6, 12
Indexes get used where you’d expect
           • db.foo.find({x : 42})
           • db.foo.find({x : {$in : [42,52]}})
           • db.foo.find({x : {$lt : 42})
           • update, findAndModify that select on x,
           • count, distinct,
           • $match in aggregation
           • left-anchored regexp, e.g. /^Kev/




Thursday, September 6, 12
But indexes aren’t always helpful
      • Most negations: $not, $nin, $ne


      • Some corner cases: $mod, $where


      • Matching most regular expressions, e.g. /a/
          or /foo/i




Thursday, September 6, 12
Advanced Options




Thursday, September 6, 12
Arrays: the powerful “multiKey” index
           { title : “Chicken Noodle Soup”,
             ingredients : [“chicken”, “noodles”] }

           >	
  db.foo.ensureIndex(	
  {	
  ingredients	
  :	
  1	
  }	
  )

                             ingredients                        page
                                chicken                          42
                                   ...                            ...
                               noodles                           42
                                   ...                            ...




Thursday, September 6, 12
Unique Indexes
     • db.foo.ensureIndex( { email : 1 } , {unique : true} )



          > db.foo.insert({email : “matulef@10gen.com”})
          > db.foo.insert({email : “matulef@10gen.com”})
             E11000 duplicate key error ...




Thursday, September 6, 12
Sparse Indexes
     • db.foo.ensureIndex( { email : 1 } , {sparse : true} )


                  No index entries for docs without “email” field




Thursday, September 6, 12
Geospatial Indexes
          { name: "10gen Office",
            lat_long: [ 52.5184, 13.387 ] }

          > db.foo.ensureIndex( { lat_long : “2d” } )

          > db.locations.find( { lat_long: {$near: [52.53, 13.4] } } )




Thursday, September 6, 12
Troubleshooting




Thursday, September 6, 12
The Query Optimizer
      • For each “type” of query, mongoDB
          periodically tries all useful indexes.

      • Aborts as soon as one plan wins.

      • Winning plan is temporarily cached.




Thursday, September 6, 12
Which plan wins? Explain!
      > db.foo.find( { t: { $lt : 40 } } ).explain( )
      {
        "cursor" : "BtreeCursor t_1" ,
        "n" : 42,
        “nscannedObjects: 42
        "nscanned" : 42,
         ...
        "millis" : 0,
         ...
      }




Thursday, September 6, 12
Which plan wins? Explain!
      > db.foo.find( { t: { $lt : 40 } } ).explain( )
      {
        "cursor" : "BtreeCursor t_1" ,
        "n" : 42,                          Pay attention to the
        “nscannedObjects: 42
        "nscanned" : 42,                    ratio	
  n/nscanned!
         ...
        "millis" : 0,
         ...
      }




Thursday, September 6, 12
Think you know better? Give us a hint
      > db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} )




Thursday, September 6, 12
Recording slow queries
      > db.setProfilingLevel( n , slowms=100ms )

      n=0 profiler off
      n=1 record queries longer than slowms
      n=2 record all queries

      > db.system.profile.find()




Thursday, September 6, 12
Operational Tips




Thursday, September 6, 12
Background index builds
          db.foo.ensureIndex( { user : 1 } , { background : true } )

          Caveats:
           • still resource-intensive
           • will build in foreground on secondaries




Thursday, September 6, 12
Minimizing impact on Replica Sets
          for (s in secondaries)
              s.restartAsStandalone()
              s.buildIndex()
              s.restartAsReplSetMember()
              s.waitForCatchup()

          p.stepDown()
          p.restartAsStandalone()
          p.buildIndex()
          p.restartAsReplSetMember()




Thursday, September 6, 12
Absent or suboptimal indexes are
     the most common avoidable
   MongoDB performance problem...

    ...so take some time and get your
               indexes right!


Thursday, September 6, 12
Thanks!

                (and thanks to Richard Kreuter for the slides)




Thursday, September 6, 12

More Related Content

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Indexing and Query Optimization Webinar

  • 1. Indexing  and  Query  Optimization Kevin  Matulef September  6,  2012 Thursday, September 6, 12
  • 2. What’s in store • What are indexes? • Picking the right indexes. • Creating indexes in MongoDB • Troubleshooting Thursday, September 6, 12
  • 3. Indexes are the single biggest tunable performance factor in MongoDB. Thursday, September 6, 12
  • 4. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem. Thursday, September 6, 12
  • 5. So what problem do indexes solve? Thursday, September 6, 12
  • 7. How do you find a chicken recipe? • An unindexed cookbook might be quite a page turner. • Probably not what you want, though. Thursday, September 6, 12
  • 8. I know, I’ll use an index! Thursday, September 6, 12
  • 10. Let’s imagine a simple index ingredient page aardvark 790 ... ... beef 190,  191,  205,  ... ... ... chicken 182,  199,  200,  ...   chorizo 497,  ... ... ... zucchini 673,  986,  ... Thursday, September 6, 12
  • 11. How do you find a quick chicken recipe? Thursday, September 6, 12
  • 12. Let’s imagine a compound index ingredient cooking  time page ... ... ... chicken 15  min 182,  200 chicken 25  min 199 chicken 30  min 289,316,320 chicken 45  min 290,  291,  354 ... ... ... Thursday, September 6, 12
  • 13. Consider the ordering of index keys Aardvark,  20  min Chicken,  15  min Zuchinni,  45  min Chicken,  25  min Chicken,  30  min Chicken,  45  min Thursday, September 6, 12
  • 14. How about a low-calorie chicken recipe? Thursday, September 6, 12
  • 15. Let’s imagine a 2nd compound index ingredient calories page ... ... ... chicken 250 199,  316 chicken 300 289,291 chicken 425 320 ... ... ... Thursday, September 6, 12
  • 16. How about a quick, low-calorie recipe? Thursday, September 6, 12
  • 17. Let’s imagine a last compound index calories cooking  time page ... ... ... 250 25  min 199 250 30  min 316 300 25  min 289 300 45  min 291 425 30  min 320 ... ... ... How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes? Thursday, September 6, 12
  • 18. Consider the ordering of index keys 250  cal, 250  cal, 300  cal, 300  cal, 425  cal, 25  min 30  min 25  min 45  min 30  min How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes? 4 index entries will be scanned, but only 1 will match! Thursday, September 6, 12
  • 19. Range queries using an index on A, B • A is a range J • A is constant, B is a range J • A is constant, order by B J • A is range, B is constant/range K • B is constant/range, A unspecified L Thursday, September 6, 12
  • 20. It’s really that straightforward. Thursday, September 6, 12
  • 21. B-Trees (Bayer & McCreight ’72) Thursday, September 6, 12
  • 22. B-Trees (Bayer & McCreight ’72) 13 Thursday, September 6, 12
  • 23. B-Trees (Bayer & McCreight ’72) 13 Queries,  Inserts,  Deletes:  O(log  n) Thursday, September 6, 12
  • 24. All this is relevant to MongoDB. • MongoDB’s indexes are B-Trees, which are designed for range queries. • Generally, the best index for your queries is going to be a compound index. • Every additional index slows down inserts & removes, and may slow updates. Thursday, September 6, 12
  • 25. On to MongoDB! Thursday, September 6, 12
  • 26. Declaring Indexes • db.foo.ensureIndex( { username : 1 } ) Thursday, September 6, 12
  • 27. Declaring Indexes • db.foo.ensureIndex( { username : 1 } ) • db.foo.ensureIndex( { username : 1, created_at : -1 } ) Thursday, September 6, 12
  • 28. And managing them.... > db.system.indexes.find() //db.foo.getIndexes() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" } Thursday, September 6, 12
  • 29. And managing them.... > db.system.indexes.find() //db.foo.getIndexes() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" } > db.foo.dropIndex( { username : 1} ) { "nIndexesWas" : 2 , "ok" : 1 } Thursday, September 6, 12
  • 30. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. Thursday, September 6, 12
  • 31. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) Thursday, September 6, 12
  • 32. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) • All queries can use just 1 index (except $or queries). Thursday, September 6, 12
  • 33. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) • All queries can use just 1 index (except $or queries). • The maximum index key size is 1024 bytes. Thursday, September 6, 12
  • 34. Indexes get used where you’d expect • db.foo.find({x : 42}) • db.foo.find({x : {$in : [42,52]}}) • db.foo.find({x : {$lt : 42}) • update, findAndModify that select on x, • count, distinct, • $match in aggregation • left-anchored regexp, e.g. /^Kev/ Thursday, September 6, 12
  • 35. But indexes aren’t always helpful • Most negations: $not, $nin, $ne • Some corner cases: $mod, $where • Matching most regular expressions, e.g. /a/ or /foo/i Thursday, September 6, 12
  • 37. Arrays: the powerful “multiKey” index { title : “Chicken Noodle Soup”, ingredients : [“chicken”, “noodles”] } >  db.foo.ensureIndex(  {  ingredients  :  1  }  ) ingredients page chicken 42 ... ... noodles 42 ... ... Thursday, September 6, 12
  • 38. Unique Indexes • db.foo.ensureIndex( { email : 1 } , {unique : true} ) > db.foo.insert({email : “matulef@10gen.com”}) > db.foo.insert({email : “matulef@10gen.com”}) E11000 duplicate key error ... Thursday, September 6, 12
  • 39. Sparse Indexes • db.foo.ensureIndex( { email : 1 } , {sparse : true} ) No index entries for docs without “email” field Thursday, September 6, 12
  • 40. Geospatial Indexes { name: "10gen Office", lat_long: [ 52.5184, 13.387 ] } > db.foo.ensureIndex( { lat_long : “2d” } ) > db.locations.find( { lat_long: {$near: [52.53, 13.4] } } ) Thursday, September 6, 12
  • 42. The Query Optimizer • For each “type” of query, mongoDB periodically tries all useful indexes. • Aborts as soon as one plan wins. • Winning plan is temporarily cached. Thursday, September 6, 12
  • 43. Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ... } Thursday, September 6, 12
  • 44. Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor t_1" , "n" : 42, Pay attention to the “nscannedObjects: 42 "nscanned" : 42, ratio  n/nscanned! ... "millis" : 0, ... } Thursday, September 6, 12
  • 45. Think you know better? Give us a hint > db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} ) Thursday, September 6, 12
  • 46. Recording slow queries > db.setProfilingLevel( n , slowms=100ms ) n=0 profiler off n=1 record queries longer than slowms n=2 record all queries > db.system.profile.find() Thursday, September 6, 12
  • 48. Background index builds db.foo.ensureIndex( { user : 1 } , { background : true } ) Caveats: • still resource-intensive • will build in foreground on secondaries Thursday, September 6, 12
  • 49. Minimizing impact on Replica Sets for (s in secondaries) s.restartAsStandalone() s.buildIndex() s.restartAsReplSetMember() s.waitForCatchup() p.stepDown() p.restartAsStandalone() p.buildIndex() p.restartAsReplSetMember() Thursday, September 6, 12
  • 50. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem... ...so take some time and get your indexes right! Thursday, September 6, 12
  • 51. Thanks! (and thanks to Richard Kreuter for the slides) Thursday, September 6, 12