SlideShare a Scribd company logo
1 of 51
Download to read offline
Indexing	
  and	
  Query	
  Optimization
                                        Kevin	
  Matulef
                                   September	
  6,	
  2012




Thursday, September 6, 12
What’s in store

      • What are indexes?

      • Picking the right indexes.

      • Creating indexes in MongoDB

      • Troubleshooting




Thursday, September 6, 12
Indexes are the single biggest
         tunable performance factor
                in MongoDB.




Thursday, September 6, 12
Absent or suboptimal indexes are
      the most common avoidable
    MongoDB performance problem.




Thursday, September 6, 12
So what problem do indexes solve?




Thursday, September 6, 12
Thursday, September 6, 12
How do you find a chicken recipe?


      • An unindexed cookbook might be quite a
          page turner.

      • Probably not what you want, though.




Thursday, September 6, 12
I know, I’ll use an index!




Thursday, September 6, 12
Thursday, September 6, 12
Let’s imagine a simple index
                            ingredient             page

                            aardvark                790

                                ...                   ...

                              beef       190,	
  191,	
  205,	
  ...

                                ...                   ...

                             chicken     182,	
  199,	
  200,	
  ...	
  

                             chorizo             497,	
  ...

                                ...                   ...

                             zucchini        673,	
  986,	
  ...




Thursday, September 6, 12
How do you find a quick chicken recipe?




Thursday, September 6, 12
Let’s imagine a compound index
                            ingredient   cooking	
  time        page

                                ...            ...                 ...

                             chicken        15	
  min         182,	
  200

                             chicken        25	
  min            199

                             chicken        30	
  min      289,316,320

                             chicken        45	
  min      290,	
  291,	
  354

                                ...            ...                 ...




Thursday, September 6, 12
Consider the ordering of index keys




  Aardvark,	
  20	
  min    Chicken,	
  15	
  min                      Zuchinni,	
  45	
  min
                                   Chicken,	
  25	
  min
                                        Chicken,	
  30	
  min
                                               Chicken,	
  45	
  min




Thursday, September 6, 12
How about a low-calorie chicken recipe?




Thursday, September 6, 12
Let’s imagine a 2nd compound index

                            ingredient   calories     page


                                ...         ...         ...


                             chicken       250      199,	
  316


                             chicken       300      289,291


                             chicken       425         320


                                ...         ...         ...




Thursday, September 6, 12
How about a quick, low-calorie recipe?




Thursday, September 6, 12
Let’s imagine a last compound index
                            calories   cooking	
  time   page

                               ...           ...          ...

                              250         25	
  min      199

                              250         30	
  min      316

                              300         25	
  min      289

                              300         45	
  min      291
                              425         30	
  min      320

                               ...           ...          ...

          How do you find dishes from 250 to 300 calories that cook from
                               30 to 40 minutes?



Thursday, September 6, 12
Consider the ordering of index keys


              250	
  cal,   250	
  cal,   300	
  cal,   300	
  cal,   425	
  cal,
              25	
  min     30	
  min     25	
  min     45	
  min     30	
  min




          How do you find dishes from 250 to 300 calories that cook from
                               30 to 40 minutes?

                  4 index entries will be scanned, but only 1 will match!




Thursday, September 6, 12
Range queries using an index on A, B
      • A is a range J

      • A is constant, B is a range J

      • A is constant, order by B J

      • A is range, B is constant/range   K

      • B is constant/range, A unspecified L


Thursday, September 6, 12
It’s really that straightforward.




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




                                              13




Thursday, September 6, 12
B-Trees                (Bayer & McCreight ’72)




                                                 13

            Queries,	
  Inserts,	
  Deletes:	
  O(log	
  n)


Thursday, September 6, 12
All this is relevant to MongoDB.
      • MongoDB’s indexes are B-Trees, which are
          designed for range queries.

      • Generally, the best index for your queries is
          going to be a compound index.

      • Every additional index slows down inserts &
          removes, and may slow updates.




Thursday, September 6, 12
On to MongoDB!




Thursday, September 6, 12
Declaring Indexes
      • db.foo.ensureIndex( { username : 1 } )




Thursday, September 6, 12
Declaring Indexes
      • db.foo.ensureIndex( { username : 1 } )


      • db.foo.ensureIndex( { username : 1, created_at : -1 } )




Thursday, September 6, 12
And managing them....
          > db.system.indexes.find() //db.foo.getIndexes()

           { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" }
           { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }




Thursday, September 6, 12
And managing them....
          > db.system.indexes.find() //db.foo.getIndexes()

           { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" }
           { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" }



          > db.foo.dropIndex( { username : 1} )

            { "nIndexesWas" : 2 , "ok" : 1 }




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)


      • All queries can use just 1 index
                  (except $or queries).




Thursday, September 6, 12
Key info about MongoDB’s indexes
      • A collection may have at most 64 indexes.

      • “_id” index is automatic
                  (except capped collections before 2.2)


      • All queries can use just 1 index
                  (except $or queries).


      • The maximum index key size is 1024 bytes.



Thursday, September 6, 12
Indexes get used where you’d expect
           • db.foo.find({x : 42})
           • db.foo.find({x : {$in : [42,52]}})
           • db.foo.find({x : {$lt : 42})
           • update, findAndModify that select on x,
           • count, distinct,
           • $match in aggregation
           • left-anchored regexp, e.g. /^Kev/




Thursday, September 6, 12
But indexes aren’t always helpful
      • Most negations: $not, $nin, $ne


      • Some corner cases: $mod, $where


      • Matching most regular expressions, e.g. /a/
          or /foo/i




Thursday, September 6, 12
Advanced Options




Thursday, September 6, 12
Arrays: the powerful “multiKey” index
           { title : “Chicken Noodle Soup”,
             ingredients : [“chicken”, “noodles”] }

           >	
  db.foo.ensureIndex(	
  {	
  ingredients	
  :	
  1	
  }	
  )

                             ingredients                        page
                                chicken                          42
                                   ...                            ...
                               noodles                           42
                                   ...                            ...




Thursday, September 6, 12
Unique Indexes
     • db.foo.ensureIndex( { email : 1 } , {unique : true} )



          > db.foo.insert({email : “matulef@10gen.com”})
          > db.foo.insert({email : “matulef@10gen.com”})
             E11000 duplicate key error ...




Thursday, September 6, 12
Sparse Indexes
     • db.foo.ensureIndex( { email : 1 } , {sparse : true} )


                  No index entries for docs without “email” field




Thursday, September 6, 12
Geospatial Indexes
          { name: "10gen Office",
            lat_long: [ 52.5184, 13.387 ] }

          > db.foo.ensureIndex( { lat_long : “2d” } )

          > db.locations.find( { lat_long: {$near: [52.53, 13.4] } } )




Thursday, September 6, 12
Troubleshooting




Thursday, September 6, 12
The Query Optimizer
      • For each “type” of query, mongoDB
          periodically tries all useful indexes.

      • Aborts as soon as one plan wins.

      • Winning plan is temporarily cached.




Thursday, September 6, 12
Which plan wins? Explain!
      > db.foo.find( { t: { $lt : 40 } } ).explain( )
      {
        "cursor" : "BtreeCursor t_1" ,
        "n" : 42,
        “nscannedObjects: 42
        "nscanned" : 42,
         ...
        "millis" : 0,
         ...
      }




Thursday, September 6, 12
Which plan wins? Explain!
      > db.foo.find( { t: { $lt : 40 } } ).explain( )
      {
        "cursor" : "BtreeCursor t_1" ,
        "n" : 42,                          Pay attention to the
        “nscannedObjects: 42
        "nscanned" : 42,                    ratio	
  n/nscanned!
         ...
        "millis" : 0,
         ...
      }




Thursday, September 6, 12
Think you know better? Give us a hint
      > db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} )




Thursday, September 6, 12
Recording slow queries
      > db.setProfilingLevel( n , slowms=100ms )

      n=0 profiler off
      n=1 record queries longer than slowms
      n=2 record all queries

      > db.system.profile.find()




Thursday, September 6, 12
Operational Tips




Thursday, September 6, 12
Background index builds
          db.foo.ensureIndex( { user : 1 } , { background : true } )

          Caveats:
           • still resource-intensive
           • will build in foreground on secondaries




Thursday, September 6, 12
Minimizing impact on Replica Sets
          for (s in secondaries)
              s.restartAsStandalone()
              s.buildIndex()
              s.restartAsReplSetMember()
              s.waitForCatchup()

          p.stepDown()
          p.restartAsStandalone()
          p.buildIndex()
          p.restartAsReplSetMember()




Thursday, September 6, 12
Absent or suboptimal indexes are
     the most common avoidable
   MongoDB performance problem...

    ...so take some time and get your
               indexes right!


Thursday, September 6, 12
Thanks!

                (and thanks to Richard Kreuter for the slides)




Thursday, September 6, 12

More Related Content

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Indexing and Query Optimization Webinar

  • 1. Indexing  and  Query  Optimization Kevin  Matulef September  6,  2012 Thursday, September 6, 12
  • 2. What’s in store • What are indexes? • Picking the right indexes. • Creating indexes in MongoDB • Troubleshooting Thursday, September 6, 12
  • 3. Indexes are the single biggest tunable performance factor in MongoDB. Thursday, September 6, 12
  • 4. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem. Thursday, September 6, 12
  • 5. So what problem do indexes solve? Thursday, September 6, 12
  • 7. How do you find a chicken recipe? • An unindexed cookbook might be quite a page turner. • Probably not what you want, though. Thursday, September 6, 12
  • 8. I know, I’ll use an index! Thursday, September 6, 12
  • 10. Let’s imagine a simple index ingredient page aardvark 790 ... ... beef 190,  191,  205,  ... ... ... chicken 182,  199,  200,  ...   chorizo 497,  ... ... ... zucchini 673,  986,  ... Thursday, September 6, 12
  • 11. How do you find a quick chicken recipe? Thursday, September 6, 12
  • 12. Let’s imagine a compound index ingredient cooking  time page ... ... ... chicken 15  min 182,  200 chicken 25  min 199 chicken 30  min 289,316,320 chicken 45  min 290,  291,  354 ... ... ... Thursday, September 6, 12
  • 13. Consider the ordering of index keys Aardvark,  20  min Chicken,  15  min Zuchinni,  45  min Chicken,  25  min Chicken,  30  min Chicken,  45  min Thursday, September 6, 12
  • 14. How about a low-calorie chicken recipe? Thursday, September 6, 12
  • 15. Let’s imagine a 2nd compound index ingredient calories page ... ... ... chicken 250 199,  316 chicken 300 289,291 chicken 425 320 ... ... ... Thursday, September 6, 12
  • 16. How about a quick, low-calorie recipe? Thursday, September 6, 12
  • 17. Let’s imagine a last compound index calories cooking  time page ... ... ... 250 25  min 199 250 30  min 316 300 25  min 289 300 45  min 291 425 30  min 320 ... ... ... How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes? Thursday, September 6, 12
  • 18. Consider the ordering of index keys 250  cal, 250  cal, 300  cal, 300  cal, 425  cal, 25  min 30  min 25  min 45  min 30  min How do you find dishes from 250 to 300 calories that cook from 30 to 40 minutes? 4 index entries will be scanned, but only 1 will match! Thursday, September 6, 12
  • 19. Range queries using an index on A, B • A is a range J • A is constant, B is a range J • A is constant, order by B J • A is range, B is constant/range K • B is constant/range, A unspecified L Thursday, September 6, 12
  • 20. It’s really that straightforward. Thursday, September 6, 12
  • 21. B-Trees (Bayer & McCreight ’72) Thursday, September 6, 12
  • 22. B-Trees (Bayer & McCreight ’72) 13 Thursday, September 6, 12
  • 23. B-Trees (Bayer & McCreight ’72) 13 Queries,  Inserts,  Deletes:  O(log  n) Thursday, September 6, 12
  • 24. All this is relevant to MongoDB. • MongoDB’s indexes are B-Trees, which are designed for range queries. • Generally, the best index for your queries is going to be a compound index. • Every additional index slows down inserts & removes, and may slow updates. Thursday, September 6, 12
  • 25. On to MongoDB! Thursday, September 6, 12
  • 26. Declaring Indexes • db.foo.ensureIndex( { username : 1 } ) Thursday, September 6, 12
  • 27. Declaring Indexes • db.foo.ensureIndex( { username : 1 } ) • db.foo.ensureIndex( { username : 1, created_at : -1 } ) Thursday, September 6, 12
  • 28. And managing them.... > db.system.indexes.find() //db.foo.getIndexes() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" } Thursday, September 6, 12
  • 29. And managing them.... > db.system.indexes.find() //db.foo.getIndexes() { "v" : 1, "key" : { "_id" : 1 }, "ns" : "test.foo", "name" : "_id_" } { "v" : 1, "key" : { "username" : 1 }, "ns" : "test.foo", "name" : "username_1" } > db.foo.dropIndex( { username : 1} ) { "nIndexesWas" : 2 , "ok" : 1 } Thursday, September 6, 12
  • 30. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. Thursday, September 6, 12
  • 31. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) Thursday, September 6, 12
  • 32. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) • All queries can use just 1 index (except $or queries). Thursday, September 6, 12
  • 33. Key info about MongoDB’s indexes • A collection may have at most 64 indexes. • “_id” index is automatic (except capped collections before 2.2) • All queries can use just 1 index (except $or queries). • The maximum index key size is 1024 bytes. Thursday, September 6, 12
  • 34. Indexes get used where you’d expect • db.foo.find({x : 42}) • db.foo.find({x : {$in : [42,52]}}) • db.foo.find({x : {$lt : 42}) • update, findAndModify that select on x, • count, distinct, • $match in aggregation • left-anchored regexp, e.g. /^Kev/ Thursday, September 6, 12
  • 35. But indexes aren’t always helpful • Most negations: $not, $nin, $ne • Some corner cases: $mod, $where • Matching most regular expressions, e.g. /a/ or /foo/i Thursday, September 6, 12
  • 37. Arrays: the powerful “multiKey” index { title : “Chicken Noodle Soup”, ingredients : [“chicken”, “noodles”] } >  db.foo.ensureIndex(  {  ingredients  :  1  }  ) ingredients page chicken 42 ... ... noodles 42 ... ... Thursday, September 6, 12
  • 38. Unique Indexes • db.foo.ensureIndex( { email : 1 } , {unique : true} ) > db.foo.insert({email : “matulef@10gen.com”}) > db.foo.insert({email : “matulef@10gen.com”}) E11000 duplicate key error ... Thursday, September 6, 12
  • 39. Sparse Indexes • db.foo.ensureIndex( { email : 1 } , {sparse : true} ) No index entries for docs without “email” field Thursday, September 6, 12
  • 40. Geospatial Indexes { name: "10gen Office", lat_long: [ 52.5184, 13.387 ] } > db.foo.ensureIndex( { lat_long : “2d” } ) > db.locations.find( { lat_long: {$near: [52.53, 13.4] } } ) Thursday, September 6, 12
  • 42. The Query Optimizer • For each “type” of query, mongoDB periodically tries all useful indexes. • Aborts as soon as one plan wins. • Winning plan is temporarily cached. Thursday, September 6, 12
  • 43. Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor t_1" , "n" : 42, “nscannedObjects: 42 "nscanned" : 42, ... "millis" : 0, ... } Thursday, September 6, 12
  • 44. Which plan wins? Explain! > db.foo.find( { t: { $lt : 40 } } ).explain( ) { "cursor" : "BtreeCursor t_1" , "n" : 42, Pay attention to the “nscannedObjects: 42 "nscanned" : 42, ratio  n/nscanned! ... "millis" : 0, ... } Thursday, September 6, 12
  • 45. Think you know better? Give us a hint > db.foo.find( { t: { $lt : 40 } } ).hint( { _id : 1} ) Thursday, September 6, 12
  • 46. Recording slow queries > db.setProfilingLevel( n , slowms=100ms ) n=0 profiler off n=1 record queries longer than slowms n=2 record all queries > db.system.profile.find() Thursday, September 6, 12
  • 48. Background index builds db.foo.ensureIndex( { user : 1 } , { background : true } ) Caveats: • still resource-intensive • will build in foreground on secondaries Thursday, September 6, 12
  • 49. Minimizing impact on Replica Sets for (s in secondaries) s.restartAsStandalone() s.buildIndex() s.restartAsReplSetMember() s.waitForCatchup() p.stepDown() p.restartAsStandalone() p.buildIndex() p.restartAsReplSetMember() Thursday, September 6, 12
  • 50. Absent or suboptimal indexes are the most common avoidable MongoDB performance problem... ...so take some time and get your indexes right! Thursday, September 6, 12
  • 51. Thanks! (and thanks to Richard Kreuter for the slides) Thursday, September 6, 12