• Share
  • Email
  • Embed
  • Like
  • Private Content
Indexing with MongoDB
 

Indexing with MongoDB

on

  • 30,154 views

Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677 ...

Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.

Statistics

Views

Total Views
30,154
Views on SlideShare
26,055
Embed Views
4,099

Actions

Likes
66
Downloads
550
Comments
1

20 Embeds 4,099

http://nosql.pl 2153
http://www.scoop.it 1004
http://www.10gen.com 606
http://www.slideshare.net 179
http://nosqlpl.tumblr.com 75
https://twitter.com 32
http://www.mongodb.com 21
http://localhost 10
http://webcache.googleusercontent.com 4
http://www.mongodb.org 3
http://www.crazyshell.org 2
http://drupal1.10gen.cc 2
http://mbot-2.local 1
http://web.archive.org 1
http://archive.10gen.com 1
http://feeds.feedburner.com 1
http://www.babbleapp.org 1
http://dl.mongodb.org 1
https://www.10gen.com 1
http://www.tumblr.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • test
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Indexing with MongoDB Indexing with MongoDB Presentation Transcript

    • Indexing with
      Aaron Staple
      aaron@10gen.com
    • What are indexes?
      References to your documents, efficiently ordered by key
      Maintained in a tree structure, allowing fast lookup
      {x:1}
      {y:1}
      {x:0.5,y:0.5}
      {x:2,y:0.5}
      {x:5,y:2}
      {x:-4,y:10}
      {x:3,y:’f’}
    • Fast document lookup
      db.c.findOne( {_id:2} ), using index {_id:1}
      db.c.find( {x:2} ), using index {x:1}
      db.c.find( {x:{$in:[2,3]}} ), using index {x:1}
      db.c.find( {‘x.a’:1} ), using index {‘x.a’:1}
      Matches {_id:1,x:{a:1}}
      db.c.find( {x:{a:1}} ), using index {x:1}
      Matches {_id:1,x:{a:1}}, but not {_id:2,x:{a:1,b:2}}
      QUESTION: What about db.c.find( {$where:“this.x == this.y”} ), using index {x:1}?
      Indexes cannot be used for $where type queries, but if there are non-where elements in the query then indexes can be used for the non-where elements.
    • Fast document range scan
      db.c.find( {x:{$gt:2}} ), using index {x:1}
      db.c.find( {x:{$gt:2,$lt:5}} ), using index {x:1}
      db.c.find( {x:/^a/} ), using index {x:1}
      QUESTION: What about db.c.find( {x:/a/} ), using index {x:1}?
      The letter ‘a’ can appear anywhere in a matching string, so lexicographic ordering on strings won’t help. However, we can use the index to find the range of documents where x is string (eg not a number) or x is the regular expression /a/.
    • Other operations
      db.c.count( {x:2} ) using index {x:1}
      db.c.distinct( {x:2} ) using index {x:1}
      db.c.update( {x:2}, {x:3} ) using index {x:1}
      db.c.remove( {x:2} ) using index {x:1}
      QUESTION: What about db.c.update( {x:2}, {$inc:{x:3}} ), using index {x:1}?
      Older versions of mongoDB didn’t support modifiers on indexed fields, but we now support this.
    • Fast document ordering
      db.c.find( {} ).sort( {x:1} ), using index {x:1}
      db.c.find( {} ).sort( {x:-1} ), using index {x:1}
      db.c.find( {x:{$gt:4}} ).sort( {x:-1} ), using index {x:1}
      db.c.find( {} ).sort( {‘x.a’:1} ), using index {‘x.a’:1}
      QUESTION: What about db.c.find( {y:1} ).sort( {x:1} ), using index {x:1}?
      The index will be used to ensure ordering, provided there is no better index.
    • Missing fields
      db.c.find( {x:null} ), using index {x:1}
      Matches {_id:5}
      db.c.find( {x:{$exists:false}} ), using index {x:1}
      Matches {_id:5}, but not {_id:6,x:null}
      QUESTION: What about db.c.find( {x:{$exists:true}} ), using index {x:1}?
      The index is not currently used, though we may use the index in a future version of mongoDB.
    • Array matching
      All the following match {_id:6,x:[2,10]} and use index {x:1}
      db.c.find( {x:2} )
      db.c.find( {x:10} )
      db.c.find( {x:{$gt:5}} )
      db.c.find( {x:[2,10]} )
      db.c.find( {x:{$in:[2,5]}} )
      QUESTION: What about db.c.find( {x:{$all:[2,10]}} )?
      The index will be used to look up all documents matching {x:2}.
    • Compound Indexes
      db.c.find( {x:10,y:20} ), using index {x:1,y:1}
      db.c.find( {x:10,y:20} ), using index {x:1,y:-1}
      db.c.find( {x:{$in:[10,20]},y:20} ), using index {x:1,y:1}
      db.c.find().sort( {x:1,y:1} ), using index {x:1,y:1}
      db.c.find().sort( {x:-1,y:1} ), using index {x:1,y:-1}
      db.c.find( {x:10} ).sort( {y:1} ), using index {x:1,y:1}
      QUESTION: What about db.c.find( {y:10} ).sort( {x:1} ), using index {x:1,y:1}?
      The index will be used to ensure ordering, provided no better index is available.
    • When indexes are less helpful
      db.c.find( {x:{$ne:1}} )
      db.c.find( {x:{$mod:[10,1]}} )
      Uses index {x:1} to scan numbers only
      db.c.find( {x:{$not:/a/}} )
      db.c.find( {x:{$gte:0,$lte:10},y:5} ) using index {x:1,y:1}
      Currently must scan all elements from {x:0,y:5} to {x:10,y:5}, but some improvements may be possible
      db.c.find( {$where:’this.x = 5’} )
      QUESTION: What about db.c.find( {x:{$not:/^a/}} ), using index {x:1}?
      The index is not used currently, but will be used in mongoDB 1.6
    • Geospatial indexes
      db.c.find( {a:[50,50]} ) using index {a:’2d’}
      db.c.find( {a:{$near:[50,50]}} ) using index {a:’2d’}
      Results are sorted closest - farthest
      db.c.find( {a:{$within:{$box:[[40,40],[60,60]]}}} ) using index {a:’2d’}
      db.c.find( {a:{$within:{$center:[[50,50],10]}}} ) using index {a:’2d’}
      db.c.find( {a:{$near:[50,50]},b:2} ) using index {a:’2d’,b:1}
      QUESTION: Most queries can be performed with or without an index. Is this true of geospatial queries?
      No. A geospatial query requires an index.
    • Creating indexes
      {_id:1} index created automatically
      For non-capped collections
      db.c.ensureIndex( {x:1} )
      Can create an index at any time, even when you already have plenty of data in your collection
      Creating an index will block mongoDB unless you specify background index creation
      db.c.ensureIndex( {x:1}, {background:true} )
      Background index creation is a still impacts performance – run at non peak times if you’re concerned
      QUESTION: Can an index be removed during background creation?
      Not at this time.
    • Unique key constraints
      db.c.ensureIndex( {x:1}, {unique:true} )
      Don’t allow {_id:10,x:2} and {_id:11,x:2}
      Don’t allow {_id:12} and {_id:13} (both match {x:null}
      What if duplicates exist before index is created?
      Normally index creation fails and the index is removed
      db.ensureIndex( {x:1}, {unique:true,dropDups:true} )
      QUESTION: In dropDups mode, which duplicates will be removed?
      The first document according to the collection’s “natural order” will be preserved.
    • Cleaning up indexes
      db.system.indexes.find( {ns:’db.c’} )
      db.c.dropIndex( {x:1} )
      db.c.dropIndexes()
      db.c.reIndex()
      Rebuilds all indexes, removing index cruft that has built up over large numbers of updates and deletes. Index cruft will not exist in mongoDB 1.6, so this command will be deprecated.
      QUESTION: Why would you want to drop an index?
      See next slide…
    • Limits and Tradeoffs
      Max 40 indexes per collection
      Logically equivalent indexes are not prevented (eg {x:1} and {x:-1})
      Indexes can improve speed of queries, but make inserts slower
      More specific indexes {a:1,b:1,c:1} can be more helpful than less specific indexes {a:1}, but sorting compound keys may not be as fast as sorting simple keys
      QUESTION: Do indexes make updates slower? How about deletes?
      It depends – finding your document might be faster, but if any indexed fields are changed the indexes must be updated.
    • Query Optimizer
      In charge of picking which index to use for a query/count/update/delete/etc
      Implementation is part of the magic of mongo (you can read about it online – not covering today)
      Usually it does a good job, but if you know what you’re doing you can override it
      db.c.find( {x:2,y:3} ).hint( {y:1} )
      Use index {y:1} and avoid trying out {x:1}
      As your data changes, different indexes may be chosen. Ordering requirements should be made explicit using sort().
      QUESTION: How can you force a full collection scan instead of using indexes?
      db.c.find( {x:2,y:3} ).hint( {$natural:1} )
    • Mongod log output
      query test.c ntoreturn:1 reslen:69 nscanned:100000 { i: 99999.0 } nreturned:1 157ms
      query test.$cmd ntoreturn:1 command: { count: "c", query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256ms
      query:{ query: {}, orderby: { i: 1.0 } } ... query test.c ntoreturn:0 exception 1378ms ... User Exception 10128:too much key data for sort() with no index. add an index or specify a smaller limit
      query test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } } nreturned:101 390ms
      Occasionally may see a slow operation as a result of disk activity or mongo cleaning things up – some messages about slow ops are spurious
      Keep this in mind when running the same op a massive number of times, and it appears slow very rarely
    • Profiling
      Record same info as with log messages, but in a database collection
      > db.system.profile.find()
      {"ts" : "Thu Jan 29 2009 15:19:32 GMT-0500 (EST)" , "info" : "query test.$cmd ntoreturn:1 reslen:66 nscanned:0 <br>query: { profile: 2 } nreturned:1 bytes:50" , "millis" : 0}...
      > db.system.profile.find( { info: /test.foo/ } )
      > db.system.profile.find( { millis : { $gt : 5 } } )
      > db.system.profile.find().sort({$natural:-1})
      Enable explicitly using levels (0:off, 1:slow ops (>100ms), 2:all ops)
      > db.setProfilingLevel(2);
      {"was" : 0 , "ok" : 1}
      > db.getProfilingLevel()
      2
      > db.setProfilingLevel( 1 , 10 ); // slow means > 10ms
      Profiling impacts performance, but not severely
    • Query explain
      > db.c.find( {x:1000,y:0} ).explain()
      {
      "cursor" : "BtreeCursor x_1",
      "indexBounds" : [
      [
      {
      "x" : 1000
      },
      {
      "x" : 1000
      }
      ]
      ],
      "nscanned" : 10,
      "nscannedObjects" : 10,
      "n" : 10,
      "millis" : 0,
      "oldPlan" : {
      "cursor" : "BtreeCursor x_1",
      "indexBounds" : [
      [
      {
      "x" : 1000
      },
      {
      "x" : 1000
      }
      ]
      ]
      },
      "allPlans" : [
      {
      "cursor" : "BtreeCursor x_1",
      "indexBounds" : [
      [
      {
      "x" : 1000
      },
      {
      "x" : 1000
      }
      ]
      ]
      },
      {
      "cursor" : "BtreeCursor y_1",
      "indexBounds" : [
      [
      {
      "y" : 0
      },
      {
      "y" : 0
      }
      ]
      ]
      },
      {
      "cursor" : "BasicCursor",
      "indexBounds" : [ ]
      }
      ]
      }
    • Example 1
      > db.c.findOne( {i:99999} )
      { "_id" : ObjectId("4bb962dddfdcf5761c1ec6a3"), "i" : 99999 }
      query test.c ntoreturn:1 reslen:69 nscanned:100000 { i: 99999.0 } nreturned:1 157ms
      > db.c.find( {i:99999} ).limit(1).explain()
      {
      "cursor" : "BasicCursor",
      "indexBounds" : [ ],
      "nscanned" : 100000,
      "nscannedObjects" : 100000,
      "n" : 1,
      "millis" : 161,
      "allPlans" : [
      {
      "cursor" : "BasicCursor",
      "indexBounds" : [ ]
      }
      ]
      }
      > db.c.ensureIndex( {i:1} );
      > for( i = 0; i < 100000; ++i ) { db.c.save( {i:i} ); }
    • Example 2
      > db.c.count( {type:0,i:{$gt:99000}} )
      499
      query test.$cmd ntoreturn:1 command: { count: "c", query: { type: 0.0, i: { $gt: 99000.0 } }, fields: {} } reslen:64 256ms
      > db.c.find( {type:0,i:{$gt:99000}} ).limit(1).explain()
      {
      "cursor" : "BtreeCursor type_1",
      "indexBounds" : [
      [
      {
      "type" : 0
      },
      {
      "type" : 0
      }
      ]
      ],
      "nscanned" : 49502,
      "nscannedObjects" : 49502,
      "n" : 1,
      "millis" : 349,
      ...
      > db.c.ensureIndex( {type:1,i:1} );
      > for( i = 0; i < 100000; ++i ) { db.c.save( {type:i%2,i:i} ); }
    • Example 3
      > db.c.find().sort( {i:1} )
      error: {
      "$err" : "too much key data for sort() with no index. add an index or specify a smaller limit"
      }
      > db.c.find().sort( {i:1} ).explain()
      JS Error: uncaught exception: error: {
      "$err" : "too much key data for sort() with no index. add an index or specify a smaller limit"
      }
      > db.c.ensureIndex( {i:1} );
      > for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i} ); }
    • Example 4
      > db.c.find( {type:500} ).sort( {i:1} )
      { "_id" : ObjectId("4bba4904dfdcf5761c2f917e"), "i" : 500, "type" : 500 }
      { "_id" : ObjectId("4bba4904dfdcf5761c2f9566"), "i" : 1500, "type" : 500 }
      ...
      query test.c ntoreturn:0 reslen:4783 nscanned:100501 { query: { type: 500.0 }, orderby: { i: 1.0 } } nreturned:101 390ms
      > db.c.find( {type:500} ).sort( {i:1} ).explain()
      {
      "cursor" : "BtreeCursor i_1",
      "indexBounds" : [
      [
      {
      "i" : {
      "$minElement" : 1
      }
      },
      {
      "i" : {
      "$maxElement" : 1
      }
      }
      ]
      ],
      "nscanned" : 1000000,
      "nscannedObjects" : 1000000,
      "n" : 1000,
      "millis" : 5388,
      ...
      > db.c.ensureIndex( {type:1,i:1} );
      > for( i = 0; i < 1000000; ++i ) { db.c.save( {i:i,type:i%1000} ); }
    • Questions?
      Follow @mongodb
      Get involved www.mongodb.org
      Upcoming events www.mongodb.org/display/DOCS/Events
      MongoSF April 30
      SF office hours every Mon 4-6pm Epicenter Cafe
      Commercial support www.10gen.com
      jobs@10gen.com