Query Optimization in MongoDB

Query Optimization in
MongoDB
TehranDB

MongoDB
MongoDB is a cross-platform, document oriented database that provides, high performance,
high availability, and easy scalability. MongoDB works on concept of collection and document.
Database
Database is a physical container for collections. Each database gets its own set of files on the
file system. A single MongoDB server typically has multiple databases.
Collection
Collection is a group of MongoDB documents. It is the equivalent of an RDBMS table. A
collection exists within a single database. Collections do not enforce a schema. Documents
within a collection can have different fields. Typically, all documents in a collection are of similar
or related purpose.
Document
A document is a set of key-value pairs. Documents have dynamic schema. Dynamic schema
means that documents in the same collection do not need to have the same set of fields or
structure, and common fields in a collection's documents may hold different types of data.
The following table shows the relationship of RDBMS terminology with MongoDB.

Advantages of MongoDB over RDBMS
• Schema less − MongoDB is a document database in which one collection
holds different documents. Number of fields, content and size of the document can differ
from one document to another.
• Structure of a single object is clear.
• No complex joins.
• Deep query-ability. MongoDB supports dynamic queries on documents using a
document-based query language that's nearly as powerful as SQL.
• Tuning.
• Ease of scale-out − MongoDB is easy to scale.
• Conversion/mapping of application objects to database objects not needed.
• Uses internal memory for storing the (windowed) working set, enabling faster
access of data.

Where to Use MongoDB?
• Big Data
• Content Management and Delivery
• Mobile and Social Infrastructure
• User Data Management

Analyze Your Queries
Like many databases, MongoDB provides an explain facility which provides statistics
about the Performance of a Query. You can add explain('executionStats') to a query.
db.user.find(
{ country: 'AU', city: 'Melbourne' }
).explain('executionStats');

Explain
or append it to the collection:
db.user.explain('executionStats').find(
{ country: 'AU', city: 'Melbourne' }
);

Explain
This returns a large JSON result, but there are three primary values to examine:
queryPlanner.winningPlan.stag:
1. COLLSCAN : Indicates a collection scan.
2.IXSCAN : Indicates index use.
executionStats.nReturned :
The number of documents returned.
executionStats.totalDocsExamined :
The number of documents scanned to find the result.
executionStats.totalKeysExamined :
Indicate that MongoDB scanned three index entries. 0 indicates that the query is not using an index.
If the number of documents examined greatly exceeds the number returned, the query
may not be efficient. In the worst cases, MongoDB might have to scan every document
in the collection.

Explain Result Example 1
{
"queryPlanner" : {
"plannerVersion" : 1,
...
"winningPlan" : {
"stage" : "COLLSCAN",
...
}
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 3,
"executionTimeMillis" : 0,
"totalKeysExamined" : 0,
"totalDocsExamined" : 10,
"executionStages" : {
"stage" : "COLLSCAN",
...
},
...
},
...
}

Explain Result Example 2
{
"queryPlanner" : {
"plannerVersion" : 1,
...
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"quantity" : 1
},
...
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 3,
"executionTimeMillis" : 0,
"totalKeysExamined" : 3,
"totalDocsExamined" : 3,
"executionStages" : {
...
},
...
},
...
}

Add Appropriate Indexes
Index Types
1. Single Field
2. Compound Index
3. Multikey Index
4. Geospatial Index
5. Text Index
6. Hashed Indexes

Simple Index
MongoDB supports the creation of user-defined ascending/descending indexes on a
single field of a document.
Default _id Index
MongoDB creates a unique index on the _id field during the creation of a collection. The
_id index prevents clients from inserting two documents with the same value for the _id
field. You cannot drop this index on the _id field.
Create an Index
The following example creates a single key descending index on the name field:
db.collection.createIndex( { name: -1 } )

Compound Index
MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
The order of fields listed in a compound index has significance.
For instance, if a compound index consists of { userid: 1, score: -1 }, the index sorts first
by userid and then, within each userid value, sorts by score.
db.collection.createIndex( {userId: 1 , score: -1 } )

Multikey Index
MongoDB uses multikey indexes to index the content stored in arrays. If you index a
field that holds an array value, MongoDB creates separate index entries for every
element of the array. These multikey indexes allow queries to select documents that
contain arrays by matching on element or elements of the arrays. MongoDB
automatically determines whether to create a multikey index if the indexed field contains
an array value; you do not need to explicitly specify the multikey type.
db.collection.createIndex( {addr.zip: 1 } )

Geospatial Index
To support efficient queries of geospatial coordinate data, MongoDB provides two
special indexes: 2d indexes that uses planar geometry when returning results and
2dsphere indexes that use spherical geometry to return results.

Text Indexes
MongoDB provides a text index type that supports searching for string content in a
collection. These text indexes do not store language-specific stop words (e.g. “the”, “a”,
“or”) and stem the words in a collection to only store root words.

Hashed Indexes
To support hash based sharding, MongoDB provides a hashed index type, which
indexes the hash of the value of a field. These indexes have a more random distribution
of values along their range, but only support equality matches and cannot support
range-based queries.

Index Properties
Unique Indexes¶
The unique property for an index causes MongoDB to reject duplicate values for the indexed field.
Other than the unique constraint, unique indexes are functionally interchangeable with other
MongoDB indexes.
db.members.createIndex( { "user_id": 1 }, { unique: true } )
Partial Indexes¶
New in version 3.2.
Partial indexes only index the documents in a collection that meet a specified filter expression. By
indexing a subset of the documents in a collection, partial indexes have lower storage requirements
and reduced performance costs for index creation and maintenance.
Partial indexes offer a superset of the functionality of sparse indexes and should be preferred over
sparse indexes.
db.restaurants.createIndex( { cuisine: 1, name: 1 }, { partialFilterExpression: { rating: { $gt: 5 }
} })

Index Properties
Sparse Indexes
The sparse property of an index ensures that the index only contain entries for documents that
have the indexed field. The index skips documents that do not have the indexed field.
You can combine the sparse index option with the unique index option to reject documents that
have duplicate values for a field but ignore documents that do not have the indexed key.
If a sparse index would result in an incomplete result set for queries and sort operations,
MongoDB will not use that index unless a hint() explicitly specifies the index.
db.addresses.createIndex( { "xmpp_id": 1 }, { sparse: true } )
TTL Indexes
TTL indexes are special indexes that MongoDB can use to automatically remove documents
from a collection after a certain amount of time. This is ideal for certain types of information like
machine generated event data, logs, and session information that only need to persist in a
database for a finite amount of time.
db.eventlog.createIndex( { "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 } )

Index Intersection
MongoDB can use the intersection of multiple indexes to fulfill queries. In general, each
index intersection involves two indexes; however, MongoDB can employ multiple/nested
index intersections to resolve a query.
To illustrate index intersection, consider a collection orders that has the following
indexes:
{ qty: 1 }
{ item: 1 }
MongoDB can use the intersection of the two indexes to support the following query:
db.orders.find( { item: "abc123", qty: { $gt: 15 } } )
To determine if MongoDB used index intersection, run explain(); the results of explain()
will include either an AND_SORTED stage or an AND_HASH stage.

Index Prefix Intersection
With index intersection, MongoDB can use an intersection of either the entire index or
the index prefix. An index prefix is a subset of a compound index, consisting of one or
more keys starting from the beginning of the index.
Consider a collection orders with the following indexes:
{ qty: 1 }
{ status: 1, ord_date: -1 }
To fulfill the following query which specifies a condition on both the qty field and the
status field, MongoDB can use the intersection of the two indexes:
db.orders.find( { qty: { $gt: 10 } , status: "A" } )

Index Intersection and Compound Indexes
Index intersection does not eliminate the need for creating compound indexes. However, because both the list
order (i.e. the order in which the keys are listed in the index) and the sort order (i.e. ascending or descending),
matter in compound indexes, a compound index may not support a query condition that does not include the
index prefix keys or that specifies a different sort order.
For example, if a collection orders has the following compound index, with the status field listed before the
ord_date field:
The compound index can support the following queries:
db.orders.find( { status: { $in: ["A", "P" ] } } )
db.orders.find({ord_date: { $gt: new Date("2014-02-01") }, status: {$in:[ "P", "A" ] }})
But not the following two queries:
db.orders.find( { ord_date: { $gt: new Date("2014-02-01") } } )
db.orders.find( { } ).sort( { ord_date: 1 } )
However, if the collection has two separate indexes:
{ status: 1 }
{ ord_date: -1 }
The two indexes can, either individually or through index intersection, support all four aforementioned queries.
The choice between creating compound indexes that support your queries or relying on index intersection
depends on the specifics of your system.

Index Intersection and Sort
Index intersection does not apply when the sort() operation requires an index completely separate from the
query predicate.
For example, the orders collection has the following indexes:
{ qty: 1 }
{ status: 1 }
{ ord_date: -1 }
MongoDB cannot use index intersection for the following query with sort:
db.orders.find( { qty: { $gt: 10 } } ).sort( { status: 1 } )
That is, MongoDB does not use the { qty: 1 } index for the query, and the separate { status: 1 } or the { status:
1, ord_date: -1 } index for the sort.
However, MongoDB can use index intersection for the following query with sort since the index { status: 1,
ord_date: -1 } can fulfill part of the query predicate.
db.orders.find( { qty: { $gt: 10 } , status: "A" } ).sort( { ord_date: -1 } )

Compound Indexes Prefix
Index prefixes are the beginning subsets of indexed fields. For example, consider the
following compound index:
{ "item": 1, "location": 1, "stock": 1 }
The index has the following index prefixes:
• { item: 1 }
• { item: 1, location: 1 }
For a compound index, MongoDB can use the index to support queries on the index
prefixes. As such, MongoDB can use the index for queries on the following fields:
• the item field,
• the item field and the location field,
• the item field and the location field and the stock field.

Compound Indexes Prefix
MongoDB can also use the index to support a query on item and stock fields since item
field corresponds to a prefix. However, the index would not be as efficient in supporting
the query as would be an index on only item and stock.
However, MongoDB cannot use the index to support queries that include the following
fields since without the item field, none of the listed fields correspond to a prefix index:
• the location field,
• the stock field
• the location and stock fields.
If you have a collection that has both a compound index and an index on its prefix (e.g. {
a: 1, b: 1 } and { a: 1 }), if neither index has a sparse or unique constraint, then you can
remove the index on the prefix (e.g. { a: 1 }). MongoDB will use the compound index in
all of the situations that it would have used the prefix index.

Optimizing MongoDB Compound Indexes
In order to create the best index for a complex MongoDB queries that combine equality
tests, sorts, and range filters, and demonstrate the best order for fields in a compound
index You must consider Index Cardinality and Selectivity.
:

Index Cardinality
The index cardinality refers to how many possible values there are for a field. The field
sex only has two possible values. It has a very low cardinality. Other fields such as
names, usernames, phone numbers, emails, etc. will have a more unique value for
every document in the collection, which is considered high cardinality.
• Greater Cardinality
The greater the cardinality of a field the more helpful an index will be, because indexes
narrow the search space, making it a much smaller set.
If you have an index on sex and you are looking for men named John. You would only
narrow down the result space by approximately %50 if you indexed by sex first.
Conversely if you indexed by name, you would immediately narrow down the result set
to a minute fraction of users named John, then you would refer to those documents to
check the gender.

Selectivity
Also, you want to use indexes selectively and write queries that limit the number of
possible documents with the indexed field. To keep it simple, consider the following
collection. If your index is {name:1}, If you run the query { name: "John", sex: "male"}.
You will have to scan 1 document. Because you allowed MongoDB to be selective.
Consider the following collection. If your index is {sex:1}, If you run the query {sex:
"male", name: "John"}. You will have to scan 4 documents.
{_id:ObjectId(),name:"John",sex:"male"}
{_id:ObjectId(),name:"Rich",sex:"male"}
{_id:ObjectId(),name:"Mose",sex:"male"}
{_id:ObjectId(),name:"Sami",sex:"male"}
{_id:ObjectId(),name:"Cari",sex:"female"}
{_id:ObjectId(),name:"Mary",sex:"female"}

Equality, Range Query, And Sort
Method
So here’s my method for creating a compound index for a query combining equality tests, sort fields, and range
filters:
Equality Tests
• Add all equality-tested fields to the compound index, in any order
Sort Fields (ascending / descending only matters if there are multiple sort fields)
• Add sort fields to the index in the same order and direction as your query’s sort
Range Filters
• First, add the range filter for the field with the lowest cardinality (fewest distinct values in the collection)
• Then the next lowest-cardinality range filter, and so on to the highest-cardinality
You can omit some equality-test fields or range-filter fields if they are not selective, to decrease the index
size—a rule of thumb is, if the field doesn’t filter out at least 90% of the possible documents in your collection,
it’s probably better to omit it from the index. Remember that if you have several indexes on a collection, you
may need to hint Mongo to use the right index.
That’s it! For complex queries on several fields, there’s a heap of possible indexes to consider. If you use this
method you’ll narrow your choices radically and go straight to a good index.

Equality, Range Query, And Sort
❖ Example:
db.comments.find({ timestamp: { $gte: 2, $lte: 4 }, anonymous: false }
…).sort( { rating: -1 } ).hint( { anonymous: 1, rating: 1, timestamp: 1 } ).explain()
{ timestamp: 1, anonymous: false, rating: 3 }
{ timestamp: 3, anonymous: true, rating: 1 }

Query Optimization in MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Query Optimization in MongoDB

Similar to Query Optimization in MongoDB (20)

Recently uploaded

Recently uploaded (20)

Query Optimization in MongoDB