This document provides an overview of indexing in MongoDB. It discusses what indexes are, why they are needed to optimize queries, and how to work with indexes in MongoDB. Some key points covered include how to create, manage, and optimize indexes. Common indexing mistakes are also discussed, such as trying to use multiple indexes per query, having indexes with low selectivity, and queries that cannot use indexes like regular expressions and negation queries.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
MongoDB World 2015 - A Technical Introduction to WiredTigerWiredTiger
MongoDB 3.0 introduces a new pluggable storage engine API and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database. In this talk we will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
MongoDB has adapted transaction feature (ACID Properties) in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Manosh Malai Senior Devops/NoSQL Consultant with Mydbops and Ranjith Database Administrator with Mydbops.
Video available here: http://vivu.tv/portal/archive.jsp?flow=783-586-4282&id=1270584002677
We all know that MongoDB is one of the most flexible and feature-rich databases available. In this webinar we'll discuss how you can leverage this feature set and maintain high performance with your project's massive data sets and high loads. We'll cover how indexes can be designed to optimize the performance of MongoDB. We'll also discuss tips for diagnosing and fixing performance issues should they arise.
MongoDB World 2015 - A Technical Introduction to WiredTigerWiredTiger
MongoDB 3.0 introduces a new pluggable storage engine API and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database. In this talk we will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
MongoDB has adapted transaction feature (ACID Properties) in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB. This tech talk was presented at Mydbops Database Meetup on 27-04-2019 by Manosh Malai Senior Devops/NoSQL Consultant with Mydbops and Ranjith Database Administrator with Mydbops.
MongoDB 3.0 introduces a pluggable storage architecture and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database.
In this webinar Michael Cahill, co-founder of WiredTiger, will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a senior member of the support team I will share more common mistakes observed and some tips and tricks to avoiding them.
This presentation explains the major differences between SQL and NoSQL databases in terms of Scalability, Flexibility and Performance. It also talks about MongoDB which is a document-based NoSQL database and explains the database strutre for my mouse-human research classifier project.
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
MongoDB is an open-source document database, and the leading NoSQL database. Written in C++.
MongoDB has official drivers for a variety of popular programming languages and development environments. There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Speaker: Alex Komyagin
MongoDB replica sets allow you to make the database highly available so that you can keep your applications running even when some of the database nodes are down. In a distributed system, local durability of writes with journaling is no longer enough to guarantee system-wide durability, as the node might go down just before any other node replicates new write operations from it. As such, we need a new concept of cluster-wide durability.
How do you make sure that your write operations are durable within a replica set? How do you make sure that your read operations do not see those writes that are not yet durable? This talk will cover the mechanics of ensuring durability of writes via write concern and how to prevent reading of stale data in MongoDB using read concern. We will discuss the decision flow for selecting an appropriate level of write concern, as well as associated tradeoffs and several practical use cases and examples."
Presentation covers core lucene/solr stuff which is used in numeric range queries. There are several examples, algorithm discovered by Uwe is briefly explained.
Index is a database object, which can be created on one or more columns (16 Max column combinations). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (Insert, Delete and Update) operations
Having the right indexes in place are crucial to performance in MongoDB. In this talk, we’ll explain how indexes work and the various indexing options. We’ll talk about the tools available to optimize your queries and avoid common pitfalls. Throughout the session, we’ll reference real-world examples to demonstrate the importance of proper indexing.
MongoDB 3.0 introduces a pluggable storage architecture and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database.
In this webinar Michael Cahill, co-founder of WiredTiger, will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
In this presentation, Raghavendra BM of Valuebound has discussed the basics of MongoDB - an open-source document database and leading NoSQL database.
----------------------------------------------------------
Get Socialistic
Our website: http://valuebound.com/
LinkedIn: http://bit.ly/2eKgdux
Facebook: https://www.facebook.com/valuebound/
Twitter: http://bit.ly/2gFPTi8
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a senior member of the support team I will share more common mistakes observed and some tips and tricks to avoiding them.
This presentation explains the major differences between SQL and NoSQL databases in terms of Scalability, Flexibility and Performance. It also talks about MongoDB which is a document-based NoSQL database and explains the database strutre for my mouse-human research classifier project.
Apache Spark is a In Memory Data Processing Solution that can work with existing data source like HDFS and can make use of your existing computation infrastructure like YARN/Mesos etc. This talk will cover a basic introduction of Apache Spark with its various components like MLib, Shark, GrpahX and with few examples.
MongoDB is an open-source document database, and the leading NoSQL database. Written in C++.
MongoDB has official drivers for a variety of popular programming languages and development environments. There are also a large number of unofficial or community-supported drivers for other programming languages and frameworks.
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
This presentation contains the introduction to NOSQL databases, it's types with examples, differentiation with 40 year old relational database management system, it's usage, why and we should use it.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Speaker: Alex Komyagin
MongoDB replica sets allow you to make the database highly available so that you can keep your applications running even when some of the database nodes are down. In a distributed system, local durability of writes with journaling is no longer enough to guarantee system-wide durability, as the node might go down just before any other node replicates new write operations from it. As such, we need a new concept of cluster-wide durability.
How do you make sure that your write operations are durable within a replica set? How do you make sure that your read operations do not see those writes that are not yet durable? This talk will cover the mechanics of ensuring durability of writes via write concern and how to prevent reading of stale data in MongoDB using read concern. We will discuss the decision flow for selecting an appropriate level of write concern, as well as associated tradeoffs and several practical use cases and examples."
Presentation covers core lucene/solr stuff which is used in numeric range queries. There are several examples, algorithm discovered by Uwe is briefly explained.
Index is a database object, which can be created on one or more columns (16 Max column combinations). When creating the index will read the column(s) and forms a relevant data structure to minimize the number of data comparisons. The index will improve the performance of data retrieval and adds some overhead on data modification such as create, delete and modify. So it depends on how much data retrieval can be performed on table versus how much of DML (Insert, Delete and Update) operations
Having the right indexes in place are crucial to performance in MongoDB. In this talk, we’ll explain how indexes work and the various indexing options. We’ll talk about the tools available to optimize your queries and avoid common pitfalls. Throughout the session, we’ll reference real-world examples to demonstrate the importance of proper indexing.
MongoDB supports a wide range of indexing options to enable fast querying of your data. In this talk we’ll cover how indexing works, the various indexing options, and cover use cases where each might be useful.
Having the right indexes in place are crucial to performance in MongoDB. In this talk, we’ll explain how indexes work and the various indexing options. Then, we'll cover the tools available to optimize your queries and avoid common pitfalls. This session will use real-world examples to demonstrate the importance of proper indexing.
As your data grows, the need to establish proper indexes becomes critical to performance. MongoDB supports a wide range of indexing options to enable fast querying of your data, but what are the right strategies for your application?
In this talk we’ll cover how indexing works, the various indexing options, and use cases where each can be useful. We'll dive into common pitfalls using real-world examples to ensure that you're ready for scale.
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the solutions architecture team, I will share common mistakes observed as well as tips and tricks to avoiding them.
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on the new capabilities of the aggregation framework. We are going to cover the new operators and expressions. At the same time, we will explore how updates commands can now use the aggregation framework operators. We are also going to present aggregation framework improvements focusing on the on-demand materialized views. Finally, we are going to explore the wildcard indexes introduced in MongoDB 4.2 and how they change the way we design documents and build queries/aggregations. We will also make a reference to the new index build system.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
16. How do I create indexes?
// Create an index if one does not exist
db.recipes.createIndex({ main_ingredient: 1 })
// The client remembers the index and raises no errors
db.recipes.ensureIndex({ main_ingredient: 1 })
* 1 means ascending, -1 descending
17. What can be indexed?
// Multiple fields (compound key indexes)
db.recipes.ensureIndex({
main_ingredient: 1,
calories: -1
})
// Arrays of values (multikey indexes)
{
name: 'Chicken Noodle Soup’,
ingredients : ['chicken', 'noodles']
}
db.recipes.ensureIndex({ ingredients: 1 })
18. What can be indexed?
// Subdocuments
{
name : 'Apple Pie',
contributor: {
name: 'Joe American',
id: 'joea123'
}
}
db.recipes.ensureIndex({ 'contributor.id': 1 })
db.recipes.ensureIndex({ 'contributor': 1 })
19. How do I manage indexes?
// List a collection's indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()
// Drop a specific index
db.recipes.dropIndex({ ingredients: 1 })
// Drop all indexes and recreate them
db.recipes.reIndex()
// Default (unique) index on _id
20. Background Index Builds
// Index creation is a blocking operation that can take a long time
// Background creation yields to other operations
db.recipes.ensureIndex(
{ ingredients: 1 },
{ background: true }
)
22. Uniqueness Constraints
// Only one recipe can have a given value for name
db.recipes.ensureIndex( { name: 1 }, { unique: true } )
// Force index on collection with duplicate recipe names – drop the
duplicates
db.recipes.ensureIndex(
{ name: 1 },
{ unique: true, dropDups: true }
)
* dropDups is probably never what you want
23. Sparse Indexes
// Only documents with field calories will be indexed
db.recipes.ensureIndex(
{ calories: -1 },
{ sparse: true }
)
// Allow multiple documents to not have calories field
db.recipes.ensureIndex(
{ name: 1 , calories: -1 },
{ unique: true, sparse: true }
)
* Missing fields are stored as null(s) in the index
24. Geospatial Indexes
// Add latitude, longitude coordinates
{
name: '10gen Palo Alto’,
loc: [ 37.449157, -122.158574 ]
}
// Index the coordinates
db.locations.ensureIndex( { loc : '2d' } )
// Query for locations 'near' a particular coordinate
db.locations.find({
loc: { $near: [ 37.4, -122.3 ] }
})
25. TTL Collections
// Documents must have a BSON UTC Date field
{ 'status' : ISODate('2012-10-12T05:24:07.211Z'), … }
// Documents are removed after 'expireAfterSeconds' seconds
db.recipes.ensureIndex(
{ submitted_date: 1 },
{ expireAfterSeconds: 3600 }
)
26. Limitations
• Collections can not have > 64 indexes.
• Index keys can not be > 1024 bytes (1K).
• The name of an index, including the namespace, must be <
128 characters.
• Queries can only use 1 index*
• Indexes have storage requirements, and impact the
performance of writes.
• In memory sort (no-index) limited to 32mb of return data.
28. Profiling Slow Ops
db.setProfilingLevel( n , slowms=100ms )
n=0 profiler off
n=1 record operations longer than slowms
n=2 record all queries
db.system.profile.find()
* The profile collection is a capped collection, and fixed in size
29. The Explain Plan (Pre Index)
db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
"cursor" : "BasicCursor" ,
"n" : 42,
"nscannedObjects” : 12345
"nscanned" : 12345,
...
"millis" : 356,
...
}
* Doesn’t use cached plans, re-evals and resets cache
30. The Explain Plan (Post Index)
db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
"cursor" : "BtreeCursor calories_-1" ,
"n" : 42,
"nscannedObjects": 42
"nscanned" : 42,
...
"millis" : 0,
...
}
* Doesn’t use cached plans, re-evals and resets cache
31. The Query Optimizer
• For each "type" of query, MongoDB
periodically tries all useful indexes
• Aborts the rest as soon as one plan wins
• The winning plan is temporarily cached for
each “type” of query
32. Manually Select Index to Use
// Tell the database what index to use
db.recipes.find({
calories: { $lt: 1000 } }
).hint({ _id: 1 })
// Tell the database to NOT use an index
db.recipes.find(
{ calories: { $lt: 1000 } }
).hint({ $natural: 1 })
33. Use Indexes to Sort Query
Results
// Given the following index
db.collection.ensureIndex({ a:1, b:1 , c:1, d:1 })
// The following query and sort operations can use the index
db.collection.find( ).sort({ a:1 })
db.collection.find( ).sort({ a:1, b:1 })
db.collection.find({ a:4 }).sort({ a:1, b:1 })
db.collection.find({ b:5 }).sort({ a:1, b:1 })
34. Indexes that won’t work for
sorting query results
// Given the following index
db.collection.ensureIndex({ a:1, b:1, c:1, d:1 })
// These can not sort using the index
db.collection.find( ).sort({ b: 1 })
db.collection.find({ b: 5 }).sort({ b: 1 })
35. Index Covered Queries
// MongoDB can return data from just the index
db.recipes.ensureIndex({ main_ingredient: 1, name: 1 })
// Return only the ingredients field
db.recipes.find(
{ main_ingredient: 'chicken’ },
{ _id: 0, name: 1 }
)
// indexOnly will be true in the explain plan
db.recipes.find(
{ main_ingredient: 'chicken' },
{ _id: 0, name: 1 }
).explain()
{
"indexOnly": true,
}
38. Trying to Use Multiple
Indexes
// MongoDB can only use one index for a query
db.collection.ensureIndex({ a: 1 })
db.collection.ensureIndex({ b: 1 })
// Only one of the above indexes is used
db.collection.find({ a: 3, b: 4 })
39. Compound Key Mistakes
// Compound key indexes are very effective
db.collection.ensureIndex({ a: 1, b: 1, c: 1 })
// But only if the query is a prefix of the index
// This query can't effectively use the index
db.collection.find({ c: 2 })
// …but this query can
db.collection.find({ a: 3, b: 5 })
41. Regular Expressions
db.users.ensureIndex({ username: 1 })
// Left anchored regex queries can use the index
db.users.find({ username: /^joe smith/ })
// But not generic regexes
db.users.find({username: /smith/ })
// Or case insensitive queries
db.users.find({ username: /Joe/i })
When speaking: What are indexes and why do we need them?First part of this talk is conceptualSecond part is extremely detailed
Look at 7 documents
Queries, inserts and deletes: O(log(n)) time
MongoDB's indexes are B-Trees.Lookups (queries), inserts and deletes happen in O(log(n)) time.TODO: Add a page describing what a B-Tree is???
So this is helpful, and can speed up queries by a tremendous amount
So it’s imperative we understand them
Tell a story about a customer problem caused by a missing index.
Repeated calls to ensureIndex only result in one create message going to the server. The index is cached client side for some period of time (varies by driver).
Indexes can be costly if you have too manysoooo....
getIndexes returns an index document for each index in the collection.dropIndex requires the spec used to create the index initiallyreIndex drops *all* indexes (including the _id index) and rebuilds them
Caveats:Still a resource-intensive operationIndex build is slowerThe mongo shell session or app will block while the index buildsIndexes are still built in the foreground on secondariesKristine to provide replica set image.
unique applies a uniqueness constant on duplicate values.dropDups will force the server to create a unique index by only keeping the first document found in natural order with a value and dropping all other documents with that value.dropDups will likely result in data loss!!!TODO: Maybe add a red exclamation point for dropDups.
MongoDB doesn't enforce a schema – documents are not required to have the same fields.Sparse indexes only contain entries for documents that have the indexed field.Without sparse, documents without field 'a' have a null entry in the index for that field.With sparse a unique constraint can be applied to a field not shared by all documents. Otherwise multiple 'null' values violate the unique constraint.XXX: Is there a visual that makes sense here?
'2d' index is a geohash on top of the b-tree.Allows you to search for documents 'near' a latitude/longitude position. Bounds queries are also possible using $within.TODO: Google maps image, or something similar. Kristine to provide.
Index must be on a BSON date field.Documents are removed after expireAfterSeconds seconds.Reaper thread runs every 60 seconds.TODO: Hourglass image, or something similar. Kristine to provide.
Indexes are a really powerful feature of MongoDB, however there are some limitations.Understanding these limitations is an important part of using MongoDB correctly.With the exception of $or queries.If index key exceeds 1k, documents silently dropped/not included
Changingslowms also affects what queries are logged to the mongodb log file.
cursor – the type of cursor used. BasicCursor means no index was used. TODO: Use a real example here instead of made up numbers…n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
cursor – the type of cursor used. BasicCursor means no index was used.n – the number of documents that match the querynscannedObjects – the number of documents that had to be scannednscanned – the number of items (index entries or documents) examinedmillis – how long the query tookRatio of n to nscanned should be as close to 1 as possible.
Winning plan is reevaluated after 1000 write operations (insert, update, remove, etc.).TODO: Replace much of this with an animation? Kristine to provide.
Tells MongoDB exactly what index to use.
MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
MongoDB sorts results based on the field order in the index.For queries that include a sort that uses a compound key index, ensure that all fields before the first sorted field are equality matches.TODO: Better explanation
TODO: Cookbook image here? Rework to go along with the cookbook example?
Tell a story about a customer problem caused by a suboptimal index.TODO: Change background color?
Better to use a compound index on the low selectivity field and some other more selective field.