Daniel Coupal "At this point, you may be familiar with the design of MongoDB databases and collections, however what are the frequent patterns you may have to model?
This presentation will build on the knowledge of how to represent common relationships (1-1, 1-N, N-N) into MongoDB. Going further than relationships, this presentation aims at identifying a set of common patterns in a similar way the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns into MongoDB collections.
"
Speaker: Jay Runkel
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn't a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
Speaker: Daniel Coupal
At this point, you may be familiar with the design of MongoDB databases and collections – but what are the frequent patterns you may have to model?
This presentation will add knowledge of how to represent common relationships (1-1, 1-N, N-N) in MongoDB. Going further than relationships, this presentation identifies a set of common patterns, in a similar way to what the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns in MongoDB collections.
In this session, you will learn about:
How to create the appropriate MongoDB collections for some of the patterns discussed.
Differences in relationships vs. the relational database world, and how those differences translate to MongoDB collections.
Common patterns in developing applications with MongoDB, plus a specific vocabulary with which to refer to them.
Speaker: Alex Komyagin
MongoDB replica sets allow you to make the database highly available so that you can keep your applications running even when some of the database nodes are down. In a distributed system, local durability of writes with journaling is no longer enough to guarantee system-wide durability, as the node might go down just before any other node replicates new write operations from it. As such, we need a new concept of cluster-wide durability.
How do you make sure that your write operations are durable within a replica set? How do you make sure that your read operations do not see those writes that are not yet durable? This talk will cover the mechanics of ensuring durability of writes via write concern and how to prevent reading of stale data in MongoDB using read concern. We will discuss the decision flow for selecting an appropriate level of write concern, as well as associated tradeoffs and several practical use cases and examples."
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
The hardest part of moving from a tabular database world to a modern world of objects and JSON is how to model your data. This year at OSN, Matt from MongoDB will take data modeling one step further than prior years and focus specifically on advanced schema design patterns to optimize the ease-of-use and performance of your data access layer and application.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice?
In this webinar you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Topics covered include:
Performance and Scalability
MongoDB's Data Model
Popular MongoDB Use Cases
Customer Stories
Determining the root cause of performance issues is a critical task for Operations. In this webinar, we'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Speaker: Jay Runkel
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn't a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
Speaker: Daniel Coupal
At this point, you may be familiar with the design of MongoDB databases and collections – but what are the frequent patterns you may have to model?
This presentation will add knowledge of how to represent common relationships (1-1, 1-N, N-N) in MongoDB. Going further than relationships, this presentation identifies a set of common patterns, in a similar way to what the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns in MongoDB collections.
In this session, you will learn about:
How to create the appropriate MongoDB collections for some of the patterns discussed.
Differences in relationships vs. the relational database world, and how those differences translate to MongoDB collections.
Common patterns in developing applications with MongoDB, plus a specific vocabulary with which to refer to them.
Speaker: Alex Komyagin
MongoDB replica sets allow you to make the database highly available so that you can keep your applications running even when some of the database nodes are down. In a distributed system, local durability of writes with journaling is no longer enough to guarantee system-wide durability, as the node might go down just before any other node replicates new write operations from it. As such, we need a new concept of cluster-wide durability.
How do you make sure that your write operations are durable within a replica set? How do you make sure that your read operations do not see those writes that are not yet durable? This talk will cover the mechanics of ensuring durability of writes via write concern and how to prevent reading of stale data in MongoDB using read concern. We will discuss the decision flow for selecting an appropriate level of write concern, as well as associated tradeoffs and several practical use cases and examples."
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
The hardest part of moving from a tabular database world to a modern world of objects and JSON is how to model your data. This year at OSN, Matt from MongoDB will take data modeling one step further than prior years and focus specifically on advanced schema design patterns to optimize the ease-of-use and performance of your data access layer and application.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice?
In this webinar you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Topics covered include:
Performance and Scalability
MongoDB's Data Model
Popular MongoDB Use Cases
Customer Stories
Determining the root cause of performance issues is a critical task for Operations. In this webinar, we'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Speaker: Isabel Peters, Software Engineer, MongoDB
Track: WTC Lounge
Data backup is a critical process to keep your data safe and recoverable in case of an unexpected local storage failure. At MongoDB, we develop tools to easily backup your data, keep it safe and restore it so that you don’t have to worry or spend time thinking about the process, allowing you to focus on your various other responsibilities. Come discover what the architecture of a backup system looks like.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
Common Cluster Configuration Pitfalls and How to Avoid Them
Speaker: Andrew Young, Technical Services Engineer, MongoDB
Level: 200 (Intermediate)
Track: Operations
Learn best practices in sharding and replication from a MongoDB Technical Services Engineer with experience in a wide variety of customer environments. The talk will discuss standard system configurations, common pitfalls and mistakes when configuring MongoDB clusters, and ways to recover from replication and sharding problems that arise. We will also consider specific use cases that require unusual configurations such as multi-tenant systems, geographically distributed systems, and systems that require dedicated business intelligence servers.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
Webinar: Choosing the Right Shard Key for High Performance and ScaleMongoDB
Read these webinar slides to learn how selecting the right shard key can future proof your application.
The shard key that you select can impact the performance, capability, and functionality of your database.
Speaker: Jay Runkel, Principal Solution Architect, MongoDB
Session Type: 40 minute main track session
Track: Operations
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn’t a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
What You Will Learn:
- How to architect a sharded cluster that provides the required computing resources while minimizing hardware or cloud computing costs
- How to use this information to estimate the overall cluster requirements for IOPS, RAM, cores, disk space, etc.
- What you need to know about the application to estimate a cluster size
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
Jumpstart: Using Aggregation for Analytics
Speaker: Ruben Terceño, Senior Solutions Architect, MongoDB
Level: 200 (Intermediate)
Track: Jumpstart
The MongoDB aggregation framework allows you to perform real-time analytics on your live operational data set. It's an important tool to understand when considering analytics options for your application. In this session we will give you an overview of basic aggregation functionality. You should walk away with an understanding of when to use the aggregation framework for your needs and how to leverage different functions for different purposes.
This is a Jumpstart session, held before the keynotes, designed to give you an overview of MongoDB aggregation basics so you can dive into more advanced sessions later in the day.
What You Will Learn:
- Discover the Aggregation Framework
- Understand the sweet spot for MongoDB Analytics
- Have fun crushing numbers!
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
Este es el último seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web le guiaremos por el despliegue en producción.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without needing to define a formal, up-front schema. Operations teams appreciate the fact that they don't need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute.
Some projects reach a point where it's necessary to define rules on what's being stored in the database. This webinar explains how MongoDB 3.2 allows that document validation work to be performed by the database rather than in the application code.
This webinar focuses on the benefits of using document validation: how to set up the rules using the familiar MongoDB Query Language and how to safely roll it out into an existing, mature production environment.
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
Moving to a new home is daunting. Packing up all your things, getting a vehicle to move it all, unpacking it, updating your mailing address, and making sure you did not leave anything behind. Well, the move to MongoDB Atlas is similar, but all the logistics are already figured out for you by MongoDB.
Hidden inside MongoDB is the WiredTiger data engine, an Open Source, pluggable storage engine that became the database's default in 3.2. Written in C, WiredTiger uses a variety of techniques to provide unmatched performance, low latency and scalability. This talk will explore data structures and techniques C/C++ programmers can use to support heavily threaded applications on modern hardware, using examples from the WiredTiger code base. Data structures and techniques to be covered include hazard pointers, skiplists, ticket locks, atomic instructions and memory barriers.
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB
Presented by Achille Brighton, Principal Consulting Engineer, MongoDB
Experience level: Deep dive
MongoDB 3.2 brings major enhancements. New pluggable storage engines optimized for in-memory computing and the most security-sensitive applications. Simplified data governance with document validation, coupled with GUI-based schema discovery and visualization. Improved operational efficiency with enhanced management platforms, continuous uptime across distributed, multi-region deployments, and zero-downtime upgrades. To take advantage of these features, your team needs an upgrade plan. In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. You’ll walk away confident that you're prepared to upgrade.
Speaker: Akira Kurogane, Senior Technical Services Engineer, MongoDB
Level: 300 (Advanced)
Track: Performance
One week your active dataset consumes 90% of available RAM. The next week it's 110%. Is that a 10% or 99% performance degradation? Let's discover what it looks like when different hardware capacity limitations are hit. For example, memory vs. disk bottlenecks, the rare CPU bottleneck and network bottlenecks, seeing what happens when you drop a crucial index during peak load, or what happens when you run multiple WiredTiger nodes on the same server without limiting their cache size.
What You Will Learn:
- Performance analysis
- Post-mortem log analysis
- Capacity planning
Speaker: Michael Cahill, Director of Engineering (Storage), MongoDB
Level: 300 (Advanced)
Track: How We Build MongoDB
When the WiredTiger storage engine was created, the use case we had in mind was applications with modest numbers of collections. That led to various choices during the design, such as storing each collection in a separate file. However, MongoDB customers have an enormous variety of use cases, including multi-tenant applications where each user has a separate database and each database contains hundreds of collections.
To support these applications efficiently, we have evolved the storage layer with better testing, better analysis tools, and more scalable data structures and algorithms. This session will explain how we can now run workloads with over a million collections.
What You Will Learn:
- How MongoDB represents collections and indexes in the storage layer.
- The system resources involved in accessing multiple collections and indexes from different client connections.
- The system limits MongoDB may hit as you add more collections, and how to increase or work around those limits.
Speaker: Isabel Peters, Software Engineer, MongoDB
Track: WTC Lounge
Data backup is a critical process to keep your data safe and recoverable in case of an unexpected local storage failure. At MongoDB, we develop tools to easily backup your data, keep it safe and restore it so that you don’t have to worry or spend time thinking about the process, allowing you to focus on your various other responsibilities. Come discover what the architecture of a backup system looks like.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
Common Cluster Configuration Pitfalls and How to Avoid Them
Speaker: Andrew Young, Technical Services Engineer, MongoDB
Level: 200 (Intermediate)
Track: Operations
Learn best practices in sharding and replication from a MongoDB Technical Services Engineer with experience in a wide variety of customer environments. The talk will discuss standard system configurations, common pitfalls and mistakes when configuring MongoDB clusters, and ways to recover from replication and sharding problems that arise. We will also consider specific use cases that require unusual configurations such as multi-tenant systems, geographically distributed systems, and systems that require dedicated business intelligence servers.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
Webinar: Choosing the Right Shard Key for High Performance and ScaleMongoDB
Read these webinar slides to learn how selecting the right shard key can future proof your application.
The shard key that you select can impact the performance, capability, and functionality of your database.
Speaker: Jay Runkel, Principal Solution Architect, MongoDB
Session Type: 40 minute main track session
Track: Operations
When architecting a MongoDB application, one of the most difficult questions to answer is how much hardware (number of shards, number of replicas, and server specifications) am I going to need for an application. Similarly, when deploying in the cloud, how do you estimate your monthly AWS, Azure, or GCP costs given a description of a new application? While there isn’t a precise formula for mapping application features (e.g., document structure, schema, query volumes) into servers, there are various strategies you can use to estimate the MongoDB cluster sizing. This presentation will cover the questions you need to ask and describe how to use this information to estimate the required cluster size or cloud deployment cost.
What You Will Learn:
- How to architect a sharded cluster that provides the required computing resources while minimizing hardware or cloud computing costs
- How to use this information to estimate the overall cluster requirements for IOPS, RAM, cores, disk space, etc.
- What you need to know about the application to estimate a cluster size
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
Jumpstart: Using Aggregation for Analytics
Speaker: Ruben Terceño, Senior Solutions Architect, MongoDB
Level: 200 (Intermediate)
Track: Jumpstart
The MongoDB aggregation framework allows you to perform real-time analytics on your live operational data set. It's an important tool to understand when considering analytics options for your application. In this session we will give you an overview of basic aggregation functionality. You should walk away with an understanding of when to use the aggregation framework for your needs and how to leverage different functions for different purposes.
This is a Jumpstart session, held before the keynotes, designed to give you an overview of MongoDB aggregation basics so you can dive into more advanced sessions later in the day.
What You Will Learn:
- Discover the Aggregation Framework
- Understand the sweet spot for MongoDB Analytics
- Have fun crushing numbers!
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
Este es el último seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web le guiaremos por el despliegue en producción.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without needing to define a formal, up-front schema. Operations teams appreciate the fact that they don't need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute.
Some projects reach a point where it's necessary to define rules on what's being stored in the database. This webinar explains how MongoDB 3.2 allows that document validation work to be performed by the database rather than in the application code.
This webinar focuses on the benefits of using document validation: how to set up the rules using the familiar MongoDB Query Language and how to safely roll it out into an existing, mature production environment.
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
Moving to a new home is daunting. Packing up all your things, getting a vehicle to move it all, unpacking it, updating your mailing address, and making sure you did not leave anything behind. Well, the move to MongoDB Atlas is similar, but all the logistics are already figured out for you by MongoDB.
Hidden inside MongoDB is the WiredTiger data engine, an Open Source, pluggable storage engine that became the database's default in 3.2. Written in C, WiredTiger uses a variety of techniques to provide unmatched performance, low latency and scalability. This talk will explore data structures and techniques C/C++ programmers can use to support heavily threaded applications on modern hardware, using examples from the WiredTiger code base. Data structures and techniques to be covered include hazard pointers, skiplists, ticket locks, atomic instructions and memory barriers.
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB
Presented by Achille Brighton, Principal Consulting Engineer, MongoDB
Experience level: Deep dive
MongoDB 3.2 brings major enhancements. New pluggable storage engines optimized for in-memory computing and the most security-sensitive applications. Simplified data governance with document validation, coupled with GUI-based schema discovery and visualization. Improved operational efficiency with enhanced management platforms, continuous uptime across distributed, multi-region deployments, and zero-downtime upgrades. To take advantage of these features, your team needs an upgrade plan. In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. You’ll walk away confident that you're prepared to upgrade.
Speaker: Akira Kurogane, Senior Technical Services Engineer, MongoDB
Level: 300 (Advanced)
Track: Performance
One week your active dataset consumes 90% of available RAM. The next week it's 110%. Is that a 10% or 99% performance degradation? Let's discover what it looks like when different hardware capacity limitations are hit. For example, memory vs. disk bottlenecks, the rare CPU bottleneck and network bottlenecks, seeing what happens when you drop a crucial index during peak load, or what happens when you run multiple WiredTiger nodes on the same server without limiting their cache size.
What You Will Learn:
- Performance analysis
- Post-mortem log analysis
- Capacity planning
Speaker: Michael Cahill, Director of Engineering (Storage), MongoDB
Level: 300 (Advanced)
Track: How We Build MongoDB
When the WiredTiger storage engine was created, the use case we had in mind was applications with modest numbers of collections. That led to various choices during the design, such as storing each collection in a separate file. However, MongoDB customers have an enormous variety of use cases, including multi-tenant applications where each user has a separate database and each database contains hundreds of collections.
To support these applications efficiently, we have evolved the storage layer with better testing, better analysis tools, and more scalable data structures and algorithms. This session will explain how we can now run workloads with over a million collections.
What You Will Learn:
- How MongoDB represents collections and indexes in the storage layer.
- The system resources involved in accessing multiple collections and indexes from different client connections.
- The system limits MongoDB may hit as you add more collections, and how to increase or work around those limits.
Speaker: Daniel Coupal, Senior Curriculum Engineer, MongoDB
Level: 200 (Intermediate)
Track: Developer
At this point, you may be familiar with the design of MongoDB databases and collections, however what are the frequent patterns you may have to model?
This presentation will build on the knowledge of how to represent common relationships (1-1, 1-N, N-N) into MongoDB. Going further than relationships, this presentation aims at identifying a set of common patterns in a similar way the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns into MongoDB collections.
What You Will Learn:
- How to create the appropriate MongoDB collections for some of the patterns discussed.
- The different relationships from the relational databases world, and understand how those translate to MongoDB collections.
- The patterns that are frequently seen in developing applications with MongoDB, and a specific vocabulary with which to refer to them. For example, “Subset”, “Attributes” and “Rolled Up” are among some of the patterns explored.
At this point, you may be familiar with the design of MongoDB databases and collections – but what are the frequent patterns you may have to model?
This presentation will add knowledge of how to represent common relationships (1-1, 1-N, N-N) in MongoDB. Going further than relationships, this presentation identifies a set of common patterns, in a similar way to what the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns in MongoDB collections.
In this session, you will learn about:
How to create the appropriate MongoDB collections for some of the patterns discussed.
Differences in relationships vs. the relational database world, and how those differences translate to MongoDB collections.
Common patterns in developing applications with MongoDB, plus a specific vocabulary with which to refer to them.
At this point, you may be familiar with the design of MongoDB databases and collections – but what are the frequent patterns you may have to model?
This presentation will add knowledge of how to represent common relationships (1-1, 1-N, N-N) in MongoDB. Going further than relationships, this presentation identifies a set of common patterns, in a similar way to what the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns in MongoDB collections.
In this session, you will learn about:
How to create the appropriate MongoDB collections for some of the patterns discussed.
Differences in relationships vs. the relational database world, and how those differences translate to MongoDB collections.
Common patterns in developing applications with MongoDB, plus a specific vocabulary with which to refer to them.
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
SQL Server Managing Test Data & Stress Testing January 2011Mark Ginnebaugh
A quick look at some of the available functionality for SQL Server developers who have access to Visual Studio 2010 and SQL-Hero.
With Visual Studio 2010 Premium (and Professional to a degree) delivering similar capabilities to what was available in VS 2008 Database Pro Edition, the ability to generate a mass amount of sample data for your database has only gotten more accessible with time.
Realizing that other tools exist in this space and not all SQL developers use Visual Studio, we’ll also take a look at the third party data generation facility available in SQL-Hero, seeing how we can create thousands (or millions!) of records very quickly using a powerful rules engine, plus automate this process to support continuous integration strategies.
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
SmugMug: From MySQL to Amazon DynamoDB (DAT204) | AWS re:Invent 2013Amazon Web Services
SmugMug.com is a popular hosting and commerce platform for photo enthusiasts with hundreds of thousands of subscribers and millions of viewers. Learn now SmugMug uses Amazon DynamoDB to provide customers detailed information about millions of daily image and video views. Smugmug shares code and information about their stats stack, which includes an HTTP interface to Amazon DynamoDB and also interfaces with their internal PHP stack and other tools such as Memcached. Get a detailed picture of lessons learned and the methods SmugMug uses to create a system that is easy to use, reliable, and high performing.
Framing the Argument: How to Scale Faster with NoSQLInside Analysis
The Briefing Room with Dr. Robin Bloor and IBM Cloudant
Live Webcast March 24, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e8bf62408d47e76c43aa73be08377e41c
Context matters. Perspective matters. Thinking outside the box? That's often the key! While the Structured Query Language remains the lingua Franca of data, there are some views of the world that are best rendered with the benefit of NoSQL engines. As usual, that's easier said than done. How can your organization migrate from a structured query to unstructured or semi-structured query language?
Register for this episode of The Briefing Room to find out! Veteran Analyst Dr. Robin Bloor will provide a detailed assessment of serious considerations when using NoSQL engines in conjunction with SQL. He'll be briefed by Ryan Millay of IBM Cloudant, who will showcase his company's solution, and how it's addressing the more vexing challenges facing today's information managers.
Visit InsideAnalysis.com for more information.
[MongoDB.local Bengaluru 2018] Jumpstart: Introduction to Schema DesignMongoDB
Presented by: Saurabh Kashikar
Abstract: MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. If you are new to MongoDB, learn the schema design basics in this introductory session. This session will help you model basic relationships in MongoDB.
What You Will Learn:
- The fundamentals of the MongoDB document model.
- How to model 1-1 and one-to-many (1-N) relationships in MongoDB.
- How to model many-to-many (N-N) relationships in MongoDB.
Learn how to achieve scale with MongoDB. In this presentation, we cover three different ways to scale MongoDB, including optimization, vertical scaling, and horizontal scaling.
Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, EuropeFlip Kromer
This talk centers on two things: a set of patterns for the architecture of high-scale data systems; and a framework for understanding the tradeoffs we make in designing them.
Hear Ryan Millay, IBM Cloudant software development manager, discuss what you need to consider when moving from world of relational databases to a NoSQL document store.
You'll learn about the key differences between relational databases and JSON document stores like Cloudant, as well as how to dodge the pitfalls of migrating from a relational database to NoSQL.
An Introduction To Software Development - Software Development Midterm ReviewBlue Elephant Consulting
This presentation is a part of the COP2271C college level course taught at the Florida Polytechnic University located in Lakeland Florida. The purpose of this course is to introduce Freshmen students to both the process of software development and to the Python language.
The course is one semester in length and meets for 2 hours twice a week. The Instructor is Dr. Jim Anderson.
A video of Dr. Anderson using these slides is available on YouTube at:
http://youtu.be/IgrPAlFVWbw
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
Chaque entreprise devient une entreprise de logiciels, fournissant des solutions client pour accéder à une variété de services et d'informations. Les entreprises commencent maintenant à valoriser leurs données et à obtenir de meilleures informations pour l'entreprise. Un défi crucial consiste à s'assurer que ces données sont toujours disponibles et sécurisées pour être conformes aux objectifs commerciaux de l'entreprise et aux contraintes réglementaires des pays. MongoDB fournit la couche de sécurité dont vous avez besoin, venez découvrir comment sécuriser vos données avec MongoDB.
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
Advanced Schema Design Patterns
1. O C T O B E R 1 2 , 2 0 1 7 | B E S P O K E | S A N F R A N C I S C O
# M D B l o c a l
Advanced Schema
Design Patterns
2. # M D B l o c a l
{ "name": "Daniel Coupal",
"jobs_at_MongoDB": [
{ "job": "Senior Curriculum Engineer",
"from": new Date("2016-11") },
{ "job": "Senior Technical Service Engineer",
"from": new Date("2013-11") }
],
"previous_jobs": [
"Consultant",
"Developer",
"Manager Quality & Tools Team",
"Manager Software Team",
"Tools Developer"
],
"likes": [ "food", "beers", "movies", "MongoDB" ]
}
Who Am I?
3. # M D B l o c a l
The "Gang of Four":
A design pattern systematically names, explains,
and evaluates an important and recurring design
in object-oriented systems
MongoDB systems can also be built using its own
patterns
PATTERN
Pattern
4. # M D B l o c a l
• Enable teams to use a common methodology and vocabulary
when designing schemas for MongoDB
• Giving you the ability to model schemas using building blocks
• Less art and more methodology
Why this Talk?
5. # M D B l o c a l
Ensure:
• Good performance
• Scalability
despite constraints ➡
• Hardware
• RAM faster than Disk
• Disk cheaper than RAM
• Network latency
• Reduce costs $$$
• Database Server
• Maximum size for a document
• Atomicity of a write
• Data set
• Size of data
Why do we Create Models?
6. # M D B l o c a l
•Don’t over-design! •Design for:
•Performance
•Scalability
•Simplicity
However …
7. # M D B l o c a l
WMDB -
World Movie Database
Any events, characters and
entities depicted in this
presentation are fictional.
Any resemblance or similarity to
reality is entirely coincidental
8. # M D B l o c a l
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C. screenings
9. # M D B l o c a l
Our mission, should we decide to accept it, is to
fix this solution, so it can perform well and scale.
As always, should I or anyone in the audience do
it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good
luck!
Mission Possible
10. # M D B l o c a l
Categories of Patterns
• Frequency of
Access
• Subset ✓
• Approximation ✓
• Grouping
• Computed ✓
• Overflow
• Bucket
• Representation
• Attribute ✓
• Schema Versioning ✓
• Document Versioning
• Tree
• Pre-Allocation
11. # M D B l o c a l
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}
Would need the following indexes:
{ release_USA: 1 }
{ release_Mexico: 1 }
{ release_France: 1 }
...
{ release_Festival_Mill_Valley: 1 }
...
Issue #1: Big Documents, Many Fields
and Many Indexes
12. # M D B l o c a l
Pattern #1: Attribute
{
title: "Moonlight",
...
release_USA: "2016/09/02",
release_Mexico: "2017/01/27",
release_France: "2017/02/01",
release_Festival_Mill_Valley:
"2017/10/10"
}
13. # M D B l o c a l
Problem:
• Lots of similar fields
• Common characteristic to search across those fields together
• Fields present in only a small subset of documents
Use cases:
• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
• Release dates of a movie in different countries, festivals
Attribute Pattern
14. # M D B l o c a l
Solution:
• Field pairs in an array
Benefits:
• Allow for non deterministic list of attributes
• Easy to index
{ "releases.location": 1, "releases.date": 1 }
• Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }
Attribute Pattern - Solution
15. # M D B l o c a l
Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards
Issue #2: Working Set doesn’t fit in RAM
16. # M D B l o c a l
WMDB -
World Movie Database
First iteration
3 collections:
A. movies
B. moviegoers
C. screenings
17. # M D B l o c a l
In this example, we can:
• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
• …
Pattern #2: Subset
18. # M D B l o c a l
Problem:
• There is a 1-N or N-N relationship, and only few documents from
need to be shown always
• Only infrequently do you need to pull all of the depending
documents
Use cases:
• Main actors of a movie
• List of reviews or comments
Subset Pattern
19. # M D B l o c a l
Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"
Subset Pattern - Solution
20.
21. # M D B l o c a l
• How duplication is handled
A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?
Aspect of Patterns: Consistency
23. # M D B l o c a l
{
title: "Your Name",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}
Issue #3: ..caused by repeated
calculations
24. # M D B l o c a l
For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups
Pattern #3: Computed
25. # M D B l o c a l
Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
Computed Pattern
26. # M D B l o c a l
Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over
• Replaces a view
Computed Pattern - Solution
27. # M D B l o c a l
Issue #4: Lots of Writes
Web page counters
Updates on movie data
Screenings
Other
28. # M D B l o c a l
Issue #4: … for non critical data
29. # M D B l o c a l
• Only increment once in X
iterations
• Increment by X
Pattern #4: Approximation
30. # M D B l o c a l
Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the document every time to keep
an exact count
• No one gives a damn if the number is exact
Use cases:
• Population of a country
• Web site visits
Approximation Pattern
31. # M D B l o c a l
Solution:
• Fewer stronger writes
Benefits:
• Less writes, reducing contention on some documents
Approximation Pattern –
Solution
32. # M D B l o c a l
• Keeping track of the schema version of a document
Issue #5: Need to change the list of fields
in the documents
33. # M D B l o c a l
Add a field to track the
schema version number, per
document
Does not have to exist for
version 1
Pattern #5: Schema Versioning
34. # M D B l o c a l
Problem:
• Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
• Practically any database that will go to production
Schema Versioning Pattern
35. # M D B l o c a l
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification
Schema Versioning Pattern –
Solution
36. # M D B l o c a l
• Bucket
• grouping documents together, to have less documents
• Document Versioning
• tracking of content changes in a document
• Outlier
• Avoid few documents drive the design, and impact performance for all
• Tree(s)
• Pre-allocation
Other Patterns
38. # M D B l o c a l
• Simple grouping from tables to collections is not optimal
• Learn a common vocabulary for designing schemas with
MongoDB
• Use patterns as "plug-and-play" for your future designs
• Attribute
• Subset
• Computed
• Approximation
• Schema Versioning
Take Aways
39. # M D B l o c a l
A full design example for a
given problem:
• E-commerce site
• Contents Management
System
• Social Networking
• Single view
• …
References for complete Solutions
40. # M D B l o c a l
• More patterns in a follow up to this presentation
• MongoDB in-person training courses on Schema Design
• Upcoming Online course at
MongoDB University:
• https://university.mongodb.com
• M220 Data Modeling
How Can I Learn More About Schema
Design?
41. # M D B l o c a l
daniel.coupal@mongodb.com
Thank You for
using MongoDB!
Editor's Notes
Welcome
[Remember]
Beware of transitions, keep them smooth
[TODOs]
Add the page numbers
Drawing of a working set
Consider removing ":" in the slide titles
Consider changing "revenues" => revenue, in few slides
More on the value and use cases for each pattern
Previous Jobs, Order of likes, =>Gang of Four
I like Food, Beers and Movies … and MongoDB.
My inspiration for this talk comes from the "Gang of Four".
How many of you are familiar with the "Gang of Four"?
Building blocks, Some patterns, => Same for MongoDB
Basically the ones who wrote this book on "Design Patterns"
GOF are Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides
https://en.wikipedia.org/wiki/Design_Patterns
Key words are "Elements of Reusable Software"
Assemble their experience on designing and implementing software over the years
They found that a lot of the solutions were sharing some "patterns"
Examples of patterns from "Design Patterns"
Types: Creational (5), Structural (7), Behavioral (11)
Singleton (restrict the creation to a single object for a given class)
Observer (number of objects to see an event)
Command (user operation)
Decorator (embellishing a UI element)
Memento (ability to restore an object to a previous state)
…
So, they went and made a catalog of those "patterns".
The idea is enable people who write software to share a common language and have building blocks for solutions.
10 Years, Vocabulary, Building Blocks, "Art", => Example
We use that contents in our internal trainings, however is it the first time we are presenting it at a conference, well… including the "data modeling" workshop we ran yesterday.
The goal is not to teach you about doing schema design.
I am expecting you to either have done some with MongoDB or with a Relational Database
My goal is help you formalize the process of creating schemas for MongoDB, help you work in team by sharing visuals, vocabulary
Performance & scalability, "air"
Before we get going, let's just answer why we create models.
In a perfect world, you don't really have to model.
I mean if everything is super fast and resources are abundant, you really don't care where and how data is stored
Every day I get up I don't make plans on how I will breathe air.
However if you go to space or under water, you will need a "design" that will let you get the amount of air you need.
Design is optional, cost of developer, 5 or 10 shards?
If performance is not an issue, meaning you have resources to spare, then you are likely to model for simplicity. The reason is that software engineers are very expensive. You may not think so, but your manager does.
If you need to shard the database, it is likely that performance is very important
Why using 10 shards, if you can reduce the number of operations (reads and writes) by 2 and be able to do the same with 5 shards?
Entities
In order to illustrate this talk, let's assume there is a site called the "World Movie Database".
This site is so popular that everyone goes there on Thursdays before the release of new movies and it crashes the site.
Then some people tried to migrate the site to a NoSQL database, MongoDB obviously.
Collections, grouping not optimal, =>accept challenge
This is the first try of trying to move the schema from Relational to MongoDB.
There are 3 collections: movies, moviegoers and screenings.
Simply grouping entities into collections is not optimal.
The solution using this design did not perform much better than the previous one.
This is still normalized. When you remove this restriction, duplication is fine, 1-1 relationships are fine.
You open the door to some important transformations.
Those will be our patterns.
[NOTE] Use "Sync Visibility" once you activate the color layer to also see it in the PNG file.
Perform & Scale, without training, disavow
Our goal, no need to say, is to fix this website before it gets the same fate as this tape recorder.
GoF, top 5 patterns in order,
We will use patterns, like the Gang of Four.
Most patterns can be grouped in 3 categories.
We will cover those patterns identified with check marks in this presentation.
Also, I will cover the patterns in order of importance, or so.
For the other ones, I will refer you to the slides of this presentation and subsequent content we will have on the subject.
How do I search on movies being released on a given date in the USA?
The same would apply to products you could see on E-commerce site.
For example, clothes may have a size that is expressed as S, M, L, while for some other products like a laptop, size would be something like 13", 15"
If you noticed from my personal info, I did use that pattern.
That allowed me to list my jobs at MongoDB and associate them with a given date.
Inventory of things to insure
Polymorphic entities
Vehicles: submarine, car
"Adding a qualifier on the attribute" may be "currency"
Working set, imagine no more RAM
With everyone pounding on the WMDB site, it was observed that the working set does not fit in memory.
What can you do?
Looking at the design we see that we are putting all the actors and all reviews for a given movie in the main document
[TODO] Add a drawing showing what the working set is
Collections, grouping not optimal, =>accept challenge
This is the first try of trying to move the schema from Relational to MongoDB.
There are 3 collections: movies, moviegoers and screenings.
Simply grouping entities into collections is not optimal.
The solution using this design did not perform much better than the previous one.
This is still normalized. When you remove this restriction, duplication is fine, 1-1 relationships are fine.
You open the door to some important transformations.
Those will be our patterns.
[NOTE] Use "Sync Visibility" once you activate the color layer to also see it in the PNG file.
The collection "castandcrew" contains all the actors, but also the producers, costume makers, stunts, etc.
For this pattern to be worth it, it has to have a fair amount of information left aside.
Top level information for a first page
If this is slow, you may not keep your users on the site
You want them to validate that this is what they want, then dig for more if needed
Let's take a pause there.
Don't go get popcorn, not yet, this is just an intermission from our pattern list.
[TODO] make this "intermission" more appealing
Let’s pause from our pattern list, and let’s examine a characteristic or aspect of some patterns.
As you may guess, people pay attention to the popularity of the movies.
So, metrics like "revenues" and "viewers" are really important.
In the current design, those numbers are calculated every time the page of a movie is displayed.
Let’s calculate those numbers once in a while and stick the results on the page instead.
As you may guess, people pay attention to the popularity of the movies.
So, metrics like "revenues" and "viewers" are really important.
In the current design, those numbers are calculated every time the page of a movie is displayed.
Let’s calculate those numbers once in a while and stick the results on the page instead.
Also refer to "Rolled up" as CQRS - Command Query Responsibility Segregation
According to Bryan, that sounds good at a Party.
Another thing that was observed with the current design is that trying to keep track of all page views of the site resulted in very poor performance. That was seen for both MMAPv1 and WT.
In MMAPv1, you get a lot of threads looking for the write lock.
While with WT, you get a lot of write conflicts that need to be retried.
One solution is to record "good enough" numbers. Well no one cares that the count is 100 millions or 100 millions and few. What is the tolerance level here? Let’s assume 1000.
In this case, we will let the application update the page views by 1000, however only 1/1000th of the time. Statistically, we should get a result very close to the exact count, however doing only 1/1000th of the writes.
If you make the parallel to a movie, we never see a movie as a continuous image, the movie is made by displaying 24 static images per second, however this is enough to our eyes to not see the discontinuties.
How do you do that? Let’s have the application run a (X mod 1000) operation, where X is a random number. If the result is 0, let’s update the counter by 1000.
Another thing that was observed with the current design is that trying to keep track of all page views of the site resulted in very poor performance. That was seen for both MMAPv1 and WT.
In MMAPv1, you get a lot of threads looking for the write lock.
While with WT, you get a lot of write conflicts that need to be retried.
One solution is to record "good enough" numbers. Well no one cares that the count is 100 millions or 100 millions and few. What is the tolerance level here? Let’s assume 1000.
In this case, we will let the application update the page views by 1000, however only 1/1000th of the time. Statistically, we should get a result very close to the exact count, however doing only 1/1000th of the writes.
If you make the parallel to a movie, we never see a movie as a continuous image, the movie is made by displaying 24 static images per second, however this is enough to our eyes to not see the discontinuties.
How do you do that? Let’s have the application run a (X mod 1000) operation, where X is a random number. If the result is 0, let’s update the counter by 1000.
You can have a counter. Once you reach the count, you do the write.
Or you can use a random generator and when you get a specific value, you do the write.
As you guess, this simple pattern is also applicable to Relational databases.
… it is just that NoSQL people have more tricks to handle performance bottlenecks.
Let's face it configuration management and database usually don't work well together.
Database tend to keep the "latest" state of your data, while "CM" systems remember everything. Those of you who checked in stupid mistakes in Git, ClearCase, etc know what I am taking about.
For this pattern, we are keeping track of the shape of the document. We are not addressing keeping track of the different contents of the document it self. This other case is solved by the Document Versioning pattern.
Instead of using a "version" field, we could discover the version number based on fields
- Few million references would not even fit into an embedded array. And if it did, you would not want to construct a query by passing a million values to the $in operator.
We touch a little bit the bucket pattern when we looked at the outlier one. The bucket pattern let you group X sub-documents into one document. When the bucket is full, you create another one.
Pre-allocation will be the case where you pre-create an array of cells to have the reads and writes easily access the elements. This is a very important pattern if you are using MMAPv1, as continuously growing an array can have a negative effect. With Wired Tiger it is not as crucial, however may make the code in the application simpler.
As for Trees are commonly represented by either having one node per document, where you can list the parent, the children, the ancestors, or a combination of those
[TODO] I need another title!
Elliot and Dev went to the future to see if there are still people using relational databases there, so we can work on the missing features in our next release.
I think they are looking at their watch to see if it is time to come back… or wait, maybe they want me to hurry up, so I will wrap up the presentation…
We did use a fictional site, however all the patterns we used would also apply to "Internet of Things", "Single View", "E-commerce" solutions.
10 years, future data big or not square, becoming an expert
MongoDB celebrates 10 years … very soon.
We are able to identify patterns because we have seen a lot of models with MongoDB over those first 10 years. Those are "plug-and-play" elements that let you go faster in your designs.
We do believe MongoDB has a bright future.
Most data that could be put in a Relational Database is already there. We are left with:
Data this is "not square", meaning it does not fit well in square tables.
Large datasets
We believe the document model and the scalability of MongoDB are prime to store that data
Ensure you are ready for the future by becoming an expert on MongoDB and how to model for it
My goal was to introduce you to patterns, however if you want more complete solutions to common problems, there are few good books out there. Let me point you to those 2:
The Little Mongo DB Schema Design Book Paperback, by Christian Kvalheim
MongoDB Applied Design Patterns, by Rick Copeland
I am leaving you with where you can find more information about schema design
M220 is likely to be available in Q4 2017
Thanks you for attending my presentation, and this conference, but above all:
Thank you for using MongoDB!