This document provides an overview of MongoDB basics, including:
- A history of MongoDB and how it enables working with non-structured data and real-time analytics.
- MongoDB's ranking as the highest placed non-relational database and as a "Challenger" to relational databases.
- How MongoDB works using a clustered architecture with shards, replica sets, config servers, and mongos processes to provide scalability, high availability, and load balancing.
- Key MongoDB concepts like documents, collections, embedded documents, and schema flexibility compared to a traditional SQL schema.
- MongoDB utilities for backup, restore, and monitoring like mongoexport, mongorestore, mongostat, and mongotop.
Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s sharding support, including an architectural overview, design principles, and automation.
This presentation will discuss implementing external authentication when using Percona Server for MongoDB and MongoDB Enterprise. It will review authentication using OpenLDAP or ActiveDirectory and ActiveDirectory with Kerberos.
The presentation will also include examples of the configurations required by these external directory services. It will also review the LDAP Authorization features introduced in MongoDB Enterprise 3.4.
MongoDB allows to profile slow operations. However, it's difficult to get a quick overview of a sharded system or to have a historical view since MongoDB stores slow operations on every profiled node in a capped collection. This talk, held during the MongoDB User Group Berlin on 4th of June 2013, gives a deeper insight how idealo solved these shortcomings.
Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s sharding support, including an architectural overview, design principles, and automation.
This presentation will discuss implementing external authentication when using Percona Server for MongoDB and MongoDB Enterprise. It will review authentication using OpenLDAP or ActiveDirectory and ActiveDirectory with Kerberos.
The presentation will also include examples of the configurations required by these external directory services. It will also review the LDAP Authorization features introduced in MongoDB Enterprise 3.4.
MongoDB allows to profile slow operations. However, it's difficult to get a quick overview of a sharded system or to have a historical view since MongoDB stores slow operations on every profiled node in a capped collection. This talk, held during the MongoDB User Group Berlin on 4th of June 2013, gives a deeper insight how idealo solved these shortcomings.
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
This presentation will compare WiredTiger’s In-Memory Engine to Redis. We will review characteristics of each data store, how they are similar, and different. Understanding the similarities and differences will help you decide which data store is best suited for your key-value store needs.
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
Most modern databases concern themselves with their ability to scale a workload beyond the power of one machine. But maintaining a database across multiple machines is inherently more complex than it is on a single machine. As soon as scaling out is required, suddenly a lot of scaling out is required, to deal with new problems like index suitability and load balancing.
Write optimized data structures are well-suited to a sharding architecture that delivers higher efficiency than traditional sharding architectures. This talk describes a new sharding architecture for MongoDB applications that can be achieved with write optimized storage like TokuMX's Fractal Tree indexes.
This tutorial will guide you through the many considerations when deploying a sharded cluster. We will cover the services that make up a sharded cluster, configuration recommendations for these services, shard key selection, use cases, and how data is managed within a sharded cluster. Maintaining a sharded cluster also has its challenges. We will review these challenges and how you can prevent them with proper design or ways to resolve them if they exist today. There will be lab sessions at the end of some chapters so please have your laptops with you.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
This presentation will discuss scalability best practices with MongoDB. We will review how the following affect scalability: schema design, locking granularity within versions and engines, scaling vertically or horizontally, and collection sharding. Understanding how these topics can affect your application will help you avoid complications as your data and workload grows.
Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
Back to Basics Spanish 4 Introduction to shardingMongoDB
Cómo MongoDB amplía el rendimiento de las operaciones de escritura y maneja grandes tamaño de datos
Cómo crear un sharded cluster básico
Cómo elegir una clave de sharding
MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
This presentation will compare WiredTiger’s In-Memory Engine to Redis. We will review characteristics of each data store, how they are similar, and different. Understanding the similarities and differences will help you decide which data store is best suited for your key-value store needs.
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
Most modern databases concern themselves with their ability to scale a workload beyond the power of one machine. But maintaining a database across multiple machines is inherently more complex than it is on a single machine. As soon as scaling out is required, suddenly a lot of scaling out is required, to deal with new problems like index suitability and load balancing.
Write optimized data structures are well-suited to a sharding architecture that delivers higher efficiency than traditional sharding architectures. This talk describes a new sharding architecture for MongoDB applications that can be achieved with write optimized storage like TokuMX's Fractal Tree indexes.
This tutorial will guide you through the many considerations when deploying a sharded cluster. We will cover the services that make up a sharded cluster, configuration recommendations for these services, shard key selection, use cases, and how data is managed within a sharded cluster. Maintaining a sharded cluster also has its challenges. We will review these challenges and how you can prevent them with proper design or ways to resolve them if they exist today. There will be lab sessions at the end of some chapters so please have your laptops with you.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
This presentation will discuss scalability best practices with MongoDB. We will review how the following affect scalability: schema design, locking granularity within versions and engines, scaling vertically or horizontally, and collection sharding. Understanding how these topics can affect your application will help you avoid complications as your data and workload grows.
Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
Back to Basics Spanish 4 Introduction to shardingMongoDB
Cómo MongoDB amplía el rendimiento de las operaciones de escritura y maneja grandes tamaño de datos
Cómo crear un sharded cluster básico
Cómo elegir una clave de sharding
MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.
Webinar: Building Your First App with MongoDB and JavaMongoDB
This webinar will walk you through building a simple Java-based application in MongoDB. We’ll cover the basics of MongoDB’s document model, query language, aggregation framework, and deployment architecture.
In this webinar, you will discover:
- How easy it is to start building Java applications with MongoDB
- Key features for manipulating and accessing data
- High availability and scale-out architecture
- WriteConcerns and ReadPreference
Complex Legacy System Archiving/Data Retention with MongoDB and XqueryDATAVERSITY
Many organizations today, due to regulatory compliance or other needs, are finding it necessary to archive large volumes of data into long-term storage. Learn how MongoDB provides a flexible, efficient, scalable, long-term document storage that can adapt to your organization's changing needs over time. A case study from US federal government agency with 130 legacy applications that needed to be archived and integrated into a federated view of archive and real-time operational data. Regulations in many industries (eg HIPAA, SOX, Basel 3, FATCA etc) are driving the need for data retention and the need for query processing across archives and operational data.
Strata 2015 presentation from Oracle for Big Data - we are announcing several new big data products including GoldenGate for Big Data, Big Data Discovery, Oracle Big Data SQL and Oracle NoSQL
Mettre en Oeuvre une Plateforme d'Intégration et de Gestion des Informations ...Microsoft Décideurs IT
Avec SQL Server 2012 Master Data Services, la plateforme Application Server de Microsoft se dote d'un outil puissant de gestion de la donnée de référence. Nous allons montrer en quoi et comment cette solution, qui dispose d'une fonctionnalité de maintien de la qualité de la donnée, complète l'écosystème applicatif et permet de mettre en oeuvre des processus d'intégration, d'enrichissement et de diffusion des données de référence, en temps réel, entre les différentes partenaires applicatifs du système d'information, à l'intérieur comme à l'extérieur de l'entreprise.
La formation complète est disponible ici:
http://www.alphorm.com/tutoriel/formation-en-ligne-mongodb-administration
Cette formation vous apprendra à maîtriser le système de gestion de base de données MongoDB.
Durant cette formation vous appréhenderez le fonctionnement du moteur MongoDB, à administrer mongoDB au quotidien (collection, document, sauvegarde, sécurité) et à mettre en œuvre une solution de haute disponibilité avec le système de Replica Set.
Vous apprendrez également à mettre en œuvre une solution performante pour la montée en charge avec la fonctionnalité avancée du Sharding.
Ce cours vous permettra également d'optimiser vos bases de données MongoDB par la mise en place d'une indexation optimale.
These are the slides I presented at the Nosql Night in Boston on Nov 4, 2014. The slides were adapted from a presentation given by Steve Francia in 2011. Original slide deck can be found here:
http://spf13.com/presentation/mongodb-sort-conference-2011
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
Presented by Austin Zellner, Solutions Architect, MongoDB
Schema design is as much art as it is science, but it is central to understanding how to get the most out of MongoDB. Attendees will walk away with an understanding of how to approach schema design, what influences it, and the science behind the art. After this session, attendees will be ready to design new schemas, as well as re-evaluate existing schemas with a new mental model.
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
Este es el último seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web le guiaremos por el despliegue en producción.
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian
What is the significance of MongoDB and what are its usages.docxkzayra69
MongoDB's significance lies in its ability to handle diverse data types, scale easily, and support agile development practices, making it a valuable asset for organizations looking to manage and analyze large volumes of data efficiently. Its dynamic schema and querying capabilities make it suitable for various use cases such as content management systems, social networking applications, IoT data storage, and mobile app backends. To fully leverage MongoDB's capabilities, it's essential to understand how to configure resource utilization effectively. By following best practices for hardware sizing, storage engine configuration, index optimization, and replica sets/sharding, you can ensure optimal performance and scalability for your MongoDB deployment. MongoDB provides built-in tools such as mongoimport and mongoexport for importing and exporting data, as well as monitoring tools like mongostat and mongotop for monitoring server statistics and database operations. By monitoring disk usage using MongoDB's built-in tools, database profiling, operating system tools, and third-party monitoring solutions, you can proactively identify and address issues affecting disk performance and ensure the smooth operation of your MongoDB deployment.
Dev Jumpstart: Build Your First App with MongoDBMongoDB
New to MongoDB? This talk will introduce the philosophy and features of MongoDB. We’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app. We’ll cover inserting, updating, and querying the database of books. This session will jumpstart your knowledge of MongoDB development, providing you with context for the rest of the day's content.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
3. Who am I?
Juan Antonio Roy Couto
❏ Financial Software Developer
❏ Email: juanroycouto@gmail.com
❏ Twitter: @juanroycouto
❏ Linkedin: https://www.linkedin.com/in/juanroycouto
❏ Slideshare: slideshare.net/juanroycouto
❏ Personal blog: http://www.juanroy.es
❏ Contributor at: http://www.mongodbspain.com
❏ Charrosfera member: http://www.charrosfera.com
MongoDB Basics
3
4. ❏ History
❏ Ranking, Who, Community & Metrics, Drivers
❏ Products
❏ Cluster Overview
❏ Characteristics
❏ Schema Design
❏ How does MongoDB work?
❏ Utilities
❏ Data analytics
❏ Ops Manager
❏ Cloud Manager
Agenda MongoDB Basics
4
5. History MongoDB Basics
MongoDB
Internet of things
Cloud computing
Wearables
Apps
Smart cities
❏ Non structured data
❏ Enabling Big Data analytics
❏ Faster development
❏ Real time analytics
❏ Better strategic decisions
❏ Reduce costs and time to
market
5
7. Who? MongoDB Basics
7
Who is using MongoDB?
https://www.mongodb.com/who-uses-mongodb
Who provides MongoDB?
https://www.mongodb.com/partners/list
8. Community & Metrics
https://www.mongodb.org/community
MongoDB Basics
❏ 10 million downloads
❏ 2,000+ customers (including over one third of the Fortune 100)
❏ 100+ MongoDB User Groups and 40,000 members worldwide
❏ 300,000+ Education Registrations
❏ The only “Challenger” to relational databases in Gartner’s
Operational Database Magic Quadrant
❏ Highest placed non-relational database in DB Engines rankings
8
17. SQL Schema Design MongoDB Basics
17
❏ Customer Key
❏ First Name
❏ Last Name
Tables
Customers
❏ Address Key
❏ Customer Key
❏ Street
❏ Number
❏ Location
Addresses
❏ Pet Key
❏ Customer Key
❏ Type
❏ Breed
❏ Name
Pets
18. MongoDB Schema Design MongoDB Basics
18
Customers Collection
❏ Street
❏ Number
❏ Location
Addresses
❏ Type
❏ Breed
❏ Name
Pets
Customers Info
❏ First Name
❏ Last Name
❏ Type
❏ Breed
❏ Name
> db.customers.findOne()
{
"_id" : ObjectId("54131863041cd2e6181156ba"),
"first_name" : "Peter",
"last_name" : "Keil",
"address" : {
"street" : "C/Alcalá",
"number" : 123,
"location" : "Madrid",
},
"pets" : [
{
"type" : "Dog",
"breed" : "Airedale Terrier",
"name" : "Linda",
},
{
"type" : "Dog",
"breed" : "Akita",
"name" : "Bruto",
}
]
}
>
20. Cluster overview
Replica Set
❏ High Availability
❏ Data Safety
❏ Asynchronous
❏ Automatic Node Recovery
❏ Read Preference
❏ Write Concern
Replica Set
Secondary
Secondary
Primary
MongoDB Basics
20
21. ❏ Scale out
❏ Even data distribution across all of the
shards based on a shard key
❏ A shard key range belongs to only one
shard
❏ More efficient queries (performance)
Cluster overview
Shards
Cluster
Shard 0 Shard 2Shard 1
A-I J-Q R-Z
MongoDB Basics
21
22. Cluster overview
Config servers
❏ config database
❏ Identical information (consistency check).
❏ Metadata:
❏ Cluster shards list
❏ Data per shard (chunk ranges)
❏ ...
❏ Replica Set (3.2 version)
MongoDB Basics
22
23. ❏ Receives client requests and returns results.
❏ Reads the metadata and sends the query to the necessary
shard/shards.
❏ Does not store data.
❏ Keeps a cache version of the metadata.
Cluster overview
mongos
MongoDB Basics
23
24. Definitions
❏ Range: Data division based on the values of the shard key.
❏ Chunk: They are not physical data. Chunks are just a logical grouping of
data into ranges (64MB by default).
❏ Split: Chunk division (size > 64MB). No data is moved. Background.
❏ Migration: Chunk movements between shards in order to get an even
distribution. Only one chunk is moved at a time.
❏ Balanced system: The same number of chunks per shard.
❏ Balancer: Checks if a migration is needed and starts it (background).
❏ Pre-split: First data is split, then it is stored.
❏ Tag-based sharding: Used when you want to pin ranges to a specific
shard.
MongoDB Basics
24
25. How does MongoDB work?
Shard 0 Shard 1 Shard 2 Shard 3
mongos
Client
Migrations
MongoDB Basics
25
26. Utilities
Backup tools
MongoDB Basics
26
Name Description
mongoexport Generates a JSON or CSV file from a mongodb instance
mongoimport Imports content from a JSON, CSV or TSV export
mongodump Utility for creating a binary export
mongorestore Writes data to a mongodb instance from a binary file
27. Utilities
Track tools
MongoDB Basics
27
Name Description
mongostat
Provides a quick overview of the status of a running mongod or
mongos instance
mongotop
Provides a method to track the amount of time a mongodb
instance spends reading or writing data.
mongotop provides statistics on a collection level.
By default, returns values each second.
29. OPS Manager MongoDB Basics
29
The best way to run MongoDB within your own data center or public cloud
❏ Monitors 100+ key database and system health metrics
(operations, memory, CPU,...)
❏ Customizable web dashboard
❏ Deploy new clusters (adding shards, replica set members,…)
❏ Alerts
❏ Backup (point-in-time recovery)
❏ Automation (upgrades, scaling,..)
30. Cloud Manager MongoDB Basics
❏ Simplify complex operational tasks (Reduce tedious manual steps to just a click of a button)
❏ Automated database management (deploy and upgrade with zero downtime)
❏ Continuous real-time backup (Cloud manager is disaster recovery).
❏ Full performance visibility
❏ Alerts
❏ Get the insights you need to make critical decisions fast.
❏ Cloud Manager saves you time, money, and helps you protect the customer experience by
eliminating the guesswork from running MongoDB.
30
31. ❏ High Performance
❏ Flexible
❏ Automatic Scalable
❏ Automatic Failover
❏ High Availability
❏ Reduced Administrative Tasks (replica set, sharding, disaster recovery)
❏ Real Time Analytic Tools (aggregation framework, mapReduce, Hadoop,
Spark, and BI connectors,...)
❏ Easy To Learn
Summary MongoDB Basics
31