The document discusses key concepts related to MongoDB including its characteristics, replication, sharding, and utilities. MongoDB is a fast, flexible, scalable database designed to reduce administrative tasks through features like replica sets, sharding, and disaster recovery. It also includes powerful analysis tools and indexes that allow for queries on non-relational data structures.
MongoDB Basics. Talk at University of León, Spain. A whole description of MongoDB power, characteristics, capabilities and products. Updated to 3.2 version.
Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s sharding support, including an architectural overview, design principles, and automation.
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
Most modern databases concern themselves with their ability to scale a workload beyond the power of one machine. But maintaining a database across multiple machines is inherently more complex than it is on a single machine. As soon as scaling out is required, suddenly a lot of scaling out is required, to deal with new problems like index suitability and load balancing.
Write optimized data structures are well-suited to a sharding architecture that delivers higher efficiency than traditional sharding architectures. This talk describes a new sharding architecture for MongoDB applications that can be achieved with write optimized storage like TokuMX's Fractal Tree indexes.
MongoDB Basics. Talk at University of León, Spain. A whole description of MongoDB power, characteristics, capabilities and products. Updated to 3.2 version.
Sharding allows you to distribute load across multiple servers and keep your data balanced across those servers. This session will review MongoDB’s sharding support, including an architectural overview, design principles, and automation.
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
Most modern databases concern themselves with their ability to scale a workload beyond the power of one machine. But maintaining a database across multiple machines is inherently more complex than it is on a single machine. As soon as scaling out is required, suddenly a lot of scaling out is required, to deal with new problems like index suitability and load balancing.
Write optimized data structures are well-suited to a sharding architecture that delivers higher efficiency than traditional sharding architectures. This talk describes a new sharding architecture for MongoDB applications that can be achieved with write optimized storage like TokuMX's Fractal Tree indexes.
This presentation will discuss implementing external authentication when using Percona Server for MongoDB and MongoDB Enterprise. It will review authentication using OpenLDAP or ActiveDirectory and ActiveDirectory with Kerberos.
The presentation will also include examples of the configurations required by these external directory services. It will also review the LDAP Authorization features introduced in MongoDB Enterprise 3.4.
Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys
MongoDB allows to profile slow operations. However, it's difficult to get a quick overview of a sharded system or to have a historical view since MongoDB stores slow operations on every profiled node in a capped collection. This talk, held during the MongoDB User Group Berlin on 4th of June 2013, gives a deeper insight how idealo solved these shortcomings.
MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.
Back to Basics Spanish 4 Introduction to shardingMongoDB
Cómo MongoDB amplía el rendimiento de las operaciones de escritura y maneja grandes tamaño de datos
Cómo crear un sharded cluster básico
Cómo elegir una clave de sharding
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
This presentation will compare WiredTiger’s In-Memory Engine to Redis. We will review characteristics of each data store, how they are similar, and different. Understanding the similarities and differences will help you decide which data store is best suited for your key-value store needs.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
This tutorial will guide you through the many considerations when deploying a sharded cluster. We will cover the services that make up a sharded cluster, configuration recommendations for these services, shard key selection, use cases, and how data is managed within a sharded cluster. Maintaining a sharded cluster also has its challenges. We will review these challenges and how you can prevent them with proper design or ways to resolve them if they exist today. There will be lab sessions at the end of some chapters so please have your laptops with you.
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
This presentation will discuss implementing external authentication when using Percona Server for MongoDB and MongoDB Enterprise. It will review authentication using OpenLDAP or ActiveDirectory and ActiveDirectory with Kerberos.
The presentation will also include examples of the configurations required by these external directory services. It will also review the LDAP Authorization features introduced in MongoDB Enterprise 3.4.
Choosing a shard key can be difficult, and the factors involved largely depend on your use case. In fact, there is no such thing as a perfect shard key; there are design tradeoffs inherent in every decision. This presentation goes through those tradeoffs, as well as the different types of shard keys available in MongoDB, such as hashed and compound shard keys
MongoDB allows to profile slow operations. However, it's difficult to get a quick overview of a sharded system or to have a historical view since MongoDB stores slow operations on every profiled node in a capped collection. This talk, held during the MongoDB User Group Berlin on 4th of June 2013, gives a deeper insight how idealo solved these shortcomings.
MongoDB was designed for humongous amounts of data, with the ability to scale horizontally via sharding. In this session, we’ll look at MongoDB’s approach to partitioning data, and the architecture of a sharded system. We’ll walk you through configuration of a sharded system, and look at how data is balanced across servers and requests are routed.
Back to Basics Spanish 4 Introduction to shardingMongoDB
Cómo MongoDB amplía el rendimiento de las operaciones de escritura y maneja grandes tamaño de datos
Cómo crear un sharded cluster básico
Cómo elegir una clave de sharding
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
This presentation will compare WiredTiger’s In-Memory Engine to Redis. We will review characteristics of each data store, how they are similar, and different. Understanding the similarities and differences will help you decide which data store is best suited for your key-value store needs.
Managing Data and Operation Distribution In MongoDBJason Terpko
In a sharded MongoDB cluster, scale and data distribution are defined by your shard keys. Even when choosing the correct shards key, ongoing maintenance and review can still be required to maintain optimal performance.
This presentation will review shard key selection and how the distribution of chunks can create scenarios where you may need to manually move, split, or merge chunks in your sharded cluster. Scenarios requiring these actions can exist with both optimal and sub-optimal shard keys. Example use cases will provide tips on selection of shard key, detecting an issue, reasons why you may encounter these scenarios, and specific steps you can take to rectify the issue.
This tutorial will guide you through the many considerations when deploying a sharded cluster. We will cover the services that make up a sharded cluster, configuration recommendations for these services, shard key selection, use cases, and how data is managed within a sharded cluster. Maintaining a sharded cluster also has its challenges. We will review these challenges and how you can prevent them with proper design or ways to resolve them if they exist today. There will be lab sessions at the end of some chapters so please have your laptops with you.
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
Presented by Austin Zellner, Solutions Architect, MongoDB
Schema design is as much art as it is science, but it is central to understanding how to get the most out of MongoDB. Attendees will walk away with an understanding of how to approach schema design, what influences it, and the science behind the art. After this session, attendees will be ready to design new schemas, as well as re-evaluate existing schemas with a new mental model.
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
Percona Live 2017 - How sitecore depends on mongo db for scalability and performance, and what it can teach you by Antonios Giannopoulos and Grant Killian
MongoDB 4.2 comes GA soon delivering some amazing new features on multiple areas. In this talk, we will focus on changes related to sharded clusters. We are going to cover distributed transactions & mutable shard keys providing examples that will reveal the internals of those new features. We will provide best practices around the new sharding features and we will cover other minor changes related to it.
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
Este es el último seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web le guiaremos por el despliegue en producción.
Best Practices for Running MongoDB on AWS - AWS May 2016 Webinar SeriesAmazon Web Services
MongoDB is an open source, NoSQL database that uses JSON-like documents with dynamic schemas. MongoDB’s ease of use makes it a very popular choice among a wide variety of applications including Ad Tech, financial services, IoT, mobile, and more. The recent releases of MongoDB 3.2 bring the benefits of modern database architectures to a growing range of applications and users.
In this webinar, we'll cover best practices for running and scaling MongoDB on AWS. Then we will show how users can spin up new clusters on AWS in minutes using MongoDB Cloud Manager. Finally, we'll discuss the necessary steps to maintain, monitor, and backup MongoDB.
Learning Objectives:
• Best practices to deploy and scale MongoDB on AWS
• Using MongoDB Cloud Manager to spin up MongoDB clusters on AWS
• How to monitor and manage MongoDB on AWS
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lGNybu.
Stefan Krawczyk discusses how his team at StitchFix use the cloud to enable over 80 data scientists to be productive. He also talks about prototyping ideas, algorithms and analyses, how they set up & keep schemas in sync between Hive, Presto, Redshift & Spark and make access easy for their data scientists, etc. Filmed at qconsf.com..
Stefan Krawczyk is Algo Dev Platform Lead at StitchFix, where he’s leading development of the algorithm development platform. He spent formative years at Stanford, LinkedIn, Nextdoor & Idibon, working on everything from growth engineering, product engineering, data engineering, to recommendation systems, NLP, data science and business intelligence.
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
The world in which we monitor software is growing more complex every year. There are increasingly more ways to run server-side software, with many more independent services and more points of failures, the list goes on! On the plus side, there’s a lot of great tools and patterns being developed to try and make things simple to assess and understand. This talk covers how metrics and monitoring can be leveraged in a variety of different ways, auto-discovering applications and their usage of databases, caches, load balancers, etc, setting up and tearing down dashboards and monitoring automatically for services and instances, and more.
We’ll also talk about how you can accomplish all this with a global view of your systems using both Prometheus and Graphite with M3, our open source metrics platform. We’ll take a deep dive look at how we use M3DB, distributed aggregation with the M3 aggregator and the M3 Kubernetes operator to horizontally scale a metrics platform in a way that doesn’t cost outrageous amounts to run with a system that’s still sane to operate with petabytes of metrics data.
EuroPython 2013 - FAST, DOCUMENTED AND RELIABLE JSON BASED WEBSERVICES WITH P...Alessandro Molina
Web development is more and more moving to rich JavaScript and mobile applications, building a fast and reliable API server has become a core foundation in such situations.
The new 2.3 version of TurboGears2 is aiming at providing a great toolkit on both Python3 and Python2 for such cases thanks to builtin support for:
New minimal mode for small single file applications.
Gevent based deploy on Mozilla Circus using the new GearBox toolkit
The TGJSonAutodoc sphinx extension for automatically documenting JSON based API
Out of the box support for MongoDB using the Ming ODM
The DebugBar profiler and query analyzer for MongoDB
Those tools can make really easy to quickly prototype a fully working and documented web service, greatly improving a developer life and quality of the services. Even deployment can become a single line command thanks to the GearBox toolkit integration with Mozilla Circus while the jsoncall sphinx extension can really speed up service documentation thanks to its tgjsonautodoc directive.
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB
Presented by Sigfrido Narvaez, Senior Solutions Architect, MongoDB
Experience level: Introductory
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice? In this session you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Who: Karthik Ramasamy (@karthikz)
Date: September 20, 2016
Event: #TwitterRealTime
This slide deck consists of presentations from various teams about Twitter's real time infrastructure, the components it uses, and how they function. It includes presentations from David Rusek (@davidrusek), Maosong Fu (@Louis_Fumaosong), Sandy Strong (@st5are), and Yimin Tan (@YiminTan_Kevin).
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
Learn why MongoDB is spreading like wildfire across capital markets (and really every industry) and then focus in particular on how financial firms are enjoying the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data.
MongoDB Days Germany: Data Processing with MongoDBMongoDB
Presented by Marc Schwering, Senior Solutions Architect, MongoDB
Modern architectures are moving away from "one size fits all" solutions. The best tools need to be put to the job and given the large amounts of options today, chances are that you’ll end up using MongoDB for your operational workload, as well as Data Processing Systems like Apache Flink or Spark for your high speed data processing needs. When documents or data structures are modeled, there are some key aspects that need to be attended. This takes into consideration the distribution of data nodes, streaming capabilities, performance, aggregation, and queryability options, and how we can integrate the different data processing software that can benefit from subtle but substantial model changes. This session will cover the way how you enhance your architecture using data processing technologies such as Apache Flink and Spark. It will take the audience through the evolution of an app from simple to complex with its architectural requirements . We´ll look into similarities and differences of available technologies and you will walk away with an understanding how to use MongoDB to fulfill more advanced tasks such as personalization through clustering algorithms.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
1. Concepts of
Juan Antonio Roy Couto
Twitter: @juanroycouto
Website: www.juanroy.es
September 2014
2. Juan Antonio Roy Couto 2
Concepts
Contents
Why?
Characteristics
Who?
DB Ranking
Shell Drivers
Utilities
Community
Terms
Failover Replication Schema design
Replica Set
Indexes
Sharding
Pre-splitting Questions?
3. Apps
● Horizontal scalability
● Real time analytics
● Better strategic decisions
Internet of Things
Juan Antonio Roy Couto 3
Wearables
Smartcities
Cloud computing
● Non structured data
● Reduce costs and time to
market
Concepts
Why?
MongoDB
● Faster development
4. Juan Antonio Roy Couto 4
Concepts
Who provides MongoDB in the cloud?
http://www.mongodb.com/partners/list
Who is using MongoDB?
http://www.mongodb.com/who-uses-mongodb
Who?
5. Juan Antonio Roy Couto 5
Concepts
DB Ranking
http://db-engines.com/en/ranking
6. Juan Antonio Roy Couto 6
Concepts
Community
8 Million +
Downloads
200k+
Education Registrations
30k+
MongoDB User Group Members
7. Juan Antonio Roy Couto 7
Concepts
Drivers
http://docs.mongodb.org/ecosystem/drivers/
Driver
MongoDB
● C
● C++
● C#
● Java
● Node.js
● Perl
● PHP
● Python
● Ruby
● Scala
App
8. Juan Antonio Roy Couto 8
Concepts
Characteristics
http://www.mongodbspain.com/en/2014/08/17/mongodb-characteristics-future/
General purpose NoSQL database Native replication
Document oriented (stores data as
documents in BSON – Binary JSON) Auto sharding & load balancing
Schemaless (dynamic schema) Security
Open source Automatic failover
High availability (replica sets) JSON objects
Horizontal scalability (commodity
servers) MMS (continuous monitoring in the cloud)
Aggregation framework Geospatial queries
Map Reduce In-memory performance
Hadoop connector (for processing large
volumes of data in batch) ACID compliant at the document level
9. Juan Antonio Roy Couto 9
Concepts
Advanced characteristics
Chunk 1
Chunk 2
Chunk 3
GridFS
TTL (special indexes that
MongoDB can use to
automatically remove
documents from a collection
after a certain amount of
time)
Capped collections
Index intersection
...
10. Juan Antonio Roy Couto 10
Concepts
Shell
MongoDB
● Administrative tasks
● Full featured
● Javascript interpreter
● Standalone MongoDB client
● Allows interaction with a MongoDB instance from the
command line
11. mmoonnggooeexxppoorrtt mongoimport mongodump mongorestore mongoexport Utility that generates a JSON or CSV file of data from a MongoDB instance
Imports content from a JSON, CSV or TSV export
Utility for creating a binary export
Writes data to a MongoDB instance from a binary file
Juan Antonio Roy Couto 11
Concepts
Utilities
MongoDB tools for backup:
MongoDB tools for tracking instances:
mongostat Provides a quick overview of the status of a running mongod or mongos
instance
mongotop
Provides a method to track the amount of time a MongoDB instance spends
reading and writing data. mongotop provides statistics on a per-collection level.
By default, mongotop returns values every second
12. Juan Antonio Roy Couto 12
Concepts
Basic terms to know
MongoDB SQL
database database
collection table
document row
field column
embedding join
13. Geospatial indexes
MongoDB has two types of indexes
for supporting geographical queries.
● 2d indexes: for calculations on a
flat surface
● 2dsphere indexes: for
calculations on a earth-like
sphere
Juan Antonio Roy Couto 13
14. Tables
Customers Addresses
Juan Antonio Roy Couto 14
Concepts
SQL Schema Design
Customer key
First name
Last name
Phone number
Address key
Customer key
Street
Number
Location
Postal Code
Pets
Pet key
Customer key
Type
Breed
Name
Age
15. Customers collection
Customer info Addresses
Juan Antonio Roy Couto 15
Concepts
MongoDB Schema Design
> db.customers.findOne()
{
"_id" : ObjectId("54131863041cd2e6181156ba"),
"first_name" : "Peter",
"last_name" : "Keil",
"phone_number" : 619123456,
"address" : {
"street" : "C/Alcalá",
"number" : 123,
"location" : "Madrid",
"postal_code" : 12345
},
"pets" : [
{
"type" : "Dog",
"breed" : "Airedale Terrier",
"name" : "Linda",
"age" : 2
},
{
"type" : "Dog",
"breed" : "Akita",
"name" : "Bruto",
"age" : 10
}
]
}
>
First name
Last name
Phone number
Street
Number
Location
Postal Code
Type
Breed
Name
Age
Type
Breed
Name
Age
Pets
16. Replica Set ● High availability
Juan Antonio Roy Couto 16
Concepts
Replication
Primary
Secondary 1
Secondary 2
● Data safety
● Read preference
● Asynchronus
● Single primary
● Statement based
● Master-slave
● Automatic failover
● Automatic node recovery
17. Replica Set
Juan Antonio Roy Couto 17
Concepts
Failover scenario
Replica Set
Primary
Secondary 1
Secondary 2
Secondary 2
Primary
Secondary 1
1) Primary goes
down
2) New election
(majority of the
set)
3) Primary comes
back (now as
secondary)
4) The new primary
assumes
replication tasks
18. Replica Set
Juan Antonio Roy Couto 18
Concepts
Failover scenario with rollback
Replica Set
Primary
Secondary 1
Secondary 2
Secondary 2
Primary
Secondary 1
Rollback
Hard Disk
mongorestore
19. Juan Antonio Roy Couto 19
Concepts
Replica Set principles
● Write is truly
committed
upon
application at
the majority of
the set
20. Juan Antonio Roy Couto 20
Concepts
Replica Set: read preference
Reasons
Geography dispersed
nodes
Separate a work load
Availability
Types
Primary
Primary preferred
Secondary
Secondary preferred
Nearest
Tags
21. Shard 2
Shard N-1
Juan Antonio Roy Couto 21
Concepts
Sharding
Shard 0
Secondary
Secondary
Primary
Shard 1
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Config server
Config server
Config server
Query router Query router
...
Client Client Client
CLUSTER
22. Sharding: concepts
Sharding concepts
Data are uniformely distributed across the
shards using the shard key
Each shard allocates those documents that
belongs to its own range
Sharding improves efficiency and, therefore,
the performance because queries are routed
only to the shards in where our data resides
Juan Antonio Roy Couto 22
23. Sharding: metadata
The config servers allocates the config database which contains the cluster metadata
Metadata describes what is in the cluster, what is contained in the shards
It is a map of the data itself
Range-based partitioning
Shard key:
lastname Low High Shard
Range 0 Martín Pérez 0
Range 1 Pérez Rodriguez 1
Juan Antonio Roy Couto 23
24. Sharding: chunks, split and migrate
Chunk Split Migrate
Range data subset Runs in background Runs in background
Juan Antonio Roy Couto 24
Aproximately 1 chunk per 60MB
When a chunk grows beyond
60MB it will be splitted in two
equal chunks
It will move the
chunks across the
shards in order to
achieve the balance
The MongoDB goal is to achieve a uniform data distribution
across all the shards
MongoDB balances the number of chunks pers shard (nor
documents nor bytes)
By default all collections belong to shard 0
An empty collection has only one chunk (shard 0)
25. Sharding: chunks, split and migrate (2)
mongos
Shard 0
chunk 0
chunk 0
chunk 1
Shard 1
Juan Antonio Roy Couto 25
26. Pre-splitting
Utilized in batch/bulk loads
Split and migration do not work
Metadata are not altered
Data are stored automatically in its
shard
Shard 0
Shard 1
Shard 2
mongos
data
data
data
Juan Antonio Roy Couto 26
27. Summary
Designed to be:
● Fast (no joins, in-memory performance),
Juan Antonio Roy Couto 27
● Flexible (schemaless),
● Scalable (horizontal vs vertical),
● Easy to learn
Designed to:
● Reduce administrative tasks (replica set, sharding, disaster recovery)
With powerful:
● Analysis tools (aggregation framework, map reduce, hadoop
connector),
● Characteristics such as geospatial indexes, GridFS, etc.
29. Concepts
Thank you for your attention!
Juan Antonio Roy Couto
Email: juanroycouto@gmail.com September 2014
Juan Antonio Roy Couto 29
Editor's Notes
NoSQL surge debido a la globalización, se necesita una muy alta tasa de lectura y escritura, soportar gran cantidad de datos, máxima disponibilidad, peticiones,...
Rendimiento
Fiabilidad
Escalabilidad
Replica Set
Sharding Clusters
Auto balanceado de carga
Disminución de las labores típicas de administración de una base de datos (enumerar cuáles y por qué)
Aumento en la velocidad de la puesta en producción de un proceso al disminuir el tiempo del desarrollo de un producto
NoSQL significa No solo SQL
En el momento en que el modelo relacional no es capaz de asumir las necesidades actuales de
almacenamiento y procesado de la ingente cantidad de datos que hoy se genera (IoT, redes sociales,...)
Hoy los datos que se generan son multidisciplinares, no siguen un esquema fijo
MongoDB no pretende que nadie cambie su base de datos si esta le ofrece un rendimiento y
fiabilidad con la que está satisfecho. Sin embargo, sí basa su esfuerzo en las
pequeñas empresas o startups que abordan nuevos proyectos. También en aquellas empresas,
de cualquier tamaño, que quieren o necesitan mejorar el rendimiento de una aplicación
en marcha.
BBVA, Telefónica, Santander, ...
Por que es la base de datos no relacional líder del mercado
Open-source db used by companies of all sizes, across all industries and for a wide variety of applications. It is an agile database that allows schemas to change quickly as applications evolve, while still providing the functionality developers expect from traditional databases, such as secondary indexes, a full query language and strict consistency.
MongoDB is built for scalability, performance and high availability, scaling from single server deployments to large, complex multi-site architectures. By leveraging in-memory computing, MongoDB provides high performance for both reads and writes. MongoDB’s native replication and automated failover enable enterprise-grade reliability and operational flexibility.
Horizontal Scalability. As the data volume and throughput grow, developers can take advantage of commodity hardware and cloud infrastructure to increase the capacity of the MongoDB system.
High Availability. Multiple copies of data are maintained with native replication. Automatic failover to secondary nodes, racks and data centers makes it possible to achieve enterprise- grade uptime without custom code and complicated tuning
In-Memory Performance. Data is read and written to RAM while also persisted to disk for durability, providing fast performance and eliminating the need for a separate caching layer.
Aggregation - Batch processing of data and aggregate calculations
JavaScript execution - Ability to store JavaScript functions on the server
Es una base de datos generalista, no se enfoca en hacer bien una cosa, como podría ser el
caso de las clave:valor que son las que ofrecen la velocidad de respuesta más elevada del
mercado. Su objetivo es abarcar lo más posible y, por tanto, ofrece todas, o casi todas,
las características de las bases de datos relacionales y las ventajas de las no relacionales,
como pueden ser: schemaless, rendimiento,...
All mapReduce functions are native for both MongoDB are JavaScript and run on the database nodes.
Además de estas herramientas existen otras técnicas para hacer backup, como puede ser a través de una simple copia de los ficheros
MongoDB ha sido diseñada para que sea rápida (no joins but embedded documents)
Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon.
For supporting geospatial queries (2d and 2dsphere)
Failover:
- Proceso desde que se cae el primario hasta que otro nodo asume su papel
Node recovery:
- Rollback a todas las escrituras del primario que no llegaron a replicarse (si las había).
- Recepción de todas las operaciones que se han hecho mientras ha estado caído.
- Comienza a funcionar como secundario
Slave Delay:
Tiempo de retraso hasta que un secundario se actualiza.
Se utiliza en situaciones en las que se ha cometido un error (fat fingers) y se necesita volver atrás rápidamente sin tener que esperar a hacer un restore desde algún backup.
Tags:
Sirve para escoger los servidores con los que queremos hablar
Los routers (mongos) enrutan las peticiones de los clientes al shard/s implicado
El cliente no sabe si la colección está particionada o no, ni en qué shard residen los datos que necesita. Por lo tanto, no hay que cambiar el código de nuestra aplicación
MongoDB leverages horizontal scalability effortlessly by using commodity computers
Replica:
High availability
Data safety
Disaster recovery
Sharding:
Scale out
Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.
1 chunk is about 60MB of data
Chunks > 60 MB → split
Uniform data distribution across shards (chunks / shard)
Balancer decides when to migrate chunks and to which shard
Performance
Horizontal scalability with commodity hardware
Replica Set
Sharding Clusters
Auto load balancing
high availability
In-memory performance
Schema less
Failover
Data safety
Disaster recovery
MongoDB ha sido diseñada para que sea rápida (no joins but embedded documents), flexible (schema less), escalable (horizontal no vertical), para reducir al mínimo las labores de administración (replica set, failover, sharding) y para que a los programadores les resulte divertida y rápida de aprender a utilizar y dotada de potentes herramientas de análisis de datos (aggregation framework), geospatial indexes, GridFS, and so on.
MongoDB does not support multi-document transactions.
However, MongoDB does provide atomic operations on a single document. Often these document-level atomic operations are sufficient to solve problems that would require ACID transactions in a relational database. Relational databases might represent the same kind of data with multiple tables and rows, which would require transaction support to update the data atomically.