In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. View the slides with video recording: www.mongodb.com/presentations/hardware-provisioning-mongodb
Some of the most common questions we hear from users relate to capacity planning and hardware choices. How many replicas do I need? Should I consider sharding right away? How much RAM will I need for my working set? SSD or HDD? No one likes spending a lot of cash on hardware and cloud bills can just be as painful. MongoDB is different from traditional RDBMSs in its resource management, so you need to be mindful when deciding on the cluster layout and hardware. In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. Attendees will gain additional insight as we go through a few real-world scenarios, as experienced with MongoDB Inc customers, and come up with their ideal cluster layout and hardware.
Capacity Planning For Your Growing MongoDB ClusterMongoDB
Your MongoDB deployment is growing, but are you prepared for that growth? Capacity planning is an essential practice when deploying any database system. You need to understand your usage patterns and determine the appropriate hardware based on your application's needs. Scaling reads and scaling writes will require different types of resources. With the proper tools in place, you can understand your working set, gain visibility into when it's time to add resources or start sharding and avoid performance issues. In this session, you'll learn how to use MongoDB Management Service and other tools to identify patterns and predict growth, ensuring your success with MongoDB.
When dealing with infrastructure we often go through the process of determining the different resources needed to attend our application requirements. This talks looks into the way that resources are used by MongoDB and which aspects should be considered to determined the sizing, capacity and deployment of a MongoDB cluster given the different scenarios, different sets of operations and storage engines available.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDBMongoDB
Setting up proactive monitoring systems can help you and your team prepare for operations problems before they happen and react appropriately when disaster strikes.
In this presentation, we reviewed diagnostic tools and strategies for monitoring MongoDB.
We reviewed how to do capacity planning and establish KPIs, and present the monitoring utilities available in MongoDB.
The KPIs to monitor in your database, including throughput metrics, database performance, resource utilization, resource saturation, assertions/errors
The commands, utilities and monitoring tools to leverage in order to set up your proactive monitoring installation
Key alerts to set for monitoring your KPIs
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
Some of the most common questions we hear from users relate to capacity planning and hardware choices. How many replicas do I need? Should I consider sharding right away? How much RAM will I need for my working set? SSD or HDD? No one likes spending a lot of cash on hardware and cloud bills can just be as painful. MongoDB is different from traditional RDBMSs in its resource management, so you need to be mindful when deciding on the cluster layout and hardware. In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. Attendees will gain additional insight as we go through a few real-world scenarios, as experienced with MongoDB Inc customers, and come up with their ideal cluster layout and hardware.
Capacity Planning For Your Growing MongoDB ClusterMongoDB
Your MongoDB deployment is growing, but are you prepared for that growth? Capacity planning is an essential practice when deploying any database system. You need to understand your usage patterns and determine the appropriate hardware based on your application's needs. Scaling reads and scaling writes will require different types of resources. With the proper tools in place, you can understand your working set, gain visibility into when it's time to add resources or start sharding and avoid performance issues. In this session, you'll learn how to use MongoDB Management Service and other tools to identify patterns and predict growth, ensuring your success with MongoDB.
When dealing with infrastructure we often go through the process of determining the different resources needed to attend our application requirements. This talks looks into the way that resources are used by MongoDB and which aspects should be considered to determined the sizing, capacity and deployment of a MongoDB cluster given the different scenarios, different sets of operations and storage engines available.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
Webinar: Keep Calm and Scale Out - A proactive guide to Monitoring MongoDBMongoDB
Setting up proactive monitoring systems can help you and your team prepare for operations problems before they happen and react appropriately when disaster strikes.
In this presentation, we reviewed diagnostic tools and strategies for monitoring MongoDB.
We reviewed how to do capacity planning and establish KPIs, and present the monitoring utilities available in MongoDB.
The KPIs to monitor in your database, including throughput metrics, database performance, resource utilization, resource saturation, assertions/errors
The commands, utilities and monitoring tools to leverage in order to set up your proactive monitoring installation
Key alerts to set for monitoring your KPIs
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB
Presented by Achille Brighton, Principal Consulting Engineer, MongoDB
Experience level: Deep dive
MongoDB 3.2 brings major enhancements. New pluggable storage engines optimized for in-memory computing and the most security-sensitive applications. Simplified data governance with document validation, coupled with GUI-based schema discovery and visualization. Improved operational efficiency with enhanced management platforms, continuous uptime across distributed, multi-region deployments, and zero-downtime upgrades. To take advantage of these features, your team needs an upgrade plan. In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. You’ll walk away confident that you're prepared to upgrade.
MongoDB 3.2 introduces a host of new features and benefits, including encryption at rest, document validation, MongoDB Compass, numerous improvements to queries and the aggregation framework, and more. To take advantage of these features, your team needs an upgrade plan.
In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. By the end, you should be prepared to start developing an upgrade plan for your deployment.
As we increasingly build applications to reach global audiences, the scalability and availability of your database across geographic regions becomes a critical consideration in systems selection and design.
Security is more critical than ever with new computing environments in the cloud and expanding access to the internet. There are a number of security protection mechanisms available for MongoDB to ensure you have a stable and secure architecture for your deployment. We'll walk through general security threats to databases and specifically how they can be mitigated for MongoDB deployments. Topics will include general security tools and how to configure those for MongoDB, an overview of security features available in MongoDB, including LDAP, SSL, x.509 and Authentication.
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.
New to MongoDB? We'll provide an overview of installation, high availability through replication, scale out through sharding, and options for monitoring and backup. No prior knowledge of MongoDB is assumed. This session will jumpstart your knowledge of MongoDB operations, providing you with context for the rest of the day's content.
Webinar: Technical Introduction to Native Encryption on MongoDBMongoDB
The new encrypted storage engine in MongoDB 3.2 allows you to more easily build secure applications that handle sensitive data. Attend this webinar to learn how the internals work and discover all of the options available to you for securing your data.
With the latest release of MongoDB 3.0, we have announced several new exciting features, including our new pluggable storage API and the WiredTiger storage engine, which provides compression, improved concurrency control, and more. Learn how you will be able to take advantage of these new features and how they will improve your database performance with this upcoming webinar.
NoSQL datastores fall under the following categories: Key-value stores, document databases, column-family stores and graph databases. The traditional TPC-* tests are not sufficient for these heterogeneous database systems. MongoDB, CouchDB, Cassandra, HBase, Memcaches etc belong to one of 4 families and a common workload can be generated by ycsb to simulate your usecase and benchmark them.
MongoDB 3.0 introduces a pluggable storage architecture and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database.
In this webinar Michael Cahill, co-founder of WiredTiger, will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
Speaker: Michael Cahill, Director of Engineering (Storage), MongoDB
Level: 300 (Advanced)
Track: How We Build MongoDB
When the WiredTiger storage engine was created, the use case we had in mind was applications with modest numbers of collections. That led to various choices during the design, such as storing each collection in a separate file. However, MongoDB customers have an enormous variety of use cases, including multi-tenant applications where each user has a separate database and each database contains hundreds of collections.
To support these applications efficiently, we have evolved the storage layer with better testing, better analysis tools, and more scalable data structures and algorithms. This session will explain how we can now run workloads with over a million collections.
What You Will Learn:
- How MongoDB represents collections and indexes in the storage layer.
- The system resources involved in accessing multiple collections and indexes from different client connections.
- The system limits MongoDB may hit as you add more collections, and how to increase or work around those limits.
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB
Presented by Osmar Olivo, Product Manager, MongoDB
Experience level: Introductory
WiredTiger is MongoDB's first officially supported pluggable storage engine as well as the new default engine in 3.2. It exposes several new features and configuration options. This talk will highlight the major differences between the MMAPV1 and WiredTiger storage engines including currency, compression, and caching.
Practical Design Patterns for Building Applications Resilient to Infrastructu...MongoDB
Speaker: Feng Qu, Sr MTS, eBay
Level: 200 (Intermediate)
Track: Developer
Building applications resilient to infrastructure failure is essential to systems that run in distributed environments, including those with a MongoDB database. For example, failure can come from computer resources, such as nodes, network switches, or the entire data center. On occasion, MongoDB nodes may be marked down by Operations to perform administrative tasks, such as a software upgrade, adding extra capacity, etc.
In this session, we will discuss how to build resilient applications using appropriate design patterns suitable to enterprise class MongoDB applications.
What You Will Learn:
- How to manage updates within a resilient architecture.
- Design patterns for resilient applications.
- Practical advice for deploying resilient enterprise applications.
In the world of big data we need to build services that will be able to collect massive data, save it and pass it to processing and analysis. However, building manageable, reliable services that are scalable and cost effective is not an easy task. The choice of eco-system, frameworks and programming language, as well as using solid engineering principles is also crucial for achieving this goal.
I will share our journey and insights from rebuilding a cloud service in Linux eco-system using Scala, Akka Actors and Aerospike DB, at the end of which we gained 10 folds improvement of server usage with a much lighter, stable and reliable system that handles tens of millions of requests per hour.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
MongoDB Days Silicon Valley: Best Practices for Upgrading to MongoDBMongoDB
Presented by Achille Brighton, Principal Consulting Engineer, MongoDB
Experience level: Deep dive
MongoDB 3.2 brings major enhancements. New pluggable storage engines optimized for in-memory computing and the most security-sensitive applications. Simplified data governance with document validation, coupled with GUI-based schema discovery and visualization. Improved operational efficiency with enhanced management platforms, continuous uptime across distributed, multi-region deployments, and zero-downtime upgrades. To take advantage of these features, your team needs an upgrade plan. In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. You’ll walk away confident that you're prepared to upgrade.
MongoDB 3.2 introduces a host of new features and benefits, including encryption at rest, document validation, MongoDB Compass, numerous improvements to queries and the aggregation framework, and more. To take advantage of these features, your team needs an upgrade plan.
In this session, we’ll walk you through how to build an upgrade plan. We’ll show you how to validate your existing deployment, build a test environment with a representative workload, and detail how to carry out the upgrade. By the end, you should be prepared to start developing an upgrade plan for your deployment.
As we increasingly build applications to reach global audiences, the scalability and availability of your database across geographic regions becomes a critical consideration in systems selection and design.
Security is more critical than ever with new computing environments in the cloud and expanding access to the internet. There are a number of security protection mechanisms available for MongoDB to ensure you have a stable and secure architecture for your deployment. We'll walk through general security threats to databases and specifically how they can be mitigated for MongoDB deployments. Topics will include general security tools and how to configure those for MongoDB, an overview of security features available in MongoDB, including LDAP, SSL, x.509 and Authentication.
MongoDB's architecture features built-in support for horizontal scalability, and high availability through replica sets. Auto-sharding allows users to easily distribute data across many nodes. Replica sets enable automatic failover and recovery of database nodes within or across data centers. This session will provide an introduction to scaling with MongoDB by one of MongoDB's early adopters.
New to MongoDB? We'll provide an overview of installation, high availability through replication, scale out through sharding, and options for monitoring and backup. No prior knowledge of MongoDB is assumed. This session will jumpstart your knowledge of MongoDB operations, providing you with context for the rest of the day's content.
Webinar: Technical Introduction to Native Encryption on MongoDBMongoDB
The new encrypted storage engine in MongoDB 3.2 allows you to more easily build secure applications that handle sensitive data. Attend this webinar to learn how the internals work and discover all of the options available to you for securing your data.
With the latest release of MongoDB 3.0, we have announced several new exciting features, including our new pluggable storage API and the WiredTiger storage engine, which provides compression, improved concurrency control, and more. Learn how you will be able to take advantage of these new features and how they will improve your database performance with this upcoming webinar.
NoSQL datastores fall under the following categories: Key-value stores, document databases, column-family stores and graph databases. The traditional TPC-* tests are not sufficient for these heterogeneous database systems. MongoDB, CouchDB, Cassandra, HBase, Memcaches etc belong to one of 4 families and a common workload can be generated by ycsb to simulate your usecase and benchmark them.
MongoDB 3.0 introduces a pluggable storage architecture and a new storage engine called WiredTiger. The engineering team behind WiredTiger team has a long and distinguished career, having architected and built Berkeley DB, now the world's most widely used embedded database.
In this webinar Michael Cahill, co-founder of WiredTiger, will describe our original design goals for WiredTiger, including considerations we made for heavily threaded hardware, large on-chip caches, and SSD storage. We'll also look at some of the latch-free and non-blocking algorithms we've implemented, as well as other techniques that improve scaling, overall throughput and latency. Finally, we'll take a look at some of the features we hope to incorporate into WiredTiger and MongoDB in the future.
Speaker: Michael Cahill, Director of Engineering (Storage), MongoDB
Level: 300 (Advanced)
Track: How We Build MongoDB
When the WiredTiger storage engine was created, the use case we had in mind was applications with modest numbers of collections. That led to various choices during the design, such as storing each collection in a separate file. However, MongoDB customers have an enormous variety of use cases, including multi-tenant applications where each user has a separate database and each database contains hundreds of collections.
To support these applications efficiently, we have evolved the storage layer with better testing, better analysis tools, and more scalable data structures and algorithms. This session will explain how we can now run workloads with over a million collections.
What You Will Learn:
- How MongoDB represents collections and indexes in the storage layer.
- The system resources involved in accessing multiple collections and indexes from different client connections.
- The system limits MongoDB may hit as you add more collections, and how to increase or work around those limits.
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB
Presented by Osmar Olivo, Product Manager, MongoDB
Experience level: Introductory
WiredTiger is MongoDB's first officially supported pluggable storage engine as well as the new default engine in 3.2. It exposes several new features and configuration options. This talk will highlight the major differences between the MMAPV1 and WiredTiger storage engines including currency, compression, and caching.
Practical Design Patterns for Building Applications Resilient to Infrastructu...MongoDB
Speaker: Feng Qu, Sr MTS, eBay
Level: 200 (Intermediate)
Track: Developer
Building applications resilient to infrastructure failure is essential to systems that run in distributed environments, including those with a MongoDB database. For example, failure can come from computer resources, such as nodes, network switches, or the entire data center. On occasion, MongoDB nodes may be marked down by Operations to perform administrative tasks, such as a software upgrade, adding extra capacity, etc.
In this session, we will discuss how to build resilient applications using appropriate design patterns suitable to enterprise class MongoDB applications.
What You Will Learn:
- How to manage updates within a resilient architecture.
- Design patterns for resilient applications.
- Practical advice for deploying resilient enterprise applications.
In the world of big data we need to build services that will be able to collect massive data, save it and pass it to processing and analysis. However, building manageable, reliable services that are scalable and cost effective is not an easy task. The choice of eco-system, frameworks and programming language, as well as using solid engineering principles is also crucial for achieving this goal.
I will share our journey and insights from rebuilding a cloud service in Linux eco-system using Scala, Akka Actors and Aerospike DB, at the end of which we gained 10 folds improvement of server usage with a much lighter, stable and reliable system that handles tens of millions of requests per hour.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
Silicon Valley Code Camp 2014 - Advanced MongoDBDaniel Coupal
MongoDB presentation from Silicon Valley Code Camp 2014.
Walkthrough developing, deploying and operating a MongoDB application, avoiding the most common pitfalls.
This presentation introduces the Big Data topic to Software Quality Assurance Engineers. It can also be useful for Software Developers and other software professionals.
Current HDFS Namenode stores all of its metadata in RAM. This has allowed Hadoop clusters to scale to 100K concurrent tasks. However, the memory limits the total number of files that a single NameNode can store. While Federation allows one to create multiple volumes with additional Namenodes, there is a need to scale a single namespace and also to store multiple namespaces in a single Namenode.
This talk describes a project that removes the space limits while maintaining similar performance by caching only the working set or hot metadata in Namenode memory. We believe this approach will be very effective because the subset of files that is frequently accessed is much smaller than the full set of files stored in HDFS.
In this talk we will describe our overall approach and give details of our implementation along with some early performance numbers.
Speaker: Lin Xiao, PhD student at Carnegie Mellon University, intern at Hortonworks
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
MongoDB presentation from Silicon Valley Code Camp 2015.
Walkthrough developing, deploying and operating a MongoDB application, avoiding the most common pitfalls.
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataHakka Labs
By Doug Daniels (Director of Engineering, Data Dog)
At Datadog, we collect hundreds of billions of metric data points per day from hosts, services, and customers all over the world. In addition charting and monitoring this data in real time, we also run many large-scale offline jobs to apply algorithms and compute aggregations on the data. In the past months, we’ve migrated our largest data sets over to Apache Parquet—an efficient, portable columnar storage format
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukAndrii Vozniuk
My presentation for the Cloud Data Management course at EPFL by Anastasia Ailamaki and Christoph Koch.
It is mainly based on the following two papers:
1) S. Ghemawat, H. Gobioff, S. Leung. The Google File System. SOSP, 2003
2) J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI, 2004
In this webinar, we will be covering general best practices for running MongoDB on AWS.
Topics will range from instance selection to storage selection and service distribution to ensure service availability. We will also look at any specific best practices related to using WiredTiger. We will then shift gears and explore recommended strategies for managing your MongoDB instance on AWS.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
11. Requirements – Step One
• It is impossible to properly size a MongoDB
cluster without first documenting your business
requirements
• Availability: what is your uptime requirement?
• Throughput
• Responsiveness
– what is acceptable latency?
– is higher latency during peak times acceptable?
12. Requirements – Step Two
• Understand your own resources available to you
– Storage
– Memory
– Network
– CPU
• Many customers limited to the options available in
AWS or presented by their own Enterprise
Virtualization team
13. Continuing Requirements – Step
Three
• Once you deploy initially, it is common for requirements to
change
– More users added to the application
• Causes more queries and a larger working set
– New functionality changes queries patterns
• New indexes added causes a larger working set
– What started as a read-intensive application can add more and more
write-heavy workloads
• More write-locking increases reader queue depth
• You must monitor and collect metrics and update your
hardware selection as necessary (scale up /Add RAM? Add
more shards?)
14. Run a Proof of Concept
• Forces you to:
– Do schema / index design
– Understand query patterns
– Get a handle on Working Set size
• Start small on a single node
– See how much performance you can get from one box
• Add replication, then add sharding
– Understand how these affect performance in your use case
• POC can be done on a smaller scale to infer what will be
needed for production
15. POC – Requirements to Gather
Data Sizes
– Total Number of Documents
– Average Document Size
– Size of Data on Disk
– Size of Indexes on Disk
– Expected growth
– What is your document model?
• Ingestion
– Insertions / Updates / Deletes per second, peak &
average
– Bulk inserts / updates? If so, how large and how often?
16. POC – Requirements to Gather
• Query Patterns and Performance Expectations
– Read Response SLA
– Write Response SLA
– Range queries or single document queries?
– Sort conditions
– Is more recent data queried more frequently?
• Data Policies
– How long will you keep the data for?
– Replication Requirements
– Backup Requirements / Time to Recovery
17. POC – Requirements to Gather
• Multi-datacenter Requirements
– Number and location of datacenters
– Cross DC latency
– Active /Active orActive / Passive?
– Geographical / Data locality requirements?
• Security Requirements
– Encryption over the wire (SSL) ?
– Encryption of data at rest?
18. Resource Usage
• Storage
– IOPS
– Size
– Data & Loading Patterns
• Memory
– Working Set
• CPU
– Speed
– Cores
• Network
– Latency
– Throughput
26. Case Study #1: A Spanish Bank
• Problem statement: want to store 6 months worth of
logs
• 18TB of total data (3 TB/month)
• Primarily analyzing the last month’s worth of logs, so
Working Set Size is 1 month’s worth of data (3TB)
plus indexes (1TB) = 4 TB Working Set
27. Case Study #1: Hardware Selection
• QAEnvironment
– Did not want to mirror a full production cluster. Just
wanted to hold 2TB of data
– 3 nodes / shard * 4 shards = 12 physical machines
– 2 mongos
– 3 config servers (virtual machines)
• Production Environment
– 3 nodes / shard * 36 shards = 108 physical machines
– 128GB/RAM * 36 = 4.6 TB RAM
– 2 mongos
– 3 config servers (virtual machines)
28. Case Study #2: A Large Online
Retailer
• Problem statement: Moving their product catalog
from SQL Server to MongoDB as part of a larger
architectural overhaul to Open Source Software
• 2 main datacenters running active/active
• On Cyber Monday they peaked at 214 requests/sec,
so let’s budget for 400 requests/sec to give some
headroom
29. Case Study #2: The POC
• APOC yielded the following numbers:
– 4 million product SKUs, average JSON document size
30KB
• Need to service requests for:
– a specific product (by _id)
– Products in a specific category (i.e. “Desks” or “Hard
Drives”)
• Returns 72 documents, or 200 if it’s a google bot
crawling)
30. Case Study #2: The Math
• Want to partition (Shard) by category, and have
products that exist in multiple categories duplicated
– The average product appears in 2 categories, so we
actually need to store 8M SKU documents, not 4M
• 8M docs * 30KB/doc = 240GB of data
• 270 GB with indexes
• Working Set is 100% of all data + indexes as this is
a core functionality that must be fast at all times
31. Case Study #2: Our
Recommendation
• MongoDB initial recommendation was to deploy a single
Replica Set with enough RAM in each server to hold all the
data (at least 384GB RAM/server)
• 4 node Replica Set (2 nodes in each DC, 1 arbiter in a 3rd DC)
– Allows for a node in each DC to go down for maintenance or system
crash while still servicing the application centers in that datacenter
• Deploy using secondary reads (NEAREST read preference)
• This avoids the complexity of sharding, setting up mongos,
config servers, worrying about orphaned documents, etc.
33. Case Study #2: Actual Provisioning
• Customer decided to deploy on their corporate
VMWare Cloud
• IT would not give them nodes any bigger than 64
GB RAM
• Decided to deploy 3 shards (4 nodes each + arbiter)
= 192 GB/RAM cluster wide into a staging
environment and add a fourth shard if staging
proves it would be worthwhile
34. Key Takeaways
• Document your performance requirements up front
• Conduct a Proof of Concept
• Always test with a real workload
• Constantly monitor and adjust based on changing
requirements