High Performance Applications with MongoDB

•Download as PPTX, PDF•

6 likes•2,980 views

To understand how to make your application fast, it's important to understand what makes the database fast. We will take a detailed look at how to think about performance, and how different choices in schema design affect your cluster performances depending on storage engines used and physical resources available.

High Performance with MongoDB
or "how to design fast applications"
Asya Kamsky
Lead Product Manager, MongoDB Inc
#MongoDB @asya999 #askAsya

What Is Fast?
• Must understand
– what "fast" means
– how to measure it
– what are requirements
– what's the context

Is It Fast?
• In context of crossing the bridge, fast means:
– how long will it take one car
– how many cars can do it "at the same time"

Is It Fast?
Facts & Info
Opened to traffic
Upper level: October 25, 1931
Lower level: August 29, 1962
Bus Station opened: January 17, 1963
Length of bridge between anchorages: 4,760 feet
Width of bridge: 119 feet
Width of roadway: 90 feet
Height of tower above water: 604 feet
Water clearance at midspan: 212 feet
Number of toll lanes:
Upper level: 12
Lower level: 10
Palisades Interstate Parkway: 7*
* E-ZPass only overnight
2013 Traffic Volumes
Total New York-bound (eastbound) traffic: 49,402,245 vehicles

What Is Fast?
Latency Throughput
How long "it" takes How many "per unit of time"

What Is Fast?
Latency ThroughputThroughput Latency

What Is Fast?
Latency ThroughputThroughput Latency
Orthogonal, but highly interdependent

Application
Schema
Indexes
Storage
Engine
Driver
DB Requests

Application
Schema
Indexes
File System
Storage
Engine
OS
Driver
DB Requests

Application
Schema
Indexes
File System
Storage
Engine
OS
Driver
DB Requests
PhysicalConceptual

Parent Object
OVER-NORMALIZATION OVER-EMBEDDING
Schema Anti-Patterns

Deeply nested arrays
Really large
documents
Schema Anti-Patterns: over-embedding

Unbounded growth
Deeply nested arrays
Really large
documents
Schema Anti-Patterns: over-normalizing
you are over-normalizing if you are
doing JOINS in your application
instead of "finds"

reads vs writes
polymorphic
collections
polymorphic fields
Schema Anti-Patterns: signs of trouble

bad regex queries
lots of indexes
no indexes
Schema Anti-Patterns: can't use indexes

MMAPV1 WiredTiger
Granularity low
Latency low
Granularity high
Latency higher

0
10,000
20,000
30,000
40,000
50,000
60,000
Uniform Latest Zipfian
Throughput: 50/50 Workload in RAM

${ timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } } } db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } } ) { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 2000000 } } }$

Benchmark your own application
Use realistic workload
Use real data
Measure throughput and latency

High Performance Applications with MongoDB

Presented by Greg Deeds, CEO, Technology Exploration Group Experience level: Introductory A two person team using MongoDB and Salesforce.com created a geospatial machine learning tool from various datasets, parsing, indexing, and mapreduce in 24 hours. The amazing hack that beat 350 teams from around the world designer Greg Deeds will speak on getting to the winners circle with MongoDB power. It was MongoDB that proved to be the teams secret weapon to level the playing field for the win!

Migrating to MongoDB: Best Practices

MongoDB

MongoDB Days Silicon Valley: Introducing MongoDB 3.2

MongoDB

Introduction to MongoDB

MongoDB

Back to Basics Webinar 2: Your First MongoDB Application

MongoDB

Webinar: Best Practices for Getting Started with MongoDB

MongoDB

MongoDB adoption continues to grow at a record pace due to the significant enhancements in developer productivity and scalability that the database provides. Occasionally, however, organizations new to the technology make mistakes that limit their ability to leverage the significant advantages MongoDB provides. This webinar will discuss some of the common mistakes made by users when they first start working with MongoDB, how to identify when you've made those mistakes, and how to resolve them.

MongoDB Best Practices for Developers

Moshe Kaplan

Webinar: Developing with the modern App Stack: MEAN and MERN (with Angular2 a...

MongoDB

Users increasingly demand a far richer experience from web applications – expecting the same level of performance and interactivity they get with native desktop and mobile apps. At the same time, there's pressure on developers to deliver new applications faster and continually roll-out enhancements, while ensuring that the application is highly available and can be scaled appropriately when needed. Fortunately, there’s a set of open source technologies using JavaScript that make all of this possible. Watch this presentation to learn about the two dominant JavaScript web app stacks – MEAN (MongoDB, Express, Angular, Node.js) and MERN (MongoDB, Express, React, Node.js). These technologies are also used outside of the browser – delivering the best user experience, regardless of whether accessing your application from the desktop, from a mobile app, or even using your voice. By watching this presentation you will learn: What these technologies and how they’re used in combination: NodeJS MongoDB Express Angular2 ReactJS How to get started building your own apps using these stacks Some of the decisions to take: Angular vs Angular2 vs ReactJS Javascript vs ES6 vs Typescript What should be implemented in the front-end vs the back-end

Presented by Austin Zellner, Solutions Architect, MongoDB Schema design is as much art as it is science, but it is central to understanding how to get the most out of MongoDB. Attendees will walk away with an understanding of how to approach schema design, what influences it, and the science behind the art. After this session, attendees will be ready to design new schemas, as well as re-evaluate existing schemas with a new mental model.

Back to Basics Webinar 1: Introduction to NoSQL

MongoDB

Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...

MongoDB

Webinar: Schema Patterns and Your Storage Engine

MongoDB

How do MongoDB’s different storage options change the way you model your data? Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways. This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger. Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.

Practical Ruby Projects With Mongo Db

Alex Sharp

MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101

MongoDB

Presented by Achille Brighton, Principal Consulting Engineer, MongoDB Experience level: Introductory New to MongoDB? We'll provide an overview of installation, high availability through replication, scale out through sharding, and options for monitoring and backup. No prior knowledge of MongoDB is assumed. This session will jumpstart your knowledge of MongoDB operations, providing you with context for the rest of the day's content.

Back to Basics: My First MongoDB Application

MongoDB

Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB

MongoDB

Back to Basics Webinar 1: Introduction to NoSQL

MongoDB

5 Pitfalls to Avoid with MongoDB

Tim Callaghan

Webinar: Performance Tuning + Optimization

MongoDB

Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos

MongoDB

MongoDB 101

Abhijeet Vaikar

mongoDB PerformanceMoshe Kaplan

Back to Basics Webinar 3: Schema Design Thinking in Documents

MongoDB

Steven Francia

Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework

MongoDB

MongoDB : The Definitive GuideWildan Maulana

Back to Basics Webinar 3: Introduction to Replica Sets

MongoDB

Agility and Scalability with MongoDB

MongoDB

MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.

High Performance MongoDB on Storage-Optimized AWS EC2MongoDB

Webinar: MongoDB Schema Design and Performance Implications

MongoDB

What's hot

MongoDB Schema Design: Practical Applications and Implications

MongoDB

Back to Basics Webinar 1: Introduction to NoSQL

MongoDB

Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...

MongoDB

Webinar: Schema Patterns and Your Storage Engine

MongoDB

Practical Ruby Projects With Mongo Db

Alex Sharp

MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101

MongoDB

Back to Basics: My First MongoDB Application

MongoDB

Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB

MongoDB

Back to Basics Webinar 1: Introduction to NoSQL

MongoDB

5 Pitfalls to Avoid with MongoDB

Tim Callaghan

Webinar: Performance Tuning + Optimization

MongoDB

Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos

MongoDB

MongoDB 101

Abhijeet Vaikar

mongoDB PerformanceMoshe Kaplan

Back to Basics Webinar 3: Schema Design Thinking in Documents

MongoDB

Steven Francia

Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework

MongoDB

MongoDB : The Definitive GuideWildan Maulana

Back to Basics Webinar 3: Introduction to Replica Sets

MongoDB

Agility and Scalability with MongoDB

MongoDB

What's hot (20)

MongoDB Schema Design: Practical Applications and Implications

Back to Basics Webinar 1: Introduction to NoSQL

Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...

Webinar: Schema Patterns and Your Storage Engine

Practical Ruby Projects With Mongo Db

MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101

Back to Basics: My First MongoDB Application

Conceptos básicos. Seminario web 2: Su primera aplicación MongoDB

Back to Basics Webinar 1: Introduction to NoSQL

5 Pitfalls to Avoid with MongoDB

Webinar: Performance Tuning + Optimization

Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentos

MongoDB 101

mongoDB Performance

Back to Basics Webinar 3: Schema Design Thinking in Documents

MongoDB

Conceptos básicos. Seminario web 5: Introducción a Aggregation Framework

MongoDB : The Definitive Guide

Back to Basics Webinar 3: Introduction to Replica Sets

Agility and Scalability with MongoDB

Viewers also liked

High Performance MongoDB on Storage-Optimized AWS EC2MongoDB

Webinar: MongoDB Schema Design and Performance Implications

MongoDB

Mongo performance tuning: tips and tricks

Vladimir Malyk

No sqlSparsa Roychowdhury

The role of NoSQL in the Next Generation of Financial Informatics

Aerospike, Inc.

Optimizing your job apply pages with the LinkedIn profile API

Ivo Brett

What enterprises can learn from Real Time Bidding

Aerospike

Brian Bulkowski, CTO of Aerospike, the NoSQL database, discusses the software architecture pioneered in cutting edge advertising optimizations companies in 2008, made popular between 2009 and 2013, and now becoming more widely used in Financial Services, Retail, Social Media, Travel companies, and others. This new technology architecture focuses on multiple big data analytics sources - HDFS based batch engines, using Hadoop, Hive, Hbase, Vertica, Spark, and others depending on analysis and query patterns - with an operational and application layer. The operational application level consists of new internet application stacks, such as Node.js, Nginx, Jetty, Scala, and Go, and in-memory NoSQL databases such as MongoDB, Cassandra, and Aerospike. Specific recommendations regarding building a high-performance operational layer are presented. In particular, focusing on primary-key access at the operational layer, using Flash for the random in-memory nosql layer, and the benefits of Open Source were presented. This presentation was given at the Big Data Gurus meetup in Santa Clara, CA, on July 29, 2014. http://www.meetup.com/BigDataGurus/

Building your first app with mongo db

MongoDB

MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)MongoDB

Rapid Application Design in Financial Services

Aerospike

Introduction to mongoDBCuelogic Technologies Pvt. Ltd.

Building Your First Application with MongoDBMongoDB

Agile Schema Design: An introduction to MongoDB

Stennie Steneker

MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines

MongoDB

Is It Fast? : Measuring MongoDB Performance

Tim Callaghan

Webinar: High Performance MongoDB Applications with IBM POWER8

MongoDB

Brian Bulkowski. Aerospike

Volha Banadyseva

Mongo db multidc_webinar

MongoDB

Creating a Single View Part 1: Overview and Data AnalysisMongoDB

Step-by-Step Parse Migration

MongoDB

Parse was a bold offering in the burgeoning space of Backend-as-a-Service, and we’re sorry to see them wind down. If your application runs on Parse you’ll need to migrate your data from from the hosted service to your own database. Fortunately, MongoDB Cloud Manager makes running your own deployment easy. In this webinar we’ll use Cloud Manager to create and manage a new replica set, and detail the steps required to migrate from the Parse platform to your own deployment of MongoDB on Amazon Web Services.

Viewers also liked (20)

High Performance MongoDB on Storage-Optimized AWS EC2

Webinar: MongoDB Schema Design and Performance Implications

Mongo performance tuning: tips and tricks

No sql

The role of NoSQL in the Next Generation of Financial Informatics

Optimizing your job apply pages with the LinkedIn profile API

What enterprises can learn from Real Time Bidding

Building your first app with mongo db

MongoDB Schema Design (Richard Kreuter's Mongo Berlin preso)

Rapid Application Design in Financial Services

Introduction to mongoDB

Building Your First Application with MongoDB

Agile Schema Design: An introduction to MongoDB

MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines

Is It Fast? : Measuring MongoDB Performance

Webinar: High Performance MongoDB Applications with IBM POWER8

Brian Bulkowski. Aerospike

Mongo db multidc_webinar

Creating a Single View Part 1: Overview and Data Analysis

Step-by-Step Parse Migration

Similar to High Performance Applications with MongoDB

Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas

MongoDB

In this talk, Appboy co-founder and CIO Jon Hyman will discuss various schemas that Appboy has evolved to use on MongoDB, remaining agile as Appboy has grown to massive scale. Jon will discuss topics such as random sampling of documents, multivariate testing and multi-arm bandit optimization of such tests, field tokenization, and how Appboy stores multi-dimensional data on an individual user basis to be able to quickly optimize for the best time to deliver messages to end users. Appboy is the global leader in Marketing Automation for Apps, helping clients such as Urban Outfitters, Shutterfly, Kixeye, PicsArt, USA Today Sports, and iHeartRadio increase engagement through automated messaging. Each month, Appboy collects tens of billions of data points from hundreds of millions of monthly active users.

MongoDB Stitch Introduction

MongoDB

MongoDB Days UK: Building Apps with the MEAN Stack

MongoDB

Presented by Norberto Leite, Developer Advocate, MongoDB Experience level: Advanced Get ready to be MEAN! The MEAN Stack (MongoDB, ExpressJS, AngularJS and Node.js) allows developers to do rapid application development and application scaffolding. In this session, Norberto will walk you through strategies and best practices for building applications on the MEAN stack, the benefits of using such an application stack and the key benefits of each of the individual components.

Day 4 - Cloud Migration - But How?

Amazon Web Services

Migrating your business applications from your on-site or co-located datacenters to the AWS Cloud takes some planning, and a phased approach. This webinar looks at migration patterns from an architectural perspective and what tools and techniques are available to you. Reasons to attend: - Learn about planning your cloud migration strategy. - This webinar will help you select the workloads that can easily be moved to the cloud. - Evaluate the conditions and metrics required for a successful and cost effective migration.

MongoDB Stich Overview

MongoDB

Speaker:Drew DiPalma Learn more about MongoDB Stitch, our new Backend as a Service (BaaS) that makes it easy for developers to create and launch applications across mobile and web platforms. Stitch provides a REST API on top of MongoDB with read, write, and validation rules built-in and full integration with the services you love. This talk will cover the what, why, and how of MongoDB Stitch. We'll discuss everything from features to the architecture. You'll walk away knowing how Stitch can kickstart your new project or take your existing application to the next level.

Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx

Milen Dyankov

MongoDB for Time Series Data: Setting the Stage for Sensor ManagementMongoDB

WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform

WSO2

In today’s connected, organizations have access to an enormous amount of data but only use a very small subset of it. This data can give you hindsight, oversight, insight and foresight about your enterprise and the world that communicates with. In can be leverage to gain a considerable competitive advantage in the market. The WSO2 Data Analytics platform lets you collect data, explore it through batch, real-time, interactive and predictive processing technologies and communicate your results. In this talk, we will discuss the WSO2 Data Analytics platform and how it brings together all analytics technologies into a single platform and user experience.

Anything data (revisited)

Ahmet Akyol

Comparison between OGC Sensor Observation Service and SensorThings API

SensorUp

The recording of the webinar is here: https://www.youtube.com/watch?v=SyDSB5VM2Bw&list=PLUSJC5mjKZ9SIASpVJNWKWCSS9hVzjiFA&index=2 This webinar discussed the differences between the two OGC standards for IoT data exchange, i.e., OGC Sensor Observation Service and the OGC SensorThings API. It compares the two specifications in terms of interoperability, feature list, developer experience, efficiency, scalability/discoverability, and security. In summary, SOS and SensorThings are both interoperable. SensorThings can interoperate with SOS but not the other way around. SensorThings offers more features, better developer experience, better efficiency, and better scalability. In terms of security, SensorThings API can leverage the XML/SOAP security mechanisms by offering an SOS interface.

Socialite, the Open Source Status Feed

MongoDB

Building a complete social networking platform presents many challenges at scale. Socialite is a reference architecture and open source Java implementation of a scalable social feed service built on DropWizard and MongoDB. We'll provide an architectural overview of the platform, explaining how you can store an infinite timeline of data while optimizing indexing and sharding configuration for access to the most recent window of data. We'll also dive into the details of storing a social user graph in MongoDB.

Intro to node and mongodb 1Mohammad Qureshi

Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB

MongoDB

Watch this webinar to learn about our new Backend as a Service (BaaS) – MongoDB Stitch. MongoDB Stitch lets developers focus on building applications rather than on managing data manipulation code, service integration, or backend infrastructure. Whether you’re just starting up and want a fully managed backend as a service, or you’re part of an enterprise and want to expose existing MongoDB data to new applications, Stitch lets you focus on building the app users want, not on writing boilerplate backend logic. This webinar will cover the what, why, and how of MongoDB Stitch. We’ll cover everything from the features it provides to the architecture that makes it possible. By the end of the session, you should understand how Stitch can kickstart your new project or take your existing application to the next level. Attendees will learn: - The basics of MongoDB Stitch and how to use it for new projects or to expose existing data to new applications - How to control what data and services individual users can access - How to integrate your favorite services with your MongoDB application without writing extra code

Cassandra's Sweet Spot - an introduction to Apache Cassandra

Dave Gardner

Using Graph Analysis and Fraud Detection in the Fintech Industry

Stanka Dalekova

Paysafe provides simple and secure payment solutions to businesses of all sizes around the world, processing billions of payment dollars a year. This, combined with the focus of flawless customer experience and real-time money transfer, makes it a candidate for the “dark side” of the payments industry: fraudsters, money launderers, etc. With traditional data storage techniques such as relational technologies, it is almost impossible to see beyond individual accounts to the connections between them. In this session see how Paysafe implemented the property graph technologies in Oracle Spatial and Graph and Oracle Database, including its fast, built-in, in-memory graph analytics, to perform fast graph queries that identify patterns of fraud.

Using Graph Analysis and Fraud Detection in the Fintech Industry

Stanka Dalekova

Application metrics - Confoo 2019

Rafael Dohms

We all know not to poke at alien life forms in another planet, right? But what about metrics, do you know how to pick, measure and draw conclusions from them? In this talk we will cover various Site Reliability Engineering topics, such as SLIs and SLOs while we explore real life examples of defining and implementing metrics in a system with examples using Prometheus, an open-source system monitoring and alert platform, to demonstrate implementation. Let's get back to some real science.

Ustream vs Legacy, It's never too late to start your fight! #Jsist 2014

Máté Nádasdi

Retail referencearchitecture productcatalog

MongoDB

During this session we will cover the best practices for implementing a product catalog with MongoDB. We will cover how to model an item properly when it can have thousands of variations and thousands of properties of interest. You'll learn how to index properly and allow for faceted search with milliseconds response latency and how to implement per-store, per-sku pricing while still keeping a sane number of documents. We will also cover operational considerations, like how to bring the data closer to users to cut down the network latency.

Microxchg Analyzing Response Time Distributions for Microservices

Adrian Cockcroft

Similar to High Performance Applications with MongoDB (20)

Remaining Agile with Billions of Documents: Appboy and Creative MongoDB Schemas

MongoDB Stitch Introduction

MongoDB Days UK: Building Apps with the MEAN Stack

Day 4 - Cloud Migration - But How?

MongoDB Stich Overview

Moved to https://slidr.io/azzazzel/web-application-performance-tuning-beyond-xmx

MongoDB for Time Series Data: Setting the Stage for Sensor Management

WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform

Anything data (revisited)

Comparison between OGC Sensor Observation Service and SensorThings API

Socialite, the Open Source Status Feed

Intro to node and mongodb 1

Introducing MongoDB Stitch, Backend-as-a-Service from MongoDB

Cassandra's Sweet Spot - an introduction to Apache Cassandra

Using Graph Analysis and Fraud Detection in the Fintech Industry

Application metrics - Confoo 2019

Ustream vs Legacy, It's never too late to start your fight! #Jsist 2014

Retail referencearchitecture productcatalog

Microxchg Analyzing Response Time Distributions for Microservices

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB

During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB

MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB

Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe. This talk covers: Common components of an IoT solution The challenges involved with managing time-series data in IoT applications Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance. How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB

Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB

Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch". This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB

When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB

MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business. This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB

Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms. How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms? In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

MongoDB

Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $. La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!

MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...

MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB

MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...

MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data

MongoDB SoCal 2020: MongoDB Atlas Jump Start

MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]

MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2

MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...

MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!

MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset

MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart

MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...

MongoDB .local San Francisco 2020: Aggregation Pipeline Power++

MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...

MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive

MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang

MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...

MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

GridMate - End to end testing is a critical piece to ensure quality and avoid...

ThomasParaiso2

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

nkrafacyberclub

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

Neo4j

Leonard Jayamohan, Partner & Generative AI Lead, Deloitte This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

Elizabeth Buie - Older adults: Are we really designing for our future selves?

Nexer Digital

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

PHP Frameworks: I want to break free (IPC Berlin 2024)

Ralf Eggert

In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development. This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

Uni Systems Copilot event_05062024_C.Vlachos.pdf

Uni Systems S.M.S.A.

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

National Security Agency - NSA mobile device best practices

Quotidiano Piemontese

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

Peter Spielvogel

Building better applications for business users with SAP Fiori. • What is SAP Fiori and why it matters to you • How a better user experience drives measurable business benefits • How to get started with SAP Fiori today • How SAP Fiori elements accelerates application development • How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities • How SAP Fiori paves the way for using AI in SAP apps

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

GridMate - End to end testing is a critical piece to ensure quality and avoid...

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Pushing the limits of ePRTC: 100ns holdover for 100 days

Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx

GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...

The Art of the Pitch: WordPress Relationships and Sales

Essentials of Automations: The Art of Triggers and Actions in FME

Removing Uninteresting Bytes in Software Fuzzing

Elizabeth Buie - Older adults: Are we really designing for our future selves?

FIDO Alliance Osaka Seminar: Overview.pdf

PHP Frameworks: I want to break free (IPC Berlin 2024)

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Uni Systems Copilot event_05062024_C.Vlachos.pdf

20240607 QFM018 Elixir Reading List May 2024

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

National Security Agency - NSA mobile device best practices

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Generative AI Deep Dive: Advancing from Proof of Concept to Production

SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf

High Performance Applications with MongoDB

1. High Performance with MongoDB or "how to design fast applications" Asya Kamsky Lead Product Manager, MongoDB Inc #MongoDB @asya999 #askAsya

2. What Is Fast? • Must understand – what "fast" means – how to measure it – what are requirements – what's the context

3. What Is Fast?

4. What Is Fast?

5. What Is Fast?

6. George Washington Bridge

10. Is It Fast? • In context of crossing the bridge, fast means: – how long will it take one car – how many cars can do it "at the same time"

11. Is It Fast? Facts & Info Opened to traffic Upper level: October 25, 1931 Lower level: August 29, 1962 Bus Station opened: January 17, 1963 Length of bridge between anchorages: 4,760 feet Width of bridge: 119 feet Width of roadway: 90 feet Height of tower above water: 604 feet Water clearance at midspan: 212 feet Number of toll lanes: Upper level: 12 Lower level: 10 Palisades Interstate Parkway: 7* * E-ZPass only overnight 2013 Traffic Volumes Total New York-bound (eastbound) traffic: 49,402,245 vehicles

12. What Is Fast?

13. What Is Fast? Latency Throughput How long "it" takes How many "per unit of time"

14. What Is Fast? Latency ThroughputThroughput Latency

15. What Is Fast? Latency ThroughputThroughput Latency Orthogonal, but highly interdependent

16.

17. What Is Fast? Latency ThroughputThroughput Latency

18. What Is Fast? Latency ThroughputThroughput Latency

19.

20. What Is Fast? New Jersey New York

21. What Is Fast? New Jersey New York

22. What Is Fast? New Jersey New York

23. Must address the "limiting factor"

24. Application Driver DB Requests

25. Application Schema Indexes Storage Engine Driver DB Requests

26. Application Schema Indexes File System Storage Engine OS Driver DB Requests

27. Application Schema Indexes File System Storage Engine OS Driver DB Requests

28. Application Schema Indexes File System Storage Engine OS Driver DB Requests PhysicalConceptual

29. Application Schema Indexes File System Storage Engine OS Driver DB Requests PhysicalConceptual

30. Schema Indexes Storage Engine

31. Schema

32. Schema Patterns

33.

34.

35. Schema Anti-Patterns

36. Parent Object OVER-NORMALIZATION OVER-EMBEDDING Schema Anti-Patterns

37. Deeply nested arrays Really large documents Schema Anti-Patterns: over-embedding

38. Deeply nested arrays Really large documents Schema Anti-Patterns: over-embedding

39. Deeply nested arrays Really large documents Schema Anti-Patterns: over-embedding

40. Unbounded growth Deeply nested arrays Really large documents Schema Anti-Patterns: over-normalizing you are over-normalizing if you are doing JOINS in your application instead of "finds"

41. reads vs writes polymorphic collections polymorphic fields Schema Anti-Patterns: signs of trouble

42. reads vs writes polymorphic collections polymorphic fields Schema Anti-Patterns: signs of trouble

43. reads vs writes polymorphic collections polymorphic fields Schema Anti-Patterns: signs of trouble

44. bad regex queries lots of indexes no indexes Schema Anti-Patterns: can't use indexes

45. bad regex queries lots of indexes no indexes Schema Anti-Patterns: can't use indexes

46. bad regex queries lots of indexes no indexes Schema Anti-Patterns: can't use indexes

47. bad regex queries lots of indexes no indexes Schema Anti-Patterns: can't use indexes

48. Indexes

49.

50.

51.

52.

53.

54.

55. Storage Engine

56. Storage Engine: compression

57.

58.

59.

60.

61.

62.

63. Storage Engine: concurrency

64. MMAPV1 WiredTiger Granularity low Latency low Granularity high Latency higher

65. New Jersey New York

66. New Jersey New York

67. New Jersey New York

68. New Jersey New York

69. New Jersey New York

70.

71.

72.

73.

74.

75.

76. 0 10,000 20,000 30,000 40,000 50,000 60,000 Uniform Latest Zipfian Throughput: 50/50 Workload in RAM

77.

78.

79. htop

80. Storage Engine: write-pattern

81. { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } } }

82. { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } } } db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } } )

83. { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 2000000 } } } db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } } )

84. { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } } }

85. { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } } } db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } } )

86. { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 1500000 } } } db.metrics.update( { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used” }, {$set: {“values.59.59”: 2000000 } } ) { timestamp_hour: ISODate("2015-11-10T23:00:00.000Z"), type: “memory_used”, values: { 0: { 0: 999999, 1: 999999, …, 59: 1000000 }, 1: { 0: 2000000, 1: 2000000, …, 59: 1000000 }, …, 58: { 0: 1600000, 1: 1200000, …, 59: 1100000 }, 59: { 0: 1300000, 1: 1400000, …, 59: 2000000 } } }

87. MongoDB Cloud Monitoring

88. Benchmark your own application Use realistic workload Use real data Measure throughput and latency

Editor's Notes

What is fast? Before we can agree what our topic is, we have to literally define what fast means for you. For your application, for your users, for your stakeholders.
For your application, for your users, for your stakeholders. What's fast in one context, / may not be fast in \ another
may not be fast in
fast in \ another context, let me give you an example
For those unfamiliar with this area, here were my options. Holland, Lincoln and
By far the most scenic is George Washington Bridge the world's busiest motor vehicle bridge. Twice as long as any previous suspension bridge
when its design finalized in 1923, construction started in 1927 and the bridge was first opened to traffic in 1931 1932 more than 5.5 million vehicles used original six lane roadway. Two center lanes were added in 1946, increasing capacity by 1/3rd. Six lanes of the lower roadway were completed in 1962.
bringing bridge to 14 lanes it has today. So let me ask you this:
is the George Washington Bridge fast? Well, that's a bit of a non sequitor as a question in a vacuum isn't it? The bridge cannot be fast, it's not even going anywhere! But we all have context here. So what matters when I ask this question is whether it's a fast way to get from NJ to NY.
*For me* to get to NY "fast" meant to get across the Hudson river as quickly (and painlessly) as possible.
speed limit which is 45 MPH, let's just say that to drive across GW bridge would take about one minute. but we wouldn't measure GW capacity by how long it took me, but by how many cars can make use of it. 50M just from NJ to NY.
back to your application. In a vacuum, it's not slow or fast. your stakeholders say "fast application" what we mean is perform whatever it is that it does for the end-user quickly. I'm not the only car on the GWB, there is never just one end-user - we want the application to perform quickly and consistently for all endusrs. User: what matters is fast response time, for you matters how many can use it simultaneously.
How many users or operations we can process at any given time, or in a given period of time / we call that throughput. So latency == how long something takes; Throughput == how many you can process "in parallel" You'd be surprised how often they get confused for one another...
One of the reasons that latency and throughput get conflated /when talking about performance, is because they are closely related. You can easily see in the single threaded / case that your latency directly impacts your throughput. The higher/WORSE /the latency, the/ lower the throughput. And sometimes,/ the lower the throughput, the higher the latency... happens when (two in one) /
One of the reasons that latency and throughput get conflated /when talking about performance, is because they are closely related. You can easily see in the single threaded / case that your latency directly impacts your throughput. The higher/WORSE /the latency, the/ lower the throughput. And sometimes,/ the lower the throughput, the higher the latency... happens when (two in one) /
If your latency across the bridge is caused by delays at the toll booths,
One of the reasons that latency and throughput get conflated /when talking about performance, is because they are closely related. You can easily see in the single threaded / case that your latency directly impacts your throughput. The higher/WORSE /the latency, the/ lower the throughput. And sometimes,/ the lower the throughput, the higher the latency... happens when (two in one) /
THIS IS BECAUSE EACH PHYSICAL RESOURCE CAN ONLY ACCOMMODATE A FIXED NUMBER OF CLIENTS.
because everyone has to wait. So they get worse together. So increasing latency can reduce your throughput And decreasing throughput increases latency. - that is undeniable. I'm sure we've all experienced it. A slightly less intuitive concept is that increasing throughput capacity, may or may not reduce your latency. It depends how much of latency is inherent in doing the operation itself and how much is caused by waiting due to ... well, lack of throughput...
Adding more lanes, without adding more toll booths will *not* help with either throughput or latency. [click] adding more toll booths will likely reduce the time across the bridge. ... only to a point, because no matter how many toll booths or lanes you add, the laws of physics (and speed limit laws) will make it hard to reduce the duration of our trip across the Hudson to less than about a minute.
Adding more lanes, without adding more toll booths will *not* help with either throughput or latency. [click] adding more toll booths will likely reduce the time across the bridge. ... only to a point, because no matter how many toll booths or lanes you add, the laws of physics (and speed limit laws) will make it hard to reduce the duration of our trip across the Hudson to less than about a minute.
Spped of light, it's not just a good idea, it's the law!
Why does all of this matter and how does this tie into your application design decisions? Well, just like getting across the Hudson requires a working vehicle, an open road and a variety of other favorable conditions, your application comprises many components, and all of them must be working together optimally to get the best possible performance to your end-user – focusing on speeding up the wrong component (not bottleneck) will be useless... SYSTEM COMPONENTS
Any one can slow down each user as they go through it, and that will increase latency, reducing your throughput significantly. You know – opposite of "fast application". Components can be split into two groups:
Any one can slow down each user as they go through it, and that will increase latency, reducing your throughput significantly. You know – opposite of "fast application". Components can be split into two groups:
Any one can slow down each user as they go through it, and that will increase latency, reducing your throughput significantly. You know – opposite of "fast application". Components can be split into two groups:
Any one can slow down each user as they go through it, and that will increase latency, reducing your throughput significantly. You know – opposite of "fast application". Components can be split into two groups:
Physical components /resources - and conceptual components -/your algorithms, data structures,/schema, indexes, choice of storage engine, choice of OS and FS - all those choices affect how your physical resources will be used up. So if you don't design your application well, you will be unnecessarily exhausting some of these limited physical resources, causing your application to perform worse than it might with optimal design. [PAUSE] Of course, physical components must be properly sized and tuned. File System tuning, OS tuning: we don't have time to get into the specifics of it here, but there's lots of information available so just keep in mind that if you don't follow best practices in configuring your file system, it's a little bit like trying to drive across GW bridge with four flat tires - let's just agree that's a bad idea? [PAUSE] /Two big components we will focus on in detail are the parts of the DB:
Physical components /resources - and conceptual components -/your algorithms, data structures,/schema, indexes, choice of storage engine, choice of OS and FS - all those choices affect how your physical resources will be used up. So if you don't design your application well, you will be unnecessarily exhausting some of these limited physical resources, causing your application to perform worse than it might with optimal design. [PAUSE] Of course, physical components must be properly sized and tuned. File System tuning, OS tuning: we don't have time to get into the specifics of it here, but there's lots of information available so just keep in mind that if you don't follow best practices in configuring your file system, it's a little bit like trying to drive across GW bridge with four flat tires - let's just agree that's a bad idea? [PAUSE] /Two big components we will focus on in detail are the parts of the DB:
[ schema / indexes ] [ STORAGE ENGINE ]
Schema Design is the building block of your application and getting it right is essential to making your application's DB requests efficient. We do that by structuring your data in a way that your application can easily read and write This willl minimize the resources used while minimizing latency of each request.
Tailoring your schema design to fit your read and write patterns is like using the right tool for the job. Good schema design will always take into account data locality - that's co-locating data that you tend to get at the same time into the same documents. Now that's a rule of thumb, there are definitely ways to take this too far – important counter point to this is "don't store data in the document that you tend not to need immediately".
Imagine you have to get 50 people across George Washington bridge,/would you use a car and make over a dozen trips? /Or would you use a much slower moving bus and get the job done in a single trip? [PAUSE] On the other hand, if you have one passenger, you might get better gas mileage if you take a car rather than the bus. If u r making lots of trips to fetch all the data ...
Imagine you have to get 50 people across George Washington bridge,/would you use a car and make over a dozen trips? /Or would you use a much slower moving bus and get the job done in a single trip? [PAUSE] On the other hand, if you have one passenger, you might get better gas mileage if you take a car rather than the bus. If u r making lots of trips to fetch all the data ...
to fetch all the data you need for a single operation is called an anti-pattern in schema design, we recognize as over-normalization. /On the other hand, getting way more data each time than you need is usually a sign of the opposite problem - let's call it over-embedding. "ANTIPATTERNS"
to fetch all the data you need for a single operation is called an anti-pattern in schema design, we recognize as over-normalization. /On the other hand, getting way more data each time than you need is usually a sign of the opposite problem - let's call it over-embedding. "ANTIPATTERNS"
sign you might be over embedding: Your documents tend to grow unbounded (you keep pushing more values into arrays, though you don't usually need them all) / You have deeply nested arrays within arrays but you usually need to work only with a small number of elements in it (NOT ALWAYS)/ Your documents are really large [PAUSE] Some of the signs you might be over-normalizing
sign you might be over embedding: Your documents tend to grow unbounded (you keep pushing more values into arrays, though you don't usually need them all) / You have deeply nested arrays within arrays but you usually need to work only with a small number of elements in it (NOT ALWAYS)/ Your documents are really large [PAUSE] Some of the signs you might be over-normalizing
sign you might be over embedding: Your documents tend to grow unbounded (you keep pushing more values into arrays, though you don't usually need them all) / You have deeply nested arrays within arrays but you usually need to work only with a small number of elements in it (NOT ALWAYS)/ Your documents are really large [PAUSE] Some of the signs you might be over-normalizing
1 sign you might be over-normalizing [CLICK] you keep implementing joins in your application for every "query".
Other Signs you may run into trouble with your schema in the future: IF u haven't considered relative SLAs of reads vs writes - usually if we architect our system to make one of those faster it's at the cost of the other - more on that when we come to indexes. So knowing which you can afford to be a bit slower (higher latency) up front will help you make these trade-offs correctly. Another one: You have lots of different types of documents in the same collection - usually it's a sign of trouble. [PAUSE] You have lots of different types of values in the same field across a collection (sometimes string, sometimes date, sometimes number).[PAUSE] that will bring you to the BIGGEST warning sign: Your queries can't use indexes efficiently:
Other Signs you may run into trouble with your schema in the future: IF u haven't considered relative SLAs of reads vs writes - usually if we architect our system to make one of those faster it's at the cost of the other - more on that when we come to indexes. So knowing which you can afford to be a bit slower (higher latency) up front will help you make these trade-offs correctly. Another one: You have lots of different types of documents in the same collection - usually it's a sign of trouble. [PAUSE] You have lots of different types of values in the same field across a collection (sometimes string, sometimes date, sometimes number).[PAUSE] that will bring you to the BIGGEST warning sign: Your queries can't use indexes efficiently:
Other Signs you may run into trouble with your schema in the future: IF u haven't considered relative SLAs of reads vs writes - usually if we architect our system to make one of those faster it's at the cost of the other - more on that when we come to indexes. So knowing which you can afford to be a bit slower (higher latency) up front will help you make these trade-offs correctly. Another one: You have lots of different types of documents in the same collection - usually it's a sign of trouble. [PAUSE] You have lots of different types of values in the same field across a collection (sometimes string, sometimes date, sometimes number).[PAUSE] that will bring you to the BIGGEST warning sign: Your queries can't use indexes efficiently:
can't use indexes efficiently: - unanchored or case insensitive regex's - you need dozens of indexes for a single collection -worst you have no idea what indexes you might possibly need on a collection. Which brings us to the other biggest determining factor of whether your application will be fast:
can't use indexes efficiently: - unanchored or case insensitive regex's - you need dozens of indexes for a single collection -worst you have no idea what indexes you might possibly need on a collection. Which brings us to the other biggest determining factor of whether your application will be fast:
can't use indexes efficiently: - unanchored or case insensitive regex's - you need dozens of indexes for a single collection -worst you have no idea what indexes you might possibly need on a collection. Which brings us to the other biggest determining factor of whether your application will be fast:
can't use indexes efficiently: - unanchored or case insensitive regex's - you need dozens of indexes for a single collection -worst you have no idea what indexes you might possibly need on a collection. Which brings us to the other biggest determining factor of whether your application will be fast:
I wouldn't be exaggerating if I told you that when our support is dealing with a customer whose application is "slow" over 90% of the time, the indexes are suboptimal or outright missing for some high percentage of the slow operations! And this is in spite of the fact that we constantly harp about how important indexing is to good performance, and of course *all* databases require indexing to work well, right? let me show you how BAD life is with no indexes:
Here is my bridge analogy extended to such systems: Imagine that every morning, a bus, let's say NJ Transit picks up passengers at bus stops and then heads across GWB. How.. impact the "latency" of the trip, if throw away schedule & signed bus stops the bus just drove on every street to see if any of the people who wanted to go to NY were there? I don't imagine that would work very well. YOUR APP=query:{ } [PAUSE] And yet users deploy applications into production without having proper indexes in place – frequently because they didn't do proper test - they didn't benchmark their application's performance. (more about that at the end).
Here is my bridge analogy extended to such systems: Imagine that every morning, a bus, let's say NJ Transit picks up passengers at bus stops and then heads across GWB. How.. impact the "latency" of the trip, if throw away schedule & signed bus stops the bus just drove on every street to see if any of the people who wanted to go to NY were there? I don't imagine that would work very well. YOUR APP=query:{ } [PAUSE] And yet users deploy applications into production without having proper indexes in place – frequently because they didn't do proper test - they didn't benchmark their application's performance. (more about that at the end).
Here is my bridge analogy extended to such systems: Imagine that every morning, a bus, let's say NJ Transit picks up passengers at bus stops and then heads across GWB. How.. impact the "latency" of the trip, if throw away schedule & signed bus stops the bus just drove on every street to see if any of the people who wanted to go to NY were there? I don't imagine that would work very well. YOUR APP=query:{ } [PAUSE] And yet users deploy applications into production without having proper indexes in place – frequently because they didn't do proper test - they didn't benchmark their application's performance. (more about that at the end).
Here is my bridge analogy extended to such systems: Imagine that every morning, a bus, let's say NJ Transit picks up passengers at bus stops and then heads across GWB. How.. impact the "latency" of the trip, if throw away schedule & signed bus stops the bus just drove on every street to see if any of the people who wanted to go to NY were there? I don't imagine that would work very well. YOUR APP=query:{ } [PAUSE] And yet users deploy applications into production without having proper indexes in place – frequently because they didn't do proper test - they didn't benchmark their application's performance. (more about that at the end).
Here is my bridge analogy extended to such systems: Imagine that every morning, a bus, let's say NJ Transit picks up passengers at bus stops and then heads across GWB. How.. impact the "latency" of the trip, if throw away schedule & signed bus stops the bus just drove on every street to see if any of the people who wanted to go to NY were there? I don't imagine that would work very well. YOUR APP=query:{ } [PAUSE] And yet users deploy applications into production without having proper indexes in place – frequently because they didn't do proper test - they didn't benchmark their application's performance. (more about that at the end).
Here is my bridge analogy extended to such systems: Imagine that every morning, a bus, let's say NJ Transit picks up passengers at bus stops and then heads across GWB. How.. impact the "latency" of the trip, if throw away schedule & signed bus stops the bus just drove on every street to see if any of the people who wanted to go to NY were there? I don't imagine that would work very well. YOUR APP=query:{ } [PAUSE] And yet users deploy applications into production without having proper indexes in place – frequently because they didn't do proper test - they didn't benchmark their application's performance. (more about that at the end).
I'm sure you are all excited to hear about how awesome Wired Tiger is - and it is! But of course - the right tool for the job and all that. There are a couple of important differences between MMAP and WT that I want you to understand so you can take advantage of the strengths of each.
Most easily seen difference: WT has on-disk compression. MMAP does not. MMAP does X. WT does Y. Will it help with RAM? yes – prefix index compression.
Index prefix compression 7X (1/7th) 20% or less! 40% 3%
We have our own application Evergreen - our continuous build integration that runs thousands of tests and has TBs of log files - it was doing fine with MMAP but with 10x compression in WT we are able to now keep 10x as many runs of history! talk tomorrow afternoon about it.
If disk resource is a big limiting factor for your application, AND your data is highly compressible, CPU cycles available? then WT FTW!
If disk resource is a big limiting factor for your application, AND your data is highly compressible, CPU cycles available? then WT FTW!
If disk resource is a big limiting factor for your application, AND your data is highly compressible, CPU cycles available? then WT FTW!
If disk resource is a big limiting factor for your application, AND your data is highly compressible, CPU cycles available? then WT FTW!
interesting, complex, CONCURRENCY impacts both latency Throughput. lot has been said over the years about MMAP low granularity concurrency. It's like relatively few toll booths in front of GWB. It can be a limiting factor. But - for actual execution of the operation, mmap is "faster" i.e. lower latency. Wired Tiger has very high grained concurrency - in fact, not "document level *locking*" - it uses clever lock-free algorithms to achieve high degree of concurrency. But related to that, the latency of a single operation is higher than with mmap. WT ^thruput ^latency
Wired Tiger has very high grained concurrency - in fact, not "document level *locking*" - it uses clever lock-free algorithms to achieve high degree of concurrency. But related to that, the latency of a single operation is higher than with mmap. WT ^thruput ^latency
Why would granularity of locking impact latency this way? Imagine GWB lanes again... MMAP is like having one toll booth (or one per lane). - once you pay the toll and you *know* you are the only person in that lane so you can go as fast as possible
Why would granularity of locking impact latency this way? Imagine GWB lanes again... MMAP is like having one toll booth (or one per lane). - once you pay the toll and you *know* you are the only person in that lane so you can go as fast as possible
WiredTiger, well, I'm stretching the metaphor a little here, but imagine that there are no toll booths. Everyone has EZ-Pass or FastTrak or whatever. And you drive to your lane BUT if you find yourself in contention
Why would granularity of locking impact latency this way? Imagine GWB lanes again... MMAP is like having one toll booth (or one per lane). - once you pay the toll and you *know* you are the only person in that lane so you can go as fast as possible
might find yourself in contention with another car for this lane, then one of you has to stop and try again. So first, you can't drive quite so fast, because you have to be able to notice another car in your lane in time to stop, and second if you do meet contention then you have to stop and try again. WRITE-CONFLICTS NO BLIND WRITES.
So when is this a big win for Wired Tiger? Well, you have to have (a) multiple threads! too few threads and you aren't winning big from the clever algorithms (b) multiple threads have to be contending on the same collection (otherwise mmap has coll-level lock) (c) multiple threads must NOT be all contending on a single document (if they are then well, you see) (d) CPU available but (e) you must not have significantly more threads than you have "lanes" - in this case CPU processors Here are some "benchmarks"
So when is this a big win for Wired Tiger? Well, you have to have (a) multiple threads! too few threads and you aren't winning big from the clever algorithms (b) multiple threads have to be contending on the same collection (otherwise mmap has coll-level lock) (c) multiple threads must NOT be all contending on a single document (if they are then well, you see) (d) CPU available but (e) you must not have significantly more threads than you have "lanes" - in this case CPU processors Here are some "benchmarks"
So when is this a big win for Wired Tiger? Well, you have to have (a) multiple threads! too few threads and you aren't winning big from the clever algorithms (b) multiple threads have to be contending on the same collection (otherwise mmap has coll-level lock) (c) multiple threads must NOT be all contending on a single document (if they are then well, you see) (d) CPU available but (e) you must not have significantly more threads than you have "lanes" - in this case CPU processors Here are some "benchmarks"
So when is this a big win for Wired Tiger? Well, you have to have (a) multiple threads! too few threads and you aren't winning big from the clever algorithms (b) multiple threads have to be contending on the same collection (otherwise mmap has coll-level lock) (c) multiple threads must NOT be all contending on a single document (if they are then well, you see) (d) CPU available but (e) you must not have significantly more threads than you have "lanes" - in this case CPU processors Here are some "benchmarks"
So when is this a big win for Wired Tiger? Well, you have to have (a) multiple threads! too few threads and you aren't winning big from the clever algorithms (b) multiple threads have to be contending on the same collection (otherwise mmap has coll-level lock) (c) multiple threads must NOT be all contending on a single document (if they are then well, you see) (d) CPU available but (e) you must not have significantly more threads than you have "lanes" - in this case CPU processors Here are some "benchmarks"
So when is this a big win for Wired Tiger? Well, you have to have (a) multiple threads! too few threads and you aren't winning big from the clever algorithms (b) multiple threads have to be contending on the same collection (otherwise mmap has coll-level lock) (c) multiple threads must NOT be all contending on a single document (if they are then well, you see) (d) CPU available but (e) you must not have significantly more threads than you have "lanes" - in this case CPU processors Here are some "benchmarks"
Not contending on the same document!!! and contending. Uniform, latest, zipfian
you must not have significantly more threads than you have "lanes" - in this case CPU processors if you have a huge number of threads which are all trying to do active work on a small number of cores then you will waste a huge amount of resources on just context switching and not actually doing work plus more threads contending on same documents.
Even for read heavy loads, huge number of threads which are all trying to do active work on a small number of cores then you will waste a huge amount of resources on just context switching
context switching and not actually doing work. That's concurrency and multithreading. So please don't do any single threaded benchmarking of WT and then ask how come it's not as fast as you heard. But don't benchmark 500 threads on a 4-core laptop!
The other other significant differentiator is the "write pattern". I'm not talking compressing data on disk & using the disk IOPs a lot more judiciously than MMAP. I'm talking abotu write amplification. There is a big difference in how writes are done during updates: MMAP does "in place" updates WT does "copy on write" on all updates. Illustration using a document rather than a bridge 
Here' a time series document for a particular hour, with minutes and seconds. if you make an update to document
update to this document, mmap will overwrite the existing document with new value.
new value.
back to original document: WiredTiger will rewrite the current document on update
document (or more technically the internal page that contains that document) as a new version
new version of that page. This of course enables whoever was reading that page to still be reading it as the previous version of that page, which will get recycled when everyone who was using it is done with it. USE CASE Think about the use case where you have a very high number of documents that are nonetheless a small portion of your total data that are being extremely frequently updated, over and over again?
I'm talking of course of a system like MMS monitoring component which receives a large number of performance metrics and updates counters inside documents that don't change except for these numbers being incremented for the duration of whatever the document represents. Here, with schema heavily optimized to make sure updates are in place, performance is better with mmap even though it uses up more disk space (and RAM).
And this brings me to the most important point I'm going to make – all the generalizations are just that - no matter what I told you here today, no matter what you read on the internet, the only way to know for sure how fast your application is with your carefully selected schema and your carefully selected indexes would be to stress test and measure it. The examples I used are both applications we run in-house that we benchmarked with both storage engines with different configurations and physical resources to make the most appropriate choices - you guys should do the same. Oh, and if you happen to be going back to Jersey tonight and you want to have predictable latency
do yourself a favor, and take the train. Thank you!

High Performance Applications with MongoDB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to High Performance Applications with MongoDB

Similar to High Performance Applications with MongoDB (20)

More from MongoDB

More from MongoDB (20)

Recently uploaded

Recently uploaded (20)

High Performance Applications with MongoDB

Editor's Notes