New applications, users and inputs demand new types of data, like unstructured, semi-structured and polymorphic data. Adopting MongoDB means adopting to a new, document-based data model.
While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't apply to MongoDB. Documents can represent rich data structures, providing lots of viable alternatives to the standard, normalized, relational model. In addition, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.
In this session, Buzz Moschetti explores how you can take advantage of MongoDB's document model to build modern applications.
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB.
NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
This is the third webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will explain the architecture of document databases.
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
This is the fourth webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will introduce you to the aggregation framework.
Dev Jumpstart: Schema Design Best PracticesMongoDB
New to MongoDB? We’ll discuss the tradeoff of various data modeling strategies in MongoDB. This talk will jumpstart your knowledge of how to work with documents, evolve your schema, and common schema design patterns. MongoDB’s basic unit of storage is a document. No prior knowledge of MongoDB is assumed.
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB.
NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
This is the third webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will explain the architecture of document databases.
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
This is the fourth webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will introduce you to the aggregation framework.
Dev Jumpstart: Schema Design Best PracticesMongoDB
New to MongoDB? We’ll discuss the tradeoff of various data modeling strategies in MongoDB. This talk will jumpstart your knowledge of how to work with documents, evolve your schema, and common schema design patterns. MongoDB’s basic unit of storage is a document. No prior knowledge of MongoDB is assumed.
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.
This talk will introduce the philosophy and features of the open source, NoSQL MongoDB. We’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app to store books. We’ll cover inserting, updating, and querying the database of books.
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosMongoDB
Este es el tercer seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web se explica la arquitectura de las bases de datos de documentos.
Back to Basics: My First MongoDB ApplicationMongoDB
This Back to Basics webinar series will introduce you to NoSQL and the MongoDB database. You will find out what MongoDB is, why you would use it, and what you would use it for.
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
Este es el cuarto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. Este seminario se ve en la compatibilidad con índices de texto libre y geoespaciales.
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
Schema design in MongoDB can be an art. Different trade offs should be considered when designing how to store your data. In this presentation we are going to cover some common scenarios, recommended practices and don'ts to avoid based on previous experiences
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDBMongoDB
Este es el segundo seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web mostraremos cómo construir una aplicación de creación de blogs en MongoDB.
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
There are many possible approaches to storing and querying relationships between users in social networks. This section will dive into the details of storing a social user graph in MongoDB. It will cover the various schema designs for storing the follower networks of users and propose an optimal design for insert and query performance, as well as looking at performance differences between them.
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
Il s'agit du deuxième webinaire de la série « Retour aux fondamentaux » qui a pour but de vous présenter la base de données MongoDB. Dans ce webinaire, nous vous expliquerons comment créer une application de création de blogs dans MongoDB.
Beyond the Basics 2: Aggregation Framework MongoDB
The aggregation framework is one of the most powerful analytical tools available with MongoDB.
Learn how to create a pipeline of operations that can reshape and transform your data and apply a range of analytics functions and calculations to produce summary results across a data set.
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
This is the first webinar of a Back to Basics series that will introduce you to the MongoDB database, what it is, why you would use it, and what you would use it for.
This talk examines four real-world use cases for MongoDB document-based data modeling. We examine the implications of several possible solutions for each problem.
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB
In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
Webinar: Building Your First App with MongoDB and JavaMongoDB
This webinar will walk you through building a simple Java-based application in MongoDB. We’ll cover the basics of MongoDB’s document model, query language, aggregation framework, and deployment architecture.
In this webinar, you will discover:
- How easy it is to start building Java applications with MongoDB
- Key features for manipulating and accessing data
- High availability and scale-out architecture
- WriteConcerns and ReadPreference
Back to Basics, webinar 4: Indicizzazione avanzata, indici testuali e geospaz...MongoDB
Questo è il quarto webinar della serie Back to Basics che ti offrirà un'introduzione al database MongoDB. Questo webinar guarda supporto all'indice full-text e il supporto geospaziale.
Building a Scalable Inbox System with MongoDB and Javaantoinegirbal
Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.
This talk will introduce the philosophy and features of the open source, NoSQL MongoDB. We’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app to store books. We’ll cover inserting, updating, and querying the database of books.
Conceptos básicos. seminario web 3 : Diseño de esquema pensado para documentosMongoDB
Este es el tercer seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web se explica la arquitectura de las bases de datos de documentos.
Back to Basics: My First MongoDB ApplicationMongoDB
This Back to Basics webinar series will introduce you to NoSQL and the MongoDB database. You will find out what MongoDB is, why you would use it, and what you would use it for.
Conceptos básicos. Seminario web 4: Indexación avanzada, índices de texto y g...MongoDB
Este es el cuarto seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. Este seminario se ve en la compatibilidad con índices de texto libre y geoespaciales.
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
Schema design in MongoDB can be an art. Different trade offs should be considered when designing how to store your data. In this presentation we are going to cover some common scenarios, recommended practices and don'ts to avoid based on previous experiences
Conceptos básicos. Seminario web 2: Su primera aplicación MongoDBMongoDB
Este es el segundo seminario web de la serie Conceptos básicos, en la que se realiza una introducción a la base de datos MongoDB. En este seminario web mostraremos cómo construir una aplicación de creación de blogs en MongoDB.
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
There are many possible approaches to storing and querying relationships between users in social networks. This section will dive into the details of storing a social user graph in MongoDB. It will cover the various schema designs for storing the follower networks of users and propose an optimal design for insert and query performance, as well as looking at performance differences between them.
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
Il s'agit du deuxième webinaire de la série « Retour aux fondamentaux » qui a pour but de vous présenter la base de données MongoDB. Dans ce webinaire, nous vous expliquerons comment créer une application de création de blogs dans MongoDB.
Beyond the Basics 2: Aggregation Framework MongoDB
The aggregation framework is one of the most powerful analytical tools available with MongoDB.
Learn how to create a pipeline of operations that can reshape and transform your data and apply a range of analytics functions and calculations to produce summary results across a data set.
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
This is the first webinar of a Back to Basics series that will introduce you to the MongoDB database, what it is, why you would use it, and what you would use it for.
This talk examines four real-world use cases for MongoDB document-based data modeling. We examine the implications of several possible solutions for each problem.
MongoDB San Francisco 2013: Data Modeling Examples From the Real World presen...MongoDB
In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
Webinar: Building Your First App with MongoDB and JavaMongoDB
This webinar will walk you through building a simple Java-based application in MongoDB. We’ll cover the basics of MongoDB’s document model, query language, aggregation framework, and deployment architecture.
In this webinar, you will discover:
- How easy it is to start building Java applications with MongoDB
- Key features for manipulating and accessing data
- High availability and scale-out architecture
- WriteConcerns and ReadPreference
Back to Basics, webinar 4: Indicizzazione avanzata, indici testuali e geospaz...MongoDB
Questo è il quarto webinar della serie Back to Basics che ti offrirà un'introduzione al database MongoDB. Questo webinar guarda supporto all'indice full-text e il supporto geospaziale.
Webinar: Getting Started with MongoDB - Back to BasicsMongoDB
Part one an Introduction to MongoDB. Learn how easy it is to start building applications with MongoDB. This session covers key features and functionality of MongoDB and sets out the course of building an application.
Back to Basics Webinar 6: Production DeploymentMongoDB
This is the final webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will guide you through production deployment.
This tutorial will introduce the features of MongoDB by building a simple location-based application using MongoDB. The tutorial will cover the basics of MongoDB’s document model, query language, map-reduce framework and deployment architecture.
The tutorial will be divided into 5 sections:
Data modeling with MongoDB: documents, collections and databases
Querying your data: simple queries, geospatial queries, and text-searching
Writes and updates: using MongoDB’s atomic update modifiers
Trending and analytics: Using mapreduce and MongoDB’s aggregation framework
Deploying the sample application
Besides the knowledge to start building their own applications with MongoDB, attendees will finish the session with a working application they use to check into locations around Portland from any HTML5 enabled phone!
TUTORIAL PREREQUISITES
Each attendee should have a running version of MongoDB. Preferably the latest unstable release 2.1.x, but any install after 2.0 should be fine. You can dowload MongoDB at http://www.mongodb.org/downloads.
Instructions for installing MongoDB are at http://docs.mongodb.org/manual/installation/.
Additionally we will be building an app in Ruby. Ruby 1.9.3+ is required for this. The current latest version of ruby is 1.9.3-p194.
For windows download the http://rubyinstaller.org/
For OSX download http://unfiniti.com/software/mac/jewelrybox/
For linux most users should know how to for their own distributions.
We will be using the following GEMs and they MUST BE installed ahead of time so you can be ahead of the game and safe in the event that the Internet isn’t accommodating.
bson (1.6.4)
bson_ext (1.6.4)
haml (3.1.4)
mongo (1.6.4)
rack (1.4.1)
rack-protection (1.2.0)
rack shotgun (0.9)
sinatra (1.3.2)
tilt (1.3.3)
Prior ruby experience isn’t required for this. We will NOT be using rails for this app.
Back to Basics Webinar 5: Introduction to the Aggregation FrameworkMongoDB
This is the fifth webinar of a Back to Basics series that will introduce you to the MongoDB database. This webinar will introduce you to the aggregation framework.
The storage engine is responsible for managing how data is stored, both in memory and on disk. MongoDB supports multiple storage engines, as different engines perform better for specific workloads.
View this presentation to understand:
What a storage engine is
How to pick a storage engine
How to configure a storage engine and a replica set
Speaker: Daniel Coupal
At this point, you may be familiar with the design of MongoDB databases and collections – but what are the frequent patterns you may have to model?
This presentation will add knowledge of how to represent common relationships (1-1, 1-N, N-N) in MongoDB. Going further than relationships, this presentation identifies a set of common patterns, in a similar way to what the Gang of Four did for Object Oriented Design. Finally, this presentation will guide you through the steps of modeling those patterns in MongoDB collections.
In this session, you will learn about:
How to create the appropriate MongoDB collections for some of the patterns discussed.
Differences in relationships vs. the relational database world, and how those differences translate to MongoDB collections.
Common patterns in developing applications with MongoDB, plus a specific vocabulary with which to refer to them.
One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions: Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document? Are join tables necessary, or is there another technique for building out many-to-many relationships? What level of denormalization is appropriate? How do my data modeling decisions affect the efficiency of updates and queries? In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB using a library as a sample application. You will learn how to work with documents, evolve your schema, and common schema design patterns.
Presented by Marc Schwering, Senior Solutions Architect, MongoDB
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB. You will learn:
- How to work with documents
- How to evolve your schema
- Common schema design patterns
Webinar: General Technical Overview of MongoDB for Dev TeamsMongoDB
In this talk we will focus on several of the reasons why developers have come to love the richness, flexibility, and ease of use that MongoDB provides. First we will give a brief introduction of MongoDB, comparing and contrasting it to the traditional relational database. Next, we’ll give an overview of the APIs and tools that are part of the MongoDB ecosystem. Then we’ll look at how MongoDB CRUD (Create, Read, Update, Delete) operations work, and also explore query, update, and projection operators. Finally, we will discuss MongoDB indexes and look at some examples of how indexes are used.
One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for relational databases, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense.
MongoDB San Francisco 2013: Schema design presented by Jason Zucchetto, Consu...MongoDB
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB using a library as a sample application. You will learn how to work with documents, evolve your schema, and common schema design patterns.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. Before We Begin
• This webinar is being recorded
• Use The Chat Window for
• Technical assistance
• Q&A
• MongoDB Team will answer quick
questions in realtime
• “Common” questions will be
reviewed at the end of the webinar
3. Theme #1:
Great Data Design involves
much more than the database
• Easily understood structures
• Harmonized with software
• Acknowledging legacy issues
4. Theme #2:
Today’s solutions need to
accommodate tomorrow’s needs
• End of “Requirements Complete”
• Ability to economically scale
• Shorter solutions lifecycles
17. 17
Substructure Works Well With Code
> addr = db.patrons.find({_id :“joe”},{“_id”: 0,”address”:1})
{
address: {
street: "123 Fake St. ",
city: "Faketon",
state: "MA",
zip: “12345”
}
}
// Pass the whole Map to this function:
doSomethingWithOneAddress(addr);
// Somewhere else in the code is the actual function:
doSomethingWithOneAddress(Map addr)
{ // Look for state }
19. 19
Substructure Amplifies Agility
> addr = db.patrons.find({_id :“bob”},{“_id”: 0,”address”:1})
{
address: {
street: ”139 W45 St. ",
city: ”NY",
state: ”NY",
country: ”USA”
}
}
// Pass the whole Map to this function:
doSomethingWithOneAddress(addr);
doSomethingWithOneAddress(Map addr)
{ // Look for state and optional country }
NO CHANGE
to queries
Only the single
implementation that
looks for country needs
to change
NO COMPILE-TIME
DEPENDENCIES
when passing Maps
20. 20
The Advantage over Rectangles
resultSet = select street, state, city, country, …
Map addr = processIntoMap(resultSet);
// Pass the whole Map to this function:
doSomethingWithOneAddress(addr);
doSomethingWithOneAddress(Map addr)
{ // Look for state and optional country }
Queries must change
to pick up new columns
Compile-time
dependency to process
new columns to Map
22. 22
One-to-One Relationships
• “Belongs to” relationships are often embedded
• Holistic representation of entities with their
embedded attributes and relationships
• Great read performance
Most important:
• Keeps simple things simple
• Frees up time to tackle harder schema
design issues
24. A Patron and their Addresses
> db.patrons.find({ _id : “bob” })
{
_id: “bob",
name: “Bob Knowitall",
addresses: [
{street: "1 Vernon St.", city: "Newton", …},
{street: "52 Main St.", city: "Boston", …}
]
}
25. A Patron and their Addresses
> db.patrons.find({ _id : “bob” })
{
_id: “bob",
name: “Bob Knowitall",
addresses: [
{street: "1 Vernon St.", city: "Newton", …},
{street: "52 Main St.", city: "Boston", …}
]
}
> db.patrons.find({ _id : “joe” })
{
_id: "joe",
name: "Joe Bookreader",
address: { street: "123 Fake St. ", city: "Faketon", …}
}
26. 26
Migration Options
• Migrate all documents when the schema changes.
• Migrate On-Demand
– As we pull up a patron’s document, we make the
change.
– Any patrons that never come into the library never get
updated.
• Leave it alone
– The code layer knows about both address and
addresses
27. 27
Letting The Code Deal With Documents
Map d = collection.find(new BasicDBObject(”_id”,”bob”));
// Contract: Return either a List of addresses or a null
// if no addresses exist
// Try to get the new “version 2” shape:
List addrl = (List) d.get(”addresses”);
// If not there, try to get the old one:
if(addrl == null) {
Map oneAddr = (Map) d.get(”address”);
if(oneAddr != null) {
addrl = new List();
addrl.append(oneAddr);
}
}
// addrl either exists with 1 or more items or is null
29. 29
Book
• MongoDB: The Definitive Guide,
• By Kristina Chodorow and Mike Dirolf
• Published: 9/24/2010
• Pages: 216
• Language: English
• Publisher: O’Reilly Media, CA
32. 32
One-To-Many Using Embedding
• Optimized for read performance of Books
• We accept data duplication
• An index on “publisher.name” provides:
– Efficient lookup of all books for given publisher name
– Efficient way to find all publisher names (distinct)
• Does not automatically mean there is no “master”
Publisher collection (from which data is copied when
creating a new Book)
34. Single Book with Linked Publisher
> book = db.books.find({ _id: “123” })
{
_id: “123”,
publisher_id: “oreilly”,
title: "MongoDB: The Definitive Guide",
…
}
> db.publishers.find({ _id : book.publisher_id })
{
_id: “oreilly”,
name: "O’Reilly Media",
founded: 1980,
locations: ["CA”, ”NY” ]
}
35. Multiple Books with Linked Publisher
db.books.find({ pages: {$gt:100}} ).forEach(function(book) {
// Do whatever you need with the book document, but
// in addition, capture publisher ID uniquely by
// using it as a key in an object (Map)
tmpm[book.publisher.name] = true;
});
uniqueIDs = Object.keys(tmpm); // extract ONLY keys
db.publishers.find({"_id": {"$in": uniqueIDs } });
36. Cartesian Product != Desired Structure
resultSet = “select B.name, B.publish_date, P.name, P.founded
from Books B, Publisher P
where P.name = B.publisher_name
and B.pages > 100”
B.Name B.publish_date P.name P.founded
More Jokes 2003 Random House 1843
Perl Tricks 1998 O’Reilly 1980
More Perl 2000 O’Reilly 1980
Starting Perl 1996 O’Reilly 1980
Flying Kites 1980 Random House 1843
Using Perl 2002 O’Reilly 1980
Bad Food 2011 Random House 1843
37. …And Tough To Use Without ORDER BY
resultSet = “select B.name, B.publish_date, P.name, P.founded
from Books B, Publisher P
where P.name = B.publisher_name
and B.pages > 100
order by P.name”;
B.Name B.publish_date P.name P.founded
Perl Tricks 1998 O’Reilly 1980
More Perl 2000 O’Reilly 1980
Using Perl 2002 O’Reilly 1980
Starting Perl 1996 O’Reilly 1980
Flying Kites 1980 Random House 1843
Bad Food 2011 Random House 1843
More Jokes 2003 Random House 1843
38. SQL Is About Disassembly
resultSet = “select B.name, B.publish_date, P.name, P.founded
from Books B, Publisher P
where P.name = B.publisher_name
and B.pages > 100
order by P.name”;
prev_name = null;
while(resultSet.next()) {
if(!resultSet.getString(“P.name”).equals(prevName)) {
// “next” publisher name found. Process material
// accumulated and reset for next items.
makeNewObjects(); //etc.
prev_name = resultSet.getString(“P.name”)
}
}
39. 39
One-To-Many Using Linking
• Optimized for efficient management of mutable data
• Familiar way to organize basic entities
• Code is used to assemble fetched material into other
objects, not disassemble a single ResultSet
– More complicated queries may be easier to code and maintain
with assembly vs. disassembly
47. 47
Embedding vs. Linking
• Embedding
– Terrific for read performance
• Webapp “front pages” and pre-aggregated material
• Complex structures
– Great for insert-only / immutable designs
– Inserts might be slower than linking
– Data integrity for not-belongs-to needs to be managed
• Linking
– Flexible
– Data integrity is built-in
– Work is done during reads
• But not necessarily more work than RDBMS
56. 56
Summary
• Physical design is different in MongoDB
– But basic data design principles stay the same
• Focus on how an application accesses/manipulates data
• Seek out and capture belongs-to 1:1 relationships
• Use substructure to better align to code objects
• Be polymorphic!
• Evolve the schema to meet requirements as they change
HELLO!
This is Buzz Moschetti at MongoDB, and welcome to today’s webinar entitled “Thinking in Documents”, part of our Back To Basics series.
If your travel plans today do not include exploring the document model in MongoDB then please exit the aircraft immediately and see an agent at the gate
Otherwise – WELCOME ABOARD for about the next hour.
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
There are three themes that are important to grasp when Thinking In Documents in MongoDB and these will be reinforced throughout the presentation.
First, great schema design…. I’ll repeat it: great…
This is not new or something revolutionary to MongoDB.
It is something we have been doing all along in the construction of solutions. Sometimes well, sometimes not.
It’s just that the data structures and APIs used by MongoDB make it much easier to satisfy the first two bullet points.
Particularly for an up-stack software engineer kind of person like myself, the ease and power of well-harmonizing your persistence with your code – java, javascript, perl, python – is a vital part of
ensuring your overall information architecture is robust.
Part of the exercise also involves candidly addressing legacy RDBMS issues that we see over and over again after 40 years, like schema explosion and field overloading and flattening
Boils down to success = “schema” + code
Potentially very provocative.
In the old days, we had long requirements cycles and one or both of the following things happened:
Requirements changed midstream, invalidating a lot of schema design already performed especially for certain 1:1 entity relationships
High quality problem: The solution was asked to scale well beyond the original goals
Increasingly, it is no longer acceptable to have a 6 or 12 or 18 month requirements phase to build a system that lasts 10 years or more.
You want to get the initial information architecture done in a couple of weeks, expand upon it incrementally, and build a system that doesn’t have to live 10 years to “justify its development time.”
Because MongoDB is impedance matched to the software around it, you have a great deal of design flexibility and even innovation in the way you construct the overall solution.
We’ll explore some of those best practices today that will really put you on the road to Thinking In Documents
Let’s talk about some the terms.
JOINS: RDBMS uses Join to stich together fundamentally simple things into larger, more complex data.
MongoDB uses embedding of data within data and linking to produce the same result
A document is not a PDF or a MS word artifact.
A document a term for a rich shape. Structures of structures of lists of structures that ultimately at the leaves have familiar scalars like int, double, datetimes, and string.
In this example we see also that we’re carrying a thumbnail photo in a binary byte array type; that’s natively supported as well.
This is different than the traditional row-column approach used in RDBMS.
Another important difference is that In MongoDB, it is not required for every document in a collection to be the same shape; shapes can VARY
With the upcoming release of 3.2, we will be supporting documentation validation so in those designs where certain fields and their types are absolutely mandatory, we’ll be able to enforce that at the DB engine level similar to – but not exactly like – traditional schemas.
Truth is in most non-trivial systems, even with RDBMS and stored procs, etc. plenty of validation and logic is being handled outside the database..
Now here is something very important:
For the purposes of the webinar, we will be seeing this “to-string” representation of a document as it is emitted from the MongoDB CLI.
This is easy to read and gets the structural design points across nicely.
But make no mistake: you want most of your actual software interaction with MongoDB (and frankly any DB) to be via high fidelity types, not a big string with whitespaces and CR and quotes and whatnot.
Drivers in each language represent documents in a language-native form most appropriate for that language.
Java has maps, python has dictionaries. You deal with actual objects like Dates, not strings that must be constructed or parsed.
Another important note: We’ll be using query functionality to kill 2 birds with one stone:
To show the shape of the document
To show just a bit of the MongoDB query language itself including dotpath notation to “dig into” substructures
Note also that in MongoDB, documents go into collections in the same shape they come out so we won’t focus on insert.
This is a very different design paradigm from RDBMS, where, for example, the read-side of an operation implemented as an 8 way join is very different than the set of insert statements (some of them in a loop) required for the write side.
Let’s get back to data design.
…
Traditional data design is characterized by some of the points above, and this is largely because the design goals and constraints of legacy RDBMS engines heavily influence the data being put into them.
These platforms were designed when CPUs were slow and memory was VERY expensive.
Perhaps more interesting is that the languages of the time – COBOL, FORTRAN, APL, PASCAL, C – were very compile time oriented and very rectangular in their expression of data structures.
One could say rigid schema combined with these languages was in fact well-harmonized.
Overall, the platform is VERY focused on the physical representation of data.
For example, although most have been conflated, the legacy types of char, varchar, text, CLOB etc. to represent a string suggest a strong coupling to byte-wise storage concerns.
Documents, on the other hand, are more like business entities.
You’ll want to think of your data moving in and out as objects.
And the types and features of Document APIs are designed to be well-harmonized with today’s programming languages – Java, C#, python, node.js, Scala , C++ -- languages that are not nearly as compile-time oriented and offer great capabilities to dynamically manipulate data and perform reflection/introspection upon it.
So what better way to show this than diving right into a problem?
Today we’ll explore data structures and schema for a library management application
Good example because in general most of you have some familiarity with the entities involved and we can explore some 1:1, 1:n and other design elements
To make things a little more interesting, we’ll capture our emerging requirements in the form of questions the application needs to ask the database.
This is useful because, again, it helps us keep our eye on the software layer above the database
Imagine a patron walks up to the counter and presents his/her library card to check out some books. The first thing a librarian might want to do is confirm the patron’s address so as to have a place to send the library police when the book isn’t returned in a timely manner.
This might be the very first cut at such a thing, with 2 collections, patrons and addresses. Both share a common key space on the _id unique ID field.
To get name and address will require 2 queries.
Very important: You notice here that a list of favorite genres has been set up. It is a LIST of strings. Not a semicolon or comma delimited string.
Each element of that list is completely addressable, queryable, and updatable via dotpath notation. As is the entire list.
And should we require it, it is indexable. But let’s not get too far ahead.
Can we do better?
I think we can: embed the address substructure into the patron document.
Only 1 query is necessary.
You notice here that we treated address like an Object. We did not “flatten” everything out into addr_street, addr_city, addr_state, etc.
We’ll come back to that again in just a few slides and see the power it provides.
Already, with this simple example, two important aspects of Thinking In Documents are hilighted: arrays and substructures.
A common concern that pops up at this point is that every query will act like “SELECT *” and drag back potentially a LOT of data.
In MongoDB you can control that with projection as we see here, where fields can be explicitly included or excluded.
This operation is of course processed server side and is efficient.
Note that we can both pick up an entire substructure as a unit OR JUST specific subfields using “dot notation”.
Very powerful.
Before I mentioned we’d see more about the power of substructure.
Substructure is the concept that allows your schema to really well-harmonize with software.
By being able to pull out the entire address structure, you can pass it to a function that does something interesting with no matter what fields are or are not set.
And what’s really great is that these substructures are generic Maps or dictionaries – NO COMPILE TIME dependencies!
If your Thinking in Documents, then remember that Document Shapes can vary.
Apols for the small font.
But here, we are inserting a new document where address contains an additional field “country” (and, if you’re watching closely, does NOT have the zipcode field!)
When do a find and ask for the address field, two slightly different shapes emerge.
But more importantly, only one place “needs” to change: the place where the new “country” field is actually consumed.
Neither the query nor the intermediate parts of the call stack change at all – the new country field simply flows through!
Assuming you’re not in a “select *” environment which has it’s own huge set of issues, this is the traditional approach to the same problem.
For brevity I am leaving out the entire issue of ALTER TABLE and schema migration scripts which may be avoided in MongoDB.
The focus here is agilty and accuracy of Day 2 mods needed and how many places you have to make them.
Here, Every SQL statement that pulled out city, state, and zip has to be assessed in the context of doSomethingWithOneAddress(). In others words, if the call to doSomethingWithOneAddress() exists in code – the call NOT the one implementation -- then BOTH the SQL statement and the thing that converts it into a useful object (located hopefully nearby) have to be modified. This potentially is a large undertaking!
The problem is magnified by the number of addresses that you may be pulling out.
In SQL and rectangles, you have to track down everything that looks like it is part of an address and ALTER TABLE and add columns to the select statements.
In MongoDB and Documents, the additional material is automatically collected and flowed downstream with no DDL or compile time entanglements.
We’ll see in a moment how Lists in Documents make this easier still!
Let’s summarize one-to-one relationships.
Read performance – especially for complex shapes of documents that otherwise would entail multiway JOINs -- is optimized because we only need a single query and a single disk/memory hit..
MongoDB is especially useful in dealing with 1:1 belongs-to relationships
Not every non-scalar datum is going to force you to create a new table, a foreign key, and key management.
It keeps simple things simple.
Business Requirements Change!
People may have more than one address.
And: we acknowledge that the legacy practice of assuming some small number of instances and flattening out addresses into address1 and address2 is bound to bump up against a situation where someone has 3 addresses.
We’ve seen where that leads traditional schema design.
We don’t want to repeat it.
Now, just model addresses as an array.
That was easy.
No need to create a new table, set up foreign keys, do key management, etc.
We combine the power of arrays with substructures.
At times it takes longer to describe the change than it does to implement it and have it available as part of the information architecture including querying and indexing.
And again, these rich structures are presented in the client side driver as Lists, Maps, dictionaries, arrays and other high-fidelity types.
Now suppose this design change occurred on Day 2, after the system was placed into production.
What about the old documents that had address, not addresses?
There are 2 aspects to this issue.
The MongoDB paradigm is that the code layer just above the DB is responsible for managing these types of things. The same code that is responsible for the higher-level business validation and verification is also responsible for selectively using (both on a read and write basis) the address or addresses field.
There are several migration strategies that one could employ in conjunction with functionality at the code layer.
Remember, Thinking In Documents means thinking about documents with varying shapes.
Migrate on release: If address exists, write into new array of size 1 of new fields addresses and remove address field.
When we say “let the code deal with it” we mean something like this.
The Data Access Layer code in cooperation with the Document model significantly eases Day 2 information architecture change.
And again, this is an important aspect of Thinking In Documents.
Moving on, let’s look at another relationship in our emerging information architecture: publisher and books.
Shameless plug!
Note that a book may have multiple authors….
Here’s a first cut at the design.
Here, the 1:! Belongs-to relationship between book and publisher is not nearly as strong as patron to address
But for the moment, let’s choose to duplicate publisher in every book that the publisher has published.
Data duplication is OK because the publisher is largely immutable.
Here’s an even better one: making the author name more granular by making it a substructure with separate first and last name elements.
A good rule of thumb is to try to eliminate embedded whitespace or punctuation in a field by breaking it down into substructure.
This design will permit very high speed display of relevant book information.
And we can still quickly look up all books for a given publisher.
Suppose publisher info is not so static.
Maybe we want to take a more traditional route and separate publisher information out into it’s own collection.
We need to assign _ids and we’ll use simplified examples here.
Note that in MongoDB each document needs a unique _id but it’s not important for today’s talk to get into the advanced topic of _id construction
If you don’t supply one, one will be created automatically.
And here’s a teaser: _id does not have to be a string or number or scalar: it can be a multi-field substructure!
Note that the belongs-to 1:1 relationship of locations carries over nicely without fuss.
It now takes 2 queries to pull the information.
Now this isn’t the end of the world.
It’s effectively no more code work than a JOIN. Instead of decomposing a resultset or passing the result into an ORM, your code layer deals directly with the database.
If we want to get more than one book, no problem.
Get the books and as you’re processing the book data, capture the publisher ID.
We store it in a map to remove duplicate IDs
Then we turn around and do an efficient lookup by primary key into the publishers collection. This works well regardless of whether the list is 1 long or 10,000.
This is effectively the multi value JOIN. And of course the approach applies to linking any number of collections, not just the 2 we see here.
In MongoDB, the application assembles material from multiple queries.
Before the chat window lights up with all sorts of questions and rage, let’s do a quick compare and contrast to SQL.
This example should be straightforward to most of you.
As you know, the basic behavior of JOIN has some subtle issues:
The cartesian product in the ResultSet is a giant rectangle containing repeating column values; you have to do something to change that into a form well-digestable by your application.
Depending on the amount of JOINed columns and the cardinality, you could be redundantly dragging LOTS of data across the network.
And on top of it all, it’s hard to deal with Giant Rectangle efficiently because the JOINed data is jumping around.
This leads us to run things through ORDER BY so that we can now more easily deal with the Giant Rectangle.
What does this do to performance?
What if it’s more than a 2 way JOIN? 4 or 8 way?
In the end, for anything except the most trivial of JOINs, the amount of work required to unwind or disassemble the Giant Rectangle is largely the same amount of code as
It takes for MongoDB to assemble material.
For sophisticated queries to complex shapes, it could be argued that assembly is both easier to write AND maintain Day 2.
This design will permit very high speed display of relevant book information.
And we can still quickly look up all books for a given publisher.
We have already seen a way to do this : by performing a find() on the Books collection with a particular publisher name OR publisher ID.
But is there another way?
As much as arrays are very useful in MongoDB, beware of the temptation to treat them as “rows.”
Arrays should be “slow-growing” and not without a reasonable bound.
For example, you wouldn’t carry system of record customer transactions as a array inside the customer.
You MIGHT, however, for the purposes of speed at scale, duplicate the last 3 transactions as an array in the customer record – but that array is not growing without bound.
From an underlying technology basis, this is less of an issue now with the WiredTiger storage engine available in 3.0 but you should still be mindful of it.
As our schema is evolving, we’re beginning to see that an increasing number of questions can be efficiently answered by existing set of entities and relationships.
This holds for authors; recall we set up the substructure of first and last names. But suppose we want more detail on an author?
Let’s go ahead and create the Authors collection and link to it with a Author ID.
Yes, we are using some fancier code here to extract the author IDs but in the end, it is still only 2 queries.
The fancy code here is using a map() operation to extract just the author IDs which will be passed along to the $in operator
The authors documents contain all sorts of information about each author.
We have left the first and last names in the books document for the purposes of speed
But we could get rid of them if we wanted.
It is a choice of integrity vs. speed of retrieval.
Inevitably, we turn the question around and start to expose ourselves to the MANY-TO-MANY problem.
Here’s one way to do it: Acknowledging that most authors are not steven king or danielle steele and have something of upper bound on the books they’ve written, we can doubly link the books and authors collection.
In general the data is static so referential integrity is not as much as an issue as performance.
Again, we’ve kept title in the structure within the books field for performance purposes (to keep it to a single query) when looking up basic info about an author.
But if so desired, we could just have books be an array of book IDs and use a 2-query approach as we’ve seen in prior slides.
Note this is different than publishers having an array of books because the upper bound for author-to-books is much less than publisher-to-books.
Remember: all fields including those inside structures inside arrays are indexable.
We saw an example of this earlier with indexing on publisher.name
Complex structures like arrays of coordinate pairs, deep nested data, etc.
With immutable designs, sometimes the safest / easiest route is to dereference the data. Note that with the emergence of deduping storage arrays, this is increasingly becoming a practical and exciting solution!
Suppose the library wants to add value by having book clubs enhance the authors information with bespoke information.
There are at least 3 challenges here:
1. the book club is operating semi-independently from the main system designers
Clearly, we don’t know early on what the book club is going to add
Each author may have a different set of information, created by a different member of the book club.
This is a very, very exciting part of MongoDB.
No need to come up with userDefined column 1, column 2, etc.
We see here that Kristina and Mike have very different substructures inside the personalData field.
We call this polymorphism: the variation of shape from document-to-document within the same collection.
The library application logic only is looking for a field called “personalData”; actions will be taken dynamically based on the shape and types in the substructure!
For example, It is a very straightforward exercise to recursively “walk” the structure and construct a panel in a GUI – especially if you are using AngularJS and the MEAN stack
(MongoDB / Express / Angular / Node.js )
No need to use XML or blobs or serialized objects. It’s all native MongoDB -- and documents are represented in the form most easily natively manipulated by the language
Every field is queryable and if so desired, indexable! Documents that do not contain fields in a query predicate are simply treated as unset.
Polymorphism is such an important aspect of Thinking In Documents that it deserves another slide..
Here’s an events collection. We’ve taken out _id for some extra space.
Type and ts and data are the well-known fields, and there is probably an index on type and ts.
But the data field contents are specialized for each type.
No need for complex tables or blobs or XML.
And very easily extensible to new types we may need to add on day 2.
This model extends to products, transactions, actors in a data design..
If you think about your data in terms of entities, you’ll discover that you probably only need a handful. That plus polymorphism within the entity gets you a long way.
Because of this design approach, rarely will a MongoDB database have 100s of collections
This is an example of faceted search or tagging in general.
We can begin simply by implementing Categories as an array
We probably will only have a few and they don’t change very often and have a natural upward bound.
An index on categories will allow rapid retrieval.
Again, a very appropriate use of belongs-to 1:1
If we want a little more hierarchy, we can use regex operations.
The category field is indexed as before, but now as a string
Uses an index because of the anchored regular expression.
There is a downside: when a category hierarchy gets changed, all documents will need to be re-categorized.
A little techno-trailer for you: Document Validation.
Coming in v3.2, Document Validation “inverts” the use of existing MongoDB query language.
Documents being inserted or updated will be assessed against a validator if it exists. There are a number of special features including “strictness on update” that are beyond the scope of today’s presentation but the good news is that it is coming!
We’ll close with something really cool – document validation that adapts to change over time.
Assuming you “soft version” your documents by including a logical version ID ( in this case, a simple integer in field v) , you can maintain multiple different shapes of documents in one collection, each of them validation enforced to the version rules appropriate at the time. And again, because it is at the DB engine level, enforcement is guaranteed through all drivers.
SUPER POWERFUL!
On behalf of all of us at MongoDB , thank you for attending this webinar!
I hope what you saw and heard today gave you some insight and clues into what you might face in your own data design efforts.
Remember you can always reach out to us at MongoDB for guidance.
With that, code well and be well.