This document provides information about a MongoDB class taught by Alexandre Bergere. The class covers topics including Big Data, NoSQL, MongoDB architecture and modeling, CRUD operations, replication, security, and aggregation. It includes Alexandre's background and credentials, as well as sources and use cases for MongoDB.
Iot streaming with Azure Stream Analytics from IotHub to the full data slackAlexandre BERGERE
In this article I'm going to explain how to push data from iot devices through Azure Stream Analytics into multiples channels: Azure Blob Storage (as a cold database), Azure Cosmos DB (as a hot database), Power BI (for data visualization) and Azure Service Bus & Azure Logic App (data processing & user interaction).
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset.
Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes.
Bio:
Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization.
Topics: MLOps, Metaflow, model cards.
Brief introduction to Cerved data, the role of data scientist in Cerved and how a data scientist can take advantage from graph database.
Bio:
Stefano Gatti: Born in 1970, has been involved for more than 15 years in several big data and technologies driven projects in leading business information companies like Lince and Cerved. He is very fond of agile metodologies, trying to apply them at all organizational levels. In last years he is strongly engaged in facilitating in Cerved the spread of innovation and the taking advantage from the new big and smart data technologies especially from a business usage perspective. datatelling, open innovation, partnership with smart actors of worldwide data driven innovation ecosystem are his actual mantra. Nunzio Pellegrino: Data Scientist in Cerved, as part of Innovation team, with focus on extract value from data and resolve problems with the latest technologies available. I’ve a degree in Statistics with background in Machine Learning. I’ve being worked primarily in Data Integration and Business Intelligence projects for 3 years. In this moment, I’m product owner of a web application based on GraphDB and involved in Italian Open Data projects. I’m a R enthusiastic, Python practitioner and fascinated of graph ecosystem.
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach
You don't necessarily have to set up a relational database, tables and load data in order to use a surprisingly rich set of SQL capabilities on your data in the cloud. IBM SQL Query lets you analyze terabytes of distributed data of heterogeneous formats with a complete ANSI SQL dialect in a completely serverless usage model, elegantly ETL data between formats and partitioning layouts as needed, and run complex time series transformations, analysis and correlations with advanced built-in timeseries SQL algorithms that are differentiating in the entire industry. It also support a complete PostGIS compliant geospatial SQL function set. Come explore the stunningly advanced world of SQL without a database in IBM Cloud.
Iot streaming with Azure Stream Analytics from IotHub to the full data slackAlexandre BERGERE
In this article I'm going to explain how to push data from iot devices through Azure Stream Analytics into multiples channels: Azure Blob Storage (as a cold database), Azure Cosmos DB (as a hot database), Power BI (for data visualization) and Azure Service Bus & Azure Logic App (data processing & user interaction).
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset.
Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes.
Bio:
Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization.
Topics: MLOps, Metaflow, model cards.
Brief introduction to Cerved data, the role of data scientist in Cerved and how a data scientist can take advantage from graph database.
Bio:
Stefano Gatti: Born in 1970, has been involved for more than 15 years in several big data and technologies driven projects in leading business information companies like Lince and Cerved. He is very fond of agile metodologies, trying to apply them at all organizational levels. In last years he is strongly engaged in facilitating in Cerved the spread of innovation and the taking advantage from the new big and smart data technologies especially from a business usage perspective. datatelling, open innovation, partnership with smart actors of worldwide data driven innovation ecosystem are his actual mantra. Nunzio Pellegrino: Data Scientist in Cerved, as part of Innovation team, with focus on extract value from data and resolve problems with the latest technologies available. I’ve a degree in Statistics with background in Machine Learning. I’ve being worked primarily in Data Integration and Business Intelligence projects for 3 years. In this moment, I’m product owner of a web application based on GraphDB and involved in Italian Open Data projects. I’m a R enthusiastic, Python practitioner and fascinated of graph ecosystem.
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach
You don't necessarily have to set up a relational database, tables and load data in order to use a surprisingly rich set of SQL capabilities on your data in the cloud. IBM SQL Query lets you analyze terabytes of distributed data of heterogeneous formats with a complete ANSI SQL dialect in a completely serverless usage model, elegantly ETL data between formats and partitioning layouts as needed, and run complex time series transformations, analysis and correlations with advanced built-in timeseries SQL algorithms that are differentiating in the entire industry. It also support a complete PostGIS compliant geospatial SQL function set. Come explore the stunningly advanced world of SQL without a database in IBM Cloud.
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
Modernizing analytics data pipelines to gain the most of your data while optimizing costs can be challenging. However, today cloud providers offer a good set of services that can help with this endeavor. We will do a tour across some GCP services during this hands-on session, using DataFlow (apache beam) as the backbone to architect a modern analytics pipeline to wire them all together.
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...Databricks
We built DSx on the basis of our own experience as data scientists and in my case, as an analytics executive. The pain points we experienced guide which use cases we focus on, and these use cases have largely resonated with our customers. However, and very much to our surprise, customers are finding new and unexpected ways to use DSx to solve their particular pain points. In this talk we’ll outline both the expected and unexpected ways that customers use DSx to solve their day-to-day challenges.
This talk will provide a brief update on Microsoft’s recent history in Open Source with specific emphasis on Azure Databricks, a fast, easy and collaborative Apache Spark-based analytics service. Attendees will learn how to integrate MongoDB Atlas with Azure Databricks using the MongoDB Connector for Spark. This integration allows users to process data in MongoDB with the massive parallelism of Spark, its machine learning libraries, and streaming API.
MongoDB World 2019: Ticketek: Scaling to Global Ticket Sales with MongoDB AtlasMongoDB
Learn how MongoDB Atlas has enabled Ticketek to grow rapidly across geographical boundaries and seamlessly support the adoption of new business initiatives. Tane Oakes, TEG Enterprise Architect, will do a deep dive on how MongoDB Atlas supports Ticketek's strategic multi-cloud initiative and how Ticketek uses MongoDB Stitch to establish a scalable and common API used by customers and partners. Tane will also explain how using MongoDB Atlas and MongoDB Stitch has helped reduce technical debt.
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB
Les données de séries chronologiques sont de plus en plus au cœur des applications modernes: pensez à l'IoT, aux transactions sur actions, aux flux de clics, aux médias sociaux, etc. Avec le passage des systèmes batch aux systèmes temps réel, la capture et l'analyse efficaces des données de séries chronologiques peuvent permettre aux entreprises de mieux détecter et réagir aux événements en avance sur leurs concurrents ou d'améliorer l'efficacité opérationnelle pour réduire les coûts et les risques. Travailler avec des données de séries chronologiques est souvent différent des données d’application classiques et vous devez observer les meilleures pratiques. Cette conférence couvre: Composants communs d'une solution IoT Les défis liés à la gestion de données chronologiques dans les applications IoT Différentes conceptions de schéma et leur incidence sur l'utilisation de la mémoire et du disque sont deux facteurs déterminants dans les performances des applications. Comment interroger, analyser et présenter les données de séries chronologiques IoT à l'aide de MongoDB Compass et MongoDB Charts À la fin de la session, vous aurez une meilleure compréhension des meilleures pratiques clés en matière de gestion des données de séries chronologiques de l'IoT avec MongoDB.
How data modelling helps serve billions of queries in millisecond latency wit...DataWorks Summit
Users of financial data require their queries to return results with very low latency. As a financial data service provider, Bloomberg needs to consistently meet these requirements for our clients.
HBase promises millisecond latency, auto-sharding, and an open schema. However, as HBase is a NoSQL database, support for transaction processing is not trivial. This talk will discuss a data modelling technique and use case where effective transaction processing can be achieved. We will also discuss how this data model helps us achieve real-time streaming, scalability, and millisecond read-write latency for billions of queries each day.
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformRising Media Ltd.
The Scout24 Data Landscape Manifesto is the formalization of our opinions on how a successful data-driven company should approach data. In a truly data-driven company, no manager, no salesperson, no engineer and no data scientist can do their job properly without easy access to large amounts of high-quality data. It is Sean's mandate to create a platform that encourages the production of high-quality data and enables engagement with data by all employees. He and his team are opinionated about how all producers and consumers of data need to be active participants in the data platform, to make data-driven decisions and to be responsible for the data they produce. And he built the data platform with 'nudges' that reward data usage that matches his vision for a data-driven company. In this talk, Sean will present the Scout24 Data Landscape Manifesto and will show how the strong opinions it contains enabled him to successfully migrate from a classic centralized data warehouse to a decentralized, scalable, cloud-based data platform at AutoScout24 and ImmobilienScout24 that is core to their analytics and machine learning activities.
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
This talk was recorded in London on October 30, 2018.
KNIME Analytics Platform is an easy to use and comprehensive open source data integration, analysis, and exploration platform, enabling data scientists to visually compose end to end data analysis workflows. The over 2,000 available modules ("nodes") cover each step of the analysis workflow, including blending heterogeneous data types, data transformation, wrangling and cleansing, advanced data visualization, or model training and deployment.
Many of these nodes are provided through open source integrations (why reinvent the wheel?). This provides seamless access to large open source projects such as Keras and Tensorflow for deep learning, Apache Spark for big data processing, Python and R for scripting, and more. These integrations can be used in combination with other KNIME nodes meaning that data scientists can freely select from a vast variety of options when tackling an analysis problem.
The integration of H2O in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O open source machine learning libraries, making it easy to use H2O algorithms from a KNIME workflow without touching any code - each of the H2O nodes looks and feels just like a normal KNIME node - and the data scientist benefits from the high performance libraries and proven quality of H2O during execution. For prototyping these algorithms are executed locally, however training and deployment can easily be scaled up using a Sparkling Water cluster.
In our talk we give a short introduction to KNIME Analytics Platform and then demonstrate how data scientists benefit from using KNIME Analytics Platform and H2O Machine Learning in combination by using a real world analysis example.
Bio: Christian received a Master’s degree in Computer Science from the University of Konstanz. Having gained experience as a research software engineer at the University of Konstanz, where he developed frameworks and libraries in the fields of bioimage analysis and machine learning, Christian moved on to become a software engineer at KNIME. He now focuses on developing new functionalities and extensions for KNIME Analytics Platform. Some of his recent projects include deep learning integrations built upon Keras and Tensorflow, extensions for image analysis and active learning, and the integration of H2O Machine Learning and H2O Sparkling Water in KNIME Analytics Platform.
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demandsMongoDB
To successfully implement our clients' unique use cases and data patterns, it is mandatory that we unlearn many relational concepts while designing and rapidly developing efficient applications in NoSQL.
In this session, we will talk about some of our client use cases and the strategies we adopted using features of MongoDB.
This talk will introduce you to the Data Cloud, how it works, and the problems it solves for companies across the globe and across industries. The Data Cloud is a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Wherever data or users live, Snowflake delivers a single and seamless experience across multiple public clouds. Snowflake’s platform is the engine that powers and provides access to the Data Cloud
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
The Scout24 Data Platform powers all reporting, ad hoc analytics and machine learning products at AutoScout24 and ImmobilienScout24. In this talk, we will take a technical deep dive into our modern, cloud-based big data platform. We will discuss our evolution of approaches to ingestion, ETL, access control, reporting, and machine learning with a focus on in-the-trenches learnings gained from our many failures and successes as we migrated from a traditional Oracle Data Warehouse to an AWS-based data lake.
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB
De nos jours, tout le monde devrait être "Data Analyst". Mais avec tant de données disponibles, comment les comprendre et vous assurer que vous prenez les meilleures décisions ? Une excellente approche consiste à utiliser des visualisations de données. Au cours de cette présentation, notre expert utilisera un jeu de données complexe et vous montrera comment l'étendue des fonctionnalités de MongoDB Charts peut vous aider à transformer les bits et bytes en informations.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
The Scout24 Data Platform - a technical deep diveseangustafson
Presentation by Raffael Dzikowski and Sean Gustafson of Scout24 AG at the 2019 AWS Summit Berlin.
Abstract:
The Scout24 Data Platform powers all reporting, ad hoc analytics and machine learning products at AutoScout24 and ImmobilienScout24. In this talk, we will take a technical deep dive into our modern, cloud-based big data platform. We will discuss our evolution of approaches to ingestion, ETL, access control, reporting and machine learning with a focus on in-the-trenches learnings gained from our many failures and successes as we migrated from a traditional Oracle Data Warehouse to an AWS-based data lake.
Netflix is a famously data-driven company. Data is used to make informed decisions on everything from content acquisition to content delivery, and everything in-between. As with any data-driven company, it’s critical that data used by the business is accurate. Or, at worst, that the business has visibility into potential quality issues as soon as they arise. But even in the most mature data warehouses, data quality can be hard. How can we ensure high quality in a cloud-based, internet-scale, modern big data warehouse employing a variety of data engineering technologies?
In this talk, Michelle Ufford will share how the Data Engineering & Analytics team at Netflix is doing exactly that. We’ll kick things off with a quick overview of Netflix’s analytics environment, then dig into the architecture of our current data quality solution. We’ll cover what worked, what didn’t work so well, and what we're working on next. We’ll conclude with some tips & lessons learned for ensuring high quality on big data.
This talk was presented at DataWorks/Hadoop Summit 2017 on June 13, 2017.
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
Modernizing analytics data pipelines to gain the most of your data while optimizing costs can be challenging. However, today cloud providers offer a good set of services that can help with this endeavor. We will do a tour across some GCP services during this hands-on session, using DataFlow (apache beam) as the backbone to architect a modern analytics pipeline to wire them all together.
How Customers Are Using the IBM Data Science Experience - Expected Cases and ...Databricks
We built DSx on the basis of our own experience as data scientists and in my case, as an analytics executive. The pain points we experienced guide which use cases we focus on, and these use cases have largely resonated with our customers. However, and very much to our surprise, customers are finding new and unexpected ways to use DSx to solve their particular pain points. In this talk we’ll outline both the expected and unexpected ways that customers use DSx to solve their day-to-day challenges.
This talk will provide a brief update on Microsoft’s recent history in Open Source with specific emphasis on Azure Databricks, a fast, easy and collaborative Apache Spark-based analytics service. Attendees will learn how to integrate MongoDB Atlas with Azure Databricks using the MongoDB Connector for Spark. This integration allows users to process data in MongoDB with the massive parallelism of Spark, its machine learning libraries, and streaming API.
MongoDB World 2019: Ticketek: Scaling to Global Ticket Sales with MongoDB AtlasMongoDB
Learn how MongoDB Atlas has enabled Ticketek to grow rapidly across geographical boundaries and seamlessly support the adoption of new business initiatives. Tane Oakes, TEG Enterprise Architect, will do a deep dive on how MongoDB Atlas supports Ticketek's strategic multi-cloud initiative and how Ticketek uses MongoDB Stitch to establish a scalable and common API used by customers and partners. Tane will also explain how using MongoDB Atlas and MongoDB Stitch has helped reduce technical debt.
MongoDB .local Paris 2020: Les bonnes pratiques pour travailler avec les donn...MongoDB
Les données de séries chronologiques sont de plus en plus au cœur des applications modernes: pensez à l'IoT, aux transactions sur actions, aux flux de clics, aux médias sociaux, etc. Avec le passage des systèmes batch aux systèmes temps réel, la capture et l'analyse efficaces des données de séries chronologiques peuvent permettre aux entreprises de mieux détecter et réagir aux événements en avance sur leurs concurrents ou d'améliorer l'efficacité opérationnelle pour réduire les coûts et les risques. Travailler avec des données de séries chronologiques est souvent différent des données d’application classiques et vous devez observer les meilleures pratiques. Cette conférence couvre: Composants communs d'une solution IoT Les défis liés à la gestion de données chronologiques dans les applications IoT Différentes conceptions de schéma et leur incidence sur l'utilisation de la mémoire et du disque sont deux facteurs déterminants dans les performances des applications. Comment interroger, analyser et présenter les données de séries chronologiques IoT à l'aide de MongoDB Compass et MongoDB Charts À la fin de la session, vous aurez une meilleure compréhension des meilleures pratiques clés en matière de gestion des données de séries chronologiques de l'IoT avec MongoDB.
How data modelling helps serve billions of queries in millisecond latency wit...DataWorks Summit
Users of financial data require their queries to return results with very low latency. As a financial data service provider, Bloomberg needs to consistently meet these requirements for our clients.
HBase promises millisecond latency, auto-sharding, and an open schema. However, as HBase is a NoSQL database, support for transaction processing is not trivial. This talk will discuss a data modelling technique and use case where effective transaction processing can be achieved. We will also discuss how this data model helps us achieve real-time streaming, scalability, and millisecond read-write latency for billions of queries each day.
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformRising Media Ltd.
The Scout24 Data Landscape Manifesto is the formalization of our opinions on how a successful data-driven company should approach data. In a truly data-driven company, no manager, no salesperson, no engineer and no data scientist can do their job properly without easy access to large amounts of high-quality data. It is Sean's mandate to create a platform that encourages the production of high-quality data and enables engagement with data by all employees. He and his team are opinionated about how all producers and consumers of data need to be active participants in the data platform, to make data-driven decisions and to be responsible for the data they produce. And he built the data platform with 'nudges' that reward data usage that matches his vision for a data-driven company. In this talk, Sean will present the Scout24 Data Landscape Manifesto and will show how the strong opinions it contains enabled him to successfully migrate from a classic centralized data warehouse to a decentralized, scalable, cloud-based data platform at AutoScout24 and ImmobilienScout24 that is core to their analytics and machine learning activities.
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...Sri Ambati
This talk was recorded in London on October 30, 2018.
KNIME Analytics Platform is an easy to use and comprehensive open source data integration, analysis, and exploration platform, enabling data scientists to visually compose end to end data analysis workflows. The over 2,000 available modules ("nodes") cover each step of the analysis workflow, including blending heterogeneous data types, data transformation, wrangling and cleansing, advanced data visualization, or model training and deployment.
Many of these nodes are provided through open source integrations (why reinvent the wheel?). This provides seamless access to large open source projects such as Keras and Tensorflow for deep learning, Apache Spark for big data processing, Python and R for scripting, and more. These integrations can be used in combination with other KNIME nodes meaning that data scientists can freely select from a vast variety of options when tackling an analysis problem.
The integration of H2O in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O open source machine learning libraries, making it easy to use H2O algorithms from a KNIME workflow without touching any code - each of the H2O nodes looks and feels just like a normal KNIME node - and the data scientist benefits from the high performance libraries and proven quality of H2O during execution. For prototyping these algorithms are executed locally, however training and deployment can easily be scaled up using a Sparkling Water cluster.
In our talk we give a short introduction to KNIME Analytics Platform and then demonstrate how data scientists benefit from using KNIME Analytics Platform and H2O Machine Learning in combination by using a real world analysis example.
Bio: Christian received a Master’s degree in Computer Science from the University of Konstanz. Having gained experience as a research software engineer at the University of Konstanz, where he developed frameworks and libraries in the fields of bioimage analysis and machine learning, Christian moved on to become a software engineer at KNIME. He now focuses on developing new functionalities and extensions for KNIME Analytics Platform. Some of his recent projects include deep learning integrations built upon Keras and Tensorflow, extensions for image analysis and active learning, and the integration of H2O Machine Learning and H2O Sparkling Water in KNIME Analytics Platform.
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demandsMongoDB
To successfully implement our clients' unique use cases and data patterns, it is mandatory that we unlearn many relational concepts while designing and rapidly developing efficient applications in NoSQL.
In this session, we will talk about some of our client use cases and the strategies we adopted using features of MongoDB.
This talk will introduce you to the Data Cloud, how it works, and the problems it solves for companies across the globe and across industries. The Data Cloud is a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Wherever data or users live, Snowflake delivers a single and seamless experience across multiple public clouds. Snowflake’s platform is the engine that powers and provides access to the Data Cloud
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
The Scout24 Data Platform powers all reporting, ad hoc analytics and machine learning products at AutoScout24 and ImmobilienScout24. In this talk, we will take a technical deep dive into our modern, cloud-based big data platform. We will discuss our evolution of approaches to ingestion, ETL, access control, reporting, and machine learning with a focus on in-the-trenches learnings gained from our many failures and successes as we migrated from a traditional Oracle Data Warehouse to an AWS-based data lake.
MongoDB .local Paris 2020: Devenez explorateur de données avec MongoDB ChartsMongoDB
De nos jours, tout le monde devrait être "Data Analyst". Mais avec tant de données disponibles, comment les comprendre et vous assurer que vous prenez les meilleures décisions ? Une excellente approche consiste à utiliser des visualisations de données. Au cours de cette présentation, notre expert utilisera un jeu de données complexe et vous montrera comment l'étendue des fonctionnalités de MongoDB Charts peut vous aider à transformer les bits et bytes en informations.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local Chicago 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
The Scout24 Data Platform - a technical deep diveseangustafson
Presentation by Raffael Dzikowski and Sean Gustafson of Scout24 AG at the 2019 AWS Summit Berlin.
Abstract:
The Scout24 Data Platform powers all reporting, ad hoc analytics and machine learning products at AutoScout24 and ImmobilienScout24. In this talk, we will take a technical deep dive into our modern, cloud-based big data platform. We will discuss our evolution of approaches to ingestion, ETL, access control, reporting and machine learning with a focus on in-the-trenches learnings gained from our many failures and successes as we migrated from a traditional Oracle Data Warehouse to an AWS-based data lake.
Netflix is a famously data-driven company. Data is used to make informed decisions on everything from content acquisition to content delivery, and everything in-between. As with any data-driven company, it’s critical that data used by the business is accurate. Or, at worst, that the business has visibility into potential quality issues as soon as they arise. But even in the most mature data warehouses, data quality can be hard. How can we ensure high quality in a cloud-based, internet-scale, modern big data warehouse employing a variety of data engineering technologies?
In this talk, Michelle Ufford will share how the Data Engineering & Analytics team at Netflix is doing exactly that. We’ll kick things off with a quick overview of Netflix’s analytics environment, then dig into the architecture of our current data quality solution. We’ll cover what worked, what didn’t work so well, and what we're working on next. We’ll conclude with some tips & lessons learned for ensuring high quality on big data.
This talk was presented at DataWorks/Hadoop Summit 2017 on June 13, 2017.
how_can_businesses_address_storage_issues_using_mongodb.pdfsarah david
MongoDB enables seamless data storage and performance. Explore our blog to learn how MongoDB handles storage issues for startups and large-scale enterprises. Discover how to optimize MongoDB performance using open-source database storage.
how_can_businesses_address_storage_issues_using_mongodb.pptxsarah david
MongoDB enables seamless data storage and performance. Explore our blog to learn how MongoDB handles storage issues for startups and large-scale enterprises. Discover how to optimize MongoDB performance using open-source database storage.
We all know good training data is crucial for data scientists to build quality machine learning models. But when productionizing Machine Learning, Metadata is equally important. Consider for example:
- Provenance of model allowing for reproducible builds
- Context to comply with GDPR, CCPA requirements
- Identifying data shift in your production data
This is the reason we built ArangoML Pipeline, a flexible Metadata store which can be used with your existing ML Pipeline.
Today we are happy to announce a release of ArangoML Pipeline Cloud. Now you can start using ArangoML Pipeline without having to even start a separate docker container.
In this webinar, we will show how to leverage ArangoML Pipeline Cloud with your Machine Learning Pipeline by using an example notebook from the TensorFlow tutorial.
Find the video here: https://www.arangodb.com/arangodb-events/arangoml-pipeline-cloud/
MongoDB and Spring - Two leaves of a same treeMongoDB
Enterprise systems evolve at a tremendous pace these days. All sorts of new frameworks, databases, operating systems and multiple deployment strategies and infrastructures to adjust to ever growing business demands.
The integration between Spring Framework and MongoDB tends to be somewhat unknown. This presentation shows the different projects that compose Spring ecosystem, Springdata, Springboot, SpringIO etc and how to merge between the pure JAVA projects to massive enterprise systems that require the interaction of these systems together.
Discover BigQuery ML, build your own CREATE MODEL statementMárton Kodok
With BigQuery ML, you can build machine learning models without leaving the database environment and training it on massive datasets. In this demo session we are going to demonstrate common marketing Machine Learning use cases of how to build, train, eval, and predict, your own scalable machine learning models using SQL language in Google BigQuery and to address the following use cases: - Customer Segmentation + Product cross sale recommendation - Conversion/Purchase prediction - Inference with other in-built >20 models The audience will get first-hand experience with how to write CREATE MODEL sql syntax to build machine learning models such as: - Multiclass logistic regression for classification - K-means clustering - Matrix factorization - ARIMA time series predictions ... and more Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision-making through predictive analytics across the organization without leaving the query editor. In the end, the audience will learn how everyday developers can build/train/run their own machine-learning models straight from the database query editor, by issuing CREATE MODEL statements
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
Full-stack Web Development with MongoDB, Node.js and AWSMongoDB
Akira Technologies will share its experience of building a universal scalable high-performance platform for conducting surveys. Using MongoDB allowed replacing dozens unique survey systems with a single flexible solution, improved data and questionnaire reusability, simplified data analysis. We will also cover full-stack development and integration with Node.js, Hadoop, deployment to AWS Cloud, offline caching and stress-tecting the entire system with Tsung. A working prototype will be demonstrated including multiple surveys, dynamically rebuilding interface, geolocation, data analysis and visualization.
Everything You Need to Know About MongoDB Development.pptx75waytechnologies
Today, organizations from different verticals want to harness the power of data to grab new business opportunities and touch new heights of success. Such an urge leads them to follow unique ways to use and handle data effectively. After all, the right use of data boosts the ability to make business decisions faster. But at the same time, working with data is not as easy as a walk in the garden. It sometimes turns out to be a long-standing problem for businesses that also affects their overall functioning.
Companies expect fast phase development and better data management in every scenario. Modern web-based applications development demands a quality working system that can be deployed faster, and the application is able to scale in the future as per the constantly changing environment.
Earlier, relational databases were used as a primary data store for web application development. But today, developers show a high interest in adopting alternative data stores for modern applications such as NoSQL (Not Only Structured Query Language) because of its incredible benefits. And if you ask us, one of the technologies that can do wonders in modern web-based application development is MongoDB.
MongoDB is the first name strike in our heads when developing scalable applications with evolving data schemas. Because MongoDB is a document database, it makes it easier for developers to store both structured and unstructured data. Stores and handles large amounts of data quickly, MongoDB is undoubtedly the smart move toward building scalable and data-driven applications. If you’re wondering what MongoDB is and how it can help your digital success, this blog is surely for you.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
2. {
Part One:[
Big Data
No SQL
],
Part Two:[
Mongo DB
Architecture & Modelization
CRUD
Replication
Security
Aggregation
Mongo DB Atlas
]
}
06/06/2019 MongoDB class by Alexandre Bergere 2
3. MongoDB class by Alexandre Bergere 3
alexandre.bergere@gmail.com
https://fr.linkedin.com/in/alexandrebergere
@AlexPhile
Avanade
2016 - 2019
Sr Anls, Data Engineering
Worked for 3 years as a senior analyst at
Avanade France, I have developed my skills
in data analysis (MSBI, Power BI, R, Python,
Spark, Cosmos DB) by working on innovative
projects and proofs of concept in the energy
industry.
ESAIP
Teacher
2016 - x
Data Freelance
2019 - x
4. 06/06/2019 MongoDB class by Alexandre Bergere 4
Sources
A lot of the sources for making this courses provided from docs.mongodb.com or
https://www.university.mongodb.com.
6. Data Source
Big Data Architecture
Data Ingestion & Processing Data Analytics Data Visualization
Data Storage
Data Lake / Data Warehouse
ETL
Messaging Queue
Data Management
Batch / Streaming
Machine Learning MOLAP Data Quality
No SQL
HDFS
SQL
Web Apps
Visualizations tools
Visuals Query
RDBMS
Social Media
Device
IoT / Sensors
Files (log, Unst)
06/06/2019 MongoDB class by Alexandre Bergere 6
Object store
7. Data Storage
Relational data store HDFS Key Value data store Columnar data store
Object store Search data store Graph data store Document data store
06/06/2019 MongoDB class by Alexandre Bergere 7
10. NoSQL
OLAP
System R & SQL
Cobol
Hierarchic model
Codd's 12 Rules
o Rule 1: Information Rule
o Rule 2: Guaranteed Access Rule
o Rule 3: Systematic Treatment of NULL Values
o Rule 4: Active Online Catalog
o Rule 5: Comprehensive Data Sub-Language Rule
o Rule 6: View Updating Rule
o Rule 7: High-Level Insert, Update, and Delete Rule
o Rule 8: Physical Data Independence
o Rule 9: Logical Data Independence
o Rule 10: Integrity Independence
o Rule 11: Distribution Independence
o Rule 12: Non-Subversion Rule
Data management
RDBMS
06/06/2019 MongoDB class by Alexandre Bergere 10
11. NoSQL
OLAP
System R & SQL
Cobol
Hierarchic model
Data management
OLAP : Online Analytical Processing
06/06/2019 MongoDB class by Alexandre Bergere 11
12. Data management
No SQL
NoSQL
OLAP
System R & SQL
Cobol
Hierarchic model
Benefits:
o performance
o volume
o variety
06/06/2019 MongoDB class by Alexandre Bergere 12
Different type of data storage:
o Key-value
o Document data store
o Columnar data store
o Graph data store
o Search data store
14. Created in 2007 & first release
in 2010.
Easy and simple … as a leaf.
Document data store &
Schemaless.
06/06/2019 MongoDB class by Alexandre Bergere 14
17. Mongo DB is easy
For many developers, data model goes hand in hand with object mapping, and for that purpose
you may have used an object-relational mapping library, such as Java’s Hibernate framework or
Ruby’s ActiveRecord.
Such libraries can be useful for efficiently building applications with a RDBMS, but they’re less
necessary with MongoDB. This is due in part to the fact that a document is already an object-
like representation. It’s also partly due to the MongoDB drivers, which already provide a fairly
high-level interface to MongoDB. Without question, you can build applications on MongoDB
using the driver interface alone.
06/06/2019 MongoDB class by Alexandre Bergere 17
18. Use cases
o Web application (mongoDB is well-suited as primary datastore for web application)
o Agile development
o Analytics and logging
o Caching
o Variable Schemas
06/06/2019 MongoDB class by Alexandre Bergere 18
19. The case for adding NoSQL
o Large volumes of rapidly changing structured, semi-structured, and unstructured data
o Agile sprints, quick schema iteration, and frequent code pushes
o API-driven, object-oriented programming that is easy to use and flexible
o Geographically distributed scale-out architecture instead of expensive, monolithic
architecture
Consider, for example, enterprise resource planning (ERP), a standard for relational databases.
What if you want to offer ERP forms users can actually modify if they need to? A document-
based NoSQL database such as MongoDB can provide that functionality without requiring you
to rebuild your whole data schema every time a user wants to change the data format.
06/06/2019 MongoDB class by Alexandre Bergere 19
21. Mongo DB 4.0 : ACID transactions
More info.
06/06/2019 MongoDB class by Alexandre Bergere 21
22. Leader in The Forrester Wave™: Big Data NoSQL, Q1 2019
o “MongoDB remains the most popular
NoSQL database”
o Used by more than 8,000 companies,
including many Fortune 100
companies.
o Highest possible scores in 21 of the
26 criteria.
06/06/2019 MongoDB class by Alexandre Bergere 22
29. Document Model
Pers_ID Surname First_Name City
0 Miller Paul London
1 Ortega Alvaro Valencia
2 Huber Urs Zurich
3 Blanc Gaston Paris
4 Bertolini Fabrizio Rome
Car_ID Model Year Value Pers_ID
101 Bently 1973 100000 0
102 Rolls Royce 1965 330000 0
103 Peugot 1993 500 3
104 Ferrari 2005 150000 4
105 Renault 1998 2000 3
106 Renault 2001 7000 3
107 Smart 1999 2000 2
CAR
PERSON
Mongo DB
RDBMS
06/06/2019 MongoDB class by Alexandre Bergere 29
30. TP - Modelization
1. Transform this address « 125 avenue de la république, 75011, PARIS » in BSON object.
2. Transform this 2 addresses « 125 avenue de la république, 75011, PARIS » and « 34 rue Ferdinand,
75012, PARIS » in an array.
3. Transform the schema below on BSON document.
ID LastName FirstName Age
1 BERGERE Alexandre 26
Address ID People
125 avenue de la république, 75011,
PARIS
1
34 rue Ferdinand,
75012, PARIS
1
1 n
06/06/2019 MongoDB class by Alexandre Bergere 30
32. SQL vs MongoDB Terms
SQL Terms/Concepts MongoDB Terms/Concepts
Database Database
Table Collection
Line Document
Column Field
Index Index
Join Embeded or linked document
Primary key Primary key (start by « _id »)
06/06/2019 MongoDB class by Alexandre Bergere 32
35. Launch instance
Launch as a service:
o mongod --dbpath C:UsersalexaDocumentsMongoDBdata -- logpath
C:UsersalexaDocumentMongoDBlogs.log
Launch the conection:
o mongo
Launch a shard:
o mongos
Original Shortcut
--db -d
--collection -c
--username -u
--password -p
--host -h
Options:
06/06/2019 MongoDB class by Alexandre Bergere 35
36. The Javascript console
var authColl = db.getCollection("auth")
authColl.insertOne(
{
usrName : "John Doe",
usrDept : "Sales",
usrTitle : "Executive Account Manager",
authLevel : 4,
authDept : [ "Sales", "Customers"]
}
)
06/06/2019 MongoDB class by Alexandre Bergere 36
38. DML
# Returns all database
> show dbs
# The current database name:
> db.getName()
# Returns all database
> show dbs
# Returns all collection in the current database:
> db.getCollectionNames()
# Returns a collection or a view object:
> db.getCollection(name)
# The current database connection:
> db.getMongo()
# Clean the console log:
> cls
# Return collection informations:
> db.getCollectionInfos({name: "name"})
06/06/2019 MongoDB class by Alexandre Bergere 38
39. DML
# Removes the current database:
> db.dropDatabase()
# Copies a database to another database on the current host:
>db.copyDatabase(fromdb, todb, fromhost, usern
ame, password, mechanism)
# Copies a database from a remote host to the current host:
> db.cloneDatabase("hostname")
# Rename collection:
> db.renameCollection({ renameCollection:
"fromCollection", to: " toCollection" })
> use test
Or
> db.orders.renameCollection( "toCollection" )
# Copies data directly between MongoDB instances:
> db.cloneCollection(from, collection, query)
06/06/2019 MongoDB class by Alexandre Bergere 39
40. Stats
# Returns statistics that reflect the use state of a single database or
collection.
> db.stats()
> db.collection.stats()
{
"ns" : "guidebook.restaurants",
"count" : 25359,
"size" : 10630398,
"avgObjSize" : 419,
"storageSize" : 4104192
"capped" : false,
"wiredTiger" : {
"metadata" : {
"formatVersion" : 1
}, […]
"nindexes" : 4,
"totalIndexSize" : 626688,
"indexSizes" : {
"_id_" : 217088,
"borough_1_cuisine_1" : 139264,
"cuisine_1" : 131072,
"borough_1_address.zipcode_1" :
139264
}
06/06/2019 MongoDB class by Alexandre Bergere 40
41. Command-line tools
Launch in the shell, not in mongoDB instance.
06/06/2019 MongoDB class by Alexandre Bergere 41
42. Import or export document
mongoexport and mongoimport: Export and import JSON, CSV, and TSV7 data.
# Import multiples document:
mongoimport -d crunchbase -c companies
C:UsersalexaDocumentsMongoDBsrccompanies.json
# Import multiples document in an array:
mongoimport -d crunchbase -c artists --file
C:UsersalexaDocumentsMongoDBsrcartists.json --jsonArray
# Export collection:
mongoexport --db crunchbase --collection artists --out artists.json
06/06/2019 MongoDB class by Alexandre Bergere 42
43. Backup
mongodump
mongodump
--host
--port
--db
--username
--password (when specifying the password as part of the URI connection string)
--authenticationDatabase
--authenticationMechanism
# mongodump a Collection:
mongodump --db test --collection collection
# mongodump a Database:
mongodump –archive=test.20100224.archive –db Crunchbase
06/06/2019 MongoDB class by Alexandre Bergere 43
44. Restore
mongostore
mongostore
--host
--port
--collection /pwd –db /pwd
--db /pwd
--username
--password
--authenticationDatabase
<path to the backup>
# Output an Archive to Standard Output:
mongodump --archive --db test --port 27017 | mongorestore --archive --port 27018
06/06/2019 MongoDB class by Alexandre Bergere 44
45. Others
mongostore
o mongosniff: A wire-sniffing tool for viewing operations sent to the database. It essentially
translates the BSON going over the wire to human-readable shell statements.
o mongostat: Similar to iostat, this utility constantly polls MongoDB and the system to provide
helpful stats, including the number of operations per second (inserts, queries, updates, deletes,
and so on), the amount of virtual memory allocated, and the number of connections to the
server.
o mongotop: Similar to top, this utility polls MongoDB and shows the amount of time it spends
reading and writing data in each collection.
o mongoperf: Helps you understand the disk operations happening in a running MongoDB
instance.
o mongooplog: Shows what’s happening in the MongoDB oplog.
o Bsondump: Converts BSON files into human-readable formats including JSON.
06/06/2019 MongoDB class by Alexandre Bergere 45
48. Capped collection
Distinguished from standard collectionsby their fixed size. This means that once a capped
collection reaches its maximum size, subsequent inserts will overwrite the least-recently-
inserted documents in the collection.
This design prevents users from having to prune the collection manually when only recent data
may be of value.
> {
create: <collection or view name>,
capped: <true|false>
[…]
}
Designed for high-performance logging scenarios.
06/06/2019 MongoDB class by Alexandre Bergere 48
49. > find
# FIND()
> db.<collection>.find ({<conditions>},{<champs>})
> db.products.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 } )
sort, first, skip, second, and limit last because that is the
only order that makes sense.
# Options:
>
.pretty()
.sort() : 1 : ASC, -1: DESC :
sort({‘name’:-1})
.skip() : number
.limit() : number
.count()
06/06/2019 MongoDB class by Alexandre Bergere 49
50. Partial Match Queries in Users
# Use regular expression:
> db.users.find({'last_name': /^Ber/})
06/06/2019 MongoDB class by Alexandre Bergere 50
51. > insert
# INSERT()
> db.<collection>.insert ({<value>})
> db.<collection>.insertMany([{<values>}])
> db.inventory.insertMany([
{ item: "journal", qty: 25, tags: ["blank", "red"], size: { h: 14, w: 21, uom:
"cm" } },
{ item: "mat", qty: 85, tags: ["gray"], size: { h: 27.9, w: 35.5, uom: "cm" } },
{ item: "mousepad", qty: 25, tags: ["gel", "blue"], size: { h: 19, w: 22.85,
uom: "cm" } }
])
db.collection.insertOne() Inserts a single document into a collection.
db.collection.insertMany() db.collection.insertMany() inserts multiple documents into a collection.
db.collection.insert()
db.collection.insert() inserts a single document or multiple documents into
a collection.
06/06/2019 MongoDB class by Alexandre Bergere 51
52. > update
# UPDATE()
> db.<collection>.update
({<conditions>},{<champs>},{upsert:true/false},{multi:true/false}
)
> { "_id": "artist:281", "last_name": "Cotillard", "first_name": "Marion", "birth_date": "1975" }
# Operator Update:
> db.artists.update({"_id": "artist:281"},{ $set : {"last_name" : "Page"}})
> { "_id": "artist:281", "last_name": “Page", "first_name": "Marion", "birth_date": "1975" }
# Replacement Update:
> db.artists.update({"_id": "artist:281"},{"last_name" : "Page"})
> { "_id": "artist:281", "last_name": “Page"} ❑ Operator Update
❑ Replacement Update
All updates require at least two arguments. The first specifies which documents to update, and
the second defines how the selected documents should be modified
06/06/2019 MongoDB class by Alexandre Bergere 52
53. > update
Upsert: boolean Optional. If set to true, creates a new document when no document
matches the query criteria. The default value is false, which does not insert a new document
when no match is found.
Multi: boolean Optional. If set to true, updates multiple documents that meet the query
criteria. If set to false, updates one document. The default value is false.
# UPDATE()
> db.<collection>.update ({<conditions>},{<champs>}
,{upsert:true/false}
,{multi:true/false}
)
> db.pageview.update({'_id':'/potager/users'},{$inc:{'views':1}},{upsert:true})
06/06/2019 MongoDB class by Alexandre Bergere 53
54. Query Operator
Name Description
$eq Matches values that are equal to a specified value.
$gt Matches values that are greater than a specified value.
$gte Matches values that are greater than or equal to a specified value.
$lt Matches values that are less than a specified value.
$lte Matches values that are less than or equal to a specified value.
$ne Matches all values that are not equal to a specified value.
$in Matches any of the values specified in an array.
06/06/2019 MongoDB class by Alexandre Bergere 54
55. Query Operator
Name Description
$set Sets the value of a field in a document.
$unset Removes the specified field from a document.
$inc Increments a field by a specified value.
$rename updates the name of a field
$muc Multiply the value of a field by a number
> db.products.update( { _id: "56c0befa5e435acc1d4a5fbd"}, { $inc: { quantity: -2}
})
06/06/2019 MongoDB class by Alexandre Bergere 55
58. Query Operator : Arrays
Name Description
$set Sets the value of a field in a document.
$unset Removes the specified field from a document.
$inc Increments a field by a specified value.
$rename updates the name of a field
$muc Multiply the value of a field by a number
> db.products.update( { _id: "56c0befa5e435acc1d4a5fbd"}, { $inc: { quantity: -2}
})
06/06/2019 MongoDB class by Alexandre Bergere 58
59. > delete
# DELETE()
> db.<collection>.remove ({<conditions>})
> db.artists.remove({"_id": "artist:39"})
# Remove all fields
> db.artists.remove({})
06/06/2019 MongoDB class by Alexandre Bergere 59
60. TP - Modelization
1. Import the json document “veg_garden” into
mongoDB.
2. Return all vegetable garden with an existing
property of “number”.
3. Return all vegetable garden with an existing
property of “harvest”.
4. Return all vegetable garden with a service’ title
“Classes”.
5. Return all vegetable garden with a sale’s address
number 52.
6. Return all vegetable garden that have the product
97.
7. Import json documents “companies” and “artists”
into mongoDB.
8. Return the number of companies with a number
of employees less or equal to 45.
9. Return artists from the 6th to the 9th ordered desc
by their name.
10. Insert the following artist:
"_id": "artist:9", "last_name": "Bergere",
"first_name": "Alexandre", "birth_date": "1992“.
11. Add « golf » on artist’s hobbies with the id 280.
12. Add « yoga » on artist’s hobbies with the id 282.
13. Delete hobbies « pony » and « painting » from the
artist 280.
06/06/2019 MongoDB class by Alexandre Bergere 60
61. TP - Modelization
# 1. Import the json document “veg_garden” into mongoDB.
mongoimport -d crunchbase -c vegGarden --file C:UsersalexCoursMongoDB2018-2019srcveg_garden.json --
jsonArray
# 2. Return all vegetable garden with an existing property of “number”.
> db.vegGarden.find({"number":{$exists:true}}).pretty()
# 3. Return all vegetable garden with an existing property of “harvest”.
> db.vegGarden.find({"harvest":{$exists:true}}).pretty()
# 4. Return all vegetable garden with a service’s title “Classes”.
> db.vegGarden.find({"service.title":“Classes"}).pretty()
# 5. Return all vegetable garden with a sale’s address number 52.
> db.vegGarden.find({"adresse.sale.num":52}).pretty()
# 6. Delete hobbies « pony » and « painting » from the artist 280.
> db.vegGarden.find({"products":{$in:[97]}})
06/06/2019 MongoDB class by Alexandre Bergere 61
62. TP - Modelization
# 7. Import json documents “companies” and “artists” into mongoDB.
mongoimport -d crunchbase -c artists --file C:UsersalexCoursMongoDB2018-2019srcartists.json --
jsonArray
mongoimport -d crunchbase -c companies C:UsersalexCoursMongoDB2018-2019srccompanies.json
# 8. Return the number of companies with a number of employees less or equal to 45.
> db.companies.count({number_of_employees:{$lte:45}})
# 9. Return artists from the 6th to the 9th ordered desc by their name
> db.artists.find().pretty().sort({"last_name":-1}).skip(5).limit(4)
# 10. Insert the following artist:
"_id": "artist:9", "last_name": "Bergere", "first_name": "Alexandre", "birth_date": "1992" .
Remplacer le numéro d’id par 282.
> db.artists.insert({ "_id": "artist:9", "last_name": "Bergere", "first_name": "Alexandre", "birth_date":
"1992" })
# 11. Add « golf » on artist’s hobbies with the id 280.
> db.artists.update({"_id": "artist:280"},{$push:{"hobbies":"golf"}})
# 12. Add « yoga » on artist’s hobbies with the id 282.
> db.artists.update({"_id": "artist:282"},{$push:{"hobbies":"yoga"}})
# 13. Retirer les hobbies « poney » et « photo » à l’artiste 280.
> db.artists.update({"_id": "artist:280"},{$pull:{"hobbies": {$in:["pony","photo"]}}})
06/06/2019 MongoDB class by Alexandre Bergere 62
64. Schema validation
• Implement data governance without sacrificing
the agility that comes from a dynamic schema.
• With schema validation, developers and
operations spend less time defining data
quality controls in their applications, and
instead delegate these tasks to the database.
To specify validation rules when creating a new collection, use with the valid
db.createCollection() option.
To add document validation to an existing collection, use collMod command with the validator
option.
06/06/2019 MongoDB class by Alexandre Bergere 64
65. Example of schema validation
# Create a database
> db.createCollection("students", {
validator: {
$jsonSchema: {
bsonType: "object",
required: [ "name", "year", "major", "gpa" ],
additionalProperties: true,
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
gender: {
bsonType: "string",
description: "must be a string and is not required"
},
year: {
bsonType: "int",
minimum: 2017,
maximum: 3017,
exclusiveMaximum: false,
description: "must be an integer in [ 2017, 3017 ]
and is required"
}
> […] major: {
enum: [ "Math", "English",
"Computer Science", "History", null ],
description: "can only be
one of the enum values and is required"
},
gpa: {
bsonType: [ "double" ],
minimum: 0,
description: "must be a
double and is required"
}
}
}
}
})
06/06/2019 MongoDB class by Alexandre Bergere 65
66. Query expression
In addition to JSON Schema validation, MongoDB supports validation with query filter
expressions using the query operators, with the exception of $near, $nearSphere, $text, and
$where.
> db.createCollection( "contacts",
{ validator: { $or:
[
{ phone: { $type: "string" } },
{ email: { $regex: /@mongodb.com$/ } },
{ status: { $in: [ "Unknown", "Incomplete" ] } }
]
}
} )
06/06/2019 MongoDB class by Alexandre Bergere 66
67. Add a validator to an existing collection
In addition to JSON Schema validation, MongoDB supports validation with query filter
expressions using the query operators, with the exception of $near, $nearSphere, $text, and
$where.
> db.runCommand( {
collMod: "contacts",
validator: { $jsonSchema: {
bsonType: "object",
required: [ "phone", "name" ],
properties: {
phone: {
bsonType: "string",
description: "must be a string and is required"
},
name: {
bsonType: "string",
description: "must be a string and is required"
}
}
} },
validationLevel: "moderate"
} )
collMod
06/06/2019 MongoDB class by Alexandre Bergere 67
68. Validation level & action
ValidationLevel Description
"off" disable validation entirely.
"strict"
If the validationLevel is strict (the default), MongoDB
applies validation rules to all inserts and updates.
"moderate"
If the validationLevel is moderate, MongoDB applies
validation rules to inserts and to updates to existing
documents that already fulfil the validation criteria. With
the moderate level, updates to existing documents that do
not fulfill the validation criteria are not checked for
validity.
validationAction Description
"error"
Default Documents must pass validation before the write
occurs. Otherwise, the write operation fails.
"warn"
Documents do not have to pass validation. If the document
fails validation, the write operation logs the validation
failure.
ValidationLevel option, which determines how strictly MongoDB
applies validation rules to existing documents during an update.
ValidationAction option, which determines whether MongoDB
should error and reject documents that violate the validation rules or
warn about the violations in the log but allow invalid documents.
06/06/2019 MongoDB class by Alexandre Bergere 68
69. Bypass Document Validation
Users can bypass document validation on commands and methods that support the
bypassDocumentValidation option. The following commands and their equivalent methods
support bypassing document validation:
oaggregate
oapplyOps
ocloneCollection on the destination collection
oclone on the destination
ocopydb on the destination
ofindAndModify
oinsert
omapReduce
oUpdate
For deployments that have enabled access control, to bypass document validation, the
authenticated user must have bypassDocumentValidation action. The built-in roles dbAdmin
and restore provide this action.
06/06/2019 MongoDB class by Alexandre Bergere 69
70. TP – Schema Validation
1. Add the following schema validation to the artists’ collection:
• "last_name","first_name","status" required.
• “status” can take only this two values: "alive“ or "dead"
2. Try to insert the following artist: { "last_name": "Katerine", "first_name": "Philippe"}
3. Update the artist with the id “artists:281”, by modify his name by “Kheirona”.
4. Change validation level in « moderate ».
5. Try again the update in question 3.
6. Change validation action in « warn », then insert again the artist in question 2.
06/06/2019 MongoDB class by Alexandre Bergere 70
71. TP – Schema Validation
# 1. Add the following schema validation to the artists’ collection:
> db.runCommand({
collMod: "artists",
validator:{ $jsonSchema:{
bsonType: "object",
required:["last_name","first_name","status"],
properties:{
last_name:{
bsonType: "string",
description:"must be a string and is required"
}
,first_name:{
bsonType: "string",
description:"must be a string and is required"
}
,status:{
enum: ["alive", "dead"],
description:"must be a alive or dead and is required"
}
}
},
}
,validationLevel: "strict"
})
06/06/2019 MongoDB class by Alexandre Bergere 71
73. TP – Schema Validation
# 2. Try to insert the following artist: { "last_name": "Katerine", "first_name":
"Philippe"}
> db.artists.insert({ "last_name": "Katerine", "first_name": "Philippe"}) –
{failed}
# 3. Update the artist with the id “artists:281”, by modify his name by
“Kheirona”.
> db.artists.update({"_id": "artist:281"},{ $set:{ "last_name": " Kheirona" }})
# 4. Change validation level in « moderate ».
> db.runCommand({
collMod: "artists"
,validationLevel : "moderate"
})
# 6. Change validation action in « warn », then insert again the artist in question
2.
db.runCommand({
collMod: "artists"
,validationAction: "warn"
})
2018-12-01T12:31:23.738-0500 W STORAGE [conn1] Document would fail validation collection: example.contacts2 doc: { _id:
ObjectId('5a2191ebacbbfc2bdc4dcffc’), last_name: " Kheirona "}
06/06/2019 MongoDB class by Alexandre Bergere 73
81. TP – One to Many
# Subject:
> db.subject.insertMany([
{
"_id":"MongoDB"
,"nom":"MongoDB"
,"salle":"A09"
,"prof":"Alexandre Bergere"
},
{
"_id":"NodeJS"
,"nom":"NodeJS"
,"salle":"A12"
,"prof":"Thierry Dupont"
}
])
06/06/2019 MongoDB class by Alexandre Bergere 81
82. TP – One to Many
# Request:
> var subject = []
> db.subject.find().forEach(function(u) { subject.push(u._id) })
> db.student.find({"subject.id_ subject": {$in: subject}})
06/06/2019 MongoDB class by Alexandre Bergere 82
83. TP – Many to Many
Student Subject
06/06/2019 MongoDB class by Alexandre Bergere 83
85. TP – Many to Many
# Subject:
> db.subject.insertMany([
{
"_id":"MongoDB"
,"nom":"MongoDB"
,"salle":"A09"
,"prof":"Alexandre Bergere"
,"Students":[
{
"Prom":"ir2016",
"Student_id":[ObjectID("97099230912812"),
ObjectID("23109834091209")]
}
]
},
{
"_id":"NodeJS"
,"nom":"NodeJS"
,"salle":"A12"
,"prof":"Thierry Dupont"
,"Students":[
{
"Prom":"ir2016",
"Student_id":[ObjectID("97099230912812"),
ObjectID("23109834091209")]
}
]
}
])
06/06/2019 MongoDB class by Alexandre Bergere 85
86. $lookup
> {
$lookup:
{
from: <collection to join>,
localField: <field from the input documents>,
foreignField: <field from the documents of the "from" collection>,
as: <output array field>
}
}
06/06/2019 MongoDB class by Alexandre Bergere 86
91. Index
Indexes are special data structures [1] that store a small portion of the collection’s data set in an easy to
traverse form. The index stores the value of a specific field or set of fields, ordered by the value of the
field. The ordering of the index entries supports efficient equality matches and range-based query
operations. In addition, MongoDB can return sorted results by using the ordering in the index.
06/06/2019 MongoDB class by Alexandre Bergere 91
94. $text
$text performs a text search on the content of the fields indexed with a text index. A $text expression has the
following syntax:
{
$text:
{
$search: <string>,
$language: <string>,
$caseSensitive: <boolean>,
$diacriticSensitive: <boolean>
}
}
> db.articles.find( { $text: { $search:
"coffee" } } )
06/06/2019 MongoDB class by Alexandre Bergere 94
95. $text - indexation
Indexes
A collection can have at most
one text index.
> db.collection.createIndex( { comments: "text" } )
# You can index multiple fields for the text index:
> db.collection.createIndex(
{
subject: "text",
comments: "text"
}
)
First, create your index !
Wildcard Text Indexes
When creating a text index on multiple fields, you can also use the wildcard
specifier ($**). With a wildcard text index, MongoDB indexes every field
that contains string data for each document in the collection. The following
example creates a text index using the wildcard specifier:
db.collection.createIndex( { "$**": "text" } )
06/06/2019 MongoDB class by Alexandre Bergere 95
96. $text
Case Insensitivity
The version 3 text index supports the common C, simple S, and for Turkish
languages, the special T case foldings as specified in Unicode 8.0 Character
Database Case Folding.
The case foldings expands the case insensitivity of the text index to include
characters with diacritics, such as é and É, and characters from non-Latin
alphabets, such as “И” and “и” in the Cyrillic alphabet.
Version 3 of the text index is also diacritic insensitive. As such, the index
also does not distinguish between é, É, e, and E.
Previous versions of the text index are case insensitive for [A-z] only; i.e.
case insensitive for non-diacritics Latin characters only . For all other
characters, earlier versions of the text index treat them as distinct.
06/06/2019 MongoDB class by Alexandre Bergere 96
97. $text - indexation
Case Insensitivity
Match Any of the Search Terms
If the search string is a space-delimited string, $text operator performs a logical OR
search on each term and returns documents that contains any of the terms.
Search for a Phrase
To match the exact phrase as a single term, escape the quotes.
Exclude Documents That Contain a Term
A negated term is a term that is prefixed by a minus sign -. If you negate a term, the
$text operator will exclude the documents that contain those terms from the results.
Search a Different Language
Use the optional $language field in the $text expression to specify a language that
determines the list of stop words and the rules for the stemmer and tokenizer for the
search string.
If you specify a language value of "none", then the text search uses simple
tokenization with no list of stop words and no stemming.
> db.articles.find( { $text: { $search: "bake coffee cake" } } )
> db.articles.find( { $text: { $search: ""coffee shop"" } } )
> db.articles.find( { $text: { $search: "coffee -shop" } } )
> db.articles.find({ $text: { $search: "leche", $language: "es"
} })
06/06/2019 MongoDB class by Alexandre Bergere 97
98. TP – $text
1. Find in collection « companies » the following words: “Server” & “Software” in the fields
“description” and “name”.
06/06/2019 MongoDB class by Alexandre Bergere 98
104. MongoDB Compass
Visualize, understand, and work with your geospatial data
Point and click to construct sophisticated queries, execute
them with the push of a button and Compass will display your
results both graphically and as sets of JSON documents.
A better approach to CRUD makes it easier to interact with your
data
Modify existing documents with greater confidence using the
intuitive visual editor, or insert new documents and clone or
delete existing ones in just a few clicks.
06/06/2019 MongoDB class by Alexandre Bergere 104
105. MongoDB Compass
Compass Community
Editions
View, add, and delete databases and collections X X
View and interact with documents with full CRUD functionality X X
Build and run ad hoc queries X X
View and optimize query performance with visual explain plans X X
Manage indexes: view stats, create, and delete X X
Create and execute aggregation pipelines X X
Kerberos, LDAP and x509 Authentication X
Schema Analysis X
Real Time Server Stats X
Document Validation X
06/06/2019 MongoDB class by Alexandre Bergere 105
106. MongoDB Compass
Compass Readonly Edition
New in version 1.12.0
A read-only version of MongoDB Compass is available which provides the ability to limit
certain CRUD operations within your organization. In this version, users are limited strictly
to read operations within MongoDB.
Compass Isolated Edition
New in version 1.14.0
Compass Isolated Edition restricts network requests to TLS-encrypted TCP connections to the
server chosen on the Connect screen. All other outbound connections are not permitted in this
edition.
06/06/2019 MongoDB class by Alexandre Bergere 106
107. TP - Compass
1. Insert the following artist: {“last_name”: “Van gogh”, "first_name": “Vincent”}
2. Add hobbies “pony” and “painting” to the artist 280.
3. Add "birth_date" to the schema validation.
4. Return artists from the 6th to the 9th ordered desc by their name.
5. Free test
06/06/2019 MongoDB class by Alexandre Bergere 107
Artist collection:
111. Replica Set
A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide
redundancy and high availability, and are the basis for all production deployments. This section introduces
replication in MongoDB as well as the components and architecture of replica sets. The section also provides
tutorials for common tasks related to replica sets.
Replication provides redundancy and increases data availability. With multiple copies of data on different
database servers, replication provides a level of fault tolerance against the loss of a single database server.
In some cases, replication can provide increased read capacity as clients can send read operations to different
servers. Maintaining copies of data in different data centers can increase data locality and availability for
distributed applications. You can also maintain additional copies for dedicated purposes, such as disaster
recovery, reporting, or backup
06/06/2019 MongoDB class by Alexandre Bergere 111
112. Replica set
27017 27018
27019
Primary Arbiter
Secondary
REPLICATION
Types de serveur:
• primary
• secondary
• arbiter
• hidden
06/06/2019 MongoDB class by Alexandre Bergere 112
114. Replica set
options
mongod --port 27001
--replSet name
--dbpath paht of data
--logpath user
--logappend (if the server shutdown)
--oplogSize 50
mongod --port 27017 --dbpath
"C:UsersalexaDocumentsMongoDBdata_primary" --replSet rs0 --
smallfiles --oplogSize 128
mongod --port 27018 --dbpath
"C:UsersalexaDocumentsMongoDBdata_secondary" --replSet rs0 --
smallfiles --oplogSize 128
mongod --port 27019 --dbpath
"C:UsersalexaDocumentsMongoDBdata_arbitrer" --replSet rs0 --
smallfiles --oplogSize 128
06/06/2019 MongoDB class by Alexandre Bergere 114
115. Replica set
Replica
Options:
• arbitrerOnly : true (à aucunes données et permet de voter lors d'un
nombre paire de serveur)
• priority : 0 (never primary) permet de donner un ordre sur le futur
primary en cas de problèmes
• hidden : true permet de cacher le serveur des clients, il ne peut être
primary
• slaveDelay : Mets à jour les data avec un delay (ex : 8*3600 récupère les
données avec toujours 8h de retard) (rajouter hidden:true)
• vote : 2 (ex) permet de rajouter des votes (déconseiller, autant utiliser
arbiterOnly) [ex : Srv 1 (vote 2) Srv 2 (vote 1) si Srv tombe, Srv2 a 1/3, il
devient pas primary]
06/06/2019 MongoDB class by Alexandre Bergere 115
116. Replica set
Initialisation
# Use rs.initiate() on one and only one
member of the replica set:
> rs.initiate({
_id: "rs0",
version: 1,
members: [
{ _id: 0, host : "localhost:27017"
}
, { _id: 1, host :
"localhost:27018" }
]
}
)
# Add other replica:
> rs.add("localhost:27018")
> rs.addArb(" localhost :27019")
# Delete a server from the replicaSet:
> rs.remove("localhost:27018")
# Check the
configuration:
> rs.conf()
06/06/2019 MongoDB class by Alexandre Bergere 116
117. Replica set
Command
# In each Replica:
> rs.slaveOk()
# Check status:
> rs.status()
The secondary only accepts writes that it gets through replication.
To allow queries on a secondary, we must tell Mongo that we are okay with
reading from the secondary.
06/06/2019 MongoDB class by Alexandre Bergere 117
118. Replica set
Reconfiguration
# For the hidden:
> cfg = rs.conf()
> cfg.members[2].priority = 0
> cfg.members[2]. slaveDelay = 86400
> cfg.members[2].hidden = true
> rs.reconfig(cfg)
# For the hidden:
> rs.remove("localhost:27018")
> rs.addArb("localhost :27018")
To do only on the PRIMARY !
06/06/2019 MongoDB class by Alexandre Bergere 118
119. Fire & Forget strategy
You can configure MongoDB to fire-and-forget, sending off a write to the server without
waiting for an acknowledgment.
For high-volume, low-value data (like clickstreams and logs), fire-and-forget-style writes can be
ideal.
You can also configure MongoDB to guarantee that a write has gone to multiple replicas before
considering it committed. For important data, a safe mode setting is necessary.
06/06/2019 MongoDB class by Alexandre Bergere 119
120. TP – Replica set
1. Init 3 replicas.
2. Import the dump on the primary.
3. Put one of the replicas in backup, adjust its reception delay to 24h.
4. Try to insert data into a secondary.
5. Insert data into the primary and check its persistence on the network.
6. Add a fourth replica and configure it as an arbiter (check data behaviour on this one).
7. Import the file “place.json”. Is it persistent on all the network?
8. Set the priority to 2 for the primary, shutdown it and start it back.
06/06/2019 MongoDB class by Alexandre Bergere 120
122. TP – Replica set
06/06/2019 MongoDB class by Alexandre Bergere 122
# 2. Import the dump on the primary:
mongodump --archive --db crunchbase --port 27058 | mongorestore --archive --port 27017
# 3. Put one of the replicas in backup, adjust its reception delay to 24h:
> cfg = rs.conf()
> cfg.members[2].priority = 0
> cfg.members[2].hidden = true
> rs.reconfig(cfg)
# 6. Add a fourth replica and configure it as an arbiter (check data behaviour on this one):
mongod --port 27020 --dbpath C:UsersalexaDocumentsCoursMongoDB2018-2019data4 --replSet rs0 --
smallfiles --oplogSize 128
mongo --port 27017
> rs.addArb("localhost:27020")
mongo --port 27020
> rs.slaveOk()
123. TP – Replica set
06/06/2019 MongoDB class by Alexandre Bergere 123
# 7. Set the priority to 2 for the primary, shutdown it and start it back:
> cfg = rs.conf()
> cfg.members[3].priority = 2
> rs.reconfig(cfg)
125. Aggregation
Swiss Army knife
Executes in native code
o Written in C++
o JSON parameter
Flexible, funcional, simple
o Operation pipeline
o Computational expressions
06/06/2019 MongoDB class by Alexandre Bergere 125
130. $unwind
# Collect distinct values
> {$unwind:{
"subjects"
}
{
title:"The Great Gatsby",
ISBN:"9762832930920323" ,
subjects:"Long Island"
},
{
title:"The Great Gatsby",
ISBN:"9762832930920323" ,
subjects:"New York"
},
{
title:"The Great Gatsby",
ISBN:"9762832930920323" ,
subjects:"1920s"
}
{
title:"The Great Gatsby",
ISBN:"9762832930920323" ,
subjects:[
"Long Island",
"New York",
"1920s"
]
}
06/06/2019 MongoDB class by Alexandre Bergere 130
131. TP - Aggregation
1. How many companies has more than 999 employees and was founded in or after 2000?
2. Number of companies and Number of employees group by founded year, ordrer by founded_year
desc?
3. How many companies group by category_code, with a list of all included companies names, with
category_code filter on medical or government?
4. How many companies has more than 1000 employees and was founded after 2000 (with agg.
function)?
06/06/2019 MongoDB class by Alexandre Bergere 131
132. TP - Aggregation
06/06/2019 MongoDB class by Alexandre Bergere 132
# 1. How many companies has more than 1000 employees and was founded after 2000?
> db.companies.count({$and:[{"number_of_employees":{$gte:1000}},{"founded_year":{$gte:2000}}]},{})
# 2. Number of companies and Number of employees group by founded year, order by founded_year desc?
> db.companies.aggregate ([
{ $sort : { founded_year: 1} },
{ "$group" : {
"_id" : "$founded_year",
"NumberOfCompanies" : {$sum : 1},
"NumberOfEmployees":{$sum : "$number_of_employees"}
}
}
]).pretty()
133. TP - Aggregation
06/06/2019 MongoDB class by Alexandre Bergere 133
# 3. How many companies group by category_code, with a list of all included companies names, with
category_code filter on medical or government?
> db.companies.aggregate ([
{ "$match":{
category_code: {$in:["medical","government"]}
}
},
{ "$group" : {
"_id" : "$category_code",
"NumberOfCompanies" : {"$sum" : 1},
"Companies":{$addToSet:"$name"}
}
}
]).pretty()
134. TP - Aggregation
06/06/2019 MongoDB class by Alexandre Bergere 134
# 4. How many companies has more than 1000 employees and was founded after 2000 (with agg. function)?
> db.companies.aggregate ([
{ "$match":{
$and:[{"number_of_employees":{$gte:1000}},{"founded_year":{$gte:2000}}]
}
},
{ "$group" : {
"_id" : null,
"NumberOfCompanies" : {"$sum" : 1}
}
}
]).pretty()
> db.companies.aggregate ([
{ "$match":{
$and:[{"number_of_employees":{$gte:1000}},{"founded_year":{$gte:2000}}]
}
},
{ "$count" : "NumberOfCompanies" }
]).pretty()
138. Authentification
BUSINESS NEEDS MONGODB SECURITY FEATURES
Authentication SCRAM, LDAP, Kerberos, x.509 Certificates
Authorization Built-in Roles, User-Defined Roles, Field-Level Redaction
Auditing Admin, DML, DDL, Role-Based
Encryption Network: SSL (with FIPS 140-2)
Disk : Encrypted Storage Engine or Partner Solutions
06/06/2019 MongoDB class by Alexandre Bergere 138
139. Localhost Exception
The localhost exception allows you to enable access control and then create the first user in the
system. With the localhost exception, after you enable access control, connect to the localhost
interface and create the first user in the admin database. The first user must have privileges to
create other users, such as a user with the userAdmin or userAdminAnyDatabase role.
Changed in version 3.0: The localhost exception changed so that these connections only have
access to create the first user on the admin database. In previous versions, connections that
gained access using the localhost exception had unrestricted access to the MongoDB instance.
The localhost exception applies only when there are no users created in the MongoDB instance
and only when you’re connected to database via the localhost interface, meaning in the same
server.
06/06/2019 MongoDB class by Alexandre Bergere 139
140. Client/User
Authentication
Mechanism
Mechanism Description
SCRAM-SHA-1
• Default mechanism
• Challenge / Response
• Username / Password
• IETF Standard
MONGODB-CR
• Challenge / Response
• Replaced by SCRAM-SHA-1
• Username / Password
• Deprecated as of MongoDB 3.0
X.509
• Certificate based
• Introduced in MongoDB 2.6
• TLS
LDAP
• Lightweight Director Access Protocol
• Used for directory information
• External authentication mechanism
Kerberos
• Developed at MIT
• Design for secure authentication
• External authentication mechanism
06/06/2019 MongoDB class by Alexandre Bergere 140
141. Client/User
Authentication
Initialisation
mongod --auth --dbpath C:UsersalexaDocumentsMongoDBdata
# Request:
> use admin
db.createUser(
{
user: "UserAdmin",
pwd: "abc123",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
The first thing you are allowed to do when connected to an
authenticated Mongo server is you’re allowed to create the firste user
in the database.
With that first user you create then create other users.
06/06/2019 MongoDB class by Alexandre Bergere 141
142. Client/User
Authentication
Authentification methods
After you’re created the first user in the database, the localhost exception will
not apply.
Always specify the database in which the user is created.
mongo admin --port 27017 -u "UserAdmin" -p "abc123"
OR:
mongo –port 27017 -u "UserAdmin" -p "abc123" –
authenticationDatabase=admin
> use admin
> db.auth(‘UserAdmin’, ‘abc123’ )
OR
When adding a user, you create the user in a specific database. This database is
the authentication database for the user.
A user can have privileges across different databases; i.e. a user’s privileges are
not limited to the authentication database. By assigning to the user roles in other
databases, a user created in one database can have permissions to act on other
databases.
06/06/2019 MongoDB class by Alexandre Bergere 142
143. Client/User
Authentication
Informations
# Returns users information:
> db.getUsers()
> db.system.users.find()
# Returns users information for a specified user:
> db.getUsers(username)
06/06/2019 MongoDB class by Alexandre Bergere 143
144. Client/User
Authentication
Role
Roles are grouped of privileges, actions over resources, that are granted to
users over a given namespace (database).
{
role: "<name>",
privileges: [
{ resource: { <resource> }, actions: [ "<action>", ... ] },
...
],
roles: [
{ role: "<role>", db: "<database>" } | "<role>",
...
]
}
06/06/2019 MongoDB class by Alexandre Bergere 144
151. Client/User
Authentication
Role information
# Returns roles information:
> db.getRoles("read", {showPrivileges:true})
# Helpful:
> var readRoles = db.getRoles("read", {showPrivileges:true})
> readRoles.privileges[0]
06/06/2019 MongoDB class by Alexandre Bergere 151
152. Client/User
Authentication
Role modification
# Add a role:
> db.grantRolesToUser(
"reportsUser",
[
{ role: "read", db: "accounts" }
]
)
# Revoke a role:
> db.revokeRolesFromUser(
“myTester",
[
{ role: "readWrite", db: “crunchbase" }
]
)
06/06/2019 MongoDB class by Alexandre Bergere 152
153. Client/User
Authentication
User
# Create a new user and attribute role:
> db.createUser(
{
user: "myTester",
pwd: "xyz123",
roles: [ { role: "readWrite", db: "crunchbase" },
{ role: "read", db: "test" } ]
}
)
06/06/2019 MongoDB class by Alexandre Bergere 153
154. TP – Client/User Authentication
1. Create an admin user with the role “userAdminAnyDatabase” on the database “admin”.
2. Create a user “myTester” with a reader/writer role on the database “crunchbase”.
3. Create a user “Reader” with a reader role on the database “crunchbase.
4. Export the collection “artists”, delete it and import it back.
06/06/2019 MongoDB class by Alexandre Bergere 154
155. TP – Client/User Authentication
06/06/2019 MongoDB class by Alexandre Bergere 155
# 1. Create an admin user with the role “userAdminAnyDatabase” on the database “admin”.
mongod --auth --port 27017 --dbpath C:UsersalexaDocumentsCoursMongoDB2017-2018datadata
mongo
> use admin
> db.createUser(
{
user: "UserAdmin",
pwd: "abc123",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
156. TP – Client/User Authentication
06/06/2019 MongoDB class by Alexandre Bergere 156
# 2. Create a user “myTester” with a reader/writer role on the database “crunchbase”.
mongo --port 27017 -u "UserAdmin" -p "abc123" --authenticationDatabase "admin"
OR
mongo
> use admin
> db.auth("UserAdmin", "abc123" )
> db.createUser(
{
user: "myTester",
pwd: "xyz123",
roles: [ { role: "readWrite", db: "crunchbase" } ]
}
)
157. TP – Client/User Authentication
06/06/2019 MongoDB class by Alexandre Bergere 157
# 3. Create a user “Reader” with a reader role on the database “crunchbase.
> db.createUser(
{
user: "myReader",
pwd: "xyz123",
roles: [ { role: "read", db: "crunchbase" } ]
}
)
# 4. Create a user “Reader” with a reader role on the database “crunchbase.
mongoexport -d crunchbase -c companies –out C:UsersalexaDocumentsMongoDBsrccompanies_export.json
mongoimport -d crunchbase -c companies -u "myTester" -p "xyz123" --authenticationDatabase "crunchbase"
C:UsersalexaDocumentsMongoDBsrccompanies.json
158. Internal
Authentication
Mechanism
Mechanism Description
Keyfile
(SCRAM-SHA-1)
• shared password
• copy exists on each member
• 6-1024 Base64 characters
• whitespace ignored
x.509
• certificate based
• recommended to issue different certs per member
Members of a replica set or sharded cluster must prove who they are.
06/06/2019 MongoDB class by Alexandre Bergere 158
159. Internal
Authentication
Shared / ReplicaSet authentication
With Keyfile access
1. Create a keyfile
Create a keyfile.
With keyfile authentication, each mongod instances in the replica set uses the
contents of the keyfile as the shared password for authenticating other members
in the deployment. Only mongod instances with the correct keyfile can join the
replica set.
2. Copy the keyfile to each replica set member
Copy the keyfile to each server hosting the replica set members. Ensure that the
user running the mongod instances is the owner of the file and can access the
keyfile.
3. Enable authentication for each member of the replica set.
openssl rand -base64 756 > <path-to-keyfile>
mongod --dbpath <path> --port <port> --replSet <replicaSetName> --fork
--keyFile <path-to-keyfile>
Update Existing Deployment
06/06/2019 MongoDB class by Alexandre Bergere 159
160. Internal
Authentication
Shared / ReplicaSet authentication
With Keyfile access
4. Add first user
Add a user with the userAdminAnyDatabase role.
5. Authenticate as the user administrator
6. Create additional users as needed for your deployment
> use admin
> db.createUser(
{
user: "myUserAdmin",
pwd: "abc123",
roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]
}
)
> mongo --port 27017 -u "myUserAdmin" -p "abc123" --
authenticationDatabase "admin"
Update Existing Deployment
> db.createUser({
"user":"AdminCluster"
,"pwd":"password"
,roles:[{"role":"clusterAdmin","db":"admin"}]
})
06/06/2019 MongoDB class by Alexandre Bergere 160
161. Internal
Authentication
Shared / ReplicaSet authentication
With Keyfile access
Follow replicaset & Update Existing Deployment.
OR
Or follow the link bellow: https://docs.mongodb.com/v3.0/tutorial/enable-internal-
authentication/ (Deploy New Replica Set with Access Control)
Deploy New Replica Set
with Access Control
06/06/2019 MongoDB class by Alexandre Bergere 161
163. Mongo DB Atlas
DAAS : Database As A Service • Schema design
• Query and index optimization
• Server size selection - you must select the appropriate size of server,
coupled with IO and storage capacity
• Capacity planning - you must determine when you need additional
capacity, typically using the monitoring telemetry provided by
MongoDB Atlas, but you can make these changes with no downtime
• Initiating database restores
• How much you use
06/06/2019 MongoDB class by Alexandre Bergere 163
164. Mongo DB Cloud Manager
06/06/2019 MongoDB class by Alexandre Bergere 164
165. MongoDB Atlas vs MongoDB Cloud Manager
Feature Atlas Cloud Manager
Monitoring
Alert
API
Backup
Settings
Maintenance
Infrastructure
06/06/2019 MongoDB class by Alexandre Bergere 165
166. TP - MongoDB Atlas
06/06/2019 MongoDB class by Alexandre Bergere 166
174. Mtools
The following tools are in the mtools collection:
mlogfilter : Slices log files by time, merges log files, filters slow queries, finds table scans, shortens log lines, filters by
other attributes, convert to JSON
mloginfo : returns info about log file, like start and end time, version, binary, special sections like restarts,
connections, distinct view
mplotqueries : visualize log files with different types of plots (requires matplotlib)
mlogvis : creates a self-contained HTML file that shows an interactive visualization in a web browser (as an
alternative to mplotqueries)
mlaunch : a script to quickly spin up local test environments, including replica sets and sharded systems (requires
pymongo)
06/06/2019 MongoDB class by Alexandre Bergere 174
175. MongoDB Charts
MongoDB Charts is the fastest and
easiest way to build visualizations of
MongoDB data.
(beta)
06/06/2019 MongoDB class by Alexandre Bergere 175
176. Change Streams
More info.
Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog.
Applications can use change streams to subscribe to all data changes on a collection and immediately react to them.
06/06/2019 MongoDB class by Alexandre Bergere 176
177. Stitch
Full access to MongoDB, declarative read/write
controls, and integration with your choice of services
MongoDB Stitch lets developers focus on building applications rather than on managing data manipulation code, service integration, or
backend infrastructure. Whether you’re just starting up and want a fully managed backend as a service, or you’re part of an enterprise and
want to expose existing MongoDB data to new applications, Stitch lets you focus on building the app users want, not on writing boilerplate
backend logic.
06/06/2019 MongoDB class by Alexandre Bergere 177