This document discusses using MongoDB as part of an enterprise data management architecture. It begins by describing the rise of data lakes to manage growing and diverse data volumes. Traditional EDWs struggle with this new data variety and volume. The document then provides an overview of MongoDB's features like flexible schemas, secondary indexes, and aggregation capabilities that make it suitable for building different layers of an EDM pipeline for tasks like raw data storage, transformation, analysis, and serving data to downstream systems. Example use cases are presented for building a single customer view and for replacing Oracle with MongoDB.
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
Travellers are demanding more exhaustive, accurate, and relevant results when they search for flights, and they want these results instantly – even when there can be 100 Billion travel options for a single trip. Amadeus’s “Instant Search” feature was built to meet those requirements. These searches are not trivial. Several terabytes of constantly evolving data is needed to reply instantly to questions like, “I live in Frankfurt, where can I go this weekend for €200?” “What’s the cheapest and most convenient flight for a MongoDB Europe attendee?” This technical session will show you how Amadeus integrated MongoDB into its system, and how it allowed us to handle huge numbers of updates and searches in a high-volume system, to deliver the next generation of flight search products. It will cover topics such as how we discovered which extra indexes were needed, how we were able to get the balancer to meet our needs, and how we modelled our data for optimal performance.
Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.
Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.
Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.
We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.
Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.
By the end of the talk I expect the attendees to have an understanding of:
How they connect their MongoDB clusters with Spark
Which use cases show a net benefit for connecting these two systems
What kind of architecture design should be considered for making the most of Spark + MongoDB
How documents can be modeled for better performance and operational process, while processing these data sets stored in MongoDB.
The talk is suitable for:
Developers that want to understand how to leverage Spark
Architects that want to integrate their existing MongoDB cluster and have real time high speed processing needs
Data scientists that know about Spark, are playing with Spark and want to integrate with MongoDB for their persistency layer
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
In the world of big data, legacy modernization, siloed organizations, empowered customers, and mobile devices, making informed choices about your enterprise infrastructure has become more important than ever. The alternatives are abundant, and the successful Enterprise Architect must constantly discern which new technology is just a shiny object and which will add true business value.
MongoDB is more than just a great application database for developers; it gives Enterprise Architects new capabilities to solve previously difficult architectural requirements much more easily. Take for example the challenge of many siloed systems at MetLife – with MongoDB, the Metlife team was able to successfully provide a single view into those 70 systems, in only 3 months.
In this webinar, we will:
Explore real life challenges enterprises face with case studies of their solutions
Consider how best to introduce MongoDB in the enterprise
Give an overview of how to optimize the use of MongoDB
MongoDB Europe 2016 - Using MongoDB to Build a Fast and Scalable Content Repo...MongoDB
MongoDB can be used in the Nuxeo Platform as a replacement for more traditional SQL databases. Nuxeo's content repository, which is the cornerstone of this open source enterprise content management platform, integrates completely with MongoDB for data storage. This presentation will explain the motivation for using MongoDB and will emphasize the different implementation choices driven by the very nature of a NoSQL datastore like MongoDB. Learn how Nuxeo integrated MongoDB into the platform which resulted in increased performance (including actual benchmarks) and better response to some use cases.
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
With so much talk of how Big Data is revolutionizing the world and how a data lake with Hadoop and/or Spark will solve all your data problems, it is hard to tell what is hype, reality, or somewhere in-between.
In working with dozens of enterprises in varying stages of their enterprise data management (EDM) strategy, MongoDB enterprise architect, Matt Kalan, sees the same challenges and misunderstandings arise again and again.
In this session, he will explain common challenges in data management, what capabilities are necessary, and what the future state of architecture looks like. MongoDB is uniquely capable of filling common gaps in the data lake strategy.
This session also includes a live Q&A portion during which you are encouraged to ask questions of our team.
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
Travellers are demanding more exhaustive, accurate, and relevant results when they search for flights, and they want these results instantly – even when there can be 100 Billion travel options for a single trip. Amadeus’s “Instant Search” feature was built to meet those requirements. These searches are not trivial. Several terabytes of constantly evolving data is needed to reply instantly to questions like, “I live in Frankfurt, where can I go this weekend for €200?” “What’s the cheapest and most convenient flight for a MongoDB Europe attendee?” This technical session will show you how Amadeus integrated MongoDB into its system, and how it allowed us to handle huge numbers of updates and searches in a high-volume system, to deliver the next generation of flight search products. It will cover topics such as how we discovered which extra indexes were needed, how we were able to get the balancer to meet our needs, and how we modelled our data for optimal performance.
Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.
Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.
Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.
We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.
Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.
By the end of the talk I expect the attendees to have an understanding of:
How they connect their MongoDB clusters with Spark
Which use cases show a net benefit for connecting these two systems
What kind of architecture design should be considered for making the most of Spark + MongoDB
How documents can be modeled for better performance and operational process, while processing these data sets stored in MongoDB.
The talk is suitable for:
Developers that want to understand how to leverage Spark
Architects that want to integrate their existing MongoDB cluster and have real time high speed processing needs
Data scientists that know about Spark, are playing with Spark and want to integrate with MongoDB for their persistency layer
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
In the world of big data, legacy modernization, siloed organizations, empowered customers, and mobile devices, making informed choices about your enterprise infrastructure has become more important than ever. The alternatives are abundant, and the successful Enterprise Architect must constantly discern which new technology is just a shiny object and which will add true business value.
MongoDB is more than just a great application database for developers; it gives Enterprise Architects new capabilities to solve previously difficult architectural requirements much more easily. Take for example the challenge of many siloed systems at MetLife – with MongoDB, the Metlife team was able to successfully provide a single view into those 70 systems, in only 3 months.
In this webinar, we will:
Explore real life challenges enterprises face with case studies of their solutions
Consider how best to introduce MongoDB in the enterprise
Give an overview of how to optimize the use of MongoDB
Agile Software Development is becoming the defacto way of building software these days. More and more enterprises, from large fortune 500 to small shop start-ups, are adopting agile development methodologies. But Agile Software development is more than just a methodology or a practice. It's also a combined set of tools and platforms that today are at our disposal to allows to iterate faster, get-to-market sooner and also fail faster. These set of tools augment our development cycles by a few orders of magnitude and allow developers to be much more productive.
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
This session will be a case study of eBay’s experience running MongoDB for project Zoom, in which eBay stores all media metadata for the site. This includes references to pictures of every item for sale on eBay. This cluster is eBay's first MongoDB installation on the platform and is a mission critical application. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of the project and its underlying architecture.
Rapid Development and Performance By Transitioning from RDBMSs to MongoDB
Modern day application requirements demand rich & dynamic data structures, fast response times, easy scaling, and low TCO to match the rapidly changing customer & business requirements plus the powerful programming languages used in today's software landscape.
Traditional approaches to solutions development with RDBMSs increasingly expose the gap between the modern development languages and the relational data model, and between scaling up vs. scaling horizontally on commodity hardware. Development time is wasted as the bulk of the work has shifted from adding business features to struggling with the RDBMSs.
MongoDB, the premier NoSQL database, offers a flexible and scalable solution to focus on quickly adding business value again.
In this session, we will provide:
- Overview of MongoDB's capabilities
- Code-level exploration of the MongoDB programming model and APIs and how they transform the way developers interact with a database
- Update of the exciting features in MongoDB 3.0
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
Move Fast with MongoDB Cloud Database - Atlas.
The workshop covered:
Deploying a MongoDB cluster in minutes
Query and manage data in MongoDB
Executing continuous backups and point-in-time restores, ensuring that you can meet any restore point objectives
View historical metrics in optimized dashboards, see what’s happening in your database live, configure alerts, and receive automated index suggestions to improve the performance of your cluster
Using MongoDB Charts and create visual representations of your data
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB
MongoDB and RDBMS: Using Polyglot Persistence at Equifax. Presented by Michael Lawrence, Pariveda Solutions on behalf of Equifax at MongoDB Evenings Atlanta on September 24, 2015.
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB
What's the Scoop on MongoDB & Hadoop
Jake Angerman, Sr. Solutions Architect, MongoDB
MongoDB Evenings Dallas
March 30, 2016 at the Addison Treehouse, Dallas, TX
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB
Presented by Sigfrido Narvaez, Senior Solutions Architect, MongoDB
Experience level: Introductory
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice? In this session you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Webinar: Faster Big Data Analytics with MongoDBMongoDB
Learn how to leverage MongoDB and Big Data technologies to derive rich business insight and build high performance business intelligence platforms. This presentation includes:
- Uncovering Opportunities with Big Data analytics
- Challenges of real-time data processing
- Best practices for performance optimization
- Real world case study
This presentation was given in partnership with CIGNEX Datamatics.
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
Pairing your real-time operational data stored in a modern database like MongoDB with first-class business intelligence platforms like Tableau enables new insights to be discovered faster than ever before.
Many leading organizations already use MongoDB in conjunction with Tableau including a top American investment bank and the world’s largest airline. With the Connector for BI 2.0, it’s never been easier to streamline the connection process between these two systems.
In this webinar, we will create a live connection from Tableau Desktop to a MongoDB cluster using the Connector for BI. Once we have Tableau Desktop and MongoDB connected, we will demonstrate the visual power of Tableau to explore the agile data storage of MongoDB.
You’ll walk away knowing:
- How to configure MongoDB with Tableau using the updated connector
- Best practices for working with documents in a BI environment
- How leading companies are using big data visualization strategies to transform their businesses
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
Two complementary trends are particularly strong in enterprise IT today: MongoDB itself, and the movement of infrastructure, platform, and software to as-a-service models. Being designed from the start to work in cloud deployments, MongoDB is a natural fit.
Learn how your enterprise can create its own MongoDB service offering, combining the advantages of MongoDB and cloud for agile, nearly-instantaneous deployments. Ease your operations workload by centralizing your points for enforcement, standardize best policies, and enable elastic scalability.
We will provide you with an enterprise planning outline which incorporates needs and value for stakeholders across operations, development, and business. We will cover accounting, chargeback integration, and quantification of benefits to the enterprise (such as standardizing best practices, creating elastic architecture, and reducing database maintenance costs).
MongoDB has been conceived for the cloud age. Making sure that MongoDB is compatible and performant around cloud providers is mandatory to achieve complete integration with platforms and systems. Azure is one of biggest IaaS platforms available and very popular amongst developers that work on Microsoft Stack.
This presentation contains a preview of MongoDB 3.2 upcoming release where we explore the new storage engines, aggregation framework enhancements and utility features like document validation and partial indexes.
Agile Software Development is becoming the defacto way of building software these days. More and more enterprises, from large fortune 500 to small shop start-ups, are adopting agile development methodologies. But Agile Software development is more than just a methodology or a practice. It's also a combined set of tools and platforms that today are at our disposal to allows to iterate faster, get-to-market sooner and also fail faster. These set of tools augment our development cycles by a few orders of magnitude and allow developers to be much more productive.
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
This session will be a case study of eBay’s experience running MongoDB for project Zoom, in which eBay stores all media metadata for the site. This includes references to pictures of every item for sale on eBay. This cluster is eBay's first MongoDB installation on the platform and is a mission critical application. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of the project and its underlying architecture.
Rapid Development and Performance By Transitioning from RDBMSs to MongoDB
Modern day application requirements demand rich & dynamic data structures, fast response times, easy scaling, and low TCO to match the rapidly changing customer & business requirements plus the powerful programming languages used in today's software landscape.
Traditional approaches to solutions development with RDBMSs increasingly expose the gap between the modern development languages and the relational data model, and between scaling up vs. scaling horizontally on commodity hardware. Development time is wasted as the bulk of the work has shifted from adding business features to struggling with the RDBMSs.
MongoDB, the premier NoSQL database, offers a flexible and scalable solution to focus on quickly adding business value again.
In this session, we will provide:
- Overview of MongoDB's capabilities
- Code-level exploration of the MongoDB programming model and APIs and how they transform the way developers interact with a database
- Update of the exciting features in MongoDB 3.0
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
Move Fast with MongoDB Cloud Database - Atlas.
The workshop covered:
Deploying a MongoDB cluster in minutes
Query and manage data in MongoDB
Executing continuous backups and point-in-time restores, ensuring that you can meet any restore point objectives
View historical metrics in optimized dashboards, see what’s happening in your database live, configure alerts, and receive automated index suggestions to improve the performance of your cluster
Using MongoDB Charts and create visual representations of your data
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB
MongoDB and RDBMS: Using Polyglot Persistence at Equifax. Presented by Michael Lawrence, Pariveda Solutions on behalf of Equifax at MongoDB Evenings Atlanta on September 24, 2015.
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB
What's the Scoop on MongoDB & Hadoop
Jake Angerman, Sr. Solutions Architect, MongoDB
MongoDB Evenings Dallas
March 30, 2016 at the Addison Treehouse, Dallas, TX
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB
Presented by Sigfrido Narvaez, Senior Solutions Architect, MongoDB
Experience level: Introductory
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice? In this session you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Webinar: Faster Big Data Analytics with MongoDBMongoDB
Learn how to leverage MongoDB and Big Data technologies to derive rich business insight and build high performance business intelligence platforms. This presentation includes:
- Uncovering Opportunities with Big Data analytics
- Challenges of real-time data processing
- Best practices for performance optimization
- Real world case study
This presentation was given in partnership with CIGNEX Datamatics.
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
Pairing your real-time operational data stored in a modern database like MongoDB with first-class business intelligence platforms like Tableau enables new insights to be discovered faster than ever before.
Many leading organizations already use MongoDB in conjunction with Tableau including a top American investment bank and the world’s largest airline. With the Connector for BI 2.0, it’s never been easier to streamline the connection process between these two systems.
In this webinar, we will create a live connection from Tableau Desktop to a MongoDB cluster using the Connector for BI. Once we have Tableau Desktop and MongoDB connected, we will demonstrate the visual power of Tableau to explore the agile data storage of MongoDB.
You’ll walk away knowing:
- How to configure MongoDB with Tableau using the updated connector
- Best practices for working with documents in a BI environment
- How leading companies are using big data visualization strategies to transform their businesses
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
Two complementary trends are particularly strong in enterprise IT today: MongoDB itself, and the movement of infrastructure, platform, and software to as-a-service models. Being designed from the start to work in cloud deployments, MongoDB is a natural fit.
Learn how your enterprise can create its own MongoDB service offering, combining the advantages of MongoDB and cloud for agile, nearly-instantaneous deployments. Ease your operations workload by centralizing your points for enforcement, standardize best policies, and enable elastic scalability.
We will provide you with an enterprise planning outline which incorporates needs and value for stakeholders across operations, development, and business. We will cover accounting, chargeback integration, and quantification of benefits to the enterprise (such as standardizing best practices, creating elastic architecture, and reducing database maintenance costs).
MongoDB has been conceived for the cloud age. Making sure that MongoDB is compatible and performant around cloud providers is mandatory to achieve complete integration with platforms and systems. Azure is one of biggest IaaS platforms available and very popular amongst developers that work on Microsoft Stack.
This presentation contains a preview of MongoDB 3.2 upcoming release where we explore the new storage engines, aggregation framework enhancements and utility features like document validation and partial indexes.
Data driven innovation presentation (Rome 2016)Marco Liberati
Use data visualization to discover and communicate insights and stories in your data for your business.
In this presentation I'm going to show how visualization can make people understand their data by discovering hidden connections and patterns in time then efficiently communicate them to others.
Past, Present and Future of Data Processing in Apache HadoopCodemotion
MongoDB scales easily to store mass volumes of data. However, when it comes to making sense of it all what options do you have? In this talk, we’ll take a look at 3 different ways of aggregating your data with MongoDB, and determine the reasons why you might choose one way over another. No matter what your big data needs are, you will find out how MongoDB the big data store is evolving to help make sense of your data.
This presentation covers practical implementation of Lambda with different patterns. It also explains how to achieve continuous deployment using lambda.
My other computer is a datacentre - 2012 editionSteve Loughran
An updated version of the "my other computer is a datacentre" talk, presented at the Bristol University HPC talk.
Because it is targeted at universities, it emphasises some of the interesting problems -the classic CS ones of scheduling, new ones of availability and failure handling within what is now a single computer, and emergent problems of power and heterogeneity. It also includes references, all of which are worth reading, and, being mostly Google and Microsoft papers, are free to download without needing ACM or IEEE library access.
Comments welcome.
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
Lambda Architecture is a useful framework to think about designing big data applications. This framework has been built initially at Twitter. In this presentation you will learn, based on concrete examples how to build deploy scalable and fault tolerant applications, with a focus on Big Data and Hadoop.
This presentation was delivered at the OOP conference, Munich, Feb 2016
MongoDB Europe 2016 - Star in a Reasonably Priced Car - Which Driver is Best?MongoDB
MongoDB's unique Idiomatic Drivers let you work natively with database objects in your favourite language, removing the need to explicitly convert your data and queries to text formats such as SQL, Javascript or XML. Drivers do all the hard work of translating to serialised BSON objects on the wire, removing the need for server-side parsing and ensuring security against injection attacks. Server load and hardware requirements are reduced at the expense of additional client side CPU cycles. In this presentation we compare the performance of drivers in a number of languages to see what impact your language choice can have on your hosting costs and throughput.
Hidden inside MongoDB is the WiredTiger data engine, an Open Source, pluggable storage engine that became the database's default in 3.2. Written in C, WiredTiger uses a variety of techniques to provide unmatched performance, low latency and scalability. This talk will explore data structures and techniques C/C++ programmers can use to support heavily threaded applications on modern hardware, using examples from the WiredTiger code base. Data structures and techniques to be covered include hazard pointers, skiplists, ticket locks, atomic instructions and memory barriers.
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB
Proximus is one of the biggest Telecom companies in the Belgian market. This year the company began developing a new IoT network using LoRaWan technology. The talk will detail our development team’s search for a database suited to meet the needs of our IoT project, the selection and implementation of MongoDB as a database, as well as well as how we built a system for storing a variety of sensor data with high throughput by leveraging sleepy.mongoose. The talk will also discuss how different decisions around data storage impact applications in regards to both performance and total cost.
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB
We will do a deep dive into the powerful query capabilities of MongoDB's Aggregation Framework, and show you how you can use MongoDB's built-in features to inspect the execution and tune the performance of your queries. And, last but not least, we will also give you a brief outlook into MongoDB 3.4's awesome new Aggregation Framework additions.
MongoDB Europe 2016 - Debugging MongoDB PerformanceMongoDB
Asya is back, and so is Sherlock Holmes and his techniques to gather and analyze data from your poorly performing MongoDB clusters. In this advanced talk we take a deep look at all the diagnostic data that lives inside MongoDB - how to interrogate and interpret it to help you solve those frustrating performance bottlenecks that we all face occasionally.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
MongoDB Europe 2016 - Powering Microservices with Docker, Kubernetes, and KafkaMongoDB
Organisations are building their applications around microservice architectures because of the flexibility, speed of delivery, and maintainability they deliver. This session introduces you to technologies such as Docker, Kubernetes & Kafka which are driving the microservices revolution. Learn about containers and orchestration – and most importantly how to exploit them for stateful services such as MongoDB.
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
La nascita dei data lake - La aziende, ormai, sono sommerse dai dati e il classico datawarehouse fa fatica a macinare questi dati per numerosità e varietà. In molti hanno iniziato a guardare a delle architetture chiamate Data Lakes con Hadoop come tecnologia di riferimento. Ma questa soluzione va bene per tutto? Vieni a capire come operazionalizzare i data lakes per creare delle moderne architetture di gestione dati.
In the world of big data, legacy modernization, siloed organizations, empowered customers, and mobile devices, making informed choices about your enterprise infrastructure has become more important than ever. The alternatives are abundant, and the successful Enterprise Architect must constantly discern which new technology is just a shiny object and which will add true business value.
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
Watch full webinar here: https://bit.ly/34iCruM
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Webinar: How to Drive Business Value in Financial Services with MongoDBMongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data. This coupled with cost pressures from the business has led these institutions to seek alternatives. Top tier institutions like MetLife have turned to MongoDB because of the enormous business value it enables.
In this session, hear how MongoDB enabled these successful real world examples:
Single View of a Customer - 3 months and $2M for a single view of a customer across 50 source systems
Reference Data Management - $40M in cost savings from migrating to MongoDB for reference data management
Private cloud - MongoDB as a PaaS across a tier 1 bank for enabling agility for operations, not just the developer
The use cases are specific to financial services but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
Unlocking Operational Intelligence from the Data LakeMongoDB
Hadoop-based data lakes are enabling enterprises and governments to efficiently capture and analyze unprecedented volumes of data. Join this webinar to learn how digital transformation is driving the rise of the data lake, the role Hadoop plays in generating new classes of analytics and insight, the critical capabilities you need to evaluate in an operational database for your data lake, and more.
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
Learn why MongoDB is spreading like wildfire across capital markets (and really every industry) and then focus in particular on how financial firms are enjoying the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data.
New generations of database technologies are allowing organizations to build applications never before possible, at a speed and scale that were previously unimaginable. MongoDB is the fastest growing database on the planet, and the new 3.2 release will bring the benefits of modern database architectures to an ever broader range of applications and users.
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
Managing and distributing reference data globally has always been a challenge for financial institutions. Managing and maintaining database schemas while integrating and replicating that data across geographies is costly and time consuming. MongoDB's native replication capabilities and partitioned architecture make it simple to distribute and synchronize data efficiently across the globe. MongoDB’s dynamic schema dramatically reduces database maintenance for schema migrations – data structure changes can be applied with no down time, and with no impact to existing applications.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
Erik Baardse and Ajit Gadge from EDB Postgres presented on how to transform your DBMS in order to drive digital business. How Postgres enables you to support a wider range of workloads with your relational database which opens the Big Data doors. They also cover EnterpriseDB’s Strategy around Big Data which focuses on 3 areas and finally last but not the last how to find money in IT with Big Data and digital transformation
Open Source North - MongoDB Advanced Schema Design PatternsMatthew Kalan
The hardest part of moving from a tabular database world to a modern world of objects and JSON is how to model your data. This year at OSN, Matt from MongoDB will take data modeling one step further than prior years and focus specifically on advanced schema design patterns to optimize the ease-of-use and performance of your data access layer and application.
Similar to L’architettura di Classe Enterprise di Nuova Generazione (20)
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. Agenda
• Nascita dei Data Lake
• Overview di MongoDB
• Proposta di un’architettura
EDM
• Case Study & Scenarios
• Data Lake Lessons Learned
3. Quanti dati?
• Una cosa non manca alla aziende: dati
– Flussi dei sensori
– Sentiment sui social
– Log dei server
– App mobile
• Analisti stimano una crescita del volume di dati del 40%
annuo, 90% dei quali non strutturati.
• Le tecnologie tradizionali (alcune disegnate 40 anni fa) non
sono sufficienti
4. La Promessa del “Big Data”
• Scoprire informazioni collezionando ed analizzando i dati
porta la promessa di
– Un vantaggio competitivo
– Risparmio economico
• Un esempio diffuso dell’utilizzo della tecnologia Big Data è la
“Single View”: aggregare tutto quello che si conosce di un
cliente per migliorarne l’ingaggio e i ricavi
• Il tradizionale EDW scricchiola sotto il carico, sopraffatto dal
volume e varietà dei dati (e dall’alto costo).
5. La Nascita dei Data Lake
• Molte aziende hanno iniziato a guardare verso un’architettura
detta Data Lake:
– Piattaforma per gestire i dati in modo flessibile
– Per aggregare I dati cross-silo in un unico posto
– Permette l’esplorazione di tutti i dati
• La piattaforma più in voga in questo momento è Hadoop:
– Permette la scalabilità orizzontale su hardware commodity
– Permette una schema di dati variegati ottimizzato in lettura
– Include strati di lavorazione dei dati in SQL e linguaggi comuni
– Grandi referenze (Yahoo e Google in primis)
6. Perché Hadoop?
• Hadoop Distributed FileSystem è disegnato per scalare su
grandi operazioni batch
• Fornisce un modello write-one read-many append-only
• Ottimizzato per lunghe scansione di TB o PB di dati
• Questa capacità di gestire dati multi-strutturati è usata:
– Segmentazione dei clienti per campagne di marketing e
recommendation
– Analisi predittiva
– Modelli di Rischio
7. Ma va bene per tutto?
• I Data Lake sono disegnati per fornire l’output di Hadoop alle
applicazioni online. Queste applicazioni hanno dei requisiti
tra cui:
– Latenza di risposta in ms
– Accesso random su un sottoinsieme di dati indicizzato
– Supporto di query espressive ed aggregazioni di dati
– Update di dati che cambiano valori frequentemente in real-time
8. Hadoop è la risposta a tutto?
• Nel nostro mondo guidato ormai dai dati, i millisecondi sono
importanti.
– Ricercatori IBM affermano che il 60% dei dati perde valore alcuni
millisecondi dopo la generazione
– Ad esempio identificare una transazione di borsa fraudolenta è inutile
dopo alcuni minuti
• Gartner predice che il 70% delle installazioni di Hadoop fallirà
per non aver raggiunto gli obiettivi di costo e di incremento
del fatturato.
9. Enterprise Data Management Pipeline
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
10. In Dettaglio
• Join non necessarie causano pessime performance
• Costoso scalare verticalmente
• Lo schema rigido rende difficile il consolidamento
di datai variabili o non strutturati
• Ci sono differenze nei record da eliminare
durante la fase di aggregazione
• I processi soventi durano ore durante la notte
• I dati sono vecchi per prendere decisioni intraday
12. Documents Enable Dynamic Schema & Optimal
Performance
Relational MongoDB
{ customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
Customer
ID
First Name Last Name City
0 John Doe New York
1 Mark Smith San Francisco
2 Jay Black Newark
3 Meagan White London
4 Edward Daniels Boston
Phone Number Type DNC
Customer
ID
1-212-555-1212 home T 0
1-212-555-1213 home T 0
1-212-555-1214 cell F 0
1-212-777-1212 home T 1
1-212-777-1213 cell (null) 1
1-212-888-1212 home F 2
13. Document Model Benefits
Agility and flexibility
Data model supports business change
Rapidly iterate to meet new requirements
Intuitive, natural data representation
Eliminates ORM layer
Developers are more productive
Reduces the need for joins, disk seeks
Programming is more simple
Performance delivered at scale
{
customer_id : 1,
first_name : "Mark",
last_name : "Smith",
city : "San Francisco",
phones: [
{
number : “1-212-777-1212”,
dnc : true,
type : “home”
},
number : “1-212-777-1213”,
type : “cell”
}]
}
16. 3.2 Features Relevant for EDM
• WiredTiger as default storage engine
• In-memory storage engine
• Encryption at rest
• Document Validation Rules
• Compass (data viewer & query builder)
• Connector for BI (Visualization)
• Connector for Hadoop
• Connector for Spark
• $lookUp (left outer join)
17. Data Governance with Document Validation
Implement data governance without
sacrificing agility that comes from dynamic
schema
• Enforce data quality across multiple
teams and applications
• Use familiar MongoDB expressions to
control document structure
• Validation is optional and can be as
simple as a single field, all the way to
every field, including existence, data
types, and regular expressions
18. MongoDB Compass
For fast schema discovery and
visual construction of ad-hoc
queries
• Visualize schema
– Frequency of fields
– Frequency of types
– Determine validator rules
• View Documents
• Graphically build queries
• Authenticated access
19. MongoDB Connector for BI
Visualize and explore multi-dimensional
documents using SQL-based BI tools. The
connector does the following:
• Provides the BI tool with the schema of the
MongoDB collection to be visualized
• Translates SQL statements issued by the BI
tool into equivalent MongoDB queries that
are sent to MongoDB for processing
• Converts the results into the tabular format
expected by the BI tool, which can then
visualize the data based on user
requirements
20. Dynamic Lookup
Combine data from multiple
collections with left outer joins for
richer analytics & more flexibility in
data modeling
• Blend data from multiple sources
for analysis
• Higher performance analytics with
less application-side code and less
effort from your developers
• Executed via the new $lookup
operator, a stage in the MongoDB
Aggregation Framework pipeline
21. Aggregation Framework – Pipelined Analysis
Start with the original collection; each record
(document) contains a number of shapes (keys),
each with a particular color (value)
• $match filters out documents that don’t
contain a red diamond
• $project adds a new “square” attribute with a
value computed from the value (color) of the
snowflake and triangle attributes
• $lookup performs a left outer join with
another collection, with the star being the
comparison key
• Finally, the $group stage groups the data by
the color of the square and produces statistics
for each group
23. MongoDB Architecture Patterns
1. Operational Data Store (ODS)
2. Enterprise Data Service
3. Datamart/Cache
4. Master Data Distribution
5. Single Operational View
6. Operationalizing Hadoop
System of Record
System of Engagement
24. Enterprise Data Management Pipeline
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
25. Come scegliere il layer di Data Management per
ognuno degli stage?
Processing
Layer
?
When you want:
1. Secondary indexes
2. Sub-second latency
3. Aggregations in DB
4. Updates of data
For:
1. Scanning files
2. When indexes
not needed
Wide column store
(e.g. HBase)
For:
1. Primary key
queries
2. If multiple indexes
& slices not
needed
3. Optimized for
writing, not
reading
26. MongoDB Hadoop/Spark Connector
Distributed
processing/
analytics
• Sub-second latency
• Expressive querying
• Flexible indexing
• Aggregations in database
• Great for any subset of
data
• Longer jobs
• Batch analytics
• Append only files
• Great for scanning all data
or large subsets in files
- MongoDB Hadoop
Connector
- Spark-mongodb
Both provide:
• Schema-on-read
• Low TCO
• Horizontal scale
27. Data Store for Raw Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Store
raw data
Transform
- Typically just writing
record-by-record from
source data
- Usually just need high
write volumes
- All 3 options handle that
Transform read requirements
- Benefits to reading multiple datasets
sorted [by index], e.g. to do a merge
- Might want to look up across tables
with indexes (and join functionality in
MDB v3.2)
- Want high read performance while
writes are happening
Interactive querying on
the raw data could use
indexes with MongoDB
28. Data Store for Transformed Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
AggregateTransform
Often benefits to
updating data as
merging multiple
datasets
Dashboards &
reports can have
sub-second latency
with indexes
Aggregate read requirements
- Benefits to using indexes for grouping
- Aggregations natively in the DB would help
- With indexes, can do aggregations on slices of
data
- Might want to look up across tables with
indexes to aggregate
29. Data Store for Aggregated Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
AnalyzeAggregate
Dashboards &
reports can have
sub-second
latency with
indexes
Analytics read requirements
- For scanning all of data, could
be in any data store
- Often want to analyze a slice
of data (using indexes)
- Querying on slices is best in
MongoDB
30. Data Store for Last Dataset
…
Siloed source databases
External feeds
(batch)
Streams
Stream icon from: https://en.wikipedia.org/wiki/File:Activity_Streams_icon.png
Transform
Store raw
data
AnalyzeAggregate
Pub-sub,ETL,fileimports
Stream Processing
Users
Other
Systems
Analyze
Users
Dashboards &
reports can have
sub-second
latency with
indexes
- At the last step, there are
many consuming systems and
users
- Need expressive querying with
secondary indexes
- MongoDB is best option for the
publication or distribution of
analytical results and
operationalization of data
Other
Systems
Often digital
applications
- High scale
- Expressive querying
- JSON preferred
Often
RESTful
services,
APIs
31. Architettura EDM Completa
…
Siloed source
databases
External feeds
(batch)
Streams
Data processing pipeline
Pub-sub,ETL,fileimports
Stream Processing
Downstrea
m Systems
… …
Single CSR
Application
Unified
Digital Apps
Operational
Reporting
…
… …
Analytic
Reporting
Drivers & Stacks
Customer
Clusterin
g
Churn
Analysis
Predictiv
e
Analytics
…
Distributed
Processing
Governance to
choose where to
load and process
data
Optimal
location for
providing
operational
response times
& slices
Can run
processing on
all data or
slices
Data Lake
32. Example scenarios
1.Single Customer View
a. Operational
b. Analytics on customer segments
c. Analytics on all customers
2.Customer profiles & clustering
3.Presenting churn analytics on high value customers
33. Single View of Customer
Spanish bank replaces Teradata and Microstrategy to
increase business and avoid significant cost
Problem Why MongoDB Results
Problem Solution Results
Took days to implement new
functionality and business policies,
inhibiting revenue growth
Branches needed an app providing
single view of the customer and real
time recommendations for new
products and services
Multi-minute latency for accessing
customer data stored in Teradata and
Microstrategy
Built single view of customer on
MongoDB – flexible and scalable app
easy to adapt to new business needs
Super fast, ad hoc query capabilities
(milliseconds), and real-time analytics
thanks to MongoDB’s Aggregation
Framework
Can now leverage distributed
infrastructure and commodity
hardware for lower total cost of
ownership and greater availability
Cost avoidance of 10M$+
Application developed and deployed
in less than 6 months. New business
policies easily deployed and executed,
bringing new revenue to the company
Current capacity allows branches to
load instantly all customer info in
milliseconds, providing a great
customer experience
Large Spanish
Bank
34. Case Study
Insurance leader generates coveted single view of
customers in 90 days – “The Wall”
Problem Why MongoDB ResultsProblem Solution Results
No single view of customer, leading
to poor customer experience and
churn
145 years of policy data, 70+
systems, 15+ apps that are not
integrated
Spent 2 years, $25M trying build
single view with Oracle – failed
Built “The Wall” pulling in
disparate data and serving single
view to customer service reps in
real time
Flexible data model to aggregate
disparate data into single data
store
Churn analysis done with Hadoop
with relevant results output to
MongoDB
Prototyped in 2 weeks
Deployed to production in 90
days
Decreased churn and improved
ability to upsell/cross-sell
35. Top 15
Global Bank
Kicking Out Oracle
Global bank with 48M customers in 50 countries terminates
Oracle ULA & makes MongoDB database of choice
Problem Why MongoDB Results
Problem Solution Results
Slow development cycles due to RDBMS’
rigid data model hindering ability to meet
business demands
High TCO for hardware, licenses,
development, and support
(>$50M Oracle ULA)
Poor overall performance of customer-
facing and internal applications
Building dozens of apps on MongoDB,
both net new and migrations from Oracle
– e.g., significant portion of retail
banking, including customer-facing and
backoffice apps, fraud detection, card
activation, equity research content mgt.)
Flexible data model to develop apps
quickly and accommodate diverse data
Ability to scale infrastructure and costs
elastically
Able to cancel Oracle ULA. Evaluating
what apps can be migrated to MongoDB.
For new apps, MongoDB is default choice
Apps built in weeks instead of months or
years, e.g., ebanking app prototyped in 2
weeks and in production in 4 weeks
70% TCO reduction
Editor's Notes
Stream processing is often separate processing layer than the batch processing, but it can be stored into the data stores at various stages
Could make more visual
Kernel 3.2 Scope tracking: https://docs.google.com/spreadsheets/d/1L1EbbWoshUIHXBzCh5e3sALtAFxm_dJ52SRPR6GzeAY/edit#gid=0
Release notes for 3.1.6: http://docs.mongodb.org/manual/release-notes/3.1-dev-series/
Determine validator rules: You can use the tool to figure out what you want to set as validation rules
$lookup – this creates new documents which contain everything from the previous stage but augmented with data from any document from the second collection containing a matching colored star (i.e., the blue and yellow stars had matching lookup values, whereas the red star had none)
In terms of reporting, A number of Business Intelligence (BI) vendors have developed connectors to integrate MongoDB as a data source with their suites, alongside traditional relational dbs. This integration provides reporting, visualizations, dash-boarding of MongoDB data
Stream processing is often separate processing layer than the batch processing, but it can be stored into the data stores at various stages
Just a logical diagram. Processing could be on same physical servers as storage nodes to minimize data movement