The document discusses unlocking operational intelligence from data lakes using MongoDB. It begins by describing how digital transformation is driving changes in data volume, velocity, and variety. It then discusses how MongoDB can help operationalize data lakes by providing real-time access and analytics on data stored in data lakes, while also integrating batch processing capabilities. The document provides an example reference architecture of how MongoDB can be used with a data lake (Hadoop) and stream processing framework (Kafka) to power operational applications and machine learning models with both real-time and batch data and analytics.
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
Organizations have long seen the value in aggregating data from multiple systems into a single, holistic, real-time representation of a business entity. That entity is often a customer. But the benefits of a single view in enhancing business visibility and operational intelligence can apply equally to other business contexts. Think products, supply chains, industrial machinery, cities, financial asset classes, and many more.
However, for many organizations, delivering a single view to the business has been elusive, impeded by a combination of technology and governance limitations.
MongoDB has been used in many single view projects across enterprises of all sizes and industries. In this session, we will share the best practices we have observed and institutionalized over the years. By attending the webinar, you will learn:
- A repeatable, 10-step methodology to successfully delivering a single view
- The required technology capabilities and tools to accelerate project delivery
- Case studies from customers who have built transformational single view applications on MongoDB.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
Webinar: Simplifying the Database Experience with MongoDB AtlasMongoDB
MongoDB Atlas is our database as a service for MongoDB. In this webinar you’ll learn how it provides all of the features of MongoDB, without all of the operational heavy lifting, and all through a pay-as-you-go model billed on an hourly basis.
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB
MongoDB and RDBMS: Using Polyglot Persistence at Equifax. Presented by Michael Lawrence, Pariveda Solutions on behalf of Equifax at MongoDB Evenings Atlanta on September 24, 2015.
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
Organizations have long seen the value in aggregating data from multiple systems into a single, holistic, real-time representation of a business entity. That entity is often a customer. But the benefits of a single view in enhancing business visibility and operational intelligence can apply equally to other business contexts. Think products, supply chains, industrial machinery, cities, financial asset classes, and many more.
However, for many organizations, delivering a single view to the business has been elusive, impeded by a combination of technology and governance limitations.
MongoDB has been used in many single view projects across enterprises of all sizes and industries. In this session, we will share the best practices we have observed and institutionalized over the years. By attending the webinar, you will learn:
- A repeatable, 10-step methodology to successfully delivering a single view
- The required technology capabilities and tools to accelerate project delivery
- Case studies from customers who have built transformational single view applications on MongoDB.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
Webinar: Simplifying the Database Experience with MongoDB AtlasMongoDB
MongoDB Atlas is our database as a service for MongoDB. In this webinar you’ll learn how it provides all of the features of MongoDB, without all of the operational heavy lifting, and all through a pay-as-you-go model billed on an hourly basis.
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB
MongoDB and RDBMS: Using Polyglot Persistence at Equifax. Presented by Michael Lawrence, Pariveda Solutions on behalf of Equifax at MongoDB Evenings Atlanta on September 24, 2015.
Presented by Rob Walters, Solutions Architect, MongoDB, at MongoDB Evenings New England 2017.
MongoDB 3.6 is the latest version of the world's most popular document database. In this session we will cover the key themes of the release including speed to develop, speed to production and speed to insight. Learn about the key features that support these themes and how you can start leveraging them today!
MongoDB 3.6 helps you *move at the speed of your data* - turning developers, operations teams, and analysts into a growth engine for the business. It enables new apps to be delivered to market faster, running reliably and securely at scale, and unlocking insights and intelligence in real time. Learn more: https://www.mongodb.com/mongodb-3.6
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
In the world of big data, legacy modernization, siloed organizations, empowered customers, and mobile devices, making informed choices about your enterprise infrastructure has become more important than ever. The alternatives are abundant, and the successful Enterprise Architect must constantly discern which new technology is just a shiny object and which will add true business value.
MongoDB is more than just a great application database for developers; it gives Enterprise Architects new capabilities to solve previously difficult architectural requirements much more easily. Take for example the challenge of many siloed systems at MetLife – with MongoDB, the Metlife team was able to successfully provide a single view into those 70 systems, in only 3 months.
In this webinar, we will:
Explore real life challenges enterprises face with case studies of their solutions
Consider how best to introduce MongoDB in the enterprise
Give an overview of how to optimize the use of MongoDB
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
This session will be a case study of eBay’s experience running MongoDB for project Zoom, in which eBay stores all media metadata for the site. This includes references to pictures of every item for sale on eBay. This cluster is eBay's first MongoDB installation on the platform and is a mission critical application. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of the project and its underlying architecture.
Presented by Claudius Li, Solutions Architect at MongoDB, at MongoDB Evenings New England 2017.
MongoDB Atlas is the premier database as a service offering. Find out how MongoDB Atlas can help your team to deploy more easily, develop faster and easily manage deployment, maintenance, upgrades and expansions. We will also demonstrate some of the key features and tools that come with MongoDB Atlas.
The importance of efficient data management for Digital TransformationMongoDB
Digital Transformation has developed from hype into a “standard” tool for businesses that need to modernise and compete. Experiencing pressure from new market entrants, incumbents are challenged on a daily basis to redefine their ways of doing business. This doesn’t only include people and processes, but of course also the underlying technology. With data being the force behind the most successful transformation stories in the past years, we are explored some of the challenges of legacy Information Management Systems, and look at new ways of managing Data in Motion, Data at Rest, and Data in Use to drive a successful Digital Transformation programme to gain a competitive advantage.
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
Travellers are demanding more exhaustive, accurate, and relevant results when they search for flights, and they want these results instantly – even when there can be 100 Billion travel options for a single trip. Amadeus’s “Instant Search” feature was built to meet those requirements. These searches are not trivial. Several terabytes of constantly evolving data is needed to reply instantly to questions like, “I live in Frankfurt, where can I go this weekend for €200?” “What’s the cheapest and most convenient flight for a MongoDB Europe attendee?” This technical session will show you how Amadeus integrated MongoDB into its system, and how it allowed us to handle huge numbers of updates and searches in a high-volume system, to deliver the next generation of flight search products. It will cover topics such as how we discovered which extra indexes were needed, how we were able to get the balancer to meet our needs, and how we modelled our data for optimal performance.
Agile Software Development is becoming the defacto way of building software these days. More and more enterprises, from large fortune 500 to small shop start-ups, are adopting agile development methodologies. But Agile Software development is more than just a methodology or a practice. It's also a combined set of tools and platforms that today are at our disposal to allows to iterate faster, get-to-market sooner and also fail faster. These set of tools augment our development cycles by a few orders of magnitude and allow developers to be much more productive.
AWS is an incredibly popular environment for running MongoDB deployments. Today you have many choices about instance type, storage, network config, security, how you configure MongoDB processes, and more. In addition, you now have options when it comes to tooling to help you manage and operate your deployment. In this session, we’ll take a look at several recommendations that can help you get the best performance out of AWS.
Presented by Radu Craioveanu, Director of Software Development, Clinical Systems, Fresenius Medical Care at MongoDB Evenings New England 2017.
resenius is a large healthcare enterprise, specializing in dialysis care. Fresenius' 40,000 clinicians and physicians deliver over 100,000 dialysis treatments per day, across 8 time zones. There is significant pressure to improve treatment outcomes, lower costs, expand the patient coverage, and overall become a Value Based Services provider, sharing the risk with the payers, the insurance companies. This pressures requires Fresenius to adapt, change, and leverage technologies and processes that can enable a rapid transformation to Value Based Care. </br></br>Using technologies and partnerships with players such as MongoDB, Red Hat, and others with a similar, open source innovative approach to progress, Fresenius has been able to implement a healthcare platform that is the foundation onto which the business can transform itself.
MongoDB has enabled Fresenius to achieve high availability of systems across multiple data centers, a data lake concept used for predictive analytics and reporting, enhanced messaging capabilities, fast, effective and distributed archiving, rapid application development via MEAN stacks, and Red Hat Open Shift Docker Container ready-to-use persistence.
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...MongoDB
Jeremiah Ivan, VP of Engineering, Merrill Corporation
In the span of 12 months Merrill was able to move from a monolithic and hard-to-change architecture to a fast-moving, agile development platform, enabled by the MongoDB database. We’ll talk about the technology, people, and process changes involved in the transformation. We hope that participants in this session will come away with the bits and pieces of a recipe for success that they can apply to their environment.
Webinar: Faster Big Data Analytics with MongoDBMongoDB
Learn how to leverage MongoDB and Big Data technologies to derive rich business insight and build high performance business intelligence platforms. This presentation includes:
- Uncovering Opportunities with Big Data analytics
- Challenges of real-time data processing
- Best practices for performance optimization
- Real world case study
This presentation was given in partnership with CIGNEX Datamatics.
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
Development time is wasted as the bulk of the work shifts from adding business features to struggling with the RDBMS. MongoDB, the leading NoSQL database, offers a flexible and scalable solution.
Presented by Rob Walters, Solutions Architect, MongoDB, at MongoDB Evenings New England 2017.
MongoDB 3.6 is the latest version of the world's most popular document database. In this session we will cover the key themes of the release including speed to develop, speed to production and speed to insight. Learn about the key features that support these themes and how you can start leveraging them today!
MongoDB 3.6 helps you *move at the speed of your data* - turning developers, operations teams, and analysts into a growth engine for the business. It enables new apps to be delivered to market faster, running reliably and securely at scale, and unlocking insights and intelligence in real time. Learn more: https://www.mongodb.com/mongodb-3.6
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
In the world of big data, legacy modernization, siloed organizations, empowered customers, and mobile devices, making informed choices about your enterprise infrastructure has become more important than ever. The alternatives are abundant, and the successful Enterprise Architect must constantly discern which new technology is just a shiny object and which will add true business value.
MongoDB is more than just a great application database for developers; it gives Enterprise Architects new capabilities to solve previously difficult architectural requirements much more easily. Take for example the challenge of many siloed systems at MetLife – with MongoDB, the Metlife team was able to successfully provide a single view into those 70 systems, in only 3 months.
In this webinar, we will:
Explore real life challenges enterprises face with case studies of their solutions
Consider how best to introduce MongoDB in the enterprise
Give an overview of how to optimize the use of MongoDB
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
This session will be a case study of eBay’s experience running MongoDB for project Zoom, in which eBay stores all media metadata for the site. This includes references to pictures of every item for sale on eBay. This cluster is eBay's first MongoDB installation on the platform and is a mission critical application. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of the project and its underlying architecture.
Presented by Claudius Li, Solutions Architect at MongoDB, at MongoDB Evenings New England 2017.
MongoDB Atlas is the premier database as a service offering. Find out how MongoDB Atlas can help your team to deploy more easily, develop faster and easily manage deployment, maintenance, upgrades and expansions. We will also demonstrate some of the key features and tools that come with MongoDB Atlas.
The importance of efficient data management for Digital TransformationMongoDB
Digital Transformation has developed from hype into a “standard” tool for businesses that need to modernise and compete. Experiencing pressure from new market entrants, incumbents are challenged on a daily basis to redefine their ways of doing business. This doesn’t only include people and processes, but of course also the underlying technology. With data being the force behind the most successful transformation stories in the past years, we are explored some of the challenges of legacy Information Management Systems, and look at new ways of managing Data in Motion, Data at Rest, and Data in Use to drive a successful Digital Transformation programme to gain a competitive advantage.
MongoDB Europe 2016 - Choosing Between 100 Billion Travel Options – Instant S...MongoDB
Travellers are demanding more exhaustive, accurate, and relevant results when they search for flights, and they want these results instantly – even when there can be 100 Billion travel options for a single trip. Amadeus’s “Instant Search” feature was built to meet those requirements. These searches are not trivial. Several terabytes of constantly evolving data is needed to reply instantly to questions like, “I live in Frankfurt, where can I go this weekend for €200?” “What’s the cheapest and most convenient flight for a MongoDB Europe attendee?” This technical session will show you how Amadeus integrated MongoDB into its system, and how it allowed us to handle huge numbers of updates and searches in a high-volume system, to deliver the next generation of flight search products. It will cover topics such as how we discovered which extra indexes were needed, how we were able to get the balancer to meet our needs, and how we modelled our data for optimal performance.
Agile Software Development is becoming the defacto way of building software these days. More and more enterprises, from large fortune 500 to small shop start-ups, are adopting agile development methodologies. But Agile Software development is more than just a methodology or a practice. It's also a combined set of tools and platforms that today are at our disposal to allows to iterate faster, get-to-market sooner and also fail faster. These set of tools augment our development cycles by a few orders of magnitude and allow developers to be much more productive.
AWS is an incredibly popular environment for running MongoDB deployments. Today you have many choices about instance type, storage, network config, security, how you configure MongoDB processes, and more. In addition, you now have options when it comes to tooling to help you manage and operate your deployment. In this session, we’ll take a look at several recommendations that can help you get the best performance out of AWS.
Presented by Radu Craioveanu, Director of Software Development, Clinical Systems, Fresenius Medical Care at MongoDB Evenings New England 2017.
resenius is a large healthcare enterprise, specializing in dialysis care. Fresenius' 40,000 clinicians and physicians deliver over 100,000 dialysis treatments per day, across 8 time zones. There is significant pressure to improve treatment outcomes, lower costs, expand the patient coverage, and overall become a Value Based Services provider, sharing the risk with the payers, the insurance companies. This pressures requires Fresenius to adapt, change, and leverage technologies and processes that can enable a rapid transformation to Value Based Care. </br></br>Using technologies and partnerships with players such as MongoDB, Red Hat, and others with a similar, open source innovative approach to progress, Fresenius has been able to implement a healthcare platform that is the foundation onto which the business can transform itself.
MongoDB has enabled Fresenius to achieve high availability of systems across multiple data centers, a data lake concept used for predictive analytics and reporting, enhanced messaging capabilities, fast, effective and distributed archiving, rapid application development via MEAN stacks, and Red Hat Open Shift Docker Container ready-to-use persistence.
MongoDB and Our Journey from Old, Slow and Monolithic to Fast and Agile Micro...MongoDB
Jeremiah Ivan, VP of Engineering, Merrill Corporation
In the span of 12 months Merrill was able to move from a monolithic and hard-to-change architecture to a fast-moving, agile development platform, enabled by the MongoDB database. We’ll talk about the technology, people, and process changes involved in the transformation. We hope that participants in this session will come away with the bits and pieces of a recipe for success that they can apply to their environment.
Webinar: Faster Big Data Analytics with MongoDBMongoDB
Learn how to leverage MongoDB and Big Data technologies to derive rich business insight and build high performance business intelligence platforms. This presentation includes:
- Uncovering Opportunities with Big Data analytics
- Challenges of real-time data processing
- Best practices for performance optimization
- Real world case study
This presentation was given in partnership with CIGNEX Datamatics.
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
Development time is wasted as the bulk of the work shifts from adding business features to struggling with the RDBMS. MongoDB, the leading NoSQL database, offers a flexible and scalable solution.
In this presentation we will discuss what the state of the construction economy and employment will look like in 2017. Learn what your business can do about it and how to come up with a game plan.
This talk will describe the changes which went into MongoDB 3.0 in order to allow storage engines to achieve their maximum concurrency potential. In MongoDB 3.0, concurrency control has been separated into two levels: top-level, which protects the database catalog, and storage engine-level, which allows each individual storage engine implementation to manage its own concurrency. We will start from the top and introduce the concept of multi-granularity locking and how it protects the database catalog. We will then explain how the MongoDB lock manager works and how it allows storage engines to manage their own concurrency control without imposing any additional overhead.
Learn how to build new classes of sophisticated, real-time analytics by combining Apache Spark, the industry's leading data processing engine, with MongoDB, the industry’s fastest growing database.
We live in a world of “big data.” But it isn’t just the data itself that is valuable – it’s the insight it can generate. How quickly an organization can unlock and act on that insight has become a major source of competitive advantage. Collecting data in operational systems and then relying on nightly batch extract, transform, load (ETL) processes to update the enterprise data warehouse (EDW) is no longer sufficient.
In this live session, we show you how MongoDB and Spark work together and provide examples using the new Spark Connector for MongoDB.
This session was sponsored by Stratio & Paradigma.
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
This talk, given by Maryann Xue and Julian Hyde at Hadoop Summit, San Jose on June 30th, 2016, describes how we re-engineered Apache Phoenix with a cost-based optimizer based on Apache Calcite.
Apache Phoenix has rapidly become a workhorse in many organizations, providing a convenient standard SQL interface to HBase suitable for a wide variety of workloads from transactions to ETL and analytics. But Phoenix's initial query optimizer was based on static optimization procedures and thus could not choose between several potential plans or indices based on cost metrics.
We describe how we rebuilt Phoenix's parser and query optimizer using the Calcite framework, improving Phoenix's performance and SQL compliance. The new architecture uses relational algebra as an intermediate language, and this enables you to switch in other engines, especially those also based on Calcite. As an example of this, we demonstrate querying a Phoenix database via Apache Drill.
Big Data Analytics for Real-time Operational Intelligence with Your z/OS DataPrecisely
Big Iron z/OS systems produce an enormous amount of operational data, but the challenge for the past few decades has been how to go beyond basic performance and availability management and extract the information that can provide IT operational intelligence? You need analytical insight into z/OS operations, security data, and service delivery in real-time for the success of your business.
Watch this webcast, to learn:
o Challenges that have inhibited z/OS analytics and how to overcome those challenges by forwarding critical IBM z/OS mainframe data to Splunk Enterprise for analysis.
o How gain better insights into security threats on z/OS and across your enterprise.
o How to leverage Splunk IT Service Intelligence to monitor critical business services reliant on z/OS critical components like CICS and DB2.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Watch the webinar to learn:
- What MongoDB is and where it's used
- What data streaming is and where it fits into modern data architectures
- How Kafka works, what it delivers, and where it's used
- How to operationalize the Data Lake with MongoDB & Kafka
- How MongoDB integrates with Kafka – both as a producer and a consumer of event data
The webinar is co-presented with Confluent, the company founded by the creators of Apache Kafka.
The term "Data Lake" has become almost as overused and undescriptive as "Big Data". Many believe that centralizing datasets in HDFS makes a data lake, but then they struggle to realize any tangible value. This talk will redefine the "Data Lake" by describing four specific, key characteristics that we at Koverse have learned are crucial to successful enterprise data lake deployments. These characteristics are 1) indexing and search across all data sets, 2) interactive access for all users in the enterprise, 3) multi-level access control, and 4) integration with data science tools. These characteristics define a system that lets people realize value from their data versus getting lost in the hype. The talk will go on to provide a technical description of how we have integrated several projects, namely Apache Accumulo, Hadoop, and Spark, to implement an enterprise data lake with these key features.
Les entreprises adoptent rapidement le Data Lake comme composant clé de leur architecture Data. Les technologies de l’écosystème Big Data ont fait émerger ce concept, permettant le stockage global des données de l’entreprise au sein d’une plateforme unique, quelles que soient leur format, leur nature ou leur origine. Cependant la mise en place d’une telle plateforme n’est pas sans éceuil. Mal organisé et mal géré, ce dépot centralisé se transformera en véritable “Data Swamp” (marécage de données) inutile et ne permettant pas la création de valeur à partir des données de votre entreprise. Comment automatiser la préparation, la consolidation et la mise à disposition de vos données aux data analysts, data scientists et systèmes tiers ? Comment assurer une gouvernance de la donnée au sein d'un Data Lake ? Comment assurer la conformité de votre Data Lake à vos exigences de sécurité et de confidentialité de la donnée ? Nous aborderons ces questions et proposerons des solutions à mettre en place pour faire de votre Data Lake un succès.
Webinar: MongoDB Schema Design and Performance ImplicationsMongoDB
In this session, you will learn how to translate one-to-one, one-to-many and many-to-many relationships, and learn how MongoDB's JSON structures, atomic updates and rich indexes can influence your design. We will also explore implications of storage engines, indexing and query patterns, available tools and related new features in MongoDB 3.2.
Big Data is the reality of modern business: from big companies to small ones, everybody is trying to find their own benefit. Big Data technologies are not meant to replace traditional ones, but to be complementary to them. In this presentation you will hear what is Big Data and Data Lake and what are the most popular technologies used in Big Data world. We will also speak about Hadoop and Spark, and how they integrate with traditional systems and their benefits.
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
Depuis les années 1980, le volume de données produit et le risque lié à ces données ont littéralement explosé. 90% des données existantes aujourd’hui ont été créé ces 2 dernières années, dont 80% sont non structurées. Avec plus d’utilisateurs et le besoin de disponibilité permanent, les risques sont beaucoup plus élevés.
Quels sont les paramètres de bases de données qu’un décideur doit prendre en compte pour déployer ses applications innovantes?
In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed.
Using real time big data analytics for competitive advantageAmazon Web Services
Many organisations find it challenging to successfully perform real-time data analytics using their own on premise IT infrastructure. Building a system that can adapt and scale rapidly to handle dramatic increases in transaction loads can potentially be quite a costly and time consuming exercise.
Most of the time, infrastructure is under-utilised and it’s near impossible for organisations to forecast the amount of computing power they will need in the future to serve their customers and suppliers.
To overcome these challenges, organisations can instead utilise the cloud to support their real-time data analytics activities. Scalable, agile and secure, cloud-based infrastructure enables organisations to quickly spin up infrastructure to support their data analytics projects exactly when it is needed. Importantly, they can ‘switch off’ infrastructure when it is not.
BluePi Consulting and Amazon Web Services (AWS) are giving you the opportunity to discover how organisations are using real time data analytics to gain new insights from their information to improve the customer experience and drive competitive advantage.
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
La nascita dei data lake - La aziende, ormai, sono sommerse dai dati e il classico datawarehouse fa fatica a macinare questi dati per numerosità e varietà. In molti hanno iniziato a guardare a delle architetture chiamate Data Lakes con Hadoop come tecnologia di riferimento. Ma questa soluzione va bene per tutto? Vieni a capire come operazionalizzare i data lakes per creare delle moderne architetture di gestione dati.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
Erik Baardse and Ajit Gadge from EDB Postgres presented on how to transform your DBMS in order to drive digital business. How Postgres enables you to support a wider range of workloads with your relational database which opens the Big Data doors. They also cover EnterpriseDB’s Strategy around Big Data which focuses on 3 areas and finally last but not the last how to find money in IT with Big Data and digital transformation
Data Streaming with Apache Kafka & MongoDBconfluent
Explore the use-cases and architecture for Apache Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Businesses are generating more data than ever before.
Doing real time data analytics requires IT infrastructure that often needs to be scaled up quickly and running an on-premise environment in this setting has its limitations.
Organisations often require a massive amount of IT resources to analyse their data and the upfront capital cost can deter them from embarking on these projects.
What’s needed is scalable, agile and secure cloud-based infrastructure at the lowest possible cost so they can spin up servers that support their data analysis projects exactly when they are required. This infrastructure must enable them to create proof-of-concepts quickly and cheaply – to fail fast and move on.
The Common BI/Big Data Challenges and Solutions presented by seasoned experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session.
Similar to Unlocking Operational Intelligence from the Data Lake (20)
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. 2
The World is Changing
Digital Natives & Digital Transformation
Volume
Velocity
Variety
Iterative
Agile
Short Cycles
Always On
Secure
Global
Open-Source
Cloud
Commodity
Data Time
Risk Cost
6. 6
• 24% CAGR: Hadoop,
Spark & Streaming
• 18% CAGR: Databases
• Databases are key
components within the
big data landscape
“Big Data” is More than Just Hadoop
9. 9
How to Avoid Being in the 70%?
1. Unify data lake analytics with
the operational applications
2. Create smart, contextually
aware, data-driven apps &
insights
3. Integrate a database layer with
the data lake
10. 10
MongoDB & Hadoop: What’s Common
Distributed Processing & Analytics
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
11. 11
MongoDB & Hadoop: What’s Different
Distributed Processing & Analytics
• Data stored as large files (64MB-128MB
blocks). No indexes
• Write-once-read-many, append-only
• Designed for high throughput scans
across TB/PB of data.
• Multi-minute latency
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
12. 12
MongoDB & Hadoop: What’s Different
Distributed Processing & Analytics
• Random access to subsets of data
• Millisecond latency
• Expressive querying, rich
aggregations & flexible indexing
• Update fast changing data, avoid re-
write / re-compute entire data set
• Data stored as large files (64MB-128MB
blocks). No indexes
• Write-once-read-many, append-only
• Designed for high throughput scans
across TB/PB of data.
• Multi-minute latency
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
13. 13
Bringing it Together
Online Services
powered by
Back-end machine learning
powered by
• User account & personalization
• Product catalog
• Session management & shopping cart
• Recommendations
• Customer classification & clustering
• Basket analysis
• Brand sentiment
• Price optimization
MongoDB
Connector for
Hadoop
14. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
15. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Configure where to
land incoming data
16. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Raw data processed to
generate analytics models
17. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
MongoDB exposes
analytics models to
operational apps.
Handles real time
updates
18. MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Compute new
models against
MongoDB &
HDFS
19. 19
Operational Database Requirements
1 “Smart” integration with the data lake
2 Powerful real-time analytics
3 Flexible, governed data model
4 Scale with the data lake
5 Sophisticated management & security
21. 21
Query and Data Model
MongoDB Relational Column Family
(i.e. HBase)
Rich query language & secondary
indexes
Yes Yes Requires integration
with separate Spark /
Hadoop cluster
In-Database aggregations & search Yes Yes Requires integration
with separate Spark /
Hadoop cluster
Dynamic schema Yes No Partial
Data validation Yes Yes App-side code
• Why it matters
– Query & Aggregations: Rich, real time analytics against operational data
– Dynamic Schema: Manage multi-structured data
– Data Validation: Enforce data governance between data lake & operational apps
22. 22
Data Lake Integration
MongoDB Relational Column Family
(i.e. HBase)
Hadoop + secondary indexes Yes Yes: Expensive No secondary
indexes
Spark + secondary indexes Yes Yes: Expensive No secondary
indexes
Native BI connectivity Yes Yes 3rd-party connectors
Workload isolation Yes Yes: Expensive Load data to
separate
Spark/Hadoop
cluster
• Why it matters
– Hadoop + Spark: Efficient data movement between data lake, processing layer & database
– Native BI Connectivity: Visualizing operational data
– Workload isolation: separation between operational and analytical workloads
23. 23
Operationalizing for Scale & Security
MongoDB Relational Column Family
(i.e. HBase)
Robust security controls Yes Yes Yes
Scale-out on commodity hardware Yes No Yes
Sophisticated management platform Yes Yes Monitoring only
• Why it matters
– Security: Data protection for regulatory compliance
– Scale-Out: Grow with the data lake
– Management: Reduce TCO with platform automation, monitoring, disaster recovery
27. 27
Problem Why MongoDB ResultsProblem Solution Results
Existing EDW with nightly
batch loads
No real-time analytics to
personalize user experience
Application changes broke ETL
pipeline
Unable to scale as services
expanded
Microservices architecture running on AWS
All application events written to Kafka queue,
routed to MongoDB and Hadoop
Events that personalize real-time experience (ie
triggering email send, additional questions,
offers) written to MongoDB
All event data aggregated with other data
sources and analyzed in Hadoop, updated
customer profiles written back to MongoDB
2x faster delivery of new
services after migrating to new
architecture
Enabled continuous delivery:
pushing new features every
day
Personalized user experience,
plus higher uptime and
scalability
UK’s Leading Price Comparison Site
Out-pacing Internet search giants with continuous delivery pipeline
powered by microservices & Docker running MongoDB, Kafka and
Hadoop in the cloud
28. 28
Problem Why MongoDB Results
Problem Solution Results
Customer data scattered across
100+ different systems
Poor customer experience: no
personalization, no consistent
experience across brands or
devices
No way to analyze customer
behavior to deliver targeted offers
Selected MongoDB over HBase for
schema flexibility and rich query support
MongoDB stores all customer profiles,
served to web, mobile & call-center apps
Distributed across multiple regions for DR
and data locality
All customer interactions stored in
MongoDB, loaded into Hadoop for
customer segmentation
Unified processing pipeline with Spark
running across MongoDB and Hadoop
Single profile created for each
customer, personalizing
experience in real time
Revenue optimization by
calculating best ticket prices
Reduce competitive pressures
by identifying gaps in product
offerings
Customer Data Management
Single view and real-time analytics with MongoDB,
Spark, & Hadoop
Leading
Global
Airline
29. 29
Problem Why MongoDB Results
Problem Solution Results
Commercialize a national security
platform
Massive volumes of multi-
structured data: news, RSS &
social feeds, geospatial, geological,
health & crime stats
Requires complex analysis,
delivered in real time, always on
Apache NiFI for data ingestion, routing
& metadata management
Hadoop for text analytics
HANA for geospatial analytics
MongoDB correlates analytics with
user profiles & location data to deliver
real-time alerts to corporate security
teams & individual travelers
Enables Prescient to uniquely
blend big data technology with its
security IP developed in
government
Dynamic data model supports
indexing 38k data sources,
growing at 200 per day
24x7 continuous availability
Scalability to PBs of data
World’s Most Sophisticated
Traveler Safety Platform
Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi
& SAP HANA
30. 30
Problem Why MongoDB Results
Problem Solution Results
Requirement to analyze data over
many different dimensions to detect
real time threat profiles
HBase unable to query data
beyond primary key lookups
Lucene search unable to scale with
growth in data
MongoDB + Hadoop to collect and
analyze data from internet sensors in
real time
MongoDB dynamic schema enables
sensor data to be enriched with
geospatial tags
Auto-sharding to scale as data
volumes grow
Run complex, real-time analytics on
live data
Improved query performance by
over 3x
Scale to support doubling of data
volume every 24 months
Deploy across global data
centers for low latency user
experience
Engineering teams have more
time to develop new features
Powering Global Threat
Intelligence
Cloud-based real-time analytics with MongoDB & Hadoop
32. Conclusion
1 Data lakes enabling
enterprises to affordably
capture & analyze more data
2 Operational and analytical
workloads are converging
3 MongoDB is the key
technology to operationalize
the data lake
33. 33
MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
MongoDB Enterprise Advanced24x7Support
(1hourSLA)
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
Authorization Auditing
Encryption
(In Flight & at Rest)
Authentication
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
35. 35
Resources to Learn More
• Guide: Operational Data Lake
• Whitepaper: Real-Time
Analytics with Apache Spark &
MongoDB
36.
37. 37
For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
38. 38
Problem Why MongoDB Results
Problem Solution Results
System failures in online banking
systems creating customer sat
issues
No personalization experience
across channels
No enrichment of user data with
social media chatter
Apache Flume to ingest log data &
social media streams, Apache Spark
to process log events
MongoDB to persist log data and
KPIs, immediately rebuild user
sessions when a service fails
Integration with MongoDB query
language and secondary indexes to
selectively filter and query data in real
time
Improved user experience, with
more customers using online,
self-service channels
Improved services following
deeper understanding of how
users interact with systems
Greater user insight by adding
social media insights
One of World’s Largest Banks
Creating new customer insights with MongoDB & Spark
39. 39
LEGACY FUTURE STATE
APPS On-Premise, Monoliths SaaS, Microservices
DATABASE Relational (Oracle) Non-Relational (MongoDB)
EDW Teradata, Oracle, etc. Hadoop
COMPUTE Scale-Up Server Containers / Commodity Server / Cloud
STORAGE SAN Local Storage & Data Lakes
NETWORK Routers and Switches Software-Defined Networks
The New Enterprise Stack
40. Operational Application
Analytics Application
MongoDB Primary
MongoDB Secondary MongoDB Secondary
Real Time analytics to
inform operational
application
Querying
operational data
Workload Isolation for Real-Time Analytics
41. 41
Handling Multi-Structured Data from the Data Lake
Flexible, Governed Data Model
{
first_name: ‘Paul’,
surname: ‘Miller’,
cell: 447557505611,
city: ‘London’,
location: [45.123,47.232],
Profession: [‘banking’, ‘finance’, ‘trader’],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}
Fields can contain an array
of sub-documents
Typed field values
Fields can contain arrays
Number
42. 42
Expressive Query Language, Rich
Secondary Indexes
Rich Queries
• Find Paul’s cars
• Find everybody in London with a car between 1970
and 1980
Geospatial • Find all of the car owners within 5km of Trafalgar Sq.
Text Search • Find all the cars described as having leather seats
Aggregation • Calculate the average value of Paul’s car collection
Map Reduce
• What is the ownership pattern of colors by geography
over time (is purple trending in China?)
43. 43
Visualizing Operational Data
MongoDB Connector for BI
Visualize and explore multi-structured data
using SQL-based BI platforms.
Your BI Platform
BI Connector
Provides Schema
Translates Queries
Translates Response
44. 44
Enterprise-Grade Security
*Included with MongoDB Enterprise Advanced
BUSINESS NEEDS SECURITY FEATURES
Authentication SCRAM, LDAP*, Kerberos*, x.509 Certificates
Authorization Built-in Roles, User-Defined Roles, Field-Level Redaction
Auditing* Admin, DML, DDL, Role-based
Encryption
Network: SSL (with FIPS 140-2), Disk: Encrypted Storage Engine* or Partner
Solutions
46. 46
Management Tooling:
MongoDB Ops Manager
• Monitoring & alerting
• Integration to APM platforms
• Prescriptive management with
query profiling
• Automated cluster
provisioning, scaling and
upgrades
• Continuous, point in time
backup
Editor's Notes
Seen rapid growth in adoption of the data lake – a centralized repository for many new data sources orgs now collecting
But not without challenges – primary challenge is how to make analytics generated by the data lake available to our real time, operational apps
So we are going to cover
Rise of data lake
Challenges presented in getting most biz value out of data lake
Role that databases play, and requirements
Case studies who are unlockig insight from the data lake
As enterprises bring more products and services on line as part of digital transformation initiatives, one thing don’t lack today is data – from streams of sensor readings, to social sentiment, to machine logs, mobile apps, and more.
Analysts estimate volumes growing at 40% per annum, with 80% of all data unstructured.
Same time – we see more pressure on time to market, on exposing apps to global audiences, and in reducing cost of delivering new services
Trends fundamentally changes how enterprises build and run modern apps
What all of this new data available, we are creating an insight economy
Uncovering new insights by collecting and analyzing this data carries the promise of competitive advantage and efficiency savings. Better understand customers by predicting what they might buy based on behavior, on demographics – could be optimizing supply chain to better or faster routes. Reducing risk of fraud by identifying suspicious behavior – its all about that data
Those that don’t harness data are at major disadvantage
understand the past, monitor the present, and predict the future
Traditional source of data from operational apps has been DW, take all this data in, then create analytics from it
However, the traditional Enterprise Data Warehouse (EDW) is straining under the load, overwhelmed by the sheer volume and variety of data pouring into the business. Costs – hundreds to thousands of $ per TB v 10s to hundreds in commodity systems
Becaise of these challenges many organizations have turned to Hadoop as a centralized repository for this new data, creating what many call a data lake. Not are replacement – adjunct – stores all new data – apply new analytics which combined with traditional reporting coming from the DW
Gartner estimate around 50% of ents have or are in the process of rolling out data lakes
When we think about data lakes, think about big data, and big data often associated with Hadoop – reality is more than just Hadoop
Market growth forecast by wikibon – “big data revenues” growing from $19bn 2016 to $92bn in 2026. S/W outpacing h/w and PS. IDC forecasr Just under $50bn by 2019, 23% CAGR. Software growing fastest
Leading charge, Hadoop and spark. Closely followed by databases – key part of big data landscape – because they operationalize the data lake – link between backend data lake and front end apps that consume analytics to make those apps smarter
Hadoop – well established, celebrates 10th anniversry this year
Grown from HDFS and MR into dozens of projects - Gartner identify 19 common projects supported by 4 leading distros. Avg distro has many more projects – processing frameworks, to search, to provisionng and mgmt, to security to file formats to integration
Each project is developed independenytly – own roadmap, own dependencies – incredible complexity
HDFS is the common storage layer – against which processing frameworks run to produce outputs you see on the slide
While something like 50% of enterprises either have or are evaluating Hadoop to create new classes of app, not without its challenges
Appears in a number of Gartner analysis, any by the press
One of the fundamental challenges in integration is how to integrate data lake with your operational systems
Operational apps run the business – how do you expose analytics created in the data lake to better serve customers with more relevant products and offers, to better drive efficiency savings from IoT-enabled smart factory
Unify data lake analytics with the operational applications
Enables you to create smart, contextually aware, data-driven apps
Integrated database layer operationalizes the data lake
Obvious question is why do we need a database when we have Hadoop. Comes down to how each platform persists and accesses data. HDFS is a file system – accesses data in batches of 128MB blocks. MongoDB is a database which provides fine grained access to data at the level of individual records – gives each system very different properties – talk through.
Despite those differences, lots of similarities – in how we process data – MR, Spark. These are unopinionated on underlying persistence layer – could be HDFS, could be MongoDB. Means can unify analytics across data lake and in your database
Both MongoDB and HDFS – common atrributes provide: Schema on Read, multiple replicas for fault tolerance horizontal scale, low TCO.
But have different characteristics in how they store and access data – means suited to different parts of the data lake deployment
Differences come in how data is stored, accessed and updated. Hadoop is a file system – it stores data in files in blocks – has no knowledge of that underlying data – its has no indexes. If you want to access a specific record, scan all the data that stored in the file where the record is located – could be tens of MBs
HDFS characteristics
WORM, ie update customer data, rewite all that customer data, not just individual customers
Hadoop excels at generating analytics models by scanning and processing large datasets, is not designed to provide real-time, random access by operational applications.
the time to read the whole dataset is more important than the latency in reading the first record.
http://stackoverflow.com/questions/15675312/why-hdfs-is-write-once-and-read-multiple-times/37300268#37300268\
But MongoDB more than just a filesystem. Full database, so gives you a whole bunch of things hdfs doesn’t give –
Millisecond latency query responsiveness.
Random access to indexed subsets of data.
Expressive querying & flexible indexing: Supporting complex queries and aggregations against the data in real time, making online applications smarter and contextual.
Updating fast-changing data in real time as users interact with online applications, without having to rewrite the entire data set.
fine-grained access with complex filtering logic,
Use distributed processing libs against it – mongo collection or doc looks like an input or output in hdfs. Rather than load a file, load a dataframe. Hive sees Mongodb as a table
Longer jobs
Batch analytics
Append only files
Great for scanning all data or large subsets in files
When you bring the database and the data lake together, you can build powerful, data driven apps
Take a real life example – data lake of a large retailers
Online store front and ecomm engine is powered by MongoDB – handling customer profiles, sessions, baskets, product catalogs – presenting recommendations and offers
As they browse the ite, all of their activity is being written back to Hadoop –blending it with other data sources – social feeds, demogragpahics, market data, credit scores, currency feeds, to segment and cluster customers
These can then be exposed to MongoDB, so when customers come back, presented with personalized experience – based on what you have browsed before – what you are likely want to purchase next.
Could not serve that operational app that is dealing individual customers from hdfs – not real time, no indexes to access just the customer details you need. No way of updating customer record –everything is rewritten and recomputed
Regression and classification for customer clustering
Lets go deeper and wider
This is a design pattern for the data lake – multiple components that collectively handle ingest, storage, processing and analysis of data, then serving it to consuming operational apps
Step thru
Data ingestion: Data streams are ingested to a pub/sub message queue, which routes all raw data into HDFS.
Often also have event processing running against the queue to find interesting events that need to be consumed by the operational apps immediately - displaying an offer to a user browsing a product page, or alarms generated against vehicle telemetry from an IoT apps, are routed to MongoDB for immediate consumption by operational applications.
Raw data is loaded into the data lake where we can use Hadoop jobs – MR or Spark, generate analytics models from the raw data – see examples in the layer above HDFS
MongoDB exposes these models to the operational processes, serving indexed queries and updates against them with real-time latency
The distributed processing frameworks can re-compute analytics models, against data stored in either HDFS or MongoDB, continuously flowing updates from the operational database to analytics models
Look at some examples of users who have deployed this type of design pattern little later
Beyond low latency performance, specific requirements. Need much more than just a datastore, fully-featured database serving as a System of Record for online applications
Tight integration between MongoDB and the data lake – minimize data movement between them, fullt exploit native capabilities of each part of the system
Need to be able to serve operational workloads, run analytics against live operational data –ie top trending articles now so I know where to place my ads, how many widgets coming off my produiction line are failing QA, is that up or down with previous trends. Gartner calls it HTAP (Hybrid Transactional and Analytical Processing), Forrester = transalytics – to do that, need: Powerful query language, secondary indexes, aggregations & transformations all within the database – not ETL into a warehouse
Workload isolation: operational & analytics – so don’t contend for the same resource
Flexible schema to handle multi-structured data, but need to enforce governance to that data
Secure access to the data: – the operational DB typically accessed by a much broader audience than Hadoop, so security controls critical – robust access controls – LDAP, kerberos, RBAC
Auditing of all events for reg compliance. Encr of data in motion and at rest, all built into the database
Need to scale as the data lake scales – means scaling out on commodity hardware, often across geo regions
To simplify the envrionment, need sophisticated mgmt tools: to automate database deployment, scaling, monitoring and alerting, and disaster recovery.
Tight integration: not enough just to move data between analytics and operational layers – need to move it efficiently. Connectors should allow selective filtering by using secondary indexes to extract and process only the range of data it needs – for example, retrieving all customers located in a specific geography. This is very different from other databases that do not support secondary indexes. In these cases, Spark and Hadoop jobs are limited to extracting all data based on a simple primary key, even if only a subset of that data is required for the query. This means more processing overhead, more hardware, and longer time-to-insight for the user.
Workload isolation: provision database clusters with dedicated analytic nodes, allowing users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application.
Flexible data model to store data of any structure, and easily evolve the model to capture new attribs – ie enriched user profiles with geospatial data. Also need to ensure data quality by enforcing validation rules against the data – to ensure it is appropriated typed, contains all attribs needed by the app
Expressive queries developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from moving data between operational and analytical engines. Secondary indexes give oppt to filter data in any way you need – key for low latency operational queries
Robust security controls: govern access, provide audit trails and enc data in flight and at rest
Scale-out – match scale out of data lake, as it grows, add new nodes to service higher data volumes or user load
Advanced management platform. To reduce data lake TCO and risk of application downtime, powerful tooling to automate database deployment, scaling, monitoring and alerting, and disaster recovery.
While its impt to provide low latency access to data, not enough to just support simple K-V lookups – demand is to get insights from data faster – so this is the role of RT analytics - track in RT where vehicles in your fleet, what social sentiment to an announcement you’ve just made, Correlate patterns of real time fraud attempts against specific domains – so this is where expressive query lang, secondary indexes, aggs in database are valuable.
MongoDB and RDBMS both have strong features – RDBMS further ahead – column family – little more than k-v. Need to move data out to other query frameworks or analytics nodes to get any intelligence – adds latency, adds complexity – more moving parts
RDBMS good in many areas, but lacks data model flexibility needed to handle rapidly changing, multi-structured data is where it falls downs.
CF – more schema flexibility than relational, but still need to pre-define columns, restrict speed to evolve apps
Data validtion – apply rules to data structures operational database stores – apps creates single view of your customer – data maybe spread across many repositories – loaded into data lake, creates single view, loads in mongo to serve operational apps – needs to ensure docs contains mandatory fields: unique customer identifiers, typed and formed in a specific way, ie ID is always an integer, email address always contains @. Doc validation in mongo enables you to do this. RDBMS full schema validation, so a little ahead – have to enforce govn in code in a CF database
Look at aggregrated scores – relationla abnd mongo evenly matched, with CF, much simpler datastore, long way behind
Hadoop and Spark integration: need to do more than just move vast amounts of data between each layer of the stack – need intelligent connectors that can push down predicates, filter data with secondary indexes – ie access all customers in a specific geo, without being able to access the DBs secondary indexe, and pre-aggregate data, moving a ton of data backward and forward – more processing cycles, longer latency.
MngoDB connector for Hadoop, and for Spark, both support these capabilities. CF doesn’t offer secondary indexes or aggs, so nothing to filter the data
RDBMS offers these capabilities in its connectors, but generally only available as expensive add-ons, hence downgraded
Workload isolation – ability to perform real time analytics on live operational data, without interfering with operational apps – don’t want some type of aggregation looking at how many deliveries your fleet of trucks has made with how quickly you can detect from sensor data than a vehicle has developed a fault – key to do this is distribute queries to dedicated nodes in the database cluster – some provisioned for operational data, then replicating to nodes dedicated to analytics. MongoDB – up to 50 members in a single replica set – configure analytics as hidden so never hit by op queries. CF, restricted to just 3 data replicas – there for HA, not for separation of different workloads. RDBMS, expensive add-on
Native BI connectivity – may not be relevant in all cases, but many orgs want to be able to create live dashboards reporting current state from op systems. MongoDB had a native BI connector that exposes database as an ODBC data source – visualize in anything from tableau to biz objects to excel. Rich tooling in relational world. CF, connector exist, 3rd party, don’t push down queries to the database, instead extract all data – so more computationally and network intensive to power dashboards
Security: data from operational databases exposed to apps and potentially millions of users – need to provide robust access controls, may include integration with LDAP, kerberos, PKI environments and RBAC to titghly seggregate who can do what in the DB. Enc data in flight and at rest, need to maintain a log of activity in the DB for forensic analysis
All solutions do well – big investment in Hadoop ecosystem, rapidly gainining ground on RDBMS, but doing it at much lower cost
Scale out – need to be able to scale as data lake scales, and more digital services opened up to users – non-relational databaes, core strenght. Fundamental challenge in RDBMS requires scale up, limited headroom, very expenive in proprietary h/w
Mgmt – Hadoop is complex, mgmt tools still primitive. For op database, need a platform that provides powerful tooling to automate database deployment, scaling, fine grained monitoring and alerting, and disaster recovery with point in time backups and automated restores. Rich tooling in relational world – big investment from Mongo to close that gap
Left hand side – maintained attribs of relational – blended with innovation from NoSQL
Uniquely differentiates mongodb from its peers in the non-relational DB market
Invest in tech that has production proven deployments, broad skills availability
With availability of Hadoop skills cited by Gartner analysts as a top challenge, it is essential you choose an operational database with a large available talent pool. This enables you to find staff who can rapidly build differentiated big data applications. Across multiple measures, including DB Engines Rankings, 451 Group NoSQL Skills Index and the Gartner Magic Quadrant for Operational Databases, MongoDB is the leading non-relational database.
Look at examples in action
CTM – UK’s leading price comparisons sites – moved from an on-prem RDBMS based monlithic app to microservices architecture powered by MongoDB with Hadoop at the back end providing analytics – enabled them better personalize customer experience and deepen relationships
Read through bullets
2nd example leading global airline. Thru M&A – multiple brands to service different countries and market sectors, but customer data spread across 100 different systems.
By using Hadoop and Spark, brought that data together to create a single view, and that is loaded into MongoDB which powers the online apps – web and mobile, as well as call center – so users get a consistent experience however they interact. All user data and ticket data is stored in MongoDB, then written back into Hadoop to run advanced analytics that allow ticket price optimization, identify offers, and gaps in product portfolio
Read bullets
Provide traveler safety platform for corp customers – if natural disaster or security incident while traveler away on biz, able to send real time alerts and advise on how to get to safety
Platform built for national govts, now launched for commercial usage - Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi & SAP HANA
Read bullets
McAfee – built its cloud based threat intelligence platform on MongoDB. Platform monitor threat activity for clients in RT – identifies attacks are taking place, identifies when users maybe interacting with insecure or suspicious sites
All RT activity is captured in MongoDB – provide alerting to security teams, sent to Hadoop for further backend analytics, with updated threat profiles written back to mongo
MongoDB is open source – also provide EA
Collection of software and support to run in production at scale
The Stratio Apache Spark-certified Big Data (BD) platform is used by an impressive client list including BBVA, Just Eat, Santander, SAP, Sony, and Telefonica. The company has implemented a unified real-time monitoring platform for a multinational banking group operating in 31 countries with 51 million clients all over the world. The bank wanted to ensure a high quality of service and personalized experience across its online channels, and needed to continuously monitor client activity to check service response times and identify potential issues. The application was built on a modern technology foundation including:
Apache Flume to aggregate log data
Apache Spark to process log events in real time
MongoDB to persist log data, processed events and Key Performance Indicators (KPIs).
The aggregated KPIs, stored by MongoDB enable the bank to analyze client and systems behavior in real time in order to improve the customer experience. Collecting raw log data allows the bank to immediately rebuild user sessions if a service fails, with analysis generated by MongoDB and Spark providing complete traceability to quickly identify the root cause of any issue.
The project required a database that provided always-on availability, high performance, and linear scalability. In addition, a fully dynamic schema was needed to support high volumes of rapidly changing semi-structured and unstructured JSON data being ingested from a variety of logs, clickstreams, and social networks. After evaluating the project’s requirements, Stratio concluded MongoDB was the best fit. With MongoDB’s query projections and secondary indexes, analytic processes run by the Stratio BD platform avoid the need to scan the entire data set, which is not the case with other databases.
Digitial transformation not just impacting DW and analytics
Not just in field of datawarehouse and analytics – across the stack, we’re seeing transformations
Workload isolation. MongoDB replica sets can be provisioned with dedicated analytic nodes, allowing users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application. Using MongoDB inbuilt replication, don’t have complex and brittle ETL pipelines that are moving data between operational and analytical systems
MongoDB's document data model makes it easy for users to store and combine data of any structure, without giving up sophisticated validation rules. If new attributes need to be added – for example enriching user profiles with geo-location data – the schema can be modified without application downtime, and without having to update all existing records.
Can also enforce structure – take a user profile – need to ensure all have a unique ID stored as an int with a valid email address – use doc validation to enfoce that
enables developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools
Secondary indexes: MongoDB supports compound, unique, array, partial, TTL, geospatial, sparse, hash and text indexes to optimize for multiple query patterns, data types and application requirements. Indexes are essential when operating across slices of the data, for example updating the churn analysis of a subset of customers, without having to scan all customer data.
We need to visualize data for reporting and analytics – drive live dashboards
MongoDB BI Connector…
Provides the BI tool with the schema of the MongoDB collection to be visualized
Translates SQL statements issued by the BI tool into equivalent MongoDB queries that are sent to MongoDB for processing
Converts the results into the tabular format expected by the BI tool, which can then visualize the data based on user requirements
Protect our
Lot of investment in Hadop security, typically locked away to only a subset of analysts – the operational DB typically deployed to a much broader audience, so security controls critical – robust access controls – LDAP, kerberos, RBAC
Auditing of all events for reg compliance. Encr of data in motion and at rest, all built into the database
Need to be able to scale cost effectivly – as the data lake grows, we need to scale operational database layer in a way that is economic and doesn’t break apps
With auto-sharding, MongoDB can be distributed across multiple nodes – both wihin and across datacenters
Elastic - Increase or decrease capacity as you go, Automatic load balancing
Need sophisticated operational tooling to manage operational database layer