MongoDB is the alternative that allows you to efficiently create and consume data, rapidly and securely, no matter how it is structured across channels and products, and makes it easy to aggregate data from multiple systems, while lowering TCO and delivering applications faster.
Learn how Financial Services Organizations are Using MongoDB with this presentation.
This document discusses how retail banks can use MongoDB to address key challenges:
1) Changing regulations require agile systems that can adapt quickly.
2) Synchronizing global customer data in real-time is difficult due to latency issues.
3) Providing a 360-degree view of customers across channels is a challenge with siloed systems.
The case study describes how Infusion and MongoDB helped MetLife build a single customer view application in 90 days that aggregated data from 70 systems, providing a more holistic customer experience. This project demonstrated how acting like a startup can help large companies move faster.
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
This document discusses principles for building a healthy data platform, including:
1. Establishing explicit contracts between teams to define dependencies and service level agreements.
2. Abstracting the data platform into services for ingesting, storing, and processing data in motion and at rest.
3. Enabling observability of data pipelines through metadata collection and integration with tools like Marquez to provide lineage, availability, and change management visibility.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...MongoDB
The document introduces Kevin Wright and Jonathan Roeder, who both have experience with MongoDB. It then provides information about their company, which has been in business since 1999 and processes $2.8 billion annually on its ecommerce platform. It goes on to describe their multi-tenant CMS application built with .NET and MongoDB, including approaches taken for schema design, sharding implementation, and best practices.
Cassandra's data model is more flexible than typically assumed.
Cassandra allows tuning of consistency levels to balance availability and consistency. It can be made consistently when certain replication conditions are met.
Cassandra uses a row-oriented model where rows are uniquely identified by keys and group columns and super columns. Super column families allow grouping columns under a common name and are often used for denormalizing data.
Cassandra's data model is query-based rather than domain-based. It focuses on answering questions through flexible querying rather than storing predefined objects. Design patterns like materialized views and composite keys can help support different types of queries.
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
This document discusses how insurance companies use MongoDB. It provides examples of how MongoDB allows insurance companies to create a single customer view, consolidate data from multiple disparate systems, and distribute claims information globally in real-time. MongoDB provides a flexible schema, automatic replication of data, and the ability to query data locally for improved customer experience, risk analysis, fraud detection, and claims processing. The document highlights several insurance companies that have adopted MongoDB to unify customer data, modernize legacy systems, and power new data-driven applications and services.
This document describes building a powerful double-entry accounting system. It discusses key concepts of double-entry accounting including debits, credits, general ledger, balance sheet, income statement and cash flow statement. It also outlines how to model accounting entries and movements declaratively using schemas. Events like purchases and payments trigger movements which are made up of one or more accounting entries. Rules ensure movements balance and caches track account balances. Generative testing is used to check business invariants. The system provides an audit trail, handles many customers, and can detect operational issues for management and financial accounting.
This document discusses how retail banks can use MongoDB to address key challenges:
1) Changing regulations require agile systems that can adapt quickly.
2) Synchronizing global customer data in real-time is difficult due to latency issues.
3) Providing a 360-degree view of customers across channels is a challenge with siloed systems.
The case study describes how Infusion and MongoDB helped MetLife build a single customer view application in 90 days that aggregated data from 70 systems, providing a more holistic customer experience. This project demonstrated how acting like a startup can help large companies move faster.
Data platform architecture principles - ieee infrastructure 2020Julien Le Dem
This document discusses principles for building a healthy data platform, including:
1. Establishing explicit contracts between teams to define dependencies and service level agreements.
2. Abstracting the data platform into services for ingesting, storing, and processing data in motion and at rest.
3. Enabling observability of data pipelines through metadata collection and integration with tools like Marquez to provide lineage, availability, and change management visibility.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
Lessons Learned from Building a Multi-Tenant Saas Content Management System o...MongoDB
The document introduces Kevin Wright and Jonathan Roeder, who both have experience with MongoDB. It then provides information about their company, which has been in business since 1999 and processes $2.8 billion annually on its ecommerce platform. It goes on to describe their multi-tenant CMS application built with .NET and MongoDB, including approaches taken for schema design, sharding implementation, and best practices.
Cassandra's data model is more flexible than typically assumed.
Cassandra allows tuning of consistency levels to balance availability and consistency. It can be made consistently when certain replication conditions are met.
Cassandra uses a row-oriented model where rows are uniquely identified by keys and group columns and super columns. Super column families allow grouping columns under a common name and are often used for denormalizing data.
Cassandra's data model is query-based rather than domain-based. It focuses on answering questions through flexible querying rather than storing predefined objects. Design patterns like materialized views and composite keys can help support different types of queries.
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
This document discusses how insurance companies use MongoDB. It provides examples of how MongoDB allows insurance companies to create a single customer view, consolidate data from multiple disparate systems, and distribute claims information globally in real-time. MongoDB provides a flexible schema, automatic replication of data, and the ability to query data locally for improved customer experience, risk analysis, fraud detection, and claims processing. The document highlights several insurance companies that have adopted MongoDB to unify customer data, modernize legacy systems, and power new data-driven applications and services.
This document describes building a powerful double-entry accounting system. It discusses key concepts of double-entry accounting including debits, credits, general ledger, balance sheet, income statement and cash flow statement. It also outlines how to model accounting entries and movements declaratively using schemas. Events like purchases and payments trigger movements which are made up of one or more accounting entries. Rules ensure movements balance and caches track account balances. Generative testing is used to check business invariants. The system provides an audit trail, handles many customers, and can detect operational issues for management and financial accounting.
Data Con LA 2020
Description
Apache Druid is a cloud-native open-source database that enables developers to build highly-scalable, low-latency, real-time interactive dashboards and apps to explore huge quantities of data. This column-oriented database provides the microsecond query response times required for ad-hoc queries and programmatic analytics. Druid natively streams data from Apache Kafka (and more) and batch loads just about anything. At ingestion, Druid partitions data based on time so time-based queries run significantly faster than traditional databases, plus Druid offers SQL compatibility. Druid is used in production by AirBnB, Nielsen, Netflix and more for real-time and historical data analytics. This talk provides an introduction to Apache Druid including: Druid's core architecture and its advantages, Working with streaming and batch data in Druid, Querying data and building apps on Druid and Real-world examples of Apache Druid in action
Speaker
Matt Sarrel, Imply Data, Developer Evangelist
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
This document discusses Cassandra drivers and how to optimize queries. It begins with an introduction to Cassandra drivers and examples of basic usage in Java, Python and Ruby. It then covers the differences between synchronous and asynchronous queries. Prepared statements and consistency levels are also discussed. The document explores how consistency levels, driver policies and node outages impact performance and latency. Hinted handoff is described as a performance optimization that stores hints for missed writes on down nodes. Lastly, it provides best practices around driver usage.
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
LinkedIn uses Apache Kafka extensively to power various data pipelines and platforms. Some key uses of Kafka include:
1) Moving data between systems for monitoring, metrics, search indexing, and more.
2) Powering the Pinot real-time analytics query engine which handles billions of documents and queries per day.
3) Enabling replication and partitioning for the Espresso NoSQL data store using a Kafka-based approach.
4) Streaming data processing using Samza to handle workflows like user profile evaluation. Samza is used for both stateless and stateful stream processing at LinkedIn.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
The document discusses consistency models in distributed systems including weak, eventual, and strong consistency. It then describes the Paxos consensus algorithm, its roles of proposers, acceptors, and learners, and the two phase process of prepare/promise and accept/accept request. It provides examples of Paxos in action and discusses its usage, limitations, references, and related algorithms like Multi-Paxos and Raft.
Một hệ thống với 75.000 đơn hàng/1 ngày, quản lý tới hàng triệu SKU, giao dịch chạy qua hệ thống lên tới con số nghìn tỉ với tổng cộng trên 8.000 khách hàng đang sử dụng.
Đó chính là: Sapo - Phần mềm quản lý bán hàng trên nền tảng mở, quản lý bán hàng đa kênh, sử dụng kiến trúc Microservices thay thế kiến trúc Monolithic cũ.
Qua buổi chia sẻ kéo dài trong 2h, diễn giả Khôi Nguyễn sẽ giới thiệu về mô hình kiến trúc Microservices và một số bài toán đặc thù của Sapo đã được giải quyết dựa trên mô hình này.
Diễn giả:
Nguyễn Minh Khôi (https://www.facebook.com/khoi.nguyen.84 ) -
CTO DKT Technology ( http://www.dkt.com.vn/ )
Building a Scalable Record Linkage System with Apache Spark, Python 3, and Ma...Databricks
This document describes building a scalable record linkage system called Splinkr 3 using Apache Spark, Python 3, and machine learning. Splinkr 3 links over 330 million records across multiple systems to provide a comprehensive customer view. It standardizes data, generates record pairs efficiently using blocking keys, identifies matches using a logistic regression model, and resolves transitive links using GraphFrames. While experiments with neural networks showed marginal gains, logistic regression proved effective. Key challenges included training data quality and bugs in new Spark APIs.
This session will go into best practices and detail on how to architect a near real-time application on Hadoop using an end-to-end fraud detection case study as an example. It will discuss various options available for ingest, schema design, processing frameworks, storage handlers and others, available for architecting this fraud detection application and walk through each of the architectural decisions among those choices.
In this training webinar, we will walk you through the basics of InfluxDB – the purpose-built time series database. InfluxDB has everything you need from a time series platform in a single binary – a multi-tenanted time series database, UI and dashboarding tools, background processing and monitoring agent. This one-hour session will include the training and time for live Q&A.
What you will learn
Core concepts of time series databases
An overview of the InfluxDB platform
How to ingesting and query data in InfluxDB
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Canis Major is an adaptor that supports persistence and verification of NGSI-LD entity transactions in blockchains like Ethereum. It handles entity creation, update, retrieval and batch creation by generating a hash of the transaction and storing it on the blockchain as well as relating the transaction receipt to the original entity in a context broker. Alternative implementations include a proxy in front of the broker to handle connections to Canis Major transparently.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
This document discusses high concurrency architectures at TIKI. It describes Pegasus, the highest throughput API, which uses caching, compression, and a non-blocking architecture to handle over 200k requests per minute with sub-2ms latency. It also describes Arcturus, the high concurrency inventory API, which uses an in-memory ring buffer, Kafka for ordering, and asynchronous database flushing to handle millions of inventory transactions per second with eventual consistency. Key techniques discussed include non-blocking designs, caching, compression, ordering queues, and asynchronous data replication.
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
This document discusses how a new performance dashboard for TiDB databases helps reduce performance tuning time significantly. It introduces TiDB and its architecture, then explains why performance tuning was previously difficult due to the large number of metrics. The dashboard uses "tuning by database time" and "tuning by color" to identify issues. Case studies show how the dashboard helped optimize small table caching, data migration, and social media platform performance. The dashboard articulates issues numerically and reduces human resource needs for tuning by orders of magnitude.
MongoDB WiredTiger Internals: Journey To TransactionsM Malai
MongoDB has adapted transaction feature in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB.
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
Slack processes over 1.2 trillion messages written and 3.4 trillion messages read daily across its real-time messaging platform, generating around 1 petabyte of streaming data. With thousands of engineers and tens of thousands of producer processes, Slack relies on Apache Kafka as the commit log for its distributed database to handle its massive scale of real-time messaging.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
Prometheus is an open-source monitoring system that collects metrics from instrumented systems and applications and allows for querying and alerting on metrics over time. It is designed to be simple to operate, scalable, and provides a powerful query language and multidimensional data model. Key features include no external dependencies, metrics collection by scraping endpoints, time-series storage, and alerting handled by the AlertManager with support for various integrations.
Webinar: How Financial Services Organizations Use MongoDBMongoDB
This document discusses how MongoDB can be used across various use cases in financial services. It provides examples of how MongoDB has been used by companies for tasks like data consolidation, reference data distribution, and tick data capture and management. Its flexible data model and horizontal scalability allow consolidating disparate data sources in real-time and distributing reference data globally. It also enables cost-effective solutions for high-volume workloads like tick data processing.
The document discusses single customer view as a goal for large firms and the challenges involved. It provides an example of how MetLife was able to achieve a single customer view using MongoDB, developing a prototype customer profile application called "The Wall" in just 2 weeks that drew from 70 different systems and improved the customer experience. Lessons from successful single customer view projects emphasize behaving like a startup by having a strong champion, using modern technology, and selling the benefits of the idea.
Data Con LA 2020
Description
Apache Druid is a cloud-native open-source database that enables developers to build highly-scalable, low-latency, real-time interactive dashboards and apps to explore huge quantities of data. This column-oriented database provides the microsecond query response times required for ad-hoc queries and programmatic analytics. Druid natively streams data from Apache Kafka (and more) and batch loads just about anything. At ingestion, Druid partitions data based on time so time-based queries run significantly faster than traditional databases, plus Druid offers SQL compatibility. Druid is used in production by AirBnB, Nielsen, Netflix and more for real-time and historical data analytics. This talk provides an introduction to Apache Druid including: Druid's core architecture and its advantages, Working with streaming and batch data in Druid, Querying data and building apps on Druid and Real-world examples of Apache Druid in action
Speaker
Matt Sarrel, Imply Data, Developer Evangelist
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking VN
Những năm gần đây, cùng với sự bùng nổ của các startup cùng các loại công nghệ như máy học, lượng dữ liệu phát sinh cần thu thập và xử lý trong các hệ thống ngày càng tăng cao.
Chính vì vậy, đối với các hệ thống lớn thì việc lưu trữ và xử lý dữ liệu trên một node database đã không đáp ứng được nữa, đòi hỏi phải sử dụng nhiều node kết nối với nhau để hình thành database cluster.
Đối với các database cluster nói riêng và hệ thống Distributed System nói chung, có khá nhiều chủ đề thú vị để đào sâu. Trong buổi thảo luận này, chúng ta sẽ giới hạn trong việc khảo sát về cách ba hệ thống Redis, Elastic Search và Cassandra tổ chức cluster cũng như sự trade-off giữa tính nhất quán (consistency) và khả năng đáp ứng (availability) của ba hệ thống này.
- Speaker: Lộc Võ - Lead Software Engineer @ Grab
This document discusses Cassandra drivers and how to optimize queries. It begins with an introduction to Cassandra drivers and examples of basic usage in Java, Python and Ruby. It then covers the differences between synchronous and asynchronous queries. Prepared statements and consistency levels are also discussed. The document explores how consistency levels, driver policies and node outages impact performance and latency. Hinted handoff is described as a performance optimization that stores hints for missed writes on down nodes. Lastly, it provides best practices around driver usage.
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
LinkedIn uses Apache Kafka extensively to power various data pipelines and platforms. Some key uses of Kafka include:
1) Moving data between systems for monitoring, metrics, search indexing, and more.
2) Powering the Pinot real-time analytics query engine which handles billions of documents and queries per day.
3) Enabling replication and partitioning for the Espresso NoSQL data store using a Kafka-based approach.
4) Streaming data processing using Samza to handle workflows like user profile evaluation. Samza is used for both stateless and stateful stream processing at LinkedIn.
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
Not all workloads allow cloud computing. Low latency, cybersecurity, and cost-efficiency require a suitable combination of edge computing and cloud integration.
This session explores architectures and design patterns for software and hardware considerations to deploy hybrid data streaming with Apache Kafka anywhere. A live demo shows data synchronization from the edge to the public cloud across continents with Kafka on Hivecell and Confluent Cloud.
The document discusses consistency models in distributed systems including weak, eventual, and strong consistency. It then describes the Paxos consensus algorithm, its roles of proposers, acceptors, and learners, and the two phase process of prepare/promise and accept/accept request. It provides examples of Paxos in action and discusses its usage, limitations, references, and related algorithms like Multi-Paxos and Raft.
Một hệ thống với 75.000 đơn hàng/1 ngày, quản lý tới hàng triệu SKU, giao dịch chạy qua hệ thống lên tới con số nghìn tỉ với tổng cộng trên 8.000 khách hàng đang sử dụng.
Đó chính là: Sapo - Phần mềm quản lý bán hàng trên nền tảng mở, quản lý bán hàng đa kênh, sử dụng kiến trúc Microservices thay thế kiến trúc Monolithic cũ.
Qua buổi chia sẻ kéo dài trong 2h, diễn giả Khôi Nguyễn sẽ giới thiệu về mô hình kiến trúc Microservices và một số bài toán đặc thù của Sapo đã được giải quyết dựa trên mô hình này.
Diễn giả:
Nguyễn Minh Khôi (https://www.facebook.com/khoi.nguyen.84 ) -
CTO DKT Technology ( http://www.dkt.com.vn/ )
Building a Scalable Record Linkage System with Apache Spark, Python 3, and Ma...Databricks
This document describes building a scalable record linkage system called Splinkr 3 using Apache Spark, Python 3, and machine learning. Splinkr 3 links over 330 million records across multiple systems to provide a comprehensive customer view. It standardizes data, generates record pairs efficiently using blocking keys, identifies matches using a logistic regression model, and resolves transitive links using GraphFrames. While experiments with neural networks showed marginal gains, logistic regression proved effective. Key challenges included training data quality and bugs in new Spark APIs.
This session will go into best practices and detail on how to architect a near real-time application on Hadoop using an end-to-end fraud detection case study as an example. It will discuss various options available for ingest, schema design, processing frameworks, storage handlers and others, available for architecting this fraud detection application and walk through each of the architectural decisions among those choices.
In this training webinar, we will walk you through the basics of InfluxDB – the purpose-built time series database. InfluxDB has everything you need from a time series platform in a single binary – a multi-tenanted time series database, UI and dashboarding tools, background processing and monitoring agent. This one-hour session will include the training and time for live Q&A.
What you will learn
Core concepts of time series databases
An overview of the InfluxDB platform
How to ingesting and query data in InfluxDB
Real-time Analytics with Trino and Apache PinotXiang Fu
Trino summit 2021:
Overview of Trino Pinot Connector, which bridges the flexibility of Trino's full SQL support to the power of Apache Pinot's realtime analytics, giving you the best of both worlds.
Canis Major is an adaptor that supports persistence and verification of NGSI-LD entity transactions in blockchains like Ethereum. It handles entity creation, update, retrieval and batch creation by generating a hash of the transaction and storing it on the blockchain as well as relating the transaction receipt to the original entity in a context broker. Alternative implementations include a proxy in front of the broker to handle connections to Canis Major transparently.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
This document discusses high concurrency architectures at TIKI. It describes Pegasus, the highest throughput API, which uses caching, compression, and a non-blocking architecture to handle over 200k requests per minute with sub-2ms latency. It also describes Arcturus, the high concurrency inventory API, which uses an in-memory ring buffer, Kafka for ordering, and asynchronous database flushing to handle millions of inventory transactions per second with eventual consistency. Key techniques discussed include non-blocking designs, caching, compression, ordering queues, and asynchronous data replication.
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
This document discusses how a new performance dashboard for TiDB databases helps reduce performance tuning time significantly. It introduces TiDB and its architecture, then explains why performance tuning was previously difficult due to the large number of metrics. The dashboard uses "tuning by database time" and "tuning by color" to identify issues. Case studies show how the dashboard helped optimize small table caching, data migration, and social media platform performance. The dashboard articulates issues numerically and reduces human resource needs for tuning by orders of magnitude.
MongoDB WiredTiger Internals: Journey To TransactionsM Malai
MongoDB has adapted transaction feature in MongoDB 4.0. This talk focuses on the internals of how MongoDB adapted the ACID properties with Weird Tiger Engine. Weird tiger offers more future possibilities for MongoDB.
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
Slack processes over 1.2 trillion messages written and 3.4 trillion messages read daily across its real-time messaging platform, generating around 1 petabyte of streaming data. With thousands of engineers and tens of thousands of producer processes, Slack relies on Apache Kafka as the commit log for its distributed database to handle its massive scale of real-time messaging.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
Prometheus is an open-source monitoring system that collects metrics from instrumented systems and applications and allows for querying and alerting on metrics over time. It is designed to be simple to operate, scalable, and provides a powerful query language and multidimensional data model. Key features include no external dependencies, metrics collection by scraping endpoints, time-series storage, and alerting handled by the AlertManager with support for various integrations.
Webinar: How Financial Services Organizations Use MongoDBMongoDB
This document discusses how MongoDB can be used across various use cases in financial services. It provides examples of how MongoDB has been used by companies for tasks like data consolidation, reference data distribution, and tick data capture and management. Its flexible data model and horizontal scalability allow consolidating disparate data sources in real-time and distributing reference data globally. It also enables cost-effective solutions for high-volume workloads like tick data processing.
The document discusses single customer view as a goal for large firms and the challenges involved. It provides an example of how MetLife was able to achieve a single customer view using MongoDB, developing a prototype customer profile application called "The Wall" in just 2 weeks that drew from 70 different systems and improved the customer experience. Lessons from successful single customer view projects emphasize behaving like a startup by having a strong champion, using modern technology, and selling the benefits of the idea.
Webinar: Making A Single View of the Customer Real with MongoDBMongoDB
Tier 1 banks, top insurance providers and other global financial services institutions have discovered that with the use of MongoDB, they are able to achieve a single view of the customer. This allows them not only to comply with KYC and other regulations, but also to engage customers efficiently, which helps reduce churn and increase wallet share while reducing costs. We will focus on how MongoDB's dynamic schema, real-time replication and auto-scaling make it possible to create a global, unified data hub aggregating disparate data sources, which can be made available to customers, customer service representatives (CSRs), and relationship managers (RMs).
Webinar: How to Drive Business Value in Financial Services with MongoDBMongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data. This coupled with cost pressures from the business has led these institutions to seek alternatives. Top tier institutions like MetLife have turned to MongoDB because of the enormous business value it enables.
In this session, hear how MongoDB enabled these successful real world examples:
Single View of a Customer - 3 months and $2M for a single view of a customer across 50 source systems
Reference Data Management - $40M in cost savings from migrating to MongoDB for reference data management
Private cloud - MongoDB as a PaaS across a tier 1 bank for enabling agility for operations, not just the developer
The use cases are specific to financial services but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
In the world of big data, legacy modernization, siloed organizations, empowered customers, and mobile devices, making informed choices about your enterprise infrastructure has become more important than ever. The alternatives are abundant, and the successful Enterprise Architect must constantly discern which new technology is just a shiny object and which will add true business value.
Webinar: How Financial Firms Create a Single Customer View with MongoDBMongoDB
Learn why a tier 1 bank, top 5 insurance provider and other global financial services companies are flocking to MongoDB. This webinar focuses on how firms use MongoDB to generate a single customer view not only to comply with KYC and other regulations, but also to engage customers efficiently, which helps reduce churn and increase wallet share while still reducing costs. We will focus on how MongoDB's dynamic schema, real-time replication and auto-scaling make it possible to create a global, unified data hub aggregating disparate data sources, which can be made available to customers, customer service representatives (CSRs), and relationship managers (RMs).
Webinar: How to Drive Business Value in Financial Services with MongoDBMongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data, so-called Big Data. This coupled with cost pressures from the business has led these institutions to seek alternatives. Top tier institutions like MetLife have turned to MongoDB because of the enormous business value it enables.
In this session, learn where and how you should use MongoDB to get the maximum value including specific case studies such as saving $40M in one project.
The use cases are specific to financial services but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
This document discusses how MongoDB can help enterprises meet modern data and application requirements. It outlines the many new technologies and demands placing pressure on enterprises, including big data, mobile, cloud computing, and more. Traditional databases struggle to meet these new demands due to limitations like rigid schemas and difficulty scaling. MongoDB provides capabilities like dynamic schemas, high performance at scale through horizontal scaling, and low total cost of ownership. The document examines how MongoDB has been successfully used by enterprises for use cases like operational data stores and as an enterprise data service to break down silos.
L’architettura di classe enterprise di nuova generazioneMongoDB
The document discusses using MongoDB to build an enterprise data management (EDM) architecture and data lake. It proposes using MongoDB for different stages of an EDM pipeline including storing raw data, transforming data, aggregating data, and analyzing and distributing data to downstream systems. MongoDB is suggested for stages that require secondary indexes, sub-second latency, in-database aggregations, and updating of data. The document also provides examples of using MongoDB for a single customer view and customer profiling and clustering analytics.
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
This document discusses using MongoDB as part of an enterprise data management architecture. It begins by describing the rise of data lakes to manage growing and diverse data volumes. Traditional EDWs struggle with this new data variety and volume. The document then provides an overview of MongoDB's features like flexible schemas, secondary indexes, and aggregation capabilities that make it suitable for building different layers of an EDM pipeline for tasks like raw data storage, transformation, analysis, and serving data to downstream systems. Example use cases are presented for building a single customer view and for replacing Oracle with MongoDB.
With the combination of Pentaho and MongoDB, it’s drastically simpler and faster to build single analytical views of clients by aggregating and blending data from a variety of internal sources (customer, transaction, position data) and external sources (social networking, central bank, news, pricing) with fast response times.
Webinar covers:
An insider’s view of new ways financial services companies are using MongoDB to rapidly store and consume unlimited shapes and sizes of data
How Pentaho makes it easy to enrich data in MongoDB with predictive scoring, visual data integration tools, reports, interactive dashboards, and data visualizations
A live demo of blending Twitter, equity pricing, and news data into a single analytical view that unlocks market intelligence to create investment opportunities
Webinar: Achieving Customer Centricity and High Margins in Financial Services...MongoDB
It is imperative that Financial Services firms align the organization around providing maximum value to customers across all channels and products with the agility to capitalize on new opportunities. They must do this at the same time as cutting costs, improving operational efficiency, and complying with current and future regulations. This effort is commonly referred to as Industrialization, or streamlining people, process, and technology for maximum customer value, service, and efficiency.
MongoDB can help you in this initiative by allowing you to centralize data management no matter how it is structured across channels and products and make it easy to aggregate data from multiple systems, while lowering TCO and delivering applications faster. MetLife publicly announced that they used MongoDB to enable a single view of the customer in 3 months across 70+ existing systems. We will explore case studies demonstrating these capabilities to help you industrialize your firm.
Key takeaways:
Unique capabilities, brought to you by MongoDB
Concrete use cases that help industrialization
Implementation case studies, to pave the way
Stream me to the Cloud (and back) with Confluent & MongoDBconfluent
In this online talk, we’ll explore how and why companies are leveraging Confluent and MongoDB to modernize their architecture and leverage the scalability of the cloud and the velocity of streaming. Based upon a sample retail business scenario, we will explain how changes in an on-premise database are streamed via the Confluent Cloud to MongoDB Atlas and back.
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB
The document discusses the rise of data lakes and how MongoDB can be used to build modern data management architectures. It provides examples of how companies like a Spanish bank and an insurance leader used MongoDB to create a single customer view across siloed data sources and improve customer experiences. The document also outlines common data processing patterns and how to choose the best data store for different parts of the data pipeline.
Mobility: It's Time to Be Available for HERMongoDB
In order to meet the needs of the digitally-oriented consumer, retailers need to offer personalized service in real-time. By embracing mobile to deliver an integrated experience to customers, retailers can open new business opportunities.
Yet, for many traditional retailers, providing a seamless experience across mobile and other channels presents challenges due to the limitations of legacy technology infrastructure and the ability to act in ‘real-time’. However, a new class of database technology is emerging that enables retailers to support new business requirements, improve customer experience and reduce cost. In the next session of webinar series - Omni-Channel Retailing: One-Step-at a Time you will learn why more and more retailers and ecommerce players are turning to MongoDB as a choice for their mobile platforms. Based on existing customers you will learn:
How to meet the consumer where she is, whenever she wants - know where she is using geo-spatial services
Engage with her and provide a ‘real-time’ experience, tailored to her expectations - check-her in or ‘check-her out’ at the POS and provide the latest update
Deliver the most up-to-date information to your associates so they are empowered to serve the consumer when she engages with your brand - deliver the latest inventory information via mobile app to your employee
In this discussion, you learn the latest in business techniques and how you can take advantage of MongoDB to deliver another piece of Omni-channel imperative - meeting your customer - at her convenience.
- MongoDB enables businesses to scale databases horizontally on commodity hardware or cloud infrastructure to handle terabytes or petabytes of data without downtime. It also allows easy adaptation by making flexible data modeling and adding new data types and sources simple. Additionally, MongoDB supports rich querying across diverse and changing data sets in real time to unlock insights from data. Case studies show how MongoDB has helped companies improve performance, innovate faster, and gain competitive advantages over relational databases.
This document provides an overview of Host Access Transformation Services (HATS) from Rational, an IBM product. HATS allows organizations to modernize legacy green screen applications and interfaces by transforming them into modern web and mobile interfaces without modifying the underlying host systems. It discusses business challenges organizations face with outdated green screen systems like long training times, user frustration, and inability to access new markets. The document then summarizes key capabilities of HATS like support for web technologies, mobile platforms, and integration with other systems without needing access to source code. It provides examples and a case study of its use at Winnebago Industries and Total System Services.
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data. This coupled with cost pressures from the business has led these institutions to seek alternatives. In this session learn how FS companies are using MongoDB to solve their problems. The use cases are specific to FS but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
Overcoming Today's Data Challenges with MongoDBMongoDB
The document outlines an agenda for an event on overcoming data challenges with MongoDB. The event will feature speakers from MongoDB and Bosch discussing how the world has changed since relational databases were invented, how to radically transform IT environments with MongoDB, MongoDB and blockchain, and MongoDB for multiple use cases. The agenda includes presentations on these topics as well as a Q&A session and conclusion.
Similar to How Financial Services Organizations Use MongoDB (20)
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB
Chaque entreprise devient une entreprise de logiciels, fournissant des solutions client pour accéder à une variété de services et d'informations. Les entreprises commencent maintenant à valoriser leurs données et à obtenir de meilleures informations pour l'entreprise. Un défi crucial consiste à s'assurer que ces données sont toujours disponibles et sécurisées pour être conformes aux objectifs commerciaux de l'entreprise et aux contraintes réglementaires des pays. MongoDB fournit la couche de sécurité dont vous avez besoin, venez découvrir comment sécuriser vos données avec MongoDB.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
2. Who Is Talking To You?
2
• Yes, I use “Buzz” on my business cards
• Former Investment Bank Chief Architect at
JPMorganChase and Bear Stearns before that
• Over 27 years of designing and building systems
• Big and small
• Super-specialized to broadly useful in any vertical
• “Traditional” to completely disruptive
• Advocate of language leverage and strong factoring
• Inventor of perl DBI/DBD
• Still programming – using emacs, of course
3. MongoDB
The leading NoSQL database
3
Document
Data Model
Open-
Source
Full-
Featured
{ !
name: “John Smith”,!
pfxs: [“Dr.”,”Mr.”],!
address: “10 3rd St.”,!
phone: {!
!home: 1234567890,!
!mobile: 1234568138 }!
}!
4. MongoDB Company Overview
4
400+ employees 1100+ customers
Over $231 million in funding
Offices in NY & Palo Alto and
across EMEA, and APAC
6. 6
Indeed.com Trends
Top Job Trends
1. HTML 5
2. MongoDB
3. iOS
4. Android
5. Mobile Apps
6. Puppet
7. Hadoop
8. jQuery
9. PaaS
10. Social Media
Leading NoSQL Database
Google Search LinkedIn Job Skills
MongoDB
MongoDB
TIBCO/Jaspersoft Big Data Index
Direct Real-Time Downloads MongoDB
10. Relational: ALL Data is Column/Row
10
Customer
ID
First
Name
Last
Name
City
0
John
Doe
New
York
1
Mark
Smith
San
Francisco
2
Jay
Black
Newark
3
Meagan
White
London
4
Edward
Daniels
Boston
Phone
Number
Type
DoNotCall
Customer
ID
1-‐212-‐555-‐1212
home
T
0
1-‐212-‐555-‐1213
home
T
0
1-‐212-‐555-‐1214
cell
F
0
1-‐212-‐777-‐1212
home
T
1
1-‐212-‐777-‐1213
cell
(null)
1
1-‐212-‐888-‐1212
home
F
2
11. mongoDB: Model Your Data The Way
it is Naturally Used
Relational MongoDB
11
{ !customer_id : 1,!
!first_name : "Mark",!
!last_name : "Smith",!
!city : "San Francisco",!
!phones: [ !{!
! ! number : “1-212-777-1212”,
! ! dnc : true,!
! ! type : “home”!
!},!
!{!
! ! number : “1-212-777-1213”, !!
! ! type : “cell”!
!}] !
}!
Customer
ID
First
Name
Last
Name
City
0
John
Doe
New
York
1
Mark
Smith
San
Francisco
2
Jay
Black
Newark
3
Meagan
White
London
4
Edward
Daniels
Boston
Phone
Number
Type
DNC
Customer
ID
1-‐212-‐555-‐1212
home
T
0
1-‐212-‐555-‐1213
home
T
0
1-‐212-‐555-‐1214
cell
F
0
1-‐212-‐777-‐1212
home
T
1
1-‐212-‐777-‐1213
cell
(null)
1
1-‐212-‐888-‐1212
home
F
2
12. No SQL But Still Flexible Querying
12
Rich Queries
• Find everybody who opened a special
account last month in NY between $100
and $1000 OR last year more than $500
Aggregation • What is the average P&L of the trading
desks grouped by a set of date ranges
Text Search • Find all tweets that mention the bank
within the last 2 days
Geospatial • Find all customers that live within 10 miles
of NYC
Map Reduce • Calculate total amount settled position by
symbol by settlement venue
13. Capital Markets – Common Uses
Functional Areas Use Cases to Consider
Risk Analysis & Reporting Firm-wide Aggregate Risk Platform
13
Intraday Market & Counterparty Risk Analysis
Risk Exception Workflow Optimization
Limit Management Service
Regulatory Compliance Cross-silo Reporting: Volker, Dodd-Frank, EMIR, MiFID II, etc.
Online Long-term Audit Trail
Aggregate Know Your Customer (KYC) Repository
Buy-Side Portal Responsive Portfolio Reporting
Trade Management Cross-product (Firm-wide) Trademart
Flexible OTC Derivatives Trade Capture
Front Office Structuring & Trading Complex Product Development
Strategy Backtesting
Strategy Performance Analysis
Reference Data Management Reference Data Distribution Hub
Market Data Management Tick Data Capture
Investment Advisory Cross-channel Informed Cross-sell
Enriched Investment Research
14. Retail Banking - Common Uses
Functional Areas Use Cases to Consider
Customer Engagement Single View of a Customer
14
Customer Experience Management
Responsive Digital Banking
Gamification of Consumer Applications
Agile Next-generation Digital Platform
Marketing Multi-channel Customer Activity Capture
Real-time Cross-channel Next Best Offer
Location-based Offers
Risk Analysis & Reporting Firm-wide Liquidity Risk Analysis
Transaction Reporting and Analysis
Regulatory Compliance Flexible Cross-silo Reporting: Basel III, Dodd-Frank, etc.
Online Long-term Audit Trail
Aggregate Know Your Customer (KYC) Repository
Reference Data Management [Global] Reference Data Distribution Hub
Payments Corporate Transaction Reporting
Fraud Detection Aggregate Activity Repository
Cybersecurity Threat Analysis
15. Insurance – Common Uses
Functional Areas Use Cases to Consider
Customer Engagement Single View of a Customer
15
Customer Experience Management
Gamification of Applications
Agile Next-generation Digital Platform
Marketing Multi-channel Customer Activity Capture
Real-time Cross-channel Next Best Offer
Agent Desktop Responsive Customer Reporting
Risk Analysis & Reporting Catastrophe Risk Modeling
Liquidity Risk Analysis
Regulatory Compliance Online Long-term Audit Trail
Reference Data Management [Global] Reference Data Distribution Hub
Policy Catalog
Fraud Detection Aggregate Activity Repository
16. Data Consolidation
Challenge: Aggregation of disparate data is difficult
16
Cards
Loans
…
Deposits
Data
Warehouse
Batch
Issues
• Yesterday’s
data
• Details
lost
• Inflexible
schema
• Slow
performance
Datamart
Datamart
Datamart
Batch
Impact
• What
happened
today?
• Worse
customer
saTsfacTon
• Missed
opportuniTes
• Lost
revenue
Batch
Batch
ReporTng
CarDdast
a
Source
1
LoaDnast
a
Source
2
DepoDsaittsa
Source
n
17. Data Consolidation
Solution: Using rich, dynamic schema and easy scaling
17
Data
Warehouse
Real-‐Tme
or
Batch
Trading
ApplicaTons
Risk
applicaTons
Opera;onal
Data
Hub
Benefits
• Real-‐Tme
• Complete
details
• Agile
• Higher
customer
retenTon
• Increase
wallet
share
• ProacTve
excepTon
handling
Strategic
ReporTng
OperaTonal
ReporTng
Cards
CarDdast
a
Source
1
Loans
LoaDnast
a
Source
2
…
Deposits
DepoDsaittsa
Source
n
18. Data Consolidation
Watch Out For The Arrow!
18
Data
Source
1
Flat Data
Extractor
Program
Potentially
Many CSV
Files
Flat Data
Loader
Program
Data
Mart
Or
Warehouse
• Entities in source RDBMS not extracted as entities
• CSV is brittle with no self-description
• Both Loader and RBDMS must update schema when source changes
• Application must reassemble Entities
App
Traditional Approach
Data
Source
1
JSON
Extractor
Program
Fewer
JSON
Files
• Entities in RDBMS extracted as entities
• JSON is flexible to change and self-descriptive
• mongoDB data hub does not change when source changes
• Application can consume Entities directly
App
The mongoDB Approach
19. Data Consolidation
Case Study: Insurance
Insurance leader generates coveted 360-degree view of
customers in 90 days – “The Wall”
19
Problem Why MongoDB Results
• No single view of
customer
• 145 yrs of policy data,
70+ systems, 15+ apps
• 2 years, $25M in failing
to aggregate in RDBMS
• Poor customer
experience
• Agility – prototype in 9
days;
• Dynamic schema & rich
querying – combine
disparate data into one
data store
• Hot tech to attract top
talent
• Production in 90 days with 70
feeders
• Unified customer view
available to all channels
• Increased call center
productivity
• Better customer experience,
reduced churn, more upsell
opps
• Dozens more projects on
same data platform
20. Data Consolidation
Case Study: Global Broker Dealer
Trade Mart for all OTC Trades
20
Problem Why MongoDB Results
• Each application had its
own persistence and
audit trail
• Wanted one unified
framework and
persistence for all
trades and products
• Needed to handle many
variable structures
across all securities
• Dynamic schema: can
save trade for all products
in one data service
• Easy scaling: can easily
keep trades as long as
required with high
performance
• Fast time-to-market using
the persistence framework
• Store any structure of
products/trades without
changing a schema
• One consolidated trade
store for auditing and
reporting
* Same Concepts Apply to Risk Calculation Consolidation
21. Data Consolidation
Case Study: Heavily Mergered Bank
Entitlements Reconciliation and Management
21
Problem Why MongoDB Results
• Entitlement structure
from 100s of systems
cannot be remodeled in
a central store
• Difficult to design a
difference engine for
bespoke content
• Feeder systems need to
change on demand and
cannot be held up by
central store
• Dynamic schema:
Common bookkeeping
plus bespoke content
captured in same,
queryable collection
• Rich structure API allows
generic, granular, and
clear comparison of
documents
• Central processing places
few demands on feeders
• New systems can be
added at any time with no
development effort
• Development effort shifted
to value-add capabilities on
top of store
22. Point-of-Origin
Case Study: Global Broker Dealer
Structured Products Development & Pricing
22
Problem Why MongoDB Results
• Need agility in design
and persistence of
complex instruments
• Variety of consumers:
C# front ends, Java and
C++ backend
calculators, python RAD
• Arbitrary grouping of
instruments in RDBMS
is limited
• Rich structure in
documents supports legs
of exotic shapes
• 13 languages supported
plus more in the
community
• Faster development of
high-margin products
• Simpler management of
portfolios and groupings
23. Reference Data Distribution
Challenge: Ref data difficult to change and distribute
23
Golden
Copy
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Batch
Common
issues
• Hard
to
change
schema
of
master
data
• Data
copied
everywhere
and
gets
out
of
sync
Impact
• Process
breaks
from
out
of
sync
data
• Business
doesn’t
have
data
it
needs
• Many
copies
creates
more
management
24. Reference Data Distribution
Solution: Persistent dynamic cache replicated globally
24
Real-‐Tme
Real-‐Tme
Real-‐Tme
Real-‐Tme
Real-‐Tme
Real-‐Tme
Real-‐Tme
Real-‐Tme
Solu;on:
• Load
into
primary
with
any
schema
• Replicate
to
and
read
from
secondaries
Benefits
• Easy
&
fast
change
at
speed
of
business
• Easy
scale
out
for
one
stop
shop
for
data
• Low
TCO
25. Reference Data Distribution
Case Study: Global Bank
Distribute reference data globally in real-time for
fast local accessing and querying
25
Problem Why MongoDB Results
• Delays up to 36 hours in
distributing data by batch
• Charged multiple times
globally for same data
• Incurring regulatory
penalties from missing
SLAs
• Had to manage 20
distributed systems with
same data
• Dynamic schema: easy to
load initially & over time
• Auto-replication: data
distributed in real-time,
read locally
• Both cache and database:
cache always up-to-date
• Simple data modeling &
analysis: easy changes
and understanding
• Will avoid about
$40,000,000 in costs and
penalties over 5 years
• Only charged once for data
• Data in sync globally and
read locally
• Capacity to move to one
global shared data service
26. Market Data Capture & Management
Challenge: Huge volume, fast moving, niche technology
EOD Price Data
(10,000 rows)
26
Technology A
EOD
ApplicaTons
RT Tick Data
(150,000 ticks/sec)
X
X
Hybridized
Technology
X
Technology B
Issues
• Bespoke
technology
(incl.
APIs,
ops,
scalability)
for
each
use
case
• High-‐performance
Tck
soluTons
are
expensive
• Shallow
pool
for
skills
Impact
• Total
Expense
plus
integraTon
saps
margin
in
product
space
Symbol
X
Date
ApplicaTons
AggregaTon
ApplicaTons
Tick
ApplicaTons
27. Market Data Capture & Management
Solution: Sharding and tick bucketing & compression
27
EOD
ApplicaTons
RT Tick Data
Benefits
• Common
technology
pla`orm
• Common
DAL
for
many
use
cases
/
workloads
• Affordable
but
sTll
high
performance
horizontal
scalability
Symbol
X
Date
ApplicaTons
AggregaTon
ApplicaTons
Tick
ApplicaTons
Python DAL
Bucket /
Compression
Unbucket /
Decompression
pymongo driver
mongoDB
Sharded Cluster
28. Market Data Capture & Management
Case Study: AHL Group, Systematic Trading
Common infrastructure for multiple access
scenarios of tick data
28
Problem Why MongoDB Results
• Quants demand agility
in python
• Quant use cases have
very different workload
than traders
• Reticence to invest in
highly specialized
languages and ops
• Excellent impedance
match to python
• High, predictable read/
write performance
• Ability to easily store long
vectors of data
• Rich querying and
indexing can be exploited
by a custom DAL
• Platform can ingest
130mm ticks/second
• 10 years of 1 minute data
< 1 s
• 200 inst X all history X
EOD price < 1s
• Much lower TCO
• Easier hiring of talent
Hello all! This is Buzz Moschetti. Welcome to today’s webinar entitled “How Financial Serivces Uses MongoDB”
If your travel … otherwise, welcome aboard.
Today I’m going to give you some background on what mongoDB is all about, followed by some popular use cases involving mongoDB that we’ve seen emerge in Financial Services – that being wholesale & retail banking and insurance -- and the reasons that motivated the use of it.
First, some quick logistics:
The presentation audio & slides will be recorded and made available to you in about 24 hours.
We have an hour set up but I’ll use about 40 minutes of that for the presentation with some time for Q & A.
You can of course use the webex Q&A box to ask questions at any time
The mongoDB team is monitoring the Q&A box and will answer certain questions in real time.
Questions / themes that are popular will be captured and I will repeat them at the end of the benefit of everyone
If you have technical issues, please send a message to the mongoDB team and they will try to assist you.
Acknowledging this may be new for some percentage of the audience, I’ll spend a few minutes doing an overview of mongoDB.
What is it?
It is a general purpose document store database.
General purpose means CRUD (create read update delete) works similar to traditional databases, esp. RDBMS. Content that is saved is immediately readable and indexed and available to query through a rich query language. This is major differentiator in the NoSQL space.
By document we mean a “rich shape” model: Not a word doc or a PDF. instead of forcing data into a normalized set of rectangles (a.k.a. tables), mongoDB can store shapes that contain lists and subdocuments: we see some hint of that here with the pfxs and phone fields and we’ll explore in just slightly more detail later on.
We are also OpenSource: there is a vibrant community that contributes to and amplifies the product and solutions around it. As a company, we provide value beyond the basic features including enterprise-ready features such as commercial grade builds, monitoring & management services, authentication security, support, training, and launch services.
Here’s a little bit about us.
HQ in NY, we are 375 employees in eng, presales, consulting, documentation, and community support – and yes, sales too.
Actively supporting the mongoDB ecosystem are the people involved in the 7.2 million downloads of the product to date.
And here’s the logo page you’ve been waiting to see.
The 1000+ paying customers include most of the Fortune 500 and the top retail and wholesale banks in the country, and as you know banks are shy about their logos.
These customers span the spectrum of complexity and performance from small targeted solutions platforms to petabyte installations like CERN and the Large Hadron Collider and many billion document collections with high read/write workloads like craigslist and foursquare.
And why do they use us? Well, for a number of reasons. Our document model and the technology around it is very good – but it’s more than the technology.
Not important to point out the names of our direct competitors here but in comparison we’re clearly the most popular and commercially vibrant NoSQL database, and the talent pool is growing.
The overall community is large enough that, for example, stackoverflow.com has a very active and useful forum for mongoDB and many questions on edge use cases and integration and best practices can be found there.
And this is reflected in….. (turn page).
#5 most popular DB, measured by combination of use, awareness, and activity on the internet
Passed DB2 in Feb.
On track to pass postgres in a month or so.
From there quite a jump to the next tier but still a very good showing – and the only document / rich shape product on the radar.
Here’s another reason for the popularity and strength of the platform: We have 500 partners and growing by about 10 monthly. Much More than others in the NoSQL space.
We have strategic partnerships with progressive companies like Pentaho in BI and AppDynamics for system health and performance monitoring.
And we have certification programs for systems integrators too so you can outsource with confidence.
IBM: Standardizing on BSON, MongoDB query language, and MongoDB wire protocol for DB2 integration, and that sends a very strong signal about our position in this space. Just google for IBM DB2 JSON and you’ll see.
Historically, mongoDB is very cloud friendly and although financial services tend not to use public clouds as much due to personal info and data secrecy issues, the tools and techniques developed in the public clouds for provisioning, monitoring, multitenancy, etc. can be reproduced in private clouds inside your firewall so financial services can get a leg up on that so to speak.
Let’s examine where the technology is positioned.
Here are a few of the most popular types of persistence models in use today.
RDBMS, being the most mature, are deep in functionality – but the legacy design principles are rooted on design principles almost 40 years old. And that comes at the expense of rich interaction with today’s programming languages, design requirements, and infrastructure implementation choices.
Key-value stores, at the other end of the spectrum, act essentially like HashMaps (for those Java programmers in the audience) but are not really general purpose databases.
MongoDB trades some features of a relational database (joins, complex transactions) to enable greater scalability, flexibility, and performance for purpose. By that we mean performance for the operations as executed at the data access layer, not necessarily TPS at the database level.
To compare RDBMS and document modeling, let’s take a simple example of phone numbers for a particular customer.
Even for simple structures – a list of phone number within a customer – the data is split across 2 tables.
What are the consequences?
Managing relationship between customer and phones is non-trivial
This case is the friendly one because the same ID for the customer table is used for phones; that is not always the case, and separate foreign keys must be created and assigned o both tables.
Of course, be mindful of customers WITHOUT phones because this changes common JOIN relationships!
This approach clearly gets more complicated the more “subentities” exist for a particular design – especially those involving lists of plain scalar values
phone_0, phone_1
value_0, value_1, etc.
In mongoDB, you model your data the way it is naturally associated
Lists of things remain lists of things
No extra steps with foreign keys
Just because mongoDB is NoSQL does not mean it is without application-friendly features that are required for a general purpose database
Rich Queries and Aggregation are “expected” functions of a database and mongoDB has powerful offerings for both, complete with primary and secondary index support.
Text, Geo, and MapReduce are extended features of the platform.
NOW – let’s move on to use cases within financial serivces
Again, we consider Financial Services to be capital markets, retail, and insurance.
Starting with cap mkts, here is a summary of use cases we have developed with customers.
I won’t read through these because you can peruse them at your leisure after the webinar.
Broad swath of areas covered from front to back office.
Of note: Strong cross-asset theme
As we move forward, we’ll see some some common patterns emerging from these specific uses, across all financial services.
Retail, with a far larger direct customer base, brings 360 degree view of the customer with respect to internal (possibly legacy) systems together with modern and exciting concepts such as mobile deployment, alternative rewards programs, and rapid feature-trend development. This is very top-side kind of activity..
Interestingly, it also focuses on the back end – trade surveillance, risk, threat detection, and other fairly serious sounding and important activities!
You can see that many of the use cases are similar to capital markets.
Insurance is similar to Retail Banking – large direct customer base, 360 degree view of the customer and marketing / distribution channel optimization capabilities,
Many of the same themes: data consolidation, historical preservation of activity, and cross-asset flexible risk modeling.
In particular, the client-view integration of P&C, life, annuities, and other offerings across what was traditionally very separate aspects of the business (and therefore very separate systems) has had profound effects on the technology, customer relationship management, and targeted business growth.
Let’s get to the heart of it and examine four use case patterns in detail. Pretty much all of the use cases described in the past few pages can be described in a few patterns, which is good architecture.
The patterns are Data Consolidation, Point-Of-Origin, Reference Data Distribution, and Tick Data Management
Starting with Data Consolidation:
Most solutions look like this. Data on the left goes through a series of “processing steps” – and we’ll look at THAT in a moment – and ends up in a giant warehouse.
Why has this been a problem historically?
Largely because of 2 points: Details lost or obscured and inflexible schema to adapt to change. It’s hard enough for the feeder systems to manage their schemas; what happens when everything is brought together into a warehouse? More often than not, you end up with the giant 1000 table data warehouse.
In addition to the Impact points above, this overall design is more expensive than it needs to be especially when you factor in testing regions. Q/A must be ferocious here to ensure that the data is moving left to right smoothly.
At least from a powerpoint view, the mongoDB solution looks similar. Perhaps comfortably so!
So what is different here? What makes a mongoDB hub different than an RDBMS hub? Did we simply drop a green leaf into the picture and raise the victory flag? Couldn’t we realtime enable the RDBMS hub and skip the datamarts and get to a picture that looks like this? Well clearly you could do those things but that’s NOT the critical issue here.
The real issue lies in dynamic schema and low-cost horizontal scaling
Dynamic schema allows the feeder systems to drive the data types and the overall shape of the data instead of having to “reinterpret” this information on the hub
Horizontal scaling means your hub can grow from 10GB to 10TB or more with consistent performance and operational integrity and management including resiliency (HA) and DR (esp multi data center recovery).
On other words, even if you eliminate the marts and make the hub realtime, you will likely end up with a 1000 table, brittle, hard to change data hub.
It’s all about The Arrow. The arrow is the single most misleading thing in architecture diagrams today.
The “arrow” represents MUCH more than just “data in A going to B.”
In the traditional approach, almost from the get-go, data is extracted from the RDBMS into CSV or via ETL and immediately begins to lose fidelity. If you think back to the Customer and Phones example before, instead of extracting a complete customer entity, we likely will get two sets of files or worse – a lossy blend that perhaps only provide the first phone number!
After the extract, the loader and the target RDBMS have to have the right schema in place and good luck to an application trying to re-engineer the relationship between some of these things especially as the data shapes change. We all know what happens to CSV based environments when data changes – and that is to make a NEW feed.
In the mongoDB approach, the feeder system can extract entities in as much fidelity and richness of shape as appropriate. Because JSON is self-descriptive, new fields and indeed, complete new substructures can be added without changing the feed environment OR THE TARGET mongoDB HUB!
One of prouder moments
First feeder systems were plumbed in ONE MONTH
Risk!
Twist on the model: Instead of multiple shapes flowing into a mongoDB store, the mongoDB store is the point-of-origin for rich shapes.
Compared to distributed cache - $ and fixed schema
Many stores: Relational, tick, flat files, caches…
RT Tick data is 150,000 X 3600 X 12 X 10 bytes = ~64GB per day (many tens of GB per day)
10 years of 1 minute data < 1 s
200 inst X all history X EOD price < 1s
Sharding on market and symbol
Results:
Once a day data: 4ms for 10,000 rows
READ: 230m ticks/sec via 256 parallel readers
10-15x reduction in network load and negligible decompression (lz4: 1.8Gb/s)
Other things can be stored in mongoDB!