1. MongoDB is well-suited for reference data solutions due to its dynamic and flexible schema, built-in replication and high availability features, and tag aware sharding which allows for geographic distribution of data.
2. A case study of a global broker dealer showed how MongoDB could replace expensive and complex ETL processes for distributing reference data, saving over $40 million over 5 years.
3. Key benefits included real-time data distribution, faster querying of local data, and avoiding regulatory penalties from delays in data distribution.
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of a high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy, which in turns allows for speedy identification of business problems, delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues.
Over the course of this webinar, we will:
Help you understand foundational Data Quality concepts based on “The DAMA Guide to the Data Management Body of Knowledge” (DAMA DMBOK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
To take a “ready, aim, fire” tactic to implement Data Governance, many organizations assess themselves against industry best practices. The process is not difficult or time-consuming and can directly assure that your activities target your specific needs. Best practices are always a strong place to start.
Join Bob Seiner for this popular RWDG topic, where he will provide the information you need to set your program in the best possible direction. Bob will walk you through the steps of conducting an assessment and share with you a set of typical results from taking this action. You may be surprised at how easy it is to organize the assessment and may hear results that stimulate the actions that you need to take.
In this webinar, Bob will share:
- The value of performing a Data Governance best practice assessment
- A practical list of industry Data Governance best practices
- Criteria to determine if a practice is best practice
- Steps to follow to complete an assessment
- Typical recommendations and actions that result from an assessment
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
Many organizations are immature when it comes to data and analytics use. The answer lies in delivering a greater level of insight from data, straight to the point of need.
There are so many Data Architecture best practices today, accumulated from years of practice. In this webinar, William will look at some Data Architecture best practices that he believes have emerged in the past two years and are not worked into many enterprise data programs yet. These are keepers and will be required to move towards, by one means or another, so it’s best to mindfully work them into the environment.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
Wonder what this data mesh stuff is all about? What are the principles of data mesh? Can you or should you consider data mesh as the approach for your analytics platform? And most important - how can Snowflake help?
Given in Montreal on 14-Dec-2021
Good data is like good water: best served fresh, and ideally well-filtered. Data Management strategies can produce tremendous procedural improvements and increased profit margins across the board, but only if the data being managed is of a high quality. Determining how Data Quality should be engineered provides a useful framework for utilizing Data Quality management effectively in support of business strategy, which in turns allows for speedy identification of business problems, delineation between structural and practice-oriented defects in Data Management, and proactive prevention of future issues.
Over the course of this webinar, we will:
Help you understand foundational Data Quality concepts based on “The DAMA Guide to the Data Management Body of Knowledge” (DAMA DMBOK), as well as guiding principles, best practices, and steps for improving Data Quality at your organization
Demonstrate how chronic business challenges for organizations are often rooted in poor Data Quality
Share case studies illustrating the hallmarks and benefits of Data Quality success
There’s growing recognition in the analyst community that reference data is a form of master data that requires its own governance. Locations, currency codes, financial accounts, and organizational hierarchies are so widely used in an organization that mismatches can result in: reconciliation issues, poor quality analytics or even transactional failures.
While it’s easy to see how poor reference data management (RDM) can cause problems, many companies struggle with determining how to get started. Multiple questions arise: What’s the scope? How should one choose between RDM solutions? How do I compute ROI? To answer these questions and more, Orchestra Networks teamed up with Aaron Zornes, Chief Research Office of the MDM Institute and Godfather of MDM, for: Everything you ever wanted to know about Reference Data (but were afraid to ask).
In this hour long webcast featuring Aaron Zornes (MDM Institute) and Conrad Chuang (Orchestra Networks) you will learn the:
Characteristics of reference data,
Key features of a reference data management (RDM) solution,
Lessons learned RDM implementations,
and more
The Data Governance Annual Conference and International Data Quality Conference in San Diego was very good. I recommend this conference for business and IT persons responsible for data quality and data governenance. There will be a similar event in Orlando, December 2010. This is the presentation I delivered to a grateful audience.
Analyst field reports on top 20 multi domain MDM solutions - Aaron Zornes (NY...Aaron Zornes
“Top 10” MDM Evaluation Criteria
Data model
Business services
Identity resolution
Data governance
Architecture
Data management
Infrastructure
Analytics
Developer productivity
Vendor integrity
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Strategic Business Requirements for Master Data Management SystemsBoris Otto
This presentation describes strategic business requirements of master data management (MDM) systems. The requirements were developed in a consortium research approach by the Institute of Information Management at the University of St. Gallen, Switzerland, and 20 multinational enterprises.
The presentation was given at the 17th Amercias Conference on Information Systems (AMCIS 2011) in Detroit, MI.
The research paper on which this presentation is based on can be found here: http://www.alexandria.unisg.ch/Publikationen/Zitation/Boris_Otto/177697
More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Gartner: Master Data Management FunctionalityGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
How to Implement Data Governance Best PracticeDATAVERSITY
Data Governance Best Practice is defined as basis and guidelines for suggested governing activities. Organizations define best practices to be used as a point of comparison when determining their readiness, willingness and actions necessary to put a Data Governance program in place. But what are the best practices and how can they be implemented? This webinar will address these questions and more.
In this RWDG webinar, Bob Seiner will talk about how to create, validate, assess and implement Data Governance Best Practice with immediate impact on present and future Data Governance activities. The result of a Best Practice assessment is a thorough actionable plan focused on demonstrating value from your Data Governance program.This webinar will cover:
• Two Criteria for Data Governance Best Practice Development
• How to Assess against Best Practice to Build Program Success
• Examples of Industry Selected DG Best Practice
• How to Communicate DG Best Practice in a Non-Threatening Way
• How to Build DG Best Practice into Daily Operations
Requirements for a Master Data Management (MDM) Solution - PresentationVicki McCracken
Working on Requirements for a Master Data Management solution and looking for thoughts on how to approach the requirements? This is an overview presentation that complements my guide on how to approach requirements for a Master Data Management solution (Requirements for an MDM Solution). You may be able to leverage all or some of the approach described in this guide to formulate your approach.
Data modelling has been around since the mid 1970's but in many organisations there is considerable scepticism and downright distrust regarding the place dta modelling should occupy. So why does data modelling still have to be "sold" in many companies, and in others people simply don't believe it's necessary " the software package has all I need"! This paper looks at the failure of organisations to capitalise on the benefits data modelling can yield and examines where in the changing information systems landscape modelling is relevant.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
This is a slide deck that was assembled as a result of months of Project work at a Global Multinational. Collaboration with some incredibly smart people resulted in content that I wish I had come across prior to having to have assembled this.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
There’s growing recognition in the analyst community that reference data is a form of master data that requires its own governance. Locations, currency codes, financial accounts, and organizational hierarchies are so widely used in an organization that mismatches can result in: reconciliation issues, poor quality analytics or even transactional failures.
While it’s easy to see how poor reference data management (RDM) can cause problems, many companies struggle with determining how to get started. Multiple questions arise: What’s the scope? How should one choose between RDM solutions? How do I compute ROI? To answer these questions and more, Orchestra Networks teamed up with Aaron Zornes, Chief Research Office of the MDM Institute and Godfather of MDM, for: Everything you ever wanted to know about Reference Data (but were afraid to ask).
In this hour long webcast featuring Aaron Zornes (MDM Institute) and Conrad Chuang (Orchestra Networks) you will learn the:
Characteristics of reference data,
Key features of a reference data management (RDM) solution,
Lessons learned RDM implementations,
and more
The Data Governance Annual Conference and International Data Quality Conference in San Diego was very good. I recommend this conference for business and IT persons responsible for data quality and data governenance. There will be a similar event in Orlando, December 2010. This is the presentation I delivered to a grateful audience.
Analyst field reports on top 20 multi domain MDM solutions - Aaron Zornes (NY...Aaron Zornes
“Top 10” MDM Evaluation Criteria
Data model
Business services
Identity resolution
Data governance
Architecture
Data management
Infrastructure
Analytics
Developer productivity
Vendor integrity
Data Con LA 2020
Description
In this session, I introduce the Amazon Redshift lake house architecture which enables you to query data across your data warehouse, data lake, and operational databases to gain faster and deeper insights. With a lake house architecture, you can store data in open file formats in your Amazon S3 data lake.
Speaker
Antje Barth, Amazon Web Services, Sr. Developer Advocate, AI and Machine Learning
Strategic Business Requirements for Master Data Management SystemsBoris Otto
This presentation describes strategic business requirements of master data management (MDM) systems. The requirements were developed in a consortium research approach by the Institute of Information Management at the University of St. Gallen, Switzerland, and 20 multinational enterprises.
The presentation was given at the 17th Amercias Conference on Information Systems (AMCIS 2011) in Detroit, MI.
The research paper on which this presentation is based on can be found here: http://www.alexandria.unisg.ch/Publikationen/Zitation/Boris_Otto/177697
More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
You’ve heard the marketing buzz, maybe you have been to a workshop and worked with some Spark, Delta, SQL, Python, or R, but you still need some help putting all the pieces together? Join us as we review some common techniques to build a lakehouse using Delta Lake, use SQL Analytics to perform exploratory analysis, and build connectivity for BI applications.
Gartner: Master Data Management FunctionalityGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Building End-to-End Delta Pipelines on GCPDatabricks
Delta has been powering many production pipelines at scale in the Data and AI space since it has been introduced for the past few years.
Built on open standards, Delta provides data reliability, enhances storage and query performance to support big data use cases (both batch and streaming), fast interactive queries for BI and enabling machine learning. Delta has matured over the past couple of years in both AWS and AZURE and has become the de-facto standard for organizations building their Data and AI pipelines.
In today’s talk, we will explore building end-to-end pipelines on the Google Cloud Platform (GCP). Through presentation, code examples and notebooks, we will build the Delta Pipeline from ingest to consumption using our Delta Bronze-Silver-Gold architecture pattern and show examples of Consuming the delta files using the Big Query Connector.
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
Dragan Berić will take a deep dive into Lakehouse architecture, a game-changing concept bridging the best elements of data lake and data warehouse. The presentation will focus on the Delta Lake format as the foundation of the Lakehouse philosophy, and Databricks as the primary platform for its implementation.
How to Implement Data Governance Best PracticeDATAVERSITY
Data Governance Best Practice is defined as basis and guidelines for suggested governing activities. Organizations define best practices to be used as a point of comparison when determining their readiness, willingness and actions necessary to put a Data Governance program in place. But what are the best practices and how can they be implemented? This webinar will address these questions and more.
In this RWDG webinar, Bob Seiner will talk about how to create, validate, assess and implement Data Governance Best Practice with immediate impact on present and future Data Governance activities. The result of a Best Practice assessment is a thorough actionable plan focused on demonstrating value from your Data Governance program.This webinar will cover:
• Two Criteria for Data Governance Best Practice Development
• How to Assess against Best Practice to Build Program Success
• Examples of Industry Selected DG Best Practice
• How to Communicate DG Best Practice in a Non-Threatening Way
• How to Build DG Best Practice into Daily Operations
Requirements for a Master Data Management (MDM) Solution - PresentationVicki McCracken
Working on Requirements for a Master Data Management solution and looking for thoughts on how to approach the requirements? This is an overview presentation that complements my guide on how to approach requirements for a Master Data Management solution (Requirements for an MDM Solution). You may be able to leverage all or some of the approach described in this guide to formulate your approach.
Data modelling has been around since the mid 1970's but in many organisations there is considerable scepticism and downright distrust regarding the place dta modelling should occupy. So why does data modelling still have to be "sold" in many companies, and in others people simply don't believe it's necessary " the software package has all I need"! This paper looks at the failure of organisations to capitalise on the benefits data modelling can yield and examines where in the changing information systems landscape modelling is relevant.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
This is a slide deck that was assembled as a result of months of Project work at a Global Multinational. Collaboration with some incredibly smart people resulted in content that I wish I had come across prior to having to have assembled this.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Webinar: How MongoDB is Used to Manage Reference Data - May 2014MongoDB
Managing and distributing reference data globally has always been a challenge for financial institutions. Managing and maintaining database schemas while integrating and replicating that data across geographies is costly and time consuming. MongoDB's native replication capabilities and partitioned architecture make it simple to distribute and synchronize data efficiently across the globe. MongoDB’s dynamic schema dramatically reduces database maintenance for schema migrations – data structure changes can be applied with no down time, and with no impact to existing applications. For example, by migrating its reference data management application to MongoDB, a Tier 1 bank dramatically reduced the license and hardware costs associated with the proprietary relational database it previously ran.
MDM Institute: Why is Reference data mission critical now?Orchestra Networks
Learn why market-leading enterprises are focusing on RDM in this exclusive webinar from MDM research analyst Aaron Zornes
More than 55% of large enterprises surveyed by the MDM Institute are planning on implementing reference data management (RDM) in the next 18 months.
Why is RDM mission critical today?
How does RDM differ from (how is it similar to) MDM?
What are the top business drivers for RDM?
What are the “top 10” technical evaluation criteria?
Where are most organizations focusing their RDM efforts?
Aaron Zornes, Chief Research Officer of the MDM Institute, answers these questions and more when he reveals findings from the first ever RDM market study based on a 1Q2014 survey of 75+ global 5000 size enterprises.
An Elastic Metadata Store for eBay’s Media PlatformMongoDB
In order to build a robust, multi-tenant, highly available storage services that meet the business’ SLA your databases has to be sharded. But if your service has to scale continuously through the incremental additions of storage without service interruption or human intervention, basic static sharding is not enough. At eBay, we are building MStore to solve this problem, with MongoDB as the storage engine. In this presentation, we will dive into the key design concepts of this solution.
Apache Camel is a very popular integration library that works very well with microservice architecture.
This talk introduces you to Apache Camel and how you can easily get started with Camel on your computer.
Then we cover how to create new Camel projects from scratch as micro services which you can boot using Camel or Spring Boot, or other micro containers such as Jetty or fat JARs. We then take a look at what options you have for monitoring and managing your Camel microservices
using tooling such as Jolokia, and hawtio web console.
The second part of this talk is about running Camel in the cloud. We start by showing you how you can use the Maven Docker Plugin to create a docker image of your Camel application and run it using docker on a single host. Then kubernetes enters the stage and we take a look at how you can deploy your docker images on a kubernetes cloud platform, and how thenfabric8 tooling can make this much easier for the Java developers.
At the end of this talk you will have learned about and seen in practice how to take a Java Camel project from scratch, turn that into a docker image, and how you can deploy those docker images in a scalable cloud platform based on Google's kubernetes.
How to identify the correct Master Data subject areas & tooling for your MDM...Christopher Bradley
1. What are the different Master Data Management (MDM) architectures?
2. How can you identify the correct Master Data subject areas & tooling for your MDM initiative?
3. A reference architecture for MDM.
4. Selection criteria for MDM tooling.
chris.bradley@dmadvisors.co.uk
MongoBD London 2013: Real World MongoDB: Use Cases from Financial Services pr...MongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data. This coupled with cost pressures from the business has led these institutions to seek alternatives. In this session learn how FS companies are using MongoDB to solve their problems. The use cases are specific to FS but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
Adoption of MongoDB has accelerated tremendously among developers in the past 18 months, and many large enterprises have now deployed MongoDB in reliable and large scale production environments. However, for many developers, it remains a challenge to convince production teams and business stakeholders to adopt an open source technology that has not been certified yet by their IT teams. This session will provide you with the compelling arguments to reassure business and production teams such as:
Public customer references and real-world case studies (migration, and adoption stories)
Deployment support and practices for robustness
How MongoDB contributes to your company’s business value
Enterprise Reporting with MongoDB and JasperSoftMongoDB
Presented by Daniel Roberts, Senior Solutions Architect at MongoDB, at the recent JasperWorld event during the London leg of their current European city tour.
About the Speaker, Daniel Roberts:
Prior to MongoDB Daniel worked at Oracle for 11 years in a number of different positions, including Oracle's middleware technologies and strategy. Prior roles include consulting, product management, business development and more recently as a solution architect for financial services. Daniel has also worked for Novell, ICL and as a freelance contractor. He has a degree in Computer Science from Nottingham Trent University in the UK.
Overview of RDBMS, MongoDB, Cassandra, Hadoop, MarkLogic and a two-fold approach to building a regulatory platform.
Talk at MarkLogic User Group Benelux meetup December 2016 (Utrecht)
Webinar: How to Drive Business Value in Financial Services with MongoDBMongoDB
Huge upheaval in the finance industry has led to a major strain on existing IT infrastructure and systems. New finance industry regulation has meant increased volume, velocity and variability of data. This coupled with cost pressures from the business has led these institutions to seek alternatives. Top tier institutions like MetLife have turned to MongoDB because of the enormous business value it enables.
In this session, hear how MongoDB enabled these successful real world examples:
Single View of a Customer - 3 months and $2M for a single view of a customer across 50 source systems
Reference Data Management - $40M in cost savings from migrating to MongoDB for reference data management
Private cloud - MongoDB as a PaaS across a tier 1 bank for enabling agility for operations, not just the developer
The use cases are specific to financial services but the patterns of usage - agility, scale, global distribution - will be applicable across many industries.
How to Place Data at the Center of Digital Transformation in BFSIDenodo
Watch full webinar here: https://bit.ly/3j7E9Jo
Consumers are increasingly using digital banking tools and insurance models, and these numbers will only continue to grow. Financial and insurance organizations have to adapt to the new and always changing situation while complying with new regulations, such as IFRS17, and embracing ESG criteria.
At the heart of any digital transformation is data. Therefore, it is not a stretch to say that data management and analytics strategies differentiate many of the leaders from the laggards in the banking, financial services and insurance (BFSI) industry. BFSI organizations still relying on slow, traditional systems and data management processes will find themselves falling behind their competition. In addition, as many adopt cloud strategies, these traditional approaches fill the cloud modernization process with downtime and end user frustration. In fact, according to a McKinsey article, cloud combined with distributed data infrastructure will define how consumers and providers adopt digital insurance models for the next decade.
Hear how the BFSI industry is leveraging data virtualization to deploy data fabric or data mesh architectures for enterprise-wide digital transformation.
Join this webinar to learn:
- The latest trends in BFSI for 2023 and how data and analytics is reshaping the industry
- How a logical data architecture can help you capitalize on your data
- How Denodo customers digitally transformed themselves using the Denodo Platform
Enabling Telco to Build and Run Modern Applications Tugdual Grall
See how new databases like MongoDB enable Telco Enterprises to Build and Run Modern Applications.
This presentations was delivered in Tel Aviv in Jan-2015 during a Telco round table organized by Matrix.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
6. Relational Database Challenges
Data Types Agile Development
• Unstructured data • Iterative
• Semi-structured • Short development
data cycles
• Polymorphic data • New workloads
Volume of Data New Architectures
• Petabytes of data • Horizontal scaling
• Trillions of records • Commodity servers
• Tens of millions of • Cloud computing
queries per second
6
7. Financial Services Use Cases
1. Risk Analysis & Reporting
2. Tick Data Capture & Analysis
3. Portfolio and P&L Reporting
4. Product Catalog and Trade Lifecycle Management
5. Trade Repository
6. Quantitative Analysis & Automated Trading
7. Order Capture
8. Reference Data
7
8. Reference Data
• How do you globally distribute reference data?
– Polymorphic data
• Price / Products / Securities Master
• Counterparty information - KYC
• Corporate Actions
• Golden / Single source truth
– Often changing in structure,
• e.g. new products
– Often High volume
• How is this typically solved today?
8
9. Current Implementations
• What do reference data solutions look like today?
• Storage
– Relational Database or Caching Technologies
• Replication
– ETL or Messaging
• Complex, Costly and Brittle
– Maintenance
• schema changes
• infrastructure
– Multiple technologies
9
10. Why MongoDB?
• What features in MongoDB are ideally suited for
Global replicated reference data systems?
1. Dynamic and flexible schema
10
11. Relational: All Data is Column/Row
IssID
IssuerName
PVCurrency
117883
DWS
Vietnam
Fund
USD
69461
Independence
III
Cdo
Ltd
USD
102862
Zamano
Plc
EUR
73277
Green
Way
BMD
65134
First
European
Growth
Inc.
CHF
SecID
EventID
Company_Mee9ng
IssID
762288
407341
AGM
117883
81198
243459
SDCHG
69461
422999
410626
AGM
102862
422999
243440
SDCHG
102862
75128
20056
ISCHG
65134
11
13. Benefits of MongoDB’s Document Model
• Expressiveness
of
Data
Modeling
– A
single
document
can
express
and
encompass
a
wide
variety
of
noTons
• Flexible
Modeling
– No
need
to
migrate
for
simple
extensions
• Simplifica9on
of
Data
Modeling
– Fewer
collecTons
as
most
data
can
be
encapsulated
in
a
single
document
• Easier
Development
– Developers
understand
documents
as
it
maps
well
to
their
data
structures
• Faster
Time
to
Market
– Agile
development
means
faster
results
And
enables
beEer
data
locality
=>
faster
performance
and
scaling
13
14. Why MongoDB?
• What features in MongoDB are ideally suited for
Globally replicated reference data systems?
1. Dynamic and flexible schema
2. Built in replication and high availability
14
15. High Availability
• Automated replication and failover
• Multi-data center support
• Improved operational simplicity (e.g., HW swaps)
• Data durability and consistency
15
17. Why MongoDB?
• What features in MongoDB are ideally suited for
Globally replicated reference data systems?
1. Dynamic and flexible schema
2. Built in replication and high availability
3. Tag Aware Sharding (Geo)
17
20. 1. Case Study: Global Broker Dealer -
Reference Data Management
ETL
ETL
ETL ETL
ETL
Feeds & Batch data ETL
• Pricing Source
• Accounts Master Data ETL
• Securities Master (RDBMS)
• Corporate actions
Each represents
• People $
• Hardware $
Destination
• License $
Data
• Reg penalty $
(RDBMS)
• & other downstream
problems
20
21. Solution with MongoDB
Real-time
Real-time
Real-time Real-time
Real-time
Feeds & Batch data Real-time
• Pricing
• Accounts Real-time
MongoDB
• Securities Master
Primary
• Corporate actions
Each represents
• No people $
• Less hardware $
• Less license $
• No penalty $ MongoDB
• & many less Secondaries
problems
21
22. Case Study: Global investment bank
Distribute reference data globally in real-time for
fast local accessing and querying
Problem Why MongoDB Results
• Delays up to 20 hours in • Dynamic schema • Will save about
distributing data via ETL management: update $40,000,000 in costs and
• Had to manage 20 immediately & in one penalties over 5 years
distributed systems with place
same data • Greater throughput means
• Auto-replication: data charging more to internal
• Incurring regulatory distributed in real-time groups
penalties from missing
SLAs • Both cache and database: • Network and disk speed is
• Stale data caused cache always up-to-date the bottleneck, not
operational issues software and applications
• Simple data modeling &
analysis: easy changes
and understanding
22
23. Summary
• Why MongoDB for Reference Data solutions?
1. Dynamic and flexible schema
2. Built in replication and high availability
3. Tag Aware Sharding (Geo)
23
24. Q&A
Up And Coming
FS webinar in April - Tick database
• http://www.10gen.com/webinar/using-mongodb-as-tick-database
FS webinar in April - Risk
• http://www.10gen.com/webinar/mitigate-risk-with-mongodb
MongoDB Days - London, San Francisco, and NYC
• http://www.10gen.com/events
MongoDB 2.4 Release
• http://www.mongodb.org/downloads
25. Key Features
JSON Data Model with Auto-Sharding for
Dynamic Schema Horizontal Scalability
Rich, Document-Based
Flexible, Full Index Support
Queries
Built-In Replication and
Fast, In-Place Updates
High Availability
Aggregation Framework and
GridFS for Large File Storage
Map/Reduce
25
26. For More Information
Resource User Data Management
Location
MongoDB Downloads www.mongodb.org/download
Free Online Training education.10gen.com
Webinars and Events www.10gen.com/events
White Papers www.10gen.com/white-papers
Customer Case Studies www.10gen.com/customers
Presentations www.10gen.com/presentations
Documentation docs.mongodb.org
Additional Info info@10gen.com
26