The document summarizes a MongoDB event focused on modernizing mainframe applications. The event agenda includes presentations on moving from mainframes to operational data stores, demo of a mainframe offloading solution from Quantyca, and stories of mainframe modernization. Benefits of using MongoDB for mainframe modernization include 5-10x developer productivity and 80% reduction in mainframe costs.
Change Data Streaming Patterns for Microservices With Debezium confluent
(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
Over the last year, we have been moving from a batch processing jobs setup with Airflow using EC2s to a powerful & scalable setup using Airflow & Spark in K8s.
The increasing need of moving forward with all the technology changes, the new community advances, and multidisciplinary teams, forced us to design a solution where we were able to run multiple Spark versions at the same time by avoiding duplicating infrastructure and simplifying its deployment, maintenance, and development.
AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. In this session, we will show you how you can quickly build a data lake on AWS that ingests, catalogs and processes incoming data and makes it ready for analysis. Using a live demo, we demonstrate the capabilities of AWS provided analytical services such as AWS Glue, Amazon Athena and Amazon EMR and how to build a Data Lake on AWS step-by-step.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. We’ll discuss Amazon RDS fundamentals, learn about the six available database engines (with the seventh on the way), and examine customer success stories.
Change Data Streaming Patterns for Microservices With Debezium confluent
(Gunnar Morling, RedHat) Kafka Summit SF 2018
Debezium (noun | de·be·zi·um | /dɪ:ˈbɪ:ziːəm/): secret sauce for change data capture (CDC) streaming changes from your datastore that enables you to solve multiple challenges: synchronizing data between microservices, gradually extracting microservices from existing monoliths, maintaining different read models in CQRS-style architectures, updating caches and full-text indexes and feeding operational data to your analytics tools
Join this session to learn what CDC is about, how it can be implemented using Debezium, an open source CDC solution based on Apache Kafka and how it can be utilized for your microservices. Find out how Debezium captures all the changes from datastores such as MySQL, PostgreSQL and MongoDB, how to react to the change events in near real time and how Debezium is designed to not compromise on data correctness and completeness also if things go wrong. In a live demo we’ll show how to set up a change data stream out of your application’s database without any code changes needed. You’ll see how to sink the change events into other databases and how to push data changes to your clients using WebSockets.
Apache Spark Streaming in K8s with ArgoCD & Spark OperatorDatabricks
Over the last year, we have been moving from a batch processing jobs setup with Airflow using EC2s to a powerful & scalable setup using Airflow & Spark in K8s.
The increasing need of moving forward with all the technology changes, the new community advances, and multidisciplinary teams, forced us to design a solution where we were able to run multiple Spark versions at the same time by avoiding duplicating infrastructure and simplifying its deployment, maintenance, and development.
AWS delivers an integrated suite of services that provide everything needed to quickly and easily build and manage a data lake for analytics. AWS-powered data lakes can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches to gain deeper insights, in ways that traditional data silos and data warehouses cannot. In this session, we will show you how you can quickly build a data lake on AWS that ingests, catalogs and processes incoming data and makes it ready for analysis. Using a live demo, we demonstrate the capabilities of AWS provided analytical services such as AWS Glue, Amazon Athena and Amazon EMR and how to build a Data Lake on AWS step-by-step.
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud (Hadoop / Spark Conference Japan 2019)
# English version #
http://hadoop.apache.jp/hcj2019-program/
Build a simple data lake on AWS using a combination of services, including AWS Glue Data Catalog, AWS Glue Crawlers, AWS Glue Jobs, AWS Glue Studio, Amazon Athena, Amazon Relational Database Service (Amazon RDS), and Amazon S3.
Link to the blog post and video: https://garystafford.medium.com/building-a-simple-data-lake-on-aws-df21ca092e32
Achieving Lakehouse Models with Spark 3.0Databricks
It’s very easy to be distracted by the latest and greatest approaches with technology, but sometimes there’s a reason old approaches stand the test of time. Star Schemas & Kimball is one of those things that isn’t going anywhere, but as we move towards the “Data Lakehouse” paradigm – how appropriate is this modelling technique, and how can we harness the Delta Engine & Spark 3.0 to maximise it’s performance?
Amazon RDS allows you to launch an optimally configured, secure and highly available database with just a few clicks. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. We’ll discuss Amazon RDS fundamentals, learn about the six available database engines (with the seventh on the way), and examine customer success stories.
Amazon RDS enables you to launch an optimally configured, secure, and highly available relational database with just a few clicks. It provides cost-efficient and resizable capacity while managing time consuming administration tasks, freeing you to focus on your applications and business. In this session, we take a closer look at how Amazon RDS works, and we review best practices to achieve performance, flexibility, and cost savings for your MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server databases on Amazon RDS. We also discuss AWS Database Migration Service, a quick and secure means for migrating your existing relational database management system investments to Amazon RDS.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018Amazon Web Services
Database migrations are an important step in any journey to AWS. In this session, we show you how to get started with AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) to quickly and securely migrate your databases to AWS. Learn how to simplify your database migrations by using this service to migrate your data to and from commercial and open-source databases. We also explain how you can perform homogenous migrations such as MySQL to MySQL, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.You’ll then hear from a customer who has leveraged Redshift in their industry and how they have adopted many of the best practices. Learn More: https://aws.amazon.com/government-education/
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
Many organizations have adopted or are in the process of adopting DevOps methodologies in their quest to accelerate the delivery of software capabilities, features, and functionalities to support their organizational objectives. By applying the same practices, DataOps aims to provide the same level of agility in delivering data and information to the organization. AWS Lake Formation, in coordination with other AWS Services, enables DevOps methodologies to be realized through the Data Supply Chain Pipeline.
by Darin Briskman, Technical Evangelist, AWS
Database Freedom means being able to use the database engine that’s right for you as your needs evolve. Being locked into a specific technology can prevent you from achieving your mission. Fortunately, AWS Database Migration Service makes it easy to switch between different database engines. We’ll look at how to use Schema Migration Tool with DMS to switch from a commercial database to open source. You’ll need a laptop with a Firefox or Chrome browser.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
In this session, we discuss architectural principles that helps simplify big data analytics.
We'll apply principles to various stages of big data processing: collect, store, process, analyze, and visualize. We'll disucss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on.
Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
Building large scale transactional data lake using apache hudiBill Liu
Data is a critical infrastructure for building machine learning systems. From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power business critical data pipelines at low latency and high efficiency, and helps distributed organizations build and manage petabyte-scale data lakes.
In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel.
We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases.
Speaker: Satish Kotha (Uber)
Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore.
website: https://www.aicamp.ai/event/eventdetails/W2021043010
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
Amazon RDS enables you to launch an optimally configured, secure, and highly available relational database with just a few clicks. It provides cost-efficient and resizable capacity while managing time consuming administration tasks, freeing you to focus on your applications and business. In this session, we take a closer look at how Amazon RDS works, and we review best practices to achieve performance, flexibility, and cost savings for your MySQL, PostgreSQL, MariaDB, Oracle, and SQL Server databases on Amazon RDS. We also discuss AWS Database Migration Service, a quick and secure means for migrating your existing relational database management system investments to Amazon RDS.
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
Data Mesh is an innovative concept addressing many data challenges from an architectural, cultural, and organizational perspective. But is the world ready to implement Data Mesh?
In this session, we will review the importance of core Data Mesh principles, what they can offer, and when it is a good idea to try a Data Mesh architecture. We will discuss common challenges with implementation of Data Mesh systems and focus on the role of open-source projects for it. Projects like Apache Spark can play a key part in standardized infrastructure platform implementation of Data Mesh. We will examine the landscape of useful data engineering open-source projects to utilize in several areas of a Data Mesh system in practice, along with an architectural example. We will touch on what work (culture, tools, mindset) needs to be done to ensure Data Mesh is more accessible for engineers in the industry.
The audience will leave with a good understanding of the benefits of Data Mesh architecture, common challenges, and the role of Apache Spark and other open-source projects for its implementation in real systems.
This session is targeted for architects, decision-makers, data-engineers, and system designers.
Database Migration Using AWS DMS and AWS SCT (GPSCT307) - AWS re:Invent 2018Amazon Web Services
Database migrations are an important step in any journey to AWS. In this session, we show you how to get started with AWS Database Migration Service (AWS DMS) and AWS Schema Conversion Tool (AWS SCT) to quickly and securely migrate your databases to AWS. Learn how to simplify your database migrations by using this service to migrate your data to and from commercial and open-source databases. We also explain how you can perform homogenous migrations such as MySQL to MySQL, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora.
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
Recently, a set of modern table formats such as Delta Lake, Hudi, Iceberg spring out. Along with Hive Metastore these table formats are trying to solve problems that stand in traditional data lake for a long time with their declared features like ACID, schema evolution, upsert, time travel, incremental consumption etc.
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
This is my slide presentation from Pragmatic Works' Azure Data Week 2019: Data Quality Patterns in the Cloud with Azure Data Factory using Mapping Data Flows
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Apache Iceberg - A Table Format for Hige Analytic Datasets
Speaker:
Ryan Blue, Netflix
For more Alluxio events: https://www.alluxio.io/events/
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
Get a look under the covers: Learn tuning best practices for taking advantage of Amazon Redshift's columnar technology and parallel processing capabilities to improve your delivery of queries and improve overall database performance. This session explains how to migrate from existing data warehouses, create an optimized schema, efficiently load data, use workload management, tune your queries, and use Amazon Redshift's interleaved sorting features.You’ll then hear from a customer who has leveraged Redshift in their industry and how they have adopted many of the best practices. Learn More: https://aws.amazon.com/government-education/
The Zen of DataOps – AWS Lake Formation and the Data Supply Chain PipelineAmazon Web Services
Many organizations have adopted or are in the process of adopting DevOps methodologies in their quest to accelerate the delivery of software capabilities, features, and functionalities to support their organizational objectives. By applying the same practices, DataOps aims to provide the same level of agility in delivering data and information to the organization. AWS Lake Formation, in coordination with other AWS Services, enables DevOps methodologies to be realized through the Data Supply Chain Pipeline.
by Darin Briskman, Technical Evangelist, AWS
Database Freedom means being able to use the database engine that’s right for you as your needs evolve. Being locked into a specific technology can prevent you from achieving your mission. Fortunately, AWS Database Migration Service makes it easy to switch between different database engines. We’ll look at how to use Schema Migration Tool with DMS to switch from a commercial database to open source. You’ll need a laptop with a Firefox or Chrome browser.
At wetter.com we build analytical B2B data products and heavily use Spark and AWS technologies for data processing and analytics. I explain why we moved from AWS EMR to Databricks and Delta and share our experiences from different angles like architecture, application logic and user experience. We will look how security, cluster configuration, resource consumption and workflow changed by using Databricks clusters as well as how using Delta tables simplified our application logic and data operations.
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
In essence, a data lake is commodity distributed file system that acts as a repository to hold raw data file extracts of all the enterprise source systems, so that it can serve the data management and analytics needs of the business. A data lake system provides means to ingest data, perform scalable big data processing, and serve information, in addition to manage, monitor and secure the it environment. In these slide, we discuss building data lakes using Azure Data Factory and Data Lake Analytics. We delve into the architecture if the data lake and explore its various components. We also describe the various data ingestion scenarios and considerations. We introduce the Azure Data Lake Store, then we discuss how to build Azure Data Factory pipeline to ingest the data lake. After that, we move into big data processing using Data Lake Analytics, and we delve into U-SQL.
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
In this session, we discuss architectural principles that helps simplify big data analytics.
We'll apply principles to various stages of big data processing: collect, store, process, analyze, and visualize. We'll disucss how to choose the right technology in each stage based on criteria such as data structure, query latency, cost, request rate, item size, data volume, durability, and so on.
Finally, we provide reference architectures, design patterns, and best practices for assembling these technologies to solve your big data problems at the right cost.
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
Level: Intermediate
Speakers:
Ryan Malecky - Solutions Architect, EdTech, AWS
Rajakumar Sampathkumar - Sr. Technical Account Manager, AWS
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
Building large scale transactional data lake using apache hudiBill Liu
Data is a critical infrastructure for building machine learning systems. From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power business critical data pipelines at low latency and high efficiency, and helps distributed organizations build and manage petabyte-scale data lakes.
In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel.
We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases.
Speaker: Satish Kotha (Uber)
Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore.
website: https://www.aicamp.ai/event/eventdetails/W2021043010
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
Creating a Modern Data Architecture for Digital TransformationMongoDB
By managing Data in Motion, Data at Rest, and Data in Use differently, modern Information Management Solutions are enabling a whole range of architecture and design patterns that allow enterprises to fully harness the value in data flowing through their systems. In this session we explored some of the patterns (e.g. operational data lakes, CQRS, microservices and containerisation) that enable CIOs, CDOs and senior architects to tame the data challenge, and start to use data as a cross-enterprise asset.
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
Erik Baardse and Ajit Gadge from EDB Postgres presented on how to transform your DBMS in order to drive digital business. How Postgres enables you to support a wider range of workloads with your relational database which opens the Big Data doors. They also cover EnterpriseDB’s Strategy around Big Data which focuses on 3 areas and finally last but not the last how to find money in IT with Big Data and digital transformation
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Unlocking Operational Intelligence from the Data LakeMongoDB
Hadoop-based data lakes are enabling enterprises and governments to efficiently capture and analyze unprecedented volumes of data. Join this webinar to learn how digital transformation is driving the rise of the data lake, the role Hadoop plays in generating new classes of analytics and insight, the critical capabilities you need to evaluate in an operational database for your data lake, and more.
This is a quick overview of the challenges that BigData and Flexible Schema Databases like MongoDB offer regarding Data Treatment and strategies to overcome them.
Enabling Telco to Build and Run Modern Applications Tugdual Grall
See how new databases like MongoDB enable Telco Enterprises to Build and Run Modern Applications.
This presentations was delivered in Tel Aviv in Jan-2015 during a Telco round table organized by Matrix.
The Common BI/Big Data Challenges and Solutions presented by seasoned experts, Andriy Zabavskyy (BI Architect) and Serhiy Haziyev (Director of Software Architecture).
This was a complimentary workshop where attendees had the opportunity to learn, network and share knowledge during the lunch and education session.
Similar to MongoDB Breakfast Milan - Mainframe Offloading Strategies (20)
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
During this talk we'll navigate through a customer's journey as they migrate an existing MongoDB deployment to MongoDB Atlas. While the migration itself can be as simple as a few clicks, the prep/post effort requires due diligence to ensure a smooth transfer. We'll cover these steps in detail and provide best practices. In addition, we’ll provide an overview of what to consider when migrating other cloud data stores, traditional databases and MongoDB imitations to MongoDB Atlas.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
Query performance should be the unsung hero of an application, but without proper configuration, can become a constant headache. When used properly, MongoDB provides extremely powerful querying capabilities. In this session, we'll discuss concepts like equality, sort, range, managing query predicates versus sequential predicates, and best practices to building multikey indexes.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
2. • Dal Mainframe all’Operational Data Store
• Demo Live di Quantyca
• Storie di modernizzazione del mainframe
• Il futuro: PSD2 e Blockchain
• Cerved: Viaggio nellaAPI Economy
• Chiusura e Domande e Risposte
Agenda
5. Let our team help you on your journey to efficiently leverage the capabilities of MongoDB, the data platform that
allows innovators to unleash the power of software and data for giant ideas.
The largest Financial Services and, Communications and Government Organizations are working with MongoDB to
Modernize their Mainframes to Reduce Cost and Increase Resilience
Being successful with MongoDB for Mainframes
5-10xDeveloper Productivity
We help our customers to increase overall
output, e.g. in terms of engineering
productivity.
80%Mainframe Cost Reduction
We help our customers to dramatically lower
their total cost of ownership for data storage
and analytics by up to 80%.
6. Challenges of Mainframes in a Modern World
There are three areas of Data Management. In the legacy world these have been disconnected with
many technologies attempting to achieve an integrated the landscape.
AdaptabilityCost Risk
Unpredictable Loads
Planned/Unplanned Downtime
Expensive Ecosystem
Change Management
Access to Skills
Capacity Management
Business Process Risk
Operational Complexity
Customer Experience
9. Limits
• High Implementation Effort
• Transformation and standardization in a harmonic data model is the
DWH heart.
• Rigidity
• The data model is pre-defined and rigid and is difficult to expand it to
integrate additional external sources
• No Raw Data.
• Data Volume
• DWH can manage high volume of data but with dedicated databases
and optimized hardware.
10. Rise of Data Lakes
• Many companies started to look at a Data Lake architecture
• Platform to manage data in a flexible
• To aggregate and put in relation data cross-silos in a single repository
• Exploration of all the data
• The most common platform is Hadoop
• Allows horizontal scalability on commodity hardware
• Allows storing heterogeneous data with a read optimized model
• Include working layer in SQL and common languages
• Great references (Yahoo e Google)
12. How is Working?
• Source data are loaded as they are in a raw data layer without
any transformation
• The Technology is not based on a RDBMS but on a file system
(HDFS di Hadoop for example)
• The queries are executed directly on the raw data and are much
more complex to write since must contains also the logic of
harmonization and consolidation of data
• The add of the source of information is quite easy
13. Typical High Level Architecture
Producers
Files
Orders
MDM Logistics
(…)
Mainframe
(E)TL, e.g. Abinitio,
Talend, Informatica
Consumers / Channels
Web Mobile
B2C CMS (…)
APIs
DATA LAKE
14. And for non analytical queries?
• Data Lakes are designed to provide the Hadoop output to online
applications. These applications can have requirements like:
• Response time in ms
• Random Access on small indexed subset of data
• Expressive queries
• Frequent real-time updates
15. Reference Architecture / 3
Producers
Files
Orders
MDM Logistics
(…)
Legacy DBs
(E)TL, e.g. Abinitio,
Talend, Informatica
Consumers / Channels
Web Mobile
B2C CMS (…)
APIs
DATA LAKE
COLUMN DB SEARCH ENGINE
16. Issues
• Data Model
• Is designed for Analytics not for the applications
• Not Real-Time
• ETL process, long time to traverse all the components
• SearchDB limited in query capabilities
• ColumnDB limited in data model flexibility and HA
17. Legacy Optimization with MongoDB
• Extending legacy applications to new channels & users
• Mobile & web, IoT sensor data integration, single view of data
• Different data access, performance and scalability requirements
• Conserve valuable MIPS, avoid resource contention
• Move from a product-centric to a customer-centric schema
Business Drivers
Solution
• Operational Data Store built on MongoDB
18. Banks Situation: The Past
• Banks used mainframe systems to store critical data to that was
at the core of their business. Account data, transaction data,
other core financial data, etc.
• Internal front office systems were used by bank employees
(“Bankers”) to access this critical data and deliver it to clients.
• Critical bank practices (data governance, backup, HA/DR,
auditing, etc.) evolved with mainframe systems as a core
element of the practices and solutions. Mainframes were the
golden source of data before we invented the term “golden
source” .
19. The Past
Interactions
Front office applications
Teller systems
Advisor tools
Get account info
Get investment Info
Risk
Main frame
infrastructureClient Banker
Data requests
Manage client profile
ATM
Get/update account balanceDirect access
20. Present
Mobile app functionality
Get account summary for Ryan
Get client summary for Ryan
Mainframe.
Mobile
Get last 10 mobile touchpoints for Ryan
Other data
sources
Load
home
screen
Web
Load
home
screen
Logic (service calls,
orchestration,
aggregation, data
transformation)
Web app functionality
Get account summary for Ryan
Get client summary for Ryan
Get last 7 days web activity for Ryan
Logic (service calls,
orchestration,
aggregation, data
transformation)
Teller / POS app functionality
Get account summary for Ryan
Get client summary for Ryan
Get some bank confidential data
Reference
data
Specialized
integration
point
Account services
Client data services
Access log
(filter = mobile)
Confidential data
access
DATA
Access log
filter = web)
Account services
Client data services
Account services
Client data services
These operations
are typically read
on demand with
caching et. al..
Client Banker
21. Use Case Montly UK Payday
Problem Why MongoDB Results
Situation Technology Changes Cost
Almost everyone gets paid on the same
day here in London (not like in NA), once
a month at the end of the calendar month.
Almost everyone in London uses their
bank mobile app to check their balance on
the same day, once a month.
People used to have to go to the bank to
get your balance but now everyone has a
phone on their app they can use to check
their balance.
The fact that it’s so easy to check bank
balance combined with the significant time
between paydays means that most people
check their balance on payday at least
once if not multiple times.
The mainframe infrastructure to support
this spike in traffic due to this use case
costs banks millions dollars a year.
22. An abstract, logical view
If we strip away some details to investigate a more fundamental,
abstract, logical view of the current state, it looks like this.
LOGIC
Function app + data
logic
Other
services
Need data
Thing app + data
logic
Need data
App app + data
logic
Need data
Other app + data
logic
Need data
MAIN FRAME
SVCS + INFRA
MAIN
FRAME
DATA
Other data
sources
Reference
data
23. Some simple changes
moving the app/data logic out of clients and replacing it with data
logic between the legacy DB and the MongoDB ODS, we can
deliver significant value to reduce complexity and redundancy.
write
DATA
LOGIC
Function
rules Other
services
Use data
Analysi
s
transform
Use data
Other
Use data
Other data
sources
MongoDB
ODS
services
audit
LEGACY
DBS
LEGACY
DATA
App
Use data
Thing Use data
schema
orchestrate
plumbing
Reference
data
rea
d
Batch
Near real
time
Real
time
24. Why MongoDB
• Data Model Optimized
• The data are transformed during the replication phase to a customer-
centric data model, instead of a product-centric one, this is called
”Single View”
• Flexibility
• The flexible document-based model remove the need to update the
schema to store new info
• Architecture for Modern Applications
• MongoDB has a scalable, geographically distributed architecture for
always on services.
• Integration with Analytical Systems
25. Benefits
• Reduce Development Time
• The customer-centric data model and the MongoDB flexibility and
document data representation reduce by more than 50% the
development time
• Improve the SLA
• MongoDB scalable always on architecture improves the uptime and the
performance, providing a better customer experience
• Reduce Costs
• Move MIPS from legacy to commodity low cost hardware
26. Consumers / ChannelsProducers
Reference Architecture
Files
Mainframe
Additional
data sources
& systems
Files
Web Mobile
B2C CMS (…)
APIs
API (Authorisation, Authentication, Logging, etc.)
(E)TL, e.g. Abinitio,
Talend, Informatica,
Custom
CDC Process
(Change Data Capture)
(Attunity)
Message Queues, e.g.
Kafka, ActiveMQ,
RabbitMQ
Orders
MDM Logistics
(…)
Optional
Operational
Data Store
29. Attunity Replicate
Rapidly Move Data Across Complex Hybrid Environments
TargetsSources
On Premises
Cloud Platform
HadoopRDBMS
Data
Warehouse
Hadoop
RDBMS
Data
Warehouse
WAN-Optimized Data
Transfer
Compression | Multi-pathing
Encryption
30. Attunity Replicate
Go Agile with Automated Processes
• Target schema creation
• Heterogeneous data type
mapping
• Batch to CDC transition
• DDL change propagation
• Filtering
• Transformations
Hadoop
Fil
es
RDBM
S
Mainfram
e
Hadoop
Files
RDBMS
Kafka
EDW EDW
31. Attunity Replicate
Zero Footprint Architecture
• CDC identifies source updates by scanning change logs
• No software agents required on sources or targets
• Minimizes administrative overhead
Low Infrastructure Impact
• Log based CDC
• Source specific optimization
• Uses native DB clients for
security
Hadoop
Fil
es
RDBM
S
Mainfram
e
Hadoop
Files
RDBMS
Kafka
EDW EDW
32. Attunity Replicate
Streaming CDC to Apache Kafka
CDC
MSG
n 2 1
MSG MSG
Data Streaming
Transaction
logs
In Memory Optimized Metadata
Management and Data Transport
JSON/
Avro
data
format
Bulk
Load
MSG
n 2 1
MSG MSG
Data Streaming
JSON/
Avro
Schema
format
Message
Topic
Batch data
Schema
Topic
Real-Time Data Flow
33. Attunity Replicate
User Interface
• Intuitive web-based GUI
• Drag and drop, wizard-assisted configuration steps
• Consistent process for all sources and targets
Guided User Experience
34. Attunity Enterprise Manager (AEM)
Manage Data Ingest and Replication At Scale
• Centralize design and control
§ Loading, mapping, DDL changes
§ Stop, start, resume, reload
§ Automated status discovery
• Manage security controls
§ Granular access controls
§ Full audit trail
• Customize your views
§ Group, search, filter, sort and drill
down on tasks
§ Respond to real-time alertsLeverage graphical dashboard and APIs (REST and .Net)
36. Demo! The scenario
• 3 Tables:
• Customer
• Account
• Operations (transactions)
• Operations change the balance of an account in the same
transaction
• The same transactional integrity has to be replicated to the
MongoDB data store
37. Consumers / ChannelsProducers
Reference Architecture with a twist
Files
Mainframe
Additional
data sources
& systems
Files
Web Mobile
B2C CMS (…)
APIs
API (Authorisation, Authentication, Logging, etc.)
(E)TL, e.g. Abinitio,
Talend, Informatica,
Custom
CDC Process
(Change Data Capture)
(Attunity)
Message Queues, e.g.
Kafka, ActiveMQ,
RabbitMQ
Orders
MDM Logistics
(…)
Optional
Operational
Data Store
38. What is Kakfa?
• Kafka is a distributed publish-subscribe messaging system
organized in topics.
• It’s designed to be
• Fast
• Scalable
• Durable
• When used in the right way and for the right use case, Kafka
has unique attributes that make it a highly attractive option for
data integration.
39. First approach
CDC Process
(Change Data Capture) C A O A O A O …
Transaction
• All DBMS transactions are in the same topic and the order is guaranteed
• Kafka persists the data sequentially in MongoDB
• No other tools are needed, all processing and persistence is managed
inside Kafka and MongoDB
• Higher parallelism can be achieved by partitioning in Kafka by Customer
ID
40. Second approach – Distributed Transactions
A A A A A A A A
CDC Process
O O O O O O O O
Public.Account
Public.Transaction
CDC Process
A A A A A A A A
O O O O O O O O
Private.Account
Private.Transaction
Change
Message
Key
Join
AO AO AO AO AO AO AO AO
43. 5 phases of Legacy Offloading
MongoDB can help you offload MIPS from the legacy systems, save double-digits in cost and
increase agility and capabilities for new use cases at the same time.
Scope
BusinessBenefits
Transactions are written first to MongoDB, which passes the
data on to the legacy system of record.
Writes are performed concurrently to the legacy as well as
MongoDB (Y-Loading), e.g. via a service-driven
architecture.
The Operational Data Layer (ODL) data is enriched with
additional sources to serve as operational intelligence
platform for insights and analytics.
Enriched ODL
Records are copied via CDC/Delta Load mechanism from
the legacyDB into MongoDB, which serves as Operational
Data Layer (ODL), e.g. for frequent reads.
Operational
Data Layer (ODL)
“MongoDB first”
“Y-Loading”
System of Record
MongoDB serves as system of record for a multitude of
applications, with deferred writes to the legacy if necessary.
Offloading
Reads
Transforming the role
of the mainframe
Offloading
Reads & Writes
44. Offloading Reads
Initial use cases primarily focus on offloading costly reads, e.g. for querying large numbers of
transactions for analytics or historical views across customer data.
Application Application
Mainframe Mainframe
Operational Data Layer (ODL)
Using a change data capture (CDC) or delta load mechanism
you create an operational data layer alongside the mainframe
that serves read-heavy operations.
Additional
data sources
Files
Enriched Operational Data Layer (ODL)
Additional data sourced are loaded into the ODS to create an
even richer picture of your existing data and enable additional
use cases like advanced analytics.
Writes
Reads Reads
Writes
100%
10-50%50-90%
Writes
Reads
100%
25-75%25-75%
Writes
Reads
45. Offloading Reads & Writes
By introducing a smarter architecture to orchestrate writes concurrently, e.g. via a Microservices
architecture, you can shift away from delayed CDC or delta load mechanisms.
Mainframe
Additional
data sources
Files
Reads
Y-Loading
Writing (some) data concurrently into the mainframe
as well as MongoDB enables you to further limit
interactions with the mainframe technology .
It also sets you up for a more transformational shift of
the role of the mainframe with regards to your
enterprise architecture.
Application
10-25%75-90%
40-80%20-60%
Writes
Reads
Microservices / API Layer
Writes
46. Transforming the role of the legacy
With a shift towards writing to MongoDB first before writing to the mainframe (if at all) you are further
changing the meaning of “system of record” and “mainframe” within the organisation.
Mainframe
Additional
data sources
Files
System of Record
MongoDB serves as main System of Record, with writes
optionally being passed on to the mainframe for legacy
applications only or it gets decommissioned entirely.
Mainframe
Additional
data sources
Files
“MongoDB first”
Transactions first write to MongoDB, which can serve as buffer
before it passes transactions to the mainframe as System of
Record.
Writes Processing
20-50%50-80%
60-90%10-40%
Writes
Reads
50-90%10-50%
90-100%0-10%
Writes
Reads
Application
Microservices / API Layer
Reads
Writes
Application
Microservices / API Layer
Reads
Writes
47.
48. Mainframe Offloading enables insight
through Single View of Customer
Spanish bank replaces Teradata and Microstrategy to
increase business insight and avoid significant cost
Problem Why MongoDB Results
Problem Solution Results
Branches required an application that
offered all information about a given
customer and all his contracts (accounts,
loans, cards, etc.).
Multi-minute latency for accessing
customer data stored in Teradata and
Microstrategy.
In addition, accessing data frequently from
the legacy systems would cause spikes in
MIPS and related cost.
Offloaded to MongoDB where data is
highly-available and can be accessed by
new applications and channels.
Built single view of customer on top of
MongoDB – flexible and scalable app,
easy to adapt to new business needs.
Super fast, ad hoc query capabilities
(milliseconds), and real-time analytics
thanks to MongoDB’s Aggregation
Framework.
Can now leverage distributed
infrastructure and commodity hardware for
lower total cost of ownership and greater
availability.
Cost avoidance of 10M$+
Application developed and deployed in
less than 6 months. New business policies
easily deployed and executed, bringing
new revenue to the company.
Current capacity allows branches to load
instantly all customer info in milliseconds,
providing a great customer experience.
New applications and services can be
built on the same data platform without
causing MIPS/cost or increasing risk by
putting more stress on legacy systems.
49. Problem Why MongoDB ResultsProblem Solution Results
High licensing costs from proprietary
database and data grid technologies
Data duplication across systems with
complex reconciliation controls
High operational complexity impacting
service availability and speed of
application delivery
Implemented a multi-tenant PaaS with
shared data service based on
MongoDB, accessed via a common API
with message routing via Kafka
Standardized data structures for storage
and communication based on JSON
format
Multi-sharded, cross-data center
deployment for scalability and
availability
$ millions in savings after migration from
Coherence, Oracle database and
Microsoft SQL Server
Develop new apps in days vs months
100% uptime with simplified platform
architecture, higher utilization and
reduced data center footprint
Database-as-a-Service
Migration from Oracle & Microsoft to create a consolidated
“data fabric” reduces $m in cost, speeds application
development & simplifies operations
50. During their recent FY 2016
Investor Report, RBS CEO Ross
McEwan highlighted their MongoDB
Data Fabric platform as a key
enabler to helping the Bank reduce
cost significantly and dramatically
increase the speed at which RBS
can deploy new capabilities.
“Data Fabric will help reduce cost
significantly and dramatically
increase the speed at which we can
deploy new capabilities for our
customers”
-Ross McEwan, CEO RBS
RBS’s Investor Report FY’16
52. Future Challenges
• Open Banking and Open Data
• Provides API for 3rd party applications
• Unpredictable inbound traffic
• Requires an elastic infrastructure
• Provisioning time very high
• Blockchain
• Will have a huge impact to manage transactions
53. CloudProducers
Future Architecture
Files
Mainframe
Additional
data sources
& systems
Files
Consumers / Channels
Web
Mobil
e
B2C CMS (…)
APIs
INTERNAL API
Consumer (Read,
Transform, Apply)
Attunity CDC
Kafka
Orders
MDM Logistics
(…)
Optional
On Premise
Operational
Data Store
Optional: Write-back
mechanism
Consumer (Read,
Transform, Apply)
OPEN API
Third Party Payment
Web Mobile
55. Stefano Gatti – Head of Innovation and Data Sources
06 Luglio 2017
Cerved Api
Il viaggio dal progetto alla piattaforma
56. 56
Cerved e i suoi “dati”
Il dato «liquido» e l’algoritmo «solido»
Benvenuti nella Api Economy ma ...
Cerved Api Ecosystem
Le prossime sfide
Indice
58. 58
Aree Business & Numeri
CREDIT INFORMATION
Tutelarsi dal rischio di credito
MARKETING SOLUTIONS
Crescere con nuove opportunità di business
CREDIT MANAGEMENT
Gestire e recuperare i crediti in sofferenza
1000 report/min
ü Documenti
40 milioni
ü Linee di codice SW
34,000
üClienti
60 milioni
üDati di Pagamenti
2,000
ü Persone
377 milioni Euro
(2016)
üRicavi
59. 59
La nostra V più importante: “Variety”
Web Data
Open Data
Dati proprietari
Dato ufficiale non
camerale
Dato ufficiale
camerale
A
c
c
u
r
a
c
y
C
o
m
p
l
e
s
s
i
t
à
60. 60
L’infrastruttura di erogazione
Sourcing
Business Rules
Prodotti
Erogazione
Operations
Piattaforma
1 PB Byte dati
3000 business-rules 600 milioni
di eventi dati di
monitoraggio
all’anno
350 operatori
su sw interno
50 siti web
di erogazione
> 200 progetti B2B
> 500 prodotti
80% evasioni
time-critical
1350
server
18 dei primi 30
database più diffusi
in produzione
61. 61
Il contesto API e i suoi numeri
Servizi/Report SOAP:
oltre 1000
Microservices: 3.000
Ricerche Anagrafiche:
110.000
Calcoli Rating
300.000
Chiamate a Servizi
10.000.000
Intra Farm: 5.000.000
Legacy:
2.500.000
SOAP:
1.500.000
REST:
1.000.000
Eventi su Dati
4.500.000
Operazioni su Storage
documentale
6.500.000
Blocchi di Informazione
Erogati:
5.000.000
UN GIORNO
IN CERVED:
65. 65
Big data & algorithms
Per prendere decisioni migliori …
«Co-fondatore del MIT
Media Lab, pioniere della
human-machine interaction
e fra i data scientist più
importanti del mondo»
Sandy Pentland
Fonte: http://www.betterdecisions.it
66. 66
API Economy trend
Uno strumento per rendere più agile la pervasività di dati e algoritmi
20162012
Trend tecnologico Trend business
68. Approccio API Economy Data & Algorithmic-driven
Overview
OFFERTA
Cerved API
Diversi tipi di “Consumer” BUSINESS
(es:%Sales%&%Marketing,%Finance%
&%Operations,%Strategy and%
Development,%…)
IT
(es: Banche, Istituzioni
Finanziarie, Grandi Imprese,
Agenzie Web, Blogger di
News Economiche, System
Integrator, Sviluppatori di
Software Gestionale, PMI, …)
BUSINESS LINE
IT
API come “building block” secondo un modello “a store”
69. API Economy e la vision Cerved
Algoritmi
Dati
Algoritmi
personalizzati
Algoritmi
DatiDati esterni
Dati
Clienti – Partner – Terze Parti
Cerved platformLivello
Integrazione
Dati
Start-up, Developers
Componenti innovativi
Open innovation, hackathonConnettori
Applicazioni
API Portal + API Gateway raw data
Algoritmi
personalizzati
Algoritmi
Dati esterni
Dati
Connettori
Applicazioni
Connettori
Applicazioni
70. API non solo come tecnologia ma come prodotto
Ongoing lesson learned …
• Bisogni del Cliente
• Usabilità sviluppatori
• Design
• Manutenzione e supporto
• Un business agile costruito
su una piattaforma di APIs
Focus su
72. Cerved API ArchitectureApigeeCerved
Api Dev
Portal
CAS
login
IMS
Check
User
check
user
Apigee
Enterprise
Cerved API
EndPoint
DB Rel
(Oracle)
BR Static
BR dynamic
Api
request
• SLA
• Carico sul DB Rel
• Prestazioni BR
Problemi
73. Cerved API Architecture (hybrid cloud)ApigeeCerved
Api Dev
Portal
CAS
login
IMS
Check
User
check
user
Apigee
Enterprise
DB Rel
(Oracle)
Graph
DB
(Neo4J)
Doc. DB
(Mongo
DB)
BR Static
BR dynamic
Api
request
Cloud
DB
Cloud
Cerved API
EndPoint
API Cloud
74. 74
Cerved&MongoDb: presente e futuro
OLTP Cerved
(Oracle, Teradata)
Business
LogicCerved
API
Marketplace
Sistem Integrators
Partners Esterni
Loader
Loader
MongoDb
Tools Data
Science
(ML, BI, DL, …)
Data Lake
(Cloudera)
MongoDb
Caching
Layer
Generator
e EventiMongoDb
Caching
Layer
Cloud
(AWS, Google, …)
76. Cerved developer portal
API di ricerca nell’intero tessuto economico italiano
Per integrare e cercare
in qualunque soluzione
o caso d’uso:
• Tutte le aziende italiane
• La parte più importante
delle attività economiche
non iscritte
• Le persone del tessuto
economico italiano
Con i principali dettagli
anagrafici
77. Cerved developer portal
API che fornisce lo score su tutte le aziende italiane
Credit scoring calcolato event-driven
su tutte le aziende italiane con
grading su diverse dimensioni
dell’azienda
78. Cerved developer portal
The Italian Business Graph Apis
Antifrode
Procurement
Business Investigatio
Business Scouting
Data Journalism
81. Le prossime sfide
ü Integrare al meglio il business nel ciclo
di sviluppo e manutenzione
ü Rafforzare il rapporto con i nostri Clienti
migliorandone l’esperienza
Ø Aumentare la velocità di pubblicazione
di API innovative
Ø Integrare API di partner e aziende del
gruppo
Ø Rafforzare l’ecosistema della