SlideShare a Scribd company logo
©2018 Impetus Technologies, Inc. All rights reserved.
You are prohibited from making a copy or modification of, or from redistributing,
rebroadcasting, or re-encoding of this content without the prior written consent of
Impetus Technologies.
This presentation may include images from other products and services. These
images are used for illustrative purposes only. Unless explicitly stated there is no
implied endorsement or sponsorship of these products by Impetus Technologies. All
copyrights and trademarks are property of their respective owners.
Planning your Next-Gen Change Data Capture (CDC) Architecture
December 19, 2018
Agenda
What is CDC?
Various methods for CDC in the enterprise data warehouse
Key considerations for implementing a next-gen CDC architecture
Demo
Q&A
About Impetus
We exist to create powerful and intelligent enterprises through deep
data awareness, data integration, and data analytics.
About Impetus
Many of North America’s most respected and well-known brands trust
us as their strategic big data and analytics partner.
Transformation
Legacy EDW to
big data/cloud
Unification
Data processing,
preparation, and
access
Analytics
Real-time, machine
learning, and AI
Self-service
BI on big data/
cloud
End-to-End Big Data Solutions
What are the different change data capture use cases currently deployed
in your organization (choose all that apply)?
Continuous Ingestion in the Data Lake
Capturing streaming data changes
Database migration to cloud
Data preparation for analytics and ML jobs
We still have a legacy system
Our Speakers Today
SAURABH DUTTA
Technical Product Manager
SAMEER BHIDE
Senior Solutions Architect
What is Change Data Capture (CDC)?
CDC is the process of capturing changes made at the data source and
applying them throughout the enterprise.
Let’s Take a Closer Look
Source Database Target Database
Let’s Take a Closer Look
Source Database Target Database
WebApp
Let’s Take a Closer Look
Source Database Target Database
Customer {
Telephone: “111”
}
Customer {
Telephone: “111”
}
Create
WebApp
Let’s Take a Closer Look
Change Data
Capture Event
Source Database Target Database
Customer {
Telephone: “111”
}
Customer {
Telephone: “111”
}
Customer {
Telephone: “111”
}
Create
WebApp
Let’s Take a Closer Look
Change Data
Capture Event
Source Database Target Database
Customer {
Telephone: “222”
}
Customer {
Telephone: “222”
}
Customer {
Telephone: “222”
}
Update
WebApp
What Does CDC Mean for the Enterprise?
Batch
Replicate
Filter Transform
In-memory
Batch
RDBMS
Data
warehouse
Files
RDBMS
Data
warehouse
Hadoop
Streaming
Legacy
Real-time Incremental
Modern CDC Applications
Data lake: Continuous ingestion and pipeline
automation
Streaming: Data changes to Kafka, Kinesis, or
other queues
Cloud: Data workload migration
Business applications: Data preparation for
analytics and ML jobs
Legacy system: Data delivery and query offload
Data lakes
CloudStreaming
Data
warehouse
Files Legacy
RDBMS
Methods of Change Data Capture
Database triggers Data modification stamping Log based CDC
Database Triggers
Uses shadow tables
Challenges
• Introduces overhead
• Increases load to retrieve
• Loses intermediate changes
Date Modification Stamping
Transactional applications keep track of metadata in every row
• Tracks when the row was created and last modified
• Enables filter on the DATE_MODIFIED column
Challenges
• There is no DATE_MODIFIED for a deleted row
• Trigger based DATE_MODIFIED
• Extracting is resource intensive
Log Based CDC
Uses transactional logs
Challenges
• Interpreting transaction logs
• No direct interface to transaction log by vendors
• Agents and interfaces change with new database versions
• Supplemental logging increases volume of data
Run initial
load
Incremental
updates
Change Data Capture Implementation Steps
Enable CDC
for database
Define
target
Table to
handle CDC
states
Prepare table
for CDC
Next Gen Architecture Considerations
Ease of Use
Pre-packaged operators
Extensibility
Modern user experience
Real-time
Change Data Capture
Stream live updates
Optimized for high performance
Hybrid
Multiple vendors
On-premise and cloud
Databases, data warehouse,
and data lake
Value Proposition of CDC
Incremental update efficiency
Source/production impact
Time to value
TCO
Scale and flexibility
Continuous Ingestion in the Data Lake 46%
Capturing streaming data changes 58%
Database migration to cloud 38%
Data preparation for analytics and ML jobs 35%
We still have a legacy system 46%
What are the different change data capture use cases currently deployed
in your organization (choose all that apply)?
ETL, Real-time Stream Processing and Machine Learning Platform
+ A Visual IDE for Apache Spark
CDC with StreamAnalytix
Turnkey
adapters for
CDC vendor
ETL and data
wrangling
visual
operators
Elastic
compute
ReconcileTransform Enrich
Structured
data stores
CDC streams
Unstructured
data streams
File stores
Structured
data stores
Message
queues
Hadoop/Hive
Cloud storage
and DW
CDC Capabilities in StreamAnalytix
Integration with CDC providers
CDC Capabilities in StreamAnalytix
LogMiner integration
CDC Capabilities in StreamAnalytix
Turnkey reconciliation feature for Hadoop offload
CDC Capabilities in StreamAnalytix
Large set of visual operators for ETL,
analytics, and stream processing
Zero code approach to ETL design
Built in NFR support
StreamAnalytix CDC Solution Design
StreamAnalytix Workflow
A complete CDC solution has three parts:
Each aspect of the solution is modelled as a StreamAnalytix pipeline
Data
de-normalization
Join transactional data
with data at rest, and
store de-normalized data
on HDFS
Merge previously
processed transactional
data with new
incremental updates
Incremental
updates in Hive
Data ingestion
and staging
Stream data from Attunity,
replicate from Kafka or
LogMiner for multiple
tables, and store raw data
into HDFS
Pipeline #1: Data Ingestion and Staging (Streaming)
Data Enrichment
Enriches incoming data with
metadata information and event
timestamp
HDFS
Stores CDC data on HDFS in
landing area using OOB HDFS
emitter. HDFS files are rotated
based on time and size
Data Ingestion via Attunity
‘Channel’
Reads the data from Attunity and
target is Kafka. Configured to read
data feeds and metadata from a
separate topic
Pipeline #1: Data Ingestion and Staging (Streaming)
Data Enrichment
Enriches incoming data with
metadata information and event
timestamp
HDFS
Stores CDC data on HDFS in
landing area using OOB HDFS
emitter. HDFS files are rotated
based on time and size
Data Ingestion via Attunity
‘Channel’
Reads the data from Attunity and
target is Kafka. Configured to read
data feeds and metadata from a
separate topic
Pipeline #1: Data Ingestion and Staging (Streaming)
Data Enrichment
Enriches incoming data with
metadata information and event
timestamp
HDFS
Stores CDC data on HDFS in
landing area using OOB HDFS
emitter. HDFS files are rotated
based on time and size
Data Ingestion via Attunity
‘Channel’
Reads the data from Attunity and
target is Kafka. Configured to read
data feeds and metadata from a
separate topic
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #3: Incremental Updates in Hive (Batch)
Reconciliation Step
Hive “merge into” SQL, performs
insert, update and delete
operation based on the
operation in incremental data
Clean up step
Runs a drop table command on
the managed table to clean up
processed data – to avoid
repeated processing
Run a Hive SQL query to load a
managed table from the HDFS
incremental data generated
from Pipeline#2
Pipeline #3: Incremental Updates in Hive (Batch)
Reconciliation Step
Hive “merge into” SQL, performs
insert, update and delete
operation based on the
operation in incremental data
Clean up step
Runs a drop table command on
the managed table to clean up
processed data – to avoid
repeated processing
Run a Hive SQL query to load a
managed table from the HDFS
incremental data generated
from Pipeline#2
Pipeline #3: Incremental Updates in Hive (Batch)
Reconciliation Step
Hive “merge into” SQL, performs
insert, update and delete
operation based on the
operation in incremental data
Clean up step
Runs a drop table command on
the managed table to clean up
processed data – to avoid
repeated processing
Run a Hive SQL query to load a
managed table from the HDFS
incremental data generated
from Pipeline#2
Workflow: Oozie Coordinator Job
Oozie orchestration flow created using StreamAnalytix
webstudio – it orchestrates pipeline#2 and pipeline#3 into a
single Oozie flow that can be scheduled as shown here
Demo
Data channels
• Attunity Replicate and LogMiner
Data processing pipeline walkthrough
• Data filters and enrichment
• Analytics and data processing operators
• Data stores
Summary
Do more with your data acquisition flows
• Acquire and process data in real-time
• Enrich data from data marts
• Publish processed data as it arrives
• Multiple parallel processing paths (read once, process multiple times)
Move away from fragmented processes
• Unify data analytics and data processing/ETL flows
Conclusion
The right next gen CDC solution can make data ready for analytics as
it arrives in near real-time
CDC-based data integration is far more complex than the full export
and import of your database
A unified platform simplifies and reduces the complexities of
operationalizing CDC flows
LIVE Q&A
For a free trial download or cloud access visit www.StreamAnalytix.com
For any questions, contact us at inquiry@streamanalytix.com

More Related Content

What's hot

Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksKnoldus Inc.
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lakeBHASKAR CHAUDHURY
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance BigID Inc
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data CloudKent Graziano
 
Database Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention PeriodsDatabase Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention PeriodsCraig Mullins
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleAdam Doyle
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 

What's hot (20)

Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
 
Database Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention PeriodsDatabase Archiving - Managing Data for Long Retention Periods
Database Archiving - Managing Data for Long Retention Periods
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Snowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at ScaleSnowflake Data Science and AI/ML at Scale
Snowflake Data Science and AI/ML at Scale
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 

Similar to Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - StreamAnalytix Webinar

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopBill Hayduk
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 

Similar to Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - StreamAnalytix Webinar (20)

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 

More from Impetus Technologies

The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...Impetus Technologies
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Impetus Technologies
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarImpetus Technologies
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloudImpetus Technologies
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarBest practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarImpetus Technologies
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeImpetus Technologies
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarInstantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarImpetus Technologies
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarKeys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarImpetus Technologies
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarImpetus Technologies
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataKeys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataImpetus Technologies
 
Build Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesBuild Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesImpetus Technologies
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Impetus Technologies
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkImpetus Technologies
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...Impetus Technologies
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Impetus Technologies
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxImpetus Technologies
 

More from Impetus Technologies (19)

The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloud
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarBest practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus Webinar
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to Snowflake
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarInstantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarKeys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinar
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataKeys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of Data
 
Build Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesBuild Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in Minutes
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache Spark
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 

Recently uploaded

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhArpitMalhotra16
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单yhkoc
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单ewymefz
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单ewymefz
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单vcaxypu
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单vcaxypu
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单ocavb
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportSatyamNeelmani2
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 

Recently uploaded (20)

standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 

Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - StreamAnalytix Webinar

  • 1. ©2018 Impetus Technologies, Inc. All rights reserved. You are prohibited from making a copy or modification of, or from redistributing, rebroadcasting, or re-encoding of this content without the prior written consent of Impetus Technologies. This presentation may include images from other products and services. These images are used for illustrative purposes only. Unless explicitly stated there is no implied endorsement or sponsorship of these products by Impetus Technologies. All copyrights and trademarks are property of their respective owners.
  • 2. Planning your Next-Gen Change Data Capture (CDC) Architecture December 19, 2018
  • 3. Agenda What is CDC? Various methods for CDC in the enterprise data warehouse Key considerations for implementing a next-gen CDC architecture Demo Q&A
  • 4. About Impetus We exist to create powerful and intelligent enterprises through deep data awareness, data integration, and data analytics.
  • 5. About Impetus Many of North America’s most respected and well-known brands trust us as their strategic big data and analytics partner.
  • 6. Transformation Legacy EDW to big data/cloud Unification Data processing, preparation, and access Analytics Real-time, machine learning, and AI Self-service BI on big data/ cloud End-to-End Big Data Solutions
  • 7. What are the different change data capture use cases currently deployed in your organization (choose all that apply)? Continuous Ingestion in the Data Lake Capturing streaming data changes Database migration to cloud Data preparation for analytics and ML jobs We still have a legacy system
  • 8. Our Speakers Today SAURABH DUTTA Technical Product Manager SAMEER BHIDE Senior Solutions Architect
  • 9. What is Change Data Capture (CDC)? CDC is the process of capturing changes made at the data source and applying them throughout the enterprise.
  • 10. Let’s Take a Closer Look Source Database Target Database
  • 11. Let’s Take a Closer Look Source Database Target Database WebApp
  • 12. Let’s Take a Closer Look Source Database Target Database Customer { Telephone: “111” } Customer { Telephone: “111” } Create WebApp
  • 13. Let’s Take a Closer Look Change Data Capture Event Source Database Target Database Customer { Telephone: “111” } Customer { Telephone: “111” } Customer { Telephone: “111” } Create WebApp
  • 14. Let’s Take a Closer Look Change Data Capture Event Source Database Target Database Customer { Telephone: “222” } Customer { Telephone: “222” } Customer { Telephone: “222” } Update WebApp
  • 15. What Does CDC Mean for the Enterprise? Batch Replicate Filter Transform In-memory Batch RDBMS Data warehouse Files RDBMS Data warehouse Hadoop Streaming Legacy Real-time Incremental
  • 16. Modern CDC Applications Data lake: Continuous ingestion and pipeline automation Streaming: Data changes to Kafka, Kinesis, or other queues Cloud: Data workload migration Business applications: Data preparation for analytics and ML jobs Legacy system: Data delivery and query offload Data lakes CloudStreaming Data warehouse Files Legacy RDBMS
  • 17. Methods of Change Data Capture Database triggers Data modification stamping Log based CDC
  • 18. Database Triggers Uses shadow tables Challenges • Introduces overhead • Increases load to retrieve • Loses intermediate changes
  • 19. Date Modification Stamping Transactional applications keep track of metadata in every row • Tracks when the row was created and last modified • Enables filter on the DATE_MODIFIED column Challenges • There is no DATE_MODIFIED for a deleted row • Trigger based DATE_MODIFIED • Extracting is resource intensive
  • 20. Log Based CDC Uses transactional logs Challenges • Interpreting transaction logs • No direct interface to transaction log by vendors • Agents and interfaces change with new database versions • Supplemental logging increases volume of data
  • 21. Run initial load Incremental updates Change Data Capture Implementation Steps Enable CDC for database Define target Table to handle CDC states Prepare table for CDC
  • 22. Next Gen Architecture Considerations Ease of Use Pre-packaged operators Extensibility Modern user experience Real-time Change Data Capture Stream live updates Optimized for high performance Hybrid Multiple vendors On-premise and cloud Databases, data warehouse, and data lake
  • 23. Value Proposition of CDC Incremental update efficiency Source/production impact Time to value TCO Scale and flexibility
  • 24. Continuous Ingestion in the Data Lake 46% Capturing streaming data changes 58% Database migration to cloud 38% Data preparation for analytics and ML jobs 35% We still have a legacy system 46% What are the different change data capture use cases currently deployed in your organization (choose all that apply)?
  • 25. ETL, Real-time Stream Processing and Machine Learning Platform + A Visual IDE for Apache Spark
  • 26. CDC with StreamAnalytix Turnkey adapters for CDC vendor ETL and data wrangling visual operators Elastic compute ReconcileTransform Enrich Structured data stores CDC streams Unstructured data streams File stores Structured data stores Message queues Hadoop/Hive Cloud storage and DW
  • 27. CDC Capabilities in StreamAnalytix Integration with CDC providers
  • 28. CDC Capabilities in StreamAnalytix LogMiner integration
  • 29. CDC Capabilities in StreamAnalytix Turnkey reconciliation feature for Hadoop offload
  • 30. CDC Capabilities in StreamAnalytix Large set of visual operators for ETL, analytics, and stream processing Zero code approach to ETL design Built in NFR support
  • 31. StreamAnalytix CDC Solution Design StreamAnalytix Workflow A complete CDC solution has three parts: Each aspect of the solution is modelled as a StreamAnalytix pipeline Data de-normalization Join transactional data with data at rest, and store de-normalized data on HDFS Merge previously processed transactional data with new incremental updates Incremental updates in Hive Data ingestion and staging Stream data from Attunity, replicate from Kafka or LogMiner for multiple tables, and store raw data into HDFS
  • 32. Pipeline #1: Data Ingestion and Staging (Streaming) Data Enrichment Enriches incoming data with metadata information and event timestamp HDFS Stores CDC data on HDFS in landing area using OOB HDFS emitter. HDFS files are rotated based on time and size Data Ingestion via Attunity ‘Channel’ Reads the data from Attunity and target is Kafka. Configured to read data feeds and metadata from a separate topic
  • 33. Pipeline #1: Data Ingestion and Staging (Streaming) Data Enrichment Enriches incoming data with metadata information and event timestamp HDFS Stores CDC data on HDFS in landing area using OOB HDFS emitter. HDFS files are rotated based on time and size Data Ingestion via Attunity ‘Channel’ Reads the data from Attunity and target is Kafka. Configured to read data feeds and metadata from a separate topic
  • 34. Pipeline #1: Data Ingestion and Staging (Streaming) Data Enrichment Enriches incoming data with metadata information and event timestamp HDFS Stores CDC data on HDFS in landing area using OOB HDFS emitter. HDFS files are rotated based on time and size Data Ingestion via Attunity ‘Channel’ Reads the data from Attunity and target is Kafka. Configured to read data feeds and metadata from a separate topic
  • 35. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 36. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 37. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 38. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 39. Pipeline #3: Incremental Updates in Hive (Batch) Reconciliation Step Hive “merge into” SQL, performs insert, update and delete operation based on the operation in incremental data Clean up step Runs a drop table command on the managed table to clean up processed data – to avoid repeated processing Run a Hive SQL query to load a managed table from the HDFS incremental data generated from Pipeline#2
  • 40. Pipeline #3: Incremental Updates in Hive (Batch) Reconciliation Step Hive “merge into” SQL, performs insert, update and delete operation based on the operation in incremental data Clean up step Runs a drop table command on the managed table to clean up processed data – to avoid repeated processing Run a Hive SQL query to load a managed table from the HDFS incremental data generated from Pipeline#2
  • 41. Pipeline #3: Incremental Updates in Hive (Batch) Reconciliation Step Hive “merge into” SQL, performs insert, update and delete operation based on the operation in incremental data Clean up step Runs a drop table command on the managed table to clean up processed data – to avoid repeated processing Run a Hive SQL query to load a managed table from the HDFS incremental data generated from Pipeline#2
  • 42. Workflow: Oozie Coordinator Job Oozie orchestration flow created using StreamAnalytix webstudio – it orchestrates pipeline#2 and pipeline#3 into a single Oozie flow that can be scheduled as shown here
  • 43. Demo Data channels • Attunity Replicate and LogMiner Data processing pipeline walkthrough • Data filters and enrichment • Analytics and data processing operators • Data stores
  • 44. Summary Do more with your data acquisition flows • Acquire and process data in real-time • Enrich data from data marts • Publish processed data as it arrives • Multiple parallel processing paths (read once, process multiple times) Move away from fragmented processes • Unify data analytics and data processing/ETL flows
  • 45. Conclusion The right next gen CDC solution can make data ready for analytics as it arrives in near real-time CDC-based data integration is far more complex than the full export and import of your database A unified platform simplifies and reduces the complexities of operationalizing CDC flows
  • 46. LIVE Q&A For a free trial download or cloud access visit www.StreamAnalytix.com For any questions, contact us at inquiry@streamanalytix.com

Editor's Notes

  1. intro - 5 min Poll - 2 mins   Saurabh Background 2 min Goals 3 min Steps 3 min Methods 3 min Architectural considerations 2 min   Sameer CDC with SAX 3 min Deployment Key Benefits 3 min Demo - 10 min  Beyond CDC - 5 min   Q&A - 10 mins
  2. CDC minimizes the resources required for ETL ( extract, transform, load ) processes because it only deals with data changes. The goal of CDC is to ensure data synchronicity.
  3. https://www.youtube.com/watch?v=v_hQyUZzLsA
  4. https://www.youtube.com/watch?v=v_hQyUZzLsA
  5. https://www.youtube.com/watch?v=v_hQyUZzLsA
  6. https://www.youtube.com/watch?v=v_hQyUZzLsA
  7. https://www.youtube.com/watch?v=v_hQyUZzLsA
  8. https://www.youtube.com/watch?v=1WrfgBx3hiQ
  9. Reference: https://www.slideshare.net/jimdeppen/change-data-capture-13718162 https://www.hvr-software.com/blog/change-data-capture/
  10. https://blog.exsilio.com/all/how-to-use-change-data-capture/
  11. https://www.youtube.com/watch?v=1WrfgBx3hiQ
  12. https://www.youtube.com/watch?v=1WrfgBx3hiQ
  13. Join Processor: It. HDFS:.
  14. Join Processor: It. HDFS:.
  15. Join Processor: It. HDFS:.
  16. Join Processor: It. HDFS:.
  17. DataGenerator: It generates a dummy record on start of pipeline. LoadStagingInHive: It runs a MergeStagingAndMasterData: It runs a. DropStagingHiveData: It runs a drop table command on hive to drop the managed table loaded in second step.
  18. DataGenerator: It generates a dummy record on start of pipeline. LoadStagingInHive: It runs a MergeStagingAndMasterData: It runs a. DropStagingHiveData: It runs a drop table command on hive to drop the managed table loaded in second step.
  19. DataGenerator: It generates a dummy record on start of pipeline. LoadStagingInHive: It runs a MergeStagingAndMasterData: It runs a. DropStagingHiveData: It runs a drop table command on hive to drop the managed table loaded in second step.
  20. Questions