SlideShare a Scribd company logo
1 of 16
Bridging the Gap
John Kuchmek – American Water
Adam Michalsky – American Water
Nagaraj Jayakumar - Hortonworks
WHO WE ARE
We serve a broad national footprint and a strong
local presence.
We provide services to approximately 15 million
people in 46 states and Ontario, Canada.
We employ 6,900 dedicated and active employees
and support ongoing community support and
corporate responsibility.
We treat and deliver more than one billion gallons
of water daily.
We are the largest and most geographically
diverse publicly traded water and wastewater
service provider in the Unites States.
Problem Statement
Achieve fast change data capture from SAP while providing de-
normalized data sets to end consumers without impacting the
source transactional systems.
Hana table replication maintains source system normalization
which can be a problem for business logic design in application
use
No Hana change data capture existed using denormalized table
structures
Environment
4 Management Nodes:
(32 Cores x 78 GB)
8 Compute Nodes
(32 Cores x 128 GB)
2 Management Nodes:
(6 Cores x 16 GB)
5 NiFi Nodes
(16 Cores x 64 GB)
Data Ingestion - Architecture
Runtime
SLT
SOURCE INGEST STORAGE ANALYTICS UI/UX
Adding Timestamp to SLT
Dataset Denormalization
CDC Process – High Level
Staging Delta Base
Data Ingestion
Metrics (Average Merge Time)
maintenancenotificati
onacts
meterataglance crmlongtext interactionrecords ecclongtext
maintenanceordersta
tus
contractaccountdoch
eaderfb
records 2764 15248 19958 18970 20235 8183 175433
base table 36220184 13589752 18753324 74356523 143224450 166172977 398561392
seconds 141 121 178 99 139 152 449
1
10
100
1000
1
10
100
1000
10000
100000
1000000
10000000
100000000
1E+09
#OFRECORDSLOGBASE10
Average Concurrent Merges per Table on LLAP
Metrics (Max, Min & Mean)
min(simultaneous) average(simultaneous) max(simultaneous)
records 18970 37255.85714 175433
base table 74356523 121554086 398561392
seconds 99 182.7142857 449
1
10
100
1000
10000
100000
1000000
10000000
100000000
1E+09
#OFRECORDSLOGBASE10
Min,Avg, Max Time for 7 Concurrent Merges on LLAP
Source System Load
DURATION (Seconds)
0
50,000
100,000
150,000
200,000
250,000
300,000
5/18/1813:41
5/18/1813:40
5/18/1813:40
5/18/1813:36
5/18/1813:36
5/18/1813:35
5/18/1813:35
5/18/1813:35
5/18/1813:35
5/18/1813:35
5/18/1813:02
5/18/1813:00
5/18/1812:58
5/18/1812:56
5/18/1812:56
5/18/1812:55
5/18/1812:54
5/18/1812:54
5/18/1812:54
5/18/1810:10
5/18/1810:08
5/18/1810:07
5/18/1810:04
5/18/1810:04
5/18/1810:03
5/18/1810:03
5/18/1810:03
5/18/1810:03
5/18/1810:03
#OFRECORDS
TIME
Source System Load (snapshot)
250,000-300,000
200,000-250,000
150,000-200,000
100,000-150,000
50,000-100,000
0-50,000
Average CPU Utilization
0
10
20
30
40
50
60
%UTILIZATION
TIME
Average CPU Usage Accross 8 Node Cluster
Average of Minimum CPU Load
Average of Average CPU Load
Average of Peak CPU Load
Average Memory Used (hourly)
0
10
20
30
40
50
60
70
80
90
100
MEMORYINGB
TIME
Average Memory Used Across 8 Node Cluster
Average of Minimum Memory Used
Average of Average Memory Used
Average of Peak Memory Used
NiFi Flow as a Service (NFaaS)
THANK YOU

More Related Content

What's hot

What's hot (20)

DDD In Agile
DDD In Agile   DDD In Agile
DDD In Agile
 
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environmentLessons learned from scaling YARN to 40K machines in a multi tenancy environment
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSEUnderstanding blue store, Ceph's new storage backend - Tim Serong, SUSE
Understanding blue store, Ceph's new storage backend - Tim Serong, SUSE
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Spark tuning
Spark tuningSpark tuning
Spark tuning
 
BlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for CephBlueStore: a new, faster storage backend for Ceph
BlueStore: a new, faster storage backend for Ceph
 
04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence
 
spark_intro_1208
spark_intro_1208spark_intro_1208
spark_intro_1208
 
Redis overview
Redis overviewRedis overview
Redis overview
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Using Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with CephUsing Rook to Manage Kubernetes Storage with Ceph
Using Rook to Manage Kubernetes Storage with Ceph
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
 

Similar to SAP CDC and NiFi Flow as a Service

Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
Stephen Rose
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
blewington
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Matt Stubbs
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
Genoveva Vargas-Solar
 

Similar to SAP CDC and NiFi Flow as a Service (20)

Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4jWebinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management Things
 
Netapp Storage
Netapp StorageNetapp Storage
Netapp Storage
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Teradata a z
Teradata a zTeradata a z
Teradata a z
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATA
 
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing PlatformSAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
SAP HANA - The Foundation of Real Time, Now on the AWS Cloud Computing Platform
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
Sql server 2016 it just runs faster sql bits 2017 edition
Sql server 2016 it just runs faster   sql bits 2017 editionSql server 2016 it just runs faster   sql bits 2017 edition
Sql server 2016 it just runs faster sql bits 2017 edition
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
SAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance FeaturesSAP ASE 16 SP02 Performance Features
SAP ASE 16 SP02 Performance Features
 
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
Big Data LDN 2016: Kick Start your Big Data project with Hyperconverged Infra...
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 
Inspur: Open Hardware in Hyperscale Datacenters
Inspur: Open Hardware in Hyperscale Datacenters Inspur: Open Hardware in Hyperscale Datacenters
Inspur: Open Hardware in Hyperscale Datacenters
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Will it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing ApplicationsWill it Scale? The Secrets behind Scaling Stream Processing Applications
Will it Scale? The Secrets behind Scaling Stream Processing Applications
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

SAP CDC and NiFi Flow as a Service

  • 1. Bridging the Gap John Kuchmek – American Water Adam Michalsky – American Water Nagaraj Jayakumar - Hortonworks
  • 2. WHO WE ARE We serve a broad national footprint and a strong local presence. We provide services to approximately 15 million people in 46 states and Ontario, Canada. We employ 6,900 dedicated and active employees and support ongoing community support and corporate responsibility. We treat and deliver more than one billion gallons of water daily. We are the largest and most geographically diverse publicly traded water and wastewater service provider in the Unites States.
  • 3. Problem Statement Achieve fast change data capture from SAP while providing de- normalized data sets to end consumers without impacting the source transactional systems. Hana table replication maintains source system normalization which can be a problem for business logic design in application use No Hana change data capture existed using denormalized table structures
  • 4. Environment 4 Management Nodes: (32 Cores x 78 GB) 8 Compute Nodes (32 Cores x 128 GB) 2 Management Nodes: (6 Cores x 16 GB) 5 NiFi Nodes (16 Cores x 64 GB)
  • 5. Data Ingestion - Architecture Runtime SLT SOURCE INGEST STORAGE ANALYTICS UI/UX
  • 8. CDC Process – High Level Staging Delta Base
  • 10. Metrics (Average Merge Time) maintenancenotificati onacts meterataglance crmlongtext interactionrecords ecclongtext maintenanceordersta tus contractaccountdoch eaderfb records 2764 15248 19958 18970 20235 8183 175433 base table 36220184 13589752 18753324 74356523 143224450 166172977 398561392 seconds 141 121 178 99 139 152 449 1 10 100 1000 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 #OFRECORDSLOGBASE10 Average Concurrent Merges per Table on LLAP
  • 11. Metrics (Max, Min & Mean) min(simultaneous) average(simultaneous) max(simultaneous) records 18970 37255.85714 175433 base table 74356523 121554086 398561392 seconds 99 182.7142857 449 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 #OFRECORDSLOGBASE10 Min,Avg, Max Time for 7 Concurrent Merges on LLAP
  • 12. Source System Load DURATION (Seconds) 0 50,000 100,000 150,000 200,000 250,000 300,000 5/18/1813:41 5/18/1813:40 5/18/1813:40 5/18/1813:36 5/18/1813:36 5/18/1813:35 5/18/1813:35 5/18/1813:35 5/18/1813:35 5/18/1813:35 5/18/1813:02 5/18/1813:00 5/18/1812:58 5/18/1812:56 5/18/1812:56 5/18/1812:55 5/18/1812:54 5/18/1812:54 5/18/1812:54 5/18/1810:10 5/18/1810:08 5/18/1810:07 5/18/1810:04 5/18/1810:04 5/18/1810:03 5/18/1810:03 5/18/1810:03 5/18/1810:03 5/18/1810:03 #OFRECORDS TIME Source System Load (snapshot) 250,000-300,000 200,000-250,000 150,000-200,000 100,000-150,000 50,000-100,000 0-50,000
  • 13. Average CPU Utilization 0 10 20 30 40 50 60 %UTILIZATION TIME Average CPU Usage Accross 8 Node Cluster Average of Minimum CPU Load Average of Average CPU Load Average of Peak CPU Load
  • 14. Average Memory Used (hourly) 0 10 20 30 40 50 60 70 80 90 100 MEMORYINGB TIME Average Memory Used Across 8 Node Cluster Average of Minimum Memory Used Average of Average Memory Used Average of Peak Memory Used
  • 15. NiFi Flow as a Service (NFaaS)

Editor's Notes

  1. In HANA studio we can de-normalize the datasets.
  2. The end result in HANA will look like this. UPDATE_TS is our timestamp field. Special Notes: A timestamp will only be updated once a change occurs. After initial replication timestamps will be null or 0. If you want to add a timestamp on a table that already exists on SLT then it needs to be re-replicated.
  3. In HANA studio we can de-normalize the datasets.