SlideShare a Scribd company logo
1
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
August 13, 2022 at DataConLA
Real Time Data Streaming with Kafka
Speaker:
Jie Chen
Manager Advisory
Engineering Architect
LinkedIn
2
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Agenda
Kafka at a Glance
Kafka Use Cases
Key Takeaways
Q&A
Intelligent Forecast System
Kafka in Banking
Distributed Data with CQRS
5
min
20
min
5
min
10
min
3
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka at a Glance
4
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka in the Market
CORE CAPABILITIES
Scalable
Scale production clusters up to a
thousand brokers, trillion of
messages per day, petabytes of
data, hundreds of thousands of
partitions. Elastically expand and
contract storage and processing.
High Throughput
Deliver messages at network limited
throughput using a cluster of machines
with latencies as low as 2ms
Permanent Storage
Store streams of data safely in a
distributed, durable, fault tolerant cluster
High Availability
Stretch clusters efficiently over availability
zones or connect separate clusters across
geographic regions
Source: kafka.apache.org
5
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka Platform Overview
Event Streaming Platform
Distributed streaming platform
that enables real-time, event-
driven applications using a topic-
based pub-sub model
Performance at Scale
Kafka operates as a highly-
available and fault-tolerant
cluster that spans servers and
even data centers with a
partitioning system that supports
data volumes of practically any
size
https://docs.confluent.io/
6
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
What is Event Driven Streaming with Kafka
ETL
Raw Message Queue
Change Data Capture
Mainframe
Customed
Topic
Partition
Partition
Partition
Brokers (Servers) Web
Mobile
Data Warehouse
Monitor Tool
Partners
Subscribing
Publishing
Data Draining
Producers Consumers
Kafka Cluster
An event is a type of data that describes the entity’s observable state updates over time (Definition by IBM)
For example, first time user registration, payment, social media post etc.
7
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with
CQRS and Kafka
8
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - CQRS at a Glance
Overview
Command Query
Responsibility Segregation
Read and write workloads are
separated, decoupled, and
scaled independently.
Event Sourcing
CQRS is often linked with event
sourcing – Effectively viewing
data state as a series of discrete
events.
Event Sourcing is an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store
(Defined by Microsoft). For example, placing an online order, returning the order under the same user account.
9
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - Traditional Design
Difficult to Scale
SOR must be able to support the load
of all clients and systems. Read
replicas can improve scalability.
Single Point of Failure
If SOR or API layer is unavailable, all
consumers may be affected
Rigid
All access to SOR data flows through
centralized APIs. Consumers receive
data in the schemas set up by access
layer.
Difficult to Manipulate Data
Data access to SOR directly is
restricted. Transforms, joins, and
analytical operations may be difficult
and rely on lagging ETL operations
Client: external facing UI, third party apis
System: internal facing ETL, mainframe
SOR: System of Record (the authoritative data source)
10
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - CQRS Design
Data Changes as Events
Current state of SOR is captured
through an event format
Consumer Subscribe to
Changes
Consumers listen to data event
changes and consume the information
according to their own use case
Other Systems Act on Data
Systems act on data updates as
defined by use case. Systems may
replicate the data, enrich the data, or
simply process events in real-time
Read / Write Separation
Data read is segregated from data
write. Read only consumers introduce
no additional load to SOR.
11
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka Advantages and Challenges
Independent Scaling
Read and write workloads may
be scaled independently based
on load and access patterns
Separation of Concerns
Segregated models allow for
tightly controlled write logic while
permitting flexibility in read
models and stream processing
System Isolation
Access to the SOR database is
restricted to a controlled write
API. Consumers may safely read
from a replica
Flexible Consumption
Kafka’s scalable architecture
allows for consumers to process
events differently across systems
at different velocities
Eventual Consistency
Reads will be eventually
consistent and may have some
delay until writes have
propagated through the system
Complexity
Implementation of the pattern
increases complexity of the
overall solution
Different Data Velocity
Consumers may process events
at different velocities, resulting in
inconsistencies across systems
Advantages Challenges
12
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Distributed Data with CQRS and Kafka - Common Scenarios
Complex Data Operations Across
Systems
Different systems need to
process and transform data in
complex and evolving use cases.
Real-time Data Processing
Across Systems
Traditional ETL and batch
operations are too slow and rigid
to meet evolving business
requirements. Organization
seeks to process data in real-
time as it becomes available
across different systems.
Resource Bottlenecks with
Growing Demand
Traditional data system
resources are strained and
unable to support growing
demands of business.
Scenarios to Consider
Data Security Concerns Across
Systems
Data must be shared securely
across systems without
introducing new security risks.
Increased Demand for Data
Sharing Across Enterprise
Enterprise seeks to break down
data silos and share data
effectively across the
organization increase synergy
between systems.
13
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Intelligent Forecast
with Kafka
14
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Intelligent Forecast with Native Kafka Solution
ELK Stack
Elasticsearch
Storage
Kafka Connector API
Indexing
ETL
Raw Message Queue
Change Data Capture
Mainframe
Customed
Producers Consumers
Kafka Cluster
Publishing Subscribing
Data
Draining
Kafka's role in this solution is to publish the data from the different channels as the categorized topics; Through Kafka connector APIs
(connector replicators), the ELK Stack subscribe to the specified topics. This Pub/Sub is also called event streaming. The
customized data can then be rendered through Kibana dashboard.
15
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Challenges
Kafka Connector
Similar open source solutions like MirrorMaker, uReplicator by Uber, Mirus by Salesforce can be alternatives to
tackle the scalability bottleneck while reducing the licensing cost.
PII encryption
While considering Kafka security library and in house solution, it is important to establish the early PII governance
among producers, Kafka cluster and consumers. In other words, who is responsible for masking the sensitive data
throughout the real data streaming pipeline.
Intelligent Forecast with Native Kafka Solution
Key Design
Pub/Sub, decoupled
and asynchronous
messaging service
for scalability
Equivalent Solutions
Azure Event hub
Google Pub/Sub
AWS Kinesis
Proactive Analytics
in use cases such as the capability of detecting
and forecasting the abnormal trend outside of
the threshold: transaction fraud at ATMs and
restaurant mobile orders.
16
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Kafka in Banking
What a Banking Institute’s need to modernize its Legacy System
A Banking Institute has been looking to migrate its legacy system to modern technologies that accelerate fast growing demand
in big data through building a modern data streaming platform as part of Business Operation Brain (BOB).
Reuse the existing data centers, storage, infrastructure and security procedures
Scalable and reliable (million transactions/events per second) with the existing infrastructure
Data must be logged for transaction tracing and auditing (For example, Change Data Capture)
What options we have: Kafka and Its Comparables in the marketplace
Not an inclusive options
AWS Kinesis
Open Source, On Prem Managed Cloud Computing
Proprietary
Open Source, On Prem or Managed
Cloud Computing
©
2
0
2
2
K
P
M
G
L
L
P
,
a
D
e
l
a
w
a
r
e
l
i
m
i
t
e
d
l
i
a
b
i
l
i
t
y
p
a
r
t
n
e
r
s
h
i
p
a
n
d
a
m
e
m
b
e
r
f
i
r
m
o
f
t
h
e
K
P
M
G
g
l
o
1
9
Apache Kafka Rabbit MQ
Operation
Cost
Messaging
Immutable, ordered,
replay; User defined
retention policy
Queue/Message index attached with
TTL; Messages are removed once
consumed
Storage
Persistent storage offers
durability and reliability;
Append log
Scalability
Horizonal Scale, Scale Out,
adding more machines to
increase disk I/O
Vertical Scale, Scale up,
adding more CPU, RAM to the
existing machine/hardware
Up to 365 days
Identify KPIs When Evaluating the Options
Autoscaling
Security Customized,
Manual Configuration
Native Cloud Solution Customized,
Manual Configuration
Pay as you go,
Elastic and durable
Messages are removed once
consumed; In memory is preferred
20
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Key Takeaways
21
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Key Takeaways
CQRS Pattern with Kafka
Use the scale, speed, and reliability of
Kafka as the backbone for an
eventually-consistent distributed data
solutions that allows flexible
consumption models and independent
scaling.
Kafka in Banking
Objectively select the metrics for the
business use case. Design the data
streaming solution that is ready to
scale.
Intelligent Forecast with Kafka
To reap the scalability benefit, design
the Kafka connector solution for future
business growth. PII must be encrypted
throughout Kafka pipeline and
automated.
22
© 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International
Cooperative (“KPMG International”), a Swiss entity. All rights reserved.
Q&A

More Related Content

Similar to Data Con LA 2022 - Data Streaming with Kafka

AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
Amazon Web Services
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
confluent
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
confluent
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Firewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data accessFirewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data access
Sumit Sarkar
 
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
EMC
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
Jeffrey T. Pollock
 
Hybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and RESTHybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and REST
Sumit Sarkar
 
Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Amazon Web Services
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
confluent
 
Cloud computing Introductory Session
Cloud computing Introductory SessionCloud computing Introductory Session
Cloud computing Introductory Session
Abhinav Parmar
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Cscorajramab
 
Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2
Brandon Watson
 
Cloud 12 08 V2
Cloud 12 08 V2Cloud 12 08 V2
Cloud 12 08 V2
Pini Cohen
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
IRJET Journal
 
Amazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDBAmazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDB
John Yeung
 

Similar to Data Con LA 2022 - Data Streaming with Kafka (20)

AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &MLAWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
AWS re:Invent Comes to London 2019 - Database, Analytics, AI &ML
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Firewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data accessFirewall friendly pipeline for secure data access
Firewall friendly pipeline for secure data access
 
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
White Paper: Rethink Storage: Transform the Data Center with EMC ViPR Softwar...
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Hybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and RESTHybrid Data Pipeline for SQL and REST
Hybrid Data Pipeline for SQL and REST
 
Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019Transforming Enterprise IT - Virtual Transformation Day Feb 2019
Transforming Enterprise IT - Virtual Transformation Day Feb 2019
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Cloud computing Introductory Session
Cloud computing Introductory SessionCloud computing Introductory Session
Cloud computing Introductory Session
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Azure Overview Csco
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
 
Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2Cloud Providers Public 030909 V2
Cloud Providers Public 030909 V2
 
Cloud 12 08 V2
Cloud 12 08 V2Cloud 12 08 V2
Cloud 12 08 V2
 
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
Autonomous Platform with AIML Document Intelligence Capabilities to Handle Se...
 
Power
PowerPower
Power
 
Amazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDBAmazon Managed Blockchain and Quantum Ledger Database QLDB
Amazon Managed Blockchain and Quantum Ledger Database QLDB
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
Data Con LA 2022 - Building Field-level Lineage from Scratch for Modern Data ...
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 

Data Con LA 2022 - Data Streaming with Kafka

  • 1. 1 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. August 13, 2022 at DataConLA Real Time Data Streaming with Kafka Speaker: Jie Chen Manager Advisory Engineering Architect LinkedIn
  • 2. 2 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Agenda Kafka at a Glance Kafka Use Cases Key Takeaways Q&A Intelligent Forecast System Kafka in Banking Distributed Data with CQRS 5 min 20 min 5 min 10 min
  • 3. 3 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka at a Glance
  • 4. 4 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka in the Market CORE CAPABILITIES Scalable Scale production clusters up to a thousand brokers, trillion of messages per day, petabytes of data, hundreds of thousands of partitions. Elastically expand and contract storage and processing. High Throughput Deliver messages at network limited throughput using a cluster of machines with latencies as low as 2ms Permanent Storage Store streams of data safely in a distributed, durable, fault tolerant cluster High Availability Stretch clusters efficiently over availability zones or connect separate clusters across geographic regions Source: kafka.apache.org
  • 5. 5 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka Platform Overview Event Streaming Platform Distributed streaming platform that enables real-time, event- driven applications using a topic- based pub-sub model Performance at Scale Kafka operates as a highly- available and fault-tolerant cluster that spans servers and even data centers with a partitioning system that supports data volumes of practically any size https://docs.confluent.io/
  • 6. 6 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. What is Event Driven Streaming with Kafka ETL Raw Message Queue Change Data Capture Mainframe Customed Topic Partition Partition Partition Brokers (Servers) Web Mobile Data Warehouse Monitor Tool Partners Subscribing Publishing Data Draining Producers Consumers Kafka Cluster An event is a type of data that describes the entity’s observable state updates over time (Definition by IBM) For example, first time user registration, payment, social media post etc.
  • 7. 7 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka
  • 8. 8 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - CQRS at a Glance Overview Command Query Responsibility Segregation Read and write workloads are separated, decoupled, and scaled independently. Event Sourcing CQRS is often linked with event sourcing – Effectively viewing data state as a series of discrete events. Event Sourcing is an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store (Defined by Microsoft). For example, placing an online order, returning the order under the same user account.
  • 9. 9 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - Traditional Design Difficult to Scale SOR must be able to support the load of all clients and systems. Read replicas can improve scalability. Single Point of Failure If SOR or API layer is unavailable, all consumers may be affected Rigid All access to SOR data flows through centralized APIs. Consumers receive data in the schemas set up by access layer. Difficult to Manipulate Data Data access to SOR directly is restricted. Transforms, joins, and analytical operations may be difficult and rely on lagging ETL operations Client: external facing UI, third party apis System: internal facing ETL, mainframe SOR: System of Record (the authoritative data source)
  • 10. 10 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - CQRS Design Data Changes as Events Current state of SOR is captured through an event format Consumer Subscribe to Changes Consumers listen to data event changes and consume the information according to their own use case Other Systems Act on Data Systems act on data updates as defined by use case. Systems may replicate the data, enrich the data, or simply process events in real-time Read / Write Separation Data read is segregated from data write. Read only consumers introduce no additional load to SOR.
  • 11. 11 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka Advantages and Challenges Independent Scaling Read and write workloads may be scaled independently based on load and access patterns Separation of Concerns Segregated models allow for tightly controlled write logic while permitting flexibility in read models and stream processing System Isolation Access to the SOR database is restricted to a controlled write API. Consumers may safely read from a replica Flexible Consumption Kafka’s scalable architecture allows for consumers to process events differently across systems at different velocities Eventual Consistency Reads will be eventually consistent and may have some delay until writes have propagated through the system Complexity Implementation of the pattern increases complexity of the overall solution Different Data Velocity Consumers may process events at different velocities, resulting in inconsistencies across systems Advantages Challenges
  • 12. 12 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Distributed Data with CQRS and Kafka - Common Scenarios Complex Data Operations Across Systems Different systems need to process and transform data in complex and evolving use cases. Real-time Data Processing Across Systems Traditional ETL and batch operations are too slow and rigid to meet evolving business requirements. Organization seeks to process data in real- time as it becomes available across different systems. Resource Bottlenecks with Growing Demand Traditional data system resources are strained and unable to support growing demands of business. Scenarios to Consider Data Security Concerns Across Systems Data must be shared securely across systems without introducing new security risks. Increased Demand for Data Sharing Across Enterprise Enterprise seeks to break down data silos and share data effectively across the organization increase synergy between systems.
  • 13. 13 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Intelligent Forecast with Kafka
  • 14. 14 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Intelligent Forecast with Native Kafka Solution ELK Stack Elasticsearch Storage Kafka Connector API Indexing ETL Raw Message Queue Change Data Capture Mainframe Customed Producers Consumers Kafka Cluster Publishing Subscribing Data Draining Kafka's role in this solution is to publish the data from the different channels as the categorized topics; Through Kafka connector APIs (connector replicators), the ELK Stack subscribe to the specified topics. This Pub/Sub is also called event streaming. The customized data can then be rendered through Kibana dashboard.
  • 15. 15 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Challenges Kafka Connector Similar open source solutions like MirrorMaker, uReplicator by Uber, Mirus by Salesforce can be alternatives to tackle the scalability bottleneck while reducing the licensing cost. PII encryption While considering Kafka security library and in house solution, it is important to establish the early PII governance among producers, Kafka cluster and consumers. In other words, who is responsible for masking the sensitive data throughout the real data streaming pipeline. Intelligent Forecast with Native Kafka Solution Key Design Pub/Sub, decoupled and asynchronous messaging service for scalability Equivalent Solutions Azure Event hub Google Pub/Sub AWS Kinesis Proactive Analytics in use cases such as the capability of detecting and forecasting the abnormal trend outside of the threshold: transaction fraud at ATMs and restaurant mobile orders.
  • 16. 16 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Kafka in Banking
  • 17. What a Banking Institute’s need to modernize its Legacy System A Banking Institute has been looking to migrate its legacy system to modern technologies that accelerate fast growing demand in big data through building a modern data streaming platform as part of Business Operation Brain (BOB). Reuse the existing data centers, storage, infrastructure and security procedures Scalable and reliable (million transactions/events per second) with the existing infrastructure Data must be logged for transaction tracing and auditing (For example, Change Data Capture)
  • 18. What options we have: Kafka and Its Comparables in the marketplace Not an inclusive options
  • 19. AWS Kinesis Open Source, On Prem Managed Cloud Computing Proprietary Open Source, On Prem or Managed Cloud Computing © 2 0 2 2 K P M G L L P , a D e l a w a r e l i m i t e d l i a b i l i t y p a r t n e r s h i p a n d a m e m b e r f i r m o f t h e K P M G g l o 1 9 Apache Kafka Rabbit MQ Operation Cost Messaging Immutable, ordered, replay; User defined retention policy Queue/Message index attached with TTL; Messages are removed once consumed Storage Persistent storage offers durability and reliability; Append log Scalability Horizonal Scale, Scale Out, adding more machines to increase disk I/O Vertical Scale, Scale up, adding more CPU, RAM to the existing machine/hardware Up to 365 days Identify KPIs When Evaluating the Options Autoscaling Security Customized, Manual Configuration Native Cloud Solution Customized, Manual Configuration Pay as you go, Elastic and durable Messages are removed once consumed; In memory is preferred
  • 20. 20 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Key Takeaways
  • 21. 21 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Key Takeaways CQRS Pattern with Kafka Use the scale, speed, and reliability of Kafka as the backbone for an eventually-consistent distributed data solutions that allows flexible consumption models and independent scaling. Kafka in Banking Objectively select the metrics for the business use case. Design the data streaming solution that is ready to scale. Intelligent Forecast with Kafka To reap the scalability benefit, design the Kafka connector solution for future business growth. PII must be encrypted throughout Kafka pipeline and automated.
  • 22. 22 © 2020 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. Q&A