8. 8
Paradigm for Data Movement: Message Queues
Produce
message
Consume
oldest
message
Store until delivered
9. 9
Paradigm for Data Movement: Publish-Subscribe
Produce
message
Delivery to
0..n
subscribers
Focus on efficient
Message Delivery
10. Enterprise Data Architecture is a Giant Mess
LINE OF BUSINESS 01 LINE OF BUSINESS 02 PUBLIC CLOUD
Data architecture is rigid, complicated, and expensive - making it too hard
and cost-prohibitive to digitally transform
12. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Traditional Hub & Spoke Message Broker
Message brokers were originally architected as
centralised systems
12
Producer
Message
Broker
Consumer
Single point of failure,
resulting in low fault-
tolerance and high
downtime
Producer Producer
Consumer
Consumer
13. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
High Availability Pairs of Brokers
For improved fault-tolerance, brokers are often
deployed in high availability pairs
13
Client
Primary
Broker
Client Client
Client Client Client
Standby
Broker
Clients still only
connect to the active
brokers though, so
this solution lacks
scalability...
14. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Interconnected group of Message Brokers
Over time, some brokers evolved to multi-node
networks, but clients still connect to one broker
14
Client
Broker 1
Client
Client Client
Because clients still only
connect to one broker,
this solution still does not
provide horizontal
scalability
Managed independently
Client
Broker 2
Client
Client Client
Client
Broker 3
Client
Client Client
16. Let’s use an immutable log to share data!
16
1 2 3 4 5 6 7 8 9 10
Producers
write here
Kafka producers write to an
append-only, immutable, ordered
sequence of messages, which is
always ordered by time
● Sequential writes only
● No random disk access
● All operations are O(1)
● Highly efficient
17. A log is like a queue, but re-readable :-D
17
1 2 3 4 5 6 7 8 9 10
“Consumers”
scan the log
“Consumer”
A
“Consumer”
B
“Better than a queue”-like
behavior as Kafka consumer
groups allows for parallel in-order
consumption of data, which is
something that shared queues in
traditional message brokers do
not support.
● Sequential reads only
● Start at any offset
● All operations are O(1)
● Highly efficient
Slow consumers don’t back up
the broker: THE STREAM GOES
ON.
18. Clients connect to
multiple brokers for
both reads and writes
to and from a topic
Kafka Cluster with Topic Partitions & Multiple Client Connections
Kafka takes a different approach by partitioning
topics across the brokers in a cluster
18
Broker 1 Broker 2 Broker 3
Topic
Partition 0
Topic
Partition 1
Topic
Partition 2
Producer
Consumer
19. Kafka topics are designed as a commit log that
captures events in a durable, scalable way
1 2 3 4 5 6 8 9
7
Partition 1
Old New
1 2 3 4 5 6 8
7
Partition 0 10
9 11 12
Partition 2 1 2 3 4 5 6 8
7 10
9 11 12
Writes
1 2 3 4 5 6 8
7 10
9 11 12
Producers
Writes
Consumer A
(offset=4)
Consumer B
(offset=7)
Reads
20. Partitioning topics enables greater horizontal
scalability and enterprise-scale throughput
20
15x improvement
in throughput
performance
One platform
to deploy,
secure, and
manage to
support all of
your streaming
workloads.
Broker 1 Broker 2
Topic 1,
Partition 0
Topic 2,
Partition 2
Topic 3,
Partition 1
Topic 4,
Partition 0
Topic 1,
Partition 1
Topic 2,
Partition 0
Topic 3,
Partition 2
Topic 4,
Partition 1
Topic 1,
Partition 2
Topic 2,
Partition 1
Topic 3,
Partition 0
Topic 4,
Partition 2
Broker 3
21. How else is Kafka different from traditional
messaging queues?
21
Topic partitions are
replicated to maximize
fault-tolerance
In addition to partitioning
topics, each partition can be
replicated across multiple
brokers to ensure high uptime
even if a broker is lost.
Producers and consumers
scale independently from
brokers
Production and consumption
rates (e.g. spike or slow
consumer issue) have no effect
on the broker. THE STREAM
GOES ON.
Event streams can be
enriched in real-time with
stream processing
ksqlDB and Kafka Streams
enable event streams to be
processed “in-flight” rather
than with a separate batch
solution
23. “By 2020, event-sourced, real-time situational
awareness will be a required characteristic
for 80% of digital business solutions. And
80% of new business ecosystems will require
support for event processing.”
Gartner, Top 10 Strategic Technology Trends, “Event-Driven Model”
24. Data Sharing Challenges for Bimodal IT
24
Systems of
Innovation
Systems of
Differentiation
Systems of
Record
Mode 1
Mode 2
Agility
Reliability
25. Data Sharing Challenges for Bimodal IT
25
Systems of
Innovation
Systems of
Differentiation
Systems of
Record
Mode 1
Mode 2
Agility
Reliability
Findability
Accessibility
Interoperability
Reusability
26. Traditional Integration Approach with MoM & ESB
SAP SCM
SaaS CRM
Supplier Management
API Gateway
eCommerce
Partner
Portal
Public Cloud Platform
On-Premises DC
Message Oriented Middleware - Event Driven Data Movement with Ephemeral Message Persistence
ESB Process Orchestration and Mapping
27. Traditional Integration Approach with MoM, ESB & ETL
SAP
DWH
SCM
SaaS CRM
Supplier Management
API Gateway
eCommerce
Partner
Portal
Public Cloud Platform
On-Premises DC ETL
ETL
ETL
ETL
ETL
ETL
Message Oriented Middleware - Event Driven Data Movement with Ephemeral Message Persistence
ESB Process Orchestration and Mapping
28. Traditional Integration Approach with MoM, ESB & ETL
SAP
DWH
SCM
SaaS CRM
Supplier Management
API Gateway
eCommerce
Partner
Portal
ODS
Public Cloud Platform
On-Premises DC ETL
ETL
ETL
ETL
ETL
ETL
Message Oriented Middleware - Event Driven Data Movement with Ephemeral Message Persistence
ESB Process Orchestration and Mapping
Systems of Record Systems of Differentiation
29. Traditional Integration Challenges
SAP
DWH
SCM
SaaS
CRM
Supplier Management
API Gateway
eCommerce
Partner
Portal
ODS
Public Cloud Platform
On-Premises DC ETL
ETL
ETL
ETL
ETL
ETL
Message Oriented Middleware - Event Driven Data Movement with Ephemeral Message Persistence
ESB Process Orchestration and Mapping
● All systems need to operate in the
same mode and speed, both
technically and organizationally (i.e.
lifecycle management).
● Moves data between silos, but
doesn’t break them down.
● Specialised integration toolset
creates organisational dependency
on COE.
● Short lived data integration flows,
become monolith applications with
tight system coupling.
● Data is not FAIR (Findability ,
Accessibility, Interoperability,
Reusability)
● Loss of knowledge how data state
changed over time, as application
database maintains current state.
30. Event Streaming Integration Approach
30
Turning the Database Inside Out Data Mesh
Event Streaming Data Mesh
Data Replication
Materialized View Data as a Product
Data Ownership
&
Responsibility
31. Paradigm for Data-in-Motion: Event Streams
A Trade
A Customer
Experience
A Sale A Shipment
Real-Time
Event Streams
Real-Time
Events
Rich Customer
Experiences
Data-Driven
Operations
32. 32
Modernize your
infrastructure
Confluent provides the
tools required to
effectively augment or
migrate from your
messaging queue
Confluent offers a robust set of
connectors to pull data from
your MQ into Kafka...
...and connectors to push data
from Kafka into your modern,
cloud-native sinks
33. Systems of Record
Proposed Event Streaming Integration Architecture
33
SAP
DWH
Saas CRM
Supplier
Management
API Gateway
Stream Processors - Microservices - ksqlDB
Schema Registry Systems of Differentiation
Data Pipelines &
Data Materialization
Event Streaming Data Mesh
On-Premises DC Public Cloud Platform
eCommerce
SCM
Partner
Portal