1. Unlock value of your data in real-
time with Confluent and AWS
Ahmed Zamzam
Senior Partner Solutions Architect
2. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Agenda
Real-time Analytics with Confluent and AWS
Building event-streaming applications real time analytics
pipelines using Confluent and AWS
Presentation layer
Q/A
Confluent makes real-time data
streams top priority
Rise of data in motion
Event streaming with Confluent
Rearchitected Kafka, together with the features you
need to rapidly deploy production use cases
Enable BI and AI/ML use-cases
3. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Trending Now
Popular on Netflix
Top Picks for Joshua
Curbside
pickup
Loyalty rewards
Personalized
recommendations
Real-time trades
Ride ETA
Data is the fuel for a modern business
4. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Why Real-time data?
Source: Perishable insights, Mike Gualtieri, Forrester
Data loses value quickly over time
Real-time Seconds Minutes Hours Days Months
Value
of
data
to
decision-making
Preventive/Predictive
Actionable Reactive Historical
Time critical
decisions
Traditional “batch” business intelligence
Information half-life
in decision-making
5. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Typical real-time data pipeline
Data continuously
generated at a high
velocity from different
sources like IoT devices,
Application logs, Online
transactions, etc..
Source
Data captured and
stored in the order it
was received for set
duration of time, and
can be replayed
indefinitely.
Event Streaming
Process, analyse and
action on the data as
soon as it is generated
and in the order it was
received
Stream Processing
Sink data different
destinations. Dara Lakes
(most common) and/or
different Databases
Presentation
7. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
The rise of data in motion
70%
of fortune 500 companies
using Apache Kafka
(majority are Confluent customers)
8. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
...many more
Other
Systems
Other
Systems
Kafka
Connect
Kafka Cluster
Kafka
Connect
Apache Kafka is an Event Streaming Platform
11. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Everywhere
Be everywhere our
customers
want to be
Cloud-Native
Re-imagined
Kafka experience
for the Cloud
Complete
Enable developers
to reliably &
securely build next-
gen apps faster
The Confluent Product Advantage
12. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Leave Kafka reliability worries behind with
99.99% uptime SLA and 10x built-in durability
Never worry about Kafka storage limits again
with Infinite Storage that’s 10x more scalable
and performant
Scale and shrink to handle 0 to GBps+
workloads and peak customer demands
10x faster and easier
10x Kafka
Confluent Cloud offers a truly
fully managed, cloud-native
data streaming platform for
Apache Kafka, with 10x faster
scaling, infinitely more
storage, and built-in resilience
Resiliency
Storage
Elasticity
13. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent Platform
The Enterprise Distribution of
Apache Kafka
Confluent Cloud
Apache Kafka reengineered
for the cloud
Self-managed software
Fully managed service
VM
Deploy on any platform, on premises, or
cloud
Available on
Confluent: Everywhere
14. Federated streaming, hybrid
and multi-cloud.
Data syndication and replication
across and between clouds and on-
premises, with self-service APIs, data
governance, and visual tooling.
Reliable & real-time data streams
between all customer sites, so you
can run always-on streaming
analytics on the data of the entire
enterprise, despite regional or cloud
provider outages.
Everywhere:
Cluster Linking Global Central Nervous System
15. “We are in the business of selling and renting clothes. We are not in the
business of managing an event streaming platform… If we had to
manage everything ourselves, I would’ve had to hire at least 10
more people to keep the systems up and running.”
● Architecture planning
● Cluster sizing
● Cluster provisioning
● Broker settings
● Zookeeper management
● Partition placement and data
durability
● Source/sink connectors
development and maintenance
● Monitoring and reporting tools
setup
● Software patches and upgrades
● Security controls and integrations
● Failover design and planning
● Mirroring and geo-replication
● Streaming data governance
● Load rebalancing and monitoring
● Expansion planning & execution
● Utilization optimization and
visibility
● Cluster migrations
● Infrastructure & performance
upgrades / enhancements
I N V E S T M E N T & T I M E
V
A
L
U
E
1
2
3
4
5
Experimentation
/ early interest
Central nervous
system
Mission critical,
disparate LOBs
Identify a
project
Mission-critical,
connected LOBs
Key challenges
Operational burden and resources
Manage and scale platform to support
ever-growing demand
Security and governance
Ensure streaming data is as safe and secure
as data-at-rest as Kafka usage scales
Real-time connectivity and processing
Leverage valuable legacy data to power modern,
cloud-based applications and experiences
Global availability
Maintain high availability across
environments with minimal downtime
Kafka is hard in experimentation. It only gets harder (and riskier)
as you add mission-critical data and use cases.
Operationalizing Kafka on your own is difficult
16. Discover, understand,
and trust your data
streams
Where did data come from?
Where is it going?
Where, when, and how was it transformed?
What’s the common taxonomy?
What is the current state of the stream?
Stream Catalog
Increase collaboration and productivity
with self-service data discovery
Stream Lineage
Understand complex data relationships
and uncover more insights
Stream Quality
Deliver trusted, high-quality event
streams to the business
“Confluent’s Stream Governance suite will play a major role in our expanded use of data in
motion and creation of a central nervous system for the enterprise. With the self-service
capabilities in stream catalog and stream lineage, we’ll be able to greatly simplify and
accelerate the onboarding of new teams working with our most valuable data."
17. Instantly connect popular data sources &
sinks
120+
prebuilt
connectors
100+ Confluent supported 20+ partner supported, Confluent verified
21. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Together Confluent and AWS empower Endless Use Cases
across many Industries
Retail
Healthcare
Finance &
Banking
Transportation
Common in all
Industries
Inventory
Management
Personalized
Promotions
Product
Development
& Introduction
Sentiment
Analysis
Streaming
Enterprise
Messaging
Systems of
Scale for High
Traffic Periods
Connected
Health
Records
Data
Confidentiality
& Accessibility
Dynamic Staff
Allocation
Optimization
Integrated
Treatment
Proactive Patient
Care
Real-Time
Monitoring
Early-On
Fraud
Detection
Capital
Management
Market Risk
Recognition &
Investigation
Preventive
Regulatory
Scanning
Real-Time What-
If
Analysis
Trade Flow
Monitoring
Advanced
Navigation
Environmental
Factor
Processing
Fleet
Management
Predictive
Maintenance
Threat Detection
& Real-Time
Response
Traffic
Distribution
Optimization
Data Pipelines
Hybrid Cloud
Integration
Microservices
Security and
Fraud
Customer 360 Streaming ETL
24. ksqlDB at a glance
What is it?
ksqlDB is an event-streaming
database for working with
streams and tables of data
All the key features of a
modern streaming solution
Aggregations Joins
Windowing
Event-time
Dual query
support
Exactly-once
semantics
Out-of-order
handling
User-defined
functions
CREATE TABLE activePromotions AS
SELECT rideId,
qualifyPromotion(distanceToDst) AS promotion
FROM locations
GROUP BY rideId
EMIT CHANGES
How does it work?
It separates compute from storage, and scales
elastically in a fault-tolerant manner
It remains highly available during disruption, even in
the face of failure to a quorum of its servers
25. Kafka clients Kafka streams ksqlDB
ConsumerRecords<String, String> records =
consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet())
{
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(),
Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT
CHANGES;
Flexibility Simplicity
3 modalities of stream processing with Confluent
26. 2. Stateless Stream processing with AWS Lambda
Event
source
mapping
Lambda service
Confluent Kafka sink
connector
• Sink connector polls Kafka partitions and
invokes your function
• Lambda can be invoked synchronously or
asynchronously
• At least once semantics
• Provides a dead letter queue (DLQ) for any
failed invocations
• Sink connector scales up to a soft maximum
of 10 connectors
• Lambda service polls the Kafka partitions and invokes
your Lambda function synchronously
• Starts with one concurrent poller and customer
function
• Scaling
○ Lambda service checks every 3 minutes if
scaling is needed
○ Starts with 1 poller and scales up to ≤
#partitions
• Batch records based on a BatchSize or Batchwindow
32. ksqlDB Kafka Streams
Kinesis Data
Analytics
Lambda
Fully Managed ✅ — ✅ ✅
TYPE Stateful and Stateless
Stateful and
Stateless
Stateful and
Stateless
Stateless
FAULT TOLERANCE Exactly once Exactly once Exactly once At-least once
UDF SUPPORT ✅
(self-managed)
✅
(self-managed)
✅ ✅
LATENCY FAST VERY FAST VERY FAST FAST
When to use which?
34. Amazon Redshift
sink
AWS Lambda
sink
AWS Direct
Connect
ClusterLink
LEGACY EDW
MAINFRAME
LEGACY DB
JDBC/CDC
connectors
Connect
Leverage +120 Confluent prebuilt connectors to continuously
bring valuable data from existing services on-premises, including
enterprise data warehouse, databases, and mainframes
Modernize
Increase agility in getting applications to market and reduce TCO
when freeing up resources to focus on value-generating activities
and not in managing servers
On premises AWS Cloud
Bridge
Hybrid cloud streaming with
consistent, event-driven
architecture for modern apps
Amazon Athena
AWS Glue
Amazon
SageMaker
AWS Lake
Formation
Amazon
DynamoDB
Amazon
Aurora
Amazon S3 sink
Data streams
Applications
ksqlDB
Amazon S3
Amazon Redshift
AWS Lambda
Accelerate modernization from on premises
to AWS
35. Thank You & Next Steps
How did we do? Enter your feedback! Learn more
Check out Confluent - AWS Workshops -
https://confluent.awsworkshop.io/
Try it out yourself!
Subscribe to Confluent Cloud on the AWS
Marketplace and start with a free $400 (to
be used within 30 days).
1
2
3
Schedule a workshop/hackathon
Pick a problem and schedule a
workshop/hackathon with Confluent and
AWS for your team.
37. ● Source static data from
RDS – MySQL
● Process applications
using static data and real-
time events in Confluent
Cloud
● Visualize data sinked into
Redshift using
QuickSight
Live Demo!