SlideShare a Scribd company logo
1 of 37
Unlock value of your data in real-
time with Confluent and AWS
Ahmed Zamzam
Senior Partner Solutions Architect
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Agenda
Real-time Analytics with Confluent and AWS
Building event-streaming applications real time analytics
pipelines using Confluent and AWS
Presentation layer
Q/A
Confluent makes real-time data
streams top priority
Rise of data in motion
Event streaming with Confluent
Rearchitected Kafka, together with the features you
need to rapidly deploy production use cases
Enable BI and AI/ML use-cases
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Trending Now
Popular on Netflix
Top Picks for Joshua
Curbside
pickup
Loyalty rewards
Personalized
recommendations
Real-time trades
Ride ETA
Data is the fuel for a modern business
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Why Real-time data?
Source: Perishable insights, Mike Gualtieri, Forrester
Data loses value quickly over time
Real-time Seconds Minutes Hours Days Months
Value
of
data
to
decision-making
Preventive/Predictive
Actionable Reactive Historical
Time critical
decisions
Traditional “batch” business intelligence
Information half-life
in decision-making
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Typical real-time data pipeline
Data continuously
generated at a high
velocity from different
sources like IoT devices,
Application logs, Online
transactions, etc..
Source
Data captured and
stored in the order it
was received for set
duration of time, and
can be replayed
indefinitely.
Event Streaming
Process, analyse and
action on the data as
soon as it is generated
and in the order it was
received
Stream Processing
Sink data different
destinations. Dara Lakes
(most common) and/or
different Databases
Presentation
Data Streaming Platform
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
The rise of data in motion
70%
of fortune 500 companies
using Apache Kafka
(majority are Confluent customers)
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
...many more
Other
Systems
Other
Systems
Kafka
Connect
Kafka Cluster
Kafka
Connect
Apache Kafka is an Event Streaming Platform
© 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark
Core Kafka Features
01
Publish & Subscribe
to Streams of Events
02
Store
your Event Streams
03
Process & Analyze
your Events Streams
Data Streaming with Confluent
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Everywhere
Be everywhere our
customers
want to be
Cloud-Native
Re-imagined
Kafka experience
for the Cloud
Complete
Enable developers
to reliably &
securely build next-
gen apps faster
The Confluent Product Advantage
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Leave Kafka reliability worries behind with
99.99% uptime SLA and 10x built-in durability
Never worry about Kafka storage limits again
with Infinite Storage that’s 10x more scalable
and performant
Scale and shrink to handle 0 to GBps+
workloads and peak customer demands
10x faster and easier
10x Kafka
Confluent Cloud offers a truly
fully managed, cloud-native
data streaming platform for
Apache Kafka, with 10x faster
scaling, infinitely more
storage, and built-in resilience
Resiliency
Storage
Elasticity
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Confluent Platform
The Enterprise Distribution of
Apache Kafka
Confluent Cloud
Apache Kafka reengineered
for the cloud
Self-managed software
Fully managed service
VM
Deploy on any platform, on premises, or
cloud
Available on
Confluent: Everywhere
Federated streaming, hybrid
and multi-cloud.
Data syndication and replication
across and between clouds and on-
premises, with self-service APIs, data
governance, and visual tooling.
Reliable & real-time data streams
between all customer sites, so you
can run always-on streaming
analytics on the data of the entire
enterprise, despite regional or cloud
provider outages.
Everywhere:
Cluster Linking Global Central Nervous System
“We are in the business of selling and renting clothes. We are not in the
business of managing an event streaming platform… If we had to
manage everything ourselves, I would’ve had to hire at least 10
more people to keep the systems up and running.”
● Architecture planning
● Cluster sizing
● Cluster provisioning
● Broker settings
● Zookeeper management
● Partition placement and data
durability
● Source/sink connectors
development and maintenance
● Monitoring and reporting tools
setup
● Software patches and upgrades
● Security controls and integrations
● Failover design and planning
● Mirroring and geo-replication
● Streaming data governance
● Load rebalancing and monitoring
● Expansion planning & execution
● Utilization optimization and
visibility
● Cluster migrations
● Infrastructure & performance
upgrades / enhancements
I N V E S T M E N T & T I M E
V
A
L
U
E
1
2
3
4
5
Experimentation
/ early interest
Central nervous
system
Mission critical,
disparate LOBs
Identify a
project
Mission-critical,
connected LOBs
Key challenges
Operational burden and resources
Manage and scale platform to support
ever-growing demand
Security and governance
Ensure streaming data is as safe and secure
as data-at-rest as Kafka usage scales
Real-time connectivity and processing
Leverage valuable legacy data to power modern,
cloud-based applications and experiences
Global availability
Maintain high availability across
environments with minimal downtime
Kafka is hard in experimentation. It only gets harder (and riskier)
as you add mission-critical data and use cases.
Operationalizing Kafka on your own is difficult
Discover, understand,
and trust your data
streams
Where did data come from?
Where is it going?
Where, when, and how was it transformed?
What’s the common taxonomy?
What is the current state of the stream?
Stream Catalog
Increase collaboration and productivity
with self-service data discovery
Stream Lineage
Understand complex data relationships
and uncover more insights
Stream Quality
Deliver trusted, high-quality event
streams to the business
“Confluent’s Stream Governance suite will play a major role in our expanded use of data in
motion and creation of a central nervous system for the enterprise. With the self-service
capabilities in stream catalog and stream lineage, we’ll be able to greatly simplify and
accelerate the onboarding of new teams working with our most valuable data."
Instantly connect popular data sources &
sinks
120+
prebuilt
connectors
100+ Confluent supported 20+ partner supported, Confluent verified
Better together: Confluent and
AWS
Confluent has deep AWS service integrations
20
Confluent helps set your data in motion on AWS
Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Together Confluent and AWS empower Endless Use Cases
across many Industries
Retail
Healthcare
Finance &
Banking
Transportation
Common in all
Industries
Inventory
Management
Personalized
Promotions
Product
Development
& Introduction
Sentiment
Analysis
Streaming
Enterprise
Messaging
Systems of
Scale for High
Traffic Periods
Connected
Health
Records
Data
Confidentiality
& Accessibility
Dynamic Staff
Allocation
Optimization
Integrated
Treatment
Proactive Patient
Care
Real-Time
Monitoring
Early-On
Fraud
Detection
Capital
Management
Market Risk
Recognition &
Investigation
Preventive
Regulatory
Scanning
Real-Time What-
If
Analysis
Trade Flow
Monitoring
Advanced
Navigation
Environmental
Factor
Processing
Fleet
Management
Predictive
Maintenance
Threat Detection
& Real-Time
Response
Traffic
Distribution
Optimization
Data Pipelines
Hybrid Cloud
Integration
Microservices
Security and
Fraud
Customer 360 Streaming ETL
Stream Processing on AWS and
Confluent
Stateless Stateful
Two types of Steam Processing
ksqlDB at a glance
What is it?
ksqlDB is an event-streaming
database for working with
streams and tables of data
All the key features of a
modern streaming solution
Aggregations Joins
Windowing
Event-time
Dual query
support
Exactly-once
semantics
Out-of-order
handling
User-defined
functions
CREATE TABLE activePromotions AS
SELECT rideId,
qualifyPromotion(distanceToDst) AS promotion
FROM locations
GROUP BY rideId
EMIT CHANGES
How does it work?
It separates compute from storage, and scales
elastically in a fault-tolerant manner
It remains highly available during disruption, even in
the face of failure to a quorum of its servers
Kafka clients Kafka streams ksqlDB
ConsumerRecords<String, String> records =
consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet())
{
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(),
Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT
CHANGES;
Flexibility Simplicity
3 modalities of stream processing with Confluent
2. Stateless Stream processing with AWS Lambda
Event
source
mapping
Lambda service
Confluent Kafka sink
connector
• Sink connector polls Kafka partitions and
invokes your function
• Lambda can be invoked synchronously or
asynchronously
• At least once semantics
• Provides a dead letter queue (DLQ) for any
failed invocations
• Sink connector scales up to a soft maximum
of 10 connectors
• Lambda service polls the Kafka partitions and invokes
your Lambda function synchronously
• Starts with one concurrent poller and customer
function
• Scaling
○ Lambda service checks every 3 minutes if
scaling is needed
○ Starts with 1 poller and scales up to ≤
#partitions
• Batch records based on a BatchSize or Batchwindow
Enrich Transaction events for
Fraud scoring
Customer
Transactio
n
Jay $10
ksqlDB
Enrich Transaction events for
Fraud scoring
Customer Transaction Avg 7 days Num trans 10m
Jay $10 $8.5 1
Enrich Transaction events for
Fraud scoring
Customer Transaction Avg 7 days Num trans 10m
Jay $10 $8.5 1
Amazon
SageMaker
AWS
Lambda
ksqlDB
3. Kinesis Data Analytics
Integrating Kinesis Data
Analytics with Confluent
allows you unlocks many
use-cases with AWS AI/ML
Services
When to use which?
ksqlDB Kafka Streams
Kinesis Data
Analytics
Lambda
Fully Managed ✅ — ✅ ✅
TYPE Stateful and Stateless
Stateful and
Stateless
Stateful and
Stateless
Stateless
FAULT TOLERANCE Exactly once Exactly once Exactly once At-least once
UDF SUPPORT ✅
(self-managed)
✅
(self-managed)
✅ ✅
LATENCY FAST VERY FAST VERY FAST FAST
When to use which?
Presentation
Amazon Redshift
sink
AWS Lambda
sink
AWS Direct
Connect
ClusterLink
LEGACY EDW
MAINFRAME
LEGACY DB
JDBC/CDC
connectors
Connect
Leverage +120 Confluent prebuilt connectors to continuously
bring valuable data from existing services on-premises, including
enterprise data warehouse, databases, and mainframes
Modernize
Increase agility in getting applications to market and reduce TCO
when freeing up resources to focus on value-generating activities
and not in managing servers
On premises AWS Cloud
Bridge
Hybrid cloud streaming with
consistent, event-driven
architecture for modern apps
Amazon Athena
AWS Glue
Amazon
SageMaker
AWS Lake
Formation
Amazon
DynamoDB
Amazon
Aurora
Amazon S3 sink
Data streams
Applications
ksqlDB
Amazon S3
Amazon Redshift
AWS Lambda
Accelerate modernization from on premises
to AWS
Thank You & Next Steps
How did we do? Enter your feedback! Learn more
Check out Confluent - AWS Workshops -
https://confluent.awsworkshop.io/
Try it out yourself!
Subscribe to Confluent Cloud on the AWS
Marketplace and start with a free $400 (to
be used within 30 days).
1
2
3
Schedule a workshop/hackathon
Pick a problem and schedule a
workshop/hackathon with Confluent and
AWS for your team.
Demo: Processing Credit
Applications in real-time
● Source static data from
RDS – MySQL
● Process applications
using static data and real-
time events in Confluent
Cloud
● Visualize data sinked into
Redshift using
QuickSight
Live Demo!

More Related Content

Similar to Real-time Analytics with Confluent and AWS

Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Building real-time serverless data applications with Confluent and AWS - Lond...
Building real-time serverless data applications with Confluent and AWS - Lond...Building real-time serverless data applications with Confluent and AWS - Lond...
Building real-time serverless data applications with Confluent and AWS - Lond...Ahmed791434
 
Building real-time serverless data applications with Confluent and AWS.pptx
Building real-time serverless data applications with Confluent and AWS.pptxBuilding real-time serverless data applications with Confluent and AWS.pptx
Building real-time serverless data applications with Confluent and AWS.pptxAhmed791434
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Precisely
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureKai Wähner
 
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...HostedbyConfluent
 
Reinventing Kafka in the Data Streaming Era - Jun Rao
Reinventing Kafka in the Data Streaming Era - Jun RaoReinventing Kafka in the Data Streaming Era - Jun Rao
Reinventing Kafka in the Data Streaming Era - Jun Raoconfluent
 
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022HostedbyConfluent
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
App modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudApp modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudKai Wähner
 
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenThe Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenHostedbyConfluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Confluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfConfluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfAhmed791434
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfDATAVERSITY
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...confluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 

Similar to Real-time Analytics with Confluent and AWS (20)

Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Building real-time serverless data applications with Confluent and AWS - Lond...
Building real-time serverless data applications with Confluent and AWS - Lond...Building real-time serverless data applications with Confluent and AWS - Lond...
Building real-time serverless data applications with Confluent and AWS - Lond...
 
Building real-time serverless data applications with Confluent and AWS.pptx
Building real-time serverless data applications with Confluent and AWS.pptxBuilding real-time serverless data applications with Confluent and AWS.pptx
Building real-time serverless data applications with Confluent and AWS.pptx
 
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
Keine Angst vorm Dinosaurier: Mainframe-Integration und -Offloading mit Confl...
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
 
Reinventing Kafka in the Data Streaming Era - Jun Rao
Reinventing Kafka in the Data Streaming Era - Jun RaoReinventing Kafka in the Data Streaming Era - Jun Rao
Reinventing Kafka in the Data Streaming Era - Jun Rao
 
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
Streaming Time Series Data With Kenny Gorman and Elena Cuevas | Current 2022
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
App modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudApp modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent Cloud
 
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenThe Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Confluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdfConfluent_AWS_ImmersionDay_Q42023.pdf
Confluent_AWS_ImmersionDay_Q42023.pdf
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it YourselfWhy Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
Why Cloud-Native Kafka Matters: 4 Reasons to Stop Managing it Yourself
 
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
Technical Deep Dive: Using Apache Kafka to Optimize Real-Time Analytics in Fi...
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Real-time Analytics with Confluent and AWS

  • 1. Unlock value of your data in real- time with Confluent and AWS Ahmed Zamzam Senior Partner Solutions Architect
  • 2. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Agenda Real-time Analytics with Confluent and AWS Building event-streaming applications real time analytics pipelines using Confluent and AWS Presentation layer Q/A Confluent makes real-time data streams top priority Rise of data in motion Event streaming with Confluent Rearchitected Kafka, together with the features you need to rapidly deploy production use cases Enable BI and AI/ML use-cases
  • 3. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Trending Now Popular on Netflix Top Picks for Joshua Curbside pickup Loyalty rewards Personalized recommendations Real-time trades Ride ETA Data is the fuel for a modern business
  • 4. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Why Real-time data? Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Real-time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  • 5. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Typical real-time data pipeline Data continuously generated at a high velocity from different sources like IoT devices, Application logs, Online transactions, etc.. Source Data captured and stored in the order it was received for set duration of time, and can be replayed indefinitely. Event Streaming Process, analyse and action on the data as soon as it is generated and in the order it was received Stream Processing Sink data different destinations. Dara Lakes (most common) and/or different Databases Presentation
  • 7. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. The rise of data in motion 70% of fortune 500 companies using Apache Kafka (majority are Confluent customers)
  • 8. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. ...many more Other Systems Other Systems Kafka Connect Kafka Cluster Kafka Connect Apache Kafka is an Event Streaming Platform
  • 9. © 2022, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Core Kafka Features 01 Publish & Subscribe to Streams of Events 02 Store your Event Streams 03 Process & Analyze your Events Streams
  • 10. Data Streaming with Confluent
  • 11. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Everywhere Be everywhere our customers want to be Cloud-Native Re-imagined Kafka experience for the Cloud Complete Enable developers to reliably & securely build next- gen apps faster The Confluent Product Advantage
  • 12. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Leave Kafka reliability worries behind with 99.99% uptime SLA and 10x built-in durability Never worry about Kafka storage limits again with Infinite Storage that’s 10x more scalable and performant Scale and shrink to handle 0 to GBps+ workloads and peak customer demands 10x faster and easier 10x Kafka Confluent Cloud offers a truly fully managed, cloud-native data streaming platform for Apache Kafka, with 10x faster scaling, infinitely more storage, and built-in resilience Resiliency Storage Elasticity
  • 13. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Confluent Platform The Enterprise Distribution of Apache Kafka Confluent Cloud Apache Kafka reengineered for the cloud Self-managed software Fully managed service VM Deploy on any platform, on premises, or cloud Available on Confluent: Everywhere
  • 14. Federated streaming, hybrid and multi-cloud. Data syndication and replication across and between clouds and on- premises, with self-service APIs, data governance, and visual tooling. Reliable & real-time data streams between all customer sites, so you can run always-on streaming analytics on the data of the entire enterprise, despite regional or cloud provider outages. Everywhere: Cluster Linking Global Central Nervous System
  • 15. “We are in the business of selling and renting clothes. We are not in the business of managing an event streaming platform… If we had to manage everything ourselves, I would’ve had to hire at least 10 more people to keep the systems up and running.” ● Architecture planning ● Cluster sizing ● Cluster provisioning ● Broker settings ● Zookeeper management ● Partition placement and data durability ● Source/sink connectors development and maintenance ● Monitoring and reporting tools setup ● Software patches and upgrades ● Security controls and integrations ● Failover design and planning ● Mirroring and geo-replication ● Streaming data governance ● Load rebalancing and monitoring ● Expansion planning & execution ● Utilization optimization and visibility ● Cluster migrations ● Infrastructure & performance upgrades / enhancements I N V E S T M E N T & T I M E V A L U E 1 2 3 4 5 Experimentation / early interest Central nervous system Mission critical, disparate LOBs Identify a project Mission-critical, connected LOBs Key challenges Operational burden and resources Manage and scale platform to support ever-growing demand Security and governance Ensure streaming data is as safe and secure as data-at-rest as Kafka usage scales Real-time connectivity and processing Leverage valuable legacy data to power modern, cloud-based applications and experiences Global availability Maintain high availability across environments with minimal downtime Kafka is hard in experimentation. It only gets harder (and riskier) as you add mission-critical data and use cases. Operationalizing Kafka on your own is difficult
  • 16. Discover, understand, and trust your data streams Where did data come from? Where is it going? Where, when, and how was it transformed? What’s the common taxonomy? What is the current state of the stream? Stream Catalog Increase collaboration and productivity with self-service data discovery Stream Lineage Understand complex data relationships and uncover more insights Stream Quality Deliver trusted, high-quality event streams to the business “Confluent’s Stream Governance suite will play a major role in our expanded use of data in motion and creation of a central nervous system for the enterprise. With the self-service capabilities in stream catalog and stream lineage, we’ll be able to greatly simplify and accelerate the onboarding of new teams working with our most valuable data."
  • 17. Instantly connect popular data sources & sinks 120+ prebuilt connectors 100+ Confluent supported 20+ partner supported, Confluent verified
  • 19. Confluent has deep AWS service integrations
  • 20. 20 Confluent helps set your data in motion on AWS
  • 21. Copyright 2023, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Together Confluent and AWS empower Endless Use Cases across many Industries Retail Healthcare Finance & Banking Transportation Common in all Industries Inventory Management Personalized Promotions Product Development & Introduction Sentiment Analysis Streaming Enterprise Messaging Systems of Scale for High Traffic Periods Connected Health Records Data Confidentiality & Accessibility Dynamic Staff Allocation Optimization Integrated Treatment Proactive Patient Care Real-Time Monitoring Early-On Fraud Detection Capital Management Market Risk Recognition & Investigation Preventive Regulatory Scanning Real-Time What- If Analysis Trade Flow Monitoring Advanced Navigation Environmental Factor Processing Fleet Management Predictive Maintenance Threat Detection & Real-Time Response Traffic Distribution Optimization Data Pipelines Hybrid Cloud Integration Microservices Security and Fraud Customer 360 Streaming ETL
  • 22. Stream Processing on AWS and Confluent
  • 23. Stateless Stateful Two types of Steam Processing
  • 24. ksqlDB at a glance What is it? ksqlDB is an event-streaming database for working with streams and tables of data All the key features of a modern streaming solution Aggregations Joins Windowing Event-time Dual query support Exactly-once semantics Out-of-order handling User-defined functions CREATE TABLE activePromotions AS SELECT rideId, qualifyPromotion(distanceToDst) AS promotion FROM locations GROUP BY rideId EMIT CHANGES How does it work? It separates compute from storage, and scales elastically in a fault-tolerant manner It remains highly available during disruption, even in the face of failure to a quorum of its servers
  • 25. Kafka clients Kafka streams ksqlDB ConsumerRecords<String, String> records = consumer.poll(100); Map<String, Integer> counts = new DefaultMap<String, Integer>(); for (ConsumerRecord<String, Integer> record : records) { String key = record.key(); int c = counts.get(key) c += record.value() counts.put(key, c) } for (Map.Entry<String, Integer> entry : counts.entrySet()) { int stateCount; int attempts; while (attempts++ < MAX_RETRIES) { try { stateCount = stateStore.getValue(entry.getKey()) stateStore.setValue(entry.getKey(), entry.getValue() + stateCount) break; } catch (StateStoreException e) { RetryUtils.backoff(attempts); } } } builder .stream("input-stream", Consumed.with(Serdes.String(), Serdes.String())) .groupBy((key, value) -> value) .count() .toStream() .to("counts", Produced.with(Serdes.String(), Serdes.Long())); SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES; Flexibility Simplicity 3 modalities of stream processing with Confluent
  • 26. 2. Stateless Stream processing with AWS Lambda Event source mapping Lambda service Confluent Kafka sink connector • Sink connector polls Kafka partitions and invokes your function • Lambda can be invoked synchronously or asynchronously • At least once semantics • Provides a dead letter queue (DLQ) for any failed invocations • Sink connector scales up to a soft maximum of 10 connectors • Lambda service polls the Kafka partitions and invokes your Lambda function synchronously • Starts with one concurrent poller and customer function • Scaling ○ Lambda service checks every 3 minutes if scaling is needed ○ Starts with 1 poller and scales up to ≤ #partitions • Batch records based on a BatchSize or Batchwindow
  • 27. Enrich Transaction events for Fraud scoring Customer Transactio n Jay $10 ksqlDB
  • 28. Enrich Transaction events for Fraud scoring Customer Transaction Avg 7 days Num trans 10m Jay $10 $8.5 1
  • 29. Enrich Transaction events for Fraud scoring Customer Transaction Avg 7 days Num trans 10m Jay $10 $8.5 1 Amazon SageMaker AWS Lambda ksqlDB
  • 30. 3. Kinesis Data Analytics Integrating Kinesis Data Analytics with Confluent allows you unlocks many use-cases with AWS AI/ML Services
  • 31. When to use which?
  • 32. ksqlDB Kafka Streams Kinesis Data Analytics Lambda Fully Managed ✅ — ✅ ✅ TYPE Stateful and Stateless Stateful and Stateless Stateful and Stateless Stateless FAULT TOLERANCE Exactly once Exactly once Exactly once At-least once UDF SUPPORT ✅ (self-managed) ✅ (self-managed) ✅ ✅ LATENCY FAST VERY FAST VERY FAST FAST When to use which?
  • 34. Amazon Redshift sink AWS Lambda sink AWS Direct Connect ClusterLink LEGACY EDW MAINFRAME LEGACY DB JDBC/CDC connectors Connect Leverage +120 Confluent prebuilt connectors to continuously bring valuable data from existing services on-premises, including enterprise data warehouse, databases, and mainframes Modernize Increase agility in getting applications to market and reduce TCO when freeing up resources to focus on value-generating activities and not in managing servers On premises AWS Cloud Bridge Hybrid cloud streaming with consistent, event-driven architecture for modern apps Amazon Athena AWS Glue Amazon SageMaker AWS Lake Formation Amazon DynamoDB Amazon Aurora Amazon S3 sink Data streams Applications ksqlDB Amazon S3 Amazon Redshift AWS Lambda Accelerate modernization from on premises to AWS
  • 35. Thank You & Next Steps How did we do? Enter your feedback! Learn more Check out Confluent - AWS Workshops - https://confluent.awsworkshop.io/ Try it out yourself! Subscribe to Confluent Cloud on the AWS Marketplace and start with a free $400 (to be used within 30 days). 1 2 3 Schedule a workshop/hackathon Pick a problem and schedule a workshop/hackathon with Confluent and AWS for your team.
  • 37. ● Source static data from RDS – MySQL ● Process applications using static data and real- time events in Confluent Cloud ● Visualize data sinked into Redshift using QuickSight Live Demo!