© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SPONSORED BY CONFLUENT
Building real-time serverless data
applications with Confluent and AWS
Ahmed Zamzam(he/his)
PAT036– AWS Summit London
Senior AWS Partner Solutions Architect
Confluent
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why real-time and Serverless
Event streaming with Confluent
Serverless Stream processing with Confluent and AWS
Best practices
Agenda
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Faster time
to market
Automatic
Scaling
Lower total cost
of ownership
Eliminate operational
overhead
Built-in high
availability & security
Serverless accelerates innovation
Why Serverless?
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Why real-time?
Real-time Seconds Minutes Hours Days Months
Value
of
data
to
decision-making
Preventive/Predictive
Actionable Reactive Historical
Source: Perishable insights, Mike Gualtieri, Forrester
Data loses value quickly over time
Time critical
decisions
Traditional “batch” business
intelligence
Information half-life
in decision-making
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Typical real-time data pipeline
Data continuously
generated at a high
velocity from different
sources like IoT devices,
Application logs, Online
transactions, etc..
Source
Data captured and
stored in the order it
was received for set
duration of time and
can be replayed
indefinitely.
Event Streaming
Process, analyse and
action on the data as
soon as it is generated
and, in the order, it was
received
Stream Processing
Sink data different
destinations. Dara Lakes
(most common) and/or
different Databases
Presentation
Governance
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Typical real-time data pipeline
Data continuously
generated at a high
velocity from different
sources like IoT devices,
Application logs, Online
transactions, etc..
Source
Data captured and
stored in the order it
was received for set
duration of time and
can be replayed
indefinitely.
Event Streaming
Process, analyse and
action on the data as
soon as it is generated
and, in the order, it was
received
Stream Processing
Sink data different
destinations. Dara Lakes
(most common) and/or
different Databases
Presentation
Governance
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The Confluent Product Advantage
Everywhere
Be everywhere our
customers
want to be
Cloud-Native
Re-imagined
Kafka experience
for the Cloud
Complete
Enable developers
to reliably &
securely build next-
gen apps faster
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Leave Kafka reliability worries behind with
99.99% uptime SLA and 10x built-in durability
Never worry about Kafka storage limits again
with Infinite Storage that’s 10x more scalable
and performant
Scale and shrink to handle 0 to GBps+
workloads and peak customer demands
10x faster and easier
10x Kafka
Confluent Cloud offers a truly
fully managed, cloud-native
data streaming platform for
Apache Kafka, with 10x faster
scaling, infinitely more storage,
and built-in resilience
Resiliency
Storage
Elasticity
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Together Confluent and AWS empower Endless Use Cases across many
Industries
Retail
Healthcare
Finance &
Banking
Transportation
Common in all
Industries
Inventory
Management
Personalized
Promotions
Product
Development
& Introduction
Sentiment
Analysis
Streaming
Enterprise
Messaging
Systems of
Scale for High
Traffic Periods
Connected
Health
Records
Data
Confidentiality
& Accessibility
Dynamic Staff
Allocation
Optimization
Integrated
Treatment
Proactive
Patient Care
Real-Time
Monitoring
Early-On
Fraud
Detection
Capital
Management
Market Risk
Recognition &
Investigation
Preventive
Regulatory
Scanning
Real-Time
What-If
Analysis
Trade Flow
Monitoring
Advanced
Navigation
Environmental
Factor
Processing
Fleet
Management
Predictive
Maintenance
Threat
Detection &
Real-Time
Response
Traffic
Distribution
Optimization
Data Pipelines
Hybrid Cloud
Integration
Microservices
Security and
Fraud
Customer 360 Streaming ETL
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data continuously
generated at a high
velocity from different
sources like IoT devices,
Application logs, Online
transactions, etc..
Source
Data captured and
stored in the order it
was received for set
duration of time and
can be replayed
indefinitely.
Event Streaming
Process, analyse and
action on the data as
soon as it is generated
and, in the order, it was
received
Stream Processing
Sink data different
destinations. Dara Lakes
(most common) and/or
different Databases
Presentation
Governance
Typical real-time data pipeline
ksqlDB at a glance
What is it?
ksqlDB is an event-streaming
database for working with
streams and tables of data
All the key features of a
modern streaming solution
Aggregations Joins
Windowing
Event-time
Dual query
support
Exactly-once
semantics
Out-of-order
handling
User-defined
functions
CREATE TABLE activePromotions AS
SELECT rideId,
qualifyPromotion(distanceToDst) AS
promotion
FROM locations
GROUP BY rideId
EMIT CHANGES
How does it work?
It separates compute from storage, and scales
elastically in a fault-tolerant manner
It remains highly available during disruption, even in
the face of failure to a quorum of its servers
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
2. Stateless Stream processing with AWS Lambda
Event
source
mapping
Lambda service
Confluent Kafka sink
connector
• Sink connector polls Kafka partitions and
invokes your function
• Lambda can be invoked synchronously or
asynchronously
• At least once semantics
• Provides a dead letter queue (DLQ) for any
failed invocations
• Sink connector scales up to a soft maximum
of 10 connectors
• Lambda service polls the Kafka partitions and invokes
your Lambda function synchronously
• Starts with one concurrent poller and customer
function
• Scaling
○ Lambda service checks every 3 minutes if
scaling is needed
○ Starts with 1 poller and scales up to ≤
#partitions
• Batch records based on a BatchSize or Batchwindow
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Optimize batch-size/batch-window to
lower cost
Lambda
Function
instance
Poller
Lambda’s maximum execution time is 15 minutes
• Adjust the batch size (max 10,000) to ensure
execution time is optimal
• For sparse topics, consider batch window to
aggregate over a time period
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Capture and log exceptions
data
producer
Lambda
service
function A
(instance 1)
batch size =
200
300 records
✔
function A
(instance 1)
✔
Catch exceptions and log
to CloudWatch Logs
CloudWatch
Logs
Return successfully from
Lambda function
• Ensure processing moves forward by catching exceptions and returning successfully
!
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enrich Transaction events for Fraud scoring
Customer
Transactio
n
Jay $10
ksqlDB
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer Transaction Avg 7 days Num trans 10m
Jay $10 $8.5 1
Enrich Transaction events for Fraud scoring
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customer Transaction Avg 7 days Num trans 10m
Jay $10 $8.5 1
Amazon
SageMaker
AWS
Lambda
ksqlDB
Enrich Transaction events for Fraud scoring
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which one to use?
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Which one to use?
CHOOSE THE RIGHT STREAM PROCESSING OPTION DEPENDING ON YOUR NEEDS
ksqlDB Kafka Streams
Kinesis Data
Analytics
Lambda
Fully Managed ✔ — ✔ ✔
TYPE
Stateful and
Stateless
Stateful and
Stateless
Stateful and
Stateless
Stateless
FAULT
TOLERANCE
Exactly once Exactly once Exactly once At-least once
UDF SUPPORT
✔
(self-managed)
✔
(self-managed)
✔ ✔
LATENCY FAST VERY FAST VERY FAST FAST
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Please complete the session
survey in the mobile app
Ahmed Zamzam
linkedin.com/in/ahmed-saef-zamzam/

Building real-time serverless data applications with Confluent and AWS - London Summit.pptx

  • 1.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. SPONSORED BY CONFLUENT Building real-time serverless data applications with Confluent and AWS Ahmed Zamzam(he/his) PAT036– AWS Summit London Senior AWS Partner Solutions Architect Confluent
  • 2.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Why real-time and Serverless Event streaming with Confluent Serverless Stream processing with Confluent and AWS Best practices Agenda
  • 3.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Faster time to market Automatic Scaling Lower total cost of ownership Eliminate operational overhead Built-in high availability & security Serverless accelerates innovation Why Serverless?
  • 4.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Why real-time? Real-time Seconds Minutes Hours Days Months Value of data to decision-making Preventive/Predictive Actionable Reactive Historical Source: Perishable insights, Mike Gualtieri, Forrester Data loses value quickly over time Time critical decisions Traditional “batch” business intelligence Information half-life in decision-making
  • 5.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Typical real-time data pipeline Data continuously generated at a high velocity from different sources like IoT devices, Application logs, Online transactions, etc.. Source Data captured and stored in the order it was received for set duration of time and can be replayed indefinitely. Event Streaming Process, analyse and action on the data as soon as it is generated and, in the order, it was received Stream Processing Sink data different destinations. Dara Lakes (most common) and/or different Databases Presentation Governance
  • 6.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Typical real-time data pipeline Data continuously generated at a high velocity from different sources like IoT devices, Application logs, Online transactions, etc.. Source Data captured and stored in the order it was received for set duration of time and can be replayed indefinitely. Event Streaming Process, analyse and action on the data as soon as it is generated and, in the order, it was received Stream Processing Sink data different destinations. Dara Lakes (most common) and/or different Databases Presentation Governance
  • 7.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. The Confluent Product Advantage Everywhere Be everywhere our customers want to be Cloud-Native Re-imagined Kafka experience for the Cloud Complete Enable developers to reliably & securely build next- gen apps faster
  • 8.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Leave Kafka reliability worries behind with 99.99% uptime SLA and 10x built-in durability Never worry about Kafka storage limits again with Infinite Storage that’s 10x more scalable and performant Scale and shrink to handle 0 to GBps+ workloads and peak customer demands 10x faster and easier 10x Kafka Confluent Cloud offers a truly fully managed, cloud-native data streaming platform for Apache Kafka, with 10x faster scaling, infinitely more storage, and built-in resilience Resiliency Storage Elasticity
  • 9.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Together Confluent and AWS empower Endless Use Cases across many Industries Retail Healthcare Finance & Banking Transportation Common in all Industries Inventory Management Personalized Promotions Product Development & Introduction Sentiment Analysis Streaming Enterprise Messaging Systems of Scale for High Traffic Periods Connected Health Records Data Confidentiality & Accessibility Dynamic Staff Allocation Optimization Integrated Treatment Proactive Patient Care Real-Time Monitoring Early-On Fraud Detection Capital Management Market Risk Recognition & Investigation Preventive Regulatory Scanning Real-Time What-If Analysis Trade Flow Monitoring Advanced Navigation Environmental Factor Processing Fleet Management Predictive Maintenance Threat Detection & Real-Time Response Traffic Distribution Optimization Data Pipelines Hybrid Cloud Integration Microservices Security and Fraud Customer 360 Streaming ETL
  • 10.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Data continuously generated at a high velocity from different sources like IoT devices, Application logs, Online transactions, etc.. Source Data captured and stored in the order it was received for set duration of time and can be replayed indefinitely. Event Streaming Process, analyse and action on the data as soon as it is generated and, in the order, it was received Stream Processing Sink data different destinations. Dara Lakes (most common) and/or different Databases Presentation Governance Typical real-time data pipeline
  • 11.
    ksqlDB at aglance What is it? ksqlDB is an event-streaming database for working with streams and tables of data All the key features of a modern streaming solution Aggregations Joins Windowing Event-time Dual query support Exactly-once semantics Out-of-order handling User-defined functions CREATE TABLE activePromotions AS SELECT rideId, qualifyPromotion(distanceToDst) AS promotion FROM locations GROUP BY rideId EMIT CHANGES How does it work? It separates compute from storage, and scales elastically in a fault-tolerant manner It remains highly available during disruption, even in the face of failure to a quorum of its servers
  • 12.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. 2. Stateless Stream processing with AWS Lambda Event source mapping Lambda service Confluent Kafka sink connector • Sink connector polls Kafka partitions and invokes your function • Lambda can be invoked synchronously or asynchronously • At least once semantics • Provides a dead letter queue (DLQ) for any failed invocations • Sink connector scales up to a soft maximum of 10 connectors • Lambda service polls the Kafka partitions and invokes your Lambda function synchronously • Starts with one concurrent poller and customer function • Scaling ○ Lambda service checks every 3 minutes if scaling is needed ○ Starts with 1 poller and scales up to ≤ #partitions • Batch records based on a BatchSize or Batchwindow
  • 13.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Optimize batch-size/batch-window to lower cost Lambda Function instance Poller Lambda’s maximum execution time is 15 minutes • Adjust the batch size (max 10,000) to ensure execution time is optimal • For sparse topics, consider batch window to aggregate over a time period
  • 14.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Capture and log exceptions data producer Lambda service function A (instance 1) batch size = 200 300 records ✔ function A (instance 1) ✔ Catch exceptions and log to CloudWatch Logs CloudWatch Logs Return successfully from Lambda function • Ensure processing moves forward by catching exceptions and returning successfully !
  • 15.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Enrich Transaction events for Fraud scoring Customer Transactio n Jay $10 ksqlDB
  • 16.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Customer Transaction Avg 7 days Num trans 10m Jay $10 $8.5 1 Enrich Transaction events for Fraud scoring
  • 17.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Customer Transaction Avg 7 days Num trans 10m Jay $10 $8.5 1 Amazon SageMaker AWS Lambda ksqlDB Enrich Transaction events for Fraud scoring
  • 18.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Which one to use?
  • 19.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Which one to use? CHOOSE THE RIGHT STREAM PROCESSING OPTION DEPENDING ON YOUR NEEDS ksqlDB Kafka Streams Kinesis Data Analytics Lambda Fully Managed ✔ — ✔ ✔ TYPE Stateful and Stateless Stateful and Stateless Stateful and Stateless Stateless FAULT TOLERANCE Exactly once Exactly once Exactly once At-least once UDF SUPPORT ✔ (self-managed) ✔ (self-managed) ✔ ✔ LATENCY FAST VERY FAST VERY FAST FAST
  • 20.
    © 2023, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Thank you! © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app Ahmed Zamzam linkedin.com/in/ahmed-saef-zamzam/