SlideShare a Scribd company logo
1 of 26
Kappa vs. Lambda Architecture
Use Cases, Trade-offs, Technologies, Comparison
Kai Waehner
Field CTO
kai.waehner@confluent.io
linkedin.com/in/kaiwaehner
@KaiWaehner
confluent.io
kai-waehner.de
An Event Streaming Platform
The Underpinning of Data in Motion
2
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
STREAM
PROCESSING
CONNECTORS
Example Architecture for Data in Motion
ksqlDB
KStreams
Real-time decision making for claim processing and fraud detection
Dashboard
Oracle
DB
Oracle
CDC
CONNECTOR
Salesforce CDC
CONNECTOR
Salesforce
Source / Sink
CONNECTOR
Fraud Detection App
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kafka Connect
Kafka Cluster
CRM Integration
Domain-Driven Design for your Integration Layer
Legacy
Integration
Custom
Application
ESB Connector
Java / Python /
ksqlDB / etc.
Schema Registry
Event Streaming Platform
CRM Domain Legacy Domain Payment Domain
è Independent and loosely coupled, but scalable, highly available and reliable!
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Lambda Architecture
Option 1: Unified serving layer
7
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Serving
Layer
Real-Time App
(Data Processing in Motion)
Batch App
(Data Processing at Rest)
ms
min/hr
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
8
Data
Source
Real-Time Layer
(Data Processing in Motion)
Batch Layer
(Data Processing at Rest)
Real-time Query
Mixed Query
ms
min/hr
Speed
View
Batch
View
Batch Query
Lambda Architecture
Option 2: Separate serving layers
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Concerns with the Lambda Architecture
9
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
10
Data
Source
Real-Time Layer
(Data Processing in Motion)
Real-Time App
(Data Processing in Motion)
Storage
Batch App
(Data Processing at Rest)
Storage
ms
min/hr
Storage
Kappa Architecture
One pipeline for real-time and batch consumers
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa is NOT a free lunch
11
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa Concerns Solved
• Data availability / retention
à Compacted Topics, Tiered Storage
• Data consistency and fault-tolerance
à Exactly-once semantics, Multi-Region Clusters, Cluster Linking
• Handling late-arriving data
à State management in the streaming application, proper data
sinks, replay with guaranteed ordering and timestamps
• Data reprocessing and backfill
à Dynamic clusters, stateful applications (Kafka Streams, ksqlDB,
external stream processing framework like Apache Flink)
• Data integration
à Kafka Connect for sources and sinks, clients for any language,
REST Proxy (real-time but also batch and RPC
12
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Uber
13
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Shopify
14
Kappa Building Blocks
The Log (Kafka)
Durability with Topic Compaction and Tiered Storage
Consistency via Exactly-Once Semantics (EOS)
Data Integration via Kafka Connect
Elasticity via dynamic Kafka clusters
Streaming Framework (Kafka Streams / Flink)
Reliability and scalability
Fault tolerance
State management
Sinks
Update/Upsert for simplified design:
RDBMS, NoSQL, Compacted Kafka Topics
Append-only: Regular Kafka Topics, Time Series
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Disney
15
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa @ Twitter
17
https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/processing-billions-of-events-in-real-time-at-twitter-
Migration from Hadoop and Kafka to a hybrid architecture on both Twitter data
center and Google Cloud Platform with Kafka and GCP, Twitter is able to process
billions of events in real-time and achieve low latency, high accuracy, stability,
architecture simplicity, and reduced operation cost
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Benefits of the Kappa Architecture
The Kappa architecture leverages a single source of truth with a focus on simplicity in
the enterprise architecture
• Improve streaming to handle all the cases
• One codebase that is always in synch
• One set of infrastructure and technology
• The heart of the infrastructure is real-time, scalable, and reliable
• Improved data quality with guaranteed ordering and no mismatches
• No need to re-architect for new use cases, just connect new consumers (real-time, near
real-time, batch, RPC)
18
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Store Data
Long-Term
in Kafka?
Kafka
Processing
App
Storage
Transactions, auth,
quota enforcement,
compaction, ...
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Use Cases for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Schema changes in analytics platform
• Model training
Real-time Consumer
Consumer of Historical Data
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Tiered Storage @ Uber
23
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Confluent Tiered Storage for Kafka
24
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
honeycomb - Observability
• Kafka is the “beating heart” of Honeycomb, powering the 99.99% ingest availability SLO
• Ingest telemetry data
• Buffer big data before processing in “retriever” columnar storage database
• True decoupling to innovate more quickly by shipping to each service
• Guard against the risk of a bug in retriever corrupting customer data
• Confluent Tiered Storage frees the engineering from being storage-bound
• Has grown 10x in two years while TCO for Kafka has only gone up 20%
• Replayability from Tiered Storage after outage for error handling
25
https://www.honeycomb.io/blog/scaling-kafka-observability-pipelines/
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kappa Architecture
for Streaming Analytics with Kafka and TensorFlow
26
MQTT Proxy
MongoDB
Storage
MongoDB
Dashboards
Search
Analytics
Kafka Cluster Kafka Connect
Car Sensors
Kafka Ecosystem
TensorFlow
Other Components
Kafka Streams
Application
All
Data
Critical
Data
Ingest
Data
Potential Detect
TensorFlow
Train Analytic
Model
ksqlDB
Analytic
Model
Preprocess Data Consume
Data
Deploy
Analytic Model
Tiered Storage
Mobile App
BI Tool
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model B
Model A
Producer
Distributed Commit
Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
27
Model X
(at a later time)
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, ksqlDB and TensorFlow
28
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Car Engine Car Self-driving Car
Alternatives for Data in Motion
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Native Kafka Kafka Protocol
(not fully compliant)
Non Kafka
The Event Streaming Landscape – Cloud-native? Complete? Everywhere?
Apache Kafka Products and Cloud Services, “Compatible” Offerings, and other Streaming Technologies
Self Managed
(Everywhere)
Partially
Managed
Fully Managed
(Cloud only)
(Cloud
only)
(Everywhere)
(Kafka mapper not
part of cloud offering)
Platforms Tools
kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
Kai Waehner
Field CTO
kai.waehner@confluent.io
@KaiWaehner
confluent.io
kai-waehner.de
linkedin.com/in/kaiwaehner
Questions? Feedback?
Let’s connect!

More Related Content

What's hot

Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Kai Wähner
 

What's hot (20)

Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake ArchitectureServerless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
Serverless Kafka on AWS as Part of a Cloud-native Data Lake Architecture
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache KafkaTop 5 Event Streaming Use Cases for 2021 with Apache Kafka
Top 5 Event Streaming Use Cases for 2021 with Apache Kafka
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Kafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance IndustryKafka and Machine Learning in Banking and Insurance Industry
Kafka and Machine Learning in Banking and Insurance Industry
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 

Similar to Kappa vs Lambda Architectures and Technology Comparison

Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 

Similar to Kappa vs Lambda Architectures and Technology Comparison (20)

Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
The Top 5 Event Streaming Use Cases & Architectures in 2021
The Top 5 Event Streaming Use Cases & Architectures in 2021The Top 5 Event Streaming Use Cases & Architectures in 2021
The Top 5 Event Streaming Use Cases & Architectures in 2021
 
App modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudApp modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent Cloud
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
Set Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO RoundtableSet Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO Roundtable
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice ArchitecturesApache Kafka as Event Streaming Platform for Microservice Architectures
Apache Kafka as Event Streaming Platform for Microservice Architectures
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
 
Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain Apache Kafka® + Machine Learning for Supply Chain 
Apache Kafka® + Machine Learning for Supply Chain 
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
 
Supply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache KafkaSupply Chain Optimization with Apache Kafka
Supply Chain Optimization with Apache Kafka
 

More from Kai Wähner

The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Kai Wähner
 

More from Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 
Apache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and LogisticsApache Kafka in the Transportation and Logistics
Apache Kafka in the Transportation and Logistics
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
 
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4....
 
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache KafkaIBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
IBM Cloud Pak for Integration with Confluent Platform powered by Apache Kafka
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?
 
Apache Kafka in the Insurance Industry
Apache Kafka in the Insurance IndustryApache Kafka in the Insurance Industry
Apache Kafka in the Insurance Industry
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdfThe Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
The Evolution of Web App Testing_ An Ultimate Guide to Future Trends.pdf
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCAOpenChain Webinar: AboutCode and Beyond - End-to-End SCA
OpenChain Webinar: AboutCode and Beyond - End-to-End SCA
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 

Kappa vs Lambda Architectures and Technology Comparison

  • 1. Kappa vs. Lambda Architecture Use Cases, Trade-offs, Technologies, Comparison Kai Waehner Field CTO kai.waehner@confluent.io linkedin.com/in/kaiwaehner @KaiWaehner confluent.io kai-waehner.de
  • 2. An Event Streaming Platform The Underpinning of Data in Motion 2 Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 3. STREAM PROCESSING CONNECTORS Example Architecture for Data in Motion ksqlDB KStreams Real-time decision making for claim processing and fraud detection Dashboard Oracle DB Oracle CDC CONNECTOR Salesforce CDC CONNECTOR Salesforce Source / Sink CONNECTOR Fraud Detection App kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 4. Kafka Connect Kafka Cluster CRM Integration Domain-Driven Design for your Integration Layer Legacy Integration Custom Application ESB Connector Java / Python / ksqlDB / etc. Schema Registry Event Streaming Platform CRM Domain Legacy Domain Payment Domain è Independent and loosely coupled, but scalable, highly available and reliable! kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 5. Lambda Architecture Option 1: Unified serving layer 7 Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Serving Layer Real-Time App (Data Processing in Motion) Batch App (Data Processing at Rest) ms min/hr kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 6. 8 Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Real-time Query Mixed Query ms min/hr Speed View Batch View Batch Query Lambda Architecture Option 2: Separate serving layers kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 7. Concerns with the Lambda Architecture 9 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 8. 10 Data Source Real-Time Layer (Data Processing in Motion) Real-Time App (Data Processing in Motion) Storage Batch App (Data Processing at Rest) Storage ms min/hr Storage Kappa Architecture One pipeline for real-time and batch consumers kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 9. Kappa is NOT a free lunch 11 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 10. Kappa Concerns Solved • Data availability / retention à Compacted Topics, Tiered Storage • Data consistency and fault-tolerance à Exactly-once semantics, Multi-Region Clusters, Cluster Linking • Handling late-arriving data à State management in the streaming application, proper data sinks, replay with guaranteed ordering and timestamps • Data reprocessing and backfill à Dynamic clusters, stateful applications (Kafka Streams, ksqlDB, external stream processing framework like Apache Flink) • Data integration à Kafka Connect for sources and sinks, clients for any language, REST Proxy (real-time but also batch and RPC 12 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 11. Kappa @ Uber 13 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 12. Kappa @ Shopify 14 Kappa Building Blocks The Log (Kafka) Durability with Topic Compaction and Tiered Storage Consistency via Exactly-Once Semantics (EOS) Data Integration via Kafka Connect Elasticity via dynamic Kafka clusters Streaming Framework (Kafka Streams / Flink) Reliability and scalability Fault tolerance State management Sinks Update/Upsert for simplified design: RDBMS, NoSQL, Compacted Kafka Topics Append-only: Regular Kafka Topics, Time Series kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 13. Kappa @ Disney 15 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 14. Kappa @ Twitter 17 https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/processing-billions-of-events-in-real-time-at-twitter- Migration from Hadoop and Kafka to a hybrid architecture on both Twitter data center and Google Cloud Platform with Kafka and GCP, Twitter is able to process billions of events in real-time and achieve low latency, high accuracy, stability, architecture simplicity, and reduced operation cost kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 15. Benefits of the Kappa Architecture The Kappa architecture leverages a single source of truth with a focus on simplicity in the enterprise architecture • Improve streaming to handle all the cases • One codebase that is always in synch • One set of infrastructure and technology • The heart of the infrastructure is real-time, scalable, and reliable • Improved data quality with guaranteed ordering and no mismatches • No need to re-architect for new use cases, just connect new consumers (real-time, near real-time, batch, RPC) 18 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 16. Store Data Long-Term in Kafka? Kafka Processing App Storage Transactions, auth, quota enforcement, compaction, ... kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 17. Use Cases for Reprocessing Historical Events Give me all events from time A to time B Real-time Producer Time • New consumer application • Error-handling • Compliance / regulatory processing • Query and analyze existing events • Schema changes in analytics platform • Model training Real-time Consumer Consumer of Historical Data kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 18. Tiered Storage @ Uber 23 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 19. Confluent Tiered Storage for Kafka 24 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 20. honeycomb - Observability • Kafka is the “beating heart” of Honeycomb, powering the 99.99% ingest availability SLO • Ingest telemetry data • Buffer big data before processing in “retriever” columnar storage database • True decoupling to innovate more quickly by shipping to each service • Guard against the risk of a bug in retriever corrupting customer data • Confluent Tiered Storage frees the engineering from being storage-bound • Has grown 10x in two years while TCO for Kafka has only gone up 20% • Replayability from Tiered Storage after outage for error handling 25 https://www.honeycomb.io/blog/scaling-kafka-observability-pipelines/ kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 21. Kappa Architecture for Streaming Analytics with Kafka and TensorFlow 26 MQTT Proxy MongoDB Storage MongoDB Dashboards Search Analytics Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem TensorFlow Other Components Kafka Streams Application All Data Critical Data Ingest Data Potential Detect TensorFlow Train Analytic Model ksqlDB Analytic Model Preprocess Data Consume Data Deploy Analytic Model Tiered Storage Mobile App BI Tool kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 22. Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model B Model A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io 27 Model X (at a later time) kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 23. “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, ksqlDB and TensorFlow 28 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 24. Car Engine Car Self-driving Car Alternatives for Data in Motion kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  • 25. Native Kafka Kafka Protocol (not fully compliant) Non Kafka The Event Streaming Landscape – Cloud-native? Complete? Everywhere? Apache Kafka Products and Cloud Services, “Compatible” Offerings, and other Streaming Technologies Self Managed (Everywhere) Partially Managed Fully Managed (Cloud only) (Cloud only) (Everywhere) (Kafka mapper not part of cloud offering) Platforms Tools kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture