Successfully reported this slideshow.
Your SlideShare is downloading. ×

Kappa vs Lambda Architectures and Technology Comparison

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 26 Ad

Kappa vs Lambda Architectures and Technology Comparison

Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.

This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.

The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.

Video recording of this presentation:
https://youtu.be/j7D29eyysDw

Further reading:

https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/

https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/

https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/

Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.

This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.

The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.

Video recording of this presentation:
https://youtu.be/j7D29eyysDw

Further reading:

https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/

https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/

https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Kappa vs Lambda Architectures and Technology Comparison (20)

Advertisement

More from Kai Wähner (20)

Recently uploaded (20)

Advertisement

Kappa vs Lambda Architectures and Technology Comparison

  1. 1. Kappa vs. Lambda Architecture Use Cases, Trade-offs, Technologies, Comparison Kai Waehner Field CTO kai.waehner@confluent.io linkedin.com/in/kaiwaehner @KaiWaehner confluent.io kai-waehner.de
  2. 2. An Event Streaming Platform The Underpinning of Data in Motion 2 Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  3. 3. STREAM PROCESSING CONNECTORS Example Architecture for Data in Motion ksqlDB KStreams Real-time decision making for claim processing and fraud detection Dashboard Oracle DB Oracle CDC CONNECTOR Salesforce CDC CONNECTOR Salesforce Source / Sink CONNECTOR Fraud Detection App kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  4. 4. Kafka Connect Kafka Cluster CRM Integration Domain-Driven Design for your Integration Layer Legacy Integration Custom Application ESB Connector Java / Python / ksqlDB / etc. Schema Registry Event Streaming Platform CRM Domain Legacy Domain Payment Domain è Independent and loosely coupled, but scalable, highly available and reliable! kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  5. 5. Lambda Architecture Option 1: Unified serving layer 7 Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Serving Layer Real-Time App (Data Processing in Motion) Batch App (Data Processing at Rest) ms min/hr kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  6. 6. 8 Data Source Real-Time Layer (Data Processing in Motion) Batch Layer (Data Processing at Rest) Real-time Query Mixed Query ms min/hr Speed View Batch View Batch Query Lambda Architecture Option 2: Separate serving layers kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  7. 7. Concerns with the Lambda Architecture 9 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  8. 8. 10 Data Source Real-Time Layer (Data Processing in Motion) Real-Time App (Data Processing in Motion) Storage Batch App (Data Processing at Rest) Storage ms min/hr Storage Kappa Architecture One pipeline for real-time and batch consumers kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  9. 9. Kappa is NOT a free lunch 11 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  10. 10. Kappa Concerns Solved • Data availability / retention à Compacted Topics, Tiered Storage • Data consistency and fault-tolerance à Exactly-once semantics, Multi-Region Clusters, Cluster Linking • Handling late-arriving data à State management in the streaming application, proper data sinks, replay with guaranteed ordering and timestamps • Data reprocessing and backfill à Dynamic clusters, stateful applications (Kafka Streams, ksqlDB, external stream processing framework like Apache Flink) • Data integration à Kafka Connect for sources and sinks, clients for any language, REST Proxy (real-time but also batch and RPC 12 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  11. 11. Kappa @ Uber 13 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  12. 12. Kappa @ Shopify 14 Kappa Building Blocks The Log (Kafka) Durability with Topic Compaction and Tiered Storage Consistency via Exactly-Once Semantics (EOS) Data Integration via Kafka Connect Elasticity via dynamic Kafka clusters Streaming Framework (Kafka Streams / Flink) Reliability and scalability Fault tolerance State management Sinks Update/Upsert for simplified design: RDBMS, NoSQL, Compacted Kafka Topics Append-only: Regular Kafka Topics, Time Series kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  13. 13. Kappa @ Disney 15 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  14. 14. Kappa @ Twitter 17 https://blog.twitter.com/engineering/en_us/topics/infrastructure/2021/processing-billions-of-events-in-real-time-at-twitter- Migration from Hadoop and Kafka to a hybrid architecture on both Twitter data center and Google Cloud Platform with Kafka and GCP, Twitter is able to process billions of events in real-time and achieve low latency, high accuracy, stability, architecture simplicity, and reduced operation cost kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  15. 15. Benefits of the Kappa Architecture The Kappa architecture leverages a single source of truth with a focus on simplicity in the enterprise architecture • Improve streaming to handle all the cases • One codebase that is always in synch • One set of infrastructure and technology • The heart of the infrastructure is real-time, scalable, and reliable • Improved data quality with guaranteed ordering and no mismatches • No need to re-architect for new use cases, just connect new consumers (real-time, near real-time, batch, RPC) 18 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  16. 16. Store Data Long-Term in Kafka? Kafka Processing App Storage Transactions, auth, quota enforcement, compaction, ... kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  17. 17. Use Cases for Reprocessing Historical Events Give me all events from time A to time B Real-time Producer Time • New consumer application • Error-handling • Compliance / regulatory processing • Query and analyze existing events • Schema changes in analytics platform • Model training Real-time Consumer Consumer of Historical Data kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  18. 18. Tiered Storage @ Uber 23 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  19. 19. Confluent Tiered Storage for Kafka 24 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  20. 20. honeycomb - Observability • Kafka is the “beating heart” of Honeycomb, powering the 99.99% ingest availability SLO • Ingest telemetry data • Buffer big data before processing in “retriever” columnar storage database • True decoupling to innovate more quickly by shipping to each service • Guard against the risk of a bug in retriever corrupting customer data • Confluent Tiered Storage frees the engineering from being storage-bound • Has grown 10x in two years while TCO for Kafka has only gone up 20% • Replayability from Tiered Storage after outage for error handling 25 https://www.honeycomb.io/blog/scaling-kafka-observability-pipelines/ kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  21. 21. Kappa Architecture for Streaming Analytics with Kafka and TensorFlow 26 MQTT Proxy MongoDB Storage MongoDB Dashboards Search Analytics Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem TensorFlow Other Components Kafka Streams Application All Data Critical Data Ingest Data Potential Detect TensorFlow Train Analytic Model ksqlDB Analytic Model Preprocess Data Consume Data Deploy Analytic Model Tiered Storage Mobile App BI Tool kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  22. 22. Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model B Model A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io 27 Model X (at a later time) kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  23. 23. “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, ksqlDB and TensorFlow 28 kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  24. 24. Car Engine Car Self-driving Car Alternatives for Data in Motion kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  25. 25. Native Kafka Kafka Protocol (not fully compliant) Non Kafka The Event Streaming Landscape – Cloud-native? Complete? Everywhere? Apache Kafka Products and Cloud Services, “Compatible” Offerings, and other Streaming Technologies Self Managed (Everywhere) Partially Managed Fully Managed (Cloud only) (Cloud only) (Everywhere) (Kafka mapper not part of cloud offering) Platforms Tools kai-waehner.de | @KaiWaehner | Kappa vs. Lambda Architecture
  26. 26. Kai Waehner Field CTO kai.waehner@confluent.io @KaiWaehner confluent.io kai-waehner.de linkedin.com/in/kaiwaehner Questions? Feedback? Let’s connect!

×