SlideShare a Scribd company logo
1 of 19
Download to read offline
Kafka Vs Kinesis
Agenda
1. Kafka architecture high level overview
2. Comparison with Kinesis in terms of throughput and cost
3. Headaches with Kinesis and Kafka
4. Use case for the data team
5. Reasons for switching
6. Success stories
7. References
Kafka ArchitectureVery similar to Kinesis!
That shouldn’t come as a surprise
as Kinesis was inspired by Kafka.
Kinesis Kafka
Stream Topic
Shard Partition
DynamoDB tables Zookeeper
Architecture (Contd..)
▶ Kafka broker stores all messages in the partitions configured for that particular topic. It
ensures the messages are equally shared between partitions.
▶ Once the consumer subscribes to a topic, Kafka will provide the current offset of the topic to
the consumer and also saves the offset in the Zookeeper ensemble.
▶ Consumer will request the Kafka in a regular interval (configurable) for new messages.
▶ Once the messages are processed, consumer will send an acknowledgement to the Kafka
broker.
▶ Once Kafka receives an acknowledgement, it changes the offset to the new value and
Working
How do you scale?
▶ Consumer side scaling -
▶ Each application instance is a part of a
consumer group and reads from at least
one partition of the topic it is subscribed
to. (Consumer group A)
▶ Once additional application instances are
added to the consumer group, Kafka
reassigns partitions so that the additional
instance can read from at least one
partition. (Consumer group B)
▶ Producer side scaling -
▶ In case of producer spikes, producer can
write to multiple partitions across multiple
brokers. The throughput is controlled by
the network card I/O capacity and the
disk space attached to the broker.
▶ Kinesis
▶ Write - 1,000 records per second for writes, up to a maximum total
data write rate of 1 MB per second (including partition keys)
▶ Read - up to 5 transactions per second for reads, up to a maximum
total data read rate of 2 MB per second
▶ Retention - 1 day by default
▶ Kafka
▶ Write - Dependent on the network card
▶ Read - Dependent on the network card
▶ Retention - 7 days
Throughput
▶ Test setup -
▶ Cluster - Three Intel Xeon 2.5 GHz processor with six cores
▶ Six 7200 RPM SATA drives, 32GB of RAM, 1Gbps Ethernet
▶ Equivalent EC2 instance - t2.2xlarge priced at 0.376 $ per hour.
▶ Test - Single producer thread, 3x asynchronous replication
▶ Record size - 100 byte.
▶ Throughput recorded with a cluster of 3 machines - 786,980 records/sec
(75.1 MB/sec) being consumed and persisted in the Kafka cluster.
▶ Total cost of the cluster per hour - 0.376 * 3 = 1.128 $ (excluding the zookeeper
Throughput and cost comparison
Kafka
▶ Kinesis shard capacity - 1MB/sec.
▶ Total number of shards required for a comparable test - 75.
▶ Cost per shard - $0.015 / hour.
▶ Cost of PUT Payload Units, per 1,000,000 units - $0.014
▶ Total no of Payload Units per hour - (75 MB/sec * 3600 sec ) / 25 KB - (1)
▶ Total no of PUTS per hour - (1) / 1M - Around 11
▶ Total cost - 75 * 0.015 + 11 * 0.014 = 1.29$
So, Total cost is around the same - 1.12$/hour for Kafka (without Zookeeper)
vs 1.29$/hour for Kinesis.
Throughput and cost comparison
Kinesis
More detailed comparison
More detailed comparison
Limits on kinesis suck -
1. Kinesis has a limit of 5 reads per second from a shard. So, if we built 5 components that would
need to read the same data and process from a shard, we would have already maxed out with
Kinesis. This seemed like an unnecessary limitation on scaling out consumers. Of course, there
are workarounds by increasing the number of shards, but then, you end up paying more too.
Front end of kinesis has a load balancer, backend does not. Thus, the strong limit.
1. Describe Stream API limits - 10 calls per account per second. A lot of calls are being made by the
KCL, which means shard monitoring and scaling up and down is subject to failure.
1. Other bugs like “vanishing history” after shard splitting, more worker leases than total number of
workers available.
Headaches with Kinesis
▶ Main concern → Everything needs to be managed.
▶ These concerns should be alleviated after the Kafka as a service
launch.
Headaches with Kafka
Use case for the data team
Kafka
▶ Capable of handling massive amount of messages.
▶ Easier to scale out. Can scale vertically as well.
▶ A new aws instance and start the Kafka broker can be started on it within a
matter of 1-2 minutes in us-west-1by using EBS to minimize the data transfer (as
per Confluent).
▶ Lower end to end latency than Kinesis, as Kinesis writes its data synchronously to
3 locations before it confirms a put request. Kafka supports async replication.
▶ More mature than Kinesis, less bugs.
▶ More flexible than Kinesis, no limits.
▶ Huge open source support.
▶ Plenty of success stories where Kafka is used as the log and materialized views
are constructed on top of it, using Spark, Samza, Storm, Flink etc.
Why switch from Kinesis to Kafka
Companies using Kafka
How Netflix uses Kafka on AWS
Questions/Comments/Suggestions?
▶ Architecture - https://kafka.apache.org/documentation/
▶ Throughout study - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-
million-writes-second-three-cheap-machines
▶ Kinesis Limits - http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-
limits.html
▶ Detailed comparison - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download
▶ How netflix uses Kafka - https://medium.com/netflix-techblog/kafka-inside-keystone-
pipeline-dd5aeabaf6bb
References

More Related Content

What's hot

Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQGeorge Teo
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton Araf Karsh Hamid
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdKai Wähner
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREAraf Karsh Hamid
 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewBob Killen
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Using Git and BitBucket
Using Git and BitBucketUsing Git and BitBucket
Using Git and BitBucketMedhat Dawoud
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources confluent
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitecturePaul Mooney
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusClaus Ibsen
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - KafkaMayank Bansal
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 introTerry Cho
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingAraf Karsh Hamid
 

What's hot (20)

Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 
kafka
kafkakafka
kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
BitBucket presentation
BitBucket presentationBitBucket presentation
BitBucket presentation
 
Kubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive OverviewKubernetes - A Comprehensive Overview
Kubernetes - A Comprehensive Overview
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Using Git and BitBucket
Using Git and BitBucketUsing Git and BitBucket
Using Git and BitBucket
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources Automate Your Kafka Cluster with Kubernetes Custom Resources
Automate Your Kafka Cluster with Kubernetes Custom Resources
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic Architecture
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Kubernetes #1 intro
Kubernetes #1   introKubernetes #1   intro
Kubernetes #1 intro
 
Big Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb ShardingBig Data Redis Mongodb Dynamodb Sharding
Big Data Redis Mongodb Dynamodb Sharding
 

Similar to Kafka vs kinesis

OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...ScyllaDB
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @JavaPeter Lawrey
 
Acsug scalable windows azure patterns
Acsug scalable windows azure patternsAcsug scalable windows azure patterns
Acsug scalable windows azure patternsNikolai Blackie
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis Omid Vahdaty
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...HostedbyConfluent
 

Similar to Kafka vs kinesis (20)

OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
Real time data pipline with kafka streams
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streams
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
Scylla Summit 2022: Operating at Monstrous Scales: Benchmarking Petabyte Work...
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @Java
 
Acsug scalable windows azure patterns
Acsug scalable windows azure patternsAcsug scalable windows azure patterns
Acsug scalable windows azure patterns
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Introduction to streaming and messaging flume,kafka,SQS,kinesis
Introduction to streaming and messaging  flume,kafka,SQS,kinesis Introduction to streaming and messaging  flume,kafka,SQS,kinesis
Introduction to streaming and messaging flume,kafka,SQS,kinesis
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...
 

Recently uploaded

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 

Recently uploaded (17)

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 

Kafka vs kinesis

  • 2. Agenda 1. Kafka architecture high level overview 2. Comparison with Kinesis in terms of throughput and cost 3. Headaches with Kinesis and Kafka 4. Use case for the data team 5. Reasons for switching 6. Success stories 7. References
  • 3. Kafka ArchitectureVery similar to Kinesis! That shouldn’t come as a surprise as Kinesis was inspired by Kafka.
  • 4. Kinesis Kafka Stream Topic Shard Partition DynamoDB tables Zookeeper Architecture (Contd..)
  • 5. ▶ Kafka broker stores all messages in the partitions configured for that particular topic. It ensures the messages are equally shared between partitions. ▶ Once the consumer subscribes to a topic, Kafka will provide the current offset of the topic to the consumer and also saves the offset in the Zookeeper ensemble. ▶ Consumer will request the Kafka in a regular interval (configurable) for new messages. ▶ Once the messages are processed, consumer will send an acknowledgement to the Kafka broker. ▶ Once Kafka receives an acknowledgement, it changes the offset to the new value and Working
  • 6. How do you scale? ▶ Consumer side scaling - ▶ Each application instance is a part of a consumer group and reads from at least one partition of the topic it is subscribed to. (Consumer group A) ▶ Once additional application instances are added to the consumer group, Kafka reassigns partitions so that the additional instance can read from at least one partition. (Consumer group B) ▶ Producer side scaling - ▶ In case of producer spikes, producer can write to multiple partitions across multiple brokers. The throughput is controlled by the network card I/O capacity and the disk space attached to the broker.
  • 7. ▶ Kinesis ▶ Write - 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys) ▶ Read - up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second ▶ Retention - 1 day by default ▶ Kafka ▶ Write - Dependent on the network card ▶ Read - Dependent on the network card ▶ Retention - 7 days Throughput
  • 8. ▶ Test setup - ▶ Cluster - Three Intel Xeon 2.5 GHz processor with six cores ▶ Six 7200 RPM SATA drives, 32GB of RAM, 1Gbps Ethernet ▶ Equivalent EC2 instance - t2.2xlarge priced at 0.376 $ per hour. ▶ Test - Single producer thread, 3x asynchronous replication ▶ Record size - 100 byte. ▶ Throughput recorded with a cluster of 3 machines - 786,980 records/sec (75.1 MB/sec) being consumed and persisted in the Kafka cluster. ▶ Total cost of the cluster per hour - 0.376 * 3 = 1.128 $ (excluding the zookeeper Throughput and cost comparison Kafka
  • 9. ▶ Kinesis shard capacity - 1MB/sec. ▶ Total number of shards required for a comparable test - 75. ▶ Cost per shard - $0.015 / hour. ▶ Cost of PUT Payload Units, per 1,000,000 units - $0.014 ▶ Total no of Payload Units per hour - (75 MB/sec * 3600 sec ) / 25 KB - (1) ▶ Total no of PUTS per hour - (1) / 1M - Around 11 ▶ Total cost - 75 * 0.015 + 11 * 0.014 = 1.29$ So, Total cost is around the same - 1.12$/hour for Kafka (without Zookeeper) vs 1.29$/hour for Kinesis. Throughput and cost comparison Kinesis
  • 12. Limits on kinesis suck - 1. Kinesis has a limit of 5 reads per second from a shard. So, if we built 5 components that would need to read the same data and process from a shard, we would have already maxed out with Kinesis. This seemed like an unnecessary limitation on scaling out consumers. Of course, there are workarounds by increasing the number of shards, but then, you end up paying more too. Front end of kinesis has a load balancer, backend does not. Thus, the strong limit. 1. Describe Stream API limits - 10 calls per account per second. A lot of calls are being made by the KCL, which means shard monitoring and scaling up and down is subject to failure. 1. Other bugs like “vanishing history” after shard splitting, more worker leases than total number of workers available. Headaches with Kinesis
  • 13. ▶ Main concern → Everything needs to be managed. ▶ These concerns should be alleviated after the Kafka as a service launch. Headaches with Kafka
  • 14. Use case for the data team Kafka
  • 15. ▶ Capable of handling massive amount of messages. ▶ Easier to scale out. Can scale vertically as well. ▶ A new aws instance and start the Kafka broker can be started on it within a matter of 1-2 minutes in us-west-1by using EBS to minimize the data transfer (as per Confluent). ▶ Lower end to end latency than Kinesis, as Kinesis writes its data synchronously to 3 locations before it confirms a put request. Kafka supports async replication. ▶ More mature than Kinesis, less bugs. ▶ More flexible than Kinesis, no limits. ▶ Huge open source support. ▶ Plenty of success stories where Kafka is used as the log and materialized views are constructed on top of it, using Spark, Samza, Storm, Flink etc. Why switch from Kinesis to Kafka
  • 17. How Netflix uses Kafka on AWS
  • 19. ▶ Architecture - https://kafka.apache.org/documentation/ ▶ Throughout study - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2- million-writes-second-three-cheap-machines ▶ Kinesis Limits - http://docs.aws.amazon.com/streams/latest/dev/service-sizes-and- limits.html ▶ Detailed comparison - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download ▶ How netflix uses Kafka - https://medium.com/netflix-techblog/kafka-inside-keystone- pipeline-dd5aeabaf6bb References

Editor's Notes

  1. Source - https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
  2. Source - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download
  3. Source - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download
  4. Source - http://go.datapipe.com/whitepaper-kafka-vs-kinesis-download