SlideShare a Scribd company logo
Traditional Messaging
Traditional Messaging
● Java Messaging Service (JMS)
Traditional Messaging
● Java Messaging Service (JMS)
● Advanced Messaging Queuing Protocol (AMQP)
Traditional Messaging
● Java Messaging Service (JMS)
● Advanced Messaging Queuing Protocol (AMQP)
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
○ Rabbit MQ
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
○ Rabbit MQ
● Message Queuing Telemetry Transport (MQTT)
○ Hive MQ
A very famous Qns
https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
“Performance-wise, both are excellent
performers, but have major architectural
differences.”
--from the quora qns discussion
What’s the Diff ?
What’s the Diff ?
What’s the Diff ?
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber ● Subscriber pulls/picks up msg from server
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
● Also. Subscriber stores state i.e. which msg’s it picked up
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
● Also. Subscriber stores state i.e. which msg’s it picked up
So Apache Kafka...
Apache Kafka
Notions:
Apache Kafka
Notions:
● Publisher
Apache Kafka
Notions:
● Publisher
● Message
Apache Kafka
Notions:
● Publisher
● Message
● Topic
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer
● Message Offset
Summary
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
● Subscriber requests broker for msg at specific offset in a Topic Partition.
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
● Subscriber requests broker for msg at specific offset in a Topic Partition.
● Upto Subscriber to remember which msg offset it has processed.
A lovely use case - REPLAY
A lovely use case - REPLAY
● Since Subscriber requests for a message at an offset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.
A lovely use case - REPLAY
● Since Subscriber requests for a message at an offset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.
● Handy when outages occur.
Hence
Things to Ponder about
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
● Think about producer Partitioning key ...
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
● Think about producer Partitioning key …
In dataspark
●
Others ...
● Amazon Kinesis is similar to Kafka ….
● You have Redis - PubSub (different guarantees, not similar to
kafka)
What i did not cover ? :)
● Kafka Replication mechanism
○ ISR = in sync replica set
● Tools like Kafka mirror
● Zookeeper interaction (yes kafka depends on zookeeper)
What’s new in kafka ?
● Kafka stream api
● Kafka Sql
● See release notes … :)
producer.send(“ Any Questions ? Thanks ”)

More Related Content

Similar to Introduction to Apache Kafka

Towards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe SystemsTowards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe Systems
Srinath Perera
 
Enterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdfEnterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdf
Ortus Solutions, Corp
 

Similar to Introduction to Apache Kafka (20)

Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Reducing load with RabbitMQ
Reducing load with RabbitMQReducing load with RabbitMQ
Reducing load with RabbitMQ
 
Messaging
MessagingMessaging
Messaging
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Towards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe SystemsTowards Improved Data Dissemination of Publish-Subscribe Systems
Towards Improved Data Dissemination of Publish-Subscribe Systems
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
SOA Pattern-Asynchronous Queuing
SOA Pattern-Asynchronous QueuingSOA Pattern-Asynchronous Queuing
SOA Pattern-Asynchronous Queuing
 
Scaling event aggregation at twitter
Scaling event aggregation at twitterScaling event aggregation at twitter
Scaling event aggregation at twitter
 
Patna_Meetup_MQ
Patna_Meetup_MQPatna_Meetup_MQ
Patna_Meetup_MQ
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Enterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdfEnterprise Messaging with RabbitMQ.pdf
Enterprise Messaging with RabbitMQ.pdf
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Building chat bot
Building chat botBuilding chat bot
Building chat bot
 
What we've learned from running thousands of production RabbitMQ clusters - L...
What we've learned from running thousands of production RabbitMQ clusters - L...What we've learned from running thousands of production RabbitMQ clusters - L...
What we've learned from running thousands of production RabbitMQ clusters - L...
 
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
KafkaConsumer - Decoupling Consumption and Processing for Better Resource Uti...
 
HKG15-901: Upstreaming 101
HKG15-901: Upstreaming 101HKG15-901: Upstreaming 101
HKG15-901: Upstreaming 101
 
MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ
MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ
MuleSoft Surat Virtual Meetup#33 - Unleash the power of Anypoint MQ and DLQ
 
Working with Asynchronous Events
Working with Asynchronous EventsWorking with Asynchronous Events
Working with Asynchronous Events
 

More from vishnu rao

More from vishnu rao (10)

A talk on mysql & aurora
A talk on mysql & auroraA talk on mysql & aurora
A talk on mysql & aurora
 
Mysql Relay log - the unsung hero
Mysql Relay log - the unsung heroMysql Relay log - the unsung hero
Mysql Relay log - the unsung hero
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
Druid beginner performance tips
Druid beginner performance tipsDruid beginner performance tips
Druid beginner performance tips
 
Demystifying datastores
Demystifying datastoresDemystifying datastores
Demystifying datastores
 
Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker Visualising Basic Concepts of Docker
Visualising Basic Concepts of Docker
 
StormWars - when the data stream shrinks
StormWars - when the data stream shrinksStormWars - when the data stream shrinks
StormWars - when the data stream shrinks
 
Punch clock for debugging apache storm
Punch clock for  debugging apache stormPunch clock for  debugging apache storm
Punch clock for debugging apache storm
 
a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?a wild Supposition: can MySQL be Kafka ?
a wild Supposition: can MySQL be Kafka ?
 
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
Build your own Real Time Analytics and Visualization, Enable Complex Event Pr...
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UX
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 

Introduction to Apache Kafka

  • 1.
  • 3. Traditional Messaging ● Java Messaging Service (JMS)
  • 4. Traditional Messaging ● Java Messaging Service (JMS) ● Advanced Messaging Queuing Protocol (AMQP)
  • 5. Traditional Messaging ● Java Messaging Service (JMS) ● Advanced Messaging Queuing Protocol (AMQP) ● Message Queuing Telemetry Transport (MQTT)
  • 6. Traditional Messaging ● Java Messaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ● Message Queuing Telemetry Transport (MQTT)
  • 7. Traditional Messaging ● Java Messaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ○ Rabbit MQ ● Message Queuing Telemetry Transport (MQTT)
  • 8. Traditional Messaging ● Java Messaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ○ Rabbit MQ ● Message Queuing Telemetry Transport (MQTT) ○ Hive MQ
  • 9. A very famous Qns https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
  • 10. “Performance-wise, both are excellent performers, but have major architectural differences.” --from the quora qns discussion
  • 14. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Subscriber pulls/picks up msg from server
  • 15. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg
  • 16. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not
  • 17. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format
  • 18. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model
  • 19. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it.
  • 20. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it. ● Also. Subscriber stores state i.e. which msg’s it picked up
  • 21. What’s the Diff ? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it. ● Also. Subscriber stores state i.e. which msg’s it picked up
  • 27. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition
  • 28. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition ● Broker
  • 29. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition ● Broker ● Subscriber/Consumer
  • 30. Apache Kafka Notions: ● Publisher ● Message ● Topic ○ Topic Partition ● Broker ● Subscriber/Consumer ● Message Offset
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 42. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
  • 43. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition.
  • 44. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition. ● Subscriber requests broker for msg at specific offset in a Topic Partition.
  • 45. Summary ● Publisher chooses a topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition. ● Subscriber requests broker for msg at specific offset in a Topic Partition. ● Upto Subscriber to remember which msg offset it has processed.
  • 46. A lovely use case - REPLAY
  • 47. A lovely use case - REPLAY ● Since Subscriber requests for a message at an offset in a topic partition, the subscriber is free to REPLAY the processing at any point in time.
  • 48. A lovely use case - REPLAY ● Since Subscriber requests for a message at an offset in a topic partition, the subscriber is free to REPLAY the processing at any point in time. ● Handy when outages occur.
  • 49.
  • 50. Hence
  • 52. Things to Ponder about ● How do i achieve high Read/Write Throughput ?
  • 53. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput
  • 54. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ?
  • 55. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes
  • 56. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ?
  • 57. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition
  • 58. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ?
  • 59. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition.
  • 60. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ?
  • 61. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on.
  • 62. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on. ● Think about producer Partitioning key ...
  • 63. Things to Ponder about ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on. ● Think about producer Partitioning key …
  • 65. Others ... ● Amazon Kinesis is similar to Kafka …. ● You have Redis - PubSub (different guarantees, not similar to kafka)
  • 66. What i did not cover ? :) ● Kafka Replication mechanism ○ ISR = in sync replica set ● Tools like Kafka mirror ● Zookeeper interaction (yes kafka depends on zookeeper)
  • 67. What’s new in kafka ? ● Kafka stream api ● Kafka Sql ● See release notes … :)