Traditional Messaging
Traditional Messaging
● Java Messaging Service (JMS)
Traditional Messaging
● Java Messaging Service (JMS)
● Advanced Messaging Queuing Protocol (AMQP)
Traditional Messaging
● Java Messaging Service (JMS)
● Advanced Messaging Queuing Protocol (AMQP)
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
○ Rabbit MQ
● Message Queuing Telemetry Transport (MQTT)
Traditional Messaging
● Java Messaging Service (JMS)
○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ
● Advanced Messaging Queuing Protocol (AMQP)
○ Rabbit MQ
● Message Queuing Telemetry Transport (MQTT)
○ Hive MQ
A very famous Qns
https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
“Performance-wise, both are excellent
performers, but have major architectural
differences.”
--from the quora qns discussion
What’s the Diff ?
What’s the Diff ?
What’s the Diff ?
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber ● Subscriber pulls/picks up msg from server
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
● Also. Subscriber stores state i.e. which msg’s it picked up
What’s the Diff ?
● Server Pushes/delivers msg to Subscriber
● Server does lot of work in-mem
○ Store each msg & its state(delivered etc)
○ Maintain order of msg
● Hence mostly an ‘Online’ processing model
● Server can do complex routing logic.
● Subscriber pulls/picks up msg from server
● Not much in-mem work for server, just store msg
○ Just store msg. Dont care whether pickedup or not
○ Ordering logic dictated by client & storage format
● Hence mostly an Offline processing model
● Client maintains routing logic. Server is blind to it.
● Also. Subscriber stores state i.e. which msg’s it picked up
So Apache Kafka...
Apache Kafka
Notions:
Apache Kafka
Notions:
● Publisher
Apache Kafka
Notions:
● Publisher
● Message
Apache Kafka
Notions:
● Publisher
● Message
● Topic
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer
Apache Kafka
Notions:
● Publisher
● Message
● Topic
○ Topic Partition
● Broker
● Subscriber/Consumer
● Message Offset
Summary
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
● Subscriber requests broker for msg at specific offset in a Topic Partition.
Summary
● Publisher chooses a topic to publish onto.
○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
● Broker receives message & appends message to end of topic partition.
● Subscriber requests broker for msg at specific offset in a Topic Partition.
● Upto Subscriber to remember which msg offset it has processed.
A lovely use case - REPLAY
A lovely use case - REPLAY
● Since Subscriber requests for a message at an offset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.
A lovely use case - REPLAY
● Since Subscriber requests for a message at an offset in a topic partition, the
subscriber is free to REPLAY the processing at any point in time.
● Handy when outages occur.
Hence
Things to Ponder about
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
● Think about producer Partitioning key ...
Things to Ponder about
● How do i achieve high Read/Write Throughput ?
○ Have more partitions per topic , this determines read/write throughput
● Can multiple publishers publish concurrently to same topic partition ?
○ Yes
● Should multiple Consumers read from same topic partition ?
○ Ideally one Consumer per partition or Consumer group per partition
● What about replication of data ?
○ While creating a topic, you can set replication factor which applies to each topic partition.
● What about data retention time policy ?
○ While creating a topic, please set it. You can edit later on.
● Think about producer Partitioning key …
In dataspark
●
Others ...
● Amazon Kinesis is similar to Kafka ….
● You have Redis - PubSub (different guarantees, not similar to
kafka)
What i did not cover ? :)
● Kafka Replication mechanism
○ ISR = in sync replica set
● Tools like Kafka mirror
● Zookeeper interaction (yes kafka depends on zookeeper)
What’s new in kafka ?
● Kafka stream api
● Kafka Sql
● See release notes … :)
producer.send(“ Any Questions ? Thanks ”)

Introduction to Apache Kafka

  • 2.
  • 3.
    Traditional Messaging ● JavaMessaging Service (JMS)
  • 4.
    Traditional Messaging ● JavaMessaging Service (JMS) ● Advanced Messaging Queuing Protocol (AMQP)
  • 5.
    Traditional Messaging ● JavaMessaging Service (JMS) ● Advanced Messaging Queuing Protocol (AMQP) ● Message Queuing Telemetry Transport (MQTT)
  • 6.
    Traditional Messaging ● JavaMessaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ● Message Queuing Telemetry Transport (MQTT)
  • 7.
    Traditional Messaging ● JavaMessaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ○ Rabbit MQ ● Message Queuing Telemetry Transport (MQTT)
  • 8.
    Traditional Messaging ● JavaMessaging Service (JMS) ○ Apache Active MQ , IBM Websphere MQ , Hornet MQ, Fiorano* MQ ● Advanced Messaging Queuing Protocol (AMQP) ○ Rabbit MQ ● Message Queuing Telemetry Transport (MQTT) ○ Hive MQ
  • 9.
    A very famousQns https://www.quora.com/What-are-the-differences-between-Apache-Kafka-and-RabbitMQ
  • 10.
    “Performance-wise, both areexcellent performers, but have major architectural differences.” --from the quora qns discussion
  • 11.
  • 12.
  • 13.
  • 14.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Subscriber pulls/picks up msg from server
  • 15.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg
  • 16.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not
  • 17.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format
  • 18.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model
  • 19.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it.
  • 20.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it. ● Also. Subscriber stores state i.e. which msg’s it picked up
  • 21.
    What’s the Diff? ● Server Pushes/delivers msg to Subscriber ● Server does lot of work in-mem ○ Store each msg & its state(delivered etc) ○ Maintain order of msg ● Hence mostly an ‘Online’ processing model ● Server can do complex routing logic. ● Subscriber pulls/picks up msg from server ● Not much in-mem work for server, just store msg ○ Just store msg. Dont care whether pickedup or not ○ Ordering logic dictated by client & storage format ● Hence mostly an Offline processing model ● Client maintains routing logic. Server is blind to it. ● Also. Subscriber stores state i.e. which msg’s it picked up
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
    Apache Kafka Notions: ● Publisher ●Message ● Topic ○ Topic Partition
  • 28.
    Apache Kafka Notions: ● Publisher ●Message ● Topic ○ Topic Partition ● Broker
  • 29.
    Apache Kafka Notions: ● Publisher ●Message ● Topic ○ Topic Partition ● Broker ● Subscriber/Consumer
  • 30.
    Apache Kafka Notions: ● Publisher ●Message ● Topic ○ Topic Partition ● Broker ● Subscriber/Consumer ● Message Offset
  • 41.
  • 42.
    Summary ● Publisher choosesa topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key)
  • 43.
    Summary ● Publisher choosesa topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition.
  • 44.
    Summary ● Publisher choosesa topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition. ● Subscriber requests broker for msg at specific offset in a Topic Partition.
  • 45.
    Summary ● Publisher choosesa topic to publish onto. ○ It also decides Routing logic i.e. chooses which partition to publish onto (uses a partitioning key) ● Broker receives message & appends message to end of topic partition. ● Subscriber requests broker for msg at specific offset in a Topic Partition. ● Upto Subscriber to remember which msg offset it has processed.
  • 46.
    A lovely usecase - REPLAY
  • 47.
    A lovely usecase - REPLAY ● Since Subscriber requests for a message at an offset in a topic partition, the subscriber is free to REPLAY the processing at any point in time.
  • 48.
    A lovely usecase - REPLAY ● Since Subscriber requests for a message at an offset in a topic partition, the subscriber is free to REPLAY the processing at any point in time. ● Handy when outages occur.
  • 50.
  • 51.
  • 52.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ?
  • 53.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput
  • 54.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ?
  • 55.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes
  • 56.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ?
  • 57.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition
  • 58.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ?
  • 59.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition.
  • 60.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ?
  • 61.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on.
  • 62.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on. ● Think about producer Partitioning key ...
  • 63.
    Things to Ponderabout ● How do i achieve high Read/Write Throughput ? ○ Have more partitions per topic , this determines read/write throughput ● Can multiple publishers publish concurrently to same topic partition ? ○ Yes ● Should multiple Consumers read from same topic partition ? ○ Ideally one Consumer per partition or Consumer group per partition ● What about replication of data ? ○ While creating a topic, you can set replication factor which applies to each topic partition. ● What about data retention time policy ? ○ While creating a topic, please set it. You can edit later on. ● Think about producer Partitioning key …
  • 64.
  • 65.
    Others ... ● AmazonKinesis is similar to Kafka …. ● You have Redis - PubSub (different guarantees, not similar to kafka)
  • 66.
    What i didnot cover ? :) ● Kafka Replication mechanism ○ ISR = in sync replica set ● Tools like Kafka mirror ● Zookeeper interaction (yes kafka depends on zookeeper)
  • 67.
    What’s new inkafka ? ● Kafka stream api ● Kafka Sql ● See release notes … :)
  • 68.