Kafka is a distributed streaming platform whose main strength is the ability to serve as the single message hub for applications of massive scale.It relies on a topic-based publish-subscribe model, where each topic may have multiple partitions to be consumed/published from/into. Kafka diverges from regular message queues in a lot of its functionalities. A significant change from other standard message queues is that each consumer has an associated offset, representing the identifier of the last consumed message from the subscribed topic. This allows for replaying of messages in cases of failures, deployment issues, and other occurrences. How can fully functional applications interact with Kafka and its features while maintaining the characteristics of the functional domain? The Scala ecosystem has many stream-based frameworks that can be leveraged to use Kafka. Two of the most used are Akka-Streams and FS2. These frameworks imply different approaches for processing data streams and dealing with Kafka’s features. The goal of this talk is to provide insight into these differences, in terms of functionality, performance, and their impact in maintaining the code, keeping it functional, robust and readable.
13. Read Write
Akka Streams
val bootstrapServers = "localhost:9092"
val consumerGroup = "retweets"
val stringSerializer = new StringSerializer
val stringDeserializer = new StringDeserializer
val bootstrapServers = "localhost:9092"
val consumerGroup = "retweets"
val stringSerializer = new StringSerializer
val stringDeserializer = new StringDeserializer
val producerSettings = ProducerSettings(system, stringSerializer, stringSerializer)
.withBootstrapServers(bootstrapServers)
val consumerSettings = ConsumerSettings(system, stringDeserializer, stringDeserializer)
.withBootstrapServers(bootstrapServers)
.withGroupId(consumerGroup)
val regularSubscription = Subscriptions.topics("topic1")
val bootstrapServers = "localhost:9092"
val consumerGroup = "retweets"
val stringSerializer = new StringSerializer
val stringDeserializer = new StringDeserializer
val producerSettings = ProducerSettings(system, stringSerializer, stringSerializer)
.withBootstrapServers(bootstrapServers)
val consumerSettings = ConsumerSettings(system, stringDeserializer, stringDeserializer)
.withBootstrapServers(bootstrapServers)
.withGroupId(consumerGroup)
val regularSubscription = Subscriptions.topics("topic1")
Consumer.plainSource(consumerSettings, regularSubscription)
.map { message =>
parseTweet[Message](record, _.value())
}.map { tweet =>
val count = tweet.retweet_count.toString
new ProducerRecord[String, String]("topic2", count)
}.runWith(Producer.plainSink(producerSettings))
14. Read Write
FS2
val bootstrapServers = "localhost:9092"
val consumerGroup = "retweets"
val bootstrapServers = "localhost:9092"
val consumerGroup = "retweets"
val consumerSettings = ConsumerSettings[String, String](50 millis)
.withBootstrapServers(bootstrapServers)
.withGroupId(consumerGroup)
val producerSettings = ProducerSettings[String, String]()
.withBootstrapServers(bootstrapServers)
val subscription = Subscriptions.topics("topic1")
val bootstrapServers = "localhost:9092"
val consumerGroup = "retweets"
val consumerSettings = ConsumerSettings[String, String](50 millis)
.withBootstrapServers(bootstrapServers)
.withGroupId(consumerGroup)
val producerSettings = ProducerSettings[String, String]()
.withBootstrapServers(bootstrapServers)
val subscription = Subscriptions.topics("topic1")
// stream definition
val stream = Consumer[Task, String, String](consumerSettings)
.simpleStream
.plainMessages(subscription)
.map(msg => parseTweet[ConsumerRecord[String, String]](msg, _.value))
.map(_.retweet_count)
.map(count => new ProducerRecord[String, String]("topic2", "key", count.toString))
.to(Producer[Task, String, String](producerSettings).sendAsync)
// run at end of universe
stream.run.unsafeRun()