Functional streams with Kafka - A comparison between Akka-streams and FS2

Functional streams with
Kafka
A comparison between Akka-streams and
FS2
12th May 2017
Luis Reis & Rui Batista

Read
and Write
Offset
management
Parallelism
Which features?

val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("group.id", "consumer-tutorial")
props.put("key.deserializer", StringDeserializer.class.getName())
props.put("value.deserializer", StringDeserializer.class.getName())
consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List("foo", "bar").asJava)
//One thread
try {
while (true) {
val records = consumer.poll(Long.MAX_VALUE);
records.iterator.asScala foreach { record => println(record.value) }
}
} catch {
case e: WakeupException => // ignore for shutdown
} finally {
consumer.close()
}
//separate thread
consumer.wakeup()
Java API
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("group.id", "consumer-tutorial")
props.put("key.deserializer", StringDeserializer.class.getName())
props.put("value.deserializer", StringDeserializer.class.getName())
consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(List("foo", "bar").asJava)

Functional Streams
Akka
Streams
FS2
Monix
Kafka
Streams

Functional Streams
Akka Streams + FS2
Akka
Streams
Reactive-Kafka
FS2
FS2-Kafka
(by Rui)

Functional Streams
Akka Streams

Functional Streams
FS2
Stream[F, O]
F[_] --> fs2.Task || scalaz.concurrent.Task || cats.effect.IO
//A Function from Stream[F, I] to Stream[F, O]
Pipe[F, I, O] =:= (Stream[F, I] => Stream[F, O])
Sink[F, O] =:= Pipe[F, O, Unit]
Stream[Task, Int](1).map(_.toString).runLast.unsafeRun()

Read Write
Akka Streams
val bootstrapServers = "localhost:9092"
val consumerGroup = "retweets"
val stringSerializer = new StringSerializer
val stringDeserializer = new StringDeserializer
val producerSettings = ProducerSettings(system, stringSerializer, stringSerializer)
.withBootstrapServers(bootstrapServers)
val consumerSettings = ConsumerSettings(system, stringDeserializer, stringDeserializer)
.withGroupId(consumerGroup)
val regularSubscription = Subscriptions.topics("topic1")
val producerSettings = ProducerSettings(system, stringSerializer, stringSerializer)
val consumerSettings = ConsumerSettings(system, stringDeserializer, stringDeserializer)
val regularSubscription = Subscriptions.topics("topic1")
Consumer.plainSource(consumerSettings, regularSubscription)
.map { message =>
parseTweet[Message](record, _.value())
}.map { tweet =>
val count = tweet.retweet_count.toString
new ProducerRecord[String, String]("topic2", count)
}.runWith(Producer.plainSink(producerSettings))

Read Write
FS2
val consumerSettings = ConsumerSettings[String, String](50 millis)
val producerSettings = ProducerSettings[String, String]()
val subscription = Subscriptions.topics("topic1")
val consumerSettings = ConsumerSettings[String, String](50 millis)
val producerSettings = ProducerSettings[String, String]()
val subscription = Subscriptions.topics("topic1")
// stream definition
val stream = Consumer[Task, String, String](consumerSettings)
.simpleStream
.plainMessages(subscription)
.map(msg => parseTweet[ConsumerRecord[String, String]](msg, _.value))
.map(_.retweet_count)
.map(count => new ProducerRecord[String, String]("topic2", "key", count.toString))
.to(Producer[Task, String, String](producerSettings).sendAsync)
// run at end of universe
stream.run.unsafeRun()

Offset Management
Akka Streams
//Automatically
Consumer.committableSource(consumerSettings, regularSubscription)
.map { message =>
(message, parseTweet[Message](message, _.record.value()))
}.map { case (message, tweet) =>
Util.messageWithOffset(message.committableOffset, tweet, "topic2")
}.runWith(Producer.commitableSink(producerSettings))
//Automatically
Consumer.committableSource(consumerSettings, regularSubscription)
.map { message =>
(message, parseTweet[Message](message, _.record.value()))
}.map { case (message, tweet) =>
}.runWith(Producer.commitableSink(producerSettings))
//External offset storage
Source.fromFuture(offsetDB.loadOffset())
.flatMapConcat { offset =>
val subscription = Subscriptions.assignmentWithOffsets(
Map(new TopicPartition("topic1", 0) -> offset)
)
Consumer.committableSource(consumerSettings, subscription)
.mapAsync(producerSettings.parallelism) { record =>
val offset = record.committableOffset.partitionOffset.offset
offsetDB.save(offset + 1)
}
}.runWith(Sink.ignore)

Offset Management
FS2
//Automatically
Consumer[Task, String, String](consumerSettings)
.simpleStream
.commitableMessages(subscription)
.map(_.map(r => parseTweet[Message](r, _.value)))
// do stuff with tweet
.map { message =>
}
.to(Producer[Task, String, String](producerSettings).sendCommitable)
//Automatically
.simpleStream
.commitableMessages(subscription)
.map(_.map(r => parseTweet[Message](r, _.value)))
// do stuff with tweet
.map { message =>
}
.to(Producer[Task, String, String](producerSettings).sendCommitable)
//External offset storage
Stream.eval(offsetDB.loadOffset)
.flatMap { offset =>
val assignment = Subscriptions.assignmentWithOffsets(
Map(new TopicPartition("topic1", 0) -> offset)
)
.simpleStream
.commitableMessages(assignment)
// do stuff with message
.evalMap(msg => offsetDB.save(msg.commitableOffset.partitionOffset.offset + 1))
}

Parallelism
Akka Streams
//One source per partition
Consumer.committablePartitionedSource(consumerSettings, regularSubscription)
//One source per partition
Consumer.committablePartitionedSource(consumerSettings, regularSubscription)
.map {
case (topicPartition, source) =>
source
.via(logicFlow)
.map { flowResponse => createKafkaMessage(producerTopic, flowResponse) }
.runWith(Producer.committablePartitionedSink(producerSettings))
}
.mapAsyncUnordered(producerSettings.parallelism)(identity)
.runWith(Sink.ignore)

Parallelism
FS2
def parallelCount(keyFunc: KeyFunc[Task, Message],
signal: Signal[Task, Map[String, Int]])
signal: Signal[Task, Map[String, Int]]) = {
val partitioned = Consumer[Task, String, String](consumerSettings)
.partitionedStreams
.commitableMessages(Subscriptions.topics("topic1"))
.map {
case (_, innerStream) =>
innerStream.evalMap(keyFunc)
}
//still needs to join streams
signal: Signal[Task, Map[String, Int]]) = {
val partitioned = Consumer[Task, String, String](consumerSettings)
.partitionedStreams
.commitableMessages(Subscriptions.topics("topic1"))
.map {
case (_, innerStream) =>
innerStream.evalMap(keyFunc)
}
// join streams and aggregate key counts
fs2.concurrent.join(100)(partitioned)
.scan(Map.empty[String, Int]) { case (current, key) =>
current |+| Map(key -> 1)
}
.evalMap(signal.set _)
}

Functional Streams
API Design for FS2-Kafka
Typesafe
API
Resource
acquisition
State
management
Back
Pressure

trait Consumer[F[_], K, V]
FS2-Kafka
Typesafe API
trait Consumer[F[_], K, V] {
val simpleStream = new StreamType {
type OutStreamType[A] = Stream[F, A]
// ...
}
val partitionedStreams = new StreamType {
type OutStreamType[A] = Stream[F, (TopicPartition, Stream[F, A])]
// ...
}
}

trait Consumer[F[_], K, V] {
private[kafka] def createConsumer: F[ConsumerControl[F, K, V]]
trait StreamType {
type OutStreamType[_] <: Stream[F, _]
type CMessage = CommitableMessage[F, ConsumerRecord[K, V]]]
private[kafka] def makeStream(
subscription: Subscription,
builder: MessageBuilder[F, K, V]
)(implicit F: Async[F]): OutStreamType[builder.Message]
def commitableMessages(subscription: Subscription)
(implicit F: Async[F]): OutStreamType[CMessage]] =
makeStream(subscription, new CommitableMessageBuilder[F, K, V])
}
}
FS2-Kafka
Typesafe API

trait Producer[F[_], K, V] {
//...
type PMessage[P] = ProducerMessage[K, V, P]
def send[P](implicit F: Async[F]): Pipe[F, PMessage[P], ProducerMetadata[P]]
def sendAsync: Sink[F, ProducerRecord[K, V]]
def sendCommitable[P <: Commitable[F]](implicit F: Async[F]): Sink[F, PMessage[P]]
}
Resource Acquisition
def bracket[F[_],R,A](r: F[R])
(use: R => Stream[F,A],
release: R => F[Unit]): Stream[F,A]
Stream.bracket(createConsumer)({consumer => ???}, _.close)
FS2-Kafka
Typesafe API

type Record = ConsumerRecord[K, V]
//Mutable queues and references
openPartitions: Async.Ref[F, Map[TopicPartition, Queue[F, Option[Chunk[Record]]]]]
//Notify assigned partitions
openPartitionsQueue: Queue[F, (TopicPartition, Stream[F, Record])]
FS2-Kafka
State Management

References
Apache Kafka
http://kafka.apache.org/
http://blog.cloudera.com/blog/2014/09/apache
-kafka-for-beginners/
http://nverma-tech-
blog.blogspot.pt/2015/10/apache-kafka-quick-
start-on-windows.html
FS2
https://github.com/ragb/fs2-kafka
https://github.com/functional-
streams-for-scala/fs2
Akka-Streams
http://akka.io/docs/
https://pt.slideshare.net/Lightben
d/understanding-akka-streams-
back-pressure-and-
asynchronous-architectures

Functional streams with Kafka - A comparison between Akka-streams and FS2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Functional streams with Kafka - A comparison between Akka-streams and FS2

Similar to Functional streams with Kafka - A comparison between Akka-streams and FS2 (20)

Recently uploaded

Recently uploaded (20)

Functional streams with Kafka - A comparison between Akka-streams and FS2