Greyhound - Powerful Pure Functional Kafka Library

natans@wix.com twitter@NSilnitsky linkedin/natansilnitsky github.com/natansil
Greyhound -
A Powerful Pure Functional Kafka library
Natan Silnitsky
Backend Infra Developer, Wix.com

A Scala/Java high-level SDK for Apache Kafka.
Powered by ZIO
Greyhound
* features...

But ﬁrst…
a few Kafka terms

@NSilnitsky
Kafka
Producer
Kafka Broker
A few
Kafka terms

@NSilnitsky
Kafka
Producer
Topic
Partition
Partition
Partition
Kafka Broker
Topic Topic
Partition
Partition
Partition
Partition
Partition
Partition
A few
Kafka terms

@NSilnitsky
Topic TopicTopic
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Kafka
Producer
Partition
0 1 2 3 4 5
append-only log
A few
Kafka terms

@NSilnitsky
Topic TopicTopic
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Partition
Kafka
Consumers
A few
Kafka terms
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
Partition

Greyhound
Wraps
Kafka
Kafka Broker
Service A Service B
Kafka
Consumer
Kafka
Producer

@NSilnitsky
Kafka Broker
Service A Service B
Abstract
so that it is easy to
change for everyone
Simplify
APIs, with additional
features
Greyhound
wraps Kafka
Kafka
Consumer
Kafka
Producer

@NSilnitsky
Multiple APIs
For Java, Scala and
Wix Devs
Greyhound
wraps Kafka
Scala Future ZIO Java
Kafka
Consumer
Kafka
Producer
Kafka Broker
ZIO Core
Service A Service B
* all logic

@NSilnitsky
Greyhound
wraps Kafka
Scala Future ZIO Java
Kafka
Consumer
Kafka
Producer
Kafka Broker
ZIO Core
Wix Interop
OSS
Private
Service A Service B

Kafka Broker
Service A Service B
Kafka
Consumer
Kafka
Producer
- Boilerplate
Greyhound
wraps Kafka
What do we
want it to do?

val consumer: KafkaConsumer[String, SomeMessage] =
createConsumer()
def pollProcessAndCommit(): Unit = {
val consumerRecords = consumer.poll(1000).asScala
consumerRecords.foreach(record => {
println(s"Record value: ${record.value.messageValue}")
})
consumer.commitAsync()
pollProcessAndCommit()
}
Kafka
Consumer API
* Broker location, serde

val consumer: KafkaConsumer[String, SomeMessage] =
createConsumer()
def pollProcessAndCommit(): Unit = {
val consumerRecords = consumer.poll(1000).asScala
consumerRecords.foreach(record => {
println(s"Record value: ${record.value.messageValue}")
})
consumer.commitAsync()
}
Kafka
Consumer API

val handler: RecordHandler[Console, Nothing,
String, SomeMessage] =
RecordHandler { record =>
zio.console.putStrLn(record.value.messageValue)
}
GreyhoundConsumersBuilder
.withConsumer(RecordConsumer(
topic = "some-group",
group = "group-2",
handle = handler))
Greyhound
Consumer API
* No commit, wix broker location

Functional
Composition
Greyhound
wraps Kafka
✔ Simple Consumer API
+ Composable Record
Handler
What do we
want it to do?

@NSilnitsky
Kafka Broker
Greyhound
Consumer
Kafka
Consumer
COMPOSABLE
RECORD HANDLER
Site
Published
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Commit

trait RecordHandler[-R, +E, K, V] {
def handle(record: ConsumerRecord[K, V]): ZIO[R, E, Any]
def contramap: RecordHandler
def contramapM: RecordHandler
def mapError: RecordHandler
def withErrorHandler: RecordHandler
def ignore: RecordHandler
def provide: RecordHandler
def andThen: RecordHandler
def withDeserializers: RecordHandler
}
Composable
Handler
@NSilnitsky

trait RecordHandler[-R, +E, K, V] {
def handle(record: ConsumerRecord[K, V]): ZIO[R, E, Any]
def contramap: RecordHandler
def contramapM: RecordHandler
def mapError: RecordHandler
def withErrorHandler: RecordHandler
def ignore: RecordHandler
def provide: RecordHandler
def andThen: RecordHandler
def withDeserializers: RecordHandler
}
Composable
Handler
@NSilnitsky * change type

def contramapM[R, E, K, V](f: ConsumerRecord[K2, V2] =>
ZIO[R, E, ConsumerRecord[K, V]])
: RecordHandler[R, E, K2, V2] =
new RecordHandler[R, E, K2, V2] {
override def handle(record: ConsumerRecord[K2, V2]): ZIO[R, E, Any] =
f(record).flatMap(self.handle)
}
def withDeserializers(keyDeserializer: Deserializer[K],
valueDeserializer: Deserializer[V])
: RecordHandler[R, Either[SerializationError, E], Chunk[Byte], Chunk[Byte]] =
mapError(Right(_)).contramapM { record =>
record.bimapM(
key => keyDeserializer.deserialize(record.topic, record.headers, key),
value => valueDeserializer.deserialize(record.topic, record.headers, value)
).mapError(e => Left(SerializationError(e)))
}
Composable
Handler
@NSilnitsky

RecordHandler(
(r: ConsumerRecord[String, Duration]) =>
putStrLn(s"duration: ${r.value.toMillis}"))
.withDeserializers(StringSerde, DurationSerde)
=>
RecordHandler[Console, scala.Either[SerializationError, scala.RuntimeException],
Chunk[Byte], Chunk[Byte]]
Composable
Handler
@NSilnitsky

@NSilnitsky
Kafka Broker
Greyhound
Consumer
Kafka
Consumer
DRILL DOWN
Site
Published
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5

@NSilnitsky
Kafka Broker
Kafka
Consumer
Site
Published
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Event loop
Greyhound
Consumer

@NSilnitsky
Kafka Broker
Kafka
Consumer
Site
Published
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Event loop
Greyhound
Consumer
Workers
Message
Dispatcher

@NSilnitsky
Kafka Broker
Kafka
Consumer
Site
Published
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Event loop
Greyhound
Consumer
Workers
Message
Dispatcher
DRILL DOWN

object EventLoop {
def make[R](consumer: Consumer /*...*/): RManaged[Env, EventLoop[...]] = {
val start = for {
running <- Ref.make(true)
fiber <- pollLoop(running, consumer/*...*/).forkDaemon
} yield (fiber, running /*...*/)
start.toManaged {
case (fiber, running /*...*/) => for {
_ <- running.set(false)
// ...
} yield ()
}
}
EventLoop
Polling

object EventLoop {
val start = for {
start.toManaged {
// ...
} yield ()
}
}
EventLoop
Polling
@NSilnitsky * dispatcher.shutdown

object EventLoop {
val start = for {
start.toManaged {
// ...
} yield ()
}
}
EventLoop
Polling
* mem leak@NSilnitsky

object EventLoop {
val start = for {
start.toManaged {
// ...
} yield ()
}
}
EventLoop
Polling
@NSilnitsky

def pollLoop[R1](running: Ref[Boolean],
consumer: Consumer
// ...
): URIO[R1 with GreyhoundMetrics, Unit] =
running.get.flatMap {
case true => for {
//...
_ <- pollAndHandle(consumer /*...*/)
//...
result <- pollLoop(running, consumer /*...*/)
} yield result
case false => ZIO.unit
}
TailRec in ZIO
@NSilnitsky

def pollLoop[R1](running: Ref[Boolean],
consumer: Consumer
// ...
): URIO[R1 with GreyhoundMetrics, Unit] =
running.get.flatMap {
case true => // ...
pollAndHandle(consumer /*...*/)
// ...
.flatMap(_ =>
pollLoop(running, consumer /*...*/)
.map(result => result)
)
case false => ZIO.unit
}
TailRec in ZIO
https://github.com/oleg-py/better-monadic-for
@NSilnitsky

object EventLoop {
val start = for {
fiber <- pollOnce(running, consumer/*, ...*/)
.doWhile(_ == true).forkDaemon
start.toManaged {
// ...
} yield ()
}
TailRec in ZIO
@NSilnitsky

object EventLoop {
type Handler[-R] = RecordHandler[R, Nothing, Chunk[Byte], Chunk[Byte]]
def make[R](handler: Handler[R] /*...*/): RManaged[Env, EventLoop[...]] = {
val start = for {
// ...
handle = handler.andThen(offsets.update).handle(_)
dispatcher <- Dispatcher.make(handle, /**/)
// ...
}
def pollOnce(/*...*/) = {
// poll and handle...
_ <- offsets.commit
Commit Oﬀsets
@NSilnitsky
* old -> pass down

Greyhound
wraps Kafka
✔ Composable Record
Handler
+ Parallel Consumption!
What do we
want it to do?

@NSilnitsky
Kafka Broker
Site
Published
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound
Consumer
ZIO FIBERS +
QUEUES

@NSilnitsky
Kafka Broker
Kafka
Consumer
Site
Published
Topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Event loop
Greyhound
Consumer
Workers
Message
Dispatcher
(THREAD-SAFE)
PARALLEL
CONSUMPTION

object Dispatcher {
def make[R](handle: Record => URIO[R, Any]): UIO[Dispatcher[R]] = for {
// ...
workers <- Ref.make(Map.empty[TopicPartition, Worker])
} yield new Dispatcher[R] {
override def submit(record: Record): URIO[..., SubmitResult] =
for {
// ...
worker <- workerFor(TopicPartition(record))
submitted <- worker.submit(record)
} yield // …
}
}
Parallel
Consumption
@NSilnitsky

object Dispatcher {
def make[R](handle: Record => URIO[R, Any]): UIO[Dispatcher[R]] = for {
// ...
workers <- Ref.make(Map.empty[TopicPartition, Worker])
} yield new Dispatcher[R] {
override def submit(record: Record): URIO[..., SubmitResult] =
for {
// ...
worker <- workerFor(TopicPartition(record))
submitted <- worker.submit(record)
} yield // …
}
}
Parallel
Consumption
* lazily
@NSilnitsky

object Worker {
def make[R](handle: Record => URIO[R, Any],
capacity: Int,/*...*/): URIO[...,Worker] = for {
queue <- Queue.dropping[Record](capacity)
_ <- // simplified
queue.take.flatMap { record =>
handle(record).as(true)
}.doWhile(_ == true).forkDaemon
} yield new Worker {
override def submit(record: Record): UIO[Boolean] =
queue.offer(record)
// ...
}
}
Parallel
Consumption
@NSilnitsky

object Worker {
def make[R](handle: Record => URIO[R, Any],
capacity: Int,/*...*/): URIO[...,Worker] = for {
queue <- Queue.dropping[Record](capacity)
_ <- // simplified
queue.take.flatMap { record =>
handle(record).as(true)
}.doWhile(_ == true).forkDaemon
} yield new Worker {
override def submit(record: Record): UIO[Boolean] =
queue.offer(record)
// ...
}
}
Parallel
Consumption
* semaphore
@NSilnitsky

class OldWorker(capacity: Int, /*...*/) {
private val tasksQueue = new LinkedBlockingDeque(capacity)
start()
private def start() = {
// simplified
val thread = new Thread(new TaskLoop)
thread.start()
}
private class TaskLoop extends Runnable {
override def run() = {
// simplified
while (true) {
val task = tasksQueue.take()
task.run()
}
}
}
...
}
Old Worker

class OldWorker(capacity: Int, /*...*/) {
private val tasksQueue = new LinkedBlockingDeque(capacity)
start()
private def start() = {
// simplified
val thread = new Thread(new TaskLoop)
thread.start()
}
private class TaskLoop extends Runnable {
override def run() = {
// simplified
while (true) {
val task = tasksQueue.take()
task.run()
}
}
}
...
}
Old Worker
* resource, context, maxPar

Greyhound
wraps Kafka
Handler
✔ Parallel Consumption
+ Retries!
What do we
want it to do?
...what about
Error handling?

val retryConfig = RetryConfig.nonBlocking(
1.second, 10.minutes)
.withConsumer(GreyhoundConsumer(
group = "group-2",
handle = handler,
retryConfig = retryConfig))
Non-blocking
Retries

@NSilnitsky
Kafka Broker
renew-sub-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound Consumer
FAILED
PROCESSING
Kafka Consumer

@NSilnitsky
Kafka Broker
renew-sub-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
renew-sub-topic-retry-0
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Inspired by Uber
RETRY!
Greyhound Consumer
Kafka Consumer
RETRY
PRODUCER

@NSilnitsky
Kafka Broker
renew-sub-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
0 1 2 3 4 5
Inspired by Uber
RETRY!
0 1 2 3 4 5
Greyhound Consumer
Kafka Consumer
RETRY
PRODUCER

Retries same message on failure
BLOCKING
POLICY
HANDLER
Kafka Broker
source-control-
update-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound Consumer
Kafka Consumer
build-log-service

val retryConfig = RetryConfig.finiteBlocking(
1.second, 1.minutes)
.withConsumer(GreyhoundConsumer(
group = "group-2",
handle = handler,
retryConfig = retryConfig))
Blocking Retries
* exponential

Retries same message on failure
* lag
BLOCKING
POLICY
HANDLER
Kafka Broker
source-control-
update-topic
0 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 50 1 2 3 4 5
Greyhound Consumer
Kafka Consumer
build-log-service

handle()
.retry(
Schedule.doWhile(_ => shouldBlock(blockingStateRef)) &&
Schedule.fromDurations(blockingBackoffs)
)
First Approach

handle()
.retry(
Schedule.doWhile(_ => shouldBlock(blockingStateRef)) &&
Schedule.fromDurations(blockingBackoffs)
)
First Approach
Doesn’t allow
delay
interruptions

foreachWhile(blockingBackoffs) { interval =>
handleAndBlockWithPolling(interval, blockingStateRef)
}
Current Solution

}
Current Solution
Checks
blockingState
between short
sleeps
Allows user
request to
unblock

}
Current Solution
def foreachWhile[R, E, A](as: Iterable[A])(f: A => ZIO[R, E, Boolean]):
ZIO[R, E, Unit] =
ZIO.effectTotal(as.iterator).flatMap { i =>
def loop: ZIO[R, E, Unit] =
if (i.hasNext) f(i.next).flatMap(result => if(result) loop else
ZIO.unit)
else ZIO.unit
loop
}

}
Current Solution
def foreachWhile[R, E, A](as: Iterable[A])(f: A => ZIO[R, E, Boolean]):
ZIO[R, E, Unit] =
ZIO.effectTotal(as.iterator).flatMap { i =>
def loop: ZIO[R, E, Unit] =
if (i.hasNext) f(i.next).flatMap(result => if(result) loop else
ZIO.unit)
else ZIO.unit
loop
}
Stream.fromIterable(blockingBackoffs).foreachWhile(...)

Greyhound
wraps Kafka
Handler
✔ Retries
+ Resilient Producer
What do we
want it to do?
and when
Kafka brokers
are
unavailable...

+ Retry
on Error
Kafka Broker
Producer
Use Case:
Guarantee completion
Consumer
Wix
Payments
Service
Subscription
renewal
Job
Scheduler

+ Retry
on Error
Kafka Broker
Producer
Use Case:
Guarantee completion
Consumer
Wix
Payments
Service
Job
Scheduler

Kafka Broker
Producer
Wix
Payments
Service
Job
Scheduler
RESILIENT
PRODUCER

Kafka Broker
Producer
Wix
Payments
Service
Job
Scheduler
FAILS TO
PRODUCE
Save message
to disk

Kafka Broker
Producer
Wix
Payments
Service
Job
Scheduler
FAILS TO
PRODUCE
Save message
to disk
Retry on failure

Greyhound
wraps Kafka
Handler
✔ Retries
✔ Resilient Producer
+ Context Propagation
What do we
want it to do?
Super cool for
us

@NSilnitsky
CONTEXT
PROPAGATION
Language
User
Type
USER REQUEST METADATA
Sign up
Site-Members
ServiceGeo
...
Browser

CONTEXT
PROPAGATION
Site-Members
Service
Kafka Broker
Producer
Topic/Partition/Oﬀset
Headers
Key
Value
timestamp
Browser

CONTEXT
PROPAGATION
Site-Members
Service
Kafka Broker
Producer
Contacts
Service
Consumer
Browser

context = contextFrom(record.headers, token)
handler.handle(record).provideSomeLayer[UserEnv](Context.layerFrom(context))
Context
Propagation

context = contextFrom(record.headers, token)
handler.handle(record,
controller).provideSomeLayer[UserEnv](Context.layerFrom(context))
RecordHandler((r: ConsumerRecord[String, Contact]) => for {
context <- Context.retrieve
_ <- ContactsDB.write(r.value, context)
} yield ()
=>
RecordHandler[ContactsDB with Context, Throwable, Chunk[Byte], Chunk[Byte]]
Context
Propagation

Greyhound
wraps Kafka
more features
✔ Composable Record Handler
✔ Retries
✔ Context Propagation
✔ Pause/resume consumption
✔ Metrics reporting

Greyhound
wraps Kafka
✔ Composable Record Handler
✔ Retries
✔ Context Propagation
✔ Pause/resume consumption
✔ Metrics reporting
future plans
+ Batch Consumer
+ Exactly Once Processing
+ In-Memory KV Stores
Will be much
simpler with
ZIO

Greyhound
Producer
Greyhound
Consumer
Kafka
Consumer
Kafka
Producer
Kafka Broker

GREYHOUND
USE CASES
AT WIX
Pub/Sub
CDC
Oﬄine Scheduled Jobs
DB (Elastic
Search) replication
Action retries
Materialized
Views

- Much less boilerplate
- Code that’s easier to understand
- Fun
REWRITING
GREYHOUND IN ZIO
RESULTED IN...

...but ZIO oﬀers lower level abstractions too, like Promise and clock.sleep.
SOMETIMES YOU CAN’T DO
EXACTLY WHAT YOU WANT WITH
HIGH LEVEL ZIO OPERATORS

A Scala/Java high-level SDK for Apache Kafka.
0.1 is out!
github.com/wix/greyhound

Thank You
natans@wix.com twitter @NSilnitsky linkedin/natansilnitsky github.com/natansil

Slides & More
slideshare.net/NatanSilnitsky
medium.com/@natansil
twitter.com/NSilnitsky
natansil.com

Greyhound - Powerful Pure Functional Kafka Library

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Greyhound - Powerful Pure Functional Kafka Library

Similar to Greyhound - Powerful Pure Functional Kafka Library (20)

More from Natan Silnitsky

More from Natan Silnitsky (20)

Recently uploaded

Recently uploaded (20)

Greyhound - Powerful Pure Functional Kafka Library