Building services with kafka streams

KAFKA STREAMS
BUILDING SERVICES WITH

KAFKA 101
TOPIC CONSUMERPRODUCER
KAFKA

KAFKA 101
KAFKA
a, b, c a, b, c

KAFKA 101
TOPIC (3) CONSUMERPRODUCER
KAFKA
a, b, c a, b, c

KAFKA 101
CONSUMERPRODUCER
CONSUMER
CONSUMER
KAFKA
TOPIC (3)
a, b, c c
a
b

KAFKA 101
CONSUMERPRODUCER
CONSUMER
CONSUMER
KAFKA
TOPIC (3)
1
2
3
CONSUMER GROUP
a, b, c c
a
b

KAFKA 101
CONSUMER
PRODUCER
CONSUMER
CONSUMER
KAFKA
TOPIC (3)
1
2
3
CONSUMER GROUP
CONSUMER
a, b, c
c
a
b

KAFKA 101
CONSUMERPRODUCER
CONSUMER
CONSUMER
KAFKA
TOPIC (3)
1
2
3
CONSUMER GROUP
a, b, c

KAFKA 101
CONSUMERPRODUCER
CONSUMER
KAFKA
TOPIC (3)
1
23
CONSUMER GROUP
a, b, c c, a
b

KAFKA 101
CONSUMERPRODUCER
CONSUMER
KAFKA
TOPIC (3)
1
23
CONSUMER GROUP
CONSUMER
123
CONSUMER GROUP TOO
a, b, c c, a
b
a, b, c

KAFKA @ NEW RELIC
‣ default message broker for backend services
‣ 958 topics in production cluster
‣ 100TB of data in the cluster
‣ “kafka topics as a service” for the product teams

WHAT ARE WE BUILDING?
METRICS_5M
{
"id": 1,
"ts": 5,
"latency": 10
},
{
"id": 1,
"ts": 25,
"latency": 20
}

{
"id": 1,
"name": foo,
"region": us
}
METRICS_5M
{
"id": 1,
"ts": 5,
"latency": 10
},
{
"id": 1,
"ts": 25,
"latency": 20
}
METADATA

METRICS_5M
{
"id": 1,
"ts": 5,
"latency": 10
},
{
"id": 1,
"ts": 25,
"latency": 20
}
{
"id": 1,
"name": foo,
"region": us
}
METADATA
{
"id": 1,
"ts": 60,
“avg.latency": 15,
"name": foo,
"region": us
}
METRICS_1H

METRICS_5M
METADATA
JAVA
APP
{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
{
"id": 1,
"ts": 5,
"latency": 10
},
{
"id": 1,
"ts": 25,
"latency": 20
}
{
"id": 1,
"name": foo,
"region": us
}

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M
METADATA
JAVA
APP
{"id": 1}
{
"id": 1,
"ts": 5,
"latency": 10
},
{
"id": 1,
"ts": 25,
"latency": 20
}
{
"id": 1,
"name": foo,
"region": us
}

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M
METADATA
JAVA
APP
{
"id": 1,
"ts": 5,
"latency": 10
},
{
"id": 1,
"ts": 25,
"latency": 20
}
{
"id": 1,
"name": foo,
"region": us
}

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M (3)
METADATA
JAVA
APP
JAVA
APP
JAVA
APP
{
"id": 1,
"ts": 5,
"latency": 10
},
{
"id": 1,
"ts": 25,
"latency": 20
}
{
"id": 1,
"name": foo,
"region": us
}

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METADATA
JAVA
APP
JAVA
APP
JAVA
APP
METRICS_5M (3)

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METADATA
JAVA
APP
JAVA
APP
JAVA
APP
METRICS_5M (3)
METADATA (3)

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M (3)
KSTREAMS
APP
METADATA

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M (3)
KSTREAMS
APP
METADATA (3)
METADATA

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M (3)
KSTREAMS
APP
METADATA (3)
METADATA
AGG_STORE (3)

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M (3)
KSTREAMS
APP
METADATA (3)
METADATA
AGG_STORE (3)
KSTREAMS
APP
KSTREAMS
APP

{
"id": 1,
"ts": 60,
"name": foo,
"region": us
}
METRICS_1H
METRICS_5M (3)
KSTREAMS
APP
METADATA (3)
METADATA
AGG_STORE (3)
KSTREAMS
APP

‣ a library
WHAT’S KAFKA STREAMS

KAFKA

TOPIC JAVA APPJAVA APP
KAFKA

JAVA APP

KAFKA
CLIENT
LIBRARY
JAVA APP

JAVA APP
KAFKA
CLIENT
LIBRARY
KSTREAMS
LIBRARY

JAVA APP
+
DSL
KAFKA
CLIENT
LIBRARY
KSTREAMS
LIBRARY

‣ a library
‣ an abstraction on-top of kafka topics
‣ state stores (also kafka topic)
‣ DSL for data processing
‣ kafka is your source and your sink

‣ small product team
‣ no existing cluster
‣ existing tooling for containerised apps
‣ ﬂink and kafka streams
WHY IS IT A GOOD FIT FOR US?

‣ great dev story
‣ very versatile
‣ more implicit
‣ less exciting ops story
‣ local env is very different from prod
‣ manages it’s own resources
‣ tricky schema migrations (ﬁxed in 1.8)
‣ longer learning curve
WHY YOU NO FLINK?

WHAT DID WE LIKE
‣ easy to start, if you already use Kafka (cluster)
‣ it's simple and explicit
‣ no shufﬂes would occur behind the scene
‣ DSL is sufﬁcient for simple use cases

WHAT IS TRICKY
‣ stream architecture takes time to get used to
‣ be prepared to deploy many things
‣ partitions is the way to do horizontal scaling
‣ scaling up is non-trivial due to co-partitioning
‣ everything will be sucked into Kafka topics
‣ you have to think about schemas
‣ it's rather new (not much of StackOverﬂow wisdom yet)
‣ DSL is sufﬁcient for simple use cases

GOING FORWARD
‣ use it again? yes
‣ pick it up again, looking back? yes
‣ doing differently, looking back? not really

Building services with kafka streams

Recommended

Recommended

More Related Content

Similar to Building services with kafka streams

Similar to Building services with kafka streams (20)

Recently uploaded

Recently uploaded (20)

Building services with kafka streams