Nyc kafka meetup 2015 - when bad things happen to good kafka clusters

When Bad Things Happen to
Good Kafka Clusters
True stories that actually happened to production Kafka clusters
As told by
Gwen Shapira, System Architect
@gwenshap 1

Disclaimer
I am talking about other people’s systems
Not yours.
I am sure you had perfectly good reasons to configure your system the
way you did.
This is not personal criticism
Just some stories and few lessons we learned the hard way
2

POCs are super easy
Its time to go production
3

We keep our data in
/tmp/logs
What can possible go
wrong?
4

Replication-factor of 3 is way too much
5

__consumer_offsets topic?
Never heard of it, so its probably
ok to delete.
6

What’s wrong with running Kafka 0.7?
8

Remember that time when…
We accidentally lost all our data?
9

We added new partitions…
And immediately ran out of memory
10

We wanted to lookup records by time
The smaller the segments, the more accurate
the lookups
So we created 10k segments.
11

We need REALLY LARGE messages
12

We just serialize JSON
and throw it into a topic.
It’s easy.
The consumers will figure something out.
13

Log4J is a great way to
reliably send data to Kafka
14

Keep your Kafka safe!
“When it absolutely, positively has to be there:
Reliability guarantees in Apache Kafka”
Wednesday, 11:20am, Room 3D
15

Thank you
16
Visit Confluent in booth #929
Books, Kafka t-shirts & stickers, and more…
Gwen Shapira | gwen@confluent.io | @gwenshap

Nyc kafka meetup 2015 - when bad things happen to good kafka clusters

More Related Content

What's hot

Viewers also liked

More from Gwen (Chen) Shapira

Recently uploaded

Nyc kafka meetup 2015 - when bad things happen to good kafka clusters

Editor's Notes