What can be easier than building a data pipeline nowadays? You add a few Apache Kafka clusters, some way to ingest data (probably over HTTP), design a way to route your data streams, add a few stream processors and consumers, integrate with a data warehouse... wait, it does start to look like A LOT of things, doesn't it? And you probably want to make it highly scalable and available in the end, correct?
We've been developing a data pipeline in Demonware/Activision for a while. We learned how to scale it not only in terms of messages/sec it can handle, but also in terms of supporting more games and more use-cases.
In this presentation you'll hear about the lessons we learned, including (but not limited to):
- Message schemas
- Apache Kafka organization and tuning
- Topics naming conventions, structure and routing
- Reliable and scalable producers and ingestion layer
- Stream processing