Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcelona only)


Published on

How to implement a solution using Kafka as a distributed database, KafkaStreams as a glue for different services and how to apply some Domain Driven Design concepts to ensure data integrity and design the boundaries of each service.

Published in: Software
  • Be the first to comment

A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcelona only)

  2. 2. Kevin Mas Ruiz Thoughtworker Alexey Gravanov AutoScoutie WHO WE ARE?
  3. 3. WHAT TO EXPECT? ● To meet ScoutWorks :) ● Tales about business requirements ● A brief introduction to some Kafka & Kafka Streams conventions ● See how we designed our architecture ● Talk about resilience in a functional architecture
  4. 4. AUTOSCOUT24 ● Platform for selling cars & motorbikes ● 8 countries + 10 language versions ● 55+ thousands dealers ● 2,4+ millions listings ● 3+ billions page impression per month ● 10+ millions active users per month
  5. 5. OUR DOMAIN ● Core of domain are listings ● Images are one of the main point of information of listings ● Dealers want to export those listings to other marketplaces
  6. 6. OUR PRODUCT A system able to export dealers’ high quality listings to other marketplaces to improve her visibility on the market.
  7. 7. BUSINESS REQUIREMENTS ● A dealer is capable of enabling and disabling the export process ● All active listings of a dealer will be exported ● Exported listings that become inactive or deleted should be hidden on external marketplaces
  8. 8. MORE BUSINESS REQUIREMENTS ● It’s acceptable to not have latest listing information exported in real-time, but it should be eventually updated ● It’s important to have all listings on external marketplaces ASAP to ensure visibility ● Listings data format is dynamic, so it should be possible to reprocess the listing and export again
  9. 9. TECH REQUIREMENTS ● Load fluctuates during the day, scaling up / down is mandatory ● Easy to add additional marketplaces ● Easy to monitor / trace any listing
  10. 10. DATA FLOW
  11. 11. KAFKA
  12. 12. WHAT IS KAFKA? ● Distributed streaming platform ● Records are published in topics, which formed by partitions ● Each partition is an append-only (*) structured commit log ● Records consist of partition key, a value and a timestamp, and an assigned offset, which means position of record in the log
  13. 13. KAFKA GUARANTEES ● Sharding of records based on partition key ● Replication of records depending on configuration ● Ordering of records within partition ● At-least-once delivery guarantee of records
  14. 14. WHY KAFKA? Kafka is often used for building real-time streaming applications that transform or react to the streams of data.
  15. 15. WHY KAFKA? ● Listings change propagation fits very well to Kafka streaming mindset ● Possibility to go back in time and reprocess records if needed ● Enables developers to design thinking in a composition of small functions
  16. 16. KAFKA STREAMS ● Opinionated library to process streams or records ● Provides possibility to build elastic, scalable and fault-tolerant solutions ● Uses Kafka to store current offsets / intermediate state of processed data ● Supports stateless processing, stateful processing or windowing operations, e.g. aggregates of records ● For stateless operations, allows to see microservices as state-ignorant pure functions, letting Kafka Streams to take care of side-effects
  18. 18. STREAMING VS MESSAGING ● Very similar approaches, but... ● Who has the fish? ● Go back in time and re-process records? ● Ordered records for a single aggregate root
  20. 20. Functions run once and completely, can not be interrupted Atomic Composable Functions can be chained generating more abstract and business-related algebras State-ignorant State is shared as a parameter, avoiding mutable state between functions FUNCTIONS ARE
  21. 21. CONSISTENCY BOUNDARIES ● Can only be ensured on a single partition ● Is degraded when repartitioning
  22. 22. AGGREGATE ROOT ● Is the boundary of consistency ● Is a set of records in a single topic with the same partition key ● Represents a single business object (for example, a Listing)
  23. 23. TOPOLOGY
  24. 24. TOPOLOGY
  25. 25. TOPOLOGY
  26. 26. Functions are based on an iterative business language, not on size
  28. 28. "Everything fails all the time." Werner Vogels VP & CTO at
  29. 29. KAFKA For every topic with replication factor of N, Kafka tolerates failures up to N-1 nodes.
  30. 30. KAFKA STREAMS ● One node setup: after coming back, picking up where processing stopped ● Multi-node setup: other nodes taking over, but… ○ Stateless processor: continue working as soon as nodes are re-balanced ○ Stateful processor, simple setup: can take a while until state is built up ○ Stateful processor, hot stand-by setup: local state is being build-up, but records are not being actually processed until failover happens
  31. 31. LEARNINGS ● Function signature should be unique (only one function should be responsible of a single transformation) ● Functions, by design, should not pertain to a single domain, but map two domains ● The consistency boundary is a partition (or a single aggregate root)
  32. 32. LEARNINGS ● A system can be seen as a composition of functions, but data needs to be managed by an external system. ● As a function, we should test transformations, not side-effects. ● Adding a correlation id on data sources is really useful for tracing, but boundaries should be chosen carefully.
  33. 33. LEARNINGS ● Kafka Streams should not be used for external I/O. For example, if you need a service that makes HTTP requests, use another streaming engine for that (we used Akka Streams). ● Kafka Streams’ learning curve is really steep. ● Kafka Streams and Kafka by default are not there yet for medium size messages (like ~50KB). You will need to tweak and optimize the configuration.
  34. 34. LEARNINGS ● Backpressure is a natural fit as functions are pull-based. ● Single-direction data-flow is a mindset that needs to be learned and improved.
  35. 35. THANK YOU For questions or suggestions: Kevin Mas Ruiz (@skmruiz) Alexey Gravanov (@gravanov)