Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
KAFKA & KAFKA STREAMS
A FUNCTIONAL
ARCHITECTURE
KEVIN MAS RUIZ & ALEXEY GRAVANOV
Kevin Mas Ruiz
Thoughtworker
Alexey Gravanov
AutoScoutie
WHO WE ARE?
WHAT TO EXPECT?
● To meet ScoutWorks :)
● Tales about business requirements
● A brief introduction to some Kafka & Kafka S...
AUTOSCOUT24
● Platform for selling cars & motorbikes
● 8 countries + 10 language versions
● 55+ thousands dealers
● 2,4+ m...
OUR DOMAIN
● Core of domain are listings
● Images are one of the main point of information of listings
● Dealers want to e...
OUR PRODUCT
A system able to export dealers’ high quality listings
to other marketplaces to improve her visibility on the ...
BUSINESS REQUIREMENTS
● A dealer is capable of enabling and disabling the export process
● All active listings of a dealer...
MORE BUSINESS REQUIREMENTS
● It’s acceptable to not have latest listing information exported in real-time,
but it should b...
TECH REQUIREMENTS
● Load fluctuates during the day, scaling up / down is mandatory
● Easy to add additional marketplaces
●...
DATA FLOW
KAFKA
WHAT IS KAFKA?
● Distributed streaming platform
● Records are published in topics, which formed by partitions
● Each parti...
KAFKA GUARANTEES
● Sharding of records based on partition key
● Replication of records depending on configuration
● Orderi...
WHY KAFKA?
Kafka is often used for building real-time streaming applications
that transform or react to the streams of dat...
WHY KAFKA?
● Listings change propagation fits very well to Kafka streaming mindset
● Possibility to go back in time and re...
KAFKA STREAMS
● Opinionated library to process streams or records
● Provides possibility to build elastic, scalable and fa...
KAFKA STREAMS GUARANTEES
STREAMING VS MESSAGING
● Very similar approaches, but...
● Who has the fish?
● Go back in time and re-process records?
● O...
MODELING WITH FUNCTIONS
Functions run once and
completely, can not be
interrupted
Atomic Composable
Functions can be chained
generating more abstr...
CONSISTENCY BOUNDARIES
● Can only be ensured on a single partition
● Is degraded when repartitioning
AGGREGATE ROOT
● Is the boundary of consistency
● Is a set of records in a single topic with the same partition key
● Repr...
TOPOLOGY
TOPOLOGY
TOPOLOGY
Functions are based on an iterative
business language, not on size
WHAT ABOUT FAULT-TOLERANCE?
"Everything fails all the time."
Werner Vogels
VP & CTO at Amazon.com
KAFKA
For every topic with replication factor of N,
Kafka tolerates failures up to N-1 nodes.
KAFKA STREAMS
● One node setup: after coming back, picking up where processing stopped
● Multi-node setup: other nodes tak...
LEARNINGS
● Function signature should be unique (only one function should be
responsible of a single transformation)
● Fun...
LEARNINGS
● A system can be seen as a composition of functions, but data needs
to be managed by an external system.
● As a...
LEARNINGS
● Kafka Streams should not be used for external I/O. For example, if
you need a service that makes HTTP requests...
LEARNINGS
● Backpressure is a natural fit as functions are pull-based.
● Single-direction data-flow is a mindset that need...
THANK YOU
For questions or suggestions:
Kevin Mas Ruiz (@skmruiz)
kmas@ThoughtWorks.com
Alexey Gravanov (@gravanov)
alexey...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcel...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcel...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcel...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcel...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcel...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcel...
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcel...
Upcoming SlideShare
Loading in …5
×

A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcelona only)

67 views

Published on

How to implement a solution using Kafka as a distributed database, KafkaStreams as a glue for different services and how to apply some Domain Driven Design concepts to ensure data integrity and design the boundaries of each service.

Published in: Software
  • Be the first to comment

A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Ruiz & Alexey Gravanov (joint in Munich & Barcelona only)

  1. 1. KAFKA & KAFKA STREAMS A FUNCTIONAL ARCHITECTURE KEVIN MAS RUIZ & ALEXEY GRAVANOV
  2. 2. Kevin Mas Ruiz Thoughtworker Alexey Gravanov AutoScoutie WHO WE ARE?
  3. 3. WHAT TO EXPECT? ● To meet ScoutWorks :) ● Tales about business requirements ● A brief introduction to some Kafka & Kafka Streams conventions ● See how we designed our architecture ● Talk about resilience in a functional architecture
  4. 4. AUTOSCOUT24 ● Platform for selling cars & motorbikes ● 8 countries + 10 language versions ● 55+ thousands dealers ● 2,4+ millions listings ● 3+ billions page impression per month ● 10+ millions active users per month
  5. 5. OUR DOMAIN ● Core of domain are listings ● Images are one of the main point of information of listings ● Dealers want to export those listings to other marketplaces
  6. 6. OUR PRODUCT A system able to export dealers’ high quality listings to other marketplaces to improve her visibility on the market.
  7. 7. BUSINESS REQUIREMENTS ● A dealer is capable of enabling and disabling the export process ● All active listings of a dealer will be exported ● Exported listings that become inactive or deleted should be hidden on external marketplaces
  8. 8. MORE BUSINESS REQUIREMENTS ● It’s acceptable to not have latest listing information exported in real-time, but it should be eventually updated ● It’s important to have all listings on external marketplaces ASAP to ensure visibility ● Listings data format is dynamic, so it should be possible to reprocess the listing and export again
  9. 9. TECH REQUIREMENTS ● Load fluctuates during the day, scaling up / down is mandatory ● Easy to add additional marketplaces ● Easy to monitor / trace any listing
  10. 10. DATA FLOW
  11. 11. KAFKA
  12. 12. WHAT IS KAFKA? ● Distributed streaming platform ● Records are published in topics, which formed by partitions ● Each partition is an append-only (*) structured commit log ● Records consist of partition key, a value and a timestamp, and an assigned offset, which means position of record in the log
  13. 13. KAFKA GUARANTEES ● Sharding of records based on partition key ● Replication of records depending on configuration ● Ordering of records within partition ● At-least-once delivery guarantee of records
  14. 14. WHY KAFKA? Kafka is often used for building real-time streaming applications that transform or react to the streams of data.
  15. 15. WHY KAFKA? ● Listings change propagation fits very well to Kafka streaming mindset ● Possibility to go back in time and reprocess records if needed ● Enables developers to design thinking in a composition of small functions
  16. 16. KAFKA STREAMS ● Opinionated library to process streams or records ● Provides possibility to build elastic, scalable and fault-tolerant solutions ● Uses Kafka to store current offsets / intermediate state of processed data ● Supports stateless processing, stateful processing or windowing operations, e.g. aggregates of records ● For stateless operations, allows to see microservices as state-ignorant pure functions, letting Kafka Streams to take care of side-effects
  17. 17. KAFKA STREAMS GUARANTEES
  18. 18. STREAMING VS MESSAGING ● Very similar approaches, but... ● Who has the fish? ● Go back in time and re-process records? ● Ordered records for a single aggregate root
  19. 19. MODELING WITH FUNCTIONS
  20. 20. Functions run once and completely, can not be interrupted Atomic Composable Functions can be chained generating more abstract and business-related algebras State-ignorant State is shared as a parameter, avoiding mutable state between functions FUNCTIONS ARE
  21. 21. CONSISTENCY BOUNDARIES ● Can only be ensured on a single partition ● Is degraded when repartitioning
  22. 22. AGGREGATE ROOT ● Is the boundary of consistency ● Is a set of records in a single topic with the same partition key ● Represents a single business object (for example, a Listing)
  23. 23. TOPOLOGY
  24. 24. TOPOLOGY
  25. 25. TOPOLOGY
  26. 26. Functions are based on an iterative business language, not on size
  27. 27. WHAT ABOUT FAULT-TOLERANCE?
  28. 28. "Everything fails all the time." Werner Vogels VP & CTO at Amazon.com
  29. 29. KAFKA For every topic with replication factor of N, Kafka tolerates failures up to N-1 nodes.
  30. 30. KAFKA STREAMS ● One node setup: after coming back, picking up where processing stopped ● Multi-node setup: other nodes taking over, but… ○ Stateless processor: continue working as soon as nodes are re-balanced ○ Stateful processor, simple setup: can take a while until state is built up ○ Stateful processor, hot stand-by setup: local state is being build-up, but records are not being actually processed until failover happens
  31. 31. LEARNINGS ● Function signature should be unique (only one function should be responsible of a single transformation) ● Functions, by design, should not pertain to a single domain, but map two domains ● The consistency boundary is a partition (or a single aggregate root)
  32. 32. LEARNINGS ● A system can be seen as a composition of functions, but data needs to be managed by an external system. ● As a function, we should test transformations, not side-effects. ● Adding a correlation id on data sources is really useful for tracing, but boundaries should be chosen carefully.
  33. 33. LEARNINGS ● Kafka Streams should not be used for external I/O. For example, if you need a service that makes HTTP requests, use another streaming engine for that (we used Akka Streams). ● Kafka Streams’ learning curve is really steep. ● Kafka Streams and Kafka by default are not there yet for medium size messages (like ~50KB). You will need to tweak and optimize the configuration.
  34. 34. LEARNINGS ● Backpressure is a natural fit as functions are pull-based. ● Single-direction data-flow is a mindset that needs to be learned and improved.
  35. 35. THANK YOU For questions or suggestions: Kevin Mas Ruiz (@skmruiz) kmas@ThoughtWorks.com Alexey Gravanov (@gravanov) alexey.gravanov@scout24.com

×