This document discusses patterns and anti-patterns of streaming data. It begins by defining streaming as the simultaneous processing of record-by-record data from multiple sources. Some key points covered include the differences between message queues and event logs, how to structure streaming records, delivery guarantees for streaming data, and how to handle failures in distributed streaming systems. The document provides recommendations for designing streaming platforms and lists several recommended books and resources for further reading on streaming concepts.
2. 2
Babylon believes it is possible to put an
accessible and affordable health service in the
hands of every person on earth.
3. What do we mean
image source: https://media.giphy.com/media/l3q2XB76CaWPggiNW/giphy.gif
by streaming Space for image:
Delete this box and send
image to back once pasted
4. 4
Streaming is the process of
processing record-by-record data
which has been simultaneously
generated by multiple sources
6. 6
Stream Processing are techniques
that aim to process streaming data
without having access to the full
dataset
7. 7
Streaming Platform is an ecosystem
of tools and documentation aimed at
enabling streaming data and
streaming processing
8. ● Message queue or event log
● Message, event and commands
● How to structure records
● The right guarantees
● When things go wrong
Agenda
Image source: https://www.deactivate-account.com/wp-content/uploads/2017/03/Meetup_logo.png
Space for image:
Delete this box and send
image to back once pasted
12. 12
Message queue
● Transient, once it’s been read, it’s gone
● Establishes a one to one connection
● Requires routing for one to many connection
● Supports a definition of priority
● Has no notion of ordering between messages
Image sources: https://www.rabbitmq.com/img/RabbitMQ-logo.svg
https://zeromq.org/images/logo.gif
13. 13
Event Log
● immutable
● establish one to many connection
● no definition of priority
● provides ordering guarantees
● no out-of-the-box routing
17. 17
Message / Event / Command
An event is a message containing a statement of
fact.
An event is a message informing multiple listeners
that something has occurred
19. 19
What about commands?
A command is message with an explicit or
implicit statement of intent sent by a producer
towards a specific consumer
A producer is often concerned with the outcome
of the command
24. 24
Schemaless
Simplest thing that works
● Onus on the consumer to discover
● Complexity grows as # of consumers
grows
Works well if you control both producer and consumer
Image source: https://gph.is/19YqVP5
25. 25
Schema-centric
Treat Data is an asset
Provide contracts between producer and
consumers
Schemas evolve
Works well if you have more than one audience
Image source: https://gph.is/g/ZY8BB3a
28. 28
Validation at broker side
Image source: https://www.flaticon.com/authors/freepik
Broker
Consumer
Consumer
Consumer
Publisher
Publisher
29. 29
What about Kafka ecosystem
Image source: https://www.striim.com/wp-content/themes/striim-ap/integrations/AVRO.png
Schema
Registry
KIP-467
+
30. 30
Notification Event
● Tiny and Simple
● Generally requires a callback to extract information
● Potential loss of state
● GOOD FOR: updating to current operational persistence layers
(Like a search engine)
An event that tells consumers that “something has
changed”, but not necessarily what that change is.
Often just an event type, and a URI.
31. 31
Event-Carried State Transfer
● Larger Event Payloads and Data Duplication
● No callback needed, consumer can materialise, or use full state
● Full state history available
● More complex consumer
● GOOD FOR: improved resilience, lower latency, better decoupling
An event is generated such that a consumer
ideally would not need to contact the source
system in order to do further work.
32. 32
Domain bound
● Must be generated when a change occurs in a domain that
we wish to inform others about
● Should, like a good API, hide the internals of the domain
● Be well documented, understandable and jargon free
● Provide for order
● GOOD FOR: interdomain or external communication
Based on DDD, this uses the domain-bound context
to logically separate “external” vs “internal” events,
therefore domain bound event.
34. 34
Fire and forget vs ACK
t
Fire and
forget
t
ack
A
pod
B
pod
C
pod
D
pod
E
pod
A
pod
B
pod
B B B C
pod pod
D D
Image source: https://www.flaticon.com/authors/smashicons
35. 35
Delivery guarantees
Producer Consumer
At most once producer.send(<RECORD>)
consumer.poll()
consumer.commit()
<HANDLE RECORD>
At least once producer.send(<RECORD>)
consumer.poll()
<HANDLE RECORD>
consumer.commit()
Exactly once
producer.beginTransaction()
producer.send(<RECORD>)
producer.endTransaction()
consumer.poll()
<HANDLE RECORD>
consumer.commit()
44. 44
Last but not least
This is a shortlist of all questions we
have asked ourselves while building a
streaming platform
45. ● Designing Data-Intensive Applications: The
Big Ideas Behind Reliable, Scalable, and
Maintainable Systems
by Martin Kleppmann
● Kafka - The Definitive Guide
by Neha Narkhede
● Kafka Streams in Action: Real-time apps and
microservices with the Kafka Streams API
by William P. Bejeck Jr
● What do you mean by “Event-Driven”?
by Martin Fowler
Recommended reading
Image source: https://media.giphy.com/media/uMPjuulT3rpRe/giphy.gif
Space for image:
Delete this box and send
image to back once pasted