@samuelroze
Event Streaming Some things you want to know about.
@samuelroze
@samuelroze
Introduction
• My name is Samuel Rozé, I am VPoE at Birdie Care.
Core Team member of Symfony, for my work on Messenger.
• This is an architecture talk.
• We will briefly discuss the values of using stream processing event
streaming.
• We will see the consequences of living the dream of managing distributed
systems. TL;DR: plenty of things will go wrong.
@samuelroze@samuelroze
1. Why …is event streaming even interesting?
@samuelroze
Your product works… you split your services.
@samuelroze
Your services are talking to each other.
@samuelroze
Now you need to introduce targeted discounts…
@samuelroze
Now you need to introduce targeted discounts…
@samuelroze
How will this service get its data?
1. Pull via an API
• A lot of data will be moved each
time the “discount” service computes
discounts for a customer.
• “Discounts” is able to work only
when the 3 other services are
available (cascading failures).
• “Discounts” needs to know about
where are the other services and
how to talk to them.
2. Using “batch”
• Potentially contains loads of duplicated
information (full load each time or the
period is “over X days”)
• Not real-time. “Wait a few days for your
marketing preferences to be propagated”
• A lot can go wrong with all services
properly creating exports every night.
@samuelroze
Event streaming
• Events are flowing in real-time, from and
to multiple services.
• To receive a specific event, services don’t
have to know who is sending events, just
that they can expect these messages.
• Much higher availability because data
goes to the service that requires it when they
are online.
• (When bus does persistence) New
consumers create their context by going
through all the events that have happened in
the system.
• Writing code that works well with the
nature of the distributed system is hard.
• You need a real governance about how
is the message bus used, they are your
new API contracts.
@samuelroze
Event streaming, as a diagram
@samuelroze
Everything we are going to talk about is true for…
@samuelroze@samuelroze
2. What will go wrong? It’s not “if”.
@samuelroze
Let’s start with a simple use-case.
Here we write on the `Basket` entity for example
@samuelroze
Your message is sent to a queue
@samuelroze@samuelroze
Problem A Are you sure that the message 

was sent to the queue?
@samuelroze
A. Are you sure that your message has been sent?
@samuelroze
A. Are you sure that your message has been sent?
@samuelroze
A. Distributed transactions are not really a thing.
@samuelroze
A. What might happen if we don’t care about that?
• Your local “basket” table might have the new product but no worker receives
the “ProductAddedToBasket” event.
• Your local “basket” table might NOT have the new product but workers have
received the “ProductAddedToBasket” event (most likely if you use
database transactions for the entire request)
• Imagine the event being about “payment successful” or even potentially life-
changing like “fall in the home has been detected”… 😬
@samuelroze
A. The outbox pattern
@samuelroze
A. Publishing messages to bus consistently
• In a nutshell, write your message & side effects to your database as part of
one transaction and then get something else to pull the message from the
database and send it to your queue.
• With Symfony, the simplest is actually to use the Doctrine transport for
Symfony Messenger, with the doctrine transaction middleware.
• Alternatively, you can use a dedicated library for this.
• EventSaucePHP/DoctrineOutboxMessageDispatcher
• italolelis/outboxer
@samuelroze
A. Using the Doctrine transport
@samuelroze@samuelroze
Problem B You will receive duplicated
messages.
@samuelroze
B. “At least once delivery”
@samuelroze
B. What might happen if we don’t care about that?
• You consume twice “ProductAddedToBasket”: the product is added twice
instead of just once (as per the user request).
• Depending on your business logic, it might be very important. For example,
what if it is about “Money added to bank account” or “Medication dose
taken”.
@samuelroze
B. You need some idempotence.
• You will receive the same message multiple times, it’s just a matter of time.
• There isn’t much a framework could do, you own the business logic; you
need to handle it by yourself.
• Use an idempotency key. A key that represents a single message and
allows you to know whether or not it’s been processed already.
• (By the way, this also applies to HTTP requests. Stripe’s API is a good
example.)
@samuelroze
B. Using the idempotency key in the handler
• One option is to have your idempotency key is part of your message. Your
team needs to know why it is useful and how to use it.
@samuelroze
B. How to use your “idempotency key”
@samuelroze
B. How to use your “idempotency key”
@samuelroze@samuelroze
Problem C Processing messages in parallel.
@samuelroze
C. Concurrently processing messages
@samuelroze
C. What might happen if we don’t care about that?
• You will lose some state in whatever you updated based on the events, at
some point.
• The easiest solution: don’t process things concurrently. But, not really
practical when things start to scale.
@samuelroze
C. Locking! Optimistic vs Pessimistic
The optimist…
• Assumes that everything will go
right most of the time.
• It validates that everything has
happened as expected when
writing its state to a consistent
storage.
• a.k.a. HTTP’s If-Match, …
The pessimist…
• Believes that I most cases, this
won’t work.
• Before doing any work, it ensures
nobody else is doing it.
• a.k.a. “mutex”, “advisory locks”,
etc…
@samuelroze
C. Pessimistic locking with Symfony Lock
@samuelroze
C. Optimistic locking with Doctrine’s “versions"
@samuelroze
C. Optimistic locking with Doctrine’s “versions"
What’s happening behind the scene
with optimistic locking:
@samuelroze@samuelroze
Problem D Message ordering
@samuelroze
D. Know when there is no ordering guarantee
@samuelroze
D. What might happen if we don’t care about that?
• Hopefully your business logic doesn’t rely too much on the events being
ordered… make sure this is true.
• For example, we rely on “access_granted” and “access_revoked” events to
configure some permission rules. If they are consumed in the wrong
order… this is a different meaning 💥
@samuelroze
D. There are buses that guarantee order
@samuelroze
D. They scale using partitions
@samuelroze
D. Order guaranteed means blocking messages.
@samuelroze
D. For the infrastructure to guarantee ordering…
• You need a message bus that supports it (Kafka, SQS Fifo, Kinesis, etc…).
• You need to carefully design your partitions (or “shards”) so that you know all
message of a specific aggregate will always go to the same partition (a.k.a.
routing keys).
• You need to carefully manage all the errors. You can’t afford a wrong
message blocking an entire topic. But you can’t really post-pone only one single message…
@samuelroze@samuelroze
To wrap up… A few learnings (hopefully).
@samuelroze
We’ve seen a few ways it can go wrong.
• When publishing a message to a bus.
Outbox pattern FTW.
• When receiving multiple time the same message.
Idempotence FTW.
• When concurrently consuming messages.

You need to use optimistic or pessimistic locking.
• You can request ordering from your infrastructure.
But needs careful partition design & error management.
@samuelroze
Thank you!
@samuelroze
@samuelroze
Want to read more?
• Martin Kleppman’s book.
https://dataintensive.net
• https://multithreaded.stitchfix.com/blog/2017/06/26/patterns-of-soa-
idempotency-key/
• https://microservices.io/patterns/data/transactional-outbox.html

Event streaming: what will go wrong? (Symfony World 2020)

  • 1.
    @samuelroze Event Streaming Somethings you want to know about. @samuelroze
  • 2.
    @samuelroze Introduction • My nameis Samuel Rozé, I am VPoE at Birdie Care. Core Team member of Symfony, for my work on Messenger. • This is an architecture talk. • We will briefly discuss the values of using stream processing event streaming. • We will see the consequences of living the dream of managing distributed systems. TL;DR: plenty of things will go wrong.
  • 3.
    @samuelroze@samuelroze 1. Why …isevent streaming even interesting?
  • 4.
    @samuelroze Your product works…you split your services.
  • 5.
    @samuelroze Your services aretalking to each other.
  • 6.
    @samuelroze Now you needto introduce targeted discounts…
  • 7.
    @samuelroze Now you needto introduce targeted discounts…
  • 8.
    @samuelroze How will thisservice get its data? 1. Pull via an API • A lot of data will be moved each time the “discount” service computes discounts for a customer. • “Discounts” is able to work only when the 3 other services are available (cascading failures). • “Discounts” needs to know about where are the other services and how to talk to them. 2. Using “batch” • Potentially contains loads of duplicated information (full load each time or the period is “over X days”) • Not real-time. “Wait a few days for your marketing preferences to be propagated” • A lot can go wrong with all services properly creating exports every night.
  • 9.
    @samuelroze Event streaming • Eventsare flowing in real-time, from and to multiple services. • To receive a specific event, services don’t have to know who is sending events, just that they can expect these messages. • Much higher availability because data goes to the service that requires it when they are online. • (When bus does persistence) New consumers create their context by going through all the events that have happened in the system. • Writing code that works well with the nature of the distributed system is hard. • You need a real governance about how is the message bus used, they are your new API contracts.
  • 10.
  • 11.
    @samuelroze Everything we aregoing to talk about is true for…
  • 12.
    @samuelroze@samuelroze 2. What willgo wrong? It’s not “if”.
  • 13.
    @samuelroze Let’s start witha simple use-case. Here we write on the `Basket` entity for example
  • 14.
  • 15.
    @samuelroze@samuelroze Problem A Areyou sure that the message 
 was sent to the queue?
  • 16.
    @samuelroze A. Are yousure that your message has been sent?
  • 17.
    @samuelroze A. Are yousure that your message has been sent?
  • 18.
  • 19.
    @samuelroze A. What mighthappen if we don’t care about that? • Your local “basket” table might have the new product but no worker receives the “ProductAddedToBasket” event. • Your local “basket” table might NOT have the new product but workers have received the “ProductAddedToBasket” event (most likely if you use database transactions for the entire request) • Imagine the event being about “payment successful” or even potentially life- changing like “fall in the home has been detected”… 😬
  • 20.
  • 21.
    @samuelroze A. Publishing messagesto bus consistently • In a nutshell, write your message & side effects to your database as part of one transaction and then get something else to pull the message from the database and send it to your queue. • With Symfony, the simplest is actually to use the Doctrine transport for Symfony Messenger, with the doctrine transaction middleware. • Alternatively, you can use a dedicated library for this. • EventSaucePHP/DoctrineOutboxMessageDispatcher • italolelis/outboxer
  • 22.
    @samuelroze A. Using theDoctrine transport
  • 23.
    @samuelroze@samuelroze Problem B Youwill receive duplicated messages.
  • 24.
  • 25.
    @samuelroze B. What mighthappen if we don’t care about that? • You consume twice “ProductAddedToBasket”: the product is added twice instead of just once (as per the user request). • Depending on your business logic, it might be very important. For example, what if it is about “Money added to bank account” or “Medication dose taken”.
  • 26.
    @samuelroze B. You needsome idempotence. • You will receive the same message multiple times, it’s just a matter of time. • There isn’t much a framework could do, you own the business logic; you need to handle it by yourself. • Use an idempotency key. A key that represents a single message and allows you to know whether or not it’s been processed already. • (By the way, this also applies to HTTP requests. Stripe’s API is a good example.)
  • 27.
    @samuelroze B. Using theidempotency key in the handler • One option is to have your idempotency key is part of your message. Your team needs to know why it is useful and how to use it.
  • 28.
    @samuelroze B. How touse your “idempotency key”
  • 29.
    @samuelroze B. How touse your “idempotency key”
  • 30.
  • 31.
  • 32.
    @samuelroze C. What mighthappen if we don’t care about that? • You will lose some state in whatever you updated based on the events, at some point. • The easiest solution: don’t process things concurrently. But, not really practical when things start to scale.
  • 33.
    @samuelroze C. Locking! Optimisticvs Pessimistic The optimist… • Assumes that everything will go right most of the time. • It validates that everything has happened as expected when writing its state to a consistent storage. • a.k.a. HTTP’s If-Match, … The pessimist… • Believes that I most cases, this won’t work. • Before doing any work, it ensures nobody else is doing it. • a.k.a. “mutex”, “advisory locks”, etc…
  • 34.
  • 35.
    @samuelroze C. Optimistic lockingwith Doctrine’s “versions"
  • 36.
    @samuelroze C. Optimistic lockingwith Doctrine’s “versions" What’s happening behind the scene with optimistic locking:
  • 37.
  • 38.
    @samuelroze D. Know whenthere is no ordering guarantee
  • 39.
    @samuelroze D. What mighthappen if we don’t care about that? • Hopefully your business logic doesn’t rely too much on the events being ordered… make sure this is true. • For example, we rely on “access_granted” and “access_revoked” events to configure some permission rules. If they are consumed in the wrong order… this is a different meaning 💥
  • 40.
    @samuelroze D. There arebuses that guarantee order
  • 41.
    @samuelroze D. They scaleusing partitions
  • 42.
    @samuelroze D. Order guaranteedmeans blocking messages.
  • 43.
    @samuelroze D. For theinfrastructure to guarantee ordering… • You need a message bus that supports it (Kafka, SQS Fifo, Kinesis, etc…). • You need to carefully design your partitions (or “shards”) so that you know all message of a specific aggregate will always go to the same partition (a.k.a. routing keys). • You need to carefully manage all the errors. You can’t afford a wrong message blocking an entire topic. But you can’t really post-pone only one single message…
  • 44.
    @samuelroze@samuelroze To wrap up…A few learnings (hopefully).
  • 45.
    @samuelroze We’ve seen afew ways it can go wrong. • When publishing a message to a bus. Outbox pattern FTW. • When receiving multiple time the same message. Idempotence FTW. • When concurrently consuming messages.
 You need to use optimistic or pessimistic locking. • You can request ordering from your infrastructure. But needs careful partition design & error management.
  • 46.
  • 47.
    @samuelroze Want to readmore? • Martin Kleppman’s book. https://dataintensive.net • https://multithreaded.stitchfix.com/blog/2017/06/26/patterns-of-soa- idempotency-key/ • https://microservices.io/patterns/data/transactional-outbox.html