12. @jeppec
What’s wrong with distributed transactions?
• Transactions lock resources while active
• Services are autonomous
• Can’t be expected to finish within a certain time interval
• Locking keeps other transactions from completing their job
• Locking doesn’t scale
• X Phase Commit is fragile by design
13. @jeppec
Learnings
The bad
• Task/Activity services needs to perform UPDATES across multiple data/entity
services. Requires distributed transactions
• The more synchronous request-response remote calls you have to make the
more it hurts Performance and latency.
• Robustness is lower. If one data/entity services is down it can take down
many other services.
• Coupling is higher. Multiple task/activity services manage the same CRUD
data/entity services
• Cohesion is likely lower as multiple task/activity services need to replicate
entity related logic
20. @jeppec
Accidental complexity from distributed service integration
Component
Warehouse
Component
Sales
Billing System
UI
Send-Invoice
Process-Paid-
Order
Reserve-
Items
Local transaction between 2
Components
23. @jeppec
Synchronous calls lower our tolerance for faults
• When you get an IO error
• When servers crash or restarts
• When databases are down
• When deadlocks occurs in our databases
• Do you retry?
With synchronous style Service interaction we can loose business data if there’s no automatic retry
Or we risk creating data more than once if the operation isn’t idempotent*
Client Server
Duplicated Response
Duplicated Request
Processing
Response
Request Processing
The same message can be
processed more than once
*Idempotence describes the quality of an operation
in which result and state does not change if the
operation is performed more than 1 time
24. @jeppec
Sales:Process-Paid-Order()
call Warehouse:Reserve-Items()
call Billing:Send-Invoice()
if (Billing:Call-Failed:Too-Busy?)
Wait-A-While()
call Billing:Send-Invoice()
if (Billing:Call-Failed:Too-Busy?)
Wait-A-Little-While-Longer()
call Billing:Send-Invoice()
if (Billing:Call-Failed:IO-Error?)
Save-We-Need-Check-If-Call-Billing-Succeded-After-All
AND We-Need-To-Retry call Sales:Process-Paid-Order and call Warehouse:Reserve-Items
AND Tell-Customer-That-This-Operation-Perhaps-Went-Well
if (Billing:Call-Went-Well?)
commit()
Accidental complexity from distributed service integration
Component
Warehouse
Component
Sales
Billing System
UI
Send-Invoice
Process-Paid-
Order
Reserve-
Items
Local transaction between 2
Components
29. @jeppec
Using Business Events to drive Business Processes
Sales Service
Shipping
Billing
Sales
Customers
MessageChannel
Online Ordering System
Web Shop
(Composite UI)
Billing Service
Warehouse Service
<<External>>
Order Paid
MarkOrderAsPai
d
The sales
fulfillment
processing can
now begin…
Cmd Handler
Order Paid
Apply
30. @jeppec
Choreographed Event Driven Processes
Sales Service
Order Paid
Billing Service
Shipping Service
Warehouse Service
Online Ordering System
MessageChannel(e.g.aTopic)
Order Paid
Customer
Invoiced
Order Paid
Items
Reserved
Order Paid
Shipping process
works as a Finite
State Machine
(WorkFlow)
handling the life
cycle of Shipping and
thereby forms a very
central new
Aggregate in the
System
Items
Reserved
31. @jeppec
Things are not quite the same
In a distributed systems the order in which messages arrive is not
guaranteed
In a distributed systems message delivery can and will fail!
Messages can depending on guarantees be delivered:
• At Most Once – If you don’t care about loosing messages
• Page visits
• Ad views
• Exactly Once
• Not really possible
• At Least Once
• For everything else – which is why:
Everything that can handle messages must be built with idempotency in mind!
32. @jeppec
Message Handling
public class OrderShippingProcess extends BusMessageHandler {
@BusMessageHandler
private void on(OrderPaid orderPaid, ItemsReserved itemsReserved) {
ShippingDetails shippingDetails = getShippingDetailsForOrder(orderPaid.orderId);
….
printShippingLabel(orderPaid.orderId, shippingDetails.address);
}
…
} Must also be idempotent
38. @jeppec
If we primarily model around nouns/entities we can
easily violate the SRP
Where a change to requirements
is likely to require changes
to multiple entity classes
38
52. @jeppec
Order Id
• Customer/customer-id
Linked to Order Id:
• Order Delivery address
• Order Shipping method
Linked to Order Id:
• Order item quantity
• Order item subtotal
• Order shipping price
• Order total
56. @jeppec
Service
AC - 1
AC - 2AC - 2
AC - 3
AC - 3AC - 4
Broker
Broker
(Kafka,
RabbitMQ,
JMS)
AC - 1AC - 1
AC - 1
AC - 2
AC - 2
AC - 3
AC - 3
AC - 4
AC - 4
AC - 4
57. @jeppec
Service
AC - 1
AC - 1AC - 1
AC - 1
AC - 2
AC - 2
AC - 2
AC - 2
AC - 3
AC - 3
AC - 3
AC - 3
AC - 4
AC - 4
AC - 4
AC - 4
QBus
59. @jeppec
Topics
Bus features
• Decouples publisher from subscribers
• Provides temporal decoupling
• If a subscriber is unavailable it will receive its messages when it comes
online
AC - 4
AC - 1
AC - 3
61. @jeppec
Topics
Bus features
But how do we handle:
• Coding errors in Subscribers?
• New Subscribers that didn’t exist when the events were originally
published?
AC - 4
AC - 1
AC - 3
62. @jeppec
Client handled subscriptions
• Highly resilient pattern for an Event Driven Architecture that’s backed by
Event-Sourced AC’s
• In this model the publisher of the Events is responsible for the durability of
all its Events, typically to an EventStore/EventLog.
• Each client (subscriber) maintains durable information of the last event it has
received from each publisher.
• When ever the client starts up it makes a subscription to the publisher
where it states from which point in time it wants events published/streamed
to it.
• This effectively means that publisher can remain simple and the client
(subscriber) can remain simple and we don’t need additional sophisticated
broker infrastructure such as Kafka+ZooKeeper.
63. @jeppec
Client handled subscriptions
Publisher
Subscriber A
Local storage
EventStore
Subscriber B
Local storage
Topic
Subscription
Topic
Subscription
TopicSubscriptionHandler
TopicSubscriptionHandler
EventEvent
Event Event
EventBus
Event
Event
Distributed Event Bus,
which ensures that
live events published
on an AC node in the
cluster can be seen
by all AC’s of the
same type
Singe Instance
Subscriber, which
ensures that only
one instance of
Subscriber B has
an active
subscription(s).
Other instances of
the same
subscriber are
hot-standby
<<Topic Subscriber>>
Customer_Service:Some_Ac:OrderEvents
<<Topic Publisher>>
Sales_Service:OrderEvents
65. @jeppec
Time
07:39
Time
07:40
Time
07:41
Time
07:45
Time
07:46
Time
07:50
Type Aggregate
Identifier
Sequence
Number
Timestamp Event
Identifier
EventType SerializedEvent
Order 14237 0 2014-01-06 7:39 {Guid-1} OrderCreated <serialized event>…
Order 14237 1 2014-01-06 7:40 {Guid-2} ProductAdded <serialized event>…
Order 14237 2 2014-01-06 7:41 {Guid-3} ProductAdded <serialized event>…
Order 14237 3 2014-01-06 7:45 {Guid-4} ProductRemoved <serialized event>…
Order 14237 4 2014-01-06 7:46 {Guid-5} ProductAdded <serialized event>…
Order 14237 5 2014-01-06 7:50 {Guid-6} OrderAccepted <serialized event>…
Order 14238 0 2014-01-07 9:10 {Guid-X} OrderCreated <serialized event>…
DomainEvents Table
66. @jeppec
Event Replaying
Type Aggregate
Identifier
Sequence
Number
Timestamp Event
Identifier
EventType SerializedEvent
Order 14237 0 2014-01-06 7:39 {Guid-1} OrderCreated <serialized event>…
Order 14237 1 2014-01-06 7:40 {Guid-2} ProductAdded <serialized event>…
Order 14237 2 2014-01-06 7:41 {Guid-3} ProductAdded <serialized event>…
Order 14237 3 2014-01-06 7:45 {Guid-4} ProductRemoved <serialized event>…
Order 14237 4 2014-01-06 7:46 {Guid-5} ProductAdded <serialized event>…
Order 14237 5 2014-01-06 7:50 {Guid-6} OrderAccepted <serialized event>…
Order
Accepted: true
Orderline
Orderline
67. @jeppec
Topics
Bus features
AC - 4
AC - 1
AC - 3
bus.registerReplayableTopicPublisher(InternalPricingEvents.TOPIC_NAME,
replayFromAggregate(Pricing.class)
.dispatchAggregateEventsOfType(
InternalPricingEvents.class
)
);
bus.subscribeTopic(SERVICE_AC_ID.topicSubscriber(”Pricing"),
InternalPricingEvents.TOPIC_NAME,
new PricingTopicSubscription(bus));
69. @jeppec
Topics
Bus features
Features:
• The Bus provides automatic and durable handling of Redelivery in case of message handling failure
through a Redelivery Policy
• Exponential Back-off
• Max retries
• Dead letter/Error Queue
• Support for resubscription at any point in the timeline of an Event Stream
• Automatically tracking of resubscription points - aka. resubscribe at last tracked point
AC - 4
AC - 1
AC - 3
72. @jeppec
Bus features
Features:
• Support for sending a message to a single consumer
• Default pattern is Competing Consumers
• The bus provides durability for all messages sent on a Queue
• The Bus provider automatic and durable handling of Redelivery in case of message handling failure
through a Redelivery Policy
• Exponential Backoff
• Max retries
• Dead letter/Error Queue
Durable Queues
74. @jeppec
Bus features
Notifications:
• Durable notifications with failover support
• bus.notify(notificationQueue, notificationMessage)
Notifications
Distributed
Broadcast
Broadcast:
• Broadcast a Message to all AC’s in the cluster
• Broadcast a Message to a UI client (all, per user, per privilege)
75. @jeppec
Bus features
Single Instance Task:
• Ensures that only one Active instance of a Task is active in the cluster at one time
• Other tasks of the same type are in hot standby
• Used to e.g. group multiple subscribers, to ensure that all subscribers are either all
active or standby.
• Used by our ViewRepositories
Distributed
SingleInstanceTask
bus.createClusterSingleInstanceTask(”MyTask",
new MyTask ()); // Where MyTask implements Lifecycle
76. @jeppec
Bus features
Process Manager
• Defines durable business processes
as a flow of Events
Sagas
Process Manager
Sales Service
Order
Accepted
Billing Service
Order Fulfilment
(Saga/
Process-Manager)
Shipping Service
MessageChannel(e.g.aTopic)
Order
Accepted
Order
Accepted
Customer
Billed
Customer
Billed
Order
Approved
Order
Approved
77. @jeppec
Application features
Workflow Manager:
• Defines the Tasks that required human intervention, such as:
• Approvals (e.g. Contract approval)
• Assistance/Help with a business problem
• Incident handling (e.g. a technical problem identified by a developer)
• Authorization (e.g. request manager approval)
• Reminders
• Common tasks supported: Claiming Tasks, Escalating Tasks, Completing Tasks
Workflow Manager
78. @jeppec
Correlation logging
Our infrastructure automatically captures information about an API call,
a Message delivery in the form of a CallContext:
• Message Id - the Id of the message/call being handled.
• Correlation Id – an Id that binds API calls and message handlings across
multiple services/AC’s together
• When – time of the “call”
• Who – which user performed the “call”
• Meta Data: Which Event or Command caused this “call”
The layers are technically de-coupled but not in terms of data or temporal.
Each layer has a technical integration layer that makes it harder to understand the correlations
Debugging is made more difficult
Single point of failure
Latency is high and scalability is low
Locking and scaling: If it takes 200 ms to carry out an operation that uses scaling, the system can maximum handle 5 concurrent users
Fragility: 2,3,4 .... Phase Commit - 2PC theory concludes by saying "and this does not work in reality" - in case of error (eg. Due to timeout while waiting for a resource commit phase) you always end up with having to decide what two do "Halt and manual recovery" or guess whether it was good or bad!
There are besides timeouts a big problem. At worst, timeouts last very long!
If both System A, B and C share the same technical infrastructure (e.g. database) and reside in the same memory space they can share a local transaction. In this case we’re not bound by the laws of distributed computing and everything is easy because the transaction manager will solve all our problems.
Synchronous RPC is the crack cocaine of distributed programming
When ever you connect 2 Nouns using a Verb you create COUPLING!
The service name Customer have a clear bias (have a name, and age, and address)
However in many domains Age will be more related to risk for e.g. Bank/Creditcard/Insurance