Orchestration vs Choreography
A Guide To Composing Your Monolith
Ian Thomas
www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic 2023
www.footyheadlines.com
Adidas
https://www.flickr.com/photos/bounder/2878997040
Hi 👋, I’m Ian
I’m a software engineer from the UK, currently
building work products at Meta as part of
Reality Labs.
Previously, I was VP of Web Architecture for
Genesis and Chief Digital Technology Architect
for PokerStars.
www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic
󰞽 On The Menu Today…
● The need for change and how our decisions impact it
● How complexity and coupling are conspiring to slow your progress
● Orchestration and choreography patterns
● State, data, failure and humans
● Tooling and processes to help smooth the road
I encourage you to keep in mind five “C”s:
Communication, Consistency, Coordination, Coupling and Complexity
Lehman’s Laws of
Software
Evolution
Programs, Life Cycles and Laws of Software Evolution
Continuing Change
Self Regulation
Conservation of
Familiarity
Declining Quality
Increasing Complexity
Conservation of
Organisational Stability
Continuing Growth
Feedback System
Continuing Change
Self Regulation
Conservation of
Familiarity
Declining Quality
Increasing Complexity
Conservation of
Organisational Stability
Continuing Growth
Feedback System
Requires continual adaptation
or it becomes progressively less
satisfactory
Complexity increases unless
work is done to maintain or
reduce it
Functional content must be
continually increased to
maintain user satisfaction
Quality will appear to be
declining unless a system is
rigorously maintained and
adapted to operational
environment changes
Continuing Change
Self Regulation
Conservation of
Familiarity
Declining Quality
Increasing Complexity
Conservation of
Organisational Stability
Continuing Growth
Feedback System
Continuing Change
Self Regulation
Conservation of
Familiarity
Declining Quality
Increasing Complexity
Conservation of
Organisational Stability
Continuing Growth
Feedback System
Change in the
world outside
systems drives the
need for growth
and change within
Continuing Change
Self Regulation
Conservation of
Familiarity
Declining Quality
Increasing Complexity
Conservation of
Organisational Stability
Continuing Growth
Feedback System
Change in the
world outside
systems drives the
need for growth
and change within
Change in the
system itself is
self-limiting
unless deliberate
effort is exerted
“[..] shows the continuing growth of the system
(first law) albeit at a declining rate
(demonstrably due to increasing difficulty of
change, growing complexity (second law)”
M. Lehman - Program, Life Cycles, and Laws of Software Evolution
Which chart most closely matches your situation?
Which chart most closely matches your situation?
Workflows
Place
order
Check
stock
Wait for
stock
Cancel
order
Process
payment
Update
payment
details
Ship
product
Order
complete
Abyss of
shipping
support
Accepted
Unavailable
Available
Allocated Billed
Delivered
N
o
s
h
o
w
Timeout
Timeout
R
e
t
r
y
E
r
r
o
r
Start
Place
order
Check
stock
Wait for
stock
Cancel
order
Process
payment
Update
payment
details
Ship
product
Order
complete
Start
Order
Inventory
Payment Fulfilment
Bounded Contexts
Ubiquitous Languages
Bounded Contexts
Svc
Svc
Svc
Svc Svc
Svc
Svc
Svc
Svc
Ubiquitous Languages
Bounded Contexts
Svc
Svc
Svc
Svc Svc
Svc
Svc
Svc
Svc
Subdomain
Ubiquitous Languages
Bounded Contexts
Ubiquitous Languages
Svc
Svc
Svc
Svc Svc
Svc
Svc
Svc
Svc
Subdomain
Interchange Contexts
Place
order
Check
stock
Wait for
stock
Cancel
order
Process
payment
Update
payment
details
Ship
product
Order
complete
Start
EXTERNAL
INTERNAL
INTERNAL
EXTERNAL
EXTERNAL
EXTERNAL
Interaction Patterns
Interaction Patterns
Orchestration
Human-in-the-loop and long
running workflows
The orchestrator can manage compensatory actions
��
Circuit Breakers Timeouts*
Service Discovery Retries
Healthchecks Auto-scaling
Bulkheads Mutual TLS
Handling Failure
Many of these requirements can be pushed to the platform
Orchestration
Pro Con
Single controller managing workflow state Single point of failure
Complex error handling is easier to manage Additional latency
Platform tooling increasingly removing complexity from
applications (especially for synchronous calls)
Scalability
Recoverability Responsiveness
Lower cognitive load Coupling between orchestrator and services
Version controllable workflow definitions
Lots of tooling to support o11y of API driven services
Interaction Patterns
Choreography
Events or Commands?
Events or Commands?
https://nedroidcomics.tumblr.com/post/41879001445/the-internet
Types of Event
Event Notification Announcing facts, with no expectation of action or response
Event-Carried State Transfer Reduce chattiness between services by including data in the event
Event Sourcing Events are recorded in a persistent log, allowing for replay and state reconstruction
CQRS Separate reading and writing, handles broad variation in access patterns
Martin Fowler - What do you mean by “Event-Driven”?
Message size Extra infrastructure
Infinite loops Error handling
Delivery semantics Workflow level timeouts
Idempotency Ordering
Versioning State management
Choreography Considerations
Place
order
Check
stock
Wait for
stock
Cancel
order
Process
payment
Update
payment
details
Ship
product
Order
complete
Abyss of
shipping
support
Accepted
Unavailable
Available
Allocated Billed
Delivered
N
o
s
h
o
w
Timeout
Timeout
R
e
t
r
y
E
r
r
o
r
Start
Choreography
Pro Con
Weak coupling between services Complexity grows rapidly with event cardinality
Scalability Typically requires intermediate infrastructure
Responsiveness Versioning of events
Fault tolerance, no single point of failure No single view of workflow state
High throughput Hard to version control workflow
Error handling (especially at the workflow level)
Choosing Your PAtterns
…use orchestration within the bounded
context of a microservice, but use
choreography between bounded-contexts.
Yan Cui – Choreography vs Orchestration in the land of serverless
…use orchestration within the bounded
context of a microservice, but use
choreography between bounded-contexts.
Yan Cui – Choreography vs Orchestration in the land of serverless
Place
order
Check
stock
Wait for
stock
Cancel
order
Process
payment
Update
payment
details
Ship
product
Order
complete
Start
EXTERNAL
INTERNAL
INTERNAL
EXTERNAL
EXTERNAL
EXTERNAL
Formally Specified Informally Specified
How formally do you need to specify your workflow?
Orchestration
service, workflow
defined using
DSL/programming
language
(declarative)
Custom
orchestrator,
workflow defined in
general purpose
programming
language
(imperative)
Front controller
knows about
workflow
Stateless, hopefully
documented,
potentially just in a
few people’s heads
Orchestration Choreography
An architect can never reduce semantic
coupling via implementation, but they
can make it worse.
Neal Ford, Mark Richards, Pramod Sadalage & Zhamak Dehghani – Software Architecture: The Hard Parts
Coupling
Essential Accidental
Systems necessarily coupled to deliver value
Poor or non-existent system design or
development processes
Types of Coupling
Operational A consumer can’t run without a provider
Developmental Changes in producers and consumers must be coordinated
Semantic Change together because of shared concepts
Functional Change together because of shared responsibility
Incidental Change together for no good reason
Michael Nygard - Uncoupling
Temporal effects?
Will you be seated?
What about 3 people?
12.5% chance of being seated
Adding further people quickly reduces your chances
of being seated on time to (effectively) 0
Adding further people quickly reduces your chances
of being seated on time to (effectively) 0
6% 1.6% 0.1%
Types of Coupling
Operational A consumer can’t run without a provider
Developmental Changes in producers and consumers must be coordinated
Semantic Change together because of shared concepts
Functional Change together because of shared responsibility
Incidental Change together for no good reason
Michael Nygard - Uncoupling
Organisational Progress can only be achieved through others
Decomposition != Decoupling
The value of orchestration increases with workflow
complexity, notably with complex error scenarios.
Comparatively, responsiveness and scalability
requirements favour choreography, especially when
error handling is minimal.
Orchestration Choreography
Operational Coupling Strong Very weak
Developmental Coupling Strong Weaker, caution advised
Semantic Coupling Less Strong Weak
Functional Coupling Less Strong Weak
Incidental Coupling Less likely, potentially easier to find Harder to find, more sinister when present
Scalability Scale cascades, less suitable for parallelism Backpressure to decouple, easier to parallelise
Reliability Only as good as your weakest link Careful design decouples uptime
Responsiveness Central bottleneck, processing chains add latency Highly responsive due to reduced operational coupling
Fault Tolerance Low, due to single point of failure effect Excellent
Error Handling Eased through central state management Harder, risk of event explosion and passive/aggressive
Cognitive Load Lower Complexity grows with number of events
Observability Traditional tooling and central state make o11y easier Potentially more difficult, requires strong platform support
Orchestration vs Choreography
@anatomic’s rough guide to
Making “IT” Work
Avoiding gotcha’s, no matter the pattern
What’s inside the boxes?
Four pillars of Event Streaming Capabilities
Business Function Instrumentation Control Plane Operational Plane
Actually doing the
work we need, the
business function (or
“core” plane) is where
the value lies for our
customers and the
business.
The metrics and
telemetry necessary
for us to determine if
the system is working
as expected.
Systems will keep on
chugging, even when
we might need them
to stop. Control
planes help manage
change, including
pausing, scaling and
rate-limiting.
Tooling and processes
to help run our
systems, including
addressing failure
modes (wiping data
and corrective
actions), upgrade
processes and
evolutionary support.
https://www.confluent.io/en-gb/blog/journey-to-event-driven-part-4-four-pillars-of-event-streaming-microservices/
More difficult to implement in
event-driven systems
What’s between the boxes?
A B
What’s in a line?
A B
What’s in a line?
Inter-process communication
Traffic flow
DNS
Service discovery
Routing
Schemas
Certificates
Firewall
Physical connection
Load balancing
AuthN/AuthZ
Secrets
Data format
Protocol
Failure modes
A B
What’s in a line?
Inter-process communication
Traffic flow
DNS
Service discovery
Routing
Schemas
Certificates
Firewall
Physical connection
Load balancing
AuthN/AuthZ
Secrets
Data format
Protocol
Failure modes
A B
What’s in a line?
Inter-process communication
Traffic flow
DNS
Service discovery
Routing
Schemas
Certificates
Firewall
Physical connection
Load balancing
AuthN/AuthZ
Secrets
Data format
Protocol
Failure modes
Different priorities
Required changes
Change management
ITIL
Change Advisory Boards
Time zones
Backlogs
Scrum of Scrums
Language
A B
What’s in a line?
Inter-process communication
Traffic flow
DNS
Service discovery
Routing
Schemas
Certificates
Firewall
Physical connection
Load balancing
AuthN/AuthZ
Secrets
Data format
Protocol
Failure modes
Different priorities
Required changes
Change management
ITIL
Change Advisory Boards
Time zones
Backlogs
Scrum of Scrums
Language
🤯
A B
What’s in a line?
Inter-process communication
Traffic flow
DNS
Service discovery
Routing
Schemas
Certificates
Firewall
Physical connection
Load balancing
AuthN/AuthZ
Secrets
Data format
Protocol
Failure modes
Different priorities
Required changes
Change management
ITIL
Change Advisory Boards
Time zones
Backlogs
Scrum of Scrums
Language
Testing Long-running workflows
Schema evolution Ownership of workflow state
Fallacies of distributed computing Team geo-distribution
Serialisation formats Distributed tracing
Accidental coupling Self-service infrastructure
How are you going to handle…
And all the other stuff that won’t fit on a slide
Schema Evolution
Operational Changes Allowed Schemas Validated Upgrade First
Backward
● Delete fields
● Add optional fields
Last version Consumers
Backward transitive
● Delete fields
● Add optional fields
All previous versions Consumers
Forward
● Add fields
● Delete optional fields
Last version Producers
Forward transitive
● Add fields
● Delete optional fields
All previous versions Producers
Full
● Add optional fields
● Delete optional fields
Last version Any order
Full transitive
● Add optional fields
● Delete optional fields
All previous versions Any order
None ● All changes accepted None Depends
https://docs.confluent.io/platform/current/schema-registry/avro.html - compatibility-types
A B
What’s in a line?
Inter-process communication
Traffic flow
DNS
Service discovery
Routing
Schemas
Certificates
Firewall
Physical connection
Load balancing
AuthN/AuthZ
Secrets
Data format
Protocol
Failure modes
Different priorities
Required changes
Change management
ITIL
Change Advisory Boards
Time zones
Backlogs
Scrum of Scrums
Language
Testing Long-running workflows
Schema evolution Ownership of workflow state
Fallacies of distributed computing Team geo-distribution
Serialisation formats Distributed tracing
Accidental coupling Self-service infrastructure
How are you going to handle…
And all the other stuff that won’t fit on a slide
Favour orchestration for complex workflows,
choreography for scalability + weaker coupling
Enable long-term changeability through
deliberate design + trade-off analysis
Complexity breeds in the bits between our systems,
handle with care (+ don’t forget about the humans!)
www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic
Thanks 🖖
If you’re interested in chatting more about any
of the topics covered in this talk, come and grab
me in the hallway track or virtually through
Twitter or LinkedIn.
Thank you for coming to hear me speak!
www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic
2023

Orchestration vs Choreography - A Guide To Composing Your Monolith

  • 1.
    Orchestration vs Choreography AGuide To Composing Your Monolith Ian Thomas www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic 2023
  • 2.
  • 3.
  • 5.
  • 6.
    Hi 👋, I’mIan I’m a software engineer from the UK, currently building work products at Meta as part of Reality Labs. Previously, I was VP of Web Architecture for Genesis and Chief Digital Technology Architect for PokerStars. www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic
  • 7.
    󰞽 On TheMenu Today… ● The need for change and how our decisions impact it ● How complexity and coupling are conspiring to slow your progress ● Orchestration and choreography patterns ● State, data, failure and humans ● Tooling and processes to help smooth the road I encourage you to keep in mind five “C”s: Communication, Consistency, Coordination, Coupling and Complexity
  • 8.
    Lehman’s Laws of Software Evolution Programs,Life Cycles and Laws of Software Evolution
  • 9.
    Continuing Change Self Regulation Conservationof Familiarity Declining Quality Increasing Complexity Conservation of Organisational Stability Continuing Growth Feedback System
  • 10.
    Continuing Change Self Regulation Conservationof Familiarity Declining Quality Increasing Complexity Conservation of Organisational Stability Continuing Growth Feedback System Requires continual adaptation or it becomes progressively less satisfactory Complexity increases unless work is done to maintain or reduce it Functional content must be continually increased to maintain user satisfaction Quality will appear to be declining unless a system is rigorously maintained and adapted to operational environment changes
  • 11.
    Continuing Change Self Regulation Conservationof Familiarity Declining Quality Increasing Complexity Conservation of Organisational Stability Continuing Growth Feedback System
  • 12.
    Continuing Change Self Regulation Conservationof Familiarity Declining Quality Increasing Complexity Conservation of Organisational Stability Continuing Growth Feedback System Change in the world outside systems drives the need for growth and change within
  • 13.
    Continuing Change Self Regulation Conservationof Familiarity Declining Quality Increasing Complexity Conservation of Organisational Stability Continuing Growth Feedback System Change in the world outside systems drives the need for growth and change within Change in the system itself is self-limiting unless deliberate effort is exerted
  • 14.
    “[..] shows thecontinuing growth of the system (first law) albeit at a declining rate (demonstrably due to increasing difficulty of change, growing complexity (second law)” M. Lehman - Program, Life Cycles, and Laws of Software Evolution
  • 17.
    Which chart mostclosely matches your situation?
  • 18.
    Which chart mostclosely matches your situation?
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    Bounded Contexts Ubiquitous Languages Svc Svc Svc SvcSvc Svc Svc Svc Svc Subdomain Interchange Contexts
  • 26.
  • 27.
  • 28.
  • 33.
  • 35.
    The orchestrator canmanage compensatory actions ��
  • 36.
    Circuit Breakers Timeouts* ServiceDiscovery Retries Healthchecks Auto-scaling Bulkheads Mutual TLS Handling Failure Many of these requirements can be pushed to the platform
  • 38.
    Orchestration Pro Con Single controllermanaging workflow state Single point of failure Complex error handling is easier to manage Additional latency Platform tooling increasingly removing complexity from applications (especially for synchronous calls) Scalability Recoverability Responsiveness Lower cognitive load Coupling between orchestrator and services Version controllable workflow definitions Lots of tooling to support o11y of API driven services
  • 39.
  • 42.
  • 43.
  • 44.
    Types of Event EventNotification Announcing facts, with no expectation of action or response Event-Carried State Transfer Reduce chattiness between services by including data in the event Event Sourcing Events are recorded in a persistent log, allowing for replay and state reconstruction CQRS Separate reading and writing, handles broad variation in access patterns Martin Fowler - What do you mean by “Event-Driven”?
  • 51.
    Message size Extrainfrastructure Infinite loops Error handling Delivery semantics Workflow level timeouts Idempotency Ordering Versioning State management Choreography Considerations
  • 52.
  • 53.
    Choreography Pro Con Weak couplingbetween services Complexity grows rapidly with event cardinality Scalability Typically requires intermediate infrastructure Responsiveness Versioning of events Fault tolerance, no single point of failure No single view of workflow state High throughput Hard to version control workflow Error handling (especially at the workflow level)
  • 54.
  • 55.
    …use orchestration withinthe bounded context of a microservice, but use choreography between bounded-contexts. Yan Cui – Choreography vs Orchestration in the land of serverless
  • 56.
    …use orchestration withinthe bounded context of a microservice, but use choreography between bounded-contexts. Yan Cui – Choreography vs Orchestration in the land of serverless
  • 57.
  • 58.
    Formally Specified InformallySpecified How formally do you need to specify your workflow? Orchestration service, workflow defined using DSL/programming language (declarative) Custom orchestrator, workflow defined in general purpose programming language (imperative) Front controller knows about workflow Stateless, hopefully documented, potentially just in a few people’s heads Orchestration Choreography
  • 59.
    An architect cannever reduce semantic coupling via implementation, but they can make it worse. Neal Ford, Mark Richards, Pramod Sadalage & Zhamak Dehghani – Software Architecture: The Hard Parts
  • 60.
    Coupling Essential Accidental Systems necessarilycoupled to deliver value Poor or non-existent system design or development processes
  • 61.
    Types of Coupling OperationalA consumer can’t run without a provider Developmental Changes in producers and consumers must be coordinated Semantic Change together because of shared concepts Functional Change together because of shared responsibility Incidental Change together for no good reason Michael Nygard - Uncoupling Temporal effects?
  • 63.
    Will you beseated?
  • 68.
    What about 3people?
  • 70.
    12.5% chance ofbeing seated
  • 71.
    Adding further peoplequickly reduces your chances of being seated on time to (effectively) 0
  • 72.
    Adding further peoplequickly reduces your chances of being seated on time to (effectively) 0
  • 73.
  • 74.
    Types of Coupling OperationalA consumer can’t run without a provider Developmental Changes in producers and consumers must be coordinated Semantic Change together because of shared concepts Functional Change together because of shared responsibility Incidental Change together for no good reason Michael Nygard - Uncoupling Organisational Progress can only be achieved through others
  • 75.
  • 76.
    The value oforchestration increases with workflow complexity, notably with complex error scenarios. Comparatively, responsiveness and scalability requirements favour choreography, especially when error handling is minimal.
  • 77.
    Orchestration Choreography Operational CouplingStrong Very weak Developmental Coupling Strong Weaker, caution advised Semantic Coupling Less Strong Weak Functional Coupling Less Strong Weak Incidental Coupling Less likely, potentially easier to find Harder to find, more sinister when present Scalability Scale cascades, less suitable for parallelism Backpressure to decouple, easier to parallelise Reliability Only as good as your weakest link Careful design decouples uptime Responsiveness Central bottleneck, processing chains add latency Highly responsive due to reduced operational coupling Fault Tolerance Low, due to single point of failure effect Excellent Error Handling Eased through central state management Harder, risk of event explosion and passive/aggressive Cognitive Load Lower Complexity grows with number of events Observability Traditional tooling and central state make o11y easier Potentially more difficult, requires strong platform support Orchestration vs Choreography @anatomic’s rough guide to
  • 78.
    Making “IT” Work Avoidinggotcha’s, no matter the pattern
  • 80.
  • 81.
    Four pillars ofEvent Streaming Capabilities Business Function Instrumentation Control Plane Operational Plane Actually doing the work we need, the business function (or “core” plane) is where the value lies for our customers and the business. The metrics and telemetry necessary for us to determine if the system is working as expected. Systems will keep on chugging, even when we might need them to stop. Control planes help manage change, including pausing, scaling and rate-limiting. Tooling and processes to help run our systems, including addressing failure modes (wiping data and corrective actions), upgrade processes and evolutionary support. https://www.confluent.io/en-gb/blog/journey-to-event-driven-part-4-four-pillars-of-event-streaming-microservices/ More difficult to implement in event-driven systems
  • 82.
  • 83.
  • 84.
    A B What’s ina line? Inter-process communication Traffic flow DNS Service discovery Routing Schemas Certificates Firewall Physical connection Load balancing AuthN/AuthZ Secrets Data format Protocol Failure modes
  • 85.
    A B What’s ina line? Inter-process communication Traffic flow DNS Service discovery Routing Schemas Certificates Firewall Physical connection Load balancing AuthN/AuthZ Secrets Data format Protocol Failure modes
  • 86.
    A B What’s ina line? Inter-process communication Traffic flow DNS Service discovery Routing Schemas Certificates Firewall Physical connection Load balancing AuthN/AuthZ Secrets Data format Protocol Failure modes Different priorities Required changes Change management ITIL Change Advisory Boards Time zones Backlogs Scrum of Scrums Language
  • 87.
    A B What’s ina line? Inter-process communication Traffic flow DNS Service discovery Routing Schemas Certificates Firewall Physical connection Load balancing AuthN/AuthZ Secrets Data format Protocol Failure modes Different priorities Required changes Change management ITIL Change Advisory Boards Time zones Backlogs Scrum of Scrums Language 🤯
  • 88.
    A B What’s ina line? Inter-process communication Traffic flow DNS Service discovery Routing Schemas Certificates Firewall Physical connection Load balancing AuthN/AuthZ Secrets Data format Protocol Failure modes Different priorities Required changes Change management ITIL Change Advisory Boards Time zones Backlogs Scrum of Scrums Language Testing Long-running workflows Schema evolution Ownership of workflow state Fallacies of distributed computing Team geo-distribution Serialisation formats Distributed tracing Accidental coupling Self-service infrastructure How are you going to handle… And all the other stuff that won’t fit on a slide
  • 90.
    Schema Evolution Operational ChangesAllowed Schemas Validated Upgrade First Backward ● Delete fields ● Add optional fields Last version Consumers Backward transitive ● Delete fields ● Add optional fields All previous versions Consumers Forward ● Add fields ● Delete optional fields Last version Producers Forward transitive ● Add fields ● Delete optional fields All previous versions Producers Full ● Add optional fields ● Delete optional fields Last version Any order Full transitive ● Add optional fields ● Delete optional fields All previous versions Any order None ● All changes accepted None Depends https://docs.confluent.io/platform/current/schema-registry/avro.html - compatibility-types
  • 91.
    A B What’s ina line? Inter-process communication Traffic flow DNS Service discovery Routing Schemas Certificates Firewall Physical connection Load balancing AuthN/AuthZ Secrets Data format Protocol Failure modes Different priorities Required changes Change management ITIL Change Advisory Boards Time zones Backlogs Scrum of Scrums Language Testing Long-running workflows Schema evolution Ownership of workflow state Fallacies of distributed computing Team geo-distribution Serialisation formats Distributed tracing Accidental coupling Self-service infrastructure How are you going to handle… And all the other stuff that won’t fit on a slide
  • 93.
    Favour orchestration forcomplex workflows, choreography for scalability + weaker coupling Enable long-term changeability through deliberate design + trade-off analysis Complexity breeds in the bits between our systems, handle with care (+ don’t forget about the humans!) www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic
  • 94.
    Thanks 🖖 If you’reinterested in chatting more about any of the topics covered in this talk, come and grab me in the hallway track or virtually through Twitter or LinkedIn. Thank you for coming to hear me speak! www.ian-thomas.net | @anatomic | linkedin.com/in/anatomic 2023