4. Review of Microservice Architectures
Microservice Characteristics
● Like SOA but “smaller”
● Many small services
● Versioned independently
● Deployed independently
● Scale independently
Possible Microservice Benefits
● Resilience (through clustered services)
● Zero downtime
● Horizontal scaling
● Independent scaling of components
● Ease of making changes
● Ease of uptaking new technology
● Canary testing
● Independent release cycles
“Building Microservices” - Sam Newman
4
5. The taste of the kool-aid (part 1)
● It’s all about the contract! That’s not exactly revolutionary
● Microservices mess with everything. Including testing strategies.
● Hard things become easier (releasing changes, scaling)
● Easy things become hard (e.g. multi-lingual support, referential integrity checks)
● Service boundaries don’t necessarily follow the data model.
● Some ‘scary’ stuff like a repository/db per service really isn’t a big deal. Seriously.
5
6. The taste of the kool-aid (part 2)
● Communication and state exchange between microservices gets complicated (more to follow)
● Zero-downtime upgrades: Not actually that bad once you get the hang of it
● Horizontal scalability: Not a simple yes or no question but a spectrum. Also, not free.
● Canaries are essential for both quality and velocity (more to follow)
● Polyglot stack: More compelling at the database level than at the language/runtime level
● As a developer, you’re empowered. With power comes responsibility
6
14. Guiding Principles
1. You can’t “contrive” production in a
laboratory
○ It’s probably impossible or at a minimum
not worth the cost
2. Reality is our friend
○ Time doesn’t make everything “stronger”
○ Are devs thinking about production?
3. Build “crisis” reflexes
○ Do everyday what you will do in a crisis
14
18. ● Code level
○ Easy to do
○ Basically some “if checks” and a global registry per tenant or %
○ Larger blast radius
● Binary Level
○ Harder to do
○ Utilize service routing techniques (per tenant/% based)
○ Smaller blast radius
● Example (Chat change)
○ Reverse works on this as well
Canary Testing -- It’s not that hard
18
19. Guiding Principles Played Out @ Quiq
1. You can’t “contrive” production in a
laboratory
○ It’s probably impossible or at a minimum
not worth the cost
2. Reality is our friend
○ Time doesn’t make everything “stronger”
○ Are devs thinking about production?
3. Build “crisis” reflexes
○ Do everyday what you will do in a crisis
19
22. More Protocols, More Problems
● Achieving the benefits of microservices is non-trivial
● Using both HTTP & AMQP made each more daunting
○ HTTP necessitates service discovery
○ Canary testing - must cutover & direct two traffic types
○ Resilience - Need clustered/redundant AMQP
○ Horizontal scalability - AMQP server a possible bottleneck
● AMQP worked directly against some of the hallmarks
○ Becomes a point of centralization, like a shared database
○ Polyglot stack - Not as ubiquitous as HTTP
But here’s the worst part…
We were actually afraid to horizontally scale some of our services!
22
23. 1. What if Downstream2 finishes first?
a. Even DB locking does not help
because the race condition is in
the queue itself
2. The more instances of Downstream1,
the more likely the corruption
3. We need an AMQP extension, TCP-
style message ordering, or merge
strategies, or...
User A Offline
User A Online
Downstream2Downstream1
SessionService1
23
Knock, knock: race condition (who’s there?)
24. Our Solution
● All inter-service communication over HTTP
● Store the transactionally ordered side effects in the same database that mutated records
during the transaction (e.g. in redis)
● Ordered executor in (e.g. SessionService) publishes side effect via HTTP
○ Ordered executor leverages locking mechanisms of host service’s primary DB
● Downstream1 receives side effect, pushes it into its own queue, kicks off a future &
immediately returns (still async!)
● Now we have ordered, durable downstream state updates and/or side effects!
24
26. Some Observations
1. Achieving the benefits of microservices is challenging. Do yourself a favor and limit
the number of communication protocols
2. Don’t use producer/consumer queues for general communication
3. In the event of problems like process death, always err on the side of sending the
same message twice instead of dropping it. Idempotency is king.
26
Characteristics enable but don’t guarantee benefits
Takes hard work… you can evolve into these (even in greenfield)
Different gates along the way
Whatever gates were there before are ignored or a different “policy”
Different gates along the way
Developers check in line of code - not thinking about production
Every line of code the developer is thinking about production (quality, scalability, performance?)
Pair program
Key: Small enough chunks of work
Includes: Size of service and how often release
What about bake time?
2 Extremes: Data migration code vs Low level library
Customer/Tenant
Customer/Tenant
A service must be able to read from another service while processing requests
Must support a service registering a side effect in another service in a durable, transactional way
E.g. sending an email to the user iff their account is created
Must support a service emitting partial state updates to one or more downstream services in a durable, transactional way
E.g. user online/offline events
It seemed odd/heavy to use anything other than HTTP for 1, so we did that. 2 & 3 are inherently asynchronous, so we employed AMQP. Makes sense, right?
Service A manages sessions and produces online/offline events
Service B consumes those events to maintain a denormalized list of online users (so that A doesn’t have to scale w/ B)
This is microservices . . There are multiple instances of service B. What if B2 finishes first?
Even DB locking does not help because the race condition is in the AMQP queue itself
The more instances of B, the more likely the corruption
We need an AMQP extension, TCP-style message ordering, or merge strategies, or...
The “Process” block uses the databases locking mechanisms to insure a single worker on pipelineA
We solved our problems by moving the queue(s) into the services, where we’re in control
There’s an implicit retry daemon