Microservices appear simple to build on the surface, but there's more to creating them than just launching some code running in a container. This talk outlines 10 important questions that should be answered about any new microservice before development begins on it - - and certainly before it gets deployed into production.
Active Directory Penetration Testing, cionsystems.com.pdf
Â
Creating a Microservice? Answer These 10 Questions First.
1. Creating a Microservice?
Answer These 10 Questions First.
Brian Kelly, VP Engineering, Datawire
DevOpsDays Austin, May 2nd 2016
@brikelly
bkelly@datawire.io
2. datawire.io
Hi!
Me
* Working in distributed systems most of my career
* Built a number of middleware and messaging products
* Strangled a SaaS monolith with microservices
Datawire
* Based in Boston and San Francisco
* We provide technology for companies adopting microservices
* We’ve spent a lot of time with the master microservices practitioners
from high-growth technology companies
3. datawire.io
Microservices increase development velocity
DevOps increases release velocity
For organizations scaling rapidly, doing one without the
other is…“suboptimal”
Microservices and DevOps:
A Perfect Match
5. datawire.io 5
“There are only two hard problems in distributed systems:
1. Exactly-once delivery
2. Guaranteed order of messages
1. Exactly-once delivery”
@mathiasverraes
6. datawire.io
Force awareness in your teams of latent concerns
* For example, potential future issues with scalability and reliability
It’s OK to not have sophisticated answers for each question
* But asking them is important!
Why Ask These 10 Questions?
9. datawire.io 9
Developer Infrastructure Teams
The dev infrastructure team focuses on
developer education, core infrastructure, and
driving standards through a great DX.
10. datawire.io 10
Investing in the core infrastructure necessary for
independent iteration is key
Continuous
delivery workflow
Loosely coupled
services
Application
resilience
13. datawire.io 13
Continuous
delivery workflow
1. Workflow needs to be defined but
does not need to be fully automated.
Increase automation as the number of
microservices grows.
2. Need to have service running in
production in order to fully test.
Quickly move from commit to customer
14. datawire.io 14
Each upgrade is an opportunity to break the contract between your new
service and any other dependent services
Plenty of techniques exist for mitigating the chance of failure:
* Well-specified structural and behavioral service contracts
* Dark launching for examining the effect of prod traffic without risk
* Response diff’ing for ensuring contract compliance
* Canary testing for progressive rollout
* Blue/Green deployment for fast rollback
Upgrading your Service
16. datawire.io 16
Ways of monitoring your service’s health:
OK:
* Health check from monitor to service (GET /health from an ELB)
Better:
* “Call Home” health check from service to monitor (APM approach)
Best:
* The client’s experience calling real APIs on the service
Monitoring and Measuring your Service
17. datawire.io 17
Which service is introducing the
maximum latency into a request?
Which service is the root cause of a
cascade failure?
Monitor the traffic, not just the services
Diagnosis
19. datawire.io 19
Unit testing a single service is the easy part
What’s harder: testing the entire system
How will a developer verify that their changes to a single microservice
will not break other parts of the system?
Staging environments bring a little comfort, but add significant cost,
complexity, and distractions
Testing
20. datawire.io 20
Test before launch
Mock services
Sophisticated deployment
workflows
Automated regression tests
Test after launch
Dark launch
Canary testing
Blue / green deployment
Microservice Testing Is Required on Both Sides of Deployment
Reduce probability of failure Reduce impact of failure
22. datawire.io 22
Most likely type of attack vectors:
* Exploitation of OWASP Top 10 vulnerabilities in your web application
* Internal staff with existing access
* Social engineering
Less likely type of attack vector:
* Attacker gains access behind your perimeter, logs on to your containers,
reverse-engineers your internal service APIs, sends fake requests to and from
each microservice
Prioritize Potential Attack Vectors
24. datawire.io 24
“Configuration” can be categorized:
• Static configuration (log file locations, ports to listen on, …)
• Runtime configuration (thread pool sizes, JVM heap size, …)
• Behavioral configuration (feature flags, request routing rules, …)
Configuration
25. datawire.io 25
Prevent arbitrary static configuration changes to production systems
* Instead, deploy those changes into new immutable, copy-on-write
containers
Strive for adaptive, elastic services that require zero dynamic configuration
changes at runtime to stay healthy
Reserve behavioral configuration for progressive rollouts, dark launching,
routing
Configuration
27. datawire.io 27
Your new microservice will provide new value to the rest of the system
But will it offer an SLA for its latency, uptime, and reliability?
Those who consume it will appreciate it:
• They can specify timeouts and trip circuit breakers when response latency is high
• They will know which operations are idempotent
• They could cache some responses for large queries
• They can spot uptime SLA discrepancies
Datawire’s Quark is an IDL that captures both structure and behavior
Your microservice needs a contract
31. datawire.io 31
The simpler your discovery system, the less flexibility it offers.
DNS schemes: very simple, but don’t take into account availability, also
makes the developer experience difficult
Strongly consistent datastores (e.g. Zookeeper): more flexible, but don’t
handle network partitions at all
Eventually consistent datastores with pub/sub (e.g. Datawire Connect):
very flexible, handles partitions well, clients and services unaffected even
when they can’t reach the discovery system
Service Discovery
35. datawire.io
Node
NodeNode
35
What will be the sequence of failures in the
event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM,
then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will
help you be aware of your headroom and help
build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
36. datawire.io
Node
NodeNode
36
What will be the sequence of failures in the
event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM,
then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will
help you be aware of your headroom and help
build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
37. datawire.io
NodeNode
Node
37
What will be the sequence of failures in the
event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM,
then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will
help you be aware of your headroom and help
build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
38. datawire.io
NodeNode
Node
38
What will be the sequence of failures in the
event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM,
then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will
help you be aware of your headroom and help
build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode
39. datawire.io
NodeNode
Node
39
What will be the sequence of failures in the
event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM,
then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will
help you be aware of your headroom and help
build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode NodeNode NodeNode
40. datawire.io
NodeNode
Node
40
What will be the sequence of failures in the
event of a large increase in traffic?
* Example sequence: First the database maxes out, then RAM,
then CPU, then file descriptors, then ELBs, then NICs
Awareness of the likely failure sequence will
help you be aware of your headroom and help
build a plan for capacity growth
Knowing your Chokepoint Sequence
NodeNode
NodeCassandra Cassandra Cassandra
NodeHAProxy HAProxy
NodeNode NodeNode NodeNode
43. datawire.io 43
Microservice architectures are a highly distributed system
by their nature
That means failures will occur, and on a frequent basis
Dependency Failures
45. datawire.io 45
Any microservice calling another must handle downstream failure, with:
* Timeouts
* Circuit breakers to prevent cascading failure
* Backpressure
* Default response values
* Caching prior responses
* Retries
* Fallback to alternative endpoints
Don’t assume that downstream failures manifest as dead endpoints
* Services get sick more often than they die!
Downstream Dependency Failure
46. datawire.io 46
Understand what it means for the rest of the system when (not if) your
service fails
A non-critical service (e.g. a logging service invoked asynchronously over
UDP) can fail without causing upstream disruption, at the expense of log
data loss
A critical synchronous service (e.g. a credit card payment service invoked
over RPC) will require careful use by upstream components if transactions
fail mid-stream
Failing to Serve Upstream Dependencies
48. datawire.io
It’s free and OSS!
https://github.com/datawire/datawire-connect
We work in a public Slack channel - feel free to join to ask questions about microservices in
general, or about our tech (link on the GitHub page)
Watch the talks from our recent Microservices Practitioner Summit (speakers from
Facebook, Netflix, Uber, Google, Yelp, New Relic…) on microservices.com
And like every other organization in here, we’re hiring!
48
Trying Datawire Connect