12. Paradigm shift to MicroServices
Loosely coupled service oriented architecture with
From Adrian Cockroft (ex Netflix Chief Architect)
13. What is an application?
A single container
– Putting multiple processes into a single container simplifies the deployment
– Breaks Docker best-practices model
– monit, supervisord, runsvdir, runIt
A composition of related containers
– Pod (Kubernetes)
– Task (Amazon AWS ECS – Elastic Container Service)
– Separation of operational concerns
– Not all frameworks understand the container composition
A graph of dependent containers
15. Release Process / Pipeline
1. A developer commits new code to a Repo
2. A build is triggered and creates an app artifact and
pushes it into the artifact repository with metadata:
1. Artifact has a hard version
2. Declares its contracts and contract versions
3. List of dependencies and their versions (Bill-of-materials) attached
3. Builds a Docker images and pushes it to the Docker
1. Inherits from official base image approved by InfoSec and Systems teams
2. Has exactly the same tag as the version of the app artifact – creates correlations
1:1 with the source
4. Deployment ...
16. Release Process Challenges
Pick Container Registry:
– Your own
Registry management is important:
– Disk space, Heavy images
– Tracking of what's in use
– Decommissioning and pruning of the artifacts
Prepare Docker host (configuration management)
– Fry and not Bake
Pull Docker container
– Beware of growing size
– Pre-warm the host with the base image or a previous version
– Single container – easy
– Composition of containers is a challenge (Fig? Your own? ...)
– What configuration (env vars, partitions, etc...) is needed?
External HIERARCHICAL config/settings management is the key (Consul,
– Passing secrets into the containers – think carefully!
Secret management is important (Consul, EtcD, ...)
19. Testing Considerations
Not much different from Virtualized payload
Spin up sandbox environment
Test against API, Mocks, Fakes, Pact
– Use Blue/Green deployment
– Simpler and cheaper to do it in production
– Isolate traffic
– Gradually add load to the point of failure
– Monitor and measure
25. Service Discovery
No built-in SDN yet, just simple linking
Where my dependencies?
Need to manage state of the App
When do you know that the app is healthy and running?
RunScope - tests contracts and validates the payload
– Or check the state from the LB – requires extra code
26. Am I alive?
When the service is ready to
How do you know if your service
is alive? Or still alive?
When the service is actually can start accessing the
Introduce delayed initialization or retries
Make your orchestration smarter to recognize the
Stagger the start and introduce jitter into the system
27. Monitoring / Alerting
Adds another layer to monitor
Monitor both host and the
Rate of change is drastically
Location, Names, Versions – everything in motion
Mutiple running versions at the same time
Multiple locations, regions, zones, DC, HA, etc...
Tools start to recognize Docker – DataDog, Librato, NewRelic,
Composite SLA metrics
28. Reasoning about failure
Tools assume containment
Most can't reason about
Your apps spanning
across multiple containers
Ex: Machine component
(disk?) failure will affect all
instances, VMs, Containers
29. Failure Detection, Cleanup
When to clean up the containers?
What the container failure mean?
How to deal with the partial failure of the app
dependencies or linked containers
Volume containers filling up the host storage – beware!
How to decommission / tear down:
– In what order?
– How to communicate with the Monitoring/Alerting
– Notify Change Management system
30. Container storage
Stateful containers are hard for the moment
Volumes disappear if the Docker host dies –
especially on the clouds: AWS, OpenStack, etc...
Use host mounts, but don't forget where is your stuff
and when to clean it
Interesting: volume relocation by Flocker
31. Log Management
Eagerly move logs out – containers are short lived
Beware of sheer volume of logs – be smart about what and when
Can't truncate or rotate container STDOUT and STDERR
Write to volumes
Log rotation – volume rotation?
Log monitoring & alerting
– Scribe, LogStash
– Splunk (if you can afford it)
Cluster management, provides efficient, fine-
grained resource sharing and isolation across
distributed applications, or frameworks
Distributed resource broker
Since 2012 runs in Twitter in Production
In July 2013 became top-level Apache project
36. Missing Mesos features
No Pods support (multi-container apps)
No REST Api to schedule jobs
No built-in clean-up
Tricky to write frameworks (but getting easier)
A lot of work to integrate with the
38. What's next?
– What will be the solution for SDN?
– Container dependencies discovery
– What's an on-prem alternative?
– How do we test apps?
– What is an app?
– Should we just stop using apps concepts and move to stream processing?
39. Work in progress
– Correlation does not imply causation (from Wikipedia)
– Derivatives and predictive monitoring
– Machine learning