Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The journey to container adoption in enterprise

1,608 views

Published on

The war stories of Docker adoption in production in enterprise companies

Published in: Software
  • Be the first to comment

The journey to container adoption in enterprise

  1. 1. the journey to container adoption in Enterprise Personal observations by Igor Moochnick Running Docker, Mesos and more in production
  2. 2. Where do I come from?
  3. 3. ● Monolithic architecture ● Local dependencies ● Everything in one place
  4. 4. ● Static Infrastructure ● Predictable operations ● Known Change ● Scheduled downtime
  5. 5. ● A lot of Change control and coordination – MR, MC ● Waiting for approvals
  6. 6. Paradigm shits for Speed ● Requirements ● Correctness ● Stability ● Waterfall ● Monolith/3-tier ● Market demand ● Customer's delight ● Speed ● Agile/Lean ● SOA/Services
  7. 7. ● What's in it for us? ● Will it help? ● Is it a hype? ● Static vs. Cloud ● Virtualization vs. Containers ● Private vs. public Docker?
  8. 8. ● Gradual adoption of virtualization over 5 years ● Explosion adoption of containers over 2 years Virtualization OpenStack Docker Interest over time (by Google Analytics)
  9. 9. ● Starting slow ● Getting used to ● Find limitations ● Isolation of the builds ● Slow? ● Container hosts ● Network vs. Storage
  10. 10. Paradigm shift to MicroServices ● Loosely coupled service oriented architecture with bounded contexts From Adrian Cockroft (ex Netflix Chief Architect)
  11. 11. What is an application? ● A single container – Putting multiple processes into a single container simplifies the deployment – Breaks Docker best-practices model – monit, supervisord, runsvdir, runIt ● A composition of related containers – Pod (Kubernetes) – Task (Amazon AWS ECS – Elastic Container Service) – Separation of operational concerns – Not all frameworks understand the container composition ● A graph of dependent containers
  12. 12. Immutable Artifacts ● Configuration management doesn't guarantee immutability ● Cumulative change/Drift vs. refresh ● Version everything! ● Turn your release process into an artifact! Pipeline Builder http://bit.ly/1Eoz7WV
  13. 13. Release Process / Pipeline 1. A developer commits new code to a Repo 2. A build is triggered and creates an app artifact and pushes it into the artifact repository with metadata: 1. Artifact has a hard version 2. Declares its contracts and contract versions 3. List of dependencies and their versions (Bill-of-materials) attached 3. Builds a Docker images and pushes it to the Docker registry 1. Inherits from official base image approved by InfoSec and Systems teams 2. Has exactly the same tag as the version of the app artifact – creates correlations 1:1 with the source 4. Deployment ...
  14. 14. Release Process Challenges ● Pick Container Registry: – Your own – DockerHub – Artifactory ● Registry management is important: – Disk space, Heavy images – Tracking of what's in use – Decommissioning and pruning of the artifacts – Availability – Auditing – Permissions
  15. 15. Deployment ● Prepare Docker host (configuration management) – Fry and not Bake ● Pull Docker container – Beware of growing size – Pre-warm the host with the base image or a previous version ● Start application – Single container – easy – Composition of containers is a challenge (Fig? Your own? ...) – What configuration (env vars, partitions, etc...) is needed? ● External HIERARCHICAL config/settings management is the key (Consul, Zookeeper, Hiera) – Passing secrets into the containers – think carefully! ● Secret management is important (Consul, EtcD, ...)
  16. 16. ● Versions ● Composition ● Ownership management ● Zombie containers ● Disappearing containers Container Sprawl
  17. 17. Testing Considerations ● Not much different from Virtualized payload ● Spin up sandbox environment ● Test against API, Mocks, Fakes, Pact ● Go live? – Use Blue/Green deployment ● Pressure testing? – Simpler and cheaper to do it in production – Isolate traffic – Gradually add load to the point of failure – Monitor and measure
  18. 18. Environment Management ● Dev/QA/Prod/etc... environments parity ● Local dev machine vs. Cloud deployment BigRig: http://bit.ly/1Hnrq5w
  19. 19. Lots of Microservices
  20. 20. http://accordance.github.io/ Change Management
  21. 21. ● Accordance tracks dependencies & ownership http://accordance.github.io/ Dependency Management
  22. 22. Service Discovery ● No built-in SDN yet, just simple linking ● Where my dependencies? – Eureka – EtcD – Consul ● Need to manage state of the App – Starting – Running ● When do you know that the app is healthy and running? ● Healtchecks ● RunScope - tests contracts and validates the payload – Stopping – Dead – Or check the state from the LB – requires extra code
  23. 23. Am I alive? ● When the service is ready to receive traffic? ● How do you know if your service is alive? Or still alive? ● When the service is actually can start accessing the linked dependencies/volumes? ● Introduce delayed initialization or retries ● Make your orchestration smarter to recognize the composition time ● Stagger the start and introduce jitter into the system
  24. 24. Monitoring / Alerting ● Adds another layer to monitor ● Monitor both host and the containers ● Rate of change is drastically different ● Location, Names, Versions – everything in motion ● Mutiple running versions at the same time ● Multiple locations, regions, zones, DC, HA, etc... ● Tools start to recognize Docker – DataDog, Librato, NewRelic, … ● Composite SLA metrics
  25. 25. Reasoning about failure ● Tools assume containment hierarchy ● Most can't reason about the relationship ● Your apps spanning across multiple containers and hosts ● Ex: Machine component (disk?) failure will affect all instances, VMs, Containers and Apps Region Zone/DC Environment Machine VM/Instance Container Process Process Linked Container Volume Storage
  26. 26. Failure Detection, Cleanup ● When to clean up the containers? ● What the container failure mean? ● How to deal with the partial failure of the app dependencies or linked containers ● Volume containers filling up the host storage – beware! ● How to decommission / tear down: – What? – In what order? – How to communicate with the Monitoring/Alerting – Notify Change Management system
  27. 27. Container storage ● Stateful containers are hard for the moment ● Volumes disappear if the Docker host dies – especially on the clouds: AWS, OpenStack, etc... ● Use host mounts, but don't forget where is your stuff and when to clean it ● Interesting: volume relocation by Flocker
  28. 28. Log Management ● Eagerly move logs out – containers are short lived ● Beware of sheer volume of logs – be smart about what and when you ship ● Can't truncate or rotate container STDOUT and STDERR ● Write to volumes ● Log rotation – volume rotation? ● Log analysis ● Log monitoring & alerting ● Tools examples: – Scribe, LogStash – FluentD – Splunk (if you can afford it)
  29. 29. Mesos ● Cluster management, provides efficient, fine- grained resource sharing and isolation across distributed applications, or frameworks ● Distributed resource broker ● Since 2012 runs in Twitter in Production ● In July 2013 became top-level Apache project
  30. 30. Mesos Ecosystem ● Marathon ● Chronos ● Singularity (HubSpot) – Monitoring: queues growing, failure rates, health checking ● [Apache] Aurora (Twitter) – Working rolling upgrades – Service health--checks – Notifications/service ownership/quotas ● Note (can't wait): Mantis (Netflix) – Distributed scheduler (Fenzo) + predictive auto-scaling (Scryer) – Resource optimization – Auto-scaling micro-service graph
  31. 31. Docker Cluster Management
  32. 32. Missing Mesos features ● AWS Multi-region? ● Sticky locations? ● Persistent volumes? ● No Pods support (multi-container apps) ● No REST Api to schedule jobs ● No built-in clean-up ● Tricky to write frameworks (but getting easier) ● A lot of work to integrate with the monitoring/alerting/logging systems
  33. 33. What's next? ● Kubernetes – What will be the solution for SDN? – Container dependencies discovery ● Lambda architecture – What's an on-prem alternative? – How do we test apps? – What is an app? – Should we just stop using apps concepts and move to stream processing?
  34. 34. Work in progress ● Failures tracking – Correlation does not imply causation (from Wikipedia) – Derivatives and predictive monitoring – Machine learning
  35. 35. Data, Request & Control Flow Salp (inspired by Dapper)
  36. 36. Credits ... ● Who Moved My Cheese? Movie by Dr. Spencer Johnson ● Apache Mesos at Twitter (Texas LinuxFest 2014) ● Containers at Hong Kong commercial port ● Yes, prime minister
  37. 37. Thank you! Questions? @igor_moochnick igor@igorshare.com http://r44e.wordpress.com/

×