v
Considerations for Operating Docker at Scale
Solution Architect, Docker
Andrew Hromis
Sr. DevOps Engineer, Jabil
@sujaypillai
Sujay Pillai
• Architecture
• Node sizing
• Orchestration
• Applications
Scaling Docker EE
• Design for failure
• Leverage Docker Certified Infrastructure
• Prioritize centralized logging and metric
collection
Architectural Considerations
Single Logically Separated Cluster
Node
Worker
Node
Worker
Node
Worker
Node
Worker
swarm mode cluster
Node
Worker
Node
Worker
.NET Dev Team
Using Swarm
Java Dev Team
using K8s
Java Dev Team
Using Swarm
Ops Team
DOCKER ENTERPRISE
EDITION
Production Environments
Docker Trusted Registry
Docker UCP
Production Environments
Version Control
Docker UCP
Non-Production EnvironmentsDeveloper Machine
Development CI/CD Operations
Datacenter 1
Datacenter 2
Docker Trusted Registry
Docker for
Clusters by Lifecycle
Multi-region DR
West East
CI Agent
$ eval $(<env.sh) … east
$ docker run
$ docker service
$ docker-compose up
m
yapp
v1.0
Multi-region DR
West East
CI Agent
$ eval $(<env.sh) … west
$ docker run
$ docker service
$ docker-compose up
myapp v1.0
Blue/Green
foo.example.com (1.0) foo.green.example.com (1.1)
Run your tests!
PRODUCTION NEW RELEASE
Ship it!
foo.example.com (1.0) foo.example.com (1.1)
Increase traffic to the new release
PRODUCTION 90% Traffic PRODUCTION 10% Traffic
All good?
foo.example.com (1.0) foo.example.com (1.1)
Measure the new release
PRODUCTION 90% Traffic PRODUCTION 10% Traffic
Cluster Upgrade
foo.example.com (1.0) foo.example.com (1.1)
Maintenance can be performed on blue, or rollback
PRODUCTION 0% Traffic PRODUCTION 100% Traffic
Scaling w/ Docker Certified Infra
Node Sizing
Manager Nodes Worker Nodes
• CPU: 4 vCPU
• Memory: 16GB
• Disk: SSD for /var/lib/docker
• Support 100s of worker nodes
• 3 or 5 managers is preferred
• Depends on application workloads
• If migrating there will be less
overhead from OS
• Leave headroom for rescheduling
events
• Run under load and test
• Don’t schedule workloads on manager nodes
• Deploy to nodes that fit app profiles
• Constrain resources
• Use Kubernetes namespaces in environments
with multiple users and teams
Orchestration
• Stateless applications scale the best
• Scale applications with any orchestrator
• Adjust replica count in Kubernetes and Swarm
• Understand the metrics by which to scale
Scaling Applications
Manufacturing at
the Speed of Digital
© JABIL, INC. | PUBLIC
Built on a solid foundation
1966
Founded in
Michigan
World’s most technologically advanced
manufacturing solutions provider
37
Million square feet of
Manufacturing space
100+
Sites in
29+ Countries
Tenured Management Team
180K Dedicated Employees
Over 100 sites in 29 countries
Our Markets
Automation FluidicsAcoustics Dynamic
Tuning
Emerging
Markets
Human
Machine
Interface
Adhesives IoT IT Cyber
Security
Additive
Manufacturing
Advanced
Assembly
Intelligent
Digital
Supply
Chain
Experience
Design
Mechanical
Engineering
Human
Factors
Research
&
Strategy
Materials
Technology
Miniaturization Optical
Communications
&
Networking
Smart
Clothing
Optics Power
Engineering
Precision
Injection
Mold Tooling
Precision
Mechanics
Printed
Electronics
Sensors Test
Engineering
Value
Engineering
Wireless
Connectivity
Industrial
Design
UI/UX Electrical
Engineering /
Firmware
Software
Development
Innovation Fuel: Engineering Excellence
Our journey from Docker CE to EE
• CE 1.13 and upgraded to 17.06.0-ce
• 9 node cluster [ 5 manager + 4 worker nodes]
• DFP – HAProxy + custom logic provides on-demand reconfiguration
• GlusterFS for storage [ 3 clustered servers ]
• Standalone registry server & Portus as web frontend
• Portainer – Management solution for Docker
• Prometheus – Monitoring
Getting
Started
First
Project
Scale Innovate
Docker EE Architecture
Docker EE Cluster
DOCKER ENTERPRISE EDITION
Node
Worker
Node
Manager
Management Plane
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Manager
Node
Manager
Docker EE Architecture
Docker EE Cluster
DOCKER ENTERPRISE EDITION
Node
Worker
Node
Manager
Management Plane
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Manager
Node
Manager
Availability Set
docker.corp.jabil.org
push
/ pull
Docker EE Architecture
Docker EE Cluster
DOCKER ENTERPRISE EDITION
Node
Worker
Node
Manager
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Manager
Node
Manager
Availability Set
deploy / manage
Management Plane
AvailabilitySet
ucp.docker.corp.jabil.org
Docker EE Architecture
Docker EE Cluster
DOCKER ENTERPRISE EDITION
Node
Worker
Node
Manager
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Worker
Node
Manager
Node
Manager
Availability Set
Management Plane
AvailabilitySet
*.docker.corp.jabil.org
Availability Set Availability Set
Dockerized Apps
v
Shop Floor Solutions – Malaysia, Vietnam, Singapore, China, India, Italy
Docker – Shop Floor Solutions
Web Kiosk, E-TV RDP – Remote Desktop Protocol
$450 savings per station in hardware costs
($500 vs $50)
$200 savings per station in hardware costs
($250 vs $50)
Lower energy usage per device
(51.84 kwh v/s 4.32kwh per month)
Lobby, Cafeteria, PCC Stations, PLSD
(~1000)
Lower energy usage per device
(11.51 kwh v/s 4.32kwh per month)
Replaces Thin Clients
(~20,000)
Highly scalable + Automated Pi update system that is productionized
Alternate technologies lowering costs and easier to deploy
“Pi wouldn’t have happened without Docker” … Eric Kerin
194 Pi’s
&
growing
Azure Scale/Availability Set
FD 0 FD 1
DTR01
UD0
DTR02
UD1
DTR03
UD2
availabilityset-dtr
DTR04
UD3
DTR05
UD4
Scale Set Availability Set
Identical VM’s Not necessary to be identical VM’s
Unpredictable workload Predictable workload
Image Caching
docker.jabil.com/pillais1/alpine
14.209s
20.293s
Image Caching
7.094s
5.159s
DTR Garbage Collection
• Until done
• For x minutes
• Never
Monitoring
Monitoring
Scheduler
Monitoring
docker service create --name devservices-omsagent --mode global
--mount type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock
--secret source=devservices_oms_wsid,target=WSID
--secret source=devservices_oms_wskey,target=KEY
-p 25225:25225 -p 25224:25224/udp --restart-condition=on-failure
microsoft/oms
Additional information:
• Docker Certified Infrastructure:
https://success.docker.com/architectures
• DTR Image Caching:
https://docs.docker.com/ee/dtr/admin/configure/deploy-caches/
• Running Docker EE at scale:
https://success.docker.com/article/running-docker-ee-at-scale
Thank you!

Considerations for operating docker at scale

  • 1.
  • 2.
    Solution Architect, Docker AndrewHromis Sr. DevOps Engineer, Jabil @sujaypillai Sujay Pillai
  • 3.
    • Architecture • Nodesizing • Orchestration • Applications Scaling Docker EE
  • 4.
    • Design forfailure • Leverage Docker Certified Infrastructure • Prioritize centralized logging and metric collection Architectural Considerations
  • 5.
    Single Logically SeparatedCluster Node Worker Node Worker Node Worker Node Worker swarm mode cluster Node Worker Node Worker .NET Dev Team Using Swarm Java Dev Team using K8s Java Dev Team Using Swarm Ops Team DOCKER ENTERPRISE EDITION
  • 6.
    Production Environments Docker TrustedRegistry Docker UCP Production Environments Version Control Docker UCP Non-Production EnvironmentsDeveloper Machine Development CI/CD Operations Datacenter 1 Datacenter 2 Docker Trusted Registry Docker for Clusters by Lifecycle
  • 7.
    Multi-region DR West East CIAgent $ eval $(<env.sh) … east $ docker run $ docker service $ docker-compose up m yapp v1.0
  • 8.
    Multi-region DR West East CIAgent $ eval $(<env.sh) … west $ docker run $ docker service $ docker-compose up myapp v1.0
  • 9.
    Blue/Green foo.example.com (1.0) foo.green.example.com(1.1) Run your tests! PRODUCTION NEW RELEASE
  • 10.
    Ship it! foo.example.com (1.0)foo.example.com (1.1) Increase traffic to the new release PRODUCTION 90% Traffic PRODUCTION 10% Traffic
  • 11.
    All good? foo.example.com (1.0)foo.example.com (1.1) Measure the new release PRODUCTION 90% Traffic PRODUCTION 10% Traffic
  • 12.
    Cluster Upgrade foo.example.com (1.0)foo.example.com (1.1) Maintenance can be performed on blue, or rollback PRODUCTION 0% Traffic PRODUCTION 100% Traffic
  • 13.
    Scaling w/ DockerCertified Infra
  • 14.
    Node Sizing Manager NodesWorker Nodes • CPU: 4 vCPU • Memory: 16GB • Disk: SSD for /var/lib/docker • Support 100s of worker nodes • 3 or 5 managers is preferred • Depends on application workloads • If migrating there will be less overhead from OS • Leave headroom for rescheduling events • Run under load and test
  • 15.
    • Don’t scheduleworkloads on manager nodes • Deploy to nodes that fit app profiles • Constrain resources • Use Kubernetes namespaces in environments with multiple users and teams Orchestration
  • 16.
    • Stateless applicationsscale the best • Scale applications with any orchestrator • Adjust replica count in Kubernetes and Swarm • Understand the metrics by which to scale Scaling Applications
  • 17.
    Manufacturing at the Speedof Digital © JABIL, INC. | PUBLIC
  • 18.
    Built on asolid foundation 1966 Founded in Michigan World’s most technologically advanced manufacturing solutions provider 37 Million square feet of Manufacturing space 100+ Sites in 29+ Countries Tenured Management Team 180K Dedicated Employees
  • 19.
    Over 100 sitesin 29 countries Our Markets Automation FluidicsAcoustics Dynamic Tuning Emerging Markets Human Machine Interface Adhesives IoT IT Cyber Security Additive Manufacturing Advanced Assembly Intelligent Digital Supply Chain Experience Design Mechanical Engineering Human Factors Research & Strategy Materials Technology Miniaturization Optical Communications & Networking Smart Clothing Optics Power Engineering Precision Injection Mold Tooling Precision Mechanics Printed Electronics Sensors Test Engineering Value Engineering Wireless Connectivity Industrial Design UI/UX Electrical Engineering / Firmware Software Development Innovation Fuel: Engineering Excellence
  • 21.
    Our journey fromDocker CE to EE • CE 1.13 and upgraded to 17.06.0-ce • 9 node cluster [ 5 manager + 4 worker nodes] • DFP – HAProxy + custom logic provides on-demand reconfiguration • GlusterFS for storage [ 3 clustered servers ] • Standalone registry server & Portus as web frontend • Portainer – Management solution for Docker • Prometheus – Monitoring Getting Started First Project Scale Innovate
  • 22.
    Docker EE Architecture DockerEE Cluster DOCKER ENTERPRISE EDITION Node Worker Node Manager Management Plane Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Manager Node Manager
  • 23.
    Docker EE Architecture DockerEE Cluster DOCKER ENTERPRISE EDITION Node Worker Node Manager Management Plane Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Manager Node Manager Availability Set docker.corp.jabil.org push / pull
  • 24.
    Docker EE Architecture DockerEE Cluster DOCKER ENTERPRISE EDITION Node Worker Node Manager Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Manager Node Manager Availability Set deploy / manage Management Plane AvailabilitySet ucp.docker.corp.jabil.org
  • 25.
    Docker EE Architecture DockerEE Cluster DOCKER ENTERPRISE EDITION Node Worker Node Manager Node Worker Node Worker Node Worker Node Worker Node Worker Node Worker Node Manager Node Manager Availability Set Management Plane AvailabilitySet *.docker.corp.jabil.org Availability Set Availability Set
  • 26.
  • 27.
    v Shop Floor Solutions– Malaysia, Vietnam, Singapore, China, India, Italy Docker – Shop Floor Solutions Web Kiosk, E-TV RDP – Remote Desktop Protocol $450 savings per station in hardware costs ($500 vs $50) $200 savings per station in hardware costs ($250 vs $50) Lower energy usage per device (51.84 kwh v/s 4.32kwh per month) Lobby, Cafeteria, PCC Stations, PLSD (~1000) Lower energy usage per device (11.51 kwh v/s 4.32kwh per month) Replaces Thin Clients (~20,000) Highly scalable + Automated Pi update system that is productionized Alternate technologies lowering costs and easier to deploy “Pi wouldn’t have happened without Docker” … Eric Kerin 194 Pi’s & growing
  • 28.
    Azure Scale/Availability Set FD0 FD 1 DTR01 UD0 DTR02 UD1 DTR03 UD2 availabilityset-dtr DTR04 UD3 DTR05 UD4 Scale Set Availability Set Identical VM’s Not necessary to be identical VM’s Unpredictable workload Predictable workload
  • 29.
  • 30.
  • 31.
    DTR Garbage Collection •Until done • For x minutes • Never
  • 32.
  • 33.
  • 34.
  • 35.
    Monitoring docker service create--name devservices-omsagent --mode global --mount type=bind,source=/var/run/docker.sock,destination=/var/run/docker.sock --secret source=devservices_oms_wsid,target=WSID --secret source=devservices_oms_wskey,target=KEY -p 25225:25225 -p 25224:25224/udp --restart-condition=on-failure microsoft/oms
  • 36.
    Additional information: • DockerCertified Infrastructure: https://success.docker.com/architectures • DTR Image Caching: https://docs.docker.com/ee/dtr/admin/configure/deploy-caches/ • Running Docker EE at scale: https://success.docker.com/article/running-docker-ee-at-scale Thank you!