The Enterprise IT Checklist for Docker Operations

THE
ENTERPRISE IT
CHECKLIST
FOR DOCKER
OPERATIONS
Nicola Kabar
Solutions Architect
Docker
@nicolakabar

1. The Enterprise IT Goal:
Deliver Value to Customers,
Fast!

2. Docker is at the center of it all

3. Biggest
challenge is
moving to
production in
time to prove
value

Docker
Production
Readiness
Checklist for
Enterprise IT

The Checklist
 Infrastructure
 Orchestration
Management
 Image Distribution
 Security
 Network
 Storage
 Logging and
Monitoring
 Integration
 Disaster Recovery
 Testing

Infrastructure
 Cluster Sizing and Zoning
 Supported and Compatible ( OS, Docker Engine, UCP, DTR)
 Host Sizing ( Manager vs Worker Nodes)
• Manager(minimum): 16G mem, 4 vCPU, 1+ Gbps, 32+ GB disk
• Worker(minimum): 4G mem, 2 vCPU, 100+ Mbps, 8 GB disk

Orchestration Management
 Redundant/Highly Available UCP managers
 Deployed in odd numbers (3,5,7) to maintain quorum
 Distributed across DCs or Availability Zones (1-1-1, 2-2-1..etc)
 Fine-tuned orchestration settings (e.g Task History Limit, Raft Settings,
Node Certificate Rotation)

Orchestration Management
 Upstream TCP load balancing
 No application workloads on managers
 Automated join and leave process
 Labeled resources (networks, volumes, containers, services,
secrets, nodes)

Image Distribution
 Redundant (3,5,7) DTR Replicas
 Replicated and secured image backend storage (NFS, S3, Azure
Storage…etc)
 Garbage collection enabled
 Security scanning enabled

Security
 Utilize Docker EE RBAC Model ( Subjects, Grants, Roles, Collection,
Resource)
 AD/LDAP groups mapped to teams and organizations
 Docker Content Trust Signing and Enforcement
 Regular Run of Docker Security Bench

Security
 Restricted direct access (SSH/RDP)
 Utilize built-in Secrets functionality (encrypted, controlled)
 Rotate orchestration join keys
 Use built-in or your own CA
 Valid SSL/TLS certificates for UCP and DTR

Network
 Pick right networking driver for your application (overlay ,
bridge+host port mapping)
 Select proper publishing mode for external traffic ( Ingress vs.
Host Mode)
 Pick suitable load-balancing mode ( client side = dnsrr, server-
side = vip)
 Network latency < 100ms

Network
 Segment app at L3 with overlays (1 App  1 Overlay Network)
 Utilize built-in encrypted overlay feature ( app <--> app encrypted)
 Pick the application subnet size carefully
 Designated non-overlapping subnets to be used by Docker for overlay
networks

OVERLAY A: 10.10.10.0/24 10.10.10.0/24
CONTAINER A: 10.10.10.10/24 SERVER A:10.10.10.100
Docker
Underlay Network
Customer Story

OVERLAY A: 10.10.200.0/24
10.10.10.0/24
CONTAINER A:
10.10.200.10/24
SERVER A:10.10.10.100
Docker
Underlay Network
Recommendation

Customer Story
X Improper network subnet design
X Overlapping subnet with underlay range
X Black holing traffic intended for services outside
the cluster
X Recommendation: dedicate subnets from the
underlay to be used by Docker

Storage
 Production-ready configured engine storage backend ( esp.
devicemapper for centos/RHEL)
 Replicated and secure DTR storage backend
 Certified and tested application data storage plugin for replicating
application data

Logging and Monitoring
 External centralized logging for engine and application containers logs
 Local logging for active trouble-shooting (JSON or JournalD)
 Host-level and container-level resource monitoring
 DTR image backend storage monitoring
 Docker engine storage monitoring
 Use built-in application health checking functionality

Customer Story
DTRCI
push store
NFS

Customer Story
X DTR storage backend was not monitored
X GC was not enabled
X CI led to excessive image pushes
X Storage filled up -> can’t push to DTR
 Recommendation: Monitor storage backend,
enable GC

Integration
 UCP and DTR are well integrated ( SSO, DCT..etc)
 CI/CD tooling ( Jenkins, Bamboo, CircleCI, TravisCI..etc)
 Development tooling (dev machines, IDEs)
 Configuration automation tools (Puppet, Chef, Ansible, Salt)
 Resource provisioning systems ( Terraform..etc)

Integration
 Change management systems
 Internal/external DNS or other service discovery and
registration systems
 Load balancing for both the management plane and each of
the applications ( L4/L7)
 Incident/ticketing management systems (ServiceNow, etc)

Disaster Recovery
 Regular (rec. weekly) backups (UCP, DTR, and Swarm)
 Well-tested, automated, and documented
• platform restoration
• upgrade + downgrade
• application recovery procedure

Customer Story
Backup
UCP
Backup
DTR
Backup
Swarm
Backup
DTR
Storage

Customer Story
X No backups since installation
X Lost quorum led to cluster failure
X No way to recover UCP and DTR configurations
X Manual re-install + re-config of settings, teams, groups
 Recommendation: Frequent backup, tested restore procedures

Testing
 Multi-platform image pull and push to DTR
 Confirm users have the right set of access to their respective resources
 Confirm application resource limitation works as expected
 End-to-end stack deployment from CLI and UI
 Updating applications with new configuration, images, networks using
rolling upgrade

Docker Success Center
success.docker.com

Things you can find in Success Center
● Complete Docker EE Cluster Upgrade Guide
● End-to-End Security Best Practices
● Logging Design and Best Practices
● Support & Compatibility Matrix
● Troubleshooting Guides
● + 100s of Technical Assets

Key Takeaways
 Define a readiness checklist for an
accelerated, smooth and successful path to
production
 Design the Docker Enterprise platform
based on recommended architectures
 Deliver to Differentiate

Sign Up for Docker
EE Hosted Demo
and Kubernetes
Beta !
docker.com/trial
docker.com/kubernetes

Thanks!
Questions?
@nicolakabar

The Enterprise IT Checklist for Docker Operations

More Related Content

What's hot

Similar to The Enterprise IT Checklist for Docker Operations

Recently uploaded

The Enterprise IT Checklist for Docker Operations

Editor's Notes