LB+HA webapp with Docker Swarm
Simone Soldateschi
ROME - APRIL 13/14 2018
“Let me tell you a story...”
Who I am
● Staff Engineer at Slack
● Previously at Microsoft, Rackspace
● 10+ years of experience as
○ Software Engineer
○ Systems Engineer
○ DevOps'in for the last 8 years
Simone Soldateschi
@soldasimo
simone.soldateschi@gmail.com
Friday: 5.50pm-6.30pm
+ during breaks
● Container Orchestration Engines, COEs
● Docker Swarm Mode, why
● Cloud Infrastructure, abs
● Service Management, how
Agenda
Container Orchestration
Engines
COEs
- Lots of features
(auto-scaling, secrets
management, UI)
- YAML deployment
model
- Pods
- Large community
- Google, Red Hat, Azure
- Quite recent project
- YAML deployment model
- Multi-master
- Auto-healing
- TLS network security
- Quick and Easy
- No UI
- No auto-scaling
- No external load-balancing
- Multi-master
- Highly scalable
- Multi OSes
- Steep learning curve
- Airbnb, ~Apple, eBay,
Netflix, Twitter
Docker Swarm Mode
The path to Docker Swarm Mode
«If you are using a Docker version prior to 1.12.0,
you can use standalone swarm,
but we recommend updating.»
https://docs.docker.com/engine/swarm/
Standalone Swarm != Swarm mode
$ docker swarm init
$ docker swarm join
Key Features
● Cluster management with Docker Engine
● Declarative service model
● Auto Scaling
● Multi-host networking
● Service Discovery
● Load balancing
● Secure by default (TLS)
● Rolling updates
● Developer oriented
Docker Swarm Security
Cloud Infrastructure
Manager initialises cluster
docker swarm init 
--advertise-addr $MANAGER_IP
Stand up basic cluster
[Manager|Worker]
docker swarm join 
--token $TOKEN_[MANAGER|WORKER] 
$MANAGER_IP:2377
Infrastructure as Code
template PRplan
plan destroyapply
DEV
DEPLOY
Basic Cluster
Architecture
Raft
Internal Distributed Store
Raft RaftLeader
Follower
Follower
Workers Gossip network
Raft consensus groupManagers
The Raft Consensus Algorithm
The Secret Lives of Data
Raft consensus
Given N managers:
● Raft tolerates up to (N-1)/2 failures.
● Raft requires a quorum of (N/2)+1 members to agree
on values proposed to the cluster.
Cluster Fault Tolerance
Multi AZ
Scale Cluster in/out
manager ASG
worker ASG
Docker Swarm cluster revised
Manager
docker swarm join 
--token $TOKEN_MANAGER 
$MANAGER_IP:2377
Worker
docker swarm join 
--token $TOKEN_WORKER 
$MANAGER_IP:2377
Autojoin Swarm Nodes
Autojoin Swarm Nodes
1. Ask manager for a token,
e.g. API, SSH
3. Join with token
2. Fetch token
Autojoin Swarm Nodes
1. Ask Vault for
a token
2. Fetch token
3. Join with token
Service Management
stack
Container, Service and Stack
service
container
A day in the life of a Docker Service
create scale rm
ls
logs rollback
ps
update
?
?
inspect
$ docker service create --name web -p 80:80 nginx
overall progress: 1 out of 1 tasks
1/1: running [=================================>]
verify: Service converged
Docker service
$ docker service scale web=3
web scaled to 3
overall progress: 3 out of 3 tasks
1/3: running [=================================>]
2/3: running [=================================>]
3/3: running [=================================>]
verify: Service converged
$ docker service rm web
A day in the life of a Docker Stack
deploy services rm
ls
?
ps
Docker Stacks
version: "3"
services:
web:
image: nginx
deploy:
replicas: 3
resources:
limits:
cpus: "0.1"
memory: 50M
restart_policy:
condition: on-failure
ports:
- "80:80"
networks:
- webnet
networks:
webnet:
$ docker stack deploy -c docker-compose.yml webstack
Creating network webstack_webnet
Creating service webstack_web
$ docker stack ps --format 
"{{.Name}}: {{.Image}} {{.Node}} {{.DesiredState}}"
webstack
webstack_web.1: nginx:latest worker2 Running
webstack_web.2: nginx:latest manager Running
webstack_web.3: nginx:latest worker1 Running
Docker Stacks
$ docker service scale webstack_web=6
webstack_web scaled to 6
overall progress: 6 out of 6 tasks
1/6: running [===================================>]
2/6: running [===================================>]
3/6: running [===================================>]
4/6: running [===================================>]
5/6: running [===================================>]
6/6: running [===================================>]
verify: Service converged
Secrets Management
● It’s best not to have secrets.
● Don’t write secrets down.
● Protect secrets in one place.
Secrets Management
manager
TLS
Internal distributed store
manager
TLS
manager
TLS
Service Deploy
worker
TLS
worker
TLS
worker
TLS
Raft Consensus Group
Multi-region
Cloud Infrastructure
Multi-region
Pilot Light
Tie up all things
// wait for it...
Demo Time
● Stand up development Swarm cluster
● Start services
● Start monitoring
● Scale and Load-test
Stand up Dev Swarm cluster
$ git clone 
https://github.com/siso/vagrant-docker-swarm.git
$ cd vagrant-docker-swarm
$ AUTO_START_SWARM=true vagrant up
Scale and Load-test
Monitor load-tests
Q&A
simone soldateschi
Friday: 5.50pm-6.30pm
+ during breaks
@soldasimo
simone.soldateschi@gmail.com
github.com/siso/vagrant-docker-swarm
Recap
● Services and Stacks
Recap
● Swarm Mode
● Multi-region infrastructure
● Provision Infrastructure
● Demo
Lesson Learned
● Options are good. Many COEs to choose from.
● Docker Swarm Mode is great for greenfield project.
● Prototype, then automate. Don’t do both.
Thank You!
Extras
git clone https://github.com/siso/swarmprom.git
cd /home/vagrant/swarmprom
ADMIN_USER=admin 
ADMIN_PASSWORD=admin 
docker stack deploy -c docker-compose.yml mon
echo "View Grafana Dashboard at http://$(docker node inspect self --format
'{{ .Status.Addr }}'):3000"
Start Monitoring Systems
References
● vagrant-docker-swarm github
repo
● swarmprom
Docker Swarm instrumentation with Prometheus,
Grafana, cAdvisor, Node Exporter and Alert Manager
● ...
License
Attribution-ShareAlike 4.0
International
Docker Engine
TODO -- Container Network Model
TODO -- Service Discovery

Load-balancing high-available web-app with Docker Swarm cluster. - Simone Soldateschi - Codemotion Rome 2018

  • 1.
    LB+HA webapp withDocker Swarm Simone Soldateschi ROME - APRIL 13/14 2018
  • 2.
    “Let me tellyou a story...”
  • 3.
    Who I am ●Staff Engineer at Slack ● Previously at Microsoft, Rackspace ● 10+ years of experience as ○ Software Engineer ○ Systems Engineer ○ DevOps'in for the last 8 years Simone Soldateschi @soldasimo simone.soldateschi@gmail.com Friday: 5.50pm-6.30pm + during breaks
  • 4.
    ● Container OrchestrationEngines, COEs ● Docker Swarm Mode, why ● Cloud Infrastructure, abs ● Service Management, how Agenda
  • 5.
  • 6.
    COEs - Lots offeatures (auto-scaling, secrets management, UI) - YAML deployment model - Pods - Large community - Google, Red Hat, Azure - Quite recent project - YAML deployment model - Multi-master - Auto-healing - TLS network security - Quick and Easy - No UI - No auto-scaling - No external load-balancing - Multi-master - Highly scalable - Multi OSes - Steep learning curve - Airbnb, ~Apple, eBay, Netflix, Twitter
  • 7.
  • 8.
    The path toDocker Swarm Mode «If you are using a Docker version prior to 1.12.0, you can use standalone swarm, but we recommend updating.» https://docs.docker.com/engine/swarm/
  • 9.
    Standalone Swarm !=Swarm mode $ docker swarm init $ docker swarm join
  • 10.
    Key Features ● Clustermanagement with Docker Engine ● Declarative service model ● Auto Scaling ● Multi-host networking ● Service Discovery ● Load balancing ● Secure by default (TLS) ● Rolling updates ● Developer oriented
  • 11.
  • 12.
  • 13.
    Manager initialises cluster dockerswarm init --advertise-addr $MANAGER_IP Stand up basic cluster [Manager|Worker] docker swarm join --token $TOKEN_[MANAGER|WORKER] $MANAGER_IP:2377
  • 14.
    Infrastructure as Code templatePRplan plan destroyapply DEV DEPLOY
  • 15.
  • 16.
    Architecture Raft Internal Distributed Store RaftRaftLeader Follower Follower Workers Gossip network Raft consensus groupManagers
  • 17.
    The Raft ConsensusAlgorithm The Secret Lives of Data
  • 18.
    Raft consensus Given Nmanagers: ● Raft tolerates up to (N-1)/2 failures. ● Raft requires a quorum of (N/2)+1 members to agree on values proposed to the cluster.
  • 19.
  • 20.
  • 21.
  • 22.
    Docker Swarm clusterrevised Manager docker swarm join --token $TOKEN_MANAGER $MANAGER_IP:2377 Worker docker swarm join --token $TOKEN_WORKER $MANAGER_IP:2377
  • 23.
  • 24.
    Autojoin Swarm Nodes 1.Ask manager for a token, e.g. API, SSH 3. Join with token 2. Fetch token
  • 25.
    Autojoin Swarm Nodes 1.Ask Vault for a token 2. Fetch token 3. Join with token
  • 26.
  • 27.
    stack Container, Service andStack service container
  • 28.
    A day inthe life of a Docker Service create scale rm ls logs rollback ps update ? ? inspect
  • 29.
    $ docker servicecreate --name web -p 80:80 nginx overall progress: 1 out of 1 tasks 1/1: running [=================================>] verify: Service converged Docker service $ docker service scale web=3 web scaled to 3 overall progress: 3 out of 3 tasks 1/3: running [=================================>] 2/3: running [=================================>] 3/3: running [=================================>] verify: Service converged $ docker service rm web
  • 30.
    A day inthe life of a Docker Stack deploy services rm ls ? ps
  • 31.
    Docker Stacks version: "3" services: web: image:nginx deploy: replicas: 3 resources: limits: cpus: "0.1" memory: 50M restart_policy: condition: on-failure ports: - "80:80" networks: - webnet networks: webnet: $ docker stack deploy -c docker-compose.yml webstack Creating network webstack_webnet Creating service webstack_web $ docker stack ps --format "{{.Name}}: {{.Image}} {{.Node}} {{.DesiredState}}" webstack webstack_web.1: nginx:latest worker2 Running webstack_web.2: nginx:latest manager Running webstack_web.3: nginx:latest worker1 Running
  • 32.
    Docker Stacks $ dockerservice scale webstack_web=6 webstack_web scaled to 6 overall progress: 6 out of 6 tasks 1/6: running [===================================>] 2/6: running [===================================>] 3/6: running [===================================>] 4/6: running [===================================>] 5/6: running [===================================>] 6/6: running [===================================>] verify: Service converged
  • 33.
    Secrets Management ● It’sbest not to have secrets. ● Don’t write secrets down. ● Protect secrets in one place.
  • 34.
    Secrets Management manager TLS Internal distributedstore manager TLS manager TLS Service Deploy worker TLS worker TLS worker TLS Raft Consensus Group
  • 35.
  • 36.
  • 37.
  • 38.
    Tie up allthings // wait for it...
  • 39.
    Demo Time ● Standup development Swarm cluster ● Start services ● Start monitoring ● Scale and Load-test
  • 40.
    Stand up DevSwarm cluster $ git clone https://github.com/siso/vagrant-docker-swarm.git $ cd vagrant-docker-swarm $ AUTO_START_SWARM=true vagrant up
  • 41.
  • 42.
  • 43.
  • 44.
    simone soldateschi Friday: 5.50pm-6.30pm +during breaks @soldasimo simone.soldateschi@gmail.com github.com/siso/vagrant-docker-swarm
  • 45.
  • 46.
    ● Services andStacks Recap ● Swarm Mode ● Multi-region infrastructure ● Provision Infrastructure ● Demo
  • 47.
    Lesson Learned ● Optionsare good. Many COEs to choose from. ● Docker Swarm Mode is great for greenfield project. ● Prototype, then automate. Don’t do both.
  • 48.
  • 49.
  • 50.
    git clone https://github.com/siso/swarmprom.git cd/home/vagrant/swarmprom ADMIN_USER=admin ADMIN_PASSWORD=admin docker stack deploy -c docker-compose.yml mon echo "View Grafana Dashboard at http://$(docker node inspect self --format '{{ .Status.Addr }}'):3000" Start Monitoring Systems
  • 51.
    References ● vagrant-docker-swarm github repo ●swarmprom Docker Swarm instrumentation with Prometheus, Grafana, cAdvisor, Node Exporter and Alert Manager ● ...
  • 52.
  • 54.
  • 56.
    TODO -- ContainerNetwork Model
  • 57.
    TODO -- ServiceDiscovery