ETCD CLUSTERS ON AWS
Terraform + CoreOS for
1
TEXT
ABOUT ME
▸ Software engineer, Dev-Ops by chance
▸ Currently at reBuy.de, helping with migration to AWS
▸ Previously - 4 years at Amazon (AWS)
2
THE PREMISE
WHAT IS ETCD?
▸ Distributed key-value store
▸ Based on Raft consensus algorithm
▸ Similar to Consul and ZooKeeper
▸ Used for storing state of distributed applications
(Kubernetes, Fleet, CoreUpdate)
▸ Should be treated like a database
▸ Comes bundled with CoreOS
3
THE ENVIRONMENT
TYPICAL ETCD DEPLOYMENT
▸ Odd number of instances
▸ Evenly distributed across AZs
▸ Low-latency connectivity between nodes
▸ Persistent storage (EBS)
▸ A way to determine the list of nodes
4
THE PROCESS
BOOTSTRAPPING ETCD
▸ Nodes need prior knowledge about all other nodes
▸ The bootstrap phase is a one-off scenario
▸ Has support for discovering nodes (DNS SRV records)
▸ Can use discovery for clients
5
THE PROCESS
…ON AWS
▸ Prepare CoreOS configuration (cloud-config)
▸ Launch node instances
▸ Create discovery DNS records
▸ Profit!
6
TERRAFORM + CoreOS
FINDING THE NODES
▸ Through DNS SRV records
▸ Route53 private DNS inside VPC
▸ Nodes get a stable hostname

(not ip-172-31-2-219.eu-west-1.compute.internal)
7
TERRAFORM + CoreOS
resource "aws_route53_record" "etcd_srv_discover" {



name = "_etcd-server._tcp"

type = "SRV"

records = ["${formatlist("0 0 2380 %s", aws_route53_record.etc_a_nodes.*.fqdn)}"]

ttl = “300"
zone_id = "${aws_route53_zone.etcd_zone.id}"

}


resource "aws_route53_record" "etc_a_nodes" {



count = "${var.node_count}"

type = "A" name = "node-${count.index}"

records = ["${aws_instance.etcd_node.*.private_ip[count.index]}"]

ttl = 300

zone_id = "${aws_route53_zone.etcd_zone.id}"

}
STABLE HOST NAMES
8
$ dig _etcd-server._tcp.cluster.etcd SRV
_etcd-server._tcp.cluster.etcd. 183 IN SRV 0 0 2380 node-0.cluster.etcd.
_etcd-server._tcp.cluster.etcd. 183 IN SRV 0 0 2380 node-1.cluster.etcd.
_etcd-server._tcp.cluster.etcd. 183 IN SRV 0 0 2380 node-2.cluster.etcd.
TERRAFORM + COREOS
CONFIGURING CoreOS
▸ Uses own version of cloud-init (subset of cloud-config)
▸ Config as EC2 user-data
▸ Template data-source for user-data
▸ Has to include hostname and DNS domain for discovery
9
TERRAFORM + CoreOS
CoreOS CONFIG AS USERDATA
#cloud-config

hostname: ${node_name}

coreos:

update:

reboot-strategy: "etcd-lock"

etcd2:

name: ${node_name}

discovery-srv: ${cluster_domain}

listen-peer-urls: "http://$private_ipv4:2380"

listen-client-urls: "http://0.0.0.0:2379"

initial-advertise-peer-urls: "http://${node_name}:2380"

advertise-client-urls: "http://${node_name}:2379"

units:

- name: "etcd2.service"

enable: false

command: start
10
data "template_file" "userdata" {



count = "${var.node_count}"

template = "${file("${path.root}/../resources/userdata.yaml")}"

vars {

node_name = "node-${count.index}.${var.cluster_domain}"

cluster_domain = "${var.cluster_domain}"

}

}
TERRAFORM + CoreOS
LAUNCH NODES
11
resource "aws_instance" "etcd_node" {



count = "${var.node_count}"

ami = "${data.aws_ami.coreos_ami.id}"

instance_type = "t2.medium"

subnet_id = "${aws_subnet.az_subnet.*.id[count.index]}"

key_name = "${aws_key_pair.ssh-key.id}"

user_data = "${data.template_file.userdata.*.rendered[count.index]}"

}
$ terraform apply
core@node-1 ~ $ etcdctl cluster-health
member 5bea3befcd2b527d is healthy: got healthy result from http://node-2.cluster.etcd:2379
member bfc4d7d3459cc4cb is healthy: got healthy result from http://node-1.cluster.etcd:2379
member d1b3f464b49063ac is healthy: got healthy result from http://node-0.cluster.etcd:2379
cluster is healthy
DEMO ?!
honourable crowd
TERRAFORM + CoreOS 12
TERRAFORM + CoreOS
THAT'S IT!
Take-aways:
▸ etcd operations are deliberately “manual”
▸ etcd requires a source-of-truth for member list (Terraform)
▸ auto-scaling possible, but discouraged
▸ Route53 useful for service discovery
13
TERRAFORM + CoreOS
QUESTIONS?
Terraform module at:
https://github.com/alexsomesan/tf-simple-etcd
Get in touch!
alex.somesan@gmail.com
@ASomesan
14

Etcd terraform by Alex Somesan

  • 1.
    ETCD CLUSTERS ONAWS Terraform + CoreOS for 1
  • 2.
    TEXT ABOUT ME ▸ Softwareengineer, Dev-Ops by chance ▸ Currently at reBuy.de, helping with migration to AWS ▸ Previously - 4 years at Amazon (AWS) 2
  • 3.
    THE PREMISE WHAT ISETCD? ▸ Distributed key-value store ▸ Based on Raft consensus algorithm ▸ Similar to Consul and ZooKeeper ▸ Used for storing state of distributed applications (Kubernetes, Fleet, CoreUpdate) ▸ Should be treated like a database ▸ Comes bundled with CoreOS 3
  • 4.
    THE ENVIRONMENT TYPICAL ETCDDEPLOYMENT ▸ Odd number of instances ▸ Evenly distributed across AZs ▸ Low-latency connectivity between nodes ▸ Persistent storage (EBS) ▸ A way to determine the list of nodes 4
  • 5.
    THE PROCESS BOOTSTRAPPING ETCD ▸Nodes need prior knowledge about all other nodes ▸ The bootstrap phase is a one-off scenario ▸ Has support for discovering nodes (DNS SRV records) ▸ Can use discovery for clients 5
  • 6.
    THE PROCESS …ON AWS ▸Prepare CoreOS configuration (cloud-config) ▸ Launch node instances ▸ Create discovery DNS records ▸ Profit! 6
  • 7.
    TERRAFORM + CoreOS FINDINGTHE NODES ▸ Through DNS SRV records ▸ Route53 private DNS inside VPC ▸ Nodes get a stable hostname
 (not ip-172-31-2-219.eu-west-1.compute.internal) 7
  • 8.
    TERRAFORM + CoreOS resource"aws_route53_record" "etcd_srv_discover" {
 
 name = "_etcd-server._tcp"
 type = "SRV"
 records = ["${formatlist("0 0 2380 %s", aws_route53_record.etc_a_nodes.*.fqdn)}"]
 ttl = “300" zone_id = "${aws_route53_zone.etcd_zone.id}"
 } 
 resource "aws_route53_record" "etc_a_nodes" {
 
 count = "${var.node_count}"
 type = "A" name = "node-${count.index}"
 records = ["${aws_instance.etcd_node.*.private_ip[count.index]}"]
 ttl = 300
 zone_id = "${aws_route53_zone.etcd_zone.id}"
 } STABLE HOST NAMES 8 $ dig _etcd-server._tcp.cluster.etcd SRV _etcd-server._tcp.cluster.etcd. 183 IN SRV 0 0 2380 node-0.cluster.etcd. _etcd-server._tcp.cluster.etcd. 183 IN SRV 0 0 2380 node-1.cluster.etcd. _etcd-server._tcp.cluster.etcd. 183 IN SRV 0 0 2380 node-2.cluster.etcd.
  • 9.
    TERRAFORM + COREOS CONFIGURINGCoreOS ▸ Uses own version of cloud-init (subset of cloud-config) ▸ Config as EC2 user-data ▸ Template data-source for user-data ▸ Has to include hostname and DNS domain for discovery 9
  • 10.
    TERRAFORM + CoreOS CoreOSCONFIG AS USERDATA #cloud-config
 hostname: ${node_name}
 coreos:
 update:
 reboot-strategy: "etcd-lock"
 etcd2:
 name: ${node_name}
 discovery-srv: ${cluster_domain}
 listen-peer-urls: "http://$private_ipv4:2380"
 listen-client-urls: "http://0.0.0.0:2379"
 initial-advertise-peer-urls: "http://${node_name}:2380"
 advertise-client-urls: "http://${node_name}:2379"
 units:
 - name: "etcd2.service"
 enable: false
 command: start 10 data "template_file" "userdata" {
 
 count = "${var.node_count}"
 template = "${file("${path.root}/../resources/userdata.yaml")}"
 vars {
 node_name = "node-${count.index}.${var.cluster_domain}"
 cluster_domain = "${var.cluster_domain}"
 }
 }
  • 11.
    TERRAFORM + CoreOS LAUNCHNODES 11 resource "aws_instance" "etcd_node" {
 
 count = "${var.node_count}"
 ami = "${data.aws_ami.coreos_ami.id}"
 instance_type = "t2.medium"
 subnet_id = "${aws_subnet.az_subnet.*.id[count.index]}"
 key_name = "${aws_key_pair.ssh-key.id}"
 user_data = "${data.template_file.userdata.*.rendered[count.index]}"
 } $ terraform apply core@node-1 ~ $ etcdctl cluster-health member 5bea3befcd2b527d is healthy: got healthy result from http://node-2.cluster.etcd:2379 member bfc4d7d3459cc4cb is healthy: got healthy result from http://node-1.cluster.etcd:2379 member d1b3f464b49063ac is healthy: got healthy result from http://node-0.cluster.etcd:2379 cluster is healthy
  • 12.
  • 13.
    TERRAFORM + CoreOS THAT'SIT! Take-aways: ▸ etcd operations are deliberately “manual” ▸ etcd requires a source-of-truth for member list (Terraform) ▸ auto-scaling possible, but discouraged ▸ Route53 useful for service discovery 13
  • 14.
    TERRAFORM + CoreOS QUESTIONS? Terraformmodule at: https://github.com/alexsomesan/tf-simple-etcd Get in touch! alex.somesan@gmail.com @ASomesan 14